From patchwork Thu Mar 20 01:55:51 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Changyuan Lyu X-Patchwork-Id: 14023316 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on Received: from ( []) by (Postfix) with ESMTP id A0B78C35FFC for ; Thu, 20 Mar 2025 01:56:52 +0000 (UTC) Received: by (Postfix) id 9D614280012; Wed, 19 Mar 2025 21:56:28 -0400 (EDT) Received: by (Postfix, from userid 40) id 93876280001; Wed, 19 Mar 2025 21:56:28 -0400 (EDT) X-Delivered-To: Received: by (Postfix, from userid 63042) id 7DC0E280012; Wed, 19 Mar 2025 21:56:28 -0400 (EDT) X-Delivered-To: Received: from ( []) by (Postfix) with ESMTP id 56C00280001 for ; Wed, 19 Mar 2025 21:56:28 -0400 (EDT) Received: from (a10.router.float.18 []) by (Postfix) with ESMTP id C97181407FE for ; Thu, 20 Mar 2025 01:56:28 +0000 (UTC) X-FDA: 83240264856.04.9DAB773 Received: from ( []) by (Postfix) with ESMTP id 3187040002 for ; Thu, 20 Mar 2025 01:56:26 +0000 (UTC) Authentication-Results:; dkim=pass header.s=20230601 header.b=l9eBXVkM; spf=pass ( domain of designates as permitted sender); dmarc=pass (policy=reject) ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed;; s=arc-20220608; t=1742435787; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=iApjQxSPhTF/tl2rkU2LC3L+Sofo1JAgj9zbjrL2yA0=; b=HvYXpe8J3L+XE9xCSnNpU9OudPTUmp0f5uSSMCKvhvrakAidZ65AhadcNjX7qrztPTljEp iouFt1ZGmwe0yL/g6uWeU3MDYfAIuBNxGcwvCEnUUXrEa/EBme7FZmZN1UmvZd6EEOfdfU Xk/OGmnDuzR+dAD2b6KliCD4t6ZO+bY= ARC-Seal: i=1; s=arc-20220608;; t=1742435787; a=rsa-sha256; cv=none; b=Lb1blhTGjxJxHNHCLEwS6wPGMiaucS8Bu4ZRmyKPyW3WYYClcbwL/vUbC7qILzAQA5btmx nWJVviYDiG9BvkDvTdHLmLGMQkrWjUgSDxJza+dzYbMqlV74HPmNFWjrJeqHxW0s5eLJhj 9DmTi4foGy5YyJGSCIgSnTbZVPl1AH4= ARC-Authentication-Results: i=1;; dkim=pass header.s=20230601 header.b=l9eBXVkM; spf=pass ( domain of designates as permitted sender); dmarc=pass (policy=reject) Received: by with SMTP id d9443c01a7336-2240c997059so4594845ad.0 for ; Wed, 19 Mar 2025 18:56:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;; s=20230601; t=1742435786; x=1743040586;; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=iApjQxSPhTF/tl2rkU2LC3L+Sofo1JAgj9zbjrL2yA0=; b=l9eBXVkMlqenI+MaYI9/dU3ubF/iiFs+MpEHlCiNOmtMBuaafLxRA1c9L4uaIs1uHT kHn2eCJlt1SORhwQmMMeiyeVkKyPQl+Oz311Suys4XS6Pt/BEJvcquoxera/uVNWmwOP ge+R+ruVcOQXwpXuuuio5wv98lIua80D7lavcNzXh2/rNRDrzz3PO4jDofPygHVWNkzr M3P/6d4dsknjXqHS8xKkYxkOjGafasSrfk+pSAQAgrCoI5kMxNNrJzFGwrneuS0bL5VZ 8tUrnQZ3xHaukeSpyVNRB94YnRHJUzqoQpS2c4U604F0A0Q7t5Dfx+2WeuVZYgUsd37n 5GyQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;; s=20230601; t=1742435786; x=1743040586; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=iApjQxSPhTF/tl2rkU2LC3L+Sofo1JAgj9zbjrL2yA0=; b=sWs+V/P2Wh+SxHLcIJdi32KL+nxwc5AHRHK/LlPcz3IXE2CB2xG8RRr8h3avObqDwS 8gu8xA17oCOjr6r3WS3YN9zEQXrpuK47onWcaZLDeiKnS78/koBVqT6plpWPxZERiiMu /YekDl/v2OoCHxu5pUxqSaWrafN/nYJzCjFndID/R7+gcS5chC58gbLoq31tgT6wWedF ldFhwc23+0P1AT88m4BLslVM/rBG3q4rEtGpmwrepmGCZ+ueC4AdqdieNZwFTOVdtr2V fPFmta4FwMiA6dTKrXA/WRgVR7tLgUH+hZ+gRQkBjmJHMe/MdtP7y7SvLQWCa0bcQeZH /TaA== X-Forwarded-Encrypted: i=1; X-Gm-Message-State: AOJu0YxqtuRf7p0lRFJ4LZBTG9EvWK+LcyhzRrdXwya3fpk3Ki41w0sW NGjpLDKUMNwXKhjGteN8RoJvLRsfT9GIbm5d8LDN3KOvK8s+ABiNqKog+XAgSZiz9KBrKzuaYOL UtvgbeUxrbMdVqnBxrQ== X-Google-Smtp-Source: AGHT+IGFSfyI7p/Qh0SdAoE1uQ2Phsg//Ps3doqJu8c++ecy3AeyWqH3lvhpNMjdI0PgAE047ECxcGLwH834CeR5 X-Received: from ([2002:a17:90b:2788:b0:2fa:15aa:4d2b]) (user=changyuanl job=prod-delivery.src-stubby-dispatcher) by 2002:a17:903:22c1:b0:223:fd7f:2752 with SMTP id d9443c01a7336-22649a34325mr73764035ad.29.1742435786002; Wed, 19 Mar 2025 18:56:26 -0700 (PDT) Date: Wed, 19 Mar 2025 18:55:51 -0700 In-Reply-To: <> Mime-Version: 1.0 References: <> X-Mailer: git-send-email 2.49.0.rc1.451.g8f38331e32-goog Message-ID: <> Subject: [PATCH v5 16/16] Documentation: add documentation for KHO From: Changyuan Lyu To: Cc:,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, Changyuan Lyu X-Rspam-User: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 3187040002 X-Stat-Signature: gnut3y7g53a7zpsxntcc6z9r1o1rsp6n X-HE-Tag: 1742435786-4988 X-HE-Meta: U2FsdGVkX18tt7v1sv65RuKy6hh0E88+wPX+cISgpjvpQeR1wBGmhZvte+aAGBz+UTfEX24jpDoioqn2vcUtU/T4YkLbEY+hJeTExObmE2jOEtSo/ynPOZGOq9we69a8RtNgr3mU6KbQ+XfnJTClAPB3/Ve+Dk1Cm8rXXewlLcbOBg3nbH9rFrBNMFrfk+XJpS2PotEF6yYqjCpi603dp7CxuGNlVxgPpSfXEFi90j/ArSpC8tx4lApxqILaeIpZvdyIljQHLl6uqUqCWY5vXX/fc/nItREaAMsoZLIF0j2b3uJjmsW2M5WWS+Sr0zlPxHI3FALgmvoLZ20im+HviCi55oPC2YArxm+te79dux6ZZs+79+qLsJ65HROpMax8jlIeXpYUXWGa4ZCikd4RFUrdkujiIZRPX0VLHwqUmbCc4o1T/t8dsEAynyrWZ8DKkzSeNcSNL+XunGFJhtHOcW+TWf+66yNN/XE0NWWkR8UixTv9aJqqq62NIryQJ+3CtRh6ealEAB6Joij9kxzaDY/RWNTPS4OuCvhZ6bq/lNnLrwpr5WIubDhSbcfPAzu/60kTWWD0OW2jz8WcQqZs6/6Vx+d9S9Vne9CGOj7BakEaO7UtCHfeptaxqjKp8Jzbc86wK+p/jqUNuwhvJhC6OWfk7tu0RdZk2x+Pug+X4JIA3W3JPWeTZAhryI+rUK00xHPnspF+F2oHxSEtEsSOfrna/ZY41yI8oRXHSmILYpTDcWlUm13adH0qpUZytWtyQcL1Gw/4UAsGIjRPy21DCIfVfEZOKSPn3TIKxT/N9bxfCjtot7ahXKH5TmR/ajxF55pcyTk1jE1pS6xUC0aS95DqprC8dpxNYOgb0omIaHkeCEvAciK6xhaH7NGMagU3eMdU12ga28NJpIMI6HVrIB1YGrW79ACSrtFtjVj4uKuCCgronokPMXo1IgVX6RebHvtANnmFGNMj3/34mfc E4x2oRJH nUFmS67y/DLQhfD+hLc18gBUxvczdiwNTStrslmlPav7ObsgzTtX4bE2Kaeo05N2NrcruDlY9FvvUFJeR4BF8jZEoYBiDxzsDwBEj4xRnbyAWObdpxOADIEN6oUdHTeX1h/5FRxzunyTr7kl1t0koziGdpoiMbuvbalTAYFekf6BCWqsKG0zQHN7UYQER0a7xJdNfXrBCI3FIsAh4AsQF3866X4rWC3vNZPdfDNpkN+83eCEFpl8aoEgNHniCpaiHVflJqKD6KE9TMKvpxjPD6lmJebesyWMlwnGhOinFZ4e/6OfV+QWFS/qEAanX/m4j/LglGM+tM/UxHmluOR/hmF2oCbUBMNXNkM9If7z5d3tXD3U= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: Precedence: bulk X-Loop: List-ID: List-Subscribe: List-Unsubscribe: From: Alexander Graf With KHO in place, let's add documentation that describes what it is and how to use it. Signed-off-by: Alexander Graf Co-developed-by: Mike Rapoport (Microsoft) Signed-off-by: Mike Rapoport (Microsoft) Co-developed-by: Changyuan Lyu Signed-off-by: Changyuan Lyu --- .../admin-guide/kernel-parameters.txt | 25 ++++ Documentation/kho/concepts.rst | 70 +++++++++++ Documentation/kho/fdt.rst | 62 +++++++++ Documentation/kho/index.rst | 14 +++ Documentation/kho/usage.rst | 118 ++++++++++++++++++ Documentation/subsystem-apis.rst | 1 + MAINTAINERS | 1 + 7 files changed, 291 insertions(+) create mode 100644 Documentation/kho/concepts.rst create mode 100644 Documentation/kho/fdt.rst create mode 100644 Documentation/kho/index.rst create mode 100644 Documentation/kho/usage.rst diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index fb8752b42ec8..d715c6d9dbb3 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -2698,6 +2698,31 @@ kgdbwait [KGDB,EARLY] Stop kernel execution and enter the kernel debugger at the earliest opportunity. + kho= [KEXEC,EARLY] + Format: { "0" | "1" | "off" | "on" | "y" | "n" } + Enables or disables Kexec HandOver. + "0" | "off" | "n" - kexec handover is disabled + "1" | "on" | "y" - kexec handover is enabled + + kho_scratch= [KEXEC,EARLY] + Format: ll[KMG],mm[KMG],nn[KMG] | nn% + Defines the size of the KHO scratch region. The KHO + scratch regions are physically contiguous memory + ranges that can only be used for non-kernel + allocations. That way, even when memory is heavily + fragmented with handed over memory, the kexeced + kernel will always have enough contiguous ranges to + bootstrap itself. + + It is possible to specify the exact amount of + memory in the form of "ll[KMG],mm[KMG],nn[KMG]" + where the first parameter defines the size of a low + memory scratch area, the second parameter defines + the size of a global scratch area and the third + parameter defines the size of additional per-node + scratch areas. The form "nn%" defines scale factor + (in percents) of memory that was used during boot. + kmac= [MIPS] Korina ethernet MAC address. Configure the RouterBoard 532 series on-chip Ethernet adapter MAC address. diff --git a/Documentation/kho/concepts.rst b/Documentation/kho/concepts.rst new file mode 100644 index 000000000000..174e23404ebc --- /dev/null +++ b/Documentation/kho/concepts.rst @@ -0,0 +1,70 @@ +.. SPDX-License-Identifier: GPL-2.0-or-later +.. _concepts: + +======================= +Kexec Handover Concepts +======================= + +Kexec HandOver (KHO) is a mechanism that allows Linux to preserve state - +arbitrary properties as well as memory locations - across kexec. + +It introduces multiple concepts: + +KHO State tree +============== + +Every KHO kexec carries a state tree, in the format of flattened device tree +(FDT), that describes the state of the system. Device drivers can register to +KHO to serialize their state before kexec. After KHO, device drivers can read +the FDT and extract previous state. + +KHO only uses the FDT container format and libfdt library, but does not +adhere to the same property semantics that normal device trees do: Properties +are passed in native endianness and standardized properties like ``regs`` and +``ranges`` do not exist, hence there are no ``#...-cells`` properties. + +Scratch Regions +=============== + +To boot into kexec, we need to have a physically contiguous memory range that +contains no handed over memory. Kexec then places the target kernel and initrd +into that region. The new kernel exclusively uses this region for memory +allocations before during boot up to the initialization of the page allocator. + +We guarantee that we always have such regions through the scratch regions: On +first boot KHO allocates several physically contiguous memory regions. Since +after kexec these regions will be used by early memory allocations, there is a +scratch region per NUMA node plus a scratch region to satisfy allocations +requests that do not require particular NUMA node assignment. +By default, size of the scratch region is calculated based on amount of memory +allocated during boot. The ``kho_scratch`` kernel command line option may be +used to explicitly define size of the scratch regions. +The scratch regions are declared as CMA when page allocator is initialized so +that their memory can be used during system lifetime. CMA gives us the +guarantee that no handover pages land in that region, because handover pages +must be at a static physical memory location and CMA enforces that only +movable pages can be located inside. + +After KHO kexec, we ignore the ``kho_scratch`` kernel command line option and +instead reuse the exact same region that was originally allocated. This allows +us to recursively execute any amount of KHO kexecs. Because we used this region +for boot memory allocations and as target memory for kexec blobs, some parts +of that memory region may be reserved. These reservations are irrelevant for +the next KHO, because kexec can overwrite even the original kernel. + +.. _finalization_phase: + +KHO finalization phase +====================== + +To enable user space based kexec file loader, the kernel needs to be able to +provide the FDT that describes the previous kernel's state before +performing the actual kexec. The process of generating that FDT is +called serialization. When the FDT is generated, some properties +of the system may become immutable because they are already written down +in the FDT. That state is called the KHO finalization phase. + +With the in-kernel kexec file loader, i.e., using the syscall +``kexec_file_load``, KHO FDT is not created until the actual kexec. Thus the +finalization phase is much shorter. User space can optionally choose to generate +the FDT early using the debugfs interface. diff --git a/Documentation/kho/fdt.rst b/Documentation/kho/fdt.rst new file mode 100644 index 000000000000..70b508533b77 --- /dev/null +++ b/Documentation/kho/fdt.rst @@ -0,0 +1,62 @@ +.. SPDX-License-Identifier: GPL-2.0-or-later + +======= +KHO FDT +======= + +KHO uses the flattened device tree (FDT) container format and libfdt +library to create and parse the data that is passed between the +kernels. The properties in KHO FDT are stored in native format and can +include any data KHO users need to preserve. Parsing of FDT subnodes is +responsibility of KHO users, except for nodes and properties defined by +KHO itself. + +KHO nodes and properties +======================== + +Node ``preserved-memory`` +------------------------- + +KHO saves a special node named ``preserved-memory`` under the root node. +This node contains the metadata for KHO to preserve pages across kexec. + +Property ``compatible`` +----------------------- + +The ``compatible`` property determines compatibility between the kernel +that created the KHO FDT and the kernel that attempts to load it. +If the kernel that loads the KHO FDT is not compatible with it, the entire +KHO process will be bypassed. + +Examples +======== + +The following example demonstrates KHO FDT that preserves two memory +regions create with ``reserve_mem`` kernel command line parameter:: + + /dts-v1/; + + / { + compatible = "kho-v1"; + + memblock { + compatible = "memblock-v1"; + + region1 { + compatible = "reserve-mem-v1"; + start = <0xc07a 0x4000000>; + size = <0x01 0x00>; + }; + + region2 { + compatible = "reserve-mem-v1"; + start = <0xc07b 0x4000000>; + size = <0x8000 0x00>; + }; + + }; + + preserved-memory { + metadata = <0x00 0x00>; + }; + }; diff --git a/Documentation/kho/index.rst b/Documentation/kho/index.rst new file mode 100644 index 000000000000..d108c3f8d15c --- /dev/null +++ b/Documentation/kho/index.rst @@ -0,0 +1,14 @@ +.. SPDX-License-Identifier: GPL-2.0-or-later + +======================== +Kexec Handover Subsystem +======================== + +.. toctree:: + :maxdepth: 1 + + concepts + usage + fdt + +.. only:: subproject and html diff --git a/Documentation/kho/usage.rst b/Documentation/kho/usage.rst new file mode 100644 index 000000000000..b45dc58e8d3f --- /dev/null +++ b/Documentation/kho/usage.rst @@ -0,0 +1,118 @@ +.. SPDX-License-Identifier: GPL-2.0-or-later + +==================== +Kexec Handover Usage +==================== + +Kexec HandOver (KHO) is a mechanism that allows Linux to preserve state - +arbitrary properties as well as memory locations - across kexec. + +This document expects that you are familiar with the base KHO +:ref:`concepts `. If you have not read +them yet, please do so now. + +Prerequisites +============= + +KHO is available when the ``CONFIG_KEXEC_HANDOVER`` config option is set to y +at compile time. Every KHO producer may have its own config option that you +need to enable if you would like to preserve their respective state across +kexec. + +To use KHO, please boot the kernel with the ``kho=on`` command line +parameter. You may use ``kho_scratch`` parameter to define size of the +scratch regions. For example ``kho_scratch=16M,512M,256M`` will reserve a +16 MiB low memory scratch area, a 512 MiB global scratch region, and 256 MiB +per NUMA node scratch regions on boot. + +Perform a KHO kexec +=================== + +First, before you perform a KHO kexec, you can optionally move the system into +the :ref:`KHO finalization phase ` :: + + $ echo 1 > /sys/kernel/debug/kho/out/finalize + +After this command, the KHO FDT is available in +``/sys/kernel/debug/kho/out/fdt``. + +Next, load the target payload and kexec into it. It is important that you +use the ``-s`` parameter to use the in-kernel kexec file loader, as user +space kexec tooling currently has no support for KHO with the user space +based file loader :: + + # kexec -l Image --initrd=initrd -s + # kexec -e + +If you skipped finalization in the first step, ``kexec -e`` triggers +FDT finalization automatically. The new kernel will boot up and contain +some of the previous kernel's state. + +For example, if you used ``reserve_mem`` command line parameter to create +an early memory reservation, the new kernel will have that memory at the +same physical address as the old kernel. + +Unfreeze KHO FDT data +===================== + +You can move the system out of KHO finalization phase by calling :: + + $ echo 0 > /sys/kernel/debug/kho/out/finalize + +After this command, the KHO FDT is no longer available in +``/sys/kernel/debug/kho/out/fdt``, and the states kept in KHO can be +modified by other kernel subsystems again. + +debugfs Interfaces +================== + +Currently KHO creates the following debugfs interfaces. Notice that these +interfaces may change in the future. They will be moved to sysfs once KHO is +stabilized. + +``/sys/kernel/debug/kho/out/finalize`` + Kexec HandOver (KHO) allows Linux to transition the state of + compatible drivers into the next kexec'ed kernel. To do so, + device drivers will serialize their current state into an FDT. + While the state is serialized, they are unable to perform + any modifications to state that was serialized, such as + handed over memory allocations. + + When this file contains "1", the system is in the transition + state. When contains "0", it is not. To switch between the + two states, echo the respective number into this file. + +``/sys/kernel/debug/kho/out/fdt_max`` + KHO needs to allocate a buffer for the FDT that gets + generated before it knows the final size. By default, it + will allocate 10 MiB for it. You can write to this file + to modify the size of that allocation. + +``/sys/kernel/debug/kho/out/fdt`` + When KHO state tree is finalized, the kernel exposes the + flattened device tree blob that carries its current KHO + state in this file. Kexec user space tooling can use this + as input file for the KHO payload image. + +``/sys/kernel/debug/kho/out/scratch_len`` + To support continuous KHO kexecs, we need to reserve + physically contiguous memory regions that will always stay + available for future kexec allocations. This file describes + the length of these memory regions. Kexec user space tooling + can use this to determine where it should place its payload + images. + +``/sys/kernel/debug/kho/out/scratch_phys`` + To support continuous KHO kexecs, we need to reserve + physically contiguous memory regions that will always stay + available for future kexec allocations. This file describes + the physical location of these memory regions. Kexec user space + tooling can use this to determine where it should place its + payload images. + +``/sys/kernel/debug/kho/in/fdt`` + When the kernel was booted with Kexec HandOver (KHO), + the state tree that carries metadata about the previous + kernel's state is in this file in the format of flattened + device tree. This file may disappear when all consumers of + it finished to interpret their metadata. diff --git a/Documentation/subsystem-apis.rst b/Documentation/subsystem-apis.rst index b52ad5b969d4..5fc69d6ff9f0 100644 --- a/Documentation/subsystem-apis.rst +++ b/Documentation/subsystem-apis.rst @@ -90,3 +90,4 @@ Other subsystems peci/index wmi/index tee/index + kho/index diff --git a/MAINTAINERS b/MAINTAINERS index a000a277ccf7..d0df0b380e34 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -12828,6 +12828,7 @@ F: include/linux/kernfs.h KEXEC L: W: +F: Documentation/kho/ F: include/linux/kexec*.h F: include/uapi/linux/kexec.h F: kernel/kexec*