From patchwork Wed Jan 17 14:46:54 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Graf X-Patchwork-Id: 13521806 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 381DBC47258 for ; Wed, 17 Jan 2024 14:48:32 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C5A136B00BF; Wed, 17 Jan 2024 09:48:31 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id C0A2A6B00C5; Wed, 17 Jan 2024 09:48:31 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AD3726B00D2; Wed, 17 Jan 2024 09:48:31 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 9AE9D6B00BF for ; Wed, 17 Jan 2024 09:48:31 -0500 (EST) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 665ACA048A for ; Wed, 17 Jan 2024 14:48:31 +0000 (UTC) X-FDA: 81689084022.30.3869E98 Received: from smtp-fw-80007.amazon.com (smtp-fw-80007.amazon.com [99.78.197.218]) by imf17.hostedemail.com (Postfix) with ESMTP id 284C240013 for ; Wed, 17 Jan 2024 14:48:27 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=amazon.com header.s=amazon201209 header.b=CUm1mDni; dmarc=pass (policy=quarantine) header.from=amazon.com; spf=pass (imf17.hostedemail.com: domain of "prvs=7399cbc58=graf@amazon.de" designates 99.78.197.218 as permitted sender) smtp.mailfrom="prvs=7399cbc58=graf@amazon.de" ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1705502908; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=zs1ZonnNcsxfGweb9JZkXAKdSNx8wyk84dZsk33RDuw=; b=Cp1Cb61avH5Kzg05RocBm/+ETqraX+u30UXzz1spDFG9lauZDLuJCLojMf7+NC84kNqmvD t1+1STG8SlAcxVoW4SbfGpxAC4TzuDP9v60CD4o3/wI9rIEHPef5mxPmsW0qPlHObTgnH9 KXNZrLqsCvbzknjjGfzrjCtbuyILZSs= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=amazon.com header.s=amazon201209 header.b=CUm1mDni; dmarc=pass (policy=quarantine) header.from=amazon.com; spf=pass (imf17.hostedemail.com: domain of "prvs=7399cbc58=graf@amazon.de" designates 99.78.197.218 as permitted sender) smtp.mailfrom="prvs=7399cbc58=graf@amazon.de" ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1705502908; a=rsa-sha256; cv=none; b=gwDUUNyY/brmtuWyUI+HT65FrujP3xppFl5NYF9rHJJcpIybbJR034sa/KGV6WCd2baAWF 1zR76cs8HttlND50P5UVEJF81OO49AyXyBJKRHCa7n8hLuQ9r3957wvoOsU/5ZC9b3mcxA kjfZfoqSntrOTA8a1cDarL+bzCxlw18= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1705502908; x=1737038908; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=zs1ZonnNcsxfGweb9JZkXAKdSNx8wyk84dZsk33RDuw=; b=CUm1mDnitElHXmJ9sqPoaHDRPeFm6tYo0NRhRlAVBXiduogCsGx7gt9R tH37RkYDqLKpvB4yfKoEb/G8Lj3n8H8HEicx0pgDtPaRbtPecdaUM7ZJP HJrgkZHm5/M1hU8Dwm3ht0ynh3grCY775mlB8+FiVXYOMRjpczzcC8trK s=; X-IronPort-AV: E=Sophos;i="6.05,200,1701129600"; d="scan'208";a="267408007" Received: from pdx4-co-svc-p1-lb2-vlan2.amazon.com (HELO email-inbound-relay-pdx-2c-m6i4x-b1c0e1d0.us-west-2.amazon.com) ([10.25.36.210]) by smtp-border-fw-80007.pdx80.corp.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 17 Jan 2024 14:48:24 +0000 Received: from smtpout.prod.us-west-2.prod.farcaster.email.amazon.dev (pdx2-ws-svc-p26-lb5-vlan3.pdx.amazon.com [10.39.38.70]) by email-inbound-relay-pdx-2c-m6i4x-b1c0e1d0.us-west-2.amazon.com (Postfix) with ESMTPS id 4014B80E4C; Wed, 17 Jan 2024 14:48:22 +0000 (UTC) Received: from EX19MTAUWB001.ant.amazon.com [10.0.21.151:3455] by smtpin.naws.us-west-2.prod.farcaster.email.amazon.dev [10.0.59.34:2525] with esmtp (Farcaster) id 15515389-f0a6-44be-bc51-5bd5302b18d0; Wed, 17 Jan 2024 14:48:21 +0000 (UTC) X-Farcaster-Flow-ID: 15515389-f0a6-44be-bc51-5bd5302b18d0 Received: from EX19D020UWC004.ant.amazon.com (10.13.138.149) by EX19MTAUWB001.ant.amazon.com (10.250.64.248) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.40; Wed, 17 Jan 2024 14:48:21 +0000 Received: from dev-dsk-graf-1a-5ce218e4.eu-west-1.amazon.com (10.253.83.51) by EX19D020UWC004.ant.amazon.com (10.13.138.149) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.40; Wed, 17 Jan 2024 14:48:17 +0000 From: Alexander Graf To: CC: , , , , , , , Eric Biederman , "H . Peter Anvin" , Andy Lutomirski , Peter Zijlstra , Steven Rostedt , Andrew Morton , Mark Rutland , "Tom Lendacky" , Ashish Kalra , James Gowans , Stanislav Kinsburskii , , , , Anthony Yznaga , Usama Arif , David Woodhouse , Benjamin Herrenschmidt , Rob Herring , Krzysztof Kozlowski Subject: [PATCH v3 07/17] kexec: Add documentation for KHO Date: Wed, 17 Jan 2024 14:46:54 +0000 Message-ID: <20240117144704.602-8-graf@amazon.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20240117144704.602-1-graf@amazon.com> References: <20240117144704.602-1-graf@amazon.com> MIME-Version: 1.0 X-Originating-IP: [10.253.83.51] X-ClientProxiedBy: EX19D040UWA004.ant.amazon.com (10.13.139.93) To EX19D020UWC004.ant.amazon.com (10.13.138.149) X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 284C240013 X-Stat-Signature: 9i94hhcn8eohac53udajk4z8sfuwun1g X-Rspam-User: X-HE-Tag: 1705502907-636775 X-HE-Meta: U2FsdGVkX1/IqD81u0iJDNGLrbZsm73JPVw384ggPzLMoMsluYXeEYSx3HsthtFBceOeG0lTg9lcJiz7OgpFRPB4o/lTbiaH65uXl0AKzQbqzK3XLl7GGBVLye3bFOmFp3ZFBEftnrkixtgkKeOx3xq7HF/bj6s6+t/W7OE5LKk4p+I7+SIPH0V6sZv04SK5rK4iSEzmro5OoV0ir8W0IxzM78hn3il3h7mjWyNEEsQc9OQHcA8YB9MnPAla+4MwlBGbkRFhaYvuhkF/x4lA3tdqxW2ndc9MY5OPhl01bd9/frChOcJqHoVEkp2fLDtrntEQs4OnLa33uK6CVW8ZVfyAIOJ5Kmm+7JAsOzu9QJH3E3bvGrxnR6RP9Ae3nXtTDqJlL/UqZBFhuGlt8hpnc88LdVx0NuBBhOpcXF13PTIkyG4WMtCtGNQaxMdJKZyh64H5A3EEXpm8AmV5fs3t3IdQSDXUmPNUajV/70sm0lIpDyZApMTPluHhcv/qdNSgUQF4tOBQtN2KGgaYmRK8f9uPqh3+cj8cz6W7se8jgjcbRPQAxUJUNRsdsJSaVYMsjc8lxqOlNWCXyZxk2zNKlnaXZJ7lFEPPt3Qzu+I+W3ixr1xqXDevFSwFT8KuBCmLlZHRdJblblKnHg14b4RNyJePx1X/HkPL5eOq3o7nOc3lPwGqK4m6SGMyOSy+3vGJR6XPOGRbjmPIudwDrE+IgfLT9/IKrrOe36xu221v5J5BeAkgS2d1Z/5wumwfZF5Uia7h34SWC17Pz6RuX3XBvyJ99sifcMiOiq/r1FIrnIg3S3PYXSpJHzYLUbwHXOb8LGl33pA8FYsN9siSqlyj6QRMR+lpJgRGEpBaxwWDLQFtIzXLRy1TkkBEuheKMUvnSFwDy11ssK85J3Kc3CpB3X9c37WRs/Oj65+b3hAXrGHbCNA3u2lf+B5e++PceBaP1SkoqGgeOoV/2utKe4L p0+kEppe HdfY03uheFCXqPJoJbeABnl/mz2gwcA4CNb1sj56kHmIeXGUg71j/PrO+hyRn8u2keaOxcsQ+jT9m8Vytk+gSllcsTET3ycfplNruAQWy62czWcYot4HvmEeOQguioX4AHKHzRbcd1tq3q4RnAKk6it2ftnpm2DTgs9CDO//GMjw0jkLgq/xxPkBJb1ciMFLhhPvLhDHHqe6smlhtE2FnCtO1vzwKCaL8ZqKCc8oqPmSYNtsx2p9fRw2AMuMxEv/O18g08O965xcqgPFqj9NyhTBxVupOkH7ANsHrJvYMWaMKE31Xu2FrtyaA5MA1ZFgJ5/hBLZWzIfmWHGWiTweFsNaEi/e8tOvub9kojKZSEy2lMKednvnQFy1VhJQbwa2pEWuqFrk2I1Be3S9H7SRCnOevj3wwNCc40A4khEA+a33pnIVfqN3F7f017xR8zVm3QyUBSIZLvddzTP3SQsAD0nerdjfQRnCjVZTUWLBReEZZepEDGvI2M6YnfrX03skWDXzMNsUJGbxUB4BFb1brazZBTDCloGs45wbVMxM+LYYPftcMmXyXH9EgA+Yr8F/r8UfUtjTkAi17sRTOnw0LTARW7BTSLNupGi2c5HrmoSgWh2Ex4ORlZf91Byarzyu4pngYMYg8j0KW7mIGCBYFV0N2oqEkOUqnwiAtDGG571XQhijhbquIPMsfEQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: With KHO in place, let's add documentation that describes what it is and how to use it. Signed-off-by: Alexander Graf --- v2 -> v3: - Fix wording - Add Documentation to MAINTAINERS file --- Documentation/kho/concepts.rst | 88 ++++++++++++++++++++++++++++++++ Documentation/kho/index.rst | 19 +++++++ Documentation/kho/usage.rst | 57 +++++++++++++++++++++ Documentation/subsystem-apis.rst | 1 + MAINTAINERS | 1 + 5 files changed, 166 insertions(+) create mode 100644 Documentation/kho/concepts.rst create mode 100644 Documentation/kho/index.rst create mode 100644 Documentation/kho/usage.rst diff --git a/Documentation/kho/concepts.rst b/Documentation/kho/concepts.rst new file mode 100644 index 000000000000..cb8330bcb06c --- /dev/null +++ b/Documentation/kho/concepts.rst @@ -0,0 +1,88 @@ +.. SPDX-License-Identifier: GPL-2.0-or-later + +======================= +Kexec Handover Concepts +======================= + +Kexec HandOver (KHO) is a mechanism that allows Linux to preserve state - +arbitrary properties as well as memory locations - across kexec. + +It introduces multiple concepts: + +KHO Device Tree +--------------- + +Every KHO kexec carries a KHO specific flattened device tree blob that +describes the state of the system. Device drivers can register to KHO to +serialize their state before kexec. After KHO, device drivers can read +the device tree and extract previous state. + +KHO only uses the fdt container format and libfdt library, but does not +adhere to the same property semantics that normal device trees do: Properties +are passed in native endianness and standardized properties like ``regs`` and +``ranges`` do not exist, hence there are no ``#...-cells`` properties. + +KHO introduces a new concept to its device tree: ``mem`` properties. A +``mem`` property can be inside any subnode in the device tree. When present, +it contains an array of physical memory ranges that the new kernel must mark +as reserved on boot. It is recommended, but not required, to make these ranges +as physically contiguous as possible to reduce the number of array elements :: + + struct kho_mem { + __u64 addr; + __u64 len; + }; + +After boot, drivers can call the kho subsystem to transfer ownership of memory +that was reserved via a ``mem`` property to themselves to continue using memory +from the previous execution. + +The KHO device tree follows the in-Linux schema requirements. Any element in +the device tree is documented via device tree schema yamls that explain what +data gets transferred. + +Mem cache +--------- + +The new kernel needs to know about all memory reservations, but is unable to +parse the device tree yet in early bootup code because of memory limitations. +To simplify the initial memory reservation flow, the old kernel passes a +preprocessed array of physically contiguous reserved ranges to the new kernel. + +These reservations have to be separate from architectural memory maps and +reservations because they differ on every kexec, while the architectural ones +get passed directly between invocations. + +The less entries this cache contains, the faster the new kernel will boot. + +Scratch Region +-------------- + +To boot into kexec, we need to have a physically contiguous memory range that +contains no handed over memory. Kexec then places the target kernel and initrd +into that region. The new kernel exclusively uses this region for memory +allocations before it ingests the mem cache. + +We guarantee that we always have such a region through the scratch region: On +first boot, you can pass the ``kho_scratch`` kernel command line option. When +it is set, Linux allocates a CMA region of the given size. CMA gives us the +guarantee that no handover pages land in that region, because handover +pages must be at a static physical memory location and CMA enforces that +only movable pages can be located inside. + +After KHO kexec, we ignore the ``kho_scratch`` kernel command line option and +instead reuse the exact same region that was originally allocated. This allows +us to recursively execute any amount of KHO kexecs. Because we used this region +for boot memory allocations and as target memory for kexec blobs, some parts +of that memory region may be reserved. These reservations are irrenevant for +the next KHO, because kexec can overwrite even the original kernel. + +KHO active phase +---------------- + +To enable user space based kexec file loader, the kernel needs to be able to +provide the device tree that describes the previous kernel's state before +performing the actual kexec. The process of generating that device tree is +called serialization. When the device tree is generated, some properties +of the system may become immutable because they are already written down +in the device tree. That state is called the KHO active phase. diff --git a/Documentation/kho/index.rst b/Documentation/kho/index.rst new file mode 100644 index 000000000000..5e7eeeca8520 --- /dev/null +++ b/Documentation/kho/index.rst @@ -0,0 +1,19 @@ +.. SPDX-License-Identifier: GPL-2.0-or-later + +======================== +Kexec Handover Subsystem +======================== + +.. toctree:: + :maxdepth: 1 + + concepts + usage + +.. only:: subproject and html + + + Indices + ======= + + * :ref:`genindex` diff --git a/Documentation/kho/usage.rst b/Documentation/kho/usage.rst new file mode 100644 index 000000000000..59e82f609f75 --- /dev/null +++ b/Documentation/kho/usage.rst @@ -0,0 +1,57 @@ +.. SPDX-License-Identifier: GPL-2.0-or-later + +==================== +Kexec Handover Usage +==================== + +Kexec HandOver (KHO) is a mechanism that allows Linux to preserve state - +arbitrary properties as well as memory locations - across kexec. + +This document expects that you are familiar with the base KHO +:ref:`Documentation/kho/concepts.rst `. If you have not read +them yet, please do so now. + +Prerequisites +------------- + +KHO is available when the ``CONFIG_KEXEC_KHO`` config option is set to y +at compile time. Every KHO producer has its own config option that you +need to enable if you would like to preserve their respective state across +kexec. + +To use KHO, please boot the kernel with the ``kho_scratch`` command +line parameter set to allocate a scratch region. For example +``kho_scratch=512M`` will reserve a 512 MiB scratch region on boot. + +Perform a KHO kexec +------------------- + +Before you can perform a KHO kexec, you need to move the system into the +:ref:`Documentation/kho/concepts.rst ` :: + + $ echo 1 > /sys/kernel/kho/active + +After this command, the KHO device tree is available in ``/sys/kernel/kho/dt``. + +Next, load the target payload and kexec into it. It is important that you +use the ``-s`` parameter to use the in-kernel kexec file loader, as user +space kexec tooling currently has no support for KHO with the user space +based file loader :: + + # kexec -l Image --initrd=initrd -s + # kexec -e + +The new kernel will boot up and contain some of the previous kernel's state. + +For example, if you enabled ``CONFIG_FTRACE_KHO``, the new kernel will contain +the old kernel's trace buffers in ``/sys/kernel/debug/tracing/trace``. + +Abort a KHO exec +---------------- + +You can move the system out of KHO active phase again by calling :: + + $ echo 1 > /sys/kernel/kho/active + +After this command, the KHO device tree is no longer available in +``/sys/kernel/kho/dt``. diff --git a/Documentation/subsystem-apis.rst b/Documentation/subsystem-apis.rst index 2d353fb8ea26..7c366337db5d 100644 --- a/Documentation/subsystem-apis.rst +++ b/Documentation/subsystem-apis.rst @@ -87,3 +87,4 @@ Storage interfaces peci/index wmi/index tee/index + kho/index diff --git a/MAINTAINERS b/MAINTAINERS index 88bf6730d801..1c48e4ea4005 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -11826,6 +11826,7 @@ S: Maintained W: http://kernel.org/pub/linux/utils/kernel/kexec/ F: Documentation/ABI/testing/sysfs-firmware-kho F: Documentation/ABI/testing/sysfs-kernel-kho +F: Documentation/kho/ F: include/linux/kexec.h F: include/uapi/linux/kexec.h F: kernel/kexec*