From patchwork Fri Mar 7 00:57:36 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pratyush Yadav X-Patchwork-Id: 14005622 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 860AEC282EC for ; Fri, 7 Mar 2025 00:58:42 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1C770280005; Thu, 6 Mar 2025 19:58:41 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 13AE9280006; Thu, 6 Mar 2025 19:58:41 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EE702280005; Thu, 6 Mar 2025 19:58:40 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id D1AF2280001 for ; Thu, 6 Mar 2025 19:58:40 -0500 (EST) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 47DF616076F for ; Fri, 7 Mar 2025 00:58:41 +0000 (UTC) X-FDA: 83192944842.05.20FBA31 Received: from smtp-fw-52003.amazon.com (smtp-fw-52003.amazon.com [52.119.213.152]) by imf05.hostedemail.com (Postfix) with ESMTP id 4D5B8100004 for ; Fri, 7 Mar 2025 00:58:39 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=amazon.de header.s=amazon201209 header.b=qm3DtJU3; spf=pass (imf05.hostedemail.com: domain of "prvs=1541f9db8=ptyadav@amazon.com" designates 52.119.213.152 as permitted sender) smtp.mailfrom="prvs=1541f9db8=ptyadav@amazon.com"; dmarc=pass (policy=quarantine) header.from=amazon.de ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1741309119; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=c17Kf0Er2P2sLGN5x8xcm8SJt/PbgebI9se/WEG3Bh0=; b=BwZ1PEbqHtIkixhKhCRyYhNbUpm3FhQj6oxFxccncJ4ME80rd8SaoKfnCzrXWg7ezTmcJp i2P9pl28njAOQnur3PQ3VMvKi8ywRbebw1EfUKPXGDP269PPIuVcdMzf+knZ2PS6fE7HAf 8h43V9b3dR4vZtA07jeYju7cB9IJAtU= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=amazon.de header.s=amazon201209 header.b=qm3DtJU3; spf=pass (imf05.hostedemail.com: domain of "prvs=1541f9db8=ptyadav@amazon.com" designates 52.119.213.152 as permitted sender) smtp.mailfrom="prvs=1541f9db8=ptyadav@amazon.com"; dmarc=pass (policy=quarantine) header.from=amazon.de ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1741309119; a=rsa-sha256; cv=none; b=muVOPoqEIccCCScgWNzM0+9PUZOW8F2yQEo9+7TJRDQWEFJwu203pt7mD7i09ZDa1s+YdH VW4JYDVVQqdgUf2/djcWBB7FssU1eWSc/iMEVUHx+ERF3F9EkvmvU9cgPOrXu/b58UfkiL eb+Umy2BD+idyyE9tZni2t0yz9lfbiE= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.de; i=@amazon.de; q=dns/txt; s=amazon201209; t=1741309120; x=1772845120; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=c17Kf0Er2P2sLGN5x8xcm8SJt/PbgebI9se/WEG3Bh0=; b=qm3DtJU3YN5m4T3Vt6dl9uXzt/7VEfA4EjMincYlyCLoQYZSXdVCVsN9 BvVKWKbi0vGa+GcH2QjT0UUjxNR+77Sut28QuvdmOA3KDSdvaVfoottQ5 8TXIIcIQbgre64//dJ6FU3aSm6uJOiXlVlc/9PFxdpNLjkuCUmbtncwxF U=; X-IronPort-AV: E=Sophos;i="6.14,227,1736812800"; d="scan'208";a="72017059" Received: from iad12-co-svc-p1-lb1-vlan3.amazon.com (HELO smtpout.prod.us-west-2.prod.farcaster.email.amazon.dev) ([10.43.8.6]) by smtp-border-fw-52003.iad7.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 07 Mar 2025 00:58:36 +0000 Received: from EX19MTAUWC002.ant.amazon.com [10.0.38.20:64899] by smtpin.naws.us-west-2.prod.farcaster.email.amazon.dev [10.0.58.39:2525] with esmtp (Farcaster) id cb54e298-4a3c-4851-97b6-2bac0b19c9ca; Fri, 7 Mar 2025 00:58:34 +0000 (UTC) X-Farcaster-Flow-ID: cb54e298-4a3c-4851-97b6-2bac0b19c9ca Received: from EX19D020UWA004.ant.amazon.com (10.13.138.231) by EX19MTAUWC002.ant.amazon.com (10.250.64.143) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1544.14; Fri, 7 Mar 2025 00:58:34 +0000 Received: from EX19MTAUWA001.ant.amazon.com (10.250.64.204) by EX19D020UWA004.ant.amazon.com (10.13.138.231) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1544.14; Fri, 7 Mar 2025 00:58:34 +0000 Received: from email-imr-corp-prod-iad-all-1a-f1af3bd3.us-east-1.amazon.com (10.25.36.214) by mail-relay.amazon.com (10.250.64.204) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1544.14 via Frontend Transport; Fri, 7 Mar 2025 00:58:33 +0000 Received: from dev-dsk-ptyadav-1c-43206220.eu-west-1.amazon.com (dev-dsk-ptyadav-1c-43206220.eu-west-1.amazon.com [172.19.91.144]) by email-imr-corp-prod-iad-all-1a-f1af3bd3.us-east-1.amazon.com (Postfix) with ESMTP id 463AE40235; Fri, 7 Mar 2025 00:58:33 +0000 (UTC) Received: by dev-dsk-ptyadav-1c-43206220.eu-west-1.amazon.com (Postfix, from userid 23027615) id 05EB84FDD; Fri, 7 Mar 2025 00:58:33 +0000 (UTC) From: Pratyush Yadav To: CC: Pratyush Yadav , Jonathan Corbet , "Eric Biederman" , Arnd Bergmann , "Greg Kroah-Hartman" , Alexander Viro , Christian Brauner , Jan Kara , Hugh Dickins , Alexander Graf , Benjamin Herrenschmidt , "David Woodhouse" , James Gowans , "Mike Rapoport" , Paolo Bonzini , "Pasha Tatashin" , Anthony Yznaga , Dave Hansen , David Hildenbrand , Jason Gunthorpe , Matthew Wilcox , "Wei Yang" , Andrew Morton , , , , Subject: [RFC PATCH 2/5] misc: add documentation for FDBox Date: Fri, 7 Mar 2025 00:57:36 +0000 Message-ID: <20250307005830.65293-3-ptyadav@amazon.de> X-Mailer: git-send-email 2.47.1 In-Reply-To: <20250307005830.65293-1-ptyadav@amazon.de> References: <20250307005830.65293-1-ptyadav@amazon.de> MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Queue-Id: 4D5B8100004 X-Rspamd-Server: rspam09 X-Stat-Signature: if9taox7jrkphfku1ozcikkozxysp7mn X-HE-Tag: 1741309119-375588 X-HE-Meta: U2FsdGVkX1+RnTA5n84rpIiwU+k3zsX5pjw0KL4TPf01WcQJbUu5gEsSk/C6zB4pKQF6HFrwJe5Cy+x1anrooBOe4ONETn6zTk9Er4Pt0/rPq+CMqP18IZ7NbuCH4sZdpy+KzlOQzkvU9hBCX5+i//K62LcqKUQ88C3mTnKNmg4vUlLzyHUhakKAhEJaoUSVhPBwqqYrSbgWVfVVxqxMTh2ELsU36M19aw86Ahe7xr+HP08KeTXITSh4CLrhTMqN+/65njb827zraUTSxWA9I4WFSynFzOyVY1MAd5Fu/zOBL3JAwwfsOCV4PvJ9awMiYbbgNPgUXkfCeW8pLigIZoC6OYkjMtfG2lKc+YQin7MzikBYPqFEoJ6H4abmLWifvquZpYf7jYMkaK8HbCFE5ncjcEhD0Vhck9Q9do14/5LW7kNtwY3d6K6S5HrGJJm4yIQ6mi6druN8gufrZf+LcBAYkdkNLleBsEZxUrtaxwS/c6U2gHK5sO06rkuVC5z9e+5pgJRi787v0aVnM5grlHWTBAaYgGyAv4jLvHCeFprvFaQJbo8SNbnz3w+/sBhIRPrIFbvlHDCaY/vHGw6gexJyTxdOTpvAkDKV0EXtT1Hu05UsoHUkuGQbIgzdbbkGHCxXo3PfhbpUEwB9ze8ql7aw7JujxYmYkoUL2c2SbzY+54z/RkgE6HvFUkR4YymjYvg6TATDYtr2Dc2/1syk6whjP1qrnc9+aW/wM4mGuxzBVECKU0tmJxb/zNnFPKnxEdPva4oZJaxsCMfWZTihvJI90YXBeSGDzj+BnFfs5tWi/0/kRPMuLNHPdmYVE5fkdeKE8uEEf0yqaRk2CM4vSG3eoj8qe31LrPMCnZx0rwk5OGU3L73HhzhZayumbMgrlrLRI0p/eTU80UMuKvlyt35HBdF4zFDRh7SwKVe05zBv3Iq+aWUxWfEqUMYVqDs+pk2pJRnBRIgpIBaamVi GG4urzB2 unw1eddxpXO0rm5wgUV5hOpQwYBvJZ7w12Y6c0VEf+9KfDwLF2h/dVid/SDjN9ZQl2xwGGzOPwyCLQnGrOjB3YYTUir0D51ZiMc/zvQrlbsfZDGkfSbixOrTjoZq1qbNJ5GBsOKHm/CZLp5/Nu1fk2YYTxTkFjGGF42JYtePNN7v6BlSqSHHb17DWM2NQjOZXAOUN7ELMuUFqpdBbm3oNTRN2HzHDBkRqVd/Ldvn62hd2Gib0EvaXlMqeHbJvu8qM2zlvJVEn8qDZCBp9fWf+nxanCgJLj4oYOgvFA60uJaT8UC7oZwWtruJFm0EjMkzA2Tcqzk9CqG0oCBuVaJf6v182fQYVmmDfMUAzM46Umco1I2HW7H7A75xqngU3kA+hgQe2OtBTPZHuXl8KXGdio7Q1lAAspqjPNt7fOSe4NFwpQNuQaaJez826wqjD9tBaExNQnh9Zx0Oq10HT/ZfBddVdb5vksOD5hsjln1+FkLkX/uHViQM5UbB++fhqYLiFLH4FgOYpYsn4hv/o3cQ27YdE2diUz+KYMDVi X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: With FDBox in place, add documentation that describes what it is and how it is used, along with its UAPI and in-kernel API. Since the document refers to KHO, add a reference tag in kho/index.rst. Signed-off-by: Pratyush Yadav --- Documentation/filesystems/locking.rst | 21 +++ Documentation/kho/fdbox.rst | 224 ++++++++++++++++++++++++++ Documentation/kho/index.rst | 3 + MAINTAINERS | 1 + 4 files changed, 249 insertions(+) create mode 100644 Documentation/kho/fdbox.rst diff --git a/Documentation/filesystems/locking.rst b/Documentation/filesystems/locking.rst index d20a32b77b60f..5526833faf79a 100644 --- a/Documentation/filesystems/locking.rst +++ b/Documentation/filesystems/locking.rst @@ -607,6 +607,27 @@ used. To block changes to file contents via a memory mapping during the operation, the filesystem must take mapping->invalidate_lock to coordinate with ->page_mkwrite. +fdbox_file_ops +============== + +prototypes:: + + int (*kho_write)(struct fdbox_fd *box_fd, void *fdt); + int (*seal)(struct fdbox *box); + int (*unseal)(struct fdbox *box); + + +locking rules: + all may block + +============== ================================================== +ops i_rwsem(box_fd->file->f_inode) +============== ================================================== +kho_write: exclusive +seal: no +unseal: no +============== ================================================== + dquot_operations ================ diff --git a/Documentation/kho/fdbox.rst b/Documentation/kho/fdbox.rst new file mode 100644 index 0000000000000..44a3f5cdf1efb --- /dev/null +++ b/Documentation/kho/fdbox.rst @@ -0,0 +1,224 @@ +.. SPDX-License-Identifier: GPL-2.0-or-later + +=========================== +File Descriptor Box (FDBox) +=========================== + +:Author: Pratyush Yadav + +Introduction +============ + +The File Descriptor Box (FDBox) is a mechanism for userspace to name file +descriptors and give them over to the kernel to hold. They can later be +retrieved by passing in the same name. + +The primary purpose of FDBox is to be used with :ref:`kho`. There are many kinds +anonymous file descriptors in the kernel like memfd, guest_memfd, iommufd, etc. +that would be useful to be preserved using KHO. To be able to do that, there +needs to be a mechanism to label FDs that allows userspace to set the label +before doing KHO and to use the label to map them back after KHO. FDBox achieves +that purpose by exposing a miscdevice which exposes ioctls to label and transfer +FDs between the kernel and userspace. FDBox is not intended to work with any +generic file descriptor. Support for each kind of FDs must be explicitly +enabled. + +FDBox can be enabled by setting the ``CONFIG_FDBOX`` option to ``y``. While the +primary purpose of FDBox is to be used with KHO, it does not explicitly require +``CONFIG_KEXEC_HANDOVER``, since it can be used without KHO, simply as a way to +preserve or transfer FDs when userspace exits. + +Concepts +======== + +Box +--- + +The box is a container for FDs. Boxes are identified by their name, which must +be unique. Userspace can put FDs in the box using the ``FDBOX_PUT_FD`` +operation, and take them out of the box using the ``FDBOX_GET_FD`` operation. +Once all the required FDs are put into the box, it can be sealed to make it +ready for shipping. This can be done by the ``FDBOX_SEAL`` operation. The seal +operation notifies each FD in the box. If any of the FDs have a dependency on +another, this gives them an opportunity to ensure all dependencies are met, or +fail the seal if not. Once a box is sealed, no FDs can be added or removed from +the box until it is unsealed. Only sealed boxes are transported to a new kernel +via KHO. The box can be unsealed by the ``FDBOX_UNSEAL`` operation. This is the +opposite of seal. It also notifies each FD in the box to ensure all dependencies +are met. This can be useful in case some FDs fail to be restored after KHO. + +Box FD +------ + +The Box FD is a FD that is currently in a box. It is identified by its name, +which must be unique in the box it belongs to. The Box FD is created when a FD +is put into a box by using the ``FDBOX_PUT_FD`` operation. This operation +removes the FD from the calling task. The FD can be restored by passing the +unique name to the ``FDBOX_GET_FD`` operation. + +FDBox control device +-------------------- + +This is the ``/dev/fdbox/fdbox`` device. A box can be created using the +``FDBOX_CREATE_BOX`` operation on the device. A box can be removed using the +``FDBOX_DELETE_BOX`` operation. + +UAPI +==== + +FDBOX_NAME_LEN +-------------- + +.. code-block:: c + + #define FDBOX_NAME_LEN 256 + +Maximum length of the name of a Box or Box FD. + +Ioctls on /dev/fdbox/fdbox +-------------------------- + +FDBOX_CREATE_BOX +~~~~~~~~~~~~~~~~ + +.. code-block:: c + + #define FDBOX_CREATE_BOX _IO(FDBOX_TYPE, FDBOX_BASE + 0) + struct fdbox_create_box { + __u64 flags; + __u8 name[FDBOX_NAME_LEN]; + }; + +Create a box. + +After this returns, the box is available at ``/dev/fdbox/``. + +``name`` + The name of the box to be created. Must be unique. + +``flags`` + Flags to the operation. Currently, no flags are defined. + +Returns: + 0 on success, -1 on error, with errno set. + +FDBOX_DELETE_BOX +~~~~~~~~~~~~~~~~ + +.. code-block:: c + + #define FDBOX_DELETE_BOX _IO(FDBOX_TYPE, FDBOX_BASE + 1) + struct fdbox_delete_box { + __u64 flags; + __u8 name[FDBOX_NAME_LEN]; + }; + +Delete a box. + +After this returns, the box is no longer available at ``/dev/fdbox/``. + +``name`` + The name of the box to be deleted. + +``flags`` + Flags to the operation. Currently, no flags are defined. + +Returns: + 0 on success, -1 on error, with errno set. + +Ioctls on /dev/fdbox/ +------------------------------ + +These must be performed on the ``/dev/fdbox/`` device. + +FDBX_PUT_FD +~~~~~~~~~~~ + +.. code-block:: c + + #define FDBOX_PUT_FD _IO(FDBOX_TYPE, FDBOX_BASE + 2) + struct fdbox_put_fd { + __u64 flags; + __u32 fd; + __u32 pad; + __u8 name[FDBOX_NAME_LEN]; + }; + + +Put FD into the box. + +After this returns, ``fd`` is removed from the task and can no longer be used by +it. + +``name`` + The name of the FD. + +``fd`` + The file descriptor number to be + +``flags`` + Flags to the operation. Currently, no flags are defined. + +Returns: + 0 on success, -1 on error, with errno set. + +FDBX_GET_FD +~~~~~~~~~~~ + +.. code-block:: c + + #define FDBOX_GET_FD _IO(FDBOX_TYPE, FDBOX_BASE + 3) + struct fdbox_get_fd { + __u64 flags; + __u8 name[FDBOX_NAME_LEN]; + }; + +Get an FD from the box. + +After this returns, the FD identified by ``name`` is mapped into the task and is +available for use. + +``name`` + The name of the FD to get. + +``flags`` + Flags to the operation. Currently, no flags are defined. + +Returns: + FD number on success, -1 on error with errno set. + +FDBOX_SEAL +~~~~~~~~~~ + +.. code-block:: c + + #define FDBOX_SEAL _IO(FDBOX_TYPE, FDBOX_BASE + 4) + +Seal the box. + +Gives the kernel an opportunity to ensure all dependencies are met in the box. +After this returns, the box is sealed and FDs can no longer be added or removed +from it. A box must be sealed for it to be transported across KHO. + +Returns: + 0 on success, -1 on error with errno set. + +FDBOX_UNSEAL +~~~~~~~~~~~~ + +.. code-block:: c + + #define FDBOX_UNSEAL _IO(FDBOX_TYPE, FDBOX_BASE + 5) + +Unseal the box. + +Gives the kernel an opportunity to ensure all dependencies are met in the box, +and in case of KHO, no FDs have been lost in transit. + +Returns: + 0 on success, -1 on error with errno set. + +Kernel functions and structures +=============================== + +.. kernel-doc:: include/linux/fdbox.h diff --git a/Documentation/kho/index.rst b/Documentation/kho/index.rst index 5e7eeeca8520f..051513b956075 100644 --- a/Documentation/kho/index.rst +++ b/Documentation/kho/index.rst @@ -1,5 +1,7 @@ .. SPDX-License-Identifier: GPL-2.0-or-later +.. _kho: + ======================== Kexec Handover Subsystem ======================== @@ -9,6 +11,7 @@ Kexec Handover Subsystem concepts usage + fdbox .. only:: subproject and html diff --git a/MAINTAINERS b/MAINTAINERS index d329d3e5514c5..135427582e60f 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -8866,6 +8866,7 @@ FDBOX M: Pratyush Yadav L: linux-fsdevel@vger.kernel.org S: Maintained +F: Documentation/kho/fdbox.rst F: drivers/misc/fdbox.c F: include/linux/fdbox.h F: include/uapi/linux/fdbox.h