diff mbox series

[v4,06/10] fwctl: Add documentation

Message ID 6-v4-0cf4ec3b8143+4995-fwctl_jgg@nvidia.com (mailing list archive)
State Not Applicable
Headers show
Series Introduce fwctl subystem | expand

Commit Message

Jason Gunthorpe Feb. 7, 2025, 12:13 a.m. UTC
Document the purpose and rules for the fwctl subsystem.

Link in kdocs to the doc tree.

Nacked-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20240603114250.5325279c@kernel.org
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
https://lore.kernel.org/r/ZrHY2Bds7oF7KRGz@phenom.ffwll.local
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 Documentation/userspace-api/fwctl/fwctl.rst | 285 ++++++++++++++++++++
 Documentation/userspace-api/fwctl/index.rst |  12 +
 Documentation/userspace-api/index.rst       |   1 +
 MAINTAINERS                                 |   2 +-
 4 files changed, 299 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/userspace-api/fwctl/fwctl.rst
 create mode 100644 Documentation/userspace-api/fwctl/index.rst

Comments

Jonathan Cameron Feb. 7, 2025, 2:42 p.m. UTC | #1
On Thu,  6 Feb 2025 20:13:28 -0400
Jason Gunthorpe <jgg@nvidia.com> wrote:

> Document the purpose and rules for the fwctl subsystem.
> 
> Link in kdocs to the doc tree.
> 
> Nacked-by: Jakub Kicinski <kuba@kernel.org>
> Link: https://lore.kernel.org/r/20240603114250.5325279c@kernel.org
> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> https://lore.kernel.org/r/ZrHY2Bds7oF7KRGz@phenom.ffwll.local
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

A few tiny things inline.

> ---
>  Documentation/userspace-api/fwctl/fwctl.rst | 285 ++++++++++++++++++++
>  Documentation/userspace-api/fwctl/index.rst |  12 +
>  Documentation/userspace-api/index.rst       |   1 +
>  MAINTAINERS                                 |   2 +-
>  4 files changed, 299 insertions(+), 1 deletion(-)
>  create mode 100644 Documentation/userspace-api/fwctl/fwctl.rst
>  create mode 100644 Documentation/userspace-api/fwctl/index.rst
> 
> diff --git a/Documentation/userspace-api/fwctl/fwctl.rst b/Documentation/userspace-api/fwctl/fwctl.rst
> new file mode 100644
> index 00000000000000..428f6f5bb9b4f9
> --- /dev/null
> +++ b/Documentation/userspace-api/fwctl/fwctl.rst
> @@ -0,0 +1,285 @@

> +Operations exposed through fwctl's non-taining interfaces should be fully
> +sharable with other users of the device. For instance exposing a RPC through
> +fwctl should never prevent a kernel subsystem from also concurrently using that
> +same RPC or hardware unit down the road. In such cases fwctl will be less
> +important than proper kernel subsystems that eventually emerge. Mistakes in this
> +area resulting in clashes will be resolved in favour of a kernel implementation.
> +
> +fwctl User API
> +==============
> +
> +.. kernel-doc:: include/uapi/fwctl/fwctl.h
> +.. kernel-doc:: include/uapi/fwctl/mlx5.h

Doesn't exist yet...  I'm not sure if that actually causes a build issue
or not but probably better to just slip this in later in the series.

> +Development and debugging focused RPCs under more permissive scopes can have
> +less stablitiy if the tools using them are only run under exceptional

stability 

> +circumstances and not for every day use of the device. Debugging tools may even
> +require exact version matching as they may require something similar to DWARF
> +debug information from the FW binary.
> +

...

> diff --git a/MAINTAINERS b/MAINTAINERS
> index 5f30adbe6c8521..319169f7cb7e1c 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -9561,7 +9561,7 @@ FWCTL SUBSYSTEM
>  M:	Jason Gunthorpe <jgg@nvidia.com>
>  M:	Saeed Mahameed <saeedm@nvidia.com>
>  S:	Maintained
> -F:	Documentation/userspace-api/fwctl.rst
> +F:	Documentation/userspace-api/fwctl/

Push back to patch 1 or introduce this here for the first time.

>  F:	drivers/fwctl/
>  F:	include/linux/fwctl.h
>  F:	include/uapi/fwctl/
Dave Jiang Feb. 8, 2025, 12:40 a.m. UTC | #2
On 2/6/25 5:13 PM, Jason Gunthorpe wrote:
> Document the purpose and rules for the fwctl subsystem.
> 
> Link in kdocs to the doc tree.
> 
> Nacked-by: Jakub Kicinski <kuba@kernel.org>
> Link: https://lore.kernel.org/r/20240603114250.5325279c@kernel.org
> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> https://lore.kernel.org/r/ZrHY2Bds7oF7KRGz@phenom.ffwll.local
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

One spelling correction below. Otherwise
Reviewed-by: Dave Jiang <dave.jiang@intel.com>

> ---
>  Documentation/userspace-api/fwctl/fwctl.rst | 285 ++++++++++++++++++++
>  Documentation/userspace-api/fwctl/index.rst |  12 +
>  Documentation/userspace-api/index.rst       |   1 +
>  MAINTAINERS                                 |   2 +-
>  4 files changed, 299 insertions(+), 1 deletion(-)
>  create mode 100644 Documentation/userspace-api/fwctl/fwctl.rst
>  create mode 100644 Documentation/userspace-api/fwctl/index.rst
> 
> diff --git a/Documentation/userspace-api/fwctl/fwctl.rst b/Documentation/userspace-api/fwctl/fwctl.rst
> new file mode 100644
> index 00000000000000..428f6f5bb9b4f9
> --- /dev/null
> +++ b/Documentation/userspace-api/fwctl/fwctl.rst
> @@ -0,0 +1,285 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +===============
> +fwctl subsystem
> +===============
> +
> +:Author: Jason Gunthorpe
> +
> +Overview
> +========
> +
> +Modern devices contain extensive amounts of FW, and in many cases, are largely
> +software-defined pieces of hardware. The evolution of this approach is largely a
> +reaction to Moore's Law where a chip tape out is now highly expensive, and the
> +chip design is extremely large. Replacing fixed HW logic with a flexible and
> +tightly coupled FW/HW combination is an effective risk mitigation against chip
> +respin. Problems in the HW design can be counteracted in device FW. This is
> +especially true for devices which present a stable and backwards compatible
> +interface to the operating system driver (such as NVMe).
> +
> +The FW layer in devices has grown to incredible size and devices frequently
> +integrate clusters of fast processors to run it. For example, mlx5 devices have
> +over 30MB of FW code, and big configurations operate with over 1GB of FW managed
> +runtime state.
> +
> +The availability of such a flexible layer has created quite a variety in the
> +industry where single pieces of silicon are now configurable software-defined
> +devices and can operate in substantially different ways depending on the need.
> +Further, we often see cases where specific sites wish to operate devices in ways
> +that are highly specialized and require applications that have been tailored to
> +their unique configuration.
> +
> +Further, devices have become multi-functional and integrated to the point they
> +no longer fit neatly into the kernel's division of subsystems. Modern
> +multi-functional devices have drivers, such as bnxt/ice/mlx5/pds, that span many
> +subsystems while sharing the underlying hardware using the auxiliary device
> +system.
> +
> +All together this creates a challenge for the operating system, where devices
> +have an expansive FW environment that needs robust device-specific debugging
> +support, and FW-driven functionality that is not well suited to “generic”
> +interfaces. fwctl seeks to allow access to the full device functionality from
> +user space in the areas of debuggability, management, and first-boot/nth-boot
> +provisioning.
> +
> +fwctl is aimed at the common device design pattern where the OS and FW
> +communicate via an RPC message layer constructed with a queue or mailbox scheme.
> +In this case the driver will typically have some layer to deliver RPC messages
> +and collect RPC responses from device FW. The in-kernel subsystem drivers that
> +operate the device for its primary purposes will use these RPCs to build their
> +drivers, but devices also usually have a set of ancillary RPCs that don't really
> +fit into any specific subsystem. For example, a HW RAID controller is primarily
> +operated by the block layer but also comes with a set of RPCs to administer the
> +construction of drives within the HW RAID.
> +
> +In the past when devices were more single function, individual subsystems would
> +grow different approaches to solving some of these common problems. For instance
> +monitoring device health, manipulating its FLASH, debugging the FW,
> +provisioning, all have various unique interfaces across the kernel.
> +
> +fwctl's purpose is to define a common set of limited rules, described below,
> +that allow user space to securely construct and execute RPCs inside device FW.
> +The rules serve as an agreement between the operating system and FW on how to
> +correctly design the RPC interface. As a uAPI the subsystem provides a thin
> +layer of discovery and a generic uAPI to deliver the RPCs and collect the
> +response. It supports a system of user space libraries and tools which will
> +use this interface to control the device using the device native protocols.
> +
> +Scope of Action
> +---------------
> +
> +fwctl drivers are strictly restricted to being a way to operate the device FW.
> +It is not an avenue to access random kernel internals, or other operating system
> +SW states.
> +
> +fwctl instances must operate on a well-defined device function, and the device
> +should have a well-defined security model for what scope within the physical
> +device the function is permitted to access. For instance, the most complex PCIe
> +device today may broadly have several function-level scopes:
> +
> + 1. A privileged function with full access to the on-device global state and
> +    configuration
> +
> + 2. Multiple hypervisor functions with control over itself and child functions
> +    used with VMs
> +
> + 3. Multiple VM functions tightly scoped within the VM
> +
> +The device may create a logical parent/child relationship between these scopes.
> +For instance a child VM's FW may be within the scope of the hypervisor FW. It is
> +quite common in the VFIO world that the hypervisor environment has a complex
> +provisioning/profiling/configuration responsibility for the function VFIO
> +assigns to the VM.
> +
> +Further, within the function, devices often have RPC commands that fall within
> +some general scopes of action (see enum fwctl_rpc_scope):
> +
> + 1. Access to function & child configuration, FLASH, etc. that becomes live at a
> +    function reset. Access to function & child runtime configuration that is
> +    transparent or non-disruptive to any driver or VM.
> +
> + 2. Read-only access to function debug information that may report on FW objects
> +    in the function & child, including FW objects owned by other kernel
> +    subsystems.
> +
> + 3. Write access to function & child debug information strictly compatible with
> +    the principles of kernel lockdown and kernel integrity protection. Triggers
> +    a kernel Taint.
> +
> + 4. Full debug device access. Triggers a kernel Taint, requires CAP_SYS_RAWIO.
> +
> +User space will provide a scope label on each RPC and the kernel must enforce the
> +above CAPs and taints based on that scope. A combination of kernel and FW can
> +enforce that RPCs are placed in the correct scope by user space.
> +
> +Denied behavior
> +---------------
> +
> +There are many things this interface must not allow user space to do (without a
> +Taint or CAP), broadly derived from the principles of kernel lockdown. Some
> +examples:
> +
> + 1. DMA to/from arbitrary memory, hang the system, compromise FW integrity with
> +    untrusted code, or otherwise compromise device or system security and
> +    integrity.
> +
> + 2. Provide an abnormal “back door” to kernel drivers. No manipulation of kernel
> +    objects owned by kernel drivers.
> +
> + 3. Directly configure or otherwise control kernel drivers. A subsystem kernel
> +    driver can react to the device configuration at function reset/driver load
> +    time, but otherwise must not be coupled to fwctl.
> +
> + 4. Operate the HW in a way that overlaps with the core purpose of another
> +    primary kernel subsystem, such as read/write to LBAs, send/receive of
> +    network packets, or operate an accelerator's data plane.
> +
> +fwctl is not a replacement for device direct access subsystems like uacce or
> +VFIO.
> +
> +Operations exposed through fwctl's non-taining interfaces should be fully
> +sharable with other users of the device. For instance exposing a RPC through
> +fwctl should never prevent a kernel subsystem from also concurrently using that
> +same RPC or hardware unit down the road. In such cases fwctl will be less
> +important than proper kernel subsystems that eventually emerge. Mistakes in this
> +area resulting in clashes will be resolved in favour of a kernel implementation.
> +
> +fwctl User API
> +==============
> +
> +.. kernel-doc:: include/uapi/fwctl/fwctl.h
> +.. kernel-doc:: include/uapi/fwctl/mlx5.h
> +
> +sysfs Class
> +-----------
> +
> +fwctl has a sysfs class (/sys/class/fwctl/fwctlNN/) and character devices
> +(/dev/fwctl/fwctlNN) with a simple numbered scheme. The character device
> +operates the iotcl uAPI described above.
> +
> +fwctl devices can be related to driver components in other subsystems through
> +sysfs::
> +
> +    $ ls /sys/class/fwctl/fwctl0/device/infiniband/
> +    ibp0s10f0
> +
> +    $ ls /sys/class/infiniband/ibp0s10f0/device/fwctl/
> +    fwctl0/
> +
> +    $ ls /sys/devices/pci0000:00/0000:00:0a.0/fwctl/fwctl0
> +    dev  device  power  subsystem  uevent
> +
> +User space Community
> +--------------------
> +
> +Drawing inspiration from nvme-cli, participating in the kernel side must come
> +with a user space in a common TBD git tree, at a minimum to usefully operate the
> +kernel driver. Providing such an implementation is a pre-condition to merging a
> +kernel driver.
> +
> +The goal is to build user space community around some of the shared problems
> +we all have, and ideally develop some common user space programs with some
> +starting themes of:
> +
> + - Device in-field debugging
> +
> + - HW provisioning
> +
> + - VFIO child device profiling before VM boot
> +
> + - Confidential Compute topics (attestation, secure provisioning)
> +
> +that stretch across all subsystems in the kernel. fwupd is a great example of
> +how an excellent user space experience can emerge out of kernel-side diversity.
> +
> +fwctl Kernel API
> +================
> +
> +.. kernel-doc:: drivers/fwctl/main.c
> +   :export:
> +.. kernel-doc:: include/linux/fwctl.h
> +
> +fwctl Driver design
> +-------------------
> +
> +In many cases a fwctl driver is going to be part of a larger cross-subsystem
> +device possibly using the auxiliary_device mechanism. In that case several
> +subsystems are going to be sharing the same device and FW interface layer so the
> +device design must already provide for isolation and cooperation between kernel
> +subsystems. fwctl should fit into that same model.
> +
> +Part of the driver should include a description of how its scope restrictions
> +and security model work. The driver and FW together must ensure that RPCs
> +provided by user space are mapped to the appropriate scope. If the validation is
> +done in the driver then the validation can read a 'command effects' report from
> +the device, or hardwire the enforcement. If the validation is done in the FW,
> +then the driver should pass the fwctl_rpc_scope to the FW along with the command.
> +
> +The driver and FW must cooperate to ensure that either fwctl cannot allocate
> +any FW resources, or any resources it does allocate are freed on FD closure.  A
> +driver primarily constructed around FW RPCs may find that its core PCI function
> +and RPC layer belongs under fwctl with auxiliary devices connecting to other
> +subsystems.
> +
> +Each device type must be mindful of Linux's philosophy for stable ABI. The FW
> +RPC interface does not have to meet a strictly stable ABI, but it does need to
> +meet an expectation that userspace tools that are deployed and in significant
> +use don't needlessly break. FW upgrade and kernel upgrade should keep widely
> +deployed tooling working.
> +
> +Development and debugging focused RPCs under more permissive scopes can have
> +less stablitiy if the tools using them are only run under exceptional

s/stablitiy/stability/

DJ

> +circumstances and not for every day use of the device. Debugging tools may even
> +require exact version matching as they may require something similar to DWARF
> +debug information from the FW binary.
> +
> +Security Response
> +=================
> +
> +The kernel remains the gatekeeper for this interface. If violations of the
> +scopes, security or isolation principles are found, we have options to let
> +devices fix them with a FW update, push a kernel patch to parse and block RPC
> +commands or push a kernel patch to block entire firmware versions/devices.
> +
> +While the kernel can always directly parse and restrict RPCs, it is expected
> +that the existing kernel pattern of allowing drivers to delegate validation to
> +FW to be a useful design.
> +
> +Existing Similar Examples
> +=========================
> +
> +The approach described in this document is not a new idea. Direct, or near
> +direct device access has been offered by the kernel in different areas for
> +decades. With more devices wanting to follow this design pattern it is becoming
> +clear that it is not entirely well understood and, more importantly, the
> +security considerations are not well defined or agreed upon.
> +
> +Some examples:
> +
> + - HW RAID controllers. This includes RPCs to do things like compose drives into
> +   a RAID volume, configure RAID parameters, monitor the HW and more.
> +
> + - Baseboard managers. RPCs for configuring settings in the device and more
> +
> + - NVMe vendor command capsules. nvme-cli provides access to some monitoring
> +   functions that different products have defined, but more exist.
> +
> + - CXL also has a NVMe-like vendor command system.
> +
> + - DRM allows user space drivers to send commands to the device via kernel
> +   mediation
> +
> + - RDMA allows user space drivers to directly push commands to the device
> +   without kernel involvement
> +
> + - Various “raw” APIs, raw HID (SDL2), raw USB, NVMe Generic Interface, etc.
> +
> +The first 4 are examples of areas that fwctl intends to cover. The latter three
> +are examples of denied behavior as they fully overlap with the primary purpose
> +of a kernel subsystem.
> +
> +Some key lessons learned from these past efforts are the importance of having a
> +common user space project to use as a pre-condition for obtaining a kernel
> +driver. Developing good community around useful software in user space is key to
> +getting companies to fund participation to enable their products.
> diff --git a/Documentation/userspace-api/fwctl/index.rst b/Documentation/userspace-api/fwctl/index.rst
> new file mode 100644
> index 00000000000000..06959fbf154743
> --- /dev/null
> +++ b/Documentation/userspace-api/fwctl/index.rst
> @@ -0,0 +1,12 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +Firmware Control (FWCTL) Userspace API
> +======================================
> +
> +A framework that define a common set of limited rules that allows user space
> +to securely construct and execute RPCs inside device firmware.
> +
> +.. toctree::
> +   :maxdepth: 1
> +
> +   fwctl
> diff --git a/Documentation/userspace-api/index.rst b/Documentation/userspace-api/index.rst
> index b1395d94b3fd0a..e8e861f767fd5c 100644
> --- a/Documentation/userspace-api/index.rst
> +++ b/Documentation/userspace-api/index.rst
> @@ -45,6 +45,7 @@ Devices and I/O
>  
>     accelerators/ocxl
>     dma-buf-alloc-exchange
> +   fwctl/index
>     gpio/index
>     iommufd
>     media/index
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 5f30adbe6c8521..319169f7cb7e1c 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -9561,7 +9561,7 @@ FWCTL SUBSYSTEM
>  M:	Jason Gunthorpe <jgg@nvidia.com>
>  M:	Saeed Mahameed <saeedm@nvidia.com>
>  S:	Maintained
> -F:	Documentation/userspace-api/fwctl.rst
> +F:	Documentation/userspace-api/fwctl/
>  F:	drivers/fwctl/
>  F:	include/linux/fwctl.h
>  F:	include/uapi/fwctl/
Jason Gunthorpe Feb. 10, 2025, 3:17 p.m. UTC | #3
On Fri, Feb 07, 2025 at 02:42:49PM +0000, Jonathan Cameron wrote:
> On Thu,  6 Feb 2025 20:13:28 -0400
> Jason Gunthorpe <jgg@nvidia.com> wrote:
> 
> > Document the purpose and rules for the fwctl subsystem.
> > 
> > Link in kdocs to the doc tree.
> > 
> > Nacked-by: Jakub Kicinski <kuba@kernel.org>
> > Link: https://lore.kernel.org/r/20240603114250.5325279c@kernel.org
> > Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> > https://lore.kernel.org/r/ZrHY2Bds7oF7KRGz@phenom.ffwll.local
> > Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> 
> A few tiny things inline.

Got them all, thanks

Jason
diff mbox series

Patch

diff --git a/Documentation/userspace-api/fwctl/fwctl.rst b/Documentation/userspace-api/fwctl/fwctl.rst
new file mode 100644
index 00000000000000..428f6f5bb9b4f9
--- /dev/null
+++ b/Documentation/userspace-api/fwctl/fwctl.rst
@@ -0,0 +1,285 @@ 
+.. SPDX-License-Identifier: GPL-2.0
+
+===============
+fwctl subsystem
+===============
+
+:Author: Jason Gunthorpe
+
+Overview
+========
+
+Modern devices contain extensive amounts of FW, and in many cases, are largely
+software-defined pieces of hardware. The evolution of this approach is largely a
+reaction to Moore's Law where a chip tape out is now highly expensive, and the
+chip design is extremely large. Replacing fixed HW logic with a flexible and
+tightly coupled FW/HW combination is an effective risk mitigation against chip
+respin. Problems in the HW design can be counteracted in device FW. This is
+especially true for devices which present a stable and backwards compatible
+interface to the operating system driver (such as NVMe).
+
+The FW layer in devices has grown to incredible size and devices frequently
+integrate clusters of fast processors to run it. For example, mlx5 devices have
+over 30MB of FW code, and big configurations operate with over 1GB of FW managed
+runtime state.
+
+The availability of such a flexible layer has created quite a variety in the
+industry where single pieces of silicon are now configurable software-defined
+devices and can operate in substantially different ways depending on the need.
+Further, we often see cases where specific sites wish to operate devices in ways
+that are highly specialized and require applications that have been tailored to
+their unique configuration.
+
+Further, devices have become multi-functional and integrated to the point they
+no longer fit neatly into the kernel's division of subsystems. Modern
+multi-functional devices have drivers, such as bnxt/ice/mlx5/pds, that span many
+subsystems while sharing the underlying hardware using the auxiliary device
+system.
+
+All together this creates a challenge for the operating system, where devices
+have an expansive FW environment that needs robust device-specific debugging
+support, and FW-driven functionality that is not well suited to “generic”
+interfaces. fwctl seeks to allow access to the full device functionality from
+user space in the areas of debuggability, management, and first-boot/nth-boot
+provisioning.
+
+fwctl is aimed at the common device design pattern where the OS and FW
+communicate via an RPC message layer constructed with a queue or mailbox scheme.
+In this case the driver will typically have some layer to deliver RPC messages
+and collect RPC responses from device FW. The in-kernel subsystem drivers that
+operate the device for its primary purposes will use these RPCs to build their
+drivers, but devices also usually have a set of ancillary RPCs that don't really
+fit into any specific subsystem. For example, a HW RAID controller is primarily
+operated by the block layer but also comes with a set of RPCs to administer the
+construction of drives within the HW RAID.
+
+In the past when devices were more single function, individual subsystems would
+grow different approaches to solving some of these common problems. For instance
+monitoring device health, manipulating its FLASH, debugging the FW,
+provisioning, all have various unique interfaces across the kernel.
+
+fwctl's purpose is to define a common set of limited rules, described below,
+that allow user space to securely construct and execute RPCs inside device FW.
+The rules serve as an agreement between the operating system and FW on how to
+correctly design the RPC interface. As a uAPI the subsystem provides a thin
+layer of discovery and a generic uAPI to deliver the RPCs and collect the
+response. It supports a system of user space libraries and tools which will
+use this interface to control the device using the device native protocols.
+
+Scope of Action
+---------------
+
+fwctl drivers are strictly restricted to being a way to operate the device FW.
+It is not an avenue to access random kernel internals, or other operating system
+SW states.
+
+fwctl instances must operate on a well-defined device function, and the device
+should have a well-defined security model for what scope within the physical
+device the function is permitted to access. For instance, the most complex PCIe
+device today may broadly have several function-level scopes:
+
+ 1. A privileged function with full access to the on-device global state and
+    configuration
+
+ 2. Multiple hypervisor functions with control over itself and child functions
+    used with VMs
+
+ 3. Multiple VM functions tightly scoped within the VM
+
+The device may create a logical parent/child relationship between these scopes.
+For instance a child VM's FW may be within the scope of the hypervisor FW. It is
+quite common in the VFIO world that the hypervisor environment has a complex
+provisioning/profiling/configuration responsibility for the function VFIO
+assigns to the VM.
+
+Further, within the function, devices often have RPC commands that fall within
+some general scopes of action (see enum fwctl_rpc_scope):
+
+ 1. Access to function & child configuration, FLASH, etc. that becomes live at a
+    function reset. Access to function & child runtime configuration that is
+    transparent or non-disruptive to any driver or VM.
+
+ 2. Read-only access to function debug information that may report on FW objects
+    in the function & child, including FW objects owned by other kernel
+    subsystems.
+
+ 3. Write access to function & child debug information strictly compatible with
+    the principles of kernel lockdown and kernel integrity protection. Triggers
+    a kernel Taint.
+
+ 4. Full debug device access. Triggers a kernel Taint, requires CAP_SYS_RAWIO.
+
+User space will provide a scope label on each RPC and the kernel must enforce the
+above CAPs and taints based on that scope. A combination of kernel and FW can
+enforce that RPCs are placed in the correct scope by user space.
+
+Denied behavior
+---------------
+
+There are many things this interface must not allow user space to do (without a
+Taint or CAP), broadly derived from the principles of kernel lockdown. Some
+examples:
+
+ 1. DMA to/from arbitrary memory, hang the system, compromise FW integrity with
+    untrusted code, or otherwise compromise device or system security and
+    integrity.
+
+ 2. Provide an abnormal “back door” to kernel drivers. No manipulation of kernel
+    objects owned by kernel drivers.
+
+ 3. Directly configure or otherwise control kernel drivers. A subsystem kernel
+    driver can react to the device configuration at function reset/driver load
+    time, but otherwise must not be coupled to fwctl.
+
+ 4. Operate the HW in a way that overlaps with the core purpose of another
+    primary kernel subsystem, such as read/write to LBAs, send/receive of
+    network packets, or operate an accelerator's data plane.
+
+fwctl is not a replacement for device direct access subsystems like uacce or
+VFIO.
+
+Operations exposed through fwctl's non-taining interfaces should be fully
+sharable with other users of the device. For instance exposing a RPC through
+fwctl should never prevent a kernel subsystem from also concurrently using that
+same RPC or hardware unit down the road. In such cases fwctl will be less
+important than proper kernel subsystems that eventually emerge. Mistakes in this
+area resulting in clashes will be resolved in favour of a kernel implementation.
+
+fwctl User API
+==============
+
+.. kernel-doc:: include/uapi/fwctl/fwctl.h
+.. kernel-doc:: include/uapi/fwctl/mlx5.h
+
+sysfs Class
+-----------
+
+fwctl has a sysfs class (/sys/class/fwctl/fwctlNN/) and character devices
+(/dev/fwctl/fwctlNN) with a simple numbered scheme. The character device
+operates the iotcl uAPI described above.
+
+fwctl devices can be related to driver components in other subsystems through
+sysfs::
+
+    $ ls /sys/class/fwctl/fwctl0/device/infiniband/
+    ibp0s10f0
+
+    $ ls /sys/class/infiniband/ibp0s10f0/device/fwctl/
+    fwctl0/
+
+    $ ls /sys/devices/pci0000:00/0000:00:0a.0/fwctl/fwctl0
+    dev  device  power  subsystem  uevent
+
+User space Community
+--------------------
+
+Drawing inspiration from nvme-cli, participating in the kernel side must come
+with a user space in a common TBD git tree, at a minimum to usefully operate the
+kernel driver. Providing such an implementation is a pre-condition to merging a
+kernel driver.
+
+The goal is to build user space community around some of the shared problems
+we all have, and ideally develop some common user space programs with some
+starting themes of:
+
+ - Device in-field debugging
+
+ - HW provisioning
+
+ - VFIO child device profiling before VM boot
+
+ - Confidential Compute topics (attestation, secure provisioning)
+
+that stretch across all subsystems in the kernel. fwupd is a great example of
+how an excellent user space experience can emerge out of kernel-side diversity.
+
+fwctl Kernel API
+================
+
+.. kernel-doc:: drivers/fwctl/main.c
+   :export:
+.. kernel-doc:: include/linux/fwctl.h
+
+fwctl Driver design
+-------------------
+
+In many cases a fwctl driver is going to be part of a larger cross-subsystem
+device possibly using the auxiliary_device mechanism. In that case several
+subsystems are going to be sharing the same device and FW interface layer so the
+device design must already provide for isolation and cooperation between kernel
+subsystems. fwctl should fit into that same model.
+
+Part of the driver should include a description of how its scope restrictions
+and security model work. The driver and FW together must ensure that RPCs
+provided by user space are mapped to the appropriate scope. If the validation is
+done in the driver then the validation can read a 'command effects' report from
+the device, or hardwire the enforcement. If the validation is done in the FW,
+then the driver should pass the fwctl_rpc_scope to the FW along with the command.
+
+The driver and FW must cooperate to ensure that either fwctl cannot allocate
+any FW resources, or any resources it does allocate are freed on FD closure.  A
+driver primarily constructed around FW RPCs may find that its core PCI function
+and RPC layer belongs under fwctl with auxiliary devices connecting to other
+subsystems.
+
+Each device type must be mindful of Linux's philosophy for stable ABI. The FW
+RPC interface does not have to meet a strictly stable ABI, but it does need to
+meet an expectation that userspace tools that are deployed and in significant
+use don't needlessly break. FW upgrade and kernel upgrade should keep widely
+deployed tooling working.
+
+Development and debugging focused RPCs under more permissive scopes can have
+less stablitiy if the tools using them are only run under exceptional
+circumstances and not for every day use of the device. Debugging tools may even
+require exact version matching as they may require something similar to DWARF
+debug information from the FW binary.
+
+Security Response
+=================
+
+The kernel remains the gatekeeper for this interface. If violations of the
+scopes, security or isolation principles are found, we have options to let
+devices fix them with a FW update, push a kernel patch to parse and block RPC
+commands or push a kernel patch to block entire firmware versions/devices.
+
+While the kernel can always directly parse and restrict RPCs, it is expected
+that the existing kernel pattern of allowing drivers to delegate validation to
+FW to be a useful design.
+
+Existing Similar Examples
+=========================
+
+The approach described in this document is not a new idea. Direct, or near
+direct device access has been offered by the kernel in different areas for
+decades. With more devices wanting to follow this design pattern it is becoming
+clear that it is not entirely well understood and, more importantly, the
+security considerations are not well defined or agreed upon.
+
+Some examples:
+
+ - HW RAID controllers. This includes RPCs to do things like compose drives into
+   a RAID volume, configure RAID parameters, monitor the HW and more.
+
+ - Baseboard managers. RPCs for configuring settings in the device and more
+
+ - NVMe vendor command capsules. nvme-cli provides access to some monitoring
+   functions that different products have defined, but more exist.
+
+ - CXL also has a NVMe-like vendor command system.
+
+ - DRM allows user space drivers to send commands to the device via kernel
+   mediation
+
+ - RDMA allows user space drivers to directly push commands to the device
+   without kernel involvement
+
+ - Various “raw” APIs, raw HID (SDL2), raw USB, NVMe Generic Interface, etc.
+
+The first 4 are examples of areas that fwctl intends to cover. The latter three
+are examples of denied behavior as they fully overlap with the primary purpose
+of a kernel subsystem.
+
+Some key lessons learned from these past efforts are the importance of having a
+common user space project to use as a pre-condition for obtaining a kernel
+driver. Developing good community around useful software in user space is key to
+getting companies to fund participation to enable their products.
diff --git a/Documentation/userspace-api/fwctl/index.rst b/Documentation/userspace-api/fwctl/index.rst
new file mode 100644
index 00000000000000..06959fbf154743
--- /dev/null
+++ b/Documentation/userspace-api/fwctl/index.rst
@@ -0,0 +1,12 @@ 
+.. SPDX-License-Identifier: GPL-2.0
+
+Firmware Control (FWCTL) Userspace API
+======================================
+
+A framework that define a common set of limited rules that allows user space
+to securely construct and execute RPCs inside device firmware.
+
+.. toctree::
+   :maxdepth: 1
+
+   fwctl
diff --git a/Documentation/userspace-api/index.rst b/Documentation/userspace-api/index.rst
index b1395d94b3fd0a..e8e861f767fd5c 100644
--- a/Documentation/userspace-api/index.rst
+++ b/Documentation/userspace-api/index.rst
@@ -45,6 +45,7 @@  Devices and I/O
 
    accelerators/ocxl
    dma-buf-alloc-exchange
+   fwctl/index
    gpio/index
    iommufd
    media/index
diff --git a/MAINTAINERS b/MAINTAINERS
index 5f30adbe6c8521..319169f7cb7e1c 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -9561,7 +9561,7 @@  FWCTL SUBSYSTEM
 M:	Jason Gunthorpe <jgg@nvidia.com>
 M:	Saeed Mahameed <saeedm@nvidia.com>
 S:	Maintained
-F:	Documentation/userspace-api/fwctl.rst
+F:	Documentation/userspace-api/fwctl/
 F:	drivers/fwctl/
 F:	include/linux/fwctl.h
 F:	include/uapi/fwctl/