From patchwork Wed Aug 21 18:10:58 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Jason Gunthorpe X-Patchwork-Id: 13771949 Received: from NAM04-DM6-obe.outbound.protection.outlook.com (mail-dm6nam04on2064.outbound.protection.outlook.com [40.107.102.64]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D16221B3B11; Wed, 21 Aug 2024 18:11:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=40.107.102.64 ARC-Seal: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1724263872; cv=fail; b=fi+oskbSVKJW6zOBufAqSVhXxxJKMbfrHco04SX2RqzsYBrHEP17nG6g4hX3q4XsW0ZB4b05fQOCNAMcv7s5wdmi4kVS14I/C3QdIzqigJDKYMX8CyoiAA+A7C1xceRZS7CY8+i/YX7gLMupMdlz6WJMZcExEioQQzhLgO3Qz4g= ARC-Message-Signature: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1724263872; c=relaxed/simple; bh=lOfGvwJieDYzOfBNqDRpCtbY8K3x1tWuP5a/8EmMkSw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: Content-Type:MIME-Version; b=uaaV7RQC2jFLRwh60jMWfKMHOSn0zyQ8m1MQwK87Gyaoro0hWG49iODgShJ3ZA0xEmdOCM9ydRIMfglj56S9tJFl+mUA548JuEVz7aePAlm+5KH6Mv6vAzJ/+F09pkrE1BZIXuww/IKeSp7ha082mW+jxTGfPxHnlGPtN1DdeDw= ARC-Authentication-Results: i=2; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com; spf=fail smtp.mailfrom=nvidia.com; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b=XbRyQMn9; arc=fail smtp.client-ip=40.107.102.64 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=nvidia.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b="XbRyQMn9" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=OdN9X9bUgQhd68q/0al1eZ2N+wlsoipV0VSd8aElawYth4cLPQ6+Yt+zyqYvZ9pfS6udWUTevxrG08vzJZUUNZbFvRHEll97L/0R9LKx93jWC8unAeg/eb/Y/Iu/k8ReefTASEyn4IjWRBNiHakSF/kvuz/D+104ucJVk8WXcZk/2aCPtV+r9dFwzzJcxHL3qpZBBsRZcyXgnYr4fAuW5fGpfNQGkQP6PKPEZkRW33g7QpHAD7JOysJLKSusU6AU3uC16o+Xp6EDsvc/qwsvhDTmWJ8IFSsLQSDh4rgGGltydzXJekY2gRm1gYVRBe1k0FDPYHCf5i2ODi/y6lKS2A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=rP1MMt0T5gmXxQryyzSTQCqvQLhx2j7YI+S6ZoiQpKM=; b=x69y4bbZJkVdH3BXmlTTY5cJSYJpMJOqdi+oi5ugGdmTmkcosuWhqpaYqZGCDZGuTSsacnsZJRRAsDem28i/lkj2xpfTDygxodGvn+2aYm8UZDBsw4c7/zL6txF7+6GnEWBHBLijreAGL2eL4rOvKY+hcBKpnwYyQA5v8NFobVsaX15XE2bbj9yhItMedSYjle9smGYUMXvrUowl2vzz+TAmd+Pr8F0GIWs1aS1Z1s7U4iNAX/l8J3qPZkvskCkk/gHkoKTictviUjzZQdi481Yw3D8wV0d63iSLEso/JCTO1DMdlaLxofUNI5TObFrmsBwgKRNd7yJ+aEHtuiLi3g== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=nvidia.com; dmarc=pass action=none header.from=nvidia.com; dkim=pass header.d=nvidia.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=rP1MMt0T5gmXxQryyzSTQCqvQLhx2j7YI+S6ZoiQpKM=; b=XbRyQMn9W4oucXzz3HTXPBV/9Jmf1AdYGwXr0rwKbdpqRGx76x/xL/M4CNvIEMvpKPR6GSJzDw6ggCaS1aqHhn32IvUUMxm9nYf4jiJoZ4qJkSPj/GqVKog2gn/xDLN+w7jIXvl3nnDXEWgpSJOQGSJw9hG9hlandsWIgwsYCjLheXyCQDqkCACCElE3QrLvuOdo8qBii6utSQt1A10F5jFaqVDiu8A4EUgjeMn7fk+ucWHh1y5MgBn4Esn2sCt1saMu56z/jfKBoQHN2ZDKZNJm418+vuvtDgcCYz4C8s9VhEECLWoOamZbcx6ExOPnofkdnG/dYyV6sRzftaNHfw== Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=nvidia.com; Received: from CH3PR12MB7763.namprd12.prod.outlook.com (2603:10b6:610:145::10) by SA1PR12MB7199.namprd12.prod.outlook.com (2603:10b6:806:2bc::21) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7897.17; Wed, 21 Aug 2024 18:11:04 +0000 Received: from CH3PR12MB7763.namprd12.prod.outlook.com ([fe80::8b63:dd80:c182:4ce8]) by CH3PR12MB7763.namprd12.prod.outlook.com ([fe80::8b63:dd80:c182:4ce8%3]) with mapi id 15.20.7875.023; Wed, 21 Aug 2024 18:11:03 +0000 From: Jason Gunthorpe To: Cc: Andy Gospodarek , Aron Silverton , Dan Williams , Daniel Vetter , Dave Jiang , David Ahern , Greg Kroah-Hartman , Christoph Hellwig , Itay Avraham , Jiri Pirko , Jakub Kicinski , Leonid Bloch , Leon Romanovsky , linux-cxl@vger.kernel.org, linux-rdma@vger.kernel.org, Saeed Mahameed Subject: [PATCH v3 06/10] fwctl: Add documentation Date: Wed, 21 Aug 2024 15:10:58 -0300 Message-ID: <6-v3-960f17f90f17+516-fwctl_jgg@nvidia.com> In-Reply-To: <0-v3-960f17f90f17+516-fwctl_jgg@nvidia.com> References: X-ClientProxiedBy: BN9PR03CA0684.namprd03.prod.outlook.com (2603:10b6:408:10e::29) To CH3PR12MB7763.namprd12.prod.outlook.com (2603:10b6:610:145::10) Precedence: bulk X-Mailing-List: linux-rdma@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: CH3PR12MB7763:EE_|SA1PR12MB7199:EE_ X-MS-Office365-Filtering-Correlation-Id: fd39f5b6-b0ce-4d04-ae76-08dcc20c9e31 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|366016|7416014|376014|1800799024; X-Microsoft-Antispam-Message-Info: =?utf-8?q?rlBcXrWDjZroxBWrcFoUnHpWotYg/6c?= =?utf-8?q?jrMg0pQ1nJ2HHOEvuPa+HfjZYFwgfQUPaiLWxp/gIP9wQLXerqa1e853dnnqhEMDX?= =?utf-8?q?t47TB1xgWODIWneIBgpnaVx9YIXcmWCvJkzqwL6lHwWKue9viuAy2RqyZg8o5XH9G?= =?utf-8?q?7RX8EO7PdppJWurp/ndpqmL0rX9n8mgHWC6Y1O9sOLM9lqKlprzdVCV+TkkNU4Syu?= =?utf-8?q?9ML8Y8AgsEOCjrAOZeuao+S1uAaApjr7btn2psj3zNKSU7QMUe1KrJIIf4YyXrOpz?= =?utf-8?q?B/HDO+n0t4zBIUmcQ+T7TzXCq5yTBZiByUUN4E7ZQtR5343qEE8sllL+7fnY+yO2O?= =?utf-8?q?orvjJMmPoEIOcumF+8VMkFqmP3BxwVx7NaJAU/wrn8IrNjnZaxI0jMcEThFQgS1+x?= =?utf-8?q?RzjbG24quZ05RUqAd4ihe0qubiHDNoe5H6pef3njyODPiBnFjzlXRyRv0Rm0mFg1U?= =?utf-8?q?8DCBdct82HB+f3v7U9gZMPGnBSy/y8z1k5rcrXDSIn9KFnlGQv4ecxLhl2uz/2IAo?= =?utf-8?q?Tu9trw6yvsViISTBWKRLPfCM1oJxBbf6QrIvjrWvXO4dlGO/Mq32Qdtyqaq9nDFKl?= =?utf-8?q?vsWISDEHIMynGJasaSzqTuQAMVAmPqjAAs+TIpuHSA4Ci2jM1FaKOGwV0pYpM7koQ?= =?utf-8?q?JRMoYiFWxKGKjli+S5F73XktdI8bBfZTJblOBpgKeB9rsEHaYQ18gByaaxEqzgCIp?= =?utf-8?q?hxMCq5z8KIwPOKTZauo6mQ4uGkxD5ibbktRyWo5W2G72e7rcshZz6wg8fF4iucxzT?= =?utf-8?q?5qEbdNU2yIMXyHVmGdTcyDmz1IgxvchVEenV8qpNzb+9YgMgZGabPok3HEs5Ax2DH?= =?utf-8?q?fyJHY70d8hc/eWqW3UYyfns1qwnhIA5wwSrjiKxNwte2S1HOkJ6wRsCzv8oI1raVU?= =?utf-8?q?Eeh+sA/d54zcZjyqM69TfFyG6YVmGq1S5XQ77x+8VelIKcOXCo7KIMqbInyoi4FgH?= =?utf-8?q?WLFeLJTJJfru2Mye9f6X9IbrO3Fzm8z4Vd6WL94XEI7XRagUhRPwlsWX2TqCEVMbv?= =?utf-8?q?cf6wZ/v/bsFnJnTiobTQh7IHkgVBNx7igd5X5wj0cMZi9wj5WsynIMAq9PCP/0mhB?= =?utf-8?q?R8NU9BbNkX4hLhdSpqSHa5JcodTBuk55VF5z754VTh6CzS0iQBLHxaT1YpCHsUBRE?= =?utf-8?q?tPjdYA2un2bbbZnNs/kceEP49agZxaYW0al9yks50naG9bXJF96LbOQWLHxmmw0SD?= =?utf-8?q?GYWoNuol5EvnVfvhcZxQQouiXonU45kZMWw36SErKiI5Cul41sXfJy4AETPe+xCsK?= =?utf-8?q?HBYO4Klmcf2KvYuOMp9LoHv1B42sx5d9iGoEEp5LrWw+QoAlXTrA5wk4=3D?= X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:CH3PR12MB7763.namprd12.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230040)(366016)(7416014)(376014)(1800799024);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?utf-8?q?1qAHvmSet4YMRpfdxdIiO6GMPISC?= =?utf-8?q?ewsKS8v+gPwlehLLXCY72tP2dBjbIs6Jp6M9Yb9iFZhAjHdtE+rlqoVLvqoLU5tnH?= =?utf-8?q?lVwt412sOjLvXDXUjmK78vWjd37fhhqVj/r4HOZRiuZe3FDmxHJ3hs4eXGmIgOpl5?= =?utf-8?q?ivs6Q6mbA3WJ/ELPDtF3MRwS6Mbn9ij5VS3aNi7qLj5nHW6mKvWuT7mLmvYIFoC4A?= =?utf-8?q?3z4X8WOYJzdqt3xymf4w3X8LRazgJvzWgWRhopCFXSKi8HS4ryNDOyGuRiMmh0Z+S?= =?utf-8?q?TwGPM5i7207FBQ1GOF8cb7GF12aCYCTfcY5Cb0o7Ss/kO+DMRhdZixOkyLW+Awz94?= =?utf-8?q?X47j+ei0cW12F0wC987EnWGQciXgrqPKO1sZv2F8Vnd1s+TTSfdGnH+IH1ILjkb5L?= =?utf-8?q?xu3xAelGplfPUSaLiAVHWWMJso84SsTbWJBrSaPQlrPwgoV859dE4NxR5uM2WUur5?= =?utf-8?q?k5bcU1esdi+l8Ru76pemlDWyQwHQlJATJfqDvQGsE7deyCcM5daBmeI4yNboa1hSa?= =?utf-8?q?UfFgNIsqYDvOtyWBdtm3T9u49mX0Ws3DE5eI/4KnWR1U0DWxfgRux9/loL1DHk06w?= =?utf-8?q?RnsGyBGc2iZXOEyazC/CAv8w5/Y2XOAGPC5bmqnYeb2q/DMYYNRHGryJoi1pcnQxN?= =?utf-8?q?acwop30XmArjQqmRoK4K9W3EuQOEku7eGWh1xGhSkEDlKu7yf9i9M4+Z/mIHRvfgh?= =?utf-8?q?P8gpLISfl2QrgbpRDJ95sQO4YD9VhGQI+PCEKxg3hp0piBIowCetX2153p/2Bx5QK?= =?utf-8?q?z9ikDZ7UOYVqkaY0ENVYAfDJqqy3CUvUyKNHS5pSlyhrdk9bW/1gfzl3h+oafCB6g?= =?utf-8?q?n/txWa8L94D1lt69UDuColtDte9QWH0uO+Vy8UFuxeiNaWiLZunnGIs4T7+/6m3wU?= =?utf-8?q?Hs85ohrZcfgPAHjniet4PTQ5u+yHvmf8WAm93j2lx8hLy7IAGSHQe41aWVQeMEwNh?= =?utf-8?q?WGlrdeEIYGJ+Z8HNwOrmQgpOEeDJfjKkEuuVJtGbAIKsNnyE7kFJXesKT+PCBMmt2?= =?utf-8?q?jSyswAl4VP2ymHb+za37O9bGSOU5KbMjyPghOTYngRqyTTC8w7+e+qM3yrWmLRHtW?= =?utf-8?q?OvANCwOxOKaksIeeOKAhyr5RuSEhklQ7oK1VYL703ClIhg/B1Y6M9wP83A9N4mlMh?= =?utf-8?q?03JkIXn0s6GEGo6xBS+s9+aqOqpScXRy7JHzqnRROrnFum/bA07eFg02m8HLiasME?= =?utf-8?q?sP6/rytRJxjwfqGuHkE4PXctcO6nXg5WU4a/gMiqXCHYDfgmLmtXzRW3ntVTTqcFg?= =?utf-8?q?Bl9rXlreO7ffwIAn2yMKe1HwzvWXtYgNnjK7jp24+JSspSY25k905j/8qCSsVzKKc?= =?utf-8?q?MzFrCondE3pUAUQMBte3gXEOa4Rad8FDZ0YWHfJo4qKLzQh7TLofVVLiM9+ZG5FG3?= =?utf-8?q?UShj2cZ/N+uc1R03OIAv83Q/1yNSUi18k2yQxEdNAC9rrVYtRtt0io5lzuOOIYnWL?= =?utf-8?q?G+Kl3uKMJXSjDPLgFSNIeQixtNy9E35CJXiCemH0SKXe4+ik/Y6F+iycgMeNEv6La?= =?utf-8?q?evJPsO7CrYvP?= X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-Network-Message-Id: fd39f5b6-b0ce-4d04-ae76-08dcc20c9e31 X-MS-Exchange-CrossTenant-AuthSource: CH3PR12MB7763.namprd12.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 21 Aug 2024 18:11:03.1222 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: naKeTi1rLtPpbRWABlWxshgchyuYX+CI1+kLFco5jqhEDJ/32s48IjoRl/tEmGRT X-MS-Exchange-Transport-CrossTenantHeadersStamped: SA1PR12MB7199 Document the purpose and rules for the fwctl subsystem. Link in kdocs to the doc tree. Nacked-by: Jakub Kicinski Link: https://lore.kernel.org/r/20240603114250.5325279c@kernel.org Acked-by: Daniel Vetter https://lore.kernel.org/r/ZrHY2Bds7oF7KRGz@phenom.ffwll.local Signed-off-by: Jason Gunthorpe Reviewed-by: Jonathan Cameron --- Documentation/userspace-api/fwctl.rst | 285 ++++++++++++++++++++++++++ Documentation/userspace-api/index.rst | 1 + 2 files changed, 286 insertions(+) create mode 100644 Documentation/userspace-api/fwctl.rst diff --git a/Documentation/userspace-api/fwctl.rst b/Documentation/userspace-api/fwctl.rst new file mode 100644 index 00000000000000..8f3da30ee7c91b --- /dev/null +++ b/Documentation/userspace-api/fwctl.rst @@ -0,0 +1,285 @@ +.. SPDX-License-Identifier: GPL-2.0 + +=============== +fwctl subsystem +=============== + +:Author: Jason Gunthorpe + +Overview +======== + +Modern devices contain extensive amounts of FW, and in many cases, are largely +software-defined pieces of hardware. The evolution of this approach is largely a +reaction to Moore's Law where a chip tape out is now highly expensive, and the +chip design is extremely large. Replacing fixed HW logic with a flexible and +tightly coupled FW/HW combination is an effective risk mitigation against chip +respin. Problems in the HW design can be counteracted in device FW. This is +especially true for devices which present a stable and backwards compatible +interface to the operating system driver (such as NVMe). + +The FW layer in devices has grown to incredible sizes and devices frequently +integrate clusters of fast processors to run it. For example, mlx5 devices have +over 30MB of FW code, and big configurations operate with over 1GB of FW managed +runtime state. + +The availability of such a flexible layer has created quite a variety in the +industry where single pieces of silicon are now configurable software-defined +devices and can operate in substantially different ways depending on the need. +Further, we often see cases where specific sites wish to operate devices in ways +that are highly specialized and require applications that have been tailored to +their unique configuration. + +Further, devices have become multi-functional and integrated to the point they +no longer fit neatly into the kernel's division of subsystems. Modern +multi-functional devices have drivers, such as bnxt/ice/mlx5/pds, that span many +subsystems while sharing the underlying hardware using the auxiliary device +system. + +All together this creates a challenge for the operating system, where devices +have an expansive FW environment that needs robust device-specific debugging +support, and FW-driven functionality that is not well suited to “generic” +interfaces. fwctl seeks to allow access to the full device functionality from +user space in the areas of debuggability, management, and first-boot/nth-boot +provisioning. + +fwctl is aimed at the common device design pattern where the OS and FW +communicate via an RPC message layer constructed with a queue or mailbox scheme. +In this case the driver will typically have some layer to deliver RPC messages +and collect RPC responses from device FW. The in-kernel subsystem drivers that +operate the device for its primary purposes will use these RPCs to build their +drivers, but devices also usually have a set of ancillary RPCs that don't really +fit into any specific subsystem. For example, a HW RAID controller is primarily +operated by the block layer but also comes with a set of RPCs to administer the +construction of drives within the HW RAID. + +In the past when devices were more single function, individual subsystems would +grow different approaches to solving some of these common problems. For instance +monitoring device health, manipulating its FLASH, debugging the FW, +provisioning, all have various unique interfaces across the kernel. + +fwctl's purpose is to define a common set of limited rules, described below, +that allow user space to securely construct and execute RPCs inside device FW. +The rules serve as an agreement between the operating system and FW on how to +correctly design the RPC interface. As a uAPI the subsystem provides a thin +layer of discovery and a generic uAPI to deliver the RPCs and collect the +response. It supports a system of user space libraries and tools which will +use this interface to control the device using the device native protocols. + +Scope of Action +--------------- + +fwctl drivers are strictly restricted to being a way to operate the device FW. +It is not an avenue to access random kernel internals, or other operating system +SW states. + +fwctl instances must operate on a well-defined device function, and the device +should have a well-defined security model for what scope within the physical +device the function is permitted to access. For instance, the most complex PCIe +device today may broadly have several function-level scopes: + + 1. A privileged function with full access to the on-device global state and + configuration + + 2. Multiple hypervisor functions with control over itself and child functions + used with VMs + + 3. Multiple VM functions tightly scoped within the VM + +The device may create a logical parent/child relationship between these scopes. +For instance a child VM's FW may be within the scope of the hypervisor FW. It is +quite common in the VFIO world that the hypervisor environment has a complex +provisioning/profiling/configuration responsibility for the function VFIO +assigns to the VM. + +Further, within the function, devices often have RPC commands that fall within +some general scopes of action (see enum fwctl_rpc_scope): + + 1. Access to function & child configuration, FLASH, etc. that becomes live at a + function reset. Access to function & child runtime configuration that is + transparent or non-disruptive to any driver or VM. + + 2. Read-only access to function debug information that may report on FW objects + in the function & child, including FW objects owned by other kernel + subsystems. + + 3. Write access to function & child debug information strictly compatible with + the principles of kernel lockdown and kernel integrity protection. Triggers + a kernel Taint. + + 4. Full debug device access. Triggers a kernel Taint, requires CAP_SYS_RAWIO. + +User space will provide a scope label on each RPC and the kernel must enforce the +above CAPs and taints based on that scope. A combination of kernel and FW can +enforce that RPCs are placed in the correct scope by user space. + +Denied behavior +--------------- + +There are many things this interface must not allow user space to do (without a +Taint or CAP), broadly derived from the principles of kernel lockdown. Some +examples: + + 1. DMA to/from arbitrary memory, hang the system, compromise FW integrity with + untrusted code, or otherwise compromise device or system security and + integrity. + + 2. Provide an abnormal “back door” to kernel drivers. No manipulation of kernel + objects owned by kernel drivers. + + 3. Directly configure or otherwise control kernel drivers. A subsystem kernel + driver can react to the device configuration at function reset/driver load + time, but otherwise must not be coupled to fwctl. + + 4. Operate the HW in a way that overlaps with the core purpose of another + primary kernel subsystem, such as read/write to LBAs, send/receive of + network packets, or operate an accelerator's data plane. + +fwctl is not a replacement for device direct access subsystems like uacce or +VFIO. + +Operations exposed through fwctl's non-taining interfaces should be fully +sharable with other users of the device. For instance exposing a RPC through +fwctl should never prevent a kernel subsystem from also concurrently using that +same RPC or hardware unit down the road. In such cases fwctl will be less +important than proper kernel subsystems that eventually emerge. Mistakes in this +area resulting in clashes will be resolved in favour of a kernel implementation. + +fwctl User API +============== + +.. kernel-doc:: include/uapi/fwctl/fwctl.h +.. kernel-doc:: include/uapi/fwctl/mlx5.h + +sysfs Class +----------- + +fwctl has a sysfs class (/sys/class/fwctl/fwctlNN/) and character devices +(/dev/fwctl/fwctlNN) with a simple numbered scheme. The character device +operates the iotcl uAPI described above. + +fwctl devices can be related to driver components in other subsystems through +sysfs:: + + $ ls /sys/class/fwctl/fwctl0/device/infiniband/ + ibp0s10f0 + + $ ls /sys/class/infiniband/ibp0s10f0/device/fwctl/ + fwctl0/ + + $ ls /sys/devices/pci0000:00/0000:00:0a.0/fwctl/fwctl0 + dev device power subsystem uevent + +User space Community +-------------------- + +Drawing inspiration from nvme-cli, participating in the kernel side must come +with a user space in a common TBD git tree, at a minimum to usefully operate the +kernel driver. Providing such an implementation is a pre-condition to merging a +kernel driver. + +The goal is to build user space community around some of the shared problems +we all have, and ideally develop some common user space programs with some +starting themes of: + + - Device in-field debugging + + - HW provisioning + + - VFIO child device profiling before VM boot + + - Confidential Compute topics (attestation, secure provisioning) + +that stretch across all subsystems in the kernel. fwupd is a great example of +how an excellent user space experience can emerge out of kernel-side diversity. + +fwctl Kernel API +================ + +.. kernel-doc:: drivers/fwctl/main.c + :export: +.. kernel-doc:: include/linux/fwctl.h + +fwctl Driver design +------------------- + +In many cases a fwctl driver is going to be part of a larger cross-subsystem +device possibly using the auxiliary_device mechanism. In that case several +subsystems are going to be sharing the same device and FW interface layer so the +device design must already provide for isolation and cooperation between kernel +subsystems. fwctl should fit into that same model. + +Part of the driver should include a description of how its scope restrictions +and security model work. The driver and FW together must ensure that RPCs +provided by user space are mapped to the appropriate scope. If the validation is +done in the driver then the validation can read a 'command effects' report from +the device, or hardwire the enforcement. If the validation is done in the FW, +then the driver should pass the fwctl_rpc_scope to the FW along with the command. + +The driver and FW must cooperate to ensure that either fwctl cannot allocate +any FW resources, or any resources it does allocate are freed on FD closure. A +driver primarily constructed around FW RPCs may find that its core PCI function +and RPC layer belongs under fwctl with auxiliary devices connecting to other +subsystems. + +Each device type must be mindful of Linux's philosophy for stable ABI. The FW +RPC interface does not have to meet a strictly stable ABI, but it does need to +meet an expectation that userspace tools that are deployed and in significant +use don't needlessly break. FW upgrade and kernel upgrade should keep widely +deployed tooling working. + +Development and debugging focused RPCs under more permissive scopes can have +less stablitiy if the tools using them are only run under exceptional +circumstances and not for every day use of the device. Debugging tools may even +require exact version matching as they may require something similar to DWARF +debug information from the FW binary. + +Security Response +================= + +The kernel remains the gatekeeper for this interface. If violations of the +scopes, security or isolation principles are found, we have options to let +devices fix them with a FW update, push a kernel patch to parse and block RPC +commands or push a kernel patch to block entire firmware versions/devices. + +While the kernel can always directly parse and restrict RPCs, it is expected +that the existing kernel pattern of allowing drivers to delegate validation to +FW to be a useful design. + +Existing Similar Examples +========================= + +The approach described in this document is not a new idea. Direct, or near +direct device access has been offered by the kernel in different areas for +decades. With more devices wanting to follow this design pattern it is becoming +clear that it is not entirely well understood and, more importantly, the +security considerations are not well defined or agreed upon. + +Some examples: + + - HW RAID controllers. This includes RPCs to do things like compose drives into + a RAID volume, configure RAID parameters, monitor the HW and more. + + - Baseboard managers. RPCs for configuring settings in the device and more + + - NVMe vendor command capsules. nvme-cli provides access to some monitoring + functions that different products have defined, but more exist. + + - CXL also has a NVMe-like vendor command system. + + - DRM allows user space drivers to send commands to the device via kernel + mediation + + - RDMA allows user space drivers to directly push commands to the device + without kernel involvement + + - Various “raw” APIs, raw HID (SDL2), raw USB, NVMe Generic Interface, etc. + +The first 4 are examples of areas that fwctl intends to cover. The latter three +are examples of denied behavior as they fully overlap with the primary purpose +of a kernel subsystem. + +Some key lessons learned from these past efforts are the importance of having a +common user space project to use as a pre-condition for obtaining a kernel +driver. Developing good community around useful software in user space is key to +getting companies to fund participation to enable their products. diff --git a/Documentation/userspace-api/index.rst b/Documentation/userspace-api/index.rst index 274cc7546efc2a..2bc43a65807486 100644 --- a/Documentation/userspace-api/index.rst +++ b/Documentation/userspace-api/index.rst @@ -44,6 +44,7 @@ Devices and I/O accelerators/ocxl dma-buf-alloc-exchange + fwctl gpio/index iommufd media/index