Message ID | 10-v4-0cf4ec3b8143+4995-fwctl_jgg@nvidia.com (mailing list archive) |
---|---|
State | Not Applicable |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | Introduce fwctl subystem | expand |
On Thu, 6 Feb 2025 20:13:32 -0400 Jason Gunthorpe wrote: > From: Andy Gospodarek <gospo@broadcom.com> > > Signed-off-by: Andy Gospodarek <gospo@broadcom.com> > Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> This is only needed for RDMA, why can't you make this part of bnxt_re ?
On Thu, Feb 6, 2025 at 7:44 PM Jakub Kicinski <kuba@kernel.org> wrote: > > On Thu, 6 Feb 2025 20:13:32 -0400 Jason Gunthorpe wrote: > > From: Andy Gospodarek <gospo@broadcom.com> > > > > Signed-off-by: Andy Gospodarek <gospo@broadcom.com> > > Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> > > This is only needed for RDMA, why can't you make this part of bnxt_re ? This is not just needed for RDMA, so having the aux device for fwctl as part of the base driver is preferred.
On Thu, Feb 06, 2025 at 10:17:58PM -0500, Andy Gospodarek wrote: > On Thu, Feb 6, 2025 at 7:44 PM Jakub Kicinski <kuba@kernel.org> wrote: > > > > On Thu, 6 Feb 2025 20:13:32 -0400 Jason Gunthorpe wrote: > > > From: Andy Gospodarek <gospo@broadcom.com> > > > > > > Signed-off-by: Andy Gospodarek <gospo@broadcom.com> > > > Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> > > > > This is only needed for RDMA, why can't you make this part of bnxt_re ? > > This is not just needed for RDMA, so having the aux device for fwctl > as part of the base driver is preferred. Same for mlx5 I have to apologize, somehow the bnxt WIP patches got included here, I did not intend that, it was late and I'm juggling too many things this week. Jason
On Thu, 6 Feb 2025 22:17:58 -0500 Andy Gospodarek wrote: > On Thu, Feb 6, 2025 at 7:44 PM Jakub Kicinski <kuba@kernel.org> wrote: > > On Thu, 6 Feb 2025 20:13:32 -0400 Jason Gunthorpe wrote: > > > From: Andy Gospodarek <gospo@broadcom.com> > > > > > > Signed-off-by: Andy Gospodarek <gospo@broadcom.com> > > > Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> > > > > This is only needed for RDMA, why can't you make this part of bnxt_re ? > > This is not just needed for RDMA, so having the aux device for fwctl > as part of the base driver is preferred. Please elaborate. As you well know I have experience using Broadcom devices in large TCP/IP networks, without the need for proprietary tooling. Now, I understand that it may be expedient for Broadcom and nVidia to skip the upstream process and ship "features" to customers using DOCA and whatever you call your tooling. But let's be honest that this is the motivation here. Unified support for proprietary tooling across subsystems and product lines for a given vendor. This way migrating from in-tree networking to proprietary IPU/DPU networking is easier, while migrating to another vendor would require full tooling replacement. I have nothing against RDMA and CXL subsystems adding whatever APIs they want. But I don't understand why you think it's okay to force this on normal networking, which does not need it. nVidia is already refusing to add basic minoring features to their upstream driver, and keeps asking its customers to migrate to libdoca. So the concern that merging this will negatively impact standard tooling is no longer theoretical. Anyway, rant over. Give us some technical details.
On 07 Feb 07:36, Jakub Kicinski wrote: >On Thu, 6 Feb 2025 22:17:58 -0500 Andy Gospodarek wrote: >> On Thu, Feb 6, 2025 at 7:44 PM Jakub Kicinski <kuba@kernel.org> wrote: >> > On Thu, 6 Feb 2025 20:13:32 -0400 Jason Gunthorpe wrote: >> > > From: Andy Gospodarek <gospo@broadcom.com> >> > > >> > > Signed-off-by: Andy Gospodarek <gospo@broadcom.com> >> > > Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> >> > >> > This is only needed for RDMA, why can't you make this part of bnxt_re ? >> >> This is not just needed for RDMA, so having the aux device for fwctl >> as part of the base driver is preferred. > >Please elaborate. As you well know I have experience using Broadcom >devices in large TCP/IP networks, without the need for proprietary >tooling. > I think Andy was referring to the fact that the aux bus management is implemented in the base driver which is currently under netdev stack. fwctl as well as netdev/RDMA are aux drivers sharing the same base "pci" device. So it's not part of netdev TCP/IP, due to fact that the pci base driver is part of netdev due to historical reasons, driver developers have to go through netdev mailing list to review pci/bus/aux device driver patches, which has nothing to do with TCP/IP. If netdev doesn't welcome non-TCP/IP patches, then I think we have a bigger problem here.. >Now, I understand that it may be expedient for Broadcom and nVidia >to skip the upstream process and ship "features" to customers using >DOCA and whatever you call your tooling. But let's be honest that >this is the motivation here. Unified support for proprietary tooling DOCA doesn't need FWCTL. DOCA and DPDK has all the support they need to configure the pipeline and their own transport without FWCTL. This patchset has nothing to do with DOCA, this is pretty clear from mutli vendor and corss-kernel support for FWCTL. >across subsystems and product lines for a given vendor. This way >migrating from in-tree networking to proprietary IPU/DPU networking >is easier, while migrating to another vendor would require full tooling >replacement. This is old netdev mentality, we must not apply it to all subsystems. And I don't understand why tooling has to be standardized in-kernel-tree? We have plans to standardize tooling in user-space, this was already very successfully done in the nvme-cli.. > >I have nothing against RDMA and CXL subsystems adding whatever APIs >they want. But I don't understand why you think it's okay to force >this on normal networking, which does not need it. > As explained above, netdev doesn't need it, but netdev subsystem also hosts the pci base drivers, so you are going to see fwctl patches the same as you see rdma and other non netdev patches flowing through netdev ML. >nVidia is already refusing to add basic minoring features to their >upstream driver, and keeps asking its customers to migrate to libdoca. nVidia is one of the top contributers to netdev, we submit patches on weekly bases and due to netdev mailing list review backlog and policy we barely make quota, so please elaborate on what features we are refusing to do ?? >So the concern that merging this will negatively impact standard >tooling is no longer theoretical. > >Anyway, rant over. Give us some technical details. Technical details: Driver specific aux subsystem implementation is base driver responsibility which is hosted under netdev.
On Fri, 7 Feb 2025 12:25:28 -0800 Saeed Mahameed wrote: > >nVidia is already refusing to add basic minoring features to their > >upstream driver, and keeps asking its customers to migrate to libdoca. > > nVidia is one of the top contributers to netdev, That's inaccurate. I can't think of a single meaningful contribution from nVidia's NIC team outside of your own driver in the last 2 years. > we submit patches on weekly bases and due to netdev mailing list > review backlog and policy we barely make quota, Luckily we have development statistics so we don't have to argue: Top reviewers (cs): Top reviewers (msg): 1 ( +1) [27] Meta 1 ( +1) [68] Meta 2 ( -1) [25] RedHat 2 ( -1) [57] RedHat 3 ( +1) [19] Intel 3 ( +2) [49] Intel 4 ( -1) [15] Andrew Lunn 4 ( ) [43] Andrew Lunn 5 ( ) [12] Google 5 ( -2) [32] Google 6 ( +2) [ 5] Linaro 6 ( +3) [13] NXP 7 ( +3) [ 4] Oracle 7 ( +5) [13] Oracle Top authors (cs): Top authors (msg): 1 ( ) [9] RedHat 1 ( ) [48] Intel 2 ( +2) [8] Google 2 ( ) [42] RedHat 3 ( -1) [7] Intel 3 ( +1) [39] Meta 4 ( -1) [7] Meta 4 ( -1) [31] Huawei 5 ( +2) [5] nVidia 5 ( ) [31] nVidia 6 ( +7) [3] Oracle 6 (+11) [28] Oracle 7 ( +9) [2] Linaro 7 (+15) [23] Pengutronix Top scores (positive): Top scores (negative): 1 ( +1) [329] Meta 1 ( ) [92] Huawei 2 ( +1) [265] Andrew Lunn 2 ( +1) [76] OpenVPN 3 ( -2) [238] RedHat 3 (***) [54] Pengutronix 4 ( +3) [125] Intel 4 ( +2) [53] Marvell 5 ( -1) [116] Google 5 ( +5) [50] Dent 6 ( -1) [ 70] Linaro 6 (***) [45] nVidia 7 ( -1) [ 39] Broadcom 7 (+12) [43] AMD https://lore.kernel.org/all/20250121200710.19126f7d@kernel.org/ nVidia has a negative review vs authorship score. It'd probably be much worse if it wasn't for the work of the switch team. > so please elaborate on what features we are refusing to do ?? nVidia likes to send these threads to my management so I need to be careful. An issue was discovered during new platform evaluation. That's all I'm gonna say. > As explained above, netdev doesn't need it, but netdev subsystem also > hosts the pci base drivers, so you are going to see fwctl patches the > same as you see rdma and other non netdev patches flowing through > netdev ML. Sure, but we're deadlocked here. It may be a slight inconvenience to redo the interface so that its not a standalone aux bus driver. But if you agree the netdev doesn't need it seems like a fairly straightforward way to unblock your progress. I am glad that you at least agree now that nedev doesn't need it.
On Fri, Feb 7, 2025 at 10:36 AM Jakub Kicinski <kuba@kernel.org> wrote: > > On Thu, 6 Feb 2025 22:17:58 -0500 Andy Gospodarek wrote: > > On Thu, Feb 6, 2025 at 7:44 PM Jakub Kicinski <kuba@kernel.org> wrote: > > > On Thu, 6 Feb 2025 20:13:32 -0400 Jason Gunthorpe wrote: > > > > From: Andy Gospodarek <gospo@broadcom.com> > > > > > > > > Signed-off-by: Andy Gospodarek <gospo@broadcom.com> > > > > Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> > > > > > > This is only needed for RDMA, why can't you make this part of bnxt_re ? > > > > This is not just needed for RDMA, so having the aux device for fwctl > > as part of the base driver is preferred. > > Please elaborate. As you well know I have experience using Broadcom > devices in large TCP/IP networks, without the need for proprietary > tooling. I totally get that. As a user it is not satisfying to have to download and attempt to compile complicated proprietary tools to use hardware features that seem like they should just work. I don't think fwctl should be used as a crutch to avoid doing the work that is needed to get support upstream. > Now, I understand that it may be expedient for Broadcom and nVidia > to skip the upstream process and ship "features" to customers using > DOCA and whatever you call your tooling. But let's be honest that > this is the motivation here. Unified support for proprietary tooling > across subsystems and product lines for a given vendor. This way > migrating from in-tree networking to proprietary IPU/DPU networking > is easier, while migrating to another vendor would require full tooling > replacement. > > I have nothing against RDMA and CXL subsystems adding whatever APIs > they want. But I don't understand why you think it's okay to force > this on normal networking, which does not need it. > > nVidia is already refusing to add basic minoring features to their > upstream driver, and keeps asking its customers to migrate to libdoca. > So the concern that merging this will negatively impact standard > tooling is no longer theoretical. > > Anyway, rant over. Give us some technical details. The primary use-case that I find valuable is the ability to perform debug of different parts of a hardware pipeline when devices are already in the field. This could be the standard ethernet pipeline, RoCE, crypto, etc. We do have the ability to gather all the information we need via tools like ethtool and devlink, but there are cases where running a tool in real-time can help us know what is happening in a system on a per packet basis. We actually did something like this this week. When I look at fwctl, I don't see it as something that is valuable only today -- I see it as something that is valuable 2 years from now. When someone is still running v6.17 and we have discovered that a debug counter/infra that was added in v7.0, but they cannot use it without installing a new kernel or an OOB driver we don't have an option to easily help narrow down the problem. If a fairly simple tool can help perform RPC to FW to glean some of this hardware information we save days of back and forth debugging with special drivers to try and help narrow down the issue.
On Fri, 7 Feb 2025 18:29:15 -0500 Andy Gospodarek wrote: > The primary use-case that I find valuable is the ability to perform > debug of different parts of a hardware pipeline when devices are > already in the field. I think we covered the "debug by people who have access to the RTL" in previous iterations. For networking that's not a sufficient use case for a backdoor this powerful, sorry.
On 07 Feb 13:51, Jakub Kicinski wrote: >On Fri, 7 Feb 2025 12:25:28 -0800 Saeed Mahameed wrote: >> >nVidia is already refusing to add basic minoring features to their >> >upstream driver, and keeps asking its customers to migrate to libdoca. >> >> nVidia is one of the top contributers to netdev, > >That's inaccurate. I can't think of a single meaningful contribution >from nVidia's NIC team outside of your own driver in the last 2 years. > I can help refresh your memory. Switchdev, devlink, XFRM, TLS, XDP (multi buff, meta data), page pool, and I'm pretty sure much more. I can also point out a lot of projects that are still stuck for many years due to lack of agreements on design and communicated importance e.g: PSP, TCP ZC, devlink params, and some more.. Maybe you mean meaningful to you, which is very hard to predict what is meaningful to you without clear communication. >> we submit patches on weekly bases and due to netdev mailing list >> review backlog and policy we barely make quota, > >Luckily we have development statistics so we don't have to argue: > Yes we don't have to argue, thanks for sharing. [...] snip top reviewers since it's not part of the discussion. >Top authors (cs): Top authors (msg): > 1 ( ) [9] RedHat 1 ( ) [48] Intel > 2 ( +2) [8] Google 2 ( ) [42] RedHat > 3 ( -1) [7] Intel 3 ( +1) [39] Meta > 4 ( -1) [7] Meta 4 ( -1) [31] Huawei > 5 ( +2) [5] nVidia 5 ( ) [31] nVidia ^^^^^^ ^^^^^^ So we do contribute to netdev.. and we are not moving away from netdev which was the whole point of your argument. [...] snip Top scores, since doing reviews is not the issue here. It's a separate topic. If you want we can discuss in a separate thread since I got a lot of what to say on this. >https://lore.kernel.org/all/20250121200710.19126f7d@kernel.org/ > >nVidia has a negative review vs authorship score. It'd probably >be much worse if it wasn't for the work of the switch team. > Irrelevant to FWCTL. And yes very important topic to discuss, we have our own reasons and concerns. Let me know if you want to open this topic for discussion in a separate thread. >> so please elaborate on what features we are refusing to do ?? > >nVidia likes to send these threads to my management so I need >to be careful. An issue was discovered during new platform evaluation. >That's all I'm gonna say. > I am not sure what you are talking about, but as one of the mlx5 maintainers I am 100% we are not refusing to do anything that we've been asked, it is all about priorities, you have to sort this out with whoever is reaching out to you :). It's really hard to keep the discussion coherent and objective when you are referring to private discussions I am not really part of, that we can't discuss here, yet you brought them up. >> As explained above, netdev doesn't need it, but netdev subsystem also >> hosts the pci base drivers, so you are going to see fwctl patches the >> same as you see rdma and other non netdev patches flowing through >> netdev ML. > >Sure, but we're deadlocked here. It may be a slight inconvenience to >redo the interface so that its not a standalone aux bus driver. But if >you agree the netdev doesn't need it seems like a fairly straightforward >way to unblock your progress. Yes Aux needs some improvements and it must and can be abstracted out of netdev relatively easily, to remove this unnecessary workload on netdev ML. > >I am glad that you at least agree now that nedev doesn't need it. netdev can perfectly operate with all the standard tooling we got and we will keep on developing them, TCP/IP configurability is well-established, that being said, netdev is very bad at debug, and really really behind, the few debugfs' and devlinks we have really don't cut it and will never be as good as fwctl, so mlx5 fwctl has to run side by side with netdev, I believe the same is true for all other vendors.
On Fri, Feb 07, 2025 at 01:51:11PM -0800, Jakub Kicinski wrote: > But if you agree the netdev doesn't need it seems like a fairly > straightforward way to unblock your progress. I'm trying to understand what you are suggesting here. We have many scenarios where mlx5_core spawns all kinds of different devices, including recovery cases where there is no networking at all and only fwctl. So we can't just discard the aux dev or mlx5_core triggered setup without breaking scenarios. However, you seem to be suggesting that netdev-only configurations (ie netdev loaded but no rdma loaded) should disable fwctl. Is that the case? All else would remain the same. It is very ugly but I could see a technical path to do it, and would consider it if that brings peace. Jason
On Fri, Feb 7, 2025 at 8:16 PM Jason Gunthorpe <jgg@nvidia.com> wrote: > > On Fri, Feb 07, 2025 at 01:51:11PM -0800, Jakub Kicinski wrote: > > > But if you agree the netdev doesn't need it seems like a fairly > > straightforward way to unblock your progress. > > I'm trying to understand what you are suggesting here. > > We have many scenarios where mlx5_core spawns all kinds of different > devices, including recovery cases where there is no networking at all > and only fwctl. So we can't just discard the aux dev or mlx5_core > triggered setup without breaking scenarios. > > However, you seem to be suggesting that netdev-only configurations (ie > netdev loaded but no rdma loaded) should disable fwctl. Is that the > case? All else would remain the same. It is very ugly but I could see > a technical path to do it, and would consider it if that brings peace. > We can probably live with that as well if it's required to keep fwctl in an RDMA driver and out of pure netdevs.
On Fri, 7 Feb 2025 21:16:47 -0400 Jason Gunthorpe wrote: > On Fri, Feb 07, 2025 at 01:51:11PM -0800, Jakub Kicinski wrote: > > > But if you agree the netdev doesn't need it seems like a fairly > > straightforward way to unblock your progress. > > I'm trying to understand what you are suggesting here. > > We have many scenarios where mlx5_core spawns all kinds of different > devices, including recovery cases where there is no networking at all > and only fwctl. So we can't just discard the aux dev or mlx5_core > triggered setup without breaking scenarios. > > However, you seem to be suggesting that netdev-only configurations (ie > netdev loaded but no rdma loaded) should disable fwctl. Is that the > case? All else would remain the same. It is very ugly but I could see > a technical path to do it, and would consider it if that brings peace. Yes, when RDMA driver is not loaded there should be no access to fwctl. When RDMA is disabled on the device via devlink there should be no fwctl access. To disincentivize "creative workarounds" we have to also agree and document that fwctl must not be used to configure TCP/IP functions of the device, or host queues used by the netdev stack.
On Mon, Feb 10, 2025 at 05:04:23PM -0800, Jakub Kicinski wrote: > On Fri, 7 Feb 2025 21:16:47 -0400 Jason Gunthorpe wrote: > > On Fri, Feb 07, 2025 at 01:51:11PM -0800, Jakub Kicinski wrote: > > > > > But if you agree the netdev doesn't need it seems like a fairly > > > straightforward way to unblock your progress. > > > > I'm trying to understand what you are suggesting here. > > > > We have many scenarios where mlx5_core spawns all kinds of different > > devices, including recovery cases where there is no networking at all > > and only fwctl. So we can't just discard the aux dev or mlx5_core > > triggered setup without breaking scenarios. > > > > However, you seem to be suggesting that netdev-only configurations (ie > > netdev loaded but no rdma loaded) should disable fwctl. Is that the > > case? All else would remain the same. It is very ugly but I could see > > a technical path to do it, and would consider it if that brings peace. > > Yes, when RDMA driver is not loaded there should be no access to fwctl. There are users mentioned in cover letter, which need FWCTL without RDMA. https://lore.kernel.org/all/0-v4-0cf4ec3b8143+4995-fwctl_jgg@nvidia.com/ I want to suggest something different. What about to move all XXX_core logic (mlx5_core, bnxt_core, e.t.c.) from netdev to some other dedicated place? There is no technical need to have PCI/FW logic inside networking stack. Thanks
On Tue, Feb 11, 2025 at 2:55 AM Leon Romanovsky <leon@kernel.org> wrote: > > On Mon, Feb 10, 2025 at 05:04:23PM -0800, Jakub Kicinski wrote: > > On Fri, 7 Feb 2025 21:16:47 -0400 Jason Gunthorpe wrote: > > > On Fri, Feb 07, 2025 at 01:51:11PM -0800, Jakub Kicinski wrote: > > > > > > > But if you agree the netdev doesn't need it seems like a fairly > > > > straightforward way to unblock your progress. > > > > > > I'm trying to understand what you are suggesting here. > > > > > > We have many scenarios where mlx5_core spawns all kinds of different > > > devices, including recovery cases where there is no networking at all > > > and only fwctl. So we can't just discard the aux dev or mlx5_core > > > triggered setup without breaking scenarios. > > > > > > However, you seem to be suggesting that netdev-only configurations (ie > > > netdev loaded but no rdma loaded) should disable fwctl. Is that the > > > case? All else would remain the same. It is very ugly but I could see > > > a technical path to do it, and would consider it if that brings peace. > > > > Yes, when RDMA driver is not loaded there should be no access to fwctl. > > There are users mentioned in cover letter, which need FWCTL without RDMA. > https://lore.kernel.org/all/0-v4-0cf4ec3b8143+4995-fwctl_jgg@nvidia.com/ > > I want to suggest something different. What about to move all XXX_core > logic (mlx5_core, bnxt_core, e.t.c.) from netdev to some other dedicated > place? > I understand the logic in your statement, but I do not want to separate/split PCI driver from the NIC driver for bnxt-based devices. We can look at doing that for future generations of hardware, but splitting/switching drivers for existing hardware creates a poor user-experience for distro users.
On 2/10/25 6:04 PM, Jakub Kicinski wrote: > Yes, when RDMA driver is not loaded there should be no access to fwctl. > When RDMA is disabled on the device via devlink there should be no > fwctl access. > > To disincentivize "creative workarounds" we have to also agree and > document that fwctl must not be used to configure TCP/IP functions > of the device, or host queues used by the netdev stack. Your request is not about "RDMA only" since there are non-RDMA use cases at play (e.g., CXL). It seems like what you are really asking for is a hard exception for "netdev" use cases, right? So a summary along the lines of: "Any resources in use by the netdev stack can only be created and modified by established netdev tools."
On 2/10/2025 11:55 PM, Leon Romanovsky wrote: > > On Mon, Feb 10, 2025 at 05:04:23PM -0800, Jakub Kicinski wrote: >> On Fri, 7 Feb 2025 21:16:47 -0400 Jason Gunthorpe wrote: >>> On Fri, Feb 07, 2025 at 01:51:11PM -0800, Jakub Kicinski wrote: >>> >>>> But if you agree the netdev doesn't need it seems like a fairly >>>> straightforward way to unblock your progress. >>> >>> I'm trying to understand what you are suggesting here. >>> >>> We have many scenarios where mlx5_core spawns all kinds of different >>> devices, including recovery cases where there is no networking at all >>> and only fwctl. So we can't just discard the aux dev or mlx5_core >>> triggered setup without breaking scenarios. >>> >>> However, you seem to be suggesting that netdev-only configurations (ie >>> netdev loaded but no rdma loaded) should disable fwctl. Is that the >>> case? All else would remain the same. It is very ugly but I could see >>> a technical path to do it, and would consider it if that brings peace. >> >> Yes, when RDMA driver is not loaded there should be no access to fwctl. > > There are users mentioned in cover letter, which need FWCTL without RDMA. > https://lore.kernel.org/all/0-v4-0cf4ec3b8143+4995-fwctl_jgg@nvidia.com/ > > I want to suggest something different. What about to move all XXX_core > logic (mlx5_core, bnxt_core, e.t.c.) from netdev to some other dedicated > place? > > There is no technical need to have PCI/FW logic inside networking stack. > > Thanks Our pds_core device fits this description as well: it is not an ethernet PCI device, but helps manage the FW/HW for Eth and other things that are separate PCI functions. We ended up in the netdev arena because we first went in as a support for vDPA VFs. Should these 'core' devices live in linux-pci land? Is it possible that some 'core' things might be platform devices rather than PCI? sln
On Tue, Feb 11, 2025 at 10:36:37AM -0800, Nelson, Shannon wrote: > On 2/10/2025 11:55 PM, Leon Romanovsky wrote: > > > > On Mon, Feb 10, 2025 at 05:04:23PM -0800, Jakub Kicinski wrote: > > > On Fri, 7 Feb 2025 21:16:47 -0400 Jason Gunthorpe wrote: > > > > On Fri, Feb 07, 2025 at 01:51:11PM -0800, Jakub Kicinski wrote: > > > > > > > > > But if you agree the netdev doesn't need it seems like a fairly > > > > > straightforward way to unblock your progress. > > > > > > > > I'm trying to understand what you are suggesting here. > > > > > > > > We have many scenarios where mlx5_core spawns all kinds of different > > > > devices, including recovery cases where there is no networking at all > > > > and only fwctl. So we can't just discard the aux dev or mlx5_core > > > > triggered setup without breaking scenarios. > > > > > > > > However, you seem to be suggesting that netdev-only configurations (ie > > > > netdev loaded but no rdma loaded) should disable fwctl. Is that the > > > > case? All else would remain the same. It is very ugly but I could see > > > > a technical path to do it, and would consider it if that brings peace. > > > > > > Yes, when RDMA driver is not loaded there should be no access to fwctl. > > > > There are users mentioned in cover letter, which need FWCTL without RDMA. > > https://lore.kernel.org/all/0-v4-0cf4ec3b8143+4995-fwctl_jgg@nvidia.com/ > > > > I want to suggest something different. What about to move all XXX_core > > logic (mlx5_core, bnxt_core, e.t.c.) from netdev to some other dedicated > > place? > > > > There is no technical need to have PCI/FW logic inside networking stack. > > > > Thanks > > Our pds_core device fits this description as well: it is not an ethernet PCI > device, but helps manage the FW/HW for Eth and other things that are > separate PCI functions. We ended up in the netdev arena because we first > went in as a support for vDPA VFs. > > Should these 'core' devices live in linux-pci land? Is it possible that > some 'core' things might be platform devices rather than PCI? IMHO, linux-pci was right place before FWCTL and auxbus arrived, but now these core drivers can be placed in drivers/fwctl instead. It will be natural place for them as they will be located near the UAPI which provides an access to them. All other components will be auxbus devices in their respective subsystems (eth, RDMA ...). Thanks > > sln > >
On Tue, Feb 11, 2025 at 09:27:08AM -0500, Andy Gospodarek wrote: > On Tue, Feb 11, 2025 at 2:55 AM Leon Romanovsky <leon@kernel.org> wrote: > > > > On Mon, Feb 10, 2025 at 05:04:23PM -0800, Jakub Kicinski wrote: > > > On Fri, 7 Feb 2025 21:16:47 -0400 Jason Gunthorpe wrote: > > > > On Fri, Feb 07, 2025 at 01:51:11PM -0800, Jakub Kicinski wrote: > > > > > > > > > But if you agree the netdev doesn't need it seems like a fairly > > > > > straightforward way to unblock your progress. > > > > > > > > I'm trying to understand what you are suggesting here. > > > > > > > > We have many scenarios where mlx5_core spawns all kinds of different > > > > devices, including recovery cases where there is no networking at all > > > > and only fwctl. So we can't just discard the aux dev or mlx5_core > > > > triggered setup without breaking scenarios. > > > > > > > > However, you seem to be suggesting that netdev-only configurations (ie > > > > netdev loaded but no rdma loaded) should disable fwctl. Is that the > > > > case? All else would remain the same. It is very ugly but I could see > > > > a technical path to do it, and would consider it if that brings peace. > > > > > > Yes, when RDMA driver is not loaded there should be no access to fwctl. > > > > There are users mentioned in cover letter, which need FWCTL without RDMA. > > https://lore.kernel.org/all/0-v4-0cf4ec3b8143+4995-fwctl_jgg@nvidia.com/ > > > > I want to suggest something different. What about to move all XXX_core > > logic (mlx5_core, bnxt_core, e.t.c.) from netdev to some other dedicated > > place? > > > > I understand the logic in your statement, but I do not want to > separate/split PCI driver from the NIC driver for bnxt-based devices. It is just an idea and there is no need to worry yet. There is no evidence that netdev community will allow such move. > > We can look at doing that for future generations of hardware, but > splitting/switching drivers for existing hardware creates a poor > user-experience for distro users. It is already solved with module autoload, dependencies and aliases. Thanks
On 12 Feb 15:22, Leon Romanovsky wrote: >On Tue, Feb 11, 2025 at 10:36:37AM -0800, Nelson, Shannon wrote: >> On 2/10/2025 11:55 PM, Leon Romanovsky wrote: >> > >> > On Mon, Feb 10, 2025 at 05:04:23PM -0800, Jakub Kicinski wrote: >> > > On Fri, 7 Feb 2025 21:16:47 -0400 Jason Gunthorpe wrote: >> > > > On Fri, Feb 07, 2025 at 01:51:11PM -0800, Jakub Kicinski wrote: >> > > > >> > > > > But if you agree the netdev doesn't need it seems like a fairly >> > > > > straightforward way to unblock your progress. >> > > > >> > > > I'm trying to understand what you are suggesting here. >> > > > >> > > > We have many scenarios where mlx5_core spawns all kinds of different >> > > > devices, including recovery cases where there is no networking at all >> > > > and only fwctl. So we can't just discard the aux dev or mlx5_core >> > > > triggered setup without breaking scenarios. >> > > > >> > > > However, you seem to be suggesting that netdev-only configurations (ie >> > > > netdev loaded but no rdma loaded) should disable fwctl. Is that the >> > > > case? All else would remain the same. It is very ugly but I could see >> > > > a technical path to do it, and would consider it if that brings peace. >> > > >> > > Yes, when RDMA driver is not loaded there should be no access to fwctl. >> > >> > There are users mentioned in cover letter, which need FWCTL without RDMA. >> > https://lore.kernel.org/all/0-v4-0cf4ec3b8143+4995-fwctl_jgg@nvidia.com/ >> > >> > I want to suggest something different. What about to move all XXX_core >> > logic (mlx5_core, bnxt_core, e.t.c.) from netdev to some other dedicated >> > place? >> > >> > There is no technical need to have PCI/FW logic inside networking stack. >> > >> > Thanks >> >> Our pds_core device fits this description as well: it is not an ethernet PCI >> device, but helps manage the FW/HW for Eth and other things that are >> separate PCI functions. We ended up in the netdev arena because we first >> went in as a support for vDPA VFs. >> >> Should these 'core' devices live in linux-pci land? Is it possible that >> some 'core' things might be platform devices rather than PCI? > >IMHO, linux-pci was right place before FWCTL and auxbus arrived, but now >these core drivers can be placed in drivers/fwctl instead. It will be natural +1 Fwctl subsystem is perfect for shared modules that need to initialize the pci device to a minimal state where fwctl uAPIs are enabled for debug and bare metal device configs while aux sunsystem can carry out the spawning of other subsystems. >place for them as they will be located near the UAPI which provides an access >to them. > >All other components will be auxbus devices in their respective >subsystems (eth, RDMA ...). > >Thanks > >> >> sln >> >> >
Fri, Feb 14, 2025 at 02:03:56AM +0100, saeed@kernel.org wrote: >On 12 Feb 15:22, Leon Romanovsky wrote: >> On Tue, Feb 11, 2025 at 10:36:37AM -0800, Nelson, Shannon wrote: >> > On 2/10/2025 11:55 PM, Leon Romanovsky wrote: >> > > >> > > On Mon, Feb 10, 2025 at 05:04:23PM -0800, Jakub Kicinski wrote: >> > > > On Fri, 7 Feb 2025 21:16:47 -0400 Jason Gunthorpe wrote: >> > > > > On Fri, Feb 07, 2025 at 01:51:11PM -0800, Jakub Kicinski wrote: >> > > > > >> > > > > > But if you agree the netdev doesn't need it seems like a fairly >> > > > > > straightforward way to unblock your progress. >> > > > > >> > > > > I'm trying to understand what you are suggesting here. >> > > > > >> > > > > We have many scenarios where mlx5_core spawns all kinds of different >> > > > > devices, including recovery cases where there is no networking at all >> > > > > and only fwctl. So we can't just discard the aux dev or mlx5_core >> > > > > triggered setup without breaking scenarios. >> > > > > >> > > > > However, you seem to be suggesting that netdev-only configurations (ie >> > > > > netdev loaded but no rdma loaded) should disable fwctl. Is that the >> > > > > case? All else would remain the same. It is very ugly but I could see >> > > > > a technical path to do it, and would consider it if that brings peace. >> > > > >> > > > Yes, when RDMA driver is not loaded there should be no access to fwctl. >> > > >> > > There are users mentioned in cover letter, which need FWCTL without RDMA. >> > > https://lore.kernel.org/all/0-v4-0cf4ec3b8143+4995-fwctl_jgg@nvidia.com/ >> > > >> > > I want to suggest something different. What about to move all XXX_core >> > > logic (mlx5_core, bnxt_core, e.t.c.) from netdev to some other dedicated >> > > place? >> > > >> > > There is no technical need to have PCI/FW logic inside networking stack. >> > > >> > > Thanks >> > >> > Our pds_core device fits this description as well: it is not an ethernet PCI >> > device, but helps manage the FW/HW for Eth and other things that are >> > separate PCI functions. We ended up in the netdev arena because we first >> > went in as a support for vDPA VFs. >> > >> > Should these 'core' devices live in linux-pci land? Is it possible that >> > some 'core' things might be platform devices rather than PCI? >> >> IMHO, linux-pci was right place before FWCTL and auxbus arrived, but now >> these core drivers can be placed in drivers/fwctl instead. It will be natural >+1 > >Fwctl subsystem is perfect for shared modules that need to initialize the >pci device to a minimal state where fwctl uAPIs are enabled for debug and >bare metal device configs while aux sunsystem can carry out the >spawning of other subsystems. Wouldn't it be better to call it drivers/core/ and have corectl or corefwctl ? > >> place for them as they will be located near the UAPI which provides an access >> to them. >> >> All other components will be auxbus devices in their respective >> subsystems (eth, RDMA ...). >> >> Thanks >> >> > >> > sln >> > >> > >> >
On Mon, Feb 17, 2025 at 01:49:12PM +0100, Jiri Pirko wrote: > Fri, Feb 14, 2025 at 02:03:56AM +0100, saeed@kernel.org wrote: > >On 12 Feb 15:22, Leon Romanovsky wrote: > >> On Tue, Feb 11, 2025 at 10:36:37AM -0800, Nelson, Shannon wrote: > >> > On 2/10/2025 11:55 PM, Leon Romanovsky wrote: > >> > > > >> > > On Mon, Feb 10, 2025 at 05:04:23PM -0800, Jakub Kicinski wrote: > >> > > > On Fri, 7 Feb 2025 21:16:47 -0400 Jason Gunthorpe wrote: > >> > > > > On Fri, Feb 07, 2025 at 01:51:11PM -0800, Jakub Kicinski wrote: > >> > > > > > >> > > > > > But if you agree the netdev doesn't need it seems like a fairly > >> > > > > > straightforward way to unblock your progress. > >> > > > > > >> > > > > I'm trying to understand what you are suggesting here. > >> > > > > > >> > > > > We have many scenarios where mlx5_core spawns all kinds of different > >> > > > > devices, including recovery cases where there is no networking at all > >> > > > > and only fwctl. So we can't just discard the aux dev or mlx5_core > >> > > > > triggered setup without breaking scenarios. > >> > > > > > >> > > > > However, you seem to be suggesting that netdev-only configurations (ie > >> > > > > netdev loaded but no rdma loaded) should disable fwctl. Is that the > >> > > > > case? All else would remain the same. It is very ugly but I could see > >> > > > > a technical path to do it, and would consider it if that brings peace. > >> > > > > >> > > > Yes, when RDMA driver is not loaded there should be no access to fwctl. > >> > > > >> > > There are users mentioned in cover letter, which need FWCTL without RDMA. > >> > > https://lore.kernel.org/all/0-v4-0cf4ec3b8143+4995-fwctl_jgg@nvidia.com/ > >> > > > >> > > I want to suggest something different. What about to move all XXX_core > >> > > logic (mlx5_core, bnxt_core, e.t.c.) from netdev to some other dedicated > >> > > place? > >> > > > >> > > There is no technical need to have PCI/FW logic inside networking stack. > >> > > > >> > > Thanks > >> > > >> > Our pds_core device fits this description as well: it is not an ethernet PCI > >> > device, but helps manage the FW/HW for Eth and other things that are > >> > separate PCI functions. We ended up in the netdev arena because we first > >> > went in as a support for vDPA VFs. > >> > > >> > Should these 'core' devices live in linux-pci land? Is it possible that > >> > some 'core' things might be platform devices rather than PCI? > >> > >> IMHO, linux-pci was right place before FWCTL and auxbus arrived, but now > >> these core drivers can be placed in drivers/fwctl instead. It will be natural > >+1 > > > >Fwctl subsystem is perfect for shared modules that need to initialize the > >pci device to a minimal state where fwctl uAPIs are enabled for debug and > >bare metal device configs while aux sunsystem can carry out the > >spawning of other subsystems. > > Wouldn't it be better to call it drivers/core/ and have corectl or > corefwctl ? Before names, let's first agree that this is the right thing to do. I'm fine with any proposed name. Thanks > > > > >> place for them as they will be located near the UAPI which provides an access > >> to them. > >> > >> All other components will be auxbus devices in their respective > >> subsystems (eth, RDMA ...). > >> > >> Thanks > >> > >> > > >> > sln > >> > > >> > > >> > > >
On Tue, Feb 11, 2025 at 09:24:35AM -0700, David Ahern wrote: > "Any resources in use by the netdev stack can only be created and > modified by established netdev tools." That is already a restriction described in the doc, not just netdev, but any kernel driver running with any kernel owned resource. You can't reach in and change kernel owned objects. Jason
On 2/18/25 1:05 PM, Jason Gunthorpe wrote: > On Tue, Feb 11, 2025 at 09:24:35AM -0700, David Ahern wrote: > >> "Any resources in use by the netdev stack can only be created and >> modified by established netdev tools." > > That is already a restriction described in the doc, not just netdev, > but any kernel driver running with any kernel owned resource. You > can't reach in and change kernel owned objects. > ok, then Jakub's concerns should be met.
On Tue, 18 Feb 2025 14:42:48 -0700 David Ahern wrote: > On 2/18/25 1:05 PM, Jason Gunthorpe wrote: > > On Tue, Feb 11, 2025 at 09:24:35AM -0700, David Ahern wrote: > > > >> "Any resources in use by the netdev stack can only be created and > >> modified by established netdev tools." > > > > That is already a restriction described in the doc, not just netdev, > > but any kernel driver running with any kernel owned resource. You > > can't reach in and change kernel owned objects. > > ok, then Jakub's concerns should be met. I appreciate the doc, but no, it's not enough. The fwctl interface must not be exposed if RDMA is disabled or driver not loaded.
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c index 7b8b5b39c7bbe8..bf33e7f26b1fd2 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c @@ -16291,6 +16291,8 @@ static int bnxt_init_one(struct pci_dev *pdev, const struct pci_device_id *ent) bnxt_init_ring_params(bp); bnxt_set_ring_params(bp); bnxt_rdma_aux_device_init(bp); + bnxt_fwctl_aux_device_init(bp); + rc = bnxt_set_dflt_rings(bp, true); if (rc) { if (BNXT_VF(bp) && rc == -ENODEV) { @@ -16353,6 +16355,7 @@ static int bnxt_init_one(struct pci_dev *pdev, const struct pci_device_id *ent) bnxt_dl_fw_reporters_create(bp); bnxt_rdma_aux_device_add(bp); + bnxt_fwctl_aux_device_add(bp); bnxt_print_device_info(bp); diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h index 2373f423a523ec..1951fdda8820d0 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h @@ -30,6 +30,7 @@ #include <net/xdp.h> #include <linux/dim.h> #include <linux/io-64-nonatomic-lo-hi.h> +#include "bnxt_hsi.h" #ifdef CONFIG_TEE_BNXT_FW #include <linux/firmware/broadcom/tee_bnxt_fw.h> #endif @@ -2326,7 +2327,9 @@ struct bnxt { (BNXT_CHIP_P3(bp) || BNXT_CHIP_P4(bp) || BNXT_CHIP_P5(bp)) struct bnxt_aux_priv *aux_priv; + struct bnxt_aux_priv *aux_priv_fwctl; struct bnxt_en_dev *edev; + struct bnxt_en_dev *edev_fwctl; struct bnxt_napi **bnapi; diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.c index e4a7f37036edba..7e39d9695af339 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.c @@ -26,7 +26,8 @@ #include "bnxt_hwrm.h" #include "bnxt_ulp.h" -static DEFINE_IDA(bnxt_aux_dev_ids); +static DEFINE_IDA(bnxt_rdma_aux_dev_ids); +static DEFINE_IDA(bnxt_fwctl_aux_dev_ids); static void bnxt_fill_msix_vecs(struct bnxt *bp, struct bnxt_msix_entry *ent) { @@ -413,7 +414,7 @@ static void bnxt_aux_dev_release(struct device *dev) container_of(dev, struct bnxt_aux_priv, aux_dev.dev); struct bnxt *bp = netdev_priv(aux_priv->edev->net); - ida_free(&bnxt_aux_dev_ids, aux_priv->id); + ida_free(&bnxt_rdma_aux_dev_ids, aux_priv->id); kfree(aux_priv->edev->ulp_tbl); bp->edev = NULL; kfree(aux_priv->edev); @@ -488,7 +489,7 @@ void bnxt_rdma_aux_device_init(struct bnxt *bp) if (!aux_priv) goto exit; - aux_priv->id = ida_alloc(&bnxt_aux_dev_ids, GFP_KERNEL); + aux_priv->id = ida_alloc(&bnxt_rdma_aux_dev_ids, GFP_KERNEL); if (aux_priv->id < 0) { netdev_warn(bp->dev, "ida alloc failed for ROCE auxiliary device\n"); @@ -504,7 +505,7 @@ void bnxt_rdma_aux_device_init(struct bnxt *bp) rc = auxiliary_device_init(aux_dev); if (rc) { - ida_free(&bnxt_aux_dev_ids, aux_priv->id); + ida_free(&bnxt_rdma_aux_dev_ids, aux_priv->id); kfree(aux_priv); goto exit; } @@ -536,3 +537,120 @@ void bnxt_rdma_aux_device_init(struct bnxt *bp) exit: bp->flags &= ~BNXT_FLAG_ROCE_CAP; } + +void bnxt_fwctl_aux_device_uninit(struct bnxt *bp) +{ + struct bnxt_aux_priv *aux_priv; + struct auxiliary_device *adev; + + /* Skip if no auxiliary device init was done. */ + if (!bp->aux_priv_fwctl) + return; + + aux_priv = bp->aux_priv_fwctl; + adev = &aux_priv->aux_dev; + auxiliary_device_uninit(adev); +} + + +void bnxt_fwctl_aux_device_del(struct bnxt *bp) +{ + if (!bp->edev_fwctl) + return; + + auxiliary_device_delete(&bp->aux_priv_fwctl->aux_dev); +} + +void bnxt_fwctl_aux_device_add(struct bnxt *bp) +{ + struct auxiliary_device *aux_dev; + int rc; + + if (!bp->edev_fwctl) { + printk(KERN_CRIT "edev_fwctl is NULL %s\n", __func__); + return; + } + + aux_dev = &bp->aux_priv_fwctl->aux_dev; + rc = auxiliary_device_add(aux_dev); + if (rc) { + netdev_warn(bp->dev, "Failed to add auxiliary device for FWCTL\n"); + auxiliary_device_uninit(aux_dev); + bp->flags &= ~BNXT_FLAG_ROCE_CAP; + } else { + netdev_warn(bp->dev, "Added auxiliary device for FWCTL!!!\n"); + } +} + +static void bnxt_fwctl_aux_dev_release(struct device *dev) +{ + struct bnxt_aux_priv *aux_priv = + container_of(dev, struct bnxt_aux_priv, aux_dev.dev); + struct bnxt *bp = netdev_priv(aux_priv->edev->net); + + ida_free(&bnxt_fwctl_aux_dev_ids, aux_priv->id); + kfree(aux_priv->edev); + bp->edev_fwctl = NULL; + kfree(bp->aux_priv_fwctl); + bp->aux_priv_fwctl = NULL; +} + + +void bnxt_fwctl_aux_device_init(struct bnxt *bp) +{ + struct auxiliary_device *aux_dev; + struct bnxt_aux_priv *aux_priv; + struct bnxt_en_dev *edev; + struct bnxt_ulp *ulp; + int rc; + + aux_priv = kzalloc(sizeof(*bp->aux_priv), GFP_KERNEL); + if (!aux_priv) + return; + + aux_priv->id = ida_alloc(&bnxt_fwctl_aux_dev_ids, GFP_KERNEL); + if (aux_priv->id < 0) { + netdev_warn(bp->dev, + "ida alloc failed for FWCTL auxiliary device\n"); + kfree(aux_priv); + return; + } + + aux_dev = &aux_priv->aux_dev; + aux_dev->id = aux_priv->id; + aux_dev->name = "fwctl"; + aux_dev->dev.parent = &bp->pdev->dev; + aux_dev->dev.release = bnxt_fwctl_aux_dev_release; + + rc = auxiliary_device_init(aux_dev); + if (rc) { + ida_free(&bnxt_fwctl_aux_dev_ids, aux_priv->id); + kfree(aux_priv); + return; + } + bp->aux_priv_fwctl = aux_priv; + + /* From this point, all cleanup will happen via the .release callback & + * any error unwinding will need to include a call to + * auxiliary_device_uninit. + */ + edev = kzalloc(sizeof(*edev), GFP_KERNEL); + if (!edev) + goto aux_dev_uninit; + + aux_priv->edev = edev; + + ulp = kzalloc(sizeof(*ulp), GFP_KERNEL); + if (!ulp) + goto aux_dev_uninit; + + edev->ulp_tbl = ulp; + bp->edev_fwctl = edev; + bnxt_set_edev_info(edev, bp); + /* bp->ulp_num_msix_want = bnxt_set_dflt_ulp_msix(bp); */ + printk(KERN_CRIT "Made it %s\n", __func__); + return; + +aux_dev_uninit: + auxiliary_device_uninit(aux_dev); +} diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.h b/drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.h index 7fa3b8d1ebd288..148e3eb32be001 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.h +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.h @@ -124,6 +124,10 @@ void bnxt_rdma_aux_device_uninit(struct bnxt *bp); void bnxt_rdma_aux_device_del(struct bnxt *bp); void bnxt_rdma_aux_device_add(struct bnxt *bp); void bnxt_rdma_aux_device_init(struct bnxt *bp); +void bnxt_fwctl_aux_device_uninit(struct bnxt *bp); +void bnxt_fwctl_aux_device_del(struct bnxt *bp); +void bnxt_fwctl_aux_device_add(struct bnxt *bp); +void bnxt_fwctl_aux_device_init(struct bnxt *bp); int bnxt_register_dev(struct bnxt_en_dev *edev, struct bnxt_ulp_ops *ulp_ops, void *handle); void bnxt_unregister_dev(struct bnxt_en_dev *edev);