Message ID | 20230119013602.607466-1-tianfei.zhang@intel.com (mailing list archive) |
---|---|
Headers | show |
Series | add FPGA hotplug manager driver | expand |
Hello! On Wednesday 18 January 2023 20:35:50 Tianfei Zhang wrote: > This patchset introduces the FPGA hotplug manager (fpgahp) driver which > has been verified on the Intel N3000 card. > > When a PCIe-based FPGA card is reprogrammed, it temporarily disappears > from the PCIe bus. This needs to be managed to avoid PCIe errors and to > reprobe the device after reprogramming. > > To change the FPGA image, the kernel burns a new image into the flash on > the card, and then triggers the card BMC to load the new image into FPGA. > A new FPGA hotplug manager driver is introduced that leverages the PCIe > hotplug framework to trigger and manage the update of the FPGA image, > including the disappearance and reappearance of the card on the PCIe bus. > The fpgahp driver uses APIs from the pciehp driver. Just I'm thinking about one thing. PCIe cards can support PCIe hotplug mechanism (via standard PCIe capabilities). So what would happen when FPGA based PCIe card is also hotplug-able? Will be there two PCI hotplug drivers/devices (one fpgahp and one pciehp)? Or just one and which? > Two new operation > callbacks are defined in hotplug_slot_ops: > > - available_images: Optional: available FPGA images > - image_load: Optional: trigger the FPGA to load a new image > > > The process of reprogramming an FPGA card begins by removing all devices > associated with the card that are not required for the reprogramming of > the card. This includes PCIe devices (PFs and VFs) associated with the > card as well as any other types of devices (platform, etc.) defined within > the FPGA. The remaining devices are referred to here as "reserved" devices. > After triggering the update of the FPGA card, the reserved devices are also > removed. > > The complete process for reprogramming the FPGA are: > 1. remove all PFs and VFs except for PF0 (reserved). > 2. remove all non-reserved devices of PF0. > 3. trigger FPGA card to do the image update. > 4. disable the link of the hotplug bridge. > 5. remove all reserved devices under hotplug bridge. > 6. wait for image reload done via BMC, e.g. 10s. > 7. re-enable the link of hotplug bridge > 8. enumerate PCI devices below the hotplug bridge > > usage example: > [root@localhost]# cd /sys/bus/pci/slot/X-X/ > > Get the available images. > [root@localhost 2-1]# cat available_images > bmc_factory bmc_user retimer_fw > > Load the request images for FPGA Card, for example load the BMC user image: > [root@localhost 2-1]# echo bmc_user > image_load > > Tianfei Zhang (12): > PCI: hotplug: add new callbacks on hotplug_slot_ops > PCI: hotplug: expose APIs from pciehp driver > PCI: hotplug: add and expose link disable API > PCI: hotplug: add FPGA PCI hotplug manager driver > fpga: dfl: register dfl-pci device into fpgahph driver > driver core: expose device_is_ancestor() API > PCI: hotplug: add register/unregister function for BMC device > fpga: m10bmc-sec: register BMC device into fpgahp driver > fpga: dfl: remove non-reserved devices > PCI: hotplug: implement the hotplug_slot_ops callback for fpgahp > fpga: m10bmc-sec: add m10bmc_sec_retimer_load callback > Documentation: fpga: add description of fpgahp driver > > Documentation/ABI/testing/sysfs-driver-fpgahp | 21 + > Documentation/fpga/fpgahp.rst | 29 + > Documentation/fpga/index.rst | 1 + > MAINTAINERS | 10 + > drivers/base/core.c | 3 +- > drivers/fpga/Kconfig | 2 + > drivers/fpga/dfl-pci.c | 95 +++- > drivers/fpga/dfl.c | 58 ++ > drivers/fpga/dfl.h | 4 + > drivers/fpga/intel-m10-bmc-sec-update.c | 246 ++++++++ > drivers/pci/hotplug/Kconfig | 14 + > drivers/pci/hotplug/Makefile | 1 + > drivers/pci/hotplug/fpgahp.c | 526 ++++++++++++++++++ > drivers/pci/hotplug/pci_hotplug_core.c | 88 +++ > drivers/pci/hotplug/pciehp.h | 3 + > drivers/pci/hotplug/pciehp_hpc.c | 11 +- > drivers/pci/hotplug/pciehp_pci.c | 2 + > include/linux/device.h | 1 + > include/linux/fpga/fpgahp_manager.h | 100 ++++ > include/linux/mfd/intel-m10-bmc.h | 31 ++ > include/linux/pci_hotplug.h | 5 + > 21 files changed, 1243 insertions(+), 8 deletions(-) > create mode 100644 Documentation/ABI/testing/sysfs-driver-fpgahp > create mode 100644 Documentation/fpga/fpgahp.rst > create mode 100644 drivers/pci/hotplug/fpgahp.c > create mode 100644 include/linux/fpga/fpgahp_manager.h > > > base-commit: 5dc4c995db9eb45f6373a956eb1f69460e69e6d4 > -- > 2.38.1 >
> -----Original Message----- > From: Pali Rohár <pali@kernel.org> > Sent: Thursday, January 19, 2023 4:06 PM > To: Zhang, Tianfei <tianfei.zhang@intel.com> > Cc: bhelgaas@google.com; linux-pci@vger.kernel.org; linux-fpga@vger.kernel.org; > lukas@wunner.de; kabel@kernel.org; mani@kernel.org; mdf@kernel.org; Wu, Hao > <hao.wu@intel.com>; Xu, Yilun <yilun.xu@intel.com>; Rix, Tom <trix@redhat.com>; > jgg@ziepe.ca; Weiny, Ira <ira.weiny@intel.com>; > andriy.shevchenko@linux.intel.com; Williams, Dan J <dan.j.williams@intel.com>; > keescook@chromium.org; rafael@kernel.org; Weight, Russell H > <russell.h.weight@intel.com>; corbet@lwn.net; linux-doc@vger.kernel.org; > ilpo.jarvinen@linux.intel.com; lee@kernel.org; gregkh@linuxfoundation.org; > matthew.gerlach@linux.intel.com > Subject: Re: [PATCH v1 00/12] add FPGA hotplug manager driver > > Hello! > > On Wednesday 18 January 2023 20:35:50 Tianfei Zhang wrote: > > This patchset introduces the FPGA hotplug manager (fpgahp) driver > > which has been verified on the Intel N3000 card. > > > > When a PCIe-based FPGA card is reprogrammed, it temporarily disappears > > from the PCIe bus. This needs to be managed to avoid PCIe errors and > > to reprobe the device after reprogramming. > > > > To change the FPGA image, the kernel burns a new image into the flash > > on the card, and then triggers the card BMC to load the new image into FPGA. > > A new FPGA hotplug manager driver is introduced that leverages the > > PCIe hotplug framework to trigger and manage the update of the FPGA > > image, including the disappearance and reappearance of the card on the PCIe bus. > > The fpgahp driver uses APIs from the pciehp driver. > > Just I'm thinking about one thing. PCIe cards can support PCIe hotplug mechanism > (via standard PCIe capabilities). So what would happen when FPGA based PCIe card is > also hotplug-able? Will be there two PCI hotplug drivers/devices (one fpgahp and > one pciehp)? Or just one and which? For our Intel PAC N3000 and N6000 FPGA card, there are not support PCIe hotplug capability from hardware side now, but from software perspective, the process of FPGA image load is very similar with PCIe hotplug, like removing all of devices under PCIe bridge, re-scan the PCIe device under the bridge, so we are looking for the PCIe hotplug framework and APIs from pciehp driver to manager this process, and reduce some duplicate code. > > > Two new operation > > callbacks are defined in hotplug_slot_ops: > > > > - available_images: Optional: available FPGA images > > - image_load: Optional: trigger the FPGA to load a new image > > > > > > The process of reprogramming an FPGA card begins by removing all > > devices associated with the card that are not required for the > > reprogramming of the card. This includes PCIe devices (PFs and VFs) > > associated with the card as well as any other types of devices > > (platform, etc.) defined within the FPGA. The remaining devices are referred to > here as "reserved" devices. > > After triggering the update of the FPGA card, the reserved devices are > > also removed. > > > > The complete process for reprogramming the FPGA are: > > 1. remove all PFs and VFs except for PF0 (reserved). > > 2. remove all non-reserved devices of PF0. > > 3. trigger FPGA card to do the image update. > > 4. disable the link of the hotplug bridge. > > 5. remove all reserved devices under hotplug bridge. > > 6. wait for image reload done via BMC, e.g. 10s. > > 7. re-enable the link of hotplug bridge > > 8. enumerate PCI devices below the hotplug bridge > > > > usage example: > > [root@localhost]# cd /sys/bus/pci/slot/X-X/ > > > > Get the available images. > > [root@localhost 2-1]# cat available_images bmc_factory bmc_user > > retimer_fw > > > > Load the request images for FPGA Card, for example load the BMC user image: > > [root@localhost 2-1]# echo bmc_user > image_load > > > > Tianfei Zhang (12): > > PCI: hotplug: add new callbacks on hotplug_slot_ops > > PCI: hotplug: expose APIs from pciehp driver > > PCI: hotplug: add and expose link disable API > > PCI: hotplug: add FPGA PCI hotplug manager driver > > fpga: dfl: register dfl-pci device into fpgahph driver > > driver core: expose device_is_ancestor() API > > PCI: hotplug: add register/unregister function for BMC device > > fpga: m10bmc-sec: register BMC device into fpgahp driver > > fpga: dfl: remove non-reserved devices > > PCI: hotplug: implement the hotplug_slot_ops callback for fpgahp > > fpga: m10bmc-sec: add m10bmc_sec_retimer_load callback > > Documentation: fpga: add description of fpgahp driver > > > > Documentation/ABI/testing/sysfs-driver-fpgahp | 21 + > > Documentation/fpga/fpgahp.rst | 29 + > > Documentation/fpga/index.rst | 1 + > > MAINTAINERS | 10 + > > drivers/base/core.c | 3 +- > > drivers/fpga/Kconfig | 2 + > > drivers/fpga/dfl-pci.c | 95 +++- > > drivers/fpga/dfl.c | 58 ++ > > drivers/fpga/dfl.h | 4 + > > drivers/fpga/intel-m10-bmc-sec-update.c | 246 ++++++++ > > drivers/pci/hotplug/Kconfig | 14 + > > drivers/pci/hotplug/Makefile | 1 + > > drivers/pci/hotplug/fpgahp.c | 526 ++++++++++++++++++ > > drivers/pci/hotplug/pci_hotplug_core.c | 88 +++ > > drivers/pci/hotplug/pciehp.h | 3 + > > drivers/pci/hotplug/pciehp_hpc.c | 11 +- > > drivers/pci/hotplug/pciehp_pci.c | 2 + > > include/linux/device.h | 1 + > > include/linux/fpga/fpgahp_manager.h | 100 ++++ > > include/linux/mfd/intel-m10-bmc.h | 31 ++ > > include/linux/pci_hotplug.h | 5 + > > 21 files changed, 1243 insertions(+), 8 deletions(-) create mode > > 100644 Documentation/ABI/testing/sysfs-driver-fpgahp > > create mode 100644 Documentation/fpga/fpgahp.rst create mode 100644 > > drivers/pci/hotplug/fpgahp.c create mode 100644 > > include/linux/fpga/fpgahp_manager.h > > > > > > base-commit: 5dc4c995db9eb45f6373a956eb1f69460e69e6d4 > > -- > > 2.38.1 > >
On Thu, Jan 19, 2023 at 08:17:05AM +0000, Zhang, Tianfei wrote: > > From: Pali Rohár <pali@kernel.org> > > Sent: Thursday, January 19, 2023 4:06 PM > > On Wednesday 18 January 2023 20:35:50 Tianfei Zhang wrote: ... > > > To change the FPGA image, the kernel burns a new image into the flash > > > on the card, and then triggers the card BMC to load the new image into FPGA. > > > A new FPGA hotplug manager driver is introduced that leverages the > > > PCIe hotplug framework to trigger and manage the update of the FPGA > > > image, including the disappearance and reappearance of the card on the PCIe bus. > > > The fpgahp driver uses APIs from the pciehp driver. > > > > Just I'm thinking about one thing. PCIe cards can support PCIe hotplug mechanism > > (via standard PCIe capabilities). So what would happen when FPGA based PCIe card is > > also hotplug-able? Will be there two PCI hotplug drivers/devices (one fpgahp and > > one pciehp)? Or just one and which? > > For our Intel PAC N3000 and N6000 FPGA card, there are not support PCIe > hotplug capability from hardware side now, but from software perspective, the > process of FPGA image load is very similar with PCIe hotplug, like removing > all of devices under PCIe bridge, re-scan the PCIe device under the bridge, > so we are looking for the PCIe hotplug framework and APIs from pciehp driver > to manager this process, and reduce some duplicate code. Exactly, from the OS perspective they both should be equivalent.
> -----Original Message----- > From: andriy.shevchenko@linux.intel.com <andriy.shevchenko@linux.intel.com> > Sent: Thursday, January 19, 2023 7:28 PM > To: Zhang, Tianfei <tianfei.zhang@intel.com> > Cc: Pali Rohár <pali@kernel.org>; bhelgaas@google.com; linux- > pci@vger.kernel.org; linux-fpga@vger.kernel.org; lukas@wunner.de; > kabel@kernel.org; mani@kernel.org; mdf@kernel.org; Wu, Hao > <hao.wu@intel.com>; Xu, Yilun <yilun.xu@intel.com>; Rix, Tom <trix@redhat.com>; > jgg@ziepe.ca; Weiny, Ira <ira.weiny@intel.com>; Williams, Dan J > <dan.j.williams@intel.com>; keescook@chromium.org; rafael@kernel.org; Weight, > Russell H <russell.h.weight@intel.com>; corbet@lwn.net; linux- > doc@vger.kernel.org; ilpo.jarvinen@linux.intel.com; lee@kernel.org; > gregkh@linuxfoundation.org; matthew.gerlach@linux.intel.com > Subject: Re: [PATCH v1 00/12] add FPGA hotplug manager driver > > On Thu, Jan 19, 2023 at 08:17:05AM +0000, Zhang, Tianfei wrote: > > > From: Pali Rohár <pali@kernel.org> > > > Sent: Thursday, January 19, 2023 4:06 PM On Wednesday 18 January > > > 2023 20:35:50 Tianfei Zhang wrote: > > ... > > > > > To change the FPGA image, the kernel burns a new image into the > > > > flash on the card, and then triggers the card BMC to load the new image into > FPGA. > > > > A new FPGA hotplug manager driver is introduced that leverages the > > > > PCIe hotplug framework to trigger and manage the update of the > > > > FPGA image, including the disappearance and reappearance of the card on the > PCIe bus. > > > > The fpgahp driver uses APIs from the pciehp driver. > > > > > > Just I'm thinking about one thing. PCIe cards can support PCIe > > > hotplug mechanism (via standard PCIe capabilities). So what would > > > happen when FPGA based PCIe card is also hotplug-able? Will be there > > > two PCI hotplug drivers/devices (one fpgahp and one pciehp)? Or just one and > which? > > > > For our Intel PAC N3000 and N6000 FPGA card, there are not support > > PCIe hotplug capability from hardware side now, but from software > > perspective, the process of FPGA image load is very similar with PCIe > > hotplug, like removing all of devices under PCIe bridge, re-scan the > > PCIe device under the bridge, so we are looking for the PCIe hotplug > > framework and APIs from pciehp driver to manager this process, and reduce some > duplicate code. > > Exactly, from the OS perspective they both should be equivalent. Additionally, for FPGA usage, it doesn't need physical hot-add and hot-removal of the card for FPGA card managements like load a new FPGA image in FPGA deployment of data center or cloud. So whatever the card support PCIe hotplug capability or not, it can be done by leveraging the PCI hotplug framework and APIs from pciehp driver. I think it also has benefits for other vendor's FPGA Card to do some card managements by using fpgahp driver.
On Wed, Jan 18, 2023 at 08:35:50PM -0500, Tianfei Zhang wrote: > This patchset introduces the FPGA hotplug manager (fpgahp) driver which > has been verified on the Intel N3000 card. > > When a PCIe-based FPGA card is reprogrammed, it temporarily disappears > from the PCIe bus. This needs to be managed to avoid PCIe errors and to > reprobe the device after reprogramming. > > To change the FPGA image, the kernel burns a new image into the flash on > the card, and then triggers the card BMC to load the new image into FPGA. > A new FPGA hotplug manager driver is introduced that leverages the PCIe > hotplug framework to trigger and manage the update of the FPGA image, > including the disappearance and reappearance of the card on the PCIe bus. > The fpgahp driver uses APIs from the pciehp driver. Two new operation > callbacks are defined in hotplug_slot_ops: > > - available_images: Optional: available FPGA images > - image_load: Optional: trigger the FPGA to load a new image > > > The process of reprogramming an FPGA card begins by removing all devices > associated with the card that are not required for the reprogramming of > the card. This includes PCIe devices (PFs and VFs) associated with the > card as well as any other types of devices (platform, etc.) defined within > the FPGA. The remaining devices are referred to here as "reserved" devices. > After triggering the update of the FPGA card, the reserved devices are also > removed. > > The complete process for reprogramming the FPGA are: > 1. remove all PFs and VFs except for PF0 (reserved). > 2. remove all non-reserved devices of PF0. > 3. trigger FPGA card to do the image update. > 4. disable the link of the hotplug bridge. > 5. remove all reserved devices under hotplug bridge. > 6. wait for image reload done via BMC, e.g. 10s. > 7. re-enable the link of hotplug bridge > 8. enumerate PCI devices below the hotplug bridge > > usage example: > [root@localhost]# cd /sys/bus/pci/slot/X-X/ > > Get the available images. > [root@localhost 2-1]# cat available_images > bmc_factory bmc_user retimer_fw > > Load the request images for FPGA Card, for example load the BMC user image: > [root@localhost 2-1]# echo bmc_user > image_load Why is all of this tied into the pci hotplug code? Shouldn't it be specific to this one driver instead? pci hotplug is for removing/adding PCI devices to the system, not messing with FPGA images. This feels like an abuse of the pci hotplug bus to me as this is NOT really a PCI hotplug bus at all, right? Or is it? If so, then the slots should show up under the PCI device itself, not in /sys/bus/pci/slot/. That location is there for old old stuff, we probably should move it one of these days as there's lots of special-cases in the driver core just because of that :( thanks, greg k-h
On Thu, Jan 19, 2023 at 2:34 PM Greg KH <gregkh@linuxfoundation.org> wrote: > > On Wed, Jan 18, 2023 at 08:35:50PM -0500, Tianfei Zhang wrote: > > This patchset introduces the FPGA hotplug manager (fpgahp) driver which > > has been verified on the Intel N3000 card. > > > > When a PCIe-based FPGA card is reprogrammed, it temporarily disappears > > from the PCIe bus. This needs to be managed to avoid PCIe errors and to > > reprobe the device after reprogramming. > > > > To change the FPGA image, the kernel burns a new image into the flash on > > the card, and then triggers the card BMC to load the new image into FPGA. > > A new FPGA hotplug manager driver is introduced that leverages the PCIe > > hotplug framework to trigger and manage the update of the FPGA image, > > including the disappearance and reappearance of the card on the PCIe bus. > > The fpgahp driver uses APIs from the pciehp driver. Two new operation > > callbacks are defined in hotplug_slot_ops: > > > > - available_images: Optional: available FPGA images > > - image_load: Optional: trigger the FPGA to load a new image > > > > > > The process of reprogramming an FPGA card begins by removing all devices > > associated with the card that are not required for the reprogramming of > > the card. This includes PCIe devices (PFs and VFs) associated with the > > card as well as any other types of devices (platform, etc.) defined within > > the FPGA. The remaining devices are referred to here as "reserved" devices. > > After triggering the update of the FPGA card, the reserved devices are also > > removed. > > > > The complete process for reprogramming the FPGA are: > > 1. remove all PFs and VFs except for PF0 (reserved). > > 2. remove all non-reserved devices of PF0. > > 3. trigger FPGA card to do the image update. > > 4. disable the link of the hotplug bridge. > > 5. remove all reserved devices under hotplug bridge. > > 6. wait for image reload done via BMC, e.g. 10s. > > 7. re-enable the link of hotplug bridge > > 8. enumerate PCI devices below the hotplug bridge > > > > usage example: > > [root@localhost]# cd /sys/bus/pci/slot/X-X/ > > > > Get the available images. > > [root@localhost 2-1]# cat available_images > > bmc_factory bmc_user retimer_fw > > > > Load the request images for FPGA Card, for example load the BMC user image: > > [root@localhost 2-1]# echo bmc_user > image_load > > Why is all of this tied into the pci hotplug code? Shouldn't it be > specific to this one driver instead? pci hotplug is for removing/adding > PCI devices to the system, not messing with FPGA images. > > This feels like an abuse of the pci hotplug bus to me as this is NOT > really a PCI hotplug bus at all, right? > > Or is it? If so, then the slots should show up under the PCI device > itself, not in /sys/bus/pci/slot/. That location is there for old old > stuff, we probably should move it one of these days as there's lots of > special-cases in the driver core just because of that :( I'm not sure if I can agree with this statement. The slot here is what is registered via pci_hp_register(), isn't it? There are multiple users of this in the tree, including ACPI-based PCI hotplug, which is not really that old. Are you saying that this should not be used?
On Thu, Jan 19, 2023 at 02:43:21PM +0100, Rafael J. Wysocki wrote: > On Thu, Jan 19, 2023 at 2:34 PM Greg KH <gregkh@linuxfoundation.org> wrote: > > > > On Wed, Jan 18, 2023 at 08:35:50PM -0500, Tianfei Zhang wrote: > > > This patchset introduces the FPGA hotplug manager (fpgahp) driver which > > > has been verified on the Intel N3000 card. > > > > > > When a PCIe-based FPGA card is reprogrammed, it temporarily disappears > > > from the PCIe bus. This needs to be managed to avoid PCIe errors and to > > > reprobe the device after reprogramming. > > > > > > To change the FPGA image, the kernel burns a new image into the flash on > > > the card, and then triggers the card BMC to load the new image into FPGA. > > > A new FPGA hotplug manager driver is introduced that leverages the PCIe > > > hotplug framework to trigger and manage the update of the FPGA image, > > > including the disappearance and reappearance of the card on the PCIe bus. > > > The fpgahp driver uses APIs from the pciehp driver. Two new operation > > > callbacks are defined in hotplug_slot_ops: > > > > > > - available_images: Optional: available FPGA images > > > - image_load: Optional: trigger the FPGA to load a new image > > > > > > > > > The process of reprogramming an FPGA card begins by removing all devices > > > associated with the card that are not required for the reprogramming of > > > the card. This includes PCIe devices (PFs and VFs) associated with the > > > card as well as any other types of devices (platform, etc.) defined within > > > the FPGA. The remaining devices are referred to here as "reserved" devices. > > > After triggering the update of the FPGA card, the reserved devices are also > > > removed. > > > > > > The complete process for reprogramming the FPGA are: > > > 1. remove all PFs and VFs except for PF0 (reserved). > > > 2. remove all non-reserved devices of PF0. > > > 3. trigger FPGA card to do the image update. > > > 4. disable the link of the hotplug bridge. > > > 5. remove all reserved devices under hotplug bridge. > > > 6. wait for image reload done via BMC, e.g. 10s. > > > 7. re-enable the link of hotplug bridge > > > 8. enumerate PCI devices below the hotplug bridge > > > > > > usage example: > > > [root@localhost]# cd /sys/bus/pci/slot/X-X/ > > > > > > Get the available images. > > > [root@localhost 2-1]# cat available_images > > > bmc_factory bmc_user retimer_fw > > > > > > Load the request images for FPGA Card, for example load the BMC user image: > > > [root@localhost 2-1]# echo bmc_user > image_load > > > > Why is all of this tied into the pci hotplug code? Shouldn't it be > > specific to this one driver instead? pci hotplug is for removing/adding > > PCI devices to the system, not messing with FPGA images. > > > > This feels like an abuse of the pci hotplug bus to me as this is NOT > > really a PCI hotplug bus at all, right? > > > > Or is it? If so, then the slots should show up under the PCI device > > itself, not in /sys/bus/pci/slot/. That location is there for old old > > stuff, we probably should move it one of these days as there's lots of > > special-cases in the driver core just because of that :( > > I'm not sure if I can agree with this statement. > > The slot here is what is registered via pci_hp_register(), isn't it? Yes, but is it really a "slot" like a normal PCI slot? > There are multiple users of this in the tree, including ACPI-based PCI > hotplug, which is not really that old. It's really old, I think I worked on that in the 2.4/2.5 days? Anyway, it's been around a long time. > Are you saying that this should not be used? I'm saying that PCI is the only subsystem/bus that has something like this and we have a number of functions exported in the driver core only for the pci hotplug slot list. Which kind of implies that maybe it should be moved to something else? I don't have any specific ideas what it should be, just that it feels really odd as-is still. thanks, greg k-h
On 1/19/23 05:33, Greg KH wrote: > On Wed, Jan 18, 2023 at 08:35:50PM -0500, Tianfei Zhang wrote: >> This patchset introduces the FPGA hotplug manager (fpgahp) driver which >> has been verified on the Intel N3000 card. >> >> When a PCIe-based FPGA card is reprogrammed, it temporarily disappears >> from the PCIe bus. This needs to be managed to avoid PCIe errors and to >> reprobe the device after reprogramming. >> >> To change the FPGA image, the kernel burns a new image into the flash on >> the card, and then triggers the card BMC to load the new image into FPGA. >> A new FPGA hotplug manager driver is introduced that leverages the PCIe >> hotplug framework to trigger and manage the update of the FPGA image, >> including the disappearance and reappearance of the card on the PCIe bus. >> The fpgahp driver uses APIs from the pciehp driver. Two new operation >> callbacks are defined in hotplug_slot_ops: >> >> - available_images: Optional: available FPGA images >> - image_load: Optional: trigger the FPGA to load a new image >> >> >> The process of reprogramming an FPGA card begins by removing all devices >> associated with the card that are not required for the reprogramming of >> the card. This includes PCIe devices (PFs and VFs) associated with the >> card as well as any other types of devices (platform, etc.) defined within >> the FPGA. The remaining devices are referred to here as "reserved" devices. >> After triggering the update of the FPGA card, the reserved devices are also >> removed. >> >> The complete process for reprogramming the FPGA are: >> 1. remove all PFs and VFs except for PF0 (reserved). >> 2. remove all non-reserved devices of PF0. >> 3. trigger FPGA card to do the image update. >> 4. disable the link of the hotplug bridge. >> 5. remove all reserved devices under hotplug bridge. >> 6. wait for image reload done via BMC, e.g. 10s. >> 7. re-enable the link of hotplug bridge >> 8. enumerate PCI devices below the hotplug bridge >> >> usage example: >> [root@localhost]# cd /sys/bus/pci/slot/X-X/ >> >> Get the available images. >> [root@localhost 2-1]# cat available_images >> bmc_factory bmc_user retimer_fw >> >> Load the request images for FPGA Card, for example load the BMC user image: >> [root@localhost 2-1]# echo bmc_user > image_load > Why is all of this tied into the pci hotplug code? Shouldn't it be > specific to this one driver instead? pci hotplug is for removing/adding > PCI devices to the system, not messing with FPGA images. > > This feels like an abuse of the pci hotplug bus to me as this is NOT > really a PCI hotplug bus at all, right? While it is true that triggering an FPGA image-load does not involve hotplug specific registers to be managed, the RTL that comprises the PCIe interface will disappear and then reappear after the FPGA is reprogrammed. When it reappears, it_could/_/have a different PCI ID. The process of managing this event has a lot of similarity to a PCIe hotplug event; there is a lot of existing PCIe hotplug related code that could be leveraged. As alternatives to the idea of creating a hotplug driver, we have considered creating a new PCIe service driver specifically to handle FPGA reprogramming, or modifying the existing hotplug driver(s) to add FPGA support. We have also considered a separate fpga-reload driver that would not be bound to a PCIe interface, but would still leverage the PCIe code to manage the event. Do any of these options sound preferable to creating an FPGA hotplug driver? > > Or is it? If so, then the slots should show up under the PCI device > itself, not in /sys/bus/pci/slot/. That location is there for old old > stuff, we probably should move it one of these days as there's lots of > special-cases in the driver core just because of that :( > > thanks, > > greg k-h
On Fri, Jan 20, 2023 at 08:28:51AM -0800, Russ Weight wrote: > On 1/19/23 05:33, Greg KH wrote: > > On Wed, Jan 18, 2023 at 08:35:50PM -0500, Tianfei Zhang wrote: > > > This patchset introduces the FPGA hotplug manager (fpgahp) driver which > > > has been verified on the Intel N3000 card. > > > > > > When a PCIe-based FPGA card is reprogrammed, it temporarily disappears > > > from the PCIe bus. This needs to be managed to avoid PCIe errors and to > > > reprobe the device after reprogramming. > > > > > > To change the FPGA image, the kernel burns a new image into the flash on > > > the card, and then triggers the card BMC to load the new image into FPGA. > > > A new FPGA hotplug manager driver is introduced that leverages the PCIe > > > hotplug framework to trigger and manage the update of the FPGA image, > > > including the disappearance and reappearance of the card on the PCIe bus. > > > The fpgahp driver uses APIs from the pciehp driver. Two new operation > > > callbacks are defined in hotplug_slot_ops: > > > > > > - available_images: Optional: available FPGA images > > > - image_load: Optional: trigger the FPGA to load a new image > > > > Why is all of this tied into the pci hotplug code? Shouldn't it be > > specific to this one driver instead? pci hotplug is for removing/adding > > PCI devices to the system, not messing with FPGA images. > > > > This feels like an abuse of the pci hotplug bus to me as this is NOT > > really a PCI hotplug bus at all, right? > > While it is true that triggering an FPGA image-load does not involve > hotplug specific registers to be managed, the RTL that comprises > the PCIe interface will disappear and then reappear after the FPGA > is reprogrammed. When it reappears, it_could/_/have a different PCI > ID. The process of managing this event has a lot of similarity to a > PCIe hotplug event; there is a lot of existing PCIe hotplug related > code that could be leveraged. It sounds like the N3000 is a PCI endpoint device which, when reprogrammed, briefly disappears from the bus and then may reappear under a different device ID. What you want to do then is make sure that the slot into which the N3000 is plugged is hotplug-capable. In that case, pciehp will handle disappearance and reappearance of the card just fine. Once the N3000 disables the link, pciehp will bring down the slot. Once it re-enables the link, it will bring the slot up again. It's as if the card was removed and replaced with a different one. pciehp will bind to the Root Port or Downstream Port associated with the hotplug slot. The pci_hotplug_port infrastructure is for hotplug controllers which handle devices disappearing and reappearing *below* them. It is not for endpoint devices. Thanks, Lukas
On Fri, Jan 20, 2023 at 07:42:53PM +0100, Lukas Wunner wrote: > On Fri, Jan 20, 2023 at 08:28:51AM -0800, Russ Weight wrote: > > On 1/19/23 05:33, Greg KH wrote: > > > On Wed, Jan 18, 2023 at 08:35:50PM -0500, Tianfei Zhang wrote: > > > > This patchset introduces the FPGA hotplug manager (fpgahp) driver which > > > > has been verified on the Intel N3000 card. > > > > > > > > When a PCIe-based FPGA card is reprogrammed, it temporarily disappears > > > > from the PCIe bus. This needs to be managed to avoid PCIe errors and to > > > > reprobe the device after reprogramming. > > > > > > > > To change the FPGA image, the kernel burns a new image into the flash on > > > > the card, and then triggers the card BMC to load the new image into FPGA. > > > > A new FPGA hotplug manager driver is introduced that leverages the PCIe > > > > hotplug framework to trigger and manage the update of the FPGA image, > > > > including the disappearance and reappearance of the card on the PCIe bus. > > > > The fpgahp driver uses APIs from the pciehp driver. Two new operation > > > > callbacks are defined in hotplug_slot_ops: > > > > > > > > - available_images: Optional: available FPGA images > > > > - image_load: Optional: trigger the FPGA to load a new image > > > > > > Why is all of this tied into the pci hotplug code? Shouldn't it be > > > specific to this one driver instead? pci hotplug is for removing/adding > > > PCI devices to the system, not messing with FPGA images. > > > > > > This feels like an abuse of the pci hotplug bus to me as this is NOT > > > really a PCI hotplug bus at all, right? > > > > While it is true that triggering an FPGA image-load does not involve > > hotplug specific registers to be managed, the RTL that comprises > > the PCIe interface will disappear and then reappear after the FPGA > > is reprogrammed. When it reappears, it_could/_/have a different PCI > > ID. The process of managing this event has a lot of similarity to a > > PCIe hotplug event; there is a lot of existing PCIe hotplug related > > code that could be leveraged. > > It sounds like the N3000 is a PCI endpoint device which, when reprogrammed, > briefly disappears from the bus and then may reappear under a different > device ID. > > What you want to do then is make sure that the slot into which the N3000 > is plugged is hotplug-capable. In that case, pciehp will handle > disappearance and reappearance of the card just fine. Once the N3000 > disables the link, pciehp will bring down the slot. Once it re-enables > the link, it will bring the slot up again. It's as if the card was > removed and replaced with a different one. pciehp will bind to the > Root Port or Downstream Port associated with the hotplug slot. > > The pci_hotplug_port infrastructure is for hotplug controllers which > handle devices disappearing and reappearing *below* them. It is not > for endpoint devices. Yes, thank you for expressing my concerns about this design much better than I originally did. I totally agree. thanks, greg k-h