Message ID | 1726733624-2142-2-git-send-email-shivasharan.srikanteshwara@broadcom.com (mailing list archive) |
---|---|
State | New |
Delegated to: | Bjorn Helgaas |
Headers | show |
Series | PCI/portdrv: Report inter switch P2P links through sysfs | expand |
Hi Shivasharan, kernel test robot noticed the following build warnings: [auto build test WARNING on pci/next] [also build test WARNING on next-20240919] [cannot apply to pci/for-linus linus/master v6.11] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Shivasharan-S/PCI-portdrv-Enable-reporting-inter-switch-P2P-links/20240919-162626 base: https://git.kernel.org/pub/scm/linux/kernel/git/pci/pci.git next patch link: https://lore.kernel.org/r/1726733624-2142-2-git-send-email-shivasharan.srikanteshwara%40broadcom.com patch subject: [PATCH 1/2 v2] PCI/portdrv: Enable reporting inter-switch P2P links config: x86_64-kexec (https://download.01.org/0day-ci/archive/20240920/202409201219.feYAxGor-lkp@intel.com/config) compiler: clang version 18.1.8 (https://github.com/llvm/llvm-project 3b5b5c1ec4a3095ab096dd780e84d7ab81f3d7ff) reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240920/202409201219.feYAxGor-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202409201219.feYAxGor-lkp@intel.com/ All warnings (new ones prefixed by >>): >> drivers/pci/pcie/portdrv.c:86: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst * Determine if device supports Inter switch P2P links. vim +86 drivers/pci/pcie/portdrv.c 84 85 /** > 86 * Determine if device supports Inter switch P2P links. 87 * 88 * Return value: true if inter switch P2P is supported, return false otherwise. 89 */ 90 static bool pcie_port_is_p2p_supported(struct pci_dev *dev) 91 { 92 /* P2P link attribute is supported on upstream ports only */ 93 if (pci_pcie_type(dev) != PCI_EXP_TYPE_UPSTREAM) 94 return false; 95 96 /* 97 * Currently Broadcom PEX switches are supported. 98 */ 99 if (dev->vendor == PCI_VENDOR_ID_LSI_LOGIC && 100 (dev->device == PCI_DEVICE_ID_BRCM_PEX_89000_HLC || 101 dev->device == PCI_DEVICE_ID_BRCM_PEX_89000_LLC)) 102 return pcie_brcm_is_p2p_supported(dev); 103 104 return false; 105 } 106
On Thu, 19 Sep 2024 01:13:43 -0700 Shivasharan S <shivasharan.srikanteshwara@broadcom.com> wrote: > Broadcom PCI switches supports inter-switch P2P links between two > PCI-to-PCI bridges. This presents an optimal data path for data > movement. The patch exports a new sysfs entry for PCI devices that > support the inter switch P2P links and reports the B:D:F information > of the devices that are connected through this inter switch link as > shown below: > > Host root bridge > --------------------------------------- > | | > NIC1 --- PCI Switch1 --- Inter-switch link --- PCI Switch2 --- NIC2 > (2c:00.0) (2a:00.0) (3d:00.0) (40:00.0) > | | > GPU1 GPU2 > (2d:00.0) (3f:00.0) > SERVER 1 > > $ find /sys/ -name "p2p_link" | xargs grep . > /sys/devices/pci0000:29/0000:29:01.0/0000:2a:00.0/p2p_link:0000:3d:00.0 > /sys/devices/pci0000:3c/0000:3c:01.0/0000:3d:00.0/p2p_link:0000:2a:00.0 > > Current support is added to report the P2P links available for > Broadcom switches based on the capability that is reported by the > upstream bridges through their vendor-specific capability registers. > > Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com> > Signed-off-by: Sumanesh Samanta <sumanesh.samanta@broadcom.com> > --- > Changes in v2: > Integrated the code into PCIe portdrv to create the sysfs entries during > probe, as suggested by Mani. Hmm. So we are trying to rework portdrv in general to support extensibility. I'm a little nervous about taking in vendor specific code in the meantime even if it is trivial like this is. We will be having an extensible discovery scheme but for now the plan is that will be child device based for non PCI standard features. It is a fairly small bit of code, so maybe it is fine - I'm not keen on adding the implementation of the vendor specific parts to the main driver though. Push that to an optional c file. A few general comments inline. > > Documentation/ABI/testing/sysfs-bus-pci | 14 +++ > drivers/pci/pcie/portdrv.c | 131 ++++++++++++++++++++++++ > drivers/pci/pcie/portdrv.h | 10 ++ > 3 files changed, 155 insertions(+) > > diff --git a/Documentation/ABI/testing/sysfs-bus-pci b/Documentation/ABI/testing/sysfs-bus-pci > index ecf47559f495..e5d02f20655f 100644 > --- a/Documentation/ABI/testing/sysfs-bus-pci > +++ b/Documentation/ABI/testing/sysfs-bus-pci > @@ -572,3 +572,17 @@ Description: > enclosure-specific indications "specific0" to "specific7", > hence the corresponding led class devices are unavailable if > the DSM interface is used. > + > +What: /sys/bus/pci/devices/.../p2p_link > +Date: September 2024 > +Contact: Shivasharan S <shivasharan.srikanteshwara@broadcom.com> > +Description: > + This file appears on PCIe upstream ports which supports an > + internal P2P link. > + Reading this attribute will provide the list of other upstream > + ports on the system which have an internal P2P link available > + between the two ports. Given this only applies to 'internal' links and not inter switch physical links I think you should rename it. An eventual p2p link between physical switches is going to be much more complex as will need routing information. Let us avoid trampling on that space. > +Users: > + Userspace applications interested in determining a optimal P2P > + link for data transfers between devices connected to the PCIe > + switches. Need more data that 'there is a link' for this. I'd like to see some info on bandwidth and latency. I've previously been in discussions about similar devices that provide a low latency but low bandwidth direct path. That gets more likely if we scale up to multiple physical switches or the software stack is choosing between multiple p2p targets (e.g. getting nearest path to a multiheaded NIC). Perhaps best bet is to leave space for that by using a directory here to cover everything about p2p? Maybe have links under there to the other upstream ports? That might be fiddly to manage though. > diff --git a/drivers/pci/pcie/portdrv.c b/drivers/pci/pcie/portdrv.c > index 6af5e0425872..c940b4b242fd 100644 > --- a/drivers/pci/pcie/portdrv.c > +++ b/drivers/pci/pcie/portdrv.c > @@ -18,6 +18,7 @@ > #include <linux/string.h> > #include <linux/slab.h> > #include <linux/aer.h> > +#include <linux/bitops.h> > > #include "../pci.h" > #include "portdrv.h" > @@ -37,6 +38,134 @@ struct portdrv_service_data { > u32 service; > }; > > +/** > + * pcie_brcm_is_p2p_supported - Broadcom device specific handler > + * to check if the upstream port supports inter switch P2P. > + * > + * @dev: PCIe upstream port to check > + * > + * This function assumes the PCIe upstream port is a Broadcom > + * PCIe device. > + */ > +static bool pcie_brcm_is_p2p_supported(struct pci_dev *dev) Put this in a separate c file + use a config option and some stubs to make it go away if people don't want to support it. The layering and separation in portdrv is currently messy but we shouldn't make it worse :( > +{ > + u64 dsn; > + u16 vsec; > + u32 vsec_data; > + > + dsn = pci_get_dsn(dev); > + if (!dsn) { > + pci_dbg(dev, "DSN capability is not present\n"); > + return false; > + } Why get the dsn (which will frequently exist on other devices) before getting the vsec which won't? Reorder these first two checks. For most devices the match on vendor will fail in the vsec check. > + > + vsec = pci_find_vsec_capability(dev, PCI_VENDOR_ID_LSI_LOGIC, > + PCIE_BRCM_SW_P2P_VSEC_ID); > + if (!vsec) { > + pci_dbg(dev, "Failed to get VSEC capability\n"); > + return false; > + } > + > + pci_read_config_dword(dev, vsec + PCIE_BRCM_SW_P2P_MODE_VSEC_OFFSET, > + &vsec_data); > + > + pci_dbg(dev, "Serial Number: 0x%llx VSEC 0x%x\n", > + dsn, vsec_data); > + > + if (!PCIE_BRCM_SW_IS_SECURE_PART(dsn)) Add a comment on this. Not obvious what this is checking as it's picking a single bit out of a serial number... > + return false; > + > + if (FIELD_GET(PCIE_BRCM_SW_P2P_MODE_MASK, vsec_data) != > + PCIE_BRCM_SW_P2P_MODE_INTER_SW_LINK) > + return false; > + > + return true; return FIELD_GET(PCIE_BRCM_SW_P2P_MODE_MASK, vsec_data) == PCIE_BRCM_SW_P2P_MODE_INTER_SW_LINK; perhaps. > +} > + > +/** > + * Determine if device supports Inter switch P2P links. > + * > + * Return value: true if inter switch P2P is supported, return false otherwise. > + */ > +static bool pcie_port_is_p2p_supported(struct pci_dev *dev) > +{ > + /* P2P link attribute is supported on upstream ports only */ > + if (pci_pcie_type(dev) != PCI_EXP_TYPE_UPSTREAM) > + return false; > + > + /* > + * Currently Broadcom PEX switches are supported. > + */ > + if (dev->vendor == PCI_VENDOR_ID_LSI_LOGIC && > + (dev->device == PCI_DEVICE_ID_BRCM_PEX_89000_HLC || > + dev->device == PCI_DEVICE_ID_BRCM_PEX_89000_LLC)) > + return pcie_brcm_is_p2p_supported(dev); > + > + return false; > +} > + > +/* > + * Traverse list of all PCI bridges and find devices that support Inter switch P2P > + * and have the same serial number to create report the BDF over sysfs. > + */ > +static ssize_t p2p_link_show(struct device *dev, struct device_attribute *attr, > + char *buf) > +{ > + struct pci_dev *pdev = to_pci_dev(dev), *pdev_link = NULL; > + size_t len = 0; > + u64 dsn, dsn_link; > + > + dsn = pci_get_dsn(pdev); Maybe add a comment that we don't need to repeat checks that were done to make the attribute visible. Hence dsn will exist and it will be p2p link capable. > + > + /* Traverse list of PCI bridges to determine any available P2P links */ > + while ((pdev_link = pci_get_class(PCI_CLASS_BRIDGE_PCI << 8, pdev_link)) Feels like we should have a for_each_pci_bridge() similar to for_each_pci_dev() that does this, but that is already defined to mean something else... Bjorn, is this something we should be looking to make more consistent perhaps with naming to make it clear what is a search of all instances on any bus and what is a search below a particular bus? > + != NULL) { > + if (pdev_link == pdev) > + continue; > + > + if (!pcie_port_is_p2p_supported(pdev_link)) > + continue; > + > + dsn_link = pci_get_dsn(pdev_link); > + if (!dsn_link) > + continue; > + > + if (dsn == dsn_link) > + len += sysfs_emit_at(buf, len, "%04x:%02x:%02x.%d\n", > + pci_domain_nr(pdev_link->bus), > + pdev_link->bus->number, PCI_SLOT(pdev_link->devfn), > + PCI_FUNC(pdev_link->devfn)); > + } > + > + return len; > +} > diff --git a/drivers/pci/pcie/portdrv.h b/drivers/pci/pcie/portdrv.h > index 12c89ea0313b..1be06cb45665 100644 > --- a/drivers/pci/pcie/portdrv.h > +++ b/drivers/pci/pcie/portdrv.h > @@ -25,6 +25,16 @@ > > #define PCIE_PORT_DEVICE_MAXSERVICES 5 > > +/* P2P Link supported device IDs */ > +#define PCI_DEVICE_ID_BRCM_PEX_89000_HLC 0xC030 > +#define PCI_DEVICE_ID_BRCM_PEX_89000_LLC 0xC034 > + > +#define PCIE_BRCM_SW_P2P_VSEC_ID 0x1 > +#define PCIE_BRCM_SW_P2P_MODE_VSEC_OFFSET 0xC > +#define PCIE_BRCM_SW_P2P_MODE_MASK GENMASK(9, 8) > +#define PCIE_BRCM_SW_P2P_MODE_INTER_SW_LINK 0x2 > +#define PCIE_BRCM_SW_IS_SECURE_PART(dsn) ((dsn) & 0x8) Define the mask, and use FIELD_GET() to get that. > + > extern bool pcie_ports_dpc_native; > > #ifdef CONFIG_PCIEAER
Hi Jonathan, >> Need more data that 'there is a link' for this. >>I'd like to see some info on bandwidth and latency. As you too noted in your comments, for now, we are only addressing p2p connection between "virtual switches", i.e. switches that look different to the host, but are actually part of the same physical hardware. Given that, I am not sure what we should display for bandwidth and latency. There is no physical link to traverse between the virtual switches, and usually we take that as "infinite" bandwidth and "zero" latency. As such, any number here will make little sense until we start supporting p2p connection between physical switches. We could, of course, have some encoding for the time being, like have "INF" for bandwidth and 0 for latency, but again, those will not be very useful till the day this scheme is extended to physical switch and we display real values, like bandwidth and latency for a x16 PCIe link. Thoughts? sincerely, Sumanesh On Tue, Sep 24, 2024 at 8:57 AM Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote: > > On Thu, 19 Sep 2024 01:13:43 -0700 > Shivasharan S <shivasharan.srikanteshwara@broadcom.com> wrote: > > > Broadcom PCI switches supports inter-switch P2P links between two > > PCI-to-PCI bridges. This presents an optimal data path for data > > movement. The patch exports a new sysfs entry for PCI devices that > > support the inter switch P2P links and reports the B:D:F information > > of the devices that are connected through this inter switch link as > > shown below: > > > > Host root bridge > > --------------------------------------- > > | | > > NIC1 --- PCI Switch1 --- Inter-switch link --- PCI Switch2 --- NIC2 > > (2c:00.0) (2a:00.0) (3d:00.0) (40:00.0) > > | | > > GPU1 GPU2 > > (2d:00.0) (3f:00.0) > > SERVER 1 > > > > $ find /sys/ -name "p2p_link" | xargs grep . > > /sys/devices/pci0000:29/0000:29:01.0/0000:2a:00.0/p2p_link:0000:3d:00.0 > > /sys/devices/pci0000:3c/0000:3c:01.0/0000:3d:00.0/p2p_link:0000:2a:00.0 > > > > Current support is added to report the P2P links available for > > Broadcom switches based on the capability that is reported by the > > upstream bridges through their vendor-specific capability registers. > > > > Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com> > > Signed-off-by: Sumanesh Samanta <sumanesh.samanta@broadcom.com> > > --- > > Changes in v2: > > Integrated the code into PCIe portdrv to create the sysfs entries during > > probe, as suggested by Mani. > > Hmm. So we are trying to rework portdrv in general to support extensibility. > I'm a little nervous about taking in vendor specific code in the meantime > even if it is trivial like this is. We will be having an extensible > discovery scheme but for now the plan is that will be child device based > for non PCI standard features. > > It is a fairly small bit of code, so maybe it is fine - I'm not keen > on adding the implementation of the vendor specific parts to the > main driver though. Push that to an optional c file. > > A few general comments inline. > > > > > Documentation/ABI/testing/sysfs-bus-pci | 14 +++ > > drivers/pci/pcie/portdrv.c | 131 ++++++++++++++++++++++++ > > drivers/pci/pcie/portdrv.h | 10 ++ > > 3 files changed, 155 insertions(+) > > > > diff --git a/Documentation/ABI/testing/sysfs-bus-pci b/Documentation/ABI/testing/sysfs-bus-pci > > index ecf47559f495..e5d02f20655f 100644 > > --- a/Documentation/ABI/testing/sysfs-bus-pci > > +++ b/Documentation/ABI/testing/sysfs-bus-pci > > @@ -572,3 +572,17 @@ Description: > > enclosure-specific indications "specific0" to "specific7", > > hence the corresponding led class devices are unavailable if > > the DSM interface is used. > > + > > +What: /sys/bus/pci/devices/.../p2p_link > > +Date: September 2024 > > +Contact: Shivasharan S <shivasharan.srikanteshwara@broadcom.com> > > +Description: > > + This file appears on PCIe upstream ports which supports an > > + internal P2P link. > > + Reading this attribute will provide the list of other upstream > > + ports on the system which have an internal P2P link available > > + between the two ports. > > Given this only applies to 'internal' links and not inter switch physical links > I think you should rename it. An eventual p2p link between physical switches > is going to be much more complex as will need routing information. > Let us avoid trampling on that space. > > > +Users: > > + Userspace applications interested in determining a optimal P2P > > + link for data transfers between devices connected to the PCIe > > + switches. > > Need more data that 'there is a link' for this. > I'd like to see some info on bandwidth and latency. I've previously been > in discussions about similar devices that provide a low latency but low > bandwidth direct path. That gets more likely if we scale up to > multiple physical switches or the software stack is choosing between > multiple p2p targets (e.g. getting nearest path to a multiheaded NIC). > > Perhaps best bet is to leave space for that by using a directory > here to cover everything about p2p? Maybe have links under there to the > other upstream ports? That might be fiddly to manage though. > > > diff --git a/drivers/pci/pcie/portdrv.c b/drivers/pci/pcie/portdrv.c > > index 6af5e0425872..c940b4b242fd 100644 > > --- a/drivers/pci/pcie/portdrv.c > > +++ b/drivers/pci/pcie/portdrv.c > > @@ -18,6 +18,7 @@ > > #include <linux/string.h> > > #include <linux/slab.h> > > #include <linux/aer.h> > > +#include <linux/bitops.h> > > > > #include "../pci.h" > > #include "portdrv.h" > > @@ -37,6 +38,134 @@ struct portdrv_service_data { > > u32 service; > > }; > > > > +/** > > + * pcie_brcm_is_p2p_supported - Broadcom device specific handler > > + * to check if the upstream port supports inter switch P2P. > > + * > > + * @dev: PCIe upstream port to check > > + * > > + * This function assumes the PCIe upstream port is a Broadcom > > + * PCIe device. > > + */ > > +static bool pcie_brcm_is_p2p_supported(struct pci_dev *dev) > > Put this in a separate c file + use a config option and some > stubs to make it go away if people don't want to support it. > The layering and separation in portdrv is currently messy but > we shouldn't make it worse :( > > > +{ > > + u64 dsn; > > + u16 vsec; > > + u32 vsec_data; > > + > > + dsn = pci_get_dsn(dev); > > + if (!dsn) { > > + pci_dbg(dev, "DSN capability is not present\n"); > > + return false; > > + } > > Why get the dsn (which will frequently exist on other devices) > before getting the vsec which won't? Reorder these first > two checks. For most devices the match on vendor will fail in the > vsec check. > > > + > > + vsec = pci_find_vsec_capability(dev, PCI_VENDOR_ID_LSI_LOGIC, > > + PCIE_BRCM_SW_P2P_VSEC_ID); > > + if (!vsec) { > > + pci_dbg(dev, "Failed to get VSEC capability\n"); > > + return false; > > + } > > + > > + pci_read_config_dword(dev, vsec + PCIE_BRCM_SW_P2P_MODE_VSEC_OFFSET, > > + &vsec_data); > > + > > + pci_dbg(dev, "Serial Number: 0x%llx VSEC 0x%x\n", > > + dsn, vsec_data); > > + > > + if (!PCIE_BRCM_SW_IS_SECURE_PART(dsn)) > > Add a comment on this. Not obvious what this is checking as it's picking > a single bit out of a serial number... > > > + return false; > > + > > + if (FIELD_GET(PCIE_BRCM_SW_P2P_MODE_MASK, vsec_data) != > > + PCIE_BRCM_SW_P2P_MODE_INTER_SW_LINK) > > + return false; > > + > > + return true; > return FIELD_GET(PCIE_BRCM_SW_P2P_MODE_MASK, vsec_data) == > PCIE_BRCM_SW_P2P_MODE_INTER_SW_LINK; > perhaps. > > > +} > > + > > +/** > > + * Determine if device supports Inter switch P2P links. > > + * > > + * Return value: true if inter switch P2P is supported, return false otherwise. > > + */ > > +static bool pcie_port_is_p2p_supported(struct pci_dev *dev) > > +{ > > + /* P2P link attribute is supported on upstream ports only */ > > + if (pci_pcie_type(dev) != PCI_EXP_TYPE_UPSTREAM) > > + return false; > > + > > + /* > > + * Currently Broadcom PEX switches are supported. > > + */ > > + if (dev->vendor == PCI_VENDOR_ID_LSI_LOGIC && > > + (dev->device == PCI_DEVICE_ID_BRCM_PEX_89000_HLC || > > + dev->device == PCI_DEVICE_ID_BRCM_PEX_89000_LLC)) > > + return pcie_brcm_is_p2p_supported(dev); > > + > > + return false; > > +} > > + > > +/* > > + * Traverse list of all PCI bridges and find devices that support Inter switch P2P > > + * and have the same serial number to create report the BDF over sysfs. > > + */ > > +static ssize_t p2p_link_show(struct device *dev, struct device_attribute *attr, > > + char *buf) > > +{ > > + struct pci_dev *pdev = to_pci_dev(dev), *pdev_link = NULL; > > + size_t len = 0; > > + u64 dsn, dsn_link; > > + > > + dsn = pci_get_dsn(pdev); > > Maybe add a comment that we don't need to repeat checks that were done > to make the attribute visible. Hence dsn will exist and it will be p2p link capable. > > > + > > + /* Traverse list of PCI bridges to determine any available P2P links */ > > + while ((pdev_link = pci_get_class(PCI_CLASS_BRIDGE_PCI << 8, pdev_link)) > > Feels like we should have a for_each_pci_bridge() similar to for_each_pci_dev() > that does this, but that is already defined to mean something else... > > Bjorn, is this something we should be looking to make more consistent > perhaps with naming to make it clear what is a search of all instances > on any bus and what is a search below a particular bus? > > > + != NULL) { > > + if (pdev_link == pdev) > > + continue; > > + > > + if (!pcie_port_is_p2p_supported(pdev_link)) > > + continue; > > + > > + dsn_link = pci_get_dsn(pdev_link); > > + if (!dsn_link) > > + continue; > > + > > + if (dsn == dsn_link) > > + len += sysfs_emit_at(buf, len, "%04x:%02x:%02x.%d\n", > > + pci_domain_nr(pdev_link->bus), > > + pdev_link->bus->number, PCI_SLOT(pdev_link->devfn), > > + PCI_FUNC(pdev_link->devfn)); > > + } > > + > > + return len; > > +} > > > > diff --git a/drivers/pci/pcie/portdrv.h b/drivers/pci/pcie/portdrv.h > > index 12c89ea0313b..1be06cb45665 100644 > > --- a/drivers/pci/pcie/portdrv.h > > +++ b/drivers/pci/pcie/portdrv.h > > @@ -25,6 +25,16 @@ > > > > #define PCIE_PORT_DEVICE_MAXSERVICES 5 > > > > +/* P2P Link supported device IDs */ > > +#define PCI_DEVICE_ID_BRCM_PEX_89000_HLC 0xC030 > > +#define PCI_DEVICE_ID_BRCM_PEX_89000_LLC 0xC034 > > + > > +#define PCIE_BRCM_SW_P2P_VSEC_ID 0x1 > > +#define PCIE_BRCM_SW_P2P_MODE_VSEC_OFFSET 0xC > > +#define PCIE_BRCM_SW_P2P_MODE_MASK GENMASK(9, 8) > > +#define PCIE_BRCM_SW_P2P_MODE_INTER_SW_LINK 0x2 > > +#define PCIE_BRCM_SW_IS_SECURE_PART(dsn) ((dsn) & 0x8) > Define the mask, and use FIELD_GET() to get that. > > + > > extern bool pcie_ports_dpc_native; > > > > #ifdef CONFIG_PCIEAER >
On Thu, 3 Oct 2024 14:41:07 -0600 Sumanesh Samanta <sumanesh.samanta@broadcom.com> wrote: > Hi Jonathan, > > >> Need more data that 'there is a link' for this. > >>I'd like to see some info on bandwidth and latency. > > As you too noted in your comments, for now, we are only addressing p2p > connection between "virtual switches", i.e. switches that look > different to the host, but are actually part of the same physical > hardware. > Given that, I am not sure what we should display for bandwidth and > latency. There is no physical link to traverse between the virtual > switches, and usually we take that as "infinite" bandwidth and "zero" > latency. For a case where you have no information, not having attributes is sensible. If there is information (CXL CDAT provides this for switches for instance) then we should have an interface that provides space for that information. > As such, any number here will make little sense until we > start supporting p2p connection between physical switches. As above, it makes sense in a switch as well - if the information is available. > We could, > of course, have some encoding for the time being, like have "INF" for > bandwidth and 0 for latency, but again, those will not be very useful > till the day this scheme is extended to physical switch and we display > real values, like bandwidth and latency for a x16 PCIe link. Thoughts? Hide the sysfs attributes for latency and bandwidth if we simply don't know. Software built on top of this can then assume full bandwidth is available or better still run some measurements to establish the missing data. All I really meant by this suggestion is a directory with space for other info is probably more extensible than a single file. Jonathan > > sincerely, > Sumanesh > > > On Tue, Sep 24, 2024 at 8:57 AM Jonathan Cameron > <Jonathan.Cameron@huawei.com> wrote: > > > > On Thu, 19 Sep 2024 01:13:43 -0700 > > Shivasharan S <shivasharan.srikanteshwara@broadcom.com> wrote: > > > > > Broadcom PCI switches supports inter-switch P2P links between two > > > PCI-to-PCI bridges. This presents an optimal data path for data > > > movement. The patch exports a new sysfs entry for PCI devices that > > > support the inter switch P2P links and reports the B:D:F information > > > of the devices that are connected through this inter switch link as > > > shown below: > > > > > > Host root bridge > > > --------------------------------------- > > > | | > > > NIC1 --- PCI Switch1 --- Inter-switch link --- PCI Switch2 --- NIC2 > > > (2c:00.0) (2a:00.0) (3d:00.0) (40:00.0) > > > | | > > > GPU1 GPU2 > > > (2d:00.0) (3f:00.0) > > > SERVER 1 > > > > > > $ find /sys/ -name "p2p_link" | xargs grep . > > > /sys/devices/pci0000:29/0000:29:01.0/0000:2a:00.0/p2p_link:0000:3d:00.0 > > > /sys/devices/pci0000:3c/0000:3c:01.0/0000:3d:00.0/p2p_link:0000:2a:00.0 > > > > > > Current support is added to report the P2P links available for > > > Broadcom switches based on the capability that is reported by the > > > upstream bridges through their vendor-specific capability registers. > > > > > > Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com> > > > Signed-off-by: Sumanesh Samanta <sumanesh.samanta@broadcom.com> > > > --- > > > Changes in v2: > > > Integrated the code into PCIe portdrv to create the sysfs entries during > > > probe, as suggested by Mani. > > > > Hmm. So we are trying to rework portdrv in general to support extensibility. > > I'm a little nervous about taking in vendor specific code in the meantime > > even if it is trivial like this is. We will be having an extensible > > discovery scheme but for now the plan is that will be child device based > > for non PCI standard features. > > > > It is a fairly small bit of code, so maybe it is fine - I'm not keen > > on adding the implementation of the vendor specific parts to the > > main driver though. Push that to an optional c file. > > > > A few general comments inline. > > > > > > > > Documentation/ABI/testing/sysfs-bus-pci | 14 +++ > > > drivers/pci/pcie/portdrv.c | 131 ++++++++++++++++++++++++ > > > drivers/pci/pcie/portdrv.h | 10 ++ > > > 3 files changed, 155 insertions(+) > > > > > > diff --git a/Documentation/ABI/testing/sysfs-bus-pci b/Documentation/ABI/testing/sysfs-bus-pci > > > index ecf47559f495..e5d02f20655f 100644 > > > --- a/Documentation/ABI/testing/sysfs-bus-pci > > > +++ b/Documentation/ABI/testing/sysfs-bus-pci > > > @@ -572,3 +572,17 @@ Description: > > > enclosure-specific indications "specific0" to "specific7", > > > hence the corresponding led class devices are unavailable if > > > the DSM interface is used. > > > + > > > +What: /sys/bus/pci/devices/.../p2p_link > > > +Date: September 2024 > > > +Contact: Shivasharan S <shivasharan.srikanteshwara@broadcom.com> > > > +Description: > > > + This file appears on PCIe upstream ports which supports an > > > + internal P2P link. > > > + Reading this attribute will provide the list of other upstream > > > + ports on the system which have an internal P2P link available > > > + between the two ports. > > > > Given this only applies to 'internal' links and not inter switch physical links > > I think you should rename it. An eventual p2p link between physical switches > > is going to be much more complex as will need routing information. > > Let us avoid trampling on that space. > > > > > +Users: > > > + Userspace applications interested in determining a optimal P2P > > > + link for data transfers between devices connected to the PCIe > > > + switches. > > > > Need more data that 'there is a link' for this. > > I'd like to see some info on bandwidth and latency. I've previously been > > in discussions about similar devices that provide a low latency but low > > bandwidth direct path. That gets more likely if we scale up to > > multiple physical switches or the software stack is choosing between > > multiple p2p targets (e.g. getting nearest path to a multiheaded NIC). > > > > Perhaps best bet is to leave space for that by using a directory > > here to cover everything about p2p? Maybe have links under there to the > > other upstream ports? That might be fiddly to manage though. > > > > > diff --git a/drivers/pci/pcie/portdrv.c b/drivers/pci/pcie/portdrv.c > > > index 6af5e0425872..c940b4b242fd 100644 > > > --- a/drivers/pci/pcie/portdrv.c > > > +++ b/drivers/pci/pcie/portdrv.c > > > @@ -18,6 +18,7 @@ > > > #include <linux/string.h> > > > #include <linux/slab.h> > > > #include <linux/aer.h> > > > +#include <linux/bitops.h> > > > > > > #include "../pci.h" > > > #include "portdrv.h" > > > @@ -37,6 +38,134 @@ struct portdrv_service_data { > > > u32 service; > > > }; > > > > > > +/** > > > + * pcie_brcm_is_p2p_supported - Broadcom device specific handler > > > + * to check if the upstream port supports inter switch P2P. > > > + * > > > + * @dev: PCIe upstream port to check > > > + * > > > + * This function assumes the PCIe upstream port is a Broadcom > > > + * PCIe device. > > > + */ > > > +static bool pcie_brcm_is_p2p_supported(struct pci_dev *dev) > > > > Put this in a separate c file + use a config option and some > > stubs to make it go away if people don't want to support it. > > The layering and separation in portdrv is currently messy but > > we shouldn't make it worse :( > > > > > +{ > > > + u64 dsn; > > > + u16 vsec; > > > + u32 vsec_data; > > > + > > > + dsn = pci_get_dsn(dev); > > > + if (!dsn) { > > > + pci_dbg(dev, "DSN capability is not present\n"); > > > + return false; > > > + } > > > > Why get the dsn (which will frequently exist on other devices) > > before getting the vsec which won't? Reorder these first > > two checks. For most devices the match on vendor will fail in the > > vsec check. > > > > > + > > > + vsec = pci_find_vsec_capability(dev, PCI_VENDOR_ID_LSI_LOGIC, > > > + PCIE_BRCM_SW_P2P_VSEC_ID); > > > + if (!vsec) { > > > + pci_dbg(dev, "Failed to get VSEC capability\n"); > > > + return false; > > > + } > > > + > > > + pci_read_config_dword(dev, vsec + PCIE_BRCM_SW_P2P_MODE_VSEC_OFFSET, > > > + &vsec_data); > > > + > > > + pci_dbg(dev, "Serial Number: 0x%llx VSEC 0x%x\n", > > > + dsn, vsec_data); > > > + > > > + if (!PCIE_BRCM_SW_IS_SECURE_PART(dsn)) > > > > Add a comment on this. Not obvious what this is checking as it's picking > > a single bit out of a serial number... > > > > > + return false; > > > + > > > + if (FIELD_GET(PCIE_BRCM_SW_P2P_MODE_MASK, vsec_data) != > > > + PCIE_BRCM_SW_P2P_MODE_INTER_SW_LINK) > > > + return false; > > > + > > > + return true; > > return FIELD_GET(PCIE_BRCM_SW_P2P_MODE_MASK, vsec_data) == > > PCIE_BRCM_SW_P2P_MODE_INTER_SW_LINK; > > perhaps. > > > > > +} > > > + > > > +/** > > > + * Determine if device supports Inter switch P2P links. > > > + * > > > + * Return value: true if inter switch P2P is supported, return false otherwise. > > > + */ > > > +static bool pcie_port_is_p2p_supported(struct pci_dev *dev) > > > +{ > > > + /* P2P link attribute is supported on upstream ports only */ > > > + if (pci_pcie_type(dev) != PCI_EXP_TYPE_UPSTREAM) > > > + return false; > > > + > > > + /* > > > + * Currently Broadcom PEX switches are supported. > > > + */ > > > + if (dev->vendor == PCI_VENDOR_ID_LSI_LOGIC && > > > + (dev->device == PCI_DEVICE_ID_BRCM_PEX_89000_HLC || > > > + dev->device == PCI_DEVICE_ID_BRCM_PEX_89000_LLC)) > > > + return pcie_brcm_is_p2p_supported(dev); > > > + > > > + return false; > > > +} > > > + > > > +/* > > > + * Traverse list of all PCI bridges and find devices that support Inter switch P2P > > > + * and have the same serial number to create report the BDF over sysfs. > > > + */ > > > +static ssize_t p2p_link_show(struct device *dev, struct device_attribute *attr, > > > + char *buf) > > > +{ > > > + struct pci_dev *pdev = to_pci_dev(dev), *pdev_link = NULL; > > > + size_t len = 0; > > > + u64 dsn, dsn_link; > > > + > > > + dsn = pci_get_dsn(pdev); > > > > Maybe add a comment that we don't need to repeat checks that were done > > to make the attribute visible. Hence dsn will exist and it will be p2p link capable. > > > > > + > > > + /* Traverse list of PCI bridges to determine any available P2P links */ > > > + while ((pdev_link = pci_get_class(PCI_CLASS_BRIDGE_PCI << 8, pdev_link)) > > > > Feels like we should have a for_each_pci_bridge() similar to for_each_pci_dev() > > that does this, but that is already defined to mean something else... > > > > Bjorn, is this something we should be looking to make more consistent > > perhaps with naming to make it clear what is a search of all instances > > on any bus and what is a search below a particular bus? > > > > > + != NULL) { > > > + if (pdev_link == pdev) > > > + continue; > > > + > > > + if (!pcie_port_is_p2p_supported(pdev_link)) > > > + continue; > > > + > > > + dsn_link = pci_get_dsn(pdev_link); > > > + if (!dsn_link) > > > + continue; > > > + > > > + if (dsn == dsn_link) > > > + len += sysfs_emit_at(buf, len, "%04x:%02x:%02x.%d\n", > > > + pci_domain_nr(pdev_link->bus), > > > + pdev_link->bus->number, PCI_SLOT(pdev_link->devfn), > > > + PCI_FUNC(pdev_link->devfn)); > > > + } > > > + > > > + return len; > > > +} > > > > > > > diff --git a/drivers/pci/pcie/portdrv.h b/drivers/pci/pcie/portdrv.h > > > index 12c89ea0313b..1be06cb45665 100644 > > > --- a/drivers/pci/pcie/portdrv.h > > > +++ b/drivers/pci/pcie/portdrv.h > > > @@ -25,6 +25,16 @@ > > > > > > #define PCIE_PORT_DEVICE_MAXSERVICES 5 > > > > > > +/* P2P Link supported device IDs */ > > > +#define PCI_DEVICE_ID_BRCM_PEX_89000_HLC 0xC030 > > > +#define PCI_DEVICE_ID_BRCM_PEX_89000_LLC 0xC034 > > > + > > > +#define PCIE_BRCM_SW_P2P_VSEC_ID 0x1 > > > +#define PCIE_BRCM_SW_P2P_MODE_VSEC_OFFSET 0xC > > > +#define PCIE_BRCM_SW_P2P_MODE_MASK GENMASK(9, 8) > > > +#define PCIE_BRCM_SW_P2P_MODE_INTER_SW_LINK 0x2 > > > +#define PCIE_BRCM_SW_IS_SECURE_PART(dsn) ((dsn) & 0x8) > > Define the mask, and use FIELD_GET() to get that. > > > + > > > extern bool pcie_ports_dpc_native; > > > > > > #ifdef CONFIG_PCIEAER > > >
On Fri, Oct 4, 2024 at 4:09 PM Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote: > > On Thu, 3 Oct 2024 14:41:07 -0600 > Sumanesh Samanta <sumanesh.samanta@broadcom.com> wrote: > > > Hi Jonathan, > > > > >> Need more data that 'there is a link' for this. > > >>I'd like to see some info on bandwidth and latency. > > > > As you too noted in your comments, for now, we are only addressing p2p > > connection between "virtual switches", i.e. switches that look > > different to the host, but are actually part of the same physical > > hardware. > > Given that, I am not sure what we should display for bandwidth and > > latency. There is no physical link to traverse between the virtual > > switches, and usually we take that as "infinite" bandwidth and "zero" > > latency. > > For a case where you have no information, not having attributes is > sensible. If there is information (CXL CDAT provides this for switches > for instance) then we should have an interface that provides space for > that information. > > > As such, any number here will make little sense until we > > start supporting p2p connection between physical switches. > > As above, it makes sense in a switch as well - if the information > is available. > > > We could, > > of course, have some encoding for the time being, like have "INF" for > > bandwidth and 0 for latency, but again, those will not be very useful > > till the day this scheme is extended to physical switch and we display > > real values, like bandwidth and latency for a x16 PCIe link. Thoughts? > > Hide the sysfs attributes for latency and bandwidth if we simply don't > know. Software built on top of this can then assume full bandwidth > is available or better still run some measurements to establish the > missing data. > > All I really meant by this suggestion is a directory with space for > other info is probably more extensible than a single file. Hi Jonathan, We will make the changes to add a directory for p2p_link related information to be exposed to user. We will only populate the information related to the inter-switch P2P links. Rest of the attributes can be added for devices that report them at a later stage. Please check if the directory structure makes sense to you: /sys/devices/.../B:D:F/p2p_link/links -> Reading this file will return the same information that is returned currently by the p2p_link file. > > Jonathan > > > > > sincerely, > > Sumanesh > > > > > > On Tue, Sep 24, 2024 at 8:57 AM Jonathan Cameron > > <Jonathan.Cameron@huawei.com> wrote: > > > > > > On Thu, 19 Sep 2024 01:13:43 -0700 > > > Shivasharan S <shivasharan.srikanteshwara@broadcom.com> wrote: > > > > > > > Broadcom PCI switches supports inter-switch P2P links between two > > > > PCI-to-PCI bridges. This presents an optimal data path for data > > > > movement. The patch exports a new sysfs entry for PCI devices that > > > > support the inter switch P2P links and reports the B:D:F information > > > > of the devices that are connected through this inter switch link as > > > > shown below: > > > > > > > > Host root bridge > > > > --------------------------------------- > > > > | | > > > > NIC1 --- PCI Switch1 --- Inter-switch link --- PCI Switch2 --- NIC2 > > > > (2c:00.0) (2a:00.0) (3d:00.0) (40:00.0) > > > > | | > > > > GPU1 GPU2 > > > > (2d:00.0) (3f:00.0) > > > > SERVER 1 > > > > > > > > $ find /sys/ -name "p2p_link" | xargs grep . > > > > /sys/devices/pci0000:29/0000:29:01.0/0000:2a:00.0/p2p_link:0000:3d:00.0 > > > > /sys/devices/pci0000:3c/0000:3c:01.0/0000:3d:00.0/p2p_link:0000:2a:00.0 > > > > > > > > Current support is added to report the P2P links available for > > > > Broadcom switches based on the capability that is reported by the > > > > upstream bridges through their vendor-specific capability registers. > > > > > > > > Signed-off-by: Shivasharan S < shivasharan.srikanteshwara@broadcom.com> > > > > Signed-off-by: Sumanesh Samanta <sumanesh.samanta@broadcom.com> > > > > --- > > > > Changes in v2: > > > > Integrated the code into PCIe portdrv to create the sysfs entries during > > > > probe, as suggested by Mani. > > > > > > Hmm. So we are trying to rework portdrv in general to support extensibility. > > > I'm a little nervous about taking in vendor specific code in the meantime > > > even if it is trivial like this is. We will be having an extensible > > > discovery scheme but for now the plan is that will be child device based > > > for non PCI standard features. > > > > > > It is a fairly small bit of code, so maybe it is fine - I'm not keen > > > on adding the implementation of the vendor specific parts to the > > > main driver though. Push that to an optional c file. > > > > > > A few general comments inline. > > > > > > > > > > > Documentation/ABI/testing/sysfs-bus-pci | 14 +++ > > > > drivers/pci/pcie/portdrv.c | 131 ++++++++++++++++++++++++ > > > > drivers/pci/pcie/portdrv.h | 10 ++ > > > > 3 files changed, 155 insertions(+) > > > > > > > > diff --git a/Documentation/ABI/testing/sysfs-bus-pci b/Documentation/ABI/testing/sysfs-bus-pci > > > > index ecf47559f495..e5d02f20655f 100644 > > > > --- a/Documentation/ABI/testing/sysfs-bus-pci > > > > +++ b/Documentation/ABI/testing/sysfs-bus-pci > > > > @@ -572,3 +572,17 @@ Description: > > > > enclosure-specific indications "specific0" to "specific7", > > > > hence the corresponding led class devices are unavailable if > > > > the DSM interface is used. > > > > + > > > > +What: /sys/bus/pci/devices/.../p2p_link > > > > +Date: September 2024 > > > > +Contact: Shivasharan S <shivasharan.srikanteshwara@broadcom.com > > > > > +Description: > > > > + This file appears on PCIe upstream ports which supports an > > > > + internal P2P link. > > > > + Reading this attribute will provide the list of other upstream > > > > + ports on the system which have an internal P2P link available > > > > + between the two ports. > > > > > > Given this only applies to 'internal' links and not inter switch physical links > > > I think you should rename it. An eventual p2p link between physical switches > > > is going to be much more complex as will need routing information. > > > Let us avoid trampling on that space. > > > > > > > +Users: > > > > + Userspace applications interested in determining a optimal P2P > > > > + link for data transfers between devices connected to the PCIe > > > > + switches. > > > > > > Need more data that 'there is a link' for this. > > > I'd like to see some info on bandwidth and latency. I've previously been > > > in discussions about similar devices that provide a low latency but low > > > bandwidth direct path. That gets more likely if we scale up to > > > multiple physical switches or the software stack is choosing between > > > multiple p2p targets (e.g. getting nearest path to a multiheaded NIC). > > > > > > Perhaps best bet is to leave space for that by using a directory > > > here to cover everything about p2p? Maybe have links under there to the > > > other upstream ports? That might be fiddly to manage though. > > > > > > > diff --git a/drivers/pci/pcie/portdrv.c b/drivers/pci/pcie/portdrv.c > > > > index 6af5e0425872..c940b4b242fd 100644 > > > > --- a/drivers/pci/pcie/portdrv.c > > > > +++ b/drivers/pci/pcie/portdrv.c > > > > @@ -18,6 +18,7 @@ > > > > #include <linux/string.h> > > > > #include <linux/slab.h> > > > > #include <linux/aer.h> > > > > +#include <linux/bitops.h> > > > > > > > > #include "../pci.h" > > > > #include "portdrv.h" > > > > @@ -37,6 +38,134 @@ struct portdrv_service_data { > > > > u32 service; > > > > }; > > > > > > > > +/** > > > > + * pcie_brcm_is_p2p_supported - Broadcom device specific handler > > > > + * to check if the upstream port supports inter switch P2P. > > > > + * > > > > + * @dev: PCIe upstream port to check > > > > + * > > > > + * This function assumes the PCIe upstream port is a Broadcom > > > > + * PCIe device. > > > > + */ > > > > +static bool pcie_brcm_is_p2p_supported(struct pci_dev *dev) > > > > > > Put this in a separate c file + use a config option and some > > > stubs to make it go away if people don't want to support it. > > > The layering and separation in portdrv is currently messy but > > > we shouldn't make it worse :( > > > Understood. We will move the p2p_link creation code to a separate .c/.h file . > > > > +{ > > > > + u64 dsn; > > > > + u16 vsec; > > > > + u32 vsec_data; > > > > + > > > > + dsn = pci_get_dsn(dev); > > > > + if (!dsn) { > > > > + pci_dbg(dev, "DSN capability is not present\n"); > > > > + return false; > > > > + } > > > > > > Why get the dsn (which will frequently exist on other devices) > > > before getting the vsec which won't? Reorder these first > > > two checks. For most devices the match on vendor will fail in the > > > vsec check. > > > This will be fixed in the next version of this patch. > > > > + > > > > + vsec = pci_find_vsec_capability(dev, PCI_VENDOR_ID_LSI_LOGIC, > > > > + PCIE_BRCM_SW_P2P_VSEC_ID); > > > > + if (!vsec) { > > > > + pci_dbg(dev, "Failed to get VSEC capability\n"); > > > > + return false; > > > > + } > > > > + > > > > + pci_read_config_dword(dev, vsec + PCIE_BRCM_SW_P2P_MODE_VSEC_OFFSET, > > > > + &vsec_data); > > > > + > > > > + pci_dbg(dev, "Serial Number: 0x%llx VSEC 0x%x\n", > > > > + dsn, vsec_data); > > > > + > > > > + if (!PCIE_BRCM_SW_IS_SECURE_PART(dsn)) > > > > > > Add a comment on this. Not obvious what this is checking as it's picking > > > a single bit out of a serial number... > > > Will do. > > > > + return false; > > > > + > > > > + if (FIELD_GET(PCIE_BRCM_SW_P2P_MODE_MASK, vsec_data) != > > > > + PCIE_BRCM_SW_P2P_MODE_INTER_SW_LINK) > > > > + return false; > > > > + > > > > + return true; > > > return FIELD_GET(PCIE_BRCM_SW_P2P_MODE_MASK, vsec_data) == > > > PCIE_BRCM_SW_P2P_MODE_INTER_SW_LINK; > > > perhaps. > > > Will take care of this. > > > > +} > > > > + > > > > +/** > > > > + * Determine if device supports Inter switch P2P links. > > > > + * > > > > + * Return value: true if inter switch P2P is supported, return false otherwise. > > > > + */ > > > > +static bool pcie_port_is_p2p_supported(struct pci_dev *dev) > > > > +{ > > > > + /* P2P link attribute is supported on upstream ports only */ > > > > + if (pci_pcie_type(dev) != PCI_EXP_TYPE_UPSTREAM) > > > > + return false; > > > > + > > > > + /* > > > > + * Currently Broadcom PEX switches are supported. > > > > + */ > > > > + if (dev->vendor == PCI_VENDOR_ID_LSI_LOGIC && > > > > + (dev->device == PCI_DEVICE_ID_BRCM_PEX_89000_HLC || > > > > + dev->device == PCI_DEVICE_ID_BRCM_PEX_89000_LLC)) > > > > + return pcie_brcm_is_p2p_supported(dev); > > > > + > > > > + return false; > > > > +} > > > > + > > > > +/* > > > > + * Traverse list of all PCI bridges and find devices that support Inter switch P2P > > > > + * and have the same serial number to create report the BDF over sysfs. > > > > + */ > > > > +static ssize_t p2p_link_show(struct device *dev, struct device_attribute *attr, > > > > + char *buf) > > > > +{ > > > > + struct pci_dev *pdev = to_pci_dev(dev), *pdev_link = NULL; > > > > + size_t len = 0; > > > > + u64 dsn, dsn_link; > > > > + > > > > + dsn = pci_get_dsn(pdev); > > > > > > Maybe add a comment that we don't need to repeat checks that were done > > > to make the attribute visible. Hence dsn will exist and it will be p2p link capable. > > > Will take care of this. > > > > + > > > > + /* Traverse list of PCI bridges to determine any available P2P links */ > > > > + while ((pdev_link = pci_get_class(PCI_CLASS_BRIDGE_PCI << 8, pdev_link)) > > > > > > Feels like we should have a for_each_pci_bridge() similar to for_each_pci_dev() > > > that does this, but that is already defined to mean something else... > > > > > > Bjorn, is this something we should be looking to make more consistent > > > perhaps with naming to make it clear what is a search of all instances > > > on any bus and what is a search below a particular bus? > > > > > > > + != NULL) { > > > > + if (pdev_link == pdev) > > > > + continue; > > > > + > > > > + if (!pcie_port_is_p2p_supported(pdev_link)) > > > > + continue; > > > > + > > > > + dsn_link = pci_get_dsn(pdev_link); > > > > + if (!dsn_link) > > > > + continue; > > > > + > > > > + if (dsn == dsn_link) > > > > + len += sysfs_emit_at(buf, len, "%04x:%02x:%02x.%d\n", > > > > + pci_domain_nr(pdev_link->bus), > > > > + pdev_link->bus->number, PCI_SLOT(pdev_link->devfn), > > > > + PCI_FUNC(pdev_link->devfn)); > > > > + } > > > > + > > > > + return len; > > > > +} > > > > > > > > > > diff --git a/drivers/pci/pcie/portdrv.h b/drivers/pci/pcie/portdrv.h > > > > index 12c89ea0313b..1be06cb45665 100644 > > > > --- a/drivers/pci/pcie/portdrv.h > > > > +++ b/drivers/pci/pcie/portdrv.h > > > > @@ -25,6 +25,16 @@ > > > > > > > > #define PCIE_PORT_DEVICE_MAXSERVICES 5 > > > > > > > > +/* P2P Link supported device IDs */ > > > > +#define PCI_DEVICE_ID_BRCM_PEX_89000_HLC 0xC030 > > > > +#define PCI_DEVICE_ID_BRCM_PEX_89000_LLC 0xC034 > > > > + > > > > +#define PCIE_BRCM_SW_P2P_VSEC_ID 0x1 > > > > +#define PCIE_BRCM_SW_P2P_MODE_VSEC_OFFSET 0xC > > > > +#define PCIE_BRCM_SW_P2P_MODE_MASK GENMASK(9, 8) > > > > +#define PCIE_BRCM_SW_P2P_MODE_INTER_SW_LINK 0x2 > > > > +#define PCIE_BRCM_SW_IS_SECURE_PART(dsn) ((dsn) & 0x8) > > > Define the mask, and use FIELD_GET() to get that. Will take care of this. Best Regards, Shivasharan > > > > + > > > > extern bool pcie_ports_dpc_native; > > > > > > > > #ifdef CONFIG_PCIEAER > > > > > >
On Mon, 14 Oct 2024 15:10:57 +0530 Shivasharan Srikanteshwara <shivasharan.srikanteshwara@broadcom.com> wrote: > On Fri, Oct 4, 2024 at 4:09 PM Jonathan Cameron <Jonathan.Cameron@huawei.com> > wrote: > > > > On Thu, 3 Oct 2024 14:41:07 -0600 > > Sumanesh Samanta <sumanesh.samanta@broadcom.com> wrote: > > > > > Hi Jonathan, > > > > > > >> Need more data that 'there is a link' for this. > > > >>I'd like to see some info on bandwidth and latency. > > > > > > As you too noted in your comments, for now, we are only addressing p2p > > > connection between "virtual switches", i.e. switches that look > > > different to the host, but are actually part of the same physical > > > hardware. > > > Given that, I am not sure what we should display for bandwidth and > > > latency. There is no physical link to traverse between the virtual > > > switches, and usually we take that as "infinite" bandwidth and "zero" > > > latency. > > > > For a case where you have no information, not having attributes is > > sensible. If there is information (CXL CDAT provides this for switches > > for instance) then we should have an interface that provides space for > > that information. > > > > > As such, any number here will make little sense until we > > > start supporting p2p connection between physical switches. > > > > As above, it makes sense in a switch as well - if the information > > is available. > > > > > We could, > > > of course, have some encoding for the time being, like have "INF" for > > > bandwidth and 0 for latency, but again, those will not be very useful > > > till the day this scheme is extended to physical switch and we display > > > real values, like bandwidth and latency for a x16 PCIe link. Thoughts? > > > > Hide the sysfs attributes for latency and bandwidth if we simply don't > > know. Software built on top of this can then assume full bandwidth > > is available or better still run some measurements to establish the > > missing data. > > > > All I really meant by this suggestion is a directory with space for > > other info is probably more extensible than a single file. > > Hi Jonathan, > We will make the changes to add a directory for p2p_link related information > to be exposed to user. We will only populate the information related to the > inter-switch P2P links. Rest of the attributes can be added for devices that > report them at a later stage. > Please check if the directory structure makes sense to you: > /sys/devices/.../B:D:F/p2p_link/links -> Reading this file will return the > same > information that is returned currently by the p2p_link file. Sounds good to me. Jonathan
diff --git a/Documentation/ABI/testing/sysfs-bus-pci b/Documentation/ABI/testing/sysfs-bus-pci index ecf47559f495..e5d02f20655f 100644 --- a/Documentation/ABI/testing/sysfs-bus-pci +++ b/Documentation/ABI/testing/sysfs-bus-pci @@ -572,3 +572,17 @@ Description: enclosure-specific indications "specific0" to "specific7", hence the corresponding led class devices are unavailable if the DSM interface is used. + +What: /sys/bus/pci/devices/.../p2p_link +Date: September 2024 +Contact: Shivasharan S <shivasharan.srikanteshwara@broadcom.com> +Description: + This file appears on PCIe upstream ports which supports an + internal P2P link. + Reading this attribute will provide the list of other upstream + ports on the system which have an internal P2P link available + between the two ports. +Users: + Userspace applications interested in determining a optimal P2P + link for data transfers between devices connected to the PCIe + switches. diff --git a/drivers/pci/pcie/portdrv.c b/drivers/pci/pcie/portdrv.c index 6af5e0425872..c940b4b242fd 100644 --- a/drivers/pci/pcie/portdrv.c +++ b/drivers/pci/pcie/portdrv.c @@ -18,6 +18,7 @@ #include <linux/string.h> #include <linux/slab.h> #include <linux/aer.h> +#include <linux/bitops.h> #include "../pci.h" #include "portdrv.h" @@ -37,6 +38,134 @@ struct portdrv_service_data { u32 service; }; +/** + * pcie_brcm_is_p2p_supported - Broadcom device specific handler + * to check if the upstream port supports inter switch P2P. + * + * @dev: PCIe upstream port to check + * + * This function assumes the PCIe upstream port is a Broadcom + * PCIe device. + */ +static bool pcie_brcm_is_p2p_supported(struct pci_dev *dev) +{ + u64 dsn; + u16 vsec; + u32 vsec_data; + + dsn = pci_get_dsn(dev); + if (!dsn) { + pci_dbg(dev, "DSN capability is not present\n"); + return false; + } + + vsec = pci_find_vsec_capability(dev, PCI_VENDOR_ID_LSI_LOGIC, + PCIE_BRCM_SW_P2P_VSEC_ID); + if (!vsec) { + pci_dbg(dev, "Failed to get VSEC capability\n"); + return false; + } + + pci_read_config_dword(dev, vsec + PCIE_BRCM_SW_P2P_MODE_VSEC_OFFSET, + &vsec_data); + + pci_dbg(dev, "Serial Number: 0x%llx VSEC 0x%x\n", + dsn, vsec_data); + + if (!PCIE_BRCM_SW_IS_SECURE_PART(dsn)) + return false; + + if (FIELD_GET(PCIE_BRCM_SW_P2P_MODE_MASK, vsec_data) != + PCIE_BRCM_SW_P2P_MODE_INTER_SW_LINK) + return false; + + return true; +} + +/** + * Determine if device supports Inter switch P2P links. + * + * Return value: true if inter switch P2P is supported, return false otherwise. + */ +static bool pcie_port_is_p2p_supported(struct pci_dev *dev) +{ + /* P2P link attribute is supported on upstream ports only */ + if (pci_pcie_type(dev) != PCI_EXP_TYPE_UPSTREAM) + return false; + + /* + * Currently Broadcom PEX switches are supported. + */ + if (dev->vendor == PCI_VENDOR_ID_LSI_LOGIC && + (dev->device == PCI_DEVICE_ID_BRCM_PEX_89000_HLC || + dev->device == PCI_DEVICE_ID_BRCM_PEX_89000_LLC)) + return pcie_brcm_is_p2p_supported(dev); + + return false; +} + +/* + * Traverse list of all PCI bridges and find devices that support Inter switch P2P + * and have the same serial number to create report the BDF over sysfs. + */ +static ssize_t p2p_link_show(struct device *dev, struct device_attribute *attr, + char *buf) +{ + struct pci_dev *pdev = to_pci_dev(dev), *pdev_link = NULL; + size_t len = 0; + u64 dsn, dsn_link; + + dsn = pci_get_dsn(pdev); + + /* Traverse list of PCI bridges to determine any available P2P links */ + while ((pdev_link = pci_get_class(PCI_CLASS_BRIDGE_PCI << 8, pdev_link)) + != NULL) { + if (pdev_link == pdev) + continue; + + if (!pcie_port_is_p2p_supported(pdev_link)) + continue; + + dsn_link = pci_get_dsn(pdev_link); + if (!dsn_link) + continue; + + if (dsn == dsn_link) + len += sysfs_emit_at(buf, len, "%04x:%02x:%02x.%d\n", + pci_domain_nr(pdev_link->bus), + pdev_link->bus->number, PCI_SLOT(pdev_link->devfn), + PCI_FUNC(pdev_link->devfn)); + } + + return len; +} + +/* P2P link sysfs attribute. */ +static struct device_attribute dev_attr_p2p_link = + __ATTR(p2p_link, 0444, p2p_link_show, NULL); + +static struct attribute *pcie_port_p2p_link_attrs[] = { + &dev_attr_p2p_link.attr, + NULL +}; + +static umode_t pcie_port_p2p_link_attrs_is_visible(struct kobject *kobj, + struct attribute *a, int n) +{ + struct device *dev = kobj_to_dev(kobj); + struct pci_dev *pdev = to_pci_dev(dev); + + if (!pcie_port_is_p2p_supported(pdev)) + return 0; + + return a->mode; +} + +const struct attribute_group pcie_port_p2p_link_attr_group = { + .attrs = pcie_port_p2p_link_attrs, + .is_visible = pcie_port_p2p_link_attrs_is_visible, +}; + /** * release_pcie_device - free PCI Express port service device structure * @dev: Port service device to release @@ -715,6 +844,8 @@ static int pcie_portdrv_probe(struct pci_dev *dev, pm_runtime_allow(&dev->dev); } + sysfs_update_group(&dev->dev.kobj, &pcie_port_p2p_link_attr_group); + return 0; } diff --git a/drivers/pci/pcie/portdrv.h b/drivers/pci/pcie/portdrv.h index 12c89ea0313b..1be06cb45665 100644 --- a/drivers/pci/pcie/portdrv.h +++ b/drivers/pci/pcie/portdrv.h @@ -25,6 +25,16 @@ #define PCIE_PORT_DEVICE_MAXSERVICES 5 +/* P2P Link supported device IDs */ +#define PCI_DEVICE_ID_BRCM_PEX_89000_HLC 0xC030 +#define PCI_DEVICE_ID_BRCM_PEX_89000_LLC 0xC034 + +#define PCIE_BRCM_SW_P2P_VSEC_ID 0x1 +#define PCIE_BRCM_SW_P2P_MODE_VSEC_OFFSET 0xC +#define PCIE_BRCM_SW_P2P_MODE_MASK GENMASK(9, 8) +#define PCIE_BRCM_SW_P2P_MODE_INTER_SW_LINK 0x2 +#define PCIE_BRCM_SW_IS_SECURE_PART(dsn) ((dsn) & 0x8) + extern bool pcie_ports_dpc_native; #ifdef CONFIG_PCIEAER