Message ID | 20230521115141.2384444-1-vladimir.oltean@nxp.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Bjorn Helgaas |
Headers | show |
Series | [pci] PCI: don't skip probing entire device if first fn OF node has status = "disabled" | expand |
Hi Bjorn, On Sun, May 21, 2023 at 02:51:41PM +0300, Vladimir Oltean wrote: > pci_scan_child_bus_extend() calls pci_scan_slot() with devfn > (bus:device:function) being a multiple of 8, i.e. for each unique > device. > > pci_scan_slot() has logic to say that if the function 0 of a device is > absent, the entire device is absent and we can skip the other functions > entirely. Traditionally, this has meant that pci_bus_read_dev_vendor_id() > returns an error code for that function. > > However, since the blamed commit, there is an extra confounding > condition: function 0 of the device exists and has a valid vendor id, > but it is disabled in the device tree. In that case, pci_scan_slot() > would incorrectly skip the entire device instead of just that function. > > Such is the case with the NXP LS1028A SoC, which has an ECAM > for embedded Ethernet (see pcie@1f0000000 in > arm64/boot/dts/freescale/fsl-ls1028a.dtsi). Each Ethernet port > represents a function within the ENETC ECAM, with function 0 going > to ENETC Ethernet port 0, connected to SERDES port 0 (SGMII or USXGMII). > > When using a SERDES protocol such as 0x9999, all 4 SERDES lanes go to > the Ethernet switch (function 5 on this ECAM) and none go to ENETC > port 0. So, ENETC port 0 needs to have status = "disabled", and embedded > Ethernet takes place just through the other functions (fn 2 is the DSA > master, fn 3 is the MDIO controller, fn 5 is the DSA switch etc). > Contrast this with other SERDES protocols like 0x85bb, where the switch > takes up a single SERDES lane and uses the QSGMII protocol - so ENETC > port 0 also gets access to a SERDES lane. > > Therefore, here, function 0 being unused has nothing to do with the > entire PCI device being unused. > > Add a "bool present_but_skipped" which is propagated from the caller > of pci_set_of_node() all the way to pci_scan_slot(), so that it can > distinguish an error reading the ECAM from a disabled device in the > device tree. > > Fixes: 6fffbc7ae137 ("PCI: Honor firmware's device disabled status") > Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> > --- Do you have some comments here? Thanks, Vladimir
On Sun, May 21, 2023 at 02:51:41PM +0300, Vladimir Oltean wrote: > pci_scan_child_bus_extend() calls pci_scan_slot() with devfn > (bus:device:function) being a multiple of 8, i.e. for each unique > device. > > pci_scan_slot() has logic to say that if the function 0 of a device is > absent, the entire device is absent and we can skip the other functions > entirely. Traditionally, this has meant that pci_bus_read_dev_vendor_id() > returns an error code for that function. > > However, since the blamed commit, there is an extra confounding > condition: function 0 of the device exists and has a valid vendor id, > but it is disabled in the device tree. In that case, pci_scan_slot() > would incorrectly skip the entire device instead of just that function. > > Such is the case with the NXP LS1028A SoC, which has an ECAM > for embedded Ethernet (see pcie@1f0000000 in > arm64/boot/dts/freescale/fsl-ls1028a.dtsi). Each Ethernet port > represents a function within the ENETC ECAM, with function 0 going > to ENETC Ethernet port 0, connected to SERDES port 0 (SGMII or USXGMII). > > When using a SERDES protocol such as 0x9999, all 4 SERDES lanes go to > the Ethernet switch (function 5 on this ECAM) and none go to ENETC > port 0. So, ENETC port 0 needs to have status = "disabled", and embedded > Ethernet takes place just through the other functions (fn 2 is the DSA > master, fn 3 is the MDIO controller, fn 5 is the DSA switch etc). > Contrast this with other SERDES protocols like 0x85bb, where the switch > takes up a single SERDES lane and uses the QSGMII protocol - so ENETC > port 0 also gets access to a SERDES lane. Can you write this description in terms of PCI topology? The nitty-gritty SERDES details are not relevant at this level, except to say that Function 0 is present in some cases but not others, and when it is not present, *other* functions may be present. Sigh. Per spec (PCIe r6.0, sec 7.5.1.1.9), software is not permitted to probe for Functions other than 0 unless "explicitly indicated by another mechanism, such as an ARI or SR-IOV Capability." Does it "work" to probe when the spec prohibits it? Probably. Does it lead to some breakage elsewhere eventually? Quite possibly. They didn't put "software must not probe" in the spec just to make enumeration faster. So I'm a little grumpy about further complicating this already messy path just to accommodate a new non-compliant SoC. Everybody pays the price of understanding all this stuff, and it doesn't seem in balance. Can you take advantage of some existing mechanism like PCI_SCAN_ALL_PCIE_DEVS or hypervisor_isolated_pci_functions() (which could be renamed and made more general)? > Therefore, here, function 0 being unused has nothing to do with the > entire PCI device being unused. > > Add a "bool present_but_skipped" which is propagated from the caller > of pci_set_of_node() all the way to pci_scan_slot(), so that it can > distinguish an error reading the ECAM from a disabled device in the > device tree. > > Fixes: 6fffbc7ae137 ("PCI: Honor firmware's device disabled status") > Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> > --- > drivers/pci/pci.h | 1 + > drivers/pci/probe.c | 58 +++++++++++++++++++++++++++++++-------------- > 2 files changed, 41 insertions(+), 18 deletions(-) > > diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h > index 2475098f6518..dc11e0945744 100644 > --- a/drivers/pci/pci.h > +++ b/drivers/pci/pci.h > @@ -240,6 +240,7 @@ bool pci_bus_generic_read_dev_vendor_id(struct pci_bus *bus, int devfn, u32 *pl, > int crs_timeout); > int pci_idt_bus_quirk(struct pci_bus *bus, int devfn, u32 *pl, int crs_timeout); > > +int __pci_setup_device(struct pci_dev *dev, bool *present_but_skipped); > int pci_setup_device(struct pci_dev *dev); > int __pci_read_base(struct pci_dev *dev, enum pci_bar_type type, > struct resource *res, unsigned int reg); > diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c > index 0b2826c4a832..17a51fa55020 100644 > --- a/drivers/pci/probe.c > +++ b/drivers/pci/probe.c > @@ -1811,17 +1811,7 @@ static void early_dump_pci_device(struct pci_dev *pdev) > value, 256, false); > } > > -/** > - * pci_setup_device - Fill in class and map information of a device > - * @dev: the device structure to fill > - * > - * Initialize the device structure with information about the device's > - * vendor,class,memory and IO-space addresses, IRQ lines etc. > - * Called at initialisation of the PCI subsystem and by CardBus services. > - * Returns 0 on success and negative if unknown type of device (not normal, > - * bridge or CardBus). > - */ > -int pci_setup_device(struct pci_dev *dev) > +int __pci_setup_device(struct pci_dev *dev, bool *present_but_skipped) > { > u32 class; > u16 cmd; > @@ -1841,8 +1831,10 @@ int pci_setup_device(struct pci_dev *dev) > set_pcie_port_type(dev); > > err = pci_set_of_node(dev); > - if (err) > + if (err) { > + *present_but_skipped = true; > return err; > + } > pci_set_acpi_fwnode(dev); > > pci_dev_assign_slot(dev); > @@ -1995,6 +1987,23 @@ int pci_setup_device(struct pci_dev *dev) > return 0; > } > > +/** > + * pci_setup_device - Fill in class and map information of a device > + * @dev: the device structure to fill > + * > + * Initialize the device structure with information about the device's > + * vendor,class,memory and IO-space addresses, IRQ lines etc. > + * Called at initialisation of the PCI subsystem and by CardBus services. > + * Returns 0 on success and negative if unknown type of device (not normal, > + * bridge or CardBus). > + */ > +int pci_setup_device(struct pci_dev *dev) > +{ > + bool present_but_skipped = false; > + > + return __pci_setup_device(dev, &present_but_skipped); > +} > + > static void pci_configure_mps(struct pci_dev *dev) > { > struct pci_dev *bridge = pci_upstream_bridge(dev); > @@ -2414,7 +2423,8 @@ EXPORT_SYMBOL(pci_bus_read_dev_vendor_id); > * Read the config data for a PCI device, sanity-check it, > * and fill in the dev structure. > */ > -static struct pci_dev *pci_scan_device(struct pci_bus *bus, int devfn) > +static struct pci_dev *pci_scan_device(struct pci_bus *bus, int devfn, > + bool *present_but_skipped) > { > struct pci_dev *dev; > u32 l; > @@ -2430,7 +2440,7 @@ static struct pci_dev *pci_scan_device(struct pci_bus *bus, int devfn) > dev->vendor = l & 0xffff; > dev->device = (l >> 16) & 0xffff; > > - if (pci_setup_device(dev)) { > + if (__pci_setup_device(dev, present_but_skipped)) { > pci_bus_put(dev->bus); > kfree(dev); > return NULL; > @@ -2575,17 +2585,20 @@ void pci_device_add(struct pci_dev *dev, struct pci_bus *bus) > WARN_ON(ret < 0); > } > > -struct pci_dev *pci_scan_single_device(struct pci_bus *bus, int devfn) > +static struct pci_dev *__pci_scan_single_device(struct pci_bus *bus, int devfn, > + bool *present_but_skipped) > { > struct pci_dev *dev; > > + *present_but_skipped = false; > + > dev = pci_get_slot(bus, devfn); > if (dev) { > pci_dev_put(dev); > return dev; > } > > - dev = pci_scan_device(bus, devfn); > + dev = pci_scan_device(bus, devfn, present_but_skipped); > if (!dev) > return NULL; > > @@ -2593,6 +2606,13 @@ struct pci_dev *pci_scan_single_device(struct pci_bus *bus, int devfn) > > return dev; > } > + > +struct pci_dev *pci_scan_single_device(struct pci_bus *bus, int devfn) > +{ > + bool present_but_skipped; > + > + return __pci_scan_single_device(bus, devfn, &present_but_skipped); > +} > EXPORT_SYMBOL(pci_scan_single_device); > > static int next_ari_fn(struct pci_bus *bus, struct pci_dev *dev, int fn) > @@ -2665,6 +2685,7 @@ static int only_one_child(struct pci_bus *bus) > */ > int pci_scan_slot(struct pci_bus *bus, int devfn) > { > + bool present_but_skipped; > struct pci_dev *dev; > int fn = 0, nr = 0; > > @@ -2672,13 +2693,14 @@ int pci_scan_slot(struct pci_bus *bus, int devfn) > return 0; /* Already scanned the entire slot */ > > do { > - dev = pci_scan_single_device(bus, devfn + fn); > + dev = __pci_scan_single_device(bus, devfn + fn, > + &present_but_skipped); > if (dev) { > if (!pci_dev_is_added(dev)) > nr++; > if (fn > 0) > dev->multifunction = 1; > - } else if (fn == 0) { > + } else if (fn == 0 && !present_but_skipped) { > /* > * Function 0 is required unless we are running on > * a hypervisor that passes through individual PCI > -- > 2.34.1 >
On Tue, May 30, 2023 at 04:58:55PM -0500, Bjorn Helgaas wrote: > Can you write this description in terms of PCI topology? The > nitty-gritty SERDES details are not relevant at this level, except to > say that Function 0 is present in some cases but not others, and when > it is not present, *other* functions may be present. No. It is to say that within the device, all PCIe functions (including 0) are always available and have the same number, but depending on SERDES configuration, their PCIe presence might be practically useful or not. So that's how function 0 may end having status = "disabled" in the device tree. > Sigh. Per spec (PCIe r6.0, sec 7.5.1.1.9), software is not permitted > to probe for Functions other than 0 unless "explicitly indicated by > another mechanism, such as an ARI or SR-IOV Capability." > > Does it "work" to probe when the spec prohibits it? Probably. Does > it lead to some breakage elsewhere eventually? Quite possibly. They > didn't put "software must not probe" in the spec just to make > enumeration faster. > > So I'm a little grumpy about further complicating this already messy > path just to accommodate a new non-compliant SoC. Everybody pays the > price of understanding all this stuff, and it doesn't seem in balance. > > Can you take advantage of some existing mechanism like > PCI_SCAN_ALL_PCIE_DEVS or hypervisor_isolated_pci_functions() (which > could be renamed and made more general)? Not responding yet to the rest of the email since it's not clear to me that you've understood function 0 is absolutely present and responds to all config space accesses - it's just disabled in the device tree because the user doesn't have something useful to do with it.
On Wed, May 31, 2023 at 01:04:36AM +0300, Vladimir Oltean wrote: > On Tue, May 30, 2023 at 04:58:55PM -0500, Bjorn Helgaas wrote: > > Can you write this description in terms of PCI topology? The > > nitty-gritty SERDES details are not relevant at this level, except to > > say that Function 0 is present in some cases but not others, and when > > it is not present, *other* functions may be present. > > No. It is to say that within the device, all PCIe functions (including 0) > are always available and have the same number, but depending on SERDES > configuration, their PCIe presence might be practically useful or not. > So that's how function 0 may end having status = "disabled" in the > device tree. > > > Sigh. Per spec (PCIe r6.0, sec 7.5.1.1.9), software is not permitted > > to probe for Functions other than 0 unless "explicitly indicated by > > another mechanism, such as an ARI or SR-IOV Capability." > > > > Does it "work" to probe when the spec prohibits it? Probably. Does > > it lead to some breakage elsewhere eventually? Quite possibly. They > > didn't put "software must not probe" in the spec just to make > > enumeration faster. > > > > So I'm a little grumpy about further complicating this already messy > > path just to accommodate a new non-compliant SoC. Everybody pays the > > price of understanding all this stuff, and it doesn't seem in balance. > > > > Can you take advantage of some existing mechanism like > > PCI_SCAN_ALL_PCIE_DEVS or hypervisor_isolated_pci_functions() (which > > could be renamed and made more general)? > > Not responding yet to the rest of the email since it's not clear to me > that you've understood function 0 is absolutely present and responds > to all config space accesses - it's just disabled in the device tree > because the user doesn't have something useful to do with it. Ah, you're right, sorry I missed that. Dispensing with the SERDES details would make this more obvious. Not sure why this needs to change the pci_scan_slot() path, since Function 0 is present and enumerable even though it's not useful in some cases. Seems like something in pci_set_of_node() or a quirk could do whatever you need to do. Bjorn
On Tue, May 30, 2023 at 05:27:24PM -0500, Bjorn Helgaas wrote: > Ah, you're right, sorry I missed that. Dispensing with the SERDES > details would make this more obvious. Lesson learned. When I had just gotten out of college, every time I asked the coworkers in my company what they're up to, I was amazed by them just proceeding to tell me all the nitty gritty details of what they're doing and debugging, like I was supposed to understand or care for that matter. "Dude, can't you just paint the high level idea without using dorky words?" Now I'm one of them... > Not sure why this needs to change the pci_scan_slot() path, since > Function 0 is present and enumerable even though it's not useful in > some cases. Well, the rationale for me was pretty simple: it's the pci_scan_slot() logic that I want to change - continue enumeration in some cases when the pci_dev for fn 0 is NULL - and I'm otherwise perfectly okay with pci_scan_slot() getting a NULL pci_dev from pci_setup_device() for fn 0. That wasn't something I had in mind to change. This patch is what it takes to propagate a qualifier, without leaving a mark in any structure, for that NULL return code: is it NULL because enumeration came up with nothing, or is it NULL because pci_set_of_node() said so? > Seems like something in pci_set_of_node() or a quirk could do whatever > you need to do. Could you help me out with a more detailed hint here? I'm not really familiar with the PCI core code. You probably mean to suggest leaving a stateful flag somewhere, though I'm not exactly sure where that is, that would reach pci_scan_slot() enough to be able to alter its decision.
On Wed, May 31, 2023 at 02:15:09AM +0300, Vladimir Oltean wrote: > On Tue, May 30, 2023 at 05:27:24PM -0500, Bjorn Helgaas wrote: > > Ah, you're right, sorry I missed that. Dispensing with the SERDES > > details would make this more obvious. > > Lesson learned. When I had just gotten out of college, every time I asked > the coworkers in my company what they're up to, I was amazed by them just > proceeding to tell me all the nitty gritty details of what they're doing > and debugging, like I was supposed to understand or care for that matter. > "Dude, can't you just paint the high level idea without using dorky words?" > Now I'm one of them... Haha :) Communication is the hardest part! > > Seems like something in pci_set_of_node() or a quirk could do whatever > > you need to do. > > Could you help me out with a more detailed hint here? I'm not really > familiar with the PCI core code. You probably mean to suggest leaving a > stateful flag somewhere, though I'm not exactly sure where that is, that > would reach pci_scan_slot() enough to be able to alter its decision. What bad things happen without this patch? I guess we enumerate Function 0 but in some cases it's not useful? That in itself wouldn't be a disaster; there are lots of things we enumerate but don't use. But in this case, maybe a driver would claim Function 0 but it wouldn't work as expected? Bjorn
On Wed, May 31, 2023 at 11:56:02AM -0500, Bjorn Helgaas wrote: > On Wed, May 31, 2023 at 02:15:09AM +0300, Vladimir Oltean wrote: > > On Tue, May 30, 2023 at 05:27:24PM -0500, Bjorn Helgaas wrote: > > > Ah, you're right, sorry I missed that. Dispensing with the SERDES > > > details would make this more obvious. > > > > Lesson learned. When I had just gotten out of college, every time I asked > > the coworkers in my company what they're up to, I was amazed by them just > > proceeding to tell me all the nitty gritty details of what they're doing > > and debugging, like I was supposed to understand or care for that matter. > > "Dude, can't you just paint the high level idea without using dorky words?" > > Now I'm one of them... > > Haha :) Communication is the hardest part! I know... > What bad things happen without this patch? It's in the commit title: probing the entire device (PCI device!!!) is skipped if function 0 has status = "disabled". Aka PCIe functions 1, 2, 3, 4, ...
On Wed, May 31, 2023 at 03:24:46PM -0500, Bjorn Helgaas wrote: > I guess I should have asked "what bad things happen without this patch > and without the DT 'disabled' status"? Well, now that you put it this way, I do realize that things are not so ideal for me. Our drivers for the functions of this device were already checking for of_device_is_available() during probe. So, reverting the core PCIe patch, they would still not register a network interface, which is good. However (and this is the bad part), multiple functions of this PCIe device unfortunately share a common memory, which is not zeroized by hardware, and so, to avoid multi-bit ECC errors, it must be zeroized by software, using some memory space accesses from all functions that have access to that shared memory (every function zeroizes its piece of it). This, sadly, includes functions which have status = "disabled". See commit 3222b5b613db ("net: enetc: initialize RFS/RSS memories for unused ports too"). What we used to do was start probing a bit in enetc_pf_probe(), enable the memory space, zeroize our part of the shared memory, then check of_device_is_available() and finally, we disable the memory space again and exit probing with -ENODEV. That is not possible anymore with the core patch, because the PCIe core will not probe our disabled functions at all anymore. The ENETC is not a hot-pluggable PCIe device. It uses Enhanced Allocation to essentially describe on-chip memory spaces, which are always present. So presumably, a different system-level solution to initialize those shared memories (U-Boot?) may be chosen, if implementing this workaround in Linux puts too much pressure on the PCIe core and the way in which it does things. Initially I didn't want to do this in prior boot stages because we only enable the RCEC in Linux, nothing is broken other than the spurious AER messages, and, you know.. the kernel may still run indefinitely on top of bootloaders which don't have the workaround applied. So working around it in Linux avoids one dependency.
On Thu, Jun 01, 2023 at 11:11:56AM +0300, Vladimir Oltean wrote: > On Wed, May 31, 2023 at 03:24:46PM -0500, Bjorn Helgaas wrote: > > I guess I should have asked "what bad things happen without this patch > > and without the DT 'disabled' status"? > > Well, now that you put it this way, I do realize that things are not so > ideal for me. > > Our drivers for the functions of this device were already checking for > of_device_is_available() during probe. So, reverting the core PCIe > patch, they would still not register a network interface, which is good. > > However (and this is the bad part), multiple functions of this PCIe > device unfortunately share a common memory, which is not zeroized by > hardware, and so, to avoid multi-bit ECC errors, it must be zeroized by > software, using some memory space accesses from all functions that have > access to that shared memory (every function zeroizes its piece of it). > This, sadly, includes functions which have status = "disabled". See > commit 3222b5b613db ("net: enetc: initialize RFS/RSS memories for unused > ports too"). > > What we used to do was start probing a bit in enetc_pf_probe(), enable > the memory space, zeroize our part of the shared memory, then check > of_device_is_available() and finally, we disable the memory space again > and exit probing with -ENODEV. > > That is not possible anymore with the core patch, because the PCIe core > will not probe our disabled functions at all anymore. To make sure I understand you, I think you're saying that if Function 0 has DT status "disabled", 6fffbc7ae137 ("PCI: Honor firmware's device disabled status") breaks things because we don't enumerate Function 0 and the driver can't temporarily claim it to zero out its piece of the shared memory. With just 6fffbc7ae137, we don't enumerate Function 0, which means we don't see that it's a multi-function device, so we don't enumerate Functions 1, 2, etc, either. With both 6fffbc7ae137 and your current patch, we would enumerate Functions 1, 2, etc, but we still skip Function 0, so its piece of the shared memory still doesn't get zeroed. > The ENETC is not a hot-pluggable PCIe device. It uses Enhanced Allocation > to essentially describe on-chip memory spaces, which are always present. > So presumably, a different system-level solution to initialize those > shared memories (U-Boot?) may be chosen, if implementing this workaround > in Linux puts too much pressure on the PCIe core and the way in which it > does things. Initially I didn't want to do this in prior boot stages > because we only enable the RCEC in Linux, nothing is broken other than > the spurious AER messages, and, you know.. the kernel may still run > indefinitely on top of bootloaders which don't have the workaround applied. > So working around it in Linux avoids one dependency. If I understand correctly, something (bootloader or Linux) needs to do something to Function 0 (e.g., clear memory). Doing it in Linux would minimize dependences on the bootloader, so that seems desirable to me. That means Linux needs to enumerate Function 0 so it is visible to a driver or possibly a quirk. I think we could contemplate implementing 6fffbc7ae137 in a different way. Checking DT status at driver probe-time would probably work for Loongson, but wouldn't quite solve the NXP problem because the driver wouldn't be able to claim Function 0 even temporarily. Is DT the only way to learn the NXP SERDES configuration? I think it would be much better if there were a way to programmatically learn it, because then you wouldn't have to worry about syncing the DT with the platform configuration, and it would decouple this from the Loongson situation. (If there were a way to actually discover the Loongson situation instead of relying on DT, e.g., by keying off a Device ID or something, that would be much better, too. I assume we explored that, but I don't remember the details.) Bjorn
On Thu, Jun 01, 2023 at 10:44:45AM -0500, Bjorn Helgaas wrote: > To make sure I understand you, I think you're saying that if Function > 0 has DT status "disabled", 6fffbc7ae137 ("PCI: Honor firmware's > device disabled status") breaks things because we don't enumerate > Function 0 and the driver can't temporarily claim it to zero out its > piece of the shared memory. > > With just 6fffbc7ae137, we don't enumerate Function 0, which means we > don't see that it's a multi-function device, so we don't enumerate > Functions 1, 2, etc, either. > > With both 6fffbc7ae137 and your current patch, we would enumerate > Functions 1, 2, etc, but we still skip Function 0, so its piece of the > shared memory still doesn't get zeroed. I'm saying that as long as commit 6fffbc7ae137 ("PCI: Honor firmware's device disabled status") exists in the form where the pci_driver :: probe() is completely skipped for disabled functions, the NXP ENETC PCIe device has a problem no matter what the function number is. That problem is: the device drivers of all PCIe functions need to clear some memory before they ultimately fail to probe (as they should), because of some hardware design oversight. That is no longer possible if the driver has no hook to execute code for those devices that are disabled. On top of that, function 0 having status = "disabled" is extra problematic, because the PCI core will now just assume that functions 1 .. N don't exist at all, which is simply false, because the usefulness of ENETC port 0 (PCIe function 0) from a networking perspective is independent from the usefulness of ENETC port 1 (PCIe function 1), ENETC port 2 etc. > > > The ENETC is not a hot-pluggable PCIe device. It uses Enhanced Allocation > > to essentially describe on-chip memory spaces, which are always present. > > So presumably, a different system-level solution to initialize those > > shared memories (U-Boot?) may be chosen, if implementing this workaround > > in Linux puts too much pressure on the PCIe core and the way in which it > > does things. Initially I didn't want to do this in prior boot stages > > because we only enable the RCEC in Linux, nothing is broken other than > > the spurious AER messages, and, you know.. the kernel may still run > > indefinitely on top of bootloaders which don't have the workaround applied. > > So working around it in Linux avoids one dependency. > > If I understand correctly, something (bootloader or Linux) needs to do > something to Function 0 (e.g., clear memory). To more than just function 0 (also 1, 2 and 6). There are 2 confounding problems, the latter being something that was exposed by your question: what will happen that's bad with the current mainline code structure, *notwithstanding* the fact that function 0 may have status = "disabled" (which currently will skip enumeration for the rest of the functions which don't have status = "disabled"). > Doing it in Linux would minimize dependences on the bootloader, so > that seems desirable to me. That means Linux needs to enumerate > Function 0 so it is visible to a driver or possibly a quirk. Uhm... no, that wouldn't be enough. Only a straight revert would satisfy the workaround that we currently have for NXP ENETC in Linux. Also, I'm not sure if it was completely reasonable of me in the first place to exploit this quirk of the Linux PCI bus - that the probe function is called even if a device is disabled in the device tree. I would understand if I was forced to rethink that. > I think we could contemplate implementing 6fffbc7ae137 in a different > way. Checking DT status at driver probe-time would probably work for > Loongson, but wouldn't quite solve the NXP problem because the driver > wouldn't be able to claim Function 0 even temporarily. Not sure what you mean by "checking DT status at driver probe-time". Does enetc_pf_probe() -> of_device_is_available() qualify? You probably mean earlier than that. My problem is that I don't really understand what was the functional need for commit 6fffbc7ae137 ("PCI: Honor firmware's device disabled status") in the first place, considering that any device driver can already fail to probe based on the same condition at its own will. > Is DT the only way to learn the NXP SERDES configuration? I think it > would be much better if there were a way to programmatically learn it, > because then you wouldn't have to worry about syncing the DT with the > platform configuration, and it would decouple this from the Loongson > situation. Syncing the DT with the platform configuration will always be necessary, because for networking we will also need extra information which is completely non-discoverable, like a phy-handle or such, and that depends on the wiring and static pinmuxing of the SoC. So it is practically reasonable to expect that what is usable has status = "okay", and what isn't has status = "disabled". Not to mention, there are already device trees in circulation which are written that way, and those need to continue to work. > (If there were a way to actually discover the Loongson situation > instead of relying on DT, e.g., by keying off a Device ID or > something, that would be much better, too. I assume we explored that, > but I don't remember the details.) > > Bjorn What is it that's special about the Loongson situation?
On Thu, Jun 01, 2023 at 07:33:35PM +0300, Vladimir Oltean wrote: > On Thu, Jun 01, 2023 at 10:44:45AM -0500, Bjorn Helgaas wrote: > > To make sure I understand you, I think you're saying that if Function > > 0 has DT status "disabled", 6fffbc7ae137 ("PCI: Honor firmware's > > device disabled status") breaks things because we don't enumerate > > Function 0 and the driver can't temporarily claim it to zero out its > > piece of the shared memory. > > > > With just 6fffbc7ae137, we don't enumerate Function 0, which means we > > don't see that it's a multi-function device, so we don't enumerate > > Functions 1, 2, etc, either. > > > > With both 6fffbc7ae137 and your current patch, we would enumerate > > Functions 1, 2, etc, but we still skip Function 0, so its piece of the > > shared memory still doesn't get zeroed. > > I'm saying that as long as commit 6fffbc7ae137 ("PCI: Honor firmware's > device disabled status") exists in the form where the pci_driver :: probe() > is completely skipped for disabled functions, the NXP ENETC PCIe device > has a problem no matter what the function number is. Yep. > That problem is: > the device drivers of all PCIe functions need to clear some memory > before they ultimately fail to probe (as they should), because of some > hardware design oversight. That is no longer possible if the driver has > no hook to execute code for those devices that are disabled. Yep. If there's no pci_dev, there's no nice way to do anything to the device. > On top of that, function 0 having status = "disabled" is extra > problematic, because the PCI core will now just assume that functions 1 .. N > don't exist at all, which is simply false, because the usefulness of > ENETC port 0 (PCIe function 0) from a networking perspective is > independent from the usefulness of ENETC port 1 (PCIe function 1), ENETC > port 2 etc. Yes. > > > The ENETC is not a hot-pluggable PCIe device. It uses Enhanced Allocation > > > to essentially describe on-chip memory spaces, which are always present. > > > So presumably, a different system-level solution to initialize those > > > shared memories (U-Boot?) may be chosen, if implementing this workaround > > > in Linux puts too much pressure on the PCIe core and the way in which it > > > does things. Initially I didn't want to do this in prior boot stages > > > because we only enable the RCEC in Linux, nothing is broken other than > > > the spurious AER messages, and, you know.. the kernel may still run > > > indefinitely on top of bootloaders which don't have the workaround applied. > > > So working around it in Linux avoids one dependency. > > > > If I understand correctly, something (bootloader or Linux) needs to do > > something to Function 0 (e.g., clear memory). > > To more than just function 0 (also 1, 2 and 6). Yes. > There are 2 confounding > problems, the latter being something that was exposed by your question: > what will happen that's bad with the current mainline code structure, > *notwithstanding* the fact that function 0 may have status = "disabled" > (which currently will skip enumeration for the rest of the functions > which don't have status = "disabled"). > > > Doing it in Linux would minimize dependences on the bootloader, so > > that seems desirable to me. That means Linux needs to enumerate > > Function 0 so it is visible to a driver or possibly a quirk. > > Uhm... no, that wouldn't be enough. Only a straight revert would satisfy > the workaround that we currently have for NXP ENETC in Linux. I guess you mean a revert of 6fffbc7ae137? This whole conversation is about whether we can rework 6fffbc7ae137 to work both for Loongson and for you, so nothing is decided yet. The point is, I assume you agree that it's preferable if we don't have to depend on a bootloader to clear the memory. > Also, I'm not sure if it was completely reasonable of me in the first > place to exploit this quirk of the Linux PCI bus - that the probe > function is called even if a device is disabled in the device tree. > I would understand if I was forced to rethink that. After 6fffbc7ae137, the probe function is not called if the device is disabled in DT because there's no pci_dev for it at all. > > I think we could contemplate implementing 6fffbc7ae137 in a different > > way. Checking DT status at driver probe-time would probably work for > > Loongson, but wouldn't quite solve the NXP problem because the driver > > wouldn't be able to claim Function 0 even temporarily. > > Not sure what you mean by "checking DT status at driver probe-time". > Does enetc_pf_probe() -> of_device_is_available() qualify? You probably > mean earlier than that. I was thinking about something in pci_device_probe(), e.g., by extending pci_device_can_probe(). But again, we're just exploring the solution space; I'm not saying this is the best or only path. > My problem is that I don't really understand what was the functional > need for commit 6fffbc7ae137 ("PCI: Honor firmware's device disabled > status") in the first place, considering that any device driver can > already fail to probe based on the same condition at its own will. In general, PCI drivers shouldn't rely on DT. If the bus driver (PCI in this case) calls a driver's probe function, the driver can assume the device exists. But enetc is not a general-purpose driver, and if DT is the only way to discover this property, I guess you're stuck doing that. > > Is DT the only way to learn the NXP SERDES configuration? I think it > > would be much better if there were a way to programmatically learn it, > > because then you wouldn't have to worry about syncing the DT with the > > platform configuration, and it would decouple this from the Loongson > > situation. > > Syncing the DT with the platform configuration will always be necessary, > because for networking we will also need extra information which is > completely non-discoverable, like a phy-handle or such, and that depends > on the wiring and static pinmuxing of the SoC. So it is practically > reasonable to expect that what is usable has status = "okay", and what > isn't has status = "disabled". Not to mention, there are already device > trees in circulation which are written that way, and those need to > continue to work. Just because we need DT for non-discoverable info A doesn't mean we should depend on it for B if B *is* discoverable. This question of disabling a device via DT but still needing to do things to the device is ... kind of a sticky wicket. Maybe this should be a different DT property (not "status"). Then PCI enumeration could work normally and 6fffbc7ae137 wouldn't be in the way. > > (If there were a way to actually discover the Loongson situation > > instead of relying on DT, e.g., by keying off a Device ID or > > something, that would be much better, too. I assume we explored that, > > but I don't remember the details.) > > What is it that's special about the Loongson situation?
On Thu, Jun 01, 2023 at 12:51:39PM -0500, Bjorn Helgaas wrote: > > > Doing it in Linux would minimize dependences on the bootloader, so > > > that seems desirable to me. That means Linux needs to enumerate > > > Function 0 so it is visible to a driver or possibly a quirk. > > > > Uhm... no, that wouldn't be enough. Only a straight revert would satisfy > > the workaround that we currently have for NXP ENETC in Linux. > > I guess you mean a revert of 6fffbc7ae137? Yes. > This whole conversation is about whether we can rework 6fffbc7ae137 to > work both for Loongson and for you, so nothing is decided yet. After reading https://lore.kernel.org/linux-pci/20221117020935.32086-1-liupeibao@loongson.cn/ and https://lore.kernel.org/linux-pci/20221103090040.836-1-liupeibao@loongson.cn/ and seeing the GMAC OF node at arch/mips/boot/dts/loongson/loongson64-2k1000.dtsi, I believe that a solution that would work for both Loongson and NXP would be to: - patch loongson_dwmac_probe() to check for of_device_is_available() - revert commit 6fffbc7ae137 ("PCI: Honor firmware's device disabled status") I'm not sure what else of what was concretely proposed would work. Anything else is just wishful thinking that the PCI core can start enforcing a central policy, after letting device drivers get to choose how (and whether) to treat the "status" OF property for years on end. As an added benefit, the disabled GMAC would become visible in lspci for the Loongson SoC. > The point is, I assume you agree that it's preferable if we don't have > to depend on a bootloader to clear the memory. I am confused by the message you are transmitting here. With my user hat on, yes, maintaining the effect of commit 3222b5b613db from Linux is preferable. Although Rob will probably not be happy about the way in which that will be achieved. And you haven't proposed ways in which that would remain possible, short of a revert of commit 6fffbc7ae137. > After 6fffbc7ae137, the probe function is not called if the device is > disabled in DT because there's no pci_dev for it at all. Correct, but commit 3222b5b613db pre-dates it by 2 years, and thus, it is broken by Rob's change. > > My problem is that I don't really understand what was the functional > > need for commit 6fffbc7ae137 ("PCI: Honor firmware's device disabled > > status") in the first place, considering that any device driver can > > already fail to probe based on the same condition at its own will. > > In general, PCI drivers shouldn't rely on DT. If the bus driver (PCI > in this case) calls a driver's probe function, the driver can assume > the device exists. Well, the device exists... > But enetc is not a general-purpose driver, and if DT is the only way > to discover this property, I guess you're stuck doing that. So what Loongson tried to do - break enumeration of the on-chip GMAC PCIe device at the level of the PCIe controller, if the GMAC's pinmuxing doesn't make it available for networking - is encouraged? Do you consider that their patch would have been better in the original form, if instead of the "skip-scan" property, they would have built some smarts into drivers/pci/controller/pci-loongson.c which would intentionally break config space access to gmac@3,0, without requiring OF to specify this? Are you saying that this "present but unusable due to pinmuxing" is an incorrect use of status = "disabled"? What would it constitute correct use of, then? The analogous situation for ENETC would be to patch the "pci-host-ecam-generic" driver to read the SERDES and pinmuxing configuration of the SoC, and to mask/unmask the config access to function 0 based on that. I mean - I could... but is it really a good idea? The principle of separation of concerns tells me no. The fact that the pinmuxing of the device makes it unavailable pertains to the IP-specific logic, it doesn't change whether it's enumerable or accessible on its bus. > > > Is DT the only way to learn the NXP SERDES configuration? I think it > > > would be much better if there were a way to programmatically learn it, > > > because then you wouldn't have to worry about syncing the DT with the > > > platform configuration, and it would decouple this from the Loongson > > > situation. > > > > Syncing the DT with the platform configuration will always be necessary, > > because for networking we will also need extra information which is > > completely non-discoverable, like a phy-handle or such, and that depends > > on the wiring and static pinmuxing of the SoC. So it is practically > > reasonable to expect that what is usable has status = "okay", and what > > isn't has status = "disabled". Not to mention, there are already device > > trees in circulation which are written that way, and those need to > > continue to work. > > Just because we need DT for non-discoverable info A doesn't mean we > should depend on it for B if B *is* discoverable. But the argument was: we already have device trees with a certain convention, and that is to expect having status = "disabled" for unusable ports. I don't believe that changing that is realistically in scope for fixing this. And if we have device trees with status = "disabled" in circulation which we (I) don't want to break, then we're back to square 1 regarding the probing of disabled devices. > This question of disabling a device via DT but still needing to do > things to the device is ... kind of a sticky wicket. It boils down to whether accessing a disabled device is permitted or not. I opened the devicetree specification and it didn't say anything conclusive. Though it's certainly above my pay grade to say anything with certainty in this area. Apart from "okay" and "disabled", "status" takes other documented values too, like "reserved", "fail" and "fail-sss". Linux treats everything that's not "okay" the same. Krzysztof Kozlowski came with the suggestion for Loongson to replace "skip-scan" with "status", during the review of their v1 patch. In any case, that question will only recur one level lower - in U-Boot, where we make an effort to keep device trees in sync in Linux. Why would U-Boot need to do things to a disabled device? :) > Maybe this should be a different DT property (not "status"). Then PCI > enumeration could work normally and 6fffbc7ae137 wouldn't be in the > way. I'm not quite sure where you're going with this. More concretely?
+Cc Jianmin Lv Hi, Jianmin, You are the most familiar person in this field, could you please give some suggestions? Thank you. Huacai > -----原始邮件----- > 发件人: "Vladimir Oltean" <vladimir.oltean@nxp.com> > 发送时间:2023-06-02 06:15:32 (星期五) > 收件人: "Bjorn Helgaas" <helgaas@kernel.org> > 抄送: linux-pci@vger.kernel.org, netdev@vger.kernel.org, "Bjorn Helgaas" <bhelgaas@google.com>, "Rob Herring" <robh@kernel.org>, "Claudiu Manoil" <claudiu.manoil@nxp.com>, "Michael Walle" <michael@walle.cc>, linux-kernel@vger.kernel.org, "Liu Peibao" <liupeibao@loongson.cn>, "Binbin Zhou" <zhoubinbin@loongson.cn>, "Huacai Chen" <chenhuacai@loongson.cn> > 主题: Re: [PATCH pci] PCI: don't skip probing entire device if first fn OF node has status = "disabled" > > On Thu, Jun 01, 2023 at 12:51:39PM -0500, Bjorn Helgaas wrote: > > > > Doing it in Linux would minimize dependences on the bootloader, so > > > > that seems desirable to me. That means Linux needs to enumerate > > > > Function 0 so it is visible to a driver or possibly a quirk. > > > > > > Uhm... no, that wouldn't be enough. Only a straight revert would satisfy > > > the workaround that we currently have for NXP ENETC in Linux. > > > > I guess you mean a revert of 6fffbc7ae137? > > Yes. > > > This whole conversation is about whether we can rework 6fffbc7ae137 to > > work both for Loongson and for you, so nothing is decided yet. > > After reading > https://lore.kernel.org/linux-pci/20221117020935.32086-1-liupeibao@loongson.cn/ > and > https://lore.kernel.org/linux-pci/20221103090040.836-1-liupeibao@loongson.cn/ > and seeing the GMAC OF node at arch/mips/boot/dts/loongson/loongson64-2k1000.dtsi, > I believe that a solution that would work for both Loongson and NXP would be to: > > - patch loongson_dwmac_probe() to check for of_device_is_available() > - revert commit 6fffbc7ae137 ("PCI: Honor firmware's device disabled > status") > > I'm not sure what else of what was concretely proposed would work. > Anything else is just wishful thinking that the PCI core can start > enforcing a central policy, after letting device drivers get to choose > how (and whether) to treat the "status" OF property for years on end. > > As an added benefit, the disabled GMAC would become visible in lspci for > the Loongson SoC. > > > The point is, I assume you agree that it's preferable if we don't have > > to depend on a bootloader to clear the memory. > > I am confused by the message you are transmitting here. > > With my user hat on, yes, maintaining the effect of commit 3222b5b613db > from Linux is preferable. > > Although Rob will probably not be happy about the way in which that will > be achieved. And you haven't proposed ways in which that would remain > possible, short of a revert of commit 6fffbc7ae137. > > > After 6fffbc7ae137, the probe function is not called if the device is > > disabled in DT because there's no pci_dev for it at all. > > Correct, but commit 3222b5b613db pre-dates it by 2 years, and thus, it > is broken by Rob's change. > > > > My problem is that I don't really understand what was the functional > > > need for commit 6fffbc7ae137 ("PCI: Honor firmware's device disabled > > > status") in the first place, considering that any device driver can > > > already fail to probe based on the same condition at its own will. > > > > In general, PCI drivers shouldn't rely on DT. If the bus driver (PCI > > in this case) calls a driver's probe function, the driver can assume > > the device exists. > > Well, the device exists... > > > But enetc is not a general-purpose driver, and if DT is the only way > > to discover this property, I guess you're stuck doing that. > > So what Loongson tried to do - break enumeration of the on-chip GMAC > PCIe device at the level of the PCIe controller, if the GMAC's pinmuxing > doesn't make it available for networking - is encouraged? > > Do you consider that their patch would have been better in the original > form, if instead of the "skip-scan" property, they would have built some > smarts into drivers/pci/controller/pci-loongson.c which would intentionally > break config space access to gmac@3,0, without requiring OF to specify this? > > Are you saying that this "present but unusable due to pinmuxing" is an > incorrect use of status = "disabled"? What would it constitute correct > use of, then? > > The analogous situation for ENETC would be to patch the "pci-host-ecam-generic" > driver to read the SERDES and pinmuxing configuration of the SoC, and to > mask/unmask the config access to function 0 based on that. I mean - I could... > but is it really a good idea? The principle of separation of concerns > tells me no. The fact that the pinmuxing of the device makes it unavailable > pertains to the IP-specific logic, it doesn't change whether it's enumerable > or accessible on its bus. > > > > > Is DT the only way to learn the NXP SERDES configuration? I think it > > > > would be much better if there were a way to programmatically learn it, > > > > because then you wouldn't have to worry about syncing the DT with the > > > > platform configuration, and it would decouple this from the Loongson > > > > situation. > > > > > > Syncing the DT with the platform configuration will always be necessary, > > > because for networking we will also need extra information which is > > > completely non-discoverable, like a phy-handle or such, and that depends > > > on the wiring and static pinmuxing of the SoC. So it is practically > > > reasonable to expect that what is usable has status = "okay", and what > > > isn't has status = "disabled". Not to mention, there are already device > > > trees in circulation which are written that way, and those need to > > > continue to work. > > > > Just because we need DT for non-discoverable info A doesn't mean we > > should depend on it for B if B *is* discoverable. > > But the argument was: we already have device trees with a certain > convention, and that is to expect having status = "disabled" for > unusable ports. I don't believe that changing that is realistically in > scope for fixing this. And if we have device trees with status = > "disabled" in circulation which we (I) don't want to break, then we're > back to square 1 regarding the probing of disabled devices. > > > This question of disabling a device via DT but still needing to do > > things to the device is ... kind of a sticky wicket. > > It boils down to whether accessing a disabled device is permitted or > not. I opened the devicetree specification and it didn't say anything > conclusive. Though it's certainly above my pay grade to say anything > with certainty in this area. Apart from "okay" and "disabled", "status" > takes other documented values too, like "reserved", "fail" and > "fail-sss". Linux treats everything that's not "okay" the same. > Krzysztof Kozlowski came with the suggestion for Loongson to replace > "skip-scan" with "status", during the review of their v1 patch. > > In any case, that question will only recur one level lower - in U-Boot, > where we make an effort to keep device trees in sync in Linux. Why would > U-Boot need to do things to a disabled device? :) > > > Maybe this should be a different DT property (not "status"). Then PCI > > enumeration could work normally and 6fffbc7ae137 wouldn't be in the > > way. > > I'm not quite sure where you're going with this. More concretely? 本邮件及其附件含有龙芯中科的商业秘密信息,仅限于发送给上面地址中列出的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制或散发)本邮件及其附件中的信息。如果您错收本邮件,请您立即电话或邮件通知发件人并删除本邮件。 This email and its attachments contain confidential information from Loongson Technology , which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this email in error, please notify the sender by phone or email immediately and delete it.
Hi all, It seems that modification for current PCI enumeration framework is needed to solve the problem. If the effect of this modification is not easy to evaluate, for the requirement of Loongson, it should be OK that do the things in Loongson PCI controller driver like discussed before[1]. Br, Peibao [1] https://lore.kernel.org/all/20221114074346.23008-1-liupeibao@loongson.cn/ On 6/2/23 6:15 AM, Vladimir Oltean wrote: > On Thu, Jun 01, 2023 at 12:51:39PM -0500, Bjorn Helgaas wrote: >>>> Doing it in Linux would minimize dependences on the bootloader, so >>>> that seems desirable to me. That means Linux needs to enumerate >>>> Function 0 so it is visible to a driver or possibly a quirk. >>> >>> Uhm... no, that wouldn't be enough. Only a straight revert would satisfy >>> the workaround that we currently have for NXP ENETC in Linux. >> >> I guess you mean a revert of 6fffbc7ae137? > > Yes. > >> This whole conversation is about whether we can rework 6fffbc7ae137 to >> work both for Loongson and for you, so nothing is decided yet. > > After reading > https://lore.kernel.org/linux-pci/20221117020935.32086-1-liupeibao@loongson.cn/ > and > https://lore.kernel.org/linux-pci/20221103090040.836-1-liupeibao@loongson.cn/ > and seeing the GMAC OF node at arch/mips/boot/dts/loongson/loongson64-2k1000.dtsi, > I believe that a solution that would work for both Loongson and NXP would be to: > > - patch loongson_dwmac_probe() to check for of_device_is_available() > - revert commit 6fffbc7ae137 ("PCI: Honor firmware's device disabled > status") > > I'm not sure what else of what was concretely proposed would work. > Anything else is just wishful thinking that the PCI core can start > enforcing a central policy, after letting device drivers get to choose > how (and whether) to treat the "status" OF property for years on end. > > As an added benefit, the disabled GMAC would become visible in lspci for > the Loongson SoC. > >> The point is, I assume you agree that it's preferable if we don't have >> to depend on a bootloader to clear the memory. > > I am confused by the message you are transmitting here. > > With my user hat on, yes, maintaining the effect of commit 3222b5b613db > from Linux is preferable. > > Although Rob will probably not be happy about the way in which that will > be achieved. And you haven't proposed ways in which that would remain > possible, short of a revert of commit 6fffbc7ae137. > >> After 6fffbc7ae137, the probe function is not called if the device is >> disabled in DT because there's no pci_dev for it at all. > > Correct, but commit 3222b5b613db pre-dates it by 2 years, and thus, it > is broken by Rob's change. > >>> My problem is that I don't really understand what was the functional >>> need for commit 6fffbc7ae137 ("PCI: Honor firmware's device disabled >>> status") in the first place, considering that any device driver can >>> already fail to probe based on the same condition at its own will. >> >> In general, PCI drivers shouldn't rely on DT. If the bus driver (PCI >> in this case) calls a driver's probe function, the driver can assume >> the device exists. > > Well, the device exists... > >> But enetc is not a general-purpose driver, and if DT is the only way >> to discover this property, I guess you're stuck doing that. > > So what Loongson tried to do - break enumeration of the on-chip GMAC > PCIe device at the level of the PCIe controller, if the GMAC's pinmuxing > doesn't make it available for networking - is encouraged? > > Do you consider that their patch would have been better in the original > form, if instead of the "skip-scan" property, they would have built some > smarts into drivers/pci/controller/pci-loongson.c which would intentionally > break config space access to gmac@3,0, without requiring OF to specify this? > > Are you saying that this "present but unusable due to pinmuxing" is an > incorrect use of status = "disabled"? What would it constitute correct > use of, then? > > The analogous situation for ENETC would be to patch the "pci-host-ecam-generic" > driver to read the SERDES and pinmuxing configuration of the SoC, and to > mask/unmask the config access to function 0 based on that. I mean - I could... > but is it really a good idea? The principle of separation of concerns > tells me no. The fact that the pinmuxing of the device makes it unavailable > pertains to the IP-specific logic, it doesn't change whether it's enumerable > or accessible on its bus. > >>>> Is DT the only way to learn the NXP SERDES configuration? I think it >>>> would be much better if there were a way to programmatically learn it, >>>> because then you wouldn't have to worry about syncing the DT with the >>>> platform configuration, and it would decouple this from the Loongson >>>> situation. >>> >>> Syncing the DT with the platform configuration will always be necessary, >>> because for networking we will also need extra information which is >>> completely non-discoverable, like a phy-handle or such, and that depends >>> on the wiring and static pinmuxing of the SoC. So it is practically >>> reasonable to expect that what is usable has status = "okay", and what >>> isn't has status = "disabled". Not to mention, there are already device >>> trees in circulation which are written that way, and those need to >>> continue to work. >> >> Just because we need DT for non-discoverable info A doesn't mean we >> should depend on it for B if B *is* discoverable. > > But the argument was: we already have device trees with a certain > convention, and that is to expect having status = "disabled" for > unusable ports. I don't believe that changing that is realistically in > scope for fixing this. And if we have device trees with status = > "disabled" in circulation which we (I) don't want to break, then we're > back to square 1 regarding the probing of disabled devices. > >> This question of disabling a device via DT but still needing to do >> things to the device is ... kind of a sticky wicket. > > It boils down to whether accessing a disabled device is permitted or > not. I opened the devicetree specification and it didn't say anything > conclusive. Though it's certainly above my pay grade to say anything > with certainty in this area. Apart from "okay" and "disabled", "status" > takes other documented values too, like "reserved", "fail" and > "fail-sss". Linux treats everything that's not "okay" the same. > Krzysztof Kozlowski came with the suggestion for Loongson to replace > "skip-scan" with "status", during the review of their v1 patch. > > In any case, that question will only recur one level lower - in U-Boot, > where we make an effort to keep device trees in sync in Linux. Why would > U-Boot need to do things to a disabled device? :) > >> Maybe this should be a different DT property (not "status"). Then PCI >> enumeration could work normally and 6fffbc7ae137 wouldn't be in the >> way. > > I'm not quite sure where you're going with this. More concretely? >
On 2023/6/2 下午3:21, Liu Peibao wrote: > Hi all, > > It seems that modification for current PCI enumeration framework is > needed to solve the problem. If the effect of this modification is not > easy to evaluate, for the requirement of Loongson, it should be OK that > do the things in Loongson PCI controller driver like discussed > before[1]. > > Br, > Peibao > > [1] https://lore.kernel.org/all/20221114074346.23008-1-liupeibao@loongson.cn/ > Agree. For current pci core code, all functions of the device will be skipped if function 0 is not found, even without the patch 6fffbc7ae137 (e.g. the func 0 is disabled in bios by setting pci header to 0xffffffff). So it seems that there are two ways for the issue: 1. Adjust the pci scan core code to allow separate function to be enumerated, which will affect widely the pci core code. 2. Only Adjust loongson pci controller driver as Peibao said, and any function of the device should use platform device in DT if function 0 is disabled, which is acceptable for loongson. Thanks, Jianmin > On 6/2/23 6:15 AM, Vladimir Oltean wrote: >> On Thu, Jun 01, 2023 at 12:51:39PM -0500, Bjorn Helgaas wrote: >>>>> Doing it in Linux would minimize dependences on the bootloader, so >>>>> that seems desirable to me. That means Linux needs to enumerate >>>>> Function 0 so it is visible to a driver or possibly a quirk. >>>> >>>> Uhm... no, that wouldn't be enough. Only a straight revert would satisfy >>>> the workaround that we currently have for NXP ENETC in Linux. >>> >>> I guess you mean a revert of 6fffbc7ae137? >> >> Yes. >> >>> This whole conversation is about whether we can rework 6fffbc7ae137 to >>> work both for Loongson and for you, so nothing is decided yet. >> >> After reading >> https://lore.kernel.org/linux-pci/20221117020935.32086-1-liupeibao@loongson.cn/ >> and >> https://lore.kernel.org/linux-pci/20221103090040.836-1-liupeibao@loongson.cn/ >> and seeing the GMAC OF node at arch/mips/boot/dts/loongson/loongson64-2k1000.dtsi, >> I believe that a solution that would work for both Loongson and NXP would be to: >> >> - patch loongson_dwmac_probe() to check for of_device_is_available() >> - revert commit 6fffbc7ae137 ("PCI: Honor firmware's device disabled >> status") >> >> I'm not sure what else of what was concretely proposed would work. >> Anything else is just wishful thinking that the PCI core can start >> enforcing a central policy, after letting device drivers get to choose >> how (and whether) to treat the "status" OF property for years on end. >> >> As an added benefit, the disabled GMAC would become visible in lspci for >> the Loongson SoC. >> >>> The point is, I assume you agree that it's preferable if we don't have >>> to depend on a bootloader to clear the memory. >> >> I am confused by the message you are transmitting here. >> >> With my user hat on, yes, maintaining the effect of commit 3222b5b613db >> from Linux is preferable. >> >> Although Rob will probably not be happy about the way in which that will >> be achieved. And you haven't proposed ways in which that would remain >> possible, short of a revert of commit 6fffbc7ae137. >> >>> After 6fffbc7ae137, the probe function is not called if the device is >>> disabled in DT because there's no pci_dev for it at all. >> >> Correct, but commit 3222b5b613db pre-dates it by 2 years, and thus, it >> is broken by Rob's change. >> >>>> My problem is that I don't really understand what was the functional >>>> need for commit 6fffbc7ae137 ("PCI: Honor firmware's device disabled >>>> status") in the first place, considering that any device driver can >>>> already fail to probe based on the same condition at its own will. >>> >>> In general, PCI drivers shouldn't rely on DT. If the bus driver (PCI >>> in this case) calls a driver's probe function, the driver can assume >>> the device exists. >> >> Well, the device exists... >> >>> But enetc is not a general-purpose driver, and if DT is the only way >>> to discover this property, I guess you're stuck doing that. >> >> So what Loongson tried to do - break enumeration of the on-chip GMAC >> PCIe device at the level of the PCIe controller, if the GMAC's pinmuxing >> doesn't make it available for networking - is encouraged? >> >> Do you consider that their patch would have been better in the original >> form, if instead of the "skip-scan" property, they would have built some >> smarts into drivers/pci/controller/pci-loongson.c which would intentionally >> break config space access to gmac@3,0, without requiring OF to specify this? >> >> Are you saying that this "present but unusable due to pinmuxing" is an >> incorrect use of status = "disabled"? What would it constitute correct >> use of, then? >> >> The analogous situation for ENETC would be to patch the "pci-host-ecam-generic" >> driver to read the SERDES and pinmuxing configuration of the SoC, and to >> mask/unmask the config access to function 0 based on that. I mean - I could... >> but is it really a good idea? The principle of separation of concerns >> tells me no. The fact that the pinmuxing of the device makes it unavailable >> pertains to the IP-specific logic, it doesn't change whether it's enumerable >> or accessible on its bus. >> >>>>> Is DT the only way to learn the NXP SERDES configuration? I think it >>>>> would be much better if there were a way to programmatically learn it, >>>>> because then you wouldn't have to worry about syncing the DT with the >>>>> platform configuration, and it would decouple this from the Loongson >>>>> situation. >>>> >>>> Syncing the DT with the platform configuration will always be necessary, >>>> because for networking we will also need extra information which is >>>> completely non-discoverable, like a phy-handle or such, and that depends >>>> on the wiring and static pinmuxing of the SoC. So it is practically >>>> reasonable to expect that what is usable has status = "okay", and what >>>> isn't has status = "disabled". Not to mention, there are already device >>>> trees in circulation which are written that way, and those need to >>>> continue to work. >>> >>> Just because we need DT for non-discoverable info A doesn't mean we >>> should depend on it for B if B *is* discoverable. >> >> But the argument was: we already have device trees with a certain >> convention, and that is to expect having status = "disabled" for >> unusable ports. I don't believe that changing that is realistically in >> scope for fixing this. And if we have device trees with status = >> "disabled" in circulation which we (I) don't want to break, then we're >> back to square 1 regarding the probing of disabled devices. >> >>> This question of disabling a device via DT but still needing to do >>> things to the device is ... kind of a sticky wicket. >> >> It boils down to whether accessing a disabled device is permitted or >> not. I opened the devicetree specification and it didn't say anything >> conclusive. Though it's certainly above my pay grade to say anything >> with certainty in this area. Apart from "okay" and "disabled", "status" >> takes other documented values too, like "reserved", "fail" and >> "fail-sss". Linux treats everything that's not "okay" the same. >> Krzysztof Kozlowski came with the suggestion for Loongson to replace >> "skip-scan" with "status", during the review of their v1 patch. >> >> In any case, that question will only recur one level lower - in U-Boot, >> where we make an effort to keep device trees in sync in Linux. Why would >> U-Boot need to do things to a disabled device? :) >> >>> Maybe this should be a different DT property (not "status"). Then PCI >>> enumeration could work normally and 6fffbc7ae137 wouldn't be in the >>> way. >> >> I'm not quite sure where you're going with this. More concretely? >>
Hi Jianmin, On Fri, Jun 02, 2023 at 03:36:18PM +0800, Jianmin Lv wrote: > On 2023/6/2 下午3:21, Liu Peibao wrote: > > Hi all, > > > > It seems that modification for current PCI enumeration framework is > > needed to solve the problem. If the effect of this modification is not > > easy to evaluate, for the requirement of Loongson, it should be OK that > > do the things in Loongson PCI controller driver like discussed > > before[1]. > > > > Br, > > Peibao > > > > [1] https://lore.kernel.org/all/20221114074346.23008-1-liupeibao@loongson.cn/ > > > > Agree. For current pci core code, all functions of the device will be > skipped if function 0 is not found, even without the patch 6fffbc7ae137 > (e.g. the func 0 is disabled in bios by setting pci header to 0xffffffff). > So it seems that there are two ways for the issue: > > 1. Adjust the pci scan core code to allow separate function to be > enumerated, which will affect widely the pci core code. > 2. Only Adjust loongson pci controller driver as Peibao said, and any > function of the device should use platform device in DT if function 0 is > disabled, which is acceptable for loongson. > > Thanks, > Jianmin How about 3. handle of_device_is_available() in the probe function of the "loongson, pci-gmac" driver? Would that not work?
On 2023/6/2 下午6:16, Vladimir Oltean wrote: > Hi Jianmin, > > On Fri, Jun 02, 2023 at 03:36:18PM +0800, Jianmin Lv wrote: >> On 2023/6/2 下午3:21, Liu Peibao wrote: >>> Hi all, >>> >>> It seems that modification for current PCI enumeration framework is >>> needed to solve the problem. If the effect of this modification is not >>> easy to evaluate, for the requirement of Loongson, it should be OK that >>> do the things in Loongson PCI controller driver like discussed >>> before[1]. >>> >>> Br, >>> Peibao >>> >>> [1] https://lore.kernel.org/all/20221114074346.23008-1-liupeibao@loongson.cn/ >>> >> >> Agree. For current pci core code, all functions of the device will be >> skipped if function 0 is not found, even without the patch 6fffbc7ae137 >> (e.g. the func 0 is disabled in bios by setting pci header to 0xffffffff). >> So it seems that there are two ways for the issue: >> >> 1. Adjust the pci scan core code to allow separate function to be >> enumerated, which will affect widely the pci core code. >> 2. Only Adjust loongson pci controller driver as Peibao said, and any >> function of the device should use platform device in DT if function 0 is >> disabled, which is acceptable for loongson. >> >> Thanks, >> Jianmin > > How about 3. handle of_device_is_available() in the probe function of > the "loongson, pci-gmac" driver? Would that not work? > This way does work only for the specified device. There are other devices, such as HDA, I2S, etc, which have shared pins. Then we have to add of_device_is_available() checking to those drivers one by one. And we are not sure if there are other devices in new generation chips in future. So I'm afraid that the way you mentioned is not suitable for us. Thanks, Jianmin
On Sat, Jun 03, 2023 at 10:35:50AM +0800, Jianmin Lv wrote: > > How about 3. handle of_device_is_available() in the probe function of > > the "loongson, pci-gmac" driver? Would that not work? > > > This way does work only for the specified device. There are other devices, > such as HDA, I2S, etc, which have shared pins. Then we have to add > of_device_is_available() checking to those drivers one by one. And we are > not sure if there are other devices in new generation chips in future. So > I'm afraid that the way you mentioned is not suitable for us. Got it, so you have more on-chip PCIe devices than the ones listed in loongson64-2k1000.dtsi, and you don't want to describe them in the device tree just to put status = "disabled" for those devices/functions that you don't want Linux to use - although you could, and it wouldn't be that hard or have unintended side effects. Though you need to admit, in case you had an on-chip multi-function PCIe device like the NXP ENETC, and you wanted Linux to not use function 0, the strategy you're suggesting here that is acceptable for Loongson would not have worked. I believe we need a bit of coordination from PCIe and device tree maintainers, to suggest what would be the encouraged best practices and ways to solve this regression for the ENETC.
On 2023/6/4 下午4:55, Vladimir Oltean wrote: > On Sat, Jun 03, 2023 at 10:35:50AM +0800, Jianmin Lv wrote: >>> How about 3. handle of_device_is_available() in the probe function of >>> the "loongson, pci-gmac" driver? Would that not work? >>> >> This way does work only for the specified device. There are other devices, >> such as HDA, I2S, etc, which have shared pins. Then we have to add >> of_device_is_available() checking to those drivers one by one. And we are >> not sure if there are other devices in new generation chips in future. So >> I'm afraid that the way you mentioned is not suitable for us. > > Got it, so you have more on-chip PCIe devices than the ones listed in > loongson64-2k1000.dtsi, and you don't want to describe them in the > device tree just to put status = "disabled" for those devices/functions > that you don't want Linux to use - although you could, and it wouldn't > be that hard or have unintended side effects. > > Though you need to admit, in case you had an on-chip multi-function PCIe > device like the NXP ENETC, and you wanted Linux to not use function 0, > the strategy you're suggesting here that is acceptable for Loongson > would not have worked. > > I believe we need a bit of coordination from PCIe and device tree > maintainers, to suggest what would be the encouraged best practices and > ways to solve this regression for the ENETC. > For a multi-function device, if func 0 is not allowed to be scanned, as I said in way of 2, the other funcs of the device will be described as platform devices instead of pci and be not scanned either, which is acceptable for Loongson. The main goal by any way for us is to resolve the problem that shared pins can not be used simultaneously by devices sharing them. IMO, configure them in DT one by one may be reasonable, but adapting each driver will be bothered. Any way, let's listen to opinions from Bjorn and Rob.
On Mon, Jun 05, 2023 at 08:59:23AM +0800, Jianmin Lv wrote: > For a multi-function device, if func 0 is not allowed to be scanned, as I > said in way of 2, the other funcs of the device will be described as > platform devices instead of pci and be not scanned either, which is > acceptable for Loongson. The main goal by any way for us is to resolve the > problem that shared pins can not be used simultaneously by devices sharing > them. IMO, configure them in DT one by one may be reasonable, but adapting > each driver will be bothered. Could you give an example of PCIe functions being described as platform devices, and how does that work for Loongson? Are you saying that there will be 2 drivers for the same hardware, one pci_driver and one platform_driver? In the case of the platform_driver, who will do the PCI-specific stuff required by the IP, like function level reset and enabling the memory space?
On 2023/6/5 下午5:34, Vladimir Oltean wrote: > On Mon, Jun 05, 2023 at 08:59:23AM +0800, Jianmin Lv wrote: >> For a multi-function device, if func 0 is not allowed to be scanned, as I >> said in way of 2, the other funcs of the device will be described as >> platform devices instead of pci and be not scanned either, which is >> acceptable for Loongson. The main goal by any way for us is to resolve the >> problem that shared pins can not be used simultaneously by devices sharing >> them. IMO, configure them in DT one by one may be reasonable, but adapting >> each driver will be bothered. > > Could you give an example of PCIe functions being described as platform > devices, and how does that work for Loongson? Are you saying that there > will be 2 drivers for the same hardware, one pci_driver and one platform_driver? > In the case of the platform_driver, who will do the PCI-specific stuff > required by the IP, like function level reset and enabling the memory space? > E.g. there are two functions , func0 is HDA controller and func1 is I2S controller and they have shared pins. When HDA or I2S is used, both are disabled for PCI enumeration in BIOS (e.g. by filling PCI header with 0xffffffff), and mem space has been reserved from host bridge window for them in BIOS, of cause, reserved space will not be seen by kernel because it has been removed in host bridge mem range when passed to kernel in DT. Then the reserved mem base is passed into kernel by DT, CPU will use remapped address of the mem base, and these devices will not be enumerated in PCI bus. The way is only used for PCI devices (share common pins and exist on bus 0) integrated in Loongson CPU or chipset.
On Sun, Jun 4, 2023 at 2:55 AM Vladimir Oltean <vladimir.oltean@nxp.com> wrote: > Sorry, just now seeing this as I've been out the last month. > On Sat, Jun 03, 2023 at 10:35:50AM +0800, Jianmin Lv wrote: > > > How about 3. handle of_device_is_available() in the probe function of > > > the "loongson, pci-gmac" driver? Would that not work? > > > > > This way does work only for the specified device. There are other devices, > > such as HDA, I2S, etc, which have shared pins. Then we have to add > > of_device_is_available() checking to those drivers one by one. And we are > > not sure if there are other devices in new generation chips in future. So > > I'm afraid that the way you mentioned is not suitable for us. If we decided that disabled devices should probe, then that is exactly what will have to be done. The restriction (of shared pins) is in the devices and is potentially per device, so it makes more sense for the device's drivers to handle than the host bridge IMO. (Assuming the core doesn't handle a per device property.) > Got it, so you have more on-chip PCIe devices than the ones listed in > loongson64-2k1000.dtsi, and you don't want to describe them in the > device tree just to put status = "disabled" for those devices/functions > that you don't want Linux to use - although you could, and it wouldn't > be that hard or have unintended side effects. > > Though you need to admit, in case you had an on-chip multi-function PCIe > device like the NXP ENETC, and you wanted Linux to not use function 0, > the strategy you're suggesting here that is acceptable for Loongson > would not have worked. > > I believe we need a bit of coordination from PCIe and device tree > maintainers, to suggest what would be the encouraged best practices and > ways to solve this regression for the ENETC. I think we need to define what behavior is correct for 'status = "disabled"'. For almost everywhere in DT, it is equivalent to the device is not present. A not present device doesn't probe. There are unfortunately cases where status got ignored/forgotten and PCI was one of those. PCI is a bit different since there are 2 sources of information about a device being present. The intent with PCI is DT overrides what's discovered. For example, 'vendor-id' overrides what's read from the h/w. I think we can fix making the status per function simply by making 'match_driver' be set based on the status. This would move the check later to just before probing. That would not work for a case where accessing the config registers is a problem. It doesn't sound like that's a problem for Loongson based on the above response, but their original solution did prevent that. This change would also mean the PCI quirks would run. Perhaps the func0 memory clearing you need could be run as a quirk instead? Rob
Hi Rob, On Fri, Jun 16, 2023 at 11:57:43AM -0600, Rob Herring wrote: > On Sun, Jun 4, 2023 at 2:55 AM Vladimir Oltean <vladimir.oltean@nxp.com> wrote: > > > > Sorry, just now seeing this as I've been out the last month. > > > On Sat, Jun 03, 2023 at 10:35:50AM +0800, Jianmin Lv wrote: > > > > How about 3. handle of_device_is_available() in the probe function of > > > > the "loongson, pci-gmac" driver? Would that not work? > > > > > > > This way does work only for the specified device. There are other devices, > > > such as HDA, I2S, etc, which have shared pins. Then we have to add > > > of_device_is_available() checking to those drivers one by one. And we are > > > not sure if there are other devices in new generation chips in future. So > > > I'm afraid that the way you mentioned is not suitable for us. > > If we decided that disabled devices should probe, then that is exactly > what will have to be done. The restriction (of shared pins) is in the > devices and is potentially per device, so it makes more sense for the > device's drivers to handle than the host bridge IMO. (Assuming the > core doesn't handle a per device property.) > > > > Got it, so you have more on-chip PCIe devices than the ones listed in > > loongson64-2k1000.dtsi, and you don't want to describe them in the > > device tree just to put status = "disabled" for those devices/functions > > that you don't want Linux to use - although you could, and it wouldn't > > be that hard or have unintended side effects. > > > > Though you need to admit, in case you had an on-chip multi-function PCIe > > device like the NXP ENETC, and you wanted Linux to not use function 0, > > the strategy you're suggesting here that is acceptable for Loongson > > would not have worked. > > > > I believe we need a bit of coordination from PCIe and device tree > > maintainers, to suggest what would be the encouraged best practices and > > ways to solve this regression for the ENETC. > > I think we need to define what behavior is correct for 'status = > "disabled"'. For almost everywhere in DT, it is equivalent to the > device is not present. A not present device doesn't probe. There are > unfortunately cases where status got ignored/forgotten and PCI was one > of those. PCI is a bit different since there are 2 sources of > information about a device being present. The intent with PCI is DT > overrides what's discovered. For example, 'vendor-id' overrides what's > read from the h/w. > > I think we can fix making the status per function simply by making > 'match_driver' be set based on the status. This would move the check > later to just before probing. That would not work for a case where > accessing the config registers is a problem. It doesn't sound like > that's a problem for Loongson based on the above response, but their > original solution did prevent that. This change would also mean the > PCI quirks would run. Perhaps the func0 memory clearing you need could > be run as a quirk instead? > > Rob Sorry to return to this thread very late. I had lots of other stuff to take care of, and somehow *this* breakage had less priority :) So, first off, there's a confusion regarding the "func0 memory clearing" that could be run as a quirk instead. It's not memory clearing for fn 0, but memory clearing for all ENETC functions, regardless or not whether they have status = "disabled" or not in the device tree. That being said, I've implemented the workaround below in a quirk as you've said, and the quirks only get applied for those PCI functions which don't have status = "disabled" in the device tree. So, as things stand, it won't work. Also, the original patch on which we're commenting ("PCI: don't skip probing entire device if first fn OF node has status = "disabled"") is needed in any case, because of the other issue: the PCI core thinks that when fn 0 has status = "disabled", fn 1 .. 6 are also unavailable. False. From 9c3b88196a7c7e2b010d051c6d48faf36791e220 Mon Sep 17 00:00:00 2001 From: Vladimir Oltean <vladimir.oltean@nxp.com> Date: Tue, 20 Jun 2023 16:31:07 +0300 Subject: [PATCH] net: enetc: reimplement RFS/RSS memory clearing as PCI quirk Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> --- .../net/ethernet/freescale/enetc/enetc_pf.c | 57 ++++++++++++++----- 1 file changed, 43 insertions(+), 14 deletions(-) diff --git a/drivers/net/ethernet/freescale/enetc/enetc_pf.c b/drivers/net/ethernet/freescale/enetc/enetc_pf.c index 1416262d4296..b8f6f0799170 100644 --- a/drivers/net/ethernet/freescale/enetc/enetc_pf.c +++ b/drivers/net/ethernet/freescale/enetc/enetc_pf.c @@ -1242,18 +1242,6 @@ static int enetc_pf_probe(struct pci_dev *pdev, if (err) goto err_setup_cbdr; - err = enetc_init_port_rfs_memory(si); - if (err) { - dev_err(&pdev->dev, "Failed to initialize RFS memory\n"); - goto err_init_port_rfs; - } - - err = enetc_init_port_rss_memory(si); - if (err) { - dev_err(&pdev->dev, "Failed to initialize RSS memory\n"); - goto err_init_port_rss; - } - if (node && !of_device_is_available(node)) { dev_info(&pdev->dev, "device is disabled, skipping\n"); err = -ENODEV; @@ -1339,8 +1327,6 @@ static int enetc_pf_probe(struct pci_dev *pdev, si->ndev = NULL; free_netdev(ndev); err_alloc_netdev: -err_init_port_rss: -err_init_port_rfs: err_device_disabled: err_setup_mac_addresses: enetc_teardown_cbdr(&si->cbd_ring); @@ -1377,6 +1363,49 @@ static void enetc_pf_remove(struct pci_dev *pdev) enetc_pci_remove(pdev); } +static void enetc_fixup_clear_rss_rfs(struct pci_dev *pdev) +{ + struct enetc_si *si; + struct enetc_pf *pf; + int err; + + err = enetc_pci_probe(pdev, KBUILD_MODNAME, sizeof(*pf)); + if (err) + goto out; + + si = pci_get_drvdata(pdev); + if (!si->hw.port || !si->hw.global) { + err = -ENODEV; + goto out_pci_remove; + } + + err = enetc_setup_cbdr(&pdev->dev, &si->hw, ENETC_CBDR_DEFAULT_SIZE, + &si->cbd_ring); + if (err) + goto out_pci_remove; + + err = enetc_init_port_rfs_memory(si); + if (err) + goto out_teardown_cbdr; + + err = enetc_init_port_rss_memory(si); + if (err) + goto out_teardown_cbdr; + +out_teardown_cbdr: + enetc_teardown_cbdr(&si->cbd_ring); +out_pci_remove: + enetc_pci_remove(pdev); +out: + if (err) { + dev_err(&pdev->dev, + "Failed to apply PCI fixup for clearing RFS/RSS memories: %pe\n", + ERR_PTR(err)); + } +} +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_FREESCALE, ENETC_DEV_ID_PF, + enetc_fixup_clear_rss_rfs); + static const struct pci_device_id enetc_pf_id_table[] = { { PCI_DEVICE(PCI_VENDOR_ID_FREESCALE, ENETC_DEV_ID_PF) }, { 0, } /* End of table. */ -- 2.34.1
On Thu, Aug 03, 2023 at 01:39:55PM +0300, Vladimir Oltean wrote: > Hi Rob, > > On Fri, Jun 16, 2023 at 11:57:43AM -0600, Rob Herring wrote: > > On Sun, Jun 4, 2023 at 2:55 AM Vladimir Oltean <vladimir.oltean@nxp.com> wrote: > > > > > > > Sorry, just now seeing this as I've been out the last month. > > > > > On Sat, Jun 03, 2023 at 10:35:50AM +0800, Jianmin Lv wrote: > > > > > How about 3. handle of_device_is_available() in the probe function of > > > > > the "loongson, pci-gmac" driver? Would that not work? > > > > > > > > > This way does work only for the specified device. There are other devices, > > > > such as HDA, I2S, etc, which have shared pins. Then we have to add > > > > of_device_is_available() checking to those drivers one by one. And we are > > > > not sure if there are other devices in new generation chips in future. So > > > > I'm afraid that the way you mentioned is not suitable for us. > > > > If we decided that disabled devices should probe, then that is exactly > > what will have to be done. The restriction (of shared pins) is in the > > devices and is potentially per device, so it makes more sense for the > > device's drivers to handle than the host bridge IMO. (Assuming the > > core doesn't handle a per device property.) > > > > > > > Got it, so you have more on-chip PCIe devices than the ones listed in > > > loongson64-2k1000.dtsi, and you don't want to describe them in the > > > device tree just to put status = "disabled" for those devices/functions > > > that you don't want Linux to use - although you could, and it wouldn't > > > be that hard or have unintended side effects. > > > > > > Though you need to admit, in case you had an on-chip multi-function PCIe > > > device like the NXP ENETC, and you wanted Linux to not use function 0, > > > the strategy you're suggesting here that is acceptable for Loongson > > > would not have worked. > > > > > > I believe we need a bit of coordination from PCIe and device tree > > > maintainers, to suggest what would be the encouraged best practices and > > > ways to solve this regression for the ENETC. > > > > I think we need to define what behavior is correct for 'status = > > "disabled"'. For almost everywhere in DT, it is equivalent to the > > device is not present. A not present device doesn't probe. There are > > unfortunately cases where status got ignored/forgotten and PCI was one > > of those. PCI is a bit different since there are 2 sources of > > information about a device being present. The intent with PCI is DT > > overrides what's discovered. For example, 'vendor-id' overrides what's > > read from the h/w. > > > > I think we can fix making the status per function simply by making > > 'match_driver' be set based on the status. This would move the check > > later to just before probing. That would not work for a case where > > accessing the config registers is a problem. It doesn't sound like > > that's a problem for Loongson based on the above response, but their > > original solution did prevent that. This change would also mean the > > PCI quirks would run. Perhaps the func0 memory clearing you need could > > be run as a quirk instead? > > > > Rob > > Sorry to return to this thread very late. I had lots of other stuff to > take care of, and somehow *this* breakage had less priority :) > > So, first off, there's a confusion regarding the "func0 memory clearing" > that could be run as a quirk instead. It's not memory clearing for fn 0, > but memory clearing for all ENETC functions, regardless or not whether > they have status = "disabled" or not in the device tree. > > That being said, I've implemented the workaround below in a quirk as > you've said, and the quirks only get applied for those PCI functions > which don't have status = "disabled" in the device tree. So, as things > stand, it won't work. > > Also, the original patch on which we're commenting ("PCI: don't skip > probing entire device if first fn OF node has status = "disabled"") is > needed in any case, because of the other issue: the PCI core thinks that > when fn 0 has status = "disabled", fn 1 .. 6 are also unavailable. False. > > From 9c3b88196a7c7e2b010d051c6d48faf36791e220 Mon Sep 17 00:00:00 2001 > From: Vladimir Oltean <vladimir.oltean@nxp.com> > Date: Tue, 20 Jun 2023 16:31:07 +0300 > Subject: [PATCH] net: enetc: reimplement RFS/RSS memory clearing as PCI quirk > > Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> > --- > .../net/ethernet/freescale/enetc/enetc_pf.c | 57 ++++++++++++++----- > 1 file changed, 43 insertions(+), 14 deletions(-) > > diff --git a/drivers/net/ethernet/freescale/enetc/enetc_pf.c b/drivers/net/ethernet/freescale/enetc/enetc_pf.c > index 1416262d4296..b8f6f0799170 100644 > --- a/drivers/net/ethernet/freescale/enetc/enetc_pf.c > +++ b/drivers/net/ethernet/freescale/enetc/enetc_pf.c > @@ -1242,18 +1242,6 @@ static int enetc_pf_probe(struct pci_dev *pdev, > if (err) > goto err_setup_cbdr; > > - err = enetc_init_port_rfs_memory(si); > - if (err) { > - dev_err(&pdev->dev, "Failed to initialize RFS memory\n"); > - goto err_init_port_rfs; > - } > - > - err = enetc_init_port_rss_memory(si); > - if (err) { > - dev_err(&pdev->dev, "Failed to initialize RSS memory\n"); > - goto err_init_port_rss; > - } > - > if (node && !of_device_is_available(node)) { > dev_info(&pdev->dev, "device is disabled, skipping\n"); > err = -ENODEV; > @@ -1339,8 +1327,6 @@ static int enetc_pf_probe(struct pci_dev *pdev, > si->ndev = NULL; > free_netdev(ndev); > err_alloc_netdev: > -err_init_port_rss: > -err_init_port_rfs: > err_device_disabled: > err_setup_mac_addresses: > enetc_teardown_cbdr(&si->cbd_ring); > @@ -1377,6 +1363,49 @@ static void enetc_pf_remove(struct pci_dev *pdev) > enetc_pci_remove(pdev); > } > > +static void enetc_fixup_clear_rss_rfs(struct pci_dev *pdev) > +{ > + struct enetc_si *si; > + struct enetc_pf *pf; > + int err; > + > + err = enetc_pci_probe(pdev, KBUILD_MODNAME, sizeof(*pf)); > + if (err) > + goto out; > + > + si = pci_get_drvdata(pdev); > + if (!si->hw.port || !si->hw.global) { > + err = -ENODEV; > + goto out_pci_remove; > + } > + > + err = enetc_setup_cbdr(&pdev->dev, &si->hw, ENETC_CBDR_DEFAULT_SIZE, > + &si->cbd_ring); > + if (err) > + goto out_pci_remove; > + > + err = enetc_init_port_rfs_memory(si); > + if (err) > + goto out_teardown_cbdr; > + > + err = enetc_init_port_rss_memory(si); > + if (err) > + goto out_teardown_cbdr; > + > +out_teardown_cbdr: > + enetc_teardown_cbdr(&si->cbd_ring); > +out_pci_remove: > + enetc_pci_remove(pdev); > +out: > + if (err) { > + dev_err(&pdev->dev, > + "Failed to apply PCI fixup for clearing RFS/RSS memories: %pe\n", > + ERR_PTR(err)); > + } > +} > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_FREESCALE, ENETC_DEV_ID_PF, > + enetc_fixup_clear_rss_rfs); > + > static const struct pci_device_id enetc_pf_id_table[] = { > { PCI_DEVICE(PCI_VENDOR_ID_FREESCALE, ENETC_DEV_ID_PF) }, > { 0, } /* End of table. */ > -- > 2.34.1 > Ah, sorry, I completely missed your comment about match_driver. So I've added this extra patch and both issues are solved. The fixup runs for all functions (thus I see no AER errors), and functions 1 .. 6 continue to probe even when function 0 has status = "disabled". From 3528d4a48cb37dce8d1d83d2fbb465f21a32adcd Mon Sep 17 00:00:00 2001 From: Vladimir Oltean <vladimir.oltean@nxp.com> Date: Thu, 3 Aug 2023 14:31:27 +0300 Subject: [PATCH] PCI: move OF status = "disabled" detection to dev->match_driver Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> --- drivers/pci/bus.c | 4 +++- drivers/pci/of.c | 5 ----- 2 files changed, 3 insertions(+), 6 deletions(-) diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c index 5bc81cc0a2de..46b252bbe500 100644 --- a/drivers/pci/bus.c +++ b/drivers/pci/bus.c @@ -11,6 +11,7 @@ #include <linux/pci.h> #include <linux/errno.h> #include <linux/ioport.h> +#include <linux/of.h> #include <linux/proc_fs.h> #include <linux/slab.h> @@ -332,6 +333,7 @@ void __weak pcibios_bus_add_device(struct pci_dev *pdev) { } */ void pci_bus_add_device(struct pci_dev *dev) { + struct device_node *dn = dev->dev.of_node; int retval; /* @@ -344,7 +346,7 @@ void pci_bus_add_device(struct pci_dev *dev) pci_proc_attach_device(dev); pci_bridge_d3_update(dev); - dev->match_driver = true; + dev->match_driver = !dn || of_device_is_available(dn); retval = device_attach(&dev->dev); if (retval < 0 && retval != -EPROBE_DEFER) pci_warn(dev, "device attach failed (%d)\n", retval); diff --git a/drivers/pci/of.c b/drivers/pci/of.c index e51219f9f523..3c158b17dcb5 100644 --- a/drivers/pci/of.c +++ b/drivers/pci/of.c @@ -34,11 +34,6 @@ int pci_set_of_node(struct pci_dev *dev) if (!node) return 0; - if (!of_device_is_available(node)) { - of_node_put(node); - return -ENODEV; - } - device_set_node(&dev->dev, of_fwnode_handle(node)); return 0; } -- 2.34.1
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h index 2475098f6518..dc11e0945744 100644 --- a/drivers/pci/pci.h +++ b/drivers/pci/pci.h @@ -240,6 +240,7 @@ bool pci_bus_generic_read_dev_vendor_id(struct pci_bus *bus, int devfn, u32 *pl, int crs_timeout); int pci_idt_bus_quirk(struct pci_bus *bus, int devfn, u32 *pl, int crs_timeout); +int __pci_setup_device(struct pci_dev *dev, bool *present_but_skipped); int pci_setup_device(struct pci_dev *dev); int __pci_read_base(struct pci_dev *dev, enum pci_bar_type type, struct resource *res, unsigned int reg); diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c index 0b2826c4a832..17a51fa55020 100644 --- a/drivers/pci/probe.c +++ b/drivers/pci/probe.c @@ -1811,17 +1811,7 @@ static void early_dump_pci_device(struct pci_dev *pdev) value, 256, false); } -/** - * pci_setup_device - Fill in class and map information of a device - * @dev: the device structure to fill - * - * Initialize the device structure with information about the device's - * vendor,class,memory and IO-space addresses, IRQ lines etc. - * Called at initialisation of the PCI subsystem and by CardBus services. - * Returns 0 on success and negative if unknown type of device (not normal, - * bridge or CardBus). - */ -int pci_setup_device(struct pci_dev *dev) +int __pci_setup_device(struct pci_dev *dev, bool *present_but_skipped) { u32 class; u16 cmd; @@ -1841,8 +1831,10 @@ int pci_setup_device(struct pci_dev *dev) set_pcie_port_type(dev); err = pci_set_of_node(dev); - if (err) + if (err) { + *present_but_skipped = true; return err; + } pci_set_acpi_fwnode(dev); pci_dev_assign_slot(dev); @@ -1995,6 +1987,23 @@ int pci_setup_device(struct pci_dev *dev) return 0; } +/** + * pci_setup_device - Fill in class and map information of a device + * @dev: the device structure to fill + * + * Initialize the device structure with information about the device's + * vendor,class,memory and IO-space addresses, IRQ lines etc. + * Called at initialisation of the PCI subsystem and by CardBus services. + * Returns 0 on success and negative if unknown type of device (not normal, + * bridge or CardBus). + */ +int pci_setup_device(struct pci_dev *dev) +{ + bool present_but_skipped = false; + + return __pci_setup_device(dev, &present_but_skipped); +} + static void pci_configure_mps(struct pci_dev *dev) { struct pci_dev *bridge = pci_upstream_bridge(dev); @@ -2414,7 +2423,8 @@ EXPORT_SYMBOL(pci_bus_read_dev_vendor_id); * Read the config data for a PCI device, sanity-check it, * and fill in the dev structure. */ -static struct pci_dev *pci_scan_device(struct pci_bus *bus, int devfn) +static struct pci_dev *pci_scan_device(struct pci_bus *bus, int devfn, + bool *present_but_skipped) { struct pci_dev *dev; u32 l; @@ -2430,7 +2440,7 @@ static struct pci_dev *pci_scan_device(struct pci_bus *bus, int devfn) dev->vendor = l & 0xffff; dev->device = (l >> 16) & 0xffff; - if (pci_setup_device(dev)) { + if (__pci_setup_device(dev, present_but_skipped)) { pci_bus_put(dev->bus); kfree(dev); return NULL; @@ -2575,17 +2585,20 @@ void pci_device_add(struct pci_dev *dev, struct pci_bus *bus) WARN_ON(ret < 0); } -struct pci_dev *pci_scan_single_device(struct pci_bus *bus, int devfn) +static struct pci_dev *__pci_scan_single_device(struct pci_bus *bus, int devfn, + bool *present_but_skipped) { struct pci_dev *dev; + *present_but_skipped = false; + dev = pci_get_slot(bus, devfn); if (dev) { pci_dev_put(dev); return dev; } - dev = pci_scan_device(bus, devfn); + dev = pci_scan_device(bus, devfn, present_but_skipped); if (!dev) return NULL; @@ -2593,6 +2606,13 @@ struct pci_dev *pci_scan_single_device(struct pci_bus *bus, int devfn) return dev; } + +struct pci_dev *pci_scan_single_device(struct pci_bus *bus, int devfn) +{ + bool present_but_skipped; + + return __pci_scan_single_device(bus, devfn, &present_but_skipped); +} EXPORT_SYMBOL(pci_scan_single_device); static int next_ari_fn(struct pci_bus *bus, struct pci_dev *dev, int fn) @@ -2665,6 +2685,7 @@ static int only_one_child(struct pci_bus *bus) */ int pci_scan_slot(struct pci_bus *bus, int devfn) { + bool present_but_skipped; struct pci_dev *dev; int fn = 0, nr = 0; @@ -2672,13 +2693,14 @@ int pci_scan_slot(struct pci_bus *bus, int devfn) return 0; /* Already scanned the entire slot */ do { - dev = pci_scan_single_device(bus, devfn + fn); + dev = __pci_scan_single_device(bus, devfn + fn, + &present_but_skipped); if (dev) { if (!pci_dev_is_added(dev)) nr++; if (fn > 0) dev->multifunction = 1; - } else if (fn == 0) { + } else if (fn == 0 && !present_but_skipped) { /* * Function 0 is required unless we are running on * a hypervisor that passes through individual PCI
pci_scan_child_bus_extend() calls pci_scan_slot() with devfn (bus:device:function) being a multiple of 8, i.e. for each unique device. pci_scan_slot() has logic to say that if the function 0 of a device is absent, the entire device is absent and we can skip the other functions entirely. Traditionally, this has meant that pci_bus_read_dev_vendor_id() returns an error code for that function. However, since the blamed commit, there is an extra confounding condition: function 0 of the device exists and has a valid vendor id, but it is disabled in the device tree. In that case, pci_scan_slot() would incorrectly skip the entire device instead of just that function. Such is the case with the NXP LS1028A SoC, which has an ECAM for embedded Ethernet (see pcie@1f0000000 in arm64/boot/dts/freescale/fsl-ls1028a.dtsi). Each Ethernet port represents a function within the ENETC ECAM, with function 0 going to ENETC Ethernet port 0, connected to SERDES port 0 (SGMII or USXGMII). When using a SERDES protocol such as 0x9999, all 4 SERDES lanes go to the Ethernet switch (function 5 on this ECAM) and none go to ENETC port 0. So, ENETC port 0 needs to have status = "disabled", and embedded Ethernet takes place just through the other functions (fn 2 is the DSA master, fn 3 is the MDIO controller, fn 5 is the DSA switch etc). Contrast this with other SERDES protocols like 0x85bb, where the switch takes up a single SERDES lane and uses the QSGMII protocol - so ENETC port 0 also gets access to a SERDES lane. Therefore, here, function 0 being unused has nothing to do with the entire PCI device being unused. Add a "bool present_but_skipped" which is propagated from the caller of pci_set_of_node() all the way to pci_scan_slot(), so that it can distinguish an error reading the ECAM from a disabled device in the device tree. Fixes: 6fffbc7ae137 ("PCI: Honor firmware's device disabled status") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> --- drivers/pci/pci.h | 1 + drivers/pci/probe.c | 58 +++++++++++++++++++++++++++++++-------------- 2 files changed, 41 insertions(+), 18 deletions(-)