diff mbox series

[pci] PCI: don't skip probing entire device if first fn OF node has status = "disabled"

Message ID 20230521115141.2384444-1-vladimir.oltean@nxp.com (mailing list archive)
State Not Applicable
Delegated to: Netdev Maintainers
Headers show
Series [pci] PCI: don't skip probing entire device if first fn OF node has status = "disabled" | expand

Checks

Context Check Description
netdev/tree_selection success Not a local patch

Commit Message

Vladimir Oltean May 21, 2023, 11:51 a.m. UTC
pci_scan_child_bus_extend() calls pci_scan_slot() with devfn
(bus:device:function) being a multiple of 8, i.e. for each unique
device.

pci_scan_slot() has logic to say that if the function 0 of a device is
absent, the entire device is absent and we can skip the other functions
entirely. Traditionally, this has meant that pci_bus_read_dev_vendor_id()
returns an error code for that function.

However, since the blamed commit, there is an extra confounding
condition: function 0 of the device exists and has a valid vendor id,
but it is disabled in the device tree. In that case, pci_scan_slot()
would incorrectly skip the entire device instead of just that function.

Such is the case with the NXP LS1028A SoC, which has an ECAM
for embedded Ethernet (see pcie@1f0000000 in
arm64/boot/dts/freescale/fsl-ls1028a.dtsi). Each Ethernet port
represents a function within the ENETC ECAM, with function 0 going
to ENETC Ethernet port 0, connected to SERDES port 0 (SGMII or USXGMII).

When using a SERDES protocol such as 0x9999, all 4 SERDES lanes go to
the Ethernet switch (function 5 on this ECAM) and none go to ENETC
port 0. So, ENETC port 0 needs to have status = "disabled", and embedded
Ethernet takes place just through the other functions (fn 2 is the DSA
master, fn 3 is the MDIO controller, fn 5 is the DSA switch etc).
Contrast this with other SERDES protocols like 0x85bb, where the switch
takes up a single SERDES lane and uses the QSGMII protocol - so ENETC
port 0 also gets access to a SERDES lane.

Therefore, here, function 0 being unused has nothing to do with the
entire PCI device being unused.

Add a "bool present_but_skipped" which is propagated from the caller
of pci_set_of_node() all the way to pci_scan_slot(), so that it can
distinguish an error reading the ECAM from a disabled device in the
device tree.

Fixes: 6fffbc7ae137 ("PCI: Honor firmware's device disabled status")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 drivers/pci/pci.h   |  1 +
 drivers/pci/probe.c | 58 +++++++++++++++++++++++++++++++--------------
 2 files changed, 41 insertions(+), 18 deletions(-)

Comments

Vladimir Oltean May 29, 2023, 8:48 p.m. UTC | #1
Hi Bjorn,

On Sun, May 21, 2023 at 02:51:41PM +0300, Vladimir Oltean wrote:
> pci_scan_child_bus_extend() calls pci_scan_slot() with devfn
> (bus:device:function) being a multiple of 8, i.e. for each unique
> device.
> 
> pci_scan_slot() has logic to say that if the function 0 of a device is
> absent, the entire device is absent and we can skip the other functions
> entirely. Traditionally, this has meant that pci_bus_read_dev_vendor_id()
> returns an error code for that function.
> 
> However, since the blamed commit, there is an extra confounding
> condition: function 0 of the device exists and has a valid vendor id,
> but it is disabled in the device tree. In that case, pci_scan_slot()
> would incorrectly skip the entire device instead of just that function.
> 
> Such is the case with the NXP LS1028A SoC, which has an ECAM
> for embedded Ethernet (see pcie@1f0000000 in
> arm64/boot/dts/freescale/fsl-ls1028a.dtsi). Each Ethernet port
> represents a function within the ENETC ECAM, with function 0 going
> to ENETC Ethernet port 0, connected to SERDES port 0 (SGMII or USXGMII).
> 
> When using a SERDES protocol such as 0x9999, all 4 SERDES lanes go to
> the Ethernet switch (function 5 on this ECAM) and none go to ENETC
> port 0. So, ENETC port 0 needs to have status = "disabled", and embedded
> Ethernet takes place just through the other functions (fn 2 is the DSA
> master, fn 3 is the MDIO controller, fn 5 is the DSA switch etc).
> Contrast this with other SERDES protocols like 0x85bb, where the switch
> takes up a single SERDES lane and uses the QSGMII protocol - so ENETC
> port 0 also gets access to a SERDES lane.
> 
> Therefore, here, function 0 being unused has nothing to do with the
> entire PCI device being unused.
> 
> Add a "bool present_but_skipped" which is propagated from the caller
> of pci_set_of_node() all the way to pci_scan_slot(), so that it can
> distinguish an error reading the ECAM from a disabled device in the
> device tree.
> 
> Fixes: 6fffbc7ae137 ("PCI: Honor firmware's device disabled status")
> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
> ---

Do you have some comments here?

Thanks,
Vladimir
Bjorn Helgaas May 30, 2023, 9:58 p.m. UTC | #2
On Sun, May 21, 2023 at 02:51:41PM +0300, Vladimir Oltean wrote:
> pci_scan_child_bus_extend() calls pci_scan_slot() with devfn
> (bus:device:function) being a multiple of 8, i.e. for each unique
> device.
> 
> pci_scan_slot() has logic to say that if the function 0 of a device is
> absent, the entire device is absent and we can skip the other functions
> entirely. Traditionally, this has meant that pci_bus_read_dev_vendor_id()
> returns an error code for that function.
> 
> However, since the blamed commit, there is an extra confounding
> condition: function 0 of the device exists and has a valid vendor id,
> but it is disabled in the device tree. In that case, pci_scan_slot()
> would incorrectly skip the entire device instead of just that function.
> 
> Such is the case with the NXP LS1028A SoC, which has an ECAM
> for embedded Ethernet (see pcie@1f0000000 in
> arm64/boot/dts/freescale/fsl-ls1028a.dtsi). Each Ethernet port
> represents a function within the ENETC ECAM, with function 0 going
> to ENETC Ethernet port 0, connected to SERDES port 0 (SGMII or USXGMII).
> 
> When using a SERDES protocol such as 0x9999, all 4 SERDES lanes go to
> the Ethernet switch (function 5 on this ECAM) and none go to ENETC
> port 0. So, ENETC port 0 needs to have status = "disabled", and embedded
> Ethernet takes place just through the other functions (fn 2 is the DSA
> master, fn 3 is the MDIO controller, fn 5 is the DSA switch etc).
> Contrast this with other SERDES protocols like 0x85bb, where the switch
> takes up a single SERDES lane and uses the QSGMII protocol - so ENETC
> port 0 also gets access to a SERDES lane.

Can you write this description in terms of PCI topology?  The
nitty-gritty SERDES details are not relevant at this level, except to
say that Function 0 is present in some cases but not others, and when
it is not present, *other* functions may be present.

Sigh.  Per spec (PCIe r6.0, sec 7.5.1.1.9), software is not permitted
to probe for Functions other than 0 unless "explicitly indicated by
another mechanism, such as an ARI or SR-IOV Capability."

Does it "work" to probe when the spec prohibits it?  Probably.  Does
it lead to some breakage elsewhere eventually?  Quite possibly.  They
didn't put "software must not probe" in the spec just to make
enumeration faster.

So I'm a little grumpy about further complicating this already messy
path just to accommodate a new non-compliant SoC.  Everybody pays the
price of understanding all this stuff, and it doesn't seem in balance.

Can you take advantage of some existing mechanism like
PCI_SCAN_ALL_PCIE_DEVS or hypervisor_isolated_pci_functions() (which
could be renamed and made more general)?

> Therefore, here, function 0 being unused has nothing to do with the
> entire PCI device being unused.
> 
> Add a "bool present_but_skipped" which is propagated from the caller
> of pci_set_of_node() all the way to pci_scan_slot(), so that it can
> distinguish an error reading the ECAM from a disabled device in the
> device tree.
> 
> Fixes: 6fffbc7ae137 ("PCI: Honor firmware's device disabled status")
> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
> ---
>  drivers/pci/pci.h   |  1 +
>  drivers/pci/probe.c | 58 +++++++++++++++++++++++++++++++--------------
>  2 files changed, 41 insertions(+), 18 deletions(-)
> 
> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> index 2475098f6518..dc11e0945744 100644
> --- a/drivers/pci/pci.h
> +++ b/drivers/pci/pci.h
> @@ -240,6 +240,7 @@ bool pci_bus_generic_read_dev_vendor_id(struct pci_bus *bus, int devfn, u32 *pl,
>  					int crs_timeout);
>  int pci_idt_bus_quirk(struct pci_bus *bus, int devfn, u32 *pl, int crs_timeout);
>  
> +int __pci_setup_device(struct pci_dev *dev, bool *present_but_skipped);
>  int pci_setup_device(struct pci_dev *dev);
>  int __pci_read_base(struct pci_dev *dev, enum pci_bar_type type,
>  		    struct resource *res, unsigned int reg);
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index 0b2826c4a832..17a51fa55020 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -1811,17 +1811,7 @@ static void early_dump_pci_device(struct pci_dev *pdev)
>  		       value, 256, false);
>  }
>  
> -/**
> - * pci_setup_device - Fill in class and map information of a device
> - * @dev: the device structure to fill
> - *
> - * Initialize the device structure with information about the device's
> - * vendor,class,memory and IO-space addresses, IRQ lines etc.
> - * Called at initialisation of the PCI subsystem and by CardBus services.
> - * Returns 0 on success and negative if unknown type of device (not normal,
> - * bridge or CardBus).
> - */
> -int pci_setup_device(struct pci_dev *dev)
> +int __pci_setup_device(struct pci_dev *dev, bool *present_but_skipped)
>  {
>  	u32 class;
>  	u16 cmd;
> @@ -1841,8 +1831,10 @@ int pci_setup_device(struct pci_dev *dev)
>  	set_pcie_port_type(dev);
>  
>  	err = pci_set_of_node(dev);
> -	if (err)
> +	if (err) {
> +		*present_but_skipped = true;
>  		return err;
> +	}
>  	pci_set_acpi_fwnode(dev);
>  
>  	pci_dev_assign_slot(dev);
> @@ -1995,6 +1987,23 @@ int pci_setup_device(struct pci_dev *dev)
>  	return 0;
>  }
>  
> +/**
> + * pci_setup_device - Fill in class and map information of a device
> + * @dev: the device structure to fill
> + *
> + * Initialize the device structure with information about the device's
> + * vendor,class,memory and IO-space addresses, IRQ lines etc.
> + * Called at initialisation of the PCI subsystem and by CardBus services.
> + * Returns 0 on success and negative if unknown type of device (not normal,
> + * bridge or CardBus).
> + */
> +int pci_setup_device(struct pci_dev *dev)
> +{
> +	bool present_but_skipped = false;
> +
> +	return __pci_setup_device(dev, &present_but_skipped);
> +}
> +
>  static void pci_configure_mps(struct pci_dev *dev)
>  {
>  	struct pci_dev *bridge = pci_upstream_bridge(dev);
> @@ -2414,7 +2423,8 @@ EXPORT_SYMBOL(pci_bus_read_dev_vendor_id);
>   * Read the config data for a PCI device, sanity-check it,
>   * and fill in the dev structure.
>   */
> -static struct pci_dev *pci_scan_device(struct pci_bus *bus, int devfn)
> +static struct pci_dev *pci_scan_device(struct pci_bus *bus, int devfn,
> +				       bool *present_but_skipped)
>  {
>  	struct pci_dev *dev;
>  	u32 l;
> @@ -2430,7 +2440,7 @@ static struct pci_dev *pci_scan_device(struct pci_bus *bus, int devfn)
>  	dev->vendor = l & 0xffff;
>  	dev->device = (l >> 16) & 0xffff;
>  
> -	if (pci_setup_device(dev)) {
> +	if (__pci_setup_device(dev, present_but_skipped)) {
>  		pci_bus_put(dev->bus);
>  		kfree(dev);
>  		return NULL;
> @@ -2575,17 +2585,20 @@ void pci_device_add(struct pci_dev *dev, struct pci_bus *bus)
>  	WARN_ON(ret < 0);
>  }
>  
> -struct pci_dev *pci_scan_single_device(struct pci_bus *bus, int devfn)
> +static struct pci_dev *__pci_scan_single_device(struct pci_bus *bus, int devfn,
> +						bool *present_but_skipped)
>  {
>  	struct pci_dev *dev;
>  
> +	*present_but_skipped = false;
> +
>  	dev = pci_get_slot(bus, devfn);
>  	if (dev) {
>  		pci_dev_put(dev);
>  		return dev;
>  	}
>  
> -	dev = pci_scan_device(bus, devfn);
> +	dev = pci_scan_device(bus, devfn, present_but_skipped);
>  	if (!dev)
>  		return NULL;
>  
> @@ -2593,6 +2606,13 @@ struct pci_dev *pci_scan_single_device(struct pci_bus *bus, int devfn)
>  
>  	return dev;
>  }
> +
> +struct pci_dev *pci_scan_single_device(struct pci_bus *bus, int devfn)
> +{
> +	bool present_but_skipped;
> +
> +	return __pci_scan_single_device(bus, devfn, &present_but_skipped);
> +}
>  EXPORT_SYMBOL(pci_scan_single_device);
>  
>  static int next_ari_fn(struct pci_bus *bus, struct pci_dev *dev, int fn)
> @@ -2665,6 +2685,7 @@ static int only_one_child(struct pci_bus *bus)
>   */
>  int pci_scan_slot(struct pci_bus *bus, int devfn)
>  {
> +	bool present_but_skipped;
>  	struct pci_dev *dev;
>  	int fn = 0, nr = 0;
>  
> @@ -2672,13 +2693,14 @@ int pci_scan_slot(struct pci_bus *bus, int devfn)
>  		return 0; /* Already scanned the entire slot */
>  
>  	do {
> -		dev = pci_scan_single_device(bus, devfn + fn);
> +		dev = __pci_scan_single_device(bus, devfn + fn,
> +					       &present_but_skipped);
>  		if (dev) {
>  			if (!pci_dev_is_added(dev))
>  				nr++;
>  			if (fn > 0)
>  				dev->multifunction = 1;
> -		} else if (fn == 0) {
> +		} else if (fn == 0 && !present_but_skipped) {
>  			/*
>  			 * Function 0 is required unless we are running on
>  			 * a hypervisor that passes through individual PCI
> -- 
> 2.34.1
>
Vladimir Oltean May 30, 2023, 10:04 p.m. UTC | #3
On Tue, May 30, 2023 at 04:58:55PM -0500, Bjorn Helgaas wrote:
> Can you write this description in terms of PCI topology?  The
> nitty-gritty SERDES details are not relevant at this level, except to
> say that Function 0 is present in some cases but not others, and when
> it is not present, *other* functions may be present.

No. It is to say that within the device, all PCIe functions (including 0)
are always available and have the same number, but depending on SERDES
configuration, their PCIe presence might be practically useful or not.
So that's how function 0 may end having status = "disabled" in the
device tree.

> Sigh.  Per spec (PCIe r6.0, sec 7.5.1.1.9), software is not permitted
> to probe for Functions other than 0 unless "explicitly indicated by
> another mechanism, such as an ARI or SR-IOV Capability."
> 
> Does it "work" to probe when the spec prohibits it?  Probably.  Does
> it lead to some breakage elsewhere eventually?  Quite possibly.  They
> didn't put "software must not probe" in the spec just to make
> enumeration faster.
> 
> So I'm a little grumpy about further complicating this already messy
> path just to accommodate a new non-compliant SoC.  Everybody pays the
> price of understanding all this stuff, and it doesn't seem in balance.
> 
> Can you take advantage of some existing mechanism like
> PCI_SCAN_ALL_PCIE_DEVS or hypervisor_isolated_pci_functions() (which
> could be renamed and made more general)?

Not responding yet to the rest of the email since it's not clear to me
that you've understood function 0 is absolutely present and responds
to all config space accesses - it's just disabled in the device tree
because the user doesn't have something useful to do with it.
Bjorn Helgaas May 30, 2023, 10:27 p.m. UTC | #4
On Wed, May 31, 2023 at 01:04:36AM +0300, Vladimir Oltean wrote:
> On Tue, May 30, 2023 at 04:58:55PM -0500, Bjorn Helgaas wrote:
> > Can you write this description in terms of PCI topology?  The
> > nitty-gritty SERDES details are not relevant at this level, except to
> > say that Function 0 is present in some cases but not others, and when
> > it is not present, *other* functions may be present.
> 
> No. It is to say that within the device, all PCIe functions (including 0)
> are always available and have the same number, but depending on SERDES
> configuration, their PCIe presence might be practically useful or not.
> So that's how function 0 may end having status = "disabled" in the
> device tree.
>
> > Sigh.  Per spec (PCIe r6.0, sec 7.5.1.1.9), software is not permitted
> > to probe for Functions other than 0 unless "explicitly indicated by
> > another mechanism, such as an ARI or SR-IOV Capability."
> > 
> > Does it "work" to probe when the spec prohibits it?  Probably.  Does
> > it lead to some breakage elsewhere eventually?  Quite possibly.  They
> > didn't put "software must not probe" in the spec just to make
> > enumeration faster.
> > 
> > So I'm a little grumpy about further complicating this already messy
> > path just to accommodate a new non-compliant SoC.  Everybody pays the
> > price of understanding all this stuff, and it doesn't seem in balance.
> > 
> > Can you take advantage of some existing mechanism like
> > PCI_SCAN_ALL_PCIE_DEVS or hypervisor_isolated_pci_functions() (which
> > could be renamed and made more general)?
> 
> Not responding yet to the rest of the email since it's not clear to me
> that you've understood function 0 is absolutely present and responds
> to all config space accesses - it's just disabled in the device tree
> because the user doesn't have something useful to do with it.

Ah, you're right, sorry I missed that.  Dispensing with the SERDES
details would make this more obvious.

Not sure why this needs to change the pci_scan_slot() path, since
Function 0 is present and enumerable even though it's not useful in
some cases.  Seems like something in pci_set_of_node() or a quirk
could do whatever you need to do.

Bjorn
Vladimir Oltean May 30, 2023, 11:15 p.m. UTC | #5
On Tue, May 30, 2023 at 05:27:24PM -0500, Bjorn Helgaas wrote:
> Ah, you're right, sorry I missed that.  Dispensing with the SERDES
> details would make this more obvious.

Lesson learned. When I had just gotten out of college, every time I asked
the coworkers in my company what they're up to, I was amazed by them just
proceeding to tell me all the nitty gritty details of what they're doing
and debugging, like I was supposed to understand or care for that matter.
"Dude, can't you just paint the high level idea without using dorky words?"
Now I'm one of them...

> Not sure why this needs to change the pci_scan_slot() path, since
> Function 0 is present and enumerable even though it's not useful in
> some cases.

Well, the rationale for me was pretty simple: it's the pci_scan_slot() logic
that I want to change - continue enumeration in some cases when the pci_dev
for fn 0 is NULL - and I'm otherwise perfectly okay with pci_scan_slot()
getting a NULL pci_dev from pci_setup_device() for fn 0. That wasn't something
I had in mind to change.

This patch is what it takes to propagate a qualifier, without leaving a mark
in any structure, for that NULL return code: is it NULL because enumeration
came up with nothing, or is it NULL because pci_set_of_node() said so?

> Seems like something in pci_set_of_node() or a quirk could do whatever
> you need to do.

Could you help me out with a more detailed hint here? I'm not really
familiar with the PCI core code. You probably mean to suggest leaving a
stateful flag somewhere, though I'm not exactly sure where that is, that
would reach pci_scan_slot() enough to be able to alter its decision.
Bjorn Helgaas May 31, 2023, 4:56 p.m. UTC | #6
On Wed, May 31, 2023 at 02:15:09AM +0300, Vladimir Oltean wrote:
> On Tue, May 30, 2023 at 05:27:24PM -0500, Bjorn Helgaas wrote:
> > Ah, you're right, sorry I missed that.  Dispensing with the SERDES
> > details would make this more obvious.
> 
> Lesson learned. When I had just gotten out of college, every time I asked
> the coworkers in my company what they're up to, I was amazed by them just
> proceeding to tell me all the nitty gritty details of what they're doing
> and debugging, like I was supposed to understand or care for that matter.
> "Dude, can't you just paint the high level idea without using dorky words?"
> Now I'm one of them...

Haha :)  Communication is the hardest part!

> > Seems like something in pci_set_of_node() or a quirk could do whatever
> > you need to do.
> 
> Could you help me out with a more detailed hint here? I'm not really
> familiar with the PCI core code. You probably mean to suggest leaving a
> stateful flag somewhere, though I'm not exactly sure where that is, that
> would reach pci_scan_slot() enough to be able to alter its decision.

What bad things happen without this patch?  I guess we enumerate
Function 0 but in some cases it's not useful?  That in itself wouldn't
be a disaster; there are lots of things we enumerate but don't use.
But in this case, maybe a driver would claim Function 0 but it
wouldn't work as expected?

Bjorn
Vladimir Oltean May 31, 2023, 4:58 p.m. UTC | #7
On Wed, May 31, 2023 at 11:56:02AM -0500, Bjorn Helgaas wrote:
> On Wed, May 31, 2023 at 02:15:09AM +0300, Vladimir Oltean wrote:
> > On Tue, May 30, 2023 at 05:27:24PM -0500, Bjorn Helgaas wrote:
> > > Ah, you're right, sorry I missed that.  Dispensing with the SERDES
> > > details would make this more obvious.
> > 
> > Lesson learned. When I had just gotten out of college, every time I asked
> > the coworkers in my company what they're up to, I was amazed by them just
> > proceeding to tell me all the nitty gritty details of what they're doing
> > and debugging, like I was supposed to understand or care for that matter.
> > "Dude, can't you just paint the high level idea without using dorky words?"
> > Now I'm one of them...
> 
> Haha :)  Communication is the hardest part!

I know...

> What bad things happen without this patch?

It's in the commit title: probing the entire device (PCI device!!!) is
skipped if function 0 has status = "disabled". Aka PCIe functions 1, 2, 3, 4, ...
Bjorn Helgaas May 31, 2023, 8:24 p.m. UTC | #8
[+cc Loongson folks, thread at
https://lore.kernel.org/r/20230521115141.2384444-1-vladimir.oltean@nxp.com]

On Wed, May 31, 2023 at 07:58:19PM +0300, Vladimir Oltean wrote:
> On Wed, May 31, 2023 at 11:56:02AM -0500, Bjorn Helgaas wrote:
> > What bad things happen without this patch?
> 
> It's in the commit title: probing the entire device (PCI device!!!) is
> skipped if function 0 has status = "disabled". Aka PCIe functions 1, 2, 3, 4, ...

I guess I should have asked "what bad things happen without this patch
and without the DT 'disabled' status"?

I think 6fffbc7ae137 ("PCI: Honor firmware's device disabled status")
was basically a workaround for Loongson making a device visible in PCI
config space when it shouldn't have been [1].

6fffbc7ae137 [2] means we pretend the PCI device doesn't exist if DT
status is "disabled".  If the device happens to be Function 0, that
means we don't look for any more functions.  I guess that doesn't
matter for Loongson.  But it does matter for this NXP platform, where
we don't want to use Function 0, but we *do* want to use other
Functions.

There are several PCIe things that are required to be in Function 0
(MPS, ASPM, IDE, CMA/SPDM, etc), at least in certain cases.

What would happen if instead of making pci_setup_device() fail (as
both 6fffbc7ae137 and this patch do, which means the device doesn't
even show up in "lspci"), we just prevent drivers from binding to it,
e.g., by making pci_device_probe() fail?  The device would then appear
in "lspci" and the PCI core would configure things as usual, but no
drivers would be able to claim it.

Bjorn

[1] https://lore.kernel.org/all/20221114074346.23008-1-liupeibao@loongson.cn/
[2] https://git.kernel.org/linus/6fffbc7ae137
Vladimir Oltean June 1, 2023, 8:11 a.m. UTC | #9
On Wed, May 31, 2023 at 03:24:46PM -0500, Bjorn Helgaas wrote:
> I guess I should have asked "what bad things happen without this patch
> and without the DT 'disabled' status"?

Well, now that you put it this way, I do realize that things are not so
ideal for me.

Our drivers for the functions of this device were already checking for
of_device_is_available() during probe. So, reverting the core PCIe
patch, they would still not register a network interface, which is good.

However (and this is the bad part), multiple functions of this PCIe
device unfortunately share a common memory, which is not zeroized by
hardware, and so, to avoid multi-bit ECC errors, it must be zeroized by
software, using some memory space accesses from all functions that have
access to that shared memory (every function zeroizes its piece of it).
This, sadly, includes functions which have status = "disabled". See
commit 3222b5b613db ("net: enetc: initialize RFS/RSS memories for unused
ports too").

What we used to do was start probing a bit in enetc_pf_probe(), enable
the memory space, zeroize our part of the shared memory, then check
of_device_is_available() and finally, we disable the memory space again
and exit probing with -ENODEV.

That is not possible anymore with the core patch, because the PCIe core
will not probe our disabled functions at all anymore.

The ENETC is not a hot-pluggable PCIe device. It uses Enhanced Allocation
to essentially describe on-chip memory spaces, which are always present.
So presumably, a different system-level solution to initialize those
shared memories (U-Boot?) may be chosen, if implementing this workaround
in Linux puts too much pressure on the PCIe core and the way in which it
does things. Initially I didn't want to do this in prior boot stages
because we only enable the RCEC in Linux, nothing is broken other than
the spurious AER messages, and, you know.. the kernel may still run
indefinitely on top of bootloaders which don't have the workaround applied.
So working around it in Linux avoids one dependency.
Bjorn Helgaas June 1, 2023, 3:44 p.m. UTC | #10
On Thu, Jun 01, 2023 at 11:11:56AM +0300, Vladimir Oltean wrote:
> On Wed, May 31, 2023 at 03:24:46PM -0500, Bjorn Helgaas wrote:
> > I guess I should have asked "what bad things happen without this patch
> > and without the DT 'disabled' status"?
> 
> Well, now that you put it this way, I do realize that things are not so
> ideal for me.
> 
> Our drivers for the functions of this device were already checking for
> of_device_is_available() during probe. So, reverting the core PCIe
> patch, they would still not register a network interface, which is good.
> 
> However (and this is the bad part), multiple functions of this PCIe
> device unfortunately share a common memory, which is not zeroized by
> hardware, and so, to avoid multi-bit ECC errors, it must be zeroized by
> software, using some memory space accesses from all functions that have
> access to that shared memory (every function zeroizes its piece of it).
> This, sadly, includes functions which have status = "disabled". See
> commit 3222b5b613db ("net: enetc: initialize RFS/RSS memories for unused
> ports too").
> 
> What we used to do was start probing a bit in enetc_pf_probe(), enable
> the memory space, zeroize our part of the shared memory, then check
> of_device_is_available() and finally, we disable the memory space again
> and exit probing with -ENODEV.
> 
> That is not possible anymore with the core patch, because the PCIe core
> will not probe our disabled functions at all anymore.

To make sure I understand you, I think you're saying that if Function
0 has DT status "disabled", 6fffbc7ae137 ("PCI: Honor firmware's
device disabled status") breaks things because we don't enumerate
Function 0 and the driver can't temporarily claim it to zero out its
piece of the shared memory.

With just 6fffbc7ae137, we don't enumerate Function 0, which means we
don't see that it's a multi-function device, so we don't enumerate
Functions 1, 2, etc, either.

With both 6fffbc7ae137 and your current patch, we would enumerate
Functions 1, 2, etc, but we still skip Function 0, so its piece of the
shared memory still doesn't get zeroed.

> The ENETC is not a hot-pluggable PCIe device. It uses Enhanced Allocation
> to essentially describe on-chip memory spaces, which are always present.
> So presumably, a different system-level solution to initialize those
> shared memories (U-Boot?) may be chosen, if implementing this workaround
> in Linux puts too much pressure on the PCIe core and the way in which it
> does things. Initially I didn't want to do this in prior boot stages
> because we only enable the RCEC in Linux, nothing is broken other than
> the spurious AER messages, and, you know.. the kernel may still run
> indefinitely on top of bootloaders which don't have the workaround applied.
> So working around it in Linux avoids one dependency.

If I understand correctly, something (bootloader or Linux) needs to do
something to Function 0 (e.g., clear memory).  Doing it in Linux would
minimize dependences on the bootloader, so that seems desirable to me.
That means Linux needs to enumerate Function 0 so it is visible to a
driver or possibly a quirk.

I think we could contemplate implementing 6fffbc7ae137 in a different
way.  Checking DT status at driver probe-time would probably work for
Loongson, but wouldn't quite solve the NXP problem because the driver
wouldn't be able to claim Function 0 even temporarily.

Is DT the only way to learn the NXP SERDES configuration?  I think it
would be much better if there were a way to programmatically learn it,
because then you wouldn't have to worry about syncing the DT with the
platform configuration, and it would decouple this from the Loongson
situation.

(If there were a way to actually discover the Loongson situation
instead of relying on DT, e.g., by keying off a Device ID or
something, that would be much better, too.  I assume we explored that,
but I don't remember the details.)

Bjorn
Vladimir Oltean June 1, 2023, 4:33 p.m. UTC | #11
On Thu, Jun 01, 2023 at 10:44:45AM -0500, Bjorn Helgaas wrote:
> To make sure I understand you, I think you're saying that if Function
> 0 has DT status "disabled", 6fffbc7ae137 ("PCI: Honor firmware's
> device disabled status") breaks things because we don't enumerate
> Function 0 and the driver can't temporarily claim it to zero out its
> piece of the shared memory.
> 
> With just 6fffbc7ae137, we don't enumerate Function 0, which means we
> don't see that it's a multi-function device, so we don't enumerate
> Functions 1, 2, etc, either.
> 
> With both 6fffbc7ae137 and your current patch, we would enumerate
> Functions 1, 2, etc, but we still skip Function 0, so its piece of the
> shared memory still doesn't get zeroed.

I'm saying that as long as commit 6fffbc7ae137 ("PCI: Honor firmware's
device disabled status") exists in the form where the pci_driver :: probe()
is completely skipped for disabled functions, the NXP ENETC PCIe device
has a problem no matter what the function number is. That problem is:
the device drivers of all PCIe functions need to clear some memory
before they ultimately fail to probe (as they should), because of some
hardware design oversight. That is no longer possible if the driver has
no hook to execute code for those devices that are disabled.

On top of that, function 0 having status = "disabled" is extra
problematic, because the PCI core will now just assume that functions 1 .. N
don't exist at all, which is simply false, because the usefulness of
ENETC port 0 (PCIe function 0) from a networking perspective is
independent from the usefulness of ENETC port 1 (PCIe function 1), ENETC
port 2 etc.

> 
> > The ENETC is not a hot-pluggable PCIe device. It uses Enhanced Allocation
> > to essentially describe on-chip memory spaces, which are always present.
> > So presumably, a different system-level solution to initialize those
> > shared memories (U-Boot?) may be chosen, if implementing this workaround
> > in Linux puts too much pressure on the PCIe core and the way in which it
> > does things. Initially I didn't want to do this in prior boot stages
> > because we only enable the RCEC in Linux, nothing is broken other than
> > the spurious AER messages, and, you know.. the kernel may still run
> > indefinitely on top of bootloaders which don't have the workaround applied.
> > So working around it in Linux avoids one dependency.
> 
> If I understand correctly, something (bootloader or Linux) needs to do
> something to Function 0 (e.g., clear memory).

To more than just function 0 (also 1, 2 and 6). There are 2 confounding
problems, the latter being something that was exposed by your question:
what will happen that's bad with the current mainline code structure,
*notwithstanding* the fact that function 0 may have status = "disabled"
(which currently will skip enumeration for the rest of the functions
which don't have status = "disabled").

> Doing it in Linux would minimize dependences on the bootloader, so
> that seems desirable to me. That means Linux needs to enumerate
> Function 0 so it is visible to a driver or possibly a quirk.

Uhm... no, that wouldn't be enough. Only a straight revert would satisfy
the workaround that we currently have for NXP ENETC in Linux.

Also, I'm not sure if it was completely reasonable of me in the first
place to exploit this quirk of the Linux PCI bus - that the probe
function is called even if a device is disabled in the device tree.
I would understand if I was forced to rethink that.

> I think we could contemplate implementing 6fffbc7ae137 in a different
> way.  Checking DT status at driver probe-time would probably work for
> Loongson, but wouldn't quite solve the NXP problem because the driver
> wouldn't be able to claim Function 0 even temporarily.

Not sure what you mean by "checking DT status at driver probe-time".
Does enetc_pf_probe() -> of_device_is_available() qualify? You probably
mean earlier than that.

My problem is that I don't really understand what was the functional
need for commit 6fffbc7ae137 ("PCI: Honor firmware's device disabled
status") in the first place, considering that any device driver can
already fail to probe based on the same condition at its own will.

> Is DT the only way to learn the NXP SERDES configuration?  I think it
> would be much better if there were a way to programmatically learn it,
> because then you wouldn't have to worry about syncing the DT with the
> platform configuration, and it would decouple this from the Loongson
> situation.

Syncing the DT with the platform configuration will always be necessary,
because for networking we will also need extra information which is
completely non-discoverable, like a phy-handle or such, and that depends
on the wiring and static pinmuxing of the SoC. So it is practically
reasonable to expect that what is usable has status = "okay", and what
isn't has status = "disabled". Not to mention, there are already device
trees in circulation which are written that way, and those need to
continue to work.

> (If there were a way to actually discover the Loongson situation
> instead of relying on DT, e.g., by keying off a Device ID or
> something, that would be much better, too.  I assume we explored that,
> but I don't remember the details.)
> 
> Bjorn

What is it that's special about the Loongson situation?
Bjorn Helgaas June 1, 2023, 5:51 p.m. UTC | #12
On Thu, Jun 01, 2023 at 07:33:35PM +0300, Vladimir Oltean wrote:
> On Thu, Jun 01, 2023 at 10:44:45AM -0500, Bjorn Helgaas wrote:
> > To make sure I understand you, I think you're saying that if Function
> > 0 has DT status "disabled", 6fffbc7ae137 ("PCI: Honor firmware's
> > device disabled status") breaks things because we don't enumerate
> > Function 0 and the driver can't temporarily claim it to zero out its
> > piece of the shared memory.
> > 
> > With just 6fffbc7ae137, we don't enumerate Function 0, which means we
> > don't see that it's a multi-function device, so we don't enumerate
> > Functions 1, 2, etc, either.
> > 
> > With both 6fffbc7ae137 and your current patch, we would enumerate
> > Functions 1, 2, etc, but we still skip Function 0, so its piece of the
> > shared memory still doesn't get zeroed.
> 
> I'm saying that as long as commit 6fffbc7ae137 ("PCI: Honor firmware's
> device disabled status") exists in the form where the pci_driver :: probe()
> is completely skipped for disabled functions, the NXP ENETC PCIe device
> has a problem no matter what the function number is.

Yep.

> That problem is:
> the device drivers of all PCIe functions need to clear some memory
> before they ultimately fail to probe (as they should), because of some
> hardware design oversight. That is no longer possible if the driver has
> no hook to execute code for those devices that are disabled.

Yep.  If there's no pci_dev, there's no nice way to do anything to the
device.

> On top of that, function 0 having status = "disabled" is extra
> problematic, because the PCI core will now just assume that functions 1 .. N
> don't exist at all, which is simply false, because the usefulness of
> ENETC port 0 (PCIe function 0) from a networking perspective is
> independent from the usefulness of ENETC port 1 (PCIe function 1), ENETC
> port 2 etc.

Yes.

> > > The ENETC is not a hot-pluggable PCIe device. It uses Enhanced Allocation
> > > to essentially describe on-chip memory spaces, which are always present.
> > > So presumably, a different system-level solution to initialize those
> > > shared memories (U-Boot?) may be chosen, if implementing this workaround
> > > in Linux puts too much pressure on the PCIe core and the way in which it
> > > does things. Initially I didn't want to do this in prior boot stages
> > > because we only enable the RCEC in Linux, nothing is broken other than
> > > the spurious AER messages, and, you know.. the kernel may still run
> > > indefinitely on top of bootloaders which don't have the workaround applied.
> > > So working around it in Linux avoids one dependency.
> > 
> > If I understand correctly, something (bootloader or Linux) needs to do
> > something to Function 0 (e.g., clear memory).
> 
> To more than just function 0 (also 1, 2 and 6).

Yes.

> There are 2 confounding
> problems, the latter being something that was exposed by your question:
> what will happen that's bad with the current mainline code structure,
> *notwithstanding* the fact that function 0 may have status = "disabled"
> (which currently will skip enumeration for the rest of the functions
> which don't have status = "disabled").
> 
> > Doing it in Linux would minimize dependences on the bootloader, so
> > that seems desirable to me. That means Linux needs to enumerate
> > Function 0 so it is visible to a driver or possibly a quirk.
> 
> Uhm... no, that wouldn't be enough. Only a straight revert would satisfy
> the workaround that we currently have for NXP ENETC in Linux.

I guess you mean a revert of 6fffbc7ae137?  This whole conversation is
about whether we can rework 6fffbc7ae137 to work both for Loongson and
for you, so nothing is decided yet.

The point is, I assume you agree that it's preferable if we don't have
to depend on a bootloader to clear the memory.

> Also, I'm not sure if it was completely reasonable of me in the first
> place to exploit this quirk of the Linux PCI bus - that the probe
> function is called even if a device is disabled in the device tree.
> I would understand if I was forced to rethink that.

After 6fffbc7ae137, the probe function is not called if the device is
disabled in DT because there's no pci_dev for it at all.

> > I think we could contemplate implementing 6fffbc7ae137 in a different
> > way.  Checking DT status at driver probe-time would probably work for
> > Loongson, but wouldn't quite solve the NXP problem because the driver
> > wouldn't be able to claim Function 0 even temporarily.
> 
> Not sure what you mean by "checking DT status at driver probe-time".
> Does enetc_pf_probe() -> of_device_is_available() qualify? You probably
> mean earlier than that.

I was thinking about something in pci_device_probe(), e.g., by
extending pci_device_can_probe().  But again, we're just exploring the
solution space; I'm not saying this is the best or only path.

> My problem is that I don't really understand what was the functional
> need for commit 6fffbc7ae137 ("PCI: Honor firmware's device disabled
> status") in the first place, considering that any device driver can
> already fail to probe based on the same condition at its own will.

In general, PCI drivers shouldn't rely on DT.  If the bus driver (PCI
in this case) calls a driver's probe function, the driver can assume
the device exists.  But enetc is not a general-purpose driver, and if
DT is the only way to discover this property, I guess you're stuck
doing that.

> > Is DT the only way to learn the NXP SERDES configuration?  I think it
> > would be much better if there were a way to programmatically learn it,
> > because then you wouldn't have to worry about syncing the DT with the
> > platform configuration, and it would decouple this from the Loongson
> > situation.
> 
> Syncing the DT with the platform configuration will always be necessary,
> because for networking we will also need extra information which is
> completely non-discoverable, like a phy-handle or such, and that depends
> on the wiring and static pinmuxing of the SoC. So it is practically
> reasonable to expect that what is usable has status = "okay", and what
> isn't has status = "disabled". Not to mention, there are already device
> trees in circulation which are written that way, and those need to
> continue to work.

Just because we need DT for non-discoverable info A doesn't mean we
should depend on it for B if B *is* discoverable.

This question of disabling a device via DT but still needing to do
things to the device is ... kind of a sticky wicket.

Maybe this should be a different DT property (not "status").  Then PCI
enumeration could work normally and 6fffbc7ae137 wouldn't be in the
way.

> > (If there were a way to actually discover the Loongson situation
> > instead of relying on DT, e.g., by keying off a Device ID or
> > something, that would be much better, too.  I assume we explored that,
> > but I don't remember the details.)
> 
> What is it that's special about the Loongson situation?
Vladimir Oltean June 1, 2023, 10:15 p.m. UTC | #13
On Thu, Jun 01, 2023 at 12:51:39PM -0500, Bjorn Helgaas wrote:
> > > Doing it in Linux would minimize dependences on the bootloader, so
> > > that seems desirable to me. That means Linux needs to enumerate
> > > Function 0 so it is visible to a driver or possibly a quirk.
> > 
> > Uhm... no, that wouldn't be enough. Only a straight revert would satisfy
> > the workaround that we currently have for NXP ENETC in Linux.
> 
> I guess you mean a revert of 6fffbc7ae137?

Yes.

> This whole conversation is about whether we can rework 6fffbc7ae137 to
> work both for Loongson and for you, so nothing is decided yet.

After reading
https://lore.kernel.org/linux-pci/20221117020935.32086-1-liupeibao@loongson.cn/
and
https://lore.kernel.org/linux-pci/20221103090040.836-1-liupeibao@loongson.cn/
and seeing the GMAC OF node at arch/mips/boot/dts/loongson/loongson64-2k1000.dtsi,
I believe that a solution that would work for both Loongson and NXP would be to:

- patch loongson_dwmac_probe() to check for of_device_is_available()
- revert commit 6fffbc7ae137 ("PCI: Honor firmware's device disabled
  status")

I'm not sure what else of what was concretely proposed would work.
Anything else is just wishful thinking that the PCI core can start
enforcing a central policy, after letting device drivers get to choose
how (and whether) to treat the "status" OF property for years on end.

As an added benefit, the disabled GMAC would become visible in lspci for
the Loongson SoC.

> The point is, I assume you agree that it's preferable if we don't have
> to depend on a bootloader to clear the memory.

I am confused by the message you are transmitting here.

With my user hat on, yes, maintaining the effect of commit 3222b5b613db
from Linux is preferable.

Although Rob will probably not be happy about the way in which that will
be achieved. And you haven't proposed ways in which that would remain
possible, short of a revert of commit 6fffbc7ae137.

> After 6fffbc7ae137, the probe function is not called if the device is
> disabled in DT because there's no pci_dev for it at all.

Correct, but commit 3222b5b613db pre-dates it by 2 years, and thus, it
is broken by Rob's change.

> > My problem is that I don't really understand what was the functional
> > need for commit 6fffbc7ae137 ("PCI: Honor firmware's device disabled
> > status") in the first place, considering that any device driver can
> > already fail to probe based on the same condition at its own will.
> 
> In general, PCI drivers shouldn't rely on DT.  If the bus driver (PCI
> in this case) calls a driver's probe function, the driver can assume
> the device exists.

Well, the device exists...

> But enetc is not a general-purpose driver, and if DT is the only way
> to discover this property, I guess you're stuck doing that.

So what Loongson tried to do - break enumeration of the on-chip GMAC
PCIe device at the level of the PCIe controller, if the GMAC's pinmuxing
doesn't make it available for networking - is encouraged?

Do you consider that their patch would have been better in the original
form, if instead of the "skip-scan" property, they would have built some
smarts into drivers/pci/controller/pci-loongson.c which would intentionally
break config space access to gmac@3,0, without requiring OF to specify this?

Are you saying that this "present but unusable due to pinmuxing" is an
incorrect use of status = "disabled"? What would it constitute correct
use of, then?

The analogous situation for ENETC would be to patch the "pci-host-ecam-generic"
driver to read the SERDES and pinmuxing configuration of the SoC, and to
mask/unmask the config access to function 0 based on that. I mean - I could...
but is it really a good idea? The principle of separation of concerns
tells me no. The fact that the pinmuxing of the device makes it unavailable
pertains to the IP-specific logic, it doesn't change whether it's enumerable
or accessible on its bus.

> > > Is DT the only way to learn the NXP SERDES configuration?  I think it
> > > would be much better if there were a way to programmatically learn it,
> > > because then you wouldn't have to worry about syncing the DT with the
> > > platform configuration, and it would decouple this from the Loongson
> > > situation.
> > 
> > Syncing the DT with the platform configuration will always be necessary,
> > because for networking we will also need extra information which is
> > completely non-discoverable, like a phy-handle or such, and that depends
> > on the wiring and static pinmuxing of the SoC. So it is practically
> > reasonable to expect that what is usable has status = "okay", and what
> > isn't has status = "disabled". Not to mention, there are already device
> > trees in circulation which are written that way, and those need to
> > continue to work.
> 
> Just because we need DT for non-discoverable info A doesn't mean we
> should depend on it for B if B *is* discoverable.

But the argument was: we already have device trees with a certain
convention, and that is to expect having status = "disabled" for
unusable ports. I don't believe that changing that is realistically in
scope for fixing this. And if we have device trees with status =
"disabled" in circulation which we (I) don't want to break, then we're
back to square 1 regarding the probing of disabled devices.

> This question of disabling a device via DT but still needing to do
> things to the device is ... kind of a sticky wicket.

It boils down to whether accessing a disabled device is permitted or
not. I opened the devicetree specification and it didn't say anything
conclusive. Though it's certainly above my pay grade to say anything
with certainty in this area. Apart from "okay" and "disabled", "status"
takes other documented values too, like "reserved", "fail" and
"fail-sss". Linux treats everything that's not "okay" the same.
Krzysztof Kozlowski came with the suggestion for Loongson to replace
"skip-scan" with "status", during the review of their v1 patch.

In any case, that question will only recur one level lower - in U-Boot,
where we make an effort to keep device trees in sync in Linux. Why would
U-Boot need to do things to a disabled device? :)

> Maybe this should be a different DT property (not "status").  Then PCI
> enumeration could work normally and 6fffbc7ae137 wouldn't be in the
> way.

I'm not quite sure where you're going with this. More concretely?
Huacai Chen June 2, 2023, 4:06 a.m. UTC | #14
+Cc Jianmin Lv

Hi, Jianmin,

You are the most familiar person in this field, could you please give some suggestions? Thank you.

Huacai


> -----原始邮件-----
> 发件人: "Vladimir Oltean" <vladimir.oltean@nxp.com>
> 发送时间:2023-06-02 06:15:32 (星期五)
> 收件人: "Bjorn Helgaas" <helgaas@kernel.org>
> 抄送: linux-pci@vger.kernel.org, netdev@vger.kernel.org, "Bjorn Helgaas" <bhelgaas@google.com>, "Rob Herring" <robh@kernel.org>, "Claudiu Manoil" <claudiu.manoil@nxp.com>, "Michael Walle" <michael@walle.cc>, linux-kernel@vger.kernel.org, "Liu Peibao" <liupeibao@loongson.cn>, "Binbin Zhou" <zhoubinbin@loongson.cn>, "Huacai Chen" <chenhuacai@loongson.cn>
> 主题: Re: [PATCH pci] PCI: don't skip probing entire device if first fn OF node has status = "disabled"
> 
> On Thu, Jun 01, 2023 at 12:51:39PM -0500, Bjorn Helgaas wrote:
> > > > Doing it in Linux would minimize dependences on the bootloader, so
> > > > that seems desirable to me. That means Linux needs to enumerate
> > > > Function 0 so it is visible to a driver or possibly a quirk.
> > > 
> > > Uhm... no, that wouldn't be enough. Only a straight revert would satisfy
> > > the workaround that we currently have for NXP ENETC in Linux.
> > 
> > I guess you mean a revert of 6fffbc7ae137?
> 
> Yes.
> 
> > This whole conversation is about whether we can rework 6fffbc7ae137 to
> > work both for Loongson and for you, so nothing is decided yet.
> 
> After reading
> https://lore.kernel.org/linux-pci/20221117020935.32086-1-liupeibao@loongson.cn/
> and
> https://lore.kernel.org/linux-pci/20221103090040.836-1-liupeibao@loongson.cn/
> and seeing the GMAC OF node at arch/mips/boot/dts/loongson/loongson64-2k1000.dtsi,
> I believe that a solution that would work for both Loongson and NXP would be to:
> 
> - patch loongson_dwmac_probe() to check for of_device_is_available()
> - revert commit 6fffbc7ae137 ("PCI: Honor firmware's device disabled
>   status")
> 
> I'm not sure what else of what was concretely proposed would work.
> Anything else is just wishful thinking that the PCI core can start
> enforcing a central policy, after letting device drivers get to choose
> how (and whether) to treat the "status" OF property for years on end.
> 
> As an added benefit, the disabled GMAC would become visible in lspci for
> the Loongson SoC.
> 
> > The point is, I assume you agree that it's preferable if we don't have
> > to depend on a bootloader to clear the memory.
> 
> I am confused by the message you are transmitting here.
> 
> With my user hat on, yes, maintaining the effect of commit 3222b5b613db
> from Linux is preferable.
> 
> Although Rob will probably not be happy about the way in which that will
> be achieved. And you haven't proposed ways in which that would remain
> possible, short of a revert of commit 6fffbc7ae137.
> 
> > After 6fffbc7ae137, the probe function is not called if the device is
> > disabled in DT because there's no pci_dev for it at all.
> 
> Correct, but commit 3222b5b613db pre-dates it by 2 years, and thus, it
> is broken by Rob's change.
> 
> > > My problem is that I don't really understand what was the functional
> > > need for commit 6fffbc7ae137 ("PCI: Honor firmware's device disabled
> > > status") in the first place, considering that any device driver can
> > > already fail to probe based on the same condition at its own will.
> > 
> > In general, PCI drivers shouldn't rely on DT.  If the bus driver (PCI
> > in this case) calls a driver's probe function, the driver can assume
> > the device exists.
> 
> Well, the device exists...
> 
> > But enetc is not a general-purpose driver, and if DT is the only way
> > to discover this property, I guess you're stuck doing that.
> 
> So what Loongson tried to do - break enumeration of the on-chip GMAC
> PCIe device at the level of the PCIe controller, if the GMAC's pinmuxing
> doesn't make it available for networking - is encouraged?
> 
> Do you consider that their patch would have been better in the original
> form, if instead of the "skip-scan" property, they would have built some
> smarts into drivers/pci/controller/pci-loongson.c which would intentionally
> break config space access to gmac@3,0, without requiring OF to specify this?
> 
> Are you saying that this "present but unusable due to pinmuxing" is an
> incorrect use of status = "disabled"? What would it constitute correct
> use of, then?
> 
> The analogous situation for ENETC would be to patch the "pci-host-ecam-generic"
> driver to read the SERDES and pinmuxing configuration of the SoC, and to
> mask/unmask the config access to function 0 based on that. I mean - I could...
> but is it really a good idea? The principle of separation of concerns
> tells me no. The fact that the pinmuxing of the device makes it unavailable
> pertains to the IP-specific logic, it doesn't change whether it's enumerable
> or accessible on its bus.
> 
> > > > Is DT the only way to learn the NXP SERDES configuration?  I think it
> > > > would be much better if there were a way to programmatically learn it,
> > > > because then you wouldn't have to worry about syncing the DT with the
> > > > platform configuration, and it would decouple this from the Loongson
> > > > situation.
> > > 
> > > Syncing the DT with the platform configuration will always be necessary,
> > > because for networking we will also need extra information which is
> > > completely non-discoverable, like a phy-handle or such, and that depends
> > > on the wiring and static pinmuxing of the SoC. So it is practically
> > > reasonable to expect that what is usable has status = "okay", and what
> > > isn't has status = "disabled". Not to mention, there are already device
> > > trees in circulation which are written that way, and those need to
> > > continue to work.
> > 
> > Just because we need DT for non-discoverable info A doesn't mean we
> > should depend on it for B if B *is* discoverable.
> 
> But the argument was: we already have device trees with a certain
> convention, and that is to expect having status = "disabled" for
> unusable ports. I don't believe that changing that is realistically in
> scope for fixing this. And if we have device trees with status =
> "disabled" in circulation which we (I) don't want to break, then we're
> back to square 1 regarding the probing of disabled devices.
> 
> > This question of disabling a device via DT but still needing to do
> > things to the device is ... kind of a sticky wicket.
> 
> It boils down to whether accessing a disabled device is permitted or
> not. I opened the devicetree specification and it didn't say anything
> conclusive. Though it's certainly above my pay grade to say anything
> with certainty in this area. Apart from "okay" and "disabled", "status"
> takes other documented values too, like "reserved", "fail" and
> "fail-sss". Linux treats everything that's not "okay" the same.
> Krzysztof Kozlowski came with the suggestion for Loongson to replace
> "skip-scan" with "status", during the review of their v1 patch.
> 
> In any case, that question will only recur one level lower - in U-Boot,
> where we make an effort to keep device trees in sync in Linux. Why would
> U-Boot need to do things to a disabled device? :)
> 
> > Maybe this should be a different DT property (not "status").  Then PCI
> > enumeration could work normally and 6fffbc7ae137 wouldn't be in the
> > way.
> 
> I'm not quite sure where you're going with this. More concretely?


本邮件及其附件含有龙芯中科的商业秘密信息,仅限于发送给上面地址中列出的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制或散发)本邮件及其附件中的信息。如果您错收本邮件,请您立即电话或邮件通知发件人并删除本邮件。 
This email and its attachments contain confidential information from Loongson Technology , which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this email in error, please notify the sender by phone or email immediately and delete it.
Liu Peibao June 2, 2023, 7:21 a.m. UTC | #15
Hi all,

It seems that modification for current PCI enumeration framework is
needed to solve the problem. If the effect of this modification is not
easy to evaluate, for the requirement of Loongson, it should be OK that
do the things in Loongson PCI controller driver like discussed
before[1].

Br,
Peibao

[1] https://lore.kernel.org/all/20221114074346.23008-1-liupeibao@loongson.cn/

On 6/2/23 6:15 AM, Vladimir Oltean wrote:
> On Thu, Jun 01, 2023 at 12:51:39PM -0500, Bjorn Helgaas wrote:
>>>> Doing it in Linux would minimize dependences on the bootloader, so
>>>> that seems desirable to me. That means Linux needs to enumerate
>>>> Function 0 so it is visible to a driver or possibly a quirk.
>>>
>>> Uhm... no, that wouldn't be enough. Only a straight revert would satisfy
>>> the workaround that we currently have for NXP ENETC in Linux.
>>
>> I guess you mean a revert of 6fffbc7ae137?
> 
> Yes.
> 
>> This whole conversation is about whether we can rework 6fffbc7ae137 to
>> work both for Loongson and for you, so nothing is decided yet.
> 
> After reading
> https://lore.kernel.org/linux-pci/20221117020935.32086-1-liupeibao@loongson.cn/
> and
> https://lore.kernel.org/linux-pci/20221103090040.836-1-liupeibao@loongson.cn/
> and seeing the GMAC OF node at arch/mips/boot/dts/loongson/loongson64-2k1000.dtsi,
> I believe that a solution that would work for both Loongson and NXP would be to:
> 
> - patch loongson_dwmac_probe() to check for of_device_is_available()
> - revert commit 6fffbc7ae137 ("PCI: Honor firmware's device disabled
>   status")
> 
> I'm not sure what else of what was concretely proposed would work.
> Anything else is just wishful thinking that the PCI core can start
> enforcing a central policy, after letting device drivers get to choose
> how (and whether) to treat the "status" OF property for years on end.
> 
> As an added benefit, the disabled GMAC would become visible in lspci for
> the Loongson SoC.
> 
>> The point is, I assume you agree that it's preferable if we don't have
>> to depend on a bootloader to clear the memory.
> 
> I am confused by the message you are transmitting here.
> 
> With my user hat on, yes, maintaining the effect of commit 3222b5b613db
> from Linux is preferable.
> 
> Although Rob will probably not be happy about the way in which that will
> be achieved. And you haven't proposed ways in which that would remain
> possible, short of a revert of commit 6fffbc7ae137.
> 
>> After 6fffbc7ae137, the probe function is not called if the device is
>> disabled in DT because there's no pci_dev for it at all.
> 
> Correct, but commit 3222b5b613db pre-dates it by 2 years, and thus, it
> is broken by Rob's change.
> 
>>> My problem is that I don't really understand what was the functional
>>> need for commit 6fffbc7ae137 ("PCI: Honor firmware's device disabled
>>> status") in the first place, considering that any device driver can
>>> already fail to probe based on the same condition at its own will.
>>
>> In general, PCI drivers shouldn't rely on DT.  If the bus driver (PCI
>> in this case) calls a driver's probe function, the driver can assume
>> the device exists.
> 
> Well, the device exists...
> 
>> But enetc is not a general-purpose driver, and if DT is the only way
>> to discover this property, I guess you're stuck doing that.
> 
> So what Loongson tried to do - break enumeration of the on-chip GMAC
> PCIe device at the level of the PCIe controller, if the GMAC's pinmuxing
> doesn't make it available for networking - is encouraged?
> 
> Do you consider that their patch would have been better in the original
> form, if instead of the "skip-scan" property, they would have built some
> smarts into drivers/pci/controller/pci-loongson.c which would intentionally
> break config space access to gmac@3,0, without requiring OF to specify this?
> 
> Are you saying that this "present but unusable due to pinmuxing" is an
> incorrect use of status = "disabled"? What would it constitute correct
> use of, then?
> 
> The analogous situation for ENETC would be to patch the "pci-host-ecam-generic"
> driver to read the SERDES and pinmuxing configuration of the SoC, and to
> mask/unmask the config access to function 0 based on that. I mean - I could...
> but is it really a good idea? The principle of separation of concerns
> tells me no. The fact that the pinmuxing of the device makes it unavailable
> pertains to the IP-specific logic, it doesn't change whether it's enumerable
> or accessible on its bus.
> 
>>>> Is DT the only way to learn the NXP SERDES configuration?  I think it
>>>> would be much better if there were a way to programmatically learn it,
>>>> because then you wouldn't have to worry about syncing the DT with the
>>>> platform configuration, and it would decouple this from the Loongson
>>>> situation.
>>>
>>> Syncing the DT with the platform configuration will always be necessary,
>>> because for networking we will also need extra information which is
>>> completely non-discoverable, like a phy-handle or such, and that depends
>>> on the wiring and static pinmuxing of the SoC. So it is practically
>>> reasonable to expect that what is usable has status = "okay", and what
>>> isn't has status = "disabled". Not to mention, there are already device
>>> trees in circulation which are written that way, and those need to
>>> continue to work.
>>
>> Just because we need DT for non-discoverable info A doesn't mean we
>> should depend on it for B if B *is* discoverable.
> 
> But the argument was: we already have device trees with a certain
> convention, and that is to expect having status = "disabled" for
> unusable ports. I don't believe that changing that is realistically in
> scope for fixing this. And if we have device trees with status =
> "disabled" in circulation which we (I) don't want to break, then we're
> back to square 1 regarding the probing of disabled devices.
> 
>> This question of disabling a device via DT but still needing to do
>> things to the device is ... kind of a sticky wicket.
> 
> It boils down to whether accessing a disabled device is permitted or
> not. I opened the devicetree specification and it didn't say anything
> conclusive. Though it's certainly above my pay grade to say anything
> with certainty in this area. Apart from "okay" and "disabled", "status"
> takes other documented values too, like "reserved", "fail" and
> "fail-sss". Linux treats everything that's not "okay" the same.
> Krzysztof Kozlowski came with the suggestion for Loongson to replace
> "skip-scan" with "status", during the review of their v1 patch.
> 
> In any case, that question will only recur one level lower - in U-Boot,
> where we make an effort to keep device trees in sync in Linux. Why would
> U-Boot need to do things to a disabled device? :)
> 
>> Maybe this should be a different DT property (not "status").  Then PCI
>> enumeration could work normally and 6fffbc7ae137 wouldn't be in the
>> way.
> 
> I'm not quite sure where you're going with this. More concretely?
>
吕建民 June 2, 2023, 7:36 a.m. UTC | #16
On 2023/6/2 下午3:21, Liu Peibao wrote:
> Hi all,
> 
> It seems that modification for current PCI enumeration framework is
> needed to solve the problem. If the effect of this modification is not
> easy to evaluate, for the requirement of Loongson, it should be OK that
> do the things in Loongson PCI controller driver like discussed
> before[1].
> 
> Br,
> Peibao
> 
> [1] https://lore.kernel.org/all/20221114074346.23008-1-liupeibao@loongson.cn/
> 

Agree. For current pci core code, all functions of the device will be 
skipped if function 0 is not found, even without the patch 6fffbc7ae137 
(e.g. the func 0 is disabled in bios by setting pci header to 
0xffffffff). So it seems that there are two ways for the issue:

1. Adjust the pci scan core code to allow separate function to be 
enumerated, which will affect widely the pci core code.
2. Only Adjust loongson pci controller driver as Peibao said, and any 
function of the device should use platform device in DT if function 0 is 
disabled, which is acceptable for loongson.

Thanks,
Jianmin

> On 6/2/23 6:15 AM, Vladimir Oltean wrote:
>> On Thu, Jun 01, 2023 at 12:51:39PM -0500, Bjorn Helgaas wrote:
>>>>> Doing it in Linux would minimize dependences on the bootloader, so
>>>>> that seems desirable to me. That means Linux needs to enumerate
>>>>> Function 0 so it is visible to a driver or possibly a quirk.
>>>>
>>>> Uhm... no, that wouldn't be enough. Only a straight revert would satisfy
>>>> the workaround that we currently have for NXP ENETC in Linux.
>>>
>>> I guess you mean a revert of 6fffbc7ae137?
>>
>> Yes.
>>
>>> This whole conversation is about whether we can rework 6fffbc7ae137 to
>>> work both for Loongson and for you, so nothing is decided yet.
>>
>> After reading
>> https://lore.kernel.org/linux-pci/20221117020935.32086-1-liupeibao@loongson.cn/
>> and
>> https://lore.kernel.org/linux-pci/20221103090040.836-1-liupeibao@loongson.cn/
>> and seeing the GMAC OF node at arch/mips/boot/dts/loongson/loongson64-2k1000.dtsi,
>> I believe that a solution that would work for both Loongson and NXP would be to:
>>
>> - patch loongson_dwmac_probe() to check for of_device_is_available()
>> - revert commit 6fffbc7ae137 ("PCI: Honor firmware's device disabled
>>    status")
>>
>> I'm not sure what else of what was concretely proposed would work.
>> Anything else is just wishful thinking that the PCI core can start
>> enforcing a central policy, after letting device drivers get to choose
>> how (and whether) to treat the "status" OF property for years on end.
>>
>> As an added benefit, the disabled GMAC would become visible in lspci for
>> the Loongson SoC.
>>
>>> The point is, I assume you agree that it's preferable if we don't have
>>> to depend on a bootloader to clear the memory.
>>
>> I am confused by the message you are transmitting here.
>>
>> With my user hat on, yes, maintaining the effect of commit 3222b5b613db
>> from Linux is preferable.
>>
>> Although Rob will probably not be happy about the way in which that will
>> be achieved. And you haven't proposed ways in which that would remain
>> possible, short of a revert of commit 6fffbc7ae137.
>>
>>> After 6fffbc7ae137, the probe function is not called if the device is
>>> disabled in DT because there's no pci_dev for it at all.
>>
>> Correct, but commit 3222b5b613db pre-dates it by 2 years, and thus, it
>> is broken by Rob's change.
>>
>>>> My problem is that I don't really understand what was the functional
>>>> need for commit 6fffbc7ae137 ("PCI: Honor firmware's device disabled
>>>> status") in the first place, considering that any device driver can
>>>> already fail to probe based on the same condition at its own will.
>>>
>>> In general, PCI drivers shouldn't rely on DT.  If the bus driver (PCI
>>> in this case) calls a driver's probe function, the driver can assume
>>> the device exists.
>>
>> Well, the device exists...
>>
>>> But enetc is not a general-purpose driver, and if DT is the only way
>>> to discover this property, I guess you're stuck doing that.
>>
>> So what Loongson tried to do - break enumeration of the on-chip GMAC
>> PCIe device at the level of the PCIe controller, if the GMAC's pinmuxing
>> doesn't make it available for networking - is encouraged?
>>
>> Do you consider that their patch would have been better in the original
>> form, if instead of the "skip-scan" property, they would have built some
>> smarts into drivers/pci/controller/pci-loongson.c which would intentionally
>> break config space access to gmac@3,0, without requiring OF to specify this?
>>
>> Are you saying that this "present but unusable due to pinmuxing" is an
>> incorrect use of status = "disabled"? What would it constitute correct
>> use of, then?
>>
>> The analogous situation for ENETC would be to patch the "pci-host-ecam-generic"
>> driver to read the SERDES and pinmuxing configuration of the SoC, and to
>> mask/unmask the config access to function 0 based on that. I mean - I could...
>> but is it really a good idea? The principle of separation of concerns
>> tells me no. The fact that the pinmuxing of the device makes it unavailable
>> pertains to the IP-specific logic, it doesn't change whether it's enumerable
>> or accessible on its bus.
>>
>>>>> Is DT the only way to learn the NXP SERDES configuration?  I think it
>>>>> would be much better if there were a way to programmatically learn it,
>>>>> because then you wouldn't have to worry about syncing the DT with the
>>>>> platform configuration, and it would decouple this from the Loongson
>>>>> situation.
>>>>
>>>> Syncing the DT with the platform configuration will always be necessary,
>>>> because for networking we will also need extra information which is
>>>> completely non-discoverable, like a phy-handle or such, and that depends
>>>> on the wiring and static pinmuxing of the SoC. So it is practically
>>>> reasonable to expect that what is usable has status = "okay", and what
>>>> isn't has status = "disabled". Not to mention, there are already device
>>>> trees in circulation which are written that way, and those need to
>>>> continue to work.
>>>
>>> Just because we need DT for non-discoverable info A doesn't mean we
>>> should depend on it for B if B *is* discoverable.
>>
>> But the argument was: we already have device trees with a certain
>> convention, and that is to expect having status = "disabled" for
>> unusable ports. I don't believe that changing that is realistically in
>> scope for fixing this. And if we have device trees with status =
>> "disabled" in circulation which we (I) don't want to break, then we're
>> back to square 1 regarding the probing of disabled devices.
>>
>>> This question of disabling a device via DT but still needing to do
>>> things to the device is ... kind of a sticky wicket.
>>
>> It boils down to whether accessing a disabled device is permitted or
>> not. I opened the devicetree specification and it didn't say anything
>> conclusive. Though it's certainly above my pay grade to say anything
>> with certainty in this area. Apart from "okay" and "disabled", "status"
>> takes other documented values too, like "reserved", "fail" and
>> "fail-sss". Linux treats everything that's not "okay" the same.
>> Krzysztof Kozlowski came with the suggestion for Loongson to replace
>> "skip-scan" with "status", during the review of their v1 patch.
>>
>> In any case, that question will only recur one level lower - in U-Boot,
>> where we make an effort to keep device trees in sync in Linux. Why would
>> U-Boot need to do things to a disabled device? :)
>>
>>> Maybe this should be a different DT property (not "status").  Then PCI
>>> enumeration could work normally and 6fffbc7ae137 wouldn't be in the
>>> way.
>>
>> I'm not quite sure where you're going with this. More concretely?
>>
Vladimir Oltean June 2, 2023, 10:16 a.m. UTC | #17
Hi Jianmin,

On Fri, Jun 02, 2023 at 03:36:18PM +0800, Jianmin Lv wrote:
> On 2023/6/2 下午3:21, Liu Peibao wrote:
> > Hi all,
> > 
> > It seems that modification for current PCI enumeration framework is
> > needed to solve the problem. If the effect of this modification is not
> > easy to evaluate, for the requirement of Loongson, it should be OK that
> > do the things in Loongson PCI controller driver like discussed
> > before[1].
> > 
> > Br,
> > Peibao
> > 
> > [1] https://lore.kernel.org/all/20221114074346.23008-1-liupeibao@loongson.cn/
> > 
> 
> Agree. For current pci core code, all functions of the device will be
> skipped if function 0 is not found, even without the patch 6fffbc7ae137
> (e.g. the func 0 is disabled in bios by setting pci header to 0xffffffff).
> So it seems that there are two ways for the issue:
> 
> 1. Adjust the pci scan core code to allow separate function to be
> enumerated, which will affect widely the pci core code.
> 2. Only Adjust loongson pci controller driver as Peibao said, and any
> function of the device should use platform device in DT if function 0 is
> disabled, which is acceptable for loongson.
> 
> Thanks,
> Jianmin

How about 3. handle of_device_is_available() in the probe function of
the "loongson, pci-gmac" driver? Would that not work?
吕建民 June 3, 2023, 2:35 a.m. UTC | #18
On 2023/6/2 下午6:16, Vladimir Oltean wrote:
> Hi Jianmin,
> 
> On Fri, Jun 02, 2023 at 03:36:18PM +0800, Jianmin Lv wrote:
>> On 2023/6/2 下午3:21, Liu Peibao wrote:
>>> Hi all,
>>>
>>> It seems that modification for current PCI enumeration framework is
>>> needed to solve the problem. If the effect of this modification is not
>>> easy to evaluate, for the requirement of Loongson, it should be OK that
>>> do the things in Loongson PCI controller driver like discussed
>>> before[1].
>>>
>>> Br,
>>> Peibao
>>>
>>> [1] https://lore.kernel.org/all/20221114074346.23008-1-liupeibao@loongson.cn/
>>>
>>
>> Agree. For current pci core code, all functions of the device will be
>> skipped if function 0 is not found, even without the patch 6fffbc7ae137
>> (e.g. the func 0 is disabled in bios by setting pci header to 0xffffffff).
>> So it seems that there are two ways for the issue:
>>
>> 1. Adjust the pci scan core code to allow separate function to be
>> enumerated, which will affect widely the pci core code.
>> 2. Only Adjust loongson pci controller driver as Peibao said, and any
>> function of the device should use platform device in DT if function 0 is
>> disabled, which is acceptable for loongson.
>>
>> Thanks,
>> Jianmin
> 
> How about 3. handle of_device_is_available() in the probe function of
> the "loongson, pci-gmac" driver? Would that not work?
> 
This way does work only for the specified device. There are other 
devices, such as HDA, I2S, etc, which have shared pins. Then we have to 
add of_device_is_available() checking to those drivers one by one. And 
we are not sure if there are other devices in new generation chips in 
future. So I'm afraid that the way you mentioned is not suitable for us.

Thanks,
Jianmin
Vladimir Oltean June 4, 2023, 8:55 a.m. UTC | #19
On Sat, Jun 03, 2023 at 10:35:50AM +0800, Jianmin Lv wrote:
> > How about 3. handle of_device_is_available() in the probe function of
> > the "loongson, pci-gmac" driver? Would that not work?
> > 
> This way does work only for the specified device. There are other devices,
> such as HDA, I2S, etc, which have shared pins. Then we have to add
> of_device_is_available() checking to those drivers one by one. And we are
> not sure if there are other devices in new generation chips in future. So
> I'm afraid that the way you mentioned is not suitable for us.

Got it, so you have more on-chip PCIe devices than the ones listed in
loongson64-2k1000.dtsi, and you don't want to describe them in the
device tree just to put status = "disabled" for those devices/functions
that you don't want Linux to use - although you could, and it wouldn't
be that hard or have unintended side effects.

Though you need to admit, in case you had an on-chip multi-function PCIe
device like the NXP ENETC, and you wanted Linux to not use function 0,
the strategy you're suggesting here that is acceptable for Loongson
would not have worked.

I believe we need a bit of coordination from PCIe and device tree
maintainers, to suggest what would be the encouraged best practices and
ways to solve this regression for the ENETC.
吕建民 June 5, 2023, 12:59 a.m. UTC | #20
On 2023/6/4 下午4:55, Vladimir Oltean wrote:
> On Sat, Jun 03, 2023 at 10:35:50AM +0800, Jianmin Lv wrote:
>>> How about 3. handle of_device_is_available() in the probe function of
>>> the "loongson, pci-gmac" driver? Would that not work?
>>>
>> This way does work only for the specified device. There are other devices,
>> such as HDA, I2S, etc, which have shared pins. Then we have to add
>> of_device_is_available() checking to those drivers one by one. And we are
>> not sure if there are other devices in new generation chips in future. So
>> I'm afraid that the way you mentioned is not suitable for us.
> 
> Got it, so you have more on-chip PCIe devices than the ones listed in
> loongson64-2k1000.dtsi, and you don't want to describe them in the
> device tree just to put status = "disabled" for those devices/functions
> that you don't want Linux to use - although you could, and it wouldn't
> be that hard or have unintended side effects.
> 
> Though you need to admit, in case you had an on-chip multi-function PCIe
> device like the NXP ENETC, and you wanted Linux to not use function 0,
> the strategy you're suggesting here that is acceptable for Loongson
> would not have worked.
> 
> I believe we need a bit of coordination from PCIe and device tree
> maintainers, to suggest what would be the encouraged best practices and
> ways to solve this regression for the ENETC.
> 

For a multi-function device, if func 0 is not allowed to be scanned, as 
I said in way of 2, the other funcs of the device will be described as 
platform devices instead of pci and be not scanned either, which is 
acceptable for Loongson. The main goal by any way for us is to resolve 
the problem that shared pins can not be used simultaneously by devices 
sharing them. IMO, configure them in DT one by one may be reasonable, 
but adapting each driver will be bothered.

Any way, let's listen to opinions from Bjorn and Rob.
Vladimir Oltean June 5, 2023, 9:34 a.m. UTC | #21
On Mon, Jun 05, 2023 at 08:59:23AM +0800, Jianmin Lv wrote:
> For a multi-function device, if func 0 is not allowed to be scanned, as I
> said in way of 2, the other funcs of the device will be described as
> platform devices instead of pci and be not scanned either, which is
> acceptable for Loongson. The main goal by any way for us is to resolve the
> problem that shared pins can not be used simultaneously by devices sharing
> them. IMO, configure them in DT one by one may be reasonable, but adapting
> each driver will be bothered.

Could you give an example of PCIe functions being described as platform
devices, and how does that work for Loongson? Are you saying that there
will be 2 drivers for the same hardware, one pci_driver and one platform_driver?
In the case of the platform_driver, who will do the PCI-specific stuff
required by the IP, like function level reset and enabling the memory space?
吕建民 June 16, 2023, 2:12 a.m. UTC | #22
On 2023/6/5 下午5:34, Vladimir Oltean wrote:
> On Mon, Jun 05, 2023 at 08:59:23AM +0800, Jianmin Lv wrote:
>> For a multi-function device, if func 0 is not allowed to be scanned, as I
>> said in way of 2, the other funcs of the device will be described as
>> platform devices instead of pci and be not scanned either, which is
>> acceptable for Loongson. The main goal by any way for us is to resolve the
>> problem that shared pins can not be used simultaneously by devices sharing
>> them. IMO, configure them in DT one by one may be reasonable, but adapting
>> each driver will be bothered.
> 
> Could you give an example of PCIe functions being described as platform
> devices, and how does that work for Loongson? Are you saying that there
> will be 2 drivers for the same hardware, one pci_driver and one platform_driver?
> In the case of the platform_driver, who will do the PCI-specific stuff
> required by the IP, like function level reset and enabling the memory space?
> 

E.g. there are two functions , func0 is HDA controller and func1 is I2S 
controller and they have shared pins.
When HDA or I2S is used, both are disabled for PCI enumeration in BIOS 
(e.g. by filling PCI header with 0xffffffff), and mem space has been 
reserved from host bridge window for them in BIOS, of cause, reserved 
space will not be seen by kernel because it has been removed in host 
bridge mem range when passed to kernel in DT. Then the reserved mem base 
is passed into kernel by DT, CPU will use remapped address of the mem 
base, and these devices will not be enumerated in PCI bus. The way is 
only used for PCI devices (share common pins and exist on bus 0) 
integrated in Loongson CPU or chipset.
Rob Herring (Arm) June 16, 2023, 5:57 p.m. UTC | #23
On Sun, Jun 4, 2023 at 2:55 AM Vladimir Oltean <vladimir.oltean@nxp.com> wrote:
>

Sorry, just now seeing this as I've been out the last month.

> On Sat, Jun 03, 2023 at 10:35:50AM +0800, Jianmin Lv wrote:
> > > How about 3. handle of_device_is_available() in the probe function of
> > > the "loongson, pci-gmac" driver? Would that not work?
> > >
> > This way does work only for the specified device. There are other devices,
> > such as HDA, I2S, etc, which have shared pins. Then we have to add
> > of_device_is_available() checking to those drivers one by one. And we are
> > not sure if there are other devices in new generation chips in future. So
> > I'm afraid that the way you mentioned is not suitable for us.

If we decided that disabled devices should probe, then that is exactly
what will have to be done. The restriction (of shared pins) is in the
devices and is potentially per device, so it makes more sense for the
device's drivers to handle than the host bridge IMO. (Assuming the
core doesn't handle a per device property.)


> Got it, so you have more on-chip PCIe devices than the ones listed in
> loongson64-2k1000.dtsi, and you don't want to describe them in the
> device tree just to put status = "disabled" for those devices/functions
> that you don't want Linux to use - although you could, and it wouldn't
> be that hard or have unintended side effects.
>
> Though you need to admit, in case you had an on-chip multi-function PCIe
> device like the NXP ENETC, and you wanted Linux to not use function 0,
> the strategy you're suggesting here that is acceptable for Loongson
> would not have worked.
>
> I believe we need a bit of coordination from PCIe and device tree
> maintainers, to suggest what would be the encouraged best practices and
> ways to solve this regression for the ENETC.

I think we need to define what behavior is correct for 'status =
"disabled"'. For almost everywhere in DT, it is equivalent to the
device is not present. A not present device doesn't probe. There are
unfortunately cases where status got ignored/forgotten and PCI was one
of those. PCI is a bit different since there are 2 sources of
information about a device being present. The intent with PCI is DT
overrides what's discovered. For example, 'vendor-id' overrides what's
read from the h/w.

I think we can fix making the status per function simply by making
'match_driver' be set based on the status. This would move the check
later to just before probing. That would not work for a case where
accessing the config registers is a problem. It doesn't sound like
that's a problem for Loongson based on the above response, but their
original solution did prevent that. This change would also mean the
PCI quirks would run. Perhaps the func0 memory clearing you need could
be run as a quirk instead?

Rob
Vladimir Oltean Aug. 3, 2023, 10:39 a.m. UTC | #24
Hi Rob,

On Fri, Jun 16, 2023 at 11:57:43AM -0600, Rob Herring wrote:
> On Sun, Jun 4, 2023 at 2:55 AM Vladimir Oltean <vladimir.oltean@nxp.com> wrote:
> >
> 
> Sorry, just now seeing this as I've been out the last month.
> 
> > On Sat, Jun 03, 2023 at 10:35:50AM +0800, Jianmin Lv wrote:
> > > > How about 3. handle of_device_is_available() in the probe function of
> > > > the "loongson, pci-gmac" driver? Would that not work?
> > > >
> > > This way does work only for the specified device. There are other devices,
> > > such as HDA, I2S, etc, which have shared pins. Then we have to add
> > > of_device_is_available() checking to those drivers one by one. And we are
> > > not sure if there are other devices in new generation chips in future. So
> > > I'm afraid that the way you mentioned is not suitable for us.
> 
> If we decided that disabled devices should probe, then that is exactly
> what will have to be done. The restriction (of shared pins) is in the
> devices and is potentially per device, so it makes more sense for the
> device's drivers to handle than the host bridge IMO. (Assuming the
> core doesn't handle a per device property.)
> 
> 
> > Got it, so you have more on-chip PCIe devices than the ones listed in
> > loongson64-2k1000.dtsi, and you don't want to describe them in the
> > device tree just to put status = "disabled" for those devices/functions
> > that you don't want Linux to use - although you could, and it wouldn't
> > be that hard or have unintended side effects.
> >
> > Though you need to admit, in case you had an on-chip multi-function PCIe
> > device like the NXP ENETC, and you wanted Linux to not use function 0,
> > the strategy you're suggesting here that is acceptable for Loongson
> > would not have worked.
> >
> > I believe we need a bit of coordination from PCIe and device tree
> > maintainers, to suggest what would be the encouraged best practices and
> > ways to solve this regression for the ENETC.
> 
> I think we need to define what behavior is correct for 'status =
> "disabled"'. For almost everywhere in DT, it is equivalent to the
> device is not present. A not present device doesn't probe. There are
> unfortunately cases where status got ignored/forgotten and PCI was one
> of those. PCI is a bit different since there are 2 sources of
> information about a device being present. The intent with PCI is DT
> overrides what's discovered. For example, 'vendor-id' overrides what's
> read from the h/w.
> 
> I think we can fix making the status per function simply by making
> 'match_driver' be set based on the status. This would move the check
> later to just before probing. That would not work for a case where
> accessing the config registers is a problem. It doesn't sound like
> that's a problem for Loongson based on the above response, but their
> original solution did prevent that. This change would also mean the
> PCI quirks would run. Perhaps the func0 memory clearing you need could
> be run as a quirk instead?
> 
> Rob

Sorry to return to this thread very late. I had lots of other stuff to
take care of, and somehow *this* breakage had less priority :)

So, first off, there's a confusion regarding the "func0 memory clearing"
that could be run as a quirk instead. It's not memory clearing for fn 0,
but memory clearing for all ENETC functions, regardless or not whether
they have status = "disabled" or not in the device tree.

That being said, I've implemented the workaround below in a quirk as
you've said, and the quirks only get applied for those PCI functions
which don't have status = "disabled" in the device tree. So, as things
stand, it won't work.

Also, the original patch on which we're commenting ("PCI: don't skip
probing entire device if first fn OF node has status = "disabled"") is
needed in any case, because of the other issue: the PCI core thinks that
when fn 0 has status = "disabled", fn 1 .. 6 are also unavailable. False.

>From 9c3b88196a7c7e2b010d051c6d48faf36791e220 Mon Sep 17 00:00:00 2001
From: Vladimir Oltean <vladimir.oltean@nxp.com>
Date: Tue, 20 Jun 2023 16:31:07 +0300
Subject: [PATCH] net: enetc: reimplement RFS/RSS memory clearing as PCI quirk

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 .../net/ethernet/freescale/enetc/enetc_pf.c   | 57 ++++++++++++++-----
 1 file changed, 43 insertions(+), 14 deletions(-)

diff --git a/drivers/net/ethernet/freescale/enetc/enetc_pf.c b/drivers/net/ethernet/freescale/enetc/enetc_pf.c
index 1416262d4296..b8f6f0799170 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc_pf.c
+++ b/drivers/net/ethernet/freescale/enetc/enetc_pf.c
@@ -1242,18 +1242,6 @@ static int enetc_pf_probe(struct pci_dev *pdev,
 	if (err)
 		goto err_setup_cbdr;

-	err = enetc_init_port_rfs_memory(si);
-	if (err) {
-		dev_err(&pdev->dev, "Failed to initialize RFS memory\n");
-		goto err_init_port_rfs;
-	}
-
-	err = enetc_init_port_rss_memory(si);
-	if (err) {
-		dev_err(&pdev->dev, "Failed to initialize RSS memory\n");
-		goto err_init_port_rss;
-	}
-
 	if (node && !of_device_is_available(node)) {
 		dev_info(&pdev->dev, "device is disabled, skipping\n");
 		err = -ENODEV;
@@ -1339,8 +1327,6 @@ static int enetc_pf_probe(struct pci_dev *pdev,
 	si->ndev = NULL;
 	free_netdev(ndev);
 err_alloc_netdev:
-err_init_port_rss:
-err_init_port_rfs:
 err_device_disabled:
 err_setup_mac_addresses:
 	enetc_teardown_cbdr(&si->cbd_ring);
@@ -1377,6 +1363,49 @@ static void enetc_pf_remove(struct pci_dev *pdev)
 	enetc_pci_remove(pdev);
 }

+static void enetc_fixup_clear_rss_rfs(struct pci_dev *pdev)
+{
+	struct enetc_si *si;
+	struct enetc_pf *pf;
+	int err;
+
+	err = enetc_pci_probe(pdev, KBUILD_MODNAME, sizeof(*pf));
+	if (err)
+		goto out;
+
+	si = pci_get_drvdata(pdev);
+	if (!si->hw.port || !si->hw.global) {
+		err = -ENODEV;
+		goto out_pci_remove;
+	}
+
+	err = enetc_setup_cbdr(&pdev->dev, &si->hw, ENETC_CBDR_DEFAULT_SIZE,
+			       &si->cbd_ring);
+	if (err)
+		goto out_pci_remove;
+
+	err = enetc_init_port_rfs_memory(si);
+	if (err)
+		goto out_teardown_cbdr;
+
+	err = enetc_init_port_rss_memory(si);
+	if (err)
+		goto out_teardown_cbdr;
+
+out_teardown_cbdr:
+	enetc_teardown_cbdr(&si->cbd_ring);
+out_pci_remove:
+	enetc_pci_remove(pdev);
+out:
+	if (err) {
+		dev_err(&pdev->dev,
+			"Failed to apply PCI fixup for clearing RFS/RSS memories: %pe\n",
+			ERR_PTR(err));
+	}
+}
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_FREESCALE, ENETC_DEV_ID_PF,
+			enetc_fixup_clear_rss_rfs);
+
 static const struct pci_device_id enetc_pf_id_table[] = {
 	{ PCI_DEVICE(PCI_VENDOR_ID_FREESCALE, ENETC_DEV_ID_PF) },
 	{ 0, } /* End of table. */
--
2.34.1
Vladimir Oltean Aug. 3, 2023, 11:34 a.m. UTC | #25
On Thu, Aug 03, 2023 at 01:39:55PM +0300, Vladimir Oltean wrote:
> Hi Rob,
> 
> On Fri, Jun 16, 2023 at 11:57:43AM -0600, Rob Herring wrote:
> > On Sun, Jun 4, 2023 at 2:55 AM Vladimir Oltean <vladimir.oltean@nxp.com> wrote:
> > >
> > 
> > Sorry, just now seeing this as I've been out the last month.
> > 
> > > On Sat, Jun 03, 2023 at 10:35:50AM +0800, Jianmin Lv wrote:
> > > > > How about 3. handle of_device_is_available() in the probe function of
> > > > > the "loongson, pci-gmac" driver? Would that not work?
> > > > >
> > > > This way does work only for the specified device. There are other devices,
> > > > such as HDA, I2S, etc, which have shared pins. Then we have to add
> > > > of_device_is_available() checking to those drivers one by one. And we are
> > > > not sure if there are other devices in new generation chips in future. So
> > > > I'm afraid that the way you mentioned is not suitable for us.
> > 
> > If we decided that disabled devices should probe, then that is exactly
> > what will have to be done. The restriction (of shared pins) is in the
> > devices and is potentially per device, so it makes more sense for the
> > device's drivers to handle than the host bridge IMO. (Assuming the
> > core doesn't handle a per device property.)
> > 
> > 
> > > Got it, so you have more on-chip PCIe devices than the ones listed in
> > > loongson64-2k1000.dtsi, and you don't want to describe them in the
> > > device tree just to put status = "disabled" for those devices/functions
> > > that you don't want Linux to use - although you could, and it wouldn't
> > > be that hard or have unintended side effects.
> > >
> > > Though you need to admit, in case you had an on-chip multi-function PCIe
> > > device like the NXP ENETC, and you wanted Linux to not use function 0,
> > > the strategy you're suggesting here that is acceptable for Loongson
> > > would not have worked.
> > >
> > > I believe we need a bit of coordination from PCIe and device tree
> > > maintainers, to suggest what would be the encouraged best practices and
> > > ways to solve this regression for the ENETC.
> > 
> > I think we need to define what behavior is correct for 'status =
> > "disabled"'. For almost everywhere in DT, it is equivalent to the
> > device is not present. A not present device doesn't probe. There are
> > unfortunately cases where status got ignored/forgotten and PCI was one
> > of those. PCI is a bit different since there are 2 sources of
> > information about a device being present. The intent with PCI is DT
> > overrides what's discovered. For example, 'vendor-id' overrides what's
> > read from the h/w.
> > 
> > I think we can fix making the status per function simply by making
> > 'match_driver' be set based on the status. This would move the check
> > later to just before probing. That would not work for a case where
> > accessing the config registers is a problem. It doesn't sound like
> > that's a problem for Loongson based on the above response, but their
> > original solution did prevent that. This change would also mean the
> > PCI quirks would run. Perhaps the func0 memory clearing you need could
> > be run as a quirk instead?
> > 
> > Rob
> 
> Sorry to return to this thread very late. I had lots of other stuff to
> take care of, and somehow *this* breakage had less priority :)
> 
> So, first off, there's a confusion regarding the "func0 memory clearing"
> that could be run as a quirk instead. It's not memory clearing for fn 0,
> but memory clearing for all ENETC functions, regardless or not whether
> they have status = "disabled" or not in the device tree.
> 
> That being said, I've implemented the workaround below in a quirk as
> you've said, and the quirks only get applied for those PCI functions
> which don't have status = "disabled" in the device tree. So, as things
> stand, it won't work.
> 
> Also, the original patch on which we're commenting ("PCI: don't skip
> probing entire device if first fn OF node has status = "disabled"") is
> needed in any case, because of the other issue: the PCI core thinks that
> when fn 0 has status = "disabled", fn 1 .. 6 are also unavailable. False.
> 
> From 9c3b88196a7c7e2b010d051c6d48faf36791e220 Mon Sep 17 00:00:00 2001
> From: Vladimir Oltean <vladimir.oltean@nxp.com>
> Date: Tue, 20 Jun 2023 16:31:07 +0300
> Subject: [PATCH] net: enetc: reimplement RFS/RSS memory clearing as PCI quirk
> 
> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
> ---
>  .../net/ethernet/freescale/enetc/enetc_pf.c   | 57 ++++++++++++++-----
>  1 file changed, 43 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/net/ethernet/freescale/enetc/enetc_pf.c b/drivers/net/ethernet/freescale/enetc/enetc_pf.c
> index 1416262d4296..b8f6f0799170 100644
> --- a/drivers/net/ethernet/freescale/enetc/enetc_pf.c
> +++ b/drivers/net/ethernet/freescale/enetc/enetc_pf.c
> @@ -1242,18 +1242,6 @@ static int enetc_pf_probe(struct pci_dev *pdev,
>  	if (err)
>  		goto err_setup_cbdr;
> 
> -	err = enetc_init_port_rfs_memory(si);
> -	if (err) {
> -		dev_err(&pdev->dev, "Failed to initialize RFS memory\n");
> -		goto err_init_port_rfs;
> -	}
> -
> -	err = enetc_init_port_rss_memory(si);
> -	if (err) {
> -		dev_err(&pdev->dev, "Failed to initialize RSS memory\n");
> -		goto err_init_port_rss;
> -	}
> -
>  	if (node && !of_device_is_available(node)) {
>  		dev_info(&pdev->dev, "device is disabled, skipping\n");
>  		err = -ENODEV;
> @@ -1339,8 +1327,6 @@ static int enetc_pf_probe(struct pci_dev *pdev,
>  	si->ndev = NULL;
>  	free_netdev(ndev);
>  err_alloc_netdev:
> -err_init_port_rss:
> -err_init_port_rfs:
>  err_device_disabled:
>  err_setup_mac_addresses:
>  	enetc_teardown_cbdr(&si->cbd_ring);
> @@ -1377,6 +1363,49 @@ static void enetc_pf_remove(struct pci_dev *pdev)
>  	enetc_pci_remove(pdev);
>  }
> 
> +static void enetc_fixup_clear_rss_rfs(struct pci_dev *pdev)
> +{
> +	struct enetc_si *si;
> +	struct enetc_pf *pf;
> +	int err;
> +
> +	err = enetc_pci_probe(pdev, KBUILD_MODNAME, sizeof(*pf));
> +	if (err)
> +		goto out;
> +
> +	si = pci_get_drvdata(pdev);
> +	if (!si->hw.port || !si->hw.global) {
> +		err = -ENODEV;
> +		goto out_pci_remove;
> +	}
> +
> +	err = enetc_setup_cbdr(&pdev->dev, &si->hw, ENETC_CBDR_DEFAULT_SIZE,
> +			       &si->cbd_ring);
> +	if (err)
> +		goto out_pci_remove;
> +
> +	err = enetc_init_port_rfs_memory(si);
> +	if (err)
> +		goto out_teardown_cbdr;
> +
> +	err = enetc_init_port_rss_memory(si);
> +	if (err)
> +		goto out_teardown_cbdr;
> +
> +out_teardown_cbdr:
> +	enetc_teardown_cbdr(&si->cbd_ring);
> +out_pci_remove:
> +	enetc_pci_remove(pdev);
> +out:
> +	if (err) {
> +		dev_err(&pdev->dev,
> +			"Failed to apply PCI fixup for clearing RFS/RSS memories: %pe\n",
> +			ERR_PTR(err));
> +	}
> +}
> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_FREESCALE, ENETC_DEV_ID_PF,
> +			enetc_fixup_clear_rss_rfs);
> +
>  static const struct pci_device_id enetc_pf_id_table[] = {
>  	{ PCI_DEVICE(PCI_VENDOR_ID_FREESCALE, ENETC_DEV_ID_PF) },
>  	{ 0, } /* End of table. */
> --
> 2.34.1
>

Ah, sorry, I completely missed your comment about match_driver.
So I've added this extra patch and both issues are solved. The fixup
runs for all functions (thus I see no AER errors), and functions 1 .. 6
continue to probe even when function 0 has status = "disabled".

From 3528d4a48cb37dce8d1d83d2fbb465f21a32adcd Mon Sep 17 00:00:00 2001
From: Vladimir Oltean <vladimir.oltean@nxp.com>
Date: Thu, 3 Aug 2023 14:31:27 +0300
Subject: [PATCH] PCI: move OF status = "disabled" detection to
 dev->match_driver

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 drivers/pci/bus.c | 4 +++-
 drivers/pci/of.c  | 5 -----
 2 files changed, 3 insertions(+), 6 deletions(-)

diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
index 5bc81cc0a2de..46b252bbe500 100644
--- a/drivers/pci/bus.c
+++ b/drivers/pci/bus.c
@@ -11,6 +11,7 @@
 #include <linux/pci.h>
 #include <linux/errno.h>
 #include <linux/ioport.h>
+#include <linux/of.h>
 #include <linux/proc_fs.h>
 #include <linux/slab.h>

@@ -332,6 +333,7 @@ void __weak pcibios_bus_add_device(struct pci_dev *pdev) { }
  */
 void pci_bus_add_device(struct pci_dev *dev)
 {
+	struct device_node *dn = dev->dev.of_node;
 	int retval;

 	/*
@@ -344,7 +346,7 @@ void pci_bus_add_device(struct pci_dev *dev)
 	pci_proc_attach_device(dev);
 	pci_bridge_d3_update(dev);

-	dev->match_driver = true;
+	dev->match_driver = !dn || of_device_is_available(dn);
 	retval = device_attach(&dev->dev);
 	if (retval < 0 && retval != -EPROBE_DEFER)
 		pci_warn(dev, "device attach failed (%d)\n", retval);
diff --git a/drivers/pci/of.c b/drivers/pci/of.c
index e51219f9f523..3c158b17dcb5 100644
--- a/drivers/pci/of.c
+++ b/drivers/pci/of.c
@@ -34,11 +34,6 @@ int pci_set_of_node(struct pci_dev *dev)
 	if (!node)
 		return 0;

-	if (!of_device_is_available(node)) {
-		of_node_put(node);
-		return -ENODEV;
-	}
-
 	device_set_node(&dev->dev, of_fwnode_handle(node));
 	return 0;
 }
--
2.34.1
diff mbox series

Patch

diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 2475098f6518..dc11e0945744 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -240,6 +240,7 @@  bool pci_bus_generic_read_dev_vendor_id(struct pci_bus *bus, int devfn, u32 *pl,
 					int crs_timeout);
 int pci_idt_bus_quirk(struct pci_bus *bus, int devfn, u32 *pl, int crs_timeout);
 
+int __pci_setup_device(struct pci_dev *dev, bool *present_but_skipped);
 int pci_setup_device(struct pci_dev *dev);
 int __pci_read_base(struct pci_dev *dev, enum pci_bar_type type,
 		    struct resource *res, unsigned int reg);
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 0b2826c4a832..17a51fa55020 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -1811,17 +1811,7 @@  static void early_dump_pci_device(struct pci_dev *pdev)
 		       value, 256, false);
 }
 
-/**
- * pci_setup_device - Fill in class and map information of a device
- * @dev: the device structure to fill
- *
- * Initialize the device structure with information about the device's
- * vendor,class,memory and IO-space addresses, IRQ lines etc.
- * Called at initialisation of the PCI subsystem and by CardBus services.
- * Returns 0 on success and negative if unknown type of device (not normal,
- * bridge or CardBus).
- */
-int pci_setup_device(struct pci_dev *dev)
+int __pci_setup_device(struct pci_dev *dev, bool *present_but_skipped)
 {
 	u32 class;
 	u16 cmd;
@@ -1841,8 +1831,10 @@  int pci_setup_device(struct pci_dev *dev)
 	set_pcie_port_type(dev);
 
 	err = pci_set_of_node(dev);
-	if (err)
+	if (err) {
+		*present_but_skipped = true;
 		return err;
+	}
 	pci_set_acpi_fwnode(dev);
 
 	pci_dev_assign_slot(dev);
@@ -1995,6 +1987,23 @@  int pci_setup_device(struct pci_dev *dev)
 	return 0;
 }
 
+/**
+ * pci_setup_device - Fill in class and map information of a device
+ * @dev: the device structure to fill
+ *
+ * Initialize the device structure with information about the device's
+ * vendor,class,memory and IO-space addresses, IRQ lines etc.
+ * Called at initialisation of the PCI subsystem and by CardBus services.
+ * Returns 0 on success and negative if unknown type of device (not normal,
+ * bridge or CardBus).
+ */
+int pci_setup_device(struct pci_dev *dev)
+{
+	bool present_but_skipped = false;
+
+	return __pci_setup_device(dev, &present_but_skipped);
+}
+
 static void pci_configure_mps(struct pci_dev *dev)
 {
 	struct pci_dev *bridge = pci_upstream_bridge(dev);
@@ -2414,7 +2423,8 @@  EXPORT_SYMBOL(pci_bus_read_dev_vendor_id);
  * Read the config data for a PCI device, sanity-check it,
  * and fill in the dev structure.
  */
-static struct pci_dev *pci_scan_device(struct pci_bus *bus, int devfn)
+static struct pci_dev *pci_scan_device(struct pci_bus *bus, int devfn,
+				       bool *present_but_skipped)
 {
 	struct pci_dev *dev;
 	u32 l;
@@ -2430,7 +2440,7 @@  static struct pci_dev *pci_scan_device(struct pci_bus *bus, int devfn)
 	dev->vendor = l & 0xffff;
 	dev->device = (l >> 16) & 0xffff;
 
-	if (pci_setup_device(dev)) {
+	if (__pci_setup_device(dev, present_but_skipped)) {
 		pci_bus_put(dev->bus);
 		kfree(dev);
 		return NULL;
@@ -2575,17 +2585,20 @@  void pci_device_add(struct pci_dev *dev, struct pci_bus *bus)
 	WARN_ON(ret < 0);
 }
 
-struct pci_dev *pci_scan_single_device(struct pci_bus *bus, int devfn)
+static struct pci_dev *__pci_scan_single_device(struct pci_bus *bus, int devfn,
+						bool *present_but_skipped)
 {
 	struct pci_dev *dev;
 
+	*present_but_skipped = false;
+
 	dev = pci_get_slot(bus, devfn);
 	if (dev) {
 		pci_dev_put(dev);
 		return dev;
 	}
 
-	dev = pci_scan_device(bus, devfn);
+	dev = pci_scan_device(bus, devfn, present_but_skipped);
 	if (!dev)
 		return NULL;
 
@@ -2593,6 +2606,13 @@  struct pci_dev *pci_scan_single_device(struct pci_bus *bus, int devfn)
 
 	return dev;
 }
+
+struct pci_dev *pci_scan_single_device(struct pci_bus *bus, int devfn)
+{
+	bool present_but_skipped;
+
+	return __pci_scan_single_device(bus, devfn, &present_but_skipped);
+}
 EXPORT_SYMBOL(pci_scan_single_device);
 
 static int next_ari_fn(struct pci_bus *bus, struct pci_dev *dev, int fn)
@@ -2665,6 +2685,7 @@  static int only_one_child(struct pci_bus *bus)
  */
 int pci_scan_slot(struct pci_bus *bus, int devfn)
 {
+	bool present_but_skipped;
 	struct pci_dev *dev;
 	int fn = 0, nr = 0;
 
@@ -2672,13 +2693,14 @@  int pci_scan_slot(struct pci_bus *bus, int devfn)
 		return 0; /* Already scanned the entire slot */
 
 	do {
-		dev = pci_scan_single_device(bus, devfn + fn);
+		dev = __pci_scan_single_device(bus, devfn + fn,
+					       &present_but_skipped);
 		if (dev) {
 			if (!pci_dev_is_added(dev))
 				nr++;
 			if (fn > 0)
 				dev->multifunction = 1;
-		} else if (fn == 0) {
+		} else if (fn == 0 && !present_but_skipped) {
 			/*
 			 * Function 0 is required unless we are running on
 			 * a hypervisor that passes through individual PCI