diff mbox series

[v2] PCI/portdrv: Use link bandwidth notification capability bit

Message ID 20210512213314.7778-1-stuart.w.hayes@gmail.com (mailing list archive)
State Accepted
Delegated to: Bjorn Helgaas
Headers show
Series [v2] PCI/portdrv: Use link bandwidth notification capability bit | expand

Commit Message

stuart hayes May 12, 2021, 9:33 p.m. UTC
The pcieport driver can fail to attach to a downstream port that doesn't
support bandwidth notification.  This can happen when, in
pcie_port_device_register(), pci_init_service_irqs() tries (and fails) to
set up a bandwidth notification IRQ for a device that doesn't support it.

This patch changes get_port_device_capability() to look at the link
bandwidth notification capability bit in the link capabilities register of
the port, instead of assuming that all downstream ports have that
capability.

Signed-off-by: Stuart Hayes <stuart.w.hayes@gmail.com>
Reviewed-by: Lukas Wunner <lukas@wunner.de>
---

changes from v1:
	- corrected spelling errors in commit message
	- added Lukas' reviewed-by:

---
 drivers/pci/pcie/portdrv_core.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

Comments

Krzysztof Wilczyński May 14, 2021, 1:03 p.m. UTC | #1
Hi Stuart,

> The pcieport driver can fail to attach to a downstream port that doesn't
> support bandwidth notification.  This can happen when, in
> pcie_port_device_register(), pci_init_service_irqs() tries (and fails) to
> set up a bandwidth notification IRQ for a device that doesn't support it.
> 
> This patch changes get_port_device_capability() to look at the link
> bandwidth notification capability bit in the link capabilities register of
> the port, instead of assuming that all downstream ports have that
> capability.
[...]

Thank you for fixing this!

I was wondering - is this fix connected to an issue filled in Bugzilla
or does it fix a known commit that introduced this problem?  Basically,
I am trying to see whether a "Fixes:" would be in order.

Krzysztof
Lukas Wunner May 14, 2021, 1:08 p.m. UTC | #2
On Fri, May 14, 2021 at 03:03:03PM +0200, Krzysztof Wilczy??ski wrote:
> > The pcieport driver can fail to attach to a downstream port that doesn't
> > support bandwidth notification.  This can happen when, in
> > pcie_port_device_register(), pci_init_service_irqs() tries (and fails) to
> > set up a bandwidth notification IRQ for a device that doesn't support it.
> > 
> > This patch changes get_port_device_capability() to look at the link
> > bandwidth notification capability bit in the link capabilities register of
> > the port, instead of assuming that all downstream ports have that
> > capability.
> 
> I was wondering - is this fix connected to an issue filled in Bugzilla
> or does it fix a known commit that introduced this problem?  Basically,
> I am trying to see whether a "Fixes:" would be in order.

The fix is for a driver which has been removed from the tree (for now),
including in stable kernels.  The fix will prevent an issue that will
occur once the driver is re-introduced (once we've found a way to
overcome the issues that led to its removal).  A Fixes tag is thus
uncalled for.

Thanks,

Lukas
Krzysztof Wilczyński May 14, 2021, 1:17 p.m. UTC | #3
Hi Lukas,

[...]
> > I was wondering - is this fix connected to an issue filled in Bugzilla
> > or does it fix a known commit that introduced this problem?  Basically,
> > I am trying to see whether a "Fixes:" would be in order.
> 
> The fix is for a driver which has been removed from the tree (for now),
> including in stable kernels.  The fix will prevent an issue that will
> occur once the driver is re-introduced (once we've found a way to
> overcome the issues that led to its removal).  A Fixes tag is thus
> uncalled for.

Thank you for adding more details.  Much appreciated.

Krzysztof
stuart hayes May 27, 2021, 1:12 a.m. UTC | #4
On 5/14/2021 8:17 AM, Krzysztof Wilczy??ski wrote:
> Hi Lukas,
> 
> [...]
>>> I was wondering - is this fix connected to an issue filled in Bugzilla
>>> or does it fix a known commit that introduced this problem?  Basically,
>>> I am trying to see whether a "Fixes:" would be in order.
>>
>> The fix is for a driver which has been removed from the tree (for now),
>> including in stable kernels.  The fix will prevent an issue that will
>> occur once the driver is re-introduced (once we've found a way to
>> overcome the issues that led to its removal).  A Fixes tag is thus
>> uncalled for.
> 
> Thank you for adding more details.  Much appreciated.
> 
> Krzysztof
> 

I made the patch because it was causing the config space for a 
downstream port to not get restored when a DPC event occurred, and all 
the NVMe drives under it disappeared.  I found that myself, though--I'm 
not aware of anyone else reporting the issue.
stuart hayes July 7, 2021, 3:48 p.m. UTC | #5
On 5/26/2021 8:12 PM, stuart hayes wrote:
> 
> 
> On 5/14/2021 8:17 AM, Krzysztof Wilczy??ski wrote:
>> Hi Lukas,
>>
>> [...]
>>>> I was wondering - is this fix connected to an issue filled in Bugzilla
>>>> or does it fix a known commit that introduced this problem?  Basically,
>>>> I am trying to see whether a "Fixes:" would be in order.
>>>
>>> The fix is for a driver which has been removed from the tree (for now),
>>> including in stable kernels.  The fix will prevent an issue that will
>>> occur once the driver is re-introduced (once we've found a way to
>>> overcome the issues that led to its removal).  A Fixes tag is thus
>>> uncalled for.
>>
>> Thank you for adding more details.  Much appreciated.
>>
>> Krzysztof
>>
> 
> I made the patch because it was causing the config space for a 
> downstream port to not get restored when a DPC event occurred, and all 
> the NVMe drives under it disappeared.  I found that myself, though--I'm 
> not aware of anyone else reporting the issue.

Any chance this might get accepted?  I'm not sure if it just got 
forgotten, or if there's a problem with it.  Thanks!  --Stuart
Bjorn Helgaas July 7, 2021, 6:59 p.m. UTC | #6
On Wed, Jul 07, 2021 at 10:48:20AM -0500, stuart hayes wrote:
> On 5/26/2021 8:12 PM, stuart hayes wrote:
> > On 5/14/2021 8:17 AM, Krzysztof Wilczy??ski wrote:
> > > Hi Lukas,
> > > 
> > > [...]
> > > > > I was wondering - is this fix connected to an issue filled in Bugzilla
> > > > > or does it fix a known commit that introduced this problem?  Basically,
> > > > > I am trying to see whether a "Fixes:" would be in order.
> > > > 
> > > > The fix is for a driver which has been removed from the tree (for now),
> > > > including in stable kernels.  The fix will prevent an issue that will
> > > > occur once the driver is re-introduced (once we've found a way to
> > > > overcome the issues that led to its removal).  A Fixes tag is thus
> > > > uncalled for.
> > > 
> > > Thank you for adding more details.  Much appreciated.
> > > 
> > > Krzysztof
> > > 
> > 
> > I made the patch because it was causing the config space for a
> > downstream port to not get restored when a DPC event occurred, and all
> > the NVMe drives under it disappeared.  I found that myself, though--I'm
> > not aware of anyone else reporting the issue.
> 
> Any chance this might get accepted?  I'm not sure if it just got forgotten,
> or if there's a problem with it.  Thanks!  --Stuart

Sorry, still on my list but I'll look at it after -rc1 comes out this
weekend.
Bjorn Helgaas July 16, 2021, 9:56 p.m. UTC | #7
On Thu, May 13, 2021 at 03:03:14AM +0530, Stuart Hayes wrote:
> The pcieport driver can fail to attach to a downstream port that doesn't
> support bandwidth notification.  This can happen when, in
> pcie_port_device_register(), pci_init_service_irqs() tries (and fails) to
> set up a bandwidth notification IRQ for a device that doesn't support it.

I'm trying to figure out exactly how this fails.  The only failure
path in pcie_init_service_irqs() is when
pci_alloc_irq_vectors(PCI_IRQ_LEGACY) fails, which I guess means the
port's dev->irq was zero?

And to even attempt legacy IRQs, either we had pcie_pme_no_msi() or
pcie_port_enable_irq_vec() failed?

> This patch changes get_port_device_capability() to look at the link
> bandwidth notification capability bit in the link capabilities register of
> the port, instead of assuming that all downstream ports have that
> capability.

I think this needs:

  Fixes: e8303bb7a75c ("PCI/LINK: Report degraded links via link bandwidth notification")

because even though b4c7d2076b4e ("PCI/LINK: Remove bandwidth
notification") removed *most* of e8303bb7a75c, it did not remove the
code in get_port_device_capability() that you're fixing here.

I can fix this up locally, no need to resend.  I think the patch
itself is fine, just trying to understand the commit log.

> Signed-off-by: Stuart Hayes <stuart.w.hayes@gmail.com>
> Reviewed-by: Lukas Wunner <lukas@wunner.de>
> ---
> 
> changes from v1:
> 	- corrected spelling errors in commit message
> 	- added Lukas' reviewed-by:
> 
> ---
>  drivers/pci/pcie/portdrv_core.c | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/pci/pcie/portdrv_core.c b/drivers/pci/pcie/portdrv_core.c
> index e1fed6649c41..3ee63968deaa 100644
> --- a/drivers/pci/pcie/portdrv_core.c
> +++ b/drivers/pci/pcie/portdrv_core.c
> @@ -257,8 +257,13 @@ static int get_port_device_capability(struct pci_dev *dev)
>  		services |= PCIE_PORT_SERVICE_DPC;
>  
>  	if (pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM ||
> -	    pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT)
> -		services |= PCIE_PORT_SERVICE_BWNOTIF;
> +	    pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT) {
> +		u32 linkcap;
> +
> +		pcie_capability_read_dword(dev, PCI_EXP_LNKCAP, &linkcap);
> +		if (linkcap & PCI_EXP_LNKCAP_LBNC)
> +			services |= PCIE_PORT_SERVICE_BWNOTIF;
> +	}
>  
>  	return services;
>  }
> -- 
> 2.27.0
>
stuart hayes July 17, 2021, 2:43 a.m. UTC | #8
On 7/16/2021 4:56 PM, Bjorn Helgaas wrote:
> On Thu, May 13, 2021 at 03:03:14AM +0530, Stuart Hayes wrote:
>> The pcieport driver can fail to attach to a downstream port that doesn't
>> support bandwidth notification.  This can happen when, in
>> pcie_port_device_register(), pci_init_service_irqs() tries (and fails) to
>> set up a bandwidth notification IRQ for a device that doesn't support it.
> 
> I'm trying to figure out exactly how this fails.  The only failure
> path in pcie_init_service_irqs() is when
> pci_alloc_irq_vectors(PCI_IRQ_LEGACY) fails, which I guess means the
> port's dev->irq was zero?
> 

Yes... dev->irq is zero.  Here's "lspci -v "for the device in case it helps:

66:00.0 PCI bridge: Broadcom / LSI PEX880xx PCIe Gen 4 Switch (rev b0) 
(prog-if 00 [Normal decode])
         Flags: bus master, fast devsel, latency 0, NUMA node 0
         Bus: primary=66, secondary=67, subordinate=70, sec-latency=0
         I/O behind bridge: [disabled]
         Memory behind bridge: be000000-bf4fffff [size=21M]
         Prefetchable memory behind bridge: 
00000d0000000000-00000d00017fffff [size=24M]
         Capabilities: [40] Power Management version 3
         Capabilities: [68] Express Downstream Port (Slot-), MSI 00
         Capabilities: [a4] Subsystem: Broadcom / LSI Device a064
         Capabilities: [100] Device Serial Number 00-80-5e-10-56-39-53-50
         Capabilities: [fb4] Advanced Error Reporting
         Capabilities: [148] Virtual Channel
         Capabilities: [b70] Vendor Specific Information: ID=0001 Rev=0 
Len=010 <?>
         Capabilities: [e00] Multicast
         Capabilities: [f24] Access Control Services

> And to even attempt legacy IRQs, either we had pcie_pme_no_msi() or
> pcie_port_enable_irq_vec() failed?
> 

pcie_port_enable_irq_vec() failed.  I don't believe this device has the 
ability to generate interrupts.

>> This patch changes get_port_device_capability() to look at the link
>> bandwidth notification capability bit in the link capabilities register of
>> the port, instead of assuming that all downstream ports have that
>> capability.
> 
> I think this needs:
> 
>    Fixes: e8303bb7a75c ("PCI/LINK: Report degraded links via link bandwidth notification")
> 
> because even though b4c7d2076b4e ("PCI/LINK: Remove bandwidth
> notification") removed *most* of e8303bb7a75c, it did not remove the
> code in get_port_device_capability() that you're fixing here.
> 
> I can fix this up locally, no need to resend.  I think the patch
> itself is fine, just trying to understand the commit log.
> 

Thank you!


>> Signed-off-by: Stuart Hayes <stuart.w.hayes@gmail.com>
>> Reviewed-by: Lukas Wunner <lukas@wunner.de>
>> ---
>>
>> changes from v1:
>> 	- corrected spelling errors in commit message
>> 	- added Lukas' reviewed-by:
>>
>> ---
>>   drivers/pci/pcie/portdrv_core.c | 9 +++++++--
>>   1 file changed, 7 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/pci/pcie/portdrv_core.c b/drivers/pci/pcie/portdrv_core.c
>> index e1fed6649c41..3ee63968deaa 100644
>> --- a/drivers/pci/pcie/portdrv_core.c
>> +++ b/drivers/pci/pcie/portdrv_core.c
>> @@ -257,8 +257,13 @@ static int get_port_device_capability(struct pci_dev *dev)
>>   		services |= PCIE_PORT_SERVICE_DPC;
>>   
>>   	if (pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM ||
>> -	    pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT)
>> -		services |= PCIE_PORT_SERVICE_BWNOTIF;
>> +	    pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT) {
>> +		u32 linkcap;
>> +
>> +		pcie_capability_read_dword(dev, PCI_EXP_LNKCAP, &linkcap);
>> +		if (linkcap & PCI_EXP_LNKCAP_LBNC)
>> +			services |= PCIE_PORT_SERVICE_BWNOTIF;
>> +	}
>>   
>>   	return services;
>>   }
>> -- 
>> 2.27.0
>>
Bjorn Helgaas Aug. 31, 2021, 7:11 p.m. UTC | #9
[+cc Myron]

On Thu, May 13, 2021 at 03:03:14AM +0530, Stuart Hayes wrote:
> The pcieport driver can fail to attach to a downstream port that doesn't
> support bandwidth notification.  This can happen when, in
> pcie_port_device_register(), pci_init_service_irqs() tries (and fails) to
> set up a bandwidth notification IRQ for a device that doesn't support it.
> 
> This patch changes get_port_device_capability() to look at the link
> bandwidth notification capability bit in the link capabilities register of
> the port, instead of assuming that all downstream ports have that
> capability.
> 
> Signed-off-by: Stuart Hayes <stuart.w.hayes@gmail.com>
> Reviewed-by: Lukas Wunner <lukas@wunner.de>

Sorry, I totally dropped the ball on this.  I added a stable tag and
put this on pci/portdrv and plan to try to squeeze it into v5.15.

> ---
> 
> changes from v1:
> 	- corrected spelling errors in commit message
> 	- added Lukas' reviewed-by:
> 
> ---
>  drivers/pci/pcie/portdrv_core.c | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/pci/pcie/portdrv_core.c b/drivers/pci/pcie/portdrv_core.c
> index e1fed6649c41..3ee63968deaa 100644
> --- a/drivers/pci/pcie/portdrv_core.c
> +++ b/drivers/pci/pcie/portdrv_core.c
> @@ -257,8 +257,13 @@ static int get_port_device_capability(struct pci_dev *dev)
>  		services |= PCIE_PORT_SERVICE_DPC;
>  
>  	if (pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM ||
> -	    pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT)
> -		services |= PCIE_PORT_SERVICE_BWNOTIF;
> +	    pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT) {
> +		u32 linkcap;
> +
> +		pcie_capability_read_dword(dev, PCI_EXP_LNKCAP, &linkcap);
> +		if (linkcap & PCI_EXP_LNKCAP_LBNC)
> +			services |= PCIE_PORT_SERVICE_BWNOTIF;
> +	}
>  
>  	return services;
>  }
> -- 
> 2.27.0
>
Bjorn Helgaas Aug. 31, 2021, 7:20 p.m. UTC | #10
On Wed, May 26, 2021 at 08:12:04PM -0500, stuart hayes wrote:
> ...
> I made the patch because it was causing the config space for a downstream
> port to not get restored when a DPC event occurred, and all the NVMe drives
> under it disappeared.  I found that myself, though--I'm not aware of anyone
> else reporting the issue.

This niggles at me.  IIUC the problem you're reporting is that portdrv
didn't claim a port because portdrv incorrectly assumed the port
supported bandwidth notification interrupts.  That's all fine, and I
think this is a good fix.

But why should it matter whether portdrv claims the port?  What if
CONFIG_PCIEPORTBUS isn't even enabled?  I guess CONFIG_PCIE_DPC
wouldn't be enabled then either.

In your situation, you have CONFIG_PCIEPORTBUS=y and (I assume)
CONFIG_PCIE_DPC=y.  I guess you must have two levels of downstream
ports, e.g.,

  Root Port -> Switch Upstream Port -> Switch Downstream Port -> NVMe

and portdrv claimed the Root Port and you enabled DPC there, but it
didn't claim the Switch Downstream Port?

The failure to restore config space because portdrv didn't claim the
port seems wrong to me.

Bjorn
stuart hayes Aug. 31, 2021, 9:39 p.m. UTC | #11
On 8/31/2021 2:20 PM, Bjorn Helgaas wrote:
> On Wed, May 26, 2021 at 08:12:04PM -0500, stuart hayes wrote:
>> ...
>> I made the patch because it was causing the config space for a downstream
>> port to not get restored when a DPC event occurred, and all the NVMe drives
>> under it disappeared.  I found that myself, though--I'm not aware of anyone
>> else reporting the issue.
> 
> This niggles at me.  IIUC the problem you're reporting is that portdrv
> didn't claim a port because portdrv incorrectly assumed the port
> supported bandwidth notification interrupts.  That's all fine, and I
> think this is a good fix.
> 
> But why should it matter whether portdrv claims the port?  What if
> CONFIG_PCIEPORTBUS isn't even enabled?  I guess CONFIG_PCIE_DPC
> wouldn't be enabled then either.
> 
> In your situation, you have CONFIG_PCIEPORTBUS=y and (I assume)
> CONFIG_PCIE_DPC=y.  I guess you must have two levels of downstream
> ports, e.g.,
> 
>    Root Port -> Switch Upstream Port -> Switch Downstream Port -> NVMe
> 
> and portdrv claimed the Root Port and you enabled DPC there, but it
> didn't claim the Switch Downstream Port?
> 

That's correct.  On the system I was using, there was another layer of 
upstream/downstream ports, but I don't think that matters... I had:

Root Port -> Switch Upstream Port (portdrv claimed) -> Switch Downstream 
Port (portdrv did NOT claim) -> Switch Upstream Port (portdrv claimed) 
-> Switch Downstream Port (portdrv claimed) -> NVMe

> The failure to restore config space because portdrv didn't claim the
> port seems wrong to me.
> 

When a DCP event is triggered on the root port, the downstream devices 
get reset, and portdrv is what restores the switch downstream port's 
config space (in pcie_portdrv_slot_reset).

So if portdrv doesn't claim the downstream port, the config space 
doesn't get restored at all, so it won't forward anything to subordinate 
buses, and everything below the port disappears once the DPC event happens.

I'm not really sure how else it would recover from a DPC event, I guess.

> Bjorn
>
Bjorn Helgaas Aug. 31, 2021, 9:58 p.m. UTC | #12
On Tue, Aug 31, 2021 at 04:39:44PM -0500, stuart hayes wrote:
> On 8/31/2021 2:20 PM, Bjorn Helgaas wrote:
> > On Wed, May 26, 2021 at 08:12:04PM -0500, stuart hayes wrote:
> > > ...
> > > I made the patch because it was causing the config space for a downstream
> > > port to not get restored when a DPC event occurred, and all the NVMe drives
> > > under it disappeared.  I found that myself, though--I'm not aware of anyone
> > > else reporting the issue.
> > 
> > This niggles at me.  IIUC the problem you're reporting is that portdrv
> > didn't claim a port because portdrv incorrectly assumed the port
> > supported bandwidth notification interrupts.  That's all fine, and I
> > think this is a good fix.
> > 
> > But why should it matter whether portdrv claims the port?  What if
> > CONFIG_PCIEPORTBUS isn't even enabled?  I guess CONFIG_PCIE_DPC
> > wouldn't be enabled then either.
> > 
> > In your situation, you have CONFIG_PCIEPORTBUS=y and (I assume)
> > CONFIG_PCIE_DPC=y.  I guess you must have two levels of downstream
> > ports, e.g.,
> > 
> >    Root Port -> Switch Upstream Port -> Switch Downstream Port -> NVMe
> > 
> > and portdrv claimed the Root Port and you enabled DPC there, but it
> > didn't claim the Switch Downstream Port?
> 
> That's correct.  On the system I was using, there was another layer of
> upstream/downstream ports, but I don't think that matters... I had:
> 
> Root Port -> Switch Upstream Port (portdrv claimed) -> Switch Downstream
> Port (portdrv did NOT claim) -> Switch Upstream Port (portdrv claimed) ->
> Switch Downstream Port (portdrv claimed) -> NVMe
> 
> > The failure to restore config space because portdrv didn't claim the
> > port seems wrong to me.
> 
> When a DCP event is triggered on the root port, the downstream devices get
> reset, and portdrv is what restores the switch downstream port's config
> space (in pcie_portdrv_slot_reset).
> 
> So if portdrv doesn't claim the downstream port, the config space doesn't
> get restored at all, so it won't forward anything to subordinate buses, and
> everything below the port disappears once the DPC event happens.

Right.  That's what I assumed was happening.  I just think it's
conceivable that one might *want* portdrv to not claim an intermediate
switch like that.  Maybe that switch doesn't support any of the
portdrv services.

Or maybe you don't have portdrv configured at all.  Do we still
save/restore config space for suspend/resume of the switch?  I guess
this would probably have to be putting the switch into D3cold, since
the device should preserve its config space in D3hot.

I think this kind of functionality ought to be built into the PCI core
instead of being in the portdrv.

> I'm not really sure how else it would recover from a DPC event, I guess.
Lukas Wunner Sept. 1, 2021, 5:48 a.m. UTC | #13
On Tue, Aug 31, 2021 at 04:58:01PM -0500, Bjorn Helgaas wrote:
> I just think it's
> conceivable that one might *want* portdrv to not claim an intermediate
> switch like that.

It's possible to manually unbind portdrv from the device via sysfs
(because portdrv is a driver).  In that case the port will not restore
config space upon an error-induced reset and any devices downstream
of the port will be inaccessible after the reset.

That's the only possible way to screw this up I think.
And it requires deliberate, manual action.  One *could* argue that's
not correct and the kernel shouldn't allow the incorrect behavior
in the first place.  The behavior follows from portdrv being a driver,
instead of its functionality being baked into the PCI core.


> Or maybe you don't have portdrv configured at all.  Do we still
> save/restore config space for suspend/resume of the switch?

We do, because the PCI core takes care of that.  E.g. on resume
from system sleep:

  pci_pm_resume_noirq()
    pci_pm_default_resume_early()
      pci_restore_state()

However after an error-induced reset, it's the job of the device
driver's ->slot_reset() callback to restore config space.
That's a design decision that was made back in 2005 when EEH
was introduced.  See Documentation/PCI/pci-error-recovery.rst:

  It is important for the platform to restore the PCI config space
  to the "fresh poweron" state, rather than the "last state". After
  a slot reset, the device driver will almost always use its standard
  device initialization routines, and an unusual config space setup
  may result in hung devices, kernel panics, or silent data corruption.

I guess it would be possible to perform certain tasks such as
pci_restore_state() centrally in report_slot_reset() instead
(in drivers/pci/pcie/err.c) and alleviate each driver from doing that.

One has to bear in mind though that a device may require specific
steps before pci_restore_state() is called.  E.g. in the case of portdrv,
spurious hotplug DLLSC events need to be acknowledged first:

https://patchwork.ozlabs.org/project/linux-pci/patch/251f4edcc04c14f873ff1c967bc686169cd07d2d.1627638184.git.lukas@wunner.de/


If portdrv isn't configured at all, AER and DPC support cannot be
configured either (because they depend on PCIEPORTBUS), and it's the
reset performed by AER or DPC which necessitates calling pci_restore_state().

If a port supports none of portdrv's services, portdrv still binds to
the port and is thus able to restore config space if a reset is performed
at a port further upstream.  That's because of ...

	if (!capabilities)
		return 0;

... in pcie_port_device_register().  So that should be working correctly.

Thanks,

Lukas
Bjorn Helgaas Sept. 1, 2021, 9:28 p.m. UTC | #14
On Wed, Sep 01, 2021 at 07:48:18AM +0200, Lukas Wunner wrote:
> On Tue, Aug 31, 2021 at 04:58:01PM -0500, Bjorn Helgaas wrote:
> > I just think it's
> > conceivable that one might *want* portdrv to not claim an intermediate
> > switch like that.
> 
> It's possible to manually unbind portdrv from the device via sysfs
> (because portdrv is a driver).  In that case the port will not restore
> config space upon an error-induced reset and any devices downstream
> of the port will be inaccessible after the reset.
> 
> That's the only possible way to screw this up I think.
> And it requires deliberate, manual action.  One *could* argue that's
> not correct and the kernel shouldn't allow the incorrect behavior
> in the first place.  The behavior follows from portdrv being a driver,
> instead of its functionality being baked into the PCI core.

Right.  I do think the overall PCI design would be significantly
cleaner if the portdrv functionality were baked into the PCI core
instead of being a driver.

> > Or maybe you don't have portdrv configured at all.  Do we still
> > save/restore config space for suspend/resume of the switch?
> 
> We do, because the PCI core takes care of that.  E.g. on resume
> from system sleep:
> 
>   pci_pm_resume_noirq()
>     pci_pm_default_resume_early()
>       pci_restore_state()
> 
> However after an error-induced reset, it's the job of the device
> driver's ->slot_reset() callback to restore config space.
> That's a design decision that was made back in 2005 when EEH
> was introduced.  See Documentation/PCI/pci-error-recovery.rst:
> 
>   It is important for the platform to restore the PCI config space
>   to the "fresh poweron" state, rather than the "last state". After
>   a slot reset, the device driver will almost always use its standard
>   device initialization routines, and an unusual config space setup
>   may result in hung devices, kernel panics, or silent data corruption.
> 
> I guess it would be possible to perform certain tasks such as
> pci_restore_state() centrally in report_slot_reset() instead
> (in drivers/pci/pcie/err.c) and alleviate each driver from doing that.
> 
> One has to bear in mind though that a device may require specific
> steps before pci_restore_state() is called.  E.g. in the case of
> portdrv, spurious hotplug DLLSC events need to be acknowledged
> first:
> 
> https://patchwork.ozlabs.org/project/linux-pci/patch/251f4edcc04c14f873ff1c967bc686169cd07d2d.1627638184.git.lukas@wunner.de/

As far as I know, pci_restore_state() only restores things specified
by the PCIe spec.  It doesn't restore any device-specific state, so
I'm a little hesitant about inserting device-specific things in the
middle of that flow.  I know you're solving a real problem with that
patch, and I don't have any better suggestions, but it will take me a
while to assimilate this.

Thanks for all your analysis; it is very helpful!

> If portdrv isn't configured at all, AER and DPC support cannot be
> configured either (because they depend on PCIEPORTBUS), and it's the
> reset performed by AER or DPC which necessitates calling pci_restore_state().
> 
> If a port supports none of portdrv's services, portdrv still binds to
> the port and is thus able to restore config space if a reset is performed
> at a port further upstream.  That's because of ...
> 
> 	if (!capabilities)
> 		return 0;
> 
> ... in pcie_port_device_register().  So that should be working correctly.
> 
> Thanks,
> 
> Lukas
diff mbox series

Patch

diff --git a/drivers/pci/pcie/portdrv_core.c b/drivers/pci/pcie/portdrv_core.c
index e1fed6649c41..3ee63968deaa 100644
--- a/drivers/pci/pcie/portdrv_core.c
+++ b/drivers/pci/pcie/portdrv_core.c
@@ -257,8 +257,13 @@  static int get_port_device_capability(struct pci_dev *dev)
 		services |= PCIE_PORT_SERVICE_DPC;
 
 	if (pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM ||
-	    pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT)
-		services |= PCIE_PORT_SERVICE_BWNOTIF;
+	    pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT) {
+		u32 linkcap;
+
+		pcie_capability_read_dword(dev, PCI_EXP_LNKCAP, &linkcap);
+		if (linkcap & PCI_EXP_LNKCAP_LBNC)
+			services |= PCIE_PORT_SERVICE_BWNOTIF;
+	}
 
 	return services;
 }