diff mbox series

[RFC,v2] PCI: Waiting command completed in get_port_device_capability()

Message ID 1641525769-113099-1-git-send-email-yaohongbo@linux.alibaba.com (mailing list archive)
State New, archived
Delegated to: Bjorn Helgaas
Headers show
Series [RFC,v2] PCI: Waiting command completed in get_port_device_capability() | expand

Commit Message

Yao Hongbo Jan. 7, 2022, 3:22 a.m. UTC
According to the PCIe specification Revision 5.0, section
7.5.3.11 (slot Status Register), if Command Complete notification
is supported,  a write to the slot control register needs to set
the command completed bit, which can indicate the controller is
ready to receive the next command.

However, before probing the pcie hotplug service, there needs to set
HPIE bit in the slot ctrl register to disable hotplug interrupts,
and there is no wait currently.

The interval between the two functions get_port_device_capability() and
pcie_disable_notification() is not long, which may cause the latter to
be interfered by the former.

The command complete event received by pcie_disable_notification() may
belong to the operation of get_port_device_capability().

Signed-off-by: Liguang Zhang <zhangliguang@linux.alibaba.com>
Signed-off-by: Yao Hongbo <yaohongbo@linux.alibaba.com>
---
 drivers/pci/pcie/portdrv_core.c | 40 ++++++++++++++++++++++++++++++++++++++--
 1 file changed, 38 insertions(+), 2 deletions(-)

Comments

Bjorn Helgaas Jan. 11, 2022, 6:55 p.m. UTC | #1
[+cc Lukas, Rafael (in case you have any recollection of 2bd50dd800b5)]

On Fri, Jan 07, 2022 at 11:22:49AM +0800, Yao Hongbo wrote:
> According to the PCIe specification Revision 5.0, section
> 7.5.3.11 (slot Status Register), if Command Complete notification
> is supported,  a write to the slot control register needs to set
> the command completed bit, which can indicate the controller is
> ready to receive the next command.
> 
> However, before probing the pcie hotplug service, there needs to set
> HPIE bit in the slot ctrl register to disable hotplug interrupts,
> and there is no wait currently.
> 
> The interval between the two functions get_port_device_capability() and
> pcie_disable_notification() is not long, which may cause the latter to
> be interfered by the former.
> 
> The command complete event received by pcie_disable_notification() may
> belong to the operation of get_port_device_capability().

Yes, looks like a potential problem.

> Signed-off-by: Liguang Zhang <zhangliguang@linux.alibaba.com>
> Signed-off-by: Yao Hongbo <yaohongbo@linux.alibaba.com>
> ---
>  drivers/pci/pcie/portdrv_core.c | 40 ++++++++++++++++++++++++++++++++++++++--
>  1 file changed, 38 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/pci/pcie/portdrv_core.c b/drivers/pci/pcie/portdrv_core.c
> index bda6308..ec2088b6e 100644
> --- a/drivers/pci/pcie/portdrv_core.c
> +++ b/drivers/pci/pcie/portdrv_core.c
> @@ -15,6 +15,7 @@
>  #include <linux/string.h>
>  #include <linux/slab.h>
>  #include <linux/aer.h>
> +#include <linux/delay.h>
>  
>  #include "../pci.h"
>  #include "portdrv.h"
> @@ -190,6 +191,42 @@ static int pcie_init_service_irqs(struct pci_dev *dev, int *irqs, int mask)
>  	return 0;
>  }
>  
> +static void pcie_port_disable_hp_interrupt(struct pci_dev *dev)
> +{
> +	u16 slot_status;
> +	u32 slot_cap;
> +	int timeout = 1000;
> +
> +	pcie_capability_clear_word(dev, PCI_EXP_SLTCTL,
> +			PCI_EXP_SLTCTL_CCIE | PCI_EXP_SLTCTL_HPIE);
> +
> +	/*
> +	 * If the command completed notification is not supported,
> +	 * we don't need to wait after writing to the slot ctrl register.
> +	 */
> +	pcie_capability_read_dword(dev, PCI_EXP_SLTCAP, &slot_cap);
> +	if (slot_cap & PCI_EXP_SLTCAP_NCCS)
> +		return;
> +
> +	do {
> +		pcie_capability_read_word(dev, PCI_EXP_SLTSTA, &slot_status);
> +		if (slot_status == (u16) ~0) {
> +			pci_info(dev, "%s: no response from device\n",  __func__);
> +			return;
> +		}
> +
> +		if (slot_status & PCI_EXP_SLTSTA_CC) {
> +			pcie_capability_write_word(dev, PCI_EXP_SLTSTA, PCI_EXP_SLTSTA_CC);
> +			return;
> +		}
> +
> +		msleep(10);
> +		timeout -= 10;
> +	} while (timeout >= 0);
> +
> +	pci_info(dev, "Timeout on hotplug disable interrupt!\n");
> +}
> +
>  /**
>   * get_port_device_capability - discover capabilities of a PCI Express port
>   * @dev: PCI Express port to examine
> @@ -213,8 +250,7 @@ static int get_port_device_capability(struct pci_dev *dev)
>  		 * Disable hot-plug interrupts in case they have been enabled
>  		 * by the BIOS and the hot-plug service driver is not loaded.
>  		 */
> -		pcie_capability_clear_word(dev, PCI_EXP_SLTCTL,
> -			  PCI_EXP_SLTCTL_CCIE | PCI_EXP_SLTCTL_HPIE);
> +		pcie_port_disable_hp_interrupt(dev);

This originally came from 2bd50dd800b5 ("PCI: PCIe: Disable PCIe port
services during port initialization"), where we disable hotplug
interrupts in case the hotplug driver is not available.

In general, I think the OS should not be responsible for disabling
interrupts for feature X.  The OS may predate feature X and may not
know anything about X at all.  The power-on default for interrupts
related to X should be "disabled" (as it is for HPIE and CCIE), and if
firmware enables them, it should disable them or arrange to handle
them itself before handing off to the OS.

I don't know whether 2bd50dd800b5 was prompted by spurious hotplug
interrupts or not.  If it was, I think we were seeing a firmware
defect or possibly a pciehp initialization issue.

At the time of 2bd50dd800b5, we always cleared HPIE and CCIE here.

But now, on ACPI systems, we only clear HPIE and CCIE here if we *do*
have the hotplug driver (because host->native_pcie_hotplug only
remains set if we have been granted control via _OSC, and we only
request control when CONFIG_HOTPLUG_PCI_PCIE is enabled).  On these
systems, we should be able to remove this disable code because pciehp
will do whatever it needs.

For non-ACPI systems, bridge->native_pcie_hotplug will always be set,
so we will clear HPIE and CCIE here and then (if
CONFIG_HOTPLUG_PCI_PCIE is enabled) initialize pciehp soon after,
which may be a problem as you describe.

What kind of system are you seeing the problem on?  It seems like it
should be safe to drop the HPIE and CCIE disable here for ACPI
systems.  And *likely* we could do the same for non-ACPI systems,
though I have no experience there.

Bjorn
Lukas Wunner Jan. 11, 2022, 9:55 p.m. UTC | #2
On Tue, Jan 11, 2022 at 12:55:38PM -0600, Bjorn Helgaas wrote:
> On Fri, Jan 07, 2022 at 11:22:49AM +0800, Yao Hongbo wrote:
> > According to the PCIe specification Revision 5.0, section
> > 7.5.3.11 (slot Status Register), if Command Complete notification

Hm, I only have the PCIe r4.0 Base Spec available and the section
number is 7.8.11 there.  Is 7.5.3.11 really correct for the r5.0 spec?


> > However, before probing the pcie hotplug service, there needs to set
> > HPIE bit in the slot ctrl register to disable hotplug interrupts,
> > and there is no wait currently.

There is a similar issue on resume.  The PCI core writes to the
Slot Control register in...

  pci_pm_resume_noirq()
    pci_pm_default_resume_early()
      pci_restore_state()
        pci_restore_pcie_state()

...and to ensure that the proper Command Completed dance is performed,
we do this in pciehp_resume_noirq() and pciehp_runtime_resume():

	/* pci_restore_state() just wrote to the Slot Control register */
	ctrl->cmd_started = jiffies;
	ctrl->cmd_busy = true;

That was introduced by 469e764c4a3c.

I'd prefer that we do the same in pciehp_probe() (or pcie_init())
if we come to the conlusion that we want to continue disabling HPIE
and CCIE in get_port_device_capability().  It's a much simpler and
smaller solution than the one proposed in the present patch.


> But now, on ACPI systems, we only clear HPIE and CCIE here if we *do*
> have the hotplug driver (because host->native_pcie_hotplug only
> remains set if we have been granted control via _OSC, and we only
> request control when CONFIG_HOTPLUG_PCI_PCIE is enabled).  On these
> systems, we should be able to remove this disable code because pciehp
> will do whatever it needs.

I guess an argument could be made that even though CONFIG_HOTPLUG_PCI_PCIE
is enabled, pciehp may fail to probe, e.g. because the call to kzalloc()
in pcie_init() returns NULL due to insufficient memory.  Then the interrupt
would remain enabled if the BIOS neglected to disable it.  That could be
accounted for by rearranging pciehp such that the interrupt is disabled
first thing on probe.  Of course that approach doesn't solve the problem
if CONFIG_HOTPLUG_PCI_PCIE is disabled and we'd still have to disable
the interrupt in get_port_device_capability() in that case.

I think it generally makes sense to assume the worst and compensate for
broken BIOSes.

Hotplug and PME share the same MSI or MSI-X vector per section 6.7.3.4
of the PCIe Base Spec, so enabled but unhandled hotplug interrupts will
cause the PME interrupt handler to run.  That's probably not fatal,
but still undesirable.

Thanks,

Lukas
Yao Hongbo Jan. 12, 2022, 7:33 a.m. UTC | #3
在 2022/1/12 上午2:55, Bjorn Helgaas 写道:
> [+cc Lukas, Rafael (in case you have any recollection of 2bd50dd800b5)]
> 
> On Fri, Jan 07, 2022 at 11:22:49AM +0800, Yao Hongbo wrote:
>> According to the PCIe specification Revision 5.0, section
>> 7.5.3.11 (slot Status Register), if Command Complete notification
>> is supported,  a write to the slot control register needs to set
>> the command completed bit, which can indicate the controller is
>> ready to receive the next command.
>>
>> However, before probing the pcie hotplug service, there needs to set
>> HPIE bit in the slot ctrl register to disable hotplug interrupts,
>> and there is no wait currently.
>>
>> The interval between the two functions get_port_device_capability() and
>> pcie_disable_notification() is not long, which may cause the latter to
>> be interfered by the former.
>>
>> The command complete event received by pcie_disable_notification() may
>> belong to the operation of get_port_device_capability().
> 
> Yes, looks like a potential problem.
> 
>> Signed-off-by: Liguang Zhang <zhangliguang@linux.alibaba.com>
>> Signed-off-by: Yao Hongbo <yaohongbo@linux.alibaba.com>
>> ---
>>  drivers/pci/pcie/portdrv_core.c | 40 ++++++++++++++++++++++++++++++++++++++--
>>  1 file changed, 38 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/pci/pcie/portdrv_core.c b/drivers/pci/pcie/portdrv_core.c
>> index bda6308..ec2088b6e 100644
>> --- a/drivers/pci/pcie/portdrv_core.c
>> +++ b/drivers/pci/pcie/portdrv_core.c
>> @@ -15,6 +15,7 @@
>>  #include <linux/string.h>
>>  #include <linux/slab.h>
>>  #include <linux/aer.h>
>> +#include <linux/delay.h>
>>  
>>  #include "../pci.h"
>>  #include "portdrv.h"
>> @@ -190,6 +191,42 @@ static int pcie_init_service_irqs(struct pci_dev *dev, int *irqs, int mask)
>>  	return 0;
>>  }
>>  
>> +static void pcie_port_disable_hp_interrupt(struct pci_dev *dev)
>> +{
>> +	u16 slot_status;
>> +	u32 slot_cap;
>> +	int timeout = 1000;
>> +
>> +	pcie_capability_clear_word(dev, PCI_EXP_SLTCTL,
>> +			PCI_EXP_SLTCTL_CCIE | PCI_EXP_SLTCTL_HPIE);
>> +
>> +	/*
>> +	 * If the command completed notification is not supported,
>> +	 * we don't need to wait after writing to the slot ctrl register.
>> +	 */
>> +	pcie_capability_read_dword(dev, PCI_EXP_SLTCAP, &slot_cap);
>> +	if (slot_cap & PCI_EXP_SLTCAP_NCCS)
>> +		return;
>> +
>> +	do {
>> +		pcie_capability_read_word(dev, PCI_EXP_SLTSTA, &slot_status);
>> +		if (slot_status == (u16) ~0) {
>> +			pci_info(dev, "%s: no response from device\n",  __func__);
>> +			return;
>> +		}
>> +
>> +		if (slot_status & PCI_EXP_SLTSTA_CC) {
>> +			pcie_capability_write_word(dev, PCI_EXP_SLTSTA, PCI_EXP_SLTSTA_CC);
>> +			return;
>> +		}
>> +
>> +		msleep(10);
>> +		timeout -= 10;
>> +	} while (timeout >= 0);
>> +
>> +	pci_info(dev, "Timeout on hotplug disable interrupt!\n");
>> +}
>> +
>>  /**
>>   * get_port_device_capability - discover capabilities of a PCI Express port
>>   * @dev: PCI Express port to examine
>> @@ -213,8 +250,7 @@ static int get_port_device_capability(struct pci_dev *dev)
>>  		 * Disable hot-plug interrupts in case they have been enabled
>>  		 * by the BIOS and the hot-plug service driver is not loaded.
>>  		 */
>> -		pcie_capability_clear_word(dev, PCI_EXP_SLTCTL,
>> -			  PCI_EXP_SLTCTL_CCIE | PCI_EXP_SLTCTL_HPIE);
>> +		pcie_port_disable_hp_interrupt(dev);
> 
> This originally came from 2bd50dd800b5 ("PCI: PCIe: Disable PCIe port
> services during port initialization"), where we disable hotplug
> interrupts in case the hotplug driver is not available.
> 
> In general, I think the OS should not be responsible for disabling
> interrupts for feature X.  The OS may predate feature X and may not
> know anything about X at all.  The power-on default for interrupts
> related to X should be "disabled" (as it is for HPIE and CCIE), and if
> firmware enables them, it should disable them or arrange to handle
> them itself before handing off to the OS.
> 
> I don't know whether 2bd50dd800b5 was prompted by spurious hotplug
> interrupts or not.  If it was, I think we were seeing a firmware
> defect or possibly a pciehp initialization issue.
> 
> At the time of 2bd50dd800b5, we always cleared HPIE and CCIE here.
> 
> But now, on ACPI systems, we only clear HPIE and CCIE here if we *do*
> have the hotplug driver (because host->native_pcie_hotplug only
> remains set if we have been granted control via _OSC, and we only
> request control when CONFIG_HOTPLUG_PCI_PCIE is enabled).  On these
> systems, we should be able to remove this disable code because pciehp
> will do whatever it needs.
> 
> For non-ACPI systems, bridge->native_pcie_hotplug will always be set,
> so we will clear HPIE and CCIE here and then (if
> CONFIG_HOTPLUG_PCI_PCIE is enabled) initialize pciehp soon after,
> which may be a problem as you describe.
> 
> What kind of system are you seeing the problem on?  It seems like it
> should be safe to drop the HPIE and CCIE disable here for ACPI
> systems.  And *likely* we could do the same for non-ACPI systems,
> though I have no experience there.

Hi, Bjorn
Thanks for your comments.

The problem occurs on ACPI systems.

[    7.739359] acpi PNP0A08:00: _OSC: OS supports [ExtendedConfig ASPM ClockPM Segments MSI EDR HPX-Type3]
[    7.749238] acpi PNP0A08:00: _OSC: platform does not support [SHPCHotplug AER LTR DPC]
[    7.757609] acpi PNP0A08:00: _OSC: OS now controls [PCIeHotplug PME PCIeCapability]


We clear HPIE and CCIE here because the firmware doesn't control Hotplug via __OSC.

And on ACPI systems, we can also set pcie_ports=native, which will also encounter such problems.



> Bjorn
Bjorn Helgaas Jan. 12, 2022, 6:01 p.m. UTC | #4
On Wed, Jan 12, 2022 at 03:33:25PM +0800, Yao Hongbo wrote:
> 
> 
> 在 2022/1/12 上午2:55, Bjorn Helgaas 写道:
> > [+cc Lukas, Rafael (in case you have any recollection of 2bd50dd800b5)]
> > 
> > On Fri, Jan 07, 2022 at 11:22:49AM +0800, Yao Hongbo wrote:
> >> According to the PCIe specification Revision 5.0, section
> >> 7.5.3.11 (slot Status Register), if Command Complete notification
> >> is supported,  a write to the slot control register needs to set
> >> the command completed bit, which can indicate the controller is
> >> ready to receive the next command.
> >>
> >> However, before probing the pcie hotplug service, there needs to set
> >> HPIE bit in the slot ctrl register to disable hotplug interrupts,
> >> and there is no wait currently.
> >>
> >> The interval between the two functions get_port_device_capability() and
> >> pcie_disable_notification() is not long, which may cause the latter to
> >> be interfered by the former.
> >>
> >> The command complete event received by pcie_disable_notification() may
> >> belong to the operation of get_port_device_capability().
> > 
> > Yes, looks like a potential problem.
> > 
> >> Signed-off-by: Liguang Zhang <zhangliguang@linux.alibaba.com>
> >> Signed-off-by: Yao Hongbo <yaohongbo@linux.alibaba.com>
> >> ---
> >>  drivers/pci/pcie/portdrv_core.c | 40 ++++++++++++++++++++++++++++++++++++++--
> >>  1 file changed, 38 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/drivers/pci/pcie/portdrv_core.c b/drivers/pci/pcie/portdrv_core.c
> >> index bda6308..ec2088b6e 100644
> >> --- a/drivers/pci/pcie/portdrv_core.c
> >> +++ b/drivers/pci/pcie/portdrv_core.c
> >> @@ -15,6 +15,7 @@
> >>  #include <linux/string.h>
> >>  #include <linux/slab.h>
> >>  #include <linux/aer.h>
> >> +#include <linux/delay.h>
> >>  
> >>  #include "../pci.h"
> >>  #include "portdrv.h"
> >> @@ -190,6 +191,42 @@ static int pcie_init_service_irqs(struct pci_dev *dev, int *irqs, int mask)
> >>  	return 0;
> >>  }
> >>  
> >> +static void pcie_port_disable_hp_interrupt(struct pci_dev *dev)
> >> +{
> >> +	u16 slot_status;
> >> +	u32 slot_cap;
> >> +	int timeout = 1000;
> >> +
> >> +	pcie_capability_clear_word(dev, PCI_EXP_SLTCTL,
> >> +			PCI_EXP_SLTCTL_CCIE | PCI_EXP_SLTCTL_HPIE);
> >> +
> >> +	/*
> >> +	 * If the command completed notification is not supported,
> >> +	 * we don't need to wait after writing to the slot ctrl register.
> >> +	 */
> >> +	pcie_capability_read_dword(dev, PCI_EXP_SLTCAP, &slot_cap);
> >> +	if (slot_cap & PCI_EXP_SLTCAP_NCCS)
> >> +		return;
> >> +
> >> +	do {
> >> +		pcie_capability_read_word(dev, PCI_EXP_SLTSTA, &slot_status);
> >> +		if (slot_status == (u16) ~0) {
> >> +			pci_info(dev, "%s: no response from device\n",  __func__);
> >> +			return;
> >> +		}
> >> +
> >> +		if (slot_status & PCI_EXP_SLTSTA_CC) {
> >> +			pcie_capability_write_word(dev, PCI_EXP_SLTSTA, PCI_EXP_SLTSTA_CC);
> >> +			return;
> >> +		}
> >> +
> >> +		msleep(10);
> >> +		timeout -= 10;
> >> +	} while (timeout >= 0);
> >> +
> >> +	pci_info(dev, "Timeout on hotplug disable interrupt!\n");
> >> +}
> >> +
> >>  /**
> >>   * get_port_device_capability - discover capabilities of a PCI Express port
> >>   * @dev: PCI Express port to examine
> >> @@ -213,8 +250,7 @@ static int get_port_device_capability(struct pci_dev *dev)
> >>  		 * Disable hot-plug interrupts in case they have been enabled
> >>  		 * by the BIOS and the hot-plug service driver is not loaded.
> >>  		 */
> >> -		pcie_capability_clear_word(dev, PCI_EXP_SLTCTL,
> >> -			  PCI_EXP_SLTCTL_CCIE | PCI_EXP_SLTCTL_HPIE);
> >> +		pcie_port_disable_hp_interrupt(dev);
> > 
> > This originally came from 2bd50dd800b5 ("PCI: PCIe: Disable PCIe port
> > services during port initialization"), where we disable hotplug
> > interrupts in case the hotplug driver is not available.
> > 
> > In general, I think the OS should not be responsible for disabling
> > interrupts for feature X.  The OS may predate feature X and may not
> > know anything about X at all.  The power-on default for interrupts
> > related to X should be "disabled" (as it is for HPIE and CCIE), and if
> > firmware enables them, it should disable them or arrange to handle
> > them itself before handing off to the OS.
> > 
> > I don't know whether 2bd50dd800b5 was prompted by spurious hotplug
> > interrupts or not.  If it was, I think we were seeing a firmware
> > defect or possibly a pciehp initialization issue.
> > 
> > At the time of 2bd50dd800b5, we always cleared HPIE and CCIE here.
> > 
> > But now, on ACPI systems, we only clear HPIE and CCIE here if we *do*
> > have the hotplug driver (because host->native_pcie_hotplug only
> > remains set if we have been granted control via _OSC, and we only
> > request control when CONFIG_HOTPLUG_PCI_PCIE is enabled).  On these
> > systems, we should be able to remove this disable code because pciehp
> > will do whatever it needs.
> > 
> > For non-ACPI systems, bridge->native_pcie_hotplug will always be set,
> > so we will clear HPIE and CCIE here and then (if
> > CONFIG_HOTPLUG_PCI_PCIE is enabled) initialize pciehp soon after,
> > which may be a problem as you describe.
> > 
> > What kind of system are you seeing the problem on?  It seems like it
> > should be safe to drop the HPIE and CCIE disable here for ACPI
> > systems.  And *likely* we could do the same for non-ACPI systems,
> > though I have no experience there.
> 
> Hi, Bjorn
> Thanks for your comments.
> 
> The problem occurs on ACPI systems.
> 
>  acpi PNP0A08:00: _OSC: OS supports [ExtendedConfig ASPM ClockPM Segments MSI EDR HPX-Type3]
>  acpi PNP0A08:00: _OSC: platform does not support [SHPCHotplug AER LTR DPC]
>  acpi PNP0A08:00: _OSC: OS now controls [PCIeHotplug PME PCIeCapability]
> 
> We clear HPIE and CCIE here because the firmware doesn't control
> Hotplug via __OSC.
> 
> And on ACPI systems, we can also set pcie_ports=native, which will
> also encounter such problems.

What happens if you just drop that call like the patch below?

If that avoids the problem, then we can talk about whether we need to
worry about broken firmware in the non-ACPI or "pcie_ports=native"
cases.

diff --git a/drivers/pci/pcie/portdrv_core.c b/drivers/pci/pcie/portdrv_core.c
index bda630889f95..76a3bd237bf9 100644
--- a/drivers/pci/pcie/portdrv_core.c
+++ b/drivers/pci/pcie/portdrv_core.c
@@ -208,13 +208,6 @@ static int get_port_device_capability(struct pci_dev *dev)
 	if (dev->is_hotplug_bridge &&
 	    (pcie_ports_native || host->native_pcie_hotplug)) {
 		services |= PCIE_PORT_SERVICE_HP;
-
-		/*
-		 * Disable hot-plug interrupts in case they have been enabled
-		 * by the BIOS and the hot-plug service driver is not loaded.
-		 */
-		pcie_capability_clear_word(dev, PCI_EXP_SLTCTL,
-			  PCI_EXP_SLTCTL_CCIE | PCI_EXP_SLTCTL_HPIE);
 	}
 
 #ifdef CONFIG_PCIEAER
Yao Hongbo Jan. 13, 2022, 7:34 a.m. UTC | #5
在 2022/1/13 上午2:01, Bjorn Helgaas 写道:
> On Wed, Jan 12, 2022 at 03:33:25PM +0800, Yao Hongbo wrote:
>>
>>
>> 在 2022/1/12 上午2:55, Bjorn Helgaas 写道:
>>> [+cc Lukas, Rafael (in case you have any recollection of 2bd50dd800b5)]
>>>
>>> On Fri, Jan 07, 2022 at 11:22:49AM +0800, Yao Hongbo wrote:
>>>> According to the PCIe specification Revision 5.0, section
>>>> 7.5.3.11 (slot Status Register), if Command Complete notification
>>>> is supported,  a write to the slot control register needs to set
>>>> the command completed bit, which can indicate the controller is
>>>> ready to receive the next command.
>>>>
>>>> However, before probing the pcie hotplug service, there needs to set
>>>> HPIE bit in the slot ctrl register to disable hotplug interrupts,
>>>> and there is no wait currently.
>>>>
>>>> The interval between the two functions get_port_device_capability() and
>>>> pcie_disable_notification() is not long, which may cause the latter to
>>>> be interfered by the former.
>>>>
>>>> The command complete event received by pcie_disable_notification() may
>>>> belong to the operation of get_port_device_capability().
>>>
>>> Yes, looks like a potential problem.
>>>
>>>> Signed-off-by: Liguang Zhang <zhangliguang@linux.alibaba.com>
>>>> Signed-off-by: Yao Hongbo <yaohongbo@linux.alibaba.com>
>>>> ---
>>>>  drivers/pci/pcie/portdrv_core.c | 40 ++++++++++++++++++++++++++++++++++++++--
>>>>  1 file changed, 38 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/drivers/pci/pcie/portdrv_core.c b/drivers/pci/pcie/portdrv_core.c
>>>> index bda6308..ec2088b6e 100644
>>>> --- a/drivers/pci/pcie/portdrv_core.c
>>>> +++ b/drivers/pci/pcie/portdrv_core.c
>>>> @@ -15,6 +15,7 @@
>>>>  #include <linux/string.h>
>>>>  #include <linux/slab.h>
>>>>  #include <linux/aer.h>
>>>> +#include <linux/delay.h>
>>>>  
>>>>  #include "../pci.h"
>>>>  #include "portdrv.h"
>>>> @@ -190,6 +191,42 @@ static int pcie_init_service_irqs(struct pci_dev *dev, int *irqs, int mask)
>>>>  	return 0;
>>>>  }
>>>>  
>>>> +static void pcie_port_disable_hp_interrupt(struct pci_dev *dev)
>>>> +{
>>>> +	u16 slot_status;
>>>> +	u32 slot_cap;
>>>> +	int timeout = 1000;
>>>> +
>>>> +	pcie_capability_clear_word(dev, PCI_EXP_SLTCTL,
>>>> +			PCI_EXP_SLTCTL_CCIE | PCI_EXP_SLTCTL_HPIE);
>>>> +
>>>> +	/*
>>>> +	 * If the command completed notification is not supported,
>>>> +	 * we don't need to wait after writing to the slot ctrl register.
>>>> +	 */
>>>> +	pcie_capability_read_dword(dev, PCI_EXP_SLTCAP, &slot_cap);
>>>> +	if (slot_cap & PCI_EXP_SLTCAP_NCCS)
>>>> +		return;
>>>> +
>>>> +	do {
>>>> +		pcie_capability_read_word(dev, PCI_EXP_SLTSTA, &slot_status);
>>>> +		if (slot_status == (u16) ~0) {
>>>> +			pci_info(dev, "%s: no response from device\n",  __func__);
>>>> +			return;
>>>> +		}
>>>> +
>>>> +		if (slot_status & PCI_EXP_SLTSTA_CC) {
>>>> +			pcie_capability_write_word(dev, PCI_EXP_SLTSTA, PCI_EXP_SLTSTA_CC);
>>>> +			return;
>>>> +		}
>>>> +
>>>> +		msleep(10);
>>>> +		timeout -= 10;
>>>> +	} while (timeout >= 0);
>>>> +
>>>> +	pci_info(dev, "Timeout on hotplug disable interrupt!\n");
>>>> +}
>>>> +
>>>>  /**
>>>>   * get_port_device_capability - discover capabilities of a PCI Express port
>>>>   * @dev: PCI Express port to examine
>>>> @@ -213,8 +250,7 @@ static int get_port_device_capability(struct pci_dev *dev)
>>>>  		 * Disable hot-plug interrupts in case they have been enabled
>>>>  		 * by the BIOS and the hot-plug service driver is not loaded.
>>>>  		 */
>>>> -		pcie_capability_clear_word(dev, PCI_EXP_SLTCTL,
>>>> -			  PCI_EXP_SLTCTL_CCIE | PCI_EXP_SLTCTL_HPIE);
>>>> +		pcie_port_disable_hp_interrupt(dev);
>>>
>>> This originally came from 2bd50dd800b5 ("PCI: PCIe: Disable PCIe port
>>> services during port initialization"), where we disable hotplug
>>> interrupts in case the hotplug driver is not available.
>>>
>>> In general, I think the OS should not be responsible for disabling
>>> interrupts for feature X.  The OS may predate feature X and may not
>>> know anything about X at all.  The power-on default for interrupts
>>> related to X should be "disabled" (as it is for HPIE and CCIE), and if
>>> firmware enables them, it should disable them or arrange to handle
>>> them itself before handing off to the OS.
>>>
>>> I don't know whether 2bd50dd800b5 was prompted by spurious hotplug
>>> interrupts or not.  If it was, I think we were seeing a firmware
>>> defect or possibly a pciehp initialization issue.
>>>
>>> At the time of 2bd50dd800b5, we always cleared HPIE and CCIE here.
>>>
>>> But now, on ACPI systems, we only clear HPIE and CCIE here if we *do*
>>> have the hotplug driver (because host->native_pcie_hotplug only
>>> remains set if we have been granted control via _OSC, and we only
>>> request control when CONFIG_HOTPLUG_PCI_PCIE is enabled).  On these
>>> systems, we should be able to remove this disable code because pciehp
>>> will do whatever it needs.
>>>
>>> For non-ACPI systems, bridge->native_pcie_hotplug will always be set,
>>> so we will clear HPIE and CCIE here and then (if
>>> CONFIG_HOTPLUG_PCI_PCIE is enabled) initialize pciehp soon after,
>>> which may be a problem as you describe.
>>>
>>> What kind of system are you seeing the problem on?  It seems like it
>>> should be safe to drop the HPIE and CCIE disable here for ACPI
>>> systems.  And *likely* we could do the same for non-ACPI systems,
>>> though I have no experience there.
>>
>> Hi, Bjorn
>> Thanks for your comments.
>>
>> The problem occurs on ACPI systems.
>>
>>  acpi PNP0A08:00: _OSC: OS supports [ExtendedConfig ASPM ClockPM Segments MSI EDR HPX-Type3]
>>  acpi PNP0A08:00: _OSC: platform does not support [SHPCHotplug AER LTR DPC]
>>  acpi PNP0A08:00: _OSC: OS now controls [PCIeHotplug PME PCIeCapability]
>>
>> We clear HPIE and CCIE here because the firmware doesn't control
>> Hotplug via __OSC.
>>
>> And on ACPI systems, we can also set pcie_ports=native, which will
>> also encounter such problems.
> 
> What happens if you just drop that call like the patch below?
> 
> If that avoids the problem, then we can talk about whether we need to
> worry about broken firmware in the non-ACPI or "pcie_ports=native"
> cases.

Hi, Bjorn.
This can avoid the problem currently.

But i'm not sure if removing this code will introduce other problems,
such as suprious hotplug before probing hotplug service.

Thanks,
Hongbo.


> diff --git a/drivers/pci/pcie/portdrv_core.c b/drivers/pci/pcie/portdrv_core.c
> index bda630889f95..76a3bd237bf9 100644
> --- a/drivers/pci/pcie/portdrv_core.c
> +++ b/drivers/pci/pcie/portdrv_core.c
> @@ -208,13 +208,6 @@ static int get_port_device_capability(struct pci_dev *dev)
>  	if (dev->is_hotplug_bridge &&
>  	    (pcie_ports_native || host->native_pcie_hotplug)) {
>  		services |= PCIE_PORT_SERVICE_HP;
> -
> -		/*
> -		 * Disable hot-plug interrupts in case they have been enabled
> -		 * by the BIOS and the hot-plug service driver is not loaded.
> -		 */
> -		pcie_capability_clear_word(dev, PCI_EXP_SLTCTL,
> -			  PCI_EXP_SLTCTL_CCIE | PCI_EXP_SLTCTL_HPIE);
>  	}
>  
>  #ifdef CONFIG_PCIEAER
Bjorn Helgaas Jan. 13, 2022, 2:58 p.m. UTC | #6
On Thu, Jan 13, 2022 at 03:34:41PM +0800, Yao Hongbo wrote:
> 在 2022/1/13 上午2:01, Bjorn Helgaas 写道:
> > On Wed, Jan 12, 2022 at 03:33:25PM +0800, Yao Hongbo wrote:
> >> 在 2022/1/12 上午2:55, Bjorn Helgaas 写道:

> >>> What kind of system are you seeing the problem on?  It seems like it
> >>> should be safe to drop the HPIE and CCIE disable here for ACPI
> >>> systems.  And *likely* we could do the same for non-ACPI systems,
> >>> though I have no experience there.
> >>
> >> Hi, Bjorn
> >> Thanks for your comments.
> >>
> >> The problem occurs on ACPI systems.
> >>
> >>  acpi PNP0A08:00: _OSC: OS supports [ExtendedConfig ASPM ClockPM Segments MSI EDR HPX-Type3]
> >>  acpi PNP0A08:00: _OSC: platform does not support [SHPCHotplug AER LTR DPC]
> >>  acpi PNP0A08:00: _OSC: OS now controls [PCIeHotplug PME PCIeCapability]
> >>
> >> We clear HPIE and CCIE here because the firmware doesn't control
> >> Hotplug via __OSC.
> >>
> >> And on ACPI systems, we can also set pcie_ports=native, which will
> >> also encounter such problems.
> > 
> > What happens if you just drop that call like the patch below?
> > 
> > If that avoids the problem, then we can talk about whether we need to
> > worry about broken firmware in the non-ACPI or "pcie_ports=native"
> > cases.
> 
> Hi, Bjorn.
> This can avoid the problem currently.

Thanks for collecting that data!

> But i'm not sure if removing this code will introduce other problems,
> such as suprious hotplug before probing hotplug service.

I don't think spurious hotplug before probing the hotplug service is
an issue because there is always a window like this.  It's either
between:

  A: Handoff from firmware to OS
  B: get_port_device_capability()

or between:

  A: Handoff from firmware to OS
  C: pciehp initialization

Either way there is *some* window, and we have to decide how to deal
with it.  The exact duration is not very important.

Bjorn
diff mbox series

Patch

diff --git a/drivers/pci/pcie/portdrv_core.c b/drivers/pci/pcie/portdrv_core.c
index bda6308..ec2088b6e 100644
--- a/drivers/pci/pcie/portdrv_core.c
+++ b/drivers/pci/pcie/portdrv_core.c
@@ -15,6 +15,7 @@ 
 #include <linux/string.h>
 #include <linux/slab.h>
 #include <linux/aer.h>
+#include <linux/delay.h>
 
 #include "../pci.h"
 #include "portdrv.h"
@@ -190,6 +191,42 @@  static int pcie_init_service_irqs(struct pci_dev *dev, int *irqs, int mask)
 	return 0;
 }
 
+static void pcie_port_disable_hp_interrupt(struct pci_dev *dev)
+{
+	u16 slot_status;
+	u32 slot_cap;
+	int timeout = 1000;
+
+	pcie_capability_clear_word(dev, PCI_EXP_SLTCTL,
+			PCI_EXP_SLTCTL_CCIE | PCI_EXP_SLTCTL_HPIE);
+
+	/*
+	 * If the command completed notification is not supported,
+	 * we don't need to wait after writing to the slot ctrl register.
+	 */
+	pcie_capability_read_dword(dev, PCI_EXP_SLTCAP, &slot_cap);
+	if (slot_cap & PCI_EXP_SLTCAP_NCCS)
+		return;
+
+	do {
+		pcie_capability_read_word(dev, PCI_EXP_SLTSTA, &slot_status);
+		if (slot_status == (u16) ~0) {
+			pci_info(dev, "%s: no response from device\n",  __func__);
+			return;
+		}
+
+		if (slot_status & PCI_EXP_SLTSTA_CC) {
+			pcie_capability_write_word(dev, PCI_EXP_SLTSTA, PCI_EXP_SLTSTA_CC);
+			return;
+		}
+
+		msleep(10);
+		timeout -= 10;
+	} while (timeout >= 0);
+
+	pci_info(dev, "Timeout on hotplug disable interrupt!\n");
+}
+
 /**
  * get_port_device_capability - discover capabilities of a PCI Express port
  * @dev: PCI Express port to examine
@@ -213,8 +250,7 @@  static int get_port_device_capability(struct pci_dev *dev)
 		 * Disable hot-plug interrupts in case they have been enabled
 		 * by the BIOS and the hot-plug service driver is not loaded.
 		 */
-		pcie_capability_clear_word(dev, PCI_EXP_SLTCTL,
-			  PCI_EXP_SLTCTL_CCIE | PCI_EXP_SLTCTL_HPIE);
+		pcie_port_disable_hp_interrupt(dev);
 	}
 
 #ifdef CONFIG_PCIEAER