diff mbox

[RFC] PCI/portdrv: Fix switch devctrl error report enable.

Message ID 1507885320-122365-1-git-send-email-liudongdong3@huawei.com (mailing list archive)
State New, archived
Delegated to: Bjorn Helgaas
Headers show

Commit Message

Dongdong Liu Oct. 13, 2017, 9:02 a.m. UTC
Current code has a bug, switch upstream/downstream port error report
is disabled.

DevCtl:	Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
-[0000:00]-+-00.0-[01]--
           \-08.0-[02-07]----00.0-[03-07]--+-04.0-[04]--
                                           +-05.0-[05]--
                                           +-08.0-[06]--
                                           \-09.0-[07]--
portdrv_pci.c
pcie_portdrv_probe
	--->pcie_port_device_register
		--->get_port_device_capability
			--->pci_disable_pcie_error_reporting

aerdrv.c
aer_probe
	--->aer_enable_rootport
		--->set_downstream_devices_error_reporting

But current code RP AER driver probe is before switch UP/DP portdrv probe.
This is a RFC PATCH for discussing.

Signed-off-by: Dongdong Liu <liudongdong3@huawei.com>
---
 drivers/pci/pcie/portdrv_core.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Comments

Bjorn Helgaas Oct. 20, 2017, 7:38 p.m. UTC | #1
On Fri, Oct 13, 2017 at 05:02:00PM +0800, Dongdong Liu wrote:
> Current code has a bug, switch upstream/downstream port error report
> is disabled.
> 
> DevCtl:	Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
> -[0000:00]-+-00.0-[01]--
>            \-08.0-[02-07]----00.0-[03-07]--+-04.0-[04]--
>                                            +-05.0-[05]--
>                                            +-08.0-[06]--
>                                            \-09.0-[07]--
> portdrv_pci.c
> pcie_portdrv_probe
> 	--->pcie_port_device_register
> 		--->get_port_device_capability
> 			--->pci_disable_pcie_error_reporting
> 
> aerdrv.c
> aer_probe
> 	--->aer_enable_rootport
> 		--->set_downstream_devices_error_reporting
> 
> But current code RP AER driver probe is before switch UP/DP portdrv probe.
> This is a RFC PATCH for discussing.

OK, I think I see the issue you're pointing out.  aer_probe() only
binds to root ports (".port_type = PCI_EXP_TYPE_ROOT_PORT" in the
aerdriver struct).

We call aer_probe() for a root port, and it enables AER reporting for
the root port and any downstream devices:

  aer_probe(root port)                        # only binds to root ports
    aer_enable_rootport
      set_downstream_devices_error_reporting(root, true)
        set_device_error_reporting(root, true)
          pci_enable_pcie_error_reporting
            pcie_capability_set_word(root, PCI_EXP_DEVCTL, PCI_EXP_AER_FLAGS)
        pci_walk_bus(root->subordinate set_device_error_reporting, true)
          set_device_error_reporting(dev, true)
            pci_enable_pcie_error_reporting
              pcie_capability_set_word(dev, PCI_EXP_DEVCTL, PCI_EXP_AER_FLAGS)

We later call pcie_portdrv_probe() for every downstream bridge (it
matches PCI_CLASS_BRIDGE_PCI devices, then discards any non-PCIe
devices), and it *disables* AER reporting:

  pcie_portdrv_probe(switch port)
    pcie_port_device_register
      get_port_device_capability
        pci_disable_pcie_error_reporting
          pcie_capability_clear_word(dev, PCI_EXP_DEVCTL, PCI_EXP_AER_FLAGS)

The result is that we first enable AER for the downstream switch
ports, then we disable it again.

I agree, that looks like a problem.  I think aer_enable_rootport() is
broken because it binds to one device and it also configures other
downstream devices.  That opens the door to concurrency issues and
makes it really hard to ensure that hotplug works correctly.

I think the aer_probe() path should only touch the device it is
binding, i.e., it should not use pci_walk_bus().  If we need to
configure another device, that should be done in the enumeration path
for *that device*.

> Signed-off-by: Dongdong Liu <liudongdong3@huawei.com>
> ---
>  drivers/pci/pcie/portdrv_core.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/pcie/portdrv_core.c b/drivers/pci/pcie/portdrv_core.c
> index 313a21d..6bb89ce 100644
> --- a/drivers/pci/pcie/portdrv_core.c
> +++ b/drivers/pci/pcie/portdrv_core.c
> @@ -261,7 +261,9 @@ static int get_port_device_capability(struct pci_dev *dev)
>  		 * Disable AER on this port in case it's been enabled by the
>  		 * BIOS (the AER service driver will enable it when necessary).
>  		 */
> -		pci_disable_pcie_error_reporting(dev);
> +		if((pci_pcie_type(dev) != PCI_EXP_TYPE_UPSTREAM) &&
> +		   (pci_pcie_type(dev) != PCI_EXP_TYPE_DOWNSTREAM))
> +			pci_disable_pcie_error_reporting(dev);
>  	}
>  	/* VC support */
>  	if (pci_find_ext_capability(dev, PCI_EXT_CAP_ID_VC))
> -- 
> 1.9.1
>
Dongdong Liu Oct. 24, 2017, 8:29 a.m. UTC | #2
Hi Bjorn

Many thanks for your review.

在 2017/10/21 3:38, Bjorn Helgaas 写道:
> On Fri, Oct 13, 2017 at 05:02:00PM +0800, Dongdong Liu wrote:
>> Current code has a bug, switch upstream/downstream port error report
>> is disabled.
>>
>> DevCtl:	Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
>> -[0000:00]-+-00.0-[01]--
>>            \-08.0-[02-07]----00.0-[03-07]--+-04.0-[04]--
>>                                            +-05.0-[05]--
>>                                            +-08.0-[06]--
>>                                            \-09.0-[07]--
>> portdrv_pci.c
>> pcie_portdrv_probe
>> 	--->pcie_port_device_register
>> 		--->get_port_device_capability
>> 			--->pci_disable_pcie_error_reporting
>>
>> aerdrv.c
>> aer_probe
>> 	--->aer_enable_rootport
>> 		--->set_downstream_devices_error_reporting
>>
>> But current code RP AER driver probe is before switch UP/DP portdrv probe.
>> This is a RFC PATCH for discussing.
>
> OK, I think I see the issue you're pointing out.  aer_probe() only
> binds to root ports (".port_type = PCI_EXP_TYPE_ROOT_PORT" in the
> aerdriver struct).
>
> We call aer_probe() for a root port, and it enables AER reporting for
> the root port and any downstream devices:
>
>   aer_probe(root port)                        # only binds to root ports
>     aer_enable_rootport
>       set_downstream_devices_error_reporting(root, true)
>         set_device_error_reporting(root, true)
>           pci_enable_pcie_error_reporting
>             pcie_capability_set_word(root, PCI_EXP_DEVCTL, PCI_EXP_AER_FLAGS)
>         pci_walk_bus(root->subordinate set_device_error_reporting, true)
>           set_device_error_reporting(dev, true)
>             pci_enable_pcie_error_reporting
>               pcie_capability_set_word(dev, PCI_EXP_DEVCTL, PCI_EXP_AER_FLAGS)
>
> We later call pcie_portdrv_probe() for every downstream bridge (it
> matches PCI_CLASS_BRIDGE_PCI devices, then discards any non-PCIe
> devices), and it *disables* AER reporting:
>
>   pcie_portdrv_probe(switch port)
>     pcie_port_device_register
>       get_port_device_capability
>         pci_disable_pcie_error_reporting
>           pcie_capability_clear_word(dev, PCI_EXP_DEVCTL, PCI_EXP_AER_FLAGS)
>
> The result is that we first enable AER for the downstream switch
> ports, then we disable it again.

Yes, that is the reason why upstream/downstream port error report
is disabled.

>
> I agree, that looks like a problem.  I think aer_enable_rootport() is
> broken because it binds to one device and it also configures other
> downstream devices.  That opens the door to concurrency issues and
> makes it really hard to ensure that hotplug works correctly.
>
> I think the aer_probe() path should only touch the device it is
> binding, i.e., it should not use pci_walk_bus().  If we need to
> configure another device, that should be done in the enumeration path
> for *that device*.

It seems that ACPI _HPX can set PCI_EXP_DEVCTL to enable error report
and ensure that hotplug works correctly.
But we still need the RFC PATCH as the pcie_portdrv_probe() is after _HPX setting.
OK, I will also modify aer_probe() code to make sure that the aer_probe() path
should only touch the device it is binding in PATCH V2.

Thanks,
Dongdong

>
>> Signed-off-by: Dongdong Liu <liudongdong3@huawei.com>
>> ---
>>  drivers/pci/pcie/portdrv_core.c | 4 +++-
>>  1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/pci/pcie/portdrv_core.c b/drivers/pci/pcie/portdrv_core.c
>> index 313a21d..6bb89ce 100644
>> --- a/drivers/pci/pcie/portdrv_core.c
>> +++ b/drivers/pci/pcie/portdrv_core.c
>> @@ -261,7 +261,9 @@ static int get_port_device_capability(struct pci_dev *dev)
>>  		 * Disable AER on this port in case it's been enabled by the
>>  		 * BIOS (the AER service driver will enable it when necessary).
>>  		 */
>> -		pci_disable_pcie_error_reporting(dev);
>> +		if((pci_pcie_type(dev) != PCI_EXP_TYPE_UPSTREAM) &&
>> +		   (pci_pcie_type(dev) != PCI_EXP_TYPE_DOWNSTREAM))
>> +			pci_disable_pcie_error_reporting(dev);
>>  	}
>>  	/* VC support */
>>  	if (pci_find_ext_capability(dev, PCI_EXT_CAP_ID_VC))
>> --
>> 1.9.1
>>
>
> .
>
diff mbox

Patch

diff --git a/drivers/pci/pcie/portdrv_core.c b/drivers/pci/pcie/portdrv_core.c
index 313a21d..6bb89ce 100644
--- a/drivers/pci/pcie/portdrv_core.c
+++ b/drivers/pci/pcie/portdrv_core.c
@@ -261,7 +261,9 @@  static int get_port_device_capability(struct pci_dev *dev)
 		 * Disable AER on this port in case it's been enabled by the
 		 * BIOS (the AER service driver will enable it when necessary).
 		 */
-		pci_disable_pcie_error_reporting(dev);
+		if((pci_pcie_type(dev) != PCI_EXP_TYPE_UPSTREAM) &&
+		   (pci_pcie_type(dev) != PCI_EXP_TYPE_DOWNSTREAM))
+			pci_disable_pcie_error_reporting(dev);
 	}
 	/* VC support */
 	if (pci_find_ext_capability(dev, PCI_EXT_CAP_ID_VC))