diff mbox series

[v2] PCI/AER: Add support for reset of RCiEPs for APEI/Firmware first reporting only

Message ID 20200622114402.892798-1-Jonathan.Cameron@huawei.com (mailing list archive)
State New, archived
Delegated to: Bjorn Helgaas
Headers show
Series [v2] PCI/AER: Add support for reset of RCiEPs for APEI/Firmware first reporting only | expand

Commit Message

Jonathan Cameron June 22, 2020, 11:44 a.m. UTC
Was previously: PCI/AER: Add partial initial supprot for RCiEPs using RCEC or
firmware first.

Currently the kernel does not handle AER errors for Root Complex integrated
End Points (RCiEPs)[0].  These devices sit on a root bus within the Root Complex
(RC).  AER handling is performed by a Root Complex Event Collector (RCEC) [1]
which is a effectively a type of RCiEP on the same root bus.

This code will only perform the correct reset flow for the case where there
is no need to take any actions on the RCEC because the firmware is
responsible for them.   This is true where APEI [2] is used to report the AER
errors via a GHES[v2] HEST entry [3] and relevant AER CPER record [4] and Firmware
First handling is in use.

As there is no current RCEC driver support, it should not be possible to get
to this code via any routes other than the one above. Hence appropriate RCEC
handling can be added when the RCEC driver support is ready.

The error handling is different from a normal PCIe End Point because:

1) There is no downstream port above an RCiEP as these devices sit on a root
   bus.

2) In general, it makes little sense to reset other devices on on the same
   root bus.  For error handling outside the of the root complex (RC) an AER
   error will indicate that all the topology below the physical link, which
   the error is related to, will need to be reset as they share a common
   path to the host.  For an RCiEP there is no such defined shared path
   relationship with other elements on the root bus.

A new walk function, similar to pci_bus_walk is provided that takes a pci_dev
instead of a bus.  If that dev corresponds to a downstream port it will walk
the subordinate bus of that downstream port.  If the dev does not then it
will call the function on that device alone.   This function allows us to
avoid adding special cases to the majority of the error handling.

Open questions:

1. Are we better protecting against link reset for an RCiEP in here or
   should we put the check in the link reset functions?

2. If we were to get a stupid firmware record with the relevant reset flag
   set to trigger a link reset, what is the correct response?  For now I
   try to report that we haven't done anything and print a warning.

3. Naming of pci_walk_below_dev is rather unsatisfying. Any better ideas?

4. pci_walk_below_dev is perhaps not of general utility. Shall I make it local
   in err.c?  If not would a precursor patch for that be preferred?

Testing has been performed via error injection on a QEMU platform as that lets
me create a wide range of topologies and report errors at any chosen location.
Currently I have no plans to upstream this injection support, but am happy to
share if useful to others.

[0] ACPI PCI Express Base Specification 4.0 1.3.2.3 Root Complex Integrated
    Endpoint Rules.
[1] ACPI PCI Express Base Specification 4.0 6.2 Error Signalling and Logging
[2] ACPI Specification 6.3 Chapter 18 ACPI Platform Error Interface (APEI)
[3] ACPI Sepcification 6.3 18.2.3.7 Generic Hardware Error Source
[4] UEFI Specification 2.8, N.2.7 PCI Express Error Section

Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
---
Changes since v1:
* Separated from the largely unrelated fix so the two can move forwards separately.
* Instead of separate path for RCiEP handling use the method suggested by Bjorn
  and Sathyanarayanan with an adjusted pci_bus_walk.

Thanks all for reviews of V1.
  
 drivers/pci/bus.c      | 28 ++++++++++++++++++++++++++++
 drivers/pci/pcie/err.c | 29 +++++++++++++++++++----------
 include/linux/pci.h    |  2 ++
 3 files changed, 49 insertions(+), 10 deletions(-)

Comments

Kuppuswamy Sathyanarayanan June 26, 2020, 4:18 p.m. UTC | #1
Hi, Jonathan,

On 6/22/20 4:44 AM, Jonathan Cameron wrote:
> Was previously: PCI/AER: Add partial initial supprot for RCiEPs using RCEC or
> firmware first.
> 
> Currently the kernel does not handle AER errors for Root Complex integrated
> End Points (RCiEPs)[0].
This patch makes pcie_do_recovery() aware of RCEC and RCiEP. But, did
you add support to receive IRQ for error detected in RCEC?
   These devices sit on a root bus within the Root Complex
> (RC).  AER handling is performed by a Root Complex Event Collector (RCEC) [1]
> which is a effectively a type of RCiEP on the same root bus.
> 
> This code will only perform the correct reset flow for the case where there
> is no need to take any actions on the RCEC because the firmware is
> responsible for them.   This is true where APEI [2] is used to report the AER
> errors via a GHES[v2] HEST entry [3] and relevant AER CPER record [4] and Firmware
> First handling is in use.
> 
> As there is no current RCEC driver support, it should not be possible to get
> to this code via any routes other than the one above. Hence appropriate RCEC
> handling can be added when the RCEC driver support is ready.
> 
> The error handling is different from a normal PCIe End Point because:
> 
> 1) There is no downstream port above an RCiEP as these devices sit on a root
>     bus.
> 
> 2) In general, it makes little sense to reset other devices on on the same
>     root bus.  For error handling outside the of the root complex (RC) an AER
>     error will indicate that all the topology below the physical link, which
>     the error is related to, will need to be reset as they share a common
>     path to the host.  For an RCiEP there is no such defined shared path
>     relationship with other elements on the root bus.
> 
> A new walk function, similar to pci_bus_walk is provided that takes a pci_dev
> instead of a bus.  If that dev corresponds to a downstream port it will walk
> the subordinate bus of that downstream port.  If the dev does not then it
> will call the function on that device alone.   This function allows us to
> avoid adding special cases to the majority of the error handling.
> 
> Open questions:
> 
> 1. Are we better protecting against link reset for an RCiEP in here or
>     should we put the check in the link reset functions?
> 
> 2. If we were to get a stupid firmware record with the relevant reset flag
>     set to trigger a link reset, what is the correct response?  For now I
>     try to report that we haven't done anything and print a warning.
> 
> 3. Naming of pci_walk_below_dev is rather unsatisfying. Any better ideas?
> 
> 4. pci_walk_below_dev is perhaps not of general utility. Shall I make it local
>     in err.c?  If not would a precursor patch for that be preferred?
> 
> Testing has been performed via error injection on a QEMU platform as that lets
> me create a wide range of topologies and report errors at any chosen location.
> Currently I have no plans to upstream this injection support, but am happy to
> share if useful to others.
> 
> [0] ACPI PCI Express Base Specification 4.0 1.3.2.3 Root Complex Integrated
>      Endpoint Rules.
> [1] ACPI PCI Express Base Specification 4.0 6.2 Error Signalling and Logging
> [2] ACPI Specification 6.3 Chapter 18 ACPI Platform Error Interface (APEI)
> [3] ACPI Sepcification 6.3 18.2.3.7 Generic Hardware Error Source
> [4] UEFI Specification 2.8, N.2.7 PCI Express Error Section
> 
> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> ---
> Changes since v1:
> * Separated from the largely unrelated fix so the two can move forwards separately.
> * Instead of separate path for RCiEP handling use the method suggested by Bjorn
>    and Sathyanarayanan with an adjusted pci_bus_walk.
> 
> Thanks all for reviews of V1.
>    
>   drivers/pci/bus.c      | 28 ++++++++++++++++++++++++++++
>   drivers/pci/pcie/err.c | 29 +++++++++++++++++++----------
>   include/linux/pci.h    |  2 ++
>   3 files changed, 49 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
> index 8e40b3e6da77..7cbe1ed2db3d 100644
> --- a/drivers/pci/bus.c
> +++ b/drivers/pci/bus.c
> @@ -411,6 +411,34 @@ void pci_walk_bus(struct pci_bus *top, int (*cb)(struct pci_dev *, void *),
>   }
>   EXPORT_SYMBOL_GPL(pci_walk_bus);
>   
> +/** pci_walk_below_dev - walk devices below (or on) another device
> + *  @dev      device for which we should walk below, include device when not a port.
> + *  @cb       callback to be called for each device found
> + *  @userdata arbitrary pointer to be passed to callback.
> + *
> + *  If the device provided is a port,
> + *  walk the subordinate bus, including any bridged devices
> + *  on buses under this bus.  Call the provided callback
> + *  on each device found.
> + *
> + *  If the device provided hs no subordinate bus, call the provided
> + *  callback on the device itself.
> + *
> + */
> +void pci_walk_below_dev(struct pci_dev *dev, int (*cb)(struct pci_dev *, void *),
> +			void *userdata)
> +{
> +	struct pci_bus *bus;
> +
> +	if (dev->subordinate) {
> +		bus = dev->subordinate;
> +		pci_walk_bus(bus, cb, userdata);
> +	} else {
> +		cb(dev, userdata);
> +	}
> +}
> +EXPORT_SYMBOL_GPL(pci_walk_below_dev);
> +
>   struct pci_bus *pci_bus_get(struct pci_bus *bus)
>   {
>   	if (bus)
> diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
> index 14bb8f54723e..fa08b1cc3d96 100644
> --- a/drivers/pci/pcie/err.c
> +++ b/drivers/pci/pcie/err.c
> @@ -151,33 +151,39 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
>   			pci_ers_result_t (*reset_link)(struct pci_dev *pdev))
>   {
>   	pci_ers_result_t status = PCI_ERS_RESULT_CAN_RECOVER;
> -	struct pci_bus *bus;
>   
>   	/*
>   	 * Error recovery runs on all subordinates of the first downstream port.
>   	 * If the downstream port detected the error, it is cleared at the end.
> +	 * For RCiEPs we should reset just the RCiEP itself.
>   	 */
>   	if (!(pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT ||
> -	      pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM))
> +	      pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM ||
> +	      pci_pcie_type(dev) == PCI_EXP_TYPE_RC_END))
>   		dev = dev->bus->self;
> -	bus = dev->subordinate;
>   
>   	pci_dbg(dev, "broadcast error_detected message\n");
>   	if (state == pci_channel_io_frozen) {
> -		pci_walk_bus(bus, report_frozen_detected, &status);
> +		pci_walk_below_dev(dev, report_frozen_detected, &status);
> +		if (pci_pcie_type(dev) == PCI_EXP_TYPE_RC_END) {
> +			pci_warn(dev, "link reset not possible for RCiEP\n");
> +			status = PCI_ERS_RESULT_NONE;
> +			goto failed;
> +		}
> +
>   		status = reset_link(dev);
>   		if (status != PCI_ERS_RESULT_RECOVERED) {
>   			pci_warn(dev, "link reset failed\n");
>   			goto failed;
>   		}
>   	} else {
> -		pci_walk_bus(bus, report_normal_detected, &status);
> +		pci_walk_below_dev(dev, report_normal_detected, &status);
>   	}
>   
>   	if (status == PCI_ERS_RESULT_CAN_RECOVER) {
>   		status = PCI_ERS_RESULT_RECOVERED;
>   		pci_dbg(dev, "broadcast mmio_enabled message\n");
> -		pci_walk_bus(bus, report_mmio_enabled, &status);
> +		pci_walk_below_dev(dev, report_mmio_enabled, &status);
>   	}
>   
>   	if (status == PCI_ERS_RESULT_NEED_RESET) {
> @@ -188,17 +194,20 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
>   		 */
>   		status = PCI_ERS_RESULT_RECOVERED;
>   		pci_dbg(dev, "broadcast slot_reset message\n");
> -		pci_walk_bus(bus, report_slot_reset, &status);
> +		pci_walk_below_dev(dev, report_slot_reset, &status);
>   	}
>   
>   	if (status != PCI_ERS_RESULT_RECOVERED)
>   		goto failed;
>   
>   	pci_dbg(dev, "broadcast resume message\n");
> -	pci_walk_bus(bus, report_resume, &status);
> +	pci_walk_below_dev(dev, report_resume, &status);
>   
> -	pci_aer_clear_device_status(dev);
> -	pci_aer_clear_nonfatal_status(dev);
> +	if ((pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT ||
> +	     pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM)) {
> +		pci_aer_clear_device_status(dev);
> +		pci_aer_clear_nonfatal_status(dev);
> +	}
>   	pci_info(dev, "device recovery successful\n");
>   	return status;
>   
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index c79d83304e52..538bf0a76d33 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -1411,6 +1411,8 @@ int pci_scan_bridge(struct pci_bus *bus, struct pci_dev *dev, int max,
>   
>   void pci_walk_bus(struct pci_bus *top, int (*cb)(struct pci_dev *, void *),
>   		  void *userdata);
> +void pci_walk_below_dev(struct pci_dev *dev, int (*cb)(struct pci_dev *, void *),
> +			void *userdata);
>   int pci_cfg_space_size(struct pci_dev *dev);
>   unsigned char pci_bus_max_busnr(struct pci_bus *bus);
>   void pci_setup_bridge(struct pci_bus *bus);
>
Sean V Kelley June 26, 2020, 4:29 p.m. UTC | #2
Hi,


On 22 Jun 2020, at 4:44, Jonathan Cameron wrote:

> Was previously: PCI/AER: Add partial initial supprot for RCiEPs using 
> RCEC or
> firmware first.
>
> Currently the kernel does not handle AER errors for Root Complex 
> integrated
> End Points (RCiEPs)[0].  These devices sit on a root bus within the 
> Root Complex
> (RC).  AER handling is performed by a Root Complex Event Collector 
> (RCEC) [1]
> which is a effectively a type of RCiEP on the same root bus.
>
> This code will only perform the correct reset flow for the case where 
> there
> is no need to take any actions on the RCEC because the firmware is
> responsible for them.   This is true where APEI [2] is used to report 
> the AER
> errors via a GHES[v2] HEST entry [3] and relevant AER CPER record [4] 
> and Firmware
> First handling is in use.

Right, in the case of the RCEC one identifies the RCiEPs by the RCiEP 
bitmap as a part of the RCEC Associated Endpoint Extended Capabilities.  
This ‘search’ so to speak would make use also of the RCEC Associated 
Bus Numbers Register to associate the devices with an RCEC when not on 
the same bus.

>
> As there is no current RCEC driver support, it should not be possible 
> to get
> to this code via any routes other than the one above. Hence 
> appropriate RCEC
> handling can be added when the RCEC driver support is ready.


>
> The error handling is different from a normal PCIe End Point because:
>
> 1) There is no downstream port above an RCiEP as these devices sit on 
> a root
>    bus.
>
> 2) In general, it makes little sense to reset other devices on on the 
> same
>    root bus.  For error handling outside the of the root complex (RC) 
> an AER
>    error will indicate that all the topology below the physical link, 
> which
>    the error is related to, will need to be reset as they share a 
> common
>    path to the host.  For an RCiEP there is no such defined shared 
> path
>    relationship with other elements on the root bus.
>
> A new walk function, similar to pci_bus_walk is provided that takes a 
> pci_dev
> instead of a bus.  If that dev corresponds to a downstream port it 
> will walk
> the subordinate bus of that downstream port.  If the dev does not then 
> it
> will call the function on that device alone.   This function allows us 
> to
> avoid adding special cases to the majority of the error handling.

Then in that case the callback could add the additional checks specific 
to identifying the associated RCiEPs.

>
> Open questions:
>
> 1. Are we better protecting against link reset for an RCiEP in here or
>    should we put the check in the link reset functions?
>
> 2. If we were to get a stupid firmware record with the relevant reset 
> flag
>    set to trigger a link reset, what is the correct response?  For now 
> I
>    try to report that we haven't done anything and print a warning.
>
> 3. Naming of pci_walk_below_dev is rather unsatisfying. Any better 
> ideas?
>
> 4. pci_walk_below_dev is perhaps not of general utility. Shall I make 
> it local
>    in err.c?  If not would a precursor patch for that be preferred?’

It depends.  Is it intended as a drop in replacement where needed for 
pci_walk_bus()? So in that case you are now passing the dev structure 
and do the check for subordinate or is it intended as being specific to 
say RCEC? With AER, one could either first check for RC_EC type before 
using this one.  Or one could just drop in replace (passing the dev 
structure instead) and the call back performs the RCEC specific checks 
when a device is encountered.

>
> Testing has been performed via error injection on a QEMU platform as 
> that lets
> me create a wide range of topologies and report errors at any chosen 
> location.
> Currently I have no plans to upstream this injection support, but am 
> happy to
> share if useful to others.

I’m experimenting with it in my RCEC code in AER and will give you 
additional feedback.

Thanks,

Sean


>
> [0] ACPI PCI Express Base Specification 4.0 1.3.2.3 Root Complex 
> Integrated
>     Endpoint Rules.
> [1] ACPI PCI Express Base Specification 4.0 6.2 Error Signalling and 
> Logging
> [2] ACPI Specification 6.3 Chapter 18 ACPI Platform Error Interface 
> (APEI)
> [3] ACPI Sepcification 6.3 18.2.3.7 Generic Hardware Error Source
> [4] UEFI Specification 2.8, N.2.7 PCI Express Error Section
>
> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> ---
> Changes since v1:
> * Separated from the largely unrelated fix so the two can move 
> forwards separately.
> * Instead of separate path for RCiEP handling use the method suggested 
> by Bjorn
>   and Sathyanarayanan with an adjusted pci_bus_walk.
>
> Thanks all for reviews of V1.
>
>  drivers/pci/bus.c      | 28 ++++++++++++++++++++++++++++
>  drivers/pci/pcie/err.c | 29 +++++++++++++++++++----------
>  include/linux/pci.h    |  2 ++
>  3 files changed, 49 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
> index 8e40b3e6da77..7cbe1ed2db3d 100644
> --- a/drivers/pci/bus.c
> +++ b/drivers/pci/bus.c
> @@ -411,6 +411,34 @@ void pci_walk_bus(struct pci_bus *top, int 
> (*cb)(struct pci_dev *, void *),
>  }
>  EXPORT_SYMBOL_GPL(pci_walk_bus);
>
> +/** pci_walk_below_dev - walk devices below (or on) another device
> + *  @dev      device for which we should walk below, include device 
> when not a port.
> + *  @cb       callback to be called for each device found
> + *  @userdata arbitrary pointer to be passed to callback.
> + *
> + *  If the device provided is a port,
> + *  walk the subordinate bus, including any bridged devices
> + *  on buses under this bus.  Call the provided callback
> + *  on each device found.
> + *
> + *  If the device provided hs no subordinate bus, call the provided
> + *  callback on the device itself.
> + *
> + */
> +void pci_walk_below_dev(struct pci_dev *dev, int (*cb)(struct pci_dev 
> *, void *),
> +			void *userdata)
> +{
> +	struct pci_bus *bus;
> +
> +	if (dev->subordinate) {
> +		bus = dev->subordinate;
> +		pci_walk_bus(bus, cb, userdata);
> +	} else {
> +		cb(dev, userdata);
> +	}
> +}
> +EXPORT_SYMBOL_GPL(pci_walk_below_dev);
> +
>  struct pci_bus *pci_bus_get(struct pci_bus *bus)
>  {
>  	if (bus)
> diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
> index 14bb8f54723e..fa08b1cc3d96 100644
> --- a/drivers/pci/pcie/err.c
> +++ b/drivers/pci/pcie/err.c
> @@ -151,33 +151,39 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev 
> *dev,
>  			pci_ers_result_t (*reset_link)(struct pci_dev *pdev))
>  {
>  	pci_ers_result_t status = PCI_ERS_RESULT_CAN_RECOVER;
> -	struct pci_bus *bus;
>
>  	/*
>  	 * Error recovery runs on all subordinates of the first downstream 
> port.
>  	 * If the downstream port detected the error, it is cleared at the 
> end.
> +	 * For RCiEPs we should reset just the RCiEP itself.
>  	 */
>  	if (!(pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT ||
> -	      pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM))
> +	      pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM ||
> +	      pci_pcie_type(dev) == PCI_EXP_TYPE_RC_END))
>  		dev = dev->bus->self;
> -	bus = dev->subordinate;
>
>  	pci_dbg(dev, "broadcast error_detected message\n");
>  	if (state == pci_channel_io_frozen) {
> -		pci_walk_bus(bus, report_frozen_detected, &status);
> +		pci_walk_below_dev(dev, report_frozen_detected, &status);
> +		if (pci_pcie_type(dev) == PCI_EXP_TYPE_RC_END) {
> +			pci_warn(dev, "link reset not possible for RCiEP\n");
> +			status = PCI_ERS_RESULT_NONE;
> +			goto failed;
> +		}
> +
>  		status = reset_link(dev);
>  		if (status != PCI_ERS_RESULT_RECOVERED) {
>  			pci_warn(dev, "link reset failed\n");
>  			goto failed;
>  		}
>  	} else {
> -		pci_walk_bus(bus, report_normal_detected, &status);
> +		pci_walk_below_dev(dev, report_normal_detected, &status);
>  	}
>
>  	if (status == PCI_ERS_RESULT_CAN_RECOVER) {
>  		status = PCI_ERS_RESULT_RECOVERED;
>  		pci_dbg(dev, "broadcast mmio_enabled message\n");
> -		pci_walk_bus(bus, report_mmio_enabled, &status);
> +		pci_walk_below_dev(dev, report_mmio_enabled, &status);
>  	}
>
>  	if (status == PCI_ERS_RESULT_NEED_RESET) {
> @@ -188,17 +194,20 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev 
> *dev,
>  		 */
>  		status = PCI_ERS_RESULT_RECOVERED;
>  		pci_dbg(dev, "broadcast slot_reset message\n");
> -		pci_walk_bus(bus, report_slot_reset, &status);
> +		pci_walk_below_dev(dev, report_slot_reset, &status);
>  	}
>
>  	if (status != PCI_ERS_RESULT_RECOVERED)
>  		goto failed;
>
>  	pci_dbg(dev, "broadcast resume message\n");
> -	pci_walk_bus(bus, report_resume, &status);
> +	pci_walk_below_dev(dev, report_resume, &status);
>
> -	pci_aer_clear_device_status(dev);
> -	pci_aer_clear_nonfatal_status(dev);
> +	if ((pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT ||
> +	     pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM)) {
> +		pci_aer_clear_device_status(dev);
> +		pci_aer_clear_nonfatal_status(dev);
> +	}
>  	pci_info(dev, "device recovery successful\n");
>  	return status;
>
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index c79d83304e52..538bf0a76d33 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -1411,6 +1411,8 @@ int pci_scan_bridge(struct pci_bus *bus, struct 
> pci_dev *dev, int max,
>
>  void pci_walk_bus(struct pci_bus *top, int (*cb)(struct pci_dev *, 
> void *),
>  		  void *userdata);
> +void pci_walk_below_dev(struct pci_dev *dev, int (*cb)(struct pci_dev 
> *, void *),
> +			void *userdata);
>  int pci_cfg_space_size(struct pci_dev *dev);
>  unsigned char pci_bus_max_busnr(struct pci_bus *bus);
>  void pci_setup_bridge(struct pci_bus *bus);
> -- 
> 2.19.1
Jonathan Cameron June 26, 2020, 6:24 p.m. UTC | #3
On Fri, 26 Jun 2020 09:18:36 -0700
"Kuppuswamy, Sathyanarayanan" <sathyanarayanan.kuppuswamy@linux.intel.com> wrote:

> Hi, Jonathan,
> 
> On 6/22/20 4:44 AM, Jonathan Cameron wrote:
> > Was previously: PCI/AER: Add partial initial supprot for RCiEPs using RCEC or
> > firmware first.
> > 
> > Currently the kernel does not handle AER errors for Root Complex integrated
> > End Points (RCiEPs)[0].  
> This patch makes pcie_do_recovery() aware of RCEC and RCiEP. But, did
> you add support to receive IRQ for error detected in RCEC?
Hi,

No.  This patch only handles the case where the AER error is reported
via APEI GHESv2.  In that path the firmware is responsible for everything to
do with the RCEC.  Completely generic infrastructure is used to deliver
a CPER block that tells us there was an AER error and where it was etc.

Unless I'm missunderstanding something!

The v1 patch had some hints of handling for the path where the RCEC IRQ
etc are handled by the OS, but I have removed that entirely from v2.

Thanks,

Jonathan


>    These devices sit on a root bus within the Root Complex
> > (RC).  AER handling is performed by a Root Complex Event Collector (RCEC) [1]
> > which is a effectively a type of RCiEP on the same root bus.
> > 
> > This code will only perform the correct reset flow for the case where there
> > is no need to take any actions on the RCEC because the firmware is
> > responsible for them.   This is true where APEI [2] is used to report the AER
> > errors via a GHES[v2] HEST entry [3] and relevant AER CPER record [4] and Firmware
> > First handling is in use.
> > 
> > As there is no current RCEC driver support, it should not be possible to get
> > to this code via any routes other than the one above. Hence appropriate RCEC
> > handling can be added when the RCEC driver support is ready.
> > 
> > The error handling is different from a normal PCIe End Point because:
> > 
> > 1) There is no downstream port above an RCiEP as these devices sit on a root
> >     bus.
> > 
> > 2) In general, it makes little sense to reset other devices on on the same
> >     root bus.  For error handling outside the of the root complex (RC) an AER
> >     error will indicate that all the topology below the physical link, which
> >     the error is related to, will need to be reset as they share a common
> >     path to the host.  For an RCiEP there is no such defined shared path
> >     relationship with other elements on the root bus.
> > 
> > A new walk function, similar to pci_bus_walk is provided that takes a pci_dev
> > instead of a bus.  If that dev corresponds to a downstream port it will walk
> > the subordinate bus of that downstream port.  If the dev does not then it
> > will call the function on that device alone.   This function allows us to
> > avoid adding special cases to the majority of the error handling.
> > 
> > Open questions:
> > 
> > 1. Are we better protecting against link reset for an RCiEP in here or
> >     should we put the check in the link reset functions?
> > 
> > 2. If we were to get a stupid firmware record with the relevant reset flag
> >     set to trigger a link reset, what is the correct response?  For now I
> >     try to report that we haven't done anything and print a warning.
> > 
> > 3. Naming of pci_walk_below_dev is rather unsatisfying. Any better ideas?
> > 
> > 4. pci_walk_below_dev is perhaps not of general utility. Shall I make it local
> >     in err.c?  If not would a precursor patch for that be preferred?
> > 
> > Testing has been performed via error injection on a QEMU platform as that lets
> > me create a wide range of topologies and report errors at any chosen location.
> > Currently I have no plans to upstream this injection support, but am happy to
> > share if useful to others.
> > 
> > [0] ACPI PCI Express Base Specification 4.0 1.3.2.3 Root Complex Integrated
> >      Endpoint Rules.
> > [1] ACPI PCI Express Base Specification 4.0 6.2 Error Signalling and Logging
> > [2] ACPI Specification 6.3 Chapter 18 ACPI Platform Error Interface (APEI)
> > [3] ACPI Sepcification 6.3 18.2.3.7 Generic Hardware Error Source
> > [4] UEFI Specification 2.8, N.2.7 PCI Express Error Section
> > 
> > Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > ---
> > Changes since v1:
> > * Separated from the largely unrelated fix so the two can move forwards separately.
> > * Instead of separate path for RCiEP handling use the method suggested by Bjorn
> >    and Sathyanarayanan with an adjusted pci_bus_walk.
> > 
> > Thanks all for reviews of V1.
> >    
> >   drivers/pci/bus.c      | 28 ++++++++++++++++++++++++++++
> >   drivers/pci/pcie/err.c | 29 +++++++++++++++++++----------
> >   include/linux/pci.h    |  2 ++
> >   3 files changed, 49 insertions(+), 10 deletions(-)
> > 
> > diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
> > index 8e40b3e6da77..7cbe1ed2db3d 100644
> > --- a/drivers/pci/bus.c
> > +++ b/drivers/pci/bus.c
> > @@ -411,6 +411,34 @@ void pci_walk_bus(struct pci_bus *top, int (*cb)(struct pci_dev *, void *),
> >   }
> >   EXPORT_SYMBOL_GPL(pci_walk_bus);
> >   
> > +/** pci_walk_below_dev - walk devices below (or on) another device
> > + *  @dev      device for which we should walk below, include device when not a port.
> > + *  @cb       callback to be called for each device found
> > + *  @userdata arbitrary pointer to be passed to callback.
> > + *
> > + *  If the device provided is a port,
> > + *  walk the subordinate bus, including any bridged devices
> > + *  on buses under this bus.  Call the provided callback
> > + *  on each device found.
> > + *
> > + *  If the device provided hs no subordinate bus, call the provided
> > + *  callback on the device itself.
> > + *
> > + */
> > +void pci_walk_below_dev(struct pci_dev *dev, int (*cb)(struct pci_dev *, void *),
> > +			void *userdata)
> > +{
> > +	struct pci_bus *bus;
> > +
> > +	if (dev->subordinate) {
> > +		bus = dev->subordinate;
> > +		pci_walk_bus(bus, cb, userdata);
> > +	} else {
> > +		cb(dev, userdata);
> > +	}
> > +}
> > +EXPORT_SYMBOL_GPL(pci_walk_below_dev);
> > +
> >   struct pci_bus *pci_bus_get(struct pci_bus *bus)
> >   {
> >   	if (bus)
> > diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
> > index 14bb8f54723e..fa08b1cc3d96 100644
> > --- a/drivers/pci/pcie/err.c
> > +++ b/drivers/pci/pcie/err.c
> > @@ -151,33 +151,39 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
> >   			pci_ers_result_t (*reset_link)(struct pci_dev *pdev))
> >   {
> >   	pci_ers_result_t status = PCI_ERS_RESULT_CAN_RECOVER;
> > -	struct pci_bus *bus;
> >   
> >   	/*
> >   	 * Error recovery runs on all subordinates of the first downstream port.
> >   	 * If the downstream port detected the error, it is cleared at the end.
> > +	 * For RCiEPs we should reset just the RCiEP itself.
> >   	 */
> >   	if (!(pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT ||
> > -	      pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM))
> > +	      pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM ||
> > +	      pci_pcie_type(dev) == PCI_EXP_TYPE_RC_END))
> >   		dev = dev->bus->self;
> > -	bus = dev->subordinate;
> >   
> >   	pci_dbg(dev, "broadcast error_detected message\n");
> >   	if (state == pci_channel_io_frozen) {
> > -		pci_walk_bus(bus, report_frozen_detected, &status);
> > +		pci_walk_below_dev(dev, report_frozen_detected, &status);
> > +		if (pci_pcie_type(dev) == PCI_EXP_TYPE_RC_END) {
> > +			pci_warn(dev, "link reset not possible for RCiEP\n");
> > +			status = PCI_ERS_RESULT_NONE;
> > +			goto failed;
> > +		}
> > +
> >   		status = reset_link(dev);
> >   		if (status != PCI_ERS_RESULT_RECOVERED) {
> >   			pci_warn(dev, "link reset failed\n");
> >   			goto failed;
> >   		}
> >   	} else {
> > -		pci_walk_bus(bus, report_normal_detected, &status);
> > +		pci_walk_below_dev(dev, report_normal_detected, &status);
> >   	}
> >   
> >   	if (status == PCI_ERS_RESULT_CAN_RECOVER) {
> >   		status = PCI_ERS_RESULT_RECOVERED;
> >   		pci_dbg(dev, "broadcast mmio_enabled message\n");
> > -		pci_walk_bus(bus, report_mmio_enabled, &status);
> > +		pci_walk_below_dev(dev, report_mmio_enabled, &status);
> >   	}
> >   
> >   	if (status == PCI_ERS_RESULT_NEED_RESET) {
> > @@ -188,17 +194,20 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
> >   		 */
> >   		status = PCI_ERS_RESULT_RECOVERED;
> >   		pci_dbg(dev, "broadcast slot_reset message\n");
> > -		pci_walk_bus(bus, report_slot_reset, &status);
> > +		pci_walk_below_dev(dev, report_slot_reset, &status);
> >   	}
> >   
> >   	if (status != PCI_ERS_RESULT_RECOVERED)
> >   		goto failed;
> >   
> >   	pci_dbg(dev, "broadcast resume message\n");
> > -	pci_walk_bus(bus, report_resume, &status);
> > +	pci_walk_below_dev(dev, report_resume, &status);
> >   
> > -	pci_aer_clear_device_status(dev);
> > -	pci_aer_clear_nonfatal_status(dev);
> > +	if ((pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT ||
> > +	     pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM)) {
> > +		pci_aer_clear_device_status(dev);
> > +		pci_aer_clear_nonfatal_status(dev);
> > +	}
> >   	pci_info(dev, "device recovery successful\n");
> >   	return status;
> >   
> > diff --git a/include/linux/pci.h b/include/linux/pci.h
> > index c79d83304e52..538bf0a76d33 100644
> > --- a/include/linux/pci.h
> > +++ b/include/linux/pci.h
> > @@ -1411,6 +1411,8 @@ int pci_scan_bridge(struct pci_bus *bus, struct pci_dev *dev, int max,
> >   
> >   void pci_walk_bus(struct pci_bus *top, int (*cb)(struct pci_dev *, void *),
> >   		  void *userdata);
> > +void pci_walk_below_dev(struct pci_dev *dev, int (*cb)(struct pci_dev *, void *),
> > +			void *userdata);
> >   int pci_cfg_space_size(struct pci_dev *dev);
> >   unsigned char pci_bus_max_busnr(struct pci_bus *bus);
> >   void pci_setup_bridge(struct pci_bus *bus);
> >   
>
Jonathan Cameron June 26, 2020, 6:41 p.m. UTC | #4
On Fri, 26 Jun 2020 09:29:34 -0700
Sean V Kelley <sean.v.kelley@linux.intel.com> wrote:

> Hi,
Hi,

Thanks for taking a look.

> 
> 
> On 22 Jun 2020, at 4:44, Jonathan Cameron wrote:
> 
> > Was previously: PCI/AER: Add partial initial supprot for RCiEPs using 
> > RCEC or
> > firmware first.
> >
> > Currently the kernel does not handle AER errors for Root Complex 
> > integrated
> > End Points (RCiEPs)[0].  These devices sit on a root bus within the 
> > Root Complex
> > (RC).  AER handling is performed by a Root Complex Event Collector 
> > (RCEC) [1]
> > which is a effectively a type of RCiEP on the same root bus.
> >
> > This code will only perform the correct reset flow for the case where 
> > there
> > is no need to take any actions on the RCEC because the firmware is
> > responsible for them.   This is true where APEI [2] is used to report 
> > the AER
> > errors via a GHES[v2] HEST entry [3] and relevant AER CPER record [4] 
> > and Firmware
> > First handling is in use.  
> 
> Right, in the case of the RCEC one identifies the RCiEPs by the RCiEP 
> bitmap as a part of the RCEC Associated Endpoint Extended Capabilities.  
> This ‘search’ so to speak would make use also of the RCEC Associated 
> Bus Numbers Register to associate the devices with an RCEC when not on 
> the same bus.

Ah. I'm afraid my access to recent specs is a bit limited at the moment.
I do have a draft 5.0 spec which has that in though so I now see what you mean.

Was introduced in Root Complex Event Collector Endpoint Association Extended
Capability version 2 in PCIe 5.0 I think.

> 
> >
> > As there is no current RCEC driver support, it should not be possible 
> > to get
> > to this code via any routes other than the one above. Hence 
> > appropriate RCEC
> > handling can be added when the RCEC driver support is ready.  
> 
> 
> >
> > The error handling is different from a normal PCIe End Point because:
> >
> > 1) There is no downstream port above an RCiEP as these devices sit on 
> > a root
> >    bus.
> >
> > 2) In general, it makes little sense to reset other devices on on the 
> > same
> >    root bus.  For error handling outside the of the root complex (RC) 
> > an AER
> >    error will indicate that all the topology below the physical link, 
> > which
> >    the error is related to, will need to be reset as they share a 
> > common
> >    path to the host.  For an RCiEP there is no such defined shared 
> > path
> >    relationship with other elements on the root bus.
> >
> > A new walk function, similar to pci_bus_walk is provided that takes a 
> > pci_dev
> > instead of a bus.  If that dev corresponds to a downstream port it 
> > will walk
> > the subordinate bus of that downstream port.  If the dev does not then 
> > it
> > will call the function on that device alone.   This function allows us 
> > to
> > avoid adding special cases to the majority of the error handling.  
> 
> Then in that case the callback could add the additional checks specific 
> to identifying the associated RCiEPs.

I am afraid I don't follow what you mean here.  Could you give more info?

> 
> >
> > Open questions:
> >
> > 1. Are we better protecting against link reset for an RCiEP in here or
> >    should we put the check in the link reset functions?
> >
> > 2. If we were to get a stupid firmware record with the relevant reset 
> > flag
> >    set to trigger a link reset, what is the correct response?  For now 
> > I
> >    try to report that we haven't done anything and print a warning.
> >
> > 3. Naming of pci_walk_below_dev is rather unsatisfying. Any better 
> > ideas?
> >
> > 4. pci_walk_below_dev is perhaps not of general utility. Shall I make 
> > it local
> >    in err.c?  If not would a precursor patch for that be preferred?’  
> 
> It depends.  Is it intended as a drop in replacement where needed for 
> pci_walk_bus()? So in that case you are now passing the dev structure 
> and do the check for subordinate or is it intended as being specific to 
> say RCEC? With AER, one could either first check for RC_EC type before 
> using this one.  Or one could just drop in replace (passing the dev 
> structure instead) and the call back performs the RCEC specific checks 
> when a device is encountered.

If it is useful in aer.c that's great.   Just seemed such a weird beast
I wasn't sure it would be of use anywhere else.

> 
> >
> > Testing has been performed via error injection on a QEMU platform as 
> > that lets
> > me create a wide range of topologies and report errors at any chosen 
> > location.
> > Currently I have no plans to upstream this injection support, but am 
> > happy to
> > share if useful to others.  
> 
> I’m experimenting with it in my RCEC code in AER and will give you 
> additional feedback.

Great, thanks

Jonathan

> 
> Thanks,
> 
> Sean
> 
> 
> >
> > [0] ACPI PCI Express Base Specification 4.0 1.3.2.3 Root Complex 
> > Integrated
> >     Endpoint Rules.
> > [1] ACPI PCI Express Base Specification 4.0 6.2 Error Signalling and 
> > Logging
> > [2] ACPI Specification 6.3 Chapter 18 ACPI Platform Error Interface 
> > (APEI)
> > [3] ACPI Sepcification 6.3 18.2.3.7 Generic Hardware Error Source
> > [4] UEFI Specification 2.8, N.2.7 PCI Express Error Section
> >
> > Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > ---
> > Changes since v1:
> > * Separated from the largely unrelated fix so the two can move 
> > forwards separately.
> > * Instead of separate path for RCiEP handling use the method suggested 
> > by Bjorn
> >   and Sathyanarayanan with an adjusted pci_bus_walk.
> >
> > Thanks all for reviews of V1.
> >
> >  drivers/pci/bus.c      | 28 ++++++++++++++++++++++++++++
> >  drivers/pci/pcie/err.c | 29 +++++++++++++++++++----------
> >  include/linux/pci.h    |  2 ++
> >  3 files changed, 49 insertions(+), 10 deletions(-)
> >
> > diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
> > index 8e40b3e6da77..7cbe1ed2db3d 100644
> > --- a/drivers/pci/bus.c
> > +++ b/drivers/pci/bus.c
> > @@ -411,6 +411,34 @@ void pci_walk_bus(struct pci_bus *top, int 
> > (*cb)(struct pci_dev *, void *),
> >  }
> >  EXPORT_SYMBOL_GPL(pci_walk_bus);
> >
> > +/** pci_walk_below_dev - walk devices below (or on) another device
> > + *  @dev      device for which we should walk below, include device 
> > when not a port.
> > + *  @cb       callback to be called for each device found
> > + *  @userdata arbitrary pointer to be passed to callback.
> > + *
> > + *  If the device provided is a port,
> > + *  walk the subordinate bus, including any bridged devices
> > + *  on buses under this bus.  Call the provided callback
> > + *  on each device found.
> > + *
> > + *  If the device provided hs no subordinate bus, call the provided
> > + *  callback on the device itself.
> > + *
> > + */
> > +void pci_walk_below_dev(struct pci_dev *dev, int (*cb)(struct pci_dev 
> > *, void *),
> > +			void *userdata)
> > +{
> > +	struct pci_bus *bus;
> > +
> > +	if (dev->subordinate) {
> > +		bus = dev->subordinate;
> > +		pci_walk_bus(bus, cb, userdata);
> > +	} else {
> > +		cb(dev, userdata);
> > +	}
> > +}
> > +EXPORT_SYMBOL_GPL(pci_walk_below_dev);
> > +
> >  struct pci_bus *pci_bus_get(struct pci_bus *bus)
> >  {
> >  	if (bus)
> > diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
> > index 14bb8f54723e..fa08b1cc3d96 100644
> > --- a/drivers/pci/pcie/err.c
> > +++ b/drivers/pci/pcie/err.c
> > @@ -151,33 +151,39 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev 
> > *dev,
> >  			pci_ers_result_t (*reset_link)(struct pci_dev *pdev))
> >  {
> >  	pci_ers_result_t status = PCI_ERS_RESULT_CAN_RECOVER;
> > -	struct pci_bus *bus;
> >
> >  	/*
> >  	 * Error recovery runs on all subordinates of the first downstream 
> > port.
> >  	 * If the downstream port detected the error, it is cleared at the 
> > end.
> > +	 * For RCiEPs we should reset just the RCiEP itself.
> >  	 */
> >  	if (!(pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT ||
> > -	      pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM))
> > +	      pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM ||
> > +	      pci_pcie_type(dev) == PCI_EXP_TYPE_RC_END))
> >  		dev = dev->bus->self;
> > -	bus = dev->subordinate;
> >
> >  	pci_dbg(dev, "broadcast error_detected message\n");
> >  	if (state == pci_channel_io_frozen) {
> > -		pci_walk_bus(bus, report_frozen_detected, &status);
> > +		pci_walk_below_dev(dev, report_frozen_detected, &status);
> > +		if (pci_pcie_type(dev) == PCI_EXP_TYPE_RC_END) {
> > +			pci_warn(dev, "link reset not possible for RCiEP\n");
> > +			status = PCI_ERS_RESULT_NONE;
> > +			goto failed;
> > +		}
> > +
> >  		status = reset_link(dev);
> >  		if (status != PCI_ERS_RESULT_RECOVERED) {
> >  			pci_warn(dev, "link reset failed\n");
> >  			goto failed;
> >  		}
> >  	} else {
> > -		pci_walk_bus(bus, report_normal_detected, &status);
> > +		pci_walk_below_dev(dev, report_normal_detected, &status);
> >  	}
> >
> >  	if (status == PCI_ERS_RESULT_CAN_RECOVER) {
> >  		status = PCI_ERS_RESULT_RECOVERED;
> >  		pci_dbg(dev, "broadcast mmio_enabled message\n");
> > -		pci_walk_bus(bus, report_mmio_enabled, &status);
> > +		pci_walk_below_dev(dev, report_mmio_enabled, &status);
> >  	}
> >
> >  	if (status == PCI_ERS_RESULT_NEED_RESET) {
> > @@ -188,17 +194,20 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev 
> > *dev,
> >  		 */
> >  		status = PCI_ERS_RESULT_RECOVERED;
> >  		pci_dbg(dev, "broadcast slot_reset message\n");
> > -		pci_walk_bus(bus, report_slot_reset, &status);
> > +		pci_walk_below_dev(dev, report_slot_reset, &status);
> >  	}
> >
> >  	if (status != PCI_ERS_RESULT_RECOVERED)
> >  		goto failed;
> >
> >  	pci_dbg(dev, "broadcast resume message\n");
> > -	pci_walk_bus(bus, report_resume, &status);
> > +	pci_walk_below_dev(dev, report_resume, &status);
> >
> > -	pci_aer_clear_device_status(dev);
> > -	pci_aer_clear_nonfatal_status(dev);
> > +	if ((pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT ||
> > +	     pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM)) {
> > +		pci_aer_clear_device_status(dev);
> > +		pci_aer_clear_nonfatal_status(dev);
> > +	}
> >  	pci_info(dev, "device recovery successful\n");
> >  	return status;
> >
> > diff --git a/include/linux/pci.h b/include/linux/pci.h
> > index c79d83304e52..538bf0a76d33 100644
> > --- a/include/linux/pci.h
> > +++ b/include/linux/pci.h
> > @@ -1411,6 +1411,8 @@ int pci_scan_bridge(struct pci_bus *bus, struct 
> > pci_dev *dev, int max,
> >
> >  void pci_walk_bus(struct pci_bus *top, int (*cb)(struct pci_dev *, 
> > void *),
> >  		  void *userdata);
> > +void pci_walk_below_dev(struct pci_dev *dev, int (*cb)(struct pci_dev 
> > *, void *),
> > +			void *userdata);
> >  int pci_cfg_space_size(struct pci_dev *dev);
> >  unsigned char pci_bus_max_busnr(struct pci_bus *bus);
> >  void pci_setup_bridge(struct pci_bus *bus);
> > -- 
> > 2.19.1
Sean V Kelley July 2, 2020, 6:06 p.m. UTC | #5
On 26 Jun 2020, at 11:41, Jonathan Cameron wrote:

> On Fri, 26 Jun 2020 09:29:34 -0700
> Sean V Kelley <sean.v.kelley@linux.intel.com> wrote:
>
>> Hi,
> Hi,
>
> Thanks for taking a look.
>
>>
>>
>> On 22 Jun 2020, at 4:44, Jonathan Cameron wrote:
>>
>>> Was previously: PCI/AER: Add partial initial supprot for RCiEPs 
>>> using
>>> RCEC or
>>> firmware first.
>>>
>>> Currently the kernel does not handle AER errors for Root Complex
>>> integrated
>>> End Points (RCiEPs)[0].  These devices sit on a root bus within the
>>> Root Complex
>>> (RC).  AER handling is performed by a Root Complex Event Collector
>>> (RCEC) [1]
>>> which is a effectively a type of RCiEP on the same root bus.
>>>
>>> This code will only perform the correct reset flow for the case 
>>> where
>>> there
>>> is no need to take any actions on the RCEC because the firmware is
>>> responsible for them.   This is true where APEI [2] is used to 
>>> report
>>> the AER
>>> errors via a GHES[v2] HEST entry [3] and relevant AER CPER record 
>>> [4]
>>> and Firmware
>>> First handling is in use.
>>
>> Right, in the case of the RCEC one identifies the RCiEPs by the RCiEP
>> bitmap as a part of the RCEC Associated Endpoint Extended 
>> Capabilities.
>> This ‘search’ so to speak would make use also of the RCEC 
>> Associated
>> Bus Numbers Register to associate the devices with an RCEC when not 
>> on
>> the same bus.
>
> Ah. I'm afraid my access to recent specs is a bit limited at the 
> moment.
> I do have a draft 5.0 spec which has that in though so I now see what 
> you mean.
>
> Was introduced in Root Complex Event Collector Endpoint Association 
> Extended
> Capability version 2 in PCIe 5.0 I think.
>

Correct.

>>
>>>
>>> As there is no current RCEC driver support, it should not be 
>>> possible
>>> to get
>>> to this code via any routes other than the one above. Hence
>>> appropriate RCEC
>>> handling can be added when the RCEC driver support is ready.
>>
>>
>>>
>>> The error handling is different from a normal PCIe End Point 
>>> because:
>>>
>>> 1) There is no downstream port above an RCiEP as these devices sit 
>>> on
>>> a root
>>>    bus.
>>>
>>> 2) In general, it makes little sense to reset other devices on on 
>>> the
>>> same
>>>    root bus.  For error handling outside the of the root complex 
>>> (RC)
>>> an AER
>>>    error will indicate that all the topology below the physical 
>>> link,
>>> which
>>>    the error is related to, will need to be reset as they share a
>>> common
>>>    path to the host.  For an RCiEP there is no such defined shared
>>> path
>>>    relationship with other elements on the root bus.
>>>
>>> A new walk function, similar to pci_bus_walk is provided that takes 
>>> a
>>> pci_dev
>>> instead of a bus.  If that dev corresponds to a downstream port it
>>> will walk
>>> the subordinate bus of that downstream port.  If the dev does not 
>>> then
>>> it
>>> will call the function on that device alone.   This function allows 
>>> us
>>> to
>>> avoid adding special cases to the majority of the error handling.
>>
>> Then in that case the callback could add the additional checks 
>> specific
>> to identifying the associated RCiEPs.
>
> I am afraid I don't follow what you mean here.  Could you give more 
> info?

Sure, a given RCEC can be associated with multiple RCiEPs.  As a part of 
the Extended Association Cap it is possible to obtain a bitmap of the 
RCiEP device ids on the same bus number as the RCEC device itself. 
(5.0-1.0 sec 7.9.10.2).  With a Cap version of 2h or higher, it is also 
possible to get an additional range of bus numbers containing RCiEPs 
also associated with this RCEC.

So I’m wondering if this function could be used in which passing a 
dev, in this case the RCEC, triggers the call back which makes use of 
the RCiEP bitmap and associated bus ranges to return all identified 
devices in use cases such as in AER for finding sources, etc.

The alternative is to have a separate walk for RCECs that loops through 
the bitmap / ranges (if supported) triggering the callback for each 
device found.

Thanks,

Sean

>>
>>>
>>> Open questions:
>>>
>>> 1. Are we better protecting against link reset for an RCiEP in here 
>>> or
>>>    should we put the check in the link reset functions?
>>>
>>> 2. If we were to get a stupid firmware record with the relevant 
>>> reset
>>> flag
>>>    set to trigger a link reset, what is the correct response?  For 
>>> now
>>> I
>>>    try to report that we haven't done anything and print a warning.
>>>
>>> 3. Naming of pci_walk_below_dev is rather unsatisfying. Any better
>>> ideas?
>>>
>>> 4. pci_walk_below_dev is perhaps not of general utility. Shall I 
>>> make
>>> it local
>>>    in err.c?  If not would a precursor patch for that be 
>>> preferred?’
>>
>> It depends.  Is it intended as a drop in replacement where needed for
>> pci_walk_bus()? So in that case you are now passing the dev structure
>> and do the check for subordinate or is it intended as being specific 
>> to
>> say RCEC? With AER, one could either first check for RC_EC type 
>> before
>> using this one.  Or one could just drop in replace (passing the dev
>> structure instead) and the call back performs the RCEC specific 
>> checks
>> when a device is encountered.
>
> If it is useful in aer.c that's great.   Just seemed such a weird 
> beast
> I wasn't sure it would be of use anywhere else.
>
>>
>>>
>>> Testing has been performed via error injection on a QEMU platform as
>>> that lets
>>> me create a wide range of topologies and report errors at any chosen
>>> location.
>>> Currently I have no plans to upstream this injection support, but am
>>> happy to
>>> share if useful to others.
>>
>> I’m experimenting with it in my RCEC code in AER and will give you
>> additional feedback.
>
> Great, thanks
>
> Jonathan
>
>>
>> Thanks,
>>
>> Sean
>>
>>
>>>
>>> [0] ACPI PCI Express Base Specification 4.0 1.3.2.3 Root Complex
>>> Integrated
>>>     Endpoint Rules.
>>> [1] ACPI PCI Express Base Specification 4.0 6.2 Error Signalling and
>>> Logging
>>> [2] ACPI Specification 6.3 Chapter 18 ACPI Platform Error Interface
>>> (APEI)
>>> [3] ACPI Sepcification 6.3 18.2.3.7 Generic Hardware Error Source
>>> [4] UEFI Specification 2.8, N.2.7 PCI Express Error Section
>>>
>>> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>>> ---
>>> Changes since v1:
>>> * Separated from the largely unrelated fix so the two can move
>>> forwards separately.
>>> * Instead of separate path for RCiEP handling use the method 
>>> suggested
>>> by Bjorn
>>>   and Sathyanarayanan with an adjusted pci_bus_walk.
>>>
>>> Thanks all for reviews of V1.
>>>
>>>  drivers/pci/bus.c      | 28 ++++++++++++++++++++++++++++
>>>  drivers/pci/pcie/err.c | 29 +++++++++++++++++++----------
>>>  include/linux/pci.h    |  2 ++
>>>  3 files changed, 49 insertions(+), 10 deletions(-)
>>>
>>> diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
>>> index 8e40b3e6da77..7cbe1ed2db3d 100644
>>> --- a/drivers/pci/bus.c
>>> +++ b/drivers/pci/bus.c
>>> @@ -411,6 +411,34 @@ void pci_walk_bus(struct pci_bus *top, int
>>> (*cb)(struct pci_dev *, void *),
>>>  }
>>>  EXPORT_SYMBOL_GPL(pci_walk_bus);
>>>
>>> +/** pci_walk_below_dev - walk devices below (or on) another device
>>> + *  @dev      device for which we should walk below, include device
>>> when not a port.
>>> + *  @cb       callback to be called for each device found
>>> + *  @userdata arbitrary pointer to be passed to callback.
>>> + *
>>> + *  If the device provided is a port,
>>> + *  walk the subordinate bus, including any bridged devices
>>> + *  on buses under this bus.  Call the provided callback
>>> + *  on each device found.
>>> + *
>>> + *  If the device provided hs no subordinate bus, call the provided
>>> + *  callback on the device itself.
>>> + *
>>> + */
>>> +void pci_walk_below_dev(struct pci_dev *dev, int (*cb)(struct 
>>> pci_dev
>>> *, void *),
>>> +			void *userdata)
>>> +{
>>> +	struct pci_bus *bus;
>>> +
>>> +	if (dev->subordinate) {
>>> +		bus = dev->subordinate;
>>> +		pci_walk_bus(bus, cb, userdata);
>>> +	} else {
>>> +		cb(dev, userdata);
>>> +	}
>>> +}
>>> +EXPORT_SYMBOL_GPL(pci_walk_below_dev);
>>> +
>>>  struct pci_bus *pci_bus_get(struct pci_bus *bus)
>>>  {
>>>  	if (bus)
>>> diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
>>> index 14bb8f54723e..fa08b1cc3d96 100644
>>> --- a/drivers/pci/pcie/err.c
>>> +++ b/drivers/pci/pcie/err.c
>>> @@ -151,33 +151,39 @@ pci_ers_result_t pcie_do_recovery(struct 
>>> pci_dev
>>> *dev,
>>>  			pci_ers_result_t (*reset_link)(struct pci_dev *pdev))
>>>  {
>>>  	pci_ers_result_t status = PCI_ERS_RESULT_CAN_RECOVER;
>>> -	struct pci_bus *bus;
>>>
>>>  	/*
>>>  	 * Error recovery runs on all subordinates of the first downstream
>>> port.
>>>  	 * If the downstream port detected the error, it is cleared at the
>>> end.
>>> +	 * For RCiEPs we should reset just the RCiEP itself.
>>>  	 */
>>>  	if (!(pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT ||
>>> -	      pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM))
>>> +	      pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM ||
>>> +	      pci_pcie_type(dev) == PCI_EXP_TYPE_RC_END))
>>>  		dev = dev->bus->self;
>>> -	bus = dev->subordinate;
>>>
>>>  	pci_dbg(dev, "broadcast error_detected message\n");
>>>  	if (state == pci_channel_io_frozen) {
>>> -		pci_walk_bus(bus, report_frozen_detected, &status);
>>> +		pci_walk_below_dev(dev, report_frozen_detected, &status);
>>> +		if (pci_pcie_type(dev) == PCI_EXP_TYPE_RC_END) {
>>> +			pci_warn(dev, "link reset not possible for RCiEP\n");
>>> +			status = PCI_ERS_RESULT_NONE;
>>> +			goto failed;
>>> +		}
>>> +
>>>  		status = reset_link(dev);
>>>  		if (status != PCI_ERS_RESULT_RECOVERED) {
>>>  			pci_warn(dev, "link reset failed\n");
>>>  			goto failed;
>>>  		}
>>>  	} else {
>>> -		pci_walk_bus(bus, report_normal_detected, &status);
>>> +		pci_walk_below_dev(dev, report_normal_detected, &status);
>>>  	}
>>>
>>>  	if (status == PCI_ERS_RESULT_CAN_RECOVER) {
>>>  		status = PCI_ERS_RESULT_RECOVERED;
>>>  		pci_dbg(dev, "broadcast mmio_enabled message\n");
>>> -		pci_walk_bus(bus, report_mmio_enabled, &status);
>>> +		pci_walk_below_dev(dev, report_mmio_enabled, &status);
>>>  	}
>>>
>>>  	if (status == PCI_ERS_RESULT_NEED_RESET) {
>>> @@ -188,17 +194,20 @@ pci_ers_result_t pcie_do_recovery(struct 
>>> pci_dev
>>> *dev,
>>>  		 */
>>>  		status = PCI_ERS_RESULT_RECOVERED;
>>>  		pci_dbg(dev, "broadcast slot_reset message\n");
>>> -		pci_walk_bus(bus, report_slot_reset, &status);
>>> +		pci_walk_below_dev(dev, report_slot_reset, &status);
>>>  	}
>>>
>>>  	if (status != PCI_ERS_RESULT_RECOVERED)
>>>  		goto failed;
>>>
>>>  	pci_dbg(dev, "broadcast resume message\n");
>>> -	pci_walk_bus(bus, report_resume, &status);
>>> +	pci_walk_below_dev(dev, report_resume, &status);
>>>
>>> -	pci_aer_clear_device_status(dev);
>>> -	pci_aer_clear_nonfatal_status(dev);
>>> +	if ((pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT ||
>>> +	     pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM)) {
>>> +		pci_aer_clear_device_status(dev);
>>> +		pci_aer_clear_nonfatal_status(dev);
>>> +	}
>>>  	pci_info(dev, "device recovery successful\n");
>>>  	return status;
>>>
>>> diff --git a/include/linux/pci.h b/include/linux/pci.h
>>> index c79d83304e52..538bf0a76d33 100644
>>> --- a/include/linux/pci.h
>>> +++ b/include/linux/pci.h
>>> @@ -1411,6 +1411,8 @@ int pci_scan_bridge(struct pci_bus *bus, 
>>> struct
>>> pci_dev *dev, int max,
>>>
>>>  void pci_walk_bus(struct pci_bus *top, int (*cb)(struct pci_dev *,
>>> void *),
>>>  		  void *userdata);
>>> +void pci_walk_below_dev(struct pci_dev *dev, int (*cb)(struct 
>>> pci_dev
>>> *, void *),
>>> +			void *userdata);
>>>  int pci_cfg_space_size(struct pci_dev *dev);
>>>  unsigned char pci_bus_max_busnr(struct pci_bus *bus);
>>>  void pci_setup_bridge(struct pci_bus *bus);
>>> -- 
>>> 2.19.1
Jonathan Cameron July 3, 2020, 8:23 a.m. UTC | #6
On Thu, 2 Jul 2020 11:06:26 -0700
Sean V Kelley <sean.v.kelley@linux.intel.com> wrote:

> On 26 Jun 2020, at 11:41, Jonathan Cameron wrote:
> 
> > On Fri, 26 Jun 2020 09:29:34 -0700
> > Sean V Kelley <sean.v.kelley@linux.intel.com> wrote:
> >  
> >> Hi,  
> > Hi,
> >
> > Thanks for taking a look.
> >  
> >>
> >>
> >> On 22 Jun 2020, at 4:44, Jonathan Cameron wrote:
> >>  
> >>> Was previously: PCI/AER: Add partial initial supprot for RCiEPs 
> >>> using
> >>> RCEC or
> >>> firmware first.
> >>>
> >>> Currently the kernel does not handle AER errors for Root Complex
> >>> integrated
> >>> End Points (RCiEPs)[0].  These devices sit on a root bus within the
> >>> Root Complex
> >>> (RC).  AER handling is performed by a Root Complex Event Collector
> >>> (RCEC) [1]
> >>> which is a effectively a type of RCiEP on the same root bus.
> >>>
> >>> This code will only perform the correct reset flow for the case 
> >>> where
> >>> there
> >>> is no need to take any actions on the RCEC because the firmware is
> >>> responsible for them.   This is true where APEI [2] is used to 
> >>> report
> >>> the AER
> >>> errors via a GHES[v2] HEST entry [3] and relevant AER CPER record 
> >>> [4]
> >>> and Firmware
> >>> First handling is in use.  
> >>
> >> Right, in the case of the RCEC one identifies the RCiEPs by the RCiEP
> >> bitmap as a part of the RCEC Associated Endpoint Extended 
> >> Capabilities.
> >> This ‘search’ so to speak would make use also of the RCEC 
> >> Associated
> >> Bus Numbers Register to associate the devices with an RCEC when not 
> >> on
> >> the same bus.  
> >
> > Ah. I'm afraid my access to recent specs is a bit limited at the 
> > moment.
> > I do have a draft 5.0 spec which has that in though so I now see what 
> > you mean.
> >
> > Was introduced in Root Complex Event Collector Endpoint Association 
> > Extended
> > Capability version 2 in PCIe 5.0 I think.
> >  
> 
> Correct.
> 
> >>  
> >>>
> >>> As there is no current RCEC driver support, it should not be 
> >>> possible
> >>> to get
> >>> to this code via any routes other than the one above. Hence
> >>> appropriate RCEC
> >>> handling can be added when the RCEC driver support is ready.  
> >>
> >>  
> >>>
> >>> The error handling is different from a normal PCIe End Point 
> >>> because:
> >>>
> >>> 1) There is no downstream port above an RCiEP as these devices sit 
> >>> on
> >>> a root
> >>>    bus.
> >>>
> >>> 2) In general, it makes little sense to reset other devices on on 
> >>> the
> >>> same
> >>>    root bus.  For error handling outside the of the root complex 
> >>> (RC)
> >>> an AER
> >>>    error will indicate that all the topology below the physical 
> >>> link,
> >>> which
> >>>    the error is related to, will need to be reset as they share a
> >>> common
> >>>    path to the host.  For an RCiEP there is no such defined shared
> >>> path
> >>>    relationship with other elements on the root bus.
> >>>
> >>> A new walk function, similar to pci_bus_walk is provided that takes 
> >>> a
> >>> pci_dev
> >>> instead of a bus.  If that dev corresponds to a downstream port it
> >>> will walk
> >>> the subordinate bus of that downstream port.  If the dev does not 
> >>> then
> >>> it
> >>> will call the function on that device alone.   This function allows 
> >>> us
> >>> to
> >>> avoid adding special cases to the majority of the error handling.  
> >>
> >> Then in that case the callback could add the additional checks 
> >> specific
> >> to identifying the associated RCiEPs.  
> >
> > I am afraid I don't follow what you mean here.  Could you give more 
> > info?  
> 
> Sure, a given RCEC can be associated with multiple RCiEPs.  As a part of 
> the Extended Association Cap it is possible to obtain a bitmap of the 
> RCiEP device ids on the same bus number as the RCEC device itself. 
> (5.0-1.0 sec 7.9.10.2).  With a Cap version of 2h or higher, it is also 
> possible to get an additional range of bus numbers containing RCiEPs 
> also associated with this RCEC.
> 
> So I’m wondering if this function could be used in which passing a 
> dev, in this case the RCEC, triggers the call back which makes use of 
> the RCiEP bitmap and associated bus ranges to return all identified 
> devices in use cases such as in AER for finding sources, etc.

Ah understood.

If we do this we effectively end up with 3 different types of walk and
the meaning of the walk function gets more complex again.

1) Normal bus walk - we pass the downstream port above a bus to which
the device is attached and it walks everthing below that point.

2) Case I care about RCiEP with no visible association with an RCEC as
I don't need to walk devices.  In that case just calls the callbacks for
the actual device.

3) Pass in RCiEP with RCEC asociated with it (or do a dance at the caller
to pass in the RCEC itself). Need to walk the devices that the RCEC is
handling errors for.  For handling, I'm not all the calls will be generally
applicable to other devices associated with the RCEC as some only make
sense if there is an actual PCIe bus involved and hence we need to reset
other devices on that bus.  For RCEC I don't think there is an particular
reason to assume an AER error reported at one RCiEP will have any impact
on other devices associated with the particular RCEC.
I've not found anything in the spec addressing this question but perhaps
I've missed something?

However, if the RCEC doesn't support multiple error records, you may need
to walk the bus to identify multiple simultaneous issues, very carefully
avoiding (or least minimizing) race conditions.

> 
> The alternative is to have a separate walk for RCECs that loops through 
> the bitmap / ranges (if supported) triggering the callback for each 
> device found.

Agreed. We would end up with the same splitting of handling paths that
wasn't liked in my v1 patch.   Perhaps we need  single pci_walk_aer_affected
function with a pile of documentation for what it is actually doing?

Even then we may need to have a parameter to indicate a particular callback
should be restricted to devices that share a 'real bus' or not. 

I guess the proof will as ever be in what the code looks like.

Hmm. It increasingly feels like we may need to have a go at drawing together
some coherent documentation for the different ways of handling
AER errors and specifications / assumptions for each.  Would be 'interesting'
to do given I'm fairly sure very few people actually understands all the options and
nasty corner cases!

+CC Lorenzo and James who may also be interested in this topic in general.

Thanks,

Jonathan


> 
> Thanks,
> 
> Sean
> 
> >>  
> >>>
> >>> Open questions:
> >>>
> >>> 1. Are we better protecting against link reset for an RCiEP in here 
> >>> or
> >>>    should we put the check in the link reset functions?
> >>>
> >>> 2. If we were to get a stupid firmware record with the relevant 
> >>> reset
> >>> flag
> >>>    set to trigger a link reset, what is the correct response?  For 
> >>> now
> >>> I
> >>>    try to report that we haven't done anything and print a warning.
> >>>
> >>> 3. Naming of pci_walk_below_dev is rather unsatisfying. Any better
> >>> ideas?
> >>>
> >>> 4. pci_walk_below_dev is perhaps not of general utility. Shall I 
> >>> make
> >>> it local
> >>>    in err.c?  If not would a precursor patch for that be 
> >>> preferred?’  
> >>
> >> It depends.  Is it intended as a drop in replacement where needed for
> >> pci_walk_bus()? So in that case you are now passing the dev structure
> >> and do the check for subordinate or is it intended as being specific 
> >> to
> >> say RCEC? With AER, one could either first check for RC_EC type 
> >> before
> >> using this one.  Or one could just drop in replace (passing the dev
> >> structure instead) and the call back performs the RCEC specific 
> >> checks
> >> when a device is encountered.  
> >
> > If it is useful in aer.c that's great.   Just seemed such a weird 
> > beast
> > I wasn't sure it would be of use anywhere else.
> >  
> >>  
> >>>
> >>> Testing has been performed via error injection on a QEMU platform as
> >>> that lets
> >>> me create a wide range of topologies and report errors at any chosen
> >>> location.
> >>> Currently I have no plans to upstream this injection support, but am
> >>> happy to
> >>> share if useful to others.  
> >>
> >> I’m experimenting with it in my RCEC code in AER and will give you
> >> additional feedback.  
> >
> > Great, thanks
> >
> > Jonathan
> >  
> >>
> >> Thanks,
> >>
> >> Sean
> >>
> >>  
> >>>
> >>> [0] ACPI PCI Express Base Specification 4.0 1.3.2.3 Root Complex
> >>> Integrated
> >>>     Endpoint Rules.
> >>> [1] ACPI PCI Express Base Specification 4.0 6.2 Error Signalling and
> >>> Logging
> >>> [2] ACPI Specification 6.3 Chapter 18 ACPI Platform Error Interface
> >>> (APEI)
> >>> [3] ACPI Sepcification 6.3 18.2.3.7 Generic Hardware Error Source
> >>> [4] UEFI Specification 2.8, N.2.7 PCI Express Error Section
> >>>
> >>> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> >>> ---
> >>> Changes since v1:
> >>> * Separated from the largely unrelated fix so the two can move
> >>> forwards separately.
> >>> * Instead of separate path for RCiEP handling use the method 
> >>> suggested
> >>> by Bjorn
> >>>   and Sathyanarayanan with an adjusted pci_bus_walk.
> >>>
> >>> Thanks all for reviews of V1.
> >>>
> >>>  drivers/pci/bus.c      | 28 ++++++++++++++++++++++++++++
> >>>  drivers/pci/pcie/err.c | 29 +++++++++++++++++++----------
> >>>  include/linux/pci.h    |  2 ++
> >>>  3 files changed, 49 insertions(+), 10 deletions(-)
> >>>
> >>> diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
> >>> index 8e40b3e6da77..7cbe1ed2db3d 100644
> >>> --- a/drivers/pci/bus.c
> >>> +++ b/drivers/pci/bus.c
> >>> @@ -411,6 +411,34 @@ void pci_walk_bus(struct pci_bus *top, int
> >>> (*cb)(struct pci_dev *, void *),
> >>>  }
> >>>  EXPORT_SYMBOL_GPL(pci_walk_bus);
> >>>
> >>> +/** pci_walk_below_dev - walk devices below (or on) another device
> >>> + *  @dev      device for which we should walk below, include device
> >>> when not a port.
> >>> + *  @cb       callback to be called for each device found
> >>> + *  @userdata arbitrary pointer to be passed to callback.
> >>> + *
> >>> + *  If the device provided is a port,
> >>> + *  walk the subordinate bus, including any bridged devices
> >>> + *  on buses under this bus.  Call the provided callback
> >>> + *  on each device found.
> >>> + *
> >>> + *  If the device provided hs no subordinate bus, call the provided
> >>> + *  callback on the device itself.
> >>> + *
> >>> + */
> >>> +void pci_walk_below_dev(struct pci_dev *dev, int (*cb)(struct 
> >>> pci_dev
> >>> *, void *),
> >>> +			void *userdata)
> >>> +{
> >>> +	struct pci_bus *bus;
> >>> +
> >>> +	if (dev->subordinate) {
> >>> +		bus = dev->subordinate;
> >>> +		pci_walk_bus(bus, cb, userdata);
> >>> +	} else {
> >>> +		cb(dev, userdata);
> >>> +	}
> >>> +}
> >>> +EXPORT_SYMBOL_GPL(pci_walk_below_dev);
> >>> +
> >>>  struct pci_bus *pci_bus_get(struct pci_bus *bus)
> >>>  {
> >>>  	if (bus)
> >>> diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
> >>> index 14bb8f54723e..fa08b1cc3d96 100644
> >>> --- a/drivers/pci/pcie/err.c
> >>> +++ b/drivers/pci/pcie/err.c
> >>> @@ -151,33 +151,39 @@ pci_ers_result_t pcie_do_recovery(struct 
> >>> pci_dev
> >>> *dev,
> >>>  			pci_ers_result_t (*reset_link)(struct pci_dev *pdev))
> >>>  {
> >>>  	pci_ers_result_t status = PCI_ERS_RESULT_CAN_RECOVER;
> >>> -	struct pci_bus *bus;
> >>>
> >>>  	/*
> >>>  	 * Error recovery runs on all subordinates of the first downstream
> >>> port.
> >>>  	 * If the downstream port detected the error, it is cleared at the
> >>> end.
> >>> +	 * For RCiEPs we should reset just the RCiEP itself.
> >>>  	 */
> >>>  	if (!(pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT ||
> >>> -	      pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM))
> >>> +	      pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM ||
> >>> +	      pci_pcie_type(dev) == PCI_EXP_TYPE_RC_END))
> >>>  		dev = dev->bus->self;
> >>> -	bus = dev->subordinate;
> >>>
> >>>  	pci_dbg(dev, "broadcast error_detected message\n");
> >>>  	if (state == pci_channel_io_frozen) {
> >>> -		pci_walk_bus(bus, report_frozen_detected, &status);
> >>> +		pci_walk_below_dev(dev, report_frozen_detected, &status);
> >>> +		if (pci_pcie_type(dev) == PCI_EXP_TYPE_RC_END) {
> >>> +			pci_warn(dev, "link reset not possible for RCiEP\n");
> >>> +			status = PCI_ERS_RESULT_NONE;
> >>> +			goto failed;
> >>> +		}
> >>> +
> >>>  		status = reset_link(dev);
> >>>  		if (status != PCI_ERS_RESULT_RECOVERED) {
> >>>  			pci_warn(dev, "link reset failed\n");
> >>>  			goto failed;
> >>>  		}
> >>>  	} else {
> >>> -		pci_walk_bus(bus, report_normal_detected, &status);
> >>> +		pci_walk_below_dev(dev, report_normal_detected, &status);
> >>>  	}
> >>>
> >>>  	if (status == PCI_ERS_RESULT_CAN_RECOVER) {
> >>>  		status = PCI_ERS_RESULT_RECOVERED;
> >>>  		pci_dbg(dev, "broadcast mmio_enabled message\n");
> >>> -		pci_walk_bus(bus, report_mmio_enabled, &status);
> >>> +		pci_walk_below_dev(dev, report_mmio_enabled, &status);
> >>>  	}
> >>>
> >>>  	if (status == PCI_ERS_RESULT_NEED_RESET) {
> >>> @@ -188,17 +194,20 @@ pci_ers_result_t pcie_do_recovery(struct 
> >>> pci_dev
> >>> *dev,
> >>>  		 */
> >>>  		status = PCI_ERS_RESULT_RECOVERED;
> >>>  		pci_dbg(dev, "broadcast slot_reset message\n");
> >>> -		pci_walk_bus(bus, report_slot_reset, &status);
> >>> +		pci_walk_below_dev(dev, report_slot_reset, &status);
> >>>  	}
> >>>
> >>>  	if (status != PCI_ERS_RESULT_RECOVERED)
> >>>  		goto failed;
> >>>
> >>>  	pci_dbg(dev, "broadcast resume message\n");
> >>> -	pci_walk_bus(bus, report_resume, &status);
> >>> +	pci_walk_below_dev(dev, report_resume, &status);
> >>>
> >>> -	pci_aer_clear_device_status(dev);
> >>> -	pci_aer_clear_nonfatal_status(dev);
> >>> +	if ((pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT ||
> >>> +	     pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM)) {
> >>> +		pci_aer_clear_device_status(dev);
> >>> +		pci_aer_clear_nonfatal_status(dev);
> >>> +	}
> >>>  	pci_info(dev, "device recovery successful\n");
> >>>  	return status;
> >>>
> >>> diff --git a/include/linux/pci.h b/include/linux/pci.h
> >>> index c79d83304e52..538bf0a76d33 100644
> >>> --- a/include/linux/pci.h
> >>> +++ b/include/linux/pci.h
> >>> @@ -1411,6 +1411,8 @@ int pci_scan_bridge(struct pci_bus *bus, 
> >>> struct
> >>> pci_dev *dev, int max,
> >>>
> >>>  void pci_walk_bus(struct pci_bus *top, int (*cb)(struct pci_dev *,
> >>> void *),
> >>>  		  void *userdata);
> >>> +void pci_walk_below_dev(struct pci_dev *dev, int (*cb)(struct 
> >>> pci_dev
> >>> *, void *),
> >>> +			void *userdata);
> >>>  int pci_cfg_space_size(struct pci_dev *dev);
> >>>  unsigned char pci_bus_max_busnr(struct pci_bus *bus);
> >>>  void pci_setup_bridge(struct pci_bus *bus);
> >>> -- 
> >>> 2.19.1
Sean V Kelley July 7, 2020, 4:56 p.m. UTC | #7
On 3 Jul 2020, at 1:23, Jonathan Cameron wrote:

> On Thu, 2 Jul 2020 11:06:26 -0700
> Sean V Kelley <sean.v.kelley@linux.intel.com> wrote:
> 
> > On 26 Jun 2020, at 11:41, Jonathan Cameron wrote:
> > 
> > > On Fri, 26 Jun 2020 09:29:34 -0700
> > > Sean V Kelley <sean.v.kelley@linux.intel.com> wrote:
> > > 
> > > > Hi,
> > > Hi,
> > > 
> > > Thanks for taking a look.
> > > 
> > > > 
> > > > On 22 Jun 2020, at 4:44, Jonathan Cameron wrote:
> > > > 
> > > > > Was previously: PCI/AER: Add partial initial supprot for
> > > > > RCiEPs
> > > > > using
> > > > > RCEC or
> > > > > firmware first.
> > > > > 
> > > > > Currently the kernel does not handle AER errors for Root
> > > > > Complex
> > > > > integrated
> > > > > End Points (RCiEPs)[0].  These devices sit on a root bus
> > > > > within 
> > > > > the
> > > > > Root Complex
> > > > > (RC).  AER handling is performed by a Root Complex Event
> > > > > Collector
> > > > > (RCEC) [1]
> > > > > which is a effectively a type of RCiEP on the same root bus.
> > > > > 
> > > > > This code will only perform the correct reset flow for the
> > > > > case
> > > > > where
> > > > > there
> > > > > is no need to take any actions on the RCEC because the
> > > > > firmware is
> > > > > responsible for them.   This is true where APEI [2] is used
> > > > > to
> > > > > report
> > > > > the AER
> > > > > errors via a GHES[v2] HEST entry [3] and relevant AER CPER
> > > > > record
> > > > > [4]
> > > > > and Firmware
> > > > > First handling is in use.
> > > > 
> > > > Right, in the case of the RCEC one identifies the RCiEPs by
> > > > the 
> > > > RCiEP
> > > > bitmap as a part of the RCEC Associated Endpoint Extended
> > > > Capabilities.
> > > > This ‘search’ so to speak would make use also of the RCEC
> > > > Associated
> > > > Bus Numbers Register to associate the devices with an RCEC when
> > > > not
> > > > on
> > > > the same bus.
> > > 
> > > Ah. I'm afraid my access to recent specs is a bit limited at the
> > > moment.
> > > I do have a draft 5.0 spec which has that in though so I now see 
> > > what
> > > you mean.
> > > 
> > > Was introduced in Root Complex Event Collector Endpoint
> > > Association
> > > Extended
> > > Capability version 2 in PCIe 5.0 I think.
> > > 
> > 
> > Correct.
> > 
> > > > > As there is no current RCEC driver support, it should not be
> > > > > possible
> > > > > to get
> > > > > to this code via any routes other than the one above. Hence
> > > > > appropriate RCEC
> > > > > handling can be added when the RCEC driver support is ready.
> > > > > The error handling is different from a normal PCIe End Point
> > > > > because:
> > > > > 
> > > > > 1) There is no downstream port above an RCiEP as these
> > > > > devices sit
> > > > > on
> > > > > a root
> > > > >    bus.
> > > > > 
> > > > > 2) In general, it makes little sense to reset other devices
> > > > > on on
> > > > > the
> > > > > same
> > > > >    root bus.  For error handling outside the of the root
> > > > > complex
> > > > > (RC)
> > > > > an AER
> > > > >    error will indicate that all the topology below the
> > > > > physical
> > > > > link,
> > > > > which
> > > > >    the error is related to, will need to be reset as they
> > > > > share a
> > > > > common
> > > > >    path to the host.  For an RCiEP there is no such defined
> > > > > shared
> > > > > path
> > > > >    relationship with other elements on the root bus.
> > > > > 
> > > > > A new walk function, similar to pci_bus_walk is provided
> > > > > that 
> > > > > takes
> > > > > a
> > > > > pci_dev
> > > > > instead of a bus.  If that dev corresponds to a downstream
> > > > > port it
> > > > > will walk
> > > > > the subordinate bus of that downstream port.  If the dev does
> > > > > not
> > > > > then
> > > > > it
> > > > > will call the function on that device alone.   This function 
> > > > > allows
> > > > > us
> > > > > to
> > > > > avoid adding special cases to the majority of the error
> > > > > handling.
> > > > 
> > > > Then in that case the callback could add the additional checks
> > > > specific
> > > > to identifying the associated RCiEPs.
> > > 
> > > I am afraid I don't follow what you mean here.  Could you give
> > > more
> > > info?
> > 
> > Sure, a given RCEC can be associated with multiple RCiEPs.  As a
> > part 
> > of
> > the Extended Association Cap it is possible to obtain a bitmap of
> > the
> > RCiEP device ids on the same bus number as the RCEC device itself.
> > (5.0-1.0 sec 7.9.10.2).  With a Cap version of 2h or higher, it is 
> > also
> > possible to get an additional range of bus numbers containing
> > RCiEPs
> > also associated with this RCEC.
> > 
> > So I’m wondering if this function could be used in which passing a
> > dev, in this case the RCEC, triggers the call back which makes use
> > of
> > the RCiEP bitmap and associated bus ranges to return all identified
> > devices in use cases such as in AER for finding sources, etc.
> 
> Ah understood.
> 
> If we do this we effectively end up with 3 different types of walk
> and
> the meaning of the walk function gets more complex again.
> 
> 1) Normal bus walk - we pass the downstream port above a bus to which
> the device is attached and it walks everthing below that point.
> 
> 2) Case I care about RCiEP with no visible association with an RCEC
> as
> I don't need to walk devices.  In that case just calls the callbacks 
> for
> the actual device.
> 
> 3) Pass in RCiEP with RCEC asociated with it (or do a dance at the 
> caller
> to pass in the RCEC itself). Need to walk the devices that the RCEC
> is
> handling errors for.  For handling, I'm not all the calls will be 
> generally
> applicable to other devices associated with the RCEC as some only
> make
> sense if there is an actual PCIe bus involved and hence we need to 
> reset
> other devices on that bus.  For RCEC I don't think there is an 
> particular
> reason to assume an AER error reported at one RCiEP will have any 
> impact
> on other devices associated with the particular RCEC.
> I've not found anything in the spec addressing this question but 
> perhaps
> I've missed something?

Correct.  There should be no impact to the RCEC or its associated
RCiEPs which may not happen to reside on the same bus as the collector at all.

> However, if the RCEC doesn't support multiple error records, you may 
> need
> to walk the bus to identify multiple simultaneous issues, very 
> carefully
> avoiding (or least minimizing) race conditions.
> 
> > The alternative is to have a separate walk for RCECs that loops 
> > through
> > the bitmap / ranges (if supported) triggering the callback for each
> > device found.

I’ve been testing the Associated Endpoint Bitmap and Bus Range 
handling and using my pciutils patches to help to confirm some of the 
association. This overlaps with my CXL work and CXL 1.1 based RCiEPs
are good test cards:

(Decode via:  
https://lore.kernel.org/linux-pci/20200624223940.240463-1-sean.v.kelley@linux.intel.com/
 )

Test card at 6b:00.0

Capabilities: [e00 v1] Designated Vendor-Specific: Vendor=1e98 ID=0000
Rev=0 Len=56: CXL
CXLCap: Cache- IO+ Mem+ Mem HW Init+ HDMCount 1 Viral-
CXLCtl: Cache- IO+ Mem- Cache SF Cov 0 Cache SF Gran 0 Cache Clean-
Viral-
CXLSta: Viral-
Capabilities: [e38 v1] Device Serial Number 30-91-11-78-10-00-00-00

RCEC assocated to RCiEP at 6b while residing at 6a:

Capabilities: [160 v2] Root Complex Event Collector Endpoint
Association
RCiEPBitmap: 00000000 [none]
AssociatedBusNumbers: 6b-6b
Kernel driver in use: pcieport

with dmesg:

[ 10.502543] pcieport 0000:6a:00.4: AER: enabled with IRQ 34

The trick is the walk, which is not compact.  Currently working on
error injection to test:

void pcie_walk_rcec(struct pci_dev *rcec, int (*cb)(struct pci_dev *,
void *),
+                   void *userdata)
+{
+       u32 pos, bitmap, hdr, busn;
+       u8 ver, nextbusn, lastbusn;
+       unsigned int dev, fn, bnr;
+       struct pci_bus *pbus=NULL;
+       struct pci_dev *pdev;
+       int retval;
+
+       pos = pci_find_ext_capability(rcec, PCI_EXT_CAP_ID_RCEC);
+       if (!pos)
+               return;
+
+       pci_read_config_dword(rcec, pos + PCI_RCEC_RCIEP_BITMAP,
&bitmap);
+
+       for (dev = 0; dev < 32; dev++) {
+               if (!(bitmap & (1 << dev)))
+                       continue;
+
+               for (fn = 0; fn < 8; fn++) {
+                       pdev =
pci_get_domain_bus_and_slot(pci_domain_nr(rcec->bus),
+                                                          rcec->bus-
>number,
+                                                          PCI_DEVFN(de
v, fn));
+                       if (!pdev)
+                               continue;
+
+                       retval = cb(pdev, userdata);
+                       if (retval)
+                               return;
+               }
+       }

Then continuing in the same function above, I need to also consider the
case for the Bus ranges (still wip, not tested):

+       pci_read_config_dword(rcec, pos, &hdr);
+       ver = PCI_RCEC_EP_CAP_VER(hdr);
+       if (ver < PCI_RCEC_BUSN_REG_VER)
+               return;
+
+       pci_read_config_dword(rcec, pos + PCI_RCEC_BUSN, &busn);
+       nextbusn = PCI_RCEC_BUSN_NEXT(busn);
+       lastbusn = PCI_RCEC_BUSN_LAST(busn);
+
+       if ((nextbusn == 0xff) && (lastbusn == 0x00))
+               return;
+
+       for (bnr = nextbusn; bnr < (lastbusn + 1); bnr++) {
+               pbus = pci_find_bus(pci_domain_nr(rcec->bus), bnr);
+               if (pbus) {
+                       /* find RCiEP devices on the given bus */
+                       for (dev = 0; dev < 32; dev++) {
+ etc...
+                       }
+               }
+       }
+}

Currently this lives in aer.c and it's large enough that I wonder if
due to the specifity of the assoicated spec requirments if that should
be fine?

> Agreed. We would end up with the same splitting of handling paths
> that
> wasn't liked in my v1 patch.   Perhaps we need  single 
> pci_walk_aer_affected
> function with a pile of documentation for what it is actually doing?

Perhaps this would be a good start?  I tend to agree.  I can also
submit more of the patches as RFC for further comment.

> 
> Even then we may need to have a parameter to indicate a particular 
> callback
> should be restricted to devices that share a 'real bus' or not.

In my case 'walk' through the spec options of either just bitmap or if
of sufficient version (2h), I walk through the bus ranges, calling the
callback at each encounter.

> 
> I guess the proof will as ever be in what the code looks like.

Agreed.

> 
> Hmm. It increasingly feels like we may need to have a go at drawing 
> together
> some coherent documentation for the different ways of handling
> AER errors and specifications / assumptions for each.  Would be 
> 'interesting'
> to do given I'm fairly sure very few people actually understands all 
> the options and
> nasty corner cases!
> 
> +CC Lorenzo and James who may also be interested in this topic in 
> general.

Let's do.

Best regards,

Sean

> 
> Thanks,
> 
> Jonathan
> 
> 
> > Thanks,
> > 
> > Sean
> > 
> > > > > Open questions:
> > > > > 
> > > > > 1. Are we better protecting against link reset for an RCiEP
> > > > > in 
> > > > > here
> > > > > or
> > > > >    should we put the check in the link reset functions?
> > > > > 
> > > > > 2. If we were to get a stupid firmware record with the
> > > > > relevant
> > > > > reset
> > > > > flag
> > > > >    set to trigger a link reset, what is the correct
> > > > > response?  For
> > > > > now
> > > > > I
> > > > >    try to report that we haven't done anything and print a 
> > > > > warning.
> > > > > 
> > > > > 3. Naming of pci_walk_below_dev is rather unsatisfying. Any
> > > > > better
> > > > > ideas?
> > > > > 
> > > > > 4. pci_walk_below_dev is perhaps not of general utility.
> > > > > Shall I
> > > > > make
> > > > > it local
> > > > >    in err.c?  If not would a precursor patch for that be
> > > > > preferred?’
> > > > 
> > > > It depends.  Is it intended as a drop in replacement where
> > > > needed 
> > > > for
> > > > pci_walk_bus()? So in that case you are now passing the dev 
> > > > structure
> > > > and do the check for subordinate or is it intended as being 
> > > > specific
> > > > to
> > > > say RCEC? With AER, one could either first check for RC_EC type
> > > > before
> > > > using this one.  Or one could just drop in replace (passing the
> > > > dev
> > > > structure instead) and the call back performs the RCEC specific
> > > > checks
> > > > when a device is encountered.
> > > 
> > > If it is useful in aer.c that's great.   Just seemed such a weird
> > > beast
> > > I wasn't sure it would be of use anywhere else.
> > > 
> > > > > Testing has been performed via error injection on a QEMU
> > > > > platform 
> > > > > as
> > > > > that lets
> > > > > me create a wide range of topologies and report errors at
> > > > > any 
> > > > > chosen
> > > > > location.
> > > > > Currently I have no plans to upstream this injection support,
> > > > > but 
> > > > > am
> > > > > happy to
> > > > > share if useful to others.
> > > > 
> > > > I’m experimenting with it in my RCEC code in AER and will give 
> > > > you
> > > > additional feedback.
> > > 
> > > Great, thanks
> > > 
> > > Jonathan
> > > 
> > > > Thanks,
> > > > 
> > > > Sean
> > > > 
> > > > 
> > > > > [0] ACPI PCI Express Base Specification 4.0 1.3.2.3 Root
> > > > > Complex
> > > > > Integrated
> > > > >     Endpoint Rules.
> > > > > [1] ACPI PCI Express Base Specification 4.0 6.2 Error
> > > > > Signalling 
> > > > > and
> > > > > Logging
> > > > > [2] ACPI Specification 6.3 Chapter 18 ACPI Platform Error 
> > > > > Interface
> > > > > (APEI)
> > > > > [3] ACPI Sepcification 6.3 18.2.3.7 Generic Hardware Error
> > > > > Source
> > > > > [4] UEFI Specification 2.8, N.2.7 PCI Express Error Section
> > > > > 
> > > > > Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > > > > ---
> > > > > Changes since v1:
> > > > > * Separated from the largely unrelated fix so the two can
> > > > > move
> > > > > forwards separately.
> > > > > * Instead of separate path for RCiEP handling use the method
> > > > > suggested
> > > > > by Bjorn
> > > > >   and Sathyanarayanan with an adjusted pci_bus_walk.
> > > > > 
> > > > > Thanks all for reviews of V1.
> > > > > 
> > > > >  drivers/pci/bus.c      | 28 ++++++++++++++++++++++++++++
> > > > >  drivers/pci/pcie/err.c | 29 +++++++++++++++++++----------
> > > > >  include/linux/pci.h    |  2 ++
> > > > >  3 files changed, 49 insertions(+), 10 deletions(-)
> > > > > 
> > > > > diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
> > > > > index 8e40b3e6da77..7cbe1ed2db3d 100644
> > > > > --- a/drivers/pci/bus.c
> > > > > +++ b/drivers/pci/bus.c
> > > > > @@ -411,6 +411,34 @@ void pci_walk_bus(struct pci_bus *top,
> > > > > int
> > > > > (*cb)(struct pci_dev *, void *),
> > > > >  }
> > > > >  EXPORT_SYMBOL_GPL(pci_walk_bus);
> > > > > 
> > > > > +/** pci_walk_below_dev - walk devices below (or on) another 
> > > > > device
> > > > > + *  @dev      device for which we should walk below,
> > > > > include 
> > > > > device
> > > > > when not a port.
> > > > > + *  @cb       callback to be called for each device found
> > > > > + *  @userdata arbitrary pointer to be passed to callback.
> > > > > + *
> > > > > + *  If the device provided is a port,
> > > > > + *  walk the subordinate bus, including any bridged devices
> > > > > + *  on buses under this bus.  Call the provided callback
> > > > > + *  on each device found.
> > > > > + *
> > > > > + *  If the device provided hs no subordinate bus, call the 
> > > > > provided
> > > > > + *  callback on the device itself.
> > > > > + *
> > > > > + */
> > > > > +void pci_walk_below_dev(struct pci_dev *dev, int
> > > > > (*cb)(struct
> > > > > pci_dev
> > > > > *, void *),
> > > > > +			void *userdata)
> > > > > +{
> > > > > +	struct pci_bus *bus;
> > > > > +
> > > > > +	if (dev->subordinate) {
> > > > > +		bus = dev->subordinate;
> > > > > +		pci_walk_bus(bus, cb, userdata);
> > > > > +	} else {
> > > > > +		cb(dev, userdata);
> > > > > +	}
> > > > > +}
> > > > > +EXPORT_SYMBOL_GPL(pci_walk_below_dev);
> > > > > +
> > > > >  struct pci_bus *pci_bus_get(struct pci_bus *bus)
> > > > >  {
> > > > >  	if (bus)
> > > > > diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
> > > > > index 14bb8f54723e..fa08b1cc3d96 100644
> > > > > --- a/drivers/pci/pcie/err.c
> > > > > +++ b/drivers/pci/pcie/err.c
> > > > > @@ -151,33 +151,39 @@ pci_ers_result_t
> > > > > pcie_do_recovery(struct
> > > > > pci_dev
> > > > > *dev,
> > > > >  			pci_ers_result_t (*reset_link)(struct
> > > > > pci_dev *pdev))
> > > > >  {
> > > > >  	pci_ers_result_t status = PCI_ERS_RESULT_CAN_RECOVER;
> > > > > -	struct pci_bus *bus;
> > > > > 
> > > > >  	/*
> > > > >  	 * Error recovery runs on all subordinates of the
> > > > > first 
> > > > > downstream
> > > > > port.
> > > > >  	 * If the downstream port detected the error, it is
> > > > > cleared at 
> > > > > the
> > > > > end.
> > > > > +	 * For RCiEPs we should reset just the RCiEP itself.
> > > > >  	 */
> > > > >  	if (!(pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT ||
> > > > > -	      pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM))
> > > > > +	      pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM ||
> > > > > +	      pci_pcie_type(dev) == PCI_EXP_TYPE_RC_END))
> > > > >  		dev = dev->bus->self;
> > > > > -	bus = dev->subordinate;
> > > > > 
> > > > >  	pci_dbg(dev, "broadcast error_detected message\n");
> > > > >  	if (state == pci_channel_io_frozen) {
> > > > > -		pci_walk_bus(bus, report_frozen_detected,
> > > > > &status);
> > > > > +		pci_walk_below_dev(dev, report_frozen_detected,
> > > > > &status);
> > > > > +		if (pci_pcie_type(dev) == PCI_EXP_TYPE_RC_END)
> > > > > {
> > > > > +			pci_warn(dev, "link reset not possible
> > > > > for RCiEP\n");
> > > > > +			status = PCI_ERS_RESULT_NONE;
> > > > > +			goto failed;
> > > > > +		}
> > > > > +
> > > > >  		status = reset_link(dev);
> > > > >  		if (status != PCI_ERS_RESULT_RECOVERED) {
> > > > >  			pci_warn(dev, "link reset failed\n");
> > > > >  			goto failed;
> > > > >  		}
> > > > >  	} else {
> > > > > -		pci_walk_bus(bus, report_normal_detected,
> > > > > &status);
> > > > > +		pci_walk_below_dev(dev, report_normal_detected,
> > > > > &status);
> > > > >  	}
> > > > > 
> > > > >  	if (status == PCI_ERS_RESULT_CAN_RECOVER) {
> > > > >  		status = PCI_ERS_RESULT_RECOVERED;
> > > > >  		pci_dbg(dev, "broadcast mmio_enabled
> > > > > message\n");
> > > > > -		pci_walk_bus(bus, report_mmio_enabled,
> > > > > &status);
> > > > > +		pci_walk_below_dev(dev, report_mmio_enabled,
> > > > > &status);
> > > > >  	}
> > > > > 
> > > > >  	if (status == PCI_ERS_RESULT_NEED_RESET) {
> > > > > @@ -188,17 +194,20 @@ pci_ers_result_t
> > > > > pcie_do_recovery(struct
> > > > > pci_dev
> > > > > *dev,
> > > > >  		 */
> > > > >  		status = PCI_ERS_RESULT_RECOVERED;
> > > > >  		pci_dbg(dev, "broadcast slot_reset message\n");
> > > > > -		pci_walk_bus(bus, report_slot_reset, &status);
> > > > > +		pci_walk_below_dev(dev, report_slot_reset,
> > > > > &status);
> > > > >  	}
> > > > > 
> > > > >  	if (status != PCI_ERS_RESULT_RECOVERED)
> > > > >  		goto failed;
> > > > > 
> > > > >  	pci_dbg(dev, "broadcast resume message\n");
> > > > > -	pci_walk_bus(bus, report_resume, &status);
> > > > > +	pci_walk_below_dev(dev, report_resume, &status);
> > > > > 
> > > > > -	pci_aer_clear_device_status(dev);
> > > > > -	pci_aer_clear_nonfatal_status(dev);
> > > > > +	if ((pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT ||
> > > > > +	     pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM)) {
> > > > > +		pci_aer_clear_device_status(dev);
> > > > > +		pci_aer_clear_nonfatal_status(dev);
> > > > > +	}
> > > > >  	pci_info(dev, "device recovery successful\n");
> > > > >  	return status;
> > > > > 
> > > > > diff --git a/include/linux/pci.h b/include/linux/pci.h
> > > > > index c79d83304e52..538bf0a76d33 100644
> > > > > --- a/include/linux/pci.h
> > > > > +++ b/include/linux/pci.h
> > > > > @@ -1411,6 +1411,8 @@ int pci_scan_bridge(struct pci_bus
> > > > > *bus,
> > > > > struct
> > > > > pci_dev *dev, int max,
> > > > > 
> > > > >  void pci_walk_bus(struct pci_bus *top, int (*cb)(struct
> > > > > pci_dev 
> > > > > *,
> > > > > void *),
> > > > >  		  void *userdata);
> > > > > +void pci_walk_below_dev(struct pci_dev *dev, int
> > > > > (*cb)(struct
> > > > > pci_dev
> > > > > *, void *),
> > > > > +			void *userdata);
> > > > >  int pci_cfg_space_size(struct pci_dev *dev);
> > > > >  unsigned char pci_bus_max_busnr(struct pci_bus *bus);
> > > > >  void pci_setup_bridge(struct pci_bus *bus);
> > > > > -- 
> > > > > 2.19.1
Sean V Kelley July 7, 2020, 6:08 p.m. UTC | #8
Minor correction:

On Tue, 2020-07-07 at 09:56 -0700, Sean V Kelley wrote:
> On 3 Jul 2020, at 1:23, Jonathan Cameron wrote:
> 
> > On Thu, 2 Jul 2020 11:06:26 -0700
> > Sean V Kelley <sean.v.kelley@linux.intel.com> wrote:
> > 
> > However, if the RCEC doesn't support multiple error records, you
> > may 
> > need
> > to walk the bus to identify multiple simultaneous issues, very 
> > carefully
> > avoiding (or least minimizing) race conditions.
> > 
> > > The alternative is to have a separate walk for RCECs that loops 
> > > through
> > > the bitmap / ranges (if supported) triggering the callback for
> > > each
> > > device found.
> 
> I’ve been testing the Associated Endpoint Bitmap and Bus Range 
> handling and using my pciutils patches to help to confirm some of
> the 
> association. This overlaps with my CXL work and CXL 1.1 based RCiEPs
> are good test cards:
> 
> (Decode via:  
> https://lore.kernel.org/linux-pci/20200624223940.240463-1-sean.v.kelley@linux.intel.com/
>  )
> 
> Test card at 6b:00.0
> 
> Capabilities: [e00 v1] Designated Vendor-Specific: Vendor=1e98
> ID=0000
> Rev=0 Len=56: CXL
> CXLCap: Cache- IO+ Mem+ Mem HW Init+ HDMCount 1 Viral-
> CXLCtl: Cache- IO+ Mem- Cache SF Cov 0 Cache SF Gran 0 Cache Clean-
> Viral-
> CXLSta: Viral-
> Capabilities: [e38 v1] Device Serial Number 30-91-11-78-10-00-00-00
> 
> RCEC assocated to RCiEP at 6b while residing at 6a:
> 
> Capabilities: [160 v2] Root Complex Event Collector Endpoint
> Association
> RCiEPBitmap: 00000000 [none]
> AssociatedBusNumbers: 6b-6b
> Kernel driver in use: pcieport
> 
> with dmesg:
> 
> [ 10.502543] pcieport 0000:6a:00.4: AER: enabled with IRQ 34
> 
> The trick is the walk, which is not compact.  Currently working on
> error injection to test:
> 
> void pcie_walk_rcec(struct pci_dev *rcec, int (*cb)(struct pci_dev *,
> void *),
> +                   void *userdata)
> +{
> +       u32 pos, bitmap, hdr, busn;
> +       u8 ver, nextbusn, lastbusn;
> +       unsigned int dev, fn, bnr;
> +       struct pci_bus *pbus=NULL;
> +       struct pci_dev *pdev;
> +       int retval;
> +
> +       pos = pci_find_ext_capability(rcec, PCI_EXT_CAP_ID_RCEC);
> +       if (!pos)
> +               return;
> +
> +       pci_read_config_dword(rcec, pos + PCI_RCEC_RCIEP_BITMAP,
> &bitmap);
> +
> +       for (dev = 0; dev < 32; dev++) {
> +               if (!(bitmap & (1 << dev)))
> +                       continue;
> +
> +               for (fn = 0; fn < 8; fn++) {
> +                       pdev =
> pci_get_domain_bus_and_slot(pci_domain_nr(rcec->bus),
> +                                                          rcec->bus-
> > number,
> +                                                          PCI_DEVFN(
> de
> v, fn));
> +                       if (!pdev)
> +                               continue;
> +
> +                       retval = cb(pdev, userdata);
> +                       if (retval)
> +                               return;
> +               }
> +       }
> 
> Then continuing in the same function above, I need to also consider
> the
> case for the Bus ranges (still wip, not tested):
> 
> +       pci_read_config_dword(rcec, pos, &hdr);
> +       ver = PCI_RCEC_EP_CAP_VER(hdr);
> +       if (ver < PCI_RCEC_BUSN_REG_VER)
> +               return;
> +
> +       pci_read_config_dword(rcec, pos + PCI_RCEC_BUSN, &busn);
> +       nextbusn = PCI_RCEC_BUSN_NEXT(busn);
> +       lastbusn = PCI_RCEC_BUSN_LAST(busn);
> +
> +       if ((nextbusn == 0xff) && (lastbusn == 0x00))
> +               return;
> +
> +       for (bnr = nextbusn; bnr < (lastbusn + 1); bnr++) {
> +               pbus = pci_find_bus(pci_domain_nr(rcec->bus), bnr);
> +               if (pbus) {
> +                       /* find RCiEP devices on the given bus */
> +                       for (dev = 0; dev < 32; dev++) {
> + etc...
> +                       }
> +               }
> +       }
> +}
> 
> Currently this lives in aer.c and it's large enough that I wonder if
> due to the specifity of the assoicated spec requirments if that
> should
> be fine?


This currently resides in portdrv_core.c due to the fact that a walk is
needed also for pme.c. 

Sean

> 
> > Agreed. We would end up with the same splitting of handling paths
> > that
> > wasn't liked in my v1 patch.   Perhaps we need  single 
> > pci_walk_aer_affected
> > function with a pile of documentation for what it is actually
> > doing?
> 
> Perhaps this would be a good start?  I tend to agree.  I can also
> submit more of the patches as RFC for further comment.
> 
> > Even then we may need to have a parameter to indicate a particular 
> > callback
> > should be restricted to devices that share a 'real bus' or not.
> 
> In my case 'walk' through the spec options of either just bitmap or
> if
> of sufficient version (2h), I walk through the bus ranges, calling
> the
> callback at each encounter.
> 
> > I guess the proof will as ever be in what the code looks like.
> 
> Agreed.
> 
> > Hmm. It increasingly feels like we may need to have a go at
> > drawing 
> > together
> > some coherent documentation for the different ways of handling
> > AER errors and specifications / assumptions for each.  Would be 
> > 'interesting'
> > to do given I'm fairly sure very few people actually understands
> > all 
> > the options and
> > nasty corner cases!
> > 
> > +CC Lorenzo and James who may also be interested in this topic in 
> > general.
> 
> Let's do.
> 
> Best regards,
> 
> Sean
> 
> > Thanks,
> > 
> > Jonathan
> > 
> > 
> > > Thanks,
> > > 
> > > Sean
> > > 
> > > > > > Open questions:
> > > > > > 
> > > > > > 1. Are we better protecting against link reset for an RCiEP
> > > > > > in 
> > > > > > here
> > > > > > or
> > > > > >    should we put the check in the link reset functions?
> > > > > > 
> > > > > > 2. If we were to get a stupid firmware record with the
> > > > > > relevant
> > > > > > reset
> > > > > > flag
> > > > > >    set to trigger a link reset, what is the correct
> > > > > > response?  For
> > > > > > now
> > > > > > I
> > > > > >    try to report that we haven't done anything and print a 
> > > > > > warning.
> > > > > > 
> > > > > > 3. Naming of pci_walk_below_dev is rather unsatisfying. Any
> > > > > > better
> > > > > > ideas?
> > > > > > 
> > > > > > 4. pci_walk_below_dev is perhaps not of general utility.
> > > > > > Shall I
> > > > > > make
> > > > > > it local
> > > > > >    in err.c?  If not would a precursor patch for that be
> > > > > > preferred?’
> > > > > 
> > > > > It depends.  Is it intended as a drop in replacement where
> > > > > needed 
> > > > > for
> > > > > pci_walk_bus()? So in that case you are now passing the dev 
> > > > > structure
> > > > > and do the check for subordinate or is it intended as being 
> > > > > specific
> > > > > to
> > > > > say RCEC? With AER, one could either first check for RC_EC
> > > > > type
> > > > > before
> > > > > using this one.  Or one could just drop in replace (passing
> > > > > the
> > > > > dev
> > > > > structure instead) and the call back performs the RCEC
> > > > > specific
> > > > > checks
> > > > > when a device is encountered.
> > > > 
> > > > If it is useful in aer.c that's great.   Just seemed such a
> > > > weird
> > > > beast
> > > > I wasn't sure it would be of use anywhere else.
> > > > 
> > > > > > Testing has been performed via error injection on a QEMU
> > > > > > platform 
> > > > > > as
> > > > > > that lets
> > > > > > me create a wide range of topologies and report errors at
> > > > > > any 
> > > > > > chosen
> > > > > > location.
> > > > > > Currently I have no plans to upstream this injection
> > > > > > support,
> > > > > > but 
> > > > > > am
> > > > > > happy to
> > > > > > share if useful to others.
> > > > > 
> > > > > I’m experimenting with it in my RCEC code in AER and will
> > > > > give 
> > > > > you
> > > > > additional feedback.
> > > > 
> > > > Great, thanks
> > > > 
> > > > Jonathan
> > > > 
> > > > > Thanks,
> > > > > 
> > > > > Sean
> > > > > 
> > > > > 
> > > > > > [0] ACPI PCI Express Base Specification 4.0 1.3.2.3 Root
> > > > > > Complex
> > > > > > Integrated
> > > > > >     Endpoint Rules.
> > > > > > [1] ACPI PCI Express Base Specification 4.0 6.2 Error
> > > > > > Signalling 
> > > > > > and
> > > > > > Logging
> > > > > > [2] ACPI Specification 6.3 Chapter 18 ACPI Platform Error 
> > > > > > Interface
> > > > > > (APEI)
> > > > > > [3] ACPI Sepcification 6.3 18.2.3.7 Generic Hardware Error
> > > > > > Source
> > > > > > [4] UEFI Specification 2.8, N.2.7 PCI Express Error Section
> > > > > > 
> > > > > > Signed-off-by: Jonathan Cameron <
> > > > > > Jonathan.Cameron@huawei.com>
> > > > > > ---
> > > > > > Changes since v1:
> > > > > > * Separated from the largely unrelated fix so the two can
> > > > > > move
> > > > > > forwards separately.
> > > > > > * Instead of separate path for RCiEP handling use the
> > > > > > method
> > > > > > suggested
> > > > > > by Bjorn
> > > > > >   and Sathyanarayanan with an adjusted pci_bus_walk.
> > > > > > 
> > > > > > Thanks all for reviews of V1.
> > > > > > 
> > > > > >  drivers/pci/bus.c      | 28 ++++++++++++++++++++++++++++
> > > > > >  drivers/pci/pcie/err.c | 29 +++++++++++++++++++----------
> > > > > >  include/linux/pci.h    |  2 ++
> > > > > >  3 files changed, 49 insertions(+), 10 deletions(-)
> > > > > > 
> > > > > > diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
> > > > > > index 8e40b3e6da77..7cbe1ed2db3d 100644
> > > > > > --- a/drivers/pci/bus.c
> > > > > > +++ b/drivers/pci/bus.c
> > > > > > @@ -411,6 +411,34 @@ void pci_walk_bus(struct pci_bus *top,
> > > > > > int
> > > > > > (*cb)(struct pci_dev *, void *),
> > > > > >  }
> > > > > >  EXPORT_SYMBOL_GPL(pci_walk_bus);
> > > > > > 
> > > > > > +/** pci_walk_below_dev - walk devices below (or on)
> > > > > > another 
> > > > > > device
> > > > > > + *  @dev      device for which we should walk below,
> > > > > > include 
> > > > > > device
> > > > > > when not a port.
> > > > > > + *  @cb       callback to be called for each device found
> > > > > > + *  @userdata arbitrary pointer to be passed to callback.
> > > > > > + *
> > > > > > + *  If the device provided is a port,
> > > > > > + *  walk the subordinate bus, including any bridged
> > > > > > devices
> > > > > > + *  on buses under this bus.  Call the provided callback
> > > > > > + *  on each device found.
> > > > > > + *
> > > > > > + *  If the device provided hs no subordinate bus, call
> > > > > > the 
> > > > > > provided
> > > > > > + *  callback on the device itself.
> > > > > > + *
> > > > > > + */
> > > > > > +void pci_walk_below_dev(struct pci_dev *dev, int
> > > > > > (*cb)(struct
> > > > > > pci_dev
> > > > > > *, void *),
> > > > > > +			void *userdata)
> > > > > > +{
> > > > > > +	struct pci_bus *bus;
> > > > > > +
> > > > > > +	if (dev->subordinate) {
> > > > > > +		bus = dev->subordinate;
> > > > > > +		pci_walk_bus(bus, cb, userdata);
> > > > > > +	} else {
> > > > > > +		cb(dev, userdata);
> > > > > > +	}
> > > > > > +}
> > > > > > +EXPORT_SYMBOL_GPL(pci_walk_below_dev);
> > > > > > +
> > > > > >  struct pci_bus *pci_bus_get(struct pci_bus *bus)
> > > > > >  {
> > > > > >  	if (bus)
> > > > > > diff --git a/drivers/pci/pcie/err.c
> > > > > > b/drivers/pci/pcie/err.c
> > > > > > index 14bb8f54723e..fa08b1cc3d96 100644
> > > > > > --- a/drivers/pci/pcie/err.c
> > > > > > +++ b/drivers/pci/pcie/err.c
> > > > > > @@ -151,33 +151,39 @@ pci_ers_result_t
> > > > > > pcie_do_recovery(struct
> > > > > > pci_dev
> > > > > > *dev,
> > > > > >  			pci_ers_result_t (*reset_link)(struct
> > > > > > pci_dev *pdev))
> > > > > >  {
> > > > > >  	pci_ers_result_t status = PCI_ERS_RESULT_CAN_RECOVER;
> > > > > > -	struct pci_bus *bus;
> > > > > > 
> > > > > >  	/*
> > > > > >  	 * Error recovery runs on all subordinates of the
> > > > > > first 
> > > > > > downstream
> > > > > > port.
> > > > > >  	 * If the downstream port detected the error, it is
> > > > > > cleared at 
> > > > > > the
> > > > > > end.
> > > > > > +	 * For RCiEPs we should reset just the RCiEP itself.
> > > > > >  	 */
> > > > > >  	if (!(pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT ||
> > > > > > -	      pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM))
> > > > > > +	      pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM ||
> > > > > > +	      pci_pcie_type(dev) == PCI_EXP_TYPE_RC_END))
> > > > > >  		dev = dev->bus->self;
> > > > > > -	bus = dev->subordinate;
> > > > > > 
> > > > > >  	pci_dbg(dev, "broadcast error_detected message\n");
> > > > > >  	if (state == pci_channel_io_frozen) {
> > > > > > -		pci_walk_bus(bus, report_frozen_detected,
> > > > > > &status);
> > > > > > +		pci_walk_below_dev(dev, report_frozen_detected,
> > > > > > &status);
> > > > > > +		if (pci_pcie_type(dev) == PCI_EXP_TYPE_RC_END)
> > > > > > {
> > > > > > +			pci_warn(dev, "link reset not possible
> > > > > > for RCiEP\n");
> > > > > > +			status = PCI_ERS_RESULT_NONE;
> > > > > > +			goto failed;
> > > > > > +		}
> > > > > > +
> > > > > >  		status = reset_link(dev);
> > > > > >  		if (status != PCI_ERS_RESULT_RECOVERED) {
> > > > > >  			pci_warn(dev, "link reset failed\n");
> > > > > >  			goto failed;
> > > > > >  		}
> > > > > >  	} else {
> > > > > > -		pci_walk_bus(bus, report_normal_detected,
> > > > > > &status);
> > > > > > +		pci_walk_below_dev(dev, report_normal_detected,
> > > > > > &status);
> > > > > >  	}
> > > > > > 
> > > > > >  	if (status == PCI_ERS_RESULT_CAN_RECOVER) {
> > > > > >  		status = PCI_ERS_RESULT_RECOVERED;
> > > > > >  		pci_dbg(dev, "broadcast mmio_enabled
> > > > > > message\n");
> > > > > > -		pci_walk_bus(bus, report_mmio_enabled,
> > > > > > &status);
> > > > > > +		pci_walk_below_dev(dev, report_mmio_enabled,
> > > > > > &status);
> > > > > >  	}
> > > > > > 
> > > > > >  	if (status == PCI_ERS_RESULT_NEED_RESET) {
> > > > > > @@ -188,17 +194,20 @@ pci_ers_result_t
> > > > > > pcie_do_recovery(struct
> > > > > > pci_dev
> > > > > > *dev,
> > > > > >  		 */
> > > > > >  		status = PCI_ERS_RESULT_RECOVERED;
> > > > > >  		pci_dbg(dev, "broadcast slot_reset message\n");
> > > > > > -		pci_walk_bus(bus, report_slot_reset, &status);
> > > > > > +		pci_walk_below_dev(dev, report_slot_reset,
> > > > > > &status);
> > > > > >  	}
> > > > > > 
> > > > > >  	if (status != PCI_ERS_RESULT_RECOVERED)
> > > > > >  		goto failed;
> > > > > > 
> > > > > >  	pci_dbg(dev, "broadcast resume message\n");
> > > > > > -	pci_walk_bus(bus, report_resume, &status);
> > > > > > +	pci_walk_below_dev(dev, report_resume, &status);
> > > > > > 
> > > > > > -	pci_aer_clear_device_status(dev);
> > > > > > -	pci_aer_clear_nonfatal_status(dev);
> > > > > > +	if ((pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT ||
> > > > > > +	     pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM)) {
> > > > > > +		pci_aer_clear_device_status(dev);
> > > > > > +		pci_aer_clear_nonfatal_status(dev);
> > > > > > +	}
> > > > > >  	pci_info(dev, "device recovery successful\n");
> > > > > >  	return status;
> > > > > > 
> > > > > > diff --git a/include/linux/pci.h b/include/linux/pci.h
> > > > > > index c79d83304e52..538bf0a76d33 100644
> > > > > > --- a/include/linux/pci.h
> > > > > > +++ b/include/linux/pci.h
> > > > > > @@ -1411,6 +1411,8 @@ int pci_scan_bridge(struct pci_bus
> > > > > > *bus,
> > > > > > struct
> > > > > > pci_dev *dev, int max,
> > > > > > 
> > > > > >  void pci_walk_bus(struct pci_bus *top, int (*cb)(struct
> > > > > > pci_dev 
> > > > > > *,
> > > > > > void *),
> > > > > >  		  void *userdata);
> > > > > > +void pci_walk_below_dev(struct pci_dev *dev, int
> > > > > > (*cb)(struct
> > > > > > pci_dev
> > > > > > *, void *),
> > > > > > +			void *userdata);
> > > > > >  int pci_cfg_space_size(struct pci_dev *dev);
> > > > > >  unsigned char pci_bus_max_busnr(struct pci_bus *bus);
> > > > > >  void pci_setup_bridge(struct pci_bus *bus);
> > > > > > -- 
> > > > > > 2.19.1
Sean V Kelley July 7, 2020, 11:40 p.m. UTC | #9
On 7 Jul 2020, at 9:56, Sean V Kelley wrote:

> On 3 Jul 2020, at 1:23, Jonathan Cameron wrote:
>
>> On Thu, 2 Jul 2020 11:06:26 -0700
>> Sean V Kelley <sean.v.kelley@linux.intel.com> wrote:
>>
>>> On 26 Jun 2020, at 11:41, Jonathan Cameron wrote:
>>>
>>>> On Fri, 26 Jun 2020 09:29:34 -0700
>>>> Sean V Kelley <sean.v.kelley@linux.intel.com> wrote:
>>>>
>>>>> Hi,
>>>> Hi,
>>>>
>>>> Thanks for taking a look.
>>>>
>>>>>
>>>>> On 22 Jun 2020, at 4:44, Jonathan Cameron wrote:
>>>>>
>>>>>> Was previously: PCI/AER: Add partial initial supprot for
>>>>>> RCiEPs
>>>>>> using
>>>>>> RCEC or
>>>>>> firmware first.
>>>>>>
>>>>>> Currently the kernel does not handle AER errors for Root
>>>>>> Complex
>>>>>> integrated
>>>>>> End Points (RCiEPs)[0].  These devices sit on a root bus
>>>>>> within
>>>>>> the
>>>>>> Root Complex
>>>>>> (RC).  AER handling is performed by a Root Complex Event
>>>>>> Collector
>>>>>> (RCEC) [1]
>>>>>> which is a effectively a type of RCiEP on the same root bus.
>>>>>>
>>>>>> This code will only perform the correct reset flow for the
>>>>>> case
>>>>>> where
>>>>>> there
>>>>>> is no need to take any actions on the RCEC because the
>>>>>> firmware is
>>>>>> responsible for them.   This is true where APEI [2] is used
>>>>>> to
>>>>>> report
>>>>>> the AER
>>>>>> errors via a GHES[v2] HEST entry [3] and relevant AER CPER
>>>>>> record
>>>>>> [4]
>>>>>> and Firmware
>>>>>> First handling is in use.
>>>>>
>>>>> Right, in the case of the RCEC one identifies the RCiEPs by
>>>>> the
>>>>> RCiEP
>>>>> bitmap as a part of the RCEC Associated Endpoint Extended
>>>>> Capabilities.
>>>>> This ‘search’ so to speak would make use also of the RCEC
>>>>> Associated
>>>>> Bus Numbers Register to associate the devices with an RCEC when
>>>>> not
>>>>> on
>>>>> the same bus.
>>>>
>>>> Ah. I'm afraid my access to recent specs is a bit limited at the
>>>> moment.
>>>> I do have a draft 5.0 spec which has that in though so I now see
>>>> what
>>>> you mean.
>>>>
>>>> Was introduced in Root Complex Event Collector Endpoint
>>>> Association
>>>> Extended
>>>> Capability version 2 in PCIe 5.0 I think.
>>>>
>>>
>>> Correct.
>>>
>>>>>> As there is no current RCEC driver support, it should not be
>>>>>> possible
>>>>>> to get
>>>>>> to this code via any routes other than the one above. Hence
>>>>>> appropriate RCEC
>>>>>> handling can be added when the RCEC driver support is ready.
>>>>>> The error handling is different from a normal PCIe End Point
>>>>>> because:
>>>>>>
>>>>>> 1) There is no downstream port above an RCiEP as these
>>>>>> devices sit
>>>>>> on
>>>>>> a root
>>>>>>    bus.
>>>>>>
>>>>>> 2) In general, it makes little sense to reset other devices
>>>>>> on on
>>>>>> the
>>>>>> same
>>>>>>    root bus.  For error handling outside the of the root
>>>>>> complex
>>>>>> (RC)
>>>>>> an AER
>>>>>>    error will indicate that all the topology below the
>>>>>> physical
>>>>>> link,
>>>>>> which
>>>>>>    the error is related to, will need to be reset as they
>>>>>> share a
>>>>>> common
>>>>>>    path to the host.  For an RCiEP there is no such defined
>>>>>> shared
>>>>>> path
>>>>>>    relationship with other elements on the root bus.
>>>>>>
>>>>>> A new walk function, similar to pci_bus_walk is provided
>>>>>> that
>>>>>> takes
>>>>>> a
>>>>>> pci_dev
>>>>>> instead of a bus.  If that dev corresponds to a downstream
>>>>>> port it
>>>>>> will walk
>>>>>> the subordinate bus of that downstream port.  If the dev does
>>>>>> not
>>>>>> then
>>>>>> it
>>>>>> will call the function on that device alone.   This function
>>>>>> allows
>>>>>> us
>>>>>> to
>>>>>> avoid adding special cases to the majority of the error
>>>>>> handling.
>>>>>
>>>>> Then in that case the callback could add the additional checks
>>>>> specific
>>>>> to identifying the associated RCiEPs.
>>>>
>>>> I am afraid I don't follow what you mean here.  Could you give
>>>> more
>>>> info?
>>>
>>> Sure, a given RCEC can be associated with multiple RCiEPs.  As a
>>> part
>>> of
>>> the Extended Association Cap it is possible to obtain a bitmap of
>>> the
>>> RCiEP device ids on the same bus number as the RCEC device itself.
>>> (5.0-1.0 sec 7.9.10.2).  With a Cap version of 2h or higher, it is
>>> also
>>> possible to get an additional range of bus numbers containing
>>> RCiEPs
>>> also associated with this RCEC.
>>>
>>> So I’m wondering if this function could be used in which passing a
>>> dev, in this case the RCEC, triggers the call back which makes use
>>> of
>>> the RCiEP bitmap and associated bus ranges to return all identified
>>> devices in use cases such as in AER for finding sources, etc.
>>
>> Ah understood.
>>
>> If we do this we effectively end up with 3 different types of walk
>> and
>> the meaning of the walk function gets more complex again.
>>
>> 1) Normal bus walk - we pass the downstream port above a bus to which
>> the device is attached and it walks everthing below that point.
>>
>> 2) Case I care about RCiEP with no visible association with an RCEC
>> as
>> I don't need to walk devices.  In that case just calls the callbacks
>> for
>> the actual device.
>>
>> 3) Pass in RCiEP with RCEC asociated with it (or do a dance at the
>> caller
>> to pass in the RCEC itself). Need to walk the devices that the RCEC
>> is
>> handling errors for.  For handling, I'm not all the calls will be
>> generally
>> applicable to other devices associated with the RCEC as some only
>> make
>> sense if there is an actual PCIe bus involved and hence we need to
>> reset
>> other devices on that bus.  For RCEC I don't think there is an
>> particular
>> reason to assume an AER error reported at one RCiEP will have any
>> impact
>> on other devices associated with the particular RCEC.
>> I've not found anything in the spec addressing this question but
>> perhaps
>> I've missed something?
>
> Correct.  There should be no impact to the RCEC or its associated
> RCiEPs which may not happen to reside on the same bus as the collector 
> at all.
>
>> However, if the RCEC doesn't support multiple error records, you may
>> need
>> to walk the bus to identify multiple simultaneous issues, very
>> carefully
>> avoiding (or least minimizing) race conditions.
>>
>>> The alternative is to have a separate walk for RCECs that loops
>>> through
>>> the bitmap / ranges (if supported) triggering the callback for each
>>> device found.
>
> I’ve been testing the Associated Endpoint Bitmap and Bus Range
> handling and using my pciutils patches to help to confirm some of the
> association. This overlaps with my CXL work and CXL 1.1 based RCiEPs
> are good test cards:
>
> (Decode via:
> https://lore.kernel.org/linux-pci/20200624223940.240463-1-sean.v.kelley@linux.intel.com/
>  )
>
> Test card at 6b:00.0
>
> Capabilities: [e00 v1] Designated Vendor-Specific: Vendor=1e98 ID=0000
> Rev=0 Len=56: CXL
> CXLCap: Cache- IO+ Mem+ Mem HW Init+ HDMCount 1 Viral-
> CXLCtl: Cache- IO+ Mem- Cache SF Cov 0 Cache SF Gran 0 Cache Clean-
> Viral-
> CXLSta: Viral-
> Capabilities: [e38 v1] Device Serial Number 30-91-11-78-10-00-00-00
>
> RCEC assocated to RCiEP at 6b while residing at 6a:
>
> Capabilities: [160 v2] Root Complex Event Collector Endpoint
> Association
> RCiEPBitmap: 00000000 [none]
> AssociatedBusNumbers: 6b-6b
> Kernel driver in use: pcieport
>
> with dmesg:
>
> [ 10.502543] pcieport 0000:6a:00.4: AER: enabled with IRQ 34
>
> The trick is the walk, which is not compact.  Currently working on
> error injection to test:
>
> void pcie_walk_rcec(struct pci_dev *rcec, int (*cb)(struct pci_dev *,
> void *),
> +                   void *userdata)
> +{
> +       u32 pos, bitmap, hdr, busn;
> +       u8 ver, nextbusn, lastbusn;
> +       unsigned int dev, fn, bnr;
> +       struct pci_bus *pbus=NULL;
> +       struct pci_dev *pdev;
> +       int retval;
> +
> +       pos = pci_find_ext_capability(rcec, PCI_EXT_CAP_ID_RCEC);
> +       if (!pos)
> +               return;
> +
> +       pci_read_config_dword(rcec, pos + PCI_RCEC_RCIEP_BITMAP,
> &bitmap);
> +
> +       for (dev = 0; dev < 32; dev++) {
> +               if (!(bitmap & (1 << dev)))
> +                       continue;
> +
> +               for (fn = 0; fn < 8; fn++) {
> +                       pdev =
> pci_get_domain_bus_and_slot(pci_domain_nr(rcec->bus),
> +                                                          rcec->bus-
>> number,
> +                                                          
> PCI_DEVFN(de
> v, fn));
> +                       if (!pdev)
> +                               continue;
> +
> +                       retval = cb(pdev, userdata);
> +                       if (retval)
> +                               return;
> +               }
> +       }
>
> Then continuing in the same function above, I need to also consider 
> the
> case for the Bus ranges (still wip, not tested):
>
> +       pci_read_config_dword(rcec, pos, &hdr);
> +       ver = PCI_RCEC_EP_CAP_VER(hdr);
> +       if (ver < PCI_RCEC_BUSN_REG_VER)
> +               return;
> +
> +       pci_read_config_dword(rcec, pos + PCI_RCEC_BUSN, &busn);
> +       nextbusn = PCI_RCEC_BUSN_NEXT(busn);
> +       lastbusn = PCI_RCEC_BUSN_LAST(busn);
> +
> +       if ((nextbusn == 0xff) && (lastbusn == 0x00))
> +               return;
> +
> +       for (bnr = nextbusn; bnr < (lastbusn + 1); bnr++) {
> +               pbus = pci_find_bus(pci_domain_nr(rcec->bus), bnr);
> +               if (pbus) {
> +                       /* find RCiEP devices on the given bus */
> +                       for (dev = 0; dev < 32; dev++) {
> + etc...
> +                       }
> +               }
> +       }
> +}
>
> Currently this lives in aer.c and it's large enough that I wonder if
> due to the specifity of the assoicated spec requirments if that should
> be fine?
>
>> Agreed. We would end up with the same splitting of handling paths
>> that
>> wasn't liked in my v1 patch.   Perhaps we need  single
>> pci_walk_aer_affected
>> function with a pile of documentation for what it is actually doing?
>
> Perhaps this would be a good start?  I tend to agree.  I can also
> submit more of the patches as RFC for further comment.

Thinking further on the matter and experimenting, I think a 
pci_walk_aer_device, or however we choose to call it, should be possible 
and cleaner when you consider the three cases.  The third case is what I 
illustrated pasted in the previous reply.  PME will need to likewise do 
a walk, in the case of RCEC.  However, the implementation will need to 
be different though it could share some common code for the third case.

Sean
>
>>
>> Even then we may need to have a parameter to indicate a particular
>> callback
>> should be restricted to devices that share a 'real bus' or not.
>
> In my case 'walk' through the spec options of either just bitmap or if
> of sufficient version (2h), I walk through the bus ranges, calling the
> callback at each encounter.
>
>>
>> I guess the proof will as ever be in what the code looks like.
>
> Agreed.
>
>>
>> Hmm. It increasingly feels like we may need to have a go at drawing
>> together
>> some coherent documentation for the different ways of handling
>> AER errors and specifications / assumptions for each.  Would be
>> 'interesting'
>> to do given I'm fairly sure very few people actually understands all
>> the options and
>> nasty corner cases!
>>
>> +CC Lorenzo and James who may also be interested in this topic in
>> general.
>
> Let's do.
>
> Best regards,
>
> Sean
>
>>
>> Thanks,
>>
>> Jonathan
>>
>>
>>> Thanks,
>>>
>>> Sean
>>>
>>>>>> Open questions:
>>>>>>
>>>>>> 1. Are we better protecting against link reset for an RCiEP
>>>>>> in
>>>>>> here
>>>>>> or
>>>>>>    should we put the check in the link reset functions?
>>>>>>
>>>>>> 2. If we were to get a stupid firmware record with the
>>>>>> relevant
>>>>>> reset
>>>>>> flag
>>>>>>    set to trigger a link reset, what is the correct
>>>>>> response?  For
>>>>>> now
>>>>>> I
>>>>>>    try to report that we haven't done anything and print a
>>>>>> warning.
>>>>>>
>>>>>> 3. Naming of pci_walk_below_dev is rather unsatisfying. Any
>>>>>> better
>>>>>> ideas?
>>>>>>
>>>>>> 4. pci_walk_below_dev is perhaps not of general utility.
>>>>>> Shall I
>>>>>> make
>>>>>> it local
>>>>>>    in err.c?  If not would a precursor patch for that be
>>>>>> preferred?’
>>>>>
>>>>> It depends.  Is it intended as a drop in replacement where
>>>>> needed
>>>>> for
>>>>> pci_walk_bus()? So in that case you are now passing the dev
>>>>> structure
>>>>> and do the check for subordinate or is it intended as being
>>>>> specific
>>>>> to
>>>>> say RCEC? With AER, one could either first check for RC_EC type
>>>>> before
>>>>> using this one.  Or one could just drop in replace (passing the
>>>>> dev
>>>>> structure instead) and the call back performs the RCEC specific
>>>>> checks
>>>>> when a device is encountered.
>>>>
>>>> If it is useful in aer.c that's great.   Just seemed such a weird
>>>> beast
>>>> I wasn't sure it would be of use anywhere else.
>>>>
>>>>>> Testing has been performed via error injection on a QEMU
>>>>>> platform
>>>>>> as
>>>>>> that lets
>>>>>> me create a wide range of topologies and report errors at
>>>>>> any
>>>>>> chosen
>>>>>> location.
>>>>>> Currently I have no plans to upstream this injection support,
>>>>>> but
>>>>>> am
>>>>>> happy to
>>>>>> share if useful to others.
>>>>>
>>>>> I’m experimenting with it in my RCEC code in AER and will give
>>>>> you
>>>>> additional feedback.
>>>>
>>>> Great, thanks
>>>>
>>>> Jonathan
>>>>
>>>>> Thanks,
>>>>>
>>>>> Sean
>>>>>
>>>>>
>>>>>> [0] ACPI PCI Express Base Specification 4.0 1.3.2.3 Root
>>>>>> Complex
>>>>>> Integrated
>>>>>>     Endpoint Rules.
>>>>>> [1] ACPI PCI Express Base Specification 4.0 6.2 Error
>>>>>> Signalling
>>>>>> and
>>>>>> Logging
>>>>>> [2] ACPI Specification 6.3 Chapter 18 ACPI Platform Error
>>>>>> Interface
>>>>>> (APEI)
>>>>>> [3] ACPI Sepcification 6.3 18.2.3.7 Generic Hardware Error
>>>>>> Source
>>>>>> [4] UEFI Specification 2.8, N.2.7 PCI Express Error Section
>>>>>>
>>>>>> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>>>>>> ---
>>>>>> Changes since v1:
>>>>>> * Separated from the largely unrelated fix so the two can
>>>>>> move
>>>>>> forwards separately.
>>>>>> * Instead of separate path for RCiEP handling use the method
>>>>>> suggested
>>>>>> by Bjorn
>>>>>>   and Sathyanarayanan with an adjusted pci_bus_walk.
>>>>>>
>>>>>> Thanks all for reviews of V1.
>>>>>>
>>>>>>  drivers/pci/bus.c      | 28 ++++++++++++++++++++++++++++
>>>>>>  drivers/pci/pcie/err.c | 29 +++++++++++++++++++----------
>>>>>>  include/linux/pci.h    |  2 ++
>>>>>>  3 files changed, 49 insertions(+), 10 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
>>>>>> index 8e40b3e6da77..7cbe1ed2db3d 100644
>>>>>> --- a/drivers/pci/bus.c
>>>>>> +++ b/drivers/pci/bus.c
>>>>>> @@ -411,6 +411,34 @@ void pci_walk_bus(struct pci_bus *top,
>>>>>> int
>>>>>> (*cb)(struct pci_dev *, void *),
>>>>>>  }
>>>>>>  EXPORT_SYMBOL_GPL(pci_walk_bus);
>>>>>>
>>>>>> +/** pci_walk_below_dev - walk devices below (or on) another
>>>>>> device
>>>>>> + *  @dev      device for which we should walk below,
>>>>>> include
>>>>>> device
>>>>>> when not a port.
>>>>>> + *  @cb       callback to be called for each device found
>>>>>> + *  @userdata arbitrary pointer to be passed to callback.
>>>>>> + *
>>>>>> + *  If the device provided is a port,
>>>>>> + *  walk the subordinate bus, including any bridged devices
>>>>>> + *  on buses under this bus.  Call the provided callback
>>>>>> + *  on each device found.
>>>>>> + *
>>>>>> + *  If the device provided hs no subordinate bus, call the
>>>>>> provided
>>>>>> + *  callback on the device itself.
>>>>>> + *
>>>>>> + */
>>>>>> +void pci_walk_below_dev(struct pci_dev *dev, int
>>>>>> (*cb)(struct
>>>>>> pci_dev
>>>>>> *, void *),
>>>>>> +			void *userdata)
>>>>>> +{
>>>>>> +	struct pci_bus *bus;
>>>>>> +
>>>>>> +	if (dev->subordinate) {
>>>>>> +		bus = dev->subordinate;
>>>>>> +		pci_walk_bus(bus, cb, userdata);
>>>>>> +	} else {
>>>>>> +		cb(dev, userdata);
>>>>>> +	}
>>>>>> +}
>>>>>> +EXPORT_SYMBOL_GPL(pci_walk_below_dev);
>>>>>> +
>>>>>>  struct pci_bus *pci_bus_get(struct pci_bus *bus)
>>>>>>  {
>>>>>>  	if (bus)
>>>>>> diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
>>>>>> index 14bb8f54723e..fa08b1cc3d96 100644
>>>>>> --- a/drivers/pci/pcie/err.c
>>>>>> +++ b/drivers/pci/pcie/err.c
>>>>>> @@ -151,33 +151,39 @@ pci_ers_result_t
>>>>>> pcie_do_recovery(struct
>>>>>> pci_dev
>>>>>> *dev,
>>>>>>  			pci_ers_result_t (*reset_link)(struct
>>>>>> pci_dev *pdev))
>>>>>>  {
>>>>>>  	pci_ers_result_t status = PCI_ERS_RESULT_CAN_RECOVER;
>>>>>> -	struct pci_bus *bus;
>>>>>>
>>>>>>  	/*
>>>>>>  	 * Error recovery runs on all subordinates of the
>>>>>> first
>>>>>> downstream
>>>>>> port.
>>>>>>  	 * If the downstream port detected the error, it is
>>>>>> cleared at
>>>>>> the
>>>>>> end.
>>>>>> +	 * For RCiEPs we should reset just the RCiEP itself.
>>>>>>  	 */
>>>>>>  	if (!(pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT ||
>>>>>> -	      pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM))
>>>>>> +	      pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM ||
>>>>>> +	      pci_pcie_type(dev) == PCI_EXP_TYPE_RC_END))
>>>>>>  		dev = dev->bus->self;
>>>>>> -	bus = dev->subordinate;
>>>>>>
>>>>>>  	pci_dbg(dev, "broadcast error_detected message\n");
>>>>>>  	if (state == pci_channel_io_frozen) {
>>>>>> -		pci_walk_bus(bus, report_frozen_detected,
>>>>>> &status);
>>>>>> +		pci_walk_below_dev(dev, report_frozen_detected,
>>>>>> &status);
>>>>>> +		if (pci_pcie_type(dev) == PCI_EXP_TYPE_RC_END)
>>>>>> {
>>>>>> +			pci_warn(dev, "link reset not possible
>>>>>> for RCiEP\n");
>>>>>> +			status = PCI_ERS_RESULT_NONE;
>>>>>> +			goto failed;
>>>>>> +		}
>>>>>> +
>>>>>>  		status = reset_link(dev);
>>>>>>  		if (status != PCI_ERS_RESULT_RECOVERED) {
>>>>>>  			pci_warn(dev, "link reset failed\n");
>>>>>>  			goto failed;
>>>>>>  		}
>>>>>>  	} else {
>>>>>> -		pci_walk_bus(bus, report_normal_detected,
>>>>>> &status);
>>>>>> +		pci_walk_below_dev(dev, report_normal_detected,
>>>>>> &status);
>>>>>>  	}
>>>>>>
>>>>>>  	if (status == PCI_ERS_RESULT_CAN_RECOVER) {
>>>>>>  		status = PCI_ERS_RESULT_RECOVERED;
>>>>>>  		pci_dbg(dev, "broadcast mmio_enabled
>>>>>> message\n");
>>>>>> -		pci_walk_bus(bus, report_mmio_enabled,
>>>>>> &status);
>>>>>> +		pci_walk_below_dev(dev, report_mmio_enabled,
>>>>>> &status);
>>>>>>  	}
>>>>>>
>>>>>>  	if (status == PCI_ERS_RESULT_NEED_RESET) {
>>>>>> @@ -188,17 +194,20 @@ pci_ers_result_t
>>>>>> pcie_do_recovery(struct
>>>>>> pci_dev
>>>>>> *dev,
>>>>>>  		 */
>>>>>>  		status = PCI_ERS_RESULT_RECOVERED;
>>>>>>  		pci_dbg(dev, "broadcast slot_reset message\n");
>>>>>> -		pci_walk_bus(bus, report_slot_reset, &status);
>>>>>> +		pci_walk_below_dev(dev, report_slot_reset,
>>>>>> &status);
>>>>>>  	}
>>>>>>
>>>>>>  	if (status != PCI_ERS_RESULT_RECOVERED)
>>>>>>  		goto failed;
>>>>>>
>>>>>>  	pci_dbg(dev, "broadcast resume message\n");
>>>>>> -	pci_walk_bus(bus, report_resume, &status);
>>>>>> +	pci_walk_below_dev(dev, report_resume, &status);
>>>>>>
>>>>>> -	pci_aer_clear_device_status(dev);
>>>>>> -	pci_aer_clear_nonfatal_status(dev);
>>>>>> +	if ((pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT ||
>>>>>> +	     pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM)) {
>>>>>> +		pci_aer_clear_device_status(dev);
>>>>>> +		pci_aer_clear_nonfatal_status(dev);
>>>>>> +	}
>>>>>>  	pci_info(dev, "device recovery successful\n");
>>>>>>  	return status;
>>>>>>
>>>>>> diff --git a/include/linux/pci.h b/include/linux/pci.h
>>>>>> index c79d83304e52..538bf0a76d33 100644
>>>>>> --- a/include/linux/pci.h
>>>>>> +++ b/include/linux/pci.h
>>>>>> @@ -1411,6 +1411,8 @@ int pci_scan_bridge(struct pci_bus
>>>>>> *bus,
>>>>>> struct
>>>>>> pci_dev *dev, int max,
>>>>>>
>>>>>>  void pci_walk_bus(struct pci_bus *top, int (*cb)(struct
>>>>>> pci_dev
>>>>>> *,
>>>>>> void *),
>>>>>>  		  void *userdata);
>>>>>> +void pci_walk_below_dev(struct pci_dev *dev, int
>>>>>> (*cb)(struct
>>>>>> pci_dev
>>>>>> *, void *),
>>>>>> +			void *userdata);
>>>>>>  int pci_cfg_space_size(struct pci_dev *dev);
>>>>>>  unsigned char pci_bus_max_busnr(struct pci_bus *bus);
>>>>>>  void pci_setup_bridge(struct pci_bus *bus);
>>>>>> -- 
>>>>>> 2.19.1
Jonathan Cameron July 8, 2020, 3:39 p.m. UTC | #10
On Tue, 7 Jul 2020 09:56:26 -0700
Sean V Kelley <sean.v.kelley@linux.intel.com> wrote:

> On 3 Jul 2020, at 1:23, Jonathan Cameron wrote:
> 
> > On Thu, 2 Jul 2020 11:06:26 -0700
> > Sean V Kelley <sean.v.kelley@linux.intel.com> wrote:
> >   
> > > On 26 Jun 2020, at 11:41, Jonathan Cameron wrote:
> > >   
> > > > On Fri, 26 Jun 2020 09:29:34 -0700
> > > > Sean V Kelley <sean.v.kelley@linux.intel.com> wrote:
> > > >   
> > > > > Hi,  
> > > > Hi,
> > > > 
> > > > Thanks for taking a look.
> > > >   
> > > > > 
> > > > > On 22 Jun 2020, at 4:44, Jonathan Cameron wrote:
> > > > >   
> > > > > > Was previously: PCI/AER: Add partial initial supprot for
> > > > > > RCiEPs
> > > > > > using
> > > > > > RCEC or
> > > > > > firmware first.
> > > > > > 
> > > > > > Currently the kernel does not handle AER errors for Root
> > > > > > Complex
> > > > > > integrated
> > > > > > End Points (RCiEPs)[0].  These devices sit on a root bus
> > > > > > within 
> > > > > > the
> > > > > > Root Complex
> > > > > > (RC).  AER handling is performed by a Root Complex Event
> > > > > > Collector
> > > > > > (RCEC) [1]
> > > > > > which is a effectively a type of RCiEP on the same root bus.
> > > > > > 
> > > > > > This code will only perform the correct reset flow for the
> > > > > > case
> > > > > > where
> > > > > > there
> > > > > > is no need to take any actions on the RCEC because the
> > > > > > firmware is
> > > > > > responsible for them.   This is true where APEI [2] is used
> > > > > > to
> > > > > > report
> > > > > > the AER
> > > > > > errors via a GHES[v2] HEST entry [3] and relevant AER CPER
> > > > > > record
> > > > > > [4]
> > > > > > and Firmware
> > > > > > First handling is in use.  
> > > > > 
> > > > > Right, in the case of the RCEC one identifies the RCiEPs by
> > > > > the 
> > > > > RCiEP
> > > > > bitmap as a part of the RCEC Associated Endpoint Extended
> > > > > Capabilities.
> > > > > This ‘search’ so to speak would make use also of the RCEC
> > > > > Associated
> > > > > Bus Numbers Register to associate the devices with an RCEC when
> > > > > not
> > > > > on
> > > > > the same bus.  
> > > > 
> > > > Ah. I'm afraid my access to recent specs is a bit limited at the
> > > > moment.
> > > > I do have a draft 5.0 spec which has that in though so I now see 
> > > > what
> > > > you mean.
> > > > 
> > > > Was introduced in Root Complex Event Collector Endpoint
> > > > Association
> > > > Extended
> > > > Capability version 2 in PCIe 5.0 I think.
> > > >   
> > > 
> > > Correct.
> > >   
> > > > > > As there is no current RCEC driver support, it should not be
> > > > > > possible
> > > > > > to get
> > > > > > to this code via any routes other than the one above. Hence
> > > > > > appropriate RCEC
> > > > > > handling can be added when the RCEC driver support is ready.
> > > > > > The error handling is different from a normal PCIe End Point
> > > > > > because:
> > > > > > 
> > > > > > 1) There is no downstream port above an RCiEP as these
> > > > > > devices sit
> > > > > > on
> > > > > > a root
> > > > > >    bus.
> > > > > > 
> > > > > > 2) In general, it makes little sense to reset other devices
> > > > > > on on
> > > > > > the
> > > > > > same
> > > > > >    root bus.  For error handling outside the of the root
> > > > > > complex
> > > > > > (RC)
> > > > > > an AER
> > > > > >    error will indicate that all the topology below the
> > > > > > physical
> > > > > > link,
> > > > > > which
> > > > > >    the error is related to, will need to be reset as they
> > > > > > share a
> > > > > > common
> > > > > >    path to the host.  For an RCiEP there is no such defined
> > > > > > shared
> > > > > > path
> > > > > >    relationship with other elements on the root bus.
> > > > > > 
> > > > > > A new walk function, similar to pci_bus_walk is provided
> > > > > > that 
> > > > > > takes
> > > > > > a
> > > > > > pci_dev
> > > > > > instead of a bus.  If that dev corresponds to a downstream
> > > > > > port it
> > > > > > will walk
> > > > > > the subordinate bus of that downstream port.  If the dev does
> > > > > > not
> > > > > > then
> > > > > > it
> > > > > > will call the function on that device alone.   This function 
> > > > > > allows
> > > > > > us
> > > > > > to
> > > > > > avoid adding special cases to the majority of the error
> > > > > > handling.  
> > > > > 
> > > > > Then in that case the callback could add the additional checks
> > > > > specific
> > > > > to identifying the associated RCiEPs.  
> > > > 
> > > > I am afraid I don't follow what you mean here.  Could you give
> > > > more
> > > > info?  
> > > 
> > > Sure, a given RCEC can be associated with multiple RCiEPs.  As a
> > > part 
> > > of
> > > the Extended Association Cap it is possible to obtain a bitmap of
> > > the
> > > RCiEP device ids on the same bus number as the RCEC device itself.
> > > (5.0-1.0 sec 7.9.10.2).  With a Cap version of 2h or higher, it is 
> > > also
> > > possible to get an additional range of bus numbers containing
> > > RCiEPs
> > > also associated with this RCEC.
> > > 
> > > So I’m wondering if this function could be used in which passing a
> > > dev, in this case the RCEC, triggers the call back which makes use
> > > of
> > > the RCiEP bitmap and associated bus ranges to return all identified
> > > devices in use cases such as in AER for finding sources, etc.  
> > 
> > Ah understood.
> > 
> > If we do this we effectively end up with 3 different types of walk
> > and
> > the meaning of the walk function gets more complex again.
> > 
> > 1) Normal bus walk - we pass the downstream port above a bus to which
> > the device is attached and it walks everthing below that point.
> > 
> > 2) Case I care about RCiEP with no visible association with an RCEC
> > as
> > I don't need to walk devices.  In that case just calls the callbacks 
> > for
> > the actual device.
> > 
> > 3) Pass in RCiEP with RCEC asociated with it (or do a dance at the 
> > caller
> > to pass in the RCEC itself). Need to walk the devices that the RCEC
> > is
> > handling errors for.  For handling, I'm not all the calls will be 
> > generally
> > applicable to other devices associated with the RCEC as some only
> > make
> > sense if there is an actual PCIe bus involved and hence we need to 
> > reset
> > other devices on that bus.  For RCEC I don't think there is an 
> > particular
> > reason to assume an AER error reported at one RCiEP will have any 
> > impact
> > on other devices associated with the particular RCEC.
> > I've not found anything in the spec addressing this question but 
> > perhaps
> > I've missed something?  
> 
> Correct.  There should be no impact to the RCEC or its associated
> RCiEPs which may not happen to reside on the same bus as the collector at all.
> 
> > However, if the RCEC doesn't support multiple error records, you may 
> > need
> > to walk the bus to identify multiple simultaneous issues, very 
> > carefully
> > avoiding (or least minimizing) race conditions.
> >   
> > > The alternative is to have a separate walk for RCECs that loops 
> > > through
> > > the bitmap / ranges (if supported) triggering the callback for each
> > > device found.  
> 
> I’ve been testing the Associated Endpoint Bitmap and Bus Range 
> handling and using my pciutils patches to help to confirm some of the 
> association. This overlaps with my CXL work and CXL 1.1 based RCiEPs
> are good test cards:
> 
> (Decode via:  
> https://lore.kernel.org/linux-pci/20200624223940.240463-1-sean.v.kelley@linux.intel.com/
>  )
> 
> Test card at 6b:00.0
> 
> Capabilities: [e00 v1] Designated Vendor-Specific: Vendor=1e98 ID=0000
> Rev=0 Len=56: CXL
> CXLCap: Cache- IO+ Mem+ Mem HW Init+ HDMCount 1 Viral-
> CXLCtl: Cache- IO+ Mem- Cache SF Cov 0 Cache SF Gran 0 Cache Clean-
> Viral-
> CXLSta: Viral-
> Capabilities: [e38 v1] Device Serial Number 30-91-11-78-10-00-00-00
> 
> RCEC assocated to RCiEP at 6b while residing at 6a:
> 
> Capabilities: [160 v2] Root Complex Event Collector Endpoint
> Association
> RCiEPBitmap: 00000000 [none]
> AssociatedBusNumbers: 6b-6b
> Kernel driver in use: pcieport
> 
> with dmesg:
> 
> [ 10.502543] pcieport 0000:6a:00.4: AER: enabled with IRQ 34
> 
> The trick is the walk, which is not compact.  Currently working on
> error injection to test:
> 
> void pcie_walk_rcec(struct pci_dev *rcec, int (*cb)(struct pci_dev *,
> void *),
> +                   void *userdata)
> +{
> +       u32 pos, bitmap, hdr, busn;
> +       u8 ver, nextbusn, lastbusn;
> +       unsigned int dev, fn, bnr;
> +       struct pci_bus *pbus=NULL;
> +       struct pci_dev *pdev;
> +       int retval;
> +
> +       pos = pci_find_ext_capability(rcec, PCI_EXT_CAP_ID_RCEC);
> +       if (!pos)
> +               return;
> +
> +       pci_read_config_dword(rcec, pos + PCI_RCEC_RCIEP_BITMAP,
> &bitmap);
> +
> +       for (dev = 0; dev < 32; dev++) {

for_each_bit_set perhaps?

> +               if (!(bitmap & (1 << dev)))
> +                       continue;
> +
> +               for (fn = 0; fn < 8; fn++) {
> +                       pdev =
> pci_get_domain_bus_and_slot(pci_domain_nr(rcec->bus),
> +                                                          rcec->bus-
> >number,  
> +                                                          PCI_DEVFN(de
> v, fn));
> +                       if (!pdev)
> +                               continue;
> +
> +                       retval = cb(pdev, userdata);
> +                       if (retval)
> +                               return;
> +               }
> +       }
> 
> Then continuing in the same function above, I need to also consider the
> case for the Bus ranges (still wip, not tested):
> 
> +       pci_read_config_dword(rcec, pos, &hdr);
> +       ver = PCI_RCEC_EP_CAP_VER(hdr);
> +       if (ver < PCI_RCEC_BUSN_REG_VER)
> +               return;
> +
> +       pci_read_config_dword(rcec, pos + PCI_RCEC_BUSN, &busn);
> +       nextbusn = PCI_RCEC_BUSN_NEXT(busn);
> +       lastbusn = PCI_RCEC_BUSN_LAST(busn);
> +
> +       if ((nextbusn == 0xff) && (lastbusn == 0x00))
> +               return;
> +
> +       for (bnr = nextbusn; bnr < (lastbusn + 1); bnr++) {
> +               pbus = pci_find_bus(pci_domain_nr(rcec->bus), bnr);
> +               if (pbus) {
> +                       /* find RCiEP devices on the given bus */
> +                       for (dev = 0; dev < 32; dev++) {
> + etc...
> +                       }
> +               }
> +       }
> +}

Makes sense.

> 
> Currently this lives in aer.c and it's large enough that I wonder if
> due to the specifity of the assoicated spec requirments if that should
> be fine?

Seems fine to me.

> 
> > Agreed. We would end up with the same splitting of handling paths
> > that
> > wasn't liked in my v1 patch.   Perhaps we need  single 
> > pci_walk_aer_affected
> > function with a pile of documentation for what it is actually doing?  
> 
> Perhaps this would be a good start?  I tend to agree.  I can also
> submit more of the patches as RFC for further comment.

When you are ready that would be great.

> 
> > 
> > Even then we may need to have a parameter to indicate a particular 
> > callback
> > should be restricted to devices that share a 'real bus' or not.  
> 
> In my case 'walk' through the spec options of either just bitmap or if
> of sufficient version (2h), I walk through the bus ranges, calling the
> callback at each encounter.

That wasn't what I meant.  When doing some of the actual handling once
we have a walk function it will get a bit fiddly.
I 'think' we need to first identify which devices associated with the RCEC
have reported an AER error and then we need to only call the callback for
those.

In a case with a real bus and EP, the assumption is that the whole bus
is going down so you have to issue it to everyone.

> 
> > 
> > I guess the proof will as ever be in what the code looks like.  
> 
> Agreed.
> 
> > 
> > Hmm. It increasingly feels like we may need to have a go at drawing 
> > together
> > some coherent documentation for the different ways of handling
> > AER errors and specifications / assumptions for each.  Would be 
> > 'interesting'
> > to do given I'm fairly sure very few people actually understands all 
> > the options and
> > nasty corner cases!
> > 
> > +CC Lorenzo and James who may also be interested in this topic in 
> > general.  
> 
> Let's do.

Great.  Whilst we only care about this particular corner for now it may
be worth working out some more comprehensive docs covering standard topology
as well. CXL may bring its own additions, but I'm not sure we actually
have a clear description of what 'should' happen on normal PCIe.

Jonathan

> 
> Best regards,
> 
> Sean
> 
> > 
> > Thanks,
> > 
> > Jonathan
> > 
> >   
> > > Thanks,
> > > 
> > > Sean
> > >   
> > > > > > Open questions:
> > > > > > 
> > > > > > 1. Are we better protecting against link reset for an RCiEP
> > > > > > in 
> > > > > > here
> > > > > > or
> > > > > >    should we put the check in the link reset functions?
> > > > > > 
> > > > > > 2. If we were to get a stupid firmware record with the
> > > > > > relevant
> > > > > > reset
> > > > > > flag
> > > > > >    set to trigger a link reset, what is the correct
> > > > > > response?  For
> > > > > > now
> > > > > > I
> > > > > >    try to report that we haven't done anything and print a 
> > > > > > warning.
> > > > > > 
> > > > > > 3. Naming of pci_walk_below_dev is rather unsatisfying. Any
> > > > > > better
> > > > > > ideas?
> > > > > > 
> > > > > > 4. pci_walk_below_dev is perhaps not of general utility.
> > > > > > Shall I
> > > > > > make
> > > > > > it local
> > > > > >    in err.c?  If not would a precursor patch for that be
> > > > > > preferred?’  
> > > > > 
> > > > > It depends.  Is it intended as a drop in replacement where
> > > > > needed 
> > > > > for
> > > > > pci_walk_bus()? So in that case you are now passing the dev 
> > > > > structure
> > > > > and do the check for subordinate or is it intended as being 
> > > > > specific
> > > > > to
> > > > > say RCEC? With AER, one could either first check for RC_EC type
> > > > > before
> > > > > using this one.  Or one could just drop in replace (passing the
> > > > > dev
> > > > > structure instead) and the call back performs the RCEC specific
> > > > > checks
> > > > > when a device is encountered.  
> > > > 
> > > > If it is useful in aer.c that's great.   Just seemed such a weird
> > > > beast
> > > > I wasn't sure it would be of use anywhere else.
> > > >   
> > > > > > Testing has been performed via error injection on a QEMU
> > > > > > platform 
> > > > > > as
> > > > > > that lets
> > > > > > me create a wide range of topologies and report errors at
> > > > > > any 
> > > > > > chosen
> > > > > > location.
> > > > > > Currently I have no plans to upstream this injection support,
> > > > > > but 
> > > > > > am
> > > > > > happy to
> > > > > > share if useful to others.  
> > > > > 
> > > > > I’m experimenting with it in my RCEC code in AER and will give 
> > > > > you
> > > > > additional feedback.  
> > > > 
> > > > Great, thanks
> > > > 
> > > > Jonathan
> > > >   
> > > > > Thanks,
> > > > > 
> > > > > Sean
> > > > > 
> > > > >   
> > > > > > [0] ACPI PCI Express Base Specification 4.0 1.3.2.3 Root
> > > > > > Complex
> > > > > > Integrated
> > > > > >     Endpoint Rules.
> > > > > > [1] ACPI PCI Express Base Specification 4.0 6.2 Error
> > > > > > Signalling 
> > > > > > and
> > > > > > Logging
> > > > > > [2] ACPI Specification 6.3 Chapter 18 ACPI Platform Error 
> > > > > > Interface
> > > > > > (APEI)
> > > > > > [3] ACPI Sepcification 6.3 18.2.3.7 Generic Hardware Error
> > > > > > Source
> > > > > > [4] UEFI Specification 2.8, N.2.7 PCI Express Error Section
> > > > > > 
> > > > > > Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > > > > > ---
> > > > > > Changes since v1:
> > > > > > * Separated from the largely unrelated fix so the two can
> > > > > > move
> > > > > > forwards separately.
> > > > > > * Instead of separate path for RCiEP handling use the method
> > > > > > suggested
> > > > > > by Bjorn
> > > > > >   and Sathyanarayanan with an adjusted pci_bus_walk.
> > > > > > 
> > > > > > Thanks all for reviews of V1.
> > > > > > 
> > > > > >  drivers/pci/bus.c      | 28 ++++++++++++++++++++++++++++
> > > > > >  drivers/pci/pcie/err.c | 29 +++++++++++++++++++----------
> > > > > >  include/linux/pci.h    |  2 ++
> > > > > >  3 files changed, 49 insertions(+), 10 deletions(-)
> > > > > > 
> > > > > > diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
> > > > > > index 8e40b3e6da77..7cbe1ed2db3d 100644
> > > > > > --- a/drivers/pci/bus.c
> > > > > > +++ b/drivers/pci/bus.c
> > > > > > @@ -411,6 +411,34 @@ void pci_walk_bus(struct pci_bus *top,
> > > > > > int
> > > > > > (*cb)(struct pci_dev *, void *),
> > > > > >  }
> > > > > >  EXPORT_SYMBOL_GPL(pci_walk_bus);
> > > > > > 
> > > > > > +/** pci_walk_below_dev - walk devices below (or on) another 
> > > > > > device
> > > > > > + *  @dev      device for which we should walk below,
> > > > > > include 
> > > > > > device
> > > > > > when not a port.
> > > > > > + *  @cb       callback to be called for each device found
> > > > > > + *  @userdata arbitrary pointer to be passed to callback.
> > > > > > + *
> > > > > > + *  If the device provided is a port,
> > > > > > + *  walk the subordinate bus, including any bridged devices
> > > > > > + *  on buses under this bus.  Call the provided callback
> > > > > > + *  on each device found.
> > > > > > + *
> > > > > > + *  If the device provided hs no subordinate bus, call the 
> > > > > > provided
> > > > > > + *  callback on the device itself.
> > > > > > + *
> > > > > > + */
> > > > > > +void pci_walk_below_dev(struct pci_dev *dev, int
> > > > > > (*cb)(struct
> > > > > > pci_dev
> > > > > > *, void *),
> > > > > > +			void *userdata)
> > > > > > +{
> > > > > > +	struct pci_bus *bus;
> > > > > > +
> > > > > > +	if (dev->subordinate) {
> > > > > > +		bus = dev->subordinate;
> > > > > > +		pci_walk_bus(bus, cb, userdata);
> > > > > > +	} else {
> > > > > > +		cb(dev, userdata);
> > > > > > +	}
> > > > > > +}
> > > > > > +EXPORT_SYMBOL_GPL(pci_walk_below_dev);
> > > > > > +
> > > > > >  struct pci_bus *pci_bus_get(struct pci_bus *bus)
> > > > > >  {
> > > > > >  	if (bus)
> > > > > > diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
> > > > > > index 14bb8f54723e..fa08b1cc3d96 100644
> > > > > > --- a/drivers/pci/pcie/err.c
> > > > > > +++ b/drivers/pci/pcie/err.c
> > > > > > @@ -151,33 +151,39 @@ pci_ers_result_t
> > > > > > pcie_do_recovery(struct
> > > > > > pci_dev
> > > > > > *dev,
> > > > > >  			pci_ers_result_t (*reset_link)(struct
> > > > > > pci_dev *pdev))
> > > > > >  {
> > > > > >  	pci_ers_result_t status = PCI_ERS_RESULT_CAN_RECOVER;
> > > > > > -	struct pci_bus *bus;
> > > > > > 
> > > > > >  	/*
> > > > > >  	 * Error recovery runs on all subordinates of the
> > > > > > first 
> > > > > > downstream
> > > > > > port.
> > > > > >  	 * If the downstream port detected the error, it is
> > > > > > cleared at 
> > > > > > the
> > > > > > end.
> > > > > > +	 * For RCiEPs we should reset just the RCiEP itself.
> > > > > >  	 */
> > > > > >  	if (!(pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT ||
> > > > > > -	      pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM))
> > > > > > +	      pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM ||
> > > > > > +	      pci_pcie_type(dev) == PCI_EXP_TYPE_RC_END))
> > > > > >  		dev = dev->bus->self;
> > > > > > -	bus = dev->subordinate;
> > > > > > 
> > > > > >  	pci_dbg(dev, "broadcast error_detected message\n");
> > > > > >  	if (state == pci_channel_io_frozen) {
> > > > > > -		pci_walk_bus(bus, report_frozen_detected,
> > > > > > &status);
> > > > > > +		pci_walk_below_dev(dev, report_frozen_detected,
> > > > > > &status);
> > > > > > +		if (pci_pcie_type(dev) == PCI_EXP_TYPE_RC_END)
> > > > > > {
> > > > > > +			pci_warn(dev, "link reset not possible
> > > > > > for RCiEP\n");
> > > > > > +			status = PCI_ERS_RESULT_NONE;
> > > > > > +			goto failed;
> > > > > > +		}
> > > > > > +
> > > > > >  		status = reset_link(dev);
> > > > > >  		if (status != PCI_ERS_RESULT_RECOVERED) {
> > > > > >  			pci_warn(dev, "link reset failed\n");
> > > > > >  			goto failed;
> > > > > >  		}
> > > > > >  	} else {
> > > > > > -		pci_walk_bus(bus, report_normal_detected,
> > > > > > &status);
> > > > > > +		pci_walk_below_dev(dev, report_normal_detected,
> > > > > > &status);
> > > > > >  	}
> > > > > > 
> > > > > >  	if (status == PCI_ERS_RESULT_CAN_RECOVER) {
> > > > > >  		status = PCI_ERS_RESULT_RECOVERED;
> > > > > >  		pci_dbg(dev, "broadcast mmio_enabled
> > > > > > message\n");
> > > > > > -		pci_walk_bus(bus, report_mmio_enabled,
> > > > > > &status);
> > > > > > +		pci_walk_below_dev(dev, report_mmio_enabled,
> > > > > > &status);
> > > > > >  	}
> > > > > > 
> > > > > >  	if (status == PCI_ERS_RESULT_NEED_RESET) {
> > > > > > @@ -188,17 +194,20 @@ pci_ers_result_t
> > > > > > pcie_do_recovery(struct
> > > > > > pci_dev
> > > > > > *dev,
> > > > > >  		 */
> > > > > >  		status = PCI_ERS_RESULT_RECOVERED;
> > > > > >  		pci_dbg(dev, "broadcast slot_reset message\n");
> > > > > > -		pci_walk_bus(bus, report_slot_reset, &status);
> > > > > > +		pci_walk_below_dev(dev, report_slot_reset,
> > > > > > &status);
> > > > > >  	}
> > > > > > 
> > > > > >  	if (status != PCI_ERS_RESULT_RECOVERED)
> > > > > >  		goto failed;
> > > > > > 
> > > > > >  	pci_dbg(dev, "broadcast resume message\n");
> > > > > > -	pci_walk_bus(bus, report_resume, &status);
> > > > > > +	pci_walk_below_dev(dev, report_resume, &status);
> > > > > > 
> > > > > > -	pci_aer_clear_device_status(dev);
> > > > > > -	pci_aer_clear_nonfatal_status(dev);
> > > > > > +	if ((pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT ||
> > > > > > +	     pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM)) {
> > > > > > +		pci_aer_clear_device_status(dev);
> > > > > > +		pci_aer_clear_nonfatal_status(dev);
> > > > > > +	}
> > > > > >  	pci_info(dev, "device recovery successful\n");
> > > > > >  	return status;
> > > > > > 
> > > > > > diff --git a/include/linux/pci.h b/include/linux/pci.h
> > > > > > index c79d83304e52..538bf0a76d33 100644
> > > > > > --- a/include/linux/pci.h
> > > > > > +++ b/include/linux/pci.h
> > > > > > @@ -1411,6 +1411,8 @@ int pci_scan_bridge(struct pci_bus
> > > > > > *bus,
> > > > > > struct
> > > > > > pci_dev *dev, int max,
> > > > > > 
> > > > > >  void pci_walk_bus(struct pci_bus *top, int (*cb)(struct
> > > > > > pci_dev 
> > > > > > *,
> > > > > > void *),
> > > > > >  		  void *userdata);
> > > > > > +void pci_walk_below_dev(struct pci_dev *dev, int
> > > > > > (*cb)(struct
> > > > > > pci_dev
> > > > > > *, void *),
> > > > > > +			void *userdata);
> > > > > >  int pci_cfg_space_size(struct pci_dev *dev);
> > > > > >  unsigned char pci_bus_max_busnr(struct pci_bus *bus);
> > > > > >  void pci_setup_bridge(struct pci_bus *bus);
> > > > > > -- 
> > > > > > 2.19.1  
>
Sean V Kelley July 8, 2020, 4:01 p.m. UTC | #11
On 8 Jul 2020, at 8:39, Jonathan Cameron wrote:

> On Tue, 7 Jul 2020 09:56:26 -0700
> Sean V Kelley <sean.v.kelley@linux.intel.com> wrote:
>
>> On 3 Jul 2020, at 1:23, Jonathan Cameron wrote:
>>
>>> On Thu, 2 Jul 2020 11:06:26 -0700
>>> Sean V Kelley <sean.v.kelley@linux.intel.com> wrote:
>>>
>>>> On 26 Jun 2020, at 11:41, Jonathan Cameron wrote:
>>>>
>>>>> On Fri, 26 Jun 2020 09:29:34 -0700
>>>>> Sean V Kelley <sean.v.kelley@linux.intel.com> wrote:
>>>>>
>>>>>> Hi,
>>>>> Hi,
>>>>>
>>>>> Thanks for taking a look.
>>>>>
>>>>>>
>>>>>> On 22 Jun 2020, at 4:44, Jonathan Cameron wrote:
>>>>>>
>>>>>>> Was previously: PCI/AER: Add partial initial supprot for
>>>>>>> RCiEPs
>>>>>>> using
>>>>>>> RCEC or
>>>>>>> firmware first.
>>>>>>>
>>>>>>> Currently the kernel does not handle AER errors for Root
>>>>>>> Complex
>>>>>>> integrated
>>>>>>> End Points (RCiEPs)[0].  These devices sit on a root bus
>>>>>>> within
>>>>>>> the
>>>>>>> Root Complex
>>>>>>> (RC).  AER handling is performed by a Root Complex Event
>>>>>>> Collector
>>>>>>> (RCEC) [1]
>>>>>>> which is a effectively a type of RCiEP on the same root bus.
>>>>>>>
>>>>>>> This code will only perform the correct reset flow for the
>>>>>>> case
>>>>>>> where
>>>>>>> there
>>>>>>> is no need to take any actions on the RCEC because the
>>>>>>> firmware is
>>>>>>> responsible for them.   This is true where APEI [2] is used
>>>>>>> to
>>>>>>> report
>>>>>>> the AER
>>>>>>> errors via a GHES[v2] HEST entry [3] and relevant AER CPER
>>>>>>> record
>>>>>>> [4]
>>>>>>> and Firmware
>>>>>>> First handling is in use.
>>>>>>
>>>>>> Right, in the case of the RCEC one identifies the RCiEPs by
>>>>>> the
>>>>>> RCiEP
>>>>>> bitmap as a part of the RCEC Associated Endpoint Extended
>>>>>> Capabilities.
>>>>>> This ‘search’ so to speak would make use also of the RCEC
>>>>>> Associated
>>>>>> Bus Numbers Register to associate the devices with an RCEC when
>>>>>> not
>>>>>> on
>>>>>> the same bus.
>>>>>
>>>>> Ah. I'm afraid my access to recent specs is a bit limited at the
>>>>> moment.
>>>>> I do have a draft 5.0 spec which has that in though so I now see
>>>>> what
>>>>> you mean.
>>>>>
>>>>> Was introduced in Root Complex Event Collector Endpoint
>>>>> Association
>>>>> Extended
>>>>> Capability version 2 in PCIe 5.0 I think.
>>>>>
>>>>
>>>> Correct.
>>>>
>>>>>>> As there is no current RCEC driver support, it should not be
>>>>>>> possible
>>>>>>> to get
>>>>>>> to this code via any routes other than the one above. Hence
>>>>>>> appropriate RCEC
>>>>>>> handling can be added when the RCEC driver support is ready.
>>>>>>> The error handling is different from a normal PCIe End Point
>>>>>>> because:
>>>>>>>
>>>>>>> 1) There is no downstream port above an RCiEP as these
>>>>>>> devices sit
>>>>>>> on
>>>>>>> a root
>>>>>>>    bus.
>>>>>>>
>>>>>>> 2) In general, it makes little sense to reset other devices
>>>>>>> on on
>>>>>>> the
>>>>>>> same
>>>>>>>    root bus.  For error handling outside the of the root
>>>>>>> complex
>>>>>>> (RC)
>>>>>>> an AER
>>>>>>>    error will indicate that all the topology below the
>>>>>>> physical
>>>>>>> link,
>>>>>>> which
>>>>>>>    the error is related to, will need to be reset as they
>>>>>>> share a
>>>>>>> common
>>>>>>>    path to the host.  For an RCiEP there is no such defined
>>>>>>> shared
>>>>>>> path
>>>>>>>    relationship with other elements on the root bus.
>>>>>>>
>>>>>>> A new walk function, similar to pci_bus_walk is provided
>>>>>>> that
>>>>>>> takes
>>>>>>> a
>>>>>>> pci_dev
>>>>>>> instead of a bus.  If that dev corresponds to a downstream
>>>>>>> port it
>>>>>>> will walk
>>>>>>> the subordinate bus of that downstream port.  If the dev does
>>>>>>> not
>>>>>>> then
>>>>>>> it
>>>>>>> will call the function on that device alone.   This function
>>>>>>> allows
>>>>>>> us
>>>>>>> to
>>>>>>> avoid adding special cases to the majority of the error
>>>>>>> handling.
>>>>>>
>>>>>> Then in that case the callback could add the additional checks
>>>>>> specific
>>>>>> to identifying the associated RCiEPs.
>>>>>
>>>>> I am afraid I don't follow what you mean here.  Could you give
>>>>> more
>>>>> info?
>>>>
>>>> Sure, a given RCEC can be associated with multiple RCiEPs.  As a
>>>> part
>>>> of
>>>> the Extended Association Cap it is possible to obtain a bitmap of
>>>> the
>>>> RCiEP device ids on the same bus number as the RCEC device itself.
>>>> (5.0-1.0 sec 7.9.10.2).  With a Cap version of 2h or higher, it is
>>>> also
>>>> possible to get an additional range of bus numbers containing
>>>> RCiEPs
>>>> also associated with this RCEC.
>>>>
>>>> So I’m wondering if this function could be used in which passing 
>>>> a
>>>> dev, in this case the RCEC, triggers the call back which makes use
>>>> of
>>>> the RCiEP bitmap and associated bus ranges to return all identified
>>>> devices in use cases such as in AER for finding sources, etc.
>>>
>>> Ah understood.
>>>
>>> If we do this we effectively end up with 3 different types of walk
>>> and
>>> the meaning of the walk function gets more complex again.
>>>
>>> 1) Normal bus walk - we pass the downstream port above a bus to 
>>> which
>>> the device is attached and it walks everthing below that point.
>>>
>>> 2) Case I care about RCiEP with no visible association with an RCEC
>>> as
>>> I don't need to walk devices.  In that case just calls the callbacks
>>> for
>>> the actual device.
>>>
>>> 3) Pass in RCiEP with RCEC asociated with it (or do a dance at the
>>> caller
>>> to pass in the RCEC itself). Need to walk the devices that the RCEC
>>> is
>>> handling errors for.  For handling, I'm not all the calls will be
>>> generally
>>> applicable to other devices associated with the RCEC as some only
>>> make
>>> sense if there is an actual PCIe bus involved and hence we need to
>>> reset
>>> other devices on that bus.  For RCEC I don't think there is an
>>> particular
>>> reason to assume an AER error reported at one RCiEP will have any
>>> impact
>>> on other devices associated with the particular RCEC.
>>> I've not found anything in the spec addressing this question but
>>> perhaps
>>> I've missed something?
>>
>> Correct.  There should be no impact to the RCEC or its associated
>> RCiEPs which may not happen to reside on the same bus as the 
>> collector at all.
>>
>>> However, if the RCEC doesn't support multiple error records, you may
>>> need
>>> to walk the bus to identify multiple simultaneous issues, very
>>> carefully
>>> avoiding (or least minimizing) race conditions.
>>>
>>>> The alternative is to have a separate walk for RCECs that loops
>>>> through
>>>> the bitmap / ranges (if supported) triggering the callback for each
>>>> device found.
>>
>> I’ve been testing the Associated Endpoint Bitmap and Bus Range
>> handling and using my pciutils patches to help to confirm some of the
>> association. This overlaps with my CXL work and CXL 1.1 based RCiEPs
>> are good test cards:
>>
>> (Decode via:
>> https://lore.kernel.org/linux-pci/20200624223940.240463-1-sean.v.kelley@linux.intel.com/
>>  )
>>
>> Test card at 6b:00.0
>>
>> Capabilities: [e00 v1] Designated Vendor-Specific: Vendor=1e98 
>> ID=0000
>> Rev=0 Len=56: CXL
>> CXLCap: Cache- IO+ Mem+ Mem HW Init+ HDMCount 1 Viral-
>> CXLCtl: Cache- IO+ Mem- Cache SF Cov 0 Cache SF Gran 0 Cache Clean-
>> Viral-
>> CXLSta: Viral-
>> Capabilities: [e38 v1] Device Serial Number 30-91-11-78-10-00-00-00
>>
>> RCEC assocated to RCiEP at 6b while residing at 6a:
>>
>> Capabilities: [160 v2] Root Complex Event Collector Endpoint
>> Association
>> RCiEPBitmap: 00000000 [none]
>> AssociatedBusNumbers: 6b-6b
>> Kernel driver in use: pcieport
>>
>> with dmesg:
>>
>> [ 10.502543] pcieport 0000:6a:00.4: AER: enabled with IRQ 34
>>
>> The trick is the walk, which is not compact.  Currently working on
>> error injection to test:
>>
>> void pcie_walk_rcec(struct pci_dev *rcec, int (*cb)(struct pci_dev *,
>> void *),
>> +                   void *userdata)
>> +{
>> +       u32 pos, bitmap, hdr, busn;
>> +       u8 ver, nextbusn, lastbusn;
>> +       unsigned int dev, fn, bnr;
>> +       struct pci_bus *pbus=NULL;
>> +       struct pci_dev *pdev;
>> +       int retval;
>> +
>> +       pos = pci_find_ext_capability(rcec, PCI_EXT_CAP_ID_RCEC);
>> +       if (!pos)
>> +               return;
>> +
>> +       pci_read_config_dword(rcec, pos + PCI_RCEC_RCIEP_BITMAP,
>> &bitmap);
>> +
>> +       for (dev = 0; dev < 32; dev++) {
>
> for_each_bit_set perhaps?
>
>> +               if (!(bitmap & (1 << dev)))
>> +                       continue;
>> +
>> +               for (fn = 0; fn < 8; fn++) {
>> +                       pdev =
>> pci_get_domain_bus_and_slot(pci_domain_nr(rcec->bus),
>> +                                                          rcec->bus-
>>> number,
>> +                                                          
>> PCI_DEVFN(de
>> v, fn));
>> +                       if (!pdev)
>> +                               continue;
>> +
>> +                       retval = cb(pdev, userdata);
>> +                       if (retval)
>> +                               return;
>> +               }
>> +       }
>>
>> Then continuing in the same function above, I need to also consider 
>> the
>> case for the Bus ranges (still wip, not tested):
>>
>> +       pci_read_config_dword(rcec, pos, &hdr);
>> +       ver = PCI_RCEC_EP_CAP_VER(hdr);
>> +       if (ver < PCI_RCEC_BUSN_REG_VER)
>> +               return;
>> +
>> +       pci_read_config_dword(rcec, pos + PCI_RCEC_BUSN, &busn);
>> +       nextbusn = PCI_RCEC_BUSN_NEXT(busn);
>> +       lastbusn = PCI_RCEC_BUSN_LAST(busn);
>> +
>> +       if ((nextbusn == 0xff) && (lastbusn == 0x00))
>> +               return;
>> +
>> +       for (bnr = nextbusn; bnr < (lastbusn + 1); bnr++) {
>> +               pbus = pci_find_bus(pci_domain_nr(rcec->bus), bnr);
>> +               if (pbus) {
>> +                       /* find RCiEP devices on the given bus */
>> +                       for (dev = 0; dev < 32; dev++) {
>> + etc...
>> +                       }
>> +               }
>> +       }
>> +}
>
> Makes sense.
>
>>
>> Currently this lives in aer.c and it's large enough that I wonder if
>> due to the specifity of the assoicated spec requirments if that 
>> should
>> be fine?
>
> Seems fine to me.
>
>>
>>> Agreed. We would end up with the same splitting of handling paths
>>> that
>>> wasn't liked in my v1 patch.   Perhaps we need  single
>>> pci_walk_aer_affected
>>> function with a pile of documentation for what it is actually doing?
>>
>> Perhaps this would be a good start?  I tend to agree.  I can also
>> submit more of the patches as RFC for further comment.
>
> When you are ready that would be great.

Sounds good.

>
>>
>>>
>>> Even then we may need to have a parameter to indicate a particular
>>> callback
>>> should be restricted to devices that share a 'real bus' or not.
>>
>> In my case 'walk' through the spec options of either just bitmap or 
>> if
>> of sufficient version (2h), I walk through the bus ranges, calling 
>> the
>> callback at each encounter.
>
> That wasn't what I meant.  When doing some of the actual handling once
> we have a walk function it will get a bit fiddly.
> I 'think' we need to first identify which devices associated with the 
> RCEC
> have reported an AER error and then we need to only call the callback 
> for
> those.

Right, I was going to do that decision making in the call back rather 
than include those details in the walk. That way the call back simply 
reports what associated devices it finds and makes no assumptions on 
which device needs to be acted upon.  I’ll experiment with 
alternatives to placement of identification code.

>
> In a case with a real bus and EP, the assumption is that the whole bus
> is going down so you have to issue it to everyone.

Correct.

>
>>
>>>
>>> I guess the proof will as ever be in what the code looks like.
>>
>> Agreed.
>>
>>>
>>> Hmm. It increasingly feels like we may need to have a go at drawing
>>> together
>>> some coherent documentation for the different ways of handling
>>> AER errors and specifications / assumptions for each.  Would be
>>> 'interesting'
>>> to do given I'm fairly sure very few people actually understands all
>>> the options and
>>> nasty corner cases!
>>>
>>> +CC Lorenzo and James who may also be interested in this topic in
>>> general.
>>
>> Let's do.
>
> Great.  Whilst we only care about this particular corner for now it 
> may
> be worth working out some more comprehensive docs covering standard 
> topology
> as well. CXL may bring its own additions, but I'm not sure we actually
> have a clear description of what 'should' happen on normal PCIe.

Agree, the focus is on base or normal PCIe behavior. CXL is one use case 
with its own exceptions.

Sean

>
> Jonathan
>
>>
>> Best regards,
>>
>> Sean
>>
>>>
>>> Thanks,
>>>
>>> Jonathan
>>>
>>>
>>>> Thanks,
>>>>
>>>> Sean
>>>>
>>>>>>> Open questions:
>>>>>>>
>>>>>>> 1. Are we better protecting against link reset for an RCiEP
>>>>>>> in
>>>>>>> here
>>>>>>> or
>>>>>>>    should we put the check in the link reset functions?
>>>>>>>
>>>>>>> 2. If we were to get a stupid firmware record with the
>>>>>>> relevant
>>>>>>> reset
>>>>>>> flag
>>>>>>>    set to trigger a link reset, what is the correct
>>>>>>> response?  For
>>>>>>> now
>>>>>>> I
>>>>>>>    try to report that we haven't done anything and print a
>>>>>>> warning.
>>>>>>>
>>>>>>> 3. Naming of pci_walk_below_dev is rather unsatisfying. Any
>>>>>>> better
>>>>>>> ideas?
>>>>>>>
>>>>>>> 4. pci_walk_below_dev is perhaps not of general utility.
>>>>>>> Shall I
>>>>>>> make
>>>>>>> it local
>>>>>>>    in err.c?  If not would a precursor patch for that be
>>>>>>> preferred?’
>>>>>>
>>>>>> It depends.  Is it intended as a drop in replacement where
>>>>>> needed
>>>>>> for
>>>>>> pci_walk_bus()? So in that case you are now passing the dev
>>>>>> structure
>>>>>> and do the check for subordinate or is it intended as being
>>>>>> specific
>>>>>> to
>>>>>> say RCEC? With AER, one could either first check for RC_EC type
>>>>>> before
>>>>>> using this one.  Or one could just drop in replace (passing the
>>>>>> dev
>>>>>> structure instead) and the call back performs the RCEC specific
>>>>>> checks
>>>>>> when a device is encountered.
>>>>>
>>>>> If it is useful in aer.c that's great.   Just seemed such a weird
>>>>> beast
>>>>> I wasn't sure it would be of use anywhere else.
>>>>>
>>>>>>> Testing has been performed via error injection on a QEMU
>>>>>>> platform
>>>>>>> as
>>>>>>> that lets
>>>>>>> me create a wide range of topologies and report errors at
>>>>>>> any
>>>>>>> chosen
>>>>>>> location.
>>>>>>> Currently I have no plans to upstream this injection support,
>>>>>>> but
>>>>>>> am
>>>>>>> happy to
>>>>>>> share if useful to others.
>>>>>>
>>>>>> I’m experimenting with it in my RCEC code in AER and will give
>>>>>> you
>>>>>> additional feedback.
>>>>>
>>>>> Great, thanks
>>>>>
>>>>> Jonathan
>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Sean
>>>>>>
>>>>>>
>>>>>>> [0] ACPI PCI Express Base Specification 4.0 1.3.2.3 Root
>>>>>>> Complex
>>>>>>> Integrated
>>>>>>>     Endpoint Rules.
>>>>>>> [1] ACPI PCI Express Base Specification 4.0 6.2 Error
>>>>>>> Signalling
>>>>>>> and
>>>>>>> Logging
>>>>>>> [2] ACPI Specification 6.3 Chapter 18 ACPI Platform Error
>>>>>>> Interface
>>>>>>> (APEI)
>>>>>>> [3] ACPI Sepcification 6.3 18.2.3.7 Generic Hardware Error
>>>>>>> Source
>>>>>>> [4] UEFI Specification 2.8, N.2.7 PCI Express Error Section
>>>>>>>
>>>>>>> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>>>>>>> ---
>>>>>>> Changes since v1:
>>>>>>> * Separated from the largely unrelated fix so the two can
>>>>>>> move
>>>>>>> forwards separately.
>>>>>>> * Instead of separate path for RCiEP handling use the method
>>>>>>> suggested
>>>>>>> by Bjorn
>>>>>>>   and Sathyanarayanan with an adjusted pci_bus_walk.
>>>>>>>
>>>>>>> Thanks all for reviews of V1.
>>>>>>>
>>>>>>>  drivers/pci/bus.c      | 28 ++++++++++++++++++++++++++++
>>>>>>>  drivers/pci/pcie/err.c | 29 +++++++++++++++++++----------
>>>>>>>  include/linux/pci.h    |  2 ++
>>>>>>>  3 files changed, 49 insertions(+), 10 deletions(-)
>>>>>>>
>>>>>>> diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
>>>>>>> index 8e40b3e6da77..7cbe1ed2db3d 100644
>>>>>>> --- a/drivers/pci/bus.c
>>>>>>> +++ b/drivers/pci/bus.c
>>>>>>> @@ -411,6 +411,34 @@ void pci_walk_bus(struct pci_bus *top,
>>>>>>> int
>>>>>>> (*cb)(struct pci_dev *, void *),
>>>>>>>  }
>>>>>>>  EXPORT_SYMBOL_GPL(pci_walk_bus);
>>>>>>>
>>>>>>> +/** pci_walk_below_dev - walk devices below (or on) another
>>>>>>> device
>>>>>>> + *  @dev      device for which we should walk below,
>>>>>>> include
>>>>>>> device
>>>>>>> when not a port.
>>>>>>> + *  @cb       callback to be called for each device found
>>>>>>> + *  @userdata arbitrary pointer to be passed to callback.
>>>>>>> + *
>>>>>>> + *  If the device provided is a port,
>>>>>>> + *  walk the subordinate bus, including any bridged devices
>>>>>>> + *  on buses under this bus.  Call the provided callback
>>>>>>> + *  on each device found.
>>>>>>> + *
>>>>>>> + *  If the device provided hs no subordinate bus, call the
>>>>>>> provided
>>>>>>> + *  callback on the device itself.
>>>>>>> + *
>>>>>>> + */
>>>>>>> +void pci_walk_below_dev(struct pci_dev *dev, int
>>>>>>> (*cb)(struct
>>>>>>> pci_dev
>>>>>>> *, void *),
>>>>>>> +			void *userdata)
>>>>>>> +{
>>>>>>> +	struct pci_bus *bus;
>>>>>>> +
>>>>>>> +	if (dev->subordinate) {
>>>>>>> +		bus = dev->subordinate;
>>>>>>> +		pci_walk_bus(bus, cb, userdata);
>>>>>>> +	} else {
>>>>>>> +		cb(dev, userdata);
>>>>>>> +	}
>>>>>>> +}
>>>>>>> +EXPORT_SYMBOL_GPL(pci_walk_below_dev);
>>>>>>> +
>>>>>>>  struct pci_bus *pci_bus_get(struct pci_bus *bus)
>>>>>>>  {
>>>>>>>  	if (bus)
>>>>>>> diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
>>>>>>> index 14bb8f54723e..fa08b1cc3d96 100644
>>>>>>> --- a/drivers/pci/pcie/err.c
>>>>>>> +++ b/drivers/pci/pcie/err.c
>>>>>>> @@ -151,33 +151,39 @@ pci_ers_result_t
>>>>>>> pcie_do_recovery(struct
>>>>>>> pci_dev
>>>>>>> *dev,
>>>>>>>  			pci_ers_result_t (*reset_link)(struct
>>>>>>> pci_dev *pdev))
>>>>>>>  {
>>>>>>>  	pci_ers_result_t status = PCI_ERS_RESULT_CAN_RECOVER;
>>>>>>> -	struct pci_bus *bus;
>>>>>>>
>>>>>>>  	/*
>>>>>>>  	 * Error recovery runs on all subordinates of the
>>>>>>> first
>>>>>>> downstream
>>>>>>> port.
>>>>>>>  	 * If the downstream port detected the error, it is
>>>>>>> cleared at
>>>>>>> the
>>>>>>> end.
>>>>>>> +	 * For RCiEPs we should reset just the RCiEP itself.
>>>>>>>  	 */
>>>>>>>  	if (!(pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT ||
>>>>>>> -	      pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM))
>>>>>>> +	      pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM ||
>>>>>>> +	      pci_pcie_type(dev) == PCI_EXP_TYPE_RC_END))
>>>>>>>  		dev = dev->bus->self;
>>>>>>> -	bus = dev->subordinate;
>>>>>>>
>>>>>>>  	pci_dbg(dev, "broadcast error_detected message\n");
>>>>>>>  	if (state == pci_channel_io_frozen) {
>>>>>>> -		pci_walk_bus(bus, report_frozen_detected,
>>>>>>> &status);
>>>>>>> +		pci_walk_below_dev(dev, report_frozen_detected,
>>>>>>> &status);
>>>>>>> +		if (pci_pcie_type(dev) == PCI_EXP_TYPE_RC_END)
>>>>>>> {
>>>>>>> +			pci_warn(dev, "link reset not possible
>>>>>>> for RCiEP\n");
>>>>>>> +			status = PCI_ERS_RESULT_NONE;
>>>>>>> +			goto failed;
>>>>>>> +		}
>>>>>>> +
>>>>>>>  		status = reset_link(dev);
>>>>>>>  		if (status != PCI_ERS_RESULT_RECOVERED) {
>>>>>>>  			pci_warn(dev, "link reset failed\n");
>>>>>>>  			goto failed;
>>>>>>>  		}
>>>>>>>  	} else {
>>>>>>> -		pci_walk_bus(bus, report_normal_detected,
>>>>>>> &status);
>>>>>>> +		pci_walk_below_dev(dev, report_normal_detected,
>>>>>>> &status);
>>>>>>>  	}
>>>>>>>
>>>>>>>  	if (status == PCI_ERS_RESULT_CAN_RECOVER) {
>>>>>>>  		status = PCI_ERS_RESULT_RECOVERED;
>>>>>>>  		pci_dbg(dev, "broadcast mmio_enabled
>>>>>>> message\n");
>>>>>>> -		pci_walk_bus(bus, report_mmio_enabled,
>>>>>>> &status);
>>>>>>> +		pci_walk_below_dev(dev, report_mmio_enabled,
>>>>>>> &status);
>>>>>>>  	}
>>>>>>>
>>>>>>>  	if (status == PCI_ERS_RESULT_NEED_RESET) {
>>>>>>> @@ -188,17 +194,20 @@ pci_ers_result_t
>>>>>>> pcie_do_recovery(struct
>>>>>>> pci_dev
>>>>>>> *dev,
>>>>>>>  		 */
>>>>>>>  		status = PCI_ERS_RESULT_RECOVERED;
>>>>>>>  		pci_dbg(dev, "broadcast slot_reset message\n");
>>>>>>> -		pci_walk_bus(bus, report_slot_reset, &status);
>>>>>>> +		pci_walk_below_dev(dev, report_slot_reset,
>>>>>>> &status);
>>>>>>>  	}
>>>>>>>
>>>>>>>  	if (status != PCI_ERS_RESULT_RECOVERED)
>>>>>>>  		goto failed;
>>>>>>>
>>>>>>>  	pci_dbg(dev, "broadcast resume message\n");
>>>>>>> -	pci_walk_bus(bus, report_resume, &status);
>>>>>>> +	pci_walk_below_dev(dev, report_resume, &status);
>>>>>>>
>>>>>>> -	pci_aer_clear_device_status(dev);
>>>>>>> -	pci_aer_clear_nonfatal_status(dev);
>>>>>>> +	if ((pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT ||
>>>>>>> +	     pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM)) {
>>>>>>> +		pci_aer_clear_device_status(dev);
>>>>>>> +		pci_aer_clear_nonfatal_status(dev);
>>>>>>> +	}
>>>>>>>  	pci_info(dev, "device recovery successful\n");
>>>>>>>  	return status;
>>>>>>>
>>>>>>> diff --git a/include/linux/pci.h b/include/linux/pci.h
>>>>>>> index c79d83304e52..538bf0a76d33 100644
>>>>>>> --- a/include/linux/pci.h
>>>>>>> +++ b/include/linux/pci.h
>>>>>>> @@ -1411,6 +1411,8 @@ int pci_scan_bridge(struct pci_bus
>>>>>>> *bus,
>>>>>>> struct
>>>>>>> pci_dev *dev, int max,
>>>>>>>
>>>>>>>  void pci_walk_bus(struct pci_bus *top, int (*cb)(struct
>>>>>>> pci_dev
>>>>>>> *,
>>>>>>> void *),
>>>>>>>  		  void *userdata);
>>>>>>> +void pci_walk_below_dev(struct pci_dev *dev, int
>>>>>>> (*cb)(struct
>>>>>>> pci_dev
>>>>>>> *, void *),
>>>>>>> +			void *userdata);
>>>>>>>  int pci_cfg_space_size(struct pci_dev *dev);
>>>>>>>  unsigned char pci_bus_max_busnr(struct pci_bus *bus);
>>>>>>>  void pci_setup_bridge(struct pci_bus *bus);
>>>>>>> -- 
>>>>>>> 2.19.1
>>
Jonathan Cameron July 8, 2020, 4:32 p.m. UTC | #12
On Wed, 8 Jul 2020 09:01:13 -0700
Sean V Kelley <sean.v.kelley@linux.intel.com> wrote:

> On 8 Jul 2020, at 8:39, Jonathan Cameron wrote:
> 
> > On Tue, 7 Jul 2020 09:56:26 -0700
> > Sean V Kelley <sean.v.kelley@linux.intel.com> wrote:
> >  
> >> On 3 Jul 2020, at 1:23, Jonathan Cameron wrote:
> >>  
> >>> On Thu, 2 Jul 2020 11:06:26 -0700
> >>> Sean V Kelley <sean.v.kelley@linux.intel.com> wrote:
> >>>  
> >>>> On 26 Jun 2020, at 11:41, Jonathan Cameron wrote:
> >>>>  
> >>>>> On Fri, 26 Jun 2020 09:29:34 -0700
> >>>>> Sean V Kelley <sean.v.kelley@linux.intel.com> wrote:
> >>>>>  
> >>>>>> Hi,  
> >>>>> Hi,
> >>>>>
> >>>>> Thanks for taking a look.
> >>>>>  
> >>>>>>
> >>>>>> On 22 Jun 2020, at 4:44, Jonathan Cameron wrote:
> >>>>>>  
> >>>>>>> Was previously: PCI/AER: Add partial initial supprot for
> >>>>>>> RCiEPs
> >>>>>>> using
> >>>>>>> RCEC or
> >>>>>>> firmware first.
> >>>>>>>
> >>>>>>> Currently the kernel does not handle AER errors for Root
> >>>>>>> Complex
> >>>>>>> integrated
> >>>>>>> End Points (RCiEPs)[0].  These devices sit on a root bus
> >>>>>>> within
> >>>>>>> the
> >>>>>>> Root Complex
> >>>>>>> (RC).  AER handling is performed by a Root Complex Event
> >>>>>>> Collector
> >>>>>>> (RCEC) [1]
> >>>>>>> which is a effectively a type of RCiEP on the same root bus.
> >>>>>>>
> >>>>>>> This code will only perform the correct reset flow for the
> >>>>>>> case
> >>>>>>> where
> >>>>>>> there
> >>>>>>> is no need to take any actions on the RCEC because the
> >>>>>>> firmware is
> >>>>>>> responsible for them.   This is true where APEI [2] is used
> >>>>>>> to
> >>>>>>> report
> >>>>>>> the AER
> >>>>>>> errors via a GHES[v2] HEST entry [3] and relevant AER CPER
> >>>>>>> record
> >>>>>>> [4]
> >>>>>>> and Firmware
> >>>>>>> First handling is in use.  
> >>>>>>
> >>>>>> Right, in the case of the RCEC one identifies the RCiEPs by
> >>>>>> the
> >>>>>> RCiEP
> >>>>>> bitmap as a part of the RCEC Associated Endpoint Extended
> >>>>>> Capabilities.
> >>>>>> This ‘search’ so to speak would make use also of the RCEC
> >>>>>> Associated
> >>>>>> Bus Numbers Register to associate the devices with an RCEC when
> >>>>>> not
> >>>>>> on
> >>>>>> the same bus.  
> >>>>>
> >>>>> Ah. I'm afraid my access to recent specs is a bit limited at the
> >>>>> moment.
> >>>>> I do have a draft 5.0 spec which has that in though so I now see
> >>>>> what
> >>>>> you mean.
> >>>>>
> >>>>> Was introduced in Root Complex Event Collector Endpoint
> >>>>> Association
> >>>>> Extended
> >>>>> Capability version 2 in PCIe 5.0 I think.
> >>>>>  
> >>>>
> >>>> Correct.
> >>>>  
> >>>>>>> As there is no current RCEC driver support, it should not be
> >>>>>>> possible
> >>>>>>> to get
> >>>>>>> to this code via any routes other than the one above. Hence
> >>>>>>> appropriate RCEC
> >>>>>>> handling can be added when the RCEC driver support is ready.
> >>>>>>> The error handling is different from a normal PCIe End Point
> >>>>>>> because:
> >>>>>>>
> >>>>>>> 1) There is no downstream port above an RCiEP as these
> >>>>>>> devices sit
> >>>>>>> on
> >>>>>>> a root
> >>>>>>>    bus.
> >>>>>>>
> >>>>>>> 2) In general, it makes little sense to reset other devices
> >>>>>>> on on
> >>>>>>> the
> >>>>>>> same
> >>>>>>>    root bus.  For error handling outside the of the root
> >>>>>>> complex
> >>>>>>> (RC)
> >>>>>>> an AER
> >>>>>>>    error will indicate that all the topology below the
> >>>>>>> physical
> >>>>>>> link,
> >>>>>>> which
> >>>>>>>    the error is related to, will need to be reset as they
> >>>>>>> share a
> >>>>>>> common
> >>>>>>>    path to the host.  For an RCiEP there is no such defined
> >>>>>>> shared
> >>>>>>> path
> >>>>>>>    relationship with other elements on the root bus.
> >>>>>>>
> >>>>>>> A new walk function, similar to pci_bus_walk is provided
> >>>>>>> that
> >>>>>>> takes
> >>>>>>> a
> >>>>>>> pci_dev
> >>>>>>> instead of a bus.  If that dev corresponds to a downstream
> >>>>>>> port it
> >>>>>>> will walk
> >>>>>>> the subordinate bus of that downstream port.  If the dev does
> >>>>>>> not
> >>>>>>> then
> >>>>>>> it
> >>>>>>> will call the function on that device alone.   This function
> >>>>>>> allows
> >>>>>>> us
> >>>>>>> to
> >>>>>>> avoid adding special cases to the majority of the error
> >>>>>>> handling.  
> >>>>>>
> >>>>>> Then in that case the callback could add the additional checks
> >>>>>> specific
> >>>>>> to identifying the associated RCiEPs.  
> >>>>>
> >>>>> I am afraid I don't follow what you mean here.  Could you give
> >>>>> more
> >>>>> info?  
> >>>>
> >>>> Sure, a given RCEC can be associated with multiple RCiEPs.  As a
> >>>> part
> >>>> of
> >>>> the Extended Association Cap it is possible to obtain a bitmap of
> >>>> the
> >>>> RCiEP device ids on the same bus number as the RCEC device itself.
> >>>> (5.0-1.0 sec 7.9.10.2).  With a Cap version of 2h or higher, it is
> >>>> also
> >>>> possible to get an additional range of bus numbers containing
> >>>> RCiEPs
> >>>> also associated with this RCEC.
> >>>>
> >>>> So I’m wondering if this function could be used in which passing 
> >>>> a
> >>>> dev, in this case the RCEC, triggers the call back which makes use
> >>>> of
> >>>> the RCiEP bitmap and associated bus ranges to return all identified
> >>>> devices in use cases such as in AER for finding sources, etc.  
> >>>
> >>> Ah understood.
> >>>
> >>> If we do this we effectively end up with 3 different types of walk
> >>> and
> >>> the meaning of the walk function gets more complex again.
> >>>
> >>> 1) Normal bus walk - we pass the downstream port above a bus to 
> >>> which
> >>> the device is attached and it walks everthing below that point.
> >>>
> >>> 2) Case I care about RCiEP with no visible association with an RCEC
> >>> as
> >>> I don't need to walk devices.  In that case just calls the callbacks
> >>> for
> >>> the actual device.
> >>>
> >>> 3) Pass in RCiEP with RCEC asociated with it (or do a dance at the
> >>> caller
> >>> to pass in the RCEC itself). Need to walk the devices that the RCEC
> >>> is
> >>> handling errors for.  For handling, I'm not all the calls will be
> >>> generally
> >>> applicable to other devices associated with the RCEC as some only
> >>> make
> >>> sense if there is an actual PCIe bus involved and hence we need to
> >>> reset
> >>> other devices on that bus.  For RCEC I don't think there is an
> >>> particular
> >>> reason to assume an AER error reported at one RCiEP will have any
> >>> impact
> >>> on other devices associated with the particular RCEC.
> >>> I've not found anything in the spec addressing this question but
> >>> perhaps
> >>> I've missed something?  
> >>
> >> Correct.  There should be no impact to the RCEC or its associated
> >> RCiEPs which may not happen to reside on the same bus as the 
> >> collector at all.
> >>  
> >>> However, if the RCEC doesn't support multiple error records, you may
> >>> need
> >>> to walk the bus to identify multiple simultaneous issues, very
> >>> carefully
> >>> avoiding (or least minimizing) race conditions.
> >>>  
> >>>> The alternative is to have a separate walk for RCECs that loops
> >>>> through
> >>>> the bitmap / ranges (if supported) triggering the callback for each
> >>>> device found.  
> >>
> >> I’ve been testing the Associated Endpoint Bitmap and Bus Range
> >> handling and using my pciutils patches to help to confirm some of the
> >> association. This overlaps with my CXL work and CXL 1.1 based RCiEPs
> >> are good test cards:
> >>
> >> (Decode via:
> >> https://lore.kernel.org/linux-pci/20200624223940.240463-1-sean.v.kelley@linux.intel.com/
> >>  )
> >>
> >> Test card at 6b:00.0
> >>
> >> Capabilities: [e00 v1] Designated Vendor-Specific: Vendor=1e98 
> >> ID=0000
> >> Rev=0 Len=56: CXL
> >> CXLCap: Cache- IO+ Mem+ Mem HW Init+ HDMCount 1 Viral-
> >> CXLCtl: Cache- IO+ Mem- Cache SF Cov 0 Cache SF Gran 0 Cache Clean-
> >> Viral-
> >> CXLSta: Viral-
> >> Capabilities: [e38 v1] Device Serial Number 30-91-11-78-10-00-00-00
> >>
> >> RCEC assocated to RCiEP at 6b while residing at 6a:
> >>
> >> Capabilities: [160 v2] Root Complex Event Collector Endpoint
> >> Association
> >> RCiEPBitmap: 00000000 [none]
> >> AssociatedBusNumbers: 6b-6b
> >> Kernel driver in use: pcieport
> >>
> >> with dmesg:
> >>
> >> [ 10.502543] pcieport 0000:6a:00.4: AER: enabled with IRQ 34
> >>
> >> The trick is the walk, which is not compact.  Currently working on
> >> error injection to test:
> >>
> >> void pcie_walk_rcec(struct pci_dev *rcec, int (*cb)(struct pci_dev *,
> >> void *),
> >> +                   void *userdata)
> >> +{
> >> +       u32 pos, bitmap, hdr, busn;
> >> +       u8 ver, nextbusn, lastbusn;
> >> +       unsigned int dev, fn, bnr;
> >> +       struct pci_bus *pbus=NULL;
> >> +       struct pci_dev *pdev;
> >> +       int retval;
> >> +
> >> +       pos = pci_find_ext_capability(rcec, PCI_EXT_CAP_ID_RCEC);
> >> +       if (!pos)
> >> +               return;
> >> +
> >> +       pci_read_config_dword(rcec, pos + PCI_RCEC_RCIEP_BITMAP,
> >> &bitmap);
> >> +
> >> +       for (dev = 0; dev < 32; dev++) {  
> >
> > for_each_bit_set perhaps?
> >  
> >> +               if (!(bitmap & (1 << dev)))
> >> +                       continue;
> >> +
> >> +               for (fn = 0; fn < 8; fn++) {
> >> +                       pdev =
> >> pci_get_domain_bus_and_slot(pci_domain_nr(rcec->bus),
> >> +                                                          rcec->bus-  
> >>> number,  
> >> +                                                          
> >> PCI_DEVFN(de
> >> v, fn));
> >> +                       if (!pdev)
> >> +                               continue;
> >> +
> >> +                       retval = cb(pdev, userdata);
> >> +                       if (retval)
> >> +                               return;
> >> +               }
> >> +       }
> >>
> >> Then continuing in the same function above, I need to also consider 
> >> the
> >> case for the Bus ranges (still wip, not tested):
> >>
> >> +       pci_read_config_dword(rcec, pos, &hdr);
> >> +       ver = PCI_RCEC_EP_CAP_VER(hdr);
> >> +       if (ver < PCI_RCEC_BUSN_REG_VER)
> >> +               return;
> >> +
> >> +       pci_read_config_dword(rcec, pos + PCI_RCEC_BUSN, &busn);
> >> +       nextbusn = PCI_RCEC_BUSN_NEXT(busn);
> >> +       lastbusn = PCI_RCEC_BUSN_LAST(busn);
> >> +
> >> +       if ((nextbusn == 0xff) && (lastbusn == 0x00))
> >> +               return;
> >> +
> >> +       for (bnr = nextbusn; bnr < (lastbusn + 1); bnr++) {
> >> +               pbus = pci_find_bus(pci_domain_nr(rcec->bus), bnr);
> >> +               if (pbus) {
> >> +                       /* find RCiEP devices on the given bus */
> >> +                       for (dev = 0; dev < 32; dev++) {
> >> + etc...
> >> +                       }
> >> +               }
> >> +       }
> >> +}  
> >
> > Makes sense.
> >  
> >>
> >> Currently this lives in aer.c and it's large enough that I wonder if
> >> due to the specifity of the assoicated spec requirments if that 
> >> should
> >> be fine?  
> >
> > Seems fine to me.
> >  
> >>  
> >>> Agreed. We would end up with the same splitting of handling paths
> >>> that
> >>> wasn't liked in my v1 patch.   Perhaps we need  single
> >>> pci_walk_aer_affected
> >>> function with a pile of documentation for what it is actually doing?  
> >>
> >> Perhaps this would be a good start?  I tend to agree.  I can also
> >> submit more of the patches as RFC for further comment.  
> >
> > When you are ready that would be great.  
> 
> Sounds good.
> 
> >  
> >>  
> >>>
> >>> Even then we may need to have a parameter to indicate a particular
> >>> callback
> >>> should be restricted to devices that share a 'real bus' or not.  
> >>
> >> In my case 'walk' through the spec options of either just bitmap or 
> >> if
> >> of sufficient version (2h), I walk through the bus ranges, calling 
> >> the
> >> callback at each encounter.  
> >
> > That wasn't what I meant.  When doing some of the actual handling once
> > we have a walk function it will get a bit fiddly.
> > I 'think' we need to first identify which devices associated with the 
> > RCEC
> > have reported an AER error and then we need to only call the callback 
> > for
> > those.  
> 
> Right, I was going to do that decision making in the call back rather 
> than include those details in the walk. That way the call back simply 
> reports what associated devices it finds and makes no assumptions on 
> which device needs to be acted upon.  I’ll experiment with 
> alternatives to placement of identification code.

That may work.  Will need the existing callbacks to be a bit more clever
perhaps. Will be interesting to see what you get to work!

> 
> >
> > In a case with a real bus and EP, the assumption is that the whole bus
> > is going down so you have to issue it to everyone.  
> 
> Correct.
> 
> >  
> >>  
> >>>
> >>> I guess the proof will as ever be in what the code looks like.  
> >>
> >> Agreed.
> >>  
> >>>
> >>> Hmm. It increasingly feels like we may need to have a go at drawing
> >>> together
> >>> some coherent documentation for the different ways of handling
> >>> AER errors and specifications / assumptions for each.  Would be
> >>> 'interesting'
> >>> to do given I'm fairly sure very few people actually understands all
> >>> the options and
> >>> nasty corner cases!
> >>>
> >>> +CC Lorenzo and James who may also be interested in this topic in
> >>> general.  
> >>
> >> Let's do.  
> >
> > Great.  Whilst we only care about this particular corner for now it 
> > may
> > be worth working out some more comprehensive docs covering standard 
> > topology
> > as well. CXL may bring its own additions, but I'm not sure we actually
> > have a clear description of what 'should' happen on normal PCIe.  
> 
> Agree, the focus is on base or normal PCIe behavior. CXL is one use case 
> with its own exceptions.

:)

> 
> Sean
> 
> >
> > Jonathan
> >  
> >>
> >> Best regards,
> >>
> >> Sean
> >>  
> >>>
> >>> Thanks,
> >>>
> >>> Jonathan
> >>>
> >>>  
> >>>> Thanks,
> >>>>
> >>>> Sean
> >>>>  
> >>>>>>> Open questions:
> >>>>>>>
> >>>>>>> 1. Are we better protecting against link reset for an RCiEP
> >>>>>>> in
> >>>>>>> here
> >>>>>>> or
> >>>>>>>    should we put the check in the link reset functions?
> >>>>>>>
> >>>>>>> 2. If we were to get a stupid firmware record with the
> >>>>>>> relevant
> >>>>>>> reset
> >>>>>>> flag
> >>>>>>>    set to trigger a link reset, what is the correct
> >>>>>>> response?  For
> >>>>>>> now
> >>>>>>> I
> >>>>>>>    try to report that we haven't done anything and print a
> >>>>>>> warning.
> >>>>>>>
> >>>>>>> 3. Naming of pci_walk_below_dev is rather unsatisfying. Any
> >>>>>>> better
> >>>>>>> ideas?
> >>>>>>>
> >>>>>>> 4. pci_walk_below_dev is perhaps not of general utility.
> >>>>>>> Shall I
> >>>>>>> make
> >>>>>>> it local
> >>>>>>>    in err.c?  If not would a precursor patch for that be
> >>>>>>> preferred?’  
> >>>>>>
> >>>>>> It depends.  Is it intended as a drop in replacement where
> >>>>>> needed
> >>>>>> for
> >>>>>> pci_walk_bus()? So in that case you are now passing the dev
> >>>>>> structure
> >>>>>> and do the check for subordinate or is it intended as being
> >>>>>> specific
> >>>>>> to
> >>>>>> say RCEC? With AER, one could either first check for RC_EC type
> >>>>>> before
> >>>>>> using this one.  Or one could just drop in replace (passing the
> >>>>>> dev
> >>>>>> structure instead) and the call back performs the RCEC specific
> >>>>>> checks
> >>>>>> when a device is encountered.  
> >>>>>
> >>>>> If it is useful in aer.c that's great.   Just seemed such a weird
> >>>>> beast
> >>>>> I wasn't sure it would be of use anywhere else.
> >>>>>  
> >>>>>>> Testing has been performed via error injection on a QEMU
> >>>>>>> platform
> >>>>>>> as
> >>>>>>> that lets
> >>>>>>> me create a wide range of topologies and report errors at
> >>>>>>> any
> >>>>>>> chosen
> >>>>>>> location.
> >>>>>>> Currently I have no plans to upstream this injection support,
> >>>>>>> but
> >>>>>>> am
> >>>>>>> happy to
> >>>>>>> share if useful to others.  
> >>>>>>
> >>>>>> I’m experimenting with it in my RCEC code in AER and will give
> >>>>>> you
> >>>>>> additional feedback.  
> >>>>>
> >>>>> Great, thanks
> >>>>>
> >>>>> Jonathan
> >>>>>  
> >>>>>> Thanks,
> >>>>>>
> >>>>>> Sean
> >>>>>>
> >>>>>>  
> >>>>>>> [0] ACPI PCI Express Base Specification 4.0 1.3.2.3 Root
> >>>>>>> Complex
> >>>>>>> Integrated
> >>>>>>>     Endpoint Rules.
> >>>>>>> [1] ACPI PCI Express Base Specification 4.0 6.2 Error
> >>>>>>> Signalling
> >>>>>>> and
> >>>>>>> Logging
> >>>>>>> [2] ACPI Specification 6.3 Chapter 18 ACPI Platform Error
> >>>>>>> Interface
> >>>>>>> (APEI)
> >>>>>>> [3] ACPI Sepcification 6.3 18.2.3.7 Generic Hardware Error
> >>>>>>> Source
> >>>>>>> [4] UEFI Specification 2.8, N.2.7 PCI Express Error Section
> >>>>>>>
> >>>>>>> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> >>>>>>> ---
> >>>>>>> Changes since v1:
> >>>>>>> * Separated from the largely unrelated fix so the two can
> >>>>>>> move
> >>>>>>> forwards separately.
> >>>>>>> * Instead of separate path for RCiEP handling use the method
> >>>>>>> suggested
> >>>>>>> by Bjorn
> >>>>>>>   and Sathyanarayanan with an adjusted pci_bus_walk.
> >>>>>>>
> >>>>>>> Thanks all for reviews of V1.
> >>>>>>>
> >>>>>>>  drivers/pci/bus.c      | 28 ++++++++++++++++++++++++++++
> >>>>>>>  drivers/pci/pcie/err.c | 29 +++++++++++++++++++----------
> >>>>>>>  include/linux/pci.h    |  2 ++
> >>>>>>>  3 files changed, 49 insertions(+), 10 deletions(-)
> >>>>>>>
> >>>>>>> diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
> >>>>>>> index 8e40b3e6da77..7cbe1ed2db3d 100644
> >>>>>>> --- a/drivers/pci/bus.c
> >>>>>>> +++ b/drivers/pci/bus.c
> >>>>>>> @@ -411,6 +411,34 @@ void pci_walk_bus(struct pci_bus *top,
> >>>>>>> int
> >>>>>>> (*cb)(struct pci_dev *, void *),
> >>>>>>>  }
> >>>>>>>  EXPORT_SYMBOL_GPL(pci_walk_bus);
> >>>>>>>
> >>>>>>> +/** pci_walk_below_dev - walk devices below (or on) another
> >>>>>>> device
> >>>>>>> + *  @dev      device for which we should walk below,
> >>>>>>> include
> >>>>>>> device
> >>>>>>> when not a port.
> >>>>>>> + *  @cb       callback to be called for each device found
> >>>>>>> + *  @userdata arbitrary pointer to be passed to callback.
> >>>>>>> + *
> >>>>>>> + *  If the device provided is a port,
> >>>>>>> + *  walk the subordinate bus, including any bridged devices
> >>>>>>> + *  on buses under this bus.  Call the provided callback
> >>>>>>> + *  on each device found.
> >>>>>>> + *
> >>>>>>> + *  If the device provided hs no subordinate bus, call the
> >>>>>>> provided
> >>>>>>> + *  callback on the device itself.
> >>>>>>> + *
> >>>>>>> + */
> >>>>>>> +void pci_walk_below_dev(struct pci_dev *dev, int
> >>>>>>> (*cb)(struct
> >>>>>>> pci_dev
> >>>>>>> *, void *),
> >>>>>>> +			void *userdata)
> >>>>>>> +{
> >>>>>>> +	struct pci_bus *bus;
> >>>>>>> +
> >>>>>>> +	if (dev->subordinate) {
> >>>>>>> +		bus = dev->subordinate;
> >>>>>>> +		pci_walk_bus(bus, cb, userdata);
> >>>>>>> +	} else {
> >>>>>>> +		cb(dev, userdata);
> >>>>>>> +	}
> >>>>>>> +}
> >>>>>>> +EXPORT_SYMBOL_GPL(pci_walk_below_dev);
> >>>>>>> +
> >>>>>>>  struct pci_bus *pci_bus_get(struct pci_bus *bus)
> >>>>>>>  {
> >>>>>>>  	if (bus)
> >>>>>>> diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
> >>>>>>> index 14bb8f54723e..fa08b1cc3d96 100644
> >>>>>>> --- a/drivers/pci/pcie/err.c
> >>>>>>> +++ b/drivers/pci/pcie/err.c
> >>>>>>> @@ -151,33 +151,39 @@ pci_ers_result_t
> >>>>>>> pcie_do_recovery(struct
> >>>>>>> pci_dev
> >>>>>>> *dev,
> >>>>>>>  			pci_ers_result_t (*reset_link)(struct
> >>>>>>> pci_dev *pdev))
> >>>>>>>  {
> >>>>>>>  	pci_ers_result_t status = PCI_ERS_RESULT_CAN_RECOVER;
> >>>>>>> -	struct pci_bus *bus;
> >>>>>>>
> >>>>>>>  	/*
> >>>>>>>  	 * Error recovery runs on all subordinates of the
> >>>>>>> first
> >>>>>>> downstream
> >>>>>>> port.
> >>>>>>>  	 * If the downstream port detected the error, it is
> >>>>>>> cleared at
> >>>>>>> the
> >>>>>>> end.
> >>>>>>> +	 * For RCiEPs we should reset just the RCiEP itself.
> >>>>>>>  	 */
> >>>>>>>  	if (!(pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT ||
> >>>>>>> -	      pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM))
> >>>>>>> +	      pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM ||
> >>>>>>> +	      pci_pcie_type(dev) == PCI_EXP_TYPE_RC_END))
> >>>>>>>  		dev = dev->bus->self;
> >>>>>>> -	bus = dev->subordinate;
> >>>>>>>
> >>>>>>>  	pci_dbg(dev, "broadcast error_detected message\n");
> >>>>>>>  	if (state == pci_channel_io_frozen) {
> >>>>>>> -		pci_walk_bus(bus, report_frozen_detected,
> >>>>>>> &status);
> >>>>>>> +		pci_walk_below_dev(dev, report_frozen_detected,
> >>>>>>> &status);
> >>>>>>> +		if (pci_pcie_type(dev) == PCI_EXP_TYPE_RC_END)
> >>>>>>> {
> >>>>>>> +			pci_warn(dev, "link reset not possible
> >>>>>>> for RCiEP\n");
> >>>>>>> +			status = PCI_ERS_RESULT_NONE;
> >>>>>>> +			goto failed;
> >>>>>>> +		}
> >>>>>>> +
> >>>>>>>  		status = reset_link(dev);
> >>>>>>>  		if (status != PCI_ERS_RESULT_RECOVERED) {
> >>>>>>>  			pci_warn(dev, "link reset failed\n");
> >>>>>>>  			goto failed;
> >>>>>>>  		}
> >>>>>>>  	} else {
> >>>>>>> -		pci_walk_bus(bus, report_normal_detected,
> >>>>>>> &status);
> >>>>>>> +		pci_walk_below_dev(dev, report_normal_detected,
> >>>>>>> &status);
> >>>>>>>  	}
> >>>>>>>
> >>>>>>>  	if (status == PCI_ERS_RESULT_CAN_RECOVER) {
> >>>>>>>  		status = PCI_ERS_RESULT_RECOVERED;
> >>>>>>>  		pci_dbg(dev, "broadcast mmio_enabled
> >>>>>>> message\n");
> >>>>>>> -		pci_walk_bus(bus, report_mmio_enabled,
> >>>>>>> &status);
> >>>>>>> +		pci_walk_below_dev(dev, report_mmio_enabled,
> >>>>>>> &status);
> >>>>>>>  	}
> >>>>>>>
> >>>>>>>  	if (status == PCI_ERS_RESULT_NEED_RESET) {
> >>>>>>> @@ -188,17 +194,20 @@ pci_ers_result_t
> >>>>>>> pcie_do_recovery(struct
> >>>>>>> pci_dev
> >>>>>>> *dev,
> >>>>>>>  		 */
> >>>>>>>  		status = PCI_ERS_RESULT_RECOVERED;
> >>>>>>>  		pci_dbg(dev, "broadcast slot_reset message\n");
> >>>>>>> -		pci_walk_bus(bus, report_slot_reset, &status);
> >>>>>>> +		pci_walk_below_dev(dev, report_slot_reset,
> >>>>>>> &status);
> >>>>>>>  	}
> >>>>>>>
> >>>>>>>  	if (status != PCI_ERS_RESULT_RECOVERED)
> >>>>>>>  		goto failed;
> >>>>>>>
> >>>>>>>  	pci_dbg(dev, "broadcast resume message\n");
> >>>>>>> -	pci_walk_bus(bus, report_resume, &status);
> >>>>>>> +	pci_walk_below_dev(dev, report_resume, &status);
> >>>>>>>
> >>>>>>> -	pci_aer_clear_device_status(dev);
> >>>>>>> -	pci_aer_clear_nonfatal_status(dev);
> >>>>>>> +	if ((pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT ||
> >>>>>>> +	     pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM)) {
> >>>>>>> +		pci_aer_clear_device_status(dev);
> >>>>>>> +		pci_aer_clear_nonfatal_status(dev);
> >>>>>>> +	}
> >>>>>>>  	pci_info(dev, "device recovery successful\n");
> >>>>>>>  	return status;
> >>>>>>>
> >>>>>>> diff --git a/include/linux/pci.h b/include/linux/pci.h
> >>>>>>> index c79d83304e52..538bf0a76d33 100644
> >>>>>>> --- a/include/linux/pci.h
> >>>>>>> +++ b/include/linux/pci.h
> >>>>>>> @@ -1411,6 +1411,8 @@ int pci_scan_bridge(struct pci_bus
> >>>>>>> *bus,
> >>>>>>> struct
> >>>>>>> pci_dev *dev, int max,
> >>>>>>>
> >>>>>>>  void pci_walk_bus(struct pci_bus *top, int (*cb)(struct
> >>>>>>> pci_dev
> >>>>>>> *,
> >>>>>>> void *),
> >>>>>>>  		  void *userdata);
> >>>>>>> +void pci_walk_below_dev(struct pci_dev *dev, int
> >>>>>>> (*cb)(struct
> >>>>>>> pci_dev
> >>>>>>> *, void *),
> >>>>>>> +			void *userdata);
> >>>>>>>  int pci_cfg_space_size(struct pci_dev *dev);
> >>>>>>>  unsigned char pci_bus_max_busnr(struct pci_bus *bus);
> >>>>>>>  void pci_setup_bridge(struct pci_bus *bus);
> >>>>>>> -- 
> >>>>>>> 2.19.1  
> >>
Sean V Kelley July 24, 2020, 5:29 p.m. UTC | #13
On Wed, 2020-07-08 at 17:32 +0100, Jonathan Cameron wrote:
> On Wed, 8 Jul 2020 09:01:13 -0700
> Sean V Kelley <sean.v.kelley@linux.intel.com> wrote:
> 
> > On 8 Jul 2020, at 8:39, Jonathan Cameron wrote:
> > 
> > > On Tue, 7 Jul 2020 09:56:26 -0700
> > > Sean V Kelley <sean.v.kelley@linux.intel.com> wrote:
> > >  
> > > > On 3 Jul 2020, at 1:23, Jonathan Cameron wrote:
> > > >  
> > > > > On Thu, 2 Jul 2020 11:06:26 -0700
> > > > > Sean V Kelley <sean.v.kelley@linux.intel.com> wrote:
> > > > >  
> > > > > > On 26 Jun 2020, at 11:41, Jonathan Cameron wrote:
> > > > > >  
> > > > > > > On Fri, 26 Jun 2020 09:29:34 -0700
> > > > > > > Sean V Kelley <sean.v.kelley@linux.intel.com> wrote:
> > > > > > >  
> > > > > > > > Hi,  
> > > > > > > Hi,
> > > > > > > 
> > > > > > > Thanks for taking a look.
> > > > > > >  
> > > > > > > > On 22 Jun 2020, at 4:44, Jonathan Cameron wrote:
> > > > > > > >  
> > > > > > > > > Was previously: PCI/AER: Add partial initial supprot
> > > > > > > > > for
> > > > > > > > > RCiEPs
> > > > > > > > > using
> > > > > > > > > RCEC or
> > > > > > > > > firmware first.
> > > > > > > > > 
> > > > > > > > > Currently the kernel does not handle AER errors for
> > > > > > > > > Root
> > > > > > > > > Complex
> > > > > > > > > integrated
> > > > > > > > > End Points (RCiEPs)[0].  These devices sit on a root
> > > > > > > > > bus
> > > > > > > > > within
> > > > > > > > > the
> > > > > > > > > Root Complex
> > > > > > > > > (RC).  AER handling is performed by a Root Complex
> > > > > > > > > Event
> > > > > > > > > Collector
> > > > > > > > > (RCEC) [1]
> > > > > > > > > which is a effectively a type of RCiEP on the same
> > > > > > > > > root bus.
> > > > > > > > > 
> > > > > > > > > This code will only perform the correct reset flow
> > > > > > > > > for the
> > > > > > > > > case
> > > > > > > > > where
> > > > > > > > > there
> > > > > > > > > is no need to take any actions on the RCEC because
> > > > > > > > > the
> > > > > > > > > firmware is
> > > > > > > > > responsible for them.   This is true where APEI [2]
> > > > > > > > > is used
> > > > > > > > > to
> > > > > > > > > report
> > > > > > > > > the AER
> > > > > > > > > errors via a GHES[v2] HEST entry [3] and relevant AER
> > > > > > > > > CPER
> > > > > > > > > record
> > > > > > > > > [4]
> > > > > > > > > and Firmware
> > > > > > > > > First handling is in use.  
> > > > > > > > 
> > > > > > > > Right, in the case of the RCEC one identifies the
> > > > > > > > RCiEPs by
> > > > > > > > the
> > > > > > > > RCiEP
> > > > > > > > bitmap as a part of the RCEC Associated Endpoint
> > > > > > > > Extended
> > > > > > > > Capabilities.
> > > > > > > > This ‘search’ so to speak would make use also of the
> > > > > > > > RCEC
> > > > > > > > Associated
> > > > > > > > Bus Numbers Register to associate the devices with an
> > > > > > > > RCEC when
> > > > > > > > not
> > > > > > > > on
> > > > > > > > the same bus.  
> > > > > > > 
> > > > > > > Ah. I'm afraid my access to recent specs is a bit limited
> > > > > > > at the
> > > > > > > moment.
> > > > > > > I do have a draft 5.0 spec which has that in though so I
> > > > > > > now see
> > > > > > > what
> > > > > > > you mean.
> > > > > > > 
> > > > > > > Was introduced in Root Complex Event Collector Endpoint
> > > > > > > Association
> > > > > > > Extended
> > > > > > > Capability version 2 in PCIe 5.0 I think.
> > > > > > >  
> > > > > > 
> > > > > > Correct.
> > > > > >  
> > > > > > > > > As there is no current RCEC driver support, it should
> > > > > > > > > not be
> > > > > > > > > possible
> > > > > > > > > to get
> > > > > > > > > to this code via any routes other than the one above.
> > > > > > > > > Hence
> > > > > > > > > appropriate RCEC
> > > > > > > > > handling can be added when the RCEC driver support is
> > > > > > > > > ready.
> > > > > > > > > The error handling is different from a normal PCIe
> > > > > > > > > End Point
> > > > > > > > > On Wed, 8 Jul 2020 09:01:13 -0700
> > > > > > > > > > Sean V Kelley <sean.v.kelley@linux.intel.com>
> > > > > > > > > wrote:
> > > > > > > > > > 
> > > > > > > > > > > On 8 Jul 2020, at 8:39, Jonathan Cameron wrote:
> > > > > > > > > > > 
> > > > > > > > > > > > On Tue, 7 Jul 2020 09:56:26 -0700
> > > > > > > > > > > > Sean V Kelley <sean.v.kelley@linux.intel.com>
> > > > > > > > > wrote:
> > > > > > > > > > > >  
> > > > > > > > > > > > > On 3 Jul 2020, at 1:23, Jonathan Cameron
> > > > > > > > > wrote:
> > > > > > > > > > > > >  
> > > > > > > > > > > > > > On Thu, 2 Jul 2020 11:06:26 -0700
> > > > > > > > > > > > > > Sean V Kelley <
> > > > > > > > > sean.v.kelley@linux.intel.com> wrote:
> > > > > > > > > > > > > >  
> > > > > > > > > > > > > > > On 26 Jun 2020, at 11:41, Jonathan
> > > > > > > > > Cameron wrote:
> > > > > > > > > > > > > > >  
> > > > > > > > > > > > > > > > On Fri, 26 Jun 2020 09:29:34 -0700
> > > > > > > > > > > > > > > > Sean V Kelley <
> > > > > > > > > sean.v.kelley@linux.intel.com> wrote:
> > > > > > > > > > > > > > > >  
> > > > > > > > > > > > > > > > > Hi,  
> > > > > > > > > > > > > > > > Hi,
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > Thanks for taking a look.
> > > > > > > > > > > > > > > >  
> > > > > > > > > > > > > > > > > On 22 Jun 2020, at 4:44, Jonathan
> > > > > > > > > Cameron wrote:
> > > > > > > > > > > > > > > > >  
> > > > > > > > > > > > > > > > > > Was previously: PCI/AER: Add
> > > > > > > > > partial initial supprot
> > > > > > > > > > > > > > > > > > for
> > > > > > > > > > > > > > > > > > RCiEPs
> > > > > > > > > > > > > > > > > > using
> > > > > > > > > > > > > > > > > > RCEC or
> > > > > > > > > > > > > > > > > > firmware first.
> > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > Currently the kernel does not
> > > > > > > > > handle AER errors for
> > > > > > > > > > > > > > > > > > Root
> > > > > > > > > > > > > > > > > > Complex
> > > > > > > > > > > > > > > > > > integrated
> > > > > > > > > > > > > > > > > > End Points (RCiEPs)[0].  These
> > > > > > > > > devices sit on a root
> > > > > > > > > > > > > > > > > > bus
> > > > > > > > > > > > > > > > > > within
> > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > Root Complex
> > > > > > > > > > > > > > > > > > (RC).  AER handling is performed by
> > > > > > > > > a Root Complex
> > > > > > > > > > > > > > > > > > Event
> > > > > > > > > > > > > > > > > > Collector
> > > > > > > > > > > > > > > > > > (RCEC) [1]
> > > > > > > > > > > > > > > > > > which is a effectively a type of
> > > > > > > > > RCiEP on the same
> > > > > > > > > > > > > > > > > > root bus.
> > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > This code will only perform the
> > > > > > > > > correct reset flow
> > > > > > > > > > > > > > > > > > for the
> > > > > > > > > > > > > > > > > > case
> > > > > > > > > > > > > > > > > > where
> > > > > > > > > > > > > > > > > > there
> > > > > > > > > > > > > > > > > > is no need to take any actions on
> > > > > > > > > the RCEC because
> > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > firmware is
> > > > > > > > > > > > > > > > > > responsible for them.   This is
> > > > > > > > > true where APEI [2]
> > > > > > > > > > > > > > > > > > is used
> > > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > report
> > > > > > > > > > > > > > > > > > the AER
> > > > > > > > > > > > > > > > > > errors via a GHES[v2] HEST entry
> > > > > > > > > [3] and relevant AER
> > > > > > > > > > > > > > > > > > CPER
> > > > > > > > > > > > > > > > > > record
> > > > > > > > > > > > > > > > > > [4]
> > > > > > > > > > > > > > > > > > and Firmware
> > > > > > > > > > > > > > > > > > First handling is in use.  
> > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > Right, in the case of the RCEC one
> > > > > > > > > identifies the
> > > > > > > > > > > > > > > > > RCiEPs by
> > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > RCiEP
> > > > > > > > > > > > > > > > > bitmap as a part of the RCEC
> > > > > > > > > Associated Endpoint
> > > > > > > > > > > > > > > > > Extended
> > > > > > > > > > > > > > > > > because:
> > > > > > > > > 
> > > > > > > > > 1) There is no downstream port above an RCiEP as
> > > > > > > > > these
> > > > > > > > > devices sit
> > > > > > > > > on
> > > > > > > > > a root
> > > > > > > > >    bus.
> > > > > > > > > 
> > > > > > > > > 2) In general, it makes little sense to reset other
> > > > > > > > > devices
> > > > > > > > > on on
> > > > > > > > > the
> > > > > > > > > same
> > > > > > > > >    root bus.  For error handling outside the of the
> > > > > > > > > root
> > > > > > > > > complex
> > > > > > > > > (RC)
> > > > > > > > > an AER
> > > > > > > > >    error will indicate that all the topology below
> > > > > > > > > the
> > > > > > > > > physical
> > > > > > > > > link,
> > > > > > > > > which
> > > > > > > > >    the error is related to, will need to be reset as
> > > > > > > > > they
> > > > > > > > > share a
> > > > > > > > > common
> > > > > > > > >    path to the host.  For an RCiEP there is no such
> > > > > > > > > defined
> > > > > > > > > shared
> > > > > > > > > path
> > > > > > > > >    relationship with other elements on the root bus.
> > > > > > > > > 
> > > > > > > > > A new walk function, similar to pci_bus_walk is
> > > > > > > > > provided
> > > > > > > > > that
> > > > > > > > > takes
> > > > > > > > > a
> > > > > > > > > pci_dev
> > > > > > > > > instead of a bus.  If that dev corresponds to a
> > > > > > > > > downstream
> > > > > > > > > port it
> > > > > > > > > will walk
> > > > > > > > > the subordinate bus of that downstream port.  If the
> > > > > > > > > dev does
> > > > > > > > > not
> > > > > > > > > then
> > > > > > > > > it
> > > > > > > > > will call the function on that device alone.   This
> > > > > > > > > function
> > > > > > > > > allows
> > > > > > > > > us
> > > > > > > > > to
> > > > > > > > > avoid adding special cases to the majority of the
> > > > > > > > > error
> > > > > > > > > handling.  
> > > > > > > > 
> > > > > > > > Then in that case the callback could add the additional
> > > > > > > > checks
> > > > > > > > specific
> > > > > > > > to identifying the associated RCiEPs.  
> > > > > > > 
> > > > > > > I am afraid I don't follow what you mean here.  Could you
> > > > > > > give
> > > > > > > more
> > > > > > > info?  
> > > > > > 
> > > > > > Sure, a given RCEC can be associated with multiple
> > > > > > RCiEPs.  As a
> > > > > > part
> > > > > > of
> > > > > > the Extended Association Cap it is possible to obtain a
> > > > > > bitmap of
> > > > > > the
> > > > > > RCiEP device ids on the same bus number as the RCEC device
> > > > > > itself.
> > > > > > (5.0-1.0 sec 7.9.10.2).  With a Cap version of 2h or
> > > > > > higher, it is
> > > > > > also
> > > > > > possible to get an additional range of bus numbers
> > > > > > containing
> > > > > > RCiEPs
> > > > > > also associated with this RCEC.
> > > > > > 
> > > > > > So I’m wondering if this function could be used in which
> > > > > > passing 
> > > > > > a
> > > > > > dev, in this case the RCEC, triggers the call back which
> > > > > > makes use
> > > > > > of
> > > > > > the RCiEP bitmap and associated bus ranges to return all
> > > > > > identified
> > > > > > devices in use cases such as in AER for finding sources,
> > > > > > etc.  
> > > > > 
> > > > > Ah understood.
> > > > > 
> > > > > If we do this we effectively end up with 3 different types of
> > > > > walk
> > > > > and
> > > > > the meaning of the walk function gets more complex again.
> > > > > 
> > > > > 1) Normal bus walk - we pass the downstream port above a bus
> > > > > to 
> > > > > which
> > > > > the device is attached and it walks everthing below that
> > > > > point.
> > > > > 
> > > > > 2) Case I care about RCiEP with no visible association with
> > > > > an RCEC
> > > > > as
> > > > > I don't need to walk devices.  In that case just calls the
> > > > > callbacks
> > > > > for
> > > > > the actual device.
> > > > > 
> > > > > 3) Pass in RCiEP with RCEC asociated with it (or do a dance
> > > > > at the
> > > > > caller
> > > > > to pass in the RCEC itself). Need to walk the devices that
> > > > > the RCEC
> > > > > is
> > > > > handling errors for.  For handling, I'm not all the calls
> > > > > will be
> > > > > generally
> > > > > applicable to other devices associated with the RCEC as some
> > > > > only
> > > > > make
> > > > > sense if there is an actual PCIe bus involved and hence we
> > > > > need to
> > > > > reset
> > > > > other devices on that bus.  For RCEC I don't think there is
> > > > > an
> > > > > particular
> > > > > reason to assume an AER error reported at one RCiEP will have
> > > > > any
> > > > > impact
> > > > > on other devices associated with the particular RCEC.
> > > > > I've not found anything in the spec addressing this question
> > > > > but
> > > > > perhaps
> > > > > I've missed something?  
> > > > 
> > > > Correct.  There should be no impact to the RCEC or its
> > > > associated
> > > > RCiEPs which may not happen to reside on the same bus as the 
> > > > collector at all.
> > > >  
> > > > > However, if the RCEC doesn't support multiple error records,
> > > > > you may
> > > > > need
> > > > > to walk the bus to identify multiple simultaneous issues,
> > > > > very
> > > > > carefully
> > > > > avoiding (or least minimizing) race conditions.
> > > > >  
> > > > > > The alternative is to have a separate walk for RCECs that
> > > > > > loops
> > > > > > through
> > > > > > the bitmap / ranges (if supported) triggering the callback
> > > > > > for each
> > > > > > device found.  
> > > > 
> > > > I’ve been testing the Associated Endpoint Bitmap and Bus Range
> > > > handling and using my pciutils patches to help to confirm some
> > > > of the
> > > > association. This overlaps with my CXL work and CXL 1.1 based
> > > > RCiEPs
> > > > are good test cards:
> > > > 
> > > > (Decode via:
> > > > https://lore.kernel.org/linux-pci/20200624223940.240463-1-sean.v.kelley@linux.intel.com/
> > > >  )
> > > > 
> > > > Test card at 6b:00.0
> > > > 
> > > > Capabilities: [e00 v1] Designated Vendor-Specific: Vendor=1e98 
> > > > ID=0000
> > > > Rev=0 Len=56: CXL
> > > > CXLCap: Cache- IO+ Mem+ Mem HW Init+ HDMCount 1 Viral-
> > > > CXLCtl: Cache- IO+ Mem- Cache SF Cov 0 Cache SF Gran 0 Cache
> > > > Clean-
> > > > Viral-
> > > > CXLSta: Viral-
> > > > Capabilities: [e38 v1] Device Serial Number 30-91-11-78-10-00-
> > > > 00-00
> > > > 
> > > > RCEC assocated to RCiEP at 6b while residing at 6a:
> > > > 
> > > > Capabilities: [160 v2] Root Complex Event Collector Endpoint
> > > > Association
> > > > RCiEPBitmap: 00000000 [none]
> > > > AssociatedBusNumbers: 6b-6b
> > > > Kernel driver in use: pcieport
> > > > 
> > > > with dmesg:
> > > > 
> > > > [ 10.502543] pcieport 0000:6a:00.4: AER: enabled with IRQ 34
> > > > 
> > > > The trick is the walk, which is not compact.  Currently working
> > > > on
> > > > error injection to test:
> > > > 
> > > > void pcie_walk_rcec(struct pci_dev *rcec, int (*cb)(struct
> > > > pci_dev *,
> > > > void *),
> > > > +                   void *userdata)
> > > > +{
> > > > +       u32 pos, bitmap, hdr, busn;
> > > > +       u8 ver, nextbusn, lastbusn;
> > > > +       unsigned int dev, fn, bnr;
> > > > +       struct pci_bus *pbus=NULL;
> > > > +       struct pci_dev *pdev;
> > > > +       int retval;
> > > > +
> > > > +       pos = pci_find_ext_capability(rcec,
> > > > PCI_EXT_CAP_ID_RCEC);
> > > > +       if (!pos)
> > > > +               return;
> > > > +
> > > > +       pci_read_config_dword(rcec, pos +
> > > > PCI_RCEC_RCIEP_BITMAP,
> > > > &bitmap);
> > > > +
> > > > +       for (dev = 0; dev < 32; dev++) {  
> > > 
> > > for_each_bit_set perhaps?
> > >  
> > > > +               if (!(bitmap & (1 << dev)))
> > > > +                       continue;
> > > > +
> > > > +               for (fn = 0; fn < 8; fn++) {
> > > > +                       pdev =
> > > > pci_get_domain_bus_and_slot(pci_domain_nr(rcec->bus),
> > > > +                                                          rcec
> > > > ->bus-  
> > > > > number,  
> > > > +                                                          
> > > > PCI_DEVFN(de
> > > > v, fn));
> > > > +                       if (!pdev)
> > > > +                               continue;
> > > > +
> > > > +                       retval = cb(pdev, userdata);
> > > > +                       if (retval)
> > > > +                               return;
> > > > +               }
> > > > +       }
> > > > 
> > > > Then continuing in the same function above, I need to also
> > > > consider 
> > > > the
> > > > case for the Bus ranges (still wip, not tested):
> > > > 
> > > > +       pci_read_config_dword(rcec, pos, &hdr);
> > > > +       ver = PCI_RCEC_EP_CAP_VER(hdr);
> > > > +       if (ver < PCI_RCEC_BUSN_REG_VER)
> > > > +               return;
> > > > +
> > > > +       pci_read_config_dword(rcec, pos + PCI_RCEC_BUSN,
> > > > &busn);
> > > > +       nextbusn = PCI_RCEC_BUSN_NEXT(busn);
> > > > +       lastbusn = PCI_RCEC_BUSN_LAST(busn);
> > > > +
> > > > +       if ((nextbusn == 0xff) && (lastbusn == 0x00))
> > > > +               return;
> > > > +
> > > > +       for (bnr = nextbusn; bnr < (lastbusn + 1); bnr++) {
> > > > +               pbus = pci_find_bus(pci_domain_nr(rcec->bus),
> > > > bnr);
> > > > +               if (pbus) {
> > > > +                       /* find RCiEP devices on the given bus
> > > > */
> > > > +                       for (dev = 0; dev < 32; dev++) {
> > > > + etc...
> > > > +                       }
> > > > +               }
> > > > +       }
> > > > +}  
> > > 
> > > Makes sense.
> > >  
> > > > Currently this lives in aer.c and it's large enough that I
> > > > wonder if
> > > > due to the specifity of the assoicated spec requirments if
> > > > that 
> > > > should
> > > > be fine?  
> > > 
> > > Seems fine to me.
> > >  
> > > >  
> > > > > Agreed. We would end up with the same splitting of handling
> > > > > paths
> > > > > that
> > > > > wasn't liked in my v1 patch.   Perhaps we need  single
> > > > > pci_walk_aer_affected
> > > > > function with a pile of documentation for what it is actually
> > > > > doing?  
> > > > 
> > > > Perhaps this would be a good start?  I tend to agree.  I can
> > > > also
> > > > submit more of the patches as RFC for further comment.  
> > > 
> > > When you are ready that would be great.  
> > 
> > Sounds good.
> > 
> > >  
> > > >  
> > > > > Even then we may need to have a parameter to indicate a
> > > > > particular
> > > > > callback
> > > > > should be restricted to devices that share a 'real bus' or
> > > > > not.  
> > > > 
> > > > In my case 'walk' through the spec options of either just
> > > > bitmap or 
> > > > if
> > > > of sufficient version (2h), I walk through the bus ranges,
> > > > calling 
> > > > the
> > > > callback at each encounter.  
> > > 
> > > That wasn't what I meant.  When doing some of the actual handling
> > > once
> > > we have a walk function it will get a bit fiddly.
> > > I 'think' we need to first identify which devices associated with
> > > the 
> > > RCEC
> > > have reported an AER error and then we need to only call the
> > > callback 
> > > for
> > > those.  
> > 
> > Right, I was going to do that decision making in the call back
> > rather 
> > than include those details in the walk. That way the call back
> > simply 
> > reports what associated devices it finds and makes no assumptions
> > on 
> > which device needs to be acted upon.  I’ll experiment with 
> > alternatives to placement of identification code.
> 
> That may work.  Will need the existing callbacks to be a bit more
> clever
> perhaps. Will be interesting to see what you get to work!
> 

Patches are out today, please take a look:

https://lore.kernel.org/linux-pci/20200724172223.145608-1-sean.v.kelley@intel.com/

Thanks,
Sean
diff mbox series

Patch

diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
index 8e40b3e6da77..7cbe1ed2db3d 100644
--- a/drivers/pci/bus.c
+++ b/drivers/pci/bus.c
@@ -411,6 +411,34 @@  void pci_walk_bus(struct pci_bus *top, int (*cb)(struct pci_dev *, void *),
 }
 EXPORT_SYMBOL_GPL(pci_walk_bus);
 
+/** pci_walk_below_dev - walk devices below (or on) another device
+ *  @dev      device for which we should walk below, include device when not a port.
+ *  @cb       callback to be called for each device found
+ *  @userdata arbitrary pointer to be passed to callback.
+ *
+ *  If the device provided is a port,
+ *  walk the subordinate bus, including any bridged devices
+ *  on buses under this bus.  Call the provided callback
+ *  on each device found.
+ *
+ *  If the device provided hs no subordinate bus, call the provided
+ *  callback on the device itself.
+ *
+ */
+void pci_walk_below_dev(struct pci_dev *dev, int (*cb)(struct pci_dev *, void *),
+			void *userdata)
+{
+	struct pci_bus *bus;
+
+	if (dev->subordinate) {
+		bus = dev->subordinate;
+		pci_walk_bus(bus, cb, userdata);
+	} else {
+		cb(dev, userdata);
+	}
+}
+EXPORT_SYMBOL_GPL(pci_walk_below_dev);
+
 struct pci_bus *pci_bus_get(struct pci_bus *bus)
 {
 	if (bus)
diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
index 14bb8f54723e..fa08b1cc3d96 100644
--- a/drivers/pci/pcie/err.c
+++ b/drivers/pci/pcie/err.c
@@ -151,33 +151,39 @@  pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
 			pci_ers_result_t (*reset_link)(struct pci_dev *pdev))
 {
 	pci_ers_result_t status = PCI_ERS_RESULT_CAN_RECOVER;
-	struct pci_bus *bus;
 
 	/*
 	 * Error recovery runs on all subordinates of the first downstream port.
 	 * If the downstream port detected the error, it is cleared at the end.
+	 * For RCiEPs we should reset just the RCiEP itself.
 	 */
 	if (!(pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT ||
-	      pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM))
+	      pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM ||
+	      pci_pcie_type(dev) == PCI_EXP_TYPE_RC_END))
 		dev = dev->bus->self;
-	bus = dev->subordinate;
 
 	pci_dbg(dev, "broadcast error_detected message\n");
 	if (state == pci_channel_io_frozen) {
-		pci_walk_bus(bus, report_frozen_detected, &status);
+		pci_walk_below_dev(dev, report_frozen_detected, &status);
+		if (pci_pcie_type(dev) == PCI_EXP_TYPE_RC_END) {
+			pci_warn(dev, "link reset not possible for RCiEP\n");
+			status = PCI_ERS_RESULT_NONE;
+			goto failed;
+		}
+
 		status = reset_link(dev);
 		if (status != PCI_ERS_RESULT_RECOVERED) {
 			pci_warn(dev, "link reset failed\n");
 			goto failed;
 		}
 	} else {
-		pci_walk_bus(bus, report_normal_detected, &status);
+		pci_walk_below_dev(dev, report_normal_detected, &status);
 	}
 
 	if (status == PCI_ERS_RESULT_CAN_RECOVER) {
 		status = PCI_ERS_RESULT_RECOVERED;
 		pci_dbg(dev, "broadcast mmio_enabled message\n");
-		pci_walk_bus(bus, report_mmio_enabled, &status);
+		pci_walk_below_dev(dev, report_mmio_enabled, &status);
 	}
 
 	if (status == PCI_ERS_RESULT_NEED_RESET) {
@@ -188,17 +194,20 @@  pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
 		 */
 		status = PCI_ERS_RESULT_RECOVERED;
 		pci_dbg(dev, "broadcast slot_reset message\n");
-		pci_walk_bus(bus, report_slot_reset, &status);
+		pci_walk_below_dev(dev, report_slot_reset, &status);
 	}
 
 	if (status != PCI_ERS_RESULT_RECOVERED)
 		goto failed;
 
 	pci_dbg(dev, "broadcast resume message\n");
-	pci_walk_bus(bus, report_resume, &status);
+	pci_walk_below_dev(dev, report_resume, &status);
 
-	pci_aer_clear_device_status(dev);
-	pci_aer_clear_nonfatal_status(dev);
+	if ((pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT ||
+	     pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM)) {
+		pci_aer_clear_device_status(dev);
+		pci_aer_clear_nonfatal_status(dev);
+	}
 	pci_info(dev, "device recovery successful\n");
 	return status;
 
diff --git a/include/linux/pci.h b/include/linux/pci.h
index c79d83304e52..538bf0a76d33 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1411,6 +1411,8 @@  int pci_scan_bridge(struct pci_bus *bus, struct pci_dev *dev, int max,
 
 void pci_walk_bus(struct pci_bus *top, int (*cb)(struct pci_dev *, void *),
 		  void *userdata);
+void pci_walk_below_dev(struct pci_dev *dev, int (*cb)(struct pci_dev *, void *),
+			void *userdata);
 int pci_cfg_space_size(struct pci_dev *dev);
 unsigned char pci_bus_max_busnr(struct pci_bus *bus);
 void pci_setup_bridge(struct pci_bus *bus);