mbox series

[v5,0/2] ACPI: APEI: Add support to notify the vendor specific HW errors

Message ID 8aa40a48-39c9-ba6b-ea70-bcb60907a733@huawei.com (mailing list archive)
Headers show
Series ACPI: APEI: Add support to notify the vendor specific HW errors | expand

Message

Shiju Jose March 25, 2020, 1:55 p.m. UTC
Presently the vendor drivers are unable to do the recovery for the
vendor specific recoverable HW errors, reported to the APEI driver
in the vendor defined sections, because APEI driver does not support
reporting the same to the vendor drivers.

This patch set
1. add an interface to the APEI driver to enable the vendor
drivers to register the event handling functions for the corresponding
vendor specific HW errors and report the error to the vendor driver.

2. add driver to handle HiSilicon hip08 PCIe controller's errors
    which is an example application of the above APEI interface.

Changes:

V5:
1. Fix comments from James Morse.
1.1 Changed the notification method to use the atomic_notifier_chain.
1.2 Add the error handled status for the user space.

V4:
1. Fix for the smatch warning in the PCIe error driver:
    warn: should '((((1))) << (9 + i))' be a 64 bit type?
    if (err->val_bits & BIT(HISI_PCIE_LOCAL_VALID_ERR_MISC + i))
	^^^ This should be BIT_ULL() because it goes up to 9 + 32.

V3:
1. Fix the comments from Bjorn Helgaas.

V2:
1. Changes in the HiSilicon PCIe controller's error handling driver
    for the comments from Bjorn Helgaas.

2. Changes in the APEI interface to support reporting the vendor error
    for module with multiple devices, but use the same section type.
    In the error handler will use socket id/sub module id etc to distinguish
    the device.

V1:
1. Fix comments from James Morse.

2. add driver to handle HiSilicon hip08 PCIe controller's errors,
    which is an application of the above interface.

Shiju Jose (1):
   APEI: Add support to notify the vendor specific HW errors

Yicong Yang (1):
   PCI: HIP: Add handling of HiSilicon HIP PCIe controller errors

  drivers/acpi/apei/ghes.c                 |  35 ++-
  drivers/pci/controller/Kconfig           |   8 +
  drivers/pci/controller/Makefile          |   1 +
  drivers/pci/controller/pcie-hisi-error.c | 357 +++++++++++++++++++++++
  drivers/ras/ras.c                        |   5 +-
  include/acpi/ghes.h                      |  28 ++
  include/linux/ras.h                      |   6 +-
  include/ras/ras_event.h                  |   7 +-
  8 files changed, 440 insertions(+), 7 deletions(-)
  create mode 100644 drivers/pci/controller/pcie-hisi-error.c

Comments

Bjorn Helgaas March 25, 2020, 3:22 p.m. UTC | #1
1) If you can post things as a series, i.e., with patch 1/2 and patch
2/2 being responses to the 0/2 cover letter, that makes things easier.
It looks like you did this for the previous postings.

2) When applying these, "git am" complained (but they did apply
cleanly):

  warning: Patch sent with format=flowed; space at the end of lines might be lost.
  Applying: APEI: Add support to notify the vendor specific HW errors
  warning: Patch sent with format=flowed; space at the end of lines might be lost.
  Applying: PCI: HIP: Add handling of HiSilicon HIP PCIe controller errors

3) drivers/pci/controller/pcie-hisi-error.c should be next to
drivers/pci/controller/dwc/pcie-hisi.c, shouldn't it?

4) Your subject lines don't match the convention.  "git log --oneline
drivers/acpi/apei" says:

  011077d8fbfe ("APEI: Add support to notify the vendor specific HW errors")
  cea79e7e2f24 ("apei/ghes: Do not delay GHES polling")
  933ca4e323de ("acpi: Use pr_warn instead of pr_warning")
  6abc7622271d ("ACPI / APEI: Release resources if gen_pool_add() fails")
  bb100b64763c ("ACPI / APEI: Get rid of NULL_UUID_LE constant")
  371b86897d01 ("ACPI / APEI: Remove needless __ghes_check_estatus() calls")

and "git log --oneline --follow drivers/pci/controller/dwc/pcie-hisi*"
says:

  6e0832fa432e ("PCI: Collect all native drivers under drivers/pci/controller/")
  8cfab3cf63cf ("PCI: Add SPDX GPL-2.0 to replace GPL v2 boilerplate")
  5a4751680189 ("PCI: hisi: Constify dw_pcie_host_ops structure")
  b379d385bbaa ("PCI: hisi: Remove unused variable driver")
  a5f40e8098fe ("PCI: Don't allow unbinding host controllers that aren't prepared")
  e313a447e735 ("PCI: hisi: Update PCI config space remap function")
  b9c1153f7a9c ("PCI: hisi: Fix DT binding (hisi-pcie-almost-ecam)")

So your subject lines should be:

  ACPI / APEI: ...
  PCI: hisi: ...

On Wed, Mar 25, 2020 at 01:55:03PM +0000, Shiju Jose wrote:
> Presently the vendor drivers are unable to do the recovery for the
> vendor specific recoverable HW errors, reported to the APEI driver
> in the vendor defined sections, because APEI driver does not support
> reporting the same to the vendor drivers.
> 
> This patch set
> 1. add an interface to the APEI driver to enable the vendor
> drivers to register the event handling functions for the corresponding
> vendor specific HW errors and report the error to the vendor driver.
> 
> 2. add driver to handle HiSilicon hip08 PCIe controller's errors
>    which is an example application of the above APEI interface.
> 
> Changes:
> 
> V5:
> 1. Fix comments from James Morse.
> 1.1 Changed the notification method to use the atomic_notifier_chain.
> 1.2 Add the error handled status for the user space.
> 
> V4:
> 1. Fix for the smatch warning in the PCIe error driver:
>    warn: should '((((1))) << (9 + i))' be a 64 bit type?
>    if (err->val_bits & BIT(HISI_PCIE_LOCAL_VALID_ERR_MISC + i))
> 	^^^ This should be BIT_ULL() because it goes up to 9 + 32.
> 
> V3:
> 1. Fix the comments from Bjorn Helgaas.
> 
> V2:
> 1. Changes in the HiSilicon PCIe controller's error handling driver
>    for the comments from Bjorn Helgaas.
> 
> 2. Changes in the APEI interface to support reporting the vendor error
>    for module with multiple devices, but use the same section type.
>    In the error handler will use socket id/sub module id etc to distinguish
>    the device.
> 
> V1:
> 1. Fix comments from James Morse.
> 
> 2. add driver to handle HiSilicon hip08 PCIe controller's errors,
>    which is an application of the above interface.
> 
> Shiju Jose (1):
>   APEI: Add support to notify the vendor specific HW errors
> 
> Yicong Yang (1):
>   PCI: HIP: Add handling of HiSilicon HIP PCIe controller errors
> 
>  drivers/acpi/apei/ghes.c                 |  35 ++-
>  drivers/pci/controller/Kconfig           |   8 +
>  drivers/pci/controller/Makefile          |   1 +
>  drivers/pci/controller/pcie-hisi-error.c | 357 +++++++++++++++++++++++
>  drivers/ras/ras.c                        |   5 +-
>  include/acpi/ghes.h                      |  28 ++
>  include/linux/ras.h                      |   6 +-
>  include/ras/ras_event.h                  |   7 +-
>  8 files changed, 440 insertions(+), 7 deletions(-)
>  create mode 100644 drivers/pci/controller/pcie-hisi-error.c
> 
> -- 
> 2.17.1
Shiju Jose March 25, 2020, 4:27 p.m. UTC | #2
Hi Bjorn,

>-----Original Message-----
>From: Bjorn Helgaas [mailto:helgaas@kernel.org]
>Sent: 25 March 2020 15:22
>To: Shiju Jose <shiju.jose@huawei.com>
>Cc: linux-acpi@vger.kernel.org; linux-pci@vger.kernel.org; linux-
>kernel@vger.kernel.org; rjw@rjwysocki.net; lenb@kernel.org; bp@alien8.de;
>james.morse@arm.com; tony.luck@intel.com; gregkh@linuxfoundation.org;
>zhangliguang@linux.alibaba.com; tglx@linutronix.de; Linuxarm
><linuxarm@huawei.com>; Jonathan Cameron
><jonathan.cameron@huawei.com>; tanxiaofei <tanxiaofei@huawei.com>;
>yangyicong <yangyicong@huawei.com>
>Subject: Re: [PATCH v5 0/2] ACPI: APEI: Add support to notify the vendor
>specific HW errors
>
>1) If you can post things as a series, i.e., with patch 1/2 and patch
>2/2 being responses to the 0/2 cover letter, that makes things easier.
>It looks like you did this for the previous postings.
I will send the patches as series after fixing the issues in the patch subject lines. 

>
>2) When applying these, "git am" complained (but they did apply
>cleanly):
>
>  warning: Patch sent with format=flowed; space at the end of lines might be
>lost.
>  Applying: APEI: Add support to notify the vendor specific HW errors
>  warning: Patch sent with format=flowed; space at the end of lines might be
>lost.
>  Applying: PCI: HIP: Add handling of HiSilicon HIP PCIe controller errors
>
>3) drivers/pci/controller/pcie-hisi-error.c should be next to
>drivers/pci/controller/dwc/pcie-hisi.c, shouldn't it?
Our hip PCIe controller doesn't use DWC ip.

>
>4) Your subject lines don't match the convention.  "git log --oneline
>drivers/acpi/apei" says:
>
>  011077d8fbfe ("APEI: Add support to notify the vendor specific HW errors")
>  cea79e7e2f24 ("apei/ghes: Do not delay GHES polling")
>  933ca4e323de ("acpi: Use pr_warn instead of pr_warning")
>  6abc7622271d ("ACPI / APEI: Release resources if gen_pool_add() fails")
>  bb100b64763c ("ACPI / APEI: Get rid of NULL_UUID_LE constant")
>  371b86897d01 ("ACPI / APEI: Remove needless __ghes_check_estatus()
>calls")
>
>and "git log --oneline --follow drivers/pci/controller/dwc/pcie-hisi*"
>says:
>
>  6e0832fa432e ("PCI: Collect all native drivers under drivers/pci/controller/")
>  8cfab3cf63cf ("PCI: Add SPDX GPL-2.0 to replace GPL v2 boilerplate")
>  5a4751680189 ("PCI: hisi: Constify dw_pcie_host_ops structure")
>  b379d385bbaa ("PCI: hisi: Remove unused variable driver")
>  a5f40e8098fe ("PCI: Don't allow unbinding host controllers that aren't
>prepared")
>  e313a447e735 ("PCI: hisi: Update PCI config space remap function")
>  b9c1153f7a9c ("PCI: hisi: Fix DT binding (hisi-pcie-almost-ecam)")
>
>So your subject lines should be:
>
>  ACPI / APEI: ...
Sure. I will fix this.

>  PCI: hisi: ...
Can we use PCI: hip because this driver is for the HIP hardware devices. 

[...]
>> --
>> 2.17.1

Thanks,
Shiju
Bjorn Helgaas March 25, 2020, 6:31 p.m. UTC | #3
On Wed, Mar 25, 2020 at 04:27:15PM +0000, Shiju Jose wrote:
> >-----Original Message-----
> >From: Bjorn Helgaas [mailto:helgaas@kernel.org]

> >3) drivers/pci/controller/pcie-hisi-error.c should be next to
> >drivers/pci/controller/dwc/pcie-hisi.c, shouldn't it?
>
> Our hip PCIe controller doesn't use DWC ip.

Ah, I was assuming this pcie-hisi-error.c driver was for the same
device claimed by pcie-hisi.c.

Error drivers like this will have some device-specific knowledge
(e.g., which registers to dump), but I guess they'll always be
used with the generic acpi/pci_root.c driver, right?

It looks like this driver has little or nothing to do with the PCI
core directly.  It does include drivers/pci/pci.h, but I'm not sure it
really needs it.

Maybe drivers/pci/controller/ is the best place for it, but I'm not
sure.  It's a little confusing because it's not really like the other
things there.

There are some vaguely similar things in drivers/acpi/apei/ and
drivers/acpi/nfit/.  And of course there are .acpi_match_table uses
all over the drivers/ tree.  Maybe we need a new subdirectory under
drivers/pci?  drivers/pci/controller/apei/?

Any thoughts, Rafael?

Bjorn