[v5] cxl: add RAS status unmasking for CXL

Message ID	167120700197.3444116.12992018879314163331.stgit@djiang5-desk3.ch.intel.com
State	Superseded
Headers	show Return-Path: <linux-cxl-owner@kernel.org> Subject: [PATCH v5] cxl: add RAS status unmasking for CXL From: Dave Jiang <dave.jiang@intel.com> To: linux-cxl@vger.kernel.org Cc: dan.j.williams@intel.com, ira.weiny@intel.com, vishal.l.verma@intel.com, alison.schofield@intel.com, Jonathan.Cameron@huawei.com Date: Fri, 16 Dec 2022 09:10:02 -0700 Message-ID: <167120700197.3444116.12992018879314163331.stgit@djiang5-desk3.ch.intel.com> User-Agent: StGit/1.4 MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Precedence: bulk
Series	[v5] cxl: add RAS status unmasking for CXL \| expand [v5] cxl: add RAS status unmasking for CXL

Dave Jiang Dec. 16, 2022, 4:10 p.m. UTC

By default the CXL RAS mask registers bits are defaulted to 1's and
suppress all error reporting. If the kernel has negotiated ownership
of error handling for CXL then unmask the mask registers by writing 0s.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>

---

Based on patch posted by Ira [1] to export CXL native error reporting control.

[1]: https://lore.kernel.org/linux-cxl/20221212070627.1372402-2-ira.weiny@intel.com/

v5:
- Add single debug out to show mask changing. (Dan)

v4:
- Fix masking of RAS register. (Jonathan)

v3:
- Remove flex bus port status check. (Jonathan)
- Only unmask known mask bits. (Jonathan)

v2:
- Add definition of PCI_EXP_LNKSTA2_FLIT. (Dan)
- Return error for cxl_pci_ras_unmask(). (Dan)
- Add dev_dbg() for register bits to be cleared. (Dan)
- Check Flex Port DVSEC status. (Dan)
---
 drivers/cxl/cxl.h             |    1 +
 drivers/cxl/pci.c             |   49 +++++++++++++++++++++++++++++++++++++++++
 include/uapi/linux/pci_regs.h |    1 +
 3 files changed, 51 insertions(+)

Jonathan Cameron Dec. 17, 2022, 5:52 p.m. UTC | #1

On Fri, 16 Dec 2022 09:10:02 -0700
Dave Jiang <dave.jiang@intel.com> wrote:

> By default the CXL RAS mask registers bits are defaulted to 1's and
> suppress all error reporting. If the kernel has negotiated ownership
> of error handling for CXL then unmask the mask registers by writing 0s.
> 
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Dave Jiang <dave.jiang@intel.
+CC Bjorn for a question on the AER control element of this.

I realized that adding this patch still only enables error because
I didn't check the PCIe spec when writing the QEMU emulation. I had changed
the value of  "Correctable Internal Error Mask" to default to unmasked.
PCIe 6.0 says it defaults to masked.  For some reason I thought these
masks were impdef (should have checked ;)

I'll drop that change in QEMU, but upshot is that if we want the correctable
ones to be delivered, we'll need to clear that bit as well as the ones you
do in this patch.

This got me thinking about why I never saw the same issue for UNC
errors.  Upshot, QEMU never sets the UNCOR mask so defaults to everything on
(it also doesn't allow anyone to write the register). Curiously it has
the code that uses the mask even though there was no means to set any
bits in it. 
I'll fix that (draft patch below) + we should update lspci to cover more of these AER
flags as it's a PITA debugging by reading the hex dumps!

Bjorn, is there any convention on drivers enabling these 'default' masked
AER errors?  Or is expectation that this is policy and userspace should
be dealing with it?

Dave, if you want to test on QEMU in meantime revert:
hw/mem/cxl_type3: Set mask to signal Correct Internal Errors.

and make this change to qemu:

From b0f37bafcc734fe15022c47a2da7f8326b1593c2 Mon Sep 17 00:00:00 2001
From: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Date: Sat, 17 Dec 2022 17:43:38 +0000
Subject: [PATCH] hw/pci/aer: Implement PCI_ERR_UNCOR_MASK register

This register in AER should be both writeable and should
have a default value with a couple of the errors masked
including the Uncorrectable Internal Error used by CXL for
it's error reporting.

Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
---
 hw/pci/pcie_aer.c          | 4 ++++
 include/hw/pci/pcie_regs.h | 3 +++
 2 files changed, 7 insertions(+)

diff --git a/hw/pci/pcie_aer.c b/hw/pci/pcie_aer.c
index 00ef4b5601..935fbc57b5 100644
--- a/hw/pci/pcie_aer.c
+++ b/hw/pci/pcie_aer.c
@@ -122,6 +122,10 @@ int pcie_aer_init(PCIDevice *dev, uint8_t cap_ver, uint16_t offset,

     pci_set_long(dev->w1cmask + offset + PCI_ERR_UNCOR_STATUS,
                  PCI_ERR_UNC_SUPPORTED);
+    pci_set_long(dev->config + offset + PCI_ERR_UNCOR_MASK,
+                 PCI_ERR_UNC_MASK_DEFAULT);
+    pci_set_long(dev->wmask + offset + PCI_ERR_UNCOR_MASK,
+                 PCI_ERR_UNC_SUPPORTED);

     pci_set_long(dev->config + offset + PCI_ERR_UNCOR_SEVER,
                  PCI_ERR_UNC_SEVERITY_DEFAULT);
diff --git a/include/hw/pci/pcie_regs.h b/include/hw/pci/pcie_regs.h
index 963dc2e170..6ec4785448 100644
--- a/include/hw/pci/pcie_regs.h
+++ b/include/hw/pci/pcie_regs.h
@@ -155,6 +155,9 @@ typedef enum PCIExpLinkWidth {
                                          PCI_ERR_UNC_ATOP_EBLOCKED |    \
                                          PCI_ERR_UNC_TLP_PRF_BLOCKED)

+#define PCI_ERR_UNC_MASK_DEFAULT        (PCI_ERR_UNC_INTN | \
+                                         PCI_ERR_UNC_TLP_PRF_BLOCKED)
+
 #define PCI_ERR_UNC_SEVERITY_DEFAULT    (PCI_ERR_UNC_DLP |              \
                                          PCI_ERR_UNC_SDN |              \
                                          PCI_ERR_UNC_FCP |              \
--
2.37.2

I'm testing the kernel side with:

From a5da67af2d2b4e792de2d88c74a68da3a17d2872 Mon Sep 17 00:00:00 2001
From: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Date: Sat, 17 Dec 2022 17:46:45 +0000
Subject: [PATCH] HACK: Enable the AER internal errors

Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
---
 drivers/cxl/pci.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
index bfdf92a9aef6..ef7a6ab41eb7 100644
--- a/drivers/cxl/pci.c
+++ b/drivers/cxl/pci.c
@@ -778,7 +778,18 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
                return rc;

        if (cxlds->regs.ras) {
+               u32 mask;
+               printk("trying to unmask\n");
                pci_enable_pcie_error_reporting(pdev);
+               pci_read_config_dword(pdev, pdev->aer_cap + PCI_ERR_COR_MASK, &mask);
+               mask &= ~PCI_ERR_COR_INTERNAL;
+               printk("unamsking cor\n");
+               pci_write_config_word(pdev, pdev->aer_cap + PCI_ERR_COR_MASK, mask);
+
+               pci_read_config_dword(pdev, pdev->aer_cap + PCI_ERR_UNCOR_MASK, &mask);
+               mask &= ~PCI_ERR_UNC_INTN;
+               printk("unmasking uncor\n");
+               pci_write_config_dword(pdev, pdev->aer_cap + PCI_ERR_UNCOR_MASK, mask);
                cxl_pci_ras_unmask(pdev);
                rc = devm_add_action_or_reset(&pdev->dev, disable_aer, pdev);
                if (rc)
--
2.37.2



> 
> ---
> 
> Based on patch posted by Ira [1] to export CXL native error reporting control.
> 
> [1]: https://lore.kernel.org/linux-cxl/20221212070627.1372402-2-ira.weiny@intel.com/
> 
> v5:
> - Add single debug out to show mask changing. (Dan)
> 
> v4:
> - Fix masking of RAS register. (Jonathan)
> 
> v3:
> - Remove flex bus port status check. (Jonathan)
> - Only unmask known mask bits. (Jonathan)
> 
> v2:
> - Add definition of PCI_EXP_LNKSTA2_FLIT. (Dan)
> - Return error for cxl_pci_ras_unmask(). (Dan)
> - Add dev_dbg() for register bits to be cleared. (Dan)
> - Check Flex Port DVSEC status. (Dan)
> ---
>  drivers/cxl/cxl.h             |    1 +
>  drivers/cxl/pci.c             |   49 +++++++++++++++++++++++++++++++++++++++++
>  include/uapi/linux/pci_regs.h |    1 +
>  3 files changed, 51 insertions(+)
> 
> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> index 1b1cf459ac77..31e795c6d537 100644
> --- a/drivers/cxl/cxl.h
> +++ b/drivers/cxl/cxl.h
> @@ -130,6 +130,7 @@ static inline int ways_to_eiw(unsigned int ways, u8 *eiw)
>  #define   CXL_RAS_UNCORRECTABLE_STATUS_MASK (GENMASK(16, 14) | GENMASK(11, 0))
>  #define CXL_RAS_UNCORRECTABLE_MASK_OFFSET 0x4
>  #define   CXL_RAS_UNCORRECTABLE_MASK_MASK (GENMASK(16, 14) | GENMASK(11, 0))
> +#define   CXL_RAS_UNCORRECTABLE_MASK_F256B_MASK BIT(8)
>  #define CXL_RAS_UNCORRECTABLE_SEVERITY_OFFSET 0x8
>  #define   CXL_RAS_UNCORRECTABLE_SEVERITY_MASK (GENMASK(16, 14) | GENMASK(11, 0))
>  #define CXL_RAS_CORRECTABLE_STATUS_OFFSET 0xC
> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> index 33083a522fd1..1e5fc69bcc4d 100644
> --- a/drivers/cxl/pci.c
> +++ b/drivers/cxl/pci.c
> @@ -419,6 +419,54 @@ static void disable_aer(void *pdev)
>  	pci_disable_pcie_error_reporting(pdev);
>  }
>  
> +/*
> + * CXL v3.0 6.2.3 Table 6-4
> + * The table indicates that if PCIe Flit Mode is set, then CXL is in 256B flits
> + * mode, otherwise it's 68B flits mode.
> + */
> +static bool cxl_pci_flit_256(struct pci_dev *pdev)
> +{
> +	u32 lnksta2;
> +
> +	pcie_capability_read_dword(pdev, PCI_EXP_LNKSTA2, &lnksta2);
> +	return lnksta2 & PCI_EXP_LNKSTA2_FLIT;
> +}
> +
> +static int cxl_pci_ras_unmask(struct pci_dev *pdev)
> +{
> +	struct pci_host_bridge *host_bridge = pci_find_host_bridge(pdev->bus);
> +	struct cxl_dev_state *cxlds = pci_get_drvdata(pdev);
> +	void __iomem *addr;
> +	u32 orig_val, val, mask;
> +
> +	if (!cxlds->regs.ras)
> +		return -ENODEV;
> +
> +	/* BIOS has CXL error control */
> +	if (!host_bridge->native_cxl_error)
> +		return -EOPNOTSUPP;
> +
> +	addr = cxlds->regs.ras + CXL_RAS_UNCORRECTABLE_MASK_OFFSET;
> +	orig_val = readl(addr);
> +
> +	mask = CXL_RAS_UNCORRECTABLE_MASK_MASK;
> +	if (!cxl_pci_flit_256(pdev))
> +		mask &= ~CXL_RAS_UNCORRECTABLE_MASK_F256B_MASK;
> +	val = orig_val & ~mask;
> +	writel(val, addr);
> +	dev_dbg(&pdev->dev, "Uncorrectable RAS Errors Mask: %#x -> %#x\n",
> +		orig_val, val);
> +
> +	addr = cxlds->regs.ras + CXL_RAS_CORRECTABLE_MASK_OFFSET;
> +	orig_val = readl(addr);
> +	val = orig_val & ~CXL_RAS_CORRECTABLE_MASK_MASK;
> +	writel(val, addr);
> +	dev_dbg(&pdev->dev, "Correctable RAS Errors Mask: %#x -> %#x\n",
> +		orig_val, val);
> +
> +	return 0;
> +}
> +
>  static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
>  {
>  	struct cxl_register_map map;
> @@ -498,6 +546,7 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
>  
>  	if (cxlds->regs.ras) {
>  		pci_enable_pcie_error_reporting(pdev);
> +		cxl_pci_ras_unmask(pdev);
>  		rc = devm_add_action_or_reset(&pdev->dev, disable_aer, pdev);
>  		if (rc)
>  			return rc;
> diff --git a/include/uapi/linux/pci_regs.h b/include/uapi/linux/pci_regs.h
> index 82a03ea954af..576ee2ec973f 100644
> --- a/include/uapi/linux/pci_regs.h
> +++ b/include/uapi/linux/pci_regs.h
> @@ -693,6 +693,7 @@
>  #define  PCI_EXP_LNKCTL2_TX_MARGIN	0x0380 /* Transmit Margin */
>  #define  PCI_EXP_LNKCTL2_HASD		0x0020 /* HW Autonomous Speed Disable */
>  #define PCI_EXP_LNKSTA2		0x32	/* Link Status 2 */
> +#define  PCI_EXP_LNKSTA2_FLIT		BIT(10) /* Flit Mode Status */
>  #define PCI_CAP_EXP_ENDPOINT_SIZEOF_V2	0x32	/* end of v2 EPs w/ link */
>  #define PCI_EXP_SLTCAP2		0x34	/* Slot Capabilities 2 */
>  #define  PCI_EXP_SLTCAP2_IBPD	0x00000001 /* In-band PD Disable Supported */
> 
>

Jonathan Cameron Dec. 17, 2022, 6:05 p.m. UTC | #2

On Sat, 17 Dec 2022 17:52:04 +0000
Jonathan Cameron <Jonathan.Cameron@Huawei.com> wrote:

> On Fri, 16 Dec 2022 09:10:02 -0700
> Dave Jiang <dave.jiang@intel.com> wrote:
> 
> > By default the CXL RAS mask registers bits are defaulted to 1's and
> > suppress all error reporting. If the kernel has negotiated ownership
> > of error handling for CXL then unmask the mask registers by writing 0s.
> > 
> > Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > Signed-off-by: Dave Jiang <dave.jiang@intel.
> +CC Bjorn for a question on the AER control element of this.
> 
> I realized that adding this patch still only enables error because
> I didn't check the PCIe spec when writing the QEMU emulation. I had changed
> the value of  "Correctable Internal Error Mask" to default to unmasked.
> PCIe 6.0 says it defaults to masked.  For some reason I thought these
> masks were impdef (should have checked ;)
> 
> I'll drop that change in QEMU, but upshot is that if we want the correctable
> ones to be delivered, we'll need to clear that bit as well as the ones you
> do in this patch.
> 
> This got me thinking about why I never saw the same issue for UNC
> errors.  Upshot, QEMU never sets the UNCOR mask so defaults to everything on
> (it also doesn't allow anyone to write the register). Curiously it has
> the code that uses the mask even though there was no means to set any
> bits in it. 
> I'll fix that (draft patch below) + we should update lspci to cover more of these AER
> flags as it's a PITA debugging by reading the hex dumps!
> 
> Bjorn, is there any convention on drivers enabling these 'default' masked
> AER errors?  Or is expectation that this is policy and userspace should
> be dealing with it?
> 
> Dave, if you want to test on QEMU in meantime revert:
> hw/mem/cxl_type3: Set mask to signal Correct Internal Errors.

If you want to have some pre holidays fun, it's instructive to
inject a whole bunch of errors in a row (just paste a bunch of json entries in QMP telnet settions all
at once - this might happen on single errors, but that's how I'm triggering it) I've had several
crashes in the recovery path after resets.  E.g.

pcieport 0000:0e:00.0: AER: Downstream Port link has been reset (0)
cxl_pci 0000:0f:00.0: mem1: restart CXL.mem after slot reset
cxl_pci 0000:0f:00.0: mem1: error resume successful
pcieport 0000:0e:00.0: AER: device recovery successful
pcieport 0000:0c:00.0: AER: Uncorrected (Fatal) error received: 0000:0f:00.0
cxl_pci 0000:0f:00.0: AER: PCIe Bus Error: severity=Uncorrected (Fatal), type=Inaccessible, (Unregistered Agent ID)
cxl_pci 0000:0f:00.0: mem1: frozen state error detected, disable CXL.mem
Unable to handle kernel NULL pointer dereference at virtual address 0000000000000358

....

[  133.722701] Call trace:
[  133.723627]  cxl_pmem_ctl+0x74/0x244 [cxl_pmem]
[  133.724974]  nvdimm_init_nsarea+0xb8/0xdc
[  133.727026]  nvdimm_probe+0xc0/0x1d0
[  133.728639]  nvdimm_bus_probe+0x90/0x200
[  133.730141]  really_probe+0xc8/0x3e0
[  133.731422]  __driver_probe_device+0x84/0x190
[  133.732616]  driver_probe_device+0x44/0x120
[  133.734395]  __device_attach_driver+0xc4/0x160
[  133.735723]  bus_for_each_drv+0x80/0xe0
[  133.736732]  __device_attach+0xa4/0x1c4
[  133.737876]  device_initial_probe+0x1c/0x30
[  133.740066]  bus_probe_device+0xa4/0xb0
[  133.742216]  device_add+0x404/0x920
[  133.743158]  nd_async_device_register+0x20/0x70
[  133.745009]  async_run_entry_fn+0x3c/0x154
[  133.746433]  process_one_work+0x200/0x474
[  133.748023]  worker_thread+0x74/0x43c
[  133.749899]  kthread+0x110/0x114
[  133.751393]  ret_from_fork+0x10/0x20

I did see something the other day that looked similar when manually probing
drivers in a particular order but haven't chased that one down yet.

Our test teams love sending bug reports on what happens when you get the following
N errors all at once.  Often reveal entertainingly complex bugs.

Jonathan

> 
> and make this change to qemu:
> 
> From b0f37bafcc734fe15022c47a2da7f8326b1593c2 Mon Sep 17 00:00:00 2001
> From: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Date: Sat, 17 Dec 2022 17:43:38 +0000
> Subject: [PATCH] hw/pci/aer: Implement PCI_ERR_UNCOR_MASK register
> 
> This register in AER should be both writeable and should
> have a default value with a couple of the errors masked
> including the Uncorrectable Internal Error used by CXL for
> it's error reporting.
> 
> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> ---
>  hw/pci/pcie_aer.c          | 4 ++++
>  include/hw/pci/pcie_regs.h | 3 +++
>  2 files changed, 7 insertions(+)
> 
> diff --git a/hw/pci/pcie_aer.c b/hw/pci/pcie_aer.c
> index 00ef4b5601..935fbc57b5 100644
> --- a/hw/pci/pcie_aer.c
> +++ b/hw/pci/pcie_aer.c
> @@ -122,6 +122,10 @@ int pcie_aer_init(PCIDevice *dev, uint8_t cap_ver, uint16_t offset,
> 
>      pci_set_long(dev->w1cmask + offset + PCI_ERR_UNCOR_STATUS,
>                   PCI_ERR_UNC_SUPPORTED);
> +    pci_set_long(dev->config + offset + PCI_ERR_UNCOR_MASK,
> +                 PCI_ERR_UNC_MASK_DEFAULT);
> +    pci_set_long(dev->wmask + offset + PCI_ERR_UNCOR_MASK,
> +                 PCI_ERR_UNC_SUPPORTED);
> 
>      pci_set_long(dev->config + offset + PCI_ERR_UNCOR_SEVER,
>                   PCI_ERR_UNC_SEVERITY_DEFAULT);
> diff --git a/include/hw/pci/pcie_regs.h b/include/hw/pci/pcie_regs.h
> index 963dc2e170..6ec4785448 100644
> --- a/include/hw/pci/pcie_regs.h
> +++ b/include/hw/pci/pcie_regs.h
> @@ -155,6 +155,9 @@ typedef enum PCIExpLinkWidth {
>                                           PCI_ERR_UNC_ATOP_EBLOCKED |    \
>                                           PCI_ERR_UNC_TLP_PRF_BLOCKED)
> 
> +#define PCI_ERR_UNC_MASK_DEFAULT        (PCI_ERR_UNC_INTN | \
> +                                         PCI_ERR_UNC_TLP_PRF_BLOCKED)
> +
>  #define PCI_ERR_UNC_SEVERITY_DEFAULT    (PCI_ERR_UNC_DLP |              \
>                                           PCI_ERR_UNC_SDN |              \
>                                           PCI_ERR_UNC_FCP |              \
> --
> 2.37.2
> 
> I'm testing the kernel side with:
> 
> From a5da67af2d2b4e792de2d88c74a68da3a17d2872 Mon Sep 17 00:00:00 2001
> From: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Date: Sat, 17 Dec 2022 17:46:45 +0000
> Subject: [PATCH] HACK: Enable the AER internal errors
> 
> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> ---
>  drivers/cxl/pci.c | 11 +++++++++++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> index bfdf92a9aef6..ef7a6ab41eb7 100644
> --- a/drivers/cxl/pci.c
> +++ b/drivers/cxl/pci.c
> @@ -778,7 +778,18 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
>                 return rc;
> 
>         if (cxlds->regs.ras) {
> +               u32 mask;
> +               printk("trying to unmask\n");
>                 pci_enable_pcie_error_reporting(pdev);
> +               pci_read_config_dword(pdev, pdev->aer_cap + PCI_ERR_COR_MASK, &mask);
> +               mask &= ~PCI_ERR_COR_INTERNAL;
> +               printk("unamsking cor\n");
> +               pci_write_config_word(pdev, pdev->aer_cap + PCI_ERR_COR_MASK, mask);
> +
> +               pci_read_config_dword(pdev, pdev->aer_cap + PCI_ERR_UNCOR_MASK, &mask);
> +               mask &= ~PCI_ERR_UNC_INTN;
> +               printk("unmasking uncor\n");
> +               pci_write_config_dword(pdev, pdev->aer_cap + PCI_ERR_UNCOR_MASK, mask);
>                 cxl_pci_ras_unmask(pdev);
>                 rc = devm_add_action_or_reset(&pdev->dev, disable_aer, pdev);
>                 if (rc)
> --
> 2.37.2
> 
> 
> 
> > 
> > ---
> > 
> > Based on patch posted by Ira [1] to export CXL native error reporting control.
> > 
> > [1]: https://lore.kernel.org/linux-cxl/20221212070627.1372402-2-ira.weiny@intel.com/
> > 
> > v5:
> > - Add single debug out to show mask changing. (Dan)
> > 
> > v4:
> > - Fix masking of RAS register. (Jonathan)
> > 
> > v3:
> > - Remove flex bus port status check. (Jonathan)
> > - Only unmask known mask bits. (Jonathan)
> > 
> > v2:
> > - Add definition of PCI_EXP_LNKSTA2_FLIT. (Dan)
> > - Return error for cxl_pci_ras_unmask(). (Dan)
> > - Add dev_dbg() for register bits to be cleared. (Dan)
> > - Check Flex Port DVSEC status. (Dan)
> > ---
> >  drivers/cxl/cxl.h             |    1 +
> >  drivers/cxl/pci.c             |   49 +++++++++++++++++++++++++++++++++++++++++
> >  include/uapi/linux/pci_regs.h |    1 +
> >  3 files changed, 51 insertions(+)
> > 
> > diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> > index 1b1cf459ac77..31e795c6d537 100644
> > --- a/drivers/cxl/cxl.h
> > +++ b/drivers/cxl/cxl.h
> > @@ -130,6 +130,7 @@ static inline int ways_to_eiw(unsigned int ways, u8 *eiw)
> >  #define   CXL_RAS_UNCORRECTABLE_STATUS_MASK (GENMASK(16, 14) | GENMASK(11, 0))
> >  #define CXL_RAS_UNCORRECTABLE_MASK_OFFSET 0x4
> >  #define   CXL_RAS_UNCORRECTABLE_MASK_MASK (GENMASK(16, 14) | GENMASK(11, 0))
> > +#define   CXL_RAS_UNCORRECTABLE_MASK_F256B_MASK BIT(8)
> >  #define CXL_RAS_UNCORRECTABLE_SEVERITY_OFFSET 0x8
> >  #define   CXL_RAS_UNCORRECTABLE_SEVERITY_MASK (GENMASK(16, 14) | GENMASK(11, 0))
> >  #define CXL_RAS_CORRECTABLE_STATUS_OFFSET 0xC
> > diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> > index 33083a522fd1..1e5fc69bcc4d 100644
> > --- a/drivers/cxl/pci.c
> > +++ b/drivers/cxl/pci.c
> > @@ -419,6 +419,54 @@ static void disable_aer(void *pdev)
> >  	pci_disable_pcie_error_reporting(pdev);
> >  }
> >  
> > +/*
> > + * CXL v3.0 6.2.3 Table 6-4
> > + * The table indicates that if PCIe Flit Mode is set, then CXL is in 256B flits
> > + * mode, otherwise it's 68B flits mode.
> > + */
> > +static bool cxl_pci_flit_256(struct pci_dev *pdev)
> > +{
> > +	u32 lnksta2;
> > +
> > +	pcie_capability_read_dword(pdev, PCI_EXP_LNKSTA2, &lnksta2);
> > +	return lnksta2 & PCI_EXP_LNKSTA2_FLIT;
> > +}
> > +
> > +static int cxl_pci_ras_unmask(struct pci_dev *pdev)
> > +{
> > +	struct pci_host_bridge *host_bridge = pci_find_host_bridge(pdev->bus);
> > +	struct cxl_dev_state *cxlds = pci_get_drvdata(pdev);
> > +	void __iomem *addr;
> > +	u32 orig_val, val, mask;
> > +
> > +	if (!cxlds->regs.ras)
> > +		return -ENODEV;
> > +
> > +	/* BIOS has CXL error control */
> > +	if (!host_bridge->native_cxl_error)
> > +		return -EOPNOTSUPP;
> > +
> > +	addr = cxlds->regs.ras + CXL_RAS_UNCORRECTABLE_MASK_OFFSET;
> > +	orig_val = readl(addr);
> > +
> > +	mask = CXL_RAS_UNCORRECTABLE_MASK_MASK;
> > +	if (!cxl_pci_flit_256(pdev))
> > +		mask &= ~CXL_RAS_UNCORRECTABLE_MASK_F256B_MASK;
> > +	val = orig_val & ~mask;
> > +	writel(val, addr);
> > +	dev_dbg(&pdev->dev, "Uncorrectable RAS Errors Mask: %#x -> %#x\n",
> > +		orig_val, val);
> > +
> > +	addr = cxlds->regs.ras + CXL_RAS_CORRECTABLE_MASK_OFFSET;
> > +	orig_val = readl(addr);
> > +	val = orig_val & ~CXL_RAS_CORRECTABLE_MASK_MASK;
> > +	writel(val, addr);
> > +	dev_dbg(&pdev->dev, "Correctable RAS Errors Mask: %#x -> %#x\n",
> > +		orig_val, val);
> > +
> > +	return 0;
> > +}
> > +
> >  static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
> >  {
> >  	struct cxl_register_map map;
> > @@ -498,6 +546,7 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
> >  
> >  	if (cxlds->regs.ras) {
> >  		pci_enable_pcie_error_reporting(pdev);
> > +		cxl_pci_ras_unmask(pdev);
> >  		rc = devm_add_action_or_reset(&pdev->dev, disable_aer, pdev);
> >  		if (rc)
> >  			return rc;
> > diff --git a/include/uapi/linux/pci_regs.h b/include/uapi/linux/pci_regs.h
> > index 82a03ea954af..576ee2ec973f 100644
> > --- a/include/uapi/linux/pci_regs.h
> > +++ b/include/uapi/linux/pci_regs.h
> > @@ -693,6 +693,7 @@
> >  #define  PCI_EXP_LNKCTL2_TX_MARGIN	0x0380 /* Transmit Margin */
> >  #define  PCI_EXP_LNKCTL2_HASD		0x0020 /* HW Autonomous Speed Disable */
> >  #define PCI_EXP_LNKSTA2		0x32	/* Link Status 2 */
> > +#define  PCI_EXP_LNKSTA2_FLIT		BIT(10) /* Flit Mode Status */
> >  #define PCI_CAP_EXP_ENDPOINT_SIZEOF_V2	0x32	/* end of v2 EPs w/ link */
> >  #define PCI_EXP_SLTCAP2		0x34	/* Slot Capabilities 2 */
> >  #define  PCI_EXP_SLTCAP2_IBPD	0x00000001 /* In-band PD Disable Supported */
> > 
> > 
>

Bjorn Helgaas Dec. 29, 2022, 5:27 p.m. UTC | #3

[+cc Stefan, Kuppuswamy]

On Sat, Dec 17, 2022 at 05:52:04PM +0000, Jonathan Cameron wrote:
> On Fri, 16 Dec 2022 09:10:02 -0700
> Dave Jiang <dave.jiang@intel.com> wrote:
> 
> > By default the CXL RAS mask registers bits are defaulted to 1's and
> > suppress all error reporting. If the kernel has negotiated ownership
> > of error handling for CXL then unmask the mask registers by writing 0s.

> +CC Bjorn for a question on the AER control element of this.

Timely question (unlike my response, sorry, I've been on vacation).

> I realized that adding this patch still only enables error because I
> didn't check the PCIe spec when writing the QEMU emulation. I had
> changed the value of  "Correctable Internal Error Mask" to default
> to unmasked.  PCIe 6.0 says it defaults to masked.  For some reason
> I thought these masks were impdef (should have checked ;)

I assume you refer to the AER "Corrected Internal Error Mask" bit
(PCIe r6.0, sec 7.8.4.6), which indeed defaults to 1b (masked) if the
bit is implemented.

> I'll drop that change in QEMU, but upshot is that if we want the
> correctable ones to be delivered, we'll need to clear that bit as
> well as the ones you do in this patch.

I don't think Corrected Internal Error Mask affects reporting of other
Correctable Errors, e.g., Receiver Error, Bad TLP, Bad DLLP, etc.  As
I read sec 6.2.10, those would not be classified as "internal" errors.

> This got me thinking about why I never saw the same issue for UNC
> errors.  Upshot, QEMU never sets the UNCOR mask so defaults to everything on
> (it also doesn't allow anyone to write the register). Curiously it has
> the code that uses the mask even though there was no means to set any
> bits in it. 
> I'll fix that (draft patch below) + we should update lspci to cover more of these AER
> flags as it's a PITA debugging by reading the hex dumps!
> 
> Bjorn, is there any convention on drivers enabling these 'default' masked
> AER errors?  Or is expectation that this is policy and userspace should
> be dealing with it?

I think AER configuration should generally be a system policy
decision, not a driver-level choice (unless we need quirks to work
around device defects, of course).

We now have f26e58bf6f54 ("PCI/AER: Enable error reporting when AER is
native"), which turns on error reporting in Device Control for all
devices at enumeration-time when the OS has control of AER.  But this
is only the generic device-level control; it doesn't configure any
*AER* registers.

I'm surprised to learn that the only writes to PCI_ERR_UNCOR_MASK are
some mips and powerpc arch-specific code and a few individual drivers.
It seems like maybe pci_aer_init() should do some more configuration
of the AER mask and severity registers.

Bjorn

Jonathan Cameron Jan. 5, 2023, 4:31 p.m. UTC | #4

On Thu, 29 Dec 2022 11:27:31 -0600
Bjorn Helgaas <helgaas@kernel.org> wrote:

> [+cc Stefan, Kuppuswamy]
> 
> On Sat, Dec 17, 2022 at 05:52:04PM +0000, Jonathan Cameron wrote:
> > On Fri, 16 Dec 2022 09:10:02 -0700
> > Dave Jiang <dave.jiang@intel.com> wrote:
> >   
> > > By default the CXL RAS mask registers bits are defaulted to 1's and
> > > suppress all error reporting. If the kernel has negotiated ownership
> > > of error handling for CXL then unmask the mask registers by writing 0s.  
> 
> > +CC Bjorn for a question on the AER control element of this.  
> 
> Timely question (unlike my response, sorry, I've been on vacation).
> 
> > I realized that adding this patch still only enables error because I
> > didn't check the PCIe spec when writing the QEMU emulation. I had
> > changed the value of  "Correctable Internal Error Mask" to default
> > to unmasked.  PCIe 6.0 says it defaults to masked.  For some reason
> > I thought these masks were impdef (should have checked ;)  
> 
> I assume you refer to the AER "Corrected Internal Error Mask" bit
> (PCIe r6.0, sec 7.8.4.6), which indeed defaults to 1b (masked) if the
> bit is implemented.

Spot on.  I keep confusing the correctable / corrected stuff in PCIe.
Made more confusing by the CXL stuff layered on top.

> 
> > I'll drop that change in QEMU, but upshot is that if we want the
> > correctable ones to be delivered, we'll need to clear that bit as
> > well as the ones you do in this patch.  
> 
> I don't think Corrected Internal Error Mask affects reporting of other
> Correctable Errors, e.g., Receiver Error, Bad TLP, Bad DLLP, etc.  As
> I read sec 6.2.10, those would not be classified as "internal" errors.

I should have been more detailed in this part. You are absolutely correct
for errors defined by the PCIe spec.

CXL handles it's Protocol errors by reporting them all as 'internal'
errors as defined by PCIe and then providing extra register to demultiplex
that into a range of different CXL protocol errors. A similar set of
mask /severity registers to those found in PCIe for PCIe defined protocol errors
are used for fine grained control.  Currently there are about 7
correctable and 7 uncorrectable errors in CXL r3.0
It was the CXL correctable errors that I meant to refer to with
the above comment.


> 
> > This got me thinking about why I never saw the same issue for UNC
> > errors.  Upshot, QEMU never sets the UNCOR mask so defaults to everything on
> > (it also doesn't allow anyone to write the register). Curiously it has
> > the code that uses the mask even though there was no means to set any
> > bits in it. 
> > I'll fix that (draft patch below) + we should update lspci to cover more of these AER
> > flags as it's a PITA debugging by reading the hex dumps!
> > 
> > Bjorn, is there any convention on drivers enabling these 'default' masked
> > AER errors?  Or is expectation that this is policy and userspace should
> > be dealing with it?  
> 
> I think AER configuration should generally be a system policy
> decision, not a driver-level choice (unless we need quirks to work
> around device defects, of course).

Makes sense, though we may need a second layer of system policy to
cover CXL errors - currently that's what this patch is pushing to the
driver.

> 
> We now have f26e58bf6f54 ("PCI/AER: Enable error reporting when AER is
> native"), which turns on error reporting in Device Control for all
> devices at enumeration-time when the OS has control of AER.  But this
> is only the generic device-level control; it doesn't configure any
> *AER* registers.
> 
> I'm surprised to learn that the only writes to PCI_ERR_UNCOR_MASK are
> some mips and powerpc arch-specific code and a few individual drivers.
> It seems like maybe pci_aer_init() should do some more configuration
> of the AER mask and severity registers.

Sounds good.  Any thoughts on where to get the policy from?
Feels like an administrator thing rather than a kernel config one
to me, so maybe pci_aer_init() is too early or we'd benefit from
a nice easy per device interface to tweak a default?

Jonathan

> 
> Bjorn

Dave Jiang Jan. 5, 2023, 4:53 p.m. UTC | #5

On 12/17/22 11:05 AM, Jonathan Cameron wrote:
> On Sat, 17 Dec 2022 17:52:04 +0000
> Jonathan Cameron <Jonathan.Cameron@Huawei.com> wrote:
> 
>> On Fri, 16 Dec 2022 09:10:02 -0700
>> Dave Jiang <dave.jiang@intel.com> wrote:
>>
>>> By default the CXL RAS mask registers bits are defaulted to 1's and
>>> suppress all error reporting. If the kernel has negotiated ownership
>>> of error handling for CXL then unmask the mask registers by writing 0s.
>>>
>>> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>>> Signed-off-by: Dave Jiang <dave.jiang@intel.
>> +CC Bjorn for a question on the AER control element of this.
>>
>> I realized that adding this patch still only enables error because
>> I didn't check the PCIe spec when writing the QEMU emulation. I had changed
>> the value of  "Correctable Internal Error Mask" to default to unmasked.
>> PCIe 6.0 says it defaults to masked.  For some reason I thought these
>> masks were impdef (should have checked ;)
>>
>> I'll drop that change in QEMU, but upshot is that if we want the correctable
>> ones to be delivered, we'll need to clear that bit as well as the ones you
>> do in this patch.
>>
>> This got me thinking about why I never saw the same issue for UNC
>> errors.  Upshot, QEMU never sets the UNCOR mask so defaults to everything on
>> (it also doesn't allow anyone to write the register). Curiously it has
>> the code that uses the mask even though there was no means to set any
>> bits in it.
>> I'll fix that (draft patch below) + we should update lspci to cover more of these AER
>> flags as it's a PITA debugging by reading the hex dumps!
>>
>> Bjorn, is there any convention on drivers enabling these 'default' masked
>> AER errors?  Or is expectation that this is policy and userspace should
>> be dealing with it?
>>
>> Dave, if you want to test on QEMU in meantime revert:
>> hw/mem/cxl_type3: Set mask to signal Correct Internal Errors.
> 
> If you want to have some pre holidays fun, it's instructive to
> inject a whole bunch of errors in a row (just paste a bunch of json entries in QMP telnet settions all
> at once - this might happen on single errors, but that's how I'm triggering it) I've had several
> crashes in the recovery path after resets.  E.g.

Hi Jonathan,
Just saw this now. Are you injecting a bunch of UE in a row? How many? 
Do you hit it consistently? I'll try to reproduce. Thanks!

> 
> pcieport 0000:0e:00.0: AER: Downstream Port link has been reset (0)
> cxl_pci 0000:0f:00.0: mem1: restart CXL.mem after slot reset
> cxl_pci 0000:0f:00.0: mem1: error resume successful
> pcieport 0000:0e:00.0: AER: device recovery successful
> pcieport 0000:0c:00.0: AER: Uncorrected (Fatal) error received: 0000:0f:00.0
> cxl_pci 0000:0f:00.0: AER: PCIe Bus Error: severity=Uncorrected (Fatal), type=Inaccessible, (Unregistered Agent ID)
> cxl_pci 0000:0f:00.0: mem1: frozen state error detected, disable CXL.mem
> Unable to handle kernel NULL pointer dereference at virtual address 0000000000000358
> 
> ....
> 
> [  133.722701] Call trace:
> [  133.723627]  cxl_pmem_ctl+0x74/0x244 [cxl_pmem]
> [  133.724974]  nvdimm_init_nsarea+0xb8/0xdc
> [  133.727026]  nvdimm_probe+0xc0/0x1d0
> [  133.728639]  nvdimm_bus_probe+0x90/0x200
> [  133.730141]  really_probe+0xc8/0x3e0
> [  133.731422]  __driver_probe_device+0x84/0x190
> [  133.732616]  driver_probe_device+0x44/0x120
> [  133.734395]  __device_attach_driver+0xc4/0x160
> [  133.735723]  bus_for_each_drv+0x80/0xe0
> [  133.736732]  __device_attach+0xa4/0x1c4
> [  133.737876]  device_initial_probe+0x1c/0x30
> [  133.740066]  bus_probe_device+0xa4/0xb0
> [  133.742216]  device_add+0x404/0x920
> [  133.743158]  nd_async_device_register+0x20/0x70
> [  133.745009]  async_run_entry_fn+0x3c/0x154
> [  133.746433]  process_one_work+0x200/0x474
> [  133.748023]  worker_thread+0x74/0x43c
> [  133.749899]  kthread+0x110/0x114
> [  133.751393]  ret_from_fork+0x10/0x20
> 
> I did see something the other day that looked similar when manually probing
> drivers in a particular order but haven't chased that one down yet.
> 
> Our test teams love sending bug reports on what happens when you get the following
> N errors all at once.  Often reveal entertainingly complex bugs.
> 
> Jonathan
> 
>>
>> and make this change to qemu:
>>
>>  From b0f37bafcc734fe15022c47a2da7f8326b1593c2 Mon Sep 17 00:00:00 2001
>> From: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>> Date: Sat, 17 Dec 2022 17:43:38 +0000
>> Subject: [PATCH] hw/pci/aer: Implement PCI_ERR_UNCOR_MASK register
>>
>> This register in AER should be both writeable and should
>> have a default value with a couple of the errors masked
>> including the Uncorrectable Internal Error used by CXL for
>> it's error reporting.
>>
>> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>> ---
>>   hw/pci/pcie_aer.c          | 4 ++++
>>   include/hw/pci/pcie_regs.h | 3 +++
>>   2 files changed, 7 insertions(+)
>>
>> diff --git a/hw/pci/pcie_aer.c b/hw/pci/pcie_aer.c
>> index 00ef4b5601..935fbc57b5 100644
>> --- a/hw/pci/pcie_aer.c
>> +++ b/hw/pci/pcie_aer.c
>> @@ -122,6 +122,10 @@ int pcie_aer_init(PCIDevice *dev, uint8_t cap_ver, uint16_t offset,
>>
>>       pci_set_long(dev->w1cmask + offset + PCI_ERR_UNCOR_STATUS,
>>                    PCI_ERR_UNC_SUPPORTED);
>> +    pci_set_long(dev->config + offset + PCI_ERR_UNCOR_MASK,
>> +                 PCI_ERR_UNC_MASK_DEFAULT);
>> +    pci_set_long(dev->wmask + offset + PCI_ERR_UNCOR_MASK,
>> +                 PCI_ERR_UNC_SUPPORTED);
>>
>>       pci_set_long(dev->config + offset + PCI_ERR_UNCOR_SEVER,
>>                    PCI_ERR_UNC_SEVERITY_DEFAULT);
>> diff --git a/include/hw/pci/pcie_regs.h b/include/hw/pci/pcie_regs.h
>> index 963dc2e170..6ec4785448 100644
>> --- a/include/hw/pci/pcie_regs.h
>> +++ b/include/hw/pci/pcie_regs.h
>> @@ -155,6 +155,9 @@ typedef enum PCIExpLinkWidth {
>>                                            PCI_ERR_UNC_ATOP_EBLOCKED |    \
>>                                            PCI_ERR_UNC_TLP_PRF_BLOCKED)
>>
>> +#define PCI_ERR_UNC_MASK_DEFAULT        (PCI_ERR_UNC_INTN | \
>> +                                         PCI_ERR_UNC_TLP_PRF_BLOCKED)
>> +
>>   #define PCI_ERR_UNC_SEVERITY_DEFAULT    (PCI_ERR_UNC_DLP |              \
>>                                            PCI_ERR_UNC_SDN |              \
>>                                            PCI_ERR_UNC_FCP |              \
>> --
>> 2.37.2
>>
>> I'm testing the kernel side with:
>>
>>  From a5da67af2d2b4e792de2d88c74a68da3a17d2872 Mon Sep 17 00:00:00 2001
>> From: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>> Date: Sat, 17 Dec 2022 17:46:45 +0000
>> Subject: [PATCH] HACK: Enable the AER internal errors
>>
>> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>> ---
>>   drivers/cxl/pci.c | 11 +++++++++++
>>   1 file changed, 11 insertions(+)
>>
>> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
>> index bfdf92a9aef6..ef7a6ab41eb7 100644
>> --- a/drivers/cxl/pci.c
>> +++ b/drivers/cxl/pci.c
>> @@ -778,7 +778,18 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
>>                  return rc;
>>
>>          if (cxlds->regs.ras) {
>> +               u32 mask;
>> +               printk("trying to unmask\n");
>>                  pci_enable_pcie_error_reporting(pdev);
>> +               pci_read_config_dword(pdev, pdev->aer_cap + PCI_ERR_COR_MASK, &mask);
>> +               mask &= ~PCI_ERR_COR_INTERNAL;
>> +               printk("unamsking cor\n");
>> +               pci_write_config_word(pdev, pdev->aer_cap + PCI_ERR_COR_MASK, mask);
>> +
>> +               pci_read_config_dword(pdev, pdev->aer_cap + PCI_ERR_UNCOR_MASK, &mask);
>> +               mask &= ~PCI_ERR_UNC_INTN;
>> +               printk("unmasking uncor\n");
>> +               pci_write_config_dword(pdev, pdev->aer_cap + PCI_ERR_UNCOR_MASK, mask);
>>                  cxl_pci_ras_unmask(pdev);
>>                  rc = devm_add_action_or_reset(&pdev->dev, disable_aer, pdev);
>>                  if (rc)
>> --
>> 2.37.2
>>
>>
>>
>>>
>>> ---
>>>
>>> Based on patch posted by Ira [1] to export CXL native error reporting control.
>>>
>>> [1]: https://lore.kernel.org/linux-cxl/20221212070627.1372402-2-ira.weiny@intel.com/
>>>
>>> v5:
>>> - Add single debug out to show mask changing. (Dan)
>>>
>>> v4:
>>> - Fix masking of RAS register. (Jonathan)
>>>
>>> v3:
>>> - Remove flex bus port status check. (Jonathan)
>>> - Only unmask known mask bits. (Jonathan)
>>>
>>> v2:
>>> - Add definition of PCI_EXP_LNKSTA2_FLIT. (Dan)
>>> - Return error for cxl_pci_ras_unmask(). (Dan)
>>> - Add dev_dbg() for register bits to be cleared. (Dan)
>>> - Check Flex Port DVSEC status. (Dan)
>>> ---
>>>   drivers/cxl/cxl.h             |    1 +
>>>   drivers/cxl/pci.c             |   49 +++++++++++++++++++++++++++++++++++++++++
>>>   include/uapi/linux/pci_regs.h |    1 +
>>>   3 files changed, 51 insertions(+)
>>>
>>> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
>>> index 1b1cf459ac77..31e795c6d537 100644
>>> --- a/drivers/cxl/cxl.h
>>> +++ b/drivers/cxl/cxl.h
>>> @@ -130,6 +130,7 @@ static inline int ways_to_eiw(unsigned int ways, u8 *eiw)
>>>   #define   CXL_RAS_UNCORRECTABLE_STATUS_MASK (GENMASK(16, 14) | GENMASK(11, 0))
>>>   #define CXL_RAS_UNCORRECTABLE_MASK_OFFSET 0x4
>>>   #define   CXL_RAS_UNCORRECTABLE_MASK_MASK (GENMASK(16, 14) | GENMASK(11, 0))
>>> +#define   CXL_RAS_UNCORRECTABLE_MASK_F256B_MASK BIT(8)
>>>   #define CXL_RAS_UNCORRECTABLE_SEVERITY_OFFSET 0x8
>>>   #define   CXL_RAS_UNCORRECTABLE_SEVERITY_MASK (GENMASK(16, 14) | GENMASK(11, 0))
>>>   #define CXL_RAS_CORRECTABLE_STATUS_OFFSET 0xC
>>> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
>>> index 33083a522fd1..1e5fc69bcc4d 100644
>>> --- a/drivers/cxl/pci.c
>>> +++ b/drivers/cxl/pci.c
>>> @@ -419,6 +419,54 @@ static void disable_aer(void *pdev)
>>>   	pci_disable_pcie_error_reporting(pdev);
>>>   }
>>>   
>>> +/*
>>> + * CXL v3.0 6.2.3 Table 6-4
>>> + * The table indicates that if PCIe Flit Mode is set, then CXL is in 256B flits
>>> + * mode, otherwise it's 68B flits mode.
>>> + */
>>> +static bool cxl_pci_flit_256(struct pci_dev *pdev)
>>> +{
>>> +	u32 lnksta2;
>>> +
>>> +	pcie_capability_read_dword(pdev, PCI_EXP_LNKSTA2, &lnksta2);
>>> +	return lnksta2 & PCI_EXP_LNKSTA2_FLIT;
>>> +}
>>> +
>>> +static int cxl_pci_ras_unmask(struct pci_dev *pdev)
>>> +{
>>> +	struct pci_host_bridge *host_bridge = pci_find_host_bridge(pdev->bus);
>>> +	struct cxl_dev_state *cxlds = pci_get_drvdata(pdev);
>>> +	void __iomem *addr;
>>> +	u32 orig_val, val, mask;
>>> +
>>> +	if (!cxlds->regs.ras)
>>> +		return -ENODEV;
>>> +
>>> +	/* BIOS has CXL error control */
>>> +	if (!host_bridge->native_cxl_error)
>>> +		return -EOPNOTSUPP;
>>> +
>>> +	addr = cxlds->regs.ras + CXL_RAS_UNCORRECTABLE_MASK_OFFSET;
>>> +	orig_val = readl(addr);
>>> +
>>> +	mask = CXL_RAS_UNCORRECTABLE_MASK_MASK;
>>> +	if (!cxl_pci_flit_256(pdev))
>>> +		mask &= ~CXL_RAS_UNCORRECTABLE_MASK_F256B_MASK;
>>> +	val = orig_val & ~mask;
>>> +	writel(val, addr);
>>> +	dev_dbg(&pdev->dev, "Uncorrectable RAS Errors Mask: %#x -> %#x\n",
>>> +		orig_val, val);
>>> +
>>> +	addr = cxlds->regs.ras + CXL_RAS_CORRECTABLE_MASK_OFFSET;
>>> +	orig_val = readl(addr);
>>> +	val = orig_val & ~CXL_RAS_CORRECTABLE_MASK_MASK;
>>> +	writel(val, addr);
>>> +	dev_dbg(&pdev->dev, "Correctable RAS Errors Mask: %#x -> %#x\n",
>>> +		orig_val, val);
>>> +
>>> +	return 0;
>>> +}
>>> +
>>>   static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
>>>   {
>>>   	struct cxl_register_map map;
>>> @@ -498,6 +546,7 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
>>>   
>>>   	if (cxlds->regs.ras) {
>>>   		pci_enable_pcie_error_reporting(pdev);
>>> +		cxl_pci_ras_unmask(pdev);
>>>   		rc = devm_add_action_or_reset(&pdev->dev, disable_aer, pdev);
>>>   		if (rc)
>>>   			return rc;
>>> diff --git a/include/uapi/linux/pci_regs.h b/include/uapi/linux/pci_regs.h
>>> index 82a03ea954af..576ee2ec973f 100644
>>> --- a/include/uapi/linux/pci_regs.h
>>> +++ b/include/uapi/linux/pci_regs.h
>>> @@ -693,6 +693,7 @@
>>>   #define  PCI_EXP_LNKCTL2_TX_MARGIN	0x0380 /* Transmit Margin */
>>>   #define  PCI_EXP_LNKCTL2_HASD		0x0020 /* HW Autonomous Speed Disable */
>>>   #define PCI_EXP_LNKSTA2		0x32	/* Link Status 2 */
>>> +#define  PCI_EXP_LNKSTA2_FLIT		BIT(10) /* Flit Mode Status */
>>>   #define PCI_CAP_EXP_ENDPOINT_SIZEOF_V2	0x32	/* end of v2 EPs w/ link */
>>>   #define PCI_EXP_SLTCAP2		0x34	/* Slot Capabilities 2 */
>>>   #define  PCI_EXP_SLTCAP2_IBPD	0x00000001 /* In-band PD Disable Supported */
>>>
>>>
>>
>

Bjorn Helgaas Jan. 5, 2023, 4:54 p.m. UTC | #6

On Thu, Jan 05, 2023 at 04:31:27PM +0000, Jonathan Cameron wrote:
> On Thu, 29 Dec 2022 11:27:31 -0600
> Bjorn Helgaas <helgaas@kernel.org> wrote:
> > On Sat, Dec 17, 2022 at 05:52:04PM +0000, Jonathan Cameron wrote:

> > > I realized that adding this patch still only enables error because I
> > > didn't check the PCIe spec when writing the QEMU emulation. I had
> > > changed the value of  "Correctable Internal Error Mask" to default
> > > to unmasked.  PCIe 6.0 says it defaults to masked.  For some reason
> > > I thought these masks were impdef (should have checked ;)  
> > 
> > I assume you refer to the AER "Corrected Internal Error Mask" bit
> > (PCIe r6.0, sec 7.8.4.6), which indeed defaults to 1b (masked) if the
> > bit is implemented.
> 
> Spot on.  I keep confusing the correctable / corrected stuff in PCIe.
> Made more confusing by the CXL stuff layered on top.

Great, it wasn't confusing enough already, so CXL rectified that
problem :)

> > We now have f26e58bf6f54 ("PCI/AER: Enable error reporting when AER is
> > native"), which turns on error reporting in Device Control for all
> > devices at enumeration-time when the OS has control of AER.  But this
> > is only the generic device-level control; it doesn't configure any
> > *AER* registers.
> > 
> > I'm surprised to learn that the only writes to PCI_ERR_UNCOR_MASK are
> > some mips and powerpc arch-specific code and a few individual drivers.
> > It seems like maybe pci_aer_init() should do some more configuration
> > of the AER mask and severity registers.
> 
> Sounds good.  Any thoughts on where to get the policy from?
> Feels like an administrator thing rather than a kernel config one
> to me, so maybe pci_aer_init() is too early or we'd benefit from
> a nice easy per device interface to tweak a default?

If we get a solid system-level policy in place and still end up
needing some kind of administrative control, that might be OK.  But we
don't have that solid system policy yet, so I'd like to push on that
before adding admin interfaces.

Bjorn

Jonathan Cameron Jan. 6, 2023, 11:22 a.m. UTC | #7

On Thu, 5 Jan 2023 09:53:54 -0700
Dave Jiang <dave.jiang@intel.com> wrote:

> On 12/17/22 11:05 AM, Jonathan Cameron wrote:
> > On Sat, 17 Dec 2022 17:52:04 +0000
> > Jonathan Cameron <Jonathan.Cameron@Huawei.com> wrote:
> >   
> >> On Fri, 16 Dec 2022 09:10:02 -0700
> >> Dave Jiang <dave.jiang@intel.com> wrote:
> >>  
> >>> By default the CXL RAS mask registers bits are defaulted to 1's and
> >>> suppress all error reporting. If the kernel has negotiated ownership
> >>> of error handling for CXL then unmask the mask registers by writing 0s.
> >>>
> >>> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> >>> Signed-off-by: Dave Jiang <dave.jiang@intel.  
> >> +CC Bjorn for a question on the AER control element of this.
> >>
> >> I realized that adding this patch still only enables error because
> >> I didn't check the PCIe spec when writing the QEMU emulation. I had changed
> >> the value of  "Correctable Internal Error Mask" to default to unmasked.
> >> PCIe 6.0 says it defaults to masked.  For some reason I thought these
> >> masks were impdef (should have checked ;)
> >>
> >> I'll drop that change in QEMU, but upshot is that if we want the correctable
> >> ones to be delivered, we'll need to clear that bit as well as the ones you
> >> do in this patch.
> >>
> >> This got me thinking about why I never saw the same issue for UNC
> >> errors.  Upshot, QEMU never sets the UNCOR mask so defaults to everything on
> >> (it also doesn't allow anyone to write the register). Curiously it has
> >> the code that uses the mask even though there was no means to set any
> >> bits in it.
> >> I'll fix that (draft patch below) + we should update lspci to cover more of these AER
> >> flags as it's a PITA debugging by reading the hex dumps!
> >>
> >> Bjorn, is there any convention on drivers enabling these 'default' masked
> >> AER errors?  Or is expectation that this is policy and userspace should
> >> be dealing with it?
> >>
> >> Dave, if you want to test on QEMU in meantime revert:
> >> hw/mem/cxl_type3: Set mask to signal Correct Internal Errors.  
> > 
> > If you want to have some pre holidays fun, it's instructive to
> > inject a whole bunch of errors in a row (just paste a bunch of json entries in QMP telnet settions all
> > at once - this might happen on single errors, but that's how I'm triggering it) I've had several
> > crashes in the recovery path after resets.  E.g.  
> 
> Hi Jonathan,
> Just saw this now. Are you injecting a bunch of UE in a row? How many? 
> Do you hit it consistently? I'll try to reproduce. Thanks!

Err. I should have taken some notes.  IIRC I was indeed just pasting
in a couple of error injection QMP commands at once. 

It may well occur if single errors are injected as well, but take longer.

Jonathan

> 
> > 
> > pcieport 0000:0e:00.0: AER: Downstream Port link has been reset (0)
> > cxl_pci 0000:0f:00.0: mem1: restart CXL.mem after slot reset
> > cxl_pci 0000:0f:00.0: mem1: error resume successful
> > pcieport 0000:0e:00.0: AER: device recovery successful
> > pcieport 0000:0c:00.0: AER: Uncorrected (Fatal) error received: 0000:0f:00.0
> > cxl_pci 0000:0f:00.0: AER: PCIe Bus Error: severity=Uncorrected (Fatal), type=Inaccessible, (Unregistered Agent ID)
> > cxl_pci 0000:0f:00.0: mem1: frozen state error detected, disable CXL.mem
> > Unable to handle kernel NULL pointer dereference at virtual address 0000000000000358
> > 
> > ....
> > 
> > [  133.722701] Call trace:
> > [  133.723627]  cxl_pmem_ctl+0x74/0x244 [cxl_pmem]
> > [  133.724974]  nvdimm_init_nsarea+0xb8/0xdc
> > [  133.727026]  nvdimm_probe+0xc0/0x1d0
> > [  133.728639]  nvdimm_bus_probe+0x90/0x200
> > [  133.730141]  really_probe+0xc8/0x3e0
> > [  133.731422]  __driver_probe_device+0x84/0x190
> > [  133.732616]  driver_probe_device+0x44/0x120
> > [  133.734395]  __device_attach_driver+0xc4/0x160
> > [  133.735723]  bus_for_each_drv+0x80/0xe0
> > [  133.736732]  __device_attach+0xa4/0x1c4
> > [  133.737876]  device_initial_probe+0x1c/0x30
> > [  133.740066]  bus_probe_device+0xa4/0xb0
> > [  133.742216]  device_add+0x404/0x920
> > [  133.743158]  nd_async_device_register+0x20/0x70
> > [  133.745009]  async_run_entry_fn+0x3c/0x154
> > [  133.746433]  process_one_work+0x200/0x474
> > [  133.748023]  worker_thread+0x74/0x43c
> > [  133.749899]  kthread+0x110/0x114
> > [  133.751393]  ret_from_fork+0x10/0x20
> > 
> > I did see something the other day that looked similar when manually probing
> > drivers in a particular order but haven't chased that one down yet.
> > 
> > Our test teams love sending bug reports on what happens when you get the following
> > N errors all at once.  Often reveal entertainingly complex bugs.
> > 
> > Jonathan
> >   
> >>
> >> and make this change to qemu:
> >>
> >>  From b0f37bafcc734fe15022c47a2da7f8326b1593c2 Mon Sep 17 00:00:00 2001
> >> From: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> >> Date: Sat, 17 Dec 2022 17:43:38 +0000
> >> Subject: [PATCH] hw/pci/aer: Implement PCI_ERR_UNCOR_MASK register
> >>
> >> This register in AER should be both writeable and should
> >> have a default value with a couple of the errors masked
> >> including the Uncorrectable Internal Error used by CXL for
> >> it's error reporting.
> >>
> >> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> >> ---
> >>   hw/pci/pcie_aer.c          | 4 ++++
> >>   include/hw/pci/pcie_regs.h | 3 +++
> >>   2 files changed, 7 insertions(+)
> >>
> >> diff --git a/hw/pci/pcie_aer.c b/hw/pci/pcie_aer.c
> >> index 00ef4b5601..935fbc57b5 100644
> >> --- a/hw/pci/pcie_aer.c
> >> +++ b/hw/pci/pcie_aer.c
> >> @@ -122,6 +122,10 @@ int pcie_aer_init(PCIDevice *dev, uint8_t cap_ver, uint16_t offset,
> >>
> >>       pci_set_long(dev->w1cmask + offset + PCI_ERR_UNCOR_STATUS,
> >>                    PCI_ERR_UNC_SUPPORTED);
> >> +    pci_set_long(dev->config + offset + PCI_ERR_UNCOR_MASK,
> >> +                 PCI_ERR_UNC_MASK_DEFAULT);
> >> +    pci_set_long(dev->wmask + offset + PCI_ERR_UNCOR_MASK,
> >> +                 PCI_ERR_UNC_SUPPORTED);
> >>
> >>       pci_set_long(dev->config + offset + PCI_ERR_UNCOR_SEVER,
> >>                    PCI_ERR_UNC_SEVERITY_DEFAULT);
> >> diff --git a/include/hw/pci/pcie_regs.h b/include/hw/pci/pcie_regs.h
> >> index 963dc2e170..6ec4785448 100644
> >> --- a/include/hw/pci/pcie_regs.h
> >> +++ b/include/hw/pci/pcie_regs.h
> >> @@ -155,6 +155,9 @@ typedef enum PCIExpLinkWidth {
> >>                                            PCI_ERR_UNC_ATOP_EBLOCKED |    \
> >>                                            PCI_ERR_UNC_TLP_PRF_BLOCKED)
> >>
> >> +#define PCI_ERR_UNC_MASK_DEFAULT        (PCI_ERR_UNC_INTN | \
> >> +                                         PCI_ERR_UNC_TLP_PRF_BLOCKED)
> >> +
> >>   #define PCI_ERR_UNC_SEVERITY_DEFAULT    (PCI_ERR_UNC_DLP |              \
> >>                                            PCI_ERR_UNC_SDN |              \
> >>                                            PCI_ERR_UNC_FCP |              \
> >> --
> >> 2.37.2
> >>
> >> I'm testing the kernel side with:
> >>
> >>  From a5da67af2d2b4e792de2d88c74a68da3a17d2872 Mon Sep 17 00:00:00 2001
> >> From: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> >> Date: Sat, 17 Dec 2022 17:46:45 +0000
> >> Subject: [PATCH] HACK: Enable the AER internal errors
> >>
> >> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> >> ---
> >>   drivers/cxl/pci.c | 11 +++++++++++
> >>   1 file changed, 11 insertions(+)
> >>
> >> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> >> index bfdf92a9aef6..ef7a6ab41eb7 100644
> >> --- a/drivers/cxl/pci.c
> >> +++ b/drivers/cxl/pci.c
> >> @@ -778,7 +778,18 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
> >>                  return rc;
> >>
> >>          if (cxlds->regs.ras) {
> >> +               u32 mask;
> >> +               printk("trying to unmask\n");
> >>                  pci_enable_pcie_error_reporting(pdev);
> >> +               pci_read_config_dword(pdev, pdev->aer_cap + PCI_ERR_COR_MASK, &mask);
> >> +               mask &= ~PCI_ERR_COR_INTERNAL;
> >> +               printk("unamsking cor\n");
> >> +               pci_write_config_word(pdev, pdev->aer_cap + PCI_ERR_COR_MASK, mask);
> >> +
> >> +               pci_read_config_dword(pdev, pdev->aer_cap + PCI_ERR_UNCOR_MASK, &mask);
> >> +               mask &= ~PCI_ERR_UNC_INTN;
> >> +               printk("unmasking uncor\n");
> >> +               pci_write_config_dword(pdev, pdev->aer_cap + PCI_ERR_UNCOR_MASK, mask);
> >>                  cxl_pci_ras_unmask(pdev);
> >>                  rc = devm_add_action_or_reset(&pdev->dev, disable_aer, pdev);
> >>                  if (rc)
> >> --
> >> 2.37.2
> >>
> >>
> >>  
> >>>
> >>> ---
> >>>
> >>> Based on patch posted by Ira [1] to export CXL native error reporting control.
> >>>
> >>> [1]: https://lore.kernel.org/linux-cxl/20221212070627.1372402-2-ira.weiny@intel.com/
> >>>
> >>> v5:
> >>> - Add single debug out to show mask changing. (Dan)
> >>>
> >>> v4:
> >>> - Fix masking of RAS register. (Jonathan)
> >>>
> >>> v3:
> >>> - Remove flex bus port status check. (Jonathan)
> >>> - Only unmask known mask bits. (Jonathan)
> >>>
> >>> v2:
> >>> - Add definition of PCI_EXP_LNKSTA2_FLIT. (Dan)
> >>> - Return error for cxl_pci_ras_unmask(). (Dan)
> >>> - Add dev_dbg() for register bits to be cleared. (Dan)
> >>> - Check Flex Port DVSEC status. (Dan)
> >>> ---
> >>>   drivers/cxl/cxl.h             |    1 +
> >>>   drivers/cxl/pci.c             |   49 +++++++++++++++++++++++++++++++++++++++++
> >>>   include/uapi/linux/pci_regs.h |    1 +
> >>>   3 files changed, 51 insertions(+)
> >>>
> >>> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> >>> index 1b1cf459ac77..31e795c6d537 100644
> >>> --- a/drivers/cxl/cxl.h
> >>> +++ b/drivers/cxl/cxl.h
> >>> @@ -130,6 +130,7 @@ static inline int ways_to_eiw(unsigned int ways, u8 *eiw)
> >>>   #define   CXL_RAS_UNCORRECTABLE_STATUS_MASK (GENMASK(16, 14) | GENMASK(11, 0))
> >>>   #define CXL_RAS_UNCORRECTABLE_MASK_OFFSET 0x4
> >>>   #define   CXL_RAS_UNCORRECTABLE_MASK_MASK (GENMASK(16, 14) | GENMASK(11, 0))
> >>> +#define   CXL_RAS_UNCORRECTABLE_MASK_F256B_MASK BIT(8)
> >>>   #define CXL_RAS_UNCORRECTABLE_SEVERITY_OFFSET 0x8
> >>>   #define   CXL_RAS_UNCORRECTABLE_SEVERITY_MASK (GENMASK(16, 14) | GENMASK(11, 0))
> >>>   #define CXL_RAS_CORRECTABLE_STATUS_OFFSET 0xC
> >>> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> >>> index 33083a522fd1..1e5fc69bcc4d 100644
> >>> --- a/drivers/cxl/pci.c
> >>> +++ b/drivers/cxl/pci.c
> >>> @@ -419,6 +419,54 @@ static void disable_aer(void *pdev)
> >>>   	pci_disable_pcie_error_reporting(pdev);
> >>>   }
> >>>   
> >>> +/*
> >>> + * CXL v3.0 6.2.3 Table 6-4
> >>> + * The table indicates that if PCIe Flit Mode is set, then CXL is in 256B flits
> >>> + * mode, otherwise it's 68B flits mode.
> >>> + */
> >>> +static bool cxl_pci_flit_256(struct pci_dev *pdev)
> >>> +{
> >>> +	u32 lnksta2;
> >>> +
> >>> +	pcie_capability_read_dword(pdev, PCI_EXP_LNKSTA2, &lnksta2);
> >>> +	return lnksta2 & PCI_EXP_LNKSTA2_FLIT;
> >>> +}
> >>> +
> >>> +static int cxl_pci_ras_unmask(struct pci_dev *pdev)
> >>> +{
> >>> +	struct pci_host_bridge *host_bridge = pci_find_host_bridge(pdev->bus);
> >>> +	struct cxl_dev_state *cxlds = pci_get_drvdata(pdev);
> >>> +	void __iomem *addr;
> >>> +	u32 orig_val, val, mask;
> >>> +
> >>> +	if (!cxlds->regs.ras)
> >>> +		return -ENODEV;
> >>> +
> >>> +	/* BIOS has CXL error control */
> >>> +	if (!host_bridge->native_cxl_error)
> >>> +		return -EOPNOTSUPP;
> >>> +
> >>> +	addr = cxlds->regs.ras + CXL_RAS_UNCORRECTABLE_MASK_OFFSET;
> >>> +	orig_val = readl(addr);
> >>> +
> >>> +	mask = CXL_RAS_UNCORRECTABLE_MASK_MASK;
> >>> +	if (!cxl_pci_flit_256(pdev))
> >>> +		mask &= ~CXL_RAS_UNCORRECTABLE_MASK_F256B_MASK;
> >>> +	val = orig_val & ~mask;
> >>> +	writel(val, addr);
> >>> +	dev_dbg(&pdev->dev, "Uncorrectable RAS Errors Mask: %#x -> %#x\n",
> >>> +		orig_val, val);
> >>> +
> >>> +	addr = cxlds->regs.ras + CXL_RAS_CORRECTABLE_MASK_OFFSET;
> >>> +	orig_val = readl(addr);
> >>> +	val = orig_val & ~CXL_RAS_CORRECTABLE_MASK_MASK;
> >>> +	writel(val, addr);
> >>> +	dev_dbg(&pdev->dev, "Correctable RAS Errors Mask: %#x -> %#x\n",
> >>> +		orig_val, val);
> >>> +
> >>> +	return 0;
> >>> +}
> >>> +
> >>>   static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
> >>>   {
> >>>   	struct cxl_register_map map;
> >>> @@ -498,6 +546,7 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
> >>>   
> >>>   	if (cxlds->regs.ras) {
> >>>   		pci_enable_pcie_error_reporting(pdev);
> >>> +		cxl_pci_ras_unmask(pdev);
> >>>   		rc = devm_add_action_or_reset(&pdev->dev, disable_aer, pdev);
> >>>   		if (rc)
> >>>   			return rc;
> >>> diff --git a/include/uapi/linux/pci_regs.h b/include/uapi/linux/pci_regs.h
> >>> index 82a03ea954af..576ee2ec973f 100644
> >>> --- a/include/uapi/linux/pci_regs.h
> >>> +++ b/include/uapi/linux/pci_regs.h
> >>> @@ -693,6 +693,7 @@
> >>>   #define  PCI_EXP_LNKCTL2_TX_MARGIN	0x0380 /* Transmit Margin */
> >>>   #define  PCI_EXP_LNKCTL2_HASD		0x0020 /* HW Autonomous Speed Disable */
> >>>   #define PCI_EXP_LNKSTA2		0x32	/* Link Status 2 */
> >>> +#define  PCI_EXP_LNKSTA2_FLIT		BIT(10) /* Flit Mode Status */
> >>>   #define PCI_CAP_EXP_ENDPOINT_SIZEOF_V2	0x32	/* end of v2 EPs w/ link */
> >>>   #define PCI_EXP_SLTCAP2		0x34	/* Slot Capabilities 2 */
> >>>   #define  PCI_EXP_SLTCAP2_IBPD	0x00000001 /* In-band PD Disable Supported */
> >>>
> >>>  
> >>  
> >

Jonathan Cameron Jan. 6, 2023, 11:26 a.m. UTC | #8

On Thu, 5 Jan 2023 10:54:06 -0600
Bjorn Helgaas <helgaas@kernel.org> wrote:

> On Thu, Jan 05, 2023 at 04:31:27PM +0000, Jonathan Cameron wrote:
> > On Thu, 29 Dec 2022 11:27:31 -0600
> > Bjorn Helgaas <helgaas@kernel.org> wrote:  
> > > On Sat, Dec 17, 2022 at 05:52:04PM +0000, Jonathan Cameron wrote:  
> 
> > > > I realized that adding this patch still only enables error because I
> > > > didn't check the PCIe spec when writing the QEMU emulation. I had
> > > > changed the value of  "Correctable Internal Error Mask" to default
> > > > to unmasked.  PCIe 6.0 says it defaults to masked.  For some reason
> > > > I thought these masks were impdef (should have checked ;)    
> > > 
> > > I assume you refer to the AER "Corrected Internal Error Mask" bit
> > > (PCIe r6.0, sec 7.8.4.6), which indeed defaults to 1b (masked) if the
> > > bit is implemented.  
> > 
> > Spot on.  I keep confusing the correctable / corrected stuff in PCIe.
> > Made more confusing by the CXL stuff layered on top.  
> 
> Great, it wasn't confusing enough already, so CXL rectified that
> problem :)
> 
> > > We now have f26e58bf6f54 ("PCI/AER: Enable error reporting when AER is
> > > native"), which turns on error reporting in Device Control for all
> > > devices at enumeration-time when the OS has control of AER.  But this
> > > is only the generic device-level control; it doesn't configure any
> > > *AER* registers.
> > > 
> > > I'm surprised to learn that the only writes to PCI_ERR_UNCOR_MASK are
> > > some mips and powerpc arch-specific code and a few individual drivers.
> > > It seems like maybe pci_aer_init() should do some more configuration
> > > of the AER mask and severity registers.  
> > 
> > Sounds good.  Any thoughts on where to get the policy from?
> > Feels like an administrator thing rather than a kernel config one
> > to me, so maybe pci_aer_init() is too early or we'd benefit from
> > a nice easy per device interface to tweak a default?  
> 
> If we get a solid system-level policy in place and still end up
> needing some kind of administrative control, that might be OK.  But we
> don't have that solid system policy yet, so I'd like to push on that
> before adding admin interfaces.

I guess next step is a straw man proposal for people to test / shoot at.
I'll send out an RFC that enables the lot as defined in PCI r6.0
as that will be most useful for identifying quirks we need to handle.

My assumption being people will push back on some of them / we'll need
to quirk others.

Jonathan

> 
> Bjorn

Ira Weiny Feb. 10, 2023, 8:07 a.m. UTC | #9

Jonathan Cameron wrote:
> On Thu, 5 Jan 2023 10:54:06 -0600
> Bjorn Helgaas <helgaas@kernel.org> wrote:
> 
> > On Thu, Jan 05, 2023 at 04:31:27PM +0000, Jonathan Cameron wrote:
> > > On Thu, 29 Dec 2022 11:27:31 -0600
> > > Bjorn Helgaas <helgaas@kernel.org> wrote:  
> > > > On Sat, Dec 17, 2022 at 05:52:04PM +0000, Jonathan Cameron wrote:  
> > 
> > > > > I realized that adding this patch still only enables error because I
> > > > > didn't check the PCIe spec when writing the QEMU emulation. I had
> > > > > changed the value of  "Correctable Internal Error Mask" to default
> > > > > to unmasked.  PCIe 6.0 says it defaults to masked.  For some reason
> > > > > I thought these masks were impdef (should have checked ;)    
> > > > 
> > > > I assume you refer to the AER "Corrected Internal Error Mask" bit
> > > > (PCIe r6.0, sec 7.8.4.6), which indeed defaults to 1b (masked) if the
> > > > bit is implemented.  
> > > 
> > > Spot on.  I keep confusing the correctable / corrected stuff in PCIe.
> > > Made more confusing by the CXL stuff layered on top.  
> > 
> > Great, it wasn't confusing enough already, so CXL rectified that
> > problem :)
> > 
> > > > We now have f26e58bf6f54 ("PCI/AER: Enable error reporting when AER is
> > > > native"), which turns on error reporting in Device Control for all
> > > > devices at enumeration-time when the OS has control of AER.  But this
> > > > is only the generic device-level control; it doesn't configure any
> > > > *AER* registers.
> > > > 
> > > > I'm surprised to learn that the only writes to PCI_ERR_UNCOR_MASK are
> > > > some mips and powerpc arch-specific code and a few individual drivers.
> > > > It seems like maybe pci_aer_init() should do some more configuration
> > > > of the AER mask and severity registers.  
> > > 
> > > Sounds good.  Any thoughts on where to get the policy from?
> > > Feels like an administrator thing rather than a kernel config one
> > > to me, so maybe pci_aer_init() is too early or we'd benefit from
> > > a nice easy per device interface to tweak a default?  
> > 
> > If we get a solid system-level policy in place and still end up
> > needing some kind of administrative control, that might be OK.  But we
> > don't have that solid system policy yet, so I'd like to push on that
> > before adding admin interfaces.
> 
> I guess next step is a straw man proposal for people to test / shoot at.
> I'll send out an RFC that enables the lot as defined in PCI r6.0
> as that will be most useful for identifying quirks we need to handle.
> 
> My assumption being people will push back on some of them / we'll need
> to quirk others.

Jonathan,

Did you send something along these lines?  I was testing the AER trace
point output while cleaning up the trace points a bit.[1]  And I ran
across the need to set up these PCIe registers.  Because I did not see
anything from you I was thinking of taking a crack at fixing this.

To me, it is pretty straight forward that if a driver enables error
reporting then it would want to include internal errors both correctable
and uncorrectable.  This allows for CXL to further control the CXL
registers as an overlay.  I'm not so sure about the other PCIe AER errors
which are default masked.

PCI v6.0 defaults to a fatal uncorrectable error.  I think it is probably
a good default to make these non-fatal when enabling the _reporting_ of
the errors.  If a driver wants them to be fatal then another call could be
added later.  (or the driver should be doing that on their own now.)

If I understand this thread, the CXL spec, the PCI spec, and all the code; it
seems like unmasking the internal correctable and uncorrectable errors in
pci_enable_pcie_error_reporting() are a good system default.  With the
addition of making uncorrectable non-fatal.[2]

That said, there are approximately 51 drivers which enable error reporting
with this call.  I'm a bit concerned with such a global change.  But
perhaps it seems ok if it is only enabling then internal errors as
non-fatal.  So at worse this would simply spam logs on bad devices?

Ira

[1] https://lore.kernel.org/all/20230208-cxl-event-names-v1-0-73f0ff3a3870@intel.com/
[2]

commit 30e6a0bf308e7b8c68b4da33b505fa967f6bbf34 (HEAD -> cxl-pci-aer)
Author: Ira Weiny <ira.weiny@intel.com>
Date:   Thu Feb 9 22:26:05 2023 -0800

    PCI/AER: Enable internal AER errors by default
    
    The CXL driver expects internal error reporting to be enabled via
    pci_enable_pcie_error_reporting().  It is likely other drivers expect
    the same thing.
    
    PCIe v6.0 Uncorrectable Mask Register (7.8.4.3) and Correctable Mask
    Register (7.8.4.6) default to masking internal errors.  The
    Uncorrectable Error Severity Register (7.8.4.4) defaults internal errors
    as fatal.
    
    Change pci_enable_pcie_error_reporting() to enable both types of
    internal errors.  Ensure uncorrectable errors are set non-fatal to limit
    any impact to other drivers.
    
    Cc: Jonathan Cameron <Jonathan.Cameron@Huawei.com>
    Cc: Bjorn Helgaas <helgaas@kernel.org>
    Cc: Dave Jiang <dave.jiang@intel.com>
    Cc: <linux-cxl@vger.kernel.org>,
    Cc: <dan.j.williams@intel.com>
    Cc: Stefan Roese <sr@denx.de>
    Cc: "Kuppuswamy Sathyanarayanan" <sathyanarayanan.kuppuswamy@linux.intel.com>
    Signed-off-by: Ira Weiny <ira.weiny@intel.com>
    
    ---
    For all drivers other than CXL this is expected to at worse increase the
    error reporting verbosity.  Because the errors are set to non-fatal by
    default this should not adversely affect the operation of those devices.

diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index 625f7b2cafe4..9d3ed3a5fc23 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -229,11 +229,28 @@ int pcie_aer_is_native(struct pci_dev *dev)
 
 int pci_enable_pcie_error_reporting(struct pci_dev *dev)
 {
+       int pos_cap_err;
+       u32 reg;
        int rc;
 
        if (!pcie_aer_is_native(dev))
                return -EIO;
 
+       pos_cap_err = dev->aer_cap;
+
+       /* Unmask correctable and uncorrectable (non-fatal) internal errors */
+       pci_read_config_dword(dev, pos_cap_err + PCI_ERR_COR_MASK, &reg);
+       reg &= ~PCI_ERR_COR_INTERNAL;
+       pci_write_config_dword(dev, pos_cap_err + PCI_ERR_COR_MASK, reg);
+
+       pci_read_config_dword(dev, pos_cap_err + PCI_ERR_UNCOR_SEVER, &reg);
+       reg &= ~PCI_ERR_UNC_INTN;
+       pci_write_config_dword(dev, pos_cap_err + PCI_ERR_UNCOR_SEVER, reg);
+
+       pci_read_config_dword(dev, pos_cap_err + PCI_ERR_UNCOR_MASK, &reg);
+       reg &= ~PCI_ERR_UNC_INTN;
+       pci_write_config_dword(dev, pos_cap_err + PCI_ERR_UNCOR_MASK, reg);
+
        rc = pcie_capability_set_word(dev, PCI_EXP_DEVCTL, PCI_EXP_AER_FLAGS);
        return pcibios_err_to_errno(rc);
 }

Jonathan Cameron Feb. 10, 2023, 12:29 p.m. UTC | #10

On Fri, 10 Feb 2023 00:07:47 -0800
Ira Weiny <ira.weiny@intel.com> wrote:

> Jonathan Cameron wrote:
> > On Thu, 5 Jan 2023 10:54:06 -0600
> > Bjorn Helgaas <helgaas@kernel.org> wrote:
> >   
> > > On Thu, Jan 05, 2023 at 04:31:27PM +0000, Jonathan Cameron wrote:  
> > > > On Thu, 29 Dec 2022 11:27:31 -0600
> > > > Bjorn Helgaas <helgaas@kernel.org> wrote:    
> > > > > On Sat, Dec 17, 2022 at 05:52:04PM +0000, Jonathan Cameron wrote:    
> > >   
> > > > > > I realized that adding this patch still only enables error because I
> > > > > > didn't check the PCIe spec when writing the QEMU emulation. I had
> > > > > > changed the value of  "Correctable Internal Error Mask" to default
> > > > > > to unmasked.  PCIe 6.0 says it defaults to masked.  For some reason
> > > > > > I thought these masks were impdef (should have checked ;)      
> > > > > 
> > > > > I assume you refer to the AER "Corrected Internal Error Mask" bit
> > > > > (PCIe r6.0, sec 7.8.4.6), which indeed defaults to 1b (masked) if the
> > > > > bit is implemented.    
> > > > 
> > > > Spot on.  I keep confusing the correctable / corrected stuff in PCIe.
> > > > Made more confusing by the CXL stuff layered on top.    
> > > 
> > > Great, it wasn't confusing enough already, so CXL rectified that
> > > problem :)
> > >   
> > > > > We now have f26e58bf6f54 ("PCI/AER: Enable error reporting when AER is
> > > > > native"), which turns on error reporting in Device Control for all
> > > > > devices at enumeration-time when the OS has control of AER.  But this
> > > > > is only the generic device-level control; it doesn't configure any
> > > > > *AER* registers.
> > > > > 
> > > > > I'm surprised to learn that the only writes to PCI_ERR_UNCOR_MASK are
> > > > > some mips and powerpc arch-specific code and a few individual drivers.
> > > > > It seems like maybe pci_aer_init() should do some more configuration
> > > > > of the AER mask and severity registers.    
> > > > 
> > > > Sounds good.  Any thoughts on where to get the policy from?
> > > > Feels like an administrator thing rather than a kernel config one
> > > > to me, so maybe pci_aer_init() is too early or we'd benefit from
> > > > a nice easy per device interface to tweak a default?    
> > > 
> > > If we get a solid system-level policy in place and still end up
> > > needing some kind of administrative control, that might be OK.  But we
> > > don't have that solid system policy yet, so I'd like to push on that
> > > before adding admin interfaces.  
> > 
> > I guess next step is a straw man proposal for people to test / shoot at.
> > I'll send out an RFC that enables the lot as defined in PCI r6.0
> > as that will be most useful for identifying quirks we need to handle.
> > 
> > My assumption being people will push back on some of them / we'll need
> > to quirk others.  
> 
> Jonathan,
> 
> Did you send something along these lines? 

Oops. I forgot about it :(

> I was testing the AER trace
> point output while cleaning up the trace points a bit.[1]  And I ran
> across the need to set up these PCIe registers.  Because I did not see
> anything from you I was thinking of taking a crack at fixing this.

Great. I can forget about it again ;)

> 
> To me, it is pretty straight forward that if a driver enables error
> reporting then it would want to include internal errors both correctable
> and uncorrectable.  This allows for CXL to further control the CXL
> registers as an overlay.  I'm not so sure about the other PCIe AER errors
> which are default masked.

I think the concern may be around devices that are rather over chatty wrt
things that are fairly trivial.

> 
> PCI v6.0 defaults to a fatal uncorrectable error.  I think it is probably
> a good default to make these non-fatal when enabling the _reporting_ of
> the errors.  If a driver wants them to be fatal then another call could be
> added later.  (or the driver should be doing that on their own now.)

Interesting point.  I'm not sure what to do about severity.  Definitely a
question for an RFC posting.

> 
> If I understand this thread, the CXL spec, the PCI spec, and all the code; it
> seems like unmasking the internal correctable and uncorrectable errors in
> pci_enable_pcie_error_reporting() are a good system default.  With the
> addition of making uncorrectable non-fatal.[2]
> 
> That said, there are approximately 51 drivers which enable error reporting
> with this call.  I'm a bit concerned with such a global change.  But
> perhaps it seems ok if it is only enabling then internal errors as
> non-fatal.  So at worse this would simply spam logs on bad devices?

Yes, this isn't risk free, but we probably want device drivers for problem
devices to opt out of this with the defaults as you describe.
I'm sure we'll get a bunch of reports of new log messages appearing but if
they are false positives (odd things happen during device init perhaps) then
the drivers in question ought to work around that locally.

> 
> Ira
> 
> [1] https://lore.kernel.org/all/20230208-cxl-event-names-v1-0-73f0ff3a3870@intel.com/
> [2]
> 
> commit 30e6a0bf308e7b8c68b4da33b505fa967f6bbf34 (HEAD -> cxl-pci-aer)
> Author: Ira Weiny <ira.weiny@intel.com>
> Date:   Thu Feb 9 22:26:05 2023 -0800
> 
>     PCI/AER: Enable internal AER errors by default
>     
>     The CXL driver expects internal error reporting to be enabled via
>     pci_enable_pcie_error_reporting().  It is likely other drivers expect
>     the same thing.
>     
>     PCIe v6.0 Uncorrectable Mask Register (7.8.4.3) and Correctable Mask
>     Register (7.8.4.6) default to masking internal errors.  The
>     Uncorrectable Error Severity Register (7.8.4.4) defaults internal errors
>     as fatal.
>     
>     Change pci_enable_pcie_error_reporting() to enable both types of
>     internal errors.  Ensure uncorrectable errors are set non-fatal to limit
>     any impact to other drivers.
>     
>     Cc: Jonathan Cameron <Jonathan.Cameron@Huawei.com>
>     Cc: Bjorn Helgaas <helgaas@kernel.org>
>     Cc: Dave Jiang <dave.jiang@intel.com>
>     Cc: <linux-cxl@vger.kernel.org>,
>     Cc: <dan.j.williams@intel.com>
>     Cc: Stefan Roese <sr@denx.de>
>     Cc: "Kuppuswamy Sathyanarayanan" <sathyanarayanan.kuppuswamy@linux.intel.com>
>     Signed-off-by: Ira Weiny <ira.weiny@intel.com>
>     
>     ---
>     For all drivers other than CXL this is expected to at worse increase the
>     error reporting verbosity.  Because the errors are set to non-fatal by
>     default this should not adversely affect the operation of those devices.
> 
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index 625f7b2cafe4..9d3ed3a5fc23 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -229,11 +229,28 @@ int pcie_aer_is_native(struct pci_dev *dev)
>  
>  int pci_enable_pcie_error_reporting(struct pci_dev *dev)
>  {
> +       int pos_cap_err;
> +       u32 reg;
>         int rc;
>  
>         if (!pcie_aer_is_native(dev))
>                 return -EIO;
>  
> +       pos_cap_err = dev->aer_cap;
> +
> +       /* Unmask correctable and uncorrectable (non-fatal) internal errors */
> +       pci_read_config_dword(dev, pos_cap_err + PCI_ERR_COR_MASK, &reg);
> +       reg &= ~PCI_ERR_COR_INTERNAL;
> +       pci_write_config_dword(dev, pos_cap_err + PCI_ERR_COR_MASK, reg);
> +
> +       pci_read_config_dword(dev, pos_cap_err + PCI_ERR_UNCOR_SEVER, &reg);
> +       reg &= ~PCI_ERR_UNC_INTN;
> +       pci_write_config_dword(dev, pos_cap_err + PCI_ERR_UNCOR_SEVER, reg);
> +
> +       pci_read_config_dword(dev, pos_cap_err + PCI_ERR_UNCOR_MASK, &reg);
> +       reg &= ~PCI_ERR_UNC_INTN;
> +       pci_write_config_dword(dev, pos_cap_err + PCI_ERR_UNCOR_MASK, reg);
> +
>         rc = pcie_capability_set_word(dev, PCI_EXP_DEVCTL, PCI_EXP_AER_FLAGS);
>         return pcibios_err_to_errno(rc);
>  }

[v5] cxl: add RAS status unmasking for CXL

Commit Message

Comments

Patch