[8/8] cxl/pci: Add (hopeful) error handling support

Message ID	164740406489.3912056.8334546166826246693.stgit@dwillia2-desk3.amr.corp.intel.com (mailing list archive)
State	Handled Elsewhere
Headers	show Return-Path: <linux-pci-owner@kernel.org> Subject: [PATCH 8/8] cxl/pci: Add (hopeful) error handling support From: Dan Williams <dan.j.williams@intel.com> To: linux-cxl@vger.kernel.org Cc: ben.widawsky@intel.com, vishal.l.verma@intel.com, alison.schofield@intel.com, Jonathan.Cameron@huawei.com, ira.weiny@intel.com, linux-pci@vger.kernel.org Date: Tue, 15 Mar 2022 21:14:24 -0700 Message-ID: <164740406489.3912056.8334546166826246693.stgit@dwillia2-desk3.amr.corp.intel.com> In-Reply-To: <164740402242.3912056.8303625392871313860.stgit@dwillia2-desk3.amr.corp.intel.com> References: <164740402242.3912056.8303625392871313860.stgit@dwillia2-desk3.amr.corp.intel.com> User-Agent: StGit/0.18-3-g996c MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Precedence: bulk
Series	cxl/pci: Add fundamental error handling \| expand [0/8] cxl/pci: Add fundamental error handling [1/8] cxl/pci: Cleanup repeated code in cxl_probe_regs() helpers [2/8] cxl/pci: Cleanup cxl_map_device_regs() [3/8] cxl/pci: Kill cxl_map_regs() [4/8] cxl/core/regs: Make cxl_map_{component, device}_regs() device generic [5/8] cxl/port: Limit the port driver to just the HDM Decoder Capability [6/8] cxl/pci: Prepare for mapping RAS Capability Structure [7/8] cxl/pci: Find and map the RAS Capability Structure [8/8] cxl/pci: Add (hopeful) error handling support

Message ID

164740406489.3912056.8334546166826246693.stgit@dwillia2-desk3.amr.corp.intel.com (mailing list archive)

State

Handled Elsewhere

Headers

Subject: [PATCH 8/8] cxl/pci: Add (hopeful) error handling support
From: Dan Williams <dan.j.williams@intel.com>
To: linux-cxl@vger.kernel.org
Cc: ben.widawsky@intel.com, vishal.l.verma@intel.com,
        alison.schofield@intel.com, Jonathan.Cameron@huawei.com,
        ira.weiny@intel.com, linux-pci@vger.kernel.org
Date: Tue, 15 Mar 2022 21:14:24 -0700
Message-ID: 
 <164740406489.3912056.8334546166826246693.stgit@dwillia2-desk3.amr.corp.intel.com>
In-Reply-To: 
 <164740402242.3912056.8303625392871313860.stgit@dwillia2-desk3.amr.corp.intel.com>
References: 
 <164740402242.3912056.8303625392871313860.stgit@dwillia2-desk3.amr.corp.intel.com>
User-Agent: StGit/0.18-3-g996c
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 7bit
Precedence: bulk

Series

cxl/pci: Add fundamental error handling | expand

Commit Message

Dan Williams March 16, 2022, 4:14 a.m. UTC

Add nominal error handling that tears down CXL.mem in response to error
notifications that imply a device reset. Given some CXL.mem may be
operating as System RAM, there is a high likelihood that these error
events are fatal. However, if the system survives the notification the
expectation is that the driver behavior is equivalent to a hot-unplug
and re-plug of an endpoint.

Note that this does not change the mask values from the default. That
awaits CXL _OSC support to determine whether platform firmware is in
control of the mask registers.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/cxl/core/memdev.c |    1 
 drivers/cxl/cxlmem.h      |    2 +
 drivers/cxl/pci.c         |  109 +++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 112 insertions(+)

Comments

Jonathan Cameron March 17, 2022, 3:16 p.m. UTC | #1

On Tue, 15 Mar 2022 21:14:24 -0700
Dan Williams <dan.j.williams@intel.com> wrote:

> Add nominal error handling that tears down CXL.mem in response to error
> notifications that imply a device reset. Given some CXL.mem may be
> operating as System RAM, there is a high likelihood that these error
> events are fatal. However, if the system survives the notification the
> expectation is that the driver behavior is equivalent to a hot-unplug
> and re-plug of an endpoint.
> 
> Note that this does not change the mask values from the default. That
> awaits CXL _OSC support to determine whether platform firmware is in
> control of the mask registers.
> 
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>

I'm far from an expert in PCI error handling...  I've asked one of our RAS
folk to take a quick look so he may well reply as well.

With that in mind...

> ---
>  drivers/cxl/core/memdev.c |    1 
>  drivers/cxl/cxlmem.h      |    2 +
>  drivers/cxl/pci.c         |  109 +++++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 112 insertions(+)
> 
> diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
> index 1f76b28f9826..223d512790e1 100644
> --- a/drivers/cxl/core/memdev.c
> +++ b/drivers/cxl/core/memdev.c
> @@ -341,6 +341,7 @@ struct cxl_memdev *devm_cxl_add_memdev(struct cxl_dev_state *cxlds)
>  	 * needed as this is ordered with cdev_add() publishing the device.
>  	 */
>  	cxlmd->cxlds = cxlds;
> +	cxlds->cxlmd = cxlmd;
>  
>  	cdev = &cxlmd->cdev;
>  	rc = cdev_device_add(cdev, dev);
> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> index 5d33ce24fe09..f58e16951414 100644
> --- a/drivers/cxl/cxlmem.h
> +++ b/drivers/cxl/cxlmem.h
> @@ -117,6 +117,7 @@ struct cxl_endpoint_dvsec_info {
>   * Currently only memory devices are represented.
>   *
>   * @dev: The device associated with this CXL state
> + * @cxlmd: The device representing the CXL.mem capabilities of @dev
>   * @regs: Parsed register blocks
>   * @cxl_dvsec: Offset to the PCIe device DVSEC
>   * @payload_size: Size of space for payload
> @@ -148,6 +149,7 @@ struct cxl_endpoint_dvsec_info {
>   */
>  struct cxl_dev_state {
>  	struct device *dev;
> +	struct cxl_memdev *cxlmd;

This is only used I think to access the cxlmd->dev. Perhaps better to pass
the dev and avoid having a reference to the memdev in here?

>  
>  	struct cxl_regs regs;
>  	int cxl_dvsec;
> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> index bde8929450f0..823cbfa093fa 100644
> --- a/drivers/cxl/pci.c
> +++ b/drivers/cxl/pci.c
> @@ -8,6 +8,7 @@
>  #include <linux/mutex.h>
>  #include <linux/list.h>
>  #include <linux/pci.h>
> +#include <linux/aer.h>
>  #include <linux/io.h>
>  #include "cxlmem.h"
>  #include "cxlpci.h"
> @@ -533,6 +534,11 @@ static void cxl_dvsec_ranges(struct cxl_dev_state *cxlds)
>  	info->ranges = __cxl_dvsec_ranges(cxlds, info);
>  }
>  
> +static void disable_aer(void *pdev)
> +{
> +	pci_disable_pcie_error_reporting(pdev);
> +}
> +
>  static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
>  {
>  	struct cxl_register_map map;
> @@ -554,6 +560,7 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
>  	cxlds = cxl_dev_state_create(&pdev->dev);
>  	if (IS_ERR(cxlds))
>  		return PTR_ERR(cxlds);
> +	pci_set_drvdata(pdev, cxlds);
>  
>  	cxlds->serial = pci_get_dsn(pdev);
>  	cxlds->cxl_dvsec = pci_find_dvsec_capability(
> @@ -610,6 +617,14 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
>  	if (IS_ERR(cxlmd))
>  		return PTR_ERR(cxlmd);
>  
> +	if (cxlds->regs.ras) {
> +		pci_enable_pcie_error_reporting(pdev);
> +		rc = devm_add_action_or_reset(&pdev->dev, disable_aer, pdev);
> +		if (rc)
> +			return rc;
> +	}
> +	pci_save_state(pdev);
> +
>  	if (range_len(&cxlds->pmem_range) && IS_ENABLED(CONFIG_CXL_PMEM))
>  		rc = devm_cxl_add_nvdimm(&pdev->dev, cxlmd);
>  
> @@ -623,10 +638,104 @@ static const struct pci_device_id cxl_mem_pci_tbl[] = {
>  };
>  MODULE_DEVICE_TABLE(pci, cxl_mem_pci_tbl);
>  
> +/*
> + * Log the state of the RAS status registers and prepare them to log the
> + * next error status.
> + */
> +static void cxl_report_and_clear(struct cxl_dev_state *cxlds)
> +{
> +	struct cxl_memdev *cxlmd = cxlds->cxlmd;
> +	struct device *dev = &cxlmd->dev;
> +	void __iomem *addr;
> +	u32 status;
> +
> +	if (!cxlds->regs.ras)
> +		return;
> +
> +	addr = cxlds->regs.ras + CXL_RAS_UNCORRECTABLE_STATUS_OFFSET;
> +	status = readl(addr);
> +	if (status & CXL_RAS_UNCORRECTABLE_STATUS_MASK) {
> +		dev_warn(cxlds->dev, "%s: uncorrectable status: %#08x\n",
> +			 dev_name(dev), status);
> +		writel(status & CXL_RAS_UNCORRECTABLE_STATUS_MASK, addr);
> +	}
> +
> +	addr = cxlds->regs.ras + CXL_RAS_CORRECTABLE_STATUS_OFFSET;
> +	status = readl(addr);
> +	if (status & CXL_RAS_CORRECTABLE_STATUS_MASK) {
> +		dev_warn(cxlds->dev, "%s: correctable status: %#08x\n",
> +			 dev_name(dev), status);
> +		writel(status & CXL_RAS_CORRECTABLE_STATUS_MASK, addr);
> +	}
> +}
> +
> +static pci_ers_result_t cxl_error_detected(struct pci_dev *pdev,
> +					   pci_channel_state_t state)
> +{
> +	struct cxl_dev_state *cxlds = pci_get_drvdata(pdev);
> +	struct cxl_memdev *cxlmd = cxlds->cxlmd;
> +	struct device *dev = &cxlmd->dev;

Perhaps a more informative name than dev given we have several devs
involved here?

> +
> +	/*
> +	 * A frozen channel indicates an impending reset which is fatal to
> +	 * CXL.mem operation, and will likely crash the system. On the off
> +	 * chance the situation is recoverable dump the status of the RAS
> +	 * capability registers and bounce the active state of the memdev.
> +	 */
> +	cxl_report_and_clear(cxlds);
> +
> +	switch (state) {
> +	case pci_channel_io_normal:
> +		return PCI_ERS_RESULT_CAN_RECOVER;
> +	case pci_channel_io_frozen:
> +		dev_warn(&pdev->dev,
> +			 "%s: frozen state error detected, disable CXL.mem\n",
> +			 dev_name(dev));
> +		device_release_driver(dev);
> +		return PCI_ERS_RESULT_NEED_RESET;
> +	case pci_channel_io_perm_failure:
> +		dev_warn(&pdev->dev,
> +			 "failure state error detected, request disconnect\n");
> +		return PCI_ERS_RESULT_DISCONNECT;
> +	}
> +	return PCI_ERS_RESULT_NEED_RESET;
> +}
> +
> +static pci_ers_result_t cxl_slot_reset(struct pci_dev *pdev)
> +{
> +	struct cxl_dev_state *cxlds = pci_get_drvdata(pdev);
> +	struct cxl_memdev *cxlmd = cxlds->cxlmd;
> +	struct device *dev = &cxlmd->dev;
> +
> +	dev_info(&pdev->dev, "%s: restart CXL.mem after slot reset\n",
> +		 dev_name(dev));
> +	pci_restore_state(pdev);
> +	if (device_attach(dev) <= 0)
> +		return PCI_ERS_RESULT_DISCONNECT;
> +	return PCI_ERS_RESULT_RECOVERED;
> +}
> +
> +static void cxl_error_resume(struct pci_dev *pdev)
> +{
> +	struct cxl_dev_state *cxlds = pci_get_drvdata(pdev);
> +	struct cxl_memdev *cxlmd = cxlds->cxlmd;
> +	struct device *dev = &cxlmd->dev;
> +
> +	dev_info(&pdev->dev, "%s: error resume %s\n", dev_name(dev),
> +		 dev->driver ? "successful" : "failed");
> +}
> +
> +static const struct pci_error_handlers cxl_error_handlers = {
> +	.error_detected	= cxl_error_detected,
> +	.slot_reset	= cxl_slot_reset,
> +	.resume		= cxl_error_resume,
> +};
> +
>  static struct pci_driver cxl_pci_driver = {
>  	.name			= KBUILD_MODNAME,
>  	.id_table		= cxl_mem_pci_tbl,
>  	.probe			= cxl_pci_probe,
> +	.err_handler		= &cxl_error_handlers,
>  	.driver	= {
>  		.probe_type	= PROBE_PREFER_ASYNCHRONOUS,
>  	},
>

Shiju Jose March 18, 2022, 9:41 a.m. UTC | #2

Hi Dan,

>-----Original Message-----
>From: Dan Williams <dan.j.williams@intel.com>
>Sent: 16 March 2022 04:14
>To: linux-cxl@vger.kernel.org
>Cc: ben.widawsky@intel.com; vishal.l.verma@intel.com;
>alison.schofield@intel.com; Jonathan Cameron
><jonathan.cameron@huawei.com>; ira.weiny@intel.com; linux-
>pci@vger.kernel.org
>Subject: [PATCH 8/8] cxl/pci: Add (hopeful) error handling support
>
>Add nominal error handling that tears down CXL.mem in response to error
>notifications that imply a device reset. Given some CXL.mem may be
>operating as System RAM, there is a high likelihood that these error events
>are fatal. However, if the system survives the notification the expectation is
>that the driver behavior is equivalent to a hot-unplug and re-plug of an
>endpoint.
>
>Note that this does not change the mask values from the default. That awaits
>CXL _OSC support to determine whether platform firmware is in control of the
>mask registers.
>
>Signed-off-by: Dan Williams <dan.j.williams@intel.com>
>---
> drivers/cxl/core/memdev.c |    1
> drivers/cxl/cxlmem.h      |    2 +
> drivers/cxl/pci.c         |  109
>+++++++++++++++++++++++++++++++++++++++++++++
> 3 files changed, 112 insertions(+)
>
>diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c index
>1f76b28f9826..223d512790e1 100644
>--- a/drivers/cxl/core/memdev.c
>+++ b/drivers/cxl/core/memdev.c
>@@ -341,6 +341,7 @@ struct cxl_memdev *devm_cxl_add_memdev(struct
>cxl_dev_state *cxlds)
> 	 * needed as this is ordered with cdev_add() publishing the device.
> 	 */
> 	cxlmd->cxlds = cxlds;
>+	cxlds->cxlmd = cxlmd;
>
> 	cdev = &cxlmd->cdev;
> 	rc = cdev_device_add(cdev, dev);
>diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h index
>5d33ce24fe09..f58e16951414 100644
>--- a/drivers/cxl/cxlmem.h
>+++ b/drivers/cxl/cxlmem.h
>@@ -117,6 +117,7 @@ struct cxl_endpoint_dvsec_info {
>  * Currently only memory devices are represented.
>  *
>  * @dev: The device associated with this CXL state
>+ * @cxlmd: The device representing the CXL.mem capabilities of @dev
>  * @regs: Parsed register blocks
>  * @cxl_dvsec: Offset to the PCIe device DVSEC
>  * @payload_size: Size of space for payload @@ -148,6 +149,7 @@ struct
>cxl_endpoint_dvsec_info {
>  */
> struct cxl_dev_state {
> 	struct device *dev;
>+	struct cxl_memdev *cxlmd;
>
> 	struct cxl_regs regs;
> 	int cxl_dvsec;
>diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c index
>bde8929450f0..823cbfa093fa 100644
>--- a/drivers/cxl/pci.c
>+++ b/drivers/cxl/pci.c
>@@ -8,6 +8,7 @@
> #include <linux/mutex.h>
> #include <linux/list.h>
> #include <linux/pci.h>
>+#include <linux/aer.h>
> #include <linux/io.h>
> #include "cxlmem.h"
> #include "cxlpci.h"
>@@ -533,6 +534,11 @@ static void cxl_dvsec_ranges(struct cxl_dev_state
>*cxlds)
> 	info->ranges = __cxl_dvsec_ranges(cxlds, info);  }
>
>+static void disable_aer(void *pdev)
>+{
>+	pci_disable_pcie_error_reporting(pdev);
>+}
>+
> static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
>{
> 	struct cxl_register_map map;
>@@ -554,6 +560,7 @@ static int cxl_pci_probe(struct pci_dev *pdev, const
>struct pci_device_id *id)
> 	cxlds = cxl_dev_state_create(&pdev->dev);
> 	if (IS_ERR(cxlds))
> 		return PTR_ERR(cxlds);
>+	pci_set_drvdata(pdev, cxlds);
>
> 	cxlds->serial = pci_get_dsn(pdev);
> 	cxlds->cxl_dvsec = pci_find_dvsec_capability( @@ -610,6 +617,14 @@
>static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
> 	if (IS_ERR(cxlmd))
> 		return PTR_ERR(cxlmd);
>
>+	if (cxlds->regs.ras) {
>+		pci_enable_pcie_error_reporting(pdev);
>+		rc = devm_add_action_or_reset(&pdev->dev, disable_aer,
>pdev);
>+		if (rc)
>+			return rc;
>+	}
>+	pci_save_state(pdev);
>+
> 	if (range_len(&cxlds->pmem_range) &&
>IS_ENABLED(CONFIG_CXL_PMEM))
> 		rc = devm_cxl_add_nvdimm(&pdev->dev, cxlmd);
>
>@@ -623,10 +638,104 @@ static const struct pci_device_id cxl_mem_pci_tbl[]
>= {  };  MODULE_DEVICE_TABLE(pci, cxl_mem_pci_tbl);
>
>+/*
>+ * Log the state of the RAS status registers and prepare them to log
>+the
>+ * next error status.
>+ */
>+static void cxl_report_and_clear(struct cxl_dev_state *cxlds) {
>+	struct cxl_memdev *cxlmd = cxlds->cxlmd;
>+	struct device *dev = &cxlmd->dev;
>+	void __iomem *addr;
>+	u32 status;
>+
>+	if (!cxlds->regs.ras)
>+		return;
In the cxl_error_detected () may need to return PCI_ERS_RESULT_NONE
for the following cases, if exist,
1. if (!cxlds->regs.ras),
2. if any errors would be reported during the dev initialization. 

>+
>+	addr = cxlds->regs.ras + CXL_RAS_UNCORRECTABLE_STATUS_OFFSET;
>+	status = readl(addr);
>+	if (status & CXL_RAS_UNCORRECTABLE_STATUS_MASK) {
>+		dev_warn(cxlds->dev, "%s: uncorrectable status: %#08x\n",
>+			 dev_name(dev), status);
>+		writel(status & CXL_RAS_UNCORRECTABLE_STATUS_MASK,
>addr);
>+	}
For the uncorrectable non-fatal errors, if any, may need to return PCI_ERS_RESULT_NEED_RESET
to trigger the slot reset to prevent more serious issues later. For this case the state would be
"pci_channel_io_normal".
 
>+
>+	addr = cxlds->regs.ras + CXL_RAS_CORRECTABLE_STATUS_OFFSET;
>+	status = readl(addr);
>+	if (status & CXL_RAS_CORRECTABLE_STATUS_MASK) {
>+		dev_warn(cxlds->dev, "%s: correctable status: %#08x\n",
>+			 dev_name(dev), status);
>+		writel(status & CXL_RAS_CORRECTABLE_STATUS_MASK,
>addr);
>+	}
>+}
>+
>+static pci_ers_result_t cxl_error_detected(struct pci_dev *pdev,
>+					   pci_channel_state_t state)
>+{
>+	struct cxl_dev_state *cxlds = pci_get_drvdata(pdev);
>+	struct cxl_memdev *cxlmd = cxlds->cxlmd;
>+	struct device *dev = &cxlmd->dev;
>+
>+	/*
>+	 * A frozen channel indicates an impending reset which is fatal to
>+	 * CXL.mem operation, and will likely crash the system. On the off
>+	 * chance the situation is recoverable dump the status of the RAS
>+	 * capability registers and bounce the active state of the memdev.
>+	 */
>+	cxl_report_and_clear(cxlds);
>+
>+	switch (state) {
>+	case pci_channel_io_normal:
>+		return PCI_ERS_RESULT_CAN_RECOVER;
>+	case pci_channel_io_frozen:
>+		dev_warn(&pdev->dev,
>+			 "%s: frozen state error detected, disable
>CXL.mem\n",
>+			 dev_name(dev));
>+		device_release_driver(dev);
>+		return PCI_ERS_RESULT_NEED_RESET;
>+	case pci_channel_io_perm_failure:
>+		dev_warn(&pdev->dev,
>+			 "failure state error detected, request disconnect\n");
>+		return PCI_ERS_RESULT_DISCONNECT;
>+	}
>+	return PCI_ERS_RESULT_NEED_RESET;
>+}
>+
>+static pci_ers_result_t cxl_slot_reset(struct pci_dev *pdev) {
>+	struct cxl_dev_state *cxlds = pci_get_drvdata(pdev);
>+	struct cxl_memdev *cxlmd = cxlds->cxlmd;
>+	struct device *dev = &cxlmd->dev;
>+
>+	dev_info(&pdev->dev, "%s: restart CXL.mem after slot reset\n",
>+		 dev_name(dev));
>+	pci_restore_state(pdev);
1. Do we need to call pci_save_state(pdev) here after the reset? though pci_save_state(pdev) is being invoked in the  
cxl_pci_probe().

>+	if (device_attach(dev) <= 0)
>+		return PCI_ERS_RESULT_DISCONNECT;
My understanding is that pci_disable_pcie_error_reporting(pdev) would be called
in the disable_aer () in the reset, 
pci_enable_pcie_error_reporting(pdev) may need to call here after the reset?
 
>+	return PCI_ERS_RESULT_RECOVERED;
>+}
>+
>+static void cxl_error_resume(struct pci_dev *pdev) {
>+	struct cxl_dev_state *cxlds = pci_get_drvdata(pdev);
>+	struct cxl_memdev *cxlmd = cxlds->cxlmd;
>+	struct device *dev = &cxlmd->dev;
>+
>+	dev_info(&pdev->dev, "%s: error resume %s\n", dev_name(dev),
>+		 dev->driver ? "successful" : "failed"); }
>+
>+static const struct pci_error_handlers cxl_error_handlers = {
>+	.error_detected	= cxl_error_detected,
>+	.slot_reset	= cxl_slot_reset,
>+	.resume		= cxl_error_resume,
If the FLR (Function level reset) supported, please add the corresponding callback functions
reset_prepare(..) and reset_done(..).

>+};
>+
> static struct pci_driver cxl_pci_driver = {
> 	.name			= KBUILD_MODNAME,
> 	.id_table		= cxl_mem_pci_tbl,
> 	.probe			= cxl_pci_probe,
>+	.err_handler		= &cxl_error_handlers,
> 	.driver	= {
> 		.probe_type	= PROBE_PREFER_ASYNCHRONOUS,
> 	},


Thanks,
Shiju

Dan Williams April 24, 2022, 10:15 p.m. UTC | #3

On Fri, Mar 18, 2022 at 2:42 AM Shiju Jose <shiju.jose@huawei.com> wrote:
>
> Hi Dan,

Hi, thanks for taking a look at this.

>
> >-----Original Message-----
> >From: Dan Williams <dan.j.williams@intel.com>
> >Sent: 16 March 2022 04:14
> >To: linux-cxl@vger.kernel.org
> >Cc: ben.widawsky@intel.com; vishal.l.verma@intel.com;
> >alison.schofield@intel.com; Jonathan Cameron
> ><jonathan.cameron@huawei.com>; ira.weiny@intel.com; linux-
> >pci@vger.kernel.org
> >Subject: [PATCH 8/8] cxl/pci: Add (hopeful) error handling support
> >
> >Add nominal error handling that tears down CXL.mem in response to error
> >notifications that imply a device reset. Given some CXL.mem may be
> >operating as System RAM, there is a high likelihood that these error events
> >are fatal. However, if the system survives the notification the expectation is
> >that the driver behavior is equivalent to a hot-unplug and re-plug of an
> >endpoint.
> >
> >Note that this does not change the mask values from the default. That awaits
> >CXL _OSC support to determine whether platform firmware is in control of the
> >mask registers.
> >
> >Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> >---
> > drivers/cxl/core/memdev.c |    1
> > drivers/cxl/cxlmem.h      |    2 +
> > drivers/cxl/pci.c         |  109
> >+++++++++++++++++++++++++++++++++++++++++++++
> > 3 files changed, 112 insertions(+)
> >
> >diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c index
> >1f76b28f9826..223d512790e1 100644
> >--- a/drivers/cxl/core/memdev.c
> >+++ b/drivers/cxl/core/memdev.c
> >@@ -341,6 +341,7 @@ struct cxl_memdev *devm_cxl_add_memdev(struct
> >cxl_dev_state *cxlds)
> >        * needed as this is ordered with cdev_add() publishing the device.
> >        */
> >       cxlmd->cxlds = cxlds;
> >+      cxlds->cxlmd = cxlmd;
> >
> >       cdev = &cxlmd->cdev;
> >       rc = cdev_device_add(cdev, dev);
> >diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h index
> >5d33ce24fe09..f58e16951414 100644
> >--- a/drivers/cxl/cxlmem.h
> >+++ b/drivers/cxl/cxlmem.h
> >@@ -117,6 +117,7 @@ struct cxl_endpoint_dvsec_info {
> >  * Currently only memory devices are represented.
> >  *
> >  * @dev: The device associated with this CXL state
> >+ * @cxlmd: The device representing the CXL.mem capabilities of @dev
> >  * @regs: Parsed register blocks
> >  * @cxl_dvsec: Offset to the PCIe device DVSEC
> >  * @payload_size: Size of space for payload @@ -148,6 +149,7 @@ struct
> >cxl_endpoint_dvsec_info {
> >  */
> > struct cxl_dev_state {
> >       struct device *dev;
> >+      struct cxl_memdev *cxlmd;
> >
> >       struct cxl_regs regs;
> >       int cxl_dvsec;
> >diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c index
> >bde8929450f0..823cbfa093fa 100644
> >--- a/drivers/cxl/pci.c
> >+++ b/drivers/cxl/pci.c
> >@@ -8,6 +8,7 @@
> > #include <linux/mutex.h>
> > #include <linux/list.h>
> > #include <linux/pci.h>
> >+#include <linux/aer.h>
> > #include <linux/io.h>
> > #include "cxlmem.h"
> > #include "cxlpci.h"
> >@@ -533,6 +534,11 @@ static void cxl_dvsec_ranges(struct cxl_dev_state
> >*cxlds)
> >       info->ranges = __cxl_dvsec_ranges(cxlds, info);  }
> >
> >+static void disable_aer(void *pdev)
> >+{
> >+      pci_disable_pcie_error_reporting(pdev);
> >+}
> >+
> > static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
> >{
> >       struct cxl_register_map map;
> >@@ -554,6 +560,7 @@ static int cxl_pci_probe(struct pci_dev *pdev, const
> >struct pci_device_id *id)
> >       cxlds = cxl_dev_state_create(&pdev->dev);
> >       if (IS_ERR(cxlds))
> >               return PTR_ERR(cxlds);
> >+      pci_set_drvdata(pdev, cxlds);
> >
> >       cxlds->serial = pci_get_dsn(pdev);
> >       cxlds->cxl_dvsec = pci_find_dvsec_capability( @@ -610,6 +617,14 @@
> >static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
> >       if (IS_ERR(cxlmd))
> >               return PTR_ERR(cxlmd);
> >
> >+      if (cxlds->regs.ras) {
> >+              pci_enable_pcie_error_reporting(pdev);
> >+              rc = devm_add_action_or_reset(&pdev->dev, disable_aer,
> >pdev);
> >+              if (rc)
> >+                      return rc;
> >+      }
> >+      pci_save_state(pdev);
> >+
> >       if (range_len(&cxlds->pmem_range) &&
> >IS_ENABLED(CONFIG_CXL_PMEM))
> >               rc = devm_cxl_add_nvdimm(&pdev->dev, cxlmd);
> >
> >@@ -623,10 +638,104 @@ static const struct pci_device_id cxl_mem_pci_tbl[]
> >= {  };  MODULE_DEVICE_TABLE(pci, cxl_mem_pci_tbl);
> >
> >+/*
> >+ * Log the state of the RAS status registers and prepare them to log
> >+the
> >+ * next error status.
> >+ */
> >+static void cxl_report_and_clear(struct cxl_dev_state *cxlds) {
> >+      struct cxl_memdev *cxlmd = cxlds->cxlmd;
> >+      struct device *dev = &cxlmd->dev;
> >+      void __iomem *addr;
> >+      u32 status;
> >+
> >+      if (!cxlds->regs.ras)
> >+              return;
> In the cxl_error_detected () may need to return PCI_ERS_RESULT_NONE
> for the following cases, if exist,
> 1. if (!cxlds->regs.ras),

Yes, but given that the RAS capability is mandatory for CXL devices
then I think the driver should just fail to register altogether if the
RAS register are not found / mapped.

> 2. if any errors would be reported during the dev initialization.

This can't happen. The err_handler callback process takes the
device_lock() which ensures that any initialization that has started
completes before the callback is invoked.

>
> >+
> >+      addr = cxlds->regs.ras + CXL_RAS_UNCORRECTABLE_STATUS_OFFSET;
> >+      status = readl(addr);
> >+      if (status & CXL_RAS_UNCORRECTABLE_STATUS_MASK) {
> >+              dev_warn(cxlds->dev, "%s: uncorrectable status: %#08x\n",
> >+                       dev_name(dev), status);
> >+              writel(status & CXL_RAS_UNCORRECTABLE_STATUS_MASK,
> >addr);
> >+      }
> For the uncorrectable non-fatal errors, if any, may need to return PCI_ERS_RESULT_NEED_RESET
> to trigger the slot reset to prevent more serious issues later. For this case the state would be
> "pci_channel_io_normal".
>

Ah true, some pci_channel_io_normal recovery conditions still result
in reset, will fix.

> >+
> >+      addr = cxlds->regs.ras + CXL_RAS_CORRECTABLE_STATUS_OFFSET;
> >+      status = readl(addr);
> >+      if (status & CXL_RAS_CORRECTABLE_STATUS_MASK) {
> >+              dev_warn(cxlds->dev, "%s: correctable status: %#08x\n",
> >+                       dev_name(dev), status);
> >+              writel(status & CXL_RAS_CORRECTABLE_STATUS_MASK,
> >addr);
> >+      }
> >+}
> >+
> >+static pci_ers_result_t cxl_error_detected(struct pci_dev *pdev,
> >+                                         pci_channel_state_t state)
> >+{
> >+      struct cxl_dev_state *cxlds = pci_get_drvdata(pdev);
> >+      struct cxl_memdev *cxlmd = cxlds->cxlmd;
> >+      struct device *dev = &cxlmd->dev;
> >+
> >+      /*
> >+       * A frozen channel indicates an impending reset which is fatal to
> >+       * CXL.mem operation, and will likely crash the system. On the off
> >+       * chance the situation is recoverable dump the status of the RAS
> >+       * capability registers and bounce the active state of the memdev.
> >+       */
> >+      cxl_report_and_clear(cxlds);
> >+
> >+      switch (state) {
> >+      case pci_channel_io_normal:
> >+              return PCI_ERS_RESULT_CAN_RECOVER;
> >+      case pci_channel_io_frozen:
> >+              dev_warn(&pdev->dev,
> >+                       "%s: frozen state error detected, disable
> >CXL.mem\n",
> >+                       dev_name(dev));
> >+              device_release_driver(dev);
> >+              return PCI_ERS_RESULT_NEED_RESET;
> >+      case pci_channel_io_perm_failure:
> >+              dev_warn(&pdev->dev,
> >+                       "failure state error detected, request disconnect\n");
> >+              return PCI_ERS_RESULT_DISCONNECT;
> >+      }
> >+      return PCI_ERS_RESULT_NEED_RESET;
> >+}
> >+
> >+static pci_ers_result_t cxl_slot_reset(struct pci_dev *pdev) {
> >+      struct cxl_dev_state *cxlds = pci_get_drvdata(pdev);
> >+      struct cxl_memdev *cxlmd = cxlds->cxlmd;
> >+      struct device *dev = &cxlmd->dev;
> >+
> >+      dev_info(&pdev->dev, "%s: restart CXL.mem after slot reset\n",
> >+               dev_name(dev));
> >+      pci_restore_state(pdev);
> 1. Do we need to call pci_save_state(pdev) here after the reset? though pci_save_state(pdev) is being invoked in the
> cxl_pci_probe().

The save state after probe is sufficient, it does not need to be
snapshotted again as far as I can see.

>
> >+      if (device_attach(dev) <= 0)
> >+              return PCI_ERS_RESULT_DISCONNECT;
> My understanding is that pci_disable_pcie_error_reporting(pdev) would be called
> in the disable_aer () in the reset,
> pci_enable_pcie_error_reporting(pdev) may need to call here after the reset?

After the device is disconnected the driver needs to be reloaded to
recover AER operation.

>
> >+      return PCI_ERS_RESULT_RECOVERED;
> >+}
> >+
> >+static void cxl_error_resume(struct pci_dev *pdev) {
> >+      struct cxl_dev_state *cxlds = pci_get_drvdata(pdev);
> >+      struct cxl_memdev *cxlmd = cxlds->cxlmd;
> >+      struct device *dev = &cxlmd->dev;
> >+
> >+      dev_info(&pdev->dev, "%s: error resume %s\n", dev_name(dev),
> >+               dev->driver ? "successful" : "failed"); }
> >+
> >+static const struct pci_error_handlers cxl_error_handlers = {
> >+      .error_detected = cxl_error_detected,
> >+      .slot_reset     = cxl_slot_reset,
> >+      .resume         = cxl_error_resume,
> If the FLR (Function level reset) supported, please add the corresponding callback functions
> reset_prepare(..) and reset_done(..).

No, FLR does not recover any of the errors reported via AER.

>
> >+};
> >+
> > static struct pci_driver cxl_pci_driver = {
> >       .name                   = KBUILD_MODNAME,
> >       .id_table               = cxl_mem_pci_tbl,
> >       .probe                  = cxl_pci_probe,
> >+      .err_handler            = &cxl_error_handlers,
> >       .driver = {
> >               .probe_type     = PROBE_PREFER_ASYNCHRONOUS,
> >       },
>
>
> Thanks,
> Shiju

diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
index 1f76b28f9826..223d512790e1 100644
--- a/drivers/cxl/core/memdev.c
+++ b/drivers/cxl/core/memdev.c
@@ -341,6 +341,7 @@  struct cxl_memdev *devm_cxl_add_memdev(struct cxl_dev_state *cxlds)
 	 * needed as this is ordered with cdev_add() publishing the device.
 	 */
 	cxlmd->cxlds = cxlds;
+	cxlds->cxlmd = cxlmd;
 
 	cdev = &cxlmd->cdev;
 	rc = cdev_device_add(cdev, dev);
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index 5d33ce24fe09..f58e16951414 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -117,6 +117,7 @@  struct cxl_endpoint_dvsec_info {
  * Currently only memory devices are represented.
  *
  * @dev: The device associated with this CXL state
+ * @cxlmd: The device representing the CXL.mem capabilities of @dev
  * @regs: Parsed register blocks
  * @cxl_dvsec: Offset to the PCIe device DVSEC
  * @payload_size: Size of space for payload
@@ -148,6 +149,7 @@  struct cxl_endpoint_dvsec_info {
  */
 struct cxl_dev_state {
 	struct device *dev;
+	struct cxl_memdev *cxlmd;
 
 	struct cxl_regs regs;
 	int cxl_dvsec;
diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
index bde8929450f0..823cbfa093fa 100644
--- a/drivers/cxl/pci.c
+++ b/drivers/cxl/pci.c
@@ -8,6 +8,7 @@ 
 #include <linux/mutex.h>
 #include <linux/list.h>
 #include <linux/pci.h>
+#include <linux/aer.h>
 #include <linux/io.h>
 #include "cxlmem.h"
 #include "cxlpci.h"
@@ -533,6 +534,11 @@  static void cxl_dvsec_ranges(struct cxl_dev_state *cxlds)
 	info->ranges = __cxl_dvsec_ranges(cxlds, info);
 }
 
+static void disable_aer(void *pdev)
+{
+	pci_disable_pcie_error_reporting(pdev);
+}
+
 static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 {
 	struct cxl_register_map map;
@@ -554,6 +560,7 @@  static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 	cxlds = cxl_dev_state_create(&pdev->dev);
 	if (IS_ERR(cxlds))
 		return PTR_ERR(cxlds);
+	pci_set_drvdata(pdev, cxlds);
 
 	cxlds->serial = pci_get_dsn(pdev);
 	cxlds->cxl_dvsec = pci_find_dvsec_capability(
@@ -610,6 +617,14 @@  static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 	if (IS_ERR(cxlmd))
 		return PTR_ERR(cxlmd);
 
+	if (cxlds->regs.ras) {
+		pci_enable_pcie_error_reporting(pdev);
+		rc = devm_add_action_or_reset(&pdev->dev, disable_aer, pdev);
+		if (rc)
+			return rc;
+	}
+	pci_save_state(pdev);
+
 	if (range_len(&cxlds->pmem_range) && IS_ENABLED(CONFIG_CXL_PMEM))
 		rc = devm_cxl_add_nvdimm(&pdev->dev, cxlmd);
 
@@ -623,10 +638,104 @@  static const struct pci_device_id cxl_mem_pci_tbl[] = {
 };
 MODULE_DEVICE_TABLE(pci, cxl_mem_pci_tbl);
 
+/*
+ * Log the state of the RAS status registers and prepare them to log the
+ * next error status.
+ */
+static void cxl_report_and_clear(struct cxl_dev_state *cxlds)
+{
+	struct cxl_memdev *cxlmd = cxlds->cxlmd;
+	struct device *dev = &cxlmd->dev;
+	void __iomem *addr;
+	u32 status;
+
+	if (!cxlds->regs.ras)
+		return;
+
+	addr = cxlds->regs.ras + CXL_RAS_UNCORRECTABLE_STATUS_OFFSET;
+	status = readl(addr);
+	if (status & CXL_RAS_UNCORRECTABLE_STATUS_MASK) {
+		dev_warn(cxlds->dev, "%s: uncorrectable status: %#08x\n",
+			 dev_name(dev), status);
+		writel(status & CXL_RAS_UNCORRECTABLE_STATUS_MASK, addr);
+	}
+
+	addr = cxlds->regs.ras + CXL_RAS_CORRECTABLE_STATUS_OFFSET;
+	status = readl(addr);
+	if (status & CXL_RAS_CORRECTABLE_STATUS_MASK) {
+		dev_warn(cxlds->dev, "%s: correctable status: %#08x\n",
+			 dev_name(dev), status);
+		writel(status & CXL_RAS_CORRECTABLE_STATUS_MASK, addr);
+	}
+}
+
+static pci_ers_result_t cxl_error_detected(struct pci_dev *pdev,
+					   pci_channel_state_t state)
+{
+	struct cxl_dev_state *cxlds = pci_get_drvdata(pdev);
+	struct cxl_memdev *cxlmd = cxlds->cxlmd;
+	struct device *dev = &cxlmd->dev;
+
+	/*
+	 * A frozen channel indicates an impending reset which is fatal to
+	 * CXL.mem operation, and will likely crash the system. On the off
+	 * chance the situation is recoverable dump the status of the RAS
+	 * capability registers and bounce the active state of the memdev.
+	 */
+	cxl_report_and_clear(cxlds);
+
+	switch (state) {
+	case pci_channel_io_normal:
+		return PCI_ERS_RESULT_CAN_RECOVER;
+	case pci_channel_io_frozen:
+		dev_warn(&pdev->dev,
+			 "%s: frozen state error detected, disable CXL.mem\n",
+			 dev_name(dev));
+		device_release_driver(dev);
+		return PCI_ERS_RESULT_NEED_RESET;
+	case pci_channel_io_perm_failure:
+		dev_warn(&pdev->dev,
+			 "failure state error detected, request disconnect\n");
+		return PCI_ERS_RESULT_DISCONNECT;
+	}
+	return PCI_ERS_RESULT_NEED_RESET;
+}
+
+static pci_ers_result_t cxl_slot_reset(struct pci_dev *pdev)
+{
+	struct cxl_dev_state *cxlds = pci_get_drvdata(pdev);
+	struct cxl_memdev *cxlmd = cxlds->cxlmd;
+	struct device *dev = &cxlmd->dev;
+
+	dev_info(&pdev->dev, "%s: restart CXL.mem after slot reset\n",
+		 dev_name(dev));
+	pci_restore_state(pdev);
+	if (device_attach(dev) <= 0)
+		return PCI_ERS_RESULT_DISCONNECT;
+	return PCI_ERS_RESULT_RECOVERED;
+}
+
+static void cxl_error_resume(struct pci_dev *pdev)
+{
+	struct cxl_dev_state *cxlds = pci_get_drvdata(pdev);
+	struct cxl_memdev *cxlmd = cxlds->cxlmd;
+	struct device *dev = &cxlmd->dev;
+
+	dev_info(&pdev->dev, "%s: error resume %s\n", dev_name(dev),
+		 dev->driver ? "successful" : "failed");
+}
+
+static const struct pci_error_handlers cxl_error_handlers = {
+	.error_detected	= cxl_error_detected,
+	.slot_reset	= cxl_slot_reset,
+	.resume		= cxl_error_resume,
+};
+
 static struct pci_driver cxl_pci_driver = {
 	.name			= KBUILD_MODNAME,
 	.id_table		= cxl_mem_pci_tbl,
 	.probe			= cxl_pci_probe,
+	.err_handler		= &cxl_error_handlers,
 	.driver	= {
 		.probe_type	= PROBE_PREFER_ASYNCHRONOUS,
 	},

[8/8] cxl/pci: Add (hopeful) error handling support

Commit Message

Comments

Patch