Message ID | 20240221140405.28532-4-root@hu-msarkar-hyd.qualcomm.com (mailing list archive) |
---|---|
State | Not Applicable, archived |
Headers | show |
Series | Add support for detecting Controller Level PCIe Errors | expand |
On 21/02/2024 15:04, root wrote: > From: Nitesh Gupta <nitegupt@quicinc.com> > > Synopsys Controllers provide capabilities to detect various controller > level errors. These can range from controller interface error to random > PCIe configuration errors. This patch intends to add support to detect > these errors and report it to userspace entity via sysfs, which can take > appropriate actions to mitigate the errors. > ... > + > static void qcom_pcie_init_debugfs(struct qcom_pcie *pcie) > { > struct dw_pcie *pci = pcie->pci; > @@ -1496,6 +1829,21 @@ static int qcom_pcie_probe(struct platform_device *pdev) > goto err_pm_runtime_put; > } > > + pcie->global_irq = platform_get_irq_byname(pdev, "global"); > + if (pcie->global_irq < 0) { > + ret = pcie->global_irq; > + goto err_pm_runtime_put; How does this work with old DTS and with all other platforms? Was it tested? Best regards, Krzysztof
On Wed, Feb 21, 2024 at 07:34:04PM +0530, root wrote: > From: Nitesh Gupta <nitegupt@quicinc.com> > > Synopsys Controllers provide capabilities to detect various controller "Synopsys controllers"? "Synopsys" refers to the DesignWare core, but most of this code is in the qcom driver. If it's qcom-specific, this should say "Qualcomm controllers". > level errors. These can range from controller interface error to random > PCIe configuration errors. This patch intends to add support to detect > these errors and report it to userspace entity via sysfs, which can take > appropriate actions to mitigate the errors. s/This patch intends to add/Add/, so the commit log says what the patch *does*, not "what it intends to do". > + > +/* > + * Error Reporting DBI register > + */ Typical style in this file (granted, it's not 100% consistent) is to make these single-line comments, i.e., /* Error Reporting DBI register */ > +#define DBI_DEVICE_CONTROL_DEVICE_STATUS 0x78 > +#define DBI_ROOT_CONTROL_ROOT_CAPABILITIES_REG 0x8c Most other #defines in this file use upper-case hex. > +#define PCIE_AER_EXT_CAP_ID 0x01 Why not the existing PCI_EXT_CAP_ID_ERR? If this is the standard PCIe AER stuff, we shouldn't make it needlessly device-specific. > +#define PCI_EXT_CAP_RASDP_ID 0x0b Looks like possibly PCI_EXT_CAP_ID_VNDR? Capability IDs are definitely not device-specific. The fact that a PCI_EXT_CAP_ID_VNDR capability in a device with Vendor ID PCI_VENDOR_ID_QCOM has a qcom-specific meaning is obviously specific to qcom, but the Capability ID itself is not. > +/* DBI_ROOT_CONTROL_ROOT_CAPABILITIES_REG register fields */ > +#define PCIE_CAP_SYS_ERR_ON_CORR_ERR_EN BIT(0) > +#define PCIE_CAP_SYS_ERR_ON_NON_FATAL_ERR_EN BIT(1) > +#define PCIE_CAP_SYS_ERR_ON_FATAL_ERR_EN BIT(2) > + > +/* DBI_DEVICE_CONTROL_DEVICE_STATUS register fields */ > +#define PCIE_CAP_UNSUPPORT_REQ_REP_EN BIT(3) > +#define PCIE_CAP_FATAL_ERR_REPORT_EN BIT(2) > +#define PCIE_CAP_NON_FATAL_ERR_REPORT_EN BIT(1) > +#define PCIE_CAP_CORR_ERR_REPORT_EN BIT(0) These look like alternate ways to access the generic PCIe Capability. If that's the case, either use the existing PCI_EXP_RTCTL_SECEE, PCI_EXP_DEVCTL_CERE, etc., or at least match the "RTCTL_SECEE" parts of the names so we can see the connection. > +/* DBI_ADV_ERR_CAP_CTRL_OFF register fields */ > +#define ECRC_GEN_EN BIT(6) > +#define ECRC_CHECK_EN BIT(8) Do these correspond to PCI_ERR_CAP_ECRC_GENE, PCI_ERR_CAP_ECRC_CHKE? > +/* DBI_ROOT_ERR_CMD_OFF register fields */ > +#define CORR_ERR_REPORTING_EN BIT(0) > +#define NON_FATAL_ERR_REPORTING_EN BIT(1) > +#define FATAL_ERR_REPORTING_EN BIT(2) PCI_ERR_ROOT_CMD_COR_EN, etc? > +static void qcom_pcie_enable_error_reporting_2_7_0(struct qcom_pcie *pcie) > +{ > + ... > + val = readl(pci->dbi_base + DBI_DEVICE_CONTROL_DEVICE_STATUS); > + val |= (PCIE_CAP_CORR_ERR_REPORT_EN | PCIE_CAP_NON_FATAL_ERR_REPORT_EN | > + PCIE_CAP_FATAL_ERR_REPORT_EN | PCIE_CAP_UNSUPPORT_REQ_REP_EN); > + writel(val, pci->dbi_base + DBI_DEVICE_CONTROL_DEVICE_STATUS); Is there any way to split the AER part (specified by the PCIe spec) from the qcom-specific (or dwc-specific) part? This looks an awful lot like pci_enable_pcie_error_reporting(), and we should do this in the PCI core in a generic way if possible. > + val = readl(pci->dbi_base + DBI_ROOT_CONTROL_ROOT_CAPABILITIES_REG); > + val |= (PCIE_CAP_SYS_ERR_ON_CORR_ERR_EN | PCIE_CAP_SYS_ERR_ON_NON_FATAL_ERR_EN | > + PCIE_CAP_SYS_ERR_ON_FATAL_ERR_EN); > + writel(val, pci->dbi_base + DBI_ROOT_CONTROL_ROOT_CAPABILITIES_REG); Bjorn
On Wed, Feb 21, 2024 at 07:34:04PM +0530, root wrote: > From: Nitesh Gupta <nitegupt@quicinc.com> > > Synopsys Controllers provide capabilities to detect various controller > level errors. These can range from controller interface error to random > PCIe configuration errors. This patch intends to add support to detect > these errors and report it to userspace entity via sysfs, which can take > appropriate actions to mitigate the errors. > > Signed-off-by: Nitesh Gupta <nitegupt@quicinc.com> > Signed-off-by: Mrinmay Sarkar <quic_msarkar@quicinc.com> > --- > drivers/pci/controller/dwc/pcie-designware.h | 26 ++ > drivers/pci/controller/dwc/pcie-qcom.c | 350 +++++++++++++++++++ > 2 files changed, 376 insertions(+) > > diff --git a/drivers/pci/controller/dwc/pcie-designware.h b/drivers/pci/controller/dwc/pcie-designware.h > index 26dae4837462..cd45f9a2f9bc 100644 > --- a/drivers/pci/controller/dwc/pcie-designware.h > +++ b/drivers/pci/controller/dwc/pcie-designware.h > @@ -223,6 +223,32 @@ > > #define PCIE_RAS_DES_EVENT_COUNTER_DATA 0xc > > + > +/* > + * Error Reporting DBI register > + */ > +#define DBI_DEVICE_CONTROL_DEVICE_STATUS 0x78 > +#define DBI_ROOT_CONTROL_ROOT_CAPABILITIES_REG 0x8c > +#define DBI_INTERFACE_TIMER_STATUS 0x938 > +#define DBI_SAFETY_MASK_OFF 0x960 > +#define DBI_SAFETY_STATUS 0x964 > + > +#define DBI_ADV_ERR_CAP_CTRL_OFF 0x18 > +#define DBI_ROOT_ERR_CMD_OFF 0x2c > + > +/* > + * RAS-DP register > + */ > +#define PCIE_RASDP_ERROR_MODE_EN_REG 0x28 > +#define RASDP_ERROR_MODE_EN BIT(0) > + > +/* > + * Interface Timer register > + */ > +#define PCIE_INTERFACE_TIMER_CONTROL 0x930 > +#define INTERFACE_TIMER_EN BIT(0) > +#define INTERFACE_TIMER_AER_EN BIT(1) > + > /* > * The default address offset between dbi_base and atu_base. Root controller > * drivers are not required to initialize atu_base if the offset matches this > diff --git a/drivers/pci/controller/dwc/pcie-qcom.c b/drivers/pci/controller/dwc/pcie-qcom.c > index 10f2d0bb86be..138e3b08d4b9 100644 > --- a/drivers/pci/controller/dwc/pcie-qcom.c > +++ b/drivers/pci/controller/dwc/pcie-qcom.c > @@ -20,6 +20,7 @@ > #include <linux/kernel.h> > #include <linux/init.h> > #include <linux/of.h> > +#include <linux/device.h> > #include <linux/of_gpio.h> > #include <linux/pci.h> > #include <linux/pm_runtime.h> > @@ -68,6 +69,73 @@ > #define PARF_DEBUG_CNT_AUX_CLK_IN_L1SUB_L1 0xc84 > #define PARF_DEBUG_CNT_AUX_CLK_IN_L1SUB_L2 0xc88 > > +/* Error Reporting Parf Registers */ > +#define PARF_INT_ALL_STATUS 0x224 > +#define PARF_INT_ALL_CLEAR 0x228 > +#define PARF_INT_CLEAR 0x21c > +#define PARF_INT_STATUS 0x220 > +#define PARF_INT_ALL_MASK 0x22c > +#define PARF_INT_ALL_2_CLEAR 0x504 > +#define PARF_INT_ALL_2_STATUS 0x500 > +#define PARF_INT_ALL_3_CLEAR 0x2e14 > +#define PARF_INT_ALL_3_STATUS 0x2e10 > +#define PARF_INT_ALL_4_CLEAR 0x2dd8 > +#define PARF_INT_ALL_4_STATUS 0x2dd0 > +#define PARF_INT_ALL_5_CLEAR 0x2ddc > +#define PARF_INT_ALL_5_STATUS 0x2dd4 > +#define PARF_CFG_SAFETY_INT_MASK_CTRL 0x2c60 > + > + > +#define PCIE_AER_EXT_CAP_ID 0x01 > +#define PCI_EXT_CAP_RASDP_ID 0x0b > + > +/* Interrupt Masks */ > +#define CFGPCIE_INT_ALL_STATUS_MASK 0x3ff3e > +#define CFGPCIE_PARF_INT_STATUS_MASK 0x1b > +#define CFGPCIE_INTERFACE_TIMER_STATUS_MASK 0xe7b > +#define CFGPCIE_INT_ALL_2_STATUS_MASK GENMASK(24, 0) > +#define CFGPCIE_INT_ALL_3_STATUS_MASK GENMASK(31, 0) > +#define CFGPCIE_INT_ALL_4_STATUS_MASK GENMASK(31, 0) > +#define CFGPCIE_INT_ALL_5_STATUS_MASK GENMASK(31, 0) > + > +/* PCI_INTERRUPT_LINE register field */ > +#define SERR_EN BIT(17) > + > +/* DBI_ROOT_CONTROL_ROOT_CAPABILITIES_REG register fields */ > +#define PCIE_CAP_SYS_ERR_ON_CORR_ERR_EN BIT(0) > +#define PCIE_CAP_SYS_ERR_ON_NON_FATAL_ERR_EN BIT(1) > +#define PCIE_CAP_SYS_ERR_ON_FATAL_ERR_EN BIT(2) > + > +/* DBI_DEVICE_CONTROL_DEVICE_STATUS register fields */ > +#define PCIE_CAP_UNSUPPORT_REQ_REP_EN BIT(3) > +#define PCIE_CAP_FATAL_ERR_REPORT_EN BIT(2) > +#define PCIE_CAP_NON_FATAL_ERR_REPORT_EN BIT(1) > +#define PCIE_CAP_CORR_ERR_REPORT_EN BIT(0) > + > +/* PARF_CFG_SAFETY_INT_MASK_CTRL register fields */ > +#define CFG_SAFETY_UNCORR_INT_MASK BIT(0) > +#define CFG_SAFETY_CORR_INT_MASK BIT(1) > + > +/* DBI_ADV_ERR_CAP_CTRL_OFF register fields */ > +#define ECRC_GEN_EN BIT(6) > +#define ECRC_CHECK_EN BIT(8) > + > +/* DBI_ROOT_ERR_CMD_OFF register fields */ > +#define CORR_ERR_REPORTING_EN BIT(0) > +#define NON_FATAL_ERR_REPORTING_EN BIT(1) > +#define FATAL_ERR_REPORTING_EN BIT(2) > + > +/* DBI_SAFETY_MASK_OFF register fields */ > +#define SAFETY_INT_MASK GENMASK(5, 0) > + > +/* DBI_SAFETY_STATUS register fields */ > +#define PCIE_RASDP_UNCORR_ERR BIT(0) > +#define PCIE_IFACE_TMR_ERR BIT(1) > +#define PCIE_CDM_CHK_ERR BIT(2) > +#define PCIE_AER_UNCORR_ERR BIT(3) > +#define PCIE_AER_CORR_ERR BIT(4) > +#define PCIE_RASDP_CORR_ERR BIT(5) > + > /* PARF_SYS_CTRL register fields */ > #define MAC_PHY_POWERDOWN_IN_P2_D_MUX_EN BIT(29) > #define MST_WAKEUP_EN BIT(13) > @@ -231,6 +299,24 @@ struct qcom_pcie_cfg { > const struct qcom_pcie_ops *ops; > }; > > +enum qcom_pcie_fault_code { > + RASDP_UNCORR_ERROR, /* RASDP uncorrectable error */ > + RASDP_CORR_ERROR, /* RASDP correctable error */ > + CDM_REG_CHK_ERROR, /* CDM register check error */ > + INTERFACE_TIMER_ERROR, /* PCIe local bus interface timer error */ > + PCIE_SPURIOUS_INT, /* Spurious Interrupt received */ > + MAX_PCIE_SAFETY_FAULT /* Maximum PCIe fault source code supported */ > +}; > + > +static const char * const pcie_fault_string[] = { > + "RASDP_Uncorr_Error", > + "RASDP_Corr_Error", > + "CDM_Reg_Chk_Error", > + "Interface_Timer_Error", > + "PCIe_Spurious_Interrupt", > + "TOTAL_PCIE_FAULTS", > +}; > + > struct qcom_pcie { > struct dw_pcie *pci; > void __iomem *parf; /* DT parf */ > @@ -243,6 +329,10 @@ struct qcom_pcie { > const struct qcom_pcie_cfg *cfg; > struct dentry *debugfs; > bool suspended; > + int global_irq; > + spinlock_t safety_lock; > + u32 pcie_fault[MAX_PCIE_SAFETY_FAULT]; > + u32 pcie_fault_total; > }; > > #define to_qcom_pcie(x) dev_get_drvdata((x)->dev) > @@ -959,9 +1049,94 @@ static int qcom_pcie_init_2_7_0(struct qcom_pcie *pcie) > return ret; > } > > +static void qcom_pcie_enable_error_reporting_2_7_0(struct qcom_pcie *pcie) > +{ > + struct dw_pcie *pci = pcie->pci; > + u32 val, offset; > + > + /* Clear all the interrupts before we enable it */ > + writel(0, pci->dbi_base + DBI_SAFETY_STATUS); > + writel(0, pci->dbi_base + DBI_INTERFACE_TIMER_STATUS); > + > + /* Enable interrupts which are aggregated using GLOBAL_INT */ > + writel(CFGPCIE_INT_ALL_STATUS_MASK, pcie->parf + PARF_INT_ALL_CLEAR); > + writel(CFGPCIE_PARF_INT_STATUS_MASK, pcie->parf + PARF_INT_CLEAR); > + writel(CFGPCIE_INT_ALL_2_STATUS_MASK, pcie->parf + PARF_INT_ALL_2_CLEAR); > + writel(CFGPCIE_INT_ALL_3_STATUS_MASK, pcie->parf + PARF_INT_ALL_3_CLEAR); > + writel(CFGPCIE_INT_ALL_4_STATUS_MASK, pcie->parf + PARF_INT_ALL_4_CLEAR); > + writel(CFGPCIE_INT_ALL_5_STATUS_MASK, pcie->parf + PARF_INT_ALL_5_CLEAR); > + > + dw_pcie_dbi_ro_wr_en(pci); > + > + val = readl(pci->dbi_base + PCI_INTERRUPT_LINE); > + val |= SERR_EN; > + writel(val, pci->dbi_base + PCI_INTERRUPT_LINE); > + > + val = readl(pci->dbi_base + DBI_DEVICE_CONTROL_DEVICE_STATUS); > + val |= (PCIE_CAP_CORR_ERR_REPORT_EN | PCIE_CAP_NON_FATAL_ERR_REPORT_EN | > + PCIE_CAP_FATAL_ERR_REPORT_EN | PCIE_CAP_UNSUPPORT_REQ_REP_EN); > + writel(val, pci->dbi_base + DBI_DEVICE_CONTROL_DEVICE_STATUS); > + > + val = readl(pci->dbi_base + DBI_ROOT_CONTROL_ROOT_CAPABILITIES_REG); > + val |= (PCIE_CAP_SYS_ERR_ON_CORR_ERR_EN | PCIE_CAP_SYS_ERR_ON_NON_FATAL_ERR_EN | > + PCIE_CAP_SYS_ERR_ON_FATAL_ERR_EN); > + writel(val, pci->dbi_base + DBI_ROOT_CONTROL_ROOT_CAPABILITIES_REG); > + > + dw_pcie_dbi_ro_wr_dis(pci); > + > + /* Enable RAS-DP Interrupts */ > + offset = dw_pcie_find_ext_capability(pci, PCI_EXT_CAP_RASDP_ID); > + val = readl(pci->dbi_base + offset + PCIE_RASDP_ERROR_MODE_EN_REG); > + val |= RASDP_ERROR_MODE_EN; > + writel(val, pci->dbi_base + PCIE_RASDP_ERROR_MODE_EN_REG); > + > + /* Enable CDM Check */ > + val = readl(pci->dbi_base + PCIE_PL_CHK_REG_CONTROL_STATUS); > + /* Enable continuous CMD register check mode */ > + val |= PCIE_PL_CHK_REG_CHK_REG_CONTINUOUS; > + /* Start the CDM register check */ > + val |= PCIE_PL_CHK_REG_CHK_REG_START; > + /* Enable comparison CDM register check mode */ > + val |= PCIE_PL_CHK_REG_CHK_REG_COMPARISON_ERROR; > + /* Enable logic CDM register check mode */ > + val |= PCIE_PL_CHK_REG_CHK_REG_LOGIC_ERROR; > + writel(val, pci->dbi_base + PCIE_PL_CHK_REG_CONTROL_STATUS); > + > + /* Interface Timer Enable */ > + val = readl(pci->dbi_base + PCIE_INTERFACE_TIMER_CONTROL); > + val |= (INTERFACE_TIMER_EN | INTERFACE_TIMER_AER_EN); > + writel(val, pci->dbi_base + PCIE_INTERFACE_TIMER_CONTROL); You defined DBI_* at pcie-designware.h. Supposed it should be same for all chips that use dwc controller. So I think you should put common code into pcie-designware.c to avoid duplicate the same logic at every chip vendor's driver. Frank > + > + /* Enable safety correctable and uncorrectable error reporting */ > + val = readl(pcie->parf + PARF_CFG_SAFETY_INT_MASK_CTRL); > + val |= (CFG_SAFETY_UNCORR_INT_MASK | CFG_SAFETY_CORR_INT_MASK); > + writel(val, pcie->parf + PARF_CFG_SAFETY_INT_MASK_CTRL); > + > + /* Enable CRC check and generation */ > + offset = dw_pcie_find_ext_capability(pci, PCIE_AER_EXT_CAP_ID); > + val = readl(pci->dbi_base + offset + DBI_ADV_ERR_CAP_CTRL_OFF); > + val |= (ECRC_GEN_EN | ECRC_CHECK_EN); > + writel(val, pci->dbi_base + offset + DBI_ADV_ERR_CAP_CTRL_OFF); > + > + /* Enable AER */ > + val = readl(pci->dbi_base + offset + DBI_ROOT_ERR_CMD_OFF); > + val |= (CORR_ERR_REPORTING_EN | NON_FATAL_ERR_REPORTING_EN > + | FATAL_ERR_REPORTING_EN); > + writel(val, pci->dbi_base + offset + DBI_ROOT_ERR_CMD_OFF); > + > + /* Enable interrupts */ > + val = readl(pci->dbi_base + DBI_SAFETY_MASK_OFF); > + val &= ~(SAFETY_INT_MASK); > + writel(val, pci->dbi_base + DBI_SAFETY_MASK_OFF); > + > + /* Disable Legacy Interrupts */ > + writel(0, pcie->parf + PARF_INT_ALL_MASK); > +} > + > static int qcom_pcie_post_init_2_7_0(struct qcom_pcie *pcie) > { > qcom_pcie_clear_hpc(pcie->pci); > + qcom_pcie_enable_error_reporting_2_7_0(pcie); > > return 0; > } > @@ -1416,6 +1591,130 @@ static void qcom_pcie_icc_update(struct qcom_pcie *pcie) > } > } > > +static void qcom_pcie_check_spurious_int(struct qcom_pcie *pcie) > +{ > + struct dw_pcie *pci = pcie->pci; > + u32 *pcie_fault = pcie->pcie_fault; > + struct device *dev = pci->dev; > + struct kobject *kobj_ref = &dev->kobj; > + u32 val; > + > + val = readl(pci->dbi_base + DBI_INTERFACE_TIMER_STATUS); > + if (val & CFGPCIE_INTERFACE_TIMER_STATUS_MASK) > + return; > + > + val = readl(pcie->parf + PARF_INT_ALL_STATUS); > + if (val & CFGPCIE_INT_ALL_STATUS_MASK) > + return; > + > + val = readl(pcie->parf + PARF_INT_STATUS); > + if (val & CFGPCIE_PARF_INT_STATUS_MASK) > + return; > + > + val = readl(pcie->parf + PARF_INT_ALL_2_STATUS); > + if (val & CFGPCIE_INT_ALL_2_STATUS_MASK) > + return; > + > + val = readl(pcie->parf + PARF_INT_ALL_3_STATUS); > + if (val & CFGPCIE_INT_ALL_3_STATUS_MASK) > + return; > + > + val = readl(pcie->parf + PARF_INT_ALL_4_STATUS); > + if (val & CFGPCIE_INT_ALL_4_STATUS_MASK) > + return; > + > + val = readl(pcie->parf + PARF_INT_ALL_5_STATUS); > + if (val & CFGPCIE_INT_ALL_5_STATUS_MASK) > + return; > + > + dev_err(pci->dev, "PCIe Spurious Interrupt"); > + pcie_fault[PCIE_SPURIOUS_INT]++; > + pcie->pcie_fault_total++; > + sysfs_notify(kobj_ref, NULL, "qcom_pcie_error_report"); > +} > + > +static irqreturn_t qcom_pcie_global_irq_thread(int irq, void *data) > +{ > + struct qcom_pcie *pcie = data; > + struct dw_pcie *pci = pcie->pci; > + u32 *pcie_fault = pcie->pcie_fault; > + struct device *dev = pci->dev; > + struct kobject *kobj_ref = &dev->kobj; > + unsigned long irqsave_flags; > + u32 val, int_status; > + > + spin_lock_irqsave(&pcie->safety_lock, irqsave_flags); > + > + int_status = readl(pci->dbi_base + DBI_SAFETY_STATUS); > + writel(0, pci->dbi_base + DBI_SAFETY_STATUS); > + > + if (int_status) { > + dev_err(pci->dev, "global interrupt fired status: %u", int_status); > + > + if (int_status & PCIE_RASDP_UNCORR_ERR) { > + dev_err(pci->dev, "RASDP uncorrectable error triggered"); > + pcie_fault[RASDP_UNCORR_ERROR]++; > + pcie->pcie_fault_total++; > + sysfs_notify(kobj_ref, NULL, "qcom_pcie_error_report"); > + > + /* > + * rasdp_uncorr_err ends up triggering a > + * pcie_uncorr error continuously. So masking > + * pcie_uncorr interrupts . > + */ > + val = readl(pci->dbi_base + DBI_SAFETY_MASK_OFF); > + val |= PCIE_AER_UNCORR_ERR; > + writel(val, pci->dbi_base + DBI_SAFETY_MASK_OFF); > + } > + > + if (int_status & PCIE_CDM_CHK_ERR) { > + dev_err(pci->dev, "CDM error triggered"); > + val = readl(pci->dbi_base + PCIE_PL_CHK_REG_CONTROL_STATUS); > + > + if (val & PCIE_PL_CHK_REG_CHK_REG_COMPARISON_ERROR) { > + pcie_fault[CDM_REG_CHK_ERROR]++; > + pcie->pcie_fault_total++; > + sysfs_notify(kobj_ref, NULL, "qcom_pcie_error_report"); > + > + /* > + * cdm_chk_err injection results in a continuous > + * interrupt storm on certain targets, so masking it. > + */ > + val = readl(pci->dbi_base + DBI_SAFETY_MASK_OFF); > + val |= (PCIE_CDM_CHK_ERR | PCIE_AER_UNCORR_ERR); > + writel(val, pci->dbi_base + DBI_SAFETY_MASK_OFF); > + } > + } > + > + if (int_status & PCIE_IFACE_TMR_ERR) { > + dev_err(pci->dev, "Iface Timeout error triggered"); > + pcie_fault[INTERFACE_TIMER_ERROR]++; > + pcie->pcie_fault_total++; > + sysfs_notify(kobj_ref, NULL, "qcom_pcie_error_report"); > + > + /* > + * interface_timer_err injection results in a continuous > + * interrupt storm on certain targets, so masking it. > + */ > + val = readl(pci->dbi_base + DBI_SAFETY_MASK_OFF); > + val |= (PCIE_IFACE_TMR_ERR | PCIE_AER_UNCORR_ERR); > + writel(val, pci->dbi_base + DBI_SAFETY_MASK_OFF); > + } > + > + if (int_status & PCIE_RASDP_CORR_ERR) { > + dev_err(pci->dev, "RASDP correctable error triggered"); > + pcie_fault[RASDP_CORR_ERROR]++; > + pcie->pcie_fault_total++; > + sysfs_notify(kobj_ref, NULL, "qcom_pcie_error_report"); > + } > + } else { > + qcom_pcie_check_spurious_int(pcie); > + } > + > + spin_unlock_irqrestore(&pcie->safety_lock, irqsave_flags); > + return IRQ_HANDLED; > +} > + > static int qcom_pcie_link_transition_count(struct seq_file *s, void *data) > { > struct qcom_pcie *pcie = (struct qcom_pcie *)dev_get_drvdata(s->private); > @@ -1438,6 +1737,40 @@ static int qcom_pcie_link_transition_count(struct seq_file *s, void *data) > return 0; > } > > +static ssize_t qcom_pcie_error_report_show(struct device *dev, > + struct device_attribute *attr, > + char *buf) > +{ > + unsigned int i; > + struct qcom_pcie *pcie = (struct qcom_pcie *)dev_get_drvdata(dev); > + u32 *pcie_fault = pcie->pcie_fault; > + size_t len = 0; > + > + for (i = 0; i < MAX_PCIE_SAFETY_FAULT; i++) { > + if (pcie_fault_string[i]) > + len += sysfs_emit_at(buf, len, "%s: %lu\n", > + pcie_fault_string[i], > + pcie_fault[i]); > + } > + > + len += sysfs_emit_at(buf, len, "%s: %lu\n", > + pcie_fault_string[i], > + pcie->pcie_fault_total); > + > + return len; > +} > +static DEVICE_ATTR_RO(qcom_pcie_error_report); > + > +static struct attribute *qcom_pcie_attrs[] = { > + &dev_attr_qcom_pcie_error_report.attr, > + NULL, > +}; > + > +static const struct attribute_group qcom_pcie_attribute_group = { > + .attrs = qcom_pcie_attrs, > + .name = "qcom_pcie" > +}; > + > static void qcom_pcie_init_debugfs(struct qcom_pcie *pcie) > { > struct dw_pcie *pci = pcie->pci; > @@ -1496,6 +1829,21 @@ static int qcom_pcie_probe(struct platform_device *pdev) > goto err_pm_runtime_put; > } > > + pcie->global_irq = platform_get_irq_byname(pdev, "global"); > + if (pcie->global_irq < 0) { > + ret = pcie->global_irq; > + goto err_pm_runtime_put; > + } > + > + ret = devm_request_threaded_irq(dev, pcie->global_irq, NULL, > + qcom_pcie_global_irq_thread, > + IRQF_ONESHOT, > + "global_irq", pcie); > + if (ret) { > + dev_err(dev, "Failed to request Global IRQ\n"); > + goto err_pm_runtime_put; > + } > + > pcie->parf = devm_platform_ioremap_resource_byname(pdev, "parf"); > if (IS_ERR(pcie->parf)) { > ret = PTR_ERR(pcie->parf); > @@ -1551,6 +1899,8 @@ static int qcom_pcie_probe(struct platform_device *pdev) > if (pcie->mhi) > qcom_pcie_init_debugfs(pcie); > > + sysfs_create_group(&pdev->dev.kobj, &qcom_pcie_attribute_group); > + > return 0; > > err_phy_exit: > -- > 2.40.1 >
Hi root, kernel test robot noticed the following build warnings: [auto build test WARNING on pci/next] [also build test WARNING on pci/for-linus robh/for-next mani-mhi/mhi-next linus/master v6.8-rc5 next-20240221] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/root/dt-bindings-PCI-qcom-Add-global-irq-support-for-SA8775p/20240221-220722 base: https://git.kernel.org/pub/scm/linux/kernel/git/pci/pci.git next patch link: https://lore.kernel.org/r/20240221140405.28532-4-root%40hu-msarkar-hyd.qualcomm.com patch subject: [PATCH v1 3/3] PCI: qcom: Add support for detecting controller level PCIe errors config: microblaze-allmodconfig (https://download.01.org/0day-ci/archive/20240222/202402221838.5n7vo0Jo-lkp@intel.com/config) compiler: microblaze-linux-gcc (GCC) 13.2.0 reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240222/202402221838.5n7vo0Jo-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202402221838.5n7vo0Jo-lkp@intel.com/ All warnings (new ones prefixed by >>): drivers/pci/controller/dwc/pcie-qcom.c: In function 'qcom_pcie_error_report_show': >> drivers/pci/controller/dwc/pcie-qcom.c:1751:63: warning: format '%lu' expects argument of type 'long unsigned int', but argument 5 has type 'u32' {aka 'unsigned int'} [-Wformat=] 1751 | len += sysfs_emit_at(buf, len, "%s: %lu\n", | ~~^ | | | long unsigned int | %u 1752 | pcie_fault_string[i], 1753 | pcie_fault[i]); | ~~~~~~~~~~~~~ | | | u32 {aka unsigned int} drivers/pci/controller/dwc/pcie-qcom.c:1756:47: warning: format '%lu' expects argument of type 'long unsigned int', but argument 5 has type 'u32' {aka 'unsigned int'} [-Wformat=] 1756 | len += sysfs_emit_at(buf, len, "%s: %lu\n", | ~~^ | | | long unsigned int | %u 1757 | pcie_fault_string[i], 1758 | pcie->pcie_fault_total); | ~~~~~~~~~~~~~~~~~~~~~~ | | | u32 {aka unsigned int} drivers/pci/controller/dwc/pcie-qcom.c: In function 'qcom_pcie_probe': >> drivers/pci/controller/dwc/pcie-qcom.c:1902:9: warning: ignoring return value of 'sysfs_create_group' declared with attribute 'warn_unused_result' [-Wunused-result] 1902 | sysfs_create_group(&pdev->dev.kobj, &qcom_pcie_attribute_group); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ vim +1751 drivers/pci/controller/dwc/pcie-qcom.c 1739 1740 static ssize_t qcom_pcie_error_report_show(struct device *dev, 1741 struct device_attribute *attr, 1742 char *buf) 1743 { 1744 unsigned int i; 1745 struct qcom_pcie *pcie = (struct qcom_pcie *)dev_get_drvdata(dev); 1746 u32 *pcie_fault = pcie->pcie_fault; 1747 size_t len = 0; 1748 1749 for (i = 0; i < MAX_PCIE_SAFETY_FAULT; i++) { 1750 if (pcie_fault_string[i]) > 1751 len += sysfs_emit_at(buf, len, "%s: %lu\n", 1752 pcie_fault_string[i], 1753 pcie_fault[i]); 1754 } 1755 1756 len += sysfs_emit_at(buf, len, "%s: %lu\n", 1757 pcie_fault_string[i], 1758 pcie->pcie_fault_total); 1759 1760 return len; 1761 } 1762 static DEVICE_ATTR_RO(qcom_pcie_error_report); 1763 1764 static struct attribute *qcom_pcie_attrs[] = { 1765 &dev_attr_qcom_pcie_error_report.attr, 1766 NULL, 1767 }; 1768 1769 static const struct attribute_group qcom_pcie_attribute_group = { 1770 .attrs = qcom_pcie_attrs, 1771 .name = "qcom_pcie" 1772 }; 1773 1774 static void qcom_pcie_init_debugfs(struct qcom_pcie *pcie) 1775 { 1776 struct dw_pcie *pci = pcie->pci; 1777 struct device *dev = pci->dev; 1778 char *name; 1779 1780 name = devm_kasprintf(dev, GFP_KERNEL, "%pOFP", dev->of_node); 1781 if (!name) 1782 return; 1783 1784 pcie->debugfs = debugfs_create_dir(name, NULL); 1785 debugfs_create_devm_seqfile(dev, "link_transition_count", pcie->debugfs, 1786 qcom_pcie_link_transition_count); 1787 } 1788 1789 static int qcom_pcie_probe(struct platform_device *pdev) 1790 { 1791 const struct qcom_pcie_cfg *pcie_cfg; 1792 struct device *dev = &pdev->dev; 1793 struct qcom_pcie *pcie; 1794 struct dw_pcie_rp *pp; 1795 struct resource *res; 1796 struct dw_pcie *pci; 1797 int ret; 1798 1799 pcie_cfg = of_device_get_match_data(dev); 1800 if (!pcie_cfg || !pcie_cfg->ops) { 1801 dev_err(dev, "Invalid platform data\n"); 1802 return -EINVAL; 1803 } 1804 1805 pcie = devm_kzalloc(dev, sizeof(*pcie), GFP_KERNEL); 1806 if (!pcie) 1807 return -ENOMEM; 1808 1809 pci = devm_kzalloc(dev, sizeof(*pci), GFP_KERNEL); 1810 if (!pci) 1811 return -ENOMEM; 1812 1813 pm_runtime_enable(dev); 1814 ret = pm_runtime_get_sync(dev); 1815 if (ret < 0) 1816 goto err_pm_runtime_put; 1817 1818 pci->dev = dev; 1819 pci->ops = &dw_pcie_ops; 1820 pp = &pci->pp; 1821 1822 pcie->pci = pci; 1823 1824 pcie->cfg = pcie_cfg; 1825 1826 pcie->reset = devm_gpiod_get_optional(dev, "perst", GPIOD_OUT_HIGH); 1827 if (IS_ERR(pcie->reset)) { 1828 ret = PTR_ERR(pcie->reset); 1829 goto err_pm_runtime_put; 1830 } 1831 1832 pcie->global_irq = platform_get_irq_byname(pdev, "global"); 1833 if (pcie->global_irq < 0) { 1834 ret = pcie->global_irq; 1835 goto err_pm_runtime_put; 1836 } 1837 1838 ret = devm_request_threaded_irq(dev, pcie->global_irq, NULL, 1839 qcom_pcie_global_irq_thread, 1840 IRQF_ONESHOT, 1841 "global_irq", pcie); 1842 if (ret) { 1843 dev_err(dev, "Failed to request Global IRQ\n"); 1844 goto err_pm_runtime_put; 1845 } 1846 1847 pcie->parf = devm_platform_ioremap_resource_byname(pdev, "parf"); 1848 if (IS_ERR(pcie->parf)) { 1849 ret = PTR_ERR(pcie->parf); 1850 goto err_pm_runtime_put; 1851 } 1852 1853 pcie->elbi = devm_platform_ioremap_resource_byname(pdev, "elbi"); 1854 if (IS_ERR(pcie->elbi)) { 1855 ret = PTR_ERR(pcie->elbi); 1856 goto err_pm_runtime_put; 1857 } 1858 1859 /* MHI region is optional */ 1860 res = platform_get_resource_byname(pdev, IORESOURCE_MEM, "mhi"); 1861 if (res) { 1862 pcie->mhi = devm_ioremap_resource(dev, res); 1863 if (IS_ERR(pcie->mhi)) { 1864 ret = PTR_ERR(pcie->mhi); 1865 goto err_pm_runtime_put; 1866 } 1867 } 1868 1869 pcie->phy = devm_phy_optional_get(dev, "pciephy"); 1870 if (IS_ERR(pcie->phy)) { 1871 ret = PTR_ERR(pcie->phy); 1872 goto err_pm_runtime_put; 1873 } 1874 1875 ret = qcom_pcie_icc_init(pcie); 1876 if (ret) 1877 goto err_pm_runtime_put; 1878 1879 ret = pcie->cfg->ops->get_resources(pcie); 1880 if (ret) 1881 goto err_pm_runtime_put; 1882 1883 pp->ops = &qcom_pcie_dw_ops; 1884 1885 ret = phy_init(pcie->phy); 1886 if (ret) 1887 goto err_pm_runtime_put; 1888 1889 platform_set_drvdata(pdev, pcie); 1890 1891 ret = dw_pcie_host_init(pp); 1892 if (ret) { 1893 dev_err(dev, "cannot initialize host\n"); 1894 goto err_phy_exit; 1895 } 1896 1897 qcom_pcie_icc_update(pcie); 1898 1899 if (pcie->mhi) 1900 qcom_pcie_init_debugfs(pcie); 1901 > 1902 sysfs_create_group(&pdev->dev.kobj, &qcom_pcie_attribute_group); 1903 1904 return 0; 1905 1906 err_phy_exit: 1907 phy_exit(pcie->phy); 1908 err_pm_runtime_put: 1909 pm_runtime_put(dev); 1910 pm_runtime_disable(dev); 1911 1912 return ret; 1913 } 1914
On Wed, Feb 21, 2024 at 12:50:17PM -0600, Bjorn Helgaas wrote: > On Wed, Feb 21, 2024 at 07:34:04PM +0530, root wrote: > > From: Nitesh Gupta <nitegupt@quicinc.com> > > > > Synopsys Controllers provide capabilities to detect various controller > > level errors. These can range from controller interface error to random > > PCIe configuration errors. This patch intends to add support to detect > > these errors and report it to userspace entity via sysfs, which can take > > appropriate actions to mitigate the errors. > > +static void qcom_pcie_enable_error_reporting_2_7_0(struct qcom_pcie *pcie) > > +{ > > + ... > > > + val = readl(pci->dbi_base + DBI_DEVICE_CONTROL_DEVICE_STATUS); > > + val |= (PCIE_CAP_CORR_ERR_REPORT_EN | PCIE_CAP_NON_FATAL_ERR_REPORT_EN | > > + PCIE_CAP_FATAL_ERR_REPORT_EN | PCIE_CAP_UNSUPPORT_REQ_REP_EN); > > + writel(val, pci->dbi_base + DBI_DEVICE_CONTROL_DEVICE_STATUS); > > Is there any way to split the AER part (specified by the PCIe spec) > from the qcom-specific (or dwc-specific) part? This looks an awful > lot like pci_enable_pcie_error_reporting(), and we should do this in > the PCI core in a generic way if possible. > > > + val = readl(pci->dbi_base + DBI_ROOT_CONTROL_ROOT_CAPABILITIES_REG); > > + val |= (PCIE_CAP_SYS_ERR_ON_CORR_ERR_EN | PCIE_CAP_SYS_ERR_ON_NON_FATAL_ERR_EN | > > + PCIE_CAP_SYS_ERR_ON_FATAL_ERR_EN); > > + writel(val, pci->dbi_base + DBI_ROOT_CONTROL_ROOT_CAPABILITIES_REG); More to the point: why do we need to do this in the qcom driver at all? Why is pci_enable_pcie_error_reporting() not enough? Bjorn
Hi root, kernel test robot noticed the following build warnings: [auto build test WARNING on pci/next] [also build test WARNING on pci/for-linus robh/for-next mani-mhi/mhi-next linus/master v6.8-rc5 next-20240221] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/root/dt-bindings-PCI-qcom-Add-global-irq-support-for-SA8775p/20240221-220722 base: https://git.kernel.org/pub/scm/linux/kernel/git/pci/pci.git next patch link: https://lore.kernel.org/r/20240221140405.28532-4-root%40hu-msarkar-hyd.qualcomm.com patch subject: [PATCH v1 3/3] PCI: qcom: Add support for detecting controller level PCIe errors config: i386-buildonly-randconfig-005-20240222 (https://download.01.org/0day-ci/archive/20240222/202402222044.JXTN6nr0-lkp@intel.com/config) compiler: clang version 17.0.6 (https://github.com/llvm/llvm-project 6009708b4367171ccdbf4b5905cb6a803753fe18) reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240222/202402222044.JXTN6nr0-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202402222044.JXTN6nr0-lkp@intel.com/ All warnings (new ones prefixed by >>): >> drivers/pci/controller/dwc/pcie-qcom.c:1753:6: warning: format specifies type 'unsigned long' but the argument has type 'u32' (aka 'unsigned int') [-Wformat] 1751 | len += sysfs_emit_at(buf, len, "%s: %lu\n", | ~~~ | %u 1752 | pcie_fault_string[i], 1753 | pcie_fault[i]); | ^~~~~~~~~~~~~ drivers/pci/controller/dwc/pcie-qcom.c:1758:6: warning: format specifies type 'unsigned long' but the argument has type 'u32' (aka 'unsigned int') [-Wformat] 1756 | len += sysfs_emit_at(buf, len, "%s: %lu\n", | ~~~ | %u 1757 | pcie_fault_string[i], 1758 | pcie->pcie_fault_total); | ^~~~~~~~~~~~~~~~~~~~~~ >> drivers/pci/controller/dwc/pcie-qcom.c:1902:2: warning: ignoring return value of function declared with 'warn_unused_result' attribute [-Wunused-result] 1902 | sysfs_create_group(&pdev->dev.kobj, &qcom_pcie_attribute_group); | ^~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 3 warnings generated. vim +1753 drivers/pci/controller/dwc/pcie-qcom.c 1739 1740 static ssize_t qcom_pcie_error_report_show(struct device *dev, 1741 struct device_attribute *attr, 1742 char *buf) 1743 { 1744 unsigned int i; 1745 struct qcom_pcie *pcie = (struct qcom_pcie *)dev_get_drvdata(dev); 1746 u32 *pcie_fault = pcie->pcie_fault; 1747 size_t len = 0; 1748 1749 for (i = 0; i < MAX_PCIE_SAFETY_FAULT; i++) { 1750 if (pcie_fault_string[i]) 1751 len += sysfs_emit_at(buf, len, "%s: %lu\n", 1752 pcie_fault_string[i], > 1753 pcie_fault[i]); 1754 } 1755 1756 len += sysfs_emit_at(buf, len, "%s: %lu\n", 1757 pcie_fault_string[i], > 1758 pcie->pcie_fault_total); 1759 1760 return len; 1761 } 1762 static DEVICE_ATTR_RO(qcom_pcie_error_report); 1763 1764 static struct attribute *qcom_pcie_attrs[] = { 1765 &dev_attr_qcom_pcie_error_report.attr, 1766 NULL, 1767 }; 1768 1769 static const struct attribute_group qcom_pcie_attribute_group = { 1770 .attrs = qcom_pcie_attrs, 1771 .name = "qcom_pcie" 1772 }; 1773 1774 static void qcom_pcie_init_debugfs(struct qcom_pcie *pcie) 1775 { 1776 struct dw_pcie *pci = pcie->pci; 1777 struct device *dev = pci->dev; 1778 char *name; 1779 1780 name = devm_kasprintf(dev, GFP_KERNEL, "%pOFP", dev->of_node); 1781 if (!name) 1782 return; 1783 1784 pcie->debugfs = debugfs_create_dir(name, NULL); 1785 debugfs_create_devm_seqfile(dev, "link_transition_count", pcie->debugfs, 1786 qcom_pcie_link_transition_count); 1787 } 1788 1789 static int qcom_pcie_probe(struct platform_device *pdev) 1790 { 1791 const struct qcom_pcie_cfg *pcie_cfg; 1792 struct device *dev = &pdev->dev; 1793 struct qcom_pcie *pcie; 1794 struct dw_pcie_rp *pp; 1795 struct resource *res; 1796 struct dw_pcie *pci; 1797 int ret; 1798 1799 pcie_cfg = of_device_get_match_data(dev); 1800 if (!pcie_cfg || !pcie_cfg->ops) { 1801 dev_err(dev, "Invalid platform data\n"); 1802 return -EINVAL; 1803 } 1804 1805 pcie = devm_kzalloc(dev, sizeof(*pcie), GFP_KERNEL); 1806 if (!pcie) 1807 return -ENOMEM; 1808 1809 pci = devm_kzalloc(dev, sizeof(*pci), GFP_KERNEL); 1810 if (!pci) 1811 return -ENOMEM; 1812 1813 pm_runtime_enable(dev); 1814 ret = pm_runtime_get_sync(dev); 1815 if (ret < 0) 1816 goto err_pm_runtime_put; 1817 1818 pci->dev = dev; 1819 pci->ops = &dw_pcie_ops; 1820 pp = &pci->pp; 1821 1822 pcie->pci = pci; 1823 1824 pcie->cfg = pcie_cfg; 1825 1826 pcie->reset = devm_gpiod_get_optional(dev, "perst", GPIOD_OUT_HIGH); 1827 if (IS_ERR(pcie->reset)) { 1828 ret = PTR_ERR(pcie->reset); 1829 goto err_pm_runtime_put; 1830 } 1831 1832 pcie->global_irq = platform_get_irq_byname(pdev, "global"); 1833 if (pcie->global_irq < 0) { 1834 ret = pcie->global_irq; 1835 goto err_pm_runtime_put; 1836 } 1837 1838 ret = devm_request_threaded_irq(dev, pcie->global_irq, NULL, 1839 qcom_pcie_global_irq_thread, 1840 IRQF_ONESHOT, 1841 "global_irq", pcie); 1842 if (ret) { 1843 dev_err(dev, "Failed to request Global IRQ\n"); 1844 goto err_pm_runtime_put; 1845 } 1846 1847 pcie->parf = devm_platform_ioremap_resource_byname(pdev, "parf"); 1848 if (IS_ERR(pcie->parf)) { 1849 ret = PTR_ERR(pcie->parf); 1850 goto err_pm_runtime_put; 1851 } 1852 1853 pcie->elbi = devm_platform_ioremap_resource_byname(pdev, "elbi"); 1854 if (IS_ERR(pcie->elbi)) { 1855 ret = PTR_ERR(pcie->elbi); 1856 goto err_pm_runtime_put; 1857 } 1858 1859 /* MHI region is optional */ 1860 res = platform_get_resource_byname(pdev, IORESOURCE_MEM, "mhi"); 1861 if (res) { 1862 pcie->mhi = devm_ioremap_resource(dev, res); 1863 if (IS_ERR(pcie->mhi)) { 1864 ret = PTR_ERR(pcie->mhi); 1865 goto err_pm_runtime_put; 1866 } 1867 } 1868 1869 pcie->phy = devm_phy_optional_get(dev, "pciephy"); 1870 if (IS_ERR(pcie->phy)) { 1871 ret = PTR_ERR(pcie->phy); 1872 goto err_pm_runtime_put; 1873 } 1874 1875 ret = qcom_pcie_icc_init(pcie); 1876 if (ret) 1877 goto err_pm_runtime_put; 1878 1879 ret = pcie->cfg->ops->get_resources(pcie); 1880 if (ret) 1881 goto err_pm_runtime_put; 1882 1883 pp->ops = &qcom_pcie_dw_ops; 1884 1885 ret = phy_init(pcie->phy); 1886 if (ret) 1887 goto err_pm_runtime_put; 1888 1889 platform_set_drvdata(pdev, pcie); 1890 1891 ret = dw_pcie_host_init(pp); 1892 if (ret) { 1893 dev_err(dev, "cannot initialize host\n"); 1894 goto err_phy_exit; 1895 } 1896 1897 qcom_pcie_icc_update(pcie); 1898 1899 if (pcie->mhi) 1900 qcom_pcie_init_debugfs(pcie); 1901 > 1902 sysfs_create_group(&pdev->dev.kobj, &qcom_pcie_attribute_group); 1903 1904 return 0; 1905 1906 err_phy_exit: 1907 phy_exit(pcie->phy); 1908 err_pm_runtime_put: 1909 pm_runtime_put(dev); 1910 pm_runtime_disable(dev); 1911 1912 return ret; 1913 } 1914
diff --git a/drivers/pci/controller/dwc/pcie-designware.h b/drivers/pci/controller/dwc/pcie-designware.h index 26dae4837462..cd45f9a2f9bc 100644 --- a/drivers/pci/controller/dwc/pcie-designware.h +++ b/drivers/pci/controller/dwc/pcie-designware.h @@ -223,6 +223,32 @@ #define PCIE_RAS_DES_EVENT_COUNTER_DATA 0xc + +/* + * Error Reporting DBI register + */ +#define DBI_DEVICE_CONTROL_DEVICE_STATUS 0x78 +#define DBI_ROOT_CONTROL_ROOT_CAPABILITIES_REG 0x8c +#define DBI_INTERFACE_TIMER_STATUS 0x938 +#define DBI_SAFETY_MASK_OFF 0x960 +#define DBI_SAFETY_STATUS 0x964 + +#define DBI_ADV_ERR_CAP_CTRL_OFF 0x18 +#define DBI_ROOT_ERR_CMD_OFF 0x2c + +/* + * RAS-DP register + */ +#define PCIE_RASDP_ERROR_MODE_EN_REG 0x28 +#define RASDP_ERROR_MODE_EN BIT(0) + +/* + * Interface Timer register + */ +#define PCIE_INTERFACE_TIMER_CONTROL 0x930 +#define INTERFACE_TIMER_EN BIT(0) +#define INTERFACE_TIMER_AER_EN BIT(1) + /* * The default address offset between dbi_base and atu_base. Root controller * drivers are not required to initialize atu_base if the offset matches this diff --git a/drivers/pci/controller/dwc/pcie-qcom.c b/drivers/pci/controller/dwc/pcie-qcom.c index 10f2d0bb86be..138e3b08d4b9 100644 --- a/drivers/pci/controller/dwc/pcie-qcom.c +++ b/drivers/pci/controller/dwc/pcie-qcom.c @@ -20,6 +20,7 @@ #include <linux/kernel.h> #include <linux/init.h> #include <linux/of.h> +#include <linux/device.h> #include <linux/of_gpio.h> #include <linux/pci.h> #include <linux/pm_runtime.h> @@ -68,6 +69,73 @@ #define PARF_DEBUG_CNT_AUX_CLK_IN_L1SUB_L1 0xc84 #define PARF_DEBUG_CNT_AUX_CLK_IN_L1SUB_L2 0xc88 +/* Error Reporting Parf Registers */ +#define PARF_INT_ALL_STATUS 0x224 +#define PARF_INT_ALL_CLEAR 0x228 +#define PARF_INT_CLEAR 0x21c +#define PARF_INT_STATUS 0x220 +#define PARF_INT_ALL_MASK 0x22c +#define PARF_INT_ALL_2_CLEAR 0x504 +#define PARF_INT_ALL_2_STATUS 0x500 +#define PARF_INT_ALL_3_CLEAR 0x2e14 +#define PARF_INT_ALL_3_STATUS 0x2e10 +#define PARF_INT_ALL_4_CLEAR 0x2dd8 +#define PARF_INT_ALL_4_STATUS 0x2dd0 +#define PARF_INT_ALL_5_CLEAR 0x2ddc +#define PARF_INT_ALL_5_STATUS 0x2dd4 +#define PARF_CFG_SAFETY_INT_MASK_CTRL 0x2c60 + + +#define PCIE_AER_EXT_CAP_ID 0x01 +#define PCI_EXT_CAP_RASDP_ID 0x0b + +/* Interrupt Masks */ +#define CFGPCIE_INT_ALL_STATUS_MASK 0x3ff3e +#define CFGPCIE_PARF_INT_STATUS_MASK 0x1b +#define CFGPCIE_INTERFACE_TIMER_STATUS_MASK 0xe7b +#define CFGPCIE_INT_ALL_2_STATUS_MASK GENMASK(24, 0) +#define CFGPCIE_INT_ALL_3_STATUS_MASK GENMASK(31, 0) +#define CFGPCIE_INT_ALL_4_STATUS_MASK GENMASK(31, 0) +#define CFGPCIE_INT_ALL_5_STATUS_MASK GENMASK(31, 0) + +/* PCI_INTERRUPT_LINE register field */ +#define SERR_EN BIT(17) + +/* DBI_ROOT_CONTROL_ROOT_CAPABILITIES_REG register fields */ +#define PCIE_CAP_SYS_ERR_ON_CORR_ERR_EN BIT(0) +#define PCIE_CAP_SYS_ERR_ON_NON_FATAL_ERR_EN BIT(1) +#define PCIE_CAP_SYS_ERR_ON_FATAL_ERR_EN BIT(2) + +/* DBI_DEVICE_CONTROL_DEVICE_STATUS register fields */ +#define PCIE_CAP_UNSUPPORT_REQ_REP_EN BIT(3) +#define PCIE_CAP_FATAL_ERR_REPORT_EN BIT(2) +#define PCIE_CAP_NON_FATAL_ERR_REPORT_EN BIT(1) +#define PCIE_CAP_CORR_ERR_REPORT_EN BIT(0) + +/* PARF_CFG_SAFETY_INT_MASK_CTRL register fields */ +#define CFG_SAFETY_UNCORR_INT_MASK BIT(0) +#define CFG_SAFETY_CORR_INT_MASK BIT(1) + +/* DBI_ADV_ERR_CAP_CTRL_OFF register fields */ +#define ECRC_GEN_EN BIT(6) +#define ECRC_CHECK_EN BIT(8) + +/* DBI_ROOT_ERR_CMD_OFF register fields */ +#define CORR_ERR_REPORTING_EN BIT(0) +#define NON_FATAL_ERR_REPORTING_EN BIT(1) +#define FATAL_ERR_REPORTING_EN BIT(2) + +/* DBI_SAFETY_MASK_OFF register fields */ +#define SAFETY_INT_MASK GENMASK(5, 0) + +/* DBI_SAFETY_STATUS register fields */ +#define PCIE_RASDP_UNCORR_ERR BIT(0) +#define PCIE_IFACE_TMR_ERR BIT(1) +#define PCIE_CDM_CHK_ERR BIT(2) +#define PCIE_AER_UNCORR_ERR BIT(3) +#define PCIE_AER_CORR_ERR BIT(4) +#define PCIE_RASDP_CORR_ERR BIT(5) + /* PARF_SYS_CTRL register fields */ #define MAC_PHY_POWERDOWN_IN_P2_D_MUX_EN BIT(29) #define MST_WAKEUP_EN BIT(13) @@ -231,6 +299,24 @@ struct qcom_pcie_cfg { const struct qcom_pcie_ops *ops; }; +enum qcom_pcie_fault_code { + RASDP_UNCORR_ERROR, /* RASDP uncorrectable error */ + RASDP_CORR_ERROR, /* RASDP correctable error */ + CDM_REG_CHK_ERROR, /* CDM register check error */ + INTERFACE_TIMER_ERROR, /* PCIe local bus interface timer error */ + PCIE_SPURIOUS_INT, /* Spurious Interrupt received */ + MAX_PCIE_SAFETY_FAULT /* Maximum PCIe fault source code supported */ +}; + +static const char * const pcie_fault_string[] = { + "RASDP_Uncorr_Error", + "RASDP_Corr_Error", + "CDM_Reg_Chk_Error", + "Interface_Timer_Error", + "PCIe_Spurious_Interrupt", + "TOTAL_PCIE_FAULTS", +}; + struct qcom_pcie { struct dw_pcie *pci; void __iomem *parf; /* DT parf */ @@ -243,6 +329,10 @@ struct qcom_pcie { const struct qcom_pcie_cfg *cfg; struct dentry *debugfs; bool suspended; + int global_irq; + spinlock_t safety_lock; + u32 pcie_fault[MAX_PCIE_SAFETY_FAULT]; + u32 pcie_fault_total; }; #define to_qcom_pcie(x) dev_get_drvdata((x)->dev) @@ -959,9 +1049,94 @@ static int qcom_pcie_init_2_7_0(struct qcom_pcie *pcie) return ret; } +static void qcom_pcie_enable_error_reporting_2_7_0(struct qcom_pcie *pcie) +{ + struct dw_pcie *pci = pcie->pci; + u32 val, offset; + + /* Clear all the interrupts before we enable it */ + writel(0, pci->dbi_base + DBI_SAFETY_STATUS); + writel(0, pci->dbi_base + DBI_INTERFACE_TIMER_STATUS); + + /* Enable interrupts which are aggregated using GLOBAL_INT */ + writel(CFGPCIE_INT_ALL_STATUS_MASK, pcie->parf + PARF_INT_ALL_CLEAR); + writel(CFGPCIE_PARF_INT_STATUS_MASK, pcie->parf + PARF_INT_CLEAR); + writel(CFGPCIE_INT_ALL_2_STATUS_MASK, pcie->parf + PARF_INT_ALL_2_CLEAR); + writel(CFGPCIE_INT_ALL_3_STATUS_MASK, pcie->parf + PARF_INT_ALL_3_CLEAR); + writel(CFGPCIE_INT_ALL_4_STATUS_MASK, pcie->parf + PARF_INT_ALL_4_CLEAR); + writel(CFGPCIE_INT_ALL_5_STATUS_MASK, pcie->parf + PARF_INT_ALL_5_CLEAR); + + dw_pcie_dbi_ro_wr_en(pci); + + val = readl(pci->dbi_base + PCI_INTERRUPT_LINE); + val |= SERR_EN; + writel(val, pci->dbi_base + PCI_INTERRUPT_LINE); + + val = readl(pci->dbi_base + DBI_DEVICE_CONTROL_DEVICE_STATUS); + val |= (PCIE_CAP_CORR_ERR_REPORT_EN | PCIE_CAP_NON_FATAL_ERR_REPORT_EN | + PCIE_CAP_FATAL_ERR_REPORT_EN | PCIE_CAP_UNSUPPORT_REQ_REP_EN); + writel(val, pci->dbi_base + DBI_DEVICE_CONTROL_DEVICE_STATUS); + + val = readl(pci->dbi_base + DBI_ROOT_CONTROL_ROOT_CAPABILITIES_REG); + val |= (PCIE_CAP_SYS_ERR_ON_CORR_ERR_EN | PCIE_CAP_SYS_ERR_ON_NON_FATAL_ERR_EN | + PCIE_CAP_SYS_ERR_ON_FATAL_ERR_EN); + writel(val, pci->dbi_base + DBI_ROOT_CONTROL_ROOT_CAPABILITIES_REG); + + dw_pcie_dbi_ro_wr_dis(pci); + + /* Enable RAS-DP Interrupts */ + offset = dw_pcie_find_ext_capability(pci, PCI_EXT_CAP_RASDP_ID); + val = readl(pci->dbi_base + offset + PCIE_RASDP_ERROR_MODE_EN_REG); + val |= RASDP_ERROR_MODE_EN; + writel(val, pci->dbi_base + PCIE_RASDP_ERROR_MODE_EN_REG); + + /* Enable CDM Check */ + val = readl(pci->dbi_base + PCIE_PL_CHK_REG_CONTROL_STATUS); + /* Enable continuous CMD register check mode */ + val |= PCIE_PL_CHK_REG_CHK_REG_CONTINUOUS; + /* Start the CDM register check */ + val |= PCIE_PL_CHK_REG_CHK_REG_START; + /* Enable comparison CDM register check mode */ + val |= PCIE_PL_CHK_REG_CHK_REG_COMPARISON_ERROR; + /* Enable logic CDM register check mode */ + val |= PCIE_PL_CHK_REG_CHK_REG_LOGIC_ERROR; + writel(val, pci->dbi_base + PCIE_PL_CHK_REG_CONTROL_STATUS); + + /* Interface Timer Enable */ + val = readl(pci->dbi_base + PCIE_INTERFACE_TIMER_CONTROL); + val |= (INTERFACE_TIMER_EN | INTERFACE_TIMER_AER_EN); + writel(val, pci->dbi_base + PCIE_INTERFACE_TIMER_CONTROL); + + /* Enable safety correctable and uncorrectable error reporting */ + val = readl(pcie->parf + PARF_CFG_SAFETY_INT_MASK_CTRL); + val |= (CFG_SAFETY_UNCORR_INT_MASK | CFG_SAFETY_CORR_INT_MASK); + writel(val, pcie->parf + PARF_CFG_SAFETY_INT_MASK_CTRL); + + /* Enable CRC check and generation */ + offset = dw_pcie_find_ext_capability(pci, PCIE_AER_EXT_CAP_ID); + val = readl(pci->dbi_base + offset + DBI_ADV_ERR_CAP_CTRL_OFF); + val |= (ECRC_GEN_EN | ECRC_CHECK_EN); + writel(val, pci->dbi_base + offset + DBI_ADV_ERR_CAP_CTRL_OFF); + + /* Enable AER */ + val = readl(pci->dbi_base + offset + DBI_ROOT_ERR_CMD_OFF); + val |= (CORR_ERR_REPORTING_EN | NON_FATAL_ERR_REPORTING_EN + | FATAL_ERR_REPORTING_EN); + writel(val, pci->dbi_base + offset + DBI_ROOT_ERR_CMD_OFF); + + /* Enable interrupts */ + val = readl(pci->dbi_base + DBI_SAFETY_MASK_OFF); + val &= ~(SAFETY_INT_MASK); + writel(val, pci->dbi_base + DBI_SAFETY_MASK_OFF); + + /* Disable Legacy Interrupts */ + writel(0, pcie->parf + PARF_INT_ALL_MASK); +} + static int qcom_pcie_post_init_2_7_0(struct qcom_pcie *pcie) { qcom_pcie_clear_hpc(pcie->pci); + qcom_pcie_enable_error_reporting_2_7_0(pcie); return 0; } @@ -1416,6 +1591,130 @@ static void qcom_pcie_icc_update(struct qcom_pcie *pcie) } } +static void qcom_pcie_check_spurious_int(struct qcom_pcie *pcie) +{ + struct dw_pcie *pci = pcie->pci; + u32 *pcie_fault = pcie->pcie_fault; + struct device *dev = pci->dev; + struct kobject *kobj_ref = &dev->kobj; + u32 val; + + val = readl(pci->dbi_base + DBI_INTERFACE_TIMER_STATUS); + if (val & CFGPCIE_INTERFACE_TIMER_STATUS_MASK) + return; + + val = readl(pcie->parf + PARF_INT_ALL_STATUS); + if (val & CFGPCIE_INT_ALL_STATUS_MASK) + return; + + val = readl(pcie->parf + PARF_INT_STATUS); + if (val & CFGPCIE_PARF_INT_STATUS_MASK) + return; + + val = readl(pcie->parf + PARF_INT_ALL_2_STATUS); + if (val & CFGPCIE_INT_ALL_2_STATUS_MASK) + return; + + val = readl(pcie->parf + PARF_INT_ALL_3_STATUS); + if (val & CFGPCIE_INT_ALL_3_STATUS_MASK) + return; + + val = readl(pcie->parf + PARF_INT_ALL_4_STATUS); + if (val & CFGPCIE_INT_ALL_4_STATUS_MASK) + return; + + val = readl(pcie->parf + PARF_INT_ALL_5_STATUS); + if (val & CFGPCIE_INT_ALL_5_STATUS_MASK) + return; + + dev_err(pci->dev, "PCIe Spurious Interrupt"); + pcie_fault[PCIE_SPURIOUS_INT]++; + pcie->pcie_fault_total++; + sysfs_notify(kobj_ref, NULL, "qcom_pcie_error_report"); +} + +static irqreturn_t qcom_pcie_global_irq_thread(int irq, void *data) +{ + struct qcom_pcie *pcie = data; + struct dw_pcie *pci = pcie->pci; + u32 *pcie_fault = pcie->pcie_fault; + struct device *dev = pci->dev; + struct kobject *kobj_ref = &dev->kobj; + unsigned long irqsave_flags; + u32 val, int_status; + + spin_lock_irqsave(&pcie->safety_lock, irqsave_flags); + + int_status = readl(pci->dbi_base + DBI_SAFETY_STATUS); + writel(0, pci->dbi_base + DBI_SAFETY_STATUS); + + if (int_status) { + dev_err(pci->dev, "global interrupt fired status: %u", int_status); + + if (int_status & PCIE_RASDP_UNCORR_ERR) { + dev_err(pci->dev, "RASDP uncorrectable error triggered"); + pcie_fault[RASDP_UNCORR_ERROR]++; + pcie->pcie_fault_total++; + sysfs_notify(kobj_ref, NULL, "qcom_pcie_error_report"); + + /* + * rasdp_uncorr_err ends up triggering a + * pcie_uncorr error continuously. So masking + * pcie_uncorr interrupts . + */ + val = readl(pci->dbi_base + DBI_SAFETY_MASK_OFF); + val |= PCIE_AER_UNCORR_ERR; + writel(val, pci->dbi_base + DBI_SAFETY_MASK_OFF); + } + + if (int_status & PCIE_CDM_CHK_ERR) { + dev_err(pci->dev, "CDM error triggered"); + val = readl(pci->dbi_base + PCIE_PL_CHK_REG_CONTROL_STATUS); + + if (val & PCIE_PL_CHK_REG_CHK_REG_COMPARISON_ERROR) { + pcie_fault[CDM_REG_CHK_ERROR]++; + pcie->pcie_fault_total++; + sysfs_notify(kobj_ref, NULL, "qcom_pcie_error_report"); + + /* + * cdm_chk_err injection results in a continuous + * interrupt storm on certain targets, so masking it. + */ + val = readl(pci->dbi_base + DBI_SAFETY_MASK_OFF); + val |= (PCIE_CDM_CHK_ERR | PCIE_AER_UNCORR_ERR); + writel(val, pci->dbi_base + DBI_SAFETY_MASK_OFF); + } + } + + if (int_status & PCIE_IFACE_TMR_ERR) { + dev_err(pci->dev, "Iface Timeout error triggered"); + pcie_fault[INTERFACE_TIMER_ERROR]++; + pcie->pcie_fault_total++; + sysfs_notify(kobj_ref, NULL, "qcom_pcie_error_report"); + + /* + * interface_timer_err injection results in a continuous + * interrupt storm on certain targets, so masking it. + */ + val = readl(pci->dbi_base + DBI_SAFETY_MASK_OFF); + val |= (PCIE_IFACE_TMR_ERR | PCIE_AER_UNCORR_ERR); + writel(val, pci->dbi_base + DBI_SAFETY_MASK_OFF); + } + + if (int_status & PCIE_RASDP_CORR_ERR) { + dev_err(pci->dev, "RASDP correctable error triggered"); + pcie_fault[RASDP_CORR_ERROR]++; + pcie->pcie_fault_total++; + sysfs_notify(kobj_ref, NULL, "qcom_pcie_error_report"); + } + } else { + qcom_pcie_check_spurious_int(pcie); + } + + spin_unlock_irqrestore(&pcie->safety_lock, irqsave_flags); + return IRQ_HANDLED; +} + static int qcom_pcie_link_transition_count(struct seq_file *s, void *data) { struct qcom_pcie *pcie = (struct qcom_pcie *)dev_get_drvdata(s->private); @@ -1438,6 +1737,40 @@ static int qcom_pcie_link_transition_count(struct seq_file *s, void *data) return 0; } +static ssize_t qcom_pcie_error_report_show(struct device *dev, + struct device_attribute *attr, + char *buf) +{ + unsigned int i; + struct qcom_pcie *pcie = (struct qcom_pcie *)dev_get_drvdata(dev); + u32 *pcie_fault = pcie->pcie_fault; + size_t len = 0; + + for (i = 0; i < MAX_PCIE_SAFETY_FAULT; i++) { + if (pcie_fault_string[i]) + len += sysfs_emit_at(buf, len, "%s: %lu\n", + pcie_fault_string[i], + pcie_fault[i]); + } + + len += sysfs_emit_at(buf, len, "%s: %lu\n", + pcie_fault_string[i], + pcie->pcie_fault_total); + + return len; +} +static DEVICE_ATTR_RO(qcom_pcie_error_report); + +static struct attribute *qcom_pcie_attrs[] = { + &dev_attr_qcom_pcie_error_report.attr, + NULL, +}; + +static const struct attribute_group qcom_pcie_attribute_group = { + .attrs = qcom_pcie_attrs, + .name = "qcom_pcie" +}; + static void qcom_pcie_init_debugfs(struct qcom_pcie *pcie) { struct dw_pcie *pci = pcie->pci; @@ -1496,6 +1829,21 @@ static int qcom_pcie_probe(struct platform_device *pdev) goto err_pm_runtime_put; } + pcie->global_irq = platform_get_irq_byname(pdev, "global"); + if (pcie->global_irq < 0) { + ret = pcie->global_irq; + goto err_pm_runtime_put; + } + + ret = devm_request_threaded_irq(dev, pcie->global_irq, NULL, + qcom_pcie_global_irq_thread, + IRQF_ONESHOT, + "global_irq", pcie); + if (ret) { + dev_err(dev, "Failed to request Global IRQ\n"); + goto err_pm_runtime_put; + } + pcie->parf = devm_platform_ioremap_resource_byname(pdev, "parf"); if (IS_ERR(pcie->parf)) { ret = PTR_ERR(pcie->parf); @@ -1551,6 +1899,8 @@ static int qcom_pcie_probe(struct platform_device *pdev) if (pcie->mhi) qcom_pcie_init_debugfs(pcie); + sysfs_create_group(&pdev->dev.kobj, &qcom_pcie_attribute_group); + return 0; err_phy_exit: