From patchwork Thu Jan 11 07:32:16 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: "Wang, Qingshun" X-Patchwork-Id: 13516850 X-Patchwork-Delegate: bhelgaas@google.com Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.31]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 95D52D271 for ; Thu, 11 Jan 2024 07:32:54 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="KNA5Occb" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1704958374; x=1736494374; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=px+YmM7db+0whU6yrIVWVlXts5+CVSHJVQOH1bviGhc=; b=KNA5Occb7wRTMhXuyKss4n0SdN5YhA/X5xpF8ElcSesPIyDwz55K42Pc vc20cd78v1WeLkauHTSdkxQQsO3ES4igdgNii17bN1e8SOFkgMPeMzLs8 LDSfBUKyzoKb9wKCd04Hp0usXSpez+KFdv3OLQoIcjEZXGMMvYtjeRP+X kYcjSMhKWoYzCAedu+0zu4K4UjCKsmKv+WV7qMW7QztsLTAOekXmIfrEN Pj1MNjj/gUrNFg3GmxCw8QgDd9UwHJqv/lL+z8P+H/2umfKVT/Ks8JTx2 jNmvqk7/6nkAYzqMuGY9LN8LsJVYtSeM8AHEVsheUXnJ9g7fxRPoxu1bp Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10949"; a="463055533" X-IronPort-AV: E=Sophos;i="6.04,185,1695711600"; d="scan'208";a="463055533" Received: from orviesa001.jf.intel.com ([10.64.159.141]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Jan 2024 23:32:53 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.04,185,1695711600"; d="scan'208";a="30906888" Received: from shijiel-mobl.ccr.corp.intel.com (HELO localhost) ([10.254.211.188]) by smtpauth.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Jan 2024 23:32:51 -0800 From: "Wang, Qingshun" To: linux-pci@vger.kernel.org Cc: chao.p.peng@linux.intel.com, chao.p.peng@intel.com, erwin.tsaur@intel.com, feiting.wanyan@intel.com, qingshun.wang@intel.com, "Wang, Qingshun" , "Andy Shevchenko" , "Luck, Tony" , =?utf-8?q?Ilpo_J=C3=A4rvinen?= Subject: [PATCH 1/4] pci/aer: Store more information in aer_err_info Date: Thu, 11 Jan 2024 15:32:16 +0800 Message-ID: <20240111073227.31488-2-qingshun.wang@linux.intel.com> X-Mailer: git-send-email 2.42.0 In-Reply-To: <20240111073227.31488-1-qingshun.wang@linux.intel.com> References: <20240111073227.31488-1-qingshun.wang@linux.intel.com> Precedence: bulk X-Mailing-List: linux-pci@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Store status and mask of both correctable and uncorrectable errors in aer_err_info. Severity of uncorrectable errors and the values of Device Status register is also recorded in order to filter out errors that cannot cause Advisory Non-Fatal error. Refactor rest of the code to use cor/uncor_status and cor/uncor_mask fields instead of status and mask fields. Reviewed-by: "Andy Shevchenko" Reviewed-by: "Luck, Tony" Reviewed-by: "Ilpo Järvinen" Signed-off-by: "Wang, Qingshun" --- drivers/acpi/apei/ghes.c | 10 ++++- drivers/cxl/core/pci.c | 6 ++- drivers/pci/pci.h | 8 +++- drivers/pci/pcie/aer.c | 93 ++++++++++++++++++++++++++-------------- include/linux/aer.h | 4 +- include/linux/pci.h | 27 ++++++++++++ 6 files changed, 111 insertions(+), 37 deletions(-) diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c index 63ad0541db38..0e7ae84a5d3f 100644 --- a/drivers/acpi/apei/ghes.c +++ b/drivers/acpi/apei/ghes.c @@ -576,6 +576,8 @@ static void ghes_handle_aer(struct acpi_hest_generic_data *gdata) if (pcie_err->validation_bits & CPER_PCIE_VALID_DEVICE_ID && pcie_err->validation_bits & CPER_PCIE_VALID_AER_INFO) { + struct pcie_capability_regs *pcie_caps; + u16 device_status = 0; unsigned int devfn; int aer_severity; u8 *aer_info; @@ -598,11 +600,17 @@ static void ghes_handle_aer(struct acpi_hest_generic_data *gdata) return; memcpy(aer_info, pcie_err->aer_info, sizeof(struct aer_capability_regs)); + if (pcie_err->validation_bits & CPER_PCIE_VALID_CAPABILITY) { + pcie_caps = (struct pcie_capability_regs *) &pcie_err->capability; + device_status = pcie_caps->device_status; + } + aer_recover_queue(pcie_err->device_id.segment, pcie_err->device_id.bus, devfn, aer_severity, (struct aer_capability_regs *) - aer_info); + aer_info, + device_status); } #endif } diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c index 37e1652afbc7..2d099c6463b4 100644 --- a/drivers/cxl/core/pci.c +++ b/drivers/cxl/core/pci.c @@ -902,6 +902,7 @@ static void cxl_handle_rdport_errors(struct cxl_dev_state *cxlds) struct aer_capability_regs aer_regs; struct cxl_dport *dport; struct cxl_port *port; + u16 device_status; int severity; port = cxl_pci_find_port(pdev, &dport); @@ -916,7 +917,10 @@ static void cxl_handle_rdport_errors(struct cxl_dev_state *cxlds) if (!cxl_rch_get_aer_severity(&aer_regs, &severity)) return; - pci_print_aer(pdev, severity, &aer_regs); + if (pcie_capability_read_word(pdev, PCI_EXP_DEVSTA, &device_status)) + return; + + pci_print_aer(pdev, severity, &aer_regs, device_status); if (severity == AER_CORRECTABLE) cxl_handle_rdport_cor_ras(cxlds, dport); diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h index 5ecbcf041179..1e28adfafc06 100644 --- a/drivers/pci/pci.h +++ b/drivers/pci/pci.h @@ -405,8 +405,12 @@ struct aer_err_info { unsigned int __pad2:2; unsigned int tlp_header_valid:1; - unsigned int status; /* COR/UNCOR Error Status */ - unsigned int mask; /* COR/UNCOR Error Mask */ + u32 cor_mask; /* COR Error Mask */ + u32 cor_status; /* COR Error Status */ + u32 uncor_mask; /* UNCOR Error Mask */ + u32 uncor_status; /* UNCOR Error Status */ + u32 uncor_severity; /* UNCOR Error Severity */ + u16 device_status; struct aer_header_log_regs tlp; /* TLP Header */ }; diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c index 42a3bd35a3e1..9311323a2391 100644 --- a/drivers/pci/pcie/aer.c +++ b/drivers/pci/pcie/aer.c @@ -615,7 +615,7 @@ const struct attribute_group aer_stats_attr_group = { static void pci_dev_aer_stats_incr(struct pci_dev *pdev, struct aer_err_info *info) { - unsigned long status = info->status & ~info->mask; + unsigned long status; int i, max = -1; u64 *counter = NULL; struct aer_stats *aer_stats = pdev->aer_stats; @@ -625,16 +625,19 @@ static void pci_dev_aer_stats_incr(struct pci_dev *pdev, switch (info->severity) { case AER_CORRECTABLE: + status = info->cor_status & ~info->cor_mask; aer_stats->dev_total_cor_errs++; counter = &aer_stats->dev_cor_errs[0]; max = AER_MAX_TYPEOF_COR_ERRS; break; case AER_NONFATAL: + status = info->uncor_status & ~info->uncor_mask; aer_stats->dev_total_nonfatal_errs++; counter = &aer_stats->dev_nonfatal_errs[0]; max = AER_MAX_TYPEOF_UNCOR_ERRS; break; case AER_FATAL: + status = info->uncor_status & ~info->uncor_mask; aer_stats->dev_total_fatal_errs++; counter = &aer_stats->dev_fatal_errs[0]; max = AER_MAX_TYPEOF_UNCOR_ERRS; @@ -674,15 +677,17 @@ static void __print_tlp_header(struct pci_dev *dev, static void __aer_print_error(struct pci_dev *dev, struct aer_err_info *info) { + unsigned long status; const char **strings; - unsigned long status = info->status & ~info->mask; const char *level, *errmsg; int i; if (info->severity == AER_CORRECTABLE) { + status = info->cor_status & ~info->cor_mask; strings = aer_correctable_error_string; level = KERN_WARNING; } else { + status = info->uncor_status & ~info->uncor_mask; strings = aer_uncorrectable_error_string; level = KERN_ERR; } @@ -700,18 +705,27 @@ static void __aer_print_error(struct pci_dev *dev, void aer_print_error(struct pci_dev *dev, struct aer_err_info *info) { + u32 status, mask; int layer, agent; int id = pci_dev_id(dev); const char *level; - if (!info->status) { + if (info->severity == AER_CORRECTABLE) { + status = info->cor_status; + mask = info->cor_mask; + } else { + status = info->uncor_status; + mask = info->uncor_mask; + } + + if (!status) { pci_err(dev, "PCIe Bus Error: severity=%s, type=Inaccessible, (Unregistered Agent ID)\n", aer_error_severity_string[info->severity]); goto out; } - layer = AER_GET_LAYER_ERROR(info->severity, info->status); - agent = AER_GET_AGENT(info->severity, info->status); + layer = AER_GET_LAYER_ERROR(info->severity, status); + agent = AER_GET_AGENT(info->severity, status); level = (info->severity == AER_CORRECTABLE) ? KERN_WARNING : KERN_ERR; @@ -720,7 +734,7 @@ void aer_print_error(struct pci_dev *dev, struct aer_err_info *info) aer_error_layer[layer], aer_agent_string[agent]); pci_printk(level, dev, " device [%04x:%04x] error status/mask=%08x/%08x\n", - dev->vendor, dev->device, info->status, info->mask); + dev->vendor, dev->device, status, mask); __aer_print_error(dev, info); @@ -731,7 +745,7 @@ void aer_print_error(struct pci_dev *dev, struct aer_err_info *info) if (info->id && info->error_dev_num > 1 && info->id == id) pci_err(dev, " Error of this Agent is reported first\n"); - trace_aer_event(dev_name(&dev->dev), (info->status & ~info->mask), + trace_aer_event(dev_name(&dev->dev), (status & ~mask), info->severity, info->tlp_header_valid, &info->tlp); } @@ -763,7 +777,7 @@ EXPORT_SYMBOL_GPL(cper_severity_to_aer); #endif void pci_print_aer(struct pci_dev *dev, int aer_severity, - struct aer_capability_regs *aer) + struct aer_capability_regs *aer, u16 device_status) { int layer, agent, tlp_header_valid = 0; u32 status, mask; @@ -783,8 +797,12 @@ void pci_print_aer(struct pci_dev *dev, int aer_severity, memset(&info, 0, sizeof(info)); info.severity = aer_severity; - info.status = status; - info.mask = mask; + info.cor_status = aer->cor_status; + info.cor_mask = aer->cor_mask; + info.uncor_status = aer->uncor_status; + info.uncor_severity = aer->uncor_severity; + info.uncor_mask = aer->uncor_mask; + info.device_status = device_status; info.first_error = PCI_ERR_CAP_FEP(aer->cap_control); pci_err(dev, "aer_status: 0x%08x, aer_mask: 0x%08x\n", status, mask); @@ -991,9 +1009,9 @@ static bool cxl_error_is_native(struct pci_dev *dev) static bool is_internal_error(struct aer_err_info *info) { if (info->severity == AER_CORRECTABLE) - return info->status & PCI_ERR_COR_INTERNAL; + return info->cor_status & PCI_ERR_COR_INTERNAL; - return info->status & PCI_ERR_UNC_INTN; + return info->uncor_status & PCI_ERR_UNC_INTN; } static int cxl_rch_handle_error_iter(struct pci_dev *dev, void *data) @@ -1092,7 +1110,7 @@ static void pci_aer_handle_error(struct pci_dev *dev, struct aer_err_info *info) */ if (aer) pci_write_config_dword(dev, aer + PCI_ERR_COR_STATUS, - info->status); + info->cor_status); if (pcie_aer_is_native(dev)) { struct pci_driver *pdrv = dev->driver; @@ -1123,6 +1141,7 @@ struct aer_recover_entry { u8 devfn; u16 domain; int severity; + u16 device_status; struct aer_capability_regs *regs; }; @@ -1143,7 +1162,7 @@ static void aer_recover_work_func(struct work_struct *work) PCI_SLOT(entry.devfn), PCI_FUNC(entry.devfn)); continue; } - pci_print_aer(pdev, entry.severity, entry.regs); + pci_print_aer(pdev, entry.severity, entry.regs, entry.device_status); /* * Memory for aer_capability_regs(entry.regs) is being allocated from the * ghes_estatus_pool to protect it from overwriting when multiple sections @@ -1172,7 +1191,7 @@ static DEFINE_SPINLOCK(aer_recover_ring_lock); static DECLARE_WORK(aer_recover_work, aer_recover_work_func); void aer_recover_queue(int domain, unsigned int bus, unsigned int devfn, - int severity, struct aer_capability_regs *aer_regs) + int severity, struct aer_capability_regs *aer_regs, u16 device_status) { struct aer_recover_entry entry = { .bus = bus, @@ -1180,6 +1199,7 @@ void aer_recover_queue(int domain, unsigned int bus, unsigned int devfn, .domain = domain, .severity = severity, .regs = aer_regs, + .device_status = device_status, }; if (kfifo_in_spinlocked(&aer_recover_ring, &entry, 1, @@ -1208,38 +1228,49 @@ int aer_get_device_error_info(struct pci_dev *dev, struct aer_err_info *info) int temp; /* Must reset in this function */ - info->status = 0; + info->cor_status = 0; + info->uncor_status = 0; + info->uncor_severity = 0; info->tlp_header_valid = 0; /* The device might not support AER */ if (!aer) return 0; - if (info->severity == AER_CORRECTABLE) { + if (info->severity == AER_CORRECTABLE || + info->severity == AER_NONFATAL || + type == PCI_EXP_TYPE_ROOT_PORT || + type == PCI_EXP_TYPE_RC_EC || + type == PCI_EXP_TYPE_DOWNSTREAM) { + /* Link is healthy for IO reads */ pci_read_config_dword(dev, aer + PCI_ERR_COR_STATUS, - &info->status); + &info->cor_status); pci_read_config_dword(dev, aer + PCI_ERR_COR_MASK, - &info->mask); - if (!(info->status & ~info->mask)) - return 0; - } else if (type == PCI_EXP_TYPE_ROOT_PORT || - type == PCI_EXP_TYPE_RC_EC || - type == PCI_EXP_TYPE_DOWNSTREAM || - info->severity == AER_NONFATAL) { - - /* Link is still healthy for IO reads */ + &info->cor_mask); pci_read_config_dword(dev, aer + PCI_ERR_UNCOR_STATUS, - &info->status); + &info->uncor_status); + pci_read_config_dword(dev, aer + PCI_ERR_UNCOR_SEVER, + &info->uncor_severity); pci_read_config_dword(dev, aer + PCI_ERR_UNCOR_MASK, - &info->mask); - if (!(info->status & ~info->mask)) + &info->uncor_mask); + pcie_capability_read_word(dev, PCI_EXP_DEVSTA, + &info->device_status); + } else { + return 1; + } + + if (info->severity == AER_CORRECTABLE) { + if (!(info->cor_status & ~info->cor_mask)) + return 0; + } else { + if (!(info->uncor_status & ~info->uncor_mask)) return 0; /* Get First Error Pointer */ pci_read_config_dword(dev, aer + PCI_ERR_CAP, &temp); info->first_error = PCI_ERR_CAP_FEP(temp); - if (info->status & AER_LOG_TLP_MASKS) { + if (info->uncor_status & AER_LOG_TLP_MASKS) { info->tlp_header_valid = 1; pci_read_config_dword(dev, aer + PCI_ERR_HEADER_LOG, &info->tlp.dw0); diff --git a/include/linux/aer.h b/include/linux/aer.h index f6ea2f57d808..8332731b7212 100644 --- a/include/linux/aer.h +++ b/include/linux/aer.h @@ -52,9 +52,9 @@ static inline int pcie_aer_is_native(struct pci_dev *dev) { return 0; } #endif void pci_print_aer(struct pci_dev *dev, int aer_severity, - struct aer_capability_regs *aer); + struct aer_capability_regs *aer, u16 device_status); int cper_severity_to_aer(int cper_severity); void aer_recover_queue(int domain, unsigned int bus, unsigned int devfn, - int severity, struct aer_capability_regs *aer_regs); + int severity, struct aer_capability_regs *aer_regs, u16 device_status); #endif //_AER_H_ diff --git a/include/linux/pci.h b/include/linux/pci.h index dea043bc1e38..0a4f679bca90 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -318,6 +318,33 @@ struct pci_sriov; struct pci_p2pdma; struct rcec_ea; +struct pcie_capability_regs { + u8 pcie_cap_id; + u8 next_cap_ptr; + u16 pcie_caps; + u32 device_caps; + u16 device_control; + u16 device_status; + u32 link_caps; + u16 link_control; + u16 link_status; + u32 slot_caps; + u16 slot_control; + u16 slot_status; + u16 root_control; + u16 root_caps; + u32 root_status; + u32 device_caps_2; + u16 device_control_2; + u16 device_status_2; + u32 link_caps_2; + u16 link_control_2; + u16 link_status_2; + u32 slot_caps_2; + u16 slot_control_2; + u16 slot_status_2; +}; + /* The pci_dev structure describes PCI devices */ struct pci_dev { struct list_head bus_list; /* Node in per-bus list */ From patchwork Thu Jan 11 07:32:17 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Wang, Qingshun" X-Patchwork-Id: 13516851 X-Patchwork-Delegate: bhelgaas@google.com Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6CD99D271 for ; Thu, 11 Jan 2024 07:33:05 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="A4xbJEsv" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1704958385; x=1736494385; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=v6a7LKb5QccWMOWs5LoVpMfaIp9ZTysxI+Wa69SW7kE=; b=A4xbJEsvLoK6+Qs2W1PJ+EGiSBbKuIwkJov3y7IUhaRplTvwmgsZMNTP iFNPaBNDzS8v9Sfw83r3vUOyRIj+Pez5dX6XJXZbs3vgc3q+mu+5Crrdo SRwXahF/D+VBHLAw6+fk4R7nimUB8/OPbFydsCJJOxQuf6hrWPsdFo170 uDIR7YCx/8hw15okxABQoj1YL77E44spG4a+ifslV31iCzGdnaNJ8wga1 nnEg3yvvlalqPwQIf7fA427cbgW5w06vl1HztO66z3vDfi6b09bm4qAHQ oTLVMveBW4KXfFcMh/lhayiEjW3Mtc6hZR80AlxdmPnEmgf9BYr/iFBoQ Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10949"; a="6126005" X-IronPort-AV: E=Sophos;i="6.04,185,1695711600"; d="scan'208";a="6126005" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by fmvoesa104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Jan 2024 23:33:05 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10949"; a="905855976" X-IronPort-AV: E=Sophos;i="6.04,185,1695711600"; d="scan'208";a="905855976" Received: from shijiel-mobl.ccr.corp.intel.com (HELO localhost) ([10.254.211.188]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Jan 2024 23:33:02 -0800 From: "Wang, Qingshun" To: linux-pci@vger.kernel.org Cc: chao.p.peng@linux.intel.com, chao.p.peng@intel.com, erwin.tsaur@intel.com, feiting.wanyan@intel.com, qingshun.wang@intel.com, "Wang, Qingshun" Subject: [PATCH 2/4] pci/aer: Handle Advisory Non-Fatal properly Date: Thu, 11 Jan 2024 15:32:17 +0800 Message-ID: <20240111073227.31488-3-qingshun.wang@linux.intel.com> X-Mailer: git-send-email 2.42.0 In-Reply-To: <20240111073227.31488-1-qingshun.wang@linux.intel.com> References: <20240111073227.31488-1-qingshun.wang@linux.intel.com> Precedence: bulk X-Mailing-List: linux-pci@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 If we are processing an Advisory Non-Fatal Error, first check the Device Status. If any of Fatal/Non-Fatal Error Detected bits is set, leave it to uncorrectable error handler to clear the UE status bit, which should be executed right after the CE handler in this case. Otherwise, filter out uncorrectable errors that is not possible to trigger an Advisory Non-Fatal Error, then clear all the rest status bits. Reviewed-by: "Tsaur, Erwin" Signed-off-by: "Wang, Qingshun" --- drivers/pci/pcie/aer.c | 58 +++++++++++++++++++++++++++++++++++++++++- 1 file changed, 57 insertions(+), 1 deletion(-) diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c index 9311323a2391..86e7cfd71f23 100644 --- a/drivers/pci/pcie/aer.c +++ b/drivers/pci/pcie/aer.c @@ -107,6 +107,12 @@ struct aer_stats { PCI_ERR_ROOT_MULTI_COR_RCV | \ PCI_ERR_ROOT_MULTI_UNCOR_RCV) +#define AER_ERR_ANFE_UNC_MASK (PCI_ERR_UNC_POISON_TLP | \ + PCI_ERR_UNC_COMP_TIME | \ + PCI_ERR_UNC_COMP_ABORT | \ + PCI_ERR_UNC_UNX_COMP | \ + PCI_ERR_UNC_UNSUP) + static int pcie_aer_disable; static pci_ers_result_t aer_root_reset(struct pci_dev *dev); @@ -612,6 +618,29 @@ const struct attribute_group aer_stats_attr_group = { .is_visible = aer_stats_attrs_are_visible, }; +static int anfe_get_related_err(struct aer_err_info *info) +{ + /* + * Take the most conservative route here. If there are + * Non-Fatal/Fatal errors detected, do not assume any + * bit in uncor_status is set by ANFE. + */ + if (info->device_status & (PCI_EXP_DEVSTA_NFED | PCI_EXP_DEVSTA_FED)) + return 0; + /* + * An UNCOR error may cause Advisory Non-Fatal error if: + * a. The severity of the error is Non-Fatal. + * b. The error is one of the following: + * 1. Poisoned TLP + * 2. Completion Timeout + * 3. Completer Abort + * 4. Unexpected Completion + * 5. Unsupported Request + */ + return info->uncor_status & ~info->uncor_mask + & AER_ERR_ANFE_UNC_MASK & ~info->severity; +} + static void pci_dev_aer_stats_incr(struct pci_dev *pdev, struct aer_err_info *info) { @@ -678,6 +707,7 @@ static void __aer_print_error(struct pci_dev *dev, struct aer_err_info *info) { unsigned long status; + unsigned long anfe_status; const char **strings; const char *level, *errmsg; int i; @@ -700,6 +730,21 @@ static void __aer_print_error(struct pci_dev *dev, pci_printk(level, dev, " [%2d] %-22s%s\n", i, errmsg, info->first_error == i ? " (First)" : ""); } + + if (info->severity == AER_CORRECTABLE && (status & PCI_ERR_COR_ADV_NFAT)) { + anfe_status = anfe_get_related_err(info); + if (anfe_status) { + pci_printk(level, dev, "Uncorrectable errors that may cause Advisory Non-Fatal:"); + for_each_set_bit(i, &anfe_status, 32) { + errmsg = aer_uncorrectable_error_string[i]; + if (!errmsg) + errmsg = "Unknown Error Bit"; + + pci_printk(level, dev, " [%2d] %-22s\n", i, errmsg); + } + } + } + pci_dev_aer_stats_incr(dev, info); } @@ -1092,6 +1137,14 @@ static inline void cxl_rch_handle_error(struct pci_dev *dev, struct aer_err_info *info) { } #endif +static void handle_advisory_nonfatal(struct pci_dev *dev, struct aer_err_info *info) +{ + int aer = dev->aer_cap; + + pci_write_config_dword(dev, aer + PCI_ERR_UNCOR_STATUS, + anfe_get_related_err(info)); +} + /** * pci_aer_handle_error - handle logging error into an event log * @dev: pointer to pci_dev data structure of error source device @@ -1108,9 +1161,12 @@ static void pci_aer_handle_error(struct pci_dev *dev, struct aer_err_info *info) * Correctable error does not need software intervention. * No need to go through error recovery process. */ - if (aer) + if (aer) { pci_write_config_dword(dev, aer + PCI_ERR_COR_STATUS, info->cor_status); + if (info->cor_status & PCI_ERR_COR_ADV_NFAT) + handle_advisory_nonfatal(dev, info); + } if (pcie_aer_is_native(dev)) { struct pci_driver *pdrv = dev->driver; From patchwork Thu Jan 11 07:32:18 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Wang, Qingshun" X-Patchwork-Id: 13516852 X-Patchwork-Delegate: bhelgaas@google.com Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4D7BFD271 for ; Thu, 11 Jan 2024 07:33:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="RPUiTJxA" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1704958396; x=1736494396; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=bFDkuEpdSE40qU6/BBnY/phecIamne3XJKJCDfySlsA=; b=RPUiTJxATNLQLYZE/nNCjIlE1JnI4AB2smUz6vDIP1/o+hze9uw6UEXC mqANTEQ0CrbLh4VGUxpy5M93EUS+xegA40BQSX8mMZSiB664EWCJ7RwkO fzO6RT01D/oBWk36E1ev5qaB8nXnfB1gVeDxgpULyRWQkiRRuRDy47iZP Ft6imLPuOc29WUTX1DwjOOxka22q2Y2N5Dv2YZrSu4GLpmrxmBA36+In/ V2upgqs0hPEP9ydJ4OrCFibDBmETY4HKr5D4wCG+PV+wvQguba1HN+v83 hFudBSD9oBYBJT6iMayEj67k1i4N29kUy77ivegaf49qIqpGxf44e/r5A w==; X-IronPort-AV: E=McAfee;i="6600,9927,10949"; a="6126028" X-IronPort-AV: E=Sophos;i="6.04,185,1695711600"; d="scan'208";a="6126028" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by fmvoesa104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Jan 2024 23:33:15 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10949"; a="905856002" X-IronPort-AV: E=Sophos;i="6.04,185,1695711600"; d="scan'208";a="905856002" Received: from shijiel-mobl.ccr.corp.intel.com (HELO localhost) ([10.254.211.188]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Jan 2024 23:33:13 -0800 From: "Wang, Qingshun" To: linux-pci@vger.kernel.org Cc: chao.p.peng@linux.intel.com, chao.p.peng@intel.com, erwin.tsaur@intel.com, feiting.wanyan@intel.com, qingshun.wang@intel.com, "Wang, Qingshun" Subject: [PATCH 3/4] pci/aer: Fetch information for FTrace Date: Thu, 11 Jan 2024 15:32:18 +0800 Message-ID: <20240111073227.31488-4-qingshun.wang@linux.intel.com> X-Mailer: git-send-email 2.42.0 In-Reply-To: <20240111073227.31488-1-qingshun.wang@linux.intel.com> References: <20240111073227.31488-1-qingshun.wang@linux.intel.com> Precedence: bulk X-Mailing-List: linux-pci@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Fetch and store the data of 3 more registers: `Link Status`, `Device Control 2` and `Advanced Error Capabilities and Control`. This data will be traced in FTrace in the following patch. Signed-off-by: "Wang, Qingshun" --- drivers/acpi/apei/ghes.c | 8 +++++++- drivers/cxl/core/pci.c | 11 ++++++++++- drivers/pci/pci.h | 4 ++++ drivers/pci/pcie/aer.c | 26 ++++++++++++++++++++------ include/linux/aer.h | 6 ++++-- 5 files changed, 45 insertions(+), 10 deletions(-) diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c index 0e7ae84a5d3f..a15bb93c6904 100644 --- a/drivers/acpi/apei/ghes.c +++ b/drivers/acpi/apei/ghes.c @@ -577,7 +577,9 @@ static void ghes_handle_aer(struct acpi_hest_generic_data *gdata) if (pcie_err->validation_bits & CPER_PCIE_VALID_DEVICE_ID && pcie_err->validation_bits & CPER_PCIE_VALID_AER_INFO) { struct pcie_capability_regs *pcie_caps; + u16 device_control_2 = 0; u16 device_status = 0; + u16 link_status = 0; unsigned int devfn; int aer_severity; u8 *aer_info; @@ -602,7 +604,9 @@ static void ghes_handle_aer(struct acpi_hest_generic_data *gdata) if (pcie_err->validation_bits & CPER_PCIE_VALID_CAPABILITY) { pcie_caps = (struct pcie_capability_regs *) &pcie_err->capability; + device_control_2 = pcie_caps->device_control_2; device_status = pcie_caps->device_status; + link_status = pcie_caps->link_status; } aer_recover_queue(pcie_err->device_id.segment, @@ -610,7 +614,9 @@ static void ghes_handle_aer(struct acpi_hest_generic_data *gdata) devfn, aer_severity, (struct aer_capability_regs *) aer_info, - device_status); + device_status, + link_status, + device_control_2); } #endif } diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c index 2d099c6463b4..c1ea77057341 100644 --- a/drivers/cxl/core/pci.c +++ b/drivers/cxl/core/pci.c @@ -902,7 +902,9 @@ static void cxl_handle_rdport_errors(struct cxl_dev_state *cxlds) struct aer_capability_regs aer_regs; struct cxl_dport *dport; struct cxl_port *port; + u16 device_control_2; u16 device_status; + u16 link_status; int severity; port = cxl_pci_find_port(pdev, &dport); @@ -917,10 +919,17 @@ static void cxl_handle_rdport_errors(struct cxl_dev_state *cxlds) if (!cxl_rch_get_aer_severity(&aer_regs, &severity)) return; + if (pcie_capability_read_word(pdev, PCI_EXP_DEVCTL2, &device_control_2)) + return; + if (pcie_capability_read_word(pdev, PCI_EXP_DEVSTA, &device_status)) return; - pci_print_aer(pdev, severity, &aer_regs, device_status); + if (pcie_capability_read_word(pdev, PCI_EXP_LNKSTA, &link_status)) + return; + + pci_print_aer(pdev, severity, &aer_regs, device_status, + link_status, device_control_2); if (severity == AER_CORRECTABLE) cxl_handle_rdport_cor_ras(cxlds, dport); diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h index 1e28adfafc06..333f2194a360 100644 --- a/drivers/pci/pci.h +++ b/drivers/pci/pci.h @@ -410,7 +410,11 @@ struct aer_err_info { u32 uncor_mask; /* UNCOR Error Mask */ u32 uncor_status; /* UNCOR Error Status */ u32 uncor_severity; /* UNCOR Error Severity */ + + u16 link_status; + u32 aer_cap_ctrl; /* AER Capabilities and Control */ u16 device_status; + u16 device_control_2; struct aer_header_log_regs tlp; /* TLP Header */ }; diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c index 86e7cfd71f23..b5d4ea8e5f8d 100644 --- a/drivers/pci/pcie/aer.c +++ b/drivers/pci/pcie/aer.c @@ -822,7 +822,8 @@ EXPORT_SYMBOL_GPL(cper_severity_to_aer); #endif void pci_print_aer(struct pci_dev *dev, int aer_severity, - struct aer_capability_regs *aer, u16 device_status) + struct aer_capability_regs *aer, u16 device_status, + u16 link_status, u16 device_control_2) { int layer, agent, tlp_header_valid = 0; u32 status, mask; @@ -847,7 +848,10 @@ void pci_print_aer(struct pci_dev *dev, int aer_severity, info.uncor_status = aer->uncor_status; info.uncor_severity = aer->uncor_severity; info.uncor_mask = aer->uncor_mask; + info.link_status = link_status; + info.aer_cap_ctrl = aer->cap_control; info.device_status = device_status; + info.device_control_2 = device_control_2; info.first_error = PCI_ERR_CAP_FEP(aer->cap_control); pci_err(dev, "aer_status: 0x%08x, aer_mask: 0x%08x\n", status, mask); @@ -1197,7 +1201,9 @@ struct aer_recover_entry { u8 devfn; u16 domain; int severity; + u16 link_status; u16 device_status; + u16 device_control_2; struct aer_capability_regs *regs; }; @@ -1218,7 +1224,8 @@ static void aer_recover_work_func(struct work_struct *work) PCI_SLOT(entry.devfn), PCI_FUNC(entry.devfn)); continue; } - pci_print_aer(pdev, entry.severity, entry.regs, entry.device_status); + pci_print_aer(pdev, entry.severity, entry.regs, entry.device_status, + entry.link_status, entry.device_control_2); /* * Memory for aer_capability_regs(entry.regs) is being allocated from the * ghes_estatus_pool to protect it from overwriting when multiple sections @@ -1247,7 +1254,8 @@ static DEFINE_SPINLOCK(aer_recover_ring_lock); static DECLARE_WORK(aer_recover_work, aer_recover_work_func); void aer_recover_queue(int domain, unsigned int bus, unsigned int devfn, - int severity, struct aer_capability_regs *aer_regs, u16 device_status) + int severity, struct aer_capability_regs *aer_regs, u16 device_status, + u16 link_status, u16 device_control_2) { struct aer_recover_entry entry = { .bus = bus, @@ -1255,7 +1263,9 @@ void aer_recover_queue(int domain, unsigned int bus, unsigned int devfn, .domain = domain, .severity = severity, .regs = aer_regs, + .link_status = link_status, .device_status = device_status, + .device_control_2 = device_control_2, }; if (kfifo_in_spinlocked(&aer_recover_ring, &entry, 1, @@ -1281,7 +1291,6 @@ int aer_get_device_error_info(struct pci_dev *dev, struct aer_err_info *info) { int type = pci_pcie_type(dev); int aer = dev->aer_cap; - int temp; /* Must reset in this function */ info->cor_status = 0; @@ -1309,8 +1318,14 @@ int aer_get_device_error_info(struct pci_dev *dev, struct aer_err_info *info) &info->uncor_severity); pci_read_config_dword(dev, aer + PCI_ERR_UNCOR_MASK, &info->uncor_mask); + pci_read_config_dword(dev, aer + PCI_ERR_CAP, + &info->aer_cap_ctrl); + pcie_capability_read_word(dev, PCI_EXP_LNKSTA, + &info->link_status); pcie_capability_read_word(dev, PCI_EXP_DEVSTA, &info->device_status); + pcie_capability_read_word(dev, PCI_EXP_DEVCTL2, + &info->device_control_2); } else { return 1; } @@ -1323,8 +1338,7 @@ int aer_get_device_error_info(struct pci_dev *dev, struct aer_err_info *info) return 0; /* Get First Error Pointer */ - pci_read_config_dword(dev, aer + PCI_ERR_CAP, &temp); - info->first_error = PCI_ERR_CAP_FEP(temp); + info->first_error = PCI_ERR_CAP_FEP(info->aer_cap_ctrl); if (info->uncor_status & AER_LOG_TLP_MASKS) { info->tlp_header_valid = 1; diff --git a/include/linux/aer.h b/include/linux/aer.h index 8332731b7212..333b72c19c60 100644 --- a/include/linux/aer.h +++ b/include/linux/aer.h @@ -52,9 +52,11 @@ static inline int pcie_aer_is_native(struct pci_dev *dev) { return 0; } #endif void pci_print_aer(struct pci_dev *dev, int aer_severity, - struct aer_capability_regs *aer, u16 device_status); + struct aer_capability_regs *aer, u16 device_status, + u16 link_status, u16 device_control_2); int cper_severity_to_aer(int cper_severity); void aer_recover_queue(int domain, unsigned int bus, unsigned int devfn, - int severity, struct aer_capability_regs *aer_regs, u16 device_status); + int severity, struct aer_capability_regs *aer_regs, u16 device_status, + u16 link_status, u16 device_control_2); #endif //_AER_H_ From patchwork Thu Jan 11 07:32:19 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Wang, Qingshun" X-Patchwork-Id: 13516853 X-Patchwork-Delegate: bhelgaas@google.com Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C99F5D2F3 for ; Thu, 11 Jan 2024 07:33:26 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="b6HeL22L" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1704958407; x=1736494407; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=nqK6Cw62H3exvcqS+8v+d8bFV2q3iUZMa7UqCuF15OM=; b=b6HeL22L2Dud4Fk6fz82sgHU5r6CsDXvBf0B+JH2zyKYK0rIEFFQ/UeM gm4eLDTDxPzw2r1cSNB5BSv/4i81eMU1PcLu3ROUy/6Mas/DjCeKPbd+L DqSGqg2N7dST4OYso4m5BJLGXtWJHZ2MdEgLvcv2WrWTy/kfoadKnirFr Pi8wliucJkdoOEvoq8Iqcc3Fk1Zf1IwG/C+pGaRKgUwxtpGhwziVd4eFI jEytleeydP8kzCjjp9HmtmwxQYRQd33zKd+dt+8DMDmsd7exjXCGPNQkN h31D+H2VR4zhHQYd144F3zeUj3hqvcqGBQPTTYPrNQrseaUm1+nw/cY2g Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10949"; a="6126061" X-IronPort-AV: E=Sophos;i="6.04,185,1695711600"; d="scan'208";a="6126061" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by fmvoesa104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Jan 2024 23:33:26 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10949"; a="905856031" X-IronPort-AV: E=Sophos;i="6.04,185,1695711600"; d="scan'208";a="905856031" Received: from shijiel-mobl.ccr.corp.intel.com (HELO localhost) ([10.254.211.188]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Jan 2024 23:33:23 -0800 From: "Wang, Qingshun" To: linux-pci@vger.kernel.org Cc: chao.p.peng@linux.intel.com, chao.p.peng@intel.com, erwin.tsaur@intel.com, feiting.wanyan@intel.com, qingshun.wang@intel.com, "Wang, Qingshun" Subject: [PATCH 4/4] ras: Trace more information in aer_event Date: Thu, 11 Jan 2024 15:32:19 +0800 Message-ID: <20240111073227.31488-5-qingshun.wang@linux.intel.com> X-Mailer: git-send-email 2.42.0 In-Reply-To: <20240111073227.31488-1-qingshun.wang@linux.intel.com> References: <20240111073227.31488-1-qingshun.wang@linux.intel.com> Precedence: bulk X-Mailing-List: linux-pci@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Add following fields in aer_event to better understand Advisory Non-Fatal and other errors for external observation: - cor_status (Correctable Error Status) - cor_mask (Correctable Error Mask) - uncor_status (Uncorrectable Error Status) - uncor_severity (Uncorrectable Error Severity) - uncor_mask (Uncorrectable Error Mask) - aer_cap_ctrl (AER Capabilities and Control) - link_status (Link Status) - device_status (Device Status) - device_control_2 (Device Control 2) In addition to the raw register value, value of following fields are extracted and logged for better observability: - "First Error Pointer" and "Completion Timeout Prefix/Header Log Capable" from "AER Capabilities and Control" - "Completion Timeout Value" and "Completion Timeout Disable" from "Device Control 2" Reviewed-by: "Tsaur, Erwin" Signed-off-by: "Wang, Qingshun" --- drivers/pci/pcie/aer.c | 17 +++++++++++-- include/ras/ras_event.h | 48 ++++++++++++++++++++++++++++++++--- include/uapi/linux/pci_regs.h | 1 + 3 files changed, 60 insertions(+), 6 deletions(-) diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c index b5d4ea8e5f8d..52ca9bf26e84 100644 --- a/drivers/pci/pcie/aer.c +++ b/drivers/pci/pcie/aer.c @@ -754,6 +754,7 @@ void aer_print_error(struct pci_dev *dev, struct aer_err_info *info) int layer, agent; int id = pci_dev_id(dev); const char *level; + struct aer_capability_regs aer_caps; if (info->severity == AER_CORRECTABLE) { status = info->cor_status; @@ -790,8 +791,18 @@ void aer_print_error(struct pci_dev *dev, struct aer_err_info *info) if (info->id && info->error_dev_num > 1 && info->id == id) pci_err(dev, " Error of this Agent is reported first\n"); + aer_caps = (struct aer_capability_regs) { + .cor_status = info->cor_status, + .cor_mask = info->cor_mask, + .uncor_status = info->uncor_status, + .uncor_severity = info->uncor_severity, + .uncor_mask = info->uncor_mask, + .cap_control = info->aer_cap_ctrl + }; trace_aer_event(dev_name(&dev->dev), (status & ~mask), - info->severity, info->tlp_header_valid, &info->tlp); + info->severity, info->tlp_header_valid, &info->tlp, + &aer_caps, info->link_status, + info->device_status, info->device_control_2); } static void aer_print_port_info(struct pci_dev *dev, struct aer_err_info *info) @@ -867,7 +878,9 @@ void pci_print_aer(struct pci_dev *dev, int aer_severity, __print_tlp_header(dev, &aer->header_log); trace_aer_event(dev_name(&dev->dev), (status & ~mask), - aer_severity, tlp_header_valid, &aer->header_log); + aer_severity, tlp_header_valid, &aer->header_log, + aer, info.link_status, + info.device_status, info.device_control_2); } EXPORT_SYMBOL_NS_GPL(pci_print_aer, CXL); diff --git a/include/ras/ras_event.h b/include/ras/ras_event.h index cbd3ddd7c33d..9bece6edec28 100644 --- a/include/ras/ras_event.h +++ b/include/ras/ras_event.h @@ -300,9 +300,14 @@ TRACE_EVENT(aer_event, const u32 status, const u8 severity, const u8 tlp_header_valid, - struct aer_header_log_regs *tlp), + struct aer_header_log_regs *tlp, + struct aer_capability_regs *aer_caps, + const u16 link_status, + const u16 device_status, + const u16 device_control_2), - TP_ARGS(dev_name, status, severity, tlp_header_valid, tlp), + TP_ARGS(dev_name, status, severity, tlp_header_valid, tlp, + aer_caps, link_status, device_status, device_control_2), TP_STRUCT__entry( __string( dev_name, dev_name ) @@ -310,6 +315,10 @@ TRACE_EVENT(aer_event, __field( u8, severity ) __field( u8, tlp_header_valid) __array( u32, tlp_header, 4 ) + __field_struct(struct aer_capability_regs, aer_caps) + __field( u16, link_status ) + __field( u16, device_status ) + __field( u16, device_control_2) ), TP_fast_assign( @@ -317,6 +326,10 @@ TRACE_EVENT(aer_event, __entry->status = status; __entry->severity = severity; __entry->tlp_header_valid = tlp_header_valid; + __entry->aer_caps = *aer_caps; + __entry->link_status = link_status; + __entry->device_status = device_status; + __entry->device_control_2 = device_control_2; if (tlp_header_valid) { __entry->tlp_header[0] = tlp->dw0; __entry->tlp_header[1] = tlp->dw1; @@ -325,7 +338,20 @@ TRACE_EVENT(aer_event, } ), - TP_printk("%s PCIe Bus Error: severity=%s, %s, TLP Header=%s\n", + TP_printk("%s PCIe Bus Error: severity=%s, %s, TLP Header=%s, " + "Correctable Error Status=0x%08x, " + "Correctable Error Mask=0x%08x, " + "Uncorrectable Error Status=0x%08x, " + "Uncorrectable Error Severity=0x%08x, " + "Uncorrectable Error Mask=0x%08x, " + "AER Capability and Control=0x%08x, " + "First Error Pointer=0x%x, " + "Completion Timeout Prefix/Header Log Capable=%s, " + "Link Status=0x%04x, " + "Device Status=0x%04x, " + "Device Control 2=0x%04x, " + "Completion Timeout Value=0x%x, " + "Completion Timeout Disable=%s\n", __get_str(dev_name), __entry->severity == AER_CORRECTABLE ? "Corrected" : __entry->severity == AER_FATAL ? @@ -335,7 +361,21 @@ TRACE_EVENT(aer_event, __print_flags(__entry->status, "|", aer_uncorrectable_errors), __entry->tlp_header_valid ? __print_array(__entry->tlp_header, 4, 4) : - "Not available") + "Not available", + __entry->aer_caps.cor_status, + __entry->aer_caps.cor_mask, + __entry->aer_caps.uncor_status, + __entry->aer_caps.uncor_severity, + __entry->aer_caps.uncor_mask, + __entry->aer_caps.cap_control, + PCI_ERR_CAP_FEP(__entry->aer_caps.cap_control), + __entry->aer_caps.cap_control & PCI_ERR_CAP_CTO_LOGC ? "True" : "False", + __entry->link_status, + __entry->device_status, + __entry->device_control_2, + __entry->device_control_2 & PCI_EXP_DEVCTL2_COMP_TIMEOUT, + __entry->device_control_2 & PCI_EXP_DEVCTL2_COMP_TMOUT_DIS ? + "True" : "False") ); /* diff --git a/include/uapi/linux/pci_regs.h b/include/uapi/linux/pci_regs.h index a39193213ff2..54160ed2a8c9 100644 --- a/include/uapi/linux/pci_regs.h +++ b/include/uapi/linux/pci_regs.h @@ -787,6 +787,7 @@ #define PCI_ERR_CAP_ECRC_GENE 0x00000040 /* ECRC Generation Enable */ #define PCI_ERR_CAP_ECRC_CHKC 0x00000080 /* ECRC Check Capable */ #define PCI_ERR_CAP_ECRC_CHKE 0x00000100 /* ECRC Check Enable */ +#define PCI_ERR_CAP_CTO_LOGC 0x00001000 /* Completion Timeout Prefix/Header Log Capable */ #define PCI_ERR_HEADER_LOG 0x1c /* Header Log Register (16 bytes) */ #define PCI_ERR_ROOT_COMMAND 0x2c /* Root Error Command */ #define PCI_ERR_ROOT_CMD_COR_EN 0x00000001 /* Correctable Err Reporting Enable */