From patchwork Mon Oct 25 17:01:00 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naveen Naidu X-Patchwork-Id: 12582407 X-Patchwork-Delegate: bhelgaas@google.com Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 70992C4332F for ; Mon, 25 Oct 2021 17:08:33 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 5B43060EE3 for ; Mon, 25 Oct 2021 17:08:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234817AbhJYRKy (ORCPT ); Mon, 25 Oct 2021 13:10:54 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48666 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235248AbhJYRJQ (ORCPT ); Mon, 25 Oct 2021 13:09:16 -0400 Received: from mail-pg1-x531.google.com (mail-pg1-x531.google.com [IPv6:2607:f8b0:4864:20::531]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 68705C0432F0; Mon, 25 Oct 2021 10:01:53 -0700 (PDT) Received: by mail-pg1-x531.google.com with SMTP id e65so11558728pgc.5; Mon, 25 Oct 2021 10:01:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=TzDmm1CSO5ys2u/cUDBI2TYr2rN5kWaOTVprB8Ko1Uk=; b=ZHdhSeGLURPzXaswSnZGMuZ4LcxM/+rFjEy7pPEM4C8GYgNJCYn1yIid35ljsILwQh U67jlu8fWxehMhq7EiQY5KE11JOQg4hoZwWbNbk61OPyY4blm1uk4YXpwJbYlOSQEGVL LyFNcjlOnkQHLDXAwhFbQ5tKmV62LSf0LwtAaIcxKFXkXXKeDJLLBvLoLItgpm/b1lE1 nU+jaOi2JRN5m7b2SBmQ1HhRYU4l2RwPPDrsDxaXNfxqkUhoVDn8tN3pMYujXUAbuNuN cBnR7a6eqlbNs8opvXdJoT7Pe5DChrLafLFE2RQwHrcG3/HjMruSaPOds39rtNr5agiu ZeNQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=TzDmm1CSO5ys2u/cUDBI2TYr2rN5kWaOTVprB8Ko1Uk=; b=3vUesO9FExqBSuz7WbHTaJwmhE8Pu1F5qk5bxd0A4nkpcE4FCXqkwQy0PNgXwqawaO rnYiHShZezi9/O3+OxGhssP6eWR9SFna6byJ0T6mdWO8lSworktioJNtjpNfj3c3gZZP lqGcJiA53WFvs3CBK9QaOxITtve10dSvmSbu+RLS+VZi6Bu7HtGdNVWfvrbQ9R6z/xU7 gd8pRB8KhwMT3eiEeX3QvQsQ5h0q2Imvu9QtFhnpgqm/r3TIomqAkAIH5fafJPIE/+yh r/IdCiBhNaRiksSorCn9eaImAoARO1+rVBMcfxfqPaZH+TKMI1c9V4Slvl2NV5ssXJq1 +9dA== X-Gm-Message-State: AOAM532VbiOxsYkJILBizayF8OjOi5IKWPjfBe8sZj0d6Kf6+kPAiKHB Mec6q5EhVq8UDeuMQ0l+1Bc= X-Google-Smtp-Source: ABdhPJyH7J/iuk4Y6eYOW7ZypxU9kq4dQSkqOm8XsvftCh8hBtHp15bTqEuxROXH/SA1ZGpJoWeNDA== X-Received: by 2002:a05:6a00:88e:b0:44c:c40:9279 with SMTP id q14-20020a056a00088e00b0044c0c409279mr19791608pfj.85.1635181312767; Mon, 25 Oct 2021 10:01:52 -0700 (PDT) Received: from localhost.localdomain ([2406:7400:63:df8b:7255:8580:2394:764c]) by smtp.gmail.com with ESMTPSA id g18sm5100858pfj.67.2021.10.25.10.01.49 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 25 Oct 2021 10:01:52 -0700 (PDT) From: Naveen Naidu To: bhelgaas@google.com, ruscur@russell.cc, oohall@gmail.com Cc: Naveen Naidu , linux-kernel-mentees@lists.linuxfoundation.org, skhan@linuxfoundation.org, linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org, linuxppc-dev@lists.ozlabs.org Subject: [PATCH v5 1/5] PCI/AER: Remove ID from aer_agent_string[] Date: Mon, 25 Oct 2021 22:31:00 +0530 Message-Id: X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org Currently, we do not print the "id" field in the AER error logs. Yet the aer_agent_string[] has the word "id" in it. The AER error log looks like: pcieport 0000:00:03.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Receiver ID) Without the "id" field in the error log, The aer_agent_string[] (eg: "Receiver ID") does not make sense. A user reading the aer_agent_string[] in the log, might inadvertently look for an "id" field and not finding it might lead to confusion. Remove the "ID" from the aer_agent_string[]. It is easy to reproduce this by using aer-inject: $ aer-inject -s 00:03:0 corr-err-file The content of the corr-err-file file is as below: AER COR_STATUS BAD_TLP HEADER_LOG 0 1 2 3 The following are sample dummy errors inject via aer-inject. Before ======= In 010caed4ccb6 ("PCI/AER: Decode Error Source Requester ID"), the "id" field was removed from the AER error logs, so currently AER logs look like: pcieport 0000:00:03.0: AER: Corrected error received: 0000:00:03:0 pcieport 0000:00:03.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Receiver ID) <--- no id field pcieport 0000:00:03.0: device [1b36:000c] error status/mask=00000040/0000e000 pcieport 0000:00:03.0: [ 6] BadTLP After ====== pcieport 0000:00:03.0: AER: Corrected error received: 0000:00:03.0 pcieport 0000:00:03.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Receiver) pcieport 0000:00:03.0: device [1b36:000c] error status/mask=00000040/0000e000 pcieport 0000:00:03.0: [ 6] BadTLP Link: https://lore.kernel.org/linux-pci/20211021170317.GA2700910@bhelgaas/T/#m618bda4e54042d95a1a83fccc01cdb423f7590dc Signed-off-by: Naveen Naidu --- drivers/pci/pcie/aer.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c index 9784fdcf3006..241ff361b43c 100644 --- a/drivers/pci/pcie/aer.c +++ b/drivers/pci/pcie/aer.c @@ -516,10 +516,10 @@ static const char *aer_uncorrectable_error_string[] = { }; static const char *aer_agent_string[] = { - "Receiver ID", - "Requester ID", - "Completer ID", - "Transmitter ID" + "Receiver", + "Requester", + "Completer", + "Transmitter" }; #define aer_stats_dev_attr(name, stats_array, strings_array, \ @@ -703,7 +703,7 @@ void aer_print_error(struct pci_dev *dev, struct aer_err_info *info) const char *level; if (!info->status) { - pci_err(dev, "PCIe Bus Error: severity=%s, type=Inaccessible, (Unregistered Agent ID)\n", + pci_err(dev, "PCIe Bus Error: severity=%s, type=Inaccessible, (Unregistered Agent)\n", aer_error_severity_string[info->severity]); goto out; } From patchwork Mon Oct 25 17:01:01 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naveen Naidu X-Patchwork-Id: 12582409 X-Patchwork-Delegate: bhelgaas@google.com Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 62430C433EF for ; Mon, 25 Oct 2021 17:08:34 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 4C8DD60EE3 for ; Mon, 25 Oct 2021 17:08:34 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234832AbhJYRKz (ORCPT ); Mon, 25 Oct 2021 13:10:55 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48584 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235393AbhJYRJ2 (ORCPT ); Mon, 25 Oct 2021 13:09:28 -0400 Received: from mail-pj1-x1030.google.com (mail-pj1-x1030.google.com [IPv6:2607:f8b0:4864:20::1030]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 616C5C04318A; Mon, 25 Oct 2021 10:02:17 -0700 (PDT) Received: by mail-pj1-x1030.google.com with SMTP id n36-20020a17090a5aa700b0019fa884ab85so11934765pji.5; Mon, 25 Oct 2021 10:02:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=FR6K5YUYcCv1FDH4y5DK0PhIoCQEDWlRf94cnPNCwds=; b=N4v/A7+QJCcAbm+CvFlnlwM7C67m7YvUb7uDA62Ea6M2zy9mFn7L6qlzyseS/M/hvG tSJbimFBdC4p5RReqa2wpV6Qqjd4P24XzwxeLbj00xU7hcRRw5znumqH2s/fOHMfjSX9 o5nd/AIPR73SqODv5t3sS3lZf5iijuK80MsyWyIw5o0FWX9YKrLpr3FlXwhqmXj9ObM3 vPpoGZ6V4Jv4wDgtx2wXumflnIweN5T/qLoY7NBdAFL6pkNwLJgP04wR0lIKTa5RLweC RqPKrKht1tvRKs6JR6I7M00dsZt0Vx5DzJDUHy6IhLUn99V93sH7dP35x73mQfuyLXxC urFg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=FR6K5YUYcCv1FDH4y5DK0PhIoCQEDWlRf94cnPNCwds=; b=MFpHAdeurAEKW2EVweZpcP5VUcJdcBMyypn7siBqcwV7xscgnkfqE0SlGTe55+Q2wq /HviQ7WdYt304CCUULBKgt+P2NBCm/u8mp4zvbXnJBRVVC2I937L6lyTuzJccr8xc7k8 Ubo8mNtJK+4UDEAmxhXvVZm6FVkM7wPl7SEbZN2yhy/JTpMYhX2TkoVTtOoNf42G37MW 7VAo6YtpAgQtI1/sKGuQNBsAL0sXoHbzDpv+KjSEsDEg6xRVmtGDAU6LF67CGsCYRBaC RqqHQDnArX5ZEn5C9omRTSsZh6qrZzzgJC3W9FaB3UDsbFrTp+VxNGfRRVFpnj2UAvu5 18Zw== X-Gm-Message-State: AOAM530kIxnMOm4uCae57PV1J7R7Zzr3PdwbPNluLvdtvGqW5hAHZ93Z gLBNW0B/2w0d2ZE5i9Rxy6o= X-Google-Smtp-Source: ABdhPJxPfl0ou0aD/UXHIs1Nj72QAYSTX4fl/euKCb4hAOQJajxyiu9l4DhyC9t/Ht92fYZ9ThIeJg== X-Received: by 2002:a17:90a:fe16:: with SMTP id ck22mr12789728pjb.186.1635181336216; Mon, 25 Oct 2021 10:02:16 -0700 (PDT) Received: from localhost.localdomain ([2406:7400:63:df8b:7255:8580:2394:764c]) by smtp.gmail.com with ESMTPSA id g18sm5100858pfj.67.2021.10.25.10.02.12 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 25 Oct 2021 10:02:15 -0700 (PDT) From: Naveen Naidu To: bhelgaas@google.com, ruscur@russell.cc, oohall@gmail.com Cc: Naveen Naidu , linux-kernel-mentees@lists.linuxfoundation.org, skhan@linuxfoundation.org, linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org, linuxppc-dev@lists.ozlabs.org Subject: [PATCH v5 2/5] PCI: Cleanup struct aer_err_info Date: Mon, 25 Oct 2021 22:31:01 +0530 Message-Id: <10d354c32e7517ad16e9f37bd4595de83dd7ccbc.1635179600.git.naveennaidu479@gmail.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org The id, status and the mask fields of the struct aer_err_info comes directly from the registers, hence their sizes should be explicit. The length of these registers are: - id: 16 bits - Represents the Error Source Requester ID - status: 32 bits - COR/UNCOR Error Status - mask: 32 bits - COR/UNCOR Error Mask Since the length of the above registers are even, use u16 and u32 to represent their values. Also remove the __pad fields. "pahole" was run on the modified struct aer_err_info and the size remains unchanged. Signed-off-by: Naveen Naidu --- drivers/pci/pci.h | 8 +++----- 1 file changed, 3 insertions(+), 5 deletions(-) diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h index 1cce56c2aea0..9be7a966fda7 100644 --- a/drivers/pci/pci.h +++ b/drivers/pci/pci.h @@ -427,18 +427,16 @@ struct aer_err_info { struct pci_dev *dev[AER_MAX_MULTI_ERR_DEVICES]; int error_dev_num; - unsigned int id:16; + u16 id; unsigned int severity:2; /* 0:NONFATAL | 1:FATAL | 2:COR */ - unsigned int __pad1:5; unsigned int multi_error_valid:1; unsigned int first_error:5; - unsigned int __pad2:2; unsigned int tlp_header_valid:1; - unsigned int status; /* COR/UNCOR Error Status */ - unsigned int mask; /* COR/UNCOR Error Mask */ + u32 status; /* COR/UNCOR Error Status */ + u32 mask; /* COR/UNCOR Error Mask */ struct aer_header_log_regs tlp; /* TLP Header */ }; From patchwork Mon Oct 25 17:01:02 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naveen Naidu X-Patchwork-Id: 12582411 X-Patchwork-Delegate: bhelgaas@google.com Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id BE1C4C433FE for ; Mon, 25 Oct 2021 17:08:35 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id A57CB60EE3 for ; Mon, 25 Oct 2021 17:08:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234399AbhJYRK4 (ORCPT ); Mon, 25 Oct 2021 13:10:56 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48738 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235428AbhJYRJc (ORCPT ); Mon, 25 Oct 2021 13:09:32 -0400 Received: from mail-pg1-x52a.google.com (mail-pg1-x52a.google.com [IPv6:2607:f8b0:4864:20::52a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3A39DC043193; Mon, 25 Oct 2021 10:02:22 -0700 (PDT) Received: by mail-pg1-x52a.google.com with SMTP id f5so11525418pgc.12; Mon, 25 Oct 2021 10:02:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=IP7bg1XxqK5dXg//yE9NcJiabkbw2qPuKs1z85ZWNME=; b=A406sfTrXh8GMkSFquNuEPUPbjXrxLZbOVCF4bYpKdMv8ShllAqdbJdQb04lVxhhHC NKkoeX83bwle+wOS3SOVstoXdO16KEasww9oDGJfCUsjGE+DjaeiNTYWvT74nieEka1N K9m4PrHtdH1pmfWFdizj9JYhifV5yhsxqg5B2nEfXZTlzb1FIiswefawc0O2iMM8q21n 2VdIMqT36zQmkVdUYJLMaFyxVljy8JP/WNfT9KkTRKj1UPdytBECAF68jGLrlBj6FB57 L5pp6txUxHnZJtWsBWKRJm57CWn0g/mevx9sY+YHInLJ4anBMRAhda4Zyo/IHjmIQ/dr 23eA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=IP7bg1XxqK5dXg//yE9NcJiabkbw2qPuKs1z85ZWNME=; b=JJWs1DU+uYzZlCNxj1CZjg+wJitRlri9adYakJMffMqO/nS8DAg2vA25pjHvKm1jHZ WWyJk0NoYYv9ZhBm+Cky9rB+yGXcPuaZQbjXnIqH8WIk6UwkJLKOS9k3nPJ8dVP9DlmR rjMFYEChGoIJ4U6qlzsn56eyrHHjKUNr5+RN8qeti3KpYZCkHdxL9Gss+0QibVvI5FNz I0Z9vHc47eDDb/nS2b92tkZ29WBhmiLTnb5WgfCCdYzw3XQ9wvvEWJMLPYIb+oNBxrd1 v65ZylI7OyeTcBMaouybdxTr/uQJH/qn1NP524kZCZUAoSGu6TDYetmIvLx1ujYsS3HT EOyw== X-Gm-Message-State: AOAM532LNORAPrR+Ao0gjeuh1T3mLjnpV5THjDkEZUOLScNBzmjDIMkY NuaCNHqWR8i9Y6DFFigBgSY= X-Google-Smtp-Source: ABdhPJwVXGCGM66S42ZGCsnkfum3D4CcEQwiQH5P2pomnEuWRSCqLNJFiZU0F8DC/4n6/2A7JXHjoA== X-Received: by 2002:a63:2a88:: with SMTP id q130mr14468449pgq.169.1635181341692; Mon, 25 Oct 2021 10:02:21 -0700 (PDT) Received: from localhost.localdomain ([2406:7400:63:df8b:7255:8580:2394:764c]) by smtp.gmail.com with ESMTPSA id g18sm5100858pfj.67.2021.10.25.10.02.17 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 25 Oct 2021 10:02:21 -0700 (PDT) From: Naveen Naidu To: bhelgaas@google.com, ruscur@russell.cc, oohall@gmail.com Cc: Naveen Naidu , linux-kernel-mentees@lists.linuxfoundation.org, skhan@linuxfoundation.org, linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, Keith Busch , Oza Pawandeep , Sinan Kaya Subject: [PATCH v5 3/5] PCI/DPC: Initialize info.id in dpc_process_error() Date: Mon, 25 Oct 2021 22:31:02 +0530 Message-Id: X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org In the dpc_process_error() path, info.id isn't initialized before being passed to aer_print_error(). In the corresponding AER path, it is initialized in aer_isr_one_error(). The error message shown during Coverity Scan is: Coverity #1461602 CID 1461602 (#1 of 1): Uninitialized scalar variable (UNINIT) 8. uninit_use_in_call: Using uninitialized value info.id when calling aer_print_error. Also Per PCIe r5.0, sec 7.9.15.5, the Source ID is defined only when the Trigger Reason indicates ERR_NONFATAL or ERR_FATAL. Initialize the "info.id" based on the trigger reason before passing it to aer_print_error() Fixes: 8aefa9b0d910 ("PCI/DPC: Print AER status in DPC event handling") Signed-off-by: Naveen Naidu --- drivers/pci/pcie/dpc.c | 16 ++++++++++++---- 1 file changed, 12 insertions(+), 4 deletions(-) diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c index c556e7beafe3..6fa1b1eb4671 100644 --- a/drivers/pci/pcie/dpc.c +++ b/drivers/pci/pcie/dpc.c @@ -262,16 +262,24 @@ static int dpc_get_aer_uncorrect_severity(struct pci_dev *dev, void dpc_process_error(struct pci_dev *pdev) { - u16 cap = pdev->dpc_cap, status, source, reason, ext_reason; + u16 cap = pdev->dpc_cap, status, reason, ext_reason; struct aer_err_info info; pci_read_config_word(pdev, cap + PCI_EXP_DPC_STATUS, &status); - pci_read_config_word(pdev, cap + PCI_EXP_DPC_SOURCE_ID, &source); + reason = (status & PCI_EXP_DPC_STATUS_TRIGGER_RSN) >> 1; + + /* + * Per PCIe r5.0, sec 7.9.15.5, the Source ID is defined only when the + * Trigger Reason indicates ERR_NONFATAL or ERR_FATAL. + */ + if (reason == 1 || reason == 2) + pci_read_config_word(pdev, cap + PCI_EXP_DPC_SOURCE_ID, &info.id); + else + info.id = 0; pci_info(pdev, "containment event, status:%#06x source:%#06x\n", - status, source); + status, info.id); - reason = (status & PCI_EXP_DPC_STATUS_TRIGGER_RSN) >> 1; ext_reason = (status & PCI_EXP_DPC_STATUS_TRIGGER_RSN_EXT) >> 5; pci_warn(pdev, "%s detected\n", (reason == 0) ? "unmasked uncorrectable error" : From patchwork Mon Oct 25 17:01:03 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naveen Naidu X-Patchwork-Id: 12582415 X-Patchwork-Delegate: bhelgaas@google.com Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7F3F8C43217 for ; Mon, 25 Oct 2021 17:08:48 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 63B0960F70 for ; Mon, 25 Oct 2021 17:08:48 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234894AbhJYRK6 (ORCPT ); Mon, 25 Oct 2021 13:10:58 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48654 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235464AbhJYRJi (ORCPT ); Mon, 25 Oct 2021 13:09:38 -0400 Received: from mail-pl1-x635.google.com (mail-pl1-x635.google.com [IPv6:2607:f8b0:4864:20::635]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6B016C04319C; Mon, 25 Oct 2021 10:02:27 -0700 (PDT) Received: by mail-pl1-x635.google.com with SMTP id y1so78148plg.3; Mon, 25 Oct 2021 10:02:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=CbcvGhiCynD7mHKxH0tIRuOHs7YtG8M/upRN3Jxhw2Y=; b=TahbiPRwxdYlB8R32FE/ddOj6TV7g+S93Y7bTWay/UDtjPuuYqjcWDNO4Sc9LKvBHd 2jw1FLr4XqNY0ELdMcwIrP3v2ufwrqBNZPQrILXAwoDtEyMHt2xGhPJXEUqbMLpAfvGR ejOIVWmmMK41FPj+Xk+iAuFG6bSVgpbQwVkzsr/tf52hJVCXFo8cli9EInVxDoUW5PJI 505Ftg07t5gC2JyGzTBNOG8ZHvbhFTpRc1Hmc2+xjcWLaENRda6bp18El48esZR0hSHi /m3a0ng63KFrzj5XmrOtxAB6GW/Ov4tBod+ny/Mn0oGgvenannexJmSSMJB0v7bky8Ed esCg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=CbcvGhiCynD7mHKxH0tIRuOHs7YtG8M/upRN3Jxhw2Y=; b=DsVjpRWzcIFa0jHGWyXdccRqU0erfhKPkswsXLOXXdcV/xcUnJ34aYv5ND24HehSiE fZISwkPA23mujZ5ssItZucXYmV6bhwt1NYHMdcXa03cgKwpsxi6a/HXWpkMOxXHWUZKB ez9JBjz6dcYGT9Jikq0572arEo6NiSJHZphpeA9VDee2iMdmHrtG+Cg66/VxPNdXT+47 tAoTMqkEys1nHX2DPT5rRZpOaW0afyZNmTu5Pn/F5MXD/q8l737sXuK6ErHJ9AqBOnYg WLgqpjhcn7kOD4d4UGHM+T4W4nn5Y7yh6LDIH7+0ist2d8VO2P9Bg5cGRJVHhveP9bby 1qqA== X-Gm-Message-State: AOAM531zpJ/WvcJMyfny2dUi93l/mNhU9a7ndjD2Re1eMLuqlQl7E6jz giMIJftsBJj3nhdchenCce8= X-Google-Smtp-Source: ABdhPJwj8P8g8u1L2b0v66cPyjErA4hDEmCe9jIMnClKK9DH2N52p87iPGMedaCd4zLpwEn+C4np6Q== X-Received: by 2002:a17:902:da90:b0:140:55f8:ca63 with SMTP id j16-20020a170902da9000b0014055f8ca63mr6770301plx.72.1635181346805; Mon, 25 Oct 2021 10:02:26 -0700 (PDT) Received: from localhost.localdomain ([2406:7400:63:df8b:7255:8580:2394:764c]) by smtp.gmail.com with ESMTPSA id g18sm5100858pfj.67.2021.10.25.10.02.23 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 25 Oct 2021 10:02:26 -0700 (PDT) From: Naveen Naidu To: bhelgaas@google.com, ruscur@russell.cc, oohall@gmail.com Cc: Naveen Naidu , linux-kernel-mentees@lists.linuxfoundation.org, skhan@linuxfoundation.org, linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org, linuxppc-dev@lists.ozlabs.org Subject: [PATCH v5 4/5] PCI/AER: Clear error device AER registers in aer_irq() Date: Mon, 25 Oct 2021 22:31:03 +0530 Message-Id: X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org Converge the APEI path and native AER path of clearing the AER registers of the error device. In APEI path, the system firmware clears the AER registers before handing off the record to OS. But in "native AER" path, the execution path of clearing the AER register is as follows: aer_isr_one_error aer_print_port_info if (find_source_device()) aer_process_err_devices handle_error_source pci_write_config_dword(dev, PCI_ERR_COR_STATUS, ...) The above path has a bug, if the find_source_device() fails, AER registers are not cleared from the error device. This means, the error device will keep reporting the error again and again and would lead to message spew. Related Bug Report: https://lore.kernel.org/linux-pci/20151229155822.GA17321@localhost/ https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1521173 The above bug could be avoided, if the AER registers are cleared during the AER IRQ handler aer_irq(), which would provide guarantee that the AER error registers are always cleared. This is similar to how APEI handles these errors. The main aim is that: When an interrupt handler deals with a interrupt, it must *always* clear the source of the interrupt. Signed-off-by: Naveen Naidu --- drivers/pci/pci.h | 13 ++- drivers/pci/pcie/aer.c | 249 ++++++++++++++++++++++++++++------------- 2 files changed, 184 insertions(+), 78 deletions(-) diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h index 9be7a966fda7..eb88d8bfeaf7 100644 --- a/drivers/pci/pci.h +++ b/drivers/pci/pci.h @@ -424,7 +424,6 @@ static inline bool pci_dev_is_added(const struct pci_dev *dev) #define AER_MAX_MULTI_ERR_DEVICES 5 /* Not likely to have more */ struct aer_err_info { - struct pci_dev *dev[AER_MAX_MULTI_ERR_DEVICES]; int error_dev_num; u16 id; @@ -440,6 +439,18 @@ struct aer_err_info { struct aer_header_log_regs tlp; /* TLP Header */ }; +/* Preliminary AER error information processed from Root port */ +struct aer_devices_err_info { + struct pci_dev *dev[AER_MAX_MULTI_ERR_DEVICES]; + struct aer_err_info err_info; +}; + +/* AER information associated with each error device */ +struct aer_dev_err_info { + struct pci_dev *dev; + struct aer_err_info err_info; +}; + int aer_get_device_error_info(struct pci_dev *dev, struct aer_err_info *info); void aer_print_error(struct pci_dev *dev, struct aer_err_info *info); #endif /* CONFIG_PCIEAER */ diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c index 241ff361b43c..d3937f5384e4 100644 --- a/drivers/pci/pcie/aer.c +++ b/drivers/pci/pcie/aer.c @@ -36,6 +36,18 @@ #define AER_ERROR_SOURCES_MAX 128 +/* + * There can be 128 maximum error sources (AER_ERROR_SOURCES_MAX) and each + * error source can have maximum of 5 error devices (AER_MAX_MULTI_ERR_DEVICES) + * so the maximum error devices we can report is: + * + * AER_ERROR_DEVICES_MAX = AER_ERROR_SOURCES_MAX * AER_MAX_MULTI_ERR_DEVICES == (128 * 5) == 640 + * + * But since, the size in KFIFO should be a power of two, the closest value + * to 640 is 1024 + */ +# define AER_ERROR_DEVICES_MAX 1024 + #define AER_MAX_TYPEOF_COR_ERRS 16 /* as per PCI_ERR_COR_STATUS */ #define AER_MAX_TYPEOF_UNCOR_ERRS 27 /* as per PCI_ERR_UNCOR_STATUS*/ @@ -46,7 +58,7 @@ struct aer_err_source { struct aer_rpc { struct pci_dev *rpd; /* Root Port device */ - DECLARE_KFIFO(aer_fifo, struct aer_err_source, AER_ERROR_SOURCES_MAX); + DECLARE_KFIFO(aer_fifo, struct aer_dev_err_info, AER_ERROR_DEVICES_MAX); }; /* AER stats for the device */ @@ -803,14 +815,14 @@ void cper_print_aer(struct pci_dev *dev, int aer_severity, /** * add_error_device - list device to be handled - * @e_info: pointer to error info + * @e_dev: pointer to error info * @dev: pointer to pci_dev to be added */ -static int add_error_device(struct aer_err_info *e_info, struct pci_dev *dev) +static int add_error_device(struct aer_devices_err_info *e_dev, struct pci_dev *dev) { - if (e_info->error_dev_num < AER_MAX_MULTI_ERR_DEVICES) { - e_info->dev[e_info->error_dev_num] = pci_dev_get(dev); - e_info->error_dev_num++; + if (e_dev->err_info.error_dev_num < AER_MAX_MULTI_ERR_DEVICES) { + e_dev->dev[e_dev->err_info.error_dev_num] = pci_dev_get(dev); + e_dev->err_info.error_dev_num++; return 0; } return -ENOSPC; @@ -877,18 +889,18 @@ static bool is_error_source(struct pci_dev *dev, struct aer_err_info *e_info) static int find_device_iter(struct pci_dev *dev, void *data) { - struct aer_err_info *e_info = (struct aer_err_info *)data; + struct aer_devices_err_info *e_dev = (struct aer_devices_err_info *)data; - if (is_error_source(dev, e_info)) { + if (is_error_source(dev, &e_dev->err_info)) { /* List this device */ - if (add_error_device(e_info, dev)) { + if (add_error_device(e_dev, dev)) { /* We cannot handle more... Stop iteration */ /* TODO: Should print error message here? */ return 1; } /* If there is only a single error, stop iteration */ - if (!e_info->multi_error_valid) + if (!e_dev->err_info.multi_error_valid) return 1; } return 0; @@ -897,7 +909,7 @@ static int find_device_iter(struct pci_dev *dev, void *data) /** * find_source_device - search through device hierarchy for source device * @parent: pointer to Root Port pci_dev data structure - * @e_info: including detailed error information such like id + * @e_dev: including detailed error information such like id * * Return true if found. * @@ -907,26 +919,26 @@ static int find_device_iter(struct pci_dev *dev, void *data) * e_info->error_dev_num and e_info->dev[], based on the given information. */ static bool find_source_device(struct pci_dev *parent, - struct aer_err_info *e_info) + struct aer_devices_err_info *e_dev) { struct pci_dev *dev = parent; int result; /* Must reset in this function */ - e_info->error_dev_num = 0; + e_dev->err_info.error_dev_num = 0; /* Is Root Port an agent that sends error message? */ - result = find_device_iter(dev, e_info); + result = find_device_iter(dev, e_dev); if (result) return true; if (pci_pcie_type(parent) == PCI_EXP_TYPE_RC_EC) - pcie_walk_rcec(parent, find_device_iter, e_info); + pcie_walk_rcec(parent, find_device_iter, e_dev); else - pci_walk_bus(parent->subordinate, find_device_iter, e_info); + pci_walk_bus(parent->subordinate, find_device_iter, e_dev); - if (!e_info->error_dev_num) { - pci_info(parent, "can't find device of ID%04x\n", e_info->id); + if (!e_dev->err_info.error_dev_num) { + pci_info(parent, "can't find device of ID%04x\n", e_dev->err_info.id); return false; } return true; @@ -940,24 +952,42 @@ static bool find_source_device(struct pci_dev *parent, * Invoked when an error being detected by Root Port. */ static void handle_error_source(struct pci_dev *dev, struct aer_err_info *info) +{ + /* + * Correctable error does not need software intervention. + * No need to go through error recovery process. + */ + if (info->severity == AER_NONFATAL) + pcie_do_recovery(dev, pci_channel_io_normal, aer_root_reset); + else if (info->severity == AER_FATAL) + pcie_do_recovery(dev, pci_channel_io_frozen, aer_root_reset); + pci_dev_put(dev); +} + +/** + * clear_error_source_aer_registers - clear AER registers of the error source device + * @dev: pointer to pci_dev data structure of error source device + * @info: comprehensive error information + * + * Invoked when an error being detected by Root Port but before we handle the + * error. + */ +static void clear_error_source_aer_registers(struct pci_dev *dev, struct aer_err_info info) { int aer = dev->aer_cap; - if (info->severity == AER_CORRECTABLE) { - /* - * Correctable error does not need software intervention. - * No need to go through error recovery process. - */ + if (info.severity == AER_CORRECTABLE) { if (aer) pci_write_config_dword(dev, aer + PCI_ERR_COR_STATUS, - info->status); + info.status); if (pcie_aer_is_native(dev)) pcie_clear_device_status(dev); - } else if (info->severity == AER_NONFATAL) - pcie_do_recovery(dev, pci_channel_io_normal, aer_root_reset); - else if (info->severity == AER_FATAL) - pcie_do_recovery(dev, pci_channel_io_frozen, aer_root_reset); - pci_dev_put(dev); + } else if (info.severity == AER_NONFATAL) { + pci_aer_clear_nonfatal_status(dev); + } else if (info.severity == AER_FATAL) { + pci_aer_clear_fatal_status(dev); + } + } #ifdef CONFIG_ACPI_APEI_PCIEAER @@ -1093,70 +1123,112 @@ int aer_get_device_error_info(struct pci_dev *dev, struct aer_err_info *info) return 1; } -static inline void aer_process_err_devices(struct aer_err_info *e_info) -{ - int i; - - /* Report all before handle them, not to lost records by reset etc. */ - for (i = 0; i < e_info->error_dev_num && e_info->dev[i]; i++) { - if (aer_get_device_error_info(e_info->dev[i], e_info)) - aer_print_error(e_info->dev[i], e_info); - } - for (i = 0; i < e_info->error_dev_num && e_info->dev[i]; i++) { - if (aer_get_device_error_info(e_info->dev[i], e_info)) - handle_error_source(e_info->dev[i], e_info); - } -} - /** - * aer_isr_one_error - consume an error detected by root port - * @rpc: pointer to the root port which holds an error + * aer_find_corr_error_source_device - find the error source which detected the corrected error + * @rp: pointer to Root Port pci_dev data structure * @e_src: pointer to an error source + * @e_info: including detailed error information such like id + * + * Return true if found. + * + * Process the error information received at the Root Port, set these values + * in the aer_devices_err_info and find all the devices that are related to + * the error. */ -static void aer_isr_one_error(struct aer_rpc *rpc, - struct aer_err_source *e_src) +static bool aer_find_corr_error_source_device(struct pci_dev *rp, + struct aer_err_source *e_src, + struct aer_devices_err_info *e_info) { - struct pci_dev *pdev = rpc->rpd; - struct aer_err_info e_info; - - pci_rootport_aer_stats_incr(pdev, e_src); - - /* - * There is a possibility that both correctable error and - * uncorrectable error being logged. Report correctable error first. - */ if (e_src->status & PCI_ERR_ROOT_COR_RCV) { - e_info.id = ERR_COR_ID(e_src->id); - e_info.severity = AER_CORRECTABLE; + e_info->err_info.id = ERR_COR_ID(e_src->id); + e_info->err_info.severity = AER_CORRECTABLE; if (e_src->status & PCI_ERR_ROOT_MULTI_COR_RCV) - e_info.multi_error_valid = 1; + e_info->err_info.multi_error_valid = 1; else - e_info.multi_error_valid = 0; - aer_print_port_info(pdev, &e_info); + e_info->err_info.multi_error_valid = 0; - if (find_source_device(pdev, &e_info)) - aer_process_err_devices(&e_info); + if (!find_source_device(rp, e_info)) + return false; } + return true; +} +/** + * aer_find_uncorr_error_source_device - find the error source which detected the uncorrected error + * @rp: pointer to Root Port pci_dev data structure + * @e_src: pointer to an error source + * @e_info: including detailed error information such like id + * + * Return true if found. + * + * Process the error information received at the Root Port, set these values + * in the aer_devices_err_info and find all the devices that are related to + * the error. + */ +static bool aer_find_uncorr_error_source_device(struct pci_dev *rp, + struct aer_err_source *e_src, + struct aer_devices_err_info *e_info) +{ if (e_src->status & PCI_ERR_ROOT_UNCOR_RCV) { - e_info.id = ERR_UNCOR_ID(e_src->id); + e_info->err_info.id = ERR_UNCOR_ID(e_src->id); if (e_src->status & PCI_ERR_ROOT_FATAL_RCV) - e_info.severity = AER_FATAL; + e_info->err_info.severity = AER_FATAL; else - e_info.severity = AER_NONFATAL; + e_info->err_info.severity = AER_NONFATAL; if (e_src->status & PCI_ERR_ROOT_MULTI_UNCOR_RCV) - e_info.multi_error_valid = 1; + e_info->err_info.multi_error_valid = 1; else - e_info.multi_error_valid = 0; + e_info->err_info.multi_error_valid = 0; + + if (!find_source_device(rp, e_info)) + return false; + } - aer_print_port_info(pdev, &e_info); + return true; +} - if (find_source_device(pdev, &e_info)) - aer_process_err_devices(&e_info); +/** + * aer_isr_one_error - consume an error detected by root port + * @rp: pointer to Root Port pci_dev data structure + * @e_dev: pointer to an error device + */ +static void aer_isr_one_error(struct pci_dev *rp, struct aer_dev_err_info *e_dev) +{ + aer_print_port_info(rp, &e_dev->err_info); + aer_print_error(e_dev->dev, &e_dev->err_info); + handle_error_source(e_dev->dev, &e_dev->err_info); +} + +static bool aer_add_err_devices_to_queue(struct aer_rpc *rpc, + struct aer_devices_err_info *e_info) +{ + int i; + struct aer_dev_err_info *e_dev; + + e_dev = kzalloc(sizeof(*e_dev), GFP_ATOMIC); + if (!e_dev) + return false; + + for (i = 0; i < e_info->err_info.error_dev_num && e_info->dev[i]; i++) { + e_dev->err_info = e_info->err_info; + e_dev->dev = e_info->dev[i]; + + /* + * Store the AER register information for each error device on + * the queue + */ + if (aer_get_device_error_info(e_dev->dev, &e_dev->err_info)) { + if (!kfifo_put(&rpc->aer_fifo, *e_dev)) + return false; + + clear_error_source_aer_registers(e_dev->dev, e_dev->err_info); + } } + + return true; } /** @@ -1170,13 +1242,13 @@ static irqreturn_t aer_isr(int irq, void *context) { struct pcie_device *dev = (struct pcie_device *)context; struct aer_rpc *rpc = get_service_data(dev); - struct aer_err_source e_src; + struct aer_dev_err_info e_dev; if (kfifo_is_empty(&rpc->aer_fifo)) return IRQ_NONE; - while (kfifo_get(&rpc->aer_fifo, &e_src)) - aer_isr_one_error(rpc, &e_src); + while (kfifo_get(&rpc->aer_fifo, &e_dev)) + aer_isr_one_error(rpc->rpd, &e_dev); return IRQ_HANDLED; } @@ -1194,6 +1266,11 @@ static irqreturn_t aer_irq(int irq, void *context) struct pci_dev *rp = rpc->rpd; int aer = rp->aer_cap; struct aer_err_source e_src = {}; + struct aer_devices_err_info *e_info; + + e_info = kzalloc(sizeof(*e_info), GFP_ATOMIC); + if (!e_info) + return IRQ_NONE; pci_read_config_dword(rp, aer + PCI_ERR_ROOT_STATUS, &e_src.status); if (!(e_src.status & (PCI_ERR_ROOT_UNCOR_RCV|PCI_ERR_ROOT_COR_RCV))) @@ -1202,8 +1279,26 @@ static irqreturn_t aer_irq(int irq, void *context) pci_read_config_dword(rp, aer + PCI_ERR_ROOT_ERR_SRC, &e_src.id); pci_write_config_dword(rp, aer + PCI_ERR_ROOT_STATUS, e_src.status); - if (!kfifo_put(&rpc->aer_fifo, e_src)) - return IRQ_HANDLED; + pci_rootport_aer_stats_incr(rp, &e_src); + + /* + * There is a possibility that both correctable error and + * uncorrectable error are being logged. Find the devices which caused + * correctable errors first so that they can be added to the queue first + * and will be reported first. + * + * Before adding the error device to the queue to be handled, clear the + * AER status registers. + */ + if (aer_find_corr_error_source_device(rp, &e_src, e_info)) { + if (!aer_add_err_devices_to_queue(rpc, e_info)) + return IRQ_NONE; + } + + if (aer_find_uncorr_error_source_device(rp, &e_src, e_info)) { + if (!aer_add_err_devices_to_queue(rpc, e_info)) + return IRQ_NONE; + } return IRQ_WAKE_THREAD; } From patchwork Mon Oct 25 17:01:04 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naveen Naidu X-Patchwork-Id: 12582413 X-Patchwork-Delegate: bhelgaas@google.com Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 68978C433F5 for ; Mon, 25 Oct 2021 17:08:39 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 543966103C for ; Mon, 25 Oct 2021 17:08:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234083AbhJYRK5 (ORCPT ); Mon, 25 Oct 2021 13:10:57 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48638 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235476AbhJYRJi (ORCPT ); Mon, 25 Oct 2021 13:09:38 -0400 Received: from mail-pl1-x633.google.com (mail-pl1-x633.google.com [IPv6:2607:f8b0:4864:20::633]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 43AEEC04319F; Mon, 25 Oct 2021 10:02:32 -0700 (PDT) Received: by mail-pl1-x633.google.com with SMTP id i5so8365896pla.5; Mon, 25 Oct 2021 10:02:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=cqjk8A2jWPq0oF4j9f9pzp1s4umWD0fjXGGtRVK6pjI=; b=Km6qa7as2zKj4ahv0p9FX7MpcWGD6tjOML1ZhOUQeUR0KuKrOM/Y0B/OyJfcY7k+wV vxBWCUFtyHAC9xpI7R+JpBoHX7mIBWtExK6dpDNOvdxr8x585+K1/QoH1n9W51ljXFIe Ime4psXupDQ8xMItucMuFoc8wY9NLuiKaSBK9NCeCLy4GpEaqcgSw4MigLRUmPBvRKBV zeGByhYX/WSCf8Rw50ZVEGRzOm1ClghUP3ygjo8N7vdrD6cgqrTBfTZqpLCwQ5gg2s9A hpWhkVoMpqIKr+Z8t1u0lgEIc0EPdLIJCebjyg1jYoq/0ks0XSKGoaFxjA2/LD1Bz+gE HU6g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=cqjk8A2jWPq0oF4j9f9pzp1s4umWD0fjXGGtRVK6pjI=; b=P7yxLr5iTIFcAVeFGYDGuwWDV+ptd7ttWlMsQxZj/y9xPOx2rFyFvhf9+QbuK3Wxnk HZ0sfq1dbbZyq6Gq0DcszAXcMg5B7l5j0kLkdjODl07PSGIUmdWD+OhXwPtm0Fm8ucku AaOYTih6NRtm2O0oXxK7xhsWmPmPAdqKjhXMSFpwyRvI+S9HDvxe9Kwbh5zkDCkvLMUB EgCD+02t1iDytIzyagrkvTi0IkJUUM94QyBX0w0h3rtuR7HbEeoAZzkec0qcX9ggxKb1 ZIw8ouEawhI3aBWCsH1MxXzwSayvVnVBUWD3d+9VfgilYkik5gprSNSR+6jqYafqPdui haZg== X-Gm-Message-State: AOAM5328CmcaVQSgPSvauFG97pEB898sfrIjoQJrMohqFqEHibSR1qFM oK5PWAaBqmQJ1XZVMIrE/JomnjyGEPa3gg== X-Google-Smtp-Source: ABdhPJwvg9sfWgMNprK2Vf152fcQgXoPVYpKGuP89MlymuNgLLjfTAnceEvjffU47QWeOG5XuHfFZw== X-Received: by 2002:a17:90b:1bcc:: with SMTP id oa12mr21825917pjb.212.1635181351786; Mon, 25 Oct 2021 10:02:31 -0700 (PDT) Received: from localhost.localdomain ([2406:7400:63:df8b:7255:8580:2394:764c]) by smtp.gmail.com with ESMTPSA id g18sm5100858pfj.67.2021.10.25.10.02.27 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 25 Oct 2021 10:02:31 -0700 (PDT) From: Naveen Naidu To: bhelgaas@google.com, ruscur@russell.cc, oohall@gmail.com Cc: Naveen Naidu , linux-kernel-mentees@lists.linuxfoundation.org, skhan@linuxfoundation.org, linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org, linuxppc-dev@lists.ozlabs.org Subject: [PATCH v5 5/5] PCI/AER: Include DEVCTL in aer_print_error() Date: Mon, 25 Oct 2021 22:31:04 +0530 Message-Id: <656b4eab7fae68de86bb0a52568fb93822833828.1635179600.git.naveennaidu479@gmail.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org Print the contents of Device Control Register of the device which detected the error. This might help in faster error diagnosis. It is easy to test this by using aer-inject: $ aer-inject -s 00:03:0 corr-err-file The content of the corr-err-file is as below: AER COR_STATUS BAD_TLP HEADER_LOG 0 1 2 3 Sample output from dummy error injected by aer-inject: pcieport 0000:00:03.0: AER: Corrected error received: 0000:00:03.0 pcieport 0000:00:03.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Receiver) pcieport 0000:00:03.0: device [1b36:000c] error status/mask=00000040/0000e000, devctl=0x000f <-- devctl added to the error log pcieport 0000:00:03.0: [ 6] BadTLP Signed-off-by: Naveen Naidu --- drivers/pci/pci.h | 2 ++ drivers/pci/pcie/aer.c | 10 ++++++++-- 2 files changed, 10 insertions(+), 2 deletions(-) diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h index eb88d8bfeaf7..48ed7f91113b 100644 --- a/drivers/pci/pci.h +++ b/drivers/pci/pci.h @@ -437,6 +437,8 @@ struct aer_err_info { u32 status; /* COR/UNCOR Error Status */ u32 mask; /* COR/UNCOR Error Mask */ struct aer_header_log_regs tlp; /* TLP Header */ + + u16 devctl; }; /* Preliminary AER error information processed from Root port */ diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c index d3937f5384e4..fdeef9deb016 100644 --- a/drivers/pci/pcie/aer.c +++ b/drivers/pci/pcie/aer.c @@ -729,8 +729,8 @@ void aer_print_error(struct pci_dev *dev, struct aer_err_info *info) aer_error_severity_string[info->severity], aer_error_layer[layer], aer_agent_string[agent]); - pci_printk(level, dev, " device [%04x:%04x] error status/mask=%08x/%08x\n", - dev->vendor, dev->device, info->status, info->mask); + pci_printk(level, dev, " device [%04x:%04x] error status/mask=%08x/%08x, devctl=%#06x\n", + dev->vendor, dev->device, info->status, info->mask, info->devctl); __aer_print_error(dev, info); @@ -1083,6 +1083,12 @@ int aer_get_device_error_info(struct pci_dev *dev, struct aer_err_info *info) if (!aer) return 0; + /* + * Cache the value of Device Control Register now, because later the + * device might not be available + */ + pcie_capability_read_word(dev, PCI_EXP_DEVCTL, &info->devctl); + if (info->severity == AER_CORRECTABLE) { pci_read_config_dword(dev, aer + PCI_ERR_COR_STATUS, &info->status);