From patchwork Tue Apr 3 17:08:29 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Alex G." X-Patchwork-Id: 10321577 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 4666660318 for ; Tue, 3 Apr 2018 17:09:17 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 33A992580E for ; Tue, 3 Apr 2018 17:09:17 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 234662856B; Tue, 3 Apr 2018 17:09:17 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.8 required=2.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED, DKIM_SIGNED, FREEMAIL_FROM, RCVD_IN_DNSWL_HI, T_DKIM_INVALID autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 060032580E for ; Tue, 3 Apr 2018 17:09:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752622AbeDCRJN (ORCPT ); Tue, 3 Apr 2018 13:09:13 -0400 Received: from mail-ot0-f195.google.com ([74.125.82.195]:40317 "EHLO mail-ot0-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752604AbeDCRIx (ORCPT ); Tue, 3 Apr 2018 13:08:53 -0400 Received: by mail-ot0-f195.google.com with SMTP id j8-v6so12659934ota.7; Tue, 03 Apr 2018 10:08:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=/qz5bA4N1JP2B/JXcISXOAax9x9rsJVqlZwUuNqiiTM=; b=a1jybW2SS9CGt3fOEKtqbReB5tiARc12/KL2BI9gY7clml15hqD7yr9bqAT6bX0uuX tkNGU0AJwdR9W496SC46eTXCrKXARbW93Ils9BPsJavVo6AKU/pUPY1gwqy+2xd3RzKI GPMadR+dMr404h0ohY3qbm8ZxRNIBrjW7uGKQehcD96KTS4Lo2CUKP1aYuo+nB47lL7s cJEjCLqBXMF2OnYw1jmxPChfoTeqcO1fbGyI2jTkMVwa6feigKXynn4EMkKx5qYDFUZL VOMgMouLPVJGgiOd4kLZ/KcEkTpL/ozRlJDQFdoV3LCfUev5RY6/Iei+co9q+pjOmeGd rOQQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=/qz5bA4N1JP2B/JXcISXOAax9x9rsJVqlZwUuNqiiTM=; b=ohtx2ZUDsEzckDaO6x7goArtnNo0vR3wDfBKIRwJ0yKDXu8++auxfhoX9DDB5PShux gMQu4S95iervOkNVxwUdXN4fTbrvQoUPnfmwmt6YPyPKOXVHKDi4L5iNpwwBa021qcbt M69gUDqlQxQ3XFVYD/CXHCitpsmtMYVy31yGgKJmnHyIlreNDP7Ssud5UszGxqrRsPrU VFBIl6Ml/D23ZttrQy7BmfXpJDmNzGoIhWuxoCY85TVi3srx4P5spD7QSGfmEE78AE0i EshgX7uFVyocaHlTNFGh/6Ikwc9zU521zX1Nbs8tvyqVL8dw1d2sh1w2i1LGuIJ/fjwB Eyhg== X-Gm-Message-State: AElRT7Hdp//x8oh/CFLx5JNz1+YmwOceYyYMezSqFbcY8kCimTGpCImH kSu1Bn5jEP+IPg0nkgZBQLAGySKe X-Google-Smtp-Source: AIpwx4/AW6blcZSpBOBYVdfCDQsM5ctpSH51Dbs3/NRW+FGUWS8aj4X8z8B+YuRRX0rrjpN1CTZ+vg== X-Received: by 2002:a9d:4d1a:: with SMTP id n26-v6mr8182742otf.112.1522775332178; Tue, 03 Apr 2018 10:08:52 -0700 (PDT) Received: from nuclearis2_1.lan (c-98-197-2-30.hsd1.tx.comcast.net. [98.197.2.30]) by smtp.gmail.com with ESMTPSA id p35-v6sm1763878ota.72.2018.04.03.10.08.51 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Tue, 03 Apr 2018 10:08:51 -0700 (PDT) From: Alexandru Gagniuc To: linux-acpi@vger.kernel.org Cc: rjw@rjwysocki.net, lenb@kernel.org, tony.luck@intel.com, bp@alien8.de, tbaicar@codeaurora.org, will.deacon@arm.com, james.morse@arm.com, shiju.jose@huawei.com, zjzhang@codeaurora.org, gengdongjiu@huawei.com, linux-kernel@vger.kernel.org, alex_gagniuc@dellteam.com, austin_bolen@dell.com, shyam_iyer@dell.com, Alexandru Gagniuc Subject: [RFC PATCH 3/4] acpi: apei: Do not panic() in NMI because of GHES messages Date: Tue, 3 Apr 2018 12:08:29 -0500 Message-Id: <20180403170830.29282-4-mr.nuke.me@gmail.com> X-Mailer: git-send-email 2.14.3 In-Reply-To: <20180403170830.29282-1-mr.nuke.me@gmail.com> References: <20180403170830.29282-1-mr.nuke.me@gmail.com> Sender: linux-acpi-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-acpi@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP BIOSes like to send NMIs for a number of silly reasons often deemed to be "fatal". For example pin bounce during a PCIE hotplug/unplug might cause the link to go down and retrain, with fatal PCI errors being generated while the link is retraining. Instead of panic()ing in NMI context, pass fatal errors down to IRQ context to see if they can be resolved. With these change, PCIe error are handled by AER. Other far less common errors, such as machine check exceptions, still cause a panic() in their respective handlers. Signed-off-by: Alexandru Gagniuc --- drivers/acpi/apei/ghes.c | 22 +++++++++++----------- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c index 2c998125b1d5..7243a99ea57e 100644 --- a/drivers/acpi/apei/ghes.c +++ b/drivers/acpi/apei/ghes.c @@ -428,8 +428,7 @@ static bool ghes_handle_memory_failure(struct acpi_hest_generic_data *gdata, int * GHES_SEV_RECOVERABLE -> AER_NONFATAL * GHES_SEV_RECOVERABLE && CPER_SEC_RESET -> AER_FATAL * These both need to be reported and recovered from by the AER driver. - * GHES_SEV_PANIC does not make it to this handling since the kernel must - * panic. + * GHES_SEV_PANIC -> AER_FATAL */ static bool ghes_handle_aer(struct acpi_hest_generic_data *gdata) { @@ -899,6 +898,7 @@ static void ghes_proc_in_irq(struct irq_work *irq_work) struct ghes_estatus_node *estatus_node; struct acpi_hest_generic *generic; struct acpi_hest_generic_status *estatus; + int corrected_sev; u32 len, node_len; llnode = llist_del_all(&ghes_estatus_llist); @@ -914,7 +914,14 @@ static void ghes_proc_in_irq(struct irq_work *irq_work) estatus = GHES_ESTATUS_FROM_NODE(estatus_node); len = cper_estatus_len(estatus); node_len = GHES_ESTATUS_NODE_LEN(len); - ghes_do_proc(estatus_node->ghes, estatus); + corrected_sev = ghes_do_proc(estatus_node->ghes, estatus); + + if (corrected_sev >= GHES_SEV_PANIC) { + oops_begin(); + ghes_print_queued_estatus(); + __ghes_panic(estatus_node->ghes); + } + if (!ghes_estatus_cached(estatus)) { generic = estatus_node->generic; if (ghes_print_estatus(NULL, generic, estatus)) @@ -955,7 +962,7 @@ static void __process_error(struct ghes *ghes) static int ghes_notify_nmi(unsigned int cmd, struct pt_regs *regs) { struct ghes *ghes; - int sev, ret = NMI_DONE; + int ret = NMI_DONE; if (!atomic_add_unless(&ghes_in_nmi, 1, 1)) return ret; @@ -968,13 +975,6 @@ static int ghes_notify_nmi(unsigned int cmd, struct pt_regs *regs) ret = NMI_HANDLED; } - sev = ghes_severity(ghes->estatus->error_severity); - if (sev >= GHES_SEV_PANIC) { - oops_begin(); - ghes_print_queued_estatus(); - __ghes_panic(ghes); - } - if (!(ghes->flags & GHES_TO_CLEAR)) continue;