From patchwork Tue Nov 29 18:43:59 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Prarit Bhargava X-Patchwork-Id: 9452969 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 685466071C for ; Tue, 29 Nov 2016 18:44:12 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5AD1E283D3 for ; Tue, 29 Nov 2016 18:44:12 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 4F83E283ED; Tue, 29 Nov 2016 18:44:12 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B0CC4283E6 for ; Tue, 29 Nov 2016 18:44:11 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933657AbcK2SoJ (ORCPT ); Tue, 29 Nov 2016 13:44:09 -0500 Received: from mx1.redhat.com ([209.132.183.28]:48628 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933703AbcK2SoG (ORCPT ); Tue, 29 Nov 2016 13:44:06 -0500 Received: from int-mx14.intmail.prod.int.phx2.redhat.com (int-mx14.intmail.prod.int.phx2.redhat.com [10.5.11.27]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id A31BC61E53; Tue, 29 Nov 2016 18:44:05 +0000 (UTC) Received: from praritdesktop.bos.redhat.com (prarit-guest.khw.lab.eng.bos.redhat.com [10.16.186.145]) by int-mx14.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id uATIi3kB001491; Tue, 29 Nov 2016 13:44:04 -0500 From: Prarit Bhargava To: linux-kernel@vger.kernel.org Cc: Prarit Bhargava , Borislav Petkov , "Rafael J. Wysocki" , Len Brown , Paul Gortmaker , Tyler Baicar , Punit Agrawal , Don Zickus , linux-acpi@vger.kernel.org Subject: [PATCH] ACPI / APEI: Fix NMI notification handling Date: Tue, 29 Nov 2016 13:43:59 -0500 Message-Id: <1480445039-3434-1-git-send-email-prarit@redhat.com> X-Scanned-By: MIMEDefang 2.68 on 10.5.11.27 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.39]); Tue, 29 Nov 2016 18:44:05 +0000 (UTC) Sender: linux-acpi-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-acpi@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP When removing and adding cpu 0 on a system with GHES NMI the following stack trace is seen when re-adding the cpu: WARNING: CPU: 0 PID: 0 at arch/x86/kernel/apic/apic.c:1349 setup_local_APIC+ Modules linked in: nfsv3 rpcsec_gss_krb5 nfsv4 nfs fscache coretemp intel_ra CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.9.0-rc5+ #59 Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.01.00.0 ffffffff81c03e78 ffffffff81337905 0000000000000000 0000000000000000 ffffffff81c03eb8 ffffffff8107d9c1 00000545810aac4a 0000000000000000 00000000000000f0 0000000000000000 000081cb6440f1d0 0000000000000001 Call Trace: [] dump_stack+0x63/0x8e [] __warn+0xd1/0xf0 [] warn_slowpath_null+0x1d/0x20 [] setup_local_APIC+0x275/0x370 [] apic_ap_setup+0xe/0x20 [] start_secondary+0x48/0x180 [] ? set_init_arg+0x55/0x55 [] ? early_idt_handler_array+0x120/0x120 [] ? x86_64_start_reservations+0x2a/0x2c [] ? x86_64_start_kernel+0x13d/0x14c ---[ end trace 7b6555b6343ef9ee ]--- During the cpu bringup, wakeup_cpu_via_init_nmi() is called and issues an NMI on CPU 0. The GHES NMI handler, ghes_notify_nmi() runs the ghes_proc_irq_work work queue which ends up setting IRQ_WORK_VECTOR (0xf6). The "faulty" IR line set at arch/x86/kernel/apic/apic.c:1349 is also 0xf6 (specifically APIC IRR for irqs 255 to 224 is 0x400000) which confirms that something has set the IRQ_WORK_VECTOR line prior to the APIC being initialized. Commit 2383844d4850 ("GHES: Elliminate double-loop in the NMI handler") incorrectly modified the behavior such that the handler returns NMI_HANDLED only if an error was processed, and incorrectly runs the ghes work queue for every NMI. This patch modifies the ghes_proc_irq_work() to run as it did prior to 2383844d4850 ("GHES: Elliminate double-loop in the NMI handler") by properly returning NMI_HANDLED and only calling the work queue if NMI_HANDLED has been set. Fixes: 2383844d4850 ("GHES: Elliminate double-loop in the NMI handler") Signed-off-by: Prarit Bhargava Cc: Borislav Petkov Cc: Rafael J. Wysocki Cc: Len Brown Cc: Paul Gortmaker Cc: Tyler Baicar Cc: Punit Agrawal Cc: Don Zickus Cc: linux-acpi@vger.kernel.org --- drivers/acpi/apei/ghes.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c index 0d099a24f776..39c45efbcb3d 100644 --- a/drivers/acpi/apei/ghes.c +++ b/drivers/acpi/apei/ghes.c @@ -858,17 +858,18 @@ static int ghes_notify_nmi(unsigned int cmd, struct pt_regs *regs) if (sev >= GHES_SEV_PANIC) __ghes_panic(ghes); + ret = NMI_HANDLED; + if (!(ghes->flags & GHES_TO_CLEAR)) continue; __process_error(ghes); ghes_clear_estatus(ghes); - - ret = NMI_HANDLED; } #ifdef CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG - irq_work_queue(&ghes_proc_irq_work); + if (ret == NMI_HANDLED) + irq_work_queue(&ghes_proc_irq_work); #endif atomic_dec(&ghes_in_nmi); return ret;