pci: Only disable MSI/X and enable INTx if shutdown function has been called

Bjorn,

We have seen this at Red Hat on various drivers: nouveau, ahci, mei_me, and
pcieport (so far).  Google search for "unhandled irq 16" yields many results
reporting similar behavior during shutdown indicating that this problem is
widespread.  I can cause this to happen on a "stable" system by adding a 3
second delay in pci_device_shutdown() which causes the number of spurious
interrupts to exceed the 100000 limit and display the warning below for the
primarily the nouveau driver, and occasionally for the other mentioned drivers.

A patch for this was proposed and rejected here for being too risky:

https://patchwork.kernel.org/patch/5990701/

I also originally posted a patch to resolve this here:

http://marc.info/?l=linux-pci&m=147705209308588&w=2

and several other patch suggestions were made.  The problem with all of these
solutions is that there is some risk associated with them (kdump, kvm, etc.)
and they are papering over the real issue that the PCI shutdown should not
blindly switch to INTx for all devices.

I am reproposing the original suggested patch.  There is some risk associated
with this but I don't think it is any more or any less than the other patches,
and it seems like the other patches are only applying band-aids to the problem.

[Aside: Lukas Wunner asked why does this always happen on IRQ 16 (even when the
legacy device says IRQ 32 in lspci)?

The PCI irq pins A, B, C, and D are routed according to the ACPI _PRT table for
the device.  _In general_, I have noted a consistent pattern for PCI irq pins
such that

	irq pin A is IRQ 0x10 (16)
	irq pin B is IRQ 0x11 (17)
	irq pin C is IRQ 0x12 (18)
	irq pin D is IRQ 0x13 (19)

Since the device's IRQ is hooked up to pin A we're seeing the unhandled
interrupt on IRQ 16.]

I have tested this on various systems with KVM and kdump (and kdump on
KVM) and didn't see any issues.

NOTE: In my testing this resolves the problem with PCI based serial ports
cutting off their output during shutdown.  Again, this can be tracked to the
PCI shutdown path switching between MSI & INTx independently of the driver.

----8<----

The following unhandled IRQ warning is seen during shutdown:

irq 16: nobody cared (try booting with the "irqpoll" option)
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.8.2-1.el7_UNSUPPORTED.x86_64 #1
Hardware name: Hewlett-Packard HP Z820 Workstation/158B, BIOS J63 v03.90 06/01/2016
 0000000000000000 ffff88041f803e70 ffffffff81333bd5 ffff88041cb78200
 ffff88041cb7829c ffff88041f803e98 ffffffff810d9465 ffff88041cb78200
 0000000000000000 0000000000000028 ffff88041f803ed0 ffffffff810d97bf
Call Trace:
 <IRQ>  [<ffffffff81333bd5>] dump_stack+0x63/0x8e
 [<ffffffff810d9465>] __report_bad_irq+0x35/0xd0
 [<ffffffff810d97bf>] note_interrupt+0x20f/0x260
 [<ffffffff810d6b35>] handle_irq_event_percpu+0x45/0x60
 [<ffffffff810d6b7c>] handle_irq_event+0x2c/0x50
 [<ffffffff810da31a>] handle_fasteoi_irq+0x8a/0x150
 [<ffffffff8102edfb>] handle_irq+0xab/0x130
 [<ffffffff81082391>] ? _local_bh_enable+0x21/0x50
 [<ffffffff817064ad>] do_IRQ+0x4d/0xd0
 [<ffffffff81704502>] common_interrupt+0x82/0x82
 <EOI>  [<ffffffff815d0181>] ? cpuidle_enter_state+0xc1/0x280
 [<ffffffff815d0174>] ? cpuidle_enter_state+0xb4/0x280
 [<ffffffff815d0377>] cpuidle_enter+0x17/0x20
 [<ffffffff810bf660>] cpu_startup_entry+0x220/0x3a0
 [<ffffffff816f6da7>] rest_init+0x77/0x80
 [<ffffffff81d8e147>] start_kernel+0x495/0x4a2
 [<ffffffff81d8daa0>] ? set_init_arg+0x55/0x55
 [<ffffffff81d8d120>] ? early_idt_handler_array+0x120/0x120
 [<ffffffff81d8d5d6>] x86_64_start_reservations+0x2a/0x2c
 [<ffffffff81d8d715>] x86_64_start_kernel+0x13d/0x14c

pci_device_shutdown() is called on each PCI device, and does

        if (drv && drv->shutdown)
                drv->shutdown(pci_dev);
        pci_msi_shutdown(pci_dev);
        pci_msix_shutdown(pci_dev);

The pci_msi_shutdown() and pci_msix_shutdown() functions both call
pci_intx_for_msi() which enables the INTx interrupt asynchronously of the
driver.

The problem is that the driver may not have a shutdown function and the
device remains active.  The driver continues to operate the PCI device and the
device interrupts to generate INTx.  The driver, however, has not registered a
handler for INTx and the interrupt line remains set which leads to an unhandled
IRQ warning.

Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Cc: alex.williamson@redhat.com
Cc: darcari@redhat.com
Cc: mstowe@redhat.com
Cc: bhelgaas@google.com
Cc: lukas@wunner.de
Cc: keith.busch@intel.com
Cc: mika.westerberg@linux.intel.com
---
 drivers/pci/pci-driver.c |    7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

Message ID	1478627867-28795-1-git-send-email-prarit@redhat.com (mailing list archive)
State	New, archived
Delegated to:	Bjorn Helgaas
Headers	show Return-Path: <linux-pci-owner@kernel.org> Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 1C29560512 for <patchwork-linux-pci@patchwork.kernel.org>; Tue, 8 Nov 2016 17:57:58 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1152E289EC for <patchwork-linux-pci@patchwork.kernel.org>; Tue, 8 Nov 2016 17:57:58 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 05F3D28A4C; Tue, 8 Nov 2016 17:57:58 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4547E289EC for <patchwork-linux-pci@patchwork.kernel.org>; Tue, 8 Nov 2016 17:57:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751668AbcKHR54 (ORCPT <rfc822;patchwork-linux-pci@patchwork.kernel.org>); Tue, 8 Nov 2016 12:57:56 -0500 Received: from mx1.redhat.com ([209.132.183.28]:35308 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751427AbcKHR5y (ORCPT <rfc822;linux-pci@vger.kernel.org>); Tue, 8 Nov 2016 12:57:54 -0500 Received: from int-mx10.intmail.prod.int.phx2.redhat.com (int-mx10.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id C3F045A5D; Tue, 8 Nov 2016 17:57:53 +0000 (UTC) Received: from praritdesktop.bos.redhat.com (prarit-guest.khw.lab.eng.bos.redhat.com [10.16.186.145]) by int-mx10.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id uA8HvqFI008166; Tue, 8 Nov 2016 12:57:52 -0500 From: Prarit Bhargava <prarit@redhat.com> To: linux-pci@vger.kernel.org Cc: Prarit Bhargava <prarit@redhat.com>, alex.williamson@redhat.com, darcari@redhat.com, mstowe@redhat.com, bhelgaas@google.com, lukas@wunner.de, keith.busch@intel.com, mika.westerberg@linux.intel.com Subject: [PATCH] pci: Only disable MSI/X and enable INTx if shutdown function has been called Date: Tue, 8 Nov 2016 12:57:47 -0500 Message-Id: <1478627867-28795-1-git-send-email-prarit@redhat.com> X-Scanned-By: MIMEDefang 2.68 on 10.5.11.23 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.30]); Tue, 08 Nov 2016 17:57:54 +0000 (UTC) Sender: linux-pci-owner@vger.kernel.org Precedence: bulk List-ID: <linux-pci.vger.kernel.org> X-Mailing-List: linux-pci@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP

pci: Only disable MSI/X and enable INTx if shutdown function has been called

Commit Message

Comments

Patch