From patchwork Fri Oct 17 09:13:27 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Li, Zhen-Hua" X-Patchwork-Id: 5095951 X-Patchwork-Delegate: bhelgaas@google.com Return-Path: X-Original-To: patchwork-linux-pci@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.19.201]) by patchwork2.web.kernel.org (Postfix) with ESMTP id 7070AC11AC for ; Fri, 17 Oct 2014 09:13:51 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 71389201EF for ; Fri, 17 Oct 2014 09:13:50 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 093302012E for ; Fri, 17 Oct 2014 09:13:48 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751641AbaJQJNq (ORCPT ); Fri, 17 Oct 2014 05:13:46 -0400 Received: from g2t1383g.austin.hp.com ([15.217.136.92]:9039 "EHLO g2t1383g.austin.hp.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751238AbaJQJNp (ORCPT ); Fri, 17 Oct 2014 05:13:45 -0400 Received: from g2t2353.austin.hp.com (g2t2353.austin.hp.com [15.217.128.52]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by g2t1383g.austin.hp.com (Postfix) with ESMTPS id A0B5338CB; Fri, 17 Oct 2014 09:13:44 +0000 (UTC) Received: from g2t2360.austin.hp.com (g2t2360.austin.hp.com [16.197.8.247]) by g2t2353.austin.hp.com (Postfix) with ESMTP id BCFF760; Fri, 17 Oct 2014 09:13:43 +0000 (UTC) Received: from piepie.asiapacific.hpqcorp.net (piepie.asiapacific.hpqcorp.net [16.187.246.188]) by g2t2360.austin.hp.com (Postfix) with ESMTP id 67EA14B; Fri, 17 Oct 2014 09:13:40 +0000 (UTC) From: "Li, Zhen-Hua" To: , Bjorn Helgaas , , , Eric Biederman , Cc: , "Li, Zhen-Hua" , Randy Wright Subject: [PATCH 1/1] pci: reset all pci endpoints to stop on going dma Date: Fri, 17 Oct 2014 17:13:27 +0800 Message-Id: <1413537207-16514-1-git-send-email-zhen-hual@hp.com> X-Mailer: git-send-email 2.0.0-rc0 Sender: linux-pci-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org X-Spam-Status: No, score=-6.9 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_HI, RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP This is an update of the patch https://lkml.org/lkml/2014/10/10/37 This patch is doing the reset works before the kdump kernel boots. On a Linux system with iommu supported and many PCI devices on it, when kernel crashed and the kdump kernel boots with intel_iommu=on, there may be some unexpected DMA requests on this adapter, which will cause DMA Remapping faults like: dmar: DRHD: handling fault status reg 102 dmar: DMAR:[DMA Read] Request device [41:00.0] fault addr fff81000 DMAR:[fault reason 01] Present bit in root entry is clear This bug may happen on *any* PCI device. Analysis for this bug: The present bit is set in this function: static struct context_entry * device_to_context_entry( struct intel_iommu *iommu, u8 bus, u8 devfn) { ...... set_root_present(root); ...... } Calling tree: device driver intel_alloc_coherent __intel_map_single domain_context_mapping domain_context_mapping_one device_to_context_entry This means, the present bit in root entry will not be set until the device driver is loaded. But in the kdump kernel, hardware devices are not aware that control has transferred to the second kernel, and those drivers must initialize again. Consequently there may be unexpected DMA requests from devices activity initiated in the first kernel leading to the DMA Remapping errors in the second kernel. To fix this DMAR fault, we need to reset the bus that this device on. Reset the device itself does not work. A patch for this bug that has been sent before: https://lkml.org/lkml/2014/9/30/55 As in discussion, this bug may happen on *any* device, so we need to reset all pci devices. There was an original version(Takao Indoh) that resets the pcie devices: https://lkml.org/lkml/2013/5/14/9 According to the previous discussion, On sparc, the IOMMU is initialized before PCI devices are enumerated, this patch does the resetting works before the kdump kernel boots, so it can also fix the problems on sparc. Update of this new version, comparing with Takao Indoh's version: Add support for legacy PCI devices. Use pci_try_reset_bus instead of do_downstream_device_reset. Reset all PCI/PCIe deviecs in the first kernel, before kdump kernel boots. Randy Wright corrects some misunderstanding in this description. Signed-off-by: Li, Zhen-Hua Signed-off-by: Takao Indoh Signed-off-by: Randy Wright Nacked-by: "Eric W. Biederman" Nacked-by: "Eric W. Biederman" Nacked-by: "Eric W. Biederman" --- drivers/pci/pci.c | 82 +++++++++++++++++++++++++++++++++++++++++++++++++++++ include/linux/pci.h | 6 ++++ kernel/kexec.c | 2 ++ 3 files changed, 90 insertions(+) diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index 625a4ac..aa9192a 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -23,6 +23,7 @@ #include #include #include +#include #include #include #include "pci.h" @@ -4466,6 +4467,87 @@ void __weak pci_fixup_cardbus(struct pci_bus *bus) } EXPORT_SYMBOL(pci_fixup_cardbus); +/* + * Return true if dev is PCI root port or downstream port whose child is PCI + * endpoint except VGA device. + */ +static int __pci_dev_need_reset(struct pci_dev *dev) +{ + struct pci_bus *subordinate; + struct pci_dev *child; + + if (dev->hdr_type != PCI_HEADER_TYPE_BRIDGE) + return 0; + + if (pci_is_pcie(dev)) { + if ((pci_pcie_type(dev) != PCI_EXP_TYPE_ROOT_PORT) && + (pci_pcie_type(dev) != PCI_EXP_TYPE_DOWNSTREAM)) + return 0; + } + + subordinate = dev->subordinate; + list_for_each_entry(child, &subordinate->devices, bus_list) { + /* Don't reset switch, bridge, VGA device */ + if ((child->hdr_type == PCI_HEADER_TYPE_BRIDGE) || + ((child->class >> 16) == PCI_BASE_CLASS_BRIDGE) || + ((child->class >> 16) == PCI_BASE_CLASS_DISPLAY)) + return 0; + + if (pci_is_pcie(child)) { + if ((pci_pcie_type(child) == PCI_EXP_TYPE_UPSTREAM) || + (pci_pcie_type(child) == PCI_EXP_TYPE_PCI_BRIDGE)) + return 0; + } + } + + return 1; +} + +struct pci_dev_reset_entry { + struct list_head list; + struct pci_dev *dev; +}; +int pci_reset_endpoints(void) +{ + struct pci_dev *dev = NULL; + struct pci_dev_reset_entry *pdev_entry, *tmp; + struct pci_bus *subordinate = NULL; + int has_it; + + LIST_HEAD(pdev_list); + + + for_each_pci_dev(dev) { + subordinate = dev->subordinate; + if (!subordinate || list_empty(&subordinate->devices)) + continue; + + has_it = 0; + list_for_each_entry(pdev_entry, &pdev_list, list) { + if (dev == pdev_entry->dev) { + has_it = 1; + break; + } + } + if (has_it) + continue; + + if (__pci_dev_need_reset(dev)) { + pdev_entry = kmalloc(sizeof(*pdev_entry), GFP_KERNEL); + pdev_entry->dev = dev; + list_add(&pdev_entry->list, &pdev_list); + } + } + + list_for_each_entry_safe(pdev_entry, tmp, &pdev_list, list) { + pci_try_reset_bus(pdev_entry->dev->subordinate); + kfree(pdev_entry); + } + + return 0; +} +EXPORT_SYMBOL_GPL(pci_reset_endpoints); + static int __init pci_setup(char *str) { while (str) { diff --git a/include/linux/pci.h b/include/linux/pci.h index 5be8db4..1cf7207 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -1869,4 +1869,10 @@ static inline bool pci_is_dev_assigned(struct pci_dev *pdev) { return (pdev->dev_flags & PCI_DEV_FLAGS_ASSIGNED) == PCI_DEV_FLAGS_ASSIGNED; } + +/* + * Reset all pci devices by resetting the buses. + */ +int pci_reset_endpoints(void); + #endif /* LINUX_PCI_H */ diff --git a/kernel/kexec.c b/kernel/kexec.c index 2abf9f6..986e8f7 100644 --- a/kernel/kexec.c +++ b/kernel/kexec.c @@ -36,6 +36,7 @@ #include #include #include +#include #include #include @@ -1474,6 +1475,7 @@ void crash_kexec(struct pt_regs *regs) if (kexec_crash_image) { struct pt_regs fixed_regs; + pci_reset_endpoints(); crash_setup_regs(&fixed_regs, regs); crash_save_vmcoreinfo(); machine_crash_shutdown(&fixed_regs);