From patchwork Wed Sep 5 20:35:36 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Keith Busch X-Patchwork-Id: 10589433 X-Patchwork-Delegate: bhelgaas@google.com Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3F4FC5A4 for ; Wed, 5 Sep 2018 20:36:36 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 321292AB9C for ; Wed, 5 Sep 2018 20:36:36 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 25A622ABB3; Wed, 5 Sep 2018 20:36:36 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id BA9E12AB9C for ; Wed, 5 Sep 2018 20:36:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727875AbeIFBI0 (ORCPT ); Wed, 5 Sep 2018 21:08:26 -0400 Received: from mga06.intel.com ([134.134.136.31]:58562 "EHLO mga06.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727645AbeIFBIZ (ORCPT ); Wed, 5 Sep 2018 21:08:25 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by orsmga104.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 05 Sep 2018 13:36:34 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.53,334,1531810800"; d="scan'208";a="260165084" Received: from unknown (HELO localhost.lm.intel.com) ([10.232.112.44]) by fmsmga005.fm.intel.com with ESMTP; 05 Sep 2018 13:36:33 -0700 From: Keith Busch To: Linux PCI , Bjorn Helgaas Cc: Benjamin Herrenschmidt , Sinan Kaya , Thomas Tai , poza@codeaurora.org, Lukas Wunner , Christoph Hellwig , Keith Busch Subject: [PATCHv2 10/20] PCI/ERR: Remove devices on recovery failure Date: Wed, 5 Sep 2018 14:35:36 -0600 Message-Id: <20180905203546.21921-11-keith.busch@intel.com> X-Mailer: git-send-email 2.13.6 In-Reply-To: <20180905203546.21921-1-keith.busch@intel.com> References: <20180905203546.21921-1-keith.busch@intel.com> Sender: linux-pci-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP This patch removes devices connected through a bus that can't recover from an error. After removing everything, one final enumeration from scratch will be attempted if the bridge and its downstream link appear accessible. Signed-off-by: Keith Busch --- drivers/pci/pcie/err.c | 42 ++++++++++++++++++++++++++++++++++++++---- 1 file changed, 38 insertions(+), 4 deletions(-) diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c index 31e8a4314384..2264001f695b 100644 --- a/drivers/pci/pcie/err.c +++ b/drivers/pci/pcie/err.c @@ -145,6 +145,41 @@ static int report_resume(struct pci_dev *dev, void *data) return 0; } +static int report_disconnect(struct pci_dev *dev, void *data) +{ + device_lock(&dev->dev); + pci_dev_set_disconnected(dev, NULL); + pci_uevent_ers(dev, PCI_ERS_RESULT_DISCONNECT); + device_unlock(&dev->dev); + return 0; +} + +/** + * pcie_disconnect- Called when error handling ends with +* PCI_ERS_RESULT_DISCONNECT status. + * + * Reaching here means error handling has irrevocably failed. This function + * will ungracefully disconnect all the devices below the bus that has + * experienced the unrecoverable error. + * + * If the link is active after the removing all devices on the bus, this will + * attempt to re-enumerate the bus from scratch. + */ +static void pcie_disconnect(struct pci_dev *dev) +{ + struct pci_bus *bus = dev->subordinate; + struct pci_dev *child, *tmp; + + pci_lock_rescan_remove(); + list_for_each_entry_safe(child, tmp, &bus->devices, bus_list) + pci_stop_and_remove_bus_device(child); + + if (pci_bridge_secondary_bus_reset(dev) == 0 && + pcie_wait_for_link(dev, true)) + pci_rescan_bus(bus); + pci_unlock_rescan_remove(); +} + /** * default_reset_link - default reset function * @dev: pointer to pci_dev data structure @@ -238,10 +273,9 @@ void pcie_do_recovery(struct pci_dev *dev, enum pci_channel_state state, pci_cleanup_aer_uncorrect_error_status(dev); pci_info(dev, "AER: Device recovery successful\n"); return; - failed: - pci_uevent_ers(dev, PCI_ERS_RESULT_DISCONNECT); - - /* TODO: Should kernel panic here? */ pci_info(dev, "AER: Device recovery failed\n"); + pci_dbg(dev, "broadcast disconnect message\n"); + pci_walk_bus(bus, report_disconnect, &status); + pcie_disconnect(dev); }