From patchwork Tue Feb 18 03:48:58 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Feng Tang X-Patchwork-Id: 13978869 X-Patchwork-Delegate: bhelgaas@google.com Received: from out30-131.freemail.mail.aliyun.com (out30-131.freemail.mail.aliyun.com [115.124.30.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 43204101DE; Tue, 18 Feb 2025 03:49:02 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=115.124.30.131 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739850548; cv=none; b=Je2wboOogvliPcBVU4rI5SCBDwo2/kjp0UintRQb+fQiZddHTsbJPmruXs+l4fd04BBsi5V7aef1h+Xi0tE7xqhKd+v+eUOKJaH4VgoYLg1LP13H/u2y6F/Hg3/Kepzc0hWmd4YQptuV5XiqwvT9eOrcGnAJLpWvVMMtZes5j6s= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739850548; c=relaxed/simple; bh=wVEnDf4lSSrgMuIGeSF8XBgLigTgg5U7vF7uH4Hnirk=; h=From:To:Cc:Subject:Date:Message-Id:MIME-Version; b=VGGVW+Qn9VUbex3oMrpvvhJbhyYEP2r9WRU+wMCpuYsju4qgXLKDub/KRjOPqKOkDbYGjn22mhhhkf2fjxdzIFIsQWOCuJAV/S9fvSQCOTx6Q9LkCz2S+fkDA6Su8O55lJxyJK0j9RgdSsgGs4Hka7xwY5XT6HcIsmiPCIAqFF4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com; spf=pass smtp.mailfrom=linux.alibaba.com; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b=pIfYGPJ/; arc=none smtp.client-ip=115.124.30.131 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b="pIfYGPJ/" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1739850540; h=From:To:Subject:Date:Message-Id:MIME-Version; bh=HKDik8WCTElul+esXzBqNF8f077PEfJuEM3XOG9Jfw8=; b=pIfYGPJ/KlIFIX5J7yEhE+eS37cSRSj8Hg/v3bJHrE8JBWyRwl9C1Umgw61gZ95/d6qKFKUXidrgPnfOUx9IHcl41r/z4PwmHtNRg5UecEu5/Lvs3PZD8wEMRpqxsNYXTWKlt+uzhLWggVmintxIOcftfPTcyyL0uGRZO/4l9nU= Received: from localhost(mailfrom:feng.tang@linux.alibaba.com fp:SMTPD_---0WPkD5UU_1739850539 cluster:ay36) by smtp.aliyun-inc.com; Tue, 18 Feb 2025 11:49:00 +0800 From: Feng Tang To: Bjorn Helgaas , Lukas Wunner , Sathyanarayanan Kuppuswamy , Liguang Zhang , Guanghui Feng , rafael@kernel.org Cc: Markus Elfring , Jonathan Cameron , ilpo.jarvinen@linux.intel.com, linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org, Feng Tang Subject: [PATCH v2 1/2] PCI/portdrv: Add necessary wait for disabling hotplug events Date: Tue, 18 Feb 2025 11:48:58 +0800 Message-Id: <20250218034859.40397-1-feng.tang@linux.alibaba.com> X-Mailer: git-send-email 2.39.5 (Apple Git-154) Precedence: bulk X-Mailing-List: linux-pci@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 There was problem reported by firmware developers that they received 2 pcie link control commands in very short intervals on an ARM server, which doesn't comply with pcie spec, and broke their state machine and work flow. According to PCIe 6.1 spec, section 6.7.3.2, software needs to wait at least 1 second for the command-complete event, before resending the cmd or sending a new cmd. And the first link control command firmware received is from get_port_device_capability(), which sends cmd to disable pcie hotplug interrupts without waiting for its completion. Fix it by adding the necessary wait to comply with PCIe spec, referring pcie_poll_cmd(). Also make the interrupt disabling not dependent on whether pciehp service driver will be loaded as suggested by Lukas. Fixes: 2bd50dd800b5 ("PCI: PCIe: Disable PCIe port services during port initialization") Originally-by: Liguang Zhang Suggested-by: Sathyanarayanan Kuppuswamy Signed-off-by: Feng Tang --- Changlog: since v1: * Add the Originally-by for Liguang. The issue was found on a 5.10 kernel, then 6.6. I was initially given a 5.10 kernel tar bar without git info to debug the issue, and made the patch. Thanks to Guanghui who recently pointed me to tree https://gitee.com/anolis/cloud-kernel which show the wait logic in 5.10 was originally from Liguang, and never hit mainline. * Make the irq disabling not dependent on wthether pciehp service driver will be loaded (Lukas Wunner) * Use read_poll_timeout() API to simply the waiting logic (Sathyanarayanan Kuppuswamy) * Add logic to skip irq disabling if it is already disabled. drivers/pci/pci.h | 2 ++ drivers/pci/pcie/portdrv.c | 44 +++++++++++++++++++++++++++++++++----- 2 files changed, 41 insertions(+), 5 deletions(-) diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h index 01e51db8d285..c1e234d1b81d 100644 --- a/drivers/pci/pci.h +++ b/drivers/pci/pci.h @@ -759,12 +759,14 @@ static inline void pcie_ecrc_get_policy(char *str) { } #ifdef CONFIG_PCIEPORTBUS void pcie_reset_lbms_count(struct pci_dev *port); int pcie_lbms_count(struct pci_dev *port, unsigned long *val); +void pcie_disable_hp_interrupts_early(struct pci_dev *dev); #else static inline void pcie_reset_lbms_count(struct pci_dev *port) {} static inline int pcie_lbms_count(struct pci_dev *port, unsigned long *val) { return -EOPNOTSUPP; } +static inline void pcie_disable_hp_interrupts_early(struct pci_dev *dev) {} #endif struct pci_dev_reset_methods { diff --git a/drivers/pci/pcie/portdrv.c b/drivers/pci/pcie/portdrv.c index 02e73099bad0..2470333bba2f 100644 --- a/drivers/pci/pcie/portdrv.c +++ b/drivers/pci/pcie/portdrv.c @@ -18,6 +18,7 @@ #include #include #include +#include #include "../pci.h" #include "portdrv.h" @@ -205,6 +206,40 @@ static int pcie_init_service_irqs(struct pci_dev *dev, int *irqs, int mask) return 0; } +static int pcie_wait_sltctl_cmd_raw(struct pci_dev *dev) +{ + u16 slot_status = 0; + int ret, ret1, timeout_us; + + /* 1 second, according to PCIe spec 6.1, section 6.7.3.2 */ + timeout_us = 1000000; + ret = read_poll_timeout(pcie_capability_read_word, ret1, + (slot_status & PCI_EXP_SLTSTA_CC), 10000, + timeout_us, true, dev, PCI_EXP_SLTSTA, + &slot_status); + if (!ret) + pcie_capability_write_word(dev, PCI_EXP_SLTSTA, + PCI_EXP_SLTSTA_CC); + + return ret; +} + +void pcie_disable_hp_interrupts_early(struct pci_dev *dev) +{ + u16 slot_ctrl = 0; + + pcie_capability_read_word(dev, PCI_EXP_SLTCTL, &slot_ctrl); + /* Bail out early if it is already disabled */ + if (!(slot_ctrl & (PCI_EXP_SLTCTL_CCIE | PCI_EXP_SLTCTL_HPIE))) + return; + + pcie_capability_clear_word(dev, PCI_EXP_SLTCTL, + PCI_EXP_SLTCTL_CCIE | PCI_EXP_SLTCTL_HPIE); + + if (pcie_wait_sltctl_cmd_raw(dev)) + pci_info(dev, "Timeout on disabling PCIE hot-plug interrupt\n"); +} + /** * get_port_device_capability - discover capabilities of a PCI Express port * @dev: PCI Express port to examine @@ -222,16 +257,15 @@ static int get_port_device_capability(struct pci_dev *dev) if (dev->is_hotplug_bridge && (pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT || - pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM) && - (pcie_ports_native || host->native_pcie_hotplug)) { - services |= PCIE_PORT_SERVICE_HP; + pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM)) { + if (pcie_ports_native || host->native_pcie_hotplug) + services |= PCIE_PORT_SERVICE_HP; /* * Disable hot-plug interrupts in case they have been enabled * by the BIOS and the hot-plug service driver is not loaded. */ - pcie_capability_clear_word(dev, PCI_EXP_SLTCTL, - PCI_EXP_SLTCTL_CCIE | PCI_EXP_SLTCTL_HPIE); + pcie_disable_hp_interrupts_early(dev); } #ifdef CONFIG_PCIEAER From patchwork Tue Feb 18 03:48:59 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Feng Tang X-Patchwork-Id: 13978870 X-Patchwork-Delegate: bhelgaas@google.com Received: from out30-97.freemail.mail.aliyun.com (out30-97.freemail.mail.aliyun.com [115.124.30.97]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CA0D91A08DB; Tue, 18 Feb 2025 03:49:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=115.124.30.97 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739850553; cv=none; b=cbjmwSmkcF3KY3YaXSnnC0U1E6B/jmOj++5GIDAVKaUMN/SWbCTrsaRTRwgZYIFTYnaOeF3BD3gFKQJKENuoG+c/Y4fF52ZU9JaSN6+6sg0MQO9HZWTdEHoDibwLuGqtRCIdM+mVtBrL9RTNL5Fwsud9Ry1ywSq+Pub7O5hKwjw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739850553; c=relaxed/simple; bh=gZDwF2MK3WnQqkLHRxze35VFVWxiEl3ROncmqxTo7qE=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=p6cJTUwZTsfu6pznoXUL2OaJmKkaI5BIYF8dZI+aYjHH+64dT3YfGPl1ffhFsW+KZX2YutEUfISwJRNtIEAEc+XdhBmAWOTMDO9jbe4Hs9kIw/E+a236sM2k2c5RwyOmlsrxA6xlAbMrFO1Sam5yVkdoAC0JGbVRI+i1vHl3rT0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com; spf=pass smtp.mailfrom=linux.alibaba.com; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b=vQx3JXi8; arc=none smtp.client-ip=115.124.30.97 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b="vQx3JXi8" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1739850541; h=From:To:Subject:Date:Message-Id:MIME-Version; bh=RoTkR/BoWRWXPU8Qu+KO1xpOJocUTVSiHDA/GEB6aOg=; b=vQx3JXi8+76mez0C0X/qcA02ZbsxwJdNUWZr1FSjziJk+7YfhrfD/t8ayZ3ilo2Dl30NK1S4GWqyHv+vaWQiwkDdRc7k7rxHrjn1sLHwws87wS06w0oVquHoAov1FfdXxYDTLNZgdyDZCcR4OOO8l+UCYwUXlrO+vTRA8n7kLbY= Received: from localhost(mailfrom:feng.tang@linux.alibaba.com fp:SMTPD_---0WPkGLoe_1739850540 cluster:ay36) by smtp.aliyun-inc.com; Tue, 18 Feb 2025 11:49:00 +0800 From: Feng Tang To: Bjorn Helgaas , Lukas Wunner , Sathyanarayanan Kuppuswamy , Liguang Zhang , Guanghui Feng , rafael@kernel.org Cc: Markus Elfring , Jonathan Cameron , ilpo.jarvinen@linux.intel.com, linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org, Feng Tang Subject: [PATCH v2 2/2] PCI: Disable PCIE hotplug interrupts early when msi is disabled Date: Tue, 18 Feb 2025 11:48:59 +0800 Message-Id: <20250218034859.40397-2-feng.tang@linux.alibaba.com> X-Mailer: git-send-email 2.39.5 (Apple Git-154) In-Reply-To: <20250218034859.40397-1-feng.tang@linux.alibaba.com> References: <20250218034859.40397-1-feng.tang@linux.alibaba.com> Precedence: bulk X-Mailing-List: linux-pci@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 There was an irq storm bug when testing "pci=nomsi" case, and the root cause is: 'nomsi' will disable MSI and let devices and root ports use legacy INTX interrupt, and likely make several devices/ports share one interrupt. In the failure case, BIOS doesn't disable the pcie hotplug interrupts, and actually asserts the command-complete interrupt. So the timeline is: 1. pciehp's CCIE/HPIE enabled and command-complete interrupts asserted 2. the interrupt is shared by pcie root port and nvme/nic device 3. nvme/nic driver's probe function enables the interrupt line 4. pciehp driver is loaded later or not loaded And the "nobody cared irq storm" happens between 3 and 4. This is not an issue for normal MSI case, as each interrupt is controlled by its own driver. When the driver is not loaded, the interrupt won't get fired to kernel even if it is physically asserted. So disable the pcie hotplug CCIE/HPIE interrupt in early boot phase when MSI is not enabled. Signed-off-by: Feng Tang --- Changlog: Since v1: * Modify the commit log drivers/pci/probe.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c index b6536ed599c3..10d72156da9a 100644 --- a/drivers/pci/probe.c +++ b/drivers/pci/probe.c @@ -1664,6 +1664,15 @@ void set_pcie_hotplug_bridge(struct pci_dev *pdev) pcie_capability_read_dword(pdev, PCI_EXP_SLTCAP, ®32); if (reg32 & PCI_EXP_SLTCAP_HPC) pdev->is_hotplug_bridge = 1; + + /* + * When MSI is disabled, root port will use legacy INTX, and likely + * share INTX interrupt line with other devices like NIC/NVME. There + * was real world issue that the CCIE IRQ is asserted afer boot, but + * will not be handled well and cause IRQ storm. So disable it early. + */ + if (!pci_msi_enabled()) + pcie_disable_hp_interrupts_early(pdev); } static void set_pcie_thunderbolt(struct pci_dev *dev)