From patchwork Wed Oct 11 23:33:32 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jacob Keller X-Patchwork-Id: 13418075 X-Patchwork-Delegate: kuba@kernel.org Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D174C2B74E for ; Wed, 11 Oct 2023 23:33:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="hMXCXzRR" Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 94A2FA9 for ; Wed, 11 Oct 2023 16:33:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1697067228; x=1728603228; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=xwxuWhk/VcZXZGTwlxdMH4oYJGSOcB0LdUFkcvXEoCI=; b=hMXCXzRRDVRCEn8G61DV85JdVvqJGbXZJf2MlwdDJP0ChbyY9jodC4j6 HQstetraszeJPzMXXJEZGAvuQ9FuPilqTTGt7b0zmFz0MUg47Oz0IrEjd bkzwIoN1LJzXeWfAUOS3WCKzU/kMaiFYI1IJ9GZulHuRYruCIY/L5AZqO mmEB/WE/wFb0aWWV1vLVb8h7T1rek2rmLxk4sRWviBMwQeRIUOGcha7mZ P0LqFl1cZ7Q73H7AwDFg0HnfCdmSEIHidVr0akH1R3/ZcUOybu+JR6gbE ZTb336nDrxznMVKPzFnMWP0VmKM/0a6OJP+AyhB5TAMiLydsspTRBuYU2 Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10860"; a="388652586" X-IronPort-AV: E=Sophos;i="6.03,217,1694761200"; d="scan'208";a="388652586" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Oct 2023 16:33:47 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10860"; a="824359287" X-IronPort-AV: E=Sophos;i="6.03,217,1694761200"; d="scan'208";a="824359287" Received: from jekeller-desk.amr.corp.intel.com (HELO jekeller-desk.jekeller.internal) ([10.166.241.1]) by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Oct 2023 16:33:46 -0700 From: Jacob Keller To: netdev@vger.kernel.org, David Miller , Jakub Kicinski Cc: Michal Schmidt , Simon Horman , Pucha Himasekhar Reddy Subject: [PATCH net 1/3] i40e: prevent crash on probe if hw registers have invalid values Date: Wed, 11 Oct 2023 16:33:32 -0700 Message-ID: <20231011233334.336092-2-jacob.e.keller@intel.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20231011233334.336092-1-jacob.e.keller@intel.com> References: <20231011233334.336092-1-jacob.e.keller@intel.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-2.8 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_LOW, SPF_HELO_NONE,SPF_NONE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: kuba@kernel.org From: Michal Schmidt The hardware provides the indexes of the first and the last available queue and VF. From the indexes, the driver calculates the numbers of queues and VFs. In theory, a faulty device might say the last index is smaller than the first index. In that case, the driver's calculation would underflow, it would attempt to write to non-existent registers outside of the ioremapped range and crash. I ran into this not by having a faulty device, but by an operator error. I accidentally ran a QE test meant for i40e devices on an ice device. The test used 'echo i40e > /sys/...ice PCI device.../driver_override', bound the driver to the device and crashed in one of the wr32 calls in i40e_clear_hw. Add checks to prevent underflows in the calculations of num_queues and num_vfs. With this fix, the wrong device probing reports errors and returns a failure without crashing. Fixes: 838d41d92a90 ("i40e: clear all queues and interrupts") Signed-off-by: Michal Schmidt Reviewed-by: Simon Horman Tested-by: Pucha Himasekhar Reddy (A Contingent worker at Intel) --- drivers/net/ethernet/intel/i40e/i40e_common.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/intel/i40e/i40e_common.c b/drivers/net/ethernet/intel/i40e/i40e_common.c index eeef20f77106..1b493854f522 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_common.c +++ b/drivers/net/ethernet/intel/i40e/i40e_common.c @@ -1082,7 +1082,7 @@ void i40e_clear_hw(struct i40e_hw *hw) I40E_PFLAN_QALLOC_FIRSTQ_SHIFT; j = (val & I40E_PFLAN_QALLOC_LASTQ_MASK) >> I40E_PFLAN_QALLOC_LASTQ_SHIFT; - if (val & I40E_PFLAN_QALLOC_VALID_MASK) + if (val & I40E_PFLAN_QALLOC_VALID_MASK && j >= base_queue) num_queues = (j - base_queue) + 1; else num_queues = 0; @@ -1092,7 +1092,7 @@ void i40e_clear_hw(struct i40e_hw *hw) I40E_PF_VT_PFALLOC_FIRSTVF_SHIFT; j = (val & I40E_PF_VT_PFALLOC_LASTVF_MASK) >> I40E_PF_VT_PFALLOC_LASTVF_SHIFT; - if (val & I40E_PF_VT_PFALLOC_VALID_MASK) + if (val & I40E_PF_VT_PFALLOC_VALID_MASK && j >= i) num_vfs = (j - i) + 1; else num_vfs = 0; From patchwork Wed Oct 11 23:33:33 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jacob Keller X-Patchwork-Id: 13418076 X-Patchwork-Delegate: kuba@kernel.org Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7BD852B752 for ; Wed, 11 Oct 2023 23:33:50 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="XkWMEreY" Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BFEB6AF for ; Wed, 11 Oct 2023 16:33:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1697067228; x=1728603228; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=yQeuXrOg6mfRlLAComFksMcXgag/yuNEpfQigUYthuY=; b=XkWMEreYRomA+AgyPNIL4uNrnx8vmxFSgG9ZoA402L4uHCwW0rnqbN3e 1dAKg2npuZ1ltUlreO/Gy0GsIuKwcZ+3UWsiN7PDMyNmOtC5B3bur5/Rz 76G/DRWi9NP3CpWsFTi6+7VujKXZoLzUdv9lsTMbTlHEm3NPV///1VyQO 1McF+KfnRoCjQgYL5/HTCidbmDH+FuuQBX91Zw6ViyBm4OrJk7eBavJlA RYK1h2RtqjCYC81Kpd6RdBqAaYHb2r1EGLgvzuo/NnjEmWyDWsiIj8ijv Sh6fsww5DcYD4xnLxIwZbL2Q4FC2SEpDYmT5uo2oZrVhKdJ1aqEdYqzcr w==; X-IronPort-AV: E=McAfee;i="6600,9927,10860"; a="388652590" X-IronPort-AV: E=Sophos;i="6.03,217,1694761200"; d="scan'208";a="388652590" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Oct 2023 16:33:47 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10860"; a="824359291" X-IronPort-AV: E=Sophos;i="6.03,217,1694761200"; d="scan'208";a="824359291" Received: from jekeller-desk.amr.corp.intel.com (HELO jekeller-desk.jekeller.internal) ([10.166.241.1]) by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Oct 2023 16:33:46 -0700 From: Jacob Keller To: netdev@vger.kernel.org, David Miller , Jakub Kicinski Cc: Jesse Brandeburg , Vishal Agrawal , Jay Vosburgh , Przemek Kitszel , Pucha Himasekhar Reddy Subject: [PATCH net 2/3] ice: reset first in crash dump kernels Date: Wed, 11 Oct 2023 16:33:33 -0700 Message-ID: <20231011233334.336092-3-jacob.e.keller@intel.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20231011233334.336092-1-jacob.e.keller@intel.com> References: <20231011233334.336092-1-jacob.e.keller@intel.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-2.8 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_LOW, SPF_HELO_NONE,SPF_NONE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: kuba@kernel.org From: Jesse Brandeburg When the system boots into the crash dump kernel after a panic, the ice networking device may still have pending transactions that can cause errors or machine checks when the device is re-enabled. This can prevent the crash dump kernel from loading the driver or collecting the crash data. To avoid this issue, perform a function level reset (FLR) on the ice device via PCIe config space before enabling it on the crash kernel. This will clear any outstanding transactions and stop all queues and interrupts. Restore the config space after the FLR, otherwise it was found in testing that the driver wouldn't load successfully. The following sequence causes the original issue: - Load the ice driver with modprobe ice - Enable SR-IOV with 2 VFs: echo 2 > /sys/class/net/eth0/device/sriov_num_vfs - Trigger a crash with echo c > /proc/sysrq-trigger - Load the ice driver again (or let it load automatically) with modprobe ice - The system crashes again during pcim_enable_device() Fixes: 837f08fdecbe ("ice: Add basic driver framework for Intel(R) E800 Series") Reported-by: Vishal Agrawal Reviewed-by: Jay Vosburgh Reviewed-by: Przemek Kitszel Signed-off-by: Jesse Brandeburg Tested-by: Pucha Himasekhar Reddy (A Contingent worker at Intel) --- drivers/net/ethernet/intel/ice/ice_main.c | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c index c8286adae946..6550c46e4e36 100644 --- a/drivers/net/ethernet/intel/ice/ice_main.c +++ b/drivers/net/ethernet/intel/ice/ice_main.c @@ -6,6 +6,7 @@ #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt #include +#include #include "ice.h" #include "ice_base.h" #include "ice_lib.h" @@ -5014,6 +5015,20 @@ ice_probe(struct pci_dev *pdev, const struct pci_device_id __always_unused *ent) return -EINVAL; } + /* when under a kdump kernel initiate a reset before enabling the + * device in order to clear out any pending DMA transactions. These + * transactions can cause some systems to machine check when doing + * the pcim_enable_device() below. + */ + if (is_kdump_kernel()) { + pci_save_state(pdev); + pci_clear_master(pdev); + err = pcie_flr(pdev); + if (err) + return err; + pci_restore_state(pdev); + } + /* this driver uses devres, see * Documentation/driver-api/driver-model/devres.rst */ From patchwork Wed Oct 11 23:33:34 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jacob Keller X-Patchwork-Id: 13418077 X-Patchwork-Delegate: kuba@kernel.org Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 793722B751 for ; Wed, 11 Oct 2023 23:33:50 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="nISzVozb" Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5E1D9B6 for ; Wed, 11 Oct 2023 16:33:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1697067229; x=1728603229; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=XxddgiUZYaLGKapb24U7IedxHY+YMZo8RACpD0zdQY0=; b=nISzVozbrFstEDWxj2ACM3aGE7rL4OzScBpDh4CWQwxm4Zv69kkV5hzr NUR6AI6OTuXlO2Mb1hMlWh4AhfytY0tO5ULlTiuyQaldmgBaa+/5D/U5x kP7+gaV1vG35rUg6/wEzPBPgOvCUc2Dvx6mAYTqlhPZwpt0oj9d43tix2 Q+98Ng46ZKxrrKI3zdmp3gl9cBbQKxCNEeBprY2KWV5AlrjBJg+FD2TTw RFefmgVYgJmRiSmLa6HqNmr5tq7N+mqvgSTv4qfyX7QcumLSGfbMRltOQ hfem86tHnoZniGgIYp+AGTreLHe1fMaplkEDilt/3LJBKfxSsFEkHLsi2 Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10860"; a="388652591" X-IronPort-AV: E=Sophos;i="6.03,217,1694761200"; d="scan'208";a="388652591" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Oct 2023 16:33:47 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10860"; a="824359295" X-IronPort-AV: E=Sophos;i="6.03,217,1694761200"; d="scan'208";a="824359295" Received: from jekeller-desk.amr.corp.intel.com (HELO jekeller-desk.jekeller.internal) ([10.166.241.1]) by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Oct 2023 16:33:46 -0700 From: Jacob Keller To: netdev@vger.kernel.org, David Miller , Jakub Kicinski Cc: Mateusz Pacuszka , Jan Sokolowski , Przemek Kitszel , Pucha Himasekhar Reddy Subject: [PATCH net 3/3] ice: Fix safe mode when DDP is missing Date: Wed, 11 Oct 2023 16:33:34 -0700 Message-ID: <20231011233334.336092-4-jacob.e.keller@intel.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20231011233334.336092-1-jacob.e.keller@intel.com> References: <20231011233334.336092-1-jacob.e.keller@intel.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-2.8 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_LOW, SPF_HELO_NONE,SPF_NONE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: kuba@kernel.org From: Mateusz Pacuszka One thing is broken in the safe mode, that is ice_deinit_features() is being executed even that ice_init_features() was not causing stack trace during pci_unregister_driver(). Add check on the top of the function. Fixes: 5b246e533d01 ("ice: split probe into smaller functions") Signed-off-by: Mateusz Pacuszka Signed-off-by: Jan Sokolowski Reviewed-by: Przemek Kitszel Tested-by: Pucha Himasekhar Reddy (A Contingent worker at Intel) --- drivers/net/ethernet/intel/ice/ice_main.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c index 6550c46e4e36..7784135160fd 100644 --- a/drivers/net/ethernet/intel/ice/ice_main.c +++ b/drivers/net/ethernet/intel/ice/ice_main.c @@ -4684,6 +4684,9 @@ static void ice_init_features(struct ice_pf *pf) static void ice_deinit_features(struct ice_pf *pf) { + if (ice_is_safe_mode(pf)) + return; + ice_deinit_lag(pf); if (test_bit(ICE_FLAG_DCB_CAPABLE, pf->flags)) ice_cfg_lldp_mib_change(&pf->hw, false);