Message ID | 20231004170214.474792-1-jesse.brandeburg@intel.com (mailing list archive) |
---|---|
State | Handled Elsewhere |
Delegated to: | Bjorn Helgaas |
Headers | show |
Series | [iwl-net,v3] ice: reset first in crash dump kernels | expand |
> -----Original Message----- > From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf Of Jesse Brandeburg > Sent: Wednesday, October 4, 2023 10:32 PM > To: intel-wired-lan@lists.osuosl.org > Cc: pmenzel@molgen.mpg.de; Vishal Agrawal <vagrawal@redhat.com>; linux-pci@vger.kernel.org; Brandeburg, Jesse <jesse.brandeburg@intel.com>; netdev@vger.kernel.org; jkc@redhat.com; Kitszel, Przemyslaw <przemyslaw.kitszel@intel.com> > Subject: [Intel-wired-lan] [PATCH iwl-net v3] ice: reset first in crash dump kernels > > When the system boots into the crash dump kernel after a panic, the ice > networking device may still have pending transactions that can cause errors > or machine checks when the device is re-enabled. This can prevent the crash > dump kernel from loading the driver or collecting the crash data. > > To avoid this issue, perform a function level reset (FLR) on the ice device > via PCIe config space before enabling it on the crash kernel. This will > clear any outstanding transactions and stop all queues and interrupts. > Restore the config space after the FLR, otherwise it was found in testing > that the driver wouldn't load successfully. > > The following sequence causes the original issue: > - Load the ice driver with modprobe ice > - Enable SR-IOV with 2 VFs: echo 2 > /sys/class/net/eth0/device/sriov_num_vfs > - Trigger a crash with echo c > /proc/sysrq-trigger > - Load the ice driver again (or let it load automatically) with modprobe ice > - The system crashes again during pcim_enable_device() > > Fixes: 837f08fdecbe ("ice: Add basic driver framework for Intel(R) E800 Series") > > Reported-by: Vishal Agrawal <vagrawal@redhat.com> > Reviewed-by: Jay Vosburgh <jay.vosburgh@canonical.com> > Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> > Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> > --- > v3: add Fixes tag as approximate, added Jay's RB tag > v2: respond to list comments and update commit message > v1: initial version > --- > drivers/net/ethernet/intel/ice/ice_main.c | 15 +++++++++++++++ > 1 file changed, 15 insertions(+) > Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c index c8286adae946..6550c46e4e36 100644 --- a/drivers/net/ethernet/intel/ice/ice_main.c +++ b/drivers/net/ethernet/intel/ice/ice_main.c @@ -6,6 +6,7 @@ #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt #include <generated/utsrelease.h> +#include <linux/crash_dump.h> #include "ice.h" #include "ice_base.h" #include "ice_lib.h" @@ -5014,6 +5015,20 @@ ice_probe(struct pci_dev *pdev, const struct pci_device_id __always_unused *ent) return -EINVAL; } + /* when under a kdump kernel initiate a reset before enabling the + * device in order to clear out any pending DMA transactions. These + * transactions can cause some systems to machine check when doing + * the pcim_enable_device() below. + */ + if (is_kdump_kernel()) { + pci_save_state(pdev); + pci_clear_master(pdev); + err = pcie_flr(pdev); + if (err) + return err; + pci_restore_state(pdev); + } + /* this driver uses devres, see * Documentation/driver-api/driver-model/devres.rst */