Message ID | 20240220231720.14836-1-jesse.brandeburg@intel.com (mailing list archive) |
---|---|
State | Awaiting Upstream |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [iwl-net,v1] ice: fix NULL pointer access during resume | expand |
On 2/20/2024 3:17 PM, Jesse Brandeburg wrote: > The ice_suspend/ice_resume cycle was not updated when refactoring was > done to the init path and I suspect this allowed a bug to creep in where > the driver was not correctly reinitialized during resume. > > I was able to test against 6.1.77 kernel and that ice driver works fine > for suspend/resume with no panic. > > Instead of tearing down interrupts and freeing a bunch of memory during > suspend, just begin an internal reset event, which takes care of all the > correct steps during suspend. Likewise during resume we'll just let the > reset complete and the driver comes right back to life. This mirrors the > behavior of other suspend/resume code in drivers like fm10k. > > Older kernel commits were made to this driver and to the i40e driver to > try to fix "disk" or hibernate suspend events with many CPUs. The PM > subsystem was updated since then but the drivers kept the old flows. > Testing with rtcwake -m [disk | mem] -s 10 - passes but my system won't > hibernate due to too much RAM, not enough swap. > > The code is slightly refactored during this change in order to share a > common "prep" path between suspend and the pci error handler functions > which all do the same thing, so introduce ice_quiesce_before_reset(). > > While doing all this and compile testing I ran across the pm.h changes > to get rid of compilation problems when CONFIG_PM=n etc, so those small > changes are included here as well. > ... > > Fixes: 5b246e533d01 ("ice: split probe into smaller functions") > Reported-by: Robert Elliott <elliott@hpe.com> > Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> > Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> > --- > NOTE: > Requires Amritha's patch: > https://patchwork.ozlabs.org/project/intel-wired-lan/patch/170785373072.3325.9129916579186572531.stgit@anambiarhost.jf.intel.com/ > to be applied before this will pass testing cleanly. I think this may be the other way around? It looks to be clean for netdev (doesn't have Amritha's patch), but it's not applying to net-queue (has Amritha's patch). > base-commit: 23f9c2c066e7e5052406fb8f04a115d3d0260b22 Base commit also seems to be a netdev commit. Since Amritha's patch is pending to netdev [1], I think we need a version that will apply with Amritha's changes. Thanks, Tony [1] https://lore.kernel.org/netdev/20240220214444.1039759-7-anthony.l.nguyen@intel.com/
> -----Original Message----- > From: Nguyen, Anthony L <anthony.l.nguyen@intel.com> > Sent: Thursday, February 22, 2024 2:10 PM > To: Brandeburg, Jesse <jesse.brandeburg@intel.com>; intel-wired- > lan@lists.osuosl.org > Cc: netdev@vger.kernel.org; Elliott, Rob <elliott@hpe.com>; Keller, Jacob E > <jacob.e.keller@intel.com>; David S. Miller <davem@davemloft.net>; Eric Dumazet > <edumazet@google.com>; Jakub Kicinski <kuba@kernel.org>; Paolo Abeni > <pabeni@redhat.com>; Michal Swiatkowski <michal.swiatkowski@linux.intel.com> > Subject: Re: [PATCH iwl-net v1] ice: fix NULL pointer access during resume > > On 2/20/2024 3:17 PM, Jesse Brandeburg wrote: > > The ice_suspend/ice_resume cycle was not updated when refactoring was > > done to the init path and I suspect this allowed a bug to creep in where > > the driver was not correctly reinitialized during resume. > > > > I was able to test against 6.1.77 kernel and that ice driver works fine > > for suspend/resume with no panic. > > > > Instead of tearing down interrupts and freeing a bunch of memory during > > suspend, just begin an internal reset event, which takes care of all the > > correct steps during suspend. Likewise during resume we'll just let the > > reset complete and the driver comes right back to life. This mirrors the > > behavior of other suspend/resume code in drivers like fm10k. > > > > Older kernel commits were made to this driver and to the i40e driver to > > try to fix "disk" or hibernate suspend events with many CPUs. The PM > > subsystem was updated since then but the drivers kept the old flows. > > Testing with rtcwake -m [disk | mem] -s 10 - passes but my system won't > > hibernate due to too much RAM, not enough swap. > > > > The code is slightly refactored during this change in order to share a > > common "prep" path between suspend and the pci error handler functions > > which all do the same thing, so introduce ice_quiesce_before_reset(). > > > > While doing all this and compile testing I ran across the pm.h changes > > to get rid of compilation problems when CONFIG_PM=n etc, so those small > > changes are included here as well. > > > > ... > > > > > Fixes: 5b246e533d01 ("ice: split probe into smaller functions") > > Reported-by: Robert Elliott <elliott@hpe.com> > > Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> > > Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> > > --- > > NOTE: > > Requires Amritha's patch: > > https://patchwork.ozlabs.org/project/intel-wired- > lan/patch/170785373072.3325.9129916579186572531.stgit@anambiarhost.jf. > intel.com/ > > to be applied before this will pass testing cleanly. > > I think this may be the other way around? It looks to be clean for > netdev (doesn't have Amritha's patch), but it's not applying to > net-queue (has Amritha's patch). > > > base-commit: 23f9c2c066e7e5052406fb8f04a115d3d0260b22 > > Base commit also seems to be a netdev commit. > > Since Amritha's patch is pending to netdev [1], I think we need a > version that will apply with Amritha's changes. > > Thanks, > Tony > > > [1] > https://lore.kernel.org/netdev/20240220214444.1039759-7- > anthony.l.nguyen@intel.com/ I think he means that the tests won't pass without Amritha's patch because we'll hit the ASSERT_RTNL in the suspend flow otherwise. Thanks, Jake
On 2/22/2024 2:19 PM, Keller, Jacob E wrote: >> -----Original Message----- >> From: Nguyen, Anthony L <anthony.l.nguyen@intel.com> ... >>> NOTE: >>> Requires Amritha's patch: >>> https://patchwork.ozlabs.org/project/intel-wired- >> lan/patch/170785373072.3325.9129916579186572531.stgit@anambiarhost.jf. >> intel.com/ >>> to be applied before this will pass testing cleanly. >> >> I think this may be the other way around? It looks to be clean for >> netdev (doesn't have Amritha's patch), but it's not applying to >> net-queue (has Amritha's patch). >> >> > base-commit: 23f9c2c066e7e5052406fb8f04a115d3d0260b22 >> >> Base commit also seems to be a netdev commit. >> >> Since Amritha's patch is pending to netdev [1], I think we need a >> version that will apply with Amritha's changes. >> >> Thanks, >> Tony >> >> >> [1] >> https://lore.kernel.org/netdev/20240220214444.1039759-7- >> anthony.l.nguyen@intel.com/ > > I think he means that the tests won't pass without Amritha's patch because we'll hit the ASSERT_RTNL in the suspend flow otherwise. Got it. Anyways, this doesn't apply to net-queue, could we rebase or base it off of that tree? Thanks, Tony > Thanks, > Jake
On 2/22/2024 2:25 PM, Tony Nguyen wrote: >> I think he means that the tests won't pass without Amritha's patch >> because we'll hit the ASSERT_RTNL in the suspend flow otherwise. > > Got it. Anyways, this doesn't apply to net-queue, could we rebase or > base it off of that tree? Ok, will send v2 off -net.
diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c index dd4a9bc0dfdc..2a16b4475d29 100644 --- a/drivers/net/ethernet/intel/ice/ice_main.c +++ b/drivers/net/ethernet/intel/ice/ice_main.c @@ -5385,84 +5385,15 @@ static void ice_shutdown(struct pci_dev *pdev) } } -#ifdef CONFIG_PM -/** - * ice_prepare_for_shutdown - prep for PCI shutdown - * @pf: board private structure - * - * Inform or close all dependent features in prep for PCI device shutdown - */ -static void ice_prepare_for_shutdown(struct ice_pf *pf) -{ - struct ice_hw *hw = &pf->hw; - u32 v; - - /* Notify VFs of impending reset */ - if (ice_check_sq_alive(hw, &hw->mailboxq)) - ice_vc_notify_reset(pf); - - dev_dbg(ice_pf_to_dev(pf), "Tearing down internal switch for shutdown\n"); - - /* disable the VSIs and their queues that are not already DOWN */ - ice_pf_dis_all_vsi(pf, false); - - ice_for_each_vsi(pf, v) - if (pf->vsi[v]) - pf->vsi[v]->vsi_num = 0; - - ice_shutdown_all_ctrlq(hw); -} - -/** - * ice_reinit_interrupt_scheme - Reinitialize interrupt scheme - * @pf: board private structure to reinitialize - * - * This routine reinitialize interrupt scheme that was cleared during - * power management suspend callback. - * - * This should be called during resume routine to re-allocate the q_vectors - * and reacquire interrupts. - */ -static int ice_reinit_interrupt_scheme(struct ice_pf *pf) +static int ice_quiesce_before_reset(struct ice_pf *pf) { - struct device *dev = ice_pf_to_dev(pf); - int ret, v; - - /* Since we clear MSIX flag during suspend, we need to - * set it back during resume... - */ - - ret = ice_init_interrupt_scheme(pf); - if (ret) { - dev_err(dev, "Failed to re-initialize interrupt %d\n", ret); - return ret; - } - - /* Remap vectors and rings, after successful re-init interrupts */ - ice_for_each_vsi(pf, v) { - if (!pf->vsi[v]) - continue; + int ret = ice_service_task_stop(pf); - ret = ice_vsi_alloc_q_vectors(pf->vsi[v]); - if (ret) - goto err_reinit; - ice_vsi_map_rings_to_vectors(pf->vsi[v]); - } - - ret = ice_req_irq_msix_misc(pf); - if (ret) { - dev_err(dev, "Setting up misc vector failed after device suspend %d\n", - ret); - goto err_reinit; + if (!test_bit(ICE_PREPARED_FOR_RESET, pf->state)) { + set_bit(ICE_PFR_REQ, pf->state); + ice_prepare_for_reset(pf, ICE_RESET_PFR); } - return 0; - -err_reinit: - while (v--) - if (pf->vsi[v]) - ice_vsi_free_q_vectors(pf->vsi[v]); - return ret; } @@ -5473,66 +5404,29 @@ static int ice_reinit_interrupt_scheme(struct ice_pf *pf) * Power Management callback to quiesce the device and prepare * for D3 transition. */ -static int __maybe_unused ice_suspend(struct device *dev) +static int ice_suspend(struct device *dev) { struct pci_dev *pdev = to_pci_dev(dev); struct ice_pf *pf; - int disabled, v; pf = pci_get_drvdata(pdev); if (!ice_pf_state_is_nominal(pf)) { - dev_err(dev, "Device is not ready, no need to suspend it\n"); + dev_err(dev, "Device is not ready for suspend.\n"); return -EBUSY; } - /* Stop watchdog tasks until resume completion. - * Even though it is most likely that the service task is - * disabled if the device is suspended or down, the service task's - * state is controlled by a different state bit, and we should - * store and honor whatever state that bit is in at this point. - */ - disabled = ice_service_task_stop(pf); - - ice_unplug_aux_dev(pf); - - /* Already suspended?, then there is nothing to do */ - if (test_and_set_bit(ICE_SUSPENDED, pf->state)) { - if (!disabled) - ice_service_task_restart(pf); - return 0; - } - - if (test_bit(ICE_DOWN, pf->state) || - ice_is_reset_in_progress(pf->state)) { - dev_err(dev, "can't suspend device in reset or already down\n"); - if (!disabled) - ice_service_task_restart(pf); - return 0; - } + /* Stop watchdog tasks until resume completion */ + ice_quiesce_before_reset(pf); + set_bit(ICE_SUSPENDED, pf->state); ice_setup_mc_magic_wake(pf); - - ice_prepare_for_shutdown(pf); - ice_set_wake(pf); - /* Free vectors, clear the interrupt scheme and release IRQs - * for proper hibernation, especially with large number of CPUs. - * Otherwise hibernation might fail when mapping all the vectors back - * to CPU0. - */ - ice_free_irq_msix_misc(pf); - ice_for_each_vsi(pf, v) { - if (!pf->vsi[v]) - continue; - ice_vsi_free_q_vectors(pf->vsi[v]); - } - ice_clear_interrupt_scheme(pf); - pci_save_state(pdev); pci_wake_from_d3(pdev, pf->wol_ena); pci_set_power_state(pdev, PCI_D3hot); + return 0; } @@ -5540,10 +5434,9 @@ static int __maybe_unused ice_suspend(struct device *dev) * ice_resume - PM callback for waking up from D3 * @dev: generic device information structure */ -static int __maybe_unused ice_resume(struct device *dev) +static int ice_resume(struct device *dev) { struct pci_dev *pdev = to_pci_dev(dev); - enum ice_reset_req reset_type; struct ice_pf *pf; struct ice_hw *hw; int ret; @@ -5566,32 +5459,24 @@ static int __maybe_unused ice_resume(struct device *dev) pf->wakeup_reason = rd32(hw, PFPM_WUS); ice_print_wake_reason(pf); - - /* We cleared the interrupt scheme when we suspended, so we need to - * restore it now to resume device functionality. - */ - ret = ice_reinit_interrupt_scheme(pf); - if (ret) - dev_err(dev, "Cannot restore interrupt scheme: %d\n", ret); + pci_wake_from_d3(pdev, false); clear_bit(ICE_DOWN, pf->state); - /* Now perform PF reset and rebuild */ - reset_type = ICE_RESET_PFR; - /* re-enable service task for reset, but allow reset to schedule it */ clear_bit(ICE_SERVICE_DIS, pf->state); + clear_bit(ICE_SUSPENDED, pf->state); - if (ice_schedule_reset(pf, reset_type)) - dev_err(dev, "Reset during resume failed.\n"); + /* force a reset, but it may already have been scheduled in + * ice_suspend, but either way the reset will execute + * once the service task is restarted + */ + ice_schedule_reset(pf, ICE_RESET_PFR); - clear_bit(ICE_SUSPENDED, pf->state); ice_service_task_restart(pf); - /* Restart the service task */ mod_timer(&pf->serv_tmr, round_jiffies(jiffies + pf->serv_tmr_period)); return 0; } -#endif /* CONFIG_PM */ /** * ice_pci_err_detected - warning that PCI error has been detected @@ -5612,14 +5497,8 @@ ice_pci_err_detected(struct pci_dev *pdev, pci_channel_state_t err) return PCI_ERS_RESULT_DISCONNECT; } - if (!test_bit(ICE_SUSPENDED, pf->state)) { - ice_service_task_stop(pf); - - if (!test_bit(ICE_PREPARED_FOR_RESET, pf->state)) { - set_bit(ICE_PFR_REQ, pf->state); - ice_prepare_for_reset(pf, ICE_RESET_PFR); - } - } + if (!test_bit(ICE_SUSPENDED, pf->state)) + ice_quiesce_before_reset(pf); return PCI_ERS_RESULT_NEED_RESET; } @@ -5698,14 +5577,8 @@ static void ice_pci_err_reset_prepare(struct pci_dev *pdev) { struct ice_pf *pf = pci_get_drvdata(pdev); - if (!test_bit(ICE_SUSPENDED, pf->state)) { - ice_service_task_stop(pf); - - if (!test_bit(ICE_PREPARED_FOR_RESET, pf->state)) { - set_bit(ICE_PFR_REQ, pf->state); - ice_prepare_for_reset(pf, ICE_RESET_PFR); - } - } + if (!test_bit(ICE_SUSPENDED, pf->state)) + ice_quiesce_before_reset(pf); } /** @@ -5761,7 +5634,7 @@ static const struct pci_device_id ice_pci_tbl[] = { }; MODULE_DEVICE_TABLE(pci, ice_pci_tbl); -static __maybe_unused SIMPLE_DEV_PM_OPS(ice_pm_ops, ice_suspend, ice_resume); +static DEFINE_SIMPLE_DEV_PM_OPS(ice_pm_ops, ice_suspend, ice_resume); static const struct pci_error_handlers ice_pci_err_handler = { .error_detected = ice_pci_err_detected, @@ -5776,9 +5649,7 @@ static struct pci_driver ice_driver = { .id_table = ice_pci_tbl, .probe = ice_probe, .remove = ice_remove, -#ifdef CONFIG_PM - .driver.pm = &ice_pm_ops, -#endif /* CONFIG_PM */ + .driver.pm = pm_sleep_ptr(&ice_pm_ops), .shutdown = ice_shutdown, .sriov_configure = ice_sriov_configure, .sriov_get_vf_total_msix = ice_sriov_get_vf_total_msix,