Message ID | 20240530142131.26741-1-en-wei.wu@canonical.com (mailing list archive) |
---|---|
State | Awaiting Upstream |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net,v2] ice: avoid IRQ collision to fix init failure on ACPI S3 resume | expand |
On 30.05.2024 16:21, Ricky Wu wrote: > A bug in https://bugzilla.kernel.org/show_bug.cgi?id=218906 describes > that irdma would break and report hardware initialization failed after > suspend/resume with Intel E810 NIC (tested on 6.9.0-rc5). > > The problem is caused due to the collision between the irq numbers > requested in irdma and the irq numbers requested in other drivers > after suspend/resume. > > The irq numbers used by irdma are derived from ice's ice_pf->msix_entries > which stores mappings between MSI-X index and Linux interrupt number. > It's supposed to be cleaned up when suspend and rebuilt in resume but > it's not, causing irdma using the old irq numbers stored in the old > ice_pf->msix_entries to request_irq() when resume. And eventually > collide with other drivers. > > This patch fixes this problem. On suspend, we call ice_deinit_rdma() to > clean up the ice_pf->msix_entries (and free the MSI-X vectors used by > irdma if we've dynamically allocated them). On resume, we call > ice_init_rdma() to rebuild the ice_pf->msix_entries (and allocate the > MSI-X vectors if we would like to dynamically allocate them). > > Fixes: f9f5301e7e2d ("ice: Register auxiliary device to provide RDMA") > Tested-by: Cyrus Lien <cyrus.lien@canonical.com> > Signed-off-by: Ricky Wu <en-wei.wu@canonical.com> > --- Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> > Changes in v2: > - Change title > - Add Fixes and Tested-by tags > - Fix typo > --- > drivers/net/ethernet/intel/ice/ice_main.c | 6 +++++- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c > index f60c022f7960..ec3cbadaa162 100644 > --- a/drivers/net/ethernet/intel/ice/ice_main.c > +++ b/drivers/net/ethernet/intel/ice/ice_main.c > @@ -5544,7 +5544,7 @@ static int ice_suspend(struct device *dev) > */ > disabled = ice_service_task_stop(pf); > > - ice_unplug_aux_dev(pf); > + ice_deinit_rdma(pf); > > /* Already suspended?, then there is nothing to do */ > if (test_and_set_bit(ICE_SUSPENDED, pf->state)) { > @@ -5624,6 +5624,10 @@ static int ice_resume(struct device *dev) > if (ret) > dev_err(dev, "Cannot restore interrupt scheme: %d\n", ret); > > + ret = ice_init_rdma(pf); > + if (ret) > + dev_err(dev, "Reinitialize RDMA during resume failed: %d\n", ret); > + > clear_bit(ICE_DOWN, pf->state); > /* Now perform PF reset and rebuild */ > reset_type = ICE_RESET_PFR;
On Thu, May 30, 2024 at 10:21:31PM +0800, Ricky Wu wrote: > A bug in https://bugzilla.kernel.org/show_bug.cgi?id=218906 describes > that irdma would break and report hardware initialization failed after > suspend/resume with Intel E810 NIC (tested on 6.9.0-rc5). > > The problem is caused due to the collision between the irq numbers > requested in irdma and the irq numbers requested in other drivers > after suspend/resume. > > The irq numbers used by irdma are derived from ice's ice_pf->msix_entries > which stores mappings between MSI-X index and Linux interrupt number. > It's supposed to be cleaned up when suspend and rebuilt in resume but > it's not, causing irdma using the old irq numbers stored in the old > ice_pf->msix_entries to request_irq() when resume. And eventually > collide with other drivers. > > This patch fixes this problem. On suspend, we call ice_deinit_rdma() to > clean up the ice_pf->msix_entries (and free the MSI-X vectors used by > irdma if we've dynamically allocated them). On resume, we call > ice_init_rdma() to rebuild the ice_pf->msix_entries (and allocate the > MSI-X vectors if we would like to dynamically allocate them). > > Fixes: f9f5301e7e2d ("ice: Register auxiliary device to provide RDMA") > Tested-by: Cyrus Lien <cyrus.lien@canonical.com> > Signed-off-by: Ricky Wu <en-wei.wu@canonical.com> > --- > Changes in v2: > - Change title > - Add Fixes and Tested-by tags > - Fix typo > --- > drivers/net/ethernet/intel/ice/ice_main.c | 6 +++++- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c > index f60c022f7960..ec3cbadaa162 100644 > --- a/drivers/net/ethernet/intel/ice/ice_main.c > +++ b/drivers/net/ethernet/intel/ice/ice_main.c > @@ -5544,7 +5544,7 @@ static int ice_suspend(struct device *dev) > */ > disabled = ice_service_task_stop(pf); > > - ice_unplug_aux_dev(pf); > + ice_deinit_rdma(pf); > > /* Already suspended?, then there is nothing to do */ > if (test_and_set_bit(ICE_SUSPENDED, pf->state)) { > @@ -5624,6 +5624,10 @@ static int ice_resume(struct device *dev) > if (ret) > dev_err(dev, "Cannot restore interrupt scheme: %d\n", ret); > > + ret = ice_init_rdma(pf); > + if (ret) > + dev_err(dev, "Reinitialize RDMA during resume failed: %d\n", ret); > + nit: The line above could trivially be wrapped to fit within 80 columns, as is preferred for Networking code. Flagged by checkpatch.pl --max-line-length=80 > clear_bit(ICE_DOWN, pf->state); > /* Now perform PF reset and rebuild */ > reset_type = ICE_RESET_PFR; > -- > 2.43.0 > >
> -----Original Message----- > From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf Of Ricky Wu > Sent: Thursday, May 30, 2024 7:52 PM > To: Brandeburg, Jesse <jesse.brandeburg@intel.com> > Cc: pmenzel@molgen.mpg.de; michal.swiatkowski@linux.intel.com; Drewek, Wojciech <wojciech.drewek@intel.com>; intel-wired-lan@lists.osuosl.org; rickywu0421@gmail.com; linux-kernel@vger.kernel.org; en-wei.wu@canonical.com; edumazet@google.com; Lien, Cyrus > <cyrus.lien@canonical.com>; Nguyen, Anthony L <anthony.l.nguyen@intel.com>; netdev@vger.kernel.org; kuba@kernel.org; pabeni@redhat.com; davem@davemloft.net > Subject: [Intel-wired-lan] [PATCH net, v2] ice: avoid IRQ collision to fix init failure on ACPI S3 resume > > A bug in https://bugzilla.kernel.org/show_bug.cgi?id=218906 describes that irdma would break and report hardware initialization failed after suspend/resume with Intel E810 NIC (tested on 6.9.0-rc5). > > The problem is caused due to the collision between the irq numbers requested in irdma and the irq numbers requested in other drivers after suspend/resume. > > The irq numbers used by irdma are derived from ice's ice_pf->msix_entries which stores mappings between MSI-X index and Linux interrupt number. > It's supposed to be cleaned up when suspend and rebuilt in resume but it's not, causing irdma using the old irq numbers stored in the old ice_pf->msix_entries to request_irq() when resume. And eventually collide with other drivers. > > This patch fixes this problem. On suspend, we call ice_deinit_rdma() to clean up the ice_pf->msix_entries (and free the MSI-X vectors used by irdma if we've dynamically allocated them). On resume, we call > ice_init_rdma() to rebuild the ice_pf->msix_entries (and allocate the MSI-X vectors if we would like to dynamically allocate them). > > Fixes: f9f5301e7e2d ("ice: Register auxiliary device to provide RDMA") > Tested-by: Cyrus Lien <cyrus.lien@canonical.com> > Signed-off-by: Ricky Wu <en-wei.wu@canonical.com> > --- > Changes in v2: > - Change title > - Add Fixes and Tested-by tags > - Fix typo > --- > drivers/net/ethernet/intel/ice/ice_main.c | 6 +++++- > 1 file changed, 5 insertions(+), 1 deletion(-) > Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c index f60c022f7960..ec3cbadaa162 100644 --- a/drivers/net/ethernet/intel/ice/ice_main.c +++ b/drivers/net/ethernet/intel/ice/ice_main.c @@ -5544,7 +5544,7 @@ static int ice_suspend(struct device *dev) */ disabled = ice_service_task_stop(pf); - ice_unplug_aux_dev(pf); + ice_deinit_rdma(pf); /* Already suspended?, then there is nothing to do */ if (test_and_set_bit(ICE_SUSPENDED, pf->state)) { @@ -5624,6 +5624,10 @@ static int ice_resume(struct device *dev) if (ret) dev_err(dev, "Cannot restore interrupt scheme: %d\n", ret); + ret = ice_init_rdma(pf); + if (ret) + dev_err(dev, "Reinitialize RDMA during resume failed: %d\n", ret); + clear_bit(ICE_DOWN, pf->state); /* Now perform PF reset and rebuild */ reset_type = ICE_RESET_PFR;