Message ID | 20240528100315.24290-1-en-wei.wu@canonical.com (mailing list archive) |
---|---|
State | Awaiting Upstream |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | ice: irdma hardware init failed after suspend/resume | expand |
On Tue, May 28, 2024 at 06:03:15PM +0800, Ricky Wu wrote: > A bug in https://bugzilla.kernel.org/show_bug.cgi?id=218906 describes > that irdma would break and report hardware initialization failed after > suspend/resume with Intel E810 NIC (tested on 6.9.0-rc5). > > The problem is caused due to the collision between the irq numbers > requested in irdma and the irq numbers requested in other drivers > after suspend/resume. > > The irq numbers used by irdma are derived from ice's ice_pf->msix_entries > which stores mappings between MSI-X index and Linux interrupt number. > It's supposed to be cleaned up when suspend and rebuilt in resume but > it's not, causing irdma using the old irq numbers stored in the old > ice_pf->msix_entries to request_irq() when resume. And eventually > collide with other drivers. > > This patch fixes this problem. On suspend, we call ice_deinit_rdma() to > clean up the ice_pf->msix_entries (and free the MSI-X vectors used by > irdma if we've dynamically allocated them). On Resume, we call > ice_init_rdma() to rebuild the ice_pf->msix_entries (and allocate the > MSI-X vectors if we would like to dynamically allocate them). > > Signed-off-by: Ricky Wu <en-wei.wu@canonical.com> > --- > drivers/net/ethernet/intel/ice/ice_main.c | 6 +++++- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c > index f60c022f7960..ec3cbadaa162 100644 > --- a/drivers/net/ethernet/intel/ice/ice_main.c > +++ b/drivers/net/ethernet/intel/ice/ice_main.c > @@ -5544,7 +5544,7 @@ static int ice_suspend(struct device *dev) > */ > disabled = ice_service_task_stop(pf); > > - ice_unplug_aux_dev(pf); > + ice_deinit_rdma(pf); > > /* Already suspended?, then there is nothing to do */ > if (test_and_set_bit(ICE_SUSPENDED, pf->state)) { > @@ -5624,6 +5624,10 @@ static int ice_resume(struct device *dev) > if (ret) > dev_err(dev, "Cannot restore interrupt scheme: %d\n", ret); > > + ret = ice_init_rdma(pf); > + if (ret) > + dev_err(dev, "Reinitialize RDMA during resume failed: %d\n", ret); > + > clear_bit(ICE_DOWN, pf->state); > /* Now perform PF reset and rebuild */ > reset_type = ICE_RESET_PFR; Looks fine, thanks for the fix. You can add fixes tag and target it to a net tree. Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> > -- > 2.43.0 >
On 28.05.2024 12:03, Ricky Wu wrote: > A bug in https://bugzilla.kernel.org/show_bug.cgi?id=218906 describes > that irdma would break and report hardware initialization failed after > suspend/resume with Intel E810 NIC (tested on 6.9.0-rc5). > > The problem is caused due to the collision between the irq numbers > requested in irdma and the irq numbers requested in other drivers > after suspend/resume. > > The irq numbers used by irdma are derived from ice's ice_pf->msix_entries > which stores mappings between MSI-X index and Linux interrupt number. > It's supposed to be cleaned up when suspend and rebuilt in resume but > it's not, causing irdma using the old irq numbers stored in the old > ice_pf->msix_entries to request_irq() when resume. And eventually > collide with other drivers. > > This patch fixes this problem. On suspend, we call ice_deinit_rdma() to > clean up the ice_pf->msix_entries (and free the MSI-X vectors used by > irdma if we've dynamically allocated them). On Resume, we call > ice_init_rdma() to rebuild the ice_pf->msix_entries (and allocate the > MSI-X vectors if we would like to dynamically allocate them). > > Signed-off-by: Ricky Wu <en-wei.wu@canonical.com> > --- Thanks for the patch! > drivers/net/ethernet/intel/ice/ice_main.c | 6 +++++- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c > index f60c022f7960..ec3cbadaa162 100644 > --- a/drivers/net/ethernet/intel/ice/ice_main.c > +++ b/drivers/net/ethernet/intel/ice/ice_main.c > @@ -5544,7 +5544,7 @@ static int ice_suspend(struct device *dev) > */ > disabled = ice_service_task_stop(pf); > > - ice_unplug_aux_dev(pf); > + ice_deinit_rdma(pf); I think this should be called later in the reset path IMO. You should call ice_deinit_rdma in ice_prepare_for_reset (replace ice_unplug_aux_dev), > > /* Already suspended?, then there is nothing to do */ > if (test_and_set_bit(ICE_SUSPENDED, pf->state)) { > @@ -5624,6 +5624,10 @@ static int ice_resume(struct device *dev) > if (ret) > dev_err(dev, "Cannot restore interrupt scheme: %d\n", ret); > > + ret = ice_init_rdma(pf); > + if (ret) > + dev_err(dev, "Reinitialize RDMA during resume failed: %d\n", ret); And ice_init_rdma should be moved to ice_rebuild (replace ice_plug_aux_dev) > + > clear_bit(ICE_DOWN, pf->state); > /* Now perform PF reset and rebuild */ > reset_type = ICE_RESET_PFR;
Dear Ricky, Thank you for your patch. Some minor nits. It’d be great if you made the summary about the action and not an issue description. Maybe: Avoid IRQ collision to fix init failure on ACPI S3 resume Am 28.05.24 um 12:03 schrieb Ricky Wu: > A bug in https://bugzilla.kernel.org/show_bug.cgi?id=218906 describes > that irdma would break and report hardware initialization failed after > suspend/resume with Intel E810 NIC (tested on 6.9.0-rc5). > > The problem is caused due to the collision between the irq numbers > requested in irdma and the irq numbers requested in other drivers > after suspend/resume. > > The irq numbers used by irdma are derived from ice's ice_pf->msix_entries > which stores mappings between MSI-X index and Linux interrupt number. > It's supposed to be cleaned up when suspend and rebuilt in resume but > it's not, causing irdma using the old irq numbers stored in the old > ice_pf->msix_entries to request_irq() when resume. And eventually > collide with other drivers. > > This patch fixes this problem. On suspend, we call ice_deinit_rdma() to > clean up the ice_pf->msix_entries (and free the MSI-X vectors used by > irdma if we've dynamically allocated them). On Resume, we call resume > ice_init_rdma() to rebuild the ice_pf->msix_entries (and allocate the > MSI-X vectors if we would like to dynamically allocate them). > > Signed-off-by: Ricky Wu <en-wei.wu@canonical.com> Please add a Link: tag. If this was tested by somebody else, please also add a Tested-by: line. > --- > drivers/net/ethernet/intel/ice/ice_main.c | 6 +++++- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c > index f60c022f7960..ec3cbadaa162 100644 > --- a/drivers/net/ethernet/intel/ice/ice_main.c > +++ b/drivers/net/ethernet/intel/ice/ice_main.c > @@ -5544,7 +5544,7 @@ static int ice_suspend(struct device *dev) > */ > disabled = ice_service_task_stop(pf); > > - ice_unplug_aux_dev(pf); > + ice_deinit_rdma(pf); > > /* Already suspended?, then there is nothing to do */ > if (test_and_set_bit(ICE_SUSPENDED, pf->state)) { > @@ -5624,6 +5624,10 @@ static int ice_resume(struct device *dev) > if (ret) > dev_err(dev, "Cannot restore interrupt scheme: %d\n", ret); > > + ret = ice_init_rdma(pf); > + if (ret) > + dev_err(dev, "Reinitialize RDMA during resume failed: %d\n", ret); > + > clear_bit(ICE_DOWN, pf->state); > /* Now perform PF reset and rebuild */ > reset_type = ICE_RESET_PFR; What effect does this have on resume time? Kind regards, Paul
Thanks for your kind and quick reply. > I think this should be called later in the reset path IMO. > You should call ice_deinit_rdma in ice_prepare_for_reset (replace ice_unplug_aux_dev), I'm afraid this would break the existing code because in ice_deinit_rdma(), it will remove some entries in pf->irq_tracker.entries. And in ice_reinit_interrupt_scheme() (which is called before ice_prepare_for_reset), some entries have been allocated for other irq usage. > What effect does this have on resume time? When we call ice_init_rdma() at resume time, it will allocate entries at pf->irq_tracker.entries and update pf->msix_entries for later use (request_irq) by irdma. On Tue, 28 May 2024 at 19:12, Paul Menzel <pmenzel@molgen.mpg.de> wrote: > > Dear Ricky, > > > Thank you for your patch. Some minor nits. It’d be great if you made the > summary about the action and not an issue description. Maybe: > > Avoid IRQ collision to fix init failure on ACPI S3 resume > > Am 28.05.24 um 12:03 schrieb Ricky Wu: > > A bug in https://bugzilla.kernel.org/show_bug.cgi?id=218906 describes > > that irdma would break and report hardware initialization failed after > > suspend/resume with Intel E810 NIC (tested on 6.9.0-rc5). > > > > The problem is caused due to the collision between the irq numbers > > requested in irdma and the irq numbers requested in other drivers > > after suspend/resume. > > > > The irq numbers used by irdma are derived from ice's ice_pf->msix_entries > > which stores mappings between MSI-X index and Linux interrupt number. > > It's supposed to be cleaned up when suspend and rebuilt in resume but > > it's not, causing irdma using the old irq numbers stored in the old > > ice_pf->msix_entries to request_irq() when resume. And eventually > > collide with other drivers. > > > > This patch fixes this problem. On suspend, we call ice_deinit_rdma() to > > clean up the ice_pf->msix_entries (and free the MSI-X vectors used by > > irdma if we've dynamically allocated them). On Resume, we call > > resume > > > ice_init_rdma() to rebuild the ice_pf->msix_entries (and allocate the > > MSI-X vectors if we would like to dynamically allocate them). > > > > Signed-off-by: Ricky Wu <en-wei.wu@canonical.com> > > Please add a Link: tag. > > If this was tested by somebody else, please also add a Tested-by: line. > > > --- > > drivers/net/ethernet/intel/ice/ice_main.c | 6 +++++- > > 1 file changed, 5 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c > > index f60c022f7960..ec3cbadaa162 100644 > > --- a/drivers/net/ethernet/intel/ice/ice_main.c > > +++ b/drivers/net/ethernet/intel/ice/ice_main.c > > @@ -5544,7 +5544,7 @@ static int ice_suspend(struct device *dev) > > */ > > disabled = ice_service_task_stop(pf); > > > > - ice_unplug_aux_dev(pf); > > + ice_deinit_rdma(pf); > > > > /* Already suspended?, then there is nothing to do */ > > if (test_and_set_bit(ICE_SUSPENDED, pf->state)) { > > @@ -5624,6 +5624,10 @@ static int ice_resume(struct device *dev) > > if (ret) > > dev_err(dev, "Cannot restore interrupt scheme: %d\n", ret); > > > > + ret = ice_init_rdma(pf); > > + if (ret) > > + dev_err(dev, "Reinitialize RDMA during resume failed: %d\n", ret); > > + > > clear_bit(ICE_DOWN, pf->state); > > /* Now perform PF reset and rebuild */ > > reset_type = ICE_RESET_PFR; > > What effect does this have on resume time? > > > Kind regards, > > Paul
Dear En-Wei, Thank you for responding so quickly. Am 29.05.24 um 05:17 schrieb En-Wei WU: […] >> What effect does this have on resume time? > > When we call ice_init_rdma() at resume time, it will allocate entries > at pf->irq_tracker.entries and update pf->msix_entries for later use > (request_irq) by irdma. Sorry for being unclear. I meant, does resuming the system take longer now? (initcall_debug might give a clue.) Kind regards, Paul
Thank you for your reply. > Sorry for being unclear. I meant, does resuming the system take longer > now? (initcall_debug might give a clue.) I've tested the S3 suspend/resume with the initcall_debug kernel command option, and it shows no clear difference between having or not having the ice_init_rdma in ice_resume: Without ice_init_rdma: ``` [ 104.241129] ice 0000:86:00.0: PM: pci_pm_resume+0x0/0x110 returned 0 after 9415 usecs [ 104.241206] ice 0000:86:00.1: PM: pci_pm_resume+0x0/0x110 returned 0 after 9443 usecs ``` With ice_init_rdma: ``` [ 122.749022] ice 0000:86:00.1: PM: pci_pm_resume+0x0/0x110 returned 0 after 9485 usecs [ 122.749068] ice 0000:86:00.0: PM: pci_pm_resume+0x0/0x110 returned 0 after 9532 usecs ``` > And ice_init_rdma should be moved to ice_rebuild (replace ice_plug_aux_dev) We can defer the ice_init_rdma to the later service task by adopting this. > You should call ice_deinit_rdma in ice_prepare_for_reset (replace ice_unplug_aux_dev), It seems like we must call ice_deinit_rdma in ice_suspend. If we call it in the later service task, it will: 1. break some existing code setup by ice_resume 2. Since the PCI-X vector table is flushed at the end of ice_suspend, we have no way to release PCI-X vectors for rdma if we had allocated it dynamically The second point is important since we didn't release the PCI-X vectors for rdma (if we allocated it dynamically) in the original ice_suspend, and it's somewhat like a leak in the original code. Best regards, Ricky. On Thu, 30 May 2024 at 04:19, Paul Menzel <pmenzel@molgen.mpg.de> wrote: > > Dear En-Wei, > > > Thank you for responding so quickly. > > Am 29.05.24 um 05:17 schrieb En-Wei WU: > > […] > > >> What effect does this have on resume time? > > > > When we call ice_init_rdma() at resume time, it will allocate entries > > at pf->irq_tracker.entries and update pf->msix_entries for later use > > (request_irq) by irdma. > > Sorry for being unclear. I meant, does resuming the system take longer > now? (initcall_debug might give a clue.) > > > Kind regards, > > Paul
Dear En-Wei, Thank you for your quick reply. Am 30.05.24 um 09:08 schrieb En-Wei WU: >> Sorry for being unclear. I meant, does resuming the system take longer >> now? (initcall_debug might give a clue.) > I've tested the S3 suspend/resume with the initcall_debug kernel > command option, and it shows no clear difference between having or not > having the ice_init_rdma in ice_resume: > Without ice_init_rdma: > ``` > [ 104.241129] ice 0000:86:00.0: PM: pci_pm_resume+0x0/0x110 returned 0 after 9415 usecs > [ 104.241206] ice 0000:86:00.1: PM: pci_pm_resume+0x0/0x110 returned 0 after 9443 usecs > ``` > With ice_init_rdma: > ``` > [ 122.749022] ice 0000:86:00.1: PM: pci_pm_resume+0x0/0x110 returned 0 after 9485 usecs > [ 122.749068] ice 0000:86:00.0: PM: pci_pm_resume+0x0/0x110 returned 0 after 9532 usecs > ``` Thank you for verifying this. Nice to hear, it has no impact. […] Kind regards, Paul
On 30.05.2024 09:08, En-Wei WU wrote: > Thank you for your reply. > >> Sorry for being unclear. I meant, does resuming the system take longer >> now? (initcall_debug might give a clue.) > I've tested the S3 suspend/resume with the initcall_debug kernel > command option, and it shows no clear difference between having or not > having the ice_init_rdma in ice_resume: > Without ice_init_rdma: > ``` > [ 104.241129] ice 0000:86:00.0: PM: pci_pm_resume+0x0/0x110 returned > 0 after 9415 usecs > [ 104.241206] ice 0000:86:00.1: PM: pci_pm_resume+0x0/0x110 returned > 0 after 9443 usecs > ``` > With ice_init_rdma: > ``` > [ 122.749022] ice 0000:86:00.1: PM: pci_pm_resume+0x0/0x110 returned > 0 after 9485 usecs > [ 122.749068] ice 0000:86:00.0: PM: pci_pm_resume+0x0/0x110 returned > 0 after 9532 usecs > ``` > >> And ice_init_rdma should be moved to ice_rebuild (replace ice_plug_aux_dev) > We can defer the ice_init_rdma to the later service task by adopting this. > >> You should call ice_deinit_rdma in ice_prepare_for_reset (replace ice_unplug_aux_dev), > It seems like we must call ice_deinit_rdma in ice_suspend. If we call > it in the later service task, it will: > 1. break some existing code setup by ice_resume > 2. Since the PCI-X vector table is flushed at the end of ice_suspend, > we have no way to release PCI-X vectors for rdma if we had allocated > it dynamically > The second point is important since we didn't release the PCI-X > vectors for rdma (if we allocated it dynamically) in the original > ice_suspend, and it's somewhat like a leak in the original code. > > Best regards, > Ricky. Makes sense, let's keep ice_deinit_rdma in ice_suspend. > > On Thu, 30 May 2024 at 04:19, Paul Menzel <pmenzel@molgen.mpg.de> wrote: >> >> Dear En-Wei, >> >> >> Thank you for responding so quickly. >> >> Am 29.05.24 um 05:17 schrieb En-Wei WU: >> >> […] >> >>>> What effect does this have on resume time? >>> >>> When we call ice_init_rdma() at resume time, it will allocate entries >>> at pf->irq_tracker.entries and update pf->msix_entries for later use >>> (request_irq) by irdma. >> >> Sorry for being unclear. I meant, does resuming the system take longer >> now? (initcall_debug might give a clue.) >> >> >> Kind regards, >> >> Paul
diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c index f60c022f7960..ec3cbadaa162 100644 --- a/drivers/net/ethernet/intel/ice/ice_main.c +++ b/drivers/net/ethernet/intel/ice/ice_main.c @@ -5544,7 +5544,7 @@ static int ice_suspend(struct device *dev) */ disabled = ice_service_task_stop(pf); - ice_unplug_aux_dev(pf); + ice_deinit_rdma(pf); /* Already suspended?, then there is nothing to do */ if (test_and_set_bit(ICE_SUSPENDED, pf->state)) { @@ -5624,6 +5624,10 @@ static int ice_resume(struct device *dev) if (ret) dev_err(dev, "Cannot restore interrupt scheme: %d\n", ret); + ret = ice_init_rdma(pf); + if (ret) + dev_err(dev, "Reinitialize RDMA during resume failed: %d\n", ret); + clear_bit(ICE_DOWN, pf->state); /* Now perform PF reset and rebuild */ reset_type = ICE_RESET_PFR;
A bug in https://bugzilla.kernel.org/show_bug.cgi?id=218906 describes that irdma would break and report hardware initialization failed after suspend/resume with Intel E810 NIC (tested on 6.9.0-rc5). The problem is caused due to the collision between the irq numbers requested in irdma and the irq numbers requested in other drivers after suspend/resume. The irq numbers used by irdma are derived from ice's ice_pf->msix_entries which stores mappings between MSI-X index and Linux interrupt number. It's supposed to be cleaned up when suspend and rebuilt in resume but it's not, causing irdma using the old irq numbers stored in the old ice_pf->msix_entries to request_irq() when resume. And eventually collide with other drivers. This patch fixes this problem. On suspend, we call ice_deinit_rdma() to clean up the ice_pf->msix_entries (and free the MSI-X vectors used by irdma if we've dynamically allocated them). On Resume, we call ice_init_rdma() to rebuild the ice_pf->msix_entries (and allocate the MSI-X vectors if we would like to dynamically allocate them). Signed-off-by: Ricky Wu <en-wei.wu@canonical.com> --- drivers/net/ethernet/intel/ice/ice_main.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)