ice: irdma hardware init failed after suspend/resume

Message ID	20240528100315.24290-1-en-wei.wu@canonical.com (mailing list archive)
State	Awaiting Upstream
Delegated to:	Netdev Maintainers
Headers	show Received: from smtp-relay-internal-0.canonical.com (smtp-relay-internal-0.canonical.com [185.125.188.122]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8CAC817E8F4 for <netdev@vger.kernel.org>; Tue, 28 May 2024 10:03:25 +0000 (UTC) From: Ricky Wu <en-wei.wu@canonical.com> To: jesse.brandeburg@intel.com Cc: anthony.l.nguyen@intel.com, davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com, intel-wired-lan@lists.osuosl.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, rickywu0421@gmail.com, en-wei.wu@canonical.com Subject: [PATCH] ice: irdma hardware init failed after suspend/resume Date: Tue, 28 May 2024 18:03:15 +0800 Message-ID: <20240528100315.24290-1-en-wei.wu@canonical.com> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	ice: irdma hardware init failed after suspend/resume \| expand ice: irdma hardware init failed after suspend/resume

Context	Check	Description
netdev/series_format	warning	Single patches do not need cover letters; Target tree name not specified in the subject
netdev/tree_selection	success	Guessed tree name to be net-next
netdev/ynl	success	Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present	success	Fixes tag not required for -next series
netdev/header_inline	success	No static functions without inline keyword in header files
netdev/build_32bit	success	Errors and warnings before: 902 this patch: 902
netdev/build_tools	success	No tools touched, skip
netdev/cc_maintainers	success	CCed 7 of 7 maintainers
netdev/build_clang	success	Errors and warnings before: 906 this patch: 906
netdev/verify_signedoff	success	Signed-off-by tag matches author and committer
netdev/deprecated_api	success	None detected
netdev/check_selftest	success	No net selftest shell script
netdev/verify_fixes	success	No Fixes tag
netdev/build_allmodconfig_warn	success	Errors and warnings before: 906 this patch: 906
netdev/checkpatch	warning	WARNING: line length of 82 exceeds 80 columns
netdev/build_clang_rust	success	No Rust files in patch. Skipping build
netdev/kdoc	success	Errors and warnings before: 116 this patch: 116
netdev/source_inline	success	Was 0 now: 0
netdev/contest	success	net-next-2024-05-28--15-00 (tests: 1037)

En-Wei Wu May 28, 2024, 10:03 a.m. UTC

A bug in https://bugzilla.kernel.org/show_bug.cgi?id=218906 describes
that irdma would break and report hardware initialization failed after
suspend/resume with Intel E810 NIC (tested on 6.9.0-rc5).

The problem is caused due to the collision between the irq numbers
requested in irdma and the irq numbers requested in other drivers
after suspend/resume.

The irq numbers used by irdma are derived from ice's ice_pf->msix_entries
which stores mappings between MSI-X index and Linux interrupt number.
It's supposed to be cleaned up when suspend and rebuilt in resume but
it's not, causing irdma using the old irq numbers stored in the old
ice_pf->msix_entries to request_irq() when resume. And eventually
collide with other drivers.

This patch fixes this problem. On suspend, we call ice_deinit_rdma() to
clean up the ice_pf->msix_entries (and free the MSI-X vectors used by
irdma if we've dynamically allocated them). On Resume, we call
ice_init_rdma() to rebuild the ice_pf->msix_entries (and allocate the
MSI-X vectors if we would like to dynamically allocate them).

Signed-off-by: Ricky Wu <en-wei.wu@canonical.com>
---
 drivers/net/ethernet/intel/ice/ice_main.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

Michal Swiatkowski May 28, 2024, 10:43 a.m. UTC | #1

On Tue, May 28, 2024 at 06:03:15PM +0800, Ricky Wu wrote:
> A bug in https://bugzilla.kernel.org/show_bug.cgi?id=218906 describes
> that irdma would break and report hardware initialization failed after
> suspend/resume with Intel E810 NIC (tested on 6.9.0-rc5).
> 
> The problem is caused due to the collision between the irq numbers
> requested in irdma and the irq numbers requested in other drivers
> after suspend/resume.
> 
> The irq numbers used by irdma are derived from ice's ice_pf->msix_entries
> which stores mappings between MSI-X index and Linux interrupt number.
> It's supposed to be cleaned up when suspend and rebuilt in resume but
> it's not, causing irdma using the old irq numbers stored in the old
> ice_pf->msix_entries to request_irq() when resume. And eventually
> collide with other drivers.
> 
> This patch fixes this problem. On suspend, we call ice_deinit_rdma() to
> clean up the ice_pf->msix_entries (and free the MSI-X vectors used by
> irdma if we've dynamically allocated them). On Resume, we call
> ice_init_rdma() to rebuild the ice_pf->msix_entries (and allocate the
> MSI-X vectors if we would like to dynamically allocate them).
> 
> Signed-off-by: Ricky Wu <en-wei.wu@canonical.com>
> ---
>  drivers/net/ethernet/intel/ice/ice_main.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
> index f60c022f7960..ec3cbadaa162 100644
> --- a/drivers/net/ethernet/intel/ice/ice_main.c
> +++ b/drivers/net/ethernet/intel/ice/ice_main.c
> @@ -5544,7 +5544,7 @@ static int ice_suspend(struct device *dev)
>  	 */
>  	disabled = ice_service_task_stop(pf);
>  
> -	ice_unplug_aux_dev(pf);
> +	ice_deinit_rdma(pf);
>  
>  	/* Already suspended?, then there is nothing to do */
>  	if (test_and_set_bit(ICE_SUSPENDED, pf->state)) {
> @@ -5624,6 +5624,10 @@ static int ice_resume(struct device *dev)
>  	if (ret)
>  		dev_err(dev, "Cannot restore interrupt scheme: %d\n", ret);
>  
> +	ret = ice_init_rdma(pf);
> +	if (ret)
> +		dev_err(dev, "Reinitialize RDMA during resume failed: %d\n", ret);
> +
>  	clear_bit(ICE_DOWN, pf->state);
>  	/* Now perform PF reset and rebuild */
>  	reset_type = ICE_RESET_PFR;

Looks fine, thanks for the fix. You can add fixes tag and target it to a
net tree.

Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>

> -- 
> 2.43.0
>

Wojciech Drewek May 28, 2024, 10:43 a.m. UTC | #2

On 28.05.2024 12:03, Ricky Wu wrote:
> A bug in https://bugzilla.kernel.org/show_bug.cgi?id=218906 describes
> that irdma would break and report hardware initialization failed after
> suspend/resume with Intel E810 NIC (tested on 6.9.0-rc5).
> 
> The problem is caused due to the collision between the irq numbers
> requested in irdma and the irq numbers requested in other drivers
> after suspend/resume.
> 
> The irq numbers used by irdma are derived from ice's ice_pf->msix_entries
> which stores mappings between MSI-X index and Linux interrupt number.
> It's supposed to be cleaned up when suspend and rebuilt in resume but
> it's not, causing irdma using the old irq numbers stored in the old
> ice_pf->msix_entries to request_irq() when resume. And eventually
> collide with other drivers.
> 
> This patch fixes this problem. On suspend, we call ice_deinit_rdma() to
> clean up the ice_pf->msix_entries (and free the MSI-X vectors used by
> irdma if we've dynamically allocated them). On Resume, we call
> ice_init_rdma() to rebuild the ice_pf->msix_entries (and allocate the
> MSI-X vectors if we would like to dynamically allocate them).
> 
> Signed-off-by: Ricky Wu <en-wei.wu@canonical.com>
> ---

Thanks for the patch!

>  drivers/net/ethernet/intel/ice/ice_main.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
> index f60c022f7960..ec3cbadaa162 100644
> --- a/drivers/net/ethernet/intel/ice/ice_main.c
> +++ b/drivers/net/ethernet/intel/ice/ice_main.c
> @@ -5544,7 +5544,7 @@ static int ice_suspend(struct device *dev)
>  	 */
>  	disabled = ice_service_task_stop(pf);
>  
> -	ice_unplug_aux_dev(pf);
> +	ice_deinit_rdma(pf);

I think this should be called later in the reset path IMO.
You should call ice_deinit_rdma in ice_prepare_for_reset (replace ice_unplug_aux_dev),

>  
>  	/* Already suspended?, then there is nothing to do */
>  	if (test_and_set_bit(ICE_SUSPENDED, pf->state)) {
> @@ -5624,6 +5624,10 @@ static int ice_resume(struct device *dev)
>  	if (ret)
>  		dev_err(dev, "Cannot restore interrupt scheme: %d\n", ret);
>  
> +	ret = ice_init_rdma(pf);
> +	if (ret)
> +		dev_err(dev, "Reinitialize RDMA during resume failed: %d\n", ret);

And ice_init_rdma should be moved to ice_rebuild (replace ice_plug_aux_dev)

> +
>  	clear_bit(ICE_DOWN, pf->state);
>  	/* Now perform PF reset and rebuild */
>  	reset_type = ICE_RESET_PFR;

Paul Menzel May 28, 2024, 11:12 a.m. UTC | #3

Dear Ricky,


Thank you for your patch. Some minor nits. It’d be great if you made the 
summary about the action and not an issue description. Maybe:

Avoid IRQ collision to fix init failure on ACPI S3 resume

Am 28.05.24 um 12:03 schrieb Ricky Wu:
> A bug in https://bugzilla.kernel.org/show_bug.cgi?id=218906 describes
> that irdma would break and report hardware initialization failed after
> suspend/resume with Intel E810 NIC (tested on 6.9.0-rc5).
> 
> The problem is caused due to the collision between the irq numbers
> requested in irdma and the irq numbers requested in other drivers
> after suspend/resume.
> 
> The irq numbers used by irdma are derived from ice's ice_pf->msix_entries
> which stores mappings between MSI-X index and Linux interrupt number.
> It's supposed to be cleaned up when suspend and rebuilt in resume but
> it's not, causing irdma using the old irq numbers stored in the old
> ice_pf->msix_entries to request_irq() when resume. And eventually
> collide with other drivers.
> 
> This patch fixes this problem. On suspend, we call ice_deinit_rdma() to
> clean up the ice_pf->msix_entries (and free the MSI-X vectors used by
> irdma if we've dynamically allocated them). On Resume, we call

resume

> ice_init_rdma() to rebuild the ice_pf->msix_entries (and allocate the
> MSI-X vectors if we would like to dynamically allocate them).
> 
> Signed-off-by: Ricky Wu <en-wei.wu@canonical.com>

Please add a Link: tag.

If this was tested by somebody else, please also add a Tested-by: line.

> ---
>   drivers/net/ethernet/intel/ice/ice_main.c | 6 +++++-
>   1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
> index f60c022f7960..ec3cbadaa162 100644
> --- a/drivers/net/ethernet/intel/ice/ice_main.c
> +++ b/drivers/net/ethernet/intel/ice/ice_main.c
> @@ -5544,7 +5544,7 @@ static int ice_suspend(struct device *dev)
>   	 */
>   	disabled = ice_service_task_stop(pf);
>   
> -	ice_unplug_aux_dev(pf);
> +	ice_deinit_rdma(pf);
>   
>   	/* Already suspended?, then there is nothing to do */
>   	if (test_and_set_bit(ICE_SUSPENDED, pf->state)) {
> @@ -5624,6 +5624,10 @@ static int ice_resume(struct device *dev)
>   	if (ret)
>   		dev_err(dev, "Cannot restore interrupt scheme: %d\n", ret);
>   
> +	ret = ice_init_rdma(pf);
> +	if (ret)
> +		dev_err(dev, "Reinitialize RDMA during resume failed: %d\n", ret);
> +
>   	clear_bit(ICE_DOWN, pf->state);
>   	/* Now perform PF reset and rebuild */
>   	reset_type = ICE_RESET_PFR;

What effect does this have on resume time?


Kind regards,

Paul

En-Wei Wu May 29, 2024, 3:17 a.m. UTC | #4

Thanks for your kind and quick reply.

> I think this should be called later in the reset path IMO.
> You should call ice_deinit_rdma in ice_prepare_for_reset (replace ice_unplug_aux_dev),
I'm afraid this would break the existing code because in
ice_deinit_rdma(), it will remove some entries in
pf->irq_tracker.entries. And in ice_reinit_interrupt_scheme() (which
is called before ice_prepare_for_reset), some entries have been
allocated for other irq usage.

> What effect does this have on resume time?
When we call ice_init_rdma() at resume time, it will allocate entries
at pf->irq_tracker.entries and update pf->msix_entries for later use
(request_irq) by irdma.

On Tue, 28 May 2024 at 19:12, Paul Menzel <pmenzel@molgen.mpg.de> wrote:
>
> Dear Ricky,
>
>
> Thank you for your patch. Some minor nits. It’d be great if you made the
> summary about the action and not an issue description. Maybe:
>
> Avoid IRQ collision to fix init failure on ACPI S3 resume
>
> Am 28.05.24 um 12:03 schrieb Ricky Wu:
> > A bug in https://bugzilla.kernel.org/show_bug.cgi?id=218906 describes
> > that irdma would break and report hardware initialization failed after
> > suspend/resume with Intel E810 NIC (tested on 6.9.0-rc5).
> >
> > The problem is caused due to the collision between the irq numbers
> > requested in irdma and the irq numbers requested in other drivers
> > after suspend/resume.
> >
> > The irq numbers used by irdma are derived from ice's ice_pf->msix_entries
> > which stores mappings between MSI-X index and Linux interrupt number.
> > It's supposed to be cleaned up when suspend and rebuilt in resume but
> > it's not, causing irdma using the old irq numbers stored in the old
> > ice_pf->msix_entries to request_irq() when resume. And eventually
> > collide with other drivers.
> >
> > This patch fixes this problem. On suspend, we call ice_deinit_rdma() to
> > clean up the ice_pf->msix_entries (and free the MSI-X vectors used by
> > irdma if we've dynamically allocated them). On Resume, we call
>
> resume
>
> > ice_init_rdma() to rebuild the ice_pf->msix_entries (and allocate the
> > MSI-X vectors if we would like to dynamically allocate them).
> >
> > Signed-off-by: Ricky Wu <en-wei.wu@canonical.com>
>
> Please add a Link: tag.
>
> If this was tested by somebody else, please also add a Tested-by: line.
>
> > ---
> >   drivers/net/ethernet/intel/ice/ice_main.c | 6 +++++-
> >   1 file changed, 5 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
> > index f60c022f7960..ec3cbadaa162 100644
> > --- a/drivers/net/ethernet/intel/ice/ice_main.c
> > +++ b/drivers/net/ethernet/intel/ice/ice_main.c
> > @@ -5544,7 +5544,7 @@ static int ice_suspend(struct device *dev)
> >        */
> >       disabled = ice_service_task_stop(pf);
> >
> > -     ice_unplug_aux_dev(pf);
> > +     ice_deinit_rdma(pf);
> >
> >       /* Already suspended?, then there is nothing to do */
> >       if (test_and_set_bit(ICE_SUSPENDED, pf->state)) {
> > @@ -5624,6 +5624,10 @@ static int ice_resume(struct device *dev)
> >       if (ret)
> >               dev_err(dev, "Cannot restore interrupt scheme: %d\n", ret);
> >
> > +     ret = ice_init_rdma(pf);
> > +     if (ret)
> > +             dev_err(dev, "Reinitialize RDMA during resume failed: %d\n", ret);
> > +
> >       clear_bit(ICE_DOWN, pf->state);
> >       /* Now perform PF reset and rebuild */
> >       reset_type = ICE_RESET_PFR;
>
> What effect does this have on resume time?
>
>
> Kind regards,
>
> Paul

Paul Menzel May 29, 2024, 8:18 p.m. UTC | #5

Dear En-Wei,


Thank you for responding so quickly.

Am 29.05.24 um 05:17 schrieb En-Wei WU:

[…]

>> What effect does this have on resume time?
> 
> When we call ice_init_rdma() at resume time, it will allocate entries
> at pf->irq_tracker.entries and update pf->msix_entries for later use
> (request_irq) by irdma.

Sorry for being unclear. I meant, does resuming the system take longer 
now? (initcall_debug might give a clue.)


Kind regards,

Paul

En-Wei Wu May 30, 2024, 7:08 a.m. UTC | #6

Thank you for your reply.

> Sorry for being unclear. I meant, does resuming the system take longer
> now? (initcall_debug might give a clue.)
I've tested the S3 suspend/resume with the initcall_debug kernel
command option, and it shows no clear difference between having or not
having the ice_init_rdma in ice_resume:
Without ice_init_rdma:
```
[  104.241129] ice 0000:86:00.0: PM: pci_pm_resume+0x0/0x110 returned
0 after 9415 usecs
[  104.241206] ice 0000:86:00.1: PM: pci_pm_resume+0x0/0x110 returned
0 after 9443 usecs
```
With ice_init_rdma:
```
[  122.749022] ice 0000:86:00.1: PM: pci_pm_resume+0x0/0x110 returned
0 after 9485 usecs
[  122.749068] ice 0000:86:00.0: PM: pci_pm_resume+0x0/0x110 returned
0 after 9532 usecs
```

> And ice_init_rdma should be moved to ice_rebuild (replace ice_plug_aux_dev)
We can defer the ice_init_rdma to the later service task by adopting this.

> You should call ice_deinit_rdma in ice_prepare_for_reset (replace ice_unplug_aux_dev),
It seems like we must call ice_deinit_rdma in ice_suspend. If we call
it in the later service task, it will:
1. break some existing code setup by ice_resume
2. Since the PCI-X vector table is flushed at the end of ice_suspend,
we have no way to release PCI-X vectors for rdma if we had allocated
it dynamically
The second point is important since we didn't release the PCI-X
vectors for rdma (if we allocated it dynamically) in the original
ice_suspend, and it's somewhat like a leak in the original code.

Best regards,
Ricky.

On Thu, 30 May 2024 at 04:19, Paul Menzel <pmenzel@molgen.mpg.de> wrote:
>
> Dear En-Wei,
>
>
> Thank you for responding so quickly.
>
> Am 29.05.24 um 05:17 schrieb En-Wei WU:
>
> […]
>
> >> What effect does this have on resume time?
> >
> > When we call ice_init_rdma() at resume time, it will allocate entries
> > at pf->irq_tracker.entries and update pf->msix_entries for later use
> > (request_irq) by irdma.
>
> Sorry for being unclear. I meant, does resuming the system take longer
> now? (initcall_debug might give a clue.)
>
>
> Kind regards,
>
> Paul

Paul Menzel May 30, 2024, 7:55 a.m. UTC | #7

Dear En-Wei,


Thank you for your quick reply.

Am 30.05.24 um 09:08 schrieb En-Wei WU:

>> Sorry for being unclear. I meant, does resuming the system take longer
>> now? (initcall_debug might give a clue.)
> I've tested the S3 suspend/resume with the initcall_debug kernel
> command option, and it shows no clear difference between having or not
> having the ice_init_rdma in ice_resume:
> Without ice_init_rdma:
> ```
> [  104.241129] ice 0000:86:00.0: PM: pci_pm_resume+0x0/0x110 returned 0 after 9415 usecs
> [  104.241206] ice 0000:86:00.1: PM: pci_pm_resume+0x0/0x110 returned 0 after 9443 usecs
> ```
> With ice_init_rdma:
> ```
> [  122.749022] ice 0000:86:00.1: PM: pci_pm_resume+0x0/0x110 returned 0 after 9485 usecs
> [  122.749068] ice 0000:86:00.0: PM: pci_pm_resume+0x0/0x110 returned 0 after 9532 usecs
> ```

Thank you for verifying this. Nice to hear, it has no impact.

[…]


Kind regards,

Paul

Wojciech Drewek May 31, 2024, 9:37 a.m. UTC | #8

On 30.05.2024 09:08, En-Wei WU wrote:
> Thank you for your reply.
> 
>> Sorry for being unclear. I meant, does resuming the system take longer
>> now? (initcall_debug might give a clue.)
> I've tested the S3 suspend/resume with the initcall_debug kernel
> command option, and it shows no clear difference between having or not
> having the ice_init_rdma in ice_resume:
> Without ice_init_rdma:
> ```
> [  104.241129] ice 0000:86:00.0: PM: pci_pm_resume+0x0/0x110 returned
> 0 after 9415 usecs
> [  104.241206] ice 0000:86:00.1: PM: pci_pm_resume+0x0/0x110 returned
> 0 after 9443 usecs
> ```
> With ice_init_rdma:
> ```
> [  122.749022] ice 0000:86:00.1: PM: pci_pm_resume+0x0/0x110 returned
> 0 after 9485 usecs
> [  122.749068] ice 0000:86:00.0: PM: pci_pm_resume+0x0/0x110 returned
> 0 after 9532 usecs
> ```
> 
>> And ice_init_rdma should be moved to ice_rebuild (replace ice_plug_aux_dev)
> We can defer the ice_init_rdma to the later service task by adopting this.
> 
>> You should call ice_deinit_rdma in ice_prepare_for_reset (replace ice_unplug_aux_dev),
> It seems like we must call ice_deinit_rdma in ice_suspend. If we call
> it in the later service task, it will:
> 1. break some existing code setup by ice_resume
> 2. Since the PCI-X vector table is flushed at the end of ice_suspend,
> we have no way to release PCI-X vectors for rdma if we had allocated
> it dynamically
> The second point is important since we didn't release the PCI-X
> vectors for rdma (if we allocated it dynamically) in the original
> ice_suspend, and it's somewhat like a leak in the original code.
> 
> Best regards,
> Ricky.

Makes sense, let's keep ice_deinit_rdma in ice_suspend.

> 
> On Thu, 30 May 2024 at 04:19, Paul Menzel <pmenzel@molgen.mpg.de> wrote:
>>
>> Dear En-Wei,
>>
>>
>> Thank you for responding so quickly.
>>
>> Am 29.05.24 um 05:17 schrieb En-Wei WU:
>>
>> […]
>>
>>>> What effect does this have on resume time?
>>>
>>> When we call ice_init_rdma() at resume time, it will allocate entries
>>> at pf->irq_tracker.entries and update pf->msix_entries for later use
>>> (request_irq) by irdma.
>>
>> Sorry for being unclear. I meant, does resuming the system take longer
>> now? (initcall_debug might give a clue.)
>>
>>
>> Kind regards,
>>
>> Paul

ice: irdma hardware init failed after suspend/resume

Checks

Commit Message

Comments

Patch