diff mbox series

scsi: mpt3sas: avoid watchdog issue while releasing chain buffers

Message ID 20220112140113.26560-1-mwilck@suse.com (mailing list archive)
State Deferred
Headers show
Series scsi: mpt3sas: avoid watchdog issue while releasing chain buffers | expand

Commit Message

Martin Wilck Jan. 12, 2022, 2:01 p.m. UTC
From: Martin Wilck <mwilck@suse.com>

I observe the watchdog timer being triggered while unloading the
mpt3sas driver:

Jan 12 12:25:51 tegmen kernel: mpt2sas_cm0: mpt3sas_base_detach
Jan 12 12:25:51 tegmen kernel: mpt2sas_cm0: mpt3sas_base_free_resources
Jan 12 12:25:51 tegmen kernel: mpt2sas_cm0: mpt3sas_base_make_ioc_ready
Jan 12 12:25:51 tegmen kernel: mpt2sas_cm0: sending message unit reset !!
Jan 12 12:25:51 tegmen kernel: mpt2sas_cm0: message unit reset: SUCCESS
Jan 12 12:25:51 tegmen kernel: mpt2sas_cm0: mpt3sas_base_unmap_resources
Jan 12 12:25:51 tegmen kernel: mpt2sas_cm0: _base_release_memory_pools
Jan 12 12:25:51 tegmen kernel: mpt2sas_cm0: request_pool(0x00000000144b1531): free
Jan 12 12:25:51 tegmen kernel: mpt2sas_cm0: sense_pool(0x000000009665c238): free
Jan 12 12:25:52 tegmen kernel: mpt2sas_cm0: reply_pool(0x000000005c5e0fa5): free
Jan 12 12:25:52 tegmen kernel: mpt2sas_cm0: reply_free_pool(0x000000006f897f6c): free
Jan 12 12:25:52 tegmen kernel: mpt2sas_cm0: reply_post_free_pool(0x00000000d1edc4aa): free
Jan 12 12:25:52 tegmen kernel: mpt2sas_cm0: config_page(0x000000009f651842): free
Jan 12 12:26:23 tegmen kernel: watchdog: BUG: soft lockup - CPU#27 stuck for 26s! [rmmod:2594]
Jan 12 12:26:23 tegmen kernel: Hardware name: HP ProLiant DL560 Gen8, BIOS P77 05/24/2019
Jan 12 12:26:23 tegmen kernel: RIP: 0010:_raw_spin_unlock_irqrestore+0x26/0x2e
Jan 12 12:26:23 tegmen kernel: Code: 1f 44 00 00 0f 1f 44 00 00 c6 07 00 0f 1f 40 00 f7 c6 00 02 00 00 75 0b 65 ff 0d 05 ce a1 5f 74 0>
Jan 12 12:26:23 tegmen kernel: RSP: 0018:ffffab1546bdfcc8 EFLAGS: 00000206
Jan 12 12:26:23 tegmen kernel: RAX: 0000000000000c80 RBX: ffff8d82b0f16700 RCX: 0000000000000d00
Jan 12 12:26:23 tegmen kernel: RDX: 0000000453642d00 RSI: 0000000000000282 RDI: ffff8d8292075f90
Jan 12 12:26:23 tegmen kernel: RBP: ffff8d8292075f80 R08: 0000000000000000 R09: 0000000000000001
Jan 12 12:26:23 tegmen kernel: R10: 0000000000000003 R11: ffff8d8284256a00 R12: ffff8d8293642d00
Jan 12 12:26:23 tegmen kernel: R13: ffff8d8292075f90 R14: 0000000000000282 R15: 0000000000000d00
Jan 12 12:26:23 tegmen kernel: FS:  00007fbd96388740(0000) GS:ffff8d8e7f6c0000(0000) knlGS:0000000000000000
Jan 12 12:26:23 tegmen kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 12 12:26:23 tegmen kernel: CR2: 000055bbd50f9918 CR3: 0000000c80b0c001 CR4: 00000000000606e0
Jan 12 12:26:23 tegmen kernel: Call Trace:
Jan 12 12:26:23 tegmen kernel:  <TASK>
Jan 12 12:26:23 tegmen kernel:  dma_pool_free+0xc1/0x100
Jan 12 12:26:23 tegmen kernel:  _base_release_memory_pools+0x343/0x4c0 [mpt3sas 6ff0715b1f6f07c16051cb2772836069b2821b01]
Jan 12 12:26:23 tegmen kernel:  mpt3sas_base_detach+0x2e/0x130 [mpt3sas 6ff0715b1f6f07c16051cb2772836069b2821b01]

When the driver is unloaded during system shutdown, this may actually cause a
kernel panic triggered by the watchdog.

The problem is that with the hardware in question, the driver allocates a very
large number of DMA buffers for chain lookup (scsiio_depth = 29868,
chains_needed_per_io = 15, total number of buffers = 448020). The loop that
frees all DMA buffers takes ~30s to execute. By adding a cond_resched() in the
loop, the watchdog is avoided.

Note: This is the 2nd issue I saw with this controller and the reported can_queue
value after https://lore.kernel.org/linux-scsi/Ydug9nWg4loEVkJw@T590/T/

Fixes: 93204b782a88 ("scsi: mpt3sas: Lockless access for chain buffers.")
Signed-off-by: Martin Wilck <mwilck@suse.com>
CC: Sathya Prakash <sathya.prakash@broadcom.com>
Cc: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
Cc: Suganath Prabu Subramani <suganath-prabu.subramani@broadcom.com>
Cc: MPT-FusionLinux.pdl@broadcom.com
---
 drivers/scsi/mpt3sas/mpt3sas_base.c | 1 +
 1 file changed, 1 insertion(+)

Comments

Martin Wilck Jan. 25, 2022, 9:49 p.m. UTC | #1
On Wed, 2022-01-12 at 15:01 +0100, mwilck@suse.com wrote:
> From: Martin Wilck <mwilck@suse.com>
> 
> I observe the watchdog timer being triggered while unloading the
> mpt3sas driver:
> 
> Jan 12 12:25:51 tegmen kernel: mpt2sas_cm0: mpt3sas_base_detach
> Jan 12 12:25:51 tegmen kernel: mpt2sas_cm0:
> mpt3sas_base_free_resources
> Jan 12 12:25:51 tegmen kernel: mpt2sas_cm0:
> mpt3sas_base_make_ioc_ready
> Jan 12 12:25:51 tegmen kernel: mpt2sas_cm0: sending message unit
> reset !!
> Jan 12 12:25:51 tegmen kernel: mpt2sas_cm0: message unit reset:
> SUCCESS
> Jan 12 12:25:51 tegmen kernel: mpt2sas_cm0:
> mpt3sas_base_unmap_resources
> Jan 12 12:25:51 tegmen kernel: mpt2sas_cm0:
> _base_release_memory_pools
> Jan 12 12:25:51 tegmen kernel: mpt2sas_cm0:
> request_pool(0x00000000144b1531): free
> Jan 12 12:25:51 tegmen kernel: mpt2sas_cm0:
> sense_pool(0x000000009665c238): free
> Jan 12 12:25:52 tegmen kernel: mpt2sas_cm0:
> reply_pool(0x000000005c5e0fa5): free
> Jan 12 12:25:52 tegmen kernel: mpt2sas_cm0:
> reply_free_pool(0x000000006f897f6c): free
> Jan 12 12:25:52 tegmen kernel: mpt2sas_cm0:
> reply_post_free_pool(0x00000000d1edc4aa): free
> Jan 12 12:25:52 tegmen kernel: mpt2sas_cm0:
> config_page(0x000000009f651842): free
> Jan 12 12:26:23 tegmen kernel: watchdog: BUG: soft lockup - CPU#27
> stuck for 26s! [rmmod:2594]
> Jan 12 12:26:23 tegmen kernel: Hardware name: HP ProLiant DL560 Gen8,
> BIOS P77 05/24/2019
> Jan 12 12:26:23 tegmen kernel: RIP:
> 0010:_raw_spin_unlock_irqrestore+0x26/0x2e
> Jan 12 12:26:23 tegmen kernel: Code: 1f 44 00 00 0f 1f 44 00 00 c6 07
> 00 0f 1f 40 00 f7 c6 00 02 00 00 75 0b 65 ff 0d 05 ce a1 5f 74 0>
> Jan 12 12:26:23 tegmen kernel: RSP: 0018:ffffab1546bdfcc8 EFLAGS:
> 00000206
> Jan 12 12:26:23 tegmen kernel: RAX: 0000000000000c80 RBX:
> ffff8d82b0f16700 RCX: 0000000000000d00
> Jan 12 12:26:23 tegmen kernel: RDX: 0000000453642d00 RSI:
> 0000000000000282 RDI: ffff8d8292075f90
> Jan 12 12:26:23 tegmen kernel: RBP: ffff8d8292075f80 R08:
> 0000000000000000 R09: 0000000000000001
> Jan 12 12:26:23 tegmen kernel: R10: 0000000000000003 R11:
> ffff8d8284256a00 R12: ffff8d8293642d00
> Jan 12 12:26:23 tegmen kernel: R13: ffff8d8292075f90 R14:
> 0000000000000282 R15: 0000000000000d00
> Jan 12 12:26:23 tegmen kernel: FS:  00007fbd96388740(0000)
> GS:ffff8d8e7f6c0000(0000) knlGS:0000000000000000
> Jan 12 12:26:23 tegmen kernel: CS:  0010 DS: 0000 ES: 0000 CR0:
> 0000000080050033
> Jan 12 12:26:23 tegmen kernel: CR2: 000055bbd50f9918 CR3:
> 0000000c80b0c001 CR4: 00000000000606e0
> Jan 12 12:26:23 tegmen kernel: Call Trace:
> Jan 12 12:26:23 tegmen kernel:  <TASK>
> Jan 12 12:26:23 tegmen kernel:  dma_pool_free+0xc1/0x100
> Jan 12 12:26:23 tegmen kernel: 
> _base_release_memory_pools+0x343/0x4c0 [mpt3sas
> 6ff0715b1f6f07c16051cb2772836069b2821b01]
> Jan 12 12:26:23 tegmen kernel:  mpt3sas_base_detach+0x2e/0x130
> [mpt3sas 6ff0715b1f6f07c16051cb2772836069b2821b01]
> 
> When the driver is unloaded during system shutdown, this may actually
> cause a
> kernel panic triggered by the watchdog.
> 
> The problem is that with the hardware in question, the driver
> allocates a very
> large number of DMA buffers for chain lookup (scsiio_depth = 29868,
> chains_needed_per_io = 15, total number of buffers = 448020). The
> loop that
> frees all DMA buffers takes ~30s to execute. By adding a
> cond_resched() in the
> loop, the watchdog is avoided.
> 
> Note: This is the 2nd issue I saw with this controller and the
> reported can_queue
> value after
> https://lore.kernel.org/linux-scsi/Ydug9nWg4loEVkJw@T590/T/
> 
> Fixes: 93204b782a88 ("scsi: mpt3sas: Lockless access for chain
> buffers.")
> Signed-off-by: Martin Wilck <mwilck@suse.com>
> CC: Sathya Prakash <sathya.prakash@broadcom.com>
> Cc: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
> Cc: Suganath Prabu Subramani <suganath-prabu.subramani@broadcom.com>
> Cc: MPT-FusionLinux.pdl@broadcom.com
> ---
>  drivers/scsi/mpt3sas/mpt3sas_base.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.c
> b/drivers/scsi/mpt3sas/mpt3sas_base.c
> index 81dab9b82f79..943ea7e0fef0 100644
> --- a/drivers/scsi/mpt3sas/mpt3sas_base.c
> +++ b/drivers/scsi/mpt3sas/mpt3sas_base.c
> @@ -5715,6 +5715,7 @@ _base_release_memory_pools(struct
> MPT3SAS_ADAPTER *ioc)
>                                                 ct-
> >chain_buffer_dma);
>                         }
>                         kfree(ioc->chain_lookup[i].chains_per_smid);
> +                       cond_resched();
>                 }
>                 dma_pool_destroy(ioc->chain_dma_pool);
>                 kfree(ioc->chain_lookup);
diff mbox series

Patch

diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.c b/drivers/scsi/mpt3sas/mpt3sas_base.c
index 81dab9b82f79..943ea7e0fef0 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_base.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_base.c
@@ -5715,6 +5715,7 @@  _base_release_memory_pools(struct MPT3SAS_ADAPTER *ioc)
 						ct->chain_buffer_dma);
 			}
 			kfree(ioc->chain_lookup[i].chains_per_smid);
+			cond_resched();
 		}
 		dma_pool_destroy(ioc->chain_dma_pool);
 		kfree(ioc->chain_lookup);