diff mbox

[14/22] qla2xxx: Fix nested spinlock

Message ID 20171128193503.13695-15-himanshu.madhani@cavium.com (mailing list archive)
State Superseded
Headers show

Commit Message

Madhani, Himanshu Nov. 28, 2017, 7:34 p.m. UTC
From: Quinn Tran <quinn.tran@cavium.com>

Fixes: 6eb54715b54bb ("qla2xxx: Added interface to send explicit LOGO.")
Cc: <stable@vger.kernel.org> # 4.10+
Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
---
 drivers/scsi/qla2xxx/qla_iocb.c | 3 ---
 1 file changed, 3 deletions(-)

Comments

Bart Van Assche Nov. 28, 2017, 7:45 p.m. UTC | #1
On Tue, 2017-11-28 at 11:34 -0800, Himanshu Madhani wrote:
> From: Quinn Tran <quinn.tran@cavium.com>


Nesting spinlocks is allowed so I think a more detailed description
is required for this patch. It would help e.g. to explain why this
patch is a bug fix and also what it fixes.

Thanks,

Bart.
Madhani, Himanshu Nov. 28, 2017, 9:38 p.m. UTC | #2
Hi Bart, 

> On Nov 28, 2017, at 11:45 AM, Bart Van Assche <bart.vanassche@wdc.com> wrote:
> 
> On Tue, 2017-11-28 at 11:34 -0800, Himanshu Madhani wrote:
>> From: Quinn Tran <quinn.tran@cavium.com>
> 
> Nesting spinlocks is allowed so I think a more detailed description
> is required for this patch. It would help e.g. to explain why this
> patch is a bug fix and also what it fixes.
> 
> Thanks,
> 
> Bart.

Apology for using wrong subject for this patch. 

we ran into this deadlock issue while unloading driver and stack trace shows following

#14 [ffff9f2e21e03db8] native_queued_spin_lock_slowpath at ffffffffad0d8802
#15 [ffff9f2e21e03dc0] do_raw_spin_lock at ffffffffad0d99e4
#16 [ffff9f2e21e03dd8] _raw_spin_lock_irqsave at ffffffffad652471
#17 [ffff9f2e21e03e00] qla2x00_els_dcmd_iocb_timeout at ffffffffc070cd63 [qla2xxx]
#18 [ffff9f2e21e03e40] qla2x00_sp_timeout at ffffffffc06f06d3 [qla2xxx]
#19 [ffff9f2e21e03e68] call_timer_fn at ffffffffad0f97d8
#20 [ffff9f2e21e03ed8] run_timer_softirq at ffffffffad0faf47
#21 [ffff9f2e21e03f68] __softirqentry_text_start at ffffffffad655f32

There is patch #19 (https://patchwork.kernel.org/patch/10080967/) later in this series which ultimately removes
this block of code protected by these spin locks. if you prefer i can merge this patch with the other patch and 
add more description

Thanks,
- Himanshu
Bart Van Assche Nov. 28, 2017, 10:09 p.m. UTC | #3
On Tue, 2017-11-28 at 21:38 +0000, Madhani, Himanshu wrote:
> Hi Bart, 

> 

> > On Nov 28, 2017, at 11:45 AM, Bart Van Assche <bart.vanassche@wdc.com> wrote:

> > 

> > On Tue, 2017-11-28 at 11:34 -0800, Himanshu Madhani wrote:

> > > From: Quinn Tran <quinn.tran@cavium.com>

> > 

> > Nesting spinlocks is allowed so I think a more detailed description

> > is required for this patch. It would help e.g. to explain why this

> > patch is a bug fix and also what it fixes.

> > 

> Apology for using wrong subject for this patch. 

> 

> we ran into this deadlock issue while unloading driver and stack trace shows following

> 

> #14 [ffff9f2e21e03db8] native_queued_spin_lock_slowpath at ffffffffad0d8802

> #15 [ffff9f2e21e03dc0] do_raw_spin_lock at ffffffffad0d99e4

> #16 [ffff9f2e21e03dd8] _raw_spin_lock_irqsave at ffffffffad652471

> #17 [ffff9f2e21e03e00] qla2x00_els_dcmd_iocb_timeout at ffffffffc070cd63 [qla2xxx]

> #18 [ffff9f2e21e03e40] qla2x00_sp_timeout at ffffffffc06f06d3 [qla2xxx]

> #19 [ffff9f2e21e03e68] call_timer_fn at ffffffffad0f97d8

> #20 [ffff9f2e21e03ed8] run_timer_softirq at ffffffffad0faf47

> #21 [ffff9f2e21e03f68] __softirqentry_text_start at ffffffffad655f32

> 

> There is patch #19 (https://patchwork.kernel.org/patch/10080967/) later in this series which ultimately removes

> this block of code protected by these spin locks. if you prefer i can merge this patch with the other patch and 

> add more description


Hello Himanshu,

Do you enable CONFIG_PROVE_LOCKING in any of your tests? If this bug could
be reproduced with CONFIG_PROVE_LOCKING then that would provide very valuable
information about the root cause of the deadlock.

Thanks,

Bart.
Madhani, Himanshu Nov. 28, 2017, 10:22 p.m. UTC | #4
Hi Bart, 

> On Nov 28, 2017, at 2:09 PM, Bart Van Assche <bart.vanassche@wdc.com> wrote:

> 

> Hello Himanshu,

> 

> Do you enable CONFIG_PROVE_LOCKING in any of your tests? If this bug could

> be reproduced with CONFIG_PROVE_LOCKING then that would provide very valuable

> information about the root cause of the deadlock.

> 

> Thanks,

> 

> Bart.


This was reported by customer so i am not very confident they would have CONFIG_PROVE_LOCKING enabled in
their setup. I’ll enable this on our setup and we can try to reproduce this issue. 

Let me know if you want me to drop this patch until we get more details.

Thanks,
- Himanshu
Bart Van Assche Nov. 28, 2017, 10:27 p.m. UTC | #5
On Tue, 2017-11-28 at 22:22 +0000, Madhani, Himanshu wrote:
> This was reported by customer so i am not very confident they would have CONFIG_PROVE_LOCKING enabled in

> their setup. I’ll enable this on our setup and we can try to reproduce this issue. 

> 

> Let me know if you want me to drop this patch until we get more details.


Hello Himanshu,

What I think is that at least an explanation should be provided about why it
is safe to leave out the spin_lock() and spin_unlock() calls. What does the
hardware lock protect and why is it safe to leave it out from
qla2x00_els_dcmd_iocb_timeout()?

Thanks,

Bart.
Madhani, Himanshu Nov. 28, 2017, 10:55 p.m. UTC | #6
Hi Bart, 

On Nov 28, 2017, at 2:27 PM, Bart Van Assche <bart.vanassche@wdc.com> wrote:
> 

> On Tue, 2017-11-28 at 22:22 +0000, Madhani, Himanshu wrote:

>> This was reported by customer so i am not very confident they would have CONFIG_PROVE_LOCKING enabled in

>> their setup. I’ll enable this on our setup and we can try to reproduce this issue. 

>> 

>> Let me know if you want me to drop this patch until we get more details.

> 

> Hello Himanshu,

> 

> What I think is that at least an explanation should be provided about why it

> is safe to leave out the spin_lock() and spin_unlock() calls. What does the

> hardware lock protect and why is it safe to leave it out from

> qla2x00_els_dcmd_iocb_timeout()?

> 


Okay will update patch with the description and submit v2 of this patch.

> Thanks,

> 

> Bart.


Thanks,
- Himanshu
diff mbox

Patch

diff --git a/drivers/scsi/qla2xxx/qla_iocb.c b/drivers/scsi/qla2xxx/qla_iocb.c
index d810a447cb4a..106f4ac4f733 100644
--- a/drivers/scsi/qla2xxx/qla_iocb.c
+++ b/drivers/scsi/qla2xxx/qla_iocb.c
@@ -2394,7 +2394,6 @@  qla2x00_els_dcmd_iocb_timeout(void *data)
 	struct scsi_qla_host *vha = sp->vha;
 	struct qla_hw_data *ha = vha->hw;
 	struct srb_iocb *lio = &sp->u.iocb_cmd;
-	unsigned long flags = 0;
 
 	ql_dbg(ql_dbg_io, vha, 0x3069,
 	    "%s Timeout, hdl=%x, portid=%02x%02x%02x\n",
@@ -2402,7 +2401,6 @@  qla2x00_els_dcmd_iocb_timeout(void *data)
 	    fcport->d_id.b.al_pa);
 
 	/* Abort the exchange */
-	spin_lock_irqsave(&ha->hardware_lock, flags);
 	if (ha->isp_ops->abort_command(sp)) {
 		ql_dbg(ql_dbg_io, vha, 0x3070,
 		    "mbx abort_command failed.\n");
@@ -2410,7 +2408,6 @@  qla2x00_els_dcmd_iocb_timeout(void *data)
 		ql_dbg(ql_dbg_io, vha, 0x3071,
 		    "mbx abort_command success.\n");
 	}
-	spin_unlock_irqrestore(&ha->hardware_lock, flags);
 
 	complete(&lio->u.els_logo.comp);
 }