Message ID | 20171128193503.13695-15-himanshu.madhani@cavium.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
On Tue, 2017-11-28 at 11:34 -0800, Himanshu Madhani wrote:
> From: Quinn Tran <quinn.tran@cavium.com>
Nesting spinlocks is allowed so I think a more detailed description
is required for this patch. It would help e.g. to explain why this
patch is a bug fix and also what it fixes.
Thanks,
Bart.
Hi Bart, > On Nov 28, 2017, at 11:45 AM, Bart Van Assche <bart.vanassche@wdc.com> wrote: > > On Tue, 2017-11-28 at 11:34 -0800, Himanshu Madhani wrote: >> From: Quinn Tran <quinn.tran@cavium.com> > > Nesting spinlocks is allowed so I think a more detailed description > is required for this patch. It would help e.g. to explain why this > patch is a bug fix and also what it fixes. > > Thanks, > > Bart. Apology for using wrong subject for this patch. we ran into this deadlock issue while unloading driver and stack trace shows following #14 [ffff9f2e21e03db8] native_queued_spin_lock_slowpath at ffffffffad0d8802 #15 [ffff9f2e21e03dc0] do_raw_spin_lock at ffffffffad0d99e4 #16 [ffff9f2e21e03dd8] _raw_spin_lock_irqsave at ffffffffad652471 #17 [ffff9f2e21e03e00] qla2x00_els_dcmd_iocb_timeout at ffffffffc070cd63 [qla2xxx] #18 [ffff9f2e21e03e40] qla2x00_sp_timeout at ffffffffc06f06d3 [qla2xxx] #19 [ffff9f2e21e03e68] call_timer_fn at ffffffffad0f97d8 #20 [ffff9f2e21e03ed8] run_timer_softirq at ffffffffad0faf47 #21 [ffff9f2e21e03f68] __softirqentry_text_start at ffffffffad655f32 There is patch #19 (https://patchwork.kernel.org/patch/10080967/) later in this series which ultimately removes this block of code protected by these spin locks. if you prefer i can merge this patch with the other patch and add more description Thanks, - Himanshu
On Tue, 2017-11-28 at 21:38 +0000, Madhani, Himanshu wrote: > Hi Bart, > > > On Nov 28, 2017, at 11:45 AM, Bart Van Assche <bart.vanassche@wdc.com> wrote: > > > > On Tue, 2017-11-28 at 11:34 -0800, Himanshu Madhani wrote: > > > From: Quinn Tran <quinn.tran@cavium.com> > > > > Nesting spinlocks is allowed so I think a more detailed description > > is required for this patch. It would help e.g. to explain why this > > patch is a bug fix and also what it fixes. > > > Apology for using wrong subject for this patch. > > we ran into this deadlock issue while unloading driver and stack trace shows following > > #14 [ffff9f2e21e03db8] native_queued_spin_lock_slowpath at ffffffffad0d8802 > #15 [ffff9f2e21e03dc0] do_raw_spin_lock at ffffffffad0d99e4 > #16 [ffff9f2e21e03dd8] _raw_spin_lock_irqsave at ffffffffad652471 > #17 [ffff9f2e21e03e00] qla2x00_els_dcmd_iocb_timeout at ffffffffc070cd63 [qla2xxx] > #18 [ffff9f2e21e03e40] qla2x00_sp_timeout at ffffffffc06f06d3 [qla2xxx] > #19 [ffff9f2e21e03e68] call_timer_fn at ffffffffad0f97d8 > #20 [ffff9f2e21e03ed8] run_timer_softirq at ffffffffad0faf47 > #21 [ffff9f2e21e03f68] __softirqentry_text_start at ffffffffad655f32 > > There is patch #19 (https://patchwork.kernel.org/patch/10080967/) later in this series which ultimately removes > this block of code protected by these spin locks. if you prefer i can merge this patch with the other patch and > add more description Hello Himanshu, Do you enable CONFIG_PROVE_LOCKING in any of your tests? If this bug could be reproduced with CONFIG_PROVE_LOCKING then that would provide very valuable information about the root cause of the deadlock. Thanks, Bart.
Hi Bart, > On Nov 28, 2017, at 2:09 PM, Bart Van Assche <bart.vanassche@wdc.com> wrote: > > Hello Himanshu, > > Do you enable CONFIG_PROVE_LOCKING in any of your tests? If this bug could > be reproduced with CONFIG_PROVE_LOCKING then that would provide very valuable > information about the root cause of the deadlock. > > Thanks, > > Bart. This was reported by customer so i am not very confident they would have CONFIG_PROVE_LOCKING enabled in their setup. I’ll enable this on our setup and we can try to reproduce this issue. Let me know if you want me to drop this patch until we get more details. Thanks, - Himanshu
On Tue, 2017-11-28 at 22:22 +0000, Madhani, Himanshu wrote: > This was reported by customer so i am not very confident they would have CONFIG_PROVE_LOCKING enabled in > their setup. I’ll enable this on our setup and we can try to reproduce this issue. > > Let me know if you want me to drop this patch until we get more details. Hello Himanshu, What I think is that at least an explanation should be provided about why it is safe to leave out the spin_lock() and spin_unlock() calls. What does the hardware lock protect and why is it safe to leave it out from qla2x00_els_dcmd_iocb_timeout()? Thanks, Bart.
Hi Bart, On Nov 28, 2017, at 2:27 PM, Bart Van Assche <bart.vanassche@wdc.com> wrote: > > On Tue, 2017-11-28 at 22:22 +0000, Madhani, Himanshu wrote: >> This was reported by customer so i am not very confident they would have CONFIG_PROVE_LOCKING enabled in >> their setup. I’ll enable this on our setup and we can try to reproduce this issue. >> >> Let me know if you want me to drop this patch until we get more details. > > Hello Himanshu, > > What I think is that at least an explanation should be provided about why it > is safe to leave out the spin_lock() and spin_unlock() calls. What does the > hardware lock protect and why is it safe to leave it out from > qla2x00_els_dcmd_iocb_timeout()? > Okay will update patch with the description and submit v2 of this patch. > Thanks, > > Bart. Thanks, - Himanshu
diff --git a/drivers/scsi/qla2xxx/qla_iocb.c b/drivers/scsi/qla2xxx/qla_iocb.c index d810a447cb4a..106f4ac4f733 100644 --- a/drivers/scsi/qla2xxx/qla_iocb.c +++ b/drivers/scsi/qla2xxx/qla_iocb.c @@ -2394,7 +2394,6 @@ qla2x00_els_dcmd_iocb_timeout(void *data) struct scsi_qla_host *vha = sp->vha; struct qla_hw_data *ha = vha->hw; struct srb_iocb *lio = &sp->u.iocb_cmd; - unsigned long flags = 0; ql_dbg(ql_dbg_io, vha, 0x3069, "%s Timeout, hdl=%x, portid=%02x%02x%02x\n", @@ -2402,7 +2401,6 @@ qla2x00_els_dcmd_iocb_timeout(void *data) fcport->d_id.b.al_pa); /* Abort the exchange */ - spin_lock_irqsave(&ha->hardware_lock, flags); if (ha->isp_ops->abort_command(sp)) { ql_dbg(ql_dbg_io, vha, 0x3070, "mbx abort_command failed.\n"); @@ -2410,7 +2408,6 @@ qla2x00_els_dcmd_iocb_timeout(void *data) ql_dbg(ql_dbg_io, vha, 0x3071, "mbx abort_command success.\n"); } - spin_unlock_irqrestore(&ha->hardware_lock, flags); complete(&lio->u.els_logo.comp); }