Message ID | 20191129202627.19624-2-martin.wilck@suse.com (mailing list archive) |
---|---|
State | Changes Requested |
Headers | show |
Series | [1/2] scsi: qla2xxx: avoid sending mailbox commands if firmware is stopped | expand |
On Fri, Nov 29, 2019 at 08:26:36PM +0000, Martin Wilck wrote: > From: Martin Wilck <mwilck@suse.com> > > Since 45235022da99, the firmware is shut down early in the controller > shutdown process. This causes commands sent to the firmware (such as LOGO) > to hang forever. Eventually one or more timeouts will be triggered. > Move the stopping of the firmware until after sessions have terminated. > > Fixes: 45235022da99 ("scsi: qla2xxx: Fix driver unload by shutting down chip") > Signed-off-by: Martin Wilck <mwilck@suse.com> > --- > drivers/scsi/qla2xxx/qla_os.c | 21 ++++++++++----------- > 1 file changed, 10 insertions(+), 11 deletions(-) > > diff --git a/drivers/scsi/qla2xxx/qla_os.c b/drivers/scsi/qla2xxx/qla_os.c > index 43d0aa0..0cc127d 100644 > --- a/drivers/scsi/qla2xxx/qla_os.c > +++ b/drivers/scsi/qla2xxx/qla_os.c > @@ -3710,6 +3710,16 @@ qla2x00_remove_one(struct pci_dev *pdev) > } > qla2x00_wait_for_hba_ready(base_vha); > > + qla2x00_wait_for_sess_deletion(base_vha); > + > + /* > + * if UNLOAD flag is already set, then continue unload, > + * where it was set first. > + */ > + if (test_bit(UNLOADING, &base_vha->dpc_flags)) > + return; > + > + set_bit(UNLOADING, &base_vha->dpc_flags); Hi Martin, Moving qla2x00_wait_for_sess_deletion up ensures hw->wq is flushed before shutdown is done. But I think we need to set UNLOADING bit earlier to break-up async discovery chain. The comment just above qla2x00_wait_for_sess_deletion definition mentions that. Thanks, Roman
On 11/29/19 9:26 PM, Martin Wilck wrote: > From: Martin Wilck <mwilck@suse.com> > > Since 45235022da99, the firmware is shut down early in the controller > shutdown process. This causes commands sent to the firmware (such as LOGO) > to hang forever. Eventually one or more timeouts will be triggered. > Move the stopping of the firmware until after sessions have terminated. > > Fixes: 45235022da99 ("scsi: qla2xxx: Fix driver unload by shutting down chip") > Signed-off-by: Martin Wilck <mwilck@suse.com> > --- > drivers/scsi/qla2xxx/qla_os.c | 21 ++++++++++----------- > 1 file changed, 10 insertions(+), 11 deletions(-) > > diff --git a/drivers/scsi/qla2xxx/qla_os.c b/drivers/scsi/qla2xxx/qla_os.c > index 43d0aa0..0cc127d 100644 > --- a/drivers/scsi/qla2xxx/qla_os.c > +++ b/drivers/scsi/qla2xxx/qla_os.c > @@ -3710,6 +3710,16 @@ qla2x00_remove_one(struct pci_dev *pdev) > } > qla2x00_wait_for_hba_ready(base_vha); > > + qla2x00_wait_for_sess_deletion(base_vha); > + > + /* > + * if UNLOAD flag is already set, then continue unload, > + * where it was set first. > + */ > + if (test_bit(UNLOADING, &base_vha->dpc_flags)) > + return; > + > + set_bit(UNLOADING, &base_vha->dpc_flags); > if (IS_QLA25XX(ha) || IS_QLA2031(ha) || IS_QLA27XX(ha) || > IS_QLA28XX(ha)) { > if (ha->flags.fw_started) test_and_set_bit(), maybe? Cheers, Hannes
On 11/29/19 3:26 PM, Martin Wilck wrote: > From: Martin Wilck <mwilck@suse.com> > > Since 45235022da99, the firmware is shut down early in the controller > shutdown process. This causes commands sent to the firmware (such as LOGO) > to hang forever. Eventually one or more timeouts will be triggered. > Move the stopping of the firmware until after sessions have terminated. > > Fixes: 45235022da99 ("scsi: qla2xxx: Fix driver unload by shutting down chip") > Signed-off-by: Martin Wilck <mwilck@suse.com> > --- > drivers/scsi/qla2xxx/qla_os.c | 21 ++++++++++----------- > 1 file changed, 10 insertions(+), 11 deletions(-) > > diff --git a/drivers/scsi/qla2xxx/qla_os.c b/drivers/scsi/qla2xxx/qla_os.c > index 43d0aa0..0cc127d 100644 > --- a/drivers/scsi/qla2xxx/qla_os.c > +++ b/drivers/scsi/qla2xxx/qla_os.c > @@ -3710,6 +3710,16 @@ qla2x00_remove_one(struct pci_dev *pdev) > } > qla2x00_wait_for_hba_ready(base_vha); > > + qla2x00_wait_for_sess_deletion(base_vha); > + > + /* > + * if UNLOAD flag is already set, then continue unload, > + * where it was set first. > + */ > + if (test_bit(UNLOADING, &base_vha->dpc_flags)) > + return; > + > + set_bit(UNLOADING, &base_vha->dpc_flags); > if (IS_QLA25XX(ha) || IS_QLA2031(ha) || IS_QLA27XX(ha) || > IS_QLA28XX(ha)) { > if (ha->flags.fw_started) > @@ -3726,17 +3736,6 @@ qla2x00_remove_one(struct pci_dev *pdev) > qla2x00_try_to_stop_firmware(base_vha); > } > > - qla2x00_wait_for_sess_deletion(base_vha); > - > - /* > - * if UNLOAD flag is already set, then continue unload, > - * where it was set first. > - */ > - if (test_bit(UNLOADING, &base_vha->dpc_flags)) > - return; > - > - set_bit(UNLOADING, &base_vha->dpc_flags); > - > qla_nvme_delete(base_vha); > > dma_free_coherent(&ha->pdev->dev, Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Hello Roman, sorry for the late reply. I had temporary issues with email from linux- scsi, and then was busy with other stuff. On Sat, 2019-11-30 at 13:23 +0300, Roman Bolshakov wrote: > On Fri, Nov 29, 2019 at 08:26:36PM +0000, Martin Wilck wrote: > > From: Martin Wilck <mwilck@suse.com> > > > > Since 45235022da99, the firmware is shut down early in the > > controller > > shutdown process. This causes commands sent to the firmware (such > > as LOGO) > > to hang forever. Eventually one or more timeouts will be triggered. > > Move the stopping of the firmware until after sessions have > > terminated. > > > > Fixes: 45235022da99 ("scsi: qla2xxx: Fix driver unload by shutting > > down chip") > > Signed-off-by: Martin Wilck <mwilck@suse.com> > > --- > > drivers/scsi/qla2xxx/qla_os.c | 21 ++++++++++----------- > > 1 file changed, 10 insertions(+), 11 deletions(-) > > > > diff --git a/drivers/scsi/qla2xxx/qla_os.c > > b/drivers/scsi/qla2xxx/qla_os.c > > index 43d0aa0..0cc127d 100644 > > --- a/drivers/scsi/qla2xxx/qla_os.c > > +++ b/drivers/scsi/qla2xxx/qla_os.c > > @@ -3710,6 +3710,16 @@ qla2x00_remove_one(struct pci_dev *pdev) > > } > > qla2x00_wait_for_hba_ready(base_vha); > > > > + qla2x00_wait_for_sess_deletion(base_vha); > > + > > + /* > > + * if UNLOAD flag is already set, then continue unload, > > + * where it was set first. > > + */ > > + if (test_bit(UNLOADING, &base_vha->dpc_flags)) > > + return; > > + > > + set_bit(UNLOADING, &base_vha->dpc_flags); > > Hi Martin, > > Moving qla2x00_wait_for_sess_deletion up ensures hw->wq is flushed > before shutdown is done. > > But I think we need to set UNLOADING bit earlier to break-up async > discovery chain. The comment just above > qla2x00_wait_for_sess_deletion > definition mentions that. I was unsure about this, because fc_terminate_rport_io() will not send LOGO any more if UNLOADING is set, which I thought might cause issues in the SAN. But I suppose it's ok, because after setting flag UNLOADING we will roughly proceed as follows, sending LOGO if required: qla2x00_wait_for_sess_deletion() qla2x00_mark_all_devices_lost() qlt_schedule_sess_for_deletion() -> set off sess->del_work qla24xx_delete_sess_fn() (from sess->del_work) qlt_unreg_sess -> set off sess->free_work qlt_free_session_done (from sess->free_work) This will send LOGO if sess->logout_on_delete is set. So, I'll add this to the series, plus Hannes' suggestion to use test_and_set_bit(). Thanks, Martin PS: the qla2xxx driver has multiple flags and tests for "can I access the controller now?" with (at least for my mind) blurry semantics and unclear mutual dependencies: - dpc_flags: UNLOADING - hw->flags.fw_started - vha->pci_flags.PFLG_DRIVER_REMOVING - qla2x00_chip_is_down() (tests fw_started and the dpc_flags ISP_ABORT_NEEDED, ABORT_ISP_ACTIVE, ISP_ABORT_RETRY) I've tried to review how these different flags are used - I don't see obvious rules as for which flag is tested in what situation. Anyway, as argued above, I think for the issue at hand using UNLOADING the way you suggested is correct.
diff --git a/drivers/scsi/qla2xxx/qla_os.c b/drivers/scsi/qla2xxx/qla_os.c index 43d0aa0..0cc127d 100644 --- a/drivers/scsi/qla2xxx/qla_os.c +++ b/drivers/scsi/qla2xxx/qla_os.c @@ -3710,6 +3710,16 @@ qla2x00_remove_one(struct pci_dev *pdev) } qla2x00_wait_for_hba_ready(base_vha); + qla2x00_wait_for_sess_deletion(base_vha); + + /* + * if UNLOAD flag is already set, then continue unload, + * where it was set first. + */ + if (test_bit(UNLOADING, &base_vha->dpc_flags)) + return; + + set_bit(UNLOADING, &base_vha->dpc_flags); if (IS_QLA25XX(ha) || IS_QLA2031(ha) || IS_QLA27XX(ha) || IS_QLA28XX(ha)) { if (ha->flags.fw_started) @@ -3726,17 +3736,6 @@ qla2x00_remove_one(struct pci_dev *pdev) qla2x00_try_to_stop_firmware(base_vha); } - qla2x00_wait_for_sess_deletion(base_vha); - - /* - * if UNLOAD flag is already set, then continue unload, - * where it was set first. - */ - if (test_bit(UNLOADING, &base_vha->dpc_flags)) - return; - - set_bit(UNLOADING, &base_vha->dpc_flags); - qla_nvme_delete(base_vha); dma_free_coherent(&ha->pdev->dev,