Message ID | 1522402644-3016-4-git-send-email-chaitra.basappa@broadcom.com (mailing list archive) |
---|---|
State | Changes Requested |
Headers | show |
On Fri, Mar 30, 2018 at 03:07:12PM +0530, Chaitra P B wrote: > Check scsi tracker for NULL before accessing it. > And in some places there are possibilities for getting valid st > but still other fields are not set. Please explain how that could ever happen. You should never see a scsi_cmnd without the device pointer.
On Fri, 2018-03-30 at 15:07 +0530, Chaitra P B wrote: > diff --git a/drivers/scsi/mpt3sas/mpt3sas_ctl.c b/drivers/scsi/mpt3sas/mpt3sas_ctl.c > index c1b17d6..2f27d5c 100644 > --- a/drivers/scsi/mpt3sas/mpt3sas_ctl.c > +++ b/drivers/scsi/mpt3sas/mpt3sas_ctl.c > @@ -590,7 +590,8 @@ _ctl_set_task_mid(struct MPT3SAS_ADAPTER *ioc, struct mpt3_ioctl_command *karg, > struct scsiio_tracker *st; > > scmd = mpt3sas_scsih_scsi_lookup_get(ioc, smid); > - if (!scmd) > + if (scmd == NULL || scmd->device == NULL || > + scmd->device->hostdata == NULL) > continue; > if (lun != scmd->device->lun) > continue; Is _ctl_set_task_mid() always called from the I/O completion path? As Christoph already wrote, these checks do not make sense in the completion path. > @@ -600,6 +601,8 @@ _ctl_set_task_mid(struct MPT3SAS_ADAPTER *ioc, struct mpt3_ioctl_command *karg, > if (priv_data->sas_target->handle != handle) > continue; > st = scsi_cmd_priv(scmd); > + if ((!st) || (st->smid == 0)) > + continue; > tm_request->TaskMID = cpu_to_le16(st->smid); > found = 1; > } Since the I/O submission path guarantees that st->smid != 0, how could st->smid ever be zero in the I/O completion path? > diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c b/drivers/scsi/mpt3sas/mpt3sas_scsih.c > index c9cce65..6b1aaa0 100644 > --- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c > +++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c > @@ -1465,7 +1465,7 @@ mpt3sas_scsih_scsi_lookup_get(struct MPT3SAS_ADAPTER *ioc, u16 smid) > scmd = scsi_host_find_tag(ioc->shost, unique_tag); > if (scmd) { > st = scsi_cmd_priv(scmd); > - if (st->cb_idx == 0xFF) > + if ((!st) || (st->cb_idx == 0xFF) || (st->smid == 0)) > scmd = NULL; > } > } > @@ -4451,6 +4451,13 @@ _scsih_flush_running_cmds(struct MPT3SAS_ADAPTER *ioc) > count++; > _scsih_set_satl_pending(scmd, false); > st = scsi_cmd_priv(scmd); > + /* > + * It may be possible that SCSI scmd got prepared by SML > + * but it has not issued to the driver, for these type of > + * scmd's don't do anything" > + */ > + if (st && st->smid == 0) > + continue; This seems wrong to me. If a SCSI command has not been submitted to the firmware skipping it in this function will introduce a delay because the command will only be completed after it has timed out and after the SCSI error handler has finished its processing. I think it's better to complete the command from this function instead of waiting until for the SCSI error handler. Bart.
On Fri, Mar 30, 2018 at 5:29 PM, Christoph Hellwig <hch@infradead.org> wrote: > On Fri, Mar 30, 2018 at 03:07:12PM +0530, Chaitra P B wrote: >> Check scsi tracker for NULL before accessing it. >> And in some places there are possibilities for getting valid st >> but still other fields are not set. > > Please explain how that could ever happen. You should never see > a scsi_cmnd without the device pointer. Chris, Here is one example, During Host reset operation time driver will flush out all the outstanding IO's to the SML in below function, static void _scsih_flush_running_cmds(struct MPT3SAS_ADAPTER *ioc) { struct scsi_cmnd *scmd; struct scsiio_tracker *st; u16 smid; int count = 0; [SR] Here driver is looping starting from smid one to HBA queue depth. for (smid = 1; smid <= ioc->scsiio_depth; smid++) { [SR] Some times it is possible that scsi_cmnd might have created at SML but it might not be issued to the driver as host reset operation is going on, So here we will get non-NULL scmd. scmd = mpt3sas_scsih_scsi_lookup_get(ioc, smid); if (!scmd) continue; count++; _scsih_set_satl_pending(scmd, false); [SR] Here we are trying to get the scsi tracker 'st' for the above scmd (which is not received by the driver) and so fields of this 'st' are uninitialized. st = scsi_cmd_priv(scmd); [SR] And here we are trying to clear the scsi tracker 'st' which is not yet all initialized by the driver, in other terms we are trying to flush out the scmd command which is not yet all received by the driver. mpt3sas_base_clear_st(ioc, st); scsi_dma_unmap(scmd); if (ioc->pci_error_recovery || ioc->remove_host) scmd->result = DID_NO_CONNECT << 16; else scmd->result = DID_RESET << 16; scmd->scsi_done(scmd); } dtmprintk(ioc, pr_info(MPT3SAS_FMT "completing %d cmds\n", ioc->name, count)); } Thanks, Sreekanth
On Mon, 2018-04-02 at 11:53 +0530, Sreekanth Reddy wrote: > On Fri, Mar 30, 2018 at 5:29 PM, Christoph Hellwig <hch@infradead.org> wrote: > > On Fri, Mar 30, 2018 at 03:07:12PM +0530, Chaitra P B wrote: > > > Check scsi tracker for NULL before accessing it. > > > And in some places there are possibilities for getting valid st > > > but still other fields are not set. > > > > Please explain how that could ever happen. You should never see > > a scsi_cmnd without the device pointer. > > Chris, > > Here is one example, During Host reset operation time driver will > flush out all the outstanding IO's to the SML in below function, > > static void > _scsih_flush_running_cmds(struct MPT3SAS_ADAPTER *ioc) > { > struct scsi_cmnd *scmd; > struct scsiio_tracker *st; > u16 smid; > int count = 0; > > [SR] Here driver is looping starting from smid one to HBA queue depth. > for (smid = 1; smid <= ioc->scsiio_depth; smid++) { > > [SR] Some times it is possible that scsi_cmnd might have created at > SML but it might not be issued to the driver as host reset operation > is going on, So here we will get non-NULL scmd. > scmd = mpt3sas_scsih_scsi_lookup_get(ioc, smid); > if (!scmd) > continue; > count++; > _scsih_set_satl_pending(scmd, false); > [SR] Here we are trying to get the scsi tracker 'st' for the above > scmd (which is not received by the driver) and so fields of this 'st' > are uninitialized. > st = scsi_cmd_priv(scmd); > [SR] And here we are trying to clear the scsi tracker 'st' which is > not yet all initialized by the driver, in other terms we are trying to > flush out the scmd command which is not yet all received by the > driver. > mpt3sas_base_clear_st(ioc, st); > scsi_dma_unmap(scmd); > if (ioc->pci_error_recovery || ioc->remove_host) > scmd->result = DID_NO_CONNECT << 16; > else > scmd->result = DID_RESET << 16; > scmd->scsi_done(scmd); > } > dtmprintk(ioc, pr_info(MPT3SAS_FMT "completing %d cmds\n", > ioc->name, count)); > } Hello Sreekanth, From mpt3sas_scsih_scsi_lookup_get(): st = scsi_cmd_priv(scmd); if (st->cb_idx == 0xFF) scmd = NULL; From mpt3sas_base_get_smid(), mpt3sas_base_get_smid_scsiio() and mpt3sas_base_get_smid_hpr(): request->cb_idx = cb_idx; Can _scsih_flush_running_cmds() be executed concurrently with scsih_qcmd()? In other words, can it happen that mpt3sas_scsih_scsi_lookup_get() sees that st->cb_idx == 0xff just before scsih_qcmd() assigns a value to .cb_idx? Can this cause _scsih_flush_running_cmds() to skip commands that it shouldn't skip? Can it happen that _scsih_flush_running_cmds() calls .scsi_done() for a SCSI command just after scsih_qcmd() transferred control for that command to the firmware? Can that cause .scsi_done() to be called twice for the same command? Thanks, Bart.
On Mon, Apr 2, 2018 at 8:55 PM, Bart Van Assche <Bart.VanAssche@wdc.com> wrote: > On Mon, 2018-04-02 at 11:53 +0530, Sreekanth Reddy wrote: >> On Fri, Mar 30, 2018 at 5:29 PM, Christoph Hellwig <hch@infradead.org> wrote: >> > On Fri, Mar 30, 2018 at 03:07:12PM +0530, Chaitra P B wrote: >> > > Check scsi tracker for NULL before accessing it. >> > > And in some places there are possibilities for getting valid st >> > > but still other fields are not set. >> > >> > Please explain how that could ever happen. You should never see >> > a scsi_cmnd without the device pointer. >> >> Chris, >> >> Here is one example, During Host reset operation time driver will >> flush out all the outstanding IO's to the SML in below function, >> >> static void >> _scsih_flush_running_cmds(struct MPT3SAS_ADAPTER *ioc) >> { >> struct scsi_cmnd *scmd; >> struct scsiio_tracker *st; >> u16 smid; >> int count = 0; >> >> [SR] Here driver is looping starting from smid one to HBA queue depth. >> for (smid = 1; smid <= ioc->scsiio_depth; smid++) { >> >> [SR] Some times it is possible that scsi_cmnd might have created at >> SML but it might not be issued to the driver as host reset operation >> is going on, So here we will get non-NULL scmd. >> scmd = mpt3sas_scsih_scsi_lookup_get(ioc, smid); >> if (!scmd) >> continue; >> count++; >> _scsih_set_satl_pending(scmd, false); >> [SR] Here we are trying to get the scsi tracker 'st' for the above >> scmd (which is not received by the driver) and so fields of this 'st' >> are uninitialized. >> st = scsi_cmd_priv(scmd); >> [SR] And here we are trying to clear the scsi tracker 'st' which is >> not yet all initialized by the driver, in other terms we are trying to >> flush out the scmd command which is not yet all received by the >> driver. >> mpt3sas_base_clear_st(ioc, st); >> scsi_dma_unmap(scmd); >> if (ioc->pci_error_recovery || ioc->remove_host) >> scmd->result = DID_NO_CONNECT << 16; >> else >> scmd->result = DID_RESET << 16; >> scmd->scsi_done(scmd); >> } >> dtmprintk(ioc, pr_info(MPT3SAS_FMT "completing %d cmds\n", >> ioc->name, count)); >> } > > Hello Sreekanth, > > From mpt3sas_scsih_scsi_lookup_get(): > > st = scsi_cmd_priv(scmd); > if (st->cb_idx == 0xFF) > scmd = NULL; > > From mpt3sas_base_get_smid(), mpt3sas_base_get_smid_scsiio() and > mpt3sas_base_get_smid_hpr(): > > request->cb_idx = cb_idx; > > Can _scsih_flush_running_cmds() be executed concurrently with scsih_qcmd()? [SR] No, driver calls _scsih_flush_running_cmds during Host reset time and during host reset time driver will set 'ioc->shost_recovery' flag. So in the scsih_qcmd() driver will return the incoming SCSI cmds with "SCSI_MLQUEUE_HOST_BUSY" whenever 'ioc->shost_recovery' flag is set as shown below, /* host recovery or link resets sent via IOCTLs */ if (ioc->shost_recovery || ioc->ioc_link_reset_in_progress) return SCSI_MLQUEUE_HOST_BUSY; > In other words, can it happen that mpt3sas_scsih_scsi_lookup_get() sees that > st->cb_idx == 0xff just before scsih_qcmd() assigns a value to .cb_idx? Can > this cause _scsih_flush_running_cmds() to skip commands that it shouldn't > skip? [SR] No, it won't happen. as I explained above during host reset time driver return the incoming SCSI commands with host busy status and _scsih_flush_running_cmds called during the host reset time. > > Can it happen that _scsih_flush_running_cmds() calls .scsi_done() for a SCSI > command just after scsih_qcmd() transferred control for that command to the > firmware? Can that cause .scsi_done() to be called twice for the same command? [SR] No, while _scsih_flush_running_cmds() function is getting executed no SCSI commands are issued to the firmware. > > Thanks, > > Bart. > >
On Tue, 2018-04-03 at 10:11 +0530, Sreekanth Reddy wrote: > [SR] No, driver calls _scsih_flush_running_cmds during Host reset time > and during host reset time driver will set 'ioc->shost_recovery' flag. > So in the scsih_qcmd() driver will return the incoming SCSI cmds with > "SCSI_MLQUEUE_HOST_BUSY" whenever 'ioc->shost_recovery' flag is set as > shown below, > > /* host recovery or link resets sent via IOCTLs */ > if (ioc->shost_recovery || ioc->ioc_link_reset_in_progress) > return SCSI_MLQUEUE_HOST_BUSY; The ioc->shost_recovery flag is involved in at least the following race: * From one context a SCSI command is submitted and scsih_qcmd() gets called. * At the same time sg_reset is invoked from a shell and triggers a call to scsih_host_reset(). That function in turn will call mpt3sas_base_hard_reset_handler(). I think this scenario can cause ioc->shost_recovery to be set by the mpt3sas driver after it has been checked by the scsih_qcmd() function. Anyway, let's get back to patch 03/15 at the start of this e-mail thread. That patch looks to me like an incomplete attempt to work around a race condition in the mpt3sas driver. I don't expect that anyone will trust that patch without further explanation. Which race condition does that patch address? And why do the mpt3sas maintainers believe that this patch is sufficient to address that race condition? Thanks, Bart.
On Tue, Apr 3, 2018 at 9:26 PM, Bart Van Assche <Bart.VanAssche@wdc.com> wrote: > On Tue, 2018-04-03 at 10:11 +0530, Sreekanth Reddy wrote: >> [SR] No, driver calls _scsih_flush_running_cmds during Host reset time >> and during host reset time driver will set 'ioc->shost_recovery' flag. >> So in the scsih_qcmd() driver will return the incoming SCSI cmds with >> "SCSI_MLQUEUE_HOST_BUSY" whenever 'ioc->shost_recovery' flag is set as >> shown below, >> >> /* host recovery or link resets sent via IOCTLs */ >> if (ioc->shost_recovery || ioc->ioc_link_reset_in_progress) >> return SCSI_MLQUEUE_HOST_BUSY; > > The ioc->shost_recovery flag is involved in at least the following race: > * From one context a SCSI command is submitted and scsih_qcmd() gets called. > * At the same time sg_reset is invoked from a shell and triggers a call to > scsih_host_reset(). That function in turn will call > mpt3sas_base_hard_reset_handler(). > > I think this scenario can cause ioc->shost_recovery to be set by the mpt3sas > driver after it has been checked by the scsih_qcmd() function. Then these outstanding commands will be get flush by the driver in _scsih_flush_running_cmds() function which we call as a part of host reset operation. > > Anyway, let's get back to patch 03/15 at the start of this e-mail thread. > That patch looks to me like an incomplete attempt to work around a race > condition in the mpt3sas driver. I don't expect that anyone will trust that > patch without further explanation. Which race condition does that patch > address? And why do the mpt3sas maintainers believe that this patch is > sufficient to address that race condition? Sure Bart, we will add proper description with the information which I explained in this mail thread on how this patch will fix below issue, During Host reset operation time driver will flush out all the outstanding IO's to the SML in below function, static void _scsih_flush_running_cmds(struct MPT3SAS_ADAPTER *ioc) { struct scsi_cmnd *scmd; struct scsiio_tracker *st; u16 smid; int count = 0; [SR] Here driver is looping starting from smid one to HBA queue depth. for (smid = 1; smid <= ioc->scsiio_depth; smid++) { [SR] Some times it is possible that scsi_cmnd might have created at SML but it might not be issued to the driver as host reset operation is going on, So here we will get non-NULL scmd. scmd = mpt3sas_scsih_scsi_lookup_get(ioc, smid); if (!scmd) continue; count++; _scsih_set_satl_pending(scmd, false); [SR] Here we are trying to get the scsi tracker 'st' for the above scmd (which is not received by the driver) and so fields of this 'st' are uninitialized. st = scsi_cmd_priv(scmd); [SR] And here we are trying to clear the scsi tracker 'st' which is not yet all initialized by the driver, in other terms we are trying to flush out the scmd command which is not yet all received by the driver. mpt3sas_base_clear_st(ioc, st); scsi_dma_unmap(scmd); if (ioc->pci_error_recovery || ioc->remove_host) scmd->result = DID_NO_CONNECT << 16; else scmd->result = DID_RESET << 16; scmd->scsi_done(scmd); } dtmprintk(ioc, pr_info(MPT3SAS_FMT "completing %d cmds\n", ioc->name, count)); } > > Thanks, > > Bart. >
diff --git a/drivers/scsi/mpt3sas/mpt3sas_ctl.c b/drivers/scsi/mpt3sas/mpt3sas_ctl.c index c1b17d6..2f27d5c 100644 --- a/drivers/scsi/mpt3sas/mpt3sas_ctl.c +++ b/drivers/scsi/mpt3sas/mpt3sas_ctl.c @@ -590,7 +590,8 @@ _ctl_set_task_mid(struct MPT3SAS_ADAPTER *ioc, struct mpt3_ioctl_command *karg, struct scsiio_tracker *st; scmd = mpt3sas_scsih_scsi_lookup_get(ioc, smid); - if (!scmd) + if (scmd == NULL || scmd->device == NULL || + scmd->device->hostdata == NULL) continue; if (lun != scmd->device->lun) continue; @@ -600,6 +601,8 @@ _ctl_set_task_mid(struct MPT3SAS_ADAPTER *ioc, struct mpt3_ioctl_command *karg, if (priv_data->sas_target->handle != handle) continue; st = scsi_cmd_priv(scmd); + if ((!st) || (st->smid == 0)) + continue; tm_request->TaskMID = cpu_to_le16(st->smid); found = 1; } diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c b/drivers/scsi/mpt3sas/mpt3sas_scsih.c index c9cce65..6b1aaa0 100644 --- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c +++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c @@ -1465,7 +1465,7 @@ mpt3sas_scsih_scsi_lookup_get(struct MPT3SAS_ADAPTER *ioc, u16 smid) scmd = scsi_host_find_tag(ioc->shost, unique_tag); if (scmd) { st = scsi_cmd_priv(scmd); - if (st->cb_idx == 0xFF) + if ((!st) || (st->cb_idx == 0xFF) || (st->smid == 0)) scmd = NULL; } } @@ -4451,6 +4451,13 @@ _scsih_flush_running_cmds(struct MPT3SAS_ADAPTER *ioc) count++; _scsih_set_satl_pending(scmd, false); st = scsi_cmd_priv(scmd); + /* + * It may be possible that SCSI scmd got prepared by SML + * but it has not issued to the driver, for these type of + * scmd's don't do anything" + */ + if (st && st->smid == 0) + continue; mpt3sas_base_clear_st(ioc, st); scsi_dma_unmap(scmd); if (ioc->pci_error_recovery)