Message ID | 1528971921-15216-1-git-send-email-chaitra.basappa@broadcom.com (mailing list archive) |
---|---|
State | Changes Requested |
Headers | show |
On Thu, 2018-06-14 at 06:25 -0400, Chaitra P B wrote: > As a part of host reset operation, driver will flushout all IOs > outstanding at driver level with "DID_RESET" result. > To find which are all commands outstanding at the driver level, > driver loops with smid starting from one to HBA queue depth and > calls mpt3sas_scsih_scsi_lookup_get() to get scmd as shown below > > for (smid = 1; smid <= ioc->scsiio_depth; smid++) { > scmd = mpt3sas_scsih_scsi_lookup_get(ioc, smid); > if (!scmd) > continue; > > But in mpt3sas_scsih_scsi_lookup_get() function, driver returns some > scsi cmnds which are not outstanding at the driver level (possibly request > is constructed at block layer since QUEUE_FLAG_QUIESCED is not set. Even if > driver uses scsi_block_requests and scsi_unblock_requests, issue still > persists as they will be just blocking further IO from scsi layer and > not from block layer) and these commands are flushed with > DID_RESET host bytes thus resulting into above kernel BUG. Have you considered to call scsi_target_block() before flushing out pending I/O and to call scsi_target_unblock() after flushing out pending I/O has finished? I think that would be a much cleaner solution than the proposed patch. The effect of scsi_target_block() is explained above scsi_internal_device_block(): /** * scsi_internal_device_block - try to transition to the SDEV_BLOCK state * @sdev: device to block * * Pause SCSI command processing on the specified device and wait until all * ongoing scsi_request_fn() / scsi_queue_rq() calls have finished. May sleep. * * Returns zero if successful or a negative error code upon failure. * * Note: * This routine transitions the device to the SDEV_BLOCK state (which must be * a legal transition). When the device is in this state, command processing * is paused until the device leaves the SDEV_BLOCK state. See also * scsi_internal_device_unblock(). * * To do: avoid that scsi_send_eh_cmnd() calls queuecommand() after * scsi_internal_device_block() has blocked a SCSI device and also * remove the rport mutex lock and unlock calls from srp_queuecommand(). */ Thanks, Bart.
On Wed, 2018-06-20 at 09:18 +0530, Chaitra Basappa wrote: > We have tried with calling scsi_internal_device_block_nowait() API before > doing IOC reset (i.e. host reset) and called > scsi_internal_device_unblock_nowait() after performing IOC reset. > We have tested this code change with various test cases such as > adding/removing target drives or expanders during diag reset with and > without IOs and at high level we see all are working but we observe below > error messages while performing hibernation operation, > > sd 1:0:0:0: device_block, handle(0x0028) > BRCM Debug: sdev->sdev_state: 5 before device_block_nowait > BRCM Debug: sdev->sdev_state: 5 after_device_block_nowait > sd 1:0:0:0: device_block failed with return(-22) for handle(0x0028) > . > . > sd 0:0:0:0: device_unblock and setting to running, handle(0x0028) > sd 0:0:0:0: device_unblock failed with return(-22) for handle(0x0028) > performing a block followed by an unblock > sd 0:0:0:0: retried device_block failed with return(-22) for handle(0x0028) > sd 0:0:0:0: retried device_unblock failed with return(-22) for > handle(0x0028) > > We are observing these messages during of system resume time, during which > driver issues IOC reset operation in the .resume() callback function. > In the above error messages we see that drives are in SDEV_QUIESCE state. > When drives are SDEV_QUIESCE state then moving these drives to > SDEV_BLOCK state is not allowed and hence we observe above error messages. > > SDEV_QUIESCE state means that Device quiescent. No block commands will be > accepted, only specials (which originate in the midlayer). Neither scsi_internal_device_block_nowait() nor scsi_internal_device_unblock_nowait() should ever have been changed from static into exported functions. But that's another discussion. Regarding the adverse interaction of scsi_internal_device_block_nowait() and scsi_internal_device_unblock_nowait() with the power management code, have you considered to surround code that blocks and unblocks SCSI devices with lock_system_sleep() / unlock_system_sleep() to avoid that these functions fail with error code -22? Thanks, Bart.
On Wed, Jun 20, 2018 at 11:43 PM, Bart Van Assche <Bart.VanAssche@wdc.com> wrote: > On Wed, 2018-06-20 at 09:18 +0530, Chaitra Basappa wrote: >> We have tried with calling scsi_internal_device_block_nowait() API before >> doing IOC reset (i.e. host reset) and called >> scsi_internal_device_unblock_nowait() after performing IOC reset. >> We have tested this code change with various test cases such as >> adding/removing target drives or expanders during diag reset with and >> without IOs and at high level we see all are working but we observe below >> error messages while performing hibernation operation, >> >> sd 1:0:0:0: device_block, handle(0x0028) >> BRCM Debug: sdev->sdev_state: 5 before device_block_nowait >> BRCM Debug: sdev->sdev_state: 5 after_device_block_nowait >> sd 1:0:0:0: device_block failed with return(-22) for handle(0x0028) >> . >> . >> sd 0:0:0:0: device_unblock and setting to running, handle(0x0028) >> sd 0:0:0:0: device_unblock failed with return(-22) for handle(0x0028) >> performing a block followed by an unblock >> sd 0:0:0:0: retried device_block failed with return(-22) for handle(0x0028) >> sd 0:0:0:0: retried device_unblock failed with return(-22) for >> handle(0x0028) >> >> We are observing these messages during of system resume time, during which >> driver issues IOC reset operation in the .resume() callback function. >> In the above error messages we see that drives are in SDEV_QUIESCE state. >> When drives are SDEV_QUIESCE state then moving these drives to >> SDEV_BLOCK state is not allowed and hence we observe above error messages. >> >> SDEV_QUIESCE state means that Device quiescent. No block commands will be >> accepted, only specials (which originate in the midlayer). > > Neither scsi_internal_device_block_nowait() nor > scsi_internal_device_unblock_nowait() should ever have been changed from > static into exported functions. But that's another discussion. Regarding the > adverse interaction of scsi_internal_device_block_nowait() and > scsi_internal_device_unblock_nowait() with the power management code, have > you considered to surround code that blocks and unblocks SCSI devices with > lock_system_sleep() / unlock_system_sleep() to avoid that these functions > fail with error code -22? > Bart, we tried using lock_system_sleep() before calling IOC reset operation in .resume() callback function and unlock_system_sleep() after the IOC reset. With this code change we see system is going to hang state during hibernation and we just see below messages, [ 625.788598] PM: hibernation entry Jun 21 05:37:33 localhost kernel: PM: hibernation entry [ 627.428159] PM: Syncing filesystems ... Jun 21 05:37:34 localhost kernel: PM: Syncing filesystems ... [ 628.756119] PM: done. [ 628.758707] Freezing user space processes ... (elapsed 0.001 seconds) done. [ 628.768340] OOM killer disabled. [ 628.772010] PM: Preallocating image memory... done (allocated 197704 pages) [ 632.554470] PM: Allocated 790816 kbytes in 3.77 seconds (209.76 MB/s) [ 632.561664] Freezing remaining freezable tasks ... (elapsed 0.002 seconds) done. [ 632.572269] Suspending console(s) (use no_console_suspend to debug) The fix which we have posted looks simple and we don't see any side effects of it. We have done complete regression testing on our fix and we don't see any issue with it. So please consider our fix which have posted. Thanks, Sreekanth > Thanks, > > Bart. > > >
On Thu, 2018-06-21 at 15:41 +0530, Sreekanth Reddy wrote: > Bart, we tried using lock_system_sleep() before calling IOC reset > operation in .resume() callback function and unlock_system_sleep() > after the IOC reset. With this code change we see system is going to > hang state during hibernation and we just see below messages, > > [ 625.788598] PM: hibernation entry > Jun 21 05:37:33 localhost kernel: PM: hibernation entry > [ 627.428159] PM: Syncing filesystems ... > Jun 21 05:37:34 localhost kernel: PM: Syncing filesystems ... > [ 628.756119] PM: done. > [ 628.758707] Freezing user space processes ... (elapsed 0.001 seconds) done. > [ 628.768340] OOM killer disabled. > [ 628.772010] PM: Preallocating image memory... done (allocated 197704 pages) > [ 632.554470] PM: Allocated 790816 kbytes in 3.77 seconds (209.76 MB/s) > [ 632.561664] Freezing remaining freezable tasks ... (elapsed 0.002 > seconds) done. > [ 632.572269] Suspending console(s) (use no_console_suspend to debug) > > > The fix which we have posted looks simple and we don't see any side > effects of it. > We have done complete regression testing on our fix and we don't see > any issue with it. So please consider our fix which have posted. scsi_internal_device_block_nowait() can be called by the mpt3sas driver from interrupt context. lock_system_sleep() locks a mutex and hence must not be called from interrupt context. Does the above mean that (a) a call to lock_system_sleep() was inserted in an interrupt handler and (b) that the above test was performed with a kernel in which lockdep was disabled? Bart.
On Thu, Jun 21, 2018 at 7:49 PM, Bart Van Assche <Bart.VanAssche@wdc.com> wrote: > On Thu, 2018-06-21 at 15:41 +0530, Sreekanth Reddy wrote: >> Bart, we tried using lock_system_sleep() before calling IOC reset >> operation in .resume() callback function and unlock_system_sleep() >> after the IOC reset. With this code change we see system is going to >> hang state during hibernation and we just see below messages, >> >> [ 625.788598] PM: hibernation entry >> Jun 21 05:37:33 localhost kernel: PM: hibernation entry >> [ 627.428159] PM: Syncing filesystems ... >> Jun 21 05:37:34 localhost kernel: PM: Syncing filesystems ... >> [ 628.756119] PM: done. >> [ 628.758707] Freezing user space processes ... (elapsed 0.001 seconds) done. >> [ 628.768340] OOM killer disabled. >> [ 628.772010] PM: Preallocating image memory... done (allocated 197704 pages) >> [ 632.554470] PM: Allocated 790816 kbytes in 3.77 seconds (209.76 MB/s) >> [ 632.561664] Freezing remaining freezable tasks ... (elapsed 0.002 >> seconds) done. >> [ 632.572269] Suspending console(s) (use no_console_suspend to debug) >> >> >> The fix which we have posted looks simple and we don't see any side >> effects of it. >> We have done complete regression testing on our fix and we don't see >> any issue with it. So please consider our fix which have posted. > > scsi_internal_device_block_nowait() can be called by the mpt3sas driver from > interrupt context. lock_system_sleep() locks a mutex and hence must not be > called from interrupt context. Does the above mean that (a) a call to > lock_system_sleep() was inserted in an interrupt handler and No, lock_system_sleep() is not inserted in the interrupt context. we have inserted it in .resume() call back function just before issuing the IOC reset. (b) that the > above test was performed with a kernel in which lockdep was disabled? No, lockdep is enabled in the kernel. ~Sreekanth > > Bart. > > >
On 06/21/18 22:35, Sreekanth Reddy wrote: > No, lock_system_sleep() is not inserted in the interrupt context. we > have inserted it in .resume() call back function just before issuing > the IOC reset. That's the wrong place to insert a lock_system_sleep() call. Please have a look at drivers/scsi/scsi_transport_spi.c for an example of how to use that function. Thanks, Bart.
On Fri, Jun 22, 2018 at 8:22 PM, Bart Van Assche <bart.vanassche@wdc.com> wrote: > On 06/21/18 22:35, Sreekanth Reddy wrote: >> >> No, lock_system_sleep() is not inserted in the interrupt context. we >> have inserted it in .resume() call back function just before issuing >> the IOC reset. > > > That's the wrong place to insert a lock_system_sleep() call. Please have a > look at drivers/scsi/scsi_transport_spi.c for an example of how to use that > function. > I am totally confused. In driver's .resume() callback function, driver is doing IOC reset operation. And as per your suggestion we tried using scsi_internal_device_block_nowait() to block the all the devices attached to the HBA before going for IOC reset operation. During system resume time as drives will be in quiesce state and calling scsi_internal_device_block_nowait() return with error status -22. So you suggested us to try using lock_system_sleep() API before calling scsi_internal_device_block_nowait(), so that we don't get -22 return status when we call scsi_internal_device_block_nowait() API during resume time. We tried the same and we see system get hang during hibernation. Please correct me if I am wrong or if I have wrongly understood your suggestions. I feel that the fix which have posted is the better fix then using scsi_internal_device_block_nowait()/scsi_internal_device_unblock_nowait() APIs. Thanks, Sreekanth > Thanks, > > Bart.
On 06/22/18 09:38, Sreekanth Reddy wrote: > In driver's .resume() callback function, driver is doing IOC reset > operation. And as per your suggestion we tried using > scsi_internal_device_block_nowait() to block the all the devices > attached to the HBA before going for IOC reset operation. During > system resume time as drives will be in quiesce state and calling > scsi_internal_device_block_nowait() return with error status -22. > > So you suggested us to try using lock_system_sleep() API before > calling scsi_internal_device_block_nowait(), so that we don't get -22 > return status when we call scsi_internal_device_block_nowait() API > during resume time. We tried the same and we see system get hang > during hibernation. Please correct me if I am wrong or if I have > wrongly understood your suggestions. > > I feel that the fix which have posted is the better fix then using > scsi_internal_device_block_nowait()/scsi_internal_device_unblock_nowait() > APIs. Hello Sreekanth, It seems like there is a misunderstanding: what I proposed was to use lock_system_sleep() to delay hibernation until unlock_system_sleep() has been called. I had not realized that the mpt3sas driver can call scsi_internal_device_block_nowait() during system resume. There is another issue with the scsi_internal_device_block_nowait() calls by the mpt3sas driver: callers like _scsih_sas_broadcast_primitive_event() seem to assume that all scsih_qcmd() calls have finished as soon as scsi_internal_device_block_nowait() has returned. However, that assumption is not correct, especially when using scsi-mq. If the patch "mpt3sas: Fix calltrace observed while running IO & host reset" would be allowed upstream, will the issues mentioned above ever be addressed? Bart.
On Sat, Jun 23, 2018 at 2:56 AM, Bart Van Assche <bart.vanassche@wdc.com> wrote: > On 06/22/18 09:38, Sreekanth Reddy wrote: >> >> In driver's .resume() callback function, driver is doing IOC reset >> operation. And as per your suggestion we tried using >> scsi_internal_device_block_nowait() to block the all the devices >> attached to the HBA before going for IOC reset operation. During >> system resume time as drives will be in quiesce state and calling >> scsi_internal_device_block_nowait() return with error status -22. >> >> So you suggested us to try using lock_system_sleep() API before >> calling scsi_internal_device_block_nowait(), so that we don't get -22 >> return status when we call scsi_internal_device_block_nowait() API >> during resume time. We tried the same and we see system get hang >> during hibernation. Please correct me if I am wrong or if I have >> wrongly understood your suggestions. >> >> I feel that the fix which have posted is the better fix then using >> scsi_internal_device_block_nowait()/scsi_internal_device_unblock_nowait() >> APIs. > > > Hello Sreekanth, > > It seems like there is a misunderstanding: what I proposed was to use > lock_system_sleep() to delay hibernation until unlock_system_sleep() has > been called. I had not realized that the mpt3sas driver can call > scsi_internal_device_block_nowait() during system resume. > > There is another issue with the scsi_internal_device_block_nowait() calls by > the mpt3sas driver: callers like _scsih_sas_broadcast_primitive_event() seem > to assume that all scsih_qcmd() calls have finished as soon as > scsi_internal_device_block_nowait() has returned. However, that assumption > is not correct, especially when using scsi-mq. > Hello Bart, Before calling scsi_internal_device_block_nowait() API; driver sets sas_device_priv_data->block flag as one. And in the scsih_qcmd() driver checks for this flag as shown below and return the commands with host busy status. } else if (sas_target_priv_data->tm_busy || sas_device_priv_data->block) return SCSI_MLQUEUE_DEVICE_BUSY; Also as a part of processing the braodcast primitive event driver issues the task query TM to determine where the IO command is and will only abort those IOs (by issuing task abort TM) which are queued in the target devices. Hope I have clarified the issue which you have pointed out. > If the patch "mpt3sas: Fix calltrace observed while running IO & host reset" > would be allowed upstream, will the issues mentioned above ever be > addressed? If their are any issues we are always avilable to address them. Thanks, Sreekanth > > Bart.
On 06/24/18 23:10, Sreekanth Reddy wrote: > Before calling scsi_internal_device_block_nowait() API; driver sets > sas_device_priv_data->block flag as one. And in the scsih_qcmd() > driver checks for this flag as shown below and return the commands > with host busy status. > > } else if (sas_target_priv_data->tm_busy || > sas_device_priv_data->block) > return SCSI_MLQUEUE_DEVICE_BUSY; That's exactly the kind of construct that should occur in the SCSI core or block layer core and not in a SCSI LLD. Additionally, as explained before, the construct you described above is racy. Bart.
On Thu, Jun 28, 2018 at 3:48 AM, Bart Van Assche <bart.vanassche@wdc.com> wrote: > On 06/24/18 23:10, Sreekanth Reddy wrote: >> >> Before calling scsi_internal_device_block_nowait() API; driver sets >> sas_device_priv_data->block flag as one. And in the scsih_qcmd() >> driver checks for this flag as shown below and return the commands >> with host busy status. >> >> } else if (sas_target_priv_data->tm_busy || >> sas_device_priv_data->block) >> return SCSI_MLQUEUE_DEVICE_BUSY; > > > That's exactly the kind of construct that should occur in the SCSI core or > block layer core and not in a SCSI LLD. Additionally, as explained before, > the construct you described above is racy. Can you please elaborate more in details about the racy condition which you think? Also if at all is their any racy condition here then we are ready to work on it separately, So please consider the fix which we have posted. Thanks, Sreekanth > > Bart.
On Thu, Jun 28, 2018 at 10:56 AM, Sreekanth Reddy <sreekanth.reddy@broadcom.com> wrote: > On Thu, Jun 28, 2018 at 3:48 AM, Bart Van Assche <bart.vanassche@wdc.com> wrote: >> On 06/24/18 23:10, Sreekanth Reddy wrote: >>> >>> Before calling scsi_internal_device_block_nowait() API; driver sets >>> sas_device_priv_data->block flag as one. And in the scsih_qcmd() >>> driver checks for this flag as shown below and return the commands >>> with host busy status. >>> >>> } else if (sas_target_priv_data->tm_busy || >>> sas_device_priv_data->block) >>> return SCSI_MLQUEUE_DEVICE_BUSY; >> >> >> That's exactly the kind of construct that should occur in the SCSI core or >> block layer core and not in a SCSI LLD. Additionally, as explained before, >> the construct you described above is racy. > > Can you please elaborate more in details about the racy condition > which you think? > Also if at all is their any racy condition here then we are ready to > work on it separately, > So please consider the fix which we have posted. > > Thanks, > Sreekanth > > > > >> >> Bart. Any update regarding this patch? Thanks, Sreekanth
On Mon, Jul 9, 2018 at 5:14 PM, Sreekanth Reddy <sreekanth.reddy@broadcom.com> wrote: > On Thu, Jun 28, 2018 at 10:56 AM, Sreekanth Reddy > <sreekanth.reddy@broadcom.com> wrote: >> On Thu, Jun 28, 2018 at 3:48 AM, Bart Van Assche <bart.vanassche@wdc.com> wrote: >>> On 06/24/18 23:10, Sreekanth Reddy wrote: >>>> >>>> Before calling scsi_internal_device_block_nowait() API; driver sets >>>> sas_device_priv_data->block flag as one. And in the scsih_qcmd() >>>> driver checks for this flag as shown below and return the commands >>>> with host busy status. >>>> >>>> } else if (sas_target_priv_data->tm_busy || >>>> sas_device_priv_data->block) >>>> return SCSI_MLQUEUE_DEVICE_BUSY; >>> >>> >>> That's exactly the kind of construct that should occur in the SCSI core or >>> block layer core and not in a SCSI LLD. Additionally, as explained before, >>> the construct you described above is racy. >> >> Can you please elaborate more in details about the racy condition >> which you think? >> Also if at all is their any racy condition here then we are ready to >> work on it separately, >> So please consider the fix which we have posted. >> >> Thanks, >> Sreekanth >> >> >> >> >>> >>> Bart. > > > Any update regarding this patch? Pinging once again! please consider this patch fix. this is a very simple fix for the issue reported and doesn't have any side effort. Regards, Sreekanth > > Thanks, > Sreekanth
On Tue, Jul 10, 2018 at 10:58 PM, Sreekanth Reddy <sreekanth.reddy@broadcom.com> wrote: > On Mon, Jul 9, 2018 at 5:14 PM, Sreekanth Reddy > <sreekanth.reddy@broadcom.com> wrote: >> On Thu, Jun 28, 2018 at 10:56 AM, Sreekanth Reddy >> <sreekanth.reddy@broadcom.com> wrote: >>> On Thu, Jun 28, 2018 at 3:48 AM, Bart Van Assche <bart.vanassche@wdc.com> wrote: >>>> On 06/24/18 23:10, Sreekanth Reddy wrote: >>>>> >>>>> Before calling scsi_internal_device_block_nowait() API; driver sets >>>>> sas_device_priv_data->block flag as one. And in the scsih_qcmd() >>>>> driver checks for this flag as shown below and return the commands >>>>> with host busy status. >>>>> >>>>> } else if (sas_target_priv_data->tm_busy || >>>>> sas_device_priv_data->block) >>>>> return SCSI_MLQUEUE_DEVICE_BUSY; >>>> >>>> >>>> That's exactly the kind of construct that should occur in the SCSI core or >>>> block layer core and not in a SCSI LLD. Additionally, as explained before, >>>> the construct you described above is racy. >>> >>> Can you please elaborate more in details about the racy condition >>> which you think? >>> Also if at all is their any racy condition here then we are ready to >>> work on it separately, >>> So please consider the fix which we have posted. >>> >>> Thanks, >>> Sreekanth >>> >>> >>> >>> >>>> >>>> Bart. >> >> >> Any update regarding this patch? > > Pinging once again! please consider this patch fix. this is a very > simple fix for the issue reported and doesn't have any side effort. Any update on acceptance of this patch. This is a very simple fix for the current issue which is described in the patch header, please consider this patch. We will also review complete driver code path once again and we are ready to fix if any racy conditions found while reviewing in coming days. Thanks, Sreekanth
diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.c b/drivers/scsi/mpt3sas/mpt3sas_base.c index 668f350..49d92d0 100644 --- a/drivers/scsi/mpt3sas/mpt3sas_base.c +++ b/drivers/scsi/mpt3sas/mpt3sas_base.c @@ -3309,6 +3309,7 @@ void mpt3sas_base_clear_st(struct MPT3SAS_ADAPTER *ioc, return; st->cb_idx = 0xFF; st->direct_io = 0; + st->smid = 0; atomic_set(&ioc->chain_lookup[st->smid - 1].chain_offset, 0); } diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c b/drivers/scsi/mpt3sas/mpt3sas_scsih.c index 23902ad..96e523a 100644 --- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c +++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c @@ -1489,7 +1489,7 @@ struct scsi_cmnd * scmd = scsi_host_find_tag(ioc->shost, unique_tag); if (scmd) { st = scsi_cmd_priv(scmd); - if (st->cb_idx == 0xFF) + if (st->cb_idx == 0xFF || st->smid == 0) scmd = NULL; } }