Message ID | 20180110181817.25166-4-bart.vanassche@wdc.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Wed, Jan 10, 2018 at 10:18:16AM -0800, Bart Van Assche wrote: > Several SCSI transport and LLD drivers surround code that does not > tolerate concurrent calls of .queuecommand() with scsi_target_block() / > scsi_target_unblock(). These last two functions use > blk_mq_quiesce_queue() / blk_mq_unquiesce_queue() for scsi-mq request > queues to prevent concurrent .queuecommand() calls. However, that is Actually blk_mq_quiesce_queue() is supposed to disable and drain dispatch, not for prevent concurrent .queuecommand() calls. > not sufficient to prevent .queuecommand() calls from scsi_send_eh_cmnd(). Given it is error handling, do we need to prevent the .queuecommand() call in scsi_send_eh_cmnd()? Could you share us what the actual issue observed is from user view? > Hence surround the .queuecommand() call from the SCSI error handler with > blk_start_wait_if_quiesced() / blk_finish_wait_if_quiesced(). > > Note: converting the .queuecommand() call in scsi_send_eh_cmnd() into > code that calls blk_get_request(), e.g. scsi_execute_req(), is not an > option since scsi_send_eh_cmnd() can be called if all requests are > allocated and if no requests will make progress without aborting any > of these requests. If we need to prevent the .queuecommand() in scsi_send_eh_cmnd(), what do you think of the approach by requeuing the EH command via scsi_mq_requeue_cmd()/scsi_requeue_cmd()? And the EH request will be dispatched finally when the queue becomes unquiesced or the STOPPED is cleared.
On Thu, 2018-01-11 at 10:23 +0800, Ming Lei wrote: > > not sufficient to prevent .queuecommand() calls from scsi_send_eh_cmnd(). > > Given it is error handling, do we need to prevent the .queuecommand() call > in scsi_send_eh_cmnd()? Could you share us what the actual issue > observed is from user view? Please have a look at the kernel bug report in the description of patch 4/4 of this series. > > Hence surround the .queuecommand() call from the SCSI error handler with > > blk_start_wait_if_quiesced() / blk_finish_wait_if_quiesced(). > > > > Note: converting the .queuecommand() call in scsi_send_eh_cmnd() into > > code that calls blk_get_request(), e.g. scsi_execute_req(), is not an > > option since scsi_send_eh_cmnd() can be called if all requests are > > allocated and if no requests will make progress without aborting any > > of these requests. > > If we need to prevent the .queuecommand() in scsi_send_eh_cmnd(), what > do you think of the approach by requeuing the EH command via > scsi_mq_requeue_cmd()/scsi_requeue_cmd()? And the EH request will be > dispatched finally when the queue becomes unquiesced or the STOPPED > is cleared. Calling scsi_mq_requeue_cmd() would trigger scsi_mq_uninit_cmd(), blk_mq_put_driver_tag(), blk_mq_get_driver_tag() and scsi_mq_prep_fn(). That could cause scsi_eh_scmd_add() to be called multiple times for the same SCSI command. However, that would break one of the assumptions scsi_eh_scmd_add() is based on, namely that that function gets called only once per SCSI command that is in progress. Bart.
On Fri, Jan 12, 2018 at 10:45:57PM +0000, Bart Van Assche wrote: > On Thu, 2018-01-11 at 10:23 +0800, Ming Lei wrote: > > > not sufficient to prevent .queuecommand() calls from scsi_send_eh_cmnd(). > > > > Given it is error handling, do we need to prevent the .queuecommand() call > > in scsi_send_eh_cmnd()? Could you share us what the actual issue > > observed is from user view? > > Please have a look at the kernel bug report in the description of patch 4/4 of > this series. Thanks for your mentioning, then I can find the following comment in srp_queuecommand(): /* * The SCSI EH thread is the only context from which srp_queuecommand() * can get invoked for blocked devices (SDEV_BLOCK / * SDEV_CREATED_BLOCK). Avoid racing with srp_reconnect_rport() by * locking the rport mutex if invoked from inside the SCSI EH. */ That means EH request is allowed to send to blocked device. I also replied in patch 4/4, looks there is one simple one line change which should address the issue of 'sleep in atomic context', please discuss that in patch 4/4 thread.
On Sat, 2018-01-13 at 23:54 +0800, Ming Lei wrote: > Thanks for your mentioning, then I can find the following comment in > srp_queuecommand(): > > /* > * The SCSI EH thread is the only context from which srp_queuecommand() > * can get invoked for blocked devices (SDEV_BLOCK / > * SDEV_CREATED_BLOCK). Avoid racing with srp_reconnect_rport() by > * locking the rport mutex if invoked from inside the SCSI EH. > */ > > That means EH request is allowed to send to blocked device. No way. That's a bug and a bug that needs to be fixed. Bart.
diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c index 62b56de38ae8..f7154ea86715 100644 --- a/drivers/scsi/scsi_error.c +++ b/drivers/scsi/scsi_error.c @@ -1016,6 +1016,7 @@ static int scsi_send_eh_cmnd(struct scsi_cmnd *scmd, unsigned char *cmnd, { struct scsi_device *sdev = scmd->device; struct Scsi_Host *shost = sdev->host; + struct request_queue *q = sdev->request_queue; DECLARE_COMPLETION_ONSTACK(done); unsigned long timeleft = timeout; struct scsi_eh_save ses; @@ -1028,7 +1029,9 @@ static int scsi_send_eh_cmnd(struct scsi_cmnd *scmd, unsigned char *cmnd, scsi_log_send(scmd); scmd->scsi_done = scsi_eh_done; + blk_start_wait_if_quiesced(q); rtn = shost->hostt->queuecommand(shost, scmd); + blk_finish_wait_if_quiesced(q); if (rtn) { if (timeleft > stall_for) { scsi_eh_restore_cmnd(scmd, &ses);