diff mbox

[v2,3/4] scsi: Avoid that .queuecommand() gets called for a quiesced SCSI device

Message ID 20180110181817.25166-4-bart.vanassche@wdc.com (mailing list archive)
State Deferred
Headers show

Commit Message

Bart Van Assche Jan. 10, 2018, 6:18 p.m. UTC
Several SCSI transport and LLD drivers surround code that does not
tolerate concurrent calls of .queuecommand() with scsi_target_block() /
scsi_target_unblock(). These last two functions use
blk_mq_quiesce_queue() / blk_mq_unquiesce_queue() for scsi-mq request
queues to prevent concurrent .queuecommand() calls. However, that is
not sufficient to prevent .queuecommand() calls from scsi_send_eh_cmnd().
Hence surround the .queuecommand() call from the SCSI error handler with
blk_start_wait_if_quiesced() / blk_finish_wait_if_quiesced().

Note: converting the .queuecommand() call in scsi_send_eh_cmnd() into
code that calls blk_get_request(), e.g. scsi_execute_req(), is not an
option since scsi_send_eh_cmnd() can be called if all requests are
allocated and if no requests will make progress without aborting any
of these requests.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Ming Lei <ming.lei@redhat.com>
---
 drivers/scsi/scsi_error.c | 3 +++
 1 file changed, 3 insertions(+)

Comments

Ming Lei Jan. 11, 2018, 2:23 a.m. UTC | #1
On Wed, Jan 10, 2018 at 10:18:16AM -0800, Bart Van Assche wrote:
> Several SCSI transport and LLD drivers surround code that does not
> tolerate concurrent calls of .queuecommand() with scsi_target_block() /
> scsi_target_unblock(). These last two functions use
> blk_mq_quiesce_queue() / blk_mq_unquiesce_queue() for scsi-mq request
> queues to prevent concurrent .queuecommand() calls. However, that is

Actually blk_mq_quiesce_queue() is supposed to disable and drain dispatch,
not for prevent concurrent .queuecommand() calls.

> not sufficient to prevent .queuecommand() calls from scsi_send_eh_cmnd().

Given it is error handling, do we need to prevent the .queuecommand() call
in scsi_send_eh_cmnd()? Could you share us what the actual issue
observed is from user view?

> Hence surround the .queuecommand() call from the SCSI error handler with
> blk_start_wait_if_quiesced() / blk_finish_wait_if_quiesced().
> 
> Note: converting the .queuecommand() call in scsi_send_eh_cmnd() into
> code that calls blk_get_request(), e.g. scsi_execute_req(), is not an
> option since scsi_send_eh_cmnd() can be called if all requests are
> allocated and if no requests will make progress without aborting any
> of these requests.

If we need to prevent the .queuecommand() in scsi_send_eh_cmnd(), what
do you think of the approach by requeuing the EH command via
scsi_mq_requeue_cmd()/scsi_requeue_cmd()? And the EH request will be
dispatched finally when the queue becomes unquiesced or the STOPPED
is cleared.
Bart Van Assche Jan. 12, 2018, 10:45 p.m. UTC | #2
On Thu, 2018-01-11 at 10:23 +0800, Ming Lei wrote:
> > not sufficient to prevent .queuecommand() calls from scsi_send_eh_cmnd().

> 

> Given it is error handling, do we need to prevent the .queuecommand() call

> in scsi_send_eh_cmnd()? Could you share us what the actual issue

> observed is from user view?


Please have a look at the kernel bug report in the description of patch 4/4 of
this series.

> > Hence surround the .queuecommand() call from the SCSI error handler with

> > blk_start_wait_if_quiesced() / blk_finish_wait_if_quiesced().

> > 

> > Note: converting the .queuecommand() call in scsi_send_eh_cmnd() into

> > code that calls blk_get_request(), e.g. scsi_execute_req(), is not an

> > option since scsi_send_eh_cmnd() can be called if all requests are

> > allocated and if no requests will make progress without aborting any

> > of these requests.

> 

> If we need to prevent the .queuecommand() in scsi_send_eh_cmnd(), what

> do you think of the approach by requeuing the EH command via

> scsi_mq_requeue_cmd()/scsi_requeue_cmd()? And the EH request will be

> dispatched finally when the queue becomes unquiesced or the STOPPED

> is cleared.


Calling scsi_mq_requeue_cmd() would trigger scsi_mq_uninit_cmd(),
blk_mq_put_driver_tag(), blk_mq_get_driver_tag() and scsi_mq_prep_fn().
That could cause scsi_eh_scmd_add() to be called multiple times for the
same SCSI command. However, that would break one of the assumptions
scsi_eh_scmd_add() is based on, namely that that function gets called
only once per SCSI command that is in progress.

Bart.
Ming Lei Jan. 13, 2018, 3:54 p.m. UTC | #3
On Fri, Jan 12, 2018 at 10:45:57PM +0000, Bart Van Assche wrote:
> On Thu, 2018-01-11 at 10:23 +0800, Ming Lei wrote:
> > > not sufficient to prevent .queuecommand() calls from scsi_send_eh_cmnd().
> > 
> > Given it is error handling, do we need to prevent the .queuecommand() call
> > in scsi_send_eh_cmnd()? Could you share us what the actual issue
> > observed is from user view?
> 
> Please have a look at the kernel bug report in the description of patch 4/4 of
> this series.

Thanks for your mentioning, then I can find the following comment in
srp_queuecommand():

		/*
		 * The SCSI EH thread is the only context from which srp_queuecommand()
		 * can get invoked for blocked devices (SDEV_BLOCK /
		 * SDEV_CREATED_BLOCK). Avoid racing with srp_reconnect_rport() by
		 * locking the rport mutex if invoked from inside the SCSI EH.
		 */

That means EH request is allowed to send to blocked device.

I also replied in patch 4/4, looks there is one simple one line change
which should address the issue of 'sleep in atomic context', please discuss that
in patch 4/4 thread.
Bart Van Assche Jan. 25, 2018, 5:07 p.m. UTC | #4
On Sat, 2018-01-13 at 23:54 +0800, Ming Lei wrote:
> Thanks for your mentioning, then I can find the following comment in

> srp_queuecommand():

> 

> 		/*

> 		 * The SCSI EH thread is the only context from which srp_queuecommand()

> 		 * can get invoked for blocked devices (SDEV_BLOCK /

> 		 * SDEV_CREATED_BLOCK). Avoid racing with srp_reconnect_rport() by

> 		 * locking the rport mutex if invoked from inside the SCSI EH.

> 		 */

> 

> That means EH request is allowed to send to blocked device.


No way. That's a bug and a bug that needs to be fixed.

Bart.
diff mbox

Patch

diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index 62b56de38ae8..f7154ea86715 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -1016,6 +1016,7 @@  static int scsi_send_eh_cmnd(struct scsi_cmnd *scmd, unsigned char *cmnd,
 {
 	struct scsi_device *sdev = scmd->device;
 	struct Scsi_Host *shost = sdev->host;
+	struct request_queue *q = sdev->request_queue;
 	DECLARE_COMPLETION_ONSTACK(done);
 	unsigned long timeleft = timeout;
 	struct scsi_eh_save ses;
@@ -1028,7 +1029,9 @@  static int scsi_send_eh_cmnd(struct scsi_cmnd *scmd, unsigned char *cmnd,
 
 	scsi_log_send(scmd);
 	scmd->scsi_done = scsi_eh_done;
+	blk_start_wait_if_quiesced(q);
 	rtn = shost->hostt->queuecommand(shost, scmd);
+	blk_finish_wait_if_quiesced(q);
 	if (rtn) {
 		if (timeleft > stall_for) {
 			scsi_eh_restore_cmnd(scmd, &ses);