diff mbox series

[v3,4/8] scsi: call scsi_stop_queue() without state_mutex held

Message ID 20230607182249.22623-5-mwilck@suse.com (mailing list archive)
State Superseded
Headers show
Series scsi: fixes for targets with many LUNs, and scsi_target_block rework | expand

Commit Message

Martin Wilck June 7, 2023, 6:22 p.m. UTC
From: Martin Wilck <mwilck@suse.com>

sdev->state_mutex protects only sdev->sdev_state. There's no reason
to keep it held while calling scsi_stop_queue().

Signed-off-by: Martin Wilck <mwilck@suse.com>
---
 drivers/scsi/scsi_lib.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Bart Van Assche June 7, 2023, 7:16 p.m. UTC | #1
On 6/7/23 11:22, mwilck@suse.com wrote:
> From: Martin Wilck <mwilck@suse.com>
> 
> sdev->state_mutex protects only sdev->sdev_state. There's no reason
> to keep it held while calling scsi_stop_queue().
> 
> Signed-off-by: Martin Wilck <mwilck@suse.com>
> ---
>   drivers/scsi/scsi_lib.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index ce5788643011..26e7ce25fa05 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -2795,9 +2795,9 @@ static void scsi_device_block(struct scsi_device *sdev, void *data)
>   
>   	mutex_lock(&sdev->state_mutex);
>   	err = __scsi_internal_device_block_nowait(sdev);
> +	mutex_unlock(&sdev->state_mutex);
>   	if (err == 0)
>   		scsi_stop_queue(sdev, false);
> -	mutex_unlock(&sdev->state_mutex);
>   
>   	WARN_ONCE(err, "__scsi_internal_device_block_nowait(%s) failed: err = %d\n",
>   		  dev_name(&sdev->sdev_gendev), err);

There is a reason why scsi_stop_queue() is called with the sdev state 
mutex held: if this mutex is not held, unblocking of a SCSI device can 
start before the scsi_stop_queue() call has finished. It is not allowed 
to swap the order of the blk_mq_quiesce_queue() and 
blk_mq_unquiesce_queue() calls.

Bart.
Martin Wilck June 7, 2023, 7:37 p.m. UTC | #2
On Wed, 2023-06-07 at 12:16 -0700, Bart Van Assche wrote:
> On 6/7/23 11:22, mwilck@suse.com wrote:
> > From: Martin Wilck <mwilck@suse.com>
> > 
> > sdev->state_mutex protects only sdev->sdev_state. There's no reason
> > to keep it held while calling scsi_stop_queue().
> > 
> > Signed-off-by: Martin Wilck <mwilck@suse.com>
> > ---
> >   drivers/scsi/scsi_lib.c | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> > index ce5788643011..26e7ce25fa05 100644
> > --- a/drivers/scsi/scsi_lib.c
> > +++ b/drivers/scsi/scsi_lib.c
> > @@ -2795,9 +2795,9 @@ static void scsi_device_block(struct
> > scsi_device *sdev, void *data)
> >   
> >         mutex_lock(&sdev->state_mutex);
> >         err = __scsi_internal_device_block_nowait(sdev);
> > +       mutex_unlock(&sdev->state_mutex);
> >         if (err == 0)
> >                 scsi_stop_queue(sdev, false);
> > -       mutex_unlock(&sdev->state_mutex);
> >   
> >         WARN_ONCE(err, "__scsi_internal_device_block_nowait(%s)
> > failed: err = %d\n",
> >                   dev_name(&sdev->sdev_gendev), err);
> 
> There is a reason why scsi_stop_queue() is called with the sdev state
> mutex held: if this mutex is not held, unblocking of a SCSI device
> can 
> start before the scsi_stop_queue() call has finished. It is not
> allowed 
> to swap the order of the blk_mq_quiesce_queue() and 
> blk_mq_unquiesce_queue() calls.

Thanks. This wasn't obvious to me from the current code. I'll add a
comment in the next version.

Regards
Martin
Martin Wilck June 7, 2023, 8:07 p.m. UTC | #3
Bart,

On Wed, 2023-06-07 at 21:37 +0200, Martin Wilck wrote:
> On Wed, 2023-06-07 at 12:16 -0700, Bart Van Assche wrote:
> > On 6/7/23 11:22, mwilck@suse.com wrote:
> > > From: Martin Wilck <mwilck@suse.com>
> > > 
> > > sdev->state_mutex protects only sdev->sdev_state. There's no
> > > reason
> > > to keep it held while calling scsi_stop_queue().
> > > 
> > > Signed-off-by: Martin Wilck <mwilck@suse.com>
> > > ---
> > >   drivers/scsi/scsi_lib.c | 2 +-
> > >   1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> > > index ce5788643011..26e7ce25fa05 100644
> > > --- a/drivers/scsi/scsi_lib.c
> > > +++ b/drivers/scsi/scsi_lib.c
> > > @@ -2795,9 +2795,9 @@ static void scsi_device_block(struct
> > > scsi_device *sdev, void *data)
> > >   
> > >         mutex_lock(&sdev->state_mutex);
> > >         err = __scsi_internal_device_block_nowait(sdev);
> > > +       mutex_unlock(&sdev->state_mutex);
> > >         if (err == 0)
> > >                 scsi_stop_queue(sdev, false);
> > > -       mutex_unlock(&sdev->state_mutex);
> > >   
> > >         WARN_ONCE(err, "__scsi_internal_device_block_nowait(%s)
> > > failed: err = %d\n",
> > >                   dev_name(&sdev->sdev_gendev), err);
> > 
> > There is a reason why scsi_stop_queue() is called with the sdev
> > state
> > mutex held: if this mutex is not held, unblocking of a SCSI device
> > can 
> > start before the scsi_stop_queue() call has finished. It is not
> > allowed 
> > to swap the order of the blk_mq_quiesce_queue() and 
> > blk_mq_unquiesce_queue() calls.
> 
> Thanks. This wasn't obvious to me from the current code. I'll add a
> comment in the next version.

The crucial question is now, is it sufficient to call
blk_mq_quiesce_queue_nowait() under the mutex, or does the call to
blk_mq_wait_quiesce_done() have to be under the mutex, too?
The latter would actually kill off our attempt to fix the delay
in fc_remote_port_delete() that was caused by repeated
synchronize_rcu() calls.

But if I understand you correctly, moving the wait out of the mutex
should be ok. I'll update the series accordingly.

Thanks,
Martin
Christoph Hellwig June 8, 2023, 5:44 a.m. UTC | #4
> > Thanks. This wasn't obvious to me from the current code. I'll add a
> > comment in the next version.
> 
> The crucial question is now, is it sufficient to call
> blk_mq_quiesce_queue_nowait() under the mutex, or does the call to
> blk_mq_wait_quiesce_done() have to be under the mutex, too?
> The latter would actually kill off our attempt to fix the delay
> in fc_remote_port_delete() that was caused by repeated
> synchronize_rcu() calls.
> 
> But if I understand you correctly, moving the wait out of the mutex
> should be ok. I'll update the series accordingly.

I can't think of a reason we'd want to lock over the wait, but Bart
knows this code way better than I do.
Bart Van Assche June 8, 2023, 2:12 p.m. UTC | #5
On 6/7/23 22:44, Christoph Hellwig wrote:
>>> Thanks. This wasn't obvious to me from the current code. I'll add a
>>> comment in the next version.
>>
>> The crucial question is now, is it sufficient to call
>> blk_mq_quiesce_queue_nowait() under the mutex, or does the call to
>> blk_mq_wait_quiesce_done() have to be under the mutex, too?
>> The latter would actually kill off our attempt to fix the delay
>> in fc_remote_port_delete() that was caused by repeated
>> synchronize_rcu() calls.
>>
>> But if I understand you correctly, moving the wait out of the mutex
>> should be ok. I'll update the series accordingly.
> 
> I can't think of a reason we'd want to lock over the wait, but Bart
> knows this code way better than I do.

Unless deep changes would be made in the block layer 
blk_mq_quiesce_queue_nowait() and/or blk_mq_wait_quiesce_done() 
functions, moving blk_mq_wait_quiesce_done() outside the critical 
section seems fine to me.

Thanks,

Bart.
Mike Christie June 8, 2023, 6:54 p.m. UTC | #6
On 6/8/23 9:12 AM, Bart Van Assche wrote:
> On 6/7/23 22:44, Christoph Hellwig wrote:
>>>> Thanks. This wasn't obvious to me from the current code. I'll add a
>>>> comment in the next version.
>>>
>>> The crucial question is now, is it sufficient to call
>>> blk_mq_quiesce_queue_nowait() under the mutex, or does the call to
>>> blk_mq_wait_quiesce_done() have to be under the mutex, too?
>>> The latter would actually kill off our attempt to fix the delay
>>> in fc_remote_port_delete() that was caused by repeated
>>> synchronize_rcu() calls.
>>>
>>> But if I understand you correctly, moving the wait out of the mutex
>>> should be ok. I'll update the series accordingly.
>>
>> I can't think of a reason we'd want to lock over the wait, but Bart
>> knows this code way better than I do.
> 
> Unless deep changes would be made in the block layer blk_mq_quiesce_queue_nowait() and/or blk_mq_wait_quiesce_done() functions, moving blk_mq_wait_quiesce_done() outside the critical section seems fine to me.

If it helps, I tested with iscsi and the mutex changes discussed above
and it worked ok for me.

Also thanks for fixing this Martin.
Martin Wilck June 12, 2023, 11:15 a.m. UTC | #7
On Thu, 2023-06-08 at 13:54 -0500, Mike Christie wrote:
> On 6/8/23 9:12 AM, Bart Van Assche wrote:
> > On 6/7/23 22:44, Christoph Hellwig wrote:
> > > > > Thanks. This wasn't obvious to me from the current code. I'll
> > > > > add a
> > > > > comment in the next version.
> > > > 
> > > > The crucial question is now, is it sufficient to call
> > > > blk_mq_quiesce_queue_nowait() under the mutex, or does the call
> > > > to
> > > > blk_mq_wait_quiesce_done() have to be under the mutex, too?
> > > > The latter would actually kill off our attempt to fix the delay
> > > > in fc_remote_port_delete() that was caused by repeated
> > > > synchronize_rcu() calls.
> > > > 
> > > > But if I understand you correctly, moving the wait out of the
> > > > mutex
> > > > should be ok. I'll update the series accordingly.
> > > 
> > > I can't think of a reason we'd want to lock over the wait, but
> > > Bart
> > > knows this code way better than I do.
> > 
> > Unless deep changes would be made in the block layer
> > blk_mq_quiesce_queue_nowait() and/or blk_mq_wait_quiesce_done()
> > functions, moving blk_mq_wait_quiesce_done() outside the critical
> > section seems fine to me.
> 
> If it helps, I tested with iscsi and the mutex changes discussed
> above
> and it worked ok for me.

I guess the race that Bart was hinting at is hard to trigger.

I would like to remark that the fact that we need to hold the SCSI
state_mutex while calling blk_mq_quiesce_queue_nowait() looks like a
layering issue to me. Not sure if, and how, this could be avoided,
though.

Regards
Martin
Bart Van Assche June 12, 2023, 1:41 p.m. UTC | #8
On 6/12/23 04:15, Martin Wilck wrote:
> I guess the race that Bart was hinting at is hard to trigger.

Are you sure about this? I think this scenario can be triggered by 
writing into the sysfs attribute that changes the SCSI device state 
while a scsi_target_block() call is in progress. See also 
store_state_field().

> I would like to remark that the fact that we need to hold the SCSI
> state_mutex while calling blk_mq_quiesce_queue_nowait() looks like a
> layering issue to me. Not sure if, and how, this could be avoided,
> though.

I do not agree that this is a layering issue. Is holding a mutex around 
a call of a function in a lower layer ever a layering issue?

What matters is to be very careful with locks while invoking callback 
functions. See also slide 7 in Ousterhout's presentation "Why Threads 
Are A Bad Idea (for most purposes)" from 1996 
(https://web.stanford.edu/~ouster/cgi-bin/papers/threads.pdf).

Bart.
diff mbox series

Patch

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index ce5788643011..26e7ce25fa05 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -2795,9 +2795,9 @@  static void scsi_device_block(struct scsi_device *sdev, void *data)
 
 	mutex_lock(&sdev->state_mutex);
 	err = __scsi_internal_device_block_nowait(sdev);
+	mutex_unlock(&sdev->state_mutex);
 	if (err == 0)
 		scsi_stop_queue(sdev, false);
-	mutex_unlock(&sdev->state_mutex);
 
 	WARN_ONCE(err, "__scsi_internal_device_block_nowait(%s) failed: err = %d\n",
 		  dev_name(&sdev->sdev_gendev), err);