diff mbox series

[v2,3/3] scsi: simplify scsi_stop_queue()

Message ID 20230606193845.9627-4-mwilck@suse.com (mailing list archive)
State Superseded
Headers show
Series scsi: fixes for targets with many LUNs | expand

Commit Message

Martin Wilck June 6, 2023, 7:38 p.m. UTC
From: Hannes Reinecke <hare@suse.de>

scsi_target_block() calls scsi_stop_queue() for each scsi_device
and scsi_stop_queue() calls blk_mq_wait_quiesce_done() for each LUN.
As blk_mq_wait_quiesce_done() comes down to synchronize_rcu() for
SCSI queues, this can cause substantial delay for scsi_target_block()
on a target with a lot of logical units (we measured more than 100s
delay for blocking a FC rport with 2048 LUNs).

Simplify scsi_stop_queue(), which is only called in this code path, to never
wait for the quiescing to finish. Rather call blk_mq_wait_quiesce_done()
from scsi_target_block() after iterating over all devices.

Also, move the call to scsi_stop_queue() in scsi_internal_device_block()
out of the code section where the state_mutex is held.

This patch uses the same basic idea as f983622ae605 ("scsi: core: Avoid
calling synchronize_rcu() for each device in scsi_host_block()").

Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Martin Wilck <mwilck@suse.com>
---
 drivers/scsi/scsi_lib.c | 28 ++++++++++++++--------------
 1 file changed, 14 insertions(+), 14 deletions(-)

Comments

Christoph Hellwig June 7, 2023, 5:27 a.m. UTC | #1
On Tue, Jun 06, 2023 at 09:38:45PM +0200, mwilck@suse.com wrote:
> Simplify scsi_stop_queue(), which is only called in this code path, to never
> wait for the quiescing to finish. Rather call blk_mq_wait_quiesce_done()
> from scsi_target_block() after iterating over all devices.

I don't think simplify is the right word here.  The code isn't in any
way simpler, it just is more efficient an shifts work from
scsi_stop_queue to scsi_internal_device_block and scsi_target_block.

But the whole transformation is very confusing to me even if it looks
correct in the end, and it took me quite a while to understand it.

I'd suggest to further split this up and include some additional
cleanups:

  1) remove scsi_internal_device_block and fold it into device_block
  2) move the scsi_internal_device_block in what was
     scsi_internal_device_block and now is device_block out
     of state_mutex (and document in the commit log why this is safe)
  3) remove scsi_stop_queue and open code it in the two callers, one
     of which currently wants nowait semantics, and one that doesn't.
  4) move the quiesce wait to scsi_target_block and make it per-tagset

>  scsi_target_block(struct device *dev)
>  {
> +	struct Scsi_Host *shost = dev_to_shost(dev);
> +
>  	if (scsi_is_target_device(dev))
>  		starget_for_each_device(to_scsi_target(dev), NULL,
>  					device_block);
>  	else
>  		device_for_each_child(dev, NULL, target_block);
> +
> +	/* Wait for ongoing scsi_queue_rq() calls to finish. */
> +	if (!WARN_ON_ONCE(!shost))

How could host ever be NULL here?  I can't see why we'd want this
check.

Btw, as far as I can tell scsi_target_block is never called for
a device that is a target device.  It might be worth throwing in
another patch to remove support for that case and simplify things
further.
Martin Wilck June 7, 2023, 9:26 a.m. UTC | #2
On Wed, 2023-06-07 at 07:27 +0200, Christoph Hellwig wrote:
> On Tue, Jun 06, 2023 at 09:38:45PM +0200, mwilck@suse.com wrote:
> > Simplify scsi_stop_queue(), which is only called in this code path,
> > to never
> > wait for the quiescing to finish. Rather call
> > blk_mq_wait_quiesce_done()
> > from scsi_target_block() after iterating over all devices.
> 
> I don't think simplify is the right word here.  The code isn't in any
> way simpler, it just is more efficient an shifts work from
> scsi_stop_queue to scsi_internal_device_block and scsi_target_block.
> 
> But the whole transformation is very confusing to me even if it looks
> correct in the end, and it took me quite a while to understand it.
> 
> I'd suggest to further split this up and include some additional
> cleanups:
> 
>   1) remove scsi_internal_device_block and fold it into device_block

ok

>   2) move the scsi_internal_device_block in what was

You mean scsi_stop_queue() here, right?

>      scsi_internal_device_block and now is device_block out
>      of state_mutex (and document in the commit log why this is safe)
>   3) remove scsi_stop_queue and open code it in the two callers, one
>      of which currently wants nowait semantics, and one that doesn't.
ok

>   4) move the quiesce wait to scsi_target_block and make it per-
> tagset

ok

> 
> >  scsi_target_block(struct device *dev)
> >  {
> > +       struct Scsi_Host *shost = dev_to_shost(dev);
> > +
> >         if (scsi_is_target_device(dev))
> >                 starget_for_each_device(to_scsi_target(dev), NULL,
> >                                         device_block);
> >         else
> >                 device_for_each_child(dev, NULL, target_block);
> > +
> > +       /* Wait for ongoing scsi_queue_rq() calls to finish. */
> > +       if (!WARN_ON_ONCE(!shost))
> 
> How could host ever be NULL here?  I can't see why we'd want this
> check.
> 

The reason is simple: I wasn't certain if dev_to_shost() could return
NULL, and preferred skipping the wait over an Oops. I hear you say that
dev_to_shost() can't go wrong, so I'll remove the NULL test.

> Btw, as far as I can tell scsi_target_block is never called for
> a device that is a target device.  It might be worth throwing in
> another patch to remove support for that case and simplify things
> further.

Can we do this separately, maybe? 

Thanks,
Martin
Martin Wilck June 7, 2023, 9:36 a.m. UTC | #3
On Wed, 2023-06-07 at 11:26 +0200, Martin Wilck wrote:
> On Wed, 2023-06-07 at 07:27 +0200, Christoph Hellwig wrote:
> > 
> >   3) remove scsi_stop_queue and open code it in the two callers,
> > one
> >      of which currently wants nowait semantics, and one that
> > doesn't.
> ok

Hm, scsi_stop_queue() pairs with scsi_start_queue(), do we really want
to open-code it?

Martin
Christoph Hellwig June 7, 2023, 1:34 p.m. UTC | #4
On Wed, Jun 07, 2023 at 11:26:04AM +0200, Martin Wilck wrote:
> > cleanups:
> > 
> >   1) remove scsi_internal_device_block and fold it into device_block
> 
> ok
> 
> >   2) move the scsi_internal_device_block in what was
> 
> You mean scsi_stop_queue() here, right?

Yes.

> The reason is simple: I wasn't certain if dev_to_shost() could return
> NULL, and preferred skipping the wait over an Oops. I hear you say that
> dev_to_shost() can't go wrong, so I'll remove the NULL test.

Well, the parent device can't really go away while we have a reference
to a child.  So the only case where it can return NULL is if the
passed in device isn't the child of a SCSI host, which would be a grave
programming error.

> 
> > Btw, as far as I can tell scsi_target_block is never called for
> > a device that is a target device.  It might be worth throwing in
> > another patch to remove support for that case and simplify things
> > further.
> 
> Can we do this separately, maybe? 

Sure.  Would be nice to just tack into onto the end of this series
if you touch the area, though.

> On Wed, 2023-06-07 at 11:26 +0200, Martin Wilck wrote:
> > On Wed, 2023-06-07 at 07:27 +0200, Christoph Hellwig wrote:
> > > 
> > >   3) remove scsi_stop_queue and open code it in the two callers,
> > > one
> > >      of which currently wants nowait semantics, and one that
> > > doesn't.
> > ok
> 
> Hm, scsi_stop_queue() pairs with scsi_start_queue(), do we really want
> to open-code it?

Oh well, feel free to keep it if you prefer it that way.
Bart Van Assche June 7, 2023, 2:05 p.m. UTC | #5
On 6/7/23 02:26, Martin Wilck wrote:
> On Wed, 2023-06-07 at 07:27 +0200, Christoph Hellwig wrote:
>> On Tue, Jun 06, 2023 at 09:38:45PM +0200, mwilck@suse.com wrote:
>>>   scsi_target_block(struct device *dev)
>>>   {
>>> +       struct Scsi_Host *shost = dev_to_shost(dev);
>>> +
>>>          if (scsi_is_target_device(dev))
>>>                  starget_for_each_device(to_scsi_target(dev), NULL,
>>>                                          device_block);
>>>          else
>>>                  device_for_each_child(dev, NULL, target_block);
>>> +
>>> +       /* Wait for ongoing scsi_queue_rq() calls to finish. */
>>> +       if (!WARN_ON_ONCE(!shost))
>>
>> How could host ever be NULL here?  I can't see why we'd want this
>> check.
> 
> The reason is simple: I wasn't certain if dev_to_shost() could return
> NULL, and preferred skipping the wait over an Oops. I hear you say that
> dev_to_shost() can't go wrong, so I'll remove the NULL test.

I propose to pass shost as the first argument to scsi_target_block() 
instead of using dev_to_shost() inside scsi_target_block(). Except in 
__iscsi_block_session(), shost is already available as a local variable.

Bart.
Christoph Hellwig June 7, 2023, 2:08 p.m. UTC | #6
On Wed, Jun 07, 2023 at 07:05:34AM -0700, Bart Van Assche wrote:
>> The reason is simple: I wasn't certain if dev_to_shost() could return
>> NULL, and preferred skipping the wait over an Oops. I hear you say that
>> dev_to_shost() can't go wrong, so I'll remove the NULL test.
>
> I propose to pass shost as the first argument to scsi_target_block() 
> instead of using dev_to_shost() inside scsi_target_block(). Except in 
> __iscsi_block_session(), shost is already available as a local variable.

That sounds like a good idea to me.
Martin Wilck June 7, 2023, 3:38 p.m. UTC | #7
On Wed, 2023-06-07 at 07:05 -0700, Bart Van Assche wrote:
> On 6/7/23 02:26, Martin Wilck wrote:
> > On Wed, 2023-06-07 at 07:27 +0200, Christoph Hellwig wrote:
> > > On Tue, Jun 06, 2023 at 09:38:45PM +0200, mwilck@suse.com wrote:
> > > >   scsi_target_block(struct device *dev)
> > > >   {
> > > > +       struct Scsi_Host *shost = dev_to_shost(dev);
> > > > +
> > > >          if (scsi_is_target_device(dev))
> > > >                  starget_for_each_device(to_scsi_target(dev),
> > > > NULL,
> > > >                                          device_block);
> > > >          else
> > > >                  device_for_each_child(dev, NULL,
> > > > target_block);
> > > > +
> > > > +       /* Wait for ongoing scsi_queue_rq() calls to finish. */
> > > > +       if (!WARN_ON_ONCE(!shost))
> > > 
> > > How could host ever be NULL here?  I can't see why we'd want this
> > > check.
> > 
> > The reason is simple: I wasn't certain if dev_to_shost() could
> > return
> > NULL, and preferred skipping the wait over an Oops. I hear you say
> > that
> > dev_to_shost() can't go wrong, so I'll remove the NULL test.
> 
> I propose to pass shost as the first argument to scsi_target_block() 
> instead of using dev_to_shost() inside scsi_target_block(). Except in
> __iscsi_block_session(), shost is already available as a local
> variable.

If we do this, it might actually be cleaner to just pass the tag set to
wait for.

Martin
Bart Van Assche June 7, 2023, 4:39 p.m. UTC | #8
On 6/7/23 08:38, Martin Wilck wrote:
> On Wed, 2023-06-07 at 07:05 -0700, Bart Van Assche wrote:
>> On 6/7/23 02:26, Martin Wilck wrote:
>>> On Wed, 2023-06-07 at 07:27 +0200, Christoph Hellwig wrote:
>>>> On Tue, Jun 06, 2023 at 09:38:45PM +0200, mwilck@suse.com wrote:
>>>>>    scsi_target_block(struct device *dev)
>>>>>    {
>>>>> +       struct Scsi_Host *shost = dev_to_shost(dev);
>>>>> +
>>>>>           if (scsi_is_target_device(dev))
>>>>>                   starget_for_each_device(to_scsi_target(dev),
>>>>> NULL,
>>>>>                                           device_block);
>>>>>           else
>>>>>                   device_for_each_child(dev, NULL,
>>>>> target_block);
>>>>> +
>>>>> +       /* Wait for ongoing scsi_queue_rq() calls to finish. */
>>>>> +       if (!WARN_ON_ONCE(!shost))
>>>>
>>>> How could host ever be NULL here?  I can't see why we'd want this
>>>> check.
>>>
>>> The reason is simple: I wasn't certain if dev_to_shost() could
>>> return
>>> NULL, and preferred skipping the wait over an Oops. I hear you say
>>> that
>>> dev_to_shost() can't go wrong, so I'll remove the NULL test.
>>
>> I propose to pass shost as the first argument to scsi_target_block()
>> instead of using dev_to_shost() inside scsi_target_block(). Except in
>> __iscsi_block_session(), shost is already available as a local
>> variable.
> 
> If we do this, it might actually be cleaner to just pass the tag set to
> wait for.

Wouldn't that be close to a layering violation? Shouldn't SCSI APIs accept
pointers to SCSI objects instead of pointers to block layer abstractions?

Thanks,

Bart.
Martin Wilck June 7, 2023, 5:56 p.m. UTC | #9
On Wed, 2023-06-07 at 09:39 -0700, Bart Van Assche wrote:
> On 6/7/23 08:38, Martin Wilck wrote:
> > On Wed, 2023-06-07 at 07:05 -0700, Bart Van Assche wrote:
> > > On 6/7/23 02:26, Martin Wilck wrote:
> > > > On Wed, 2023-06-07 at 07:27 +0200, Christoph Hellwig wrote:
> > > > > On Tue, Jun 06, 2023 at 09:38:45PM +0200,
> > > > > mwilck@suse.com wrote:
> > > > > >    scsi_target_block(struct device *dev)
> > > > > >    {
> > > > > > +       struct Scsi_Host *shost = dev_to_shost(dev);
> > > > > > +
> > > > > >           if (scsi_is_target_device(dev))
> > > > > >                   starget_for_each_device(to_scsi_target(de
> > > > > > v),
> > > > > > NULL,
> > > > > >                                           device_block);
> > > > > >           else
> > > > > >                   device_for_each_child(dev, NULL,
> > > > > > target_block);
> > > > > > +
> > > > > > +       /* Wait for ongoing scsi_queue_rq() calls to
> > > > > > finish. */
> > > > > > +       if (!WARN_ON_ONCE(!shost))
> > > > > 
> > > > > How could host ever be NULL here?  I can't see why we'd want
> > > > > this
> > > > > check.
> > > > 
> > > > The reason is simple: I wasn't certain if dev_to_shost() could
> > > > return
> > > > NULL, and preferred skipping the wait over an Oops. I hear you
> > > > say
> > > > that
> > > > dev_to_shost() can't go wrong, so I'll remove the NULL test.
> > > 
> > > I propose to pass shost as the first argument to
> > > scsi_target_block()
> > > instead of using dev_to_shost() inside scsi_target_block().
> > > Except in
> > > __iscsi_block_session(), shost is already available as a local
> > > variable.
> > 
> > If we do this, it might actually be cleaner to just pass the tag
> > set to
> > wait for.
> 
> Wouldn't that be close to a layering violation? Shouldn't SCSI APIs
> accept
> pointers to SCSI objects instead of pointers to block layer
> abstractions?

My thought was that quiescing is based on tag sets in current kernels,
and passing in the tag set to scsi_target_block() would make that
explicit.

But you've got a point. I'll resubmit the with a Scsi_Host argument and
see how it goes.

Thanks,
Martin
diff mbox series

Patch

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 25489fbd94c6..bc78bea62755 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -2726,24 +2726,18 @@  void scsi_start_queue(struct scsi_device *sdev)
 		blk_mq_unquiesce_queue(sdev->request_queue);
 }
 
-static void scsi_stop_queue(struct scsi_device *sdev, bool nowait)
+static void scsi_stop_queue(struct scsi_device *sdev)
 {
 	/*
 	 * The atomic variable of ->queue_stopped covers that
 	 * blk_mq_quiesce_queue* is balanced with blk_mq_unquiesce_queue.
 	 *
 	 * However, we still need to wait until quiesce is done
-	 * in case that queue has been stopped.
+	 * in case that queue has been stopped. This is done in
+	 * scsi_target_block() for all devices of the target.
 	 */
-	if (!cmpxchg(&sdev->queue_stopped, 0, 1)) {
-		if (nowait)
-			blk_mq_quiesce_queue_nowait(sdev->request_queue);
-		else
-			blk_mq_quiesce_queue(sdev->request_queue);
-	} else {
-		if (!nowait)
-			blk_mq_wait_quiesce_done(sdev->request_queue->tag_set);
-	}
+	if (!cmpxchg(&sdev->queue_stopped, 0, 1))
+		blk_mq_quiesce_queue_nowait(sdev->request_queue);
 }
 
 /**
@@ -2770,7 +2764,7 @@  int scsi_internal_device_block_nowait(struct scsi_device *sdev)
 	 * request queue.
 	 */
 	if (!ret)
-		scsi_stop_queue(sdev, true);
+		scsi_stop_queue(sdev);
 	return ret;
 }
 EXPORT_SYMBOL_GPL(scsi_internal_device_block_nowait);
@@ -2796,9 +2790,9 @@  static int scsi_internal_device_block(struct scsi_device *sdev)
 
 	mutex_lock(&sdev->state_mutex);
 	err = __scsi_internal_device_block_nowait(sdev);
-	if (err == 0)
-		scsi_stop_queue(sdev, false);
 	mutex_unlock(&sdev->state_mutex);
+	if (err == 0)
+		scsi_stop_queue(sdev);
 
 	return err;
 }
@@ -2906,11 +2900,17 @@  target_block(struct device *dev, void *data)
 void
 scsi_target_block(struct device *dev)
 {
+	struct Scsi_Host *shost = dev_to_shost(dev);
+
 	if (scsi_is_target_device(dev))
 		starget_for_each_device(to_scsi_target(dev), NULL,
 					device_block);
 	else
 		device_for_each_child(dev, NULL, target_block);
+
+	/* Wait for ongoing scsi_queue_rq() calls to finish. */
+	if (!WARN_ON_ONCE(!shost))
+		blk_mq_wait_quiesce_done(&shost->tag_set);
 }
 EXPORT_SYMBOL_GPL(scsi_target_block);