Message ID | 20230607182249.22623-5-mwilck@suse.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | scsi: fixes for targets with many LUNs, and scsi_target_block rework | expand |
On 6/7/23 11:22, mwilck@suse.com wrote: > From: Martin Wilck <mwilck@suse.com> > > sdev->state_mutex protects only sdev->sdev_state. There's no reason > to keep it held while calling scsi_stop_queue(). > > Signed-off-by: Martin Wilck <mwilck@suse.com> > --- > drivers/scsi/scsi_lib.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c > index ce5788643011..26e7ce25fa05 100644 > --- a/drivers/scsi/scsi_lib.c > +++ b/drivers/scsi/scsi_lib.c > @@ -2795,9 +2795,9 @@ static void scsi_device_block(struct scsi_device *sdev, void *data) > > mutex_lock(&sdev->state_mutex); > err = __scsi_internal_device_block_nowait(sdev); > + mutex_unlock(&sdev->state_mutex); > if (err == 0) > scsi_stop_queue(sdev, false); > - mutex_unlock(&sdev->state_mutex); > > WARN_ONCE(err, "__scsi_internal_device_block_nowait(%s) failed: err = %d\n", > dev_name(&sdev->sdev_gendev), err); There is a reason why scsi_stop_queue() is called with the sdev state mutex held: if this mutex is not held, unblocking of a SCSI device can start before the scsi_stop_queue() call has finished. It is not allowed to swap the order of the blk_mq_quiesce_queue() and blk_mq_unquiesce_queue() calls. Bart.
On Wed, 2023-06-07 at 12:16 -0700, Bart Van Assche wrote: > On 6/7/23 11:22, mwilck@suse.com wrote: > > From: Martin Wilck <mwilck@suse.com> > > > > sdev->state_mutex protects only sdev->sdev_state. There's no reason > > to keep it held while calling scsi_stop_queue(). > > > > Signed-off-by: Martin Wilck <mwilck@suse.com> > > --- > > drivers/scsi/scsi_lib.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c > > index ce5788643011..26e7ce25fa05 100644 > > --- a/drivers/scsi/scsi_lib.c > > +++ b/drivers/scsi/scsi_lib.c > > @@ -2795,9 +2795,9 @@ static void scsi_device_block(struct > > scsi_device *sdev, void *data) > > > > mutex_lock(&sdev->state_mutex); > > err = __scsi_internal_device_block_nowait(sdev); > > + mutex_unlock(&sdev->state_mutex); > > if (err == 0) > > scsi_stop_queue(sdev, false); > > - mutex_unlock(&sdev->state_mutex); > > > > WARN_ONCE(err, "__scsi_internal_device_block_nowait(%s) > > failed: err = %d\n", > > dev_name(&sdev->sdev_gendev), err); > > There is a reason why scsi_stop_queue() is called with the sdev state > mutex held: if this mutex is not held, unblocking of a SCSI device > can > start before the scsi_stop_queue() call has finished. It is not > allowed > to swap the order of the blk_mq_quiesce_queue() and > blk_mq_unquiesce_queue() calls. Thanks. This wasn't obvious to me from the current code. I'll add a comment in the next version. Regards Martin
Bart, On Wed, 2023-06-07 at 21:37 +0200, Martin Wilck wrote: > On Wed, 2023-06-07 at 12:16 -0700, Bart Van Assche wrote: > > On 6/7/23 11:22, mwilck@suse.com wrote: > > > From: Martin Wilck <mwilck@suse.com> > > > > > > sdev->state_mutex protects only sdev->sdev_state. There's no > > > reason > > > to keep it held while calling scsi_stop_queue(). > > > > > > Signed-off-by: Martin Wilck <mwilck@suse.com> > > > --- > > > drivers/scsi/scsi_lib.c | 2 +- > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c > > > index ce5788643011..26e7ce25fa05 100644 > > > --- a/drivers/scsi/scsi_lib.c > > > +++ b/drivers/scsi/scsi_lib.c > > > @@ -2795,9 +2795,9 @@ static void scsi_device_block(struct > > > scsi_device *sdev, void *data) > > > > > > mutex_lock(&sdev->state_mutex); > > > err = __scsi_internal_device_block_nowait(sdev); > > > + mutex_unlock(&sdev->state_mutex); > > > if (err == 0) > > > scsi_stop_queue(sdev, false); > > > - mutex_unlock(&sdev->state_mutex); > > > > > > WARN_ONCE(err, "__scsi_internal_device_block_nowait(%s) > > > failed: err = %d\n", > > > dev_name(&sdev->sdev_gendev), err); > > > > There is a reason why scsi_stop_queue() is called with the sdev > > state > > mutex held: if this mutex is not held, unblocking of a SCSI device > > can > > start before the scsi_stop_queue() call has finished. It is not > > allowed > > to swap the order of the blk_mq_quiesce_queue() and > > blk_mq_unquiesce_queue() calls. > > Thanks. This wasn't obvious to me from the current code. I'll add a > comment in the next version. The crucial question is now, is it sufficient to call blk_mq_quiesce_queue_nowait() under the mutex, or does the call to blk_mq_wait_quiesce_done() have to be under the mutex, too? The latter would actually kill off our attempt to fix the delay in fc_remote_port_delete() that was caused by repeated synchronize_rcu() calls. But if I understand you correctly, moving the wait out of the mutex should be ok. I'll update the series accordingly. Thanks, Martin
> > Thanks. This wasn't obvious to me from the current code. I'll add a > > comment in the next version. > > The crucial question is now, is it sufficient to call > blk_mq_quiesce_queue_nowait() under the mutex, or does the call to > blk_mq_wait_quiesce_done() have to be under the mutex, too? > The latter would actually kill off our attempt to fix the delay > in fc_remote_port_delete() that was caused by repeated > synchronize_rcu() calls. > > But if I understand you correctly, moving the wait out of the mutex > should be ok. I'll update the series accordingly. I can't think of a reason we'd want to lock over the wait, but Bart knows this code way better than I do.
On 6/7/23 22:44, Christoph Hellwig wrote: >>> Thanks. This wasn't obvious to me from the current code. I'll add a >>> comment in the next version. >> >> The crucial question is now, is it sufficient to call >> blk_mq_quiesce_queue_nowait() under the mutex, or does the call to >> blk_mq_wait_quiesce_done() have to be under the mutex, too? >> The latter would actually kill off our attempt to fix the delay >> in fc_remote_port_delete() that was caused by repeated >> synchronize_rcu() calls. >> >> But if I understand you correctly, moving the wait out of the mutex >> should be ok. I'll update the series accordingly. > > I can't think of a reason we'd want to lock over the wait, but Bart > knows this code way better than I do. Unless deep changes would be made in the block layer blk_mq_quiesce_queue_nowait() and/or blk_mq_wait_quiesce_done() functions, moving blk_mq_wait_quiesce_done() outside the critical section seems fine to me. Thanks, Bart.
On 6/8/23 9:12 AM, Bart Van Assche wrote: > On 6/7/23 22:44, Christoph Hellwig wrote: >>>> Thanks. This wasn't obvious to me from the current code. I'll add a >>>> comment in the next version. >>> >>> The crucial question is now, is it sufficient to call >>> blk_mq_quiesce_queue_nowait() under the mutex, or does the call to >>> blk_mq_wait_quiesce_done() have to be under the mutex, too? >>> The latter would actually kill off our attempt to fix the delay >>> in fc_remote_port_delete() that was caused by repeated >>> synchronize_rcu() calls. >>> >>> But if I understand you correctly, moving the wait out of the mutex >>> should be ok. I'll update the series accordingly. >> >> I can't think of a reason we'd want to lock over the wait, but Bart >> knows this code way better than I do. > > Unless deep changes would be made in the block layer blk_mq_quiesce_queue_nowait() and/or blk_mq_wait_quiesce_done() functions, moving blk_mq_wait_quiesce_done() outside the critical section seems fine to me. If it helps, I tested with iscsi and the mutex changes discussed above and it worked ok for me. Also thanks for fixing this Martin.
On Thu, 2023-06-08 at 13:54 -0500, Mike Christie wrote: > On 6/8/23 9:12 AM, Bart Van Assche wrote: > > On 6/7/23 22:44, Christoph Hellwig wrote: > > > > > Thanks. This wasn't obvious to me from the current code. I'll > > > > > add a > > > > > comment in the next version. > > > > > > > > The crucial question is now, is it sufficient to call > > > > blk_mq_quiesce_queue_nowait() under the mutex, or does the call > > > > to > > > > blk_mq_wait_quiesce_done() have to be under the mutex, too? > > > > The latter would actually kill off our attempt to fix the delay > > > > in fc_remote_port_delete() that was caused by repeated > > > > synchronize_rcu() calls. > > > > > > > > But if I understand you correctly, moving the wait out of the > > > > mutex > > > > should be ok. I'll update the series accordingly. > > > > > > I can't think of a reason we'd want to lock over the wait, but > > > Bart > > > knows this code way better than I do. > > > > Unless deep changes would be made in the block layer > > blk_mq_quiesce_queue_nowait() and/or blk_mq_wait_quiesce_done() > > functions, moving blk_mq_wait_quiesce_done() outside the critical > > section seems fine to me. > > If it helps, I tested with iscsi and the mutex changes discussed > above > and it worked ok for me. I guess the race that Bart was hinting at is hard to trigger. I would like to remark that the fact that we need to hold the SCSI state_mutex while calling blk_mq_quiesce_queue_nowait() looks like a layering issue to me. Not sure if, and how, this could be avoided, though. Regards Martin
On 6/12/23 04:15, Martin Wilck wrote: > I guess the race that Bart was hinting at is hard to trigger. Are you sure about this? I think this scenario can be triggered by writing into the sysfs attribute that changes the SCSI device state while a scsi_target_block() call is in progress. See also store_state_field(). > I would like to remark that the fact that we need to hold the SCSI > state_mutex while calling blk_mq_quiesce_queue_nowait() looks like a > layering issue to me. Not sure if, and how, this could be avoided, > though. I do not agree that this is a layering issue. Is holding a mutex around a call of a function in a lower layer ever a layering issue? What matters is to be very careful with locks while invoking callback functions. See also slide 7 in Ousterhout's presentation "Why Threads Are A Bad Idea (for most purposes)" from 1996 (https://web.stanford.edu/~ouster/cgi-bin/papers/threads.pdf). Bart.
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index ce5788643011..26e7ce25fa05 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -2795,9 +2795,9 @@ static void scsi_device_block(struct scsi_device *sdev, void *data) mutex_lock(&sdev->state_mutex); err = __scsi_internal_device_block_nowait(sdev); + mutex_unlock(&sdev->state_mutex); if (err == 0) scsi_stop_queue(sdev, false); - mutex_unlock(&sdev->state_mutex); WARN_ONCE(err, "__scsi_internal_device_block_nowait(%s) failed: err = %d\n", dev_name(&sdev->sdev_gendev), err);