diff mbox

Change synchronize_rcu() in scsi_device_quiesce() into synchronize_sched()

Message ID 20180316173516.3048-1-bart.vanassche@wdc.com (mailing list archive)
State Changes Requested
Headers show

Commit Message

Bart Van Assche March 16, 2018, 5:35 p.m. UTC
Since blk_queue_enter() uses rcu_read_lock_sched() scsi_device_quiesce()
must use synchronize_sched().

Reported-by: Tejun Heo <tj@kernel.org>
Fixes: 3a0a529971ec ("block, scsi: Make SCSI quiesce and resume work reliably")
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Tejun Heo <tj@kernel.org>
Cc: Oleksandr Natalenko <oleksandr@natalenko.name>
Cc: Martin Steigerwald <martin@lichtvoll.de>
Cc: stable@vger.kernel.org # v4.15
---
 drivers/scsi/scsi_lib.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Martin Steigerwald March 16, 2018, 9:42 p.m. UTC | #1
Hello Bart.

What is this one about?

Fix for the regression I (and others?) reported?¹

[1] [Bug 199077] [Possible REGRESSION, 4.16-rc4] Error updating SMART data 
during runtime and boot failures with blk_mq_terminate_expired in backtrace

https://bugzilla.kernel.org/show_bug.cgi?id=199077

Thanks,
Martin

Bart Van Assche - 16.03.18, 18:35:
> Since blk_queue_enter() uses rcu_read_lock_sched() scsi_device_quiesce()
> must use synchronize_sched().
> 
> Reported-by: Tejun Heo <tj@kernel.org>
> Fixes: 3a0a529971ec ("block, scsi: Make SCSI quiesce and resume work
> reliably") Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
> Cc: Hannes Reinecke <hare@suse.com>
> Cc: Ming Lei <ming.lei@redhat.com>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Johannes Thumshirn <jthumshirn@suse.de>
> Cc: Tejun Heo <tj@kernel.org>
> Cc: Oleksandr Natalenko <oleksandr@natalenko.name>
> Cc: Martin Steigerwald <martin@lichtvoll.de>
> Cc: stable@vger.kernel.org # v4.15
> ---
>  drivers/scsi/scsi_lib.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index 1d83f29aee74..0b99ee2fbbb5 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -3014,7 +3014,7 @@ scsi_device_quiesce(struct scsi_device *sdev)
>  	 * unfreeze even if the queue was already frozen before this function
>  	 * was called. See also https://lwn.net/Articles/573497/.
>  	 */
> -	synchronize_rcu();
> +	synchronize_sched();
>  	blk_mq_unfreeze_queue(q);
> 
>  	mutex_lock(&sdev->state_mutex);
Bart Van Assche March 16, 2018, 9:51 p.m. UTC | #2
On 03/16/18 14:42, Martin Steigerwald wrote:
> What is this one about?
> 
> Fix for the regression I (and others?) reported?¹
> 
> [1] [Bug 199077] [Possible REGRESSION, 4.16-rc4] Error updating SMART data
> during runtime and boot failures with blk_mq_terminate_expired in backtrace
> 
> https://bugzilla.kernel.org/show_bug.cgi?id=199077

Hello Martin,

This patch is a fix for the fix for the bug that you and others had 
reported. See also "I/O hangs after resuming from suspend-to-ram" 
(https://marc.info/?l=linux-block&m=150340235201348).

This patch fixes an API violation. For those users for which suspend and 
resume works fine with commit 3a0a529971ec applied it should still work 
fine with this patch applied since this patch may cause 
scsi_device_quiesce() to wait longer.

Bart.
Martin Steigerwald March 19, 2018, 9:02 a.m. UTC | #3
Hi Bart.

Bart Van Assche - 16.03.18, 22:51:
> On 03/16/18 14:42, Martin Steigerwald wrote:
> > What is this one about?
> > 
> > Fix for the regression I (and others?) reported?¹
> > 
> > [1] [Bug 199077] [Possible REGRESSION, 4.16-rc4] Error updating SMART data
> > during runtime and boot failures with blk_mq_terminate_expired in
> > backtrace
> > 
> > https://bugzilla.kernel.org/show_bug.cgi?id=199077
[…]
> This patch is a fix for the fix for the bug that you and others had
> reported. See also "I/O hangs after resuming from suspend-to-ram"
> (https://marc.info/?l=linux-block&m=150340235201348).
> 
> This patch fixes an API violation. For those users for which suspend and
> resume works fine with commit 3a0a529971ec applied it should still work
> fine with this patch applied since this patch may cause
> scsi_device_quiesce() to wait longer.

Okay, so if I understand you correctly, this is not related to the regression 
I mentioned above.

Testing anyway.

Thanks,
Tejun Heo March 19, 2018, 2:31 p.m. UTC | #4
On Fri, Mar 16, 2018 at 10:35:16AM -0700, Bart Van Assche wrote:
> Since blk_queue_enter() uses rcu_read_lock_sched() scsi_device_quiesce()
> must use synchronize_sched().

Is there a reason to use sched-RCU here instead of the regular one?
If not, it'd be better to switch to regular RCU than the other way
around.

Thanks.
Bart Van Assche March 19, 2018, 3:16 p.m. UTC | #5
On Mon, 2018-03-19 at 07:31 -0700, Tejun Heo wrote:
> On Fri, Mar 16, 2018 at 10:35:16AM -0700, Bart Van Assche wrote:

> > Since blk_queue_enter() uses rcu_read_lock_sched() scsi_device_quiesce()

> > must use synchronize_sched().

> 

> Is there a reason to use sched-RCU here instead of the regular one?

> If not, it'd be better to switch to regular RCU than the other way

> around.


Hello Tejun,

As explained in the comment in scsi_device_quiesce(), the effect of
blk_set_preempt_only() must be visible for percpu_ref_tryget() calls that
occur after the queue unfreeze triggered by scsi_device_quiesce(). Hence the
RCU read lock calls in blk_queue_enter(). Since these RCU read lock calls
surround a function call that uses rcu_read_lock_sched(), namely
percpu_ref_tryget_live(), we have to use sched-RCU in both blk_queue_enter()
and scsi_device_quiesce().

Bart.
Tejun Heo March 19, 2018, 3:21 p.m. UTC | #6
Hello,

On Mon, Mar 19, 2018 at 03:16:07PM +0000, Bart Van Assche wrote:
> As explained in the comment in scsi_device_quiesce(), the effect of
> blk_set_preempt_only() must be visible for percpu_ref_tryget() calls that
> occur after the queue unfreeze triggered by scsi_device_quiesce(). Hence the
> RCU read lock calls in blk_queue_enter(). Since these RCU read lock calls
> surround a function call that uses rcu_read_lock_sched(), namely
> percpu_ref_tryget_live(), we have to use sched-RCU in both blk_queue_enter()
> and scsi_device_quiesce().

Let's please not depend on rcu_read_lock_sched() in
percpu_ref_tryget_live().  That's an implementation detail.  If the
code needs RCU based synchronization, it should perform them
explicitly.

Thanks.
Bart Van Assche March 19, 2018, 4:18 p.m. UTC | #7
On Mon, 2018-03-19 at 08:21 -0700, tj@kernel.org wrote:
> On Mon, Mar 19, 2018 at 03:16:07PM +0000, Bart Van Assche wrote:

> > As explained in the comment in scsi_device_quiesce(), the effect of

> > blk_set_preempt_only() must be visible for percpu_ref_tryget() calls that

> > occur after the queue unfreeze triggered by scsi_device_quiesce(). Hence the

> > RCU read lock calls in blk_queue_enter(). Since these RCU read lock calls

> > surround a function call that uses rcu_read_lock_sched(), namely

> > percpu_ref_tryget_live(), we have to use sched-RCU in both blk_queue_enter()

> > and scsi_device_quiesce().

> 

> Let's please not depend on rcu_read_lock_sched() in

> percpu_ref_tryget_live().  That's an implementation detail.  If the

> code needs RCU based synchronization, it should perform them

> explicitly.


Hello Tejun,

The algorithm explained above does not depend on sched-rcu. But because
percpu_ref_tryget_live() uses sched-rcu and because we need to add an RCU lock
around that call we are forced to use sched-rcu. I hope this makes it clear.

Bart.
Tejun Heo March 19, 2018, 4:29 p.m. UTC | #8
Hello,

On Mon, Mar 19, 2018 at 04:18:54PM +0000, Bart Van Assche wrote:
> The algorithm explained above does not depend on sched-rcu. But because
> percpu_ref_tryget_live() uses sched-rcu and because we need to add an RCU lock
> around that call we are forced to use sched-rcu. I hope this makes it clear.

This could be me being slow but can you explain how what
percpu_ref_tryget_live() uses internally affects whether we can use
regular RCU around it?

Thanks.
Bart Van Assche March 19, 2018, 4:57 p.m. UTC | #9
On Mon, 2018-03-19 at 09:29 -0700, tj@kernel.org wrote:
> This could be me being slow but can you explain how what

> percpu_ref_tryget_live() uses internally affects whether we can use

> regular RCU around it?


Hello Tejun,

For synchronization primitives that wait having a stronger synchronization
primitive nested inside a more relaxed one can lead to a deadlock. But since
the rcu read lock primitives do not wait it could be safe to use that kind
of nesting with RCU. Do you perhaps know whether any documentation is
available about that kind of nesting or whether it is already used elsewhere
in the kernel?

Thanks,

Bart.
Tejun Heo March 19, 2018, 5:02 p.m. UTC | #10
Hey,

On Mon, Mar 19, 2018 at 04:57:45PM +0000, Bart Van Assche wrote:
> For synchronization primitives that wait having a stronger synchronization
> primitive nested inside a more relaxed one can lead to a deadlock. But since
> the rcu read lock primitives do not wait it could be safe to use that kind
> of nesting with RCU. Do you perhaps know whether any documentation is
> available about that kind of nesting or whether it is already used elsewhere
> in the kernel?

Oh, we nest them all the time.  They're like (and sometimes literally
are) preempt_disable() and don't care about nest ordering.

Thanks.
Bart Van Assche March 19, 2018, 8:19 p.m. UTC | #11
On Mon, 2018-03-19 at 10:02 -0700, tj@kernel.org wrote:
> On Mon, Mar 19, 2018 at 04:57:45PM +0000, Bart Van Assche wrote:

> > For synchronization primitives that wait having a stronger synchronization

> > primitive nested inside a more relaxed one can lead to a deadlock. But since

> > the rcu read lock primitives do not wait it could be safe to use that kind

> > of nesting with RCU. Do you perhaps know whether any documentation is

> > available about that kind of nesting or whether it is already used elsewhere

> > in the kernel?

> 

> Oh, we nest them all the time.  They're like (and sometimes literally

> are) preempt_disable() and don't care about nest ordering.


Hello Martin,

This was probably already clear to you, but anyway: please drop the patch at the
start of this thread.

Thanks,

Bart.
diff mbox

Patch

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 1d83f29aee74..0b99ee2fbbb5 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -3014,7 +3014,7 @@  scsi_device_quiesce(struct scsi_device *sdev)
 	 * unfreeze even if the queue was already frozen before this function
 	 * was called. See also https://lwn.net/Articles/573497/.
 	 */
-	synchronize_rcu();
+	synchronize_sched();
 	blk_mq_unfreeze_queue(q);
 
 	mutex_lock(&sdev->state_mutex);