diff mbox

[v4,1/7] md: Make md resync and reshape threads freezable

Message ID 20170925202924.16603-2-bart.vanassche@wdc.com (mailing list archive)
State New, archived
Headers show

Commit Message

Bart Van Assche Sept. 25, 2017, 8:29 p.m. UTC
Some people use the md driver on laptops and use the suspend and
resume functionality. Since it is essential that submitting of
new I/O requests stops before device quiescing starts, make the
md resync and reshape threads freezable.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Shaohua Li <shli@kernel.org>
Cc: linux-raid@vger.kernel.org
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
---
 drivers/md/md.c | 21 +++++++++++++--------
 1 file changed, 13 insertions(+), 8 deletions(-)

Comments

Ming Lei Sept. 25, 2017, 11:04 p.m. UTC | #1
On Mon, Sep 25, 2017 at 01:29:18PM -0700, Bart Van Assche wrote:
> Some people use the md driver on laptops and use the suspend and
> resume functionality. Since it is essential that submitting of
> new I/O requests stops before device quiescing starts, make the
> md resync and reshape threads freezable.

As I explained, if SCSI quiesce is safe, this patch shouldn't
be needed.

The issue isn't MD specific, and in theory can be triggered
on all devices. And you can see the I/O hang report on BTRFS(RAID)
without MD involved:

	https://marc.info/?l=linux-block&m=150634883816965&w=2

So please remove this patch from the patchset.

--
Ming
Bart Van Assche Sept. 25, 2017, 11:09 p.m. UTC | #2
On Tue, 2017-09-26 at 07:04 +0800, Ming Lei wrote:
> On Mon, Sep 25, 2017 at 01:29:18PM -0700, Bart Van Assche wrote:

> > Some people use the md driver on laptops and use the suspend and

> > resume functionality. Since it is essential that submitting of

> > new I/O requests stops before device quiescing starts, make the

> > md resync and reshape threads freezable.

> 

> As I explained, if SCSI quiesce is safe, this patch shouldn't

> be needed.

> 

> The issue isn't MD specific, and in theory can be triggered

> on all devices. And you can see the I/O hang report on BTRFS(RAID)

> without MD involved:

> 

> 	https://marc.info/?l=linux-block&m=150634883816965&w=2


What makes you think that this patch is not necessary once SCSI quiesce
has been made safe? Does this mean that you have not tested suspend and
resume while md RAID 1 resync was in progress? This patch is necessary
to avoid that suspend locks up while md RAID 1 resync is in progress.

Bart.
Ming Lei Sept. 26, 2017, 4:01 a.m. UTC | #3
On Mon, Sep 25, 2017 at 11:09:15PM +0000, Bart Van Assche wrote:
> On Tue, 2017-09-26 at 07:04 +0800, Ming Lei wrote:
> > On Mon, Sep 25, 2017 at 01:29:18PM -0700, Bart Van Assche wrote:
> > > Some people use the md driver on laptops and use the suspend and
> > > resume functionality. Since it is essential that submitting of
> > > new I/O requests stops before device quiescing starts, make the
> > > md resync and reshape threads freezable.
> > 
> > As I explained, if SCSI quiesce is safe, this patch shouldn't
> > be needed.
> > 
> > The issue isn't MD specific, and in theory can be triggered
> > on all devices. And you can see the I/O hang report on BTRFS(RAID)
> > without MD involved:
> > 
> > 	https://marc.info/?l=linux-block&m=150634883816965&w=2
> 
> What makes you think that this patch is not necessary once SCSI quiesce
> has been made safe? Does this mean that you have not tested suspend and

If we want to make SCSI quiesce safe, we have to drain up all submitted
I/O and prevent new I/O from being submitted, that is enough to deal
with MD's resync too.

> resume while md RAID 1 resync was in progress? This patch is necessary
> to avoid that suspend locks up while md RAID 1 resync is in progress.

I tested my patchset on RAID10 when resync in progress, not see any
issue during suspend/resume, without any MD's change. I will test
RAID1's later, but I don't think there is difference compared with
RAID10 because my patchset can make the queue being quiesced totally.

The upstream report did not mention that the resync is in progress:

	https://marc.info/?l=linux-block&m=150402198315951&w=2

I want to make the patchset(make SCSI quiesce safe) focusing on this
SCSI quiesce issue.

Maybe with this MD patch, the original report becomes quite difficult
to reproduce, that doesn't mean this MD patch fixes this kind of
issue. What I really cares is that your changes on BLOCK/SCSI can
fix this issue.
Hannes Reinecke Sept. 26, 2017, 6:06 a.m. UTC | #4
On 09/25/2017 10:29 PM, Bart Van Assche wrote:
> Some people use the md driver on laptops and use the suspend and
> resume functionality. Since it is essential that submitting of
> new I/O requests stops before device quiescing starts, make the
> md resync and reshape threads freezable.
> 
> Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
> Cc: Shaohua Li <shli@kernel.org>
> Cc: linux-raid@vger.kernel.org
> Cc: Ming Lei <ming.lei@redhat.com>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Hannes Reinecke <hare@suse.com>
> Cc: Johannes Thumshirn <jthumshirn@suse.de>
> ---
>  drivers/md/md.c | 21 +++++++++++++--------
>  1 file changed, 13 insertions(+), 8 deletions(-)
> 
Reviewed-by: Hannes Reinecke <hare@suse.com>

Cheers,

Hannes
Ming Lei Sept. 26, 2017, 8:13 a.m. UTC | #5
On Tue, Sep 26, 2017 at 12:01:03PM +0800, Ming Lei wrote:
> On Mon, Sep 25, 2017 at 11:09:15PM +0000, Bart Van Assche wrote:
> > On Tue, 2017-09-26 at 07:04 +0800, Ming Lei wrote:
> > > On Mon, Sep 25, 2017 at 01:29:18PM -0700, Bart Van Assche wrote:
> > > > Some people use the md driver on laptops and use the suspend and
> > > > resume functionality. Since it is essential that submitting of
> > > > new I/O requests stops before device quiescing starts, make the
> > > > md resync and reshape threads freezable.
> > > 
> > > As I explained, if SCSI quiesce is safe, this patch shouldn't
> > > be needed.
> > > 
> > > The issue isn't MD specific, and in theory can be triggered
> > > on all devices. And you can see the I/O hang report on BTRFS(RAID)
> > > without MD involved:
> > > 
> > > 	https://marc.info/?l=linux-block&m=150634883816965&w=2
> > 
> > What makes you think that this patch is not necessary once SCSI quiesce
> > has been made safe? Does this mean that you have not tested suspend and
> 
> If we want to make SCSI quiesce safe, we have to drain up all submitted
> I/O and prevent new I/O from being submitted, that is enough to deal
> with MD's resync too.
> 
> > resume while md RAID 1 resync was in progress? This patch is necessary
> > to avoid that suspend locks up while md RAID 1 resync is in progress.
> 
> I tested my patchset on RAID10 when resync in progress, not see any
> issue during suspend/resume, without any MD's change. I will test
> RAID1's later, but I don't think there is difference compared with
> RAID10 because my patchset can make the queue being quiesced totally.

I am pretty sure that suspend/resume can survive when resync in progress
with my patchset applied on RAID1, without any MD change.

There are reports on suspend/resume on btrfs(RAID) and revalidate path
in scsi_transport_spi device, so the issue isn't MD specific again.

If your patchset depends on this MD change, something should be wrong
in the following patches. Now I need to take a close look.
Ming Lei Sept. 26, 2017, 11:17 a.m. UTC | #6
On Mon, Sep 25, 2017 at 01:29:18PM -0700, Bart Van Assche wrote:
> Some people use the md driver on laptops and use the suspend and
> resume functionality. Since it is essential that submitting of
> new I/O requests stops before device quiescing starts, make the
> md resync and reshape threads freezable.
> 
> Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
> Cc: Shaohua Li <shli@kernel.org>
> Cc: linux-raid@vger.kernel.org
> Cc: Ming Lei <ming.lei@redhat.com>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Hannes Reinecke <hare@suse.com>
> Cc: Johannes Thumshirn <jthumshirn@suse.de>
> ---
>  drivers/md/md.c | 21 +++++++++++++--------
>  1 file changed, 13 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 08fcaebc61bd..26a12bd0db65 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -66,6 +66,7 @@
>  #include <linux/raid/md_u.h>
>  #include <linux/slab.h>
>  #include <linux/percpu-refcount.h>
> +#include <linux/freezer.h>
>  
>  #include <trace/events/block.h>
>  #include "md.h"
> @@ -7424,6 +7425,7 @@ static int md_thread(void *arg)
>  	 */
>  
>  	allow_signal(SIGKILL);
> +	set_freezable();
>  	while (!kthread_should_stop()) {
>  
>  		/* We need to wait INTERRUPTIBLE so that
> @@ -7434,7 +7436,7 @@ static int md_thread(void *arg)
>  		if (signal_pending(current))
>  			flush_signals(current);
>  
> -		wait_event_interruptible_timeout
> +		wait_event_freezable_timeout
>  			(thread->wqueue,
>  			 test_bit(THREAD_WAKEUP, &thread->flags)
>  			 || kthread_should_stop() || kthread_should_park(),
> @@ -8133,6 +8135,8 @@ void md_do_sync(struct md_thread *thread)
>  		return;
>  	}
>  
> +	set_freezable();
> +
>  	if (mddev_is_clustered(mddev)) {
>  		ret = md_cluster_ops->resync_start(mddev);
>  		if (ret)
> @@ -8324,7 +8328,7 @@ void md_do_sync(struct md_thread *thread)
>  		     mddev->curr_resync_completed > mddev->resync_max
>  			    )) {
>  			/* time to update curr_resync_completed */
> -			wait_event(mddev->recovery_wait,
> +			wait_event_freezable(mddev->recovery_wait,
>  				   atomic_read(&mddev->recovery_active) == 0);
>  			mddev->curr_resync_completed = j;
>  			if (test_bit(MD_RECOVERY_SYNC, &mddev->recovery) &&
> @@ -8342,10 +8346,10 @@ void md_do_sync(struct md_thread *thread)
>  			 * to avoid triggering warnings.
>  			 */
>  			flush_signals(current); /* just in case */
> -			wait_event_interruptible(mddev->recovery_wait,
> -						 mddev->resync_max > j
> -						 || test_bit(MD_RECOVERY_INTR,
> -							     &mddev->recovery));
> +			wait_event_freezable(mddev->recovery_wait,
> +					     mddev->resync_max > j ||
> +					     test_bit(MD_RECOVERY_INTR,
> +						      &mddev->recovery));
>  		}
>  
>  		if (test_bit(MD_RECOVERY_INTR, &mddev->recovery))
> @@ -8421,7 +8425,7 @@ void md_do_sync(struct md_thread *thread)
>  				 * Give other IO more of a chance.
>  				 * The faster the devices, the less we wait.
>  				 */
> -				wait_event(mddev->recovery_wait,
> +				wait_event_freezable(mddev->recovery_wait,
>  					   !atomic_read(&mddev->recovery_active));
>  			}
>  		}
> @@ -8433,7 +8437,8 @@ void md_do_sync(struct md_thread *thread)
>  	 * this also signals 'finished resyncing' to md_stop
>  	 */
>  	blk_finish_plug(&plug);
> -	wait_event(mddev->recovery_wait, !atomic_read(&mddev->recovery_active));
> +	wait_event_freezable(mddev->recovery_wait,
> +			     !atomic_read(&mddev->recovery_active));
>  
>  	if (!test_bit(MD_RECOVERY_RESHAPE, &mddev->recovery) &&
>  	    !test_bit(MD_RECOVERY_INTR, &mddev->recovery) &&
> -- 
> 2.14.1
> 

Just test this patch a bit and the following failure of freezing task
is triggered during suspend:

[   38.903513] PM: suspend entry (deep)
[   38.904443] PM: Syncing filesystems ... done.
[   38.983591] Freezing user space processes ... (elapsed 0.002 seconds) done.
[   38.987522] OOM killer disabled.
[   38.987962] Freezing remaining freezable tasks ...
[   58.998872] Freezing of tasks failed after 20.008 seconds (1 tasks refusing to freeze, wq_busy=0):
[   59.002539] md127_resync    D    0  1618      2 0x80000000
[   59.004954] Call Trace:
[   59.006162]  __schedule+0x41f/0xa50
[   59.007704]  schedule+0x3d/0x90
[   59.009305]  raid1_sync_request+0x2da/0xd10 [raid1]
[   59.011505]  ? remove_wait_queue+0x70/0x70
[   59.013352]  md_do_sync+0xdfa/0x12c0
[   59.014955]  ? remove_wait_queue+0x70/0x70
[   59.016336]  md_thread+0x1a8/0x1e0
[   59.016770]  ? md_thread+0x1a8/0x1e0
[   59.017250]  kthread+0x155/0x190
[   59.017662]  ? sync_speed_show+0xa0/0xa0
[   59.018217]  ? kthread_create_on_node+0x70/0x70
[   59.018858]  ret_from_fork+0x2a/0x40
[   59.019403] Restarting kernel threads ... done.
[   59.024586] OOM killer enabled.
[   59.025124] Restarting tasks ... done.
[   59.045906] PM: suspend exit
[   97.919428] systemd-journald[227]: Sent WATCHDOG=1 notification.
[  101.002695] md: md127: data-check done.
Bart Van Assche Sept. 26, 2017, 2:40 p.m. UTC | #7
On Tue, 2017-09-26 at 16:13 +0800, Ming Lei wrote:
> I am pretty sure that suspend/resume can survive when resync in progress

> with my patchset applied on RAID1, without any MD change.


The above shows that you do not understand how suspend and resume works.
In the documents in the directory Documentation/power it is explained clearly
that I/O must be stopped before the hibernation image is generation to avoid
hard to repair filesystem corruption. Although md is not a filesystem I think
this also applies to md since md keeps some state information in RAM and some
state information on disk. It is essential that all that state information is
in consistent.

> If your patchset depends on this MD change, something should be wrong

> in the following patches. Now I need to take a close look.


The later patches that make SCSI quiesce and resume safe do not depend on
this change.

Bart.
Bart Van Assche Sept. 26, 2017, 2:42 p.m. UTC | #8
On Tue, 2017-09-26 at 19:17 +0800, Ming Lei wrote:
> Just test this patch a bit and the following failure of freezing task

> is triggered during suspend: [ ... ]


What kernel version did you start from and which patches were applied on top of
that kernel? Only patch 1/7 or all seven patches? What storage configuration did
you use in your test and what command(s) did you use to trigger suspend?

Bart.
Ming Lei Sept. 26, 2017, 2:59 p.m. UTC | #9
On Tue, Sep 26, 2017 at 02:42:07PM +0000, Bart Van Assche wrote:
> On Tue, 2017-09-26 at 19:17 +0800, Ming Lei wrote:
> > Just test this patch a bit and the following failure of freezing task
> > is triggered during suspend: [ ... ]
> 
> What kernel version did you start from and which patches were applied on top of
> that kernel? Only patch 1/7 or all seven patches? What storage configuration did

It is v4.14-rc1+, and top commit is 8d93c7a43157, with all your 7 patches
applied.

> you use in your test and what command(s) did you use to trigger suspend?

Follows my pm test script:

	#!/bin/sh
	
	echo check > /sys/block/md127/md/sync_action
	
	mkfs.ext4 -F /dev/md127
	
	mount /dev/md0 /mnt/data
	
	dd if=/dev/zero of=/mnt/data/d1.img bs=4k count=128k&
	
	echo 9 > /proc/sys/kernel/printk
	echo devices > /sys/power/pm_test
	echo mem > /sys/power/state
	
	wait
	umount /mnt/data

Storage setting:

	sudo mdadm --create /dev/md/test /dev/sda /dev/sdb --level=1 --raid-devices=2
	both /dev/sda and /dev/sdb are virtio-scsi.
Ming Lei Sept. 26, 2017, 3:02 p.m. UTC | #10
On Tue, Sep 26, 2017 at 02:40:36PM +0000, Bart Van Assche wrote:
> On Tue, 2017-09-26 at 16:13 +0800, Ming Lei wrote:
> > I am pretty sure that suspend/resume can survive when resync in progress
> > with my patchset applied on RAID1, without any MD change.
> 
> The above shows that you do not understand how suspend and resume works.
> In the documents in the directory Documentation/power it is explained clearly
> that I/O must be stopped before the hibernation image is generation to avoid

No, I don't use hibernation, and just use suspend/resume(s2r).

> hard to repair filesystem corruption. Although md is not a filesystem I think
> this also applies to md since md keeps some state information in RAM and some
> state information on disk. It is essential that all that state information is
> in consistent.
> 
> > If your patchset depends on this MD change, something should be wrong
> > in the following patches. Now I need to take a close look.
> 
> The later patches that make SCSI quiesce and resume safe do not depend on
> this change.

Are you sure?

If I remove the 1st patch, system suspend/resume will hang with all your
other 6 patchset.
Christoph Hellwig Oct. 2, 2017, 1:26 p.m. UTC | #11
Independent of whether this should be required to make scsi quience
safe or not making the dm thread freeze aware is the right thing
to do, and this patch looks correct to me:

Reviewed-by: Christoph Hellwig <hch@lst.de>
diff mbox

Patch

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 08fcaebc61bd..26a12bd0db65 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -66,6 +66,7 @@ 
 #include <linux/raid/md_u.h>
 #include <linux/slab.h>
 #include <linux/percpu-refcount.h>
+#include <linux/freezer.h>
 
 #include <trace/events/block.h>
 #include "md.h"
@@ -7424,6 +7425,7 @@  static int md_thread(void *arg)
 	 */
 
 	allow_signal(SIGKILL);
+	set_freezable();
 	while (!kthread_should_stop()) {
 
 		/* We need to wait INTERRUPTIBLE so that
@@ -7434,7 +7436,7 @@  static int md_thread(void *arg)
 		if (signal_pending(current))
 			flush_signals(current);
 
-		wait_event_interruptible_timeout
+		wait_event_freezable_timeout
 			(thread->wqueue,
 			 test_bit(THREAD_WAKEUP, &thread->flags)
 			 || kthread_should_stop() || kthread_should_park(),
@@ -8133,6 +8135,8 @@  void md_do_sync(struct md_thread *thread)
 		return;
 	}
 
+	set_freezable();
+
 	if (mddev_is_clustered(mddev)) {
 		ret = md_cluster_ops->resync_start(mddev);
 		if (ret)
@@ -8324,7 +8328,7 @@  void md_do_sync(struct md_thread *thread)
 		     mddev->curr_resync_completed > mddev->resync_max
 			    )) {
 			/* time to update curr_resync_completed */
-			wait_event(mddev->recovery_wait,
+			wait_event_freezable(mddev->recovery_wait,
 				   atomic_read(&mddev->recovery_active) == 0);
 			mddev->curr_resync_completed = j;
 			if (test_bit(MD_RECOVERY_SYNC, &mddev->recovery) &&
@@ -8342,10 +8346,10 @@  void md_do_sync(struct md_thread *thread)
 			 * to avoid triggering warnings.
 			 */
 			flush_signals(current); /* just in case */
-			wait_event_interruptible(mddev->recovery_wait,
-						 mddev->resync_max > j
-						 || test_bit(MD_RECOVERY_INTR,
-							     &mddev->recovery));
+			wait_event_freezable(mddev->recovery_wait,
+					     mddev->resync_max > j ||
+					     test_bit(MD_RECOVERY_INTR,
+						      &mddev->recovery));
 		}
 
 		if (test_bit(MD_RECOVERY_INTR, &mddev->recovery))
@@ -8421,7 +8425,7 @@  void md_do_sync(struct md_thread *thread)
 				 * Give other IO more of a chance.
 				 * The faster the devices, the less we wait.
 				 */
-				wait_event(mddev->recovery_wait,
+				wait_event_freezable(mddev->recovery_wait,
 					   !atomic_read(&mddev->recovery_active));
 			}
 		}
@@ -8433,7 +8437,8 @@  void md_do_sync(struct md_thread *thread)
 	 * this also signals 'finished resyncing' to md_stop
 	 */
 	blk_finish_plug(&plug);
-	wait_event(mddev->recovery_wait, !atomic_read(&mddev->recovery_active));
+	wait_event_freezable(mddev->recovery_wait,
+			     !atomic_read(&mddev->recovery_active));
 
 	if (!test_bit(MD_RECOVERY_RESHAPE, &mddev->recovery) &&
 	    !test_bit(MD_RECOVERY_INTR, &mddev->recovery) &&