Message ID | 20170925202924.16603-2-bart.vanassche@wdc.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Mon, Sep 25, 2017 at 01:29:18PM -0700, Bart Van Assche wrote: > Some people use the md driver on laptops and use the suspend and > resume functionality. Since it is essential that submitting of > new I/O requests stops before device quiescing starts, make the > md resync and reshape threads freezable. As I explained, if SCSI quiesce is safe, this patch shouldn't be needed. The issue isn't MD specific, and in theory can be triggered on all devices. And you can see the I/O hang report on BTRFS(RAID) without MD involved: https://marc.info/?l=linux-block&m=150634883816965&w=2 So please remove this patch from the patchset. -- Ming
On Tue, 2017-09-26 at 07:04 +0800, Ming Lei wrote: > On Mon, Sep 25, 2017 at 01:29:18PM -0700, Bart Van Assche wrote: > > Some people use the md driver on laptops and use the suspend and > > resume functionality. Since it is essential that submitting of > > new I/O requests stops before device quiescing starts, make the > > md resync and reshape threads freezable. > > As I explained, if SCSI quiesce is safe, this patch shouldn't > be needed. > > The issue isn't MD specific, and in theory can be triggered > on all devices. And you can see the I/O hang report on BTRFS(RAID) > without MD involved: > > https://marc.info/?l=linux-block&m=150634883816965&w=2 What makes you think that this patch is not necessary once SCSI quiesce has been made safe? Does this mean that you have not tested suspend and resume while md RAID 1 resync was in progress? This patch is necessary to avoid that suspend locks up while md RAID 1 resync is in progress. Bart.
On Mon, Sep 25, 2017 at 11:09:15PM +0000, Bart Van Assche wrote: > On Tue, 2017-09-26 at 07:04 +0800, Ming Lei wrote: > > On Mon, Sep 25, 2017 at 01:29:18PM -0700, Bart Van Assche wrote: > > > Some people use the md driver on laptops and use the suspend and > > > resume functionality. Since it is essential that submitting of > > > new I/O requests stops before device quiescing starts, make the > > > md resync and reshape threads freezable. > > > > As I explained, if SCSI quiesce is safe, this patch shouldn't > > be needed. > > > > The issue isn't MD specific, and in theory can be triggered > > on all devices. And you can see the I/O hang report on BTRFS(RAID) > > without MD involved: > > > > https://marc.info/?l=linux-block&m=150634883816965&w=2 > > What makes you think that this patch is not necessary once SCSI quiesce > has been made safe? Does this mean that you have not tested suspend and If we want to make SCSI quiesce safe, we have to drain up all submitted I/O and prevent new I/O from being submitted, that is enough to deal with MD's resync too. > resume while md RAID 1 resync was in progress? This patch is necessary > to avoid that suspend locks up while md RAID 1 resync is in progress. I tested my patchset on RAID10 when resync in progress, not see any issue during suspend/resume, without any MD's change. I will test RAID1's later, but I don't think there is difference compared with RAID10 because my patchset can make the queue being quiesced totally. The upstream report did not mention that the resync is in progress: https://marc.info/?l=linux-block&m=150402198315951&w=2 I want to make the patchset(make SCSI quiesce safe) focusing on this SCSI quiesce issue. Maybe with this MD patch, the original report becomes quite difficult to reproduce, that doesn't mean this MD patch fixes this kind of issue. What I really cares is that your changes on BLOCK/SCSI can fix this issue.
On 09/25/2017 10:29 PM, Bart Van Assche wrote: > Some people use the md driver on laptops and use the suspend and > resume functionality. Since it is essential that submitting of > new I/O requests stops before device quiescing starts, make the > md resync and reshape threads freezable. > > Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> > Cc: Shaohua Li <shli@kernel.org> > Cc: linux-raid@vger.kernel.org > Cc: Ming Lei <ming.lei@redhat.com> > Cc: Christoph Hellwig <hch@lst.de> > Cc: Hannes Reinecke <hare@suse.com> > Cc: Johannes Thumshirn <jthumshirn@suse.de> > --- > drivers/md/md.c | 21 +++++++++++++-------- > 1 file changed, 13 insertions(+), 8 deletions(-) > Reviewed-by: Hannes Reinecke <hare@suse.com> Cheers, Hannes
On Tue, Sep 26, 2017 at 12:01:03PM +0800, Ming Lei wrote: > On Mon, Sep 25, 2017 at 11:09:15PM +0000, Bart Van Assche wrote: > > On Tue, 2017-09-26 at 07:04 +0800, Ming Lei wrote: > > > On Mon, Sep 25, 2017 at 01:29:18PM -0700, Bart Van Assche wrote: > > > > Some people use the md driver on laptops and use the suspend and > > > > resume functionality. Since it is essential that submitting of > > > > new I/O requests stops before device quiescing starts, make the > > > > md resync and reshape threads freezable. > > > > > > As I explained, if SCSI quiesce is safe, this patch shouldn't > > > be needed. > > > > > > The issue isn't MD specific, and in theory can be triggered > > > on all devices. And you can see the I/O hang report on BTRFS(RAID) > > > without MD involved: > > > > > > https://marc.info/?l=linux-block&m=150634883816965&w=2 > > > > What makes you think that this patch is not necessary once SCSI quiesce > > has been made safe? Does this mean that you have not tested suspend and > > If we want to make SCSI quiesce safe, we have to drain up all submitted > I/O and prevent new I/O from being submitted, that is enough to deal > with MD's resync too. > > > resume while md RAID 1 resync was in progress? This patch is necessary > > to avoid that suspend locks up while md RAID 1 resync is in progress. > > I tested my patchset on RAID10 when resync in progress, not see any > issue during suspend/resume, without any MD's change. I will test > RAID1's later, but I don't think there is difference compared with > RAID10 because my patchset can make the queue being quiesced totally. I am pretty sure that suspend/resume can survive when resync in progress with my patchset applied on RAID1, without any MD change. There are reports on suspend/resume on btrfs(RAID) and revalidate path in scsi_transport_spi device, so the issue isn't MD specific again. If your patchset depends on this MD change, something should be wrong in the following patches. Now I need to take a close look.
On Mon, Sep 25, 2017 at 01:29:18PM -0700, Bart Van Assche wrote: > Some people use the md driver on laptops and use the suspend and > resume functionality. Since it is essential that submitting of > new I/O requests stops before device quiescing starts, make the > md resync and reshape threads freezable. > > Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> > Cc: Shaohua Li <shli@kernel.org> > Cc: linux-raid@vger.kernel.org > Cc: Ming Lei <ming.lei@redhat.com> > Cc: Christoph Hellwig <hch@lst.de> > Cc: Hannes Reinecke <hare@suse.com> > Cc: Johannes Thumshirn <jthumshirn@suse.de> > --- > drivers/md/md.c | 21 +++++++++++++-------- > 1 file changed, 13 insertions(+), 8 deletions(-) > > diff --git a/drivers/md/md.c b/drivers/md/md.c > index 08fcaebc61bd..26a12bd0db65 100644 > --- a/drivers/md/md.c > +++ b/drivers/md/md.c > @@ -66,6 +66,7 @@ > #include <linux/raid/md_u.h> > #include <linux/slab.h> > #include <linux/percpu-refcount.h> > +#include <linux/freezer.h> > > #include <trace/events/block.h> > #include "md.h" > @@ -7424,6 +7425,7 @@ static int md_thread(void *arg) > */ > > allow_signal(SIGKILL); > + set_freezable(); > while (!kthread_should_stop()) { > > /* We need to wait INTERRUPTIBLE so that > @@ -7434,7 +7436,7 @@ static int md_thread(void *arg) > if (signal_pending(current)) > flush_signals(current); > > - wait_event_interruptible_timeout > + wait_event_freezable_timeout > (thread->wqueue, > test_bit(THREAD_WAKEUP, &thread->flags) > || kthread_should_stop() || kthread_should_park(), > @@ -8133,6 +8135,8 @@ void md_do_sync(struct md_thread *thread) > return; > } > > + set_freezable(); > + > if (mddev_is_clustered(mddev)) { > ret = md_cluster_ops->resync_start(mddev); > if (ret) > @@ -8324,7 +8328,7 @@ void md_do_sync(struct md_thread *thread) > mddev->curr_resync_completed > mddev->resync_max > )) { > /* time to update curr_resync_completed */ > - wait_event(mddev->recovery_wait, > + wait_event_freezable(mddev->recovery_wait, > atomic_read(&mddev->recovery_active) == 0); > mddev->curr_resync_completed = j; > if (test_bit(MD_RECOVERY_SYNC, &mddev->recovery) && > @@ -8342,10 +8346,10 @@ void md_do_sync(struct md_thread *thread) > * to avoid triggering warnings. > */ > flush_signals(current); /* just in case */ > - wait_event_interruptible(mddev->recovery_wait, > - mddev->resync_max > j > - || test_bit(MD_RECOVERY_INTR, > - &mddev->recovery)); > + wait_event_freezable(mddev->recovery_wait, > + mddev->resync_max > j || > + test_bit(MD_RECOVERY_INTR, > + &mddev->recovery)); > } > > if (test_bit(MD_RECOVERY_INTR, &mddev->recovery)) > @@ -8421,7 +8425,7 @@ void md_do_sync(struct md_thread *thread) > * Give other IO more of a chance. > * The faster the devices, the less we wait. > */ > - wait_event(mddev->recovery_wait, > + wait_event_freezable(mddev->recovery_wait, > !atomic_read(&mddev->recovery_active)); > } > } > @@ -8433,7 +8437,8 @@ void md_do_sync(struct md_thread *thread) > * this also signals 'finished resyncing' to md_stop > */ > blk_finish_plug(&plug); > - wait_event(mddev->recovery_wait, !atomic_read(&mddev->recovery_active)); > + wait_event_freezable(mddev->recovery_wait, > + !atomic_read(&mddev->recovery_active)); > > if (!test_bit(MD_RECOVERY_RESHAPE, &mddev->recovery) && > !test_bit(MD_RECOVERY_INTR, &mddev->recovery) && > -- > 2.14.1 > Just test this patch a bit and the following failure of freezing task is triggered during suspend: [ 38.903513] PM: suspend entry (deep) [ 38.904443] PM: Syncing filesystems ... done. [ 38.983591] Freezing user space processes ... (elapsed 0.002 seconds) done. [ 38.987522] OOM killer disabled. [ 38.987962] Freezing remaining freezable tasks ... [ 58.998872] Freezing of tasks failed after 20.008 seconds (1 tasks refusing to freeze, wq_busy=0): [ 59.002539] md127_resync D 0 1618 2 0x80000000 [ 59.004954] Call Trace: [ 59.006162] __schedule+0x41f/0xa50 [ 59.007704] schedule+0x3d/0x90 [ 59.009305] raid1_sync_request+0x2da/0xd10 [raid1] [ 59.011505] ? remove_wait_queue+0x70/0x70 [ 59.013352] md_do_sync+0xdfa/0x12c0 [ 59.014955] ? remove_wait_queue+0x70/0x70 [ 59.016336] md_thread+0x1a8/0x1e0 [ 59.016770] ? md_thread+0x1a8/0x1e0 [ 59.017250] kthread+0x155/0x190 [ 59.017662] ? sync_speed_show+0xa0/0xa0 [ 59.018217] ? kthread_create_on_node+0x70/0x70 [ 59.018858] ret_from_fork+0x2a/0x40 [ 59.019403] Restarting kernel threads ... done. [ 59.024586] OOM killer enabled. [ 59.025124] Restarting tasks ... done. [ 59.045906] PM: suspend exit [ 97.919428] systemd-journald[227]: Sent WATCHDOG=1 notification. [ 101.002695] md: md127: data-check done.
On Tue, 2017-09-26 at 16:13 +0800, Ming Lei wrote: > I am pretty sure that suspend/resume can survive when resync in progress > with my patchset applied on RAID1, without any MD change. The above shows that you do not understand how suspend and resume works. In the documents in the directory Documentation/power it is explained clearly that I/O must be stopped before the hibernation image is generation to avoid hard to repair filesystem corruption. Although md is not a filesystem I think this also applies to md since md keeps some state information in RAM and some state information on disk. It is essential that all that state information is in consistent. > If your patchset depends on this MD change, something should be wrong > in the following patches. Now I need to take a close look. The later patches that make SCSI quiesce and resume safe do not depend on this change. Bart.
On Tue, 2017-09-26 at 19:17 +0800, Ming Lei wrote: > Just test this patch a bit and the following failure of freezing task > is triggered during suspend: [ ... ] What kernel version did you start from and which patches were applied on top of that kernel? Only patch 1/7 or all seven patches? What storage configuration did you use in your test and what command(s) did you use to trigger suspend? Bart.
On Tue, Sep 26, 2017 at 02:42:07PM +0000, Bart Van Assche wrote: > On Tue, 2017-09-26 at 19:17 +0800, Ming Lei wrote: > > Just test this patch a bit and the following failure of freezing task > > is triggered during suspend: [ ... ] > > What kernel version did you start from and which patches were applied on top of > that kernel? Only patch 1/7 or all seven patches? What storage configuration did It is v4.14-rc1+, and top commit is 8d93c7a43157, with all your 7 patches applied. > you use in your test and what command(s) did you use to trigger suspend? Follows my pm test script: #!/bin/sh echo check > /sys/block/md127/md/sync_action mkfs.ext4 -F /dev/md127 mount /dev/md0 /mnt/data dd if=/dev/zero of=/mnt/data/d1.img bs=4k count=128k& echo 9 > /proc/sys/kernel/printk echo devices > /sys/power/pm_test echo mem > /sys/power/state wait umount /mnt/data Storage setting: sudo mdadm --create /dev/md/test /dev/sda /dev/sdb --level=1 --raid-devices=2 both /dev/sda and /dev/sdb are virtio-scsi.
On Tue, Sep 26, 2017 at 02:40:36PM +0000, Bart Van Assche wrote: > On Tue, 2017-09-26 at 16:13 +0800, Ming Lei wrote: > > I am pretty sure that suspend/resume can survive when resync in progress > > with my patchset applied on RAID1, without any MD change. > > The above shows that you do not understand how suspend and resume works. > In the documents in the directory Documentation/power it is explained clearly > that I/O must be stopped before the hibernation image is generation to avoid No, I don't use hibernation, and just use suspend/resume(s2r). > hard to repair filesystem corruption. Although md is not a filesystem I think > this also applies to md since md keeps some state information in RAM and some > state information on disk. It is essential that all that state information is > in consistent. > > > If your patchset depends on this MD change, something should be wrong > > in the following patches. Now I need to take a close look. > > The later patches that make SCSI quiesce and resume safe do not depend on > this change. Are you sure? If I remove the 1st patch, system suspend/resume will hang with all your other 6 patchset.
Independent of whether this should be required to make scsi quience
safe or not making the dm thread freeze aware is the right thing
to do, and this patch looks correct to me:
Reviewed-by: Christoph Hellwig <hch@lst.de>
diff --git a/drivers/md/md.c b/drivers/md/md.c index 08fcaebc61bd..26a12bd0db65 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -66,6 +66,7 @@ #include <linux/raid/md_u.h> #include <linux/slab.h> #include <linux/percpu-refcount.h> +#include <linux/freezer.h> #include <trace/events/block.h> #include "md.h" @@ -7424,6 +7425,7 @@ static int md_thread(void *arg) */ allow_signal(SIGKILL); + set_freezable(); while (!kthread_should_stop()) { /* We need to wait INTERRUPTIBLE so that @@ -7434,7 +7436,7 @@ static int md_thread(void *arg) if (signal_pending(current)) flush_signals(current); - wait_event_interruptible_timeout + wait_event_freezable_timeout (thread->wqueue, test_bit(THREAD_WAKEUP, &thread->flags) || kthread_should_stop() || kthread_should_park(), @@ -8133,6 +8135,8 @@ void md_do_sync(struct md_thread *thread) return; } + set_freezable(); + if (mddev_is_clustered(mddev)) { ret = md_cluster_ops->resync_start(mddev); if (ret) @@ -8324,7 +8328,7 @@ void md_do_sync(struct md_thread *thread) mddev->curr_resync_completed > mddev->resync_max )) { /* time to update curr_resync_completed */ - wait_event(mddev->recovery_wait, + wait_event_freezable(mddev->recovery_wait, atomic_read(&mddev->recovery_active) == 0); mddev->curr_resync_completed = j; if (test_bit(MD_RECOVERY_SYNC, &mddev->recovery) && @@ -8342,10 +8346,10 @@ void md_do_sync(struct md_thread *thread) * to avoid triggering warnings. */ flush_signals(current); /* just in case */ - wait_event_interruptible(mddev->recovery_wait, - mddev->resync_max > j - || test_bit(MD_RECOVERY_INTR, - &mddev->recovery)); + wait_event_freezable(mddev->recovery_wait, + mddev->resync_max > j || + test_bit(MD_RECOVERY_INTR, + &mddev->recovery)); } if (test_bit(MD_RECOVERY_INTR, &mddev->recovery)) @@ -8421,7 +8425,7 @@ void md_do_sync(struct md_thread *thread) * Give other IO more of a chance. * The faster the devices, the less we wait. */ - wait_event(mddev->recovery_wait, + wait_event_freezable(mddev->recovery_wait, !atomic_read(&mddev->recovery_active)); } } @@ -8433,7 +8437,8 @@ void md_do_sync(struct md_thread *thread) * this also signals 'finished resyncing' to md_stop */ blk_finish_plug(&plug); - wait_event(mddev->recovery_wait, !atomic_read(&mddev->recovery_active)); + wait_event_freezable(mddev->recovery_wait, + !atomic_read(&mddev->recovery_active)); if (!test_bit(MD_RECOVERY_RESHAPE, &mddev->recovery) && !test_bit(MD_RECOVERY_INTR, &mddev->recovery) &&
Some people use the md driver on laptops and use the suspend and resume functionality. Since it is essential that submitting of new I/O requests stops before device quiescing starts, make the md resync and reshape threads freezable. Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Cc: Shaohua Li <shli@kernel.org> Cc: linux-raid@vger.kernel.org Cc: Ming Lei <ming.lei@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Hannes Reinecke <hare@suse.com> Cc: Johannes Thumshirn <jthumshirn@suse.de> --- drivers/md/md.c | 21 +++++++++++++-------- 1 file changed, 13 insertions(+), 8 deletions(-)