Message ID | ece2b06f-d647-6613-a534-ff4c9bec1142@redhat.com (mailing list archive) |
---|---|
State | Superseded, archived |
Delegated to: | Mike Snitzer |
Headers | show |
Series | MD fixes for the LVM2 testsuite | expand |
On Wed, Jan 17, 2024 at 10:21 AM Mikulas Patocka <mpatocka@redhat.com> wrote: > > This commit fixes a deadlock in the LVM2 test > shell/lvconvert-raid-reshape-linear_to_raid6-single-type.sh I can reproduce this issue on the 6.7 kernel. However, 4/7 and 5/7 (without 1/7-3/7) cannot fix it. I will run more tests. Thanks, Song
Hi, 在 2024/01/18 2:21, Mikulas Patocka 写道: > This commit fixes a deadlock in the LVM2 test > shell/lvconvert-raid-reshape-linear_to_raid6-single-type.sh > > When MD_RECOVERY_WAIT is set or when md_is_rdwr(mddev) is true, the > function md_do_sync would not set MD_RECOVERY_DONE. Thus, stop_sync_thread > would wait for the flag MD_RECOVERY_DONE indefinitely. > > Also, md_wakeup_thread_directly does nothing if the thread is waiting in > md_thread on thread->wqueue (it wakes the thread up, the thread would > check THREAD_WAKEUP and go to sleep again without doing anything). So, > this commit introduces a call to md_wakeup_thread from > md_wakeup_thread_directly. > > task:lvm state:D stack:0 pid:46322 tgid:46322 ppid:46079 flags:0x00004002 > Call Trace: > <TASK> > __schedule+0x228/0x570 > schedule+0x29/0xa0 > schedule_timeout+0x6a/0xd0 > ? timer_shutdown_sync+0x10/0x10 > stop_sync_thread+0x197/0x1c0 [md_mod] > ? housekeeping_test_cpu+0x30/0x30 > ? table_deps+0x1b0/0x1b0 [dm_mod] > __md_stop_writes+0x10/0xd0 [md_mod] > md_stop_writes+0x18/0x30 [md_mod] > raid_postsuspend+0x32/0x40 [dm_raid] > dm_table_postsuspend_targets+0x34/0x50 [dm_mod] > dm_suspend+0xc4/0xd0 [dm_mod] > dev_suspend+0x186/0x2d0 [dm_mod] > ? table_deps+0x1b0/0x1b0 [dm_mod] > ctl_ioctl+0x2e1/0x570 [dm_mod] > dm_ctl_ioctl+0x5/0x10 [dm_mod] > __x64_sys_ioctl+0x85/0xa0 > do_syscall_64+0x5d/0x1a0 > entry_SYSCALL_64_after_hwframe+0x46/0x4e > > Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> > Fixes: f52f5c71f3d4 ("md: fix stopping sync thread") > Cc: stable@vger.kernel.org # v6.7 > > --- > drivers/md/md.c | 8 +++++++- > drivers/md/raid5.c | 4 ++++ > 2 files changed, 11 insertions(+), 1 deletion(-) > > Index: linux-2.6/drivers/md/md.c > =================================================================== > --- linux-2.6.orig/drivers/md/md.c > +++ linux-2.6/drivers/md/md.c > @@ -8029,6 +8029,8 @@ static void md_wakeup_thread_directly(st > if (t) > wake_up_process(t->tsk); > rcu_read_unlock(); > + > + md_wakeup_thread(thread); This is not correct. I already explained(already in comments) what md_wakeup_thread_directly() is supposed to do. > } > > void md_wakeup_thread(struct md_thread __rcu *thread) > @@ -8777,10 +8779,14 @@ void md_do_sync(struct md_thread *thread > > /* just incase thread restarts... */ > if (test_bit(MD_RECOVERY_DONE, &mddev->recovery) || > - test_bit(MD_RECOVERY_WAIT, &mddev->recovery)) > + test_bit(MD_RECOVERY_WAIT, &mddev->recovery)) { > + if (test_bit(MD_RECOVERY_INTR, &mddev->recovery)) > + set_bit(MD_RECOVERY_DONE, &mddev->recovery); If you set MD_RECOVERY_DONE here, sync_thread will be unregistered, I don't think this is the expected behaviour. Only dm-raid is using this flag, and rs_start_reshape() already explains that it wants sync_thread to work later until the table gets reloaded. > return; > + } > if (!md_is_rdwr(mddev)) {/* never try to sync a read-only array */ > set_bit(MD_RECOVERY_INTR, &mddev->recovery); > + set_bit(MD_RECOVERY_DONE, &mddev->recovery); This change looks reasonable. Thanks, Kuai > return; > } > > > . >
Index: linux-2.6/drivers/md/md.c =================================================================== --- linux-2.6.orig/drivers/md/md.c +++ linux-2.6/drivers/md/md.c @@ -8029,6 +8029,8 @@ static void md_wakeup_thread_directly(st if (t) wake_up_process(t->tsk); rcu_read_unlock(); + + md_wakeup_thread(thread); } void md_wakeup_thread(struct md_thread __rcu *thread) @@ -8777,10 +8779,14 @@ void md_do_sync(struct md_thread *thread /* just incase thread restarts... */ if (test_bit(MD_RECOVERY_DONE, &mddev->recovery) || - test_bit(MD_RECOVERY_WAIT, &mddev->recovery)) + test_bit(MD_RECOVERY_WAIT, &mddev->recovery)) { + if (test_bit(MD_RECOVERY_INTR, &mddev->recovery)) + set_bit(MD_RECOVERY_DONE, &mddev->recovery); return; + } if (!md_is_rdwr(mddev)) {/* never try to sync a read-only array */ set_bit(MD_RECOVERY_INTR, &mddev->recovery); + set_bit(MD_RECOVERY_DONE, &mddev->recovery); return; }
This commit fixes a deadlock in the LVM2 test shell/lvconvert-raid-reshape-linear_to_raid6-single-type.sh When MD_RECOVERY_WAIT is set or when md_is_rdwr(mddev) is true, the function md_do_sync would not set MD_RECOVERY_DONE. Thus, stop_sync_thread would wait for the flag MD_RECOVERY_DONE indefinitely. Also, md_wakeup_thread_directly does nothing if the thread is waiting in md_thread on thread->wqueue (it wakes the thread up, the thread would check THREAD_WAKEUP and go to sleep again without doing anything). So, this commit introduces a call to md_wakeup_thread from md_wakeup_thread_directly. task:lvm state:D stack:0 pid:46322 tgid:46322 ppid:46079 flags:0x00004002 Call Trace: <TASK> __schedule+0x228/0x570 schedule+0x29/0xa0 schedule_timeout+0x6a/0xd0 ? timer_shutdown_sync+0x10/0x10 stop_sync_thread+0x197/0x1c0 [md_mod] ? housekeeping_test_cpu+0x30/0x30 ? table_deps+0x1b0/0x1b0 [dm_mod] __md_stop_writes+0x10/0xd0 [md_mod] md_stop_writes+0x18/0x30 [md_mod] raid_postsuspend+0x32/0x40 [dm_raid] dm_table_postsuspend_targets+0x34/0x50 [dm_mod] dm_suspend+0xc4/0xd0 [dm_mod] dev_suspend+0x186/0x2d0 [dm_mod] ? table_deps+0x1b0/0x1b0 [dm_mod] ctl_ioctl+0x2e1/0x570 [dm_mod] dm_ctl_ioctl+0x5/0x10 [dm_mod] __x64_sys_ioctl+0x85/0xa0 do_syscall_64+0x5d/0x1a0 entry_SYSCALL_64_after_hwframe+0x46/0x4e Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Fixes: f52f5c71f3d4 ("md: fix stopping sync thread") Cc: stable@vger.kernel.org # v6.7 --- drivers/md/md.c | 8 +++++++- drivers/md/raid5.c | 4 ++++ 2 files changed, 11 insertions(+), 1 deletion(-)