Message ID | 1682985701-19914-1-git-send-email-zhouzhouyi@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [V2] rcu/torture replace wait_event with wait_event_idle | expand |
On Tue, May 02, 2023 at 08:01:41AM +0800, zhouzhouyi@gmail.com wrote: > From: Zhouyi Zhou <zhouzhouyi@gmail.com> > > In kfree_rcu_test, kfree_scale_shutdown will be detected as hung task > if kfree_loops is too big. Replace wait_event with wait_event_idle > to avoid false positive. > > Tested in the PPC VM of Open Source Lab of Oregon State University. > > Suggested-by: Paul E. McKenney <paulmck@kernel.org> > Signed-off-by: Zhouyi Zhou <zhouzhouyi@gmail.com> Good catch, thank you! However, this commit beat you to it: ef1ef3d47677 ("rcuscale: Move shutdown from wait_event() to wait_event_idle()") Thanx, Paul > --- > kernel/rcu/rcuscale.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/kernel/rcu/rcuscale.c b/kernel/rcu/rcuscale.c > index 91fb5905a008..d99c586939d1 100644 > --- a/kernel/rcu/rcuscale.c > +++ b/kernel/rcu/rcuscale.c > @@ -771,7 +771,7 @@ kfree_scale_cleanup(void) > static int > kfree_scale_shutdown(void *arg) > { > - wait_event(shutdown_wq, > + wait_event_idle(shutdown_wq, > atomic_read(&n_kfree_scale_thread_ended) >= kfree_nrealthreads); > > smp_mb(); /* Wake before output. */ > -- > 2.34.1 >
On Tue, May 2, 2023 at 9:40 PM Paul E. McKenney <paulmck@kernel.org> wrote: > > On Tue, May 02, 2023 at 08:01:41AM +0800, zhouzhouyi@gmail.com wrote: > > From: Zhouyi Zhou <zhouzhouyi@gmail.com> > > > > In kfree_rcu_test, kfree_scale_shutdown will be detected as hung task > > if kfree_loops is too big. Replace wait_event with wait_event_idle > > to avoid false positive. > > > > Tested in the PPC VM of Open Source Lab of Oregon State University. > > > > Suggested-by: Paul E. McKenney <paulmck@kernel.org> > > Signed-off-by: Zhouyi Zhou <zhouzhouyi@gmail.com> > > Good catch, thank you! > > However, this commit beat you to it: > > ef1ef3d47677 ("rcuscale: Move shutdown from wait_event() to wait_event_idle()") You are very welcome ;-) Still, this is a very fruitful learning process for me ;-) Cheers Zhouyi > > Thanx, Paul > > > --- > > kernel/rcu/rcuscale.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/kernel/rcu/rcuscale.c b/kernel/rcu/rcuscale.c > > index 91fb5905a008..d99c586939d1 100644 > > --- a/kernel/rcu/rcuscale.c > > +++ b/kernel/rcu/rcuscale.c > > @@ -771,7 +771,7 @@ kfree_scale_cleanup(void) > > static int > > kfree_scale_shutdown(void *arg) > > { > > - wait_event(shutdown_wq, > > + wait_event_idle(shutdown_wq, > > atomic_read(&n_kfree_scale_thread_ended) >= kfree_nrealthreads); > > > > smp_mb(); /* Wake before output. */ > > -- > > 2.34.1 > >
> On May 2, 2023, at 9:50 AM, Zhouyi Zhou <zhouzhouyi@gmail.com> wrote: > > On Tue, May 2, 2023 at 9:40 PM Paul E. McKenney <paulmck@kernel.org> wrote: >> >>> On Tue, May 02, 2023 at 08:01:41AM +0800, zhouzhouyi@gmail.com wrote: >>> From: Zhouyi Zhou <zhouzhouyi@gmail.com> >>> >>> In kfree_rcu_test, kfree_scale_shutdown will be detected as hung task >>> if kfree_loops is too big. Replace wait_event with wait_event_idle >>> to avoid false positive. >>> >>> Tested in the PPC VM of Open Source Lab of Oregon State University. >>> >>> Suggested-by: Paul E. McKenney <paulmck@kernel.org> >>> Signed-off-by: Zhouyi Zhou <zhouzhouyi@gmail.com> >> >> Good catch, thank you! >> >> However, this commit beat you to it: >> >> ef1ef3d47677 ("rcuscale: Move shutdown from wait_event() to wait_event_idle()") > You are very welcome ;-) Still, this is a very fruitful learning > process for me ;-) Speaking of learning, can you explain why it fixes the issue? ;-) Your change log lacked the real reason but that is Ok since change logs can only tell you so much. I admit I myself did not know the reason till I read some code. Quiz: why exactly does this all work out in the end even though the hung task detector saw it hung? Thanks, - Joel > > Cheers > Zhouyi >> >> Thanx, Paul >> >>> --- >>> kernel/rcu/rcuscale.c | 2 +- >>> 1 file changed, 1 insertion(+), 1 deletion(-) >>> >>> diff --git a/kernel/rcu/rcuscale.c b/kernel/rcu/rcuscale.c >>> index 91fb5905a008..d99c586939d1 100644 >>> --- a/kernel/rcu/rcuscale.c >>> +++ b/kernel/rcu/rcuscale.c >>> @@ -771,7 +771,7 @@ kfree_scale_cleanup(void) >>> static int >>> kfree_scale_shutdown(void *arg) >>> { >>> - wait_event(shutdown_wq, >>> + wait_event_idle(shutdown_wq, >>> atomic_read(&n_kfree_scale_thread_ended) >= kfree_nrealthreads); >>> >>> smp_mb(); /* Wake before output. */ >>> -- >>> 2.34.1 >>>
Thank Joel for your guidance ;-) On Tue, May 2, 2023 at 11:53 PM Joel Fernandes <joel@joelfernandes.org> wrote: > > > > > On May 2, 2023, at 9:50 AM, Zhouyi Zhou <zhouzhouyi@gmail.com> wrote: > > > > On Tue, May 2, 2023 at 9:40 PM Paul E. McKenney <paulmck@kernel.org> wrote: > >> > >>> On Tue, May 02, 2023 at 08:01:41AM +0800, zhouzhouyi@gmail.com wrote: > >>> From: Zhouyi Zhou <zhouzhouyi@gmail.com> > >>> > >>> In kfree_rcu_test, kfree_scale_shutdown will be detected as hung task > >>> if kfree_loops is too big. Replace wait_event with wait_event_idle > >>> to avoid false positive. > >>> > >>> Tested in the PPC VM of Open Source Lab of Oregon State University. > >>> > >>> Suggested-by: Paul E. McKenney <paulmck@kernel.org> > >>> Signed-off-by: Zhouyi Zhou <zhouzhouyi@gmail.com> > >> > >> Good catch, thank you! > >> > >> However, this commit beat you to it: > >> > >> ef1ef3d47677 ("rcuscale: Move shutdown from wait_event() to wait_event_idle()") > > You are very welcome ;-) Still, this is a very fruitful learning > > process for me ;-) > > Speaking of learning, can you explain why it fixes the issue? ;-) > > Your change log lacked the real reason but that is Ok since change logs can only tell you so much. I admit I myself did not know the reason till I read some code. When I found "blocked for more than" report in console.log, I did a grep in kernel source tree, and found check_hung_task will only be invoked from check_hung_uninterruptible_tasks when (state & TASK_UNINTERRUPTIBLE) and !(state & TASK_WAKEKILL) and !(state & TASK_NOLOAD), so I send my first patch [1], and Paul tell me wait_event_idle is the right answer, then I do a grep for recursively expansion of macro wait_event_idle, which finally set current->state to TASK_IDLE ((TASK_UNINTERRUPTIBLE | TASK_NOLOAD)), then I conclude that check_hung_task will not be called. Yes, _interruptible() variant indicates that wakeups due to things like POSIX signals are permitted, which are not our case. I should investigate more on what wait_event_idle really means, instead of merely grep and debug what wait_event_idle does. I should write my change log more clearly and explain why wait_event_idle works. I discovered [1] during the testing of the patch, this is also a very fruitful learning process. I will write my change log more indicative next time, and do better work to our RCU community (and thanks for remove linux-kernel from CC list ;-)) Thanks Zhouyi [1] Link: https://lore.kernel.org/lkml/CAABZP2yS5=ZUwEZQ7iHkV0wDm_HgO8K-TeAhyJrZhavzKDa44Q@mail.gmail.com/T/ On Tue, May 2, 2023 at 11:53 PM Joel Fernandes <joel@joelfernandes.org> wrote: > > > > > On May 2, 2023, at 9:50 AM, Zhouyi Zhou <zhouzhouyi@gmail.com> wrote: > > > > On Tue, May 2, 2023 at 9:40 PM Paul E. McKenney <paulmck@kernel.org> wrote: > >> > >>> On Tue, May 02, 2023 at 08:01:41AM +0800, zhouzhouyi@gmail.com wrote: > >>> From: Zhouyi Zhou <zhouzhouyi@gmail.com> > >>> > >>> In kfree_rcu_test, kfree_scale_shutdown will be detected as hung task > >>> if kfree_loops is too big. Replace wait_event with wait_event_idle > >>> to avoid false positive. > >>> > >>> Tested in the PPC VM of Open Source Lab of Oregon State University. > >>> > >>> Suggested-by: Paul E. McKenney <paulmck@kernel.org> > >>> Signed-off-by: Zhouyi Zhou <zhouzhouyi@gmail.com> > >> > >> Good catch, thank you! > >> > >> However, this commit beat you to it: > >> > >> ef1ef3d47677 ("rcuscale: Move shutdown from wait_event() to wait_event_idle()") > > You are very welcome ;-) Still, this is a very fruitful learning > > process for me ;-) > > Speaking of learning, can you explain why it fixes the issue? ;-) > > Your change log lacked the real reason but that is Ok since change logs can only tell you so much. I admit I myself did not know the reason till I read some code. When I found "task xxx blocked for more than" report in console.log, I did a grep in kernel source tree, and found check_hung_task will only be invoked from check_hung_uninterruptible_tasks when (state & TASK_UNINTERRUPTIBLE) and !(state & TASK_NOLOAD), so I send my first patch [1], and Paul tell me wait_event_idle is the right answer, then I do a grep for recursively expansion of macro wait_event_idle, which finally set current->state to TASK_IDLE ((TASK_UNINTERRUPTIBLE | TASK_NOLOAD)), then I conclude that [1] https://lore.kernel.org/lkml/1681264483-5208-1-git-send-email-zhouzhouyi@gmail.com/T/ > > Quiz: why exactly does this all work out in the end even though the hung task detector saw it hung? > > Thanks, > > - Joel > > > > > > Cheers > > Zhouyi > >> > >> Thanx, Paul > >> > >>> --- > >>> kernel/rcu/rcuscale.c | 2 +- > >>> 1 file changed, 1 insertion(+), 1 deletion(-) > >>> > >>> diff --git a/kernel/rcu/rcuscale.c b/kernel/rcu/rcuscale.c > >>> index 91fb5905a008..d99c586939d1 100644 > >>> --- a/kernel/rcu/rcuscale.c > >>> +++ b/kernel/rcu/rcuscale.c > >>> @@ -771,7 +771,7 @@ kfree_scale_cleanup(void) > >>> static int > >>> kfree_scale_shutdown(void *arg) > >>> { > >>> - wait_event(shutdown_wq, > >>> + wait_event_idle(shutdown_wq, > >>> atomic_read(&n_kfree_scale_thread_ended) >= kfree_nrealthreads); > >>> > >>> smp_mb(); /* Wake before output. */ > >>> -- > >>> 2.34.1 > >>> [1] https://lore.kernel.org/lkml/1681264483-5208-1-git-send-email-zhouzhouyi@gmail.com/T/ > > Quiz: why exactly does this all work out in the end even though the hung task detector saw it hung? > > Thanks, > > - Joel > > > > > > Cheers > > Zhouyi > >> > >> Thanx, Paul > >> > >>> --- > >>> kernel/rcu/rcuscale.c | 2 +- > >>> 1 file changed, 1 insertion(+), 1 deletion(-) > >>> > >>> diff --git a/kernel/rcu/rcuscale.c b/kernel/rcu/rcuscale.c > >>> index 91fb5905a008..d99c586939d1 100644 > >>> --- a/kernel/rcu/rcuscale.c > >>> +++ b/kernel/rcu/rcuscale.c > >>> @@ -771,7 +771,7 @@ kfree_scale_cleanup(void) > >>> static int > >>> kfree_scale_shutdown(void *arg) > >>> { > >>> - wait_event(shutdown_wq, > >>> + wait_event_idle(shutdown_wq, > >>> atomic_read(&n_kfree_scale_thread_ended) >= kfree_nrealthreads); > >>> > >>> smp_mb(); /* Wake before output. */ > >>> -- > >>> 2.34.1 > >>>
Sorry for the bad format of my previous email, there is always something wrong when I press Ctrl+C and Ctrl+V in my browser. Sorry for the inconvenience that I bring. On Tue, May 2, 2023 at 11:53 PM Joel Fernandes <joel@joelfernandes.org> wrote: > > > > > On May 2, 2023, at 9:50 AM, Zhouyi Zhou <zhouzhouyi@gmail.com> wrote: > > > > On Tue, May 2, 2023 at 9:40 PM Paul E. McKenney <paulmck@kernel.org> wrote: > >> > >>> On Tue, May 02, 2023 at 08:01:41AM +0800, zhouzhouyi@gmail.com wrote: > >>> From: Zhouyi Zhou <zhouzhouyi@gmail.com> > >>> > >>> In kfree_rcu_test, kfree_scale_shutdown will be detected as hung task > >>> if kfree_loops is too big. Replace wait_event with wait_event_idle > >>> to avoid false positive. > >>> > >>> Tested in the PPC VM of Open Source Lab of Oregon State University. > >>> > >>> Suggested-by: Paul E. McKenney <paulmck@kernel.org> > >>> Signed-off-by: Zhouyi Zhou <zhouzhouyi@gmail.com> > >> > >> Good catch, thank you! > >> > >> However, this commit beat you to it: > >> > >> ef1ef3d47677 ("rcuscale: Move shutdown from wait_event() to wait_event_idle()") > > You are very welcome ;-) Still, this is a very fruitful learning > > process for me ;-) > > Speaking of learning, can you explain why it fixes the issue? ;-) > > Your change log lacked the real reason but that is Ok since change logs can only tell you so much. I admit I myself did not know the reason till I read some code. > > Quiz: why exactly does this all work out in the end even though the hung task detector saw it hung? When I found "blocked for more than" report in console.log, I did a grep in kernel source tree, and found check_hung_task will only be invoked from check_hung_uninterruptible_tas ks when (state & TASK_UNINTERRUPTIBLE) and !(state & TASK_WAKEKILL) and !(state & TASK_NOLOAD), so I send my first patch [1], and Paul tell me wait_event_idle is the right answer, then I do a grep for recursively expansion of macro wait_event_idle, which finally set current->state to TASK_IDLE ((TASK_UNINTERRUPTIBLE | TASK_NOLOAD)), then I conclude that check_hung_task will not be called. Yes, _interruptible() variant indicates that wakeups due to things like POSIX signals are permitted, which are not our case. I should investigate more on what wait_event_idle really means, instead of merely grep and debug what wait_event_idle does. I should write my change log more clearly and explain why wait_event_idle works. I discovered [2] during the testing of the patch, this is also a very fruitful learning process. I will write my change log more indicative next time, and do better work to our RCU community (and thanks for remove linux-kernel from CC list ;-)) Thanks Zhouyi [1] https://lore.kernel.org/lkml/1681264483-5208-1-git-send-email-zhouzhouyi@gmail.com/T/ [2] https://lore.kernel.org/lkml/CAABZP2yS5=ZUwEZQ7iHkV0wDm_HgO8K-TeAhyJrZhavzKDa44Q@mail.gmail.com/T/ > Thanks, > > - Joel > > > > > > Cheers > > Zhouyi > >> > >> Thanx, Paul > >> > >>> --- > >>> kernel/rcu/rcuscale.c | 2 +- > >>> 1 file changed, 1 insertion(+), 1 deletion(-) > >>> > >>> diff --git a/kernel/rcu/rcuscale.c b/kernel/rcu/rcuscale.c > >>> index 91fb5905a008..d99c586939d1 100644 > >>> --- a/kernel/rcu/rcuscale.c > >>> +++ b/kernel/rcu/rcuscale.c > >>> @@ -771,7 +771,7 @@ kfree_scale_cleanup(void) > >>> static int > >>> kfree_scale_shutdown(void *arg) > >>> { > >>> - wait_event(shutdown_wq, > >>> + wait_event_idle(shutdown_wq, > >>> atomic_read(&n_kfree_scale_thread_ended) >= kfree_nrealthreads); > >>> > >>> smp_mb(); /* Wake before output. */ > >>> -- > >>> 2.34.1 > >>>
On Tue, May 2, 2023 at 12:30 PM Zhouyi Zhou <zhouzhouyi@gmail.com> wrote: > > Sorry for the bad format of my previous email, there is always > something wrong when I press Ctrl+C and Ctrl+V in my browser. > Sorry for the inconvenience that I bring. > > On Tue, May 2, 2023 at 11:53 PM Joel Fernandes <joel@joelfernandes.org> wrote: > > > > > > > > > On May 2, 2023, at 9:50 AM, Zhouyi Zhou <zhouzhouyi@gmail.com> wrote: > > > > > > On Tue, May 2, 2023 at 9:40 PM Paul E. McKenney <paulmck@kernel.org> wrote: > > >> > > >>> On Tue, May 02, 2023 at 08:01:41AM +0800, zhouzhouyi@gmail.com wrote: > > >>> From: Zhouyi Zhou <zhouzhouyi@gmail.com> > > >>> > > >>> In kfree_rcu_test, kfree_scale_shutdown will be detected as hung task > > >>> if kfree_loops is too big. Replace wait_event with wait_event_idle > > >>> to avoid false positive. > > >>> > > >>> Tested in the PPC VM of Open Source Lab of Oregon State University. > > >>> > > >>> Suggested-by: Paul E. McKenney <paulmck@kernel.org> > > >>> Signed-off-by: Zhouyi Zhou <zhouzhouyi@gmail.com> > > >> > > >> Good catch, thank you! > > >> > > >> However, this commit beat you to it: > > >> > > >> ef1ef3d47677 ("rcuscale: Move shutdown from wait_event() to wait_event_idle()") > > > You are very welcome ;-) Still, this is a very fruitful learning > > > process for me ;-) > > > > Speaking of learning, can you explain why it fixes the issue? ;-) > > > > Your change log lacked the real reason but that is Ok since change logs can only tell you so much. I admit I myself did not know the reason till I read some code. > > > > Quiz: why exactly does this all work out in the end even though the hung task detector saw it hung? > When I found "blocked for more than" report in console.log, I did a > grep in kernel source tree, and found check_hung_task will only be > invoked > from check_hung_uninterruptible_tas > ks when (state & TASK_UNINTERRUPTIBLE) and !(state & TASK_WAKEKILL) > and !(state & > TASK_NOLOAD), so I send my first patch [1], and Paul tell me > wait_event_idle is the right answer, then I do a grep for recursively > expansion of macro wait_event_idle, which finally set current->state > to TASK_IDLE ((TASK_UNINTERRUPTIBLE | TASK_NOLOAD)), then I conclude > that check_hung_task will not be called. Yes, _interruptible() > variant indicates that wakeups due to things like POSIX signals are > permitted, which are not our case. That sounds right! > I should investigate more on what wait_event_idle really means, > instead of merely grep and debug what wait_event_idle does. > I should write my change log more clearly and explain why wait_event_idle works. Sounds good, the header comments around those APIs should also help. > > I discovered [2] during the testing of the patch, this is also a very > fruitful learning process. > > I will write my change log more indicative next time, and do better > work to our RCU community (and thanks for remove linux-kernel from CC > list ;-)) Sounds good, have fun! :) - Joel
diff --git a/kernel/rcu/rcuscale.c b/kernel/rcu/rcuscale.c index 91fb5905a008..d99c586939d1 100644 --- a/kernel/rcu/rcuscale.c +++ b/kernel/rcu/rcuscale.c @@ -771,7 +771,7 @@ kfree_scale_cleanup(void) static int kfree_scale_shutdown(void *arg) { - wait_event(shutdown_wq, + wait_event_idle(shutdown_wq, atomic_read(&n_kfree_scale_thread_ended) >= kfree_nrealthreads); smp_mb(); /* Wake before output. */