diff mbox

Boot failure on emev2/kzm9d (was: Re: [PATCH v2 11/11] mm/slab: lockless decision to grow cache)

Message ID 20160629164415.GG4650@linux.vnet.ibm.com (mailing list archive)
State New, archived
Headers show

Commit Message

Paul E. McKenney June 29, 2016, 4:44 p.m. UTC
On Wed, Jun 29, 2016 at 04:54:44PM +0200, Geert Uytterhoeven wrote:
> Hi Paul,
> 
> On Thu, Jun 23, 2016 at 4:53 AM, Paul E. McKenney
> <paulmck@linux.vnet.ibm.com> wrote:
> > On Wed, Jun 22, 2016 at 07:47:42PM -0700, Paul E. McKenney wrote:

[ . . . ]

> > @@ -4720,11 +4720,18 @@ static void __init rcu_dump_rcu_node_tree(struct rcu_state *rsp)
> >                         pr_info(" ");
> >                         level = rnp->level;
> >                 }
> > -               pr_cont("%d:%d ^%d  ", rnp->grplo, rnp->grphi, rnp->grpnum);
> > +               pr_cont("%d:%d/%#lx/%#lx ^%d  ", rnp->grplo, rnp->grphi,
> > +                       rnp->qsmask,
> > +                       rnp->qsmaskinit | rnp->qsmaskinitnext, rnp->grpnum);
> >         }
> >         pr_cont("\n");
> >  }
> 
> For me it always crashes during the 37th call of synchronize_sched() in
> setup_kmem_cache_node(), which is the first call after secondary CPU bring up.
> With your and my debug code, I get:
> 
>   CPU: Testing write buffer coherency: ok
>   CPU0: thread -1, cpu 0, socket 0, mpidr 80000000
>   Setting up static identity map for 0x40100000 - 0x40100058
>   cnt = 36, sync
>   CPU1: thread -1, cpu 1, socket 0, mpidr 80000001
>   Brought up 2 CPUs
>   SMP: Total of 2 processors activated (2132.00 BogoMIPS).
>   CPU: All CPU(s) started in SVC mode.
>   rcu_node tree layout dump
>    0:1/0x0/0x3 ^0

Thank you for running this!

OK, so RCU knows about both CPUs (the "0x3"), and the previous
grace period has seen quiescent states from both of them (the "0x0").
That would indicate that your synchronize_sched() showed up when RCU was
idle, so it had to start a new grace period.  It also rules out failure
modes where RCU thinks that there are more CPUs than really exist.
(Don't laugh, such things have really happened.)

>   devtmpfs: initialized
>   VFP support v0.3: implementor 41 architecture 3 part 30 variant 9 rev 1
>   clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff,
> max_idle_ns: 19112604462750000 ns
> 
> I hope it helps. Thanks!

I am going to guess that this was the first grace period since the second
CPU came online.  When there only on CPU online, synchronize_sched()
is a no-op.

OK, this showed some things that aren't a problem.  What might the
problem be?

o	The grace-period kthread has not yet started.  It -should- start
	at early_initcall() time, but who knows?  Adding code to print
	out that kthread's task_struct address.

o	The grace-period kthread might not be responding to wakeups.
	Checking this requires that a grace period be in progress,
	so please put a call_rcu_sched() just before the call to
	rcu_dump_rcu_node_tree().  (Sample code below.)  Adding code
	to my patch to print out more GP-kthread state as well.

o	One of the CPUs might not be responding to RCU.  That -should-
	result in an RCU CPU stall warning, so I will ignore this
	possibility for the moment.

	That said, do you have some way to determine whether scheduling
	clock interrupts are really happening?  Without these interrupts,
	no RCU CPU stall warnings.

OK, that should be enough for the next phase, please see the end for the
patch.  This patch applies on top of my previous one.

Could you please set this up as follows?

	struct rcu_head rh;

	rcu_dump_rcu_node_tree(&rcu_sched_state);  /* Initial state. */
	call_rcu(&rh, do_nothing_cb);
	schedule_timeout_uninterruptible(5 * HZ);  /* Or whatever delay. */
	rcu_dump_rcu_node_tree(&rcu_sched_state);  /* GP state. */
	synchronize_sched();  /* Probably hangs. */
	rcu_barrier();  /* Drop RCU's references to rh before return. */

							Thanx, Paul

------------------------------------------------------------------------

commit 82829ec76c2c0de18874a60ebd7ff8ee80f244d1
Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Date:   Wed Jun 29 09:42:13 2016 -0700

    rcu: More diagnostics
    
    Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

Comments

Geert Uytterhoeven June 29, 2016, 5:52 p.m. UTC | #1
Hi Paul,

On Wed, Jun 29, 2016 at 6:44 PM, Paul E. McKenney
<paulmck@linux.vnet.ibm.com> wrote:
> On Wed, Jun 29, 2016 at 04:54:44PM +0200, Geert Uytterhoeven wrote:
>> On Thu, Jun 23, 2016 at 4:53 AM, Paul E. McKenney
>> <paulmck@linux.vnet.ibm.com> wrote:
>> > On Wed, Jun 22, 2016 at 07:47:42PM -0700, Paul E. McKenney wrote:
>
> [ . . . ]
>
>> > @@ -4720,11 +4720,18 @@ static void __init rcu_dump_rcu_node_tree(struct rcu_state *rsp)
>> >                         pr_info(" ");
>> >                         level = rnp->level;
>> >                 }
>> > -               pr_cont("%d:%d ^%d  ", rnp->grplo, rnp->grphi, rnp->grpnum);
>> > +               pr_cont("%d:%d/%#lx/%#lx ^%d  ", rnp->grplo, rnp->grphi,
>> > +                       rnp->qsmask,
>> > +                       rnp->qsmaskinit | rnp->qsmaskinitnext, rnp->grpnum);
>> >         }
>> >         pr_cont("\n");
>> >  }
>>
>> For me it always crashes during the 37th call of synchronize_sched() in
>> setup_kmem_cache_node(), which is the first call after secondary CPU bring up.
>> With your and my debug code, I get:
>>
>>   CPU: Testing write buffer coherency: ok
>>   CPU0: thread -1, cpu 0, socket 0, mpidr 80000000
>>   Setting up static identity map for 0x40100000 - 0x40100058
>>   cnt = 36, sync
>>   CPU1: thread -1, cpu 1, socket 0, mpidr 80000001
>>   Brought up 2 CPUs
>>   SMP: Total of 2 processors activated (2132.00 BogoMIPS).
>>   CPU: All CPU(s) started in SVC mode.
>>   rcu_node tree layout dump
>>    0:1/0x0/0x3 ^0
>
> Thank you for running this!
>
> OK, so RCU knows about both CPUs (the "0x3"), and the previous
> grace period has seen quiescent states from both of them (the "0x0").
> That would indicate that your synchronize_sched() showed up when RCU was
> idle, so it had to start a new grace period.  It also rules out failure
> modes where RCU thinks that there are more CPUs than really exist.
> (Don't laugh, such things have really happened.)
>
>>   devtmpfs: initialized
>>   VFP support v0.3: implementor 41 architecture 3 part 30 variant 9 rev 1
>>   clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff,
>> max_idle_ns: 19112604462750000 ns
>>
>> I hope it helps. Thanks!
>
> I am going to guess that this was the first grace period since the second
> CPU came online.  When there only on CPU online, synchronize_sched()
> is a no-op.
>
> OK, this showed some things that aren't a problem.  What might the
> problem be?
>
> o       The grace-period kthread has not yet started.  It -should- start
>         at early_initcall() time, but who knows?  Adding code to print
>         out that kthread's task_struct address.
>
> o       The grace-period kthread might not be responding to wakeups.
>         Checking this requires that a grace period be in progress,
>         so please put a call_rcu_sched() just before the call to
>         rcu_dump_rcu_node_tree().  (Sample code below.)  Adding code
>         to my patch to print out more GP-kthread state as well.
>
> o       One of the CPUs might not be responding to RCU.  That -should-
>         result in an RCU CPU stall warning, so I will ignore this
>         possibility for the moment.
>
>         That said, do you have some way to determine whether scheduling
>         clock interrupts are really happening?  Without these interrupts,
>         no RCU CPU stall warnings.

I believe there are no clocksources yet. The jiffies clocksource is the first
clocksource found, and that happens after the first call to
synchronize_sched(), cfr. my dmesg snippet above.

In a working boot:
# cat /sys/bus/clocksource/devices/clocksource0/available_clocksource
e0180000.timer jiffies
# cat /sys/bus/clocksource/devices/clocksource0/current_clocksource
e0180000.timer

> OK, that should be enough for the next phase, please see the end for the
> patch.  This patch applies on top of my previous one.
>
> Could you please set this up as follows?
>
>         struct rcu_head rh;
>
>         rcu_dump_rcu_node_tree(&rcu_sched_state);  /* Initial state. */
>         call_rcu(&rh, do_nothing_cb);

I added an empty do_nothing_cb() for this:

    static void do_nothing_cb(struct rcu_head *rcu_head)
    {
    }

According to the debugging technique "comment everything out until it boots",
it now hangs in call_rcu().

>         schedule_timeout_uninterruptible(5 * HZ);  /* Or whatever delay. */
>         rcu_dump_rcu_node_tree(&rcu_sched_state);  /* GP state. */
>         synchronize_sched();  /* Probably hangs. */
>         rcu_barrier();  /* Drop RCU's references to rh before return. */

Thanks!

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds
Paul E. McKenney June 29, 2016, 6:12 p.m. UTC | #2
On Wed, Jun 29, 2016 at 07:52:06PM +0200, Geert Uytterhoeven wrote:
> Hi Paul,
> 
> On Wed, Jun 29, 2016 at 6:44 PM, Paul E. McKenney
> <paulmck@linux.vnet.ibm.com> wrote:
> > On Wed, Jun 29, 2016 at 04:54:44PM +0200, Geert Uytterhoeven wrote:
> >> On Thu, Jun 23, 2016 at 4:53 AM, Paul E. McKenney
> >> <paulmck@linux.vnet.ibm.com> wrote:
> >> > On Wed, Jun 22, 2016 at 07:47:42PM -0700, Paul E. McKenney wrote:
> >
> > [ . . . ]
> >
> >> > @@ -4720,11 +4720,18 @@ static void __init rcu_dump_rcu_node_tree(struct rcu_state *rsp)
> >> >                         pr_info(" ");
> >> >                         level = rnp->level;
> >> >                 }
> >> > -               pr_cont("%d:%d ^%d  ", rnp->grplo, rnp->grphi, rnp->grpnum);
> >> > +               pr_cont("%d:%d/%#lx/%#lx ^%d  ", rnp->grplo, rnp->grphi,
> >> > +                       rnp->qsmask,
> >> > +                       rnp->qsmaskinit | rnp->qsmaskinitnext, rnp->grpnum);
> >> >         }
> >> >         pr_cont("\n");
> >> >  }
> >>
> >> For me it always crashes during the 37th call of synchronize_sched() in
> >> setup_kmem_cache_node(), which is the first call after secondary CPU bring up.
> >> With your and my debug code, I get:
> >>
> >>   CPU: Testing write buffer coherency: ok
> >>   CPU0: thread -1, cpu 0, socket 0, mpidr 80000000
> >>   Setting up static identity map for 0x40100000 - 0x40100058
> >>   cnt = 36, sync
> >>   CPU1: thread -1, cpu 1, socket 0, mpidr 80000001
> >>   Brought up 2 CPUs
> >>   SMP: Total of 2 processors activated (2132.00 BogoMIPS).
> >>   CPU: All CPU(s) started in SVC mode.
> >>   rcu_node tree layout dump
> >>    0:1/0x0/0x3 ^0
> >
> > Thank you for running this!
> >
> > OK, so RCU knows about both CPUs (the "0x3"), and the previous
> > grace period has seen quiescent states from both of them (the "0x0").
> > That would indicate that your synchronize_sched() showed up when RCU was
> > idle, so it had to start a new grace period.  It also rules out failure
> > modes where RCU thinks that there are more CPUs than really exist.
> > (Don't laugh, such things have really happened.)
> >
> >>   devtmpfs: initialized
> >>   VFP support v0.3: implementor 41 architecture 3 part 30 variant 9 rev 1
> >>   clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff,
> >> max_idle_ns: 19112604462750000 ns
> >>
> >> I hope it helps. Thanks!
> >
> > I am going to guess that this was the first grace period since the second
> > CPU came online.  When there only on CPU online, synchronize_sched()
> > is a no-op.
> >
> > OK, this showed some things that aren't a problem.  What might the
> > problem be?
> >
> > o       The grace-period kthread has not yet started.  It -should- start
> >         at early_initcall() time, but who knows?  Adding code to print
> >         out that kthread's task_struct address.
> >
> > o       The grace-period kthread might not be responding to wakeups.
> >         Checking this requires that a grace period be in progress,
> >         so please put a call_rcu_sched() just before the call to
> >         rcu_dump_rcu_node_tree().  (Sample code below.)  Adding code
> >         to my patch to print out more GP-kthread state as well.
> >
> > o       One of the CPUs might not be responding to RCU.  That -should-
> >         result in an RCU CPU stall warning, so I will ignore this
> >         possibility for the moment.
> >
> >         That said, do you have some way to determine whether scheduling
> >         clock interrupts are really happening?  Without these interrupts,
> >         no RCU CPU stall warnings.
> 
> I believe there are no clocksources yet. The jiffies clocksource is the first
> clocksource found, and that happens after the first call to
> synchronize_sched(), cfr. my dmesg snippet above.
> 
> In a working boot:
> # cat /sys/bus/clocksource/devices/clocksource0/available_clocksource
> e0180000.timer jiffies
> # cat /sys/bus/clocksource/devices/clocksource0/current_clocksource
> e0180000.timer

Ah!  But if there is no jiffies clocksource, then schedule_timeout()
and friends will never return, correct?  If so, I guarantee you that
synchronize_sched() will unconditionally hang.

So if I understand correctly, the fix is to get the jiffies clocksource
running before the first call to synchronize_sched().

							Thanx, Paul

> > OK, that should be enough for the next phase, please see the end for the
> > patch.  This patch applies on top of my previous one.
> >
> > Could you please set this up as follows?
> >
> >         struct rcu_head rh;
> >
> >         rcu_dump_rcu_node_tree(&rcu_sched_state);  /* Initial state. */
> >         call_rcu(&rh, do_nothing_cb);
> 
> I added an empty do_nothing_cb() for this:
> 
>     static void do_nothing_cb(struct rcu_head *rcu_head)
>     {
>     }
> 
> According to the debugging technique "comment everything out until it boots",
> it now hangs in call_rcu().
> 
> >         schedule_timeout_uninterruptible(5 * HZ);  /* Or whatever delay. */
> >         rcu_dump_rcu_node_tree(&rcu_sched_state);  /* GP state. */
> >         synchronize_sched();  /* Probably hangs. */
> >         rcu_barrier();  /* Drop RCU's references to rh before return. */
> 
> Thanks!
> 
> Gr{oetje,eeting}s,
> 
>                         Geert
> 
> --
> Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org
> 
> In personal conversations with technical people, I call myself a hacker. But
> when I'm talking to journalists I just say "programmer" or something like that.
>                                 -- Linus Torvalds
>
diff mbox

Patch

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 2eda7bece401..ff55c569473c 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -4712,6 +4712,11 @@  static void rcu_dump_rcu_node_tree(struct rcu_state *rsp)
 	int level = 0;
 	struct rcu_node *rnp;
 
+	pr_info("RCU: %s GP kthread: %p state: %d flags: %#x g:%ld c:%ld\n",
+		rsp->name, rsp->gp_kthread, rsp->gp_state, rsp->gp_flags,
+		(long)rsp->gpnum, (long)rsp->completed);
+	pr_info("     jiffies: %#lx  GP start: %#lx Last GP activity: %#lx\n",
+		jiffies, rsp->gp_start, rsp->gp_activity);
 	pr_info("rcu_node tree layout dump\n");
 	pr_info(" ");
 	rcu_for_each_node_breadth_first(rsp, rnp) {