Message ID | 1401045323-7116-1-git-send-email-minyard@acm.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Sunday 25 May 2014 14:15:23 minyard@acm.org wrote: > From: Corey Minyard <cminyard@mvista.com> > > The CPU will go to idle with interrupts off, but the interrupts > will wake up the idle. This was causing very long irqsoff trace > values because, basically, the whole idle time was traces with > irqs off, even though they weren't really off. Rework the idle > code to turn hardirq tracing to on befor calling idle. > > Signed-off-by: Corey Minyard <cminyard@mvista.com> > --- > arch/arm/kernel/process.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > I'm not sure this is correct for all ARM boards, but it fixes the > issue for the Vexpress board I have. No more 4 second irqsoff > times. If this patch is correct for ARM, I wonder if it should be done in architecture independent code instead. Do you see the same thing on x86? If not, do you know what's different? It seems like the right thing to do, I just don't understand why nobody hit this before. Arnd
On 05/26/2014 04:26 AM, Arnd Bergmann wrote: > On Sunday 25 May 2014 14:15:23 minyard@acm.org wrote: >> From: Corey Minyard <cminyard@mvista.com> >> >> The CPU will go to idle with interrupts off, but the interrupts >> will wake up the idle. This was causing very long irqsoff trace >> values because, basically, the whole idle time was traces with >> irqs off, even though they weren't really off. Rework the idle >> code to turn hardirq tracing to on befor calling idle. >> >> Signed-off-by: Corey Minyard <cminyard@mvista.com> >> --- >> arch/arm/kernel/process.c | 3 ++- >> 1 file changed, 2 insertions(+), 1 deletion(-) >> >> I'm not sure this is correct for all ARM boards, but it fixes the >> issue for the Vexpress board I have. No more 4 second irqsoff >> times. > If this patch is correct for ARM, I wonder if it should be done > in architecture independent code instead. Do you see the same > thing on x86? If not, do you know what's different? MIPS doesn't idle with IRQs disabled. x86 calls trace_cpu_idle_rcuidle, but it doesn't re-enable interrupts at the end of the idle routine. Adding those trace calls to ARM doesn't seem to make any difference, though. > It seems like the right thing to do, I just don't understand > why nobody hit this before. Yeah, I'm a little confused by that, too. The RT guys use ARM extensively, why haven't they seen this? Of course, if you run your RT tests under heavy load, you won't see this since you will never go idle. -corey
On Tuesday 27 May 2014, Corey Minyard wrote: > On 05/26/2014 04:26 AM, Arnd Bergmann wrote: > > On Sunday 25 May 2014 14:15:23 minyard@acm.org wrote: > >> From: Corey Minyard <cminyard@mvista.com> > >> > >> The CPU will go to idle with interrupts off, but the interrupts > >> will wake up the idle. This was causing very long irqsoff trace > >> values because, basically, the whole idle time was traces with > >> irqs off, even though they weren't really off. Rework the idle > >> code to turn hardirq tracing to on befor calling idle. > >> > >> Signed-off-by: Corey Minyard <cminyard@mvista.com> > >> --- > >> arch/arm/kernel/process.c | 3 ++- > >> 1 file changed, 2 insertions(+), 1 deletion(-) > >> > >> I'm not sure this is correct for all ARM boards, but it fixes the > >> issue for the Vexpress board I have. No more 4 second irqsoff > >> times. > > If this patch is correct for ARM, I wonder if it should be done > > in architecture independent code instead. Do you see the same > > thing on x86? If not, do you know what's different? > > MIPS doesn't idle with IRQs disabled. Looking at the mips code, I see a couple of variants of the idle function: * r3081_wait(), r39xx_wait(), and rm7k_wait_irqoff() correctly sleep before enabling the IRQs * r4k_wait first enables the IRQs and then waits, while a second r4k_wait_irqoff() variant does the opposite, and a comment explains that the first is wrong but the second may be broken on some implementations. * can't tell what au1k_wait() does, but it also seems to sleep first. > x86 calls trace_cpu_idle_rcuidle, but it doesn't re-enable interrupts at > the end of the idle routine. Adding those trace calls to ARM doesn't > seem to make any difference, though. trace_cpu_idle_rcuidle() is unrelated, I also checked that. The comment in x86 arch_safe_halt() says "Used in the idle loop; sti takes one instruction cycle" and the implementation is "sti; hlt", where "sti" enables the interrupts at the same time as "hlt" puts the cpu in low-power mode. I don't understand what happens when ACPI is used, but I would it also ends up calling mwait or hlt with IRQs off. Can you try booting an x86 machine with acpi=off? > > It seems like the right thing to do, I just don't understand > > why nobody hit this before. > > Yeah, I'm a little confused by that, too. The RT guys use ARM > extensively, why haven't they seen this? Of course, if you run your RT > tests under heavy load, you won't see this since you will never go idle. How exactly do you see this manifest? If it's clear that the trace comes from the idle function, maybe everybody just ignores it? Arnd
On 26.05.2014 11:26, Arnd Bergmann wrote: > It seems like the right thing to do, I just don't understand > why nobody hit this before. Looks like this is what I did hit a month ago and was not able to find the culprit: http://www.spinics.net/lists/linux-rt-users/msg11656.html > How exactly do you see this manifest? If it's clear > that the trace comes from the idle function, maybe > everybody just ignores it? That was definitely what I did ;)
On Tuesday 27 May 2014 18:38:33 Stanislav Meduna wrote: > On 26.05.2014 11:26, Arnd Bergmann wrote: > > > It seems like the right thing to do, I just don't understand > > why nobody hit this before. > > Looks like this is what I did hit a month ago and > was not able to find the culprit: > http://www.spinics.net/lists/linux-rt-users/msg11656.html Ok, so you get a 2.2ms latency on i.mx28 here, which comes from exactly the same code location, and is quite alarming by itself, but much less so than the 4 second latency that Corey is seeing. This could just be an artifact of i.mx28 not staying idle for as long as vexpress. > > How exactly do you see this manifest? If it's clear > > that the trace comes from the idle function, maybe > > everybody just ignores it? > > That was definitely what I did You also commented in that thread about stop_critical_timings()/ start_critical_timings(). Corey, can you look at that, too? I think it's designed to avoid the issue you are seeing but for some reason doesn't. Maybe the problem is that the tracing infrastructure gets confused when the irq_disable() happens in a different function from irq_enable(). Arnd
On 05/27/2014 11:16 AM, Arnd Bergmann wrote: > On Tuesday 27 May 2014, Corey Minyard wrote: >> On 05/26/2014 04:26 AM, Arnd Bergmann wrote: >>> On Sunday 25 May 2014 14:15:23 minyard@acm.org wrote: >>>> From: Corey Minyard <cminyard@mvista.com> >>>> >>>> The CPU will go to idle with interrupts off, but the interrupts >>>> will wake up the idle. This was causing very long irqsoff trace >>>> values because, basically, the whole idle time was traces with >>>> irqs off, even though they weren't really off. Rework the idle >>>> code to turn hardirq tracing to on befor calling idle. >>>> >>>> Signed-off-by: Corey Minyard <cminyard@mvista.com> >>>> --- >>>> arch/arm/kernel/process.c | 3 ++- >>>> 1 file changed, 2 insertions(+), 1 deletion(-) >>>> >>>> I'm not sure this is correct for all ARM boards, but it fixes the >>>> issue for the Vexpress board I have. No more 4 second irqsoff >>>> times. >>> If this patch is correct for ARM, I wonder if it should be done >>> in architecture independent code instead. Do you see the same >>> thing on x86? If not, do you know what's different? >> MIPS doesn't idle with IRQs disabled. > Looking at the mips code, I see a couple of variants of the idle function: > > * r3081_wait(), r39xx_wait(), and rm7k_wait_irqoff() correctly sleep > before enabling the IRQs > > * r4k_wait first enables the IRQs and then waits, while a second > r4k_wait_irqoff() variant does the opposite, and a comment > explains that the first is wrong but the second may be broken > on some implementations. > > * can't tell what au1k_wait() does, but it also seems to sleep first. Yes, it seems so. The system I have uses the r4k variant that has to enable interrupts first. That would seem to me to lead to other issues, but neither my testers nor I have seen any problems. All the RT scheduling tests we have come back clean on MIPS. > >> x86 calls trace_cpu_idle_rcuidle, but it doesn't re-enable interrupts at >> the end of the idle routine. Adding those trace calls to ARM doesn't >> seem to make any difference, though. > trace_cpu_idle_rcuidle() is unrelated, I also checked that. > > The comment in x86 arch_safe_halt() says "Used in the idle loop; sti > takes one instruction cycle" and the implementation is "sti; hlt", > where "sti" enables the interrupts at the same time as "hlt" > puts the cpu in low-power mode. > > I don't understand what happens when ACPI is used, but I would > it also ends up calling mwait or hlt with IRQs off. Can you try > booting an x86 machine with acpi=off? I can't get my system to boot with acpi=off. It crashes setting up the IOAPIC. This is current 3.10 stable with the RT patch applied. But I obviously don't see the issue on x86 with acpi on. >>> It seems like the right thing to do, I just don't understand >>> why nobody hit this before. >> Yeah, I'm a little confused by that, too. The RT guys use ARM >> extensively, why haven't they seen this? Of course, if you run your RT >> tests under heavy load, you won't see this since you will never go idle. > How exactly do you see this manifest? If it's clear that the > trace comes from the idle function, maybe everybody just ignores it? It is clear that the trace comes from the idle function, but it will quickly mask any other real irqsoff problem. -corey
On 05/27/14 11:49, Arnd Bergmann wrote: > > You also commented in that thread about stop_critical_timings()/ > start_critical_timings(). Corey, can you look at that, too? I > think it's designed to avoid the issue you are seeing but > for some reason doesn't. I sent a patch last week to "solve" this problem. I'm not sure if it's right but it works for me. https://lkml.org/lkml/2014/5/19/607
On Tuesday 27 May 2014 11:53:59 Stephen Boyd wrote: > On 05/27/14 11:49, Arnd Bergmann wrote: > > > > You also commented in that thread about stop_critical_timings()/ > > start_critical_timings(). Corey, can you look at that, too? I > > think it's designed to avoid the issue you are seeing but > > for some reason doesn't. > > I sent a patch last week to "solve" this problem. I'm not sure if it's > right but it works for me. > > https://lkml.org/lkml/2014/5/19/607 I think that one was also wrong, as the intention of the existing stop_critical_timings() function is already to do the same that Corey's patch does, i.e. stop the trace before we go to idle as if we were turning IRQs on. Corey, does it work for you if you replace the new trace_hardirqs_on() you added with time_hardirqs_on() or stop_critical_timing()? Arnd
On 05/27/14 12:27, Arnd Bergmann wrote: > On Tuesday 27 May 2014 11:53:59 Stephen Boyd wrote: >> On 05/27/14 11:49, Arnd Bergmann wrote: >>> You also commented in that thread about stop_critical_timings()/ >>> start_critical_timings(). Corey, can you look at that, too? I >>> think it's designed to avoid the issue you are seeing but >>> for some reason doesn't. >> I sent a patch last week to "solve" this problem. I'm not sure if it's >> right but it works for me. >> >> https://lkml.org/lkml/2014/5/19/607 > I think that one was also wrong, as the intention of the existing > stop_critical_timings() function is already to do the same that > Corey's patch does, i.e. stop the trace before we go to idle as > if we were turning IRQs on. stop_critical_timings() is called in the generic idle loop. It looks like stop_critical_timings() just isn't written correctly. All it does is turn off the tracer, but it doesn't prevent future calls to spinlocks, etc. from causing the tracer to turn on again between calls to stop/start_critical_timings(). It seems better to prevent any more tracing from happening between a call to stop_critical_timings() and start_critical_timings() so we don't have to litter calls to that function throughout the idle path.
On Tuesday 27 May 2014 12:33:38 Stephen Boyd wrote: > On 05/27/14 12:27, Arnd Bergmann wrote: > > On Tuesday 27 May 2014 11:53:59 Stephen Boyd wrote: > >> On 05/27/14 11:49, Arnd Bergmann wrote: > >>> You also commented in that thread about stop_critical_timings()/ > >>> start_critical_timings(). Corey, can you look at that, too? I > >>> think it's designed to avoid the issue you are seeing but > >>> for some reason doesn't. > >> I sent a patch last week to "solve" this problem. I'm not sure if it's > >> right but it works for me. > >> > >> https://lkml.org/lkml/2014/5/19/607 > > I think that one was also wrong, as the intention of the existing > > stop_critical_timings() function is already to do the same that > > Corey's patch does, i.e. stop the trace before we go to idle as > > if we were turning IRQs on. > > stop_critical_timings() is called in the generic idle loop. It looks > like stop_critical_timings() just isn't written correctly. All it does > is turn off the tracer, but it doesn't prevent future calls to > spinlocks, etc. from causing the tracer to turn on again between calls > to stop/start_critical_timings(). It seems better to prevent any more > tracing from happening between a call to stop_critical_timings() and > start_critical_timings() so we don't have to litter calls to that > function throughout the idle path. But are there any such calls in the idle function? I understand what you are doing in your patch, but I don't see why you have to actually do it. Arnd
On 05/27/14 12:39, Arnd Bergmann wrote: > On Tuesday 27 May 2014 12:33:38 Stephen Boyd wrote: >> On 05/27/14 12:27, Arnd Bergmann wrote: >>> On Tuesday 27 May 2014 11:53:59 Stephen Boyd wrote: >>>> On 05/27/14 11:49, Arnd Bergmann wrote: >>>>> You also commented in that thread about stop_critical_timings()/ >>>>> start_critical_timings(). Corey, can you look at that, too? I >>>>> think it's designed to avoid the issue you are seeing but >>>>> for some reason doesn't. >>>> I sent a patch last week to "solve" this problem. I'm not sure if it's >>>> right but it works for me. >>>> >>>> https://lkml.org/lkml/2014/5/19/607 >>> I think that one was also wrong, as the intention of the existing >>> stop_critical_timings() function is already to do the same that >>> Corey's patch does, i.e. stop the trace before we go to idle as >>> if we were turning IRQs on. >> stop_critical_timings() is called in the generic idle loop. It looks >> like stop_critical_timings() just isn't written correctly. All it does >> is turn off the tracer, but it doesn't prevent future calls to >> spinlocks, etc. from causing the tracer to turn on again between calls >> to stop/start_critical_timings(). It seems better to prevent any more >> tracing from happening between a call to stop_critical_timings() and >> start_critical_timings() so we don't have to litter calls to that >> function throughout the idle path. > But are there any such calls in the idle function? I understand what > you are doing in your patch, but I don't see why you have to actually > do it. > Yes in my build it happens immediately after in rcu_enter_idle(). It would also happen if you had a cpuidle driver that notified the core that the timers stop during idle (FEAT_C3_STOP). In that case we would call clockevents_notify() which will call raw_spin_lock_irqsave().
On 05/27/2014 02:27 PM, Arnd Bergmann wrote: > On Tuesday 27 May 2014 11:53:59 Stephen Boyd wrote: >> On 05/27/14 11:49, Arnd Bergmann wrote: >>> You also commented in that thread about stop_critical_timings()/ >>> start_critical_timings(). Corey, can you look at that, too? I >>> think it's designed to avoid the issue you are seeing but >>> for some reason doesn't. >> I sent a patch last week to "solve" this problem. I'm not sure if it's >> right but it works for me. >> >> https://lkml.org/lkml/2014/5/19/607 > I think that one was also wrong, as the intention of the existing > stop_critical_timings() function is already to do the same that > Corey's patch does, i.e. stop the trace before we go to idle as > if we were turning IRQs on. > > Corey, does it work for you if you replace the new trace_hardirqs_on() > you added with time_hardirqs_on() or stop_critical_timing()? Well, more information on this. It turns out that the generic idle loop calls stop_critical_timing() and start_critical timing(), so the arch_cpu_idle() shouldn't have to. However, the idle loop calls rcu_idle_enter() after it calls stop_critical_timing(), and that is resetting the critical timing, it appears. It's disabling/enabling interrupts in rcu_idle_enter(). If I switch the order of the rcu_idle and critical timing calls, the issue goes away. Stephen's patch does not seem to be necessary for my issue. I tried with the patch applied, too. It doesn't seem to hurt, at least. It did not fix the problem by itself, though. -corey
On Tuesday 27 May 2014 19:28:06 Corey Minyard wrote: > On 05/27/2014 02:27 PM, Arnd Bergmann wrote: > > On Tuesday 27 May 2014 11:53:59 Stephen Boyd wrote: > >> On 05/27/14 11:49, Arnd Bergmann wrote: > >>> You also commented in that thread about stop_critical_timings()/ > >>> start_critical_timings(). Corey, can you look at that, too? I > >>> think it's designed to avoid the issue you are seeing but > >>> for some reason doesn't. > >> I sent a patch last week to "solve" this problem. I'm not sure if it's > >> right but it works for me. > >> > >> https://lkml.org/lkml/2014/5/19/607 > > I think that one was also wrong, as the intention of the existing > > stop_critical_timings() function is already to do the same that > > Corey's patch does, i.e. stop the trace before we go to idle as > > if we were turning IRQs on. > > > > Corey, does it work for you if you replace the new trace_hardirqs_on() > > you added with time_hardirqs_on() or stop_critical_timing()? > > Well, more information on this. It turns out that the generic idle loop > calls stop_critical_timing() and start_critical timing(), so the > arch_cpu_idle() shouldn't have to. > > However, the idle loop calls rcu_idle_enter() after it calls > stop_critical_timing(), and that is resetting the critical timing, it > appears. It's disabling/enabling interrupts in rcu_idle_enter(). If I > switch the order of the rcu_idle and critical timing calls, the issue > goes away. > > Stephen's patch does not seem to be necessary for my issue. I tried with > the patch applied, too. It doesn't seem to hurt, at least. It did not > fix the problem by itself, though. Interesting, it looked like the "big hammer" solution to me (compared to yours) that should deal with the problem in any case. Arnd
diff --git a/arch/arm/kernel/process.c b/arch/arm/kernel/process.c index 1ca6cf1..92413af 100644 --- a/arch/arm/kernel/process.c +++ b/arch/arm/kernel/process.c @@ -133,11 +133,12 @@ void (*arm_pm_idle)(void); static void default_idle(void) { + trace_hardirqs_on(); if (arm_pm_idle) arm_pm_idle(); else cpu_do_idle(); - local_irq_enable(); + raw_local_irq_enable(); } void arch_cpu_idle_prepare(void)