ARM: Don't ever downscale loops_per_jiffy in SMP systems

Message ID	alpine.LFD.2.11.1405082058130.980@knanqh.ubzr (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org> Date: Thu, 8 May 2014 21:37:15 -0400 (EDT) From: Nicolas Pitre <nicolas.pitre@linaro.org> To: Russell King - ARM Linux <linux@arm.linux.org.uk> Subject: Re: [PATCH] ARM: Don't ever downscale loops_per_jiffy in SMP systems In-Reply-To: <20140508205223.GI3693@n2100.arm.linux.org.uk> Message-ID: <alpine.LFD.2.11.1405082058130.980@knanqh.ubzr> References: <1399504982-31181-1-git-send-email-dianders@chromium.org> <CAOh2x=kwaoSdj8JO0Vo08Y0dX2UfKFiF3pBV-qtp-YTyN=8H8Q@mail.gmail.com> <CAD=FV=UEGJGHrdcXgs3EZ_wFXT8GnNeCOMt7CgdiiUzBzZ5Y0A@mail.gmail.com> <alpine.LFD.2.11.1405081149190.980@knanqh.ubzr> <CAD=FV=VtcLOh51LCmafzRMdqtDrwKXCqK6gE6V7OOOUe4gjwbA@mail.gmail.com> <alpine.LFD.2.11.1405081329010.980@knanqh.ubzr> <20140508192209.GH3693@n2100.arm.linux.org.uk> <alpine.LFD.2.11.1405081600260.980@knanqh.ubzr> <20140508205223.GI3693@n2100.arm.linux.org.uk> User-Agent: Alpine 2.11 (LFD 23 2013-08-11) MIME-Version: 1.0 Cc: Stephen Boyd <sboyd@codeaurora.org>, David Riley <davidriley@chromium.org>, Stephen Warren <swarren@nvidia.com>, Marc Zyngier <marc.zyngier@arm.com>, Viresh Kumar <viresh.kumar@linaro.org>, "Rafael J. Wysocki" <rjw@rjwysocki.net>, Doug Anderson <dianders@chromium.org>, Will Deacon <will.deacon@arm.com>, "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>, Paul Gortmaker <paul.gortmaker@windriver.com>, John Stultz <john.stultz@linaro.org>, "olof@lixom.net" <olof@lixom.net>, Santosh Shilimkar <santosh.shilimkar@ti.com>, Shawn Guo <shawn.guo@linaro.org>, Sonny Rao <sonnyrao@chromium.org>, Richard Zhao <richard.zhao@linaro.org>, "linux-arm-kernel@lists.infradead.org" <linux-arm-kernel@lists.infradead.org> Precedence: list Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" <linux-arm-kernel-bounces@lists.infradead.org> Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org

Message ID

alpine.LFD.2.11.1405082058130.980@knanqh.ubzr (mailing list archive)

State

New, archived

Headers

Date: Thu, 8 May 2014 21:37:15 -0400 (EDT)
From: Nicolas Pitre <nicolas.pitre@linaro.org>
To: Russell King - ARM Linux <linux@arm.linux.org.uk>
Subject: Re: [PATCH] ARM: Don't ever downscale loops_per_jiffy in SMP systems
In-Reply-To: <20140508205223.GI3693@n2100.arm.linux.org.uk>
Message-ID: <alpine.LFD.2.11.1405082058130.980@knanqh.ubzr>
References: <1399504982-31181-1-git-send-email-dianders@chromium.org>
	<CAOh2x=kwaoSdj8JO0Vo08Y0dX2UfKFiF3pBV-qtp-YTyN=8H8Q@mail.gmail.com>
	<CAD=FV=UEGJGHrdcXgs3EZ_wFXT8GnNeCOMt7CgdiiUzBzZ5Y0A@mail.gmail.com>
	<alpine.LFD.2.11.1405081149190.980@knanqh.ubzr>
	<CAD=FV=VtcLOh51LCmafzRMdqtDrwKXCqK6gE6V7OOOUe4gjwbA@mail.gmail.com>
	<alpine.LFD.2.11.1405081329010.980@knanqh.ubzr>
	<20140508192209.GH3693@n2100.arm.linux.org.uk>
	<alpine.LFD.2.11.1405081600260.980@knanqh.ubzr>
	<20140508205223.GI3693@n2100.arm.linux.org.uk>
User-Agent: Alpine 2.11 (LFD 23 2013-08-11)
MIME-Version: 1.0
Cc: Stephen Boyd <sboyd@codeaurora.org>,
	David Riley <davidriley@chromium.org>, 
	Stephen Warren <swarren@nvidia.com>,
	Marc Zyngier <marc.zyngier@arm.com>, 
	Viresh Kumar <viresh.kumar@linaro.org>,
	"Rafael J. Wysocki" <rjw@rjwysocki.net>,
	Doug Anderson <dianders@chromium.org>, 
	Will Deacon <will.deacon@arm.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Paul Gortmaker <paul.gortmaker@windriver.com>,
	John Stultz <john.stultz@linaro.org>, "olof@lixom.net" <olof@lixom.net>, 
	Santosh Shilimkar <santosh.shilimkar@ti.com>,
	Shawn Guo <shawn.guo@linaro.org>, Sonny Rao <sonnyrao@chromium.org>,
	Richard Zhao <richard.zhao@linaro.org>, 
	"linux-arm-kernel@lists.infradead.org"
	<linux-arm-kernel@lists.infradead.org>
Precedence: list
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Sender: "linux-arm-kernel" <linux-arm-kernel-bounces@lists.infradead.org>
Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org

Commit Message

Nicolas Pitre May 9, 2014, 1:37 a.m. UTC

On Thu, 8 May 2014, Russell King - ARM Linux wrote:

> If you're in a preempt or SMP environment, provide a timer for udelay().
> IF you're in an environment with IRQs which can take a long time, use
> a timer for udelay().  If you're in an environment where the CPU clock
> can change unexpectedly, use a timer for udelay().

Longer delays are normally not a problem.  If they are, then simply 
disabling IRQs may solve it if absolutely required.  With much shorter 
delays than expected this is another story.

What about the following:

Nicolas

Comments

Doug Anderson May 9, 2014, 4:43 a.m. UTC | #1

Nicolas,

On Thu, May 8, 2014 at 6:37 PM, Nicolas Pitre <nicolas.pitre@linaro.org> wrote:
> On Thu, 8 May 2014, Russell King - ARM Linux wrote:
>
>> If you're in a preempt or SMP environment, provide a timer for udelay().
>> IF you're in an environment with IRQs which can take a long time, use
>> a timer for udelay().  If you're in an environment where the CPU clock
>> can change unexpectedly, use a timer for udelay().
>
> Longer delays are normally not a problem.  If they are, then simply
> disabling IRQs may solve it if absolutely required.  With much shorter
> delays than expected this is another story.
>
> What about the following:
>
> diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
> index 7c4fada440..10030cc5a0 100644
> --- a/arch/arm/kernel/smp.c
> +++ b/arch/arm/kernel/smp.c
> @@ -682,6 +682,15 @@ static int cpufreq_callback(struct notifier_block *nb,
>                         cpufreq_scale(per_cpu(l_p_j_ref, cpu),
>                                         per_cpu(l_p_j_ref_freq, cpu),
>                                         freq->new);
> +               /*
> +                * Another CPU might have called udelay() just before LPJ
> +                * and a shared CPU clock is increased.  That other CPU still
> +                * looping on the old LPJ value would return significantly
> +                * sooner than expected.  The actual fix is to provide a
> +                * timer based udelay() implementation instead.
> +                */
> +               if (freq->old < freq->new)
> +                       pr_warn_once("*** udelay() on SMP is racy and may be much shorter than expected ***\n");

I would be OK with that.  At least someone would have a clue what to do.

-Doug

Russell King - ARM Linux May 9, 2014, 9:18 a.m. UTC | #2

On Thu, May 08, 2014 at 09:37:15PM -0400, Nicolas Pitre wrote:
> On Thu, 8 May 2014, Russell King - ARM Linux wrote:
> 
> > If you're in a preempt or SMP environment, provide a timer for udelay().
> > IF you're in an environment with IRQs which can take a long time, use
> > a timer for udelay().  If you're in an environment where the CPU clock
> > can change unexpectedly, use a timer for udelay().
> 
> Longer delays are normally not a problem.  If they are, then simply 
> disabling IRQs may solve it if absolutely required.  With much shorter 
> delays than expected this is another story.
> 
> What about the following:
> 
> diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
> index 7c4fada440..10030cc5a0 100644
> --- a/arch/arm/kernel/smp.c
> +++ b/arch/arm/kernel/smp.c
> @@ -682,6 +682,15 @@ static int cpufreq_callback(struct notifier_block *nb,
>  			cpufreq_scale(per_cpu(l_p_j_ref, cpu),
>  					per_cpu(l_p_j_ref_freq, cpu),
>  					freq->new);
> +		/*
> +		 * Another CPU might have called udelay() just before LPJ
> +		 * and a shared CPU clock is increased.  That other CPU still
> +		 * looping on the old LPJ value would return significantly
> +		 * sooner than expected.  The actual fix is to provide a
> +		 * timer based udelay() implementation instead.
> +		 */
> +		if (freq->old < freq->new)
> +			pr_warn_once("*** udelay() on SMP is racy and may be much shorter than expected ***\n");
>  	}
>  	return NOTIFY_OK;
>  }

No, because you're assuming this is just a SMP problem.  What about
preempt, where you could preempt away from a udelay loop to change
the CPU frequency, and then back again, possibly resulting in the
CPU clock rate increasing and maybe a shorter delay if the switch
from-change-clock-and-back is fast enough?  Remember that udelay()
can be used for up to 2ms delays.

Nicolas Pitre May 9, 2014, 6 p.m. UTC | #3

On Fri, 9 May 2014, Russell King - ARM Linux wrote:

> On Thu, May 08, 2014 at 09:37:15PM -0400, Nicolas Pitre wrote:
> > On Thu, 8 May 2014, Russell King - ARM Linux wrote:
> > 
> > > If you're in a preempt or SMP environment, provide a timer for udelay().
> > > IF you're in an environment with IRQs which can take a long time, use
> > > a timer for udelay().  If you're in an environment where the CPU clock
> > > can change unexpectedly, use a timer for udelay().
> > 
> > Longer delays are normally not a problem.  If they are, then simply 
> > disabling IRQs may solve it if absolutely required.  With much shorter 
> > delays than expected this is another story.
> > 
> > What about the following:
> > 
> > diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
> > index 7c4fada440..10030cc5a0 100644
> > --- a/arch/arm/kernel/smp.c
> > +++ b/arch/arm/kernel/smp.c
> > @@ -682,6 +682,15 @@ static int cpufreq_callback(struct notifier_block *nb,
> >  			cpufreq_scale(per_cpu(l_p_j_ref, cpu),
> >  					per_cpu(l_p_j_ref_freq, cpu),
> >  					freq->new);
> > +		/*
> > +		 * Another CPU might have called udelay() just before LPJ
> > +		 * and a shared CPU clock is increased.  That other CPU still
> > +		 * looping on the old LPJ value would return significantly
> > +		 * sooner than expected.  The actual fix is to provide a
> > +		 * timer based udelay() implementation instead.
> > +		 */
> > +		if (freq->old < freq->new)
> > +			pr_warn_once("*** udelay() on SMP is racy and may be much shorter than expected ***\n");
> >  	}
> >  	return NOTIFY_OK;
> >  }
> 
> No, because you're assuming this is just a SMP problem.  What about
> preempt, where you could preempt away from a udelay loop to change
> the CPU frequency, and then back again, possibly resulting in the
> CPU clock rate increasing and maybe a shorter delay if the switch
> from-change-clock-and-back is fast enough?  Remember that udelay()
> can be used for up to 2ms delays.

Well... that would be somewhat less likely but still possible yes.

So the only way to "solve" this might look similar in spirit to what 
Doug alluded to earlier i.e. increase a sequence number on 
CPUFREQ_PRECHANGE and increase it again on CPUFREQ_POSTCHANGE, and have 
udelay() compare the count sampled before reading lpj and after 
returning from the loop code.  When the sequence count doesn't match 
then suffice to perform some arbitrarily large extra loops.


Nicolas

Russell King - ARM Linux May 9, 2014, 6:22 p.m. UTC | #4

On Fri, May 09, 2014 at 02:00:54PM -0400, Nicolas Pitre wrote:
> On Fri, 9 May 2014, Russell King - ARM Linux wrote:
> 
> > On Thu, May 08, 2014 at 09:37:15PM -0400, Nicolas Pitre wrote:
> > > On Thu, 8 May 2014, Russell King - ARM Linux wrote:
> > > 
> > > > If you're in a preempt or SMP environment, provide a timer for udelay().
> > > > IF you're in an environment with IRQs which can take a long time, use
> > > > a timer for udelay().  If you're in an environment where the CPU clock
> > > > can change unexpectedly, use a timer for udelay().
> > > 
> > > Longer delays are normally not a problem.  If they are, then simply 
> > > disabling IRQs may solve it if absolutely required.  With much shorter 
> > > delays than expected this is another story.
> > > 
> > > What about the following:
> > > 
> > > diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
> > > index 7c4fada440..10030cc5a0 100644
> > > --- a/arch/arm/kernel/smp.c
> > > +++ b/arch/arm/kernel/smp.c
> > > @@ -682,6 +682,15 @@ static int cpufreq_callback(struct notifier_block *nb,
> > >  			cpufreq_scale(per_cpu(l_p_j_ref, cpu),
> > >  					per_cpu(l_p_j_ref_freq, cpu),
> > >  					freq->new);
> > > +		/*
> > > +		 * Another CPU might have called udelay() just before LPJ
> > > +		 * and a shared CPU clock is increased.  That other CPU still
> > > +		 * looping on the old LPJ value would return significantly
> > > +		 * sooner than expected.  The actual fix is to provide a
> > > +		 * timer based udelay() implementation instead.
> > > +		 */
> > > +		if (freq->old < freq->new)
> > > +			pr_warn_once("*** udelay() on SMP is racy and may be much shorter than expected ***\n");
> > >  	}
> > >  	return NOTIFY_OK;
> > >  }
> > 
> > No, because you're assuming this is just a SMP problem.  What about
> > preempt, where you could preempt away from a udelay loop to change
> > the CPU frequency, and then back again, possibly resulting in the
> > CPU clock rate increasing and maybe a shorter delay if the switch
> > from-change-clock-and-back is fast enough?  Remember that udelay()
> > can be used for up to 2ms delays.
> 
> Well... that would be somewhat less likely but still possible yes.
> 
> So the only way to "solve" this might look similar in spirit to what 
> Doug alluded to earlier i.e. increase a sequence number on 
> CPUFREQ_PRECHANGE and increase it again on CPUFREQ_POSTCHANGE, and have 
> udelay() compare the count sampled before reading lpj and after 
> returning from the loop code.  When the sequence count doesn't match 
> then suffice to perform some arbitrarily large extra loops.

I'd much prefer just printing a warning at kernel boot time to report
that the kernel is running with features which would make udelay() less
than accurate.

Remember, it should be usable for _short_ delays on slow machines as
well as other stuff, and if we're going to start throwing stuff like
the above at it, it's going to become very inefficient.

And... I go back to what I've been saying all along: use a timer in
this situation, don't rely on the loops-based udelay if you have
preempt, USB interrupts, SMP etc.

diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
index 7c4fada440..10030cc5a0 100644
--- a/arch/arm/kernel/smp.c
+++ b/arch/arm/kernel/smp.c
@@ -682,6 +682,15 @@  static int cpufreq_callback(struct notifier_block *nb,
 			cpufreq_scale(per_cpu(l_p_j_ref, cpu),
 					per_cpu(l_p_j_ref_freq, cpu),
 					freq->new);
+		/*
+		 * Another CPU might have called udelay() just before LPJ
+		 * and a shared CPU clock is increased.  That other CPU still
+		 * looping on the old LPJ value would return significantly
+		 * sooner than expected.  The actual fix is to provide a
+		 * timer based udelay() implementation instead.
+		 */
+		if (freq->old < freq->new)
+			pr_warn_once("*** udelay() on SMP is racy and may be much shorter than expected ***\n");
 	}
 	return NOTIFY_OK;
 }

ARM: Don't ever downscale loops_per_jiffy in SMP systems

Commit Message

Comments

Patch