Message ID | 20110620142338.GL2082@n2100.arm.linux.org.uk (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 6/20/2011 7:53 PM, Russell King - ARM Linux wrote: > On Mon, Jun 20, 2011 at 12:40:19PM +0100, Russell King - ARM Linux wrote: >> Ok. So loops_per_jiffy must be too small. My guess is you're using an >> older kernel without 71c696b1 (calibrate: extract fall-back calculation >> into own helper). > > Right, this commit above helps show the problem - and it's fairly subtle. > > It's a race condition. Let's first look at the spinlock debugging code. > It does this: > > static void __spin_lock_debug(raw_spinlock_t *lock) > { > u64 i; > u64 loops = loops_per_jiffy * HZ; > > for (;;) { > for (i = 0; i< loops; i++) { > if (arch_spin_trylock(&lock->raw_lock)) > return; > __delay(1); > } > /* print warning */ > } > } > > If loops_per_jiffy is zero, we never try to grab the spinlock, because > we never enter the inner for loop. We immediately print a warning, > and re-execute the outer loop for ever, resulting in the CPU locking up > in this condition. > > In theory, we should never see a zero loops_per_jiffy value, because it > represents the number of loops __delay() needs to delay by one jiffy and > clearly zero makes no sense. > > However, calibrate_delay() does this (which x86 and ARM call on secondary > CPU startup): > > calibrate_delay() > { > ... > if (preset_lpj) { > } else if ((!printed)&& lpj_fine) { > } else if ((loops_per_jiffy = calibrate_delay_direct()) != 0) { > } else { > /* approximation/convergence stuff */ > } > } > > Now, before 71c696b, this used to be: > > } else { > loops_per_jiffy = (1<<12); > > So the window between calibrate_delay_direct() returning and setting > loops_per_jiffy to zero, and the re-initialization of loops_per_jiffy > was relatively short (maybe even the compiler optimized away the zero > write.) > > However, after 71c696b, this now does: > > } else { > if (!printed) > pr_info("Calibrating delay loop... "); > + loops_per_jiffy = calibrate_delay_converge(); > > So, as loops_per_jiffy is not local to this function, the compiler has > to write out that zero value, before calling calibrate_delay_converge(), > and loops_per_jiffy only becomes non-zero _after_ calibrate_delay_converge() > has returned. This opens the window and allows the spinlock debugging > code to explode. > > This patch closes the window completely, by only writing to loops_per_jiffy > only when we have a real value for it. > > This allows me to boot 3.0.0-rc3 on Versatile Express (4 CPU) whereas > without this it fails with spinlock lockup and rcu problems. > > init/calibrate.c | 14 ++++++++------ > 1 files changed, 8 insertions(+), 6 deletions(-) > I am away from my board now. Will test this change. btw, the online-active race is still open even with this patch close and should be fixed. Regards Santosh
On Mon, Jun 20, 2011 at 08:24:33PM +0530, Santosh Shilimkar wrote: > I am away from my board now. Will test this change. > btw, the online-active race is still open even with this patch close > and should be fixed. I have yet to see any evidence of that race - I've been running your test loop for about an hour so far on Versatile Express and nothing yet. That's not to say that we shouldn't wait for the active mask to become true before calling schedule(), but I don't think its as big a deal as you're suggesting it is.
On 6/20/2011 8:31 PM, Russell King - ARM Linux wrote: > On Mon, Jun 20, 2011 at 08:24:33PM +0530, Santosh Shilimkar wrote: >> I am away from my board now. Will test this change. >> btw, the online-active race is still open even with this patch close >> and should be fixed. > > I have yet to see any evidence of that race - I've been running your > test loop for about an hour so far on Versatile Express and nothing > yet. > In that case my script was just exposing the calibrate() code race condition. > That's not to say that we shouldn't wait for the active mask to become > true before calling schedule(), but I don't think its as big a deal as > you're suggesting it is. I am not expert to really trigger that exact race online-to-active but am sure the race needs to be fixed. May be Thomas can suggest a test-case to expose that race. From his fix for x86, it appeared that the race was indeed hit in some sequence. Regards, Santosh
Russell, On 6/20/2011 8:24 PM, Santosh Shilimkar wrote: > On 6/20/2011 7:53 PM, Russell King - ARM Linux wrote: >> On Mon, Jun 20, 2011 at 12:40:19PM +0100, Russell King - ARM Linux wrote: [....] >> So, as loops_per_jiffy is not local to this function, the compiler has >> to write out that zero value, before calling calibrate_delay_converge(), >> and loops_per_jiffy only becomes non-zero _after_ >> calibrate_delay_converge() >> has returned. This opens the window and allows the spinlock debugging >> code to explode. >> >> This patch closes the window completely, by only writing to >> loops_per_jiffy >> only when we have a real value for it. >> >> This allows me to boot 3.0.0-rc3 on Versatile Express (4 CPU) whereas >> without this it fails with spinlock lockup and rcu problems. >> >> init/calibrate.c | 14 ++++++++------ >> 1 files changed, 8 insertions(+), 6 deletions(-) >> > I am away from my board now. Will test this change. Have tested your change and it seems to fix the crash I was observing. Are you planning to send this fix for rc5? > btw, the online-active race is still open even with this patch close > and should be fixed. > The only problem remains is waiting for active mask before marking CPU online. Shall I refresh my patch with only this change then ? Regards Santosh
On Tue, Jun 21, 2011 at 02:38:34PM +0530, Santosh Shilimkar wrote: > Russell, > > On 6/20/2011 8:24 PM, Santosh Shilimkar wrote: >> On 6/20/2011 7:53 PM, Russell King - ARM Linux wrote: >>> So, as loops_per_jiffy is not local to this function, the compiler has >>> to write out that zero value, before calling calibrate_delay_converge(), >>> and loops_per_jiffy only becomes non-zero _after_ >>> calibrate_delay_converge() >>> has returned. This opens the window and allows the spinlock debugging >>> code to explode. >>> >>> This patch closes the window completely, by only writing to >>> loops_per_jiffy >>> only when we have a real value for it. >>> >>> This allows me to boot 3.0.0-rc3 on Versatile Express (4 CPU) whereas >>> without this it fails with spinlock lockup and rcu problems. >>> >>> init/calibrate.c | 14 ++++++++------ >>> 1 files changed, 8 insertions(+), 6 deletions(-) >>> >> I am away from my board now. Will test this change. > Have tested your change and it seems to fix the crash I > was observing. Are you planning to send this fix for rc5? Yes. I think sending CPUs into infinite loops in the spinlock code is definitely sufficiently serious that it needs to go to Linus ASAP. It'd be nice to have a tested-by line though. >> btw, the online-active race is still open even with this patch close >> and should be fixed. >> > The only problem remains is waiting for active mask before > marking CPU online. Shall I refresh my patch with only > this change then ? I already have that as a separate change.
On 6/21/2011 3:30 PM, Russell King - ARM Linux wrote: > On Tue, Jun 21, 2011 at 02:38:34PM +0530, Santosh Shilimkar wrote: >> Russell, >> >> On 6/20/2011 8:24 PM, Santosh Shilimkar wrote: >>> On 6/20/2011 7:53 PM, Russell King - ARM Linux wrote: >>>> So, as loops_per_jiffy is not local to this function, the compiler has >>>> to write out that zero value, before calling calibrate_delay_converge(), >>>> and loops_per_jiffy only becomes non-zero _after_ >>>> calibrate_delay_converge() >>>> has returned. This opens the window and allows the spinlock debugging >>>> code to explode. >>>> >>>> This patch closes the window completely, by only writing to >>>> loops_per_jiffy >>>> only when we have a real value for it. >>>> >>>> This allows me to boot 3.0.0-rc3 on Versatile Express (4 CPU) whereas >>>> without this it fails with spinlock lockup and rcu problems. >>>> >>>> init/calibrate.c | 14 ++++++++------ >>>> 1 files changed, 8 insertions(+), 6 deletions(-) >>>> >>> I am away from my board now. Will test this change. >> Have tested your change and it seems to fix the crash I >> was observing. Are you planning to send this fix for rc5? > > Yes. I think sending CPUs into infinite loops in the spinlock code is > definitely sufficiently serious that it needs to go to Linus ASAP. > It'd be nice to have a tested-by line though. > Sure. >>> btw, the online-active race is still open even with this patch close >>> and should be fixed. >>> >> The only problem remains is waiting for active mask before >> marking CPU online. Shall I refresh my patch with only >> this change then ? > > I already have that as a separate change. Can you point me to both of these commits so that I have them in my tree for testing. Thanks for help. Regards Santosh
On Tue, Jun 21, 2011 at 03:47:04PM +0530, Santosh Shilimkar wrote: > On 6/21/2011 3:30 PM, Russell King - ARM Linux wrote: >> On Tue, Jun 21, 2011 at 02:38:34PM +0530, Santosh Shilimkar wrote: >>> Russell, >>> >>> On 6/20/2011 8:24 PM, Santosh Shilimkar wrote: >>>> On 6/20/2011 7:53 PM, Russell King - ARM Linux wrote: >>>>> So, as loops_per_jiffy is not local to this function, the compiler has >>>>> to write out that zero value, before calling calibrate_delay_converge(), >>>>> and loops_per_jiffy only becomes non-zero _after_ >>>>> calibrate_delay_converge() >>>>> has returned. This opens the window and allows the spinlock debugging >>>>> code to explode. >>>>> >>>>> This patch closes the window completely, by only writing to >>>>> loops_per_jiffy >>>>> only when we have a real value for it. >>>>> >>>>> This allows me to boot 3.0.0-rc3 on Versatile Express (4 CPU) whereas >>>>> without this it fails with spinlock lockup and rcu problems. >>>>> >>>>> init/calibrate.c | 14 ++++++++------ >>>>> 1 files changed, 8 insertions(+), 6 deletions(-) >>>>> >>>> I am away from my board now. Will test this change. >>> Have tested your change and it seems to fix the crash I >>> was observing. Are you planning to send this fix for rc5? >> >> Yes. I think sending CPUs into infinite loops in the spinlock code is >> definitely sufficiently serious that it needs to go to Linus ASAP. >> It'd be nice to have a tested-by line though. >> > Sure. > >>>> btw, the online-active race is still open even with this patch close >>>> and should be fixed. >>>> >>> The only problem remains is waiting for active mask before >>> marking CPU online. Shall I refresh my patch with only >>> this change then ? >> >> I already have that as a separate change. > Can you point me to both of these commits so that I have > them in my tree for testing. I won't be committing the init/calibrate.c change to a git tree - it isn't ARM stuff so it goes in patch form.
On 6/21/2011 3:49 PM, Russell King - ARM Linux wrote: > On Tue, Jun 21, 2011 at 03:47:04PM +0530, Santosh Shilimkar wrote: [...] >>> I already have that as a separate change. >> Can you point me to both of these commits so that I have >> them in my tree for testing. > > I won't be committing the init/calibrate.c change to a git tree - it > isn't ARM stuff so it goes in patch form. Patches with change log would be fine as well. Regards Santosh
On Tue, Jun 21, 2011 at 03:51:00PM +0530, Santosh Shilimkar wrote: > On 6/21/2011 3:49 PM, Russell King - ARM Linux wrote: >> On Tue, Jun 21, 2011 at 03:47:04PM +0530, Santosh Shilimkar wrote: > > [...] > >>>> I already have that as a separate change. >>> Can you point me to both of these commits so that I have >>> them in my tree for testing. >> >> I won't be committing the init/calibrate.c change to a git tree - it >> isn't ARM stuff so it goes in patch form. > > Patches with change log would be fine as well. The answer is not at the moment, but maybe soon.
On Tue, Jun 21, 2011 at 01:16:47PM -0700, Stephen Boyd wrote: > On 06/21/2011 03:26 AM, Russell King - ARM Linux wrote: > > On Tue, Jun 21, 2011 at 03:51:00PM +0530, Santosh Shilimkar wrote: > >> On 6/21/2011 3:49 PM, Russell King - ARM Linux wrote: > >>> I won't be committing the init/calibrate.c change to a git tree - it > >>> isn't ARM stuff so it goes in patch form. > >> Patches with change log would be fine as well. > > The answer is not at the moment, but maybe soon. > > Should we send those two patches to the stable trees as well? They seem > to fix issues with cpu onlining that have existed for a long time. Looks to me like the problem was introduced for 2.6.39-rc1, so we should probably get the fix into the 2.6.39-stable tree too.
On 06/21/2011 04:10 PM, Russell King - ARM Linux wrote: > On Tue, Jun 21, 2011 at 01:16:47PM -0700, Stephen Boyd wrote: >> On 06/21/2011 03:26 AM, Russell King - ARM Linux wrote: >>> On Tue, Jun 21, 2011 at 03:51:00PM +0530, Santosh Shilimkar wrote: >>>> On 6/21/2011 3:49 PM, Russell King - ARM Linux wrote: >>>>> I won't be committing the init/calibrate.c change to a git tree - it >>>>> isn't ARM stuff so it goes in patch form. >>>> Patches with change log would be fine as well. >>> The answer is not at the moment, but maybe soon. >> Should we send those two patches to the stable trees as well? They seem >> to fix issues with cpu onlining that have existed for a long time. > Looks to me like the problem was introduced for 2.6.39-rc1, so we > should probably get the fix into the 2.6.39-stable tree too. Are we talking about the loops_per_jiffy problem or the cpu_active problem? I would think the cpu_active problem has been there since SMP support was added to ARM and the loops_per_jiffy problem has been there (depending on the compiler) since 8a9e1b0 ([PATCH] Platform SMIs and their interferance with tsc based delay calibration, 2005-06-23). So pretty much every stable tree would want both of these patches.
On Tue, Jun 21, 2011 at 05:06:58PM -0700, Stephen Boyd wrote: > On 06/21/2011 04:10 PM, Russell King - ARM Linux wrote: > > On Tue, Jun 21, 2011 at 01:16:47PM -0700, Stephen Boyd wrote: > >> On 06/21/2011 03:26 AM, Russell King - ARM Linux wrote: > >>> On Tue, Jun 21, 2011 at 03:51:00PM +0530, Santosh Shilimkar wrote: > >>>> On 6/21/2011 3:49 PM, Russell King - ARM Linux wrote: > >>>>> I won't be committing the init/calibrate.c change to a git tree - it > >>>>> isn't ARM stuff so it goes in patch form. > >>>> Patches with change log would be fine as well. > >>> The answer is not at the moment, but maybe soon. > >> Should we send those two patches to the stable trees as well? They seem > >> to fix issues with cpu onlining that have existed for a long time. > > Looks to me like the problem was introduced for 2.6.39-rc1, so we > > should probably get the fix into the 2.6.39-stable tree too. > > Are we talking about the loops_per_jiffy problem or the cpu_active > problem? I would think the cpu_active problem has been there since SMP > support was added to ARM and the loops_per_jiffy problem has been there > (depending on the compiler) since 8a9e1b0 ([PATCH] Platform SMIs and > their interferance with tsc based delay calibration, 2005-06-23). The cpu_active problem hasn't actually caused any symptoms on ARM, so it's low priority. It's only a problem which should be sorted in -stable _if_ someone reports that it has caused a problem. Up until Santosh's patch, no one has done so, and I've not seen any problems on any of my ARM SMP platforms coming from it. As for the loops_per_jiffy, it isn't a problem before the commit ID I pointed out - I've checked the assembly, and the compiler optimizes away the initialization of loops_per_jiffy to zero - the first write is when its set to (1<<12). Take a moment to think about this: if ((loops_per_jiffy = 0) == 0) { } else { loops_per_jiffy = 1<<12; ... } Any compiler worth talking about is going to optimize away the initial constant write to loops_per_jiffy there provided loops_per_jiffy is not volatile. So, although its not desirable for older kernels to have their lpj overwritten in this way, it doesn't cause the spinlock debugging code to explode. This can be shown to be correct because there hasn't been any problem with ARM secondary CPU bringup until recently. Plus, the previous version of the code requires significant changes to sort the problem out. So, the lpj patch will only sensibly apply to 2.6.39-rc1 and later, and so it's only going to be submitted for 2.6.39-stable. Previous kernels, the risks of changing it outweighs by several orders of magnitude any benefit coming from the change.
diff --git a/init/calibrate.c b/init/calibrate.c index 2568d22..aae2f40 100644 --- a/init/calibrate.c +++ b/init/calibrate.c @@ -245,30 +245,32 @@ static unsigned long __cpuinit calibrate_delay_converge(void) void __cpuinit calibrate_delay(void) { + unsigned long lpj; static bool printed; if (preset_lpj) { - loops_per_jiffy = preset_lpj; + lpj = preset_lpj; if (!printed) pr_info("Calibrating delay loop (skipped) " "preset value.. "); } else if ((!printed) && lpj_fine) { - loops_per_jiffy = lpj_fine; + lpj = lpj_fine; pr_info("Calibrating delay loop (skipped), " "value calculated using timer frequency.. "); - } else if ((loops_per_jiffy = calibrate_delay_direct()) != 0) { + } else if ((lpj = calibrate_delay_direct()) != 0) { if (!printed) pr_info("Calibrating delay using timer " "specific routine.. "); } else { if (!printed) pr_info("Calibrating delay loop... "); - loops_per_jiffy = calibrate_delay_converge(); + lpj = calibrate_delay_converge(); } if (!printed) pr_cont("%lu.%02lu BogoMIPS (lpj=%lu)\n", - loops_per_jiffy/(500000/HZ), - (loops_per_jiffy/(5000/HZ)) % 100, loops_per_jiffy); + lpj/(500000/HZ), + (lpj/(5000/HZ)) % 100, lpj); + loops_per_jiffy = lpj; printed = true; }