Message ID | 20161027192006.GF4617@intel.com (mailing list archive) |
---|---|
State | Not Applicable, archived |
Headers | show |
On Thu, 27 Oct 2016, Ville Syrjälä wrote: > On Thu, Oct 27, 2016 at 08:48:57PM +0200, Thomas Gleixner wrote: > > What that old patch did, was: > > > > 1) Make sure that the broadcast device is actually armed at resume. > > > > That might cause the HPET to resume proper. > > > > 2) Force a max. 3 seconds rearm when the targeted expiry time is > than 10 > > seconds > > > > That might make sure that lower C-States are never entered. > > Doh. I lost the other hunk somewhere. Let's try that again... And indeed > with the other hunk in tow the machine would appear to resume properly. So it would be interesting whether that hunk in resume_broadcast() is sufficient. > > What's the lowest C-State with acpi-idle and what's the lowest one with > > intel_idle? > > acpi_idle > /sys/devices/system/cpu/cpu0/cpuidle/state3/desc:ACPI FFH INTEL MWAIT 0x30 > /sys/devices/system/cpu/cpu0/cpuidle/state3/disable:0 > /sys/devices/system/cpu/cpu0/cpuidle/state3/latency:100 > /sys/devices/system/cpu/cpu0/cpuidle/state3/name:C3 > /sys/devices/system/cpu/cpu0/cpuidle/state3/power:0 > /sys/devices/system/cpu/cpu0/cpuidle/state3/residency:200 > /sys/devices/system/cpu/cpu0/cpuidle/state3/time:5677316 > /sys/devices/system/cpu/cpu0/cpuidle/state3/usage:5920 > > intel_idle: > /sys/devices/system/cpu/cpu0/cpuidle/state3/desc:MWAIT 0x30 > /sys/devices/system/cpu/cpu0/cpuidle/state3/disable:0 > /sys/devices/system/cpu/cpu0/cpuidle/state3/latency:100 > /sys/devices/system/cpu/cpu0/cpuidle/state3/name:C4-ATM > /sys/devices/system/cpu/cpu0/cpuidle/state3/power:0 > /sys/devices/system/cpu/cpu0/cpuidle/state3/residency:400 > /sys/devices/system/cpu/cpu0/cpuidle/state3/time:7146705 > /sys/devices/system/cpu/cpu0/cpuidle/state3/usage:6826 Does the machine work, when you limit intel idle to C3, which would then match acpi idle ? Thanks, tglx
On Thu, Oct 27, 2016 at 09:25:05PM +0200, Thomas Gleixner wrote: > On Thu, 27 Oct 2016, Ville Syrjälä wrote: > > On Thu, Oct 27, 2016 at 08:48:57PM +0200, Thomas Gleixner wrote: > > > What that old patch did, was: > > > > > > 1) Make sure that the broadcast device is actually armed at resume. > > > > > > That might cause the HPET to resume proper. > > > > > > 2) Force a max. 3 seconds rearm when the targeted expiry time is > than 10 > > > seconds > > > > > > That might make sure that lower C-States are never entered. > > > > Doh. I lost the other hunk somewhere. Let's try that again... And indeed > > with the other hunk in tow the machine would appear to resume properly. > > So it would be interesting whether that hunk in resume_broadcast() is > sufficient. So far it looks like the answer is yes. Looks to be about 5 seconds slower than acpi-idle in resuming, but I suppose that's not all that surprising ;) > > > > What's the lowest C-State with acpi-idle and what's the lowest one with > > > intel_idle? > > > > acpi_idle > > /sys/devices/system/cpu/cpu0/cpuidle/state3/desc:ACPI FFH INTEL MWAIT 0x30 > > /sys/devices/system/cpu/cpu0/cpuidle/state3/disable:0 > > /sys/devices/system/cpu/cpu0/cpuidle/state3/latency:100 > > /sys/devices/system/cpu/cpu0/cpuidle/state3/name:C3 > > /sys/devices/system/cpu/cpu0/cpuidle/state3/power:0 > > /sys/devices/system/cpu/cpu0/cpuidle/state3/residency:200 > > /sys/devices/system/cpu/cpu0/cpuidle/state3/time:5677316 > > /sys/devices/system/cpu/cpu0/cpuidle/state3/usage:5920 > > > > intel_idle: > > /sys/devices/system/cpu/cpu0/cpuidle/state3/desc:MWAIT 0x30 > > /sys/devices/system/cpu/cpu0/cpuidle/state3/disable:0 > > /sys/devices/system/cpu/cpu0/cpuidle/state3/latency:100 > > /sys/devices/system/cpu/cpu0/cpuidle/state3/name:C4-ATM > > /sys/devices/system/cpu/cpu0/cpuidle/state3/power:0 > > /sys/devices/system/cpu/cpu0/cpuidle/state3/residency:400 > > /sys/devices/system/cpu/cpu0/cpuidle/state3/time:7146705 > > /sys/devices/system/cpu/cpu0/cpuidle/state3/usage:6826 > > Does the machine work, when you limit intel idle to C3, which would then > match acpi idle ? I'm pretty sure I had tested all of these, but I just double checked to make sure. There's no C3 with intel_idle so I limited to C2, but that did not help. Isn't it possible that ACPI C3 is in fact C4? I thought ACPI C-states are always numbered non-sparsely, and in this case ACPI C3 could be anything from C3 to C11 (if the processor actually supported such states obviously). Actually now that I look at the descriptions for the states in sysfs, it says "MWAIT 0x30" for state3 on both drivers, which I presume means it's in fact C4 for both.
On Thu, 27 Oct 2016, Ville Syrjälä wrote: > On Thu, Oct 27, 2016 at 09:25:05PM +0200, Thomas Gleixner wrote: > > So it would be interesting whether that hunk in resume_broadcast() is > > sufficient. > > So far it looks like the answer is yes. > > Looks to be about 5 seconds slower than acpi-idle in resuming, but > I suppose that's not all that surprising ;) Well, set it to 1msec then. If that works reliably then we really can do that unconditionally. There is no harm in firing a useless timer during resume once. > > Does the machine work, when you limit intel idle to C3, which would then > > match acpi idle ? > > I'm pretty sure I had tested all of these, but I just double checked > to make sure. There's no C3 with intel_idle so I limited to C2, but > that did not help. > > Isn't it possible that ACPI C3 is in fact C4? I thought ACPI C-states > are always numbered non-sparsely, and in this case ACPI C3 could be > anything from C3 to C11 (if the processor actually supported such > states obviously). Actually now that I look at the descriptions for > the states in sysfs, it says "MWAIT 0x30" for state3 on both drivers, > which I presume means it's in fact C4 for both. Indeed. Thanks, tglx
On Thu, Oct 27, 2016 at 10:41:18PM +0200, Thomas Gleixner wrote: > On Thu, 27 Oct 2016, Ville Syrjälä wrote: > > On Thu, Oct 27, 2016 at 09:25:05PM +0200, Thomas Gleixner wrote: > > > So it would be interesting whether that hunk in resume_broadcast() is > > > sufficient. > > > > So far it looks like the answer is yes. > > > > Looks to be about 5 seconds slower than acpi-idle in resuming, but > > I suppose that's not all that surprising ;) > > Well, set it to 1msec then. If that works reliably then we really can do > that unconditionally. There is no harm in firing a useless timer during > resume once. I narrowed down the required timeout, and looks like 25ms is the minimum that works. With 24ms I already started to have failures. So maybe just bump it up by an order of magnitude to 250ms for some safety margin? In any case I think I'll leave the machine running S3 cycles over the weekend with the 25ms timeout just to see if it will eventually fail.
On Fri, 28 Oct 2016, Ville Syrjälä wrote: > On Thu, Oct 27, 2016 at 10:41:18PM +0200, Thomas Gleixner wrote: > > On Thu, 27 Oct 2016, Ville Syrjälä wrote: > > > On Thu, Oct 27, 2016 at 09:25:05PM +0200, Thomas Gleixner wrote: > > > > So it would be interesting whether that hunk in resume_broadcast() is > > > > sufficient. > > > > > > So far it looks like the answer is yes. > > > > > > Looks to be about 5 seconds slower than acpi-idle in resuming, but > > > I suppose that's not all that surprising ;) > > > > Well, set it to 1msec then. If that works reliably then we really can do > > that unconditionally. There is no harm in firing a useless timer during > > resume once. > > I narrowed down the required timeout, and looks like 25ms is the > minimum that works. With 24ms I already started to have failures. So > maybe just bump it up by an order of magnitude to 250ms for some > safety margin? Sure, but what puzzles me is that we need a timeout that big. What happens between broadcast_resume() and broadcast_resume() + 25ms? IOW, what is the event/resume function which we need to bridge. We should really try to track than down. You might try to enable function tracing and do a tracing_off() when that 25ms timeout fires. Something like stop_trace = true; in broadcast_resume() and then in the broadcast timer function: if (stop_trace) { stop_trace = false; tracing_off(); } Then when the machine is up read the trace, compress and upload it somewhere or send it in private mail if it's not that big. Thanks, tglx
On Fri, Oct 28, 2016 at 08:58:41PM +0200, Thomas Gleixner wrote: > On Fri, 28 Oct 2016, Ville Syrjälä wrote: > > On Thu, Oct 27, 2016 at 10:41:18PM +0200, Thomas Gleixner wrote: > > > On Thu, 27 Oct 2016, Ville Syrjälä wrote: > > > > On Thu, Oct 27, 2016 at 09:25:05PM +0200, Thomas Gleixner wrote: > > > > > So it would be interesting whether that hunk in resume_broadcast() is > > > > > sufficient. > > > > > > > > So far it looks like the answer is yes. > > > > > > > > Looks to be about 5 seconds slower than acpi-idle in resuming, but > > > > I suppose that's not all that surprising ;) > > > > > > Well, set it to 1msec then. If that works reliably then we really can do > > > that unconditionally. There is no harm in firing a useless timer during > > > resume once. > > > > I narrowed down the required timeout, and looks like 25ms is the > > minimum that works. With 24ms I already started to have failures. So > > maybe just bump it up by an order of magnitude to 250ms for some > > safety margin? I left the thing running for the weekend and it failed 26 out of 16057 times with the 25ms timeout. Looks like it takes ~5 minutes to resume when it fails, but eventually it does come back. > > Sure, but what puzzles me is that we need a timeout that big. What happens > between broadcast_resume() and broadcast_resume() + 25ms? > > IOW, what is the event/resume function which we need to bridge. We should > really try to track than down. My hunch would be that SMM trap in the DSDT/SSDT since that's where things ended up last time I was tracing these resume problems. Though I can't recall if that was just with acpi-idle or if intel_idle landed in the same spot as well. I guess I can try to repeat that test tomorrow, or I'll try your function tracer method if the other thing fails. > > You might try to enable function tracing and do a tracing_off() when that > 25ms timeout fires. > > Something like > > stop_trace = true; > > in broadcast_resume() and then in the broadcast timer function: > > if (stop_trace) { > stop_trace = false; > tracing_off(); > } > > Then when the machine is up read the trace, compress and upload it > somewhere or send it in private mail if it's not that big. > > Thanks, > > tglx
On Tue, Nov 01, 2016 at 10:47:37PM +0200, Ville Syrjälä wrote: > On Fri, Oct 28, 2016 at 08:58:41PM +0200, Thomas Gleixner wrote: > > On Fri, 28 Oct 2016, Ville Syrjälä wrote: > > > On Thu, Oct 27, 2016 at 10:41:18PM +0200, Thomas Gleixner wrote: > > > > On Thu, 27 Oct 2016, Ville Syrjälä wrote: > > > > > On Thu, Oct 27, 2016 at 09:25:05PM +0200, Thomas Gleixner wrote: > > > > > > So it would be interesting whether that hunk in resume_broadcast() is > > > > > > sufficient. > > > > > > > > > > So far it looks like the answer is yes. > > > > > > > > > > Looks to be about 5 seconds slower than acpi-idle in resuming, but > > > > > I suppose that's not all that surprising ;) > > > > > > > > Well, set it to 1msec then. If that works reliably then we really can do > > > > that unconditionally. There is no harm in firing a useless timer during > > > > resume once. > > > > > > I narrowed down the required timeout, and looks like 25ms is the > > > minimum that works. With 24ms I already started to have failures. So > > > maybe just bump it up by an order of magnitude to 250ms for some > > > safety margin? > > I left the thing running for the weekend and it failed 26 out of 16057 > times with the 25ms timeout. Looks like it takes ~5 minutes to resume > when it fails, but eventually it does come back. > > > > > Sure, but what puzzles me is that we need a timeout that big. What happens > > between broadcast_resume() and broadcast_resume() + 25ms? > > > > IOW, what is the event/resume function which we need to bridge. We should > > really try to track than down. > > My hunch would be that SMM trap in the DSDT/SSDT since that's where > things ended up last time I was tracing these resume problems. Though I > can't recall if that was just with acpi-idle or if intel_idle landed in > the same spot as well. > > I guess I can try to repeat that test tomorrow, or I'll try your function > tracer method if the other thing fails. I didn't manage to find a lot of time to play around with this, but it definitely looks like the SMM trap is the problem here. I repeated my pm_trace experiemnts and when it gets stuck it is trying to execute the _WAK ACPI method which is where the SMM trap happens. Maybe the SMM code was written with the expectation of a periodic tick or something like that? > > > > > You might try to enable function tracing and do a tracing_off() when that > > 25ms timeout fires. > > > > Something like > > > > stop_trace = true; > > > > in broadcast_resume() and then in the broadcast timer function: > > > > if (stop_trace) { > > stop_trace = false; > > tracing_off(); > > } > > > > Then when the machine is up read the trace, compress and upload it > > somewhere or send it in private mail if it's not that big. > > > > Thanks, > > > > tglx > > > -- > Ville Syrjälä > Intel OTC
On Wed, Nov 02, 2016 at 04:47:37AM +0800, Ville Syrjälä wrote: > On Fri, Oct 28, 2016 at 08:58:41PM +0200, Thomas Gleixner wrote: > > On Fri, 28 Oct 2016, Ville Syrjälä wrote: > > > On Thu, Oct 27, 2016 at 10:41:18PM +0200, Thomas Gleixner wrote: > > > > On Thu, 27 Oct 2016, Ville Syrjälä wrote: > > > > > On Thu, Oct 27, 2016 at 09:25:05PM +0200, Thomas Gleixner wrote: > > > > > > So it would be interesting whether that hunk in resume_broadcast() is > > > > > > sufficient. > > > > > > > > > > So far it looks like the answer is yes. > > > > > > > > > > Looks to be about 5 seconds slower than acpi-idle in resuming, but > > > > > I suppose that's not all that surprising ;) > > > > > > > > Well, set it to 1msec then. If that works reliably then we really can do > > > > that unconditionally. There is no harm in firing a useless timer during > > > > resume once. > > > > > > I narrowed down the required timeout, and looks like 25ms is the > > > minimum that works. With 24ms I already started to have failures. So > > > maybe just bump it up by an order of magnitude to 250ms for some > > > safety margin? > > I left the thing running for the weekend and it failed 26 out of 16057 > times with the 25ms timeout. Looks like it takes ~5 minutes to resume > when it fails, but eventually it does come back. > Just came back from a travel. Yes, the 5 minutes delay may be due to the expiration of the HPET timer, counting from 0 to 0xffffffff for a 13M frequencey HPET takes about 300 seconds. After resume, it seems nobody arms it so my old patch forces to arm one event. Thanks, Feng -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Nov 8, 2016 at 7:54 PM, Feng Tang <feng.tang@intel.com> wrote: > On Wed, Nov 02, 2016 at 04:47:37AM +0800, Ville Syrjälä wrote: >> >> I left the thing running for the weekend and it failed 26 out of 16057 >> times with the 25ms timeout. Looks like it takes ~5 minutes to resume >> when it fails, but eventually it does come back. > > Just came back from a travel. Yes, the 5 minutes delay may be due to the > expiration of the HPET timer, counting from 0 to 0xffffffff for a 13M > frequencey HPET takes about 300 seconds. After resume, it seems nobody > arms it so my old patch forces to arm one event. Ville, what happens if you disable HPET? Can you force the TSC with "clocksource=tsc" or "tsc=reliable". Does resume work reliably then? Or is this one of the CPU's where tsc just doesn't work? Linus -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Nov 08, 2016 at 10:08:37PM -0800, Linus Torvalds wrote: > On Tue, Nov 8, 2016 at 7:54 PM, Feng Tang <feng.tang@intel.com> wrote: > > On Wed, Nov 02, 2016 at 04:47:37AM +0800, Ville Syrjälä wrote: > >> > >> I left the thing running for the weekend and it failed 26 out of 16057 > >> times with the 25ms timeout. Looks like it takes ~5 minutes to resume > >> when it fails, but eventually it does come back. > > > > Just came back from a travel. Yes, the 5 minutes delay may be due to the > > expiration of the HPET timer, counting from 0 to 0xffffffff for a 13M > > frequencey HPET takes about 300 seconds. After resume, it seems nobody > > arms it so my old patch forces to arm one event. > > Ville, what happens if you disable HPET? Can you force the TSC with > "clocksource=tsc" or "tsc=reliable". Does resume work reliably then? > > Or is this one of the CPU's where tsc just doesn't work? tsc=reliable allows use of the tsc it seems. Doesn't seem to help with resuming though.
diff --git a/kernel/time/tick-broadcast.c b/kernel/time/tick-broadcast.c index f6aae7977824..e2173aeeb00c 100644 --- a/kernel/time/tick-broadcast.c +++ b/kernel/time/tick-broadcast.c @@ -507,8 +507,12 @@ void tick_resume_broadcast(void) tick_broadcast_start_periodic(bc); break; case TICKDEV_MODE_ONESHOT: - if (!cpumask_empty(tick_broadcast_mask)) + if (!cpumask_empty(tick_broadcast_mask)) { tick_resume_broadcast_oneshot(bc); + clockevents_program_event(bc, + ktime_add_ns(ktime_get(), 5 * NSEC_PER_SEC), + 1); + } break; } } @@ -657,8 +661,16 @@ static void tick_handle_oneshot_broadcast(struct clock_event_device *dev) * - There are pending events on sleeping CPUs which were not * in the event mask */ - if (next_event.tv64 != KTIME_MAX) + if (next_event.tv64 != KTIME_MAX) { + s64 delta = next_event.tv64 - now.tv64; + + if (delta >= 10000000000) { + printk(KERN_CRIT "%s(): The delta is big: %lld\n", __func__, delta); + next_event.tv64 = now.tv64 + 3000000000; + } + tick_broadcast_set_event(dev, next_cpu, next_event); + } raw_spin_unlock(&tick_broadcast_lock);