Message ID | 1459259051-4943-4-git-send-email-joao.m.martins@oracle.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
> static void tsc_check_slave(void *unused) > @@ -1437,6 +1515,20 @@ static int __init verify_tsc_reliability(void) > } > } > > + if ( !strcmp(opt_clocksource, "tsc") ) Pls also update xen-command-line.markdown > + { > + if ( try_platform_timer(&plt_tsc) > 0 ) > + { > + printk(XENLOG_INFO "Switched to Platform timer %s TSC\n", > + freq_string(plt_src.frequency)); > + > + init_percpu_time(); > + > + init_timer(&calibration_timer, time_calibration, NULL, 0); > + set_timer(&calibration_timer, NOW() + EPOCH); > + } > + } > + > return 0; > } > __initcall(verify_tsc_reliability); > @@ -1476,6 +1568,7 @@ void __init early_time_init(void) > struct cpu_time *t = &this_cpu(cpu_time); > u64 tmp = init_pit_and_calibrate_tsc(); > > + tsc_freq = tmp; > set_time_scale(&t->tsc_scale, tmp); > t->local_tsc_stamp = boot_tsc_stamp; > > -- > 2.1.4 > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
On 03/29/2016 06:39 PM, Konrad Rzeszutek Wilk wrote: >> static void tsc_check_slave(void *unused) >> @@ -1437,6 +1515,20 @@ static int __init verify_tsc_reliability(void) >> } >> } >> >> + if ( !strcmp(opt_clocksource, "tsc") ) > > Pls also update xen-command-line.markdown > OK. I fixed it for the new clocksource, and also added documentation on the new boot param that is introduced in the last patch. >> + { >> + if ( try_platform_timer(&plt_tsc) > 0 ) >> + { >> + printk(XENLOG_INFO "Switched to Platform timer %s TSC\n", >> + freq_string(plt_src.frequency)); >> + >> + init_percpu_time(); >> + >> + init_timer(&calibration_timer, time_calibration, NULL, 0); >> + set_timer(&calibration_timer, NOW() + EPOCH); >> + } >> + } >> + >> return 0; >> } >> __initcall(verify_tsc_reliability); >> @@ -1476,6 +1568,7 @@ void __init early_time_init(void) >> struct cpu_time *t = &this_cpu(cpu_time); >> u64 tmp = init_pit_and_calibrate_tsc(); >> >> + tsc_freq = tmp; >> set_time_scale(&t->tsc_scale, tmp); >> t->local_tsc_stamp = boot_tsc_stamp; >> >> -- >> 2.1.4 >> >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xen.org >> http://lists.xen.org/xen-devel
On Tue, Mar 29, 2016 at 02:44:08PM +0100, Joao Martins wrote: > Introduce support for using TSC as platform time which is the highest > resolution time and most performant to get (~20 nsecs). Though there > are also several problems associated with its usage, and there isn't a > complete (and architecturally defined) guarantee that all machines > will provide reliable and monotonic TSC across all CPUs, on different > sockets and different P/C states. I believe Intel to be the only that > can guarantee that. For this reason it's only set when adminstrator > changes "clocksource" boot option to "tsc". Initializing TSC > clocksource requires all CPUs to have the tsc reliability checks > performed. init_xen_time is called before all CPUs are up, and so we > start with HPET at boot time, and switch later to TSC. The switch then > happens on verify_tsc_reliability initcall that is invoked when all > CPUs are up. When attempting to initializing TSC we also check for > time warps and appropriate CPU features i.e. TSC_RELIABLE, > CONSTANT_TSC and NONSTOP_TSC. And in case none of these conditions are > met, we keep the clocksource that was previously initialized on > init_xen_time. > > It is also worth noting that with clocksource=tsc there isn't any > need to synchronise with another clocksource, and I could verify that > great portion the time skew was eliminated and seeing much less time > warps happening. With HPET I used to observe ~500 warps in the period > of 1h of around 27 us, and with TSC down to 50 warps in the same > period having each warp < 100 ns. The warps still exist though but > are only related to cross CPU calibration (being how much it takes to > rendezvous with master), in which a later patch in this series aims to > solve. > > Signed-off-by: Joao Martins <joao.m.martins@oracle.com> > --- > Cc: Keir Fraser <keir@xen.org> > Cc: Jan Beulich <jbeulich@suse.com> > Cc: Andrew Cooper <andrew.cooper3@citrix.com> > > Changes since v1: > - s/printk/printk(XENLOG_INFO > - Remove extra space on inner brackets > - Add missing space around brackets > - Defer clocksource TSC initialization when all CPUs are up. > > Changes since RFC: > - Spelling fixes in the commit message. > - Remove unused clocksource_is_tsc variable and introduce it instead > on the patch that uses it. > - Move plt_tsc from second to last in the available clocksources. > --- > xen/arch/x86/time.c | 97 +++++++++++++++++++++++++++++++++++++++++++++++++++-- > 1 file changed, 95 insertions(+), 2 deletions(-) > > diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c > index ed4ed24..2602dda 100644 > --- a/xen/arch/x86/time.c > +++ b/xen/arch/x86/time.c > @@ -432,6 +432,63 @@ uint64_t ns_to_acpi_pm_tick(uint64_t ns) > } > > /************************************************************ > + * PLATFORM TIMER 4: TSC > + */ > +static u64 tsc_freq; > +static unsigned long tsc_max_warp; > +static void tsc_check_reliability(void); > + > +static int __init init_tsctimer(struct platform_timesource *pts) > +{ > + bool_t tsc_reliable = 0; No need to set it to zero. > + > + tsc_check_reliability(); This has been already called by verify_tsc_reliability which calls this function. Should we set tsc_max_warp to zero before calling it? > + > + if ( tsc_max_warp > 0 ) > + { > + tsc_reliable = 0; Ditto. It is by default zero. > + printk(XENLOG_INFO "TSC: didn't passed warp test\n"); So the earlier test by verify_tsc_reliability did already this check and printed this out - and also cleared the X86_FEATURE_TSC_RELIABLE. So can this check above be removed then? Or are you thinking to ditch what verify_tsc_reliability does? > + } > + else if ( boot_cpu_has(X86_FEATURE_TSC_RELIABLE) || > + (boot_cpu_has(X86_FEATURE_CONSTANT_TSC) && > + boot_cpu_has(X86_FEATURE_NONSTOP_TSC)) ) > + { > + tsc_reliable = 1; > + } > + else if ( boot_cpu_has(X86_FEATURE_CONSTANT_TSC) ) > + { > + tsc_reliable = (max_cstate <= 2); > + > + if ( tsc_reliable ) > + printk(XENLOG_INFO "TSC: no deep Cstates, deemed reliable\n"); > + else > + printk(XENLOG_INFO "TSC: deep Cstates possible, so not reliable\n"); > + } > + > + pts->frequency = tsc_freq; > + return tsc_reliable; > +} > + > +static u64 read_tsc(void) > +{ > + return rdtsc(); > +} > + > +static void resume_tsctimer(struct platform_timesource *pts) > +{ > +} > + > +static struct platform_timesource __initdata plt_tsc = > +{ > + .id = "tsc", > + .name = "TSC", > + .read_counter = read_tsc, > + .counter_bits = 64, > + .init = init_tsctimer, Could you add a note in here: /* Not called by init_platform_timer as it is not on the plt_timers array. */ > + .resume = resume_tsctimer, > +}; > + > +/************************************************************ > * GENERIC PLATFORM TIMER INFRASTRUCTURE > */ > > @@ -533,6 +590,21 @@ static void resume_platform_timer(void) > plt_stamp = plt_src.read_counter(); > } > > +static void __init reset_platform_timer(void) > +{ > + /* Deactivate any timers running */ > + kill_timer(&plt_overflow_timer); > + kill_timer(&calibration_timer); > + > + /* Reset counters and stamps */ > + spin_lock_irq(&platform_timer_lock); > + plt_stamp = 0; > + plt_stamp64 = 0; > + platform_timer_stamp = 0; > + stime_platform_stamp = 0; > + spin_unlock_irq(&platform_timer_lock); > +} > + > static int __init try_platform_timer(struct platform_timesource *pts) > { > int rc = -1; > @@ -541,6 +613,10 @@ static int __init try_platform_timer(struct platform_timesource *pts) > if ( rc <= 0 ) > return rc; > > + /* We have a platform timesource already so reset it */ > + if ( plt_src.counter_bits != 0 ) > + reset_platform_timer(); > + > plt_mask = (u64)~0ull >> (64 - pts->counter_bits); > > set_time_scale(&plt_scale, pts->frequency); > @@ -566,7 +642,9 @@ static void __init init_platform_timer(void) > struct platform_timesource *pts = NULL; > int i, rc = -1; > > - if ( opt_clocksource[0] != '\0' ) > + /* clocksource=tsc is initialized later when all CPUS are up */ Perhaps: /* clocksource=tsc is initialized via __initcalls (when CPUs are up). */ ? > + if ( (opt_clocksource[0] != '\0') && > + (strcmp(opt_clocksource, "tsc") != 0) ) > { > for ( i = 0; i < ARRAY_SIZE(plt_timers); i++ ) > { > @@ -1192,7 +1270,7 @@ static void check_tsc_warp(unsigned long tsc_khz, unsigned long *max_warp) > } > } > > -static unsigned long tsc_max_warp, tsc_check_count; > +static unsigned long tsc_check_count; > static cpumask_t tsc_check_cpumask; > > static void tsc_check_slave(void *unused) > @@ -1437,6 +1515,20 @@ static int __init verify_tsc_reliability(void) > } > } > > + if ( !strcmp(opt_clocksource, "tsc") ) > + { > + if ( try_platform_timer(&plt_tsc) > 0 ) > + { > + printk(XENLOG_INFO "Switched to Platform timer %s TSC\n", > + freq_string(plt_src.frequency)); > + > + init_percpu_time(); > + > + init_timer(&calibration_timer, time_calibration, NULL, 0); > + set_timer(&calibration_timer, NOW() + EPOCH); > + } > + } > + > return 0; > } > __initcall(verify_tsc_reliability); > @@ -1476,6 +1568,7 @@ void __init early_time_init(void) > struct cpu_time *t = &this_cpu(cpu_time); > u64 tmp = init_pit_and_calibrate_tsc(); > > + tsc_freq = tmp; > set_time_scale(&t->tsc_scale, tmp); > t->local_tsc_stamp = boot_tsc_stamp; > > -- > 2.1.4 > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
[snip] >> diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c >> index ed4ed24..2602dda 100644 >> --- a/xen/arch/x86/time.c >> +++ b/xen/arch/x86/time.c >> @@ -432,6 +432,63 @@ uint64_t ns_to_acpi_pm_tick(uint64_t ns) >> } >> >> /************************************************************ >> + * PLATFORM TIMER 4: TSC >> + */ >> +static u64 tsc_freq; >> +static unsigned long tsc_max_warp; >> +static void tsc_check_reliability(void); >> + >> +static int __init init_tsctimer(struct platform_timesource *pts) >> +{ >> + bool_t tsc_reliable = 0; > > No need to set it to zero. OK. >> + >> + tsc_check_reliability(); > > This has been already called by verify_tsc_reliability which calls this > function. Should we set tsc_max_warp to zero before calling it? Ah, correct. But may be it's not necessary to run the tsc_check_reliability here at all as what I am doing is ineficient. See my other comment below. > >> + >> + if ( tsc_max_warp > 0 ) >> + { >> + tsc_reliable = 0; > > Ditto. It is by default zero. OK. > >> + printk(XENLOG_INFO "TSC: didn't passed warp test\n"); > > So the earlier test by verify_tsc_reliability did already this check and > printed this out - and also cleared the X86_FEATURE_TSC_RELIABLE. > > So can this check above be removed then? > > Or are you thinking to ditch what verify_tsc_reliability does? > I had the tsc_check_reliability here because TSC could still be deemed reliable for max_cstate <= 2 or with CONSTANT_TSC + NONSTOP_TSC. The way I have it, the most likely scenario (i.e. having TSC_RELIABLE) would run twice. Perhaps a better way of doing this would be to run the warp test solely on verify_tsc_reliability() in all possible conditions to be deemed reliable? And then I could even remove almost the whole init_tsctimer if it was to be called when no warps are observed. >> + } >> + else if ( boot_cpu_has(X86_FEATURE_TSC_RELIABLE) || >> + (boot_cpu_has(X86_FEATURE_CONSTANT_TSC) && >> + boot_cpu_has(X86_FEATURE_NONSTOP_TSC)) ) >> + { >> + tsc_reliable = 1; >> + } >> + else if ( boot_cpu_has(X86_FEATURE_CONSTANT_TSC) ) >> + { >> + tsc_reliable = (max_cstate <= 2); >> + >> + if ( tsc_reliable ) >> + printk(XENLOG_INFO "TSC: no deep Cstates, deemed reliable\n"); >> + else >> + printk(XENLOG_INFO "TSC: deep Cstates possible, so not reliable\n"); >> + } >> + >> + pts->frequency = tsc_freq; >> + return tsc_reliable; >> +} >> + >> +static u64 read_tsc(void) >> +{ >> + return rdtsc(); >> +} >> + >> +static void resume_tsctimer(struct platform_timesource *pts) >> +{ >> +} >> + >> +static struct platform_timesource __initdata plt_tsc = >> +{ >> + .id = "tsc", >> + .name = "TSC", >> + .read_counter = read_tsc, >> + .counter_bits = 64, >> + .init = init_tsctimer, > > Could you add a note in here: > > /* Not called by init_platform_timer as it is not on the plt_timers array. */ > OK, I fixed that. >> + .resume = resume_tsctimer, >> +}; >> + >> +/************************************************************ >> * GENERIC PLATFORM TIMER INFRASTRUCTURE >> */ >> >> @@ -533,6 +590,21 @@ static void resume_platform_timer(void) >> plt_stamp = plt_src.read_counter(); >> } >> >> +static void __init reset_platform_timer(void) >> +{ >> + /* Deactivate any timers running */ >> + kill_timer(&plt_overflow_timer); >> + kill_timer(&calibration_timer); >> + >> + /* Reset counters and stamps */ >> + spin_lock_irq(&platform_timer_lock); >> + plt_stamp = 0; >> + plt_stamp64 = 0; >> + platform_timer_stamp = 0; >> + stime_platform_stamp = 0; >> + spin_unlock_irq(&platform_timer_lock); >> +} >> + >> static int __init try_platform_timer(struct platform_timesource *pts) >> { >> int rc = -1; >> @@ -541,6 +613,10 @@ static int __init try_platform_timer(struct platform_timesource *pts) >> if ( rc <= 0 ) >> return rc; >> >> + /* We have a platform timesource already so reset it */ >> + if ( plt_src.counter_bits != 0 ) >> + reset_platform_timer(); >> + >> plt_mask = (u64)~0ull >> (64 - pts->counter_bits); >> >> set_time_scale(&plt_scale, pts->frequency); >> @@ -566,7 +642,9 @@ static void __init init_platform_timer(void) >> struct platform_timesource *pts = NULL; >> int i, rc = -1; >> >> - if ( opt_clocksource[0] != '\0' ) >> + /* clocksource=tsc is initialized later when all CPUS are up */ > > Perhaps: > /* clocksource=tsc is initialized via __initcalls (when CPUs are up). */ ? > It's clearer indeed. Fixed that too. >> + if ( (opt_clocksource[0] != '\0') && >> + (strcmp(opt_clocksource, "tsc") != 0) ) >> { >> for ( i = 0; i < ARRAY_SIZE(plt_timers); i++ ) >> { >> @@ -1192,7 +1270,7 @@ static void check_tsc_warp(unsigned long tsc_khz, unsigned long *max_warp) >> } >> } >> >> -static unsigned long tsc_max_warp, tsc_check_count; >> +static unsigned long tsc_check_count; >> static cpumask_t tsc_check_cpumask; >> >> static void tsc_check_slave(void *unused) >> @@ -1437,6 +1515,20 @@ static int __init verify_tsc_reliability(void) >> } >> } >> >> + if ( !strcmp(opt_clocksource, "tsc") ) >> + { >> + if ( try_platform_timer(&plt_tsc) > 0 ) >> + { >> + printk(XENLOG_INFO "Switched to Platform timer %s TSC\n", >> + freq_string(plt_src.frequency)); >> + >> + init_percpu_time(); >> + >> + init_timer(&calibration_timer, time_calibration, NULL, 0); >> + set_timer(&calibration_timer, NOW() + EPOCH); >> + } >> + } >> + >> return 0; >> } >> __initcall(verify_tsc_reliability); >> @@ -1476,6 +1568,7 @@ void __init early_time_init(void) >> struct cpu_time *t = &this_cpu(cpu_time); >> u64 tmp = init_pit_and_calibrate_tsc(); >> >> + tsc_freq = tmp; >> set_time_scale(&t->tsc_scale, tmp); >> t->local_tsc_stamp = boot_tsc_stamp; >> >> -- >> 2.1.4
On Fri, Apr 01, 2016 at 07:38:18PM +0100, Joao Martins wrote: > [snip] > > >> diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c > >> index ed4ed24..2602dda 100644 > >> --- a/xen/arch/x86/time.c > >> +++ b/xen/arch/x86/time.c > >> @@ -432,6 +432,63 @@ uint64_t ns_to_acpi_pm_tick(uint64_t ns) > >> } > >> > >> /************************************************************ > >> + * PLATFORM TIMER 4: TSC > >> + */ > >> +static u64 tsc_freq; > >> +static unsigned long tsc_max_warp; > >> +static void tsc_check_reliability(void); > >> + > >> +static int __init init_tsctimer(struct platform_timesource *pts) > >> +{ > >> + bool_t tsc_reliable = 0; > > > > No need to set it to zero. > OK. > > >> + > >> + tsc_check_reliability(); > > > > This has been already called by verify_tsc_reliability which calls this > > function. Should we set tsc_max_warp to zero before calling it? > Ah, correct. But may be it's not necessary to run the tsc_check_reliability here > at all as what I am doing is ineficient. See my other comment below. > > > > >> + > >> + if ( tsc_max_warp > 0 ) > >> + { > >> + tsc_reliable = 0; > > > > Ditto. It is by default zero. > OK. > > > > >> + printk(XENLOG_INFO "TSC: didn't passed warp test\n"); > > > > So the earlier test by verify_tsc_reliability did already this check and > > printed this out - and also cleared the X86_FEATURE_TSC_RELIABLE. > > > > So can this check above be removed then? > > > > Or are you thinking to ditch what verify_tsc_reliability does? > > > I had the tsc_check_reliability here because TSC could still be deemed reliable > for max_cstate <= 2 or with CONSTANT_TSC + NONSTOP_TSC. The way I have it, the > most likely scenario (i.e. having TSC_RELIABLE) would run twice. Perhaps a > better way of doing this would be to run the warp test solely on > verify_tsc_reliability() in all possible conditions to be deemed reliable? And > then I could even remove almost the whole init_tsctimer if it was to be called > when no warps are observed. So.. > > >> + } > >> + else if ( boot_cpu_has(X86_FEATURE_TSC_RELIABLE) || > >> + (boot_cpu_has(X86_FEATURE_CONSTANT_TSC) && > >> + boot_cpu_has(X86_FEATURE_NONSTOP_TSC)) ) > >> + { > >> + tsc_reliable = 1; > >> + } > >> + else if ( boot_cpu_has(X86_FEATURE_CONSTANT_TSC) ) > >> + { > >> + tsc_reliable = (max_cstate <= 2); > >> + > >> + if ( tsc_reliable ) > >> + printk(XENLOG_INFO "TSC: no deep Cstates, deemed reliable\n"); > >> + else > >> + printk(XENLOG_INFO "TSC: deep Cstates possible, so not reliable\n"); .. is that always true? As in if this is indeed the case should we clear X86_FEATURE_CONSTANT_TSC bit? And make check be part of tsc_check_reliability? Then init_tsctimer() would just need to check for the boot_cpu_has bits being set. As in: if ( boot_cpu_has(X86_FEATURE_TSC_RELIABLE) || (boot_cpu_has(X86_FEATURE_CONSTANT_TSC) && boot_cpu_has(X86_FEATURE_NONSTOP_TSC)) ) { pts->frequency = tsc_freq; return 1; } return 0; And tsc_check_reliability would be in charge of clearing the CPU bits if something is off. But maybe that is not good? As in, could we mess up and clear those bits even if they are suppose to be set?
On 04/01/2016 07:45 PM, Konrad Rzeszutek Wilk wrote: > On Fri, Apr 01, 2016 at 07:38:18PM +0100, Joao Martins wrote: >> [snip] >> >>>> diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c >>>> index ed4ed24..2602dda 100644 >>>> --- a/xen/arch/x86/time.c >>>> +++ b/xen/arch/x86/time.c >>>> @@ -432,6 +432,63 @@ uint64_t ns_to_acpi_pm_tick(uint64_t ns) >>>> } >>>> >>>> /************************************************************ >>>> + * PLATFORM TIMER 4: TSC >>>> + */ >>>> +static u64 tsc_freq; >>>> +static unsigned long tsc_max_warp; >>>> +static void tsc_check_reliability(void); >>>> + >>>> +static int __init init_tsctimer(struct platform_timesource *pts) >>>> +{ >>>> + bool_t tsc_reliable = 0; >>> >>> No need to set it to zero. >> OK. >> >>>> + >>>> + tsc_check_reliability(); >>> >>> This has been already called by verify_tsc_reliability which calls this >>> function. Should we set tsc_max_warp to zero before calling it? >> Ah, correct. But may be it's not necessary to run the tsc_check_reliability here >> at all as what I am doing is ineficient. See my other comment below. >> >>> >>>> + >>>> + if ( tsc_max_warp > 0 ) >>>> + { >>>> + tsc_reliable = 0; >>> >>> Ditto. It is by default zero. >> OK. >> >>> >>>> + printk(XENLOG_INFO "TSC: didn't passed warp test\n"); >>> >>> So the earlier test by verify_tsc_reliability did already this check and >>> printed this out - and also cleared the X86_FEATURE_TSC_RELIABLE. >>> >>> So can this check above be removed then? >>> >>> Or are you thinking to ditch what verify_tsc_reliability does? >>> >> I had the tsc_check_reliability here because TSC could still be deemed reliable >> for max_cstate <= 2 or with CONSTANT_TSC + NONSTOP_TSC. The way I have it, the >> most likely scenario (i.e. having TSC_RELIABLE) would run twice. Perhaps a >> better way of doing this would be to run the warp test solely on >> verify_tsc_reliability() in all possible conditions to be deemed reliable? And >> then I could even remove almost the whole init_tsctimer if it was to be called >> when no warps are observed. > > So.. >> >>>> + } >>>> + else if ( boot_cpu_has(X86_FEATURE_TSC_RELIABLE) || >>>> + (boot_cpu_has(X86_FEATURE_CONSTANT_TSC) && >>>> + boot_cpu_has(X86_FEATURE_NONSTOP_TSC)) ) >>>> + { >>>> + tsc_reliable = 1; >>>> + } >>>> + else if ( boot_cpu_has(X86_FEATURE_CONSTANT_TSC) ) >>>> + { >>>> + tsc_reliable = (max_cstate <= 2); >>>> + >>>> + if ( tsc_reliable ) >>>> + printk(XENLOG_INFO "TSC: no deep Cstates, deemed reliable\n"); >>>> + else >>>> + printk(XENLOG_INFO "TSC: deep Cstates possible, so not reliable\n"); > > .. is that always true? Might not be on older platforms - but I could be stricter here and require max_cstates 0 to allow TSC clocksource to be used. CONSTANT_TSC is set based on CPU model (Manual section 17.14, 2nd paragrah), and (on Intel) AFAICT constant rate can be interpreted to being across P/T states, thus leaving us with the C states. It is only architecturally guaranteed that TSC doesn't stop on deep C-states (Section 36.8.3) if invariant TSC is set (CPUID.80000007:EDX[8]). If this bit is set, on Intel it guarantees to have synchronized and nonstop TSCs across all states (as also stated in the Section 17.14.1). NONSTOP_TSC means that the TSC does not stop in deep C states (C3 or higher), so decreasing the maximum C states to 2 (or disabling entirely) would provide the equivalent to a nonstop TSC. Thus deeming it reliable _if_ the warp test passes. > As in if this is indeed the case should we clear > X86_FEATURE_CONSTANT_TSC bit? And make check be part of tsc_check_reliability? I am not sure about clearing CONSTANT_TSC as this is based on CPU model. But I believe the main issue would be to know whether the TSC stops or on deep C states. > Then init_tsctimer() would just need to check for the boot_cpu_has bits being > set. > > As in: > > if ( boot_cpu_has(X86_FEATURE_TSC_RELIABLE) || > (boot_cpu_has(X86_FEATURE_CONSTANT_TSC) && > boot_cpu_has(X86_FEATURE_NONSTOP_TSC)) ) > { > pts->frequency = tsc_freq; > return 1; > } > Yeah something along those lines. My idea previously was to not even check these bits, and assume init_tsctimer is called knowing that we have a reliable TSC source as we check all of it beforehand? Something like this: static int __init verify_tsc_reliability(void) { if ( boot_cpu_has(X86_FEATURE_TSC_RELIABLE) || (boot_cpu_has(X86_FEATURE_CONSTANT_TSC) && (boot_cpu_has(X86_FEATURE_NONSTOP_TSC) || max_cstate <= 2)) ) { if (tsc_warp_warp) { printk("%s: TSC warp detected, disabling TSC_RELIABLE\n", __func__); if ( boot_cpu_has(X86_FEATURE_TSC_RELIABLE) ) setup_clear_cpu_cap(X86_FEATURE_TSC_RELIABLE); } else if ( !strcmp(opt_clocksource, "tsc") ) { ... And then init_tsctimer would just be: static int __init init_tsctimer(struct platform_timesource *pts) { pts->frequency = tsc_freq; return 1; } ? > return 0; > > And tsc_check_reliability would be in charge of clearing the CPU bits if something > is off. Perhaps you mean verify_tsc_reliability()? That's already clearing TSC_RELIABLE in the event of warps as you previously mentioned. tsc_check_reliability is the actual warp test - might not be the best place to clear it. > But maybe that is not good? As in, could we mess up and clear those bits > even if they are suppose to be set? Hm, I am not sure how we could mess up if we clear them, but these bits look correctly set to me. Joao
>>> On 29.03.16 at 15:44, <joao.m.martins@oracle.com> wrote: > Introduce support for using TSC as platform time which is the highest > resolution time and most performant to get (~20 nsecs). Though there > are also several problems associated with its usage, and there isn't a > complete (and architecturally defined) guarantee that all machines > will provide reliable and monotonic TSC across all CPUs, on different > sockets and different P/C states. I believe Intel to be the only that > can guarantee that. For this reason it's only set when adminstrator > changes "clocksource" boot option to "tsc". Initializing TSC > clocksource requires all CPUs to have the tsc reliability checks > performed. init_xen_time is called before all CPUs are up, and so we > start with HPET at boot time, and switch later to TSC. Please don't make possibly wrong statements: There's no guarantee we'd start with HPET - might as well be PMTMR or PIT. > The switch then > happens on verify_tsc_reliability initcall that is invoked when all > CPUs are up. When attempting to initializing TSC we also check for > time warps and appropriate CPU features i.e. TSC_RELIABLE, > CONSTANT_TSC and NONSTOP_TSC. And in case none of these conditions are > met, we keep the clocksource that was previously initialized on > init_xen_time. DYM "And in case any of these conditions is not met, ..."? Or, looking at the code, something more complex? > --- a/xen/arch/x86/time.c > +++ b/xen/arch/x86/time.c > @@ -432,6 +432,63 @@ uint64_t ns_to_acpi_pm_tick(uint64_t ns) > } > > /************************************************************ > + * PLATFORM TIMER 4: TSC > + */ > +static u64 tsc_freq; > +static unsigned long tsc_max_warp; > +static void tsc_check_reliability(void); > + > +static int __init init_tsctimer(struct platform_timesource *pts) > +{ > + bool_t tsc_reliable = 0; > + > + tsc_check_reliability(); > + > + if ( tsc_max_warp > 0 ) > + { > + tsc_reliable = 0; Redundant assignment. > +static void resume_tsctimer(struct platform_timesource *pts) > +{ > +} No need for an empty function - other clock sources don't have such empty stubs either (i.e. the caller checks for NULL). > @@ -541,6 +613,10 @@ static int __init try_platform_timer(struct platform_timesource *pts) > if ( rc <= 0 ) > return rc; > > + /* We have a platform timesource already so reset it */ > + if ( plt_src.counter_bits != 0 ) > + reset_platform_timer(); What if any of the time functions get used between here and the setting of the new clock source? For example, what will happen to log messages when "console_timestamps=..." is in effect? > @@ -566,7 +642,9 @@ static void __init init_platform_timer(void) > struct platform_timesource *pts = NULL; > int i, rc = -1; > > - if ( opt_clocksource[0] != '\0' ) > + /* clocksource=tsc is initialized later when all CPUS are up */ > + if ( (opt_clocksource[0] != '\0') && > + (strcmp(opt_clocksource, "tsc") != 0) ) In line with the inverse check done below (using ! instead of == 0) please omit the != 0 here. > @@ -1437,6 +1515,20 @@ static int __init verify_tsc_reliability(void) > } > } > > + if ( !strcmp(opt_clocksource, "tsc") ) > + { > + if ( try_platform_timer(&plt_tsc) > 0 ) > + { > + printk(XENLOG_INFO "Switched to Platform timer %s TSC\n", > + freq_string(plt_src.frequency)); > + > + init_percpu_time(); So you re-do this for the BP, but not for any of the APs? Jan
On 04/05/2016 11:43 AM, Jan Beulich wrote: >>>> On 29.03.16 at 15:44, <joao.m.martins@oracle.com> wrote: >> Introduce support for using TSC as platform time which is the highest >> resolution time and most performant to get (~20 nsecs). Though there >> are also several problems associated with its usage, and there isn't a >> complete (and architecturally defined) guarantee that all machines >> will provide reliable and monotonic TSC across all CPUs, on different >> sockets and different P/C states. I believe Intel to be the only that >> can guarantee that. For this reason it's only set when adminstrator >> changes "clocksource" boot option to "tsc". Initializing TSC >> clocksource requires all CPUs to have the tsc reliability checks >> performed. init_xen_time is called before all CPUs are up, and so we >> start with HPET at boot time, and switch later to TSC. > > Please don't make possibly wrong statements: There's no guarantee > we'd start with HPET - might as well be PMTMR or PIT. My apologies - while the commit message doesn't express this, I meant it to be "for example we would start with HPET (...)". I gave that example as it's the one preferable in plt_timers. > >> The switch then >> happens on verify_tsc_reliability initcall that is invoked when all >> CPUs are up. When attempting to initializing TSC we also check for >> time warps and appropriate CPU features i.e. TSC_RELIABLE, >> CONSTANT_TSC and NONSTOP_TSC. And in case none of these conditions are >> met, we keep the clocksource that was previously initialized on >> init_xen_time. > > DYM "And in case any of these conditions is not met, ..."? Yes. My use of "none" may be wrong here so just to be sure what I mean is: (...) If all of the conditions in the previous sentence are not met (...) > Or, > looking at the code, something more complex? > >> --- a/xen/arch/x86/time.c >> +++ b/xen/arch/x86/time.c >> @@ -432,6 +432,63 @@ uint64_t ns_to_acpi_pm_tick(uint64_t ns) >> } >> >> /************************************************************ >> + * PLATFORM TIMER 4: TSC >> + */ >> +static u64 tsc_freq; >> +static unsigned long tsc_max_warp; >> +static void tsc_check_reliability(void); >> + >> +static int __init init_tsctimer(struct platform_timesource *pts) >> +{ >> + bool_t tsc_reliable = 0; >> + >> + tsc_check_reliability(); >> + >> + if ( tsc_max_warp > 0 ) >> + { >> + tsc_reliable = 0; > > Redundant assignment. > Yes, Konrad had point that out too. Fixed that. >> +static void resume_tsctimer(struct platform_timesource *pts) >> +{ >> +} > > No need for an empty function - other clock sources don't have > such empty stubs either (i.e. the caller checks for NULL). > OK. >> @@ -541,6 +613,10 @@ static int __init try_platform_timer(struct platform_timesource *pts) >> if ( rc <= 0 ) >> return rc; >> >> + /* We have a platform timesource already so reset it */ >> + if ( plt_src.counter_bits != 0 ) >> + reset_platform_timer(); > > What if any of the time functions get used between here and the > setting of the new clock source? For example, what will happen to > log messages when "console_timestamps=..." is in effect? time functions will use NOW() (i.e. get_s_time) which uses cpu_time structs being updated on local_time_calibration() or cpu frequency changes. reset_platform_timer() will zero out some of the variables used by the plt_overflow and read_platform_stime(). The overflow and calibration isn't an issue as the timers are previously killed so any further calls will use what's on cpu_time while plt_src is being changed. The case I could see this being different is if cpu_frequency_change is called between reset_platform_time() and the next update of stime_platform_stamp. In this situation the calculation would end up a bit different meaning subsequent calls of read_platform_stime() would return the equivalent to scale_delta(plt_src->read_counter(), plt_scale), as opposed to computing with a previous taken stime_platform_stamp and incrementing a difference with a counter read. But can this situation actually happen? > >> @@ -566,7 +642,9 @@ static void __init init_platform_timer(void) >> struct platform_timesource *pts = NULL; >> int i, rc = -1; >> >> - if ( opt_clocksource[0] != '\0' ) >> + /* clocksource=tsc is initialized later when all CPUS are up */ >> + if ( (opt_clocksource[0] != '\0') && >> + (strcmp(opt_clocksource, "tsc") != 0) ) > > In line with the inverse check done below (using ! instead of == 0) > please omit the != 0 here. > OK. >> @@ -1437,6 +1515,20 @@ static int __init verify_tsc_reliability(void) >> } >> } >> >> + if ( !strcmp(opt_clocksource, "tsc") ) >> + { >> + if ( try_platform_timer(&plt_tsc) > 0 ) >> + { >> + printk(XENLOG_INFO "Switched to Platform timer %s TSC\n", >> + freq_string(plt_src.frequency)); >> + >> + init_percpu_time(); > > So you re-do this for the BP, but not for any of the APs? I had overlooked the invocation made in start_secondary(). I also need to redo this on the APs. > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel >
>>> On 05.04.16 at 16:56, <joao.m.martins@oracle.com> wrote: > On 04/05/2016 11:43 AM, Jan Beulich wrote: >>>>> On 29.03.16 at 15:44, <joao.m.martins@oracle.com> wrote: >>> @@ -541,6 +613,10 @@ static int __init try_platform_timer(struct platform_timesource *pts) >>> if ( rc <= 0 ) >>> return rc; >>> >>> + /* We have a platform timesource already so reset it */ >>> + if ( plt_src.counter_bits != 0 ) >>> + reset_platform_timer(); >> >> What if any of the time functions get used between here and the >> setting of the new clock source? For example, what will happen to >> log messages when "console_timestamps=..." is in effect? > time functions will use NOW() (i.e. get_s_time) which uses cpu_time structs > being updated on local_time_calibration() or cpu frequency changes. > reset_platform_timer() will zero out some of the variables used by the > plt_overflow and read_platform_stime(). The overflow and calibration isn't an > issue as the timers are previously killed so any further calls will use what's > on cpu_time while plt_src is being changed. > > The case I could see this being different is if cpu_frequency_change is called > between reset_platform_time() and the next update of stime_platform_stamp. In > this situation the calculation would end up a bit different meaning subsequent > calls of read_platform_stime() would return the equivalent to > scale_delta(plt_src->read_counter(), plt_scale), as opposed to computing with a > previous taken stime_platform_stamp and incrementing a difference with a counter > read. But can this situation actually happen? No matter which CPU freq model is being used (right now, may change when the Intel P-state driver finally arrives), Dom0 is required to be executing already, so no issue here. Since you didn't go into any detail on the specific example of log time stamps - am I to imply that they're completely unaffected by this (i.e. there's also no risk of them going backwards for a brief period of time)? Jan
On 04/05/2016 04:12 PM, Jan Beulich wrote: >>>> On 05.04.16 at 16:56, <joao.m.martins@oracle.com> wrote: >> On 04/05/2016 11:43 AM, Jan Beulich wrote: >>>>>> On 29.03.16 at 15:44, <joao.m.martins@oracle.com> wrote: >>>> @@ -541,6 +613,10 @@ static int __init try_platform_timer(struct platform_timesource *pts) >>>> if ( rc <= 0 ) >>>> return rc; >>>> >>>> + /* We have a platform timesource already so reset it */ >>>> + if ( plt_src.counter_bits != 0 ) >>>> + reset_platform_timer(); >>> >>> What if any of the time functions get used between here and the >>> setting of the new clock source? For example, what will happen to >>> log messages when "console_timestamps=..." is in effect? >> time functions will use NOW() (i.e. get_s_time) which uses cpu_time structs >> being updated on local_time_calibration() or cpu frequency changes. >> reset_platform_timer() will zero out some of the variables used by the >> plt_overflow and read_platform_stime(). The overflow and calibration isn't an >> issue as the timers are previously killed so any further calls will use what's >> on cpu_time while plt_src is being changed. >> >> The case I could see this being different is if cpu_frequency_change is called >> between reset_platform_time() and the next update of stime_platform_stamp. In >> this situation the calculation would end up a bit different meaning subsequent >> calls of read_platform_stime() would return the equivalent to >> scale_delta(plt_src->read_counter(), plt_scale), as opposed to computing with a >> previous taken stime_platform_stamp and incrementing a difference with a counter >> read. But can this situation actually happen? > > No matter which CPU freq model is being used (right now, may > change when the Intel P-state driver finally arrives), Dom0 is > required to be executing already, so no issue here. Cool. > Since you didn't go into any detail on the specific example of log > time stamps - am I to imply that they're completely unaffected by > this (i.e. there's also no risk of them going backwards for a brief > period of time)? log timestamps uses wallclock_time() function which uses NOW() and follows the first paragraph I explained. But I need to correct that statement as get_s_time uses a previously seeded value from stime_local_stamp (which is set through read_platform_time()). Time going backwards could happen only if TSC is behind the currently-in-use clocksource. During that brief period it uses the values taken from cpu_time but after I set the new clocksource it would go backwards as I zeroed the previous stime/stamp snapshots in reset_platform_time. In this case with I propose I don't have any other timestamps (platform_timer_stamp = (stime_platform_stamp = 0)) to calculate the difference with, so it would go backwards if the TSC falls behind. I could prevent this situation from happening by having an offset which would be used until the new clocksource catches up to the old one? I am also searching in the manual, if that can situation can happen. Joao
diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c index ed4ed24..2602dda 100644 --- a/xen/arch/x86/time.c +++ b/xen/arch/x86/time.c @@ -432,6 +432,63 @@ uint64_t ns_to_acpi_pm_tick(uint64_t ns) } /************************************************************ + * PLATFORM TIMER 4: TSC + */ +static u64 tsc_freq; +static unsigned long tsc_max_warp; +static void tsc_check_reliability(void); + +static int __init init_tsctimer(struct platform_timesource *pts) +{ + bool_t tsc_reliable = 0; + + tsc_check_reliability(); + + if ( tsc_max_warp > 0 ) + { + tsc_reliable = 0; + printk(XENLOG_INFO "TSC: didn't passed warp test\n"); + } + else if ( boot_cpu_has(X86_FEATURE_TSC_RELIABLE) || + (boot_cpu_has(X86_FEATURE_CONSTANT_TSC) && + boot_cpu_has(X86_FEATURE_NONSTOP_TSC)) ) + { + tsc_reliable = 1; + } + else if ( boot_cpu_has(X86_FEATURE_CONSTANT_TSC) ) + { + tsc_reliable = (max_cstate <= 2); + + if ( tsc_reliable ) + printk(XENLOG_INFO "TSC: no deep Cstates, deemed reliable\n"); + else + printk(XENLOG_INFO "TSC: deep Cstates possible, so not reliable\n"); + } + + pts->frequency = tsc_freq; + return tsc_reliable; +} + +static u64 read_tsc(void) +{ + return rdtsc(); +} + +static void resume_tsctimer(struct platform_timesource *pts) +{ +} + +static struct platform_timesource __initdata plt_tsc = +{ + .id = "tsc", + .name = "TSC", + .read_counter = read_tsc, + .counter_bits = 64, + .init = init_tsctimer, + .resume = resume_tsctimer, +}; + +/************************************************************ * GENERIC PLATFORM TIMER INFRASTRUCTURE */ @@ -533,6 +590,21 @@ static void resume_platform_timer(void) plt_stamp = plt_src.read_counter(); } +static void __init reset_platform_timer(void) +{ + /* Deactivate any timers running */ + kill_timer(&plt_overflow_timer); + kill_timer(&calibration_timer); + + /* Reset counters and stamps */ + spin_lock_irq(&platform_timer_lock); + plt_stamp = 0; + plt_stamp64 = 0; + platform_timer_stamp = 0; + stime_platform_stamp = 0; + spin_unlock_irq(&platform_timer_lock); +} + static int __init try_platform_timer(struct platform_timesource *pts) { int rc = -1; @@ -541,6 +613,10 @@ static int __init try_platform_timer(struct platform_timesource *pts) if ( rc <= 0 ) return rc; + /* We have a platform timesource already so reset it */ + if ( plt_src.counter_bits != 0 ) + reset_platform_timer(); + plt_mask = (u64)~0ull >> (64 - pts->counter_bits); set_time_scale(&plt_scale, pts->frequency); @@ -566,7 +642,9 @@ static void __init init_platform_timer(void) struct platform_timesource *pts = NULL; int i, rc = -1; - if ( opt_clocksource[0] != '\0' ) + /* clocksource=tsc is initialized later when all CPUS are up */ + if ( (opt_clocksource[0] != '\0') && + (strcmp(opt_clocksource, "tsc") != 0) ) { for ( i = 0; i < ARRAY_SIZE(plt_timers); i++ ) { @@ -1192,7 +1270,7 @@ static void check_tsc_warp(unsigned long tsc_khz, unsigned long *max_warp) } } -static unsigned long tsc_max_warp, tsc_check_count; +static unsigned long tsc_check_count; static cpumask_t tsc_check_cpumask; static void tsc_check_slave(void *unused) @@ -1437,6 +1515,20 @@ static int __init verify_tsc_reliability(void) } } + if ( !strcmp(opt_clocksource, "tsc") ) + { + if ( try_platform_timer(&plt_tsc) > 0 ) + { + printk(XENLOG_INFO "Switched to Platform timer %s TSC\n", + freq_string(plt_src.frequency)); + + init_percpu_time(); + + init_timer(&calibration_timer, time_calibration, NULL, 0); + set_timer(&calibration_timer, NOW() + EPOCH); + } + } + return 0; } __initcall(verify_tsc_reliability); @@ -1476,6 +1568,7 @@ void __init early_time_init(void) struct cpu_time *t = &this_cpu(cpu_time); u64 tmp = init_pit_and_calibrate_tsc(); + tsc_freq = tmp; set_time_scale(&t->tsc_scale, tmp); t->local_tsc_stamp = boot_tsc_stamp;
Introduce support for using TSC as platform time which is the highest resolution time and most performant to get (~20 nsecs). Though there are also several problems associated with its usage, and there isn't a complete (and architecturally defined) guarantee that all machines will provide reliable and monotonic TSC across all CPUs, on different sockets and different P/C states. I believe Intel to be the only that can guarantee that. For this reason it's only set when adminstrator changes "clocksource" boot option to "tsc". Initializing TSC clocksource requires all CPUs to have the tsc reliability checks performed. init_xen_time is called before all CPUs are up, and so we start with HPET at boot time, and switch later to TSC. The switch then happens on verify_tsc_reliability initcall that is invoked when all CPUs are up. When attempting to initializing TSC we also check for time warps and appropriate CPU features i.e. TSC_RELIABLE, CONSTANT_TSC and NONSTOP_TSC. And in case none of these conditions are met, we keep the clocksource that was previously initialized on init_xen_time. It is also worth noting that with clocksource=tsc there isn't any need to synchronise with another clocksource, and I could verify that great portion the time skew was eliminated and seeing much less time warps happening. With HPET I used to observe ~500 warps in the period of 1h of around 27 us, and with TSC down to 50 warps in the same period having each warp < 100 ns. The warps still exist though but are only related to cross CPU calibration (being how much it takes to rendezvous with master), in which a later patch in this series aims to solve. Signed-off-by: Joao Martins <joao.m.martins@oracle.com> --- Cc: Keir Fraser <keir@xen.org> Cc: Jan Beulich <jbeulich@suse.com> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Changes since v1: - s/printk/printk(XENLOG_INFO - Remove extra space on inner brackets - Add missing space around brackets - Defer clocksource TSC initialization when all CPUs are up. Changes since RFC: - Spelling fixes in the commit message. - Remove unused clocksource_is_tsc variable and introduce it instead on the patch that uses it. - Move plt_tsc from second to last in the available clocksources. --- xen/arch/x86/time.c | 97 +++++++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 95 insertions(+), 2 deletions(-)