Message ID | 1472042632-12883-3-git-send-email-joao.m.martins@oracle.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
>>> On 24.08.16 at 14:43, <joao.m.martins@oracle.com> wrote: > This patch introduces support for using TSC as platform time source > which is the highest resolution time and most performant to get (~20 > nsecs). Is this indeed still the case with the recently added fences? > Though there are also several problems associated with its > usage, and there isn't a complete (and architecturally defined) > guarantee that all machines will provide reliable and monotonic TSC in > all cases (I believe Intel to be the only that can guarantee that?) For > this reason it's set with less priority when compared to HPET unless > adminstrator changes "clocksource" boot option to "tsc". Initializing > TSC clocksource requires all CPUs up to have the tsc reliability checks > performed. init_xen_time is called before all CPUs are up, and so we > would start with HPET at boot time, and switch later to TSC. The switch > then happens on verify_tsc_reliability initcall that is invoked when all > CPUs are up. When attempting to initialize TSC we also check for time > warps and if it has invariant TSC. And in case none of these conditions > are met, DYM "in case any of these conditions is not met"? > we keep the clocksource that was previously initialized on > init_xen_time. Note that while we deem reliable a CONSTANT_TSC with no > deep C-states, it might not always be the case, so we're conservative > and allow TSC to be used as platform timer only with invariant TSC. > > Since b64438c7c ("x86/time: use correct (local) time stamp in > constant-TSC calibration fast path") updates to cpu time use local > stamps, which means platform timer is only used to seed the initial cpu > time. With clocksource=tsc there is no need to be in sync with another > clocksource, so we reseed the local/master stamps to be values of TSC > and update the platform time stamps accordingly. Time calibration is set > to 1sec after we switch to TSC, thus these stamps are reseeded to also > ensure monotonic returning values right after the point we switch to > TSC. This is also to avoid the possibility of having inconsistent > readings in this short period (i.e. until calibration fires). > > Signed-off-by: Joao Martins <joao.m.martins@oracle.com> > --- > Cc: Jan Beulich <jbeulich@suse.com> > Cc: Andrew Cooper <andrew.cooper3@citrix.com> > > Changes since v2: > - Suggest "HPET switching to TSC" only as an example as otherwise it > would be misleading on platforms not having one. What does this refer to? The description still unconditionally talks about HPET. > - Change init_tsctimer to skip all the tests and assume it's called > only on reliable TSC conditions and no warps observed. Tidy > initialization on verify_tsc_reliability as suggested by Konrad. Which means that contrary to what you say in the cover letter there's nothing to prevent it from being used when CPU hotplug is possible. I don't think you should skip basic sanity checks, and place entirely on the admin the burden of knowing whether the option is safe to use. > +static struct platform_timesource __initdata plt_tsc = > +{ > + .id = "tsc", > + .name = "TSC", > + .read_counter = read_tsc, > + .counter_bits = 64, > + /* Not called by init_platform_timer as it is not on the plt_timers array */ > + .init = init_tsctimer, I consider this comment misleading: It took me several rounds of looking until I understood that you don't mean to say the pointer gets filled here only to not leave a NULL one around. I'd prefer if you removed it. > @@ -604,7 +647,9 @@ static u64 __init init_platform_timer(void) > unsigned int i; > s64 rc = -1; > > - if ( opt_clocksource[0] != '\0' ) > + /* clocksource=tsc is initialized via __initcalls (when CPUs are up). */ > + if ( (opt_clocksource[0] != '\0') && > + (strcmp(opt_clocksource, "tsc")) ) Stray parentheses around a function call. > @@ -1481,6 +1526,40 @@ static int __init verify_tsc_reliability(void) > __func__); > setup_clear_cpu_cap(X86_FEATURE_TSC_RELIABLE); > } > + else if ( !strcmp(opt_clocksource, "tsc") ) > + { > + int cpu; unsigned int > + if ( try_platform_timer(&plt_tsc) <= 0 ) > + return 0; If you merged this into the if() above you could avoid this extra return path, which in turn lowers the risk of something getting added to the tail of the function and then not being taken care of here. > + /* > + * Platform timer has changed and CPU time will only be updated > + * after we set again the calibration timer, which means we need to > + * seed again each local CPU time. At this stage TSC is known to be > + * reliable i.e. monotonically increasing across all CPUs so this > + * lets us remove the skew between platform timer and TSC, since > + * these are now effectively the same. > + */ > + for_each_online_cpu( cpu ) Please decide whether you consider for_each_online_cpu a control keyword like item. If you do, a blank is missing prior to the opening parenthesis. If you don't, the blanks inside the parentheses need to be dropped. > + { > + struct cpu_time *t = &per_cpu(cpu_time, cpu); > + > + t->stamp.local_tsc = boot_tsc_stamp; > + t->stamp.local_stime = 0; > + t->stamp.local_stime = get_s_time_fixed(boot_tsc_stamp); > + t->stamp.master_stime = t->stamp.local_stime; > + } Without any synchronization, how "good" are the chances that this updating would race with something the other CPUs do? > @@ -1528,6 +1607,7 @@ void __init early_time_init(void) > > preinit_pit(); > tmp = init_platform_timer(); > + plt_tsc.frequency = tmp; How can this be the TSC frequency? init_platform_timer() specifically skips the TSC. And I can see no other place where plt_tsc.frequency gets initialized. Am I overlooking something? Jan
On 08/25/2016 11:06 AM, Jan Beulich wrote: >>>> On 24.08.16 at 14:43, <joao.m.martins@oracle.com> wrote: >> This patch introduces support for using TSC as platform time source >> which is the highest resolution time and most performant to get (~20 >> nsecs). > > Is this indeed still the case with the recently added fences? Hmm, may be not as fast with the added fences, But definitely the fastest time source. If you prefer I can remove the mention to time taken. > >> Though there are also several problems associated with its >> usage, and there isn't a complete (and architecturally defined) >> guarantee that all machines will provide reliable and monotonic TSC in >> all cases (I believe Intel to be the only that can guarantee that?) For >> this reason it's set with less priority when compared to HPET unless >> adminstrator changes "clocksource" boot option to "tsc". Initializing >> TSC clocksource requires all CPUs up to have the tsc reliability checks >> performed. init_xen_time is called before all CPUs are up, and so we >> would start with HPET at boot time, and switch later to TSC. The switch >> then happens on verify_tsc_reliability initcall that is invoked when all >> CPUs are up. When attempting to initialize TSC we also check for time >> warps and if it has invariant TSC. And in case none of these conditions >> are met, > > DYM "in case any of these conditions is not met"? Yeah I will change it. My english isn't at my best these days. > >> we keep the clocksource that was previously initialized on >> init_xen_time. Note that while we deem reliable a CONSTANT_TSC with no >> deep C-states, it might not always be the case, so we're conservative >> and allow TSC to be used as platform timer only with invariant TSC. >> >> Since b64438c7c ("x86/time: use correct (local) time stamp in >> constant-TSC calibration fast path") updates to cpu time use local >> stamps, which means platform timer is only used to seed the initial cpu >> time. With clocksource=tsc there is no need to be in sync with another >> clocksource, so we reseed the local/master stamps to be values of TSC >> and update the platform time stamps accordingly. Time calibration is set >> to 1sec after we switch to TSC, thus these stamps are reseeded to also >> ensure monotonic returning values right after the point we switch to >> TSC. This is also to avoid the possibility of having inconsistent >> readings in this short period (i.e. until calibration fires). >> >> Signed-off-by: Joao Martins <joao.m.martins@oracle.com> >> --- >> Cc: Jan Beulich <jbeulich@suse.com> >> Cc: Andrew Cooper <andrew.cooper3@citrix.com> >> >> Changes since v2: >> - Suggest "HPET switching to TSC" only as an example as otherwise it >> would be misleading on platforms not having one. > > What does this refer to? The description still unconditionally talks > about HPET. I remember adding "For example" in the beginning of the sentence. This was clearly a distraction on my end (you also found another small mistake in the changelog of patch 3, despite having addressed the comment). > >> - Change init_tsctimer to skip all the tests and assume it's called >> only on reliable TSC conditions and no warps observed. Tidy >> initialization on verify_tsc_reliability as suggested by Konrad. > > Which means that contrary to what you say in the cover letter > there's nothing to prevent it from being used when CPU hotplug > is possible. Probably the cover letter wasn't completely clear on this. I mention that I split it between Patch 2 and 5 (intent was for easier review), and you can see that I add check in patch 5, plus preventing online of CPUs too. > I don't think you should skip basic sanity checks, and > place entirely on the admin the burden of knowing whether the > option is safe to use. Neither do I think it should be skipped. We mainly don't duplicate these checks. In the end I am just running init_tsctimer, in the TSC_RELIABLE block and if no warps are observed. So putting that in init_tsctimer would just duplicate the previously done checks. Or would you still prefer as done in previous version i.e. all checks in both init_tsctimer and verify_tsc_reliability? >> +static struct platform_timesource __initdata plt_tsc = >> +{ >> + .id = "tsc", >> + .name = "TSC", >> + .read_counter = read_tsc, >> + .counter_bits = 64, >> + /* Not called by init_platform_timer as it is not on the plt_timers array */ >> + .init = init_tsctimer, > > I consider this comment misleading: It took me several rounds of > looking until I understood that you don't mean to say the pointer > gets filled here only to not leave a NULL one around. I'd prefer if > you removed it. > OK. >> @@ -604,7 +647,9 @@ static u64 __init init_platform_timer(void) >> unsigned int i; >> s64 rc = -1; >> >> - if ( opt_clocksource[0] != '\0' ) >> + /* clocksource=tsc is initialized via __initcalls (when CPUs are up). */ >> + if ( (opt_clocksource[0] != '\0') && >> + (strcmp(opt_clocksource, "tsc")) ) > > Stray parentheses around a function call. Fixed. > >> @@ -1481,6 +1526,40 @@ static int __init verify_tsc_reliability(void) >> __func__); >> setup_clear_cpu_cap(X86_FEATURE_TSC_RELIABLE); >> } >> + else if ( !strcmp(opt_clocksource, "tsc") ) >> + { >> + int cpu; > > unsigned int Fixed. > >> + if ( try_platform_timer(&plt_tsc) <= 0 ) >> + return 0; > > If you merged this into the if() above you could avoid this extra > return path, which in turn lowers the risk of something getting > added to the tail of the function and then not being taken care of > here. Good point, Fixed. > >> + /* >> + * Platform timer has changed and CPU time will only be updated >> + * after we set again the calibration timer, which means we need to >> + * seed again each local CPU time. At this stage TSC is known to be >> + * reliable i.e. monotonically increasing across all CPUs so this >> + * lets us remove the skew between platform timer and TSC, since >> + * these are now effectively the same. >> + */ >> + for_each_online_cpu( cpu ) > > Please decide whether you consider for_each_online_cpu a > control keyword like item. If you do, a blank is missing prior to the > opening parenthesis. If you don't, the blanks inside the parentheses > need to be dropped. I will go with the control keyword like style. >> + { >> + struct cpu_time *t = &per_cpu(cpu_time, cpu); >> + >> + t->stamp.local_tsc = boot_tsc_stamp; >> + t->stamp.local_stime = 0; >> + t->stamp.local_stime = get_s_time_fixed(boot_tsc_stamp); >> + t->stamp.master_stime = t->stamp.local_stime; >> + } > > Without any synchronization, how "good" are the chances that > this updating would race with something the other CPUs do? I assumed that at this stage init calls are still being invoked that we update all cpus time infos, but maybe it's a misconception. I can have this part in one function and have it done on_selected_cpus() and wait until all cpu time structures get updated. That perhaps would be better? > >> @@ -1528,6 +1607,7 @@ void __init early_time_init(void) >> >> preinit_pit(); >> tmp = init_platform_timer(); >> + plt_tsc.frequency = tmp; > > How can this be the TSC frequency? init_platform_timer() > specifically skips the TSC. And I can see no other place where > plt_tsc.frequency gets initialized. Am I overlooking something? > That's the calibrated TSC frequency. Before I was setting pts->frequency in init_tsctimer through a temporary variable called tsc_freq. So I thought I could just drop the variable and set plt_tsc directly. The difference though from previous versions is that since commit 9334029 this value is returned from platform time source init() and calibrated against platform timer, instead of always against PIT. Joao
>>> On 26.08.16 at 17:11, <joao.m.martins@oracle.com> wrote: > On 08/25/2016 11:06 AM, Jan Beulich wrote: >>>>> On 24.08.16 at 14:43, <joao.m.martins@oracle.com> wrote: >>> This patch introduces support for using TSC as platform time source >>> which is the highest resolution time and most performant to get (~20 >>> nsecs). >> >> Is this indeed still the case with the recently added fences? > Hmm, may be not as fast with the added fences, But definitely the fastest > time > source. If you prefer I can remove the mention to time taken. I'd say either you re-measure it with the added fences, or remove it as potentially being stale/wrong. >>> - Change init_tsctimer to skip all the tests and assume it's called >>> only on reliable TSC conditions and no warps observed. Tidy >>> initialization on verify_tsc_reliability as suggested by Konrad. >> >> Which means that contrary to what you say in the cover letter >> there's nothing to prevent it from being used when CPU hotplug >> is possible. > Probably the cover letter wasn't completely clear on this. I mention that I split it > between Patch 2 and 5 (intent was for easier review), and you can see that I add > check in patch 5, plus preventing online of CPUs too. > >> I don't think you should skip basic sanity checks, and >> place entirely on the admin the burden of knowing whether the >> option is safe to use. > Neither do I think it should be skipped. We mainly don't duplicate these checks. In > the end I am just running init_tsctimer, in the TSC_RELIABLE block and if no warps > are observed. So putting that in init_tsctimer would just duplicate the previously > done checks. Or would you still prefer as done in previous version i.e. all checks in > both init_tsctimer and verify_tsc_reliability? I'm not sure they're needed in both places; what you need to make certain is that they're reliably gone through, and that this happens early enough. As to breaking this off into the later patch - please don't forget that this (as any) series may get applied in pieces, so deferring a crucial check to a later patch is not acceptable. If you mean things to be split for easier review you may check whether you can find a way to add the check in q prereq patch. >>> + { >>> + struct cpu_time *t = &per_cpu(cpu_time, cpu); >>> + >>> + t->stamp.local_tsc = boot_tsc_stamp; >>> + t->stamp.local_stime = 0; >>> + t->stamp.local_stime = get_s_time_fixed(boot_tsc_stamp); >>> + t->stamp.master_stime = t->stamp.local_stime; >>> + } >> >> Without any synchronization, how "good" are the chances that >> this updating would race with something the other CPUs do? > > I assumed that at this stage init calls are still being invoked that we update all > cpus time infos, but maybe it's a misconception. I can have this part in one function > and have it done on_selected_cpus() and wait until all cpu time structures get > updated. That perhaps would be better? I think so - even if the risk of thee being a race right now is rather low, that way you'd avoid this becoming a problem if secondary CPUs get made do something other than idling at this point in time. >>> @@ -1528,6 +1607,7 @@ void __init early_time_init(void) >>> >>> preinit_pit(); >>> tmp = init_platform_timer(); >>> + plt_tsc.frequency = tmp; >> >> How can this be the TSC frequency? init_platform_timer() >> specifically skips the TSC. And I can see no other place where >> plt_tsc.frequency gets initialized. Am I overlooking something? >> > That's the calibrated TSC frequency. Before I was setting pts->frequency in > init_tsctimer through a temporary variable called tsc_freq. So I thought I could just > drop the variable and set plt_tsc directly. The difference though from previous > versions is that since commit 9334029 this value is returned from platform time > source init() and calibrated against platform timer, instead of always against PIT. This doesn't seem to answer my primary question: Where does plt_tsc.frequency get initialized now? try_platform_timer() and init_platform_timer() don't - PIT and ACPI timer use static initializers, and HEPT gets taken care of in init_hpet(), and hence so should init_tsctimer() do (instead of just returning this apparently never initialized value). And that's unrelated to what ->init() returns. Jan
On 08/29/2016 10:36 AM, Jan Beulich wrote: >>>> On 26.08.16 at 17:11, <joao.m.martins@oracle.com> wrote: >> On 08/25/2016 11:06 AM, Jan Beulich wrote: >>>>>> On 24.08.16 at 14:43, <joao.m.martins@oracle.com> wrote: >>>> This patch introduces support for using TSC as platform time source >>>> which is the highest resolution time and most performant to get (~20 >>>> nsecs). >>> >>> Is this indeed still the case with the recently added fences? >> Hmm, may be not as fast with the added fences, But definitely the fastest >> time >> source. If you prefer I can remove the mention to time taken. > > I'd say either you re-measure it with the added fences, or remove it > as potentially being stale/wrong. OK. >>>> - Change init_tsctimer to skip all the tests and assume it's called >>>> only on reliable TSC conditions and no warps observed. Tidy >>>> initialization on verify_tsc_reliability as suggested by Konrad. >>> >>> Which means that contrary to what you say in the cover letter >>> there's nothing to prevent it from being used when CPU hotplug >>> is possible. >> Probably the cover letter wasn't completely clear on this. I mention that I split it >> between Patch 2 and 5 (intent was for easier review), and you can see that I add >> check in patch 5, plus preventing online of CPUs too. >> >>> I don't think you should skip basic sanity checks, and >>> place entirely on the admin the burden of knowing whether the >>> option is safe to use. >> Neither do I think it should be skipped. We mainly don't duplicate these checks. In >> the end I am just running init_tsctimer, in the TSC_RELIABLE block and if no warps >> are observed. So putting that in init_tsctimer would just duplicate the previously >> done checks. Or would you still prefer as done in previous version i.e. all checks in >> both init_tsctimer and verify_tsc_reliability? > > I'm not sure they're needed in both places; what you need to make > certain is that they're reliably gone through, and that this happens > early enough. They are reliably gone through and we get to avoid duplication of checks. Unless there's a preference to re-add these checks in init_tsctimer, I'll keep these as is. verify_tsc_reliability(...) needs to perform this checks and init_tsctimer is only called these reliable TSC conditions. > As to breaking this off into the later patch - please don't forget > that this (as any) series may get applied in pieces, so deferring a > crucial check to a later patch is not acceptable. If you mean things > to be split for easier review you may check whether you can find > a way to add the check in q prereq patch. OK, note taken. I'll get this check moved from patch 5 to here, as it's the place where it should be. >>>> + { >>>> + struct cpu_time *t = &per_cpu(cpu_time, cpu); >>>> + >>>> + t->stamp.local_tsc = boot_tsc_stamp; >>>> + t->stamp.local_stime = 0; >>>> + t->stamp.local_stime = get_s_time_fixed(boot_tsc_stamp); >>>> + t->stamp.master_stime = t->stamp.local_stime; >>>> + } >>> >>> Without any synchronization, how "good" are the chances that >>> this updating would race with something the other CPUs do? >> >> I assumed that at this stage init calls are still being invoked that we update all >> cpus time infos, but maybe it's a misconception. I can have this part in one function >> and have it done on_selected_cpus() and wait until all cpu time structures get >> updated. That perhaps would be better? > > I think so - even if the risk of thee being a race right now is rather > low, that way you'd avoid this becoming a problem if secondary > CPUs get made do something other than idling at this point in time. Indeed, I'll do it that way then. >>>> @@ -1528,6 +1607,7 @@ void __init early_time_init(void) >>>> >>>> preinit_pit(); >>>> tmp = init_platform_timer(); >>>> + plt_tsc.frequency = tmp; >>> >>> How can this be the TSC frequency? init_platform_timer() >>> specifically skips the TSC. And I can see no other place where >>> plt_tsc.frequency gets initialized. Am I overlooking something? >>> >> That's the calibrated TSC frequency. Before I was setting pts->frequency in >> init_tsctimer through a temporary variable called tsc_freq. So I thought I could just >> drop the variable and set plt_tsc directly. The difference though from previous >> versions is that since commit 9334029 this value is returned from platform time >> source init() and calibrated against platform timer, instead of always against PIT. > > This doesn't seem to answer my primary question: Where does > plt_tsc.frequency get initialized now? try_platform_timer() and > init_platform_timer() don't - PIT and ACPI timer use static > initializers, and HEPT gets taken care of in init_hpet(), and hence so > should init_tsctimer() do (instead of just returning this apparently > never initialized value). And that's unrelated to what ->init() returns. plt_tsc.frequency is certainly initialized in early_time_init(). And then on try_platform_timer we have plt_src = *pts (pts would be a pointer to plt_tsc when called from verify_tsc_reliability()). IOW, effectively I changed from this: #v2 static u64 tsc_freq; static s64 __init init_tsctimer(struct platform_timesource *pts) { ... pts->frequency = tsc_freq; return 1; } ... void __init early_time_init(void) { u64 tmp = init_pit_and_calibrate_tsc(); tsc_freq = tmp; } *To:* #v3 static s64 __init init_tsctimer(struct platform_timesource *pts) { return pts->frequency; } void __init early_time_init(void) { ... tmp = init_platform_timer(); plt_tsc.frequency = tmp; } Does this answer your question? Note that my purpose with the change, was to remove the tsc_freq temporary variable. If it makes things less clear (as in doing things differently from other platform timers) I can go back to v2 in this aspect. Joao
>>> On 30.08.16 at 14:08, <joao.m.martins@oracle.com> wrote: > On 08/29/2016 10:36 AM, Jan Beulich wrote: >>>>> On 26.08.16 at 17:11, <joao.m.martins@oracle.com> wrote: >>> On 08/25/2016 11:06 AM, Jan Beulich wrote: >>>>>>> On 24.08.16 at 14:43, <joao.m.martins@oracle.com> wrote: >>>>> - Change init_tsctimer to skip all the tests and assume it's called >>>>> only on reliable TSC conditions and no warps observed. Tidy >>>>> initialization on verify_tsc_reliability as suggested by Konrad. >>>> >>>> Which means that contrary to what you say in the cover letter >>>> there's nothing to prevent it from being used when CPU hotplug >>>> is possible. >>> Probably the cover letter wasn't completely clear on this. I mention that I split it >>> between Patch 2 and 5 (intent was for easier review), and you can see that I add >>> check in patch 5, plus preventing online of CPUs too. >>> >>>> I don't think you should skip basic sanity checks, and >>>> place entirely on the admin the burden of knowing whether the >>>> option is safe to use. >>> Neither do I think it should be skipped. We mainly don't duplicate these checks. In >>> the end I am just running init_tsctimer, in the TSC_RELIABLE block and if no warps >>> are observed. So putting that in init_tsctimer would just duplicate the previously >>> done checks. Or would you still prefer as done in previous version i.e. all checks in >>> both init_tsctimer and verify_tsc_reliability? >> >> I'm not sure they're needed in both places; what you need to make >> certain is that they're reliably gone through, and that this happens >> early enough. > They are reliably gone through and we get to avoid duplication of checks. Unless > there's a preference to re-add these checks in init_tsctimer, I'll keep these as is. > verify_tsc_reliability(...) needs to perform this checks and init_tsctimer is only > called these reliable TSC conditions. But please make sure there's a comment in (or ahead of) init_tsctimer() pointing out where the apparently missing checks are. This will help both review and future readers. >>>>> @@ -1528,6 +1607,7 @@ void __init early_time_init(void) >>>>> >>>>> preinit_pit(); >>>>> tmp = init_platform_timer(); >>>>> + plt_tsc.frequency = tmp; >>>> >>>> How can this be the TSC frequency? init_platform_timer() >>>> specifically skips the TSC. And I can see no other place where >>>> plt_tsc.frequency gets initialized. Am I overlooking something? >>>> >>> That's the calibrated TSC frequency. Before I was setting pts->frequency in >>> init_tsctimer through a temporary variable called tsc_freq. So I thought I could just >>> drop the variable and set plt_tsc directly. The difference though from previous >>> versions is that since commit 9334029 this value is returned from platform time >>> source init() and calibrated against platform timer, instead of always against PIT. >> >> This doesn't seem to answer my primary question: Where does >> plt_tsc.frequency get initialized now? try_platform_timer() and >> init_platform_timer() don't - PIT and ACPI timer use static >> initializers, and HEPT gets taken care of in init_hpet(), and hence so >> should init_tsctimer() do (instead of just returning this apparently >> never initialized value). And that's unrelated to what ->init() returns. > > plt_tsc.frequency is certainly initialized in early_time_init(). And then on > try_platform_timer we have plt_src = *pts (pts would be a pointer to plt_tsc > when > called from verify_tsc_reliability()). > > IOW, effectively I changed from this: > > #v2 > > static u64 tsc_freq; > > static s64 __init init_tsctimer(struct platform_timesource *pts) > { > ... > pts->frequency = tsc_freq; > return 1; > } > > ... > > void __init early_time_init(void) > { > u64 tmp = init_pit_and_calibrate_tsc(); > > tsc_freq = tmp; > } > > *To:* > > #v3 > > static s64 __init init_tsctimer(struct platform_timesource *pts) > { > return pts->frequency; > } > > > void __init early_time_init(void) > { > ... > tmp = init_platform_timer(); > plt_tsc.frequency = tmp; > } > > Does this answer your question? Note that my purpose with the change, was to remove > the tsc_freq temporary variable. If it makes things less clear (as in doing things > differently from other platform timers) I can go back to v2 in this aspect. Ah, I see now how I got confused. This once again depends on TSC to possible become the platform timer only much later than when early_time_init() runs. Jan
On 08/30/2016 01:30 PM, Jan Beulich wrote: >>>> On 30.08.16 at 14:08, <joao.m.martins@oracle.com> wrote: >> On 08/29/2016 10:36 AM, Jan Beulich wrote: >>>>>> On 26.08.16 at 17:11, <joao.m.martins@oracle.com> wrote: >>>> On 08/25/2016 11:06 AM, Jan Beulich wrote: >>>>>>>> On 24.08.16 at 14:43, <joao.m.martins@oracle.com> wrote: >>>>>> - Change init_tsctimer to skip all the tests and assume it's called >>>>>> only on reliable TSC conditions and no warps observed. Tidy >>>>>> initialization on verify_tsc_reliability as suggested by Konrad. >>>>> >>>>> Which means that contrary to what you say in the cover letter >>>>> there's nothing to prevent it from being used when CPU hotplug >>>>> is possible. >>>> Probably the cover letter wasn't completely clear on this. I mention that I split it >>>> between Patch 2 and 5 (intent was for easier review), and you can see that I add >>>> check in patch 5, plus preventing online of CPUs too. >>>> >>>>> I don't think you should skip basic sanity checks, and >>>>> place entirely on the admin the burden of knowing whether the >>>>> option is safe to use. >>>> Neither do I think it should be skipped. We mainly don't duplicate these checks. In >>>> the end I am just running init_tsctimer, in the TSC_RELIABLE block and if no warps >>>> are observed. So putting that in init_tsctimer would just duplicate the previously >>>> done checks. Or would you still prefer as done in previous version i.e. all checks in >>>> both init_tsctimer and verify_tsc_reliability? >>> >>> I'm not sure they're needed in both places; what you need to make >>> certain is that they're reliably gone through, and that this happens >>> early enough. >> They are reliably gone through and we get to avoid duplication of checks. Unless >> there's a preference to re-add these checks in init_tsctimer, I'll keep these as is. >> verify_tsc_reliability(...) needs to perform this checks and init_tsctimer is only >> called these reliable TSC conditions. > > But please make sure there's a comment in (or ahead of) > init_tsctimer() pointing out where the apparently missing checks > are. This will help both review and future readers. Okay, I'll add a comment in init_tsctimer(). >>>>>> @@ -1528,6 +1607,7 @@ void __init early_time_init(void) >>>>>> >>>>>> preinit_pit(); >>>>>> tmp = init_platform_timer(); >>>>>> + plt_tsc.frequency = tmp; >>>>> >>>>> How can this be the TSC frequency? init_platform_timer() >>>>> specifically skips the TSC. And I can see no other place where >>>>> plt_tsc.frequency gets initialized. Am I overlooking something? >>>>> >>>> That's the calibrated TSC frequency. Before I was setting pts->frequency in >>>> init_tsctimer through a temporary variable called tsc_freq. So I thought I could just >>>> drop the variable and set plt_tsc directly. The difference though from previous >>>> versions is that since commit 9334029 this value is returned from platform time >>>> source init() and calibrated against platform timer, instead of always against PIT. >>> >>> This doesn't seem to answer my primary question: Where does >>> plt_tsc.frequency get initialized now? try_platform_timer() and >>> init_platform_timer() don't - PIT and ACPI timer use static >>> initializers, and HEPT gets taken care of in init_hpet(), and hence so >>> should init_tsctimer() do (instead of just returning this apparently >>> never initialized value). And that's unrelated to what ->init() returns. >> >> plt_tsc.frequency is certainly initialized in early_time_init(). And then on >> try_platform_timer we have plt_src = *pts (pts would be a pointer to plt_tsc >> when >> called from verify_tsc_reliability()). >> >> IOW, effectively I changed from this: >> >> #v2 >> >> static u64 tsc_freq; >> >> static s64 __init init_tsctimer(struct platform_timesource *pts) >> { >> ... >> pts->frequency = tsc_freq; >> return 1; >> } >> >> ... >> >> void __init early_time_init(void) >> { >> u64 tmp = init_pit_and_calibrate_tsc(); >> >> tsc_freq = tmp; >> } >> >> *To:* >> >> #v3 >> >> static s64 __init init_tsctimer(struct platform_timesource *pts) >> { >> return pts->frequency; >> } >> >> >> void __init early_time_init(void) >> { >> ... >> tmp = init_platform_timer(); >> plt_tsc.frequency = tmp; >> } >> >> Does this answer your question? Note that my purpose with the change, was to remove >> the tsc_freq temporary variable. If it makes things less clear (as in doing things >> differently from other platform timers) I can go back to v2 in this aspect. > > Ah, I see now how I got confused. This once again depends on TSC > to possible become the platform timer only much later than when > early_time_init() runs. > True, but here we're only initializing frequency at this point, yet it's only used if reliable tsc conditions do verify. Joao
diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c index 6750e46..b2a11a8 100644 --- a/xen/arch/x86/time.c +++ b/xen/arch/x86/time.c @@ -475,6 +475,30 @@ uint64_t ns_to_acpi_pm_tick(uint64_t ns) } /************************************************************ + * PLATFORM TIMER 4: TSC + */ + +static s64 __init init_tsctimer(struct platform_timesource *pts) +{ + return pts->frequency; +} + +static u64 read_tsc(void) +{ + return rdtsc_ordered(); +} + +static struct platform_timesource __initdata plt_tsc = +{ + .id = "tsc", + .name = "TSC", + .read_counter = read_tsc, + .counter_bits = 64, + /* Not called by init_platform_timer as it is not on the plt_timers array */ + .init = init_tsctimer, +}; + +/************************************************************ * GENERIC PLATFORM TIMER INFRASTRUCTURE */ @@ -576,6 +600,21 @@ static void resume_platform_timer(void) plt_stamp = plt_src.read_counter(); } +static void __init reset_platform_timer(void) +{ + /* Deactivate any timers running */ + kill_timer(&plt_overflow_timer); + kill_timer(&calibration_timer); + + /* Reset counters and stamps */ + spin_lock_irq(&platform_timer_lock); + plt_stamp = 0; + plt_stamp64 = 0; + platform_timer_stamp = 0; + stime_platform_stamp = 0; + spin_unlock_irq(&platform_timer_lock); +} + static s64 __init try_platform_timer(struct platform_timesource *pts) { s64 rc = pts->init(pts); @@ -583,6 +622,10 @@ static s64 __init try_platform_timer(struct platform_timesource *pts) if ( rc <= 0 ) return rc; + /* We have a platform timesource already so reset it */ + if ( plt_src.counter_bits != 0 ) + reset_platform_timer(); + plt_mask = (u64)~0ull >> (64 - pts->counter_bits); set_time_scale(&plt_scale, pts->frequency); @@ -604,7 +647,9 @@ static u64 __init init_platform_timer(void) unsigned int i; s64 rc = -1; - if ( opt_clocksource[0] != '\0' ) + /* clocksource=tsc is initialized via __initcalls (when CPUs are up). */ + if ( (opt_clocksource[0] != '\0') && + (strcmp(opt_clocksource, "tsc")) ) { for ( i = 0; i < ARRAY_SIZE(plt_timers); i++ ) { @@ -1481,6 +1526,40 @@ static int __init verify_tsc_reliability(void) __func__); setup_clear_cpu_cap(X86_FEATURE_TSC_RELIABLE); } + else if ( !strcmp(opt_clocksource, "tsc") ) + { + int cpu; + + if ( try_platform_timer(&plt_tsc) <= 0 ) + return 0; + + /* + * Platform timer has changed and CPU time will only be updated + * after we set again the calibration timer, which means we need to + * seed again each local CPU time. At this stage TSC is known to be + * reliable i.e. monotonically increasing across all CPUs so this + * lets us remove the skew between platform timer and TSC, since + * these are now effectively the same. + */ + for_each_online_cpu( cpu ) + { + struct cpu_time *t = &per_cpu(cpu_time, cpu); + + t->stamp.local_tsc = boot_tsc_stamp; + t->stamp.local_stime = 0; + t->stamp.local_stime = get_s_time_fixed(boot_tsc_stamp); + t->stamp.master_stime = t->stamp.local_stime; + } + + platform_timer_stamp = plt_stamp64; + stime_platform_stamp = get_s_time_fixed(plt_stamp64); + + printk(XENLOG_INFO "Switched to Platform timer %s TSC\n", + freq_string(plt_src.frequency)); + + init_timer(&calibration_timer, time_calibration, NULL, 0); + set_timer(&calibration_timer, NOW() + EPOCH); + } } return 0; @@ -1528,6 +1607,7 @@ void __init early_time_init(void) preinit_pit(); tmp = init_platform_timer(); + plt_tsc.frequency = tmp; set_time_scale(&t->tsc_scale, tmp); t->stamp.local_tsc = boot_tsc_stamp;
Recent x86/time changes improved a lot of the monotonicity in xen timekeeping, making it much harder to observe time going backwards. Although platform timer can't be expected to be perfectly in sync with TSC and so get_s_time won't be guaranteed to always return monotonically increasing values across cpus. This is the case in some of the boxes I am testing with, observing sometimes ~100 warps (of very few nanoseconds each) after a few hours. This patch introduces support for using TSC as platform time source which is the highest resolution time and most performant to get (~20 nsecs). Though there are also several problems associated with its usage, and there isn't a complete (and architecturally defined) guarantee that all machines will provide reliable and monotonic TSC in all cases (I believe Intel to be the only that can guarantee that?) For this reason it's set with less priority when compared to HPET unless adminstrator changes "clocksource" boot option to "tsc". Initializing TSC clocksource requires all CPUs up to have the tsc reliability checks performed. init_xen_time is called before all CPUs are up, and so we would start with HPET at boot time, and switch later to TSC. The switch then happens on verify_tsc_reliability initcall that is invoked when all CPUs are up. When attempting to initialize TSC we also check for time warps and if it has invariant TSC. And in case none of these conditions are met, we keep the clocksource that was previously initialized on init_xen_time. Note that while we deem reliable a CONSTANT_TSC with no deep C-states, it might not always be the case, so we're conservative and allow TSC to be used as platform timer only with invariant TSC. Since b64438c7c ("x86/time: use correct (local) time stamp in constant-TSC calibration fast path") updates to cpu time use local stamps, which means platform timer is only used to seed the initial cpu time. With clocksource=tsc there is no need to be in sync with another clocksource, so we reseed the local/master stamps to be values of TSC and update the platform time stamps accordingly. Time calibration is set to 1sec after we switch to TSC, thus these stamps are reseeded to also ensure monotonic returning values right after the point we switch to TSC. This is also to avoid the possibility of having inconsistent readings in this short period (i.e. until calibration fires). Signed-off-by: Joao Martins <joao.m.martins@oracle.com> --- Cc: Jan Beulich <jbeulich@suse.com> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Changes since v2: - Suggest "HPET switching to TSC" only as an example as otherwise it would be misleading on platforms not having one. - Change init_tsctimer to skip all the tests and assume it's called only on reliable TSC conditions and no warps observed. Tidy initialization on verify_tsc_reliability as suggested by Konrad. - CONSTANT_TSC and max_cstate <= 2 case removed and only allow tsc clocksource in invariant TSC boxes. - Prefer omit !=0 on init_platform_timer for tsc case. - Change comment on init_platform_timer. - Add comment on plt_tsc declaration. - Reinit CPU time for all online cpus instead of just CPU 0. - Use rdtsc_ordered() as opposed to rdtsc() - Remove tsc_freq variable and set plt_tsc clocksource frequency with the refined tsc calibration. - Rework a bit the commit message. Changes since v1: - s/printk/printk(XENLOG_INFO - Remove extra space on inner brackets - Add missing space around brackets - Defer TSC initialization when all CPUs are up. Changes since RFC: - Spelling fixes in the commit message. - Remove unused clocksource_is_tsc variable and introduce it instead on the patch that uses it. - Move plt_tsc from second to last in the available clocksources. --- xen/arch/x86/time.c | 82 ++++++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 81 insertions(+), 1 deletion(-)