[v4,2/5] x86/time: implement tsc as clocksource

Recent x86/time changes improved a lot of the monotonicity in xen
timekeeping, making it much harder to observe time going backwards.
Although platform timer can't be expected to be perfectly in sync with
TSC and so get_s_time won't be guaranteed to always return
monotonically increasing values across cpus.  This is the case in some
of the boxes I am testing with, observing sometimes ~100 warps (of
very few nanoseconds each) after a few hours.

This patch introduces support for using TSC as platform time source
which is the highest resolution time and most performant to get.
Though there are also several problems associated with its usage, and
there isn't a complete (and architecturally defined) guarantee that
all machines will provide reliable and monotonic TSC in all cases (I
believe Intel to be the only that can guarantee that?) For this reason
it's set with less priority when compared to HPET unless adminstrator
changes "clocksource" boot option to "tsc". Initializing TSC
clocksource requires all CPUs up to have the tsc reliability checks
performed. init_xen_time is called before all CPUs are up, so for
example we would start with HPET (or ACPI, PIT) at boot time, and
switch later to TSC. The switch then happens on verify_tsc_reliability
initcall that is invoked when all CPUs are up. When attempting to
initialize TSC we also check for time warps and if it has invariant
TSC. Note that while we deem reliable a CONSTANT_TSC with no deep
C-states, it might not always be the case, so we're conservative and
allow TSC to be used as platform timer only with invariant TSC.
Additionally we check if CPU Hotplug isn't meant to be performed on
the host which will either be when max vcpus and num_present_cpu are
the same. This is because a newly hotplugged CPU may not satisfy the
condition of having all TSCs synchronized - so when having tsc
clocksource being used we allow offlining CPUs but not onlining any
ones back. Finally we prevent TSC from being used as clocksource on
multiple sockets because it isn't guaranteed to be invariant. Further
relaxing of this last requirement is added in a separate patch, such
that we allow vendors with such guarantee to use TSC as clocksource.
In case any of these conditions is not met, we keep the clocksource
that was previously initialized on init_xen_time.

Since b64438c7c ("x86/time: use correct (local) time stamp in
constant-TSC calibration fast path") updates to cpu time use local
stamps, which means platform timer is only used to seed the initial
cpu time.  With clocksource=tsc there is no need to be in sync with
another clocksource, so we reseed the local/master stamps to be values
of TSC and update the platform time stamps accordingly. Time
calibration is set to 1sec after we switch to TSC, thus these stamps
are reseeded to also ensure monotonic returning values right after the
point we switch to TSC. This is also to avoid the possibility of
having inconsistent readings in this short period (i.e. until
calibration fires).

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>

Changes since v3:
 - Really fix "HPET switching to TSC" comment. Despite mentioned in the
 in previous version, the change wasn't there.
 - Remove parenthesis around the function call in init_platform_timer
 - Merge if on verify_tsc_reliability with opt_clocksource check
 - Removed comment above ".init = init_tsctimer"
 - Fixup docs updated into this patch.
 - Move host_tsc_is_clocksource() and CPU hotplug possibility check to this
 patch.
 - s/host_tsc_is_clocksource/clocksource_is_tsc
 - Use bool instead of bool_t
 - Add a comment above init_tsctimer() declaration mentioning the
   reliable TSC checks on verify_tsc_reliability(), under which the
   function is invoked.
 - Prevent clocksource=tsc on platforms with multiple sockets. Further
 relaxing of this requirement is added in a separate patch, as
 extension of "tsc" boot parameter.
 - Removed control group to update cpu_time and do instead with
   on_selected_cpus to avoid any potential races.
 - Accomodate common path between init_xen_time and TSC switch into
 try_platform_timer_tail, such that finishing platform timer
 initialization is done in the same place (including platform timer
 overflow which was previously was removed in previous versions).
 - Changed TSC counter_bits 63 to avoid mishandling of TSC counter
 wrap-around in platform timer overflow timer.
 - Moved paragraph CPU Hotplug from last patch and add comment on
   commit message about multiple sockets TSC sync.
 - s/init_tsctimer/init_tsc/g to be consistent with other TSC platform
 timer functions.

Changes since v2:
 - Suggest "HPET switching to TSC" only as an example as otherwise it
 would be misleading on platforms not having one.
 - Change init_tsctimer to skip all the tests and assume it's called
 only on reliable TSC conditions and no warps observed. Tidy
 initialization on verify_tsc_reliability as suggested by Konrad.
 - CONSTANT_TSC and max_cstate <= 2 case removed and only allow tsc
   clocksource in invariant TSC boxes.
 - Prefer omit !=0 on init_platform_timer for tsc case.
 - Change comment on init_platform_timer.
 - Add comment on plt_tsc declaration.
 - Reinit CPU time for all online cpus instead of just CPU 0.
 - Use rdtsc_ordered() as opposed to rdtsc()
 - Remove tsc_freq variable and set plt_tsc clocksource frequency
 with the refined tsc calibration.
 - Rework a bit the commit message.

Changes since v1:
 - s/printk/printk(XENLOG_INFO
 - Remove extra space on inner brackets
 - Add missing space around brackets
 - Defer TSC initialization when all CPUs are up.

Changes since RFC:
 - Spelling fixes in the commit message.
 - Remove unused clocksource_is_tsc variable and introduce it instead
 on the patch that uses it.
 - Move plt_tsc from second to last in the available clocksources.
---
 docs/misc/xen-command-line.markdown |   6 +-
 xen/arch/x86/platform_hypercall.c   |   3 +-
 xen/arch/x86/time.c                 | 127 +++++++++++++++++++++++++++++++++---
 xen/include/asm-x86/time.h          |   1 +
 4 files changed, 125 insertions(+), 12 deletions(-)

[v4,2/5] x86/time: implement tsc as clocksource

Commit Message

Comments

Patch