From patchwork Wed Sep 14 17:37:47 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Joao Martins X-Patchwork-Id: 9332191 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id CA787607FD for ; Wed, 14 Sep 2016 17:39:20 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id BE9542A246 for ; Wed, 14 Sep 2016 17:39:20 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id B364A2A24F; Wed, 14 Sep 2016 17:39:20 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.2 required=2.0 tests=BAYES_00, RCVD_IN_DNSWL_MED, UNPARSEABLE_RELAY autolearn=ham version=3.3.1 Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id C8DF92A246 for ; Wed, 14 Sep 2016 17:39:19 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bkE7G-00010q-Sc; Wed, 14 Sep 2016 17:36:58 +0000 Received: from mail6.bemta6.messagelabs.com ([193.109.254.103]) by lists.xenproject.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bkE7F-00010S-EH for xen-devel@lists.xenproject.org; Wed, 14 Sep 2016 17:36:57 +0000 Received: from [85.158.143.35] by server-1.bemta-6.messagelabs.com id 4D/C2-21406-8BA89D75; Wed, 14 Sep 2016 17:36:56 +0000 X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFvrHLMWRWlGSWpSXmKPExsXSO6nOVXd7181 wg92HDCy+b5nM5MDocfjDFZYAxijWzLyk/IoE1oyFbzezFVzMrjj99ylrA+PxsC5GLg4hgYlM Ep+W32GEcP4ySsy+tYMJwtnIKLHu3El2CKeRUWLG9VPMXYycHGwCehKt5z+D2SICShL3Vk0G6 2AW6GCU+HL+FBNIQljATmLjzIMsIDaLgKrEpnnzwRp4BTwkzu6fCVYjISAncf74T7A4p4CnxI 8bn8HqhYBqDjy4xQxRYyjxeeNS5gmMfAsYGVYxahSnFpWlFukaWeglFWWmZ5TkJmbm6BoamOn lphYXJ6an5iQmFesl5+duYgSGCwMQ7GA8vzbwEKMkB5OSKG9p8M1wIb6k/JTKjMTijPii0pzU 4kOMMhwcShK8XJ1AOcGi1PTUirTMHGDgwqQlOHiURHilQNK8xQWJucWZ6RCpU4yKUuK8OzqAE gIgiYzSPLg2WLRcYpSVEuZlBDpEiKcgtSg3swRV/hWjOAejkjCvFch4nsy8Erjpr4AWMwEt3r LmOsjikkSElFQDo4Yb44y7XLvcnPn3fGf+6fasdp9FWLd6ke7JsldR0ytr9ktOycyINrH63cN 7Q+e119vjOenP5QvCygyy3riUnTsrWvRAcnoqw+TASduD9Rwc6vWXb/h21j9Hh7Mq6VVZ7z6b rWrSvwPnOM2MdXr18WaJuemPwNbdBSzNWdOf8Bc4sBZL8O5WYinOSDTUYi4qTgQA+99OF5ECA AA= X-Env-Sender: joao.m.martins@oracle.com X-Msg-Ref: server-16.tower-21.messagelabs.com!1473874613!28803797!1 X-Originating-IP: [141.146.126.69] X-SpamReason: No, hits=0.0 required=7.0 tests=sa_preprocessor: VHJ1c3RlZCBJUDogMTQxLjE0Ni4xMjYuNjkgPT4gMjc3MjE4\n X-StarScan-Received: X-StarScan-Version: 8.84; banners=-,-,- X-VirusChecked: Checked Received: (qmail 8015 invoked from network); 14 Sep 2016 17:36:55 -0000 Received: from aserp1040.oracle.com (HELO aserp1040.oracle.com) (141.146.126.69) by server-16.tower-21.messagelabs.com with DHE-RSA-AES256-GCM-SHA384 encrypted SMTP; 14 Sep 2016 17:36:55 -0000 Received: from userv0022.oracle.com (userv0022.oracle.com [156.151.31.74]) by aserp1040.oracle.com (Sentrion-MTA-4.3.2/Sentrion-MTA-4.3.2) with ESMTP id u8EHaqsL014356 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 14 Sep 2016 17:36:52 GMT Received: from aserv0122.oracle.com (aserv0122.oracle.com [141.146.126.236]) by userv0022.oracle.com (8.14.4/8.13.8) with ESMTP id u8EHapUJ019468 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 14 Sep 2016 17:36:51 GMT Received: from abhmp0007.oracle.com (abhmp0007.oracle.com [141.146.116.13]) by aserv0122.oracle.com (8.14.4/8.14.4) with ESMTP id u8EHapBp003757; Wed, 14 Sep 2016 17:36:51 GMT Received: from paddy.lan (/89.114.92.174) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Wed, 14 Sep 2016 10:36:51 -0700 From: Joao Martins To: xen-devel@lists.xenproject.org Date: Wed, 14 Sep 2016 18:37:47 +0100 Message-Id: <1473874670-4986-3-git-send-email-joao.m.martins@oracle.com> X-Mailer: git-send-email 2.1.4 In-Reply-To: <1473874670-4986-1-git-send-email-joao.m.martins@oracle.com> References: <1473874670-4986-1-git-send-email-joao.m.martins@oracle.com> X-Source-IP: userv0022.oracle.com [156.151.31.74] Cc: Andrew Cooper , Joao Martins , Jan Beulich Subject: [Xen-devel] [PATCH v4 2/5] x86/time: implement tsc as clocksource X-BeenThere: xen-devel@lists.xen.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: xen-devel-bounces@lists.xen.org Sender: "Xen-devel" X-Virus-Scanned: ClamAV using ClamSMTP Recent x86/time changes improved a lot of the monotonicity in xen timekeeping, making it much harder to observe time going backwards. Although platform timer can't be expected to be perfectly in sync with TSC and so get_s_time won't be guaranteed to always return monotonically increasing values across cpus. This is the case in some of the boxes I am testing with, observing sometimes ~100 warps (of very few nanoseconds each) after a few hours. This patch introduces support for using TSC as platform time source which is the highest resolution time and most performant to get. Though there are also several problems associated with its usage, and there isn't a complete (and architecturally defined) guarantee that all machines will provide reliable and monotonic TSC in all cases (I believe Intel to be the only that can guarantee that?) For this reason it's set with less priority when compared to HPET unless adminstrator changes "clocksource" boot option to "tsc". Initializing TSC clocksource requires all CPUs up to have the tsc reliability checks performed. init_xen_time is called before all CPUs are up, so for example we would start with HPET (or ACPI, PIT) at boot time, and switch later to TSC. The switch then happens on verify_tsc_reliability initcall that is invoked when all CPUs are up. When attempting to initialize TSC we also check for time warps and if it has invariant TSC. Note that while we deem reliable a CONSTANT_TSC with no deep C-states, it might not always be the case, so we're conservative and allow TSC to be used as platform timer only with invariant TSC. Additionally we check if CPU Hotplug isn't meant to be performed on the host which will either be when max vcpus and num_present_cpu are the same. This is because a newly hotplugged CPU may not satisfy the condition of having all TSCs synchronized - so when having tsc clocksource being used we allow offlining CPUs but not onlining any ones back. Finally we prevent TSC from being used as clocksource on multiple sockets because it isn't guaranteed to be invariant. Further relaxing of this last requirement is added in a separate patch, such that we allow vendors with such guarantee to use TSC as clocksource. In case any of these conditions is not met, we keep the clocksource that was previously initialized on init_xen_time. Since b64438c7c ("x86/time: use correct (local) time stamp in constant-TSC calibration fast path") updates to cpu time use local stamps, which means platform timer is only used to seed the initial cpu time. With clocksource=tsc there is no need to be in sync with another clocksource, so we reseed the local/master stamps to be values of TSC and update the platform time stamps accordingly. Time calibration is set to 1sec after we switch to TSC, thus these stamps are reseeded to also ensure monotonic returning values right after the point we switch to TSC. This is also to avoid the possibility of having inconsistent readings in this short period (i.e. until calibration fires). Signed-off-by: Joao Martins --- Cc: Jan Beulich Cc: Andrew Cooper Changes since v3: - Really fix "HPET switching to TSC" comment. Despite mentioned in the in previous version, the change wasn't there. - Remove parenthesis around the function call in init_platform_timer - Merge if on verify_tsc_reliability with opt_clocksource check - Removed comment above ".init = init_tsctimer" - Fixup docs updated into this patch. - Move host_tsc_is_clocksource() and CPU hotplug possibility check to this patch. - s/host_tsc_is_clocksource/clocksource_is_tsc - Use bool instead of bool_t - Add a comment above init_tsctimer() declaration mentioning the reliable TSC checks on verify_tsc_reliability(), under which the function is invoked. - Prevent clocksource=tsc on platforms with multiple sockets. Further relaxing of this requirement is added in a separate patch, as extension of "tsc" boot parameter. - Removed control group to update cpu_time and do instead with on_selected_cpus to avoid any potential races. - Accomodate common path between init_xen_time and TSC switch into try_platform_timer_tail, such that finishing platform timer initialization is done in the same place (including platform timer overflow which was previously was removed in previous versions). - Changed TSC counter_bits 63 to avoid mishandling of TSC counter wrap-around in platform timer overflow timer. - Moved paragraph CPU Hotplug from last patch and add comment on commit message about multiple sockets TSC sync. - s/init_tsctimer/init_tsc/g to be consistent with other TSC platform timer functions. Changes since v2: - Suggest "HPET switching to TSC" only as an example as otherwise it would be misleading on platforms not having one. - Change init_tsctimer to skip all the tests and assume it's called only on reliable TSC conditions and no warps observed. Tidy initialization on verify_tsc_reliability as suggested by Konrad. - CONSTANT_TSC and max_cstate <= 2 case removed and only allow tsc clocksource in invariant TSC boxes. - Prefer omit !=0 on init_platform_timer for tsc case. - Change comment on init_platform_timer. - Add comment on plt_tsc declaration. - Reinit CPU time for all online cpus instead of just CPU 0. - Use rdtsc_ordered() as opposed to rdtsc() - Remove tsc_freq variable and set plt_tsc clocksource frequency with the refined tsc calibration. - Rework a bit the commit message. Changes since v1: - s/printk/printk(XENLOG_INFO - Remove extra space on inner brackets - Add missing space around brackets - Defer TSC initialization when all CPUs are up. Changes since RFC: - Spelling fixes in the commit message. - Remove unused clocksource_is_tsc variable and introduce it instead on the patch that uses it. - Move plt_tsc from second to last in the available clocksources. --- docs/misc/xen-command-line.markdown | 6 +- xen/arch/x86/platform_hypercall.c | 3 +- xen/arch/x86/time.c | 127 +++++++++++++++++++++++++++++++++--- xen/include/asm-x86/time.h | 1 + 4 files changed, 125 insertions(+), 12 deletions(-) diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown index 3a250cb..f92fb3f 100644 --- a/docs/misc/xen-command-line.markdown +++ b/docs/misc/xen-command-line.markdown @@ -264,9 +264,13 @@ minimum of 32M, subject to a suitably aligned and sized contiguous region of memory being available. ### clocksource -> `= pit | hpet | acpi` +> `= pit | hpet | acpi | tsc` If set, override Xen's default choice for the platform timer. +Having TSC as platform timer requires being explicitly set. This is because +TSC can only be safely used if CPU hotplug isn't performed on the system. In +some platforms, "maxcpus" parameter may require further adjustment to the +number of online cpus. ### cmci-threshold > `= ` diff --git a/xen/arch/x86/platform_hypercall.c b/xen/arch/x86/platform_hypercall.c index 780f22d..0879e19 100644 --- a/xen/arch/x86/platform_hypercall.c +++ b/xen/arch/x86/platform_hypercall.c @@ -631,7 +631,8 @@ ret_t do_platform_op(XEN_GUEST_HANDLE_PARAM(xen_platform_op_t) u_xenpf_op) if ( ret ) break; - if ( cpu >= nr_cpu_ids || !cpu_present(cpu) ) + if ( cpu >= nr_cpu_ids || !cpu_present(cpu) || + clocksource_is_tsc() ) { ret = -EINVAL; break; diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c index 0c1ad45..e5001d5 100644 --- a/xen/arch/x86/time.c +++ b/xen/arch/x86/time.c @@ -475,6 +475,50 @@ uint64_t ns_to_acpi_pm_tick(uint64_t ns) } /************************************************************ + * PLATFORM TIMER 4: TSC + */ + +/* + * Called in verify_tsc_reliability() under reliable TSC conditions + * thus reusing all the checks already performed there. + */ +static s64 __init init_tsc(struct platform_timesource *pts) +{ + u64 ret = pts->frequency; + + if ( nr_cpu_ids != num_present_cpus() ) + { + printk(XENLOG_INFO "TSC: CPU Hotplug intended\n"); + ret = 0; + } + + if ( nr_sockets > 1 ) + { + printk(XENLOG_INFO "TSC: Not invariant across sockets\n"); + ret = 0; + } + + if ( !ret ) + printk(XENLOG_INFO "TSC: Not setting it as clocksource\n"); + + return ret; +} + +static u64 read_tsc(void) +{ + return rdtsc_ordered(); +} + +static struct platform_timesource __initdata plt_tsc = +{ + .id = "tsc", + .name = "TSC", + .read_counter = read_tsc, + .counter_bits = 63, + .init = init_tsc, +}; + +/************************************************************ * GENERIC PLATFORM TIMER INFRASTRUCTURE */ @@ -576,6 +620,21 @@ static void resume_platform_timer(void) plt_stamp = plt_src.read_counter(); } +static void __init reset_platform_timer(void) +{ + /* Deactivate any timers running */ + kill_timer(&plt_overflow_timer); + kill_timer(&calibration_timer); + + /* Reset counters and stamps */ + spin_lock_irq(&platform_timer_lock); + plt_stamp = 0; + plt_stamp64 = 0; + platform_timer_stamp = 0; + stime_platform_stamp = 0; + spin_unlock_irq(&platform_timer_lock); +} + static s64 __init try_platform_timer(struct platform_timesource *pts) { s64 rc = pts->init(pts); @@ -583,6 +642,10 @@ static s64 __init try_platform_timer(struct platform_timesource *pts) if ( rc <= 0 ) return rc; + /* We have a platform timesource already so reset it */ + if ( plt_src.counter_bits != 0 ) + reset_platform_timer(); + plt_mask = (u64)~0ull >> (64 - pts->counter_bits); set_time_scale(&plt_scale, pts->frequency); @@ -604,7 +667,9 @@ static u64 __init init_platform_timer(void) unsigned int i; s64 rc = -1; - if ( opt_clocksource[0] != '\0' ) + /* clocksource=tsc is initialized via __initcalls (when CPUs are up). */ + if ( (opt_clocksource[0] != '\0') && + strcmp(opt_clocksource, "tsc") ) { for ( i = 0; i < ARRAY_SIZE(plt_timers); i++ ) { @@ -1463,6 +1528,31 @@ static void __init tsc_check_writability(void) disable_tsc_sync = 1; } +static void __init reset_percpu_time(void *unused) +{ + struct cpu_time *t = &this_cpu(cpu_time); + + t->stamp.local_tsc = boot_tsc_stamp; + t->stamp.local_stime = 0; + t->stamp.local_stime = get_s_time_fixed(boot_tsc_stamp); + t->stamp.master_stime = t->stamp.local_stime; +} + +static void __init try_platform_timer_tail(void) +{ + init_timer(&plt_overflow_timer, plt_overflow, NULL, 0); + plt_overflow(NULL); + + platform_timer_stamp = plt_stamp64; + stime_platform_stamp = NOW(); + + if ( !clocksource_is_tsc() ) + init_percpu_time(); + + init_timer(&calibration_timer, time_calibration, NULL, 0); + set_timer(&calibration_timer, NOW() + EPOCH); +} + /* Late init function, after all cpus have booted */ static int __init verify_tsc_reliability(void) { @@ -1480,6 +1570,25 @@ static int __init verify_tsc_reliability(void) printk("TSC warp detected, disabling TSC_RELIABLE\n"); setup_clear_cpu_cap(X86_FEATURE_TSC_RELIABLE); } + else if ( !strcmp(opt_clocksource, "tsc") && + (try_platform_timer(&plt_tsc) > 0) ) + { + /* + * Platform timer has changed and CPU time will only be updated + * after we set again the calibration timer, which means we need to + * seed again each local CPU time. At this stage TSC is known to be + * reliable i.e. monotonically increasing across all CPUs so this + * lets us remove the skew between platform timer and TSC, since + * these are now effectively the same. + */ + on_selected_cpus(&cpu_online_map, reset_percpu_time, NULL, 1); + + /* Finish platform timer switch. */ + try_platform_timer_tail(); + + printk(XENLOG_INFO "Switched to Platform timer %s TSC\n", + freq_string(plt_src.frequency)); + } } return 0; @@ -1505,15 +1614,7 @@ int __init init_xen_time(void) do_settime(get_cmos_time(), 0, NOW()); /* Finish platform timer initialization. */ - init_timer(&plt_overflow_timer, plt_overflow, NULL, 0); - plt_overflow(NULL); - platform_timer_stamp = plt_stamp64; - stime_platform_stamp = NOW(); - - init_percpu_time(); - - init_timer(&calibration_timer, time_calibration, NULL, 0); - set_timer(&calibration_timer, NOW() + EPOCH); + try_platform_timer_tail(); return 0; } @@ -1527,6 +1628,7 @@ void __init early_time_init(void) preinit_pit(); tmp = init_platform_timer(); + plt_tsc.frequency = tmp; set_time_scale(&t->tsc_scale, tmp); t->stamp.local_tsc = boot_tsc_stamp; @@ -1775,6 +1877,11 @@ void pv_soft_rdtsc(struct vcpu *v, struct cpu_user_regs *regs, int rdtscp) (d->arch.tsc_mode == TSC_MODE_PVRDTSCP) ? d->arch.incarnation : 0; } +bool clocksource_is_tsc(void) +{ + return plt_src.read_counter == read_tsc; +} + int host_tsc_is_safe(void) { return boot_cpu_has(X86_FEATURE_TSC_RELIABLE); diff --git a/xen/include/asm-x86/time.h b/xen/include/asm-x86/time.h index 971883a..6d704b4 100644 --- a/xen/include/asm-x86/time.h +++ b/xen/include/asm-x86/time.h @@ -69,6 +69,7 @@ void tsc_get_info(struct domain *d, uint32_t *tsc_mode, uint64_t *elapsed_nsec, void force_update_vcpu_system_time(struct vcpu *v); +bool clocksource_is_tsc(void); int host_tsc_is_safe(void); void cpuid_time_leaf(uint32_t sub_idx, uint32_t *eax, uint32_t *ebx, uint32_t *ecx, uint32_t *edx);