diff mbox series

x86/time: switch platform timer hooks to altcall

Message ID 87997f62-a6e2-1812-ccf5-d7d2e65fd50e@suse.com (mailing list archive)
State Superseded
Headers show
Series x86/time: switch platform timer hooks to altcall | expand

Commit Message

Jan Beulich Jan. 12, 2022, 8:58 a.m. UTC
Except in the "clocksource=tsc" case we can replace the indirect calls
involved in accessing the platform timers by direct ones, as they get
established once and never changed.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
Sort of RFC, for both the whether and the how aspects.

TBD: Overriding X86_FEATURE_ALWAYS is somewhat dangerous; there's only
     no issue with e.g. hvm_set_tsc_offset() used later in time.c
     because that's an inline function which did already "latch" the
     usual value of the macro. But the alternative of introducing an
     alternative_call() variant allowing to specify the controlling
     feature also doesn't look overly nice to me either. Then again the
     .resume hook invocation could be patched unconditionally, as the
     TSC clocksource leaves this hook set to NULL.

Comments

Andrew Cooper Jan. 12, 2022, 9:17 a.m. UTC | #1
On 12/01/2022 08:58, Jan Beulich wrote:
> Except in the "clocksource=tsc" case we can replace the indirect calls
> involved in accessing the platform timers by direct ones, as they get
> established once and never changed.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> ---
> Sort of RFC, for both the whether and the how aspects.
>
> TBD: Overriding X86_FEATURE_ALWAYS is somewhat dangerous; there's only
>      no issue with e.g. hvm_set_tsc_offset() used later in time.c
>      because that's an inline function which did already "latch" the
>      usual value of the macro. But the alternative of introducing an
>      alternative_call() variant allowing to specify the controlling
>      feature also doesn't look overly nice to me either. Then again the
>      .resume hook invocation could be patched unconditionally, as the
>      TSC clocksource leaves this hook set to NULL.
>
> --- a/xen/arch/x86/alternative.c
> +++ b/xen/arch/x86/alternative.c
> @@ -268,8 +268,7 @@ static void init_or_livepatch _apply_alt
>               * point the branch destination is still NULL, insert "UD2; UD0"
>               * (for ease of recognition) instead of CALL/JMP.
>               */
> -            if ( a->cpuid == X86_FEATURE_ALWAYS &&
> -                 *(int32_t *)(buf + 1) == -5 &&
> +            if ( *(int32_t *)(buf + 1) == -5 &&

I'm afraid that this must not become conditional.

One of the reasons I was hesitant towards the mechanics of altcall in
the first place was that it intentionally broke Spectre v2 protections
by manually writing out a non-retpoline'd indirect call.

This is made safe in practice because all altcall sites either get
converted to a direct call, or rewritten to be a UD2.


If you want to make altcalls conversions conditional, then the code gen
must be rearranged to use INDIRECT_CALL first, but that comes with
overheads too because then call callsites would load the function
pointer into a register, just to not use it in the patched case.

I suspect it will be easier to figure out how to make the TSC case also
invariant after boot.

~Andrew
Jan Beulich Jan. 12, 2022, 9:46 a.m. UTC | #2
On 12.01.2022 10:17, Andrew Cooper wrote:
> On 12/01/2022 08:58, Jan Beulich wrote:
>> Except in the "clocksource=tsc" case we can replace the indirect calls
>> involved in accessing the platform timers by direct ones, as they get
>> established once and never changed.
>>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>> ---
>> Sort of RFC, for both the whether and the how aspects.
>>
>> TBD: Overriding X86_FEATURE_ALWAYS is somewhat dangerous; there's only
>>      no issue with e.g. hvm_set_tsc_offset() used later in time.c
>>      because that's an inline function which did already "latch" the
>>      usual value of the macro. But the alternative of introducing an
>>      alternative_call() variant allowing to specify the controlling
>>      feature also doesn't look overly nice to me either. Then again the
>>      .resume hook invocation could be patched unconditionally, as the
>>      TSC clocksource leaves this hook set to NULL.
>>
>> --- a/xen/arch/x86/alternative.c
>> +++ b/xen/arch/x86/alternative.c
>> @@ -268,8 +268,7 @@ static void init_or_livepatch _apply_alt
>>               * point the branch destination is still NULL, insert "UD2; UD0"
>>               * (for ease of recognition) instead of CALL/JMP.
>>               */
>> -            if ( a->cpuid == X86_FEATURE_ALWAYS &&
>> -                 *(int32_t *)(buf + 1) == -5 &&
>> +            if ( *(int32_t *)(buf + 1) == -5 &&
> 
> I'm afraid that this must not become conditional.
> 
> One of the reasons I was hesitant towards the mechanics of altcall in
> the first place was that it intentionally broke Spectre v2 protections
> by manually writing out a non-retpoline'd indirect call.
> 
> This is made safe in practice because all altcall sites either get
> converted to a direct call, or rewritten to be a UD2.

Oh, sorry, I really should have realized this.

> If you want to make altcalls conversions conditional, then the code gen
> must be rearranged to use INDIRECT_CALL first, but that comes with
> overheads too because then call callsites would load the function
> pointer into a register, just to not use it in the patched case.

I don't view this as desirable.

> I suspect it will be easier to figure out how to make the TSC case also
> invariant after boot.

Since switching to "tsc" happens only after bringing up all CPUs, I don't
see how this could become possible; in particular I don't fancy an
alternatives patching pass with all CPUs online (albeit technically this
could be an option).

The solution (workaround) I can see for now is using

    if ( plt_src.read_counter != read_tsc )
        count = alternative_call(plt_src.read_counter);
    else
        count = rdtsc_ordered();

everywhere. Potentially we'd then want to hide this in a static (inline?)
function.

Jan
diff mbox series

Patch

--- a/xen/arch/x86/alternative.c
+++ b/xen/arch/x86/alternative.c
@@ -268,8 +268,7 @@  static void init_or_livepatch _apply_alt
              * point the branch destination is still NULL, insert "UD2; UD0"
              * (for ease of recognition) instead of CALL/JMP.
              */
-            if ( a->cpuid == X86_FEATURE_ALWAYS &&
-                 *(int32_t *)(buf + 1) == -5 &&
+            if ( *(int32_t *)(buf + 1) == -5 &&
                  a->orig_len >= 6 &&
                  orig[0] == 0xff &&
                  orig[1] == (*buf & 1 ? 0x25 : 0x15) )
--- a/xen/arch/x86/include/asm/cpufeatures.h
+++ b/xen/arch/x86/include/asm/cpufeatures.h
@@ -24,7 +24,7 @@  XEN_CPUFEATURE(APERFMPERF,        X86_SY
 XEN_CPUFEATURE(MFENCE_RDTSC,      X86_SYNTH( 9)) /* MFENCE synchronizes RDTSC */
 XEN_CPUFEATURE(XEN_SMEP,          X86_SYNTH(10)) /* SMEP gets used by Xen itself */
 XEN_CPUFEATURE(XEN_SMAP,          X86_SYNTH(11)) /* SMAP gets used by Xen itself */
-/* Bit 12 - unused. */
+XEN_CPUFEATURE(CS_NOT_TSC,        X86_SYNTH(12)) /* Clocksource is not TSC */
 XEN_CPUFEATURE(IND_THUNK_LFENCE,  X86_SYNTH(13)) /* Use IND_THUNK_LFENCE */
 XEN_CPUFEATURE(IND_THUNK_JMP,     X86_SYNTH(14)) /* Use IND_THUNK_JMP */
 XEN_CPUFEATURE(SC_NO_BRANCH_HARDEN, X86_SYNTH(15)) /* (Disable) Conditional branch hardening */
--- a/xen/arch/x86/time.c
+++ b/xen/arch/x86/time.c
@@ -786,6 +786,14 @@  static struct platform_timesource __init
  * GENERIC PLATFORM TIMER INFRASTRUCTURE
  */
 
+/*
+ * Override for alternative call patching: Since "clocksource=tsc" is honored
+ * only after all CPUs have been brought up, we can't patch indirect calls in
+ * that case.
+ */
+#undef X86_FEATURE_ALWAYS
+#define X86_FEATURE_ALWAYS X86_FEATURE_CS_NOT_TSC
+
 /* details of chosen timesource */
 static struct platform_timesource __read_mostly plt_src;
 /* hardware-width mask */
@@ -818,7 +826,7 @@  static void plt_overflow(void *unused)
 
     spin_lock_irq(&platform_timer_lock);
 
-    count = plt_src.read_counter();
+    count = alternative_call(plt_src.read_counter);
     plt_stamp64 += (count - plt_stamp) & plt_mask;
     plt_stamp = count;
 
@@ -854,7 +862,7 @@  static s_time_t read_platform_stime(u64
     ASSERT(!local_irq_is_enabled());
 
     spin_lock(&platform_timer_lock);
-    plt_counter = plt_src.read_counter();
+    plt_counter = alternative_call(plt_src.read_counter);
     count = plt_stamp64 + ((plt_counter - plt_stamp) & plt_mask);
     stime = __read_platform_stime(count);
     spin_unlock(&platform_timer_lock);
@@ -872,7 +880,8 @@  static void platform_time_calibration(vo
     unsigned long flags;
 
     spin_lock_irqsave(&platform_timer_lock, flags);
-    count = plt_stamp64 + ((plt_src.read_counter() - plt_stamp) & plt_mask);
+    count = plt_stamp64 + ((alternative_call(plt_src.read_counter) -
+                            plt_stamp) & plt_mask);
     stamp = __read_platform_stime(count);
     stime_platform_stamp = stamp;
     platform_timer_stamp = count;
@@ -883,10 +892,10 @@  static void resume_platform_timer(void)
 {
     /* Timer source can be reset when backing from S3 to S0 */
     if ( plt_src.resume )
-        plt_src.resume(&plt_src);
+        alternative_vcall(plt_src.resume, &plt_src);
 
     plt_stamp64 = platform_timer_stamp;
-    plt_stamp = plt_src.read_counter();
+    plt_stamp = alternative_call(plt_src.read_counter);
 }
 
 static void __init reset_platform_timer(void)
@@ -975,6 +984,10 @@  static u64 __init init_platform_timer(vo
     printk("Platform timer is %s %s\n",
            freq_string(pts->frequency), pts->name);
 
+    if ( strcmp(opt_clocksource, "tsc") ||
+         !boot_cpu_has(X86_FEATURE_TSC_RELIABLE) )
+        setup_force_cpu_cap(X86_FEATURE_CS_NOT_TSC);
+
     return rc;
 }