diff mbox series

[3/8] x86/svm: VMEntry/Exit logic for MSR_SPEC_CTRL

Message ID 20220126084452.28975-4-andrew.cooper3@citrix.com (mailing list archive)
State Superseded
Headers show
Series x86: MSR_SPEC_CTRL support for SVM guests | expand

Commit Message

Andrew Cooper Jan. 26, 2022, 8:44 a.m. UTC
Hardware maintains both host and guest versions of MSR_SPEC_CTRL, but guests
run with the logical OR of both values.  Therefore, in principle we want to
clear Xen's value before entering the guest.  However, for migration
compatibiltiy, and for performance reasons with SEV-SNP guests, we want the
ability to use a nonzero value behind the guest's back.  Use vcpu_msrs to hold
this value, with the guest value in the VMCB.

On the VMEntry path, adjusting MSR_SPEC_CTRL must be done after CLGI so as to
be atomic with respect to NMIs/etc.  The loading of spec_ctrl_raw into %eax
was also stale from the unused old code, so can be dropped too.

Implement both pieces of logic as small pieces of C, and alternative the call
to get there based on X86_FEATURE_SC_MSR_HVM.  While adjusting the clobber
lists, drop the stale requirements on the VMExit side.

The common case is that host and "guest-protection" values are both 0, so
maintain a per-cpu last_spec_ctrl value to allow us to skip redundant WRMSRs.
The value needs to live in the cpu_info block for subsequent use with PV
guests, and compatibility with XPTI.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Roger Pau Monné <roger.pau@citrix.com>
CC: Wei Liu <wl@xen.org>

Several points:

1) It would be slightly more efficient to pass curr and cpu_info into
   vm{entry,exit}_spec_ctrl(), but setup of such state can't be in the
   ALTERNATIVE block because then the call displacement won't get fixed up.
   All the additional accesses are hot off the stack, so almost certainly
   negligible compared to the WRMSR.

2) The RAS[:32] flushing side effect is under reconsideration.  It is actually
   a very awkward side effect in practice, and not applicable to any
   implementations (that I'm aware of), but for now, it's the documented safe
   action to take.  Furthermore, it avoids complicating the logic with an
   lfence in the else case for Spectre v1 safety.
---
 xen/arch/x86/hvm/svm/entry.S             | 10 +++++-----
 xen/arch/x86/hvm/svm/svm.c               | 30 ++++++++++++++++++++++++++++++
 xen/arch/x86/include/asm/current.h       |  2 +-
 xen/arch/x86/include/asm/msr.h           |  9 +++++++++
 xen/arch/x86/include/asm/spec_ctrl_asm.h |  7 +++++++
 5 files changed, 52 insertions(+), 6 deletions(-)

Comments

Jan Beulich Jan. 26, 2022, 4:50 p.m. UTC | #1
On 26.01.2022 09:44, Andrew Cooper wrote:
> 1) It would be slightly more efficient to pass curr and cpu_info into
>    vm{entry,exit}_spec_ctrl(), but setup of such state can't be in the
>    ALTERNATIVE block because then the call displacement won't get fixed up.
>    All the additional accesses are hot off the stack, so almost certainly
>    negligible compared to the WRMSR.

What's wrong with using two instances of ALTERNATIVE, one to setup the
call arguments and the 2nd for just the CALL?

> --- a/xen/arch/x86/hvm/svm/entry.S
> +++ b/xen/arch/x86/hvm/svm/entry.S
> @@ -55,11 +55,11 @@ __UNLIKELY_END(nsvm_hap)
>          mov  %rsp, %rdi
>          call svm_vmenter_helper
>  
> -        mov VCPU_arch_msrs(%rbx), %rax
> -        mov VCPUMSR_spec_ctrl_raw(%rax), %eax
> +        clgi
>  
>          /* WARNING! `ret`, `call *`, `jmp *` not safe beyond this point. */
> -        /* SPEC_CTRL_EXIT_TO_SVM   (nothing currently) */
> +        /* SPEC_CTRL_EXIT_TO_SVM       Req:                           Clob: C   */
> +        ALTERNATIVE "", __stringify(call vmentry_spec_ctrl), X86_FEATURE_SC_MSR_HVM

I guess the new upper case C after Clob: stands for "all call-clobbered
registers"? In which case ...

> @@ -86,8 +85,9 @@ __UNLIKELY_END(nsvm_hap)
>  
>          GET_CURRENT(bx)
>  
> -        /* SPEC_CTRL_ENTRY_FROM_SVM    Req: b=curr %rsp=regs/cpuinfo, Clob: ac  */
> +        /* SPEC_CTRL_ENTRY_FROM_SVM    Req:                           Clob: ac,C */
>          ALTERNATIVE "", DO_OVERWRITE_RSB, X86_FEATURE_SC_RSB_HVM
> +        ALTERNATIVE "", __stringify(call vmexit_spec_ctrl), X86_FEATURE_SC_MSR_HVM

... why the explicit further "ac" here? Is the intention to annotate
every individual ALTERNATIVE this way?

> --- a/xen/arch/x86/hvm/svm/svm.c
> +++ b/xen/arch/x86/hvm/svm/svm.c
> @@ -3086,6 +3086,36 @@ void svm_vmexit_handler(struct cpu_user_regs *regs)
>      vmcb_set_vintr(vmcb, intr);
>  }
>  
> +/* Called with GIF=0. */
> +void vmexit_spec_ctrl(void)
> +{
> +    struct cpu_info *info = get_cpu_info();
> +    unsigned int val = info->xen_spec_ctrl;
> +
> +    /*
> +     * Write to MSR_SPEC_CTRL unconditionally, for the RAS[:32] flushing side
> +     * effect.
> +     */
> +    wrmsr(MSR_SPEC_CTRL, val, 0);
> +    info->last_spec_ctrl = val;
> +}
> +
> +/* Called with GIF=0. */
> +void vmentry_spec_ctrl(void)
> +{
> +    struct cpu_info *info = get_cpu_info();
> +    const struct vcpu *curr = current;
> +    unsigned int val = curr->arch.msrs->spec_ctrl.raw;
> +
> +    if ( val != info->last_spec_ctrl )
> +    {
> +        wrmsr(MSR_SPEC_CTRL, val, 0);
> +        info->last_spec_ctrl = val;
> +    }

Is this correct for the very first use on a CPU? last_spec_ctrl
starts out as zero afaict, and hence this very first write would be
skipped if the guest value is also zero (which it will be for a
vCPU first launched), even if we have a non-zero value in the MSR
at that point.

Jan
Andrew Cooper Jan. 26, 2022, 8:16 p.m. UTC | #2
On 26/01/2022 16:50, Jan Beulich wrote:
> On 26.01.2022 09:44, Andrew Cooper wrote:
>> 1) It would be slightly more efficient to pass curr and cpu_info into
>>    vm{entry,exit}_spec_ctrl(), but setup of such state can't be in the
>>    ALTERNATIVE block because then the call displacement won't get fixed up.
>>    All the additional accesses are hot off the stack, so almost certainly
>>    negligible compared to the WRMSR.
> What's wrong with using two instances of ALTERNATIVE, one to setup the
> call arguments and the 2nd for just the CALL?

Hmm

diff --git a/xen/arch/x86/hvm/svm/entry.S b/xen/arch/x86/hvm/svm/entry.S
index c718328ac4cf..1d4be7e97ae2 100644
--- a/xen/arch/x86/hvm/svm/entry.S
+++ b/xen/arch/x86/hvm/svm/entry.S
@@ -59,6 +59,7 @@ __UNLIKELY_END(nsvm_hap)
 
         /* WARNING! `ret`, `call *`, `jmp *` not safe beyond this point. */
         /* SPEC_CTRL_EXIT_TO_SVM       Req:                          
Clob: C   */
+        ALTERNATIVE "", __stringify(mov %rbx, %rdi; mov %rsp, %rsi),
X86_FEATURE_SC_MSR_HVM
         ALTERNATIVE "", __stringify(call vmentry_spec_ctrl),
X86_FEATURE_SC_MSR_HVM
 
         pop  %r15

is somewhat of a long line, but isn't too terrible.

I'm tempted to switch back to using STR() seeing as we have both and it
is much more concise.
>> --- a/xen/arch/x86/hvm/svm/entry.S
>> +++ b/xen/arch/x86/hvm/svm/entry.S
>> @@ -55,11 +55,11 @@ __UNLIKELY_END(nsvm_hap)
>>          mov  %rsp, %rdi
>>          call svm_vmenter_helper
>>  
>> -        mov VCPU_arch_msrs(%rbx), %rax
>> -        mov VCPUMSR_spec_ctrl_raw(%rax), %eax
>> +        clgi
>>  
>>          /* WARNING! `ret`, `call *`, `jmp *` not safe beyond this point. */
>> -        /* SPEC_CTRL_EXIT_TO_SVM   (nothing currently) */
>> +        /* SPEC_CTRL_EXIT_TO_SVM       Req:                           Clob: C   */
>> +        ALTERNATIVE "", __stringify(call vmentry_spec_ctrl), X86_FEATURE_SC_MSR_HVM
> I guess the new upper case C after Clob: stands for "all call-clobbered
> registers"?

That was the intention, yes.

>  In which case ...
>
>> @@ -86,8 +85,9 @@ __UNLIKELY_END(nsvm_hap)
>>  
>>          GET_CURRENT(bx)
>>  
>> -        /* SPEC_CTRL_ENTRY_FROM_SVM    Req: b=curr %rsp=regs/cpuinfo, Clob: ac  */
>> +        /* SPEC_CTRL_ENTRY_FROM_SVM    Req:                           Clob: ac,C */
>>          ALTERNATIVE "", DO_OVERWRITE_RSB, X86_FEATURE_SC_RSB_HVM
>> +        ALTERNATIVE "", __stringify(call vmexit_spec_ctrl), X86_FEATURE_SC_MSR_HVM
> ... why the explicit further "ac" here? Is the intention to annotate
> every individual ALTERNATIVE this way?

Fair point.  I'll switch to just C.

The clobbers are rather more important for the PV side where the logic
has multiple live variables and it's not totally obvious that all GPRs
are available.

>
>> --- a/xen/arch/x86/hvm/svm/svm.c
>> +++ b/xen/arch/x86/hvm/svm/svm.c
>> @@ -3086,6 +3086,36 @@ void svm_vmexit_handler(struct cpu_user_regs *regs)
>>      vmcb_set_vintr(vmcb, intr);
>>  }
>>  
>> +/* Called with GIF=0. */
>> +void vmexit_spec_ctrl(void)
>> +{
>> +    struct cpu_info *info = get_cpu_info();
>> +    unsigned int val = info->xen_spec_ctrl;
>> +
>> +    /*
>> +     * Write to MSR_SPEC_CTRL unconditionally, for the RAS[:32] flushing side
>> +     * effect.
>> +     */
>> +    wrmsr(MSR_SPEC_CTRL, val, 0);
>> +    info->last_spec_ctrl = val;
>> +}
>> +
>> +/* Called with GIF=0. */
>> +void vmentry_spec_ctrl(void)
>> +{
>> +    struct cpu_info *info = get_cpu_info();
>> +    const struct vcpu *curr = current;
>> +    unsigned int val = curr->arch.msrs->spec_ctrl.raw;
>> +
>> +    if ( val != info->last_spec_ctrl )
>> +    {
>> +        wrmsr(MSR_SPEC_CTRL, val, 0);
>> +        info->last_spec_ctrl = val;
>> +    }
> Is this correct for the very first use on a CPU? last_spec_ctrl
> starts out as zero afaict, and hence this very first write would be
> skipped if the guest value is also zero (which it will be for a
> vCPU first launched), even if we have a non-zero value in the MSR
> at that point.

Ish.

We intentionally write MSR_SPEC_CTRL once on each CPU to clear out any
previous-environment settings, but those boot paths need to latch
last_spec_ctrl too for this to work correctly.

Making this safe is slightly nasty.  I think the best option would be to
reorder this patch to be after the patch 6, and tweak the wording in
patch 6's commit message.  That way, we're not adding latching to
later-dropped codepaths.

~Andrew
Jan Beulich Jan. 27, 2022, 7:25 a.m. UTC | #3
On 26.01.2022 21:16, Andrew Cooper wrote:
> On 26/01/2022 16:50, Jan Beulich wrote:
>> On 26.01.2022 09:44, Andrew Cooper wrote:
>>> 1) It would be slightly more efficient to pass curr and cpu_info into
>>>    vm{entry,exit}_spec_ctrl(), but setup of such state can't be in the
>>>    ALTERNATIVE block because then the call displacement won't get fixed up.
>>>    All the additional accesses are hot off the stack, so almost certainly
>>>    negligible compared to the WRMSR.
>> What's wrong with using two instances of ALTERNATIVE, one to setup the
>> call arguments and the 2nd for just the CALL?
> 
> Hmm
> 
> diff --git a/xen/arch/x86/hvm/svm/entry.S b/xen/arch/x86/hvm/svm/entry.S
> index c718328ac4cf..1d4be7e97ae2 100644
> --- a/xen/arch/x86/hvm/svm/entry.S
> +++ b/xen/arch/x86/hvm/svm/entry.S
> @@ -59,6 +59,7 @@ __UNLIKELY_END(nsvm_hap)
>  
>          /* WARNING! `ret`, `call *`, `jmp *` not safe beyond this point. */
>          /* SPEC_CTRL_EXIT_TO_SVM       Req:                          
> Clob: C   */
> +        ALTERNATIVE "", __stringify(mov %rbx, %rdi; mov %rsp, %rsi),
> X86_FEATURE_SC_MSR_HVM
>          ALTERNATIVE "", __stringify(call vmentry_spec_ctrl),
> X86_FEATURE_SC_MSR_HVM
>  
>          pop  %r15
> 
> is somewhat of a long line, but isn't too terrible.
> 
> I'm tempted to switch back to using STR() seeing as we have both and it
> is much more concise.

No objections. In fact I think when I introduced stringify.h over 10 years
back, I didn't realize we already had STR(). Quite certainly first and
foremost because of its bogus placement in xen/config.h (same would go for
at least IS_ALIGNED() as well as KB() and friends).

I wouldn't even mind phasing out stringify.h again. Or maybe we want to
move STR() there ...

Jan
Andrew Cooper Jan. 27, 2022, 9:55 p.m. UTC | #4
On 27/01/2022 07:25, Jan Beulich wrote:
> On 26.01.2022 21:16, Andrew Cooper wrote:
>> On 26/01/2022 16:50, Jan Beulich wrote:
>>> On 26.01.2022 09:44, Andrew Cooper wrote:
>>>> 1) It would be slightly more efficient to pass curr and cpu_info into
>>>>    vm{entry,exit}_spec_ctrl(), but setup of such state can't be in the
>>>>    ALTERNATIVE block because then the call displacement won't get fixed up.
>>>>    All the additional accesses are hot off the stack, so almost certainly
>>>>    negligible compared to the WRMSR.
>>> What's wrong with using two instances of ALTERNATIVE, one to setup the
>>> call arguments and the 2nd for just the CALL?
>> Hmm
>>
>> diff --git a/xen/arch/x86/hvm/svm/entry.S b/xen/arch/x86/hvm/svm/entry.S
>> index c718328ac4cf..1d4be7e97ae2 100644
>> --- a/xen/arch/x86/hvm/svm/entry.S
>> +++ b/xen/arch/x86/hvm/svm/entry.S
>> @@ -59,6 +59,7 @@ __UNLIKELY_END(nsvm_hap)
>>  
>>          /* WARNING! `ret`, `call *`, `jmp *` not safe beyond this point. */
>>          /* SPEC_CTRL_EXIT_TO_SVM       Req:                          
>> Clob: C   */
>> +        ALTERNATIVE "", __stringify(mov %rbx, %rdi; mov %rsp, %rsi),
>> X86_FEATURE_SC_MSR_HVM
>>          ALTERNATIVE "", __stringify(call vmentry_spec_ctrl),
>> X86_FEATURE_SC_MSR_HVM
>>  
>>          pop  %r15
>>
>> is somewhat of a long line, but isn't too terrible.
>>
>> I'm tempted to switch back to using STR() seeing as we have both and it
>> is much more concise.
> No objections. In fact I think when I introduced stringify.h over 10 years
> back, I didn't realize we already had STR(). Quite certainly first and
> foremost because of its bogus placement in xen/config.h (same would go for
> at least IS_ALIGNED() as well as KB() and friends).
>
> I wouldn't even mind phasing out stringify.h again. Or maybe we want to
> move STR() there ...

Now that we've given up using freestanding headers anywhere, there's a
xen/stddef.h shaped hole.

Things like NULL should move across, but it would also be the
appropriate place for ARRAY_SIZE() and pretty much everything else we
expect to be ubiquitous.  Linux includes things like offsetof().

~Andrew
diff mbox series

Patch

diff --git a/xen/arch/x86/hvm/svm/entry.S b/xen/arch/x86/hvm/svm/entry.S
index 276215d36aff..c718328ac4cf 100644
--- a/xen/arch/x86/hvm/svm/entry.S
+++ b/xen/arch/x86/hvm/svm/entry.S
@@ -55,11 +55,11 @@  __UNLIKELY_END(nsvm_hap)
         mov  %rsp, %rdi
         call svm_vmenter_helper
 
-        mov VCPU_arch_msrs(%rbx), %rax
-        mov VCPUMSR_spec_ctrl_raw(%rax), %eax
+        clgi
 
         /* WARNING! `ret`, `call *`, `jmp *` not safe beyond this point. */
-        /* SPEC_CTRL_EXIT_TO_SVM   (nothing currently) */
+        /* SPEC_CTRL_EXIT_TO_SVM       Req:                           Clob: C   */
+        ALTERNATIVE "", __stringify(call vmentry_spec_ctrl), X86_FEATURE_SC_MSR_HVM
 
         pop  %r15
         pop  %r14
@@ -78,7 +78,6 @@  __UNLIKELY_END(nsvm_hap)
         pop  %rsi
         pop  %rdi
 
-        clgi
         sti
         vmrun
 
@@ -86,8 +85,9 @@  __UNLIKELY_END(nsvm_hap)
 
         GET_CURRENT(bx)
 
-        /* SPEC_CTRL_ENTRY_FROM_SVM    Req: b=curr %rsp=regs/cpuinfo, Clob: ac  */
+        /* SPEC_CTRL_ENTRY_FROM_SVM    Req:                           Clob: ac,C */
         ALTERNATIVE "", DO_OVERWRITE_RSB, X86_FEATURE_SC_RSB_HVM
+        ALTERNATIVE "", __stringify(call vmexit_spec_ctrl), X86_FEATURE_SC_MSR_HVM
         /* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
 
         stgi
diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
index bb6b8e560a9f..8fdb530b4004 100644
--- a/xen/arch/x86/hvm/svm/svm.c
+++ b/xen/arch/x86/hvm/svm/svm.c
@@ -3086,6 +3086,36 @@  void svm_vmexit_handler(struct cpu_user_regs *regs)
     vmcb_set_vintr(vmcb, intr);
 }
 
+/* Called with GIF=0. */
+void vmexit_spec_ctrl(void)
+{
+    struct cpu_info *info = get_cpu_info();
+    unsigned int val = info->xen_spec_ctrl;
+
+    /*
+     * Write to MSR_SPEC_CTRL unconditionally, for the RAS[:32] flushing side
+     * effect.
+     */
+    wrmsr(MSR_SPEC_CTRL, val, 0);
+    info->last_spec_ctrl = val;
+}
+
+/* Called with GIF=0. */
+void vmentry_spec_ctrl(void)
+{
+    struct cpu_info *info = get_cpu_info();
+    const struct vcpu *curr = current;
+    unsigned int val = curr->arch.msrs->spec_ctrl.raw;
+
+    if ( val != info->last_spec_ctrl )
+    {
+        wrmsr(MSR_SPEC_CTRL, val, 0);
+        info->last_spec_ctrl = val;
+    }
+
+    /* No Spectre v1 concerns.  Execution is going to hit VMRUN imminently. */
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/arch/x86/include/asm/current.h b/xen/arch/x86/include/asm/current.h
index cfbedc31983f..dc0edd9ed07d 100644
--- a/xen/arch/x86/include/asm/current.h
+++ b/xen/arch/x86/include/asm/current.h
@@ -56,6 +56,7 @@  struct cpu_info {
     /* See asm/spec_ctrl_asm.h for usage. */
     unsigned int shadow_spec_ctrl;
     uint8_t      xen_spec_ctrl;
+    uint8_t      last_spec_ctrl;
     uint8_t      spec_ctrl_flags;
 
     /*
@@ -73,7 +74,6 @@  struct cpu_info {
      */
     bool         use_pv_cr3;
 
-    unsigned long __pad;
     /* get_stack_bottom() must be 16-byte aligned */
 };
 
diff --git a/xen/arch/x86/include/asm/msr.h b/xen/arch/x86/include/asm/msr.h
index 657a3295613d..ce4fe51afe54 100644
--- a/xen/arch/x86/include/asm/msr.h
+++ b/xen/arch/x86/include/asm/msr.h
@@ -297,6 +297,15 @@  struct vcpu_msrs
      *
      * For VT-x guests, the guest value is held in the MSR guest load/save
      * list.
+     *
+     * For SVM, the guest value lives in the VMCB, and hardware saves/restores
+     * the host value automatically.  However, guests run with the OR of the
+     * host and guest value, which allows Xen to set protections behind the
+     * guest's back.
+     *
+     * We must clear/restore Xen's value before/after VMRUN to avoid unduly
+     * influencing the guest.  In order to support "behind the guest's back"
+     * protections, we load this value (commonly 0) before VMRUN.
      */
     struct {
         uint32_t raw;
diff --git a/xen/arch/x86/include/asm/spec_ctrl_asm.h b/xen/arch/x86/include/asm/spec_ctrl_asm.h
index bf82528a12ae..02b3b18ce69f 100644
--- a/xen/arch/x86/include/asm/spec_ctrl_asm.h
+++ b/xen/arch/x86/include/asm/spec_ctrl_asm.h
@@ -46,6 +46,9 @@ 
  *   - On VMX by using MSR load/save lists to have vmentry/exit atomically
  *     load/save the guest value.  Xen's value is loaded in regular code, and
  *     there is no need to use the shadow logic (below).
+ *   - On SVM by altering MSR_SPEC_CTRL inside the CLGI/STGI region.  This
+ *     makes the changes atomic with respect to NMIs/etc, so no need for
+ *     shadowing logic.
  *
  * Factor 2 is harder.  We maintain a shadow_spec_ctrl value, and a use_shadow
  * boolean in the per cpu spec_ctrl_flags.  The synchronous use is:
@@ -67,6 +70,10 @@ 
  * steps 2 and 6 will restore the shadow value rather than leaving Xen's value
  * loaded and corrupting the value used in guest context.
  *
+ * Additionally, in some cases it is safe to skip writes to MSR_SPEC_CTRL when
+ * we don't require any of the side effects of an identical write.  Maintain a
+ * per-cpu last_spec_ctrl value for this purpose.
+ *
  * The following ASM fragments implement this algorithm.  See their local
  * comments for further details.
  *  - SPEC_CTRL_ENTRY_FROM_PV