diff mbox

[RFC] Keeping host value of CR4.MCE (and other bits) live during guest execution

Message ID CAN+hb0V1ivsNx+_1t_LmPvYW5e-MgBYwt5tTUObRh8nWQ4SKkw@mail.gmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

Benjamin Serebrin April 15, 2015, 11:52 p.m. UTC
There's a bug in kvm/vmx.c: if the host enabled machine check (CR4.MCE==1),
the value gets zeroed while the CPU is running in guest context.
If a machine check event arrives while the CPU is in guest context and
effective CR4.MCE is zero, the machine raises CATERR and crashes.

We should make sure the host value of CR4.MCE is always active.  There
are read and write shadows for the guest to think it wrote its own value.

For discussion: there's new complexity with CR4 shadowing
(1e02ce4cccdcb9688386e5b8d2c9fa4660b45389).  I measure CR4 reads at 24
cycles on haswell and 36 on sandybridge, which compares well with
L2 miss costs.  Is the shadowing worth the complexity?  CR4 is also
cached (with no real consistency mechanism) in the VMCS at the time
of guest VCPU creation.  If there is ever a change in CR4 value
over time, or if CR4 is different on different CPUs in the system, all this
logic gets broken.

Thanks,
Ben


---

The host's decision to enable machine check exceptions should remain
in force during non-root mode.  KVM was writing 0 to cr4 on VCPU reset
and passed a slightly-modified 0 to the vmcs.guest_cr4 value.

Tested: Inject machine check while a guest is spinning.
Before the change, if guest CR4.MCE==0, then the machine check is
escalated to Catastrophic Error (CATERR) and the machine dies.
If guest CR4.MCE==1, then the machine check causes VMEXIT and is
handled normally by host Linux. After the change, injecting a machine
check causes normal Linux machine check handling.


---
 arch/x86/kvm/vmx.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

                /*
--
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Andy Lutomirski April 16, 2015, 12:05 a.m. UTC | #1
On Wed, Apr 15, 2015 at 4:52 PM, Benjamin Serebrin <serebrin@google.com> wrote:
> There's a bug in kvm/vmx.c: if the host enabled machine check (CR4.MCE==1),
> the value gets zeroed while the CPU is running in guest context.
> If a machine check event arrives while the CPU is in guest context and
> effective CR4.MCE is zero, the machine raises CATERR and crashes.
>
> We should make sure the host value of CR4.MCE is always active.  There
> are read and write shadows for the guest to think it wrote its own value.
>
> For discussion: there's new complexity with CR4 shadowing
> (1e02ce4cccdcb9688386e5b8d2c9fa4660b45389).  I measure CR4 reads at 24
> cycles on haswell and 36 on sandybridge, which compares well with
> L2 miss costs.  Is the shadowing worth the complexity?  CR4 is also
> cached (with no real consistency mechanism) in the VMCS at the time
> of guest VCPU creation.  If there is ever a change in CR4 value
> over time, or if CR4 is different on different CPUs in the system, all this
> logic gets broken.

The cache line I chose for the shadow is hot during context switches,
so we shouldn't ever have a meaningful cache miss there (except maybe
in KVM).

Also, we do have a consistency mechanism -- see:

commit d974baa398f34393db76be45f7d4d04fbdbb4a0a
Author: Andy Lutomirski <luto@amacapital.net>
Date:   Wed Oct 8 09:02:13 2014 -0700

    x86,kvm,vmx: Preserve CR4 across VM entry

>
>
> ---
>
> The host's decision to enable machine check exceptions should remain
> in force during non-root mode.  KVM was writing 0 to cr4 on VCPU reset
> and passed a slightly-modified 0 to the vmcs.guest_cr4 value.
>
> Tested: Inject machine check while a guest is spinning.
> Before the change, if guest CR4.MCE==0, then the machine check is
> escalated to Catastrophic Error (CATERR) and the machine dies.
> If guest CR4.MCE==1, then the machine check causes VMEXIT and is
> handled normally by host Linux. After the change, injecting a machine
> check causes normal Linux machine check handling.
>
>
> ---
>  arch/x86/kvm/vmx.c | 12 ++++++++++--
>  1 file changed, 10 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index a214104..44c8d24 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -3456,8 +3456,16 @@ static void vmx_set_cr3(struct kvm_vcpu *vcpu,
> unsigned long cr3)
>
>  static int vmx_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
>  {
> -       unsigned long hw_cr4 = cr4 | (to_vmx(vcpu)->rmode.vm86_active ?
> -                   KVM_RMODE_VM_CR4_ALWAYS_ON : KVM_PMODE_VM_CR4_ALWAYS_ON);
> +       /*
> +        * Pass through host's Machine Check Enable value to hw_cr4, which
> +        * is in force while we are in guest mode.  Do not let guests control
> +        * this bit, even if host CR4.MCE == 0.
> +        */
> +       unsigned long hw_cr4 =
> +               (read_cr4() & X86_CR4_MCE) |

cr4_read_shadow?  There is no read_cr4 any more.

> +               (cr4 & ~X86_CR4_MCE) |
> +               (to_vmx(vcpu)->rmode.vm86_active ?
> +                KVM_RMODE_VM_CR4_ALWAYS_ON : KVM_PMODE_VM_CR4_ALWAYS_ON);
>

Otherwise looks sensible.

--Andy

>         if (cr4 & X86_CR4_VMXE) {
>                 /*
> --
Paolo Bonzini April 16, 2015, 9:23 a.m. UTC | #2
On 16/04/2015 01:52, Benjamin Serebrin wrote:
> There's a bug in kvm/vmx.c: if the host enabled machine check (CR4.MCE==1),
> the value gets zeroed while the CPU is running in guest context.
> If a machine check event arrives while the CPU is in guest context and
> effective CR4.MCE is zero, the machine raises CATERR and crashes.
> 
> We should make sure the host value of CR4.MCE is always active.  There
> are read and write shadows for the guest to think it wrote its own value.
> 
> For discussion: there's new complexity with CR4 shadowing
> (1e02ce4cccdcb9688386e5b8d2c9fa4660b45389).  I measure CR4 reads at 24
> cycles on haswell and 36 on sandybridge, which compares well with
> L2 miss costs.  Is the shadowing worth the complexity?  CR4 is also
> cached (with no real consistency mechanism) in the VMCS at the time
> of guest VCPU creation.  If there is ever a change in CR4 value
> over time, or if CR4 is different on different CPUs in the system, all this
> logic gets broken.

Good catch.  Please resubmit with your Signed-off-by, and rebased to
Linus's tree.

Paolo

> Thanks,
> Ben
> 
> 
> ---
> 
> The host's decision to enable machine check exceptions should remain
> in force during non-root mode.  KVM was writing 0 to cr4 on VCPU reset
> and passed a slightly-modified 0 to the vmcs.guest_cr4 value.
> 
> Tested: Inject machine check while a guest is spinning.
> Before the change, if guest CR4.MCE==0, then the machine check is
> escalated to Catastrophic Error (CATERR) and the machine dies.
> If guest CR4.MCE==1, then the machine check causes VMEXIT and is
> handled normally by host Linux. After the change, injecting a machine
> check causes normal Linux machine check handling.
> 
> 
> ---
>  arch/x86/kvm/vmx.c | 12 ++++++++++--
>  1 file changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index a214104..44c8d24 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -3456,8 +3456,16 @@ static void vmx_set_cr3(struct kvm_vcpu *vcpu,
> unsigned long cr3)
> 
>  static int vmx_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
>  {
> -       unsigned long hw_cr4 = cr4 | (to_vmx(vcpu)->rmode.vm86_active ?
> -                   KVM_RMODE_VM_CR4_ALWAYS_ON : KVM_PMODE_VM_CR4_ALWAYS_ON);
> +       /*
> +        * Pass through host's Machine Check Enable value to hw_cr4, which
> +        * is in force while we are in guest mode.  Do not let guests control
> +        * this bit, even if host CR4.MCE == 0.
> +        */
> +       unsigned long hw_cr4 =
> +               (read_cr4() & X86_CR4_MCE) |
> +               (cr4 & ~X86_CR4_MCE) |
> +               (to_vmx(vcpu)->rmode.vm86_active ?
> +                KVM_RMODE_VM_CR4_ALWAYS_ON : KVM_PMODE_VM_CR4_ALWAYS_ON);
> 
>         if (cr4 & X86_CR4_VMXE) {
>                 /*
> --
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Benjamin Serebrin April 16, 2015, 4:41 p.m. UTC | #3
Thanks very much, Andy and Paolo.  I've brought my patch out of the dark age
and just re-sent it under a new subject line.


On Thu, Apr 16, 2015 at 2:23 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>
>
> On 16/04/2015 01:52, Benjamin Serebrin wrote:
>> There's a bug in kvm/vmx.c: if the host enabled machine check (CR4.MCE==1),
>> the value gets zeroed while the CPU is running in guest context.
>> If a machine check event arrives while the CPU is in guest context and
>> effective CR4.MCE is zero, the machine raises CATERR and crashes.
>>
>> We should make sure the host value of CR4.MCE is always active.  There
>> are read and write shadows for the guest to think it wrote its own value.
>>
>> For discussion: there's new complexity with CR4 shadowing
>> (1e02ce4cccdcb9688386e5b8d2c9fa4660b45389).  I measure CR4 reads at 24
>> cycles on haswell and 36 on sandybridge, which compares well with
>> L2 miss costs.  Is the shadowing worth the complexity?  CR4 is also
>> cached (with no real consistency mechanism) in the VMCS at the time
>> of guest VCPU creation.  If there is ever a change in CR4 value
>> over time, or if CR4 is different on different CPUs in the system, all this
>> logic gets broken.
>
> Good catch.  Please resubmit with your Signed-off-by, and rebased to
> Linus's tree.
>
> Paolo
>
>> Thanks,
>> Ben
>>
>>
>> ---
>>
>> The host's decision to enable machine check exceptions should remain
>> in force during non-root mode.  KVM was writing 0 to cr4 on VCPU reset
>> and passed a slightly-modified 0 to the vmcs.guest_cr4 value.
>>
>> Tested: Inject machine check while a guest is spinning.
>> Before the change, if guest CR4.MCE==0, then the machine check is
>> escalated to Catastrophic Error (CATERR) and the machine dies.
>> If guest CR4.MCE==1, then the machine check causes VMEXIT and is
>> handled normally by host Linux. After the change, injecting a machine
>> check causes normal Linux machine check handling.
>>
>>
>> ---
>>  arch/x86/kvm/vmx.c | 12 ++++++++++--
>>  1 file changed, 10 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>> index a214104..44c8d24 100644
>> --- a/arch/x86/kvm/vmx.c
>> +++ b/arch/x86/kvm/vmx.c
>> @@ -3456,8 +3456,16 @@ static void vmx_set_cr3(struct kvm_vcpu *vcpu,
>> unsigned long cr3)
>>
>>  static int vmx_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
>>  {
>> -       unsigned long hw_cr4 = cr4 | (to_vmx(vcpu)->rmode.vm86_active ?
>> -                   KVM_RMODE_VM_CR4_ALWAYS_ON : KVM_PMODE_VM_CR4_ALWAYS_ON);
>> +       /*
>> +        * Pass through host's Machine Check Enable value to hw_cr4, which
>> +        * is in force while we are in guest mode.  Do not let guests control
>> +        * this bit, even if host CR4.MCE == 0.
>> +        */
>> +       unsigned long hw_cr4 =
>> +               (read_cr4() & X86_CR4_MCE) |
>> +               (cr4 & ~X86_CR4_MCE) |
>> +               (to_vmx(vcpu)->rmode.vm86_active ?
>> +                KVM_RMODE_VM_CR4_ALWAYS_ON : KVM_PMODE_VM_CR4_ALWAYS_ON);
>>
>>         if (cr4 & X86_CR4_VMXE) {
>>                 /*
>> --
>>
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index a214104..44c8d24 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3456,8 +3456,16 @@  static void vmx_set_cr3(struct kvm_vcpu *vcpu,
unsigned long cr3)

 static int vmx_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
 {
-       unsigned long hw_cr4 = cr4 | (to_vmx(vcpu)->rmode.vm86_active ?
-                   KVM_RMODE_VM_CR4_ALWAYS_ON : KVM_PMODE_VM_CR4_ALWAYS_ON);
+       /*
+        * Pass through host's Machine Check Enable value to hw_cr4, which
+        * is in force while we are in guest mode.  Do not let guests control
+        * this bit, even if host CR4.MCE == 0.
+        */
+       unsigned long hw_cr4 =
+               (read_cr4() & X86_CR4_MCE) |
+               (cr4 & ~X86_CR4_MCE) |
+               (to_vmx(vcpu)->rmode.vm86_active ?
+                KVM_RMODE_VM_CR4_ALWAYS_ON : KVM_PMODE_VM_CR4_ALWAYS_ON);

        if (cr4 & X86_CR4_VMXE) {