Message ID | CAN+hb0V1ivsNx+_1t_LmPvYW5e-MgBYwt5tTUObRh8nWQ4SKkw@mail.gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Wed, Apr 15, 2015 at 4:52 PM, Benjamin Serebrin <serebrin@google.com> wrote: > There's a bug in kvm/vmx.c: if the host enabled machine check (CR4.MCE==1), > the value gets zeroed while the CPU is running in guest context. > If a machine check event arrives while the CPU is in guest context and > effective CR4.MCE is zero, the machine raises CATERR and crashes. > > We should make sure the host value of CR4.MCE is always active. There > are read and write shadows for the guest to think it wrote its own value. > > For discussion: there's new complexity with CR4 shadowing > (1e02ce4cccdcb9688386e5b8d2c9fa4660b45389). I measure CR4 reads at 24 > cycles on haswell and 36 on sandybridge, which compares well with > L2 miss costs. Is the shadowing worth the complexity? CR4 is also > cached (with no real consistency mechanism) in the VMCS at the time > of guest VCPU creation. If there is ever a change in CR4 value > over time, or if CR4 is different on different CPUs in the system, all this > logic gets broken. The cache line I chose for the shadow is hot during context switches, so we shouldn't ever have a meaningful cache miss there (except maybe in KVM). Also, we do have a consistency mechanism -- see: commit d974baa398f34393db76be45f7d4d04fbdbb4a0a Author: Andy Lutomirski <luto@amacapital.net> Date: Wed Oct 8 09:02:13 2014 -0700 x86,kvm,vmx: Preserve CR4 across VM entry > > > --- > > The host's decision to enable machine check exceptions should remain > in force during non-root mode. KVM was writing 0 to cr4 on VCPU reset > and passed a slightly-modified 0 to the vmcs.guest_cr4 value. > > Tested: Inject machine check while a guest is spinning. > Before the change, if guest CR4.MCE==0, then the machine check is > escalated to Catastrophic Error (CATERR) and the machine dies. > If guest CR4.MCE==1, then the machine check causes VMEXIT and is > handled normally by host Linux. After the change, injecting a machine > check causes normal Linux machine check handling. > > > --- > arch/x86/kvm/vmx.c | 12 ++++++++++-- > 1 file changed, 10 insertions(+), 2 deletions(-) > > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c > index a214104..44c8d24 100644 > --- a/arch/x86/kvm/vmx.c > +++ b/arch/x86/kvm/vmx.c > @@ -3456,8 +3456,16 @@ static void vmx_set_cr3(struct kvm_vcpu *vcpu, > unsigned long cr3) > > static int vmx_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4) > { > - unsigned long hw_cr4 = cr4 | (to_vmx(vcpu)->rmode.vm86_active ? > - KVM_RMODE_VM_CR4_ALWAYS_ON : KVM_PMODE_VM_CR4_ALWAYS_ON); > + /* > + * Pass through host's Machine Check Enable value to hw_cr4, which > + * is in force while we are in guest mode. Do not let guests control > + * this bit, even if host CR4.MCE == 0. > + */ > + unsigned long hw_cr4 = > + (read_cr4() & X86_CR4_MCE) | cr4_read_shadow? There is no read_cr4 any more. > + (cr4 & ~X86_CR4_MCE) | > + (to_vmx(vcpu)->rmode.vm86_active ? > + KVM_RMODE_VM_CR4_ALWAYS_ON : KVM_PMODE_VM_CR4_ALWAYS_ON); > Otherwise looks sensible. --Andy > if (cr4 & X86_CR4_VMXE) { > /* > --
On 16/04/2015 01:52, Benjamin Serebrin wrote: > There's a bug in kvm/vmx.c: if the host enabled machine check (CR4.MCE==1), > the value gets zeroed while the CPU is running in guest context. > If a machine check event arrives while the CPU is in guest context and > effective CR4.MCE is zero, the machine raises CATERR and crashes. > > We should make sure the host value of CR4.MCE is always active. There > are read and write shadows for the guest to think it wrote its own value. > > For discussion: there's new complexity with CR4 shadowing > (1e02ce4cccdcb9688386e5b8d2c9fa4660b45389). I measure CR4 reads at 24 > cycles on haswell and 36 on sandybridge, which compares well with > L2 miss costs. Is the shadowing worth the complexity? CR4 is also > cached (with no real consistency mechanism) in the VMCS at the time > of guest VCPU creation. If there is ever a change in CR4 value > over time, or if CR4 is different on different CPUs in the system, all this > logic gets broken. Good catch. Please resubmit with your Signed-off-by, and rebased to Linus's tree. Paolo > Thanks, > Ben > > > --- > > The host's decision to enable machine check exceptions should remain > in force during non-root mode. KVM was writing 0 to cr4 on VCPU reset > and passed a slightly-modified 0 to the vmcs.guest_cr4 value. > > Tested: Inject machine check while a guest is spinning. > Before the change, if guest CR4.MCE==0, then the machine check is > escalated to Catastrophic Error (CATERR) and the machine dies. > If guest CR4.MCE==1, then the machine check causes VMEXIT and is > handled normally by host Linux. After the change, injecting a machine > check causes normal Linux machine check handling. > > > --- > arch/x86/kvm/vmx.c | 12 ++++++++++-- > 1 file changed, 10 insertions(+), 2 deletions(-) > > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c > index a214104..44c8d24 100644 > --- a/arch/x86/kvm/vmx.c > +++ b/arch/x86/kvm/vmx.c > @@ -3456,8 +3456,16 @@ static void vmx_set_cr3(struct kvm_vcpu *vcpu, > unsigned long cr3) > > static int vmx_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4) > { > - unsigned long hw_cr4 = cr4 | (to_vmx(vcpu)->rmode.vm86_active ? > - KVM_RMODE_VM_CR4_ALWAYS_ON : KVM_PMODE_VM_CR4_ALWAYS_ON); > + /* > + * Pass through host's Machine Check Enable value to hw_cr4, which > + * is in force while we are in guest mode. Do not let guests control > + * this bit, even if host CR4.MCE == 0. > + */ > + unsigned long hw_cr4 = > + (read_cr4() & X86_CR4_MCE) | > + (cr4 & ~X86_CR4_MCE) | > + (to_vmx(vcpu)->rmode.vm86_active ? > + KVM_RMODE_VM_CR4_ALWAYS_ON : KVM_PMODE_VM_CR4_ALWAYS_ON); > > if (cr4 & X86_CR4_VMXE) { > /* > -- > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Thanks very much, Andy and Paolo. I've brought my patch out of the dark age and just re-sent it under a new subject line. On Thu, Apr 16, 2015 at 2:23 AM, Paolo Bonzini <pbonzini@redhat.com> wrote: > > > On 16/04/2015 01:52, Benjamin Serebrin wrote: >> There's a bug in kvm/vmx.c: if the host enabled machine check (CR4.MCE==1), >> the value gets zeroed while the CPU is running in guest context. >> If a machine check event arrives while the CPU is in guest context and >> effective CR4.MCE is zero, the machine raises CATERR and crashes. >> >> We should make sure the host value of CR4.MCE is always active. There >> are read and write shadows for the guest to think it wrote its own value. >> >> For discussion: there's new complexity with CR4 shadowing >> (1e02ce4cccdcb9688386e5b8d2c9fa4660b45389). I measure CR4 reads at 24 >> cycles on haswell and 36 on sandybridge, which compares well with >> L2 miss costs. Is the shadowing worth the complexity? CR4 is also >> cached (with no real consistency mechanism) in the VMCS at the time >> of guest VCPU creation. If there is ever a change in CR4 value >> over time, or if CR4 is different on different CPUs in the system, all this >> logic gets broken. > > Good catch. Please resubmit with your Signed-off-by, and rebased to > Linus's tree. > > Paolo > >> Thanks, >> Ben >> >> >> --- >> >> The host's decision to enable machine check exceptions should remain >> in force during non-root mode. KVM was writing 0 to cr4 on VCPU reset >> and passed a slightly-modified 0 to the vmcs.guest_cr4 value. >> >> Tested: Inject machine check while a guest is spinning. >> Before the change, if guest CR4.MCE==0, then the machine check is >> escalated to Catastrophic Error (CATERR) and the machine dies. >> If guest CR4.MCE==1, then the machine check causes VMEXIT and is >> handled normally by host Linux. After the change, injecting a machine >> check causes normal Linux machine check handling. >> >> >> --- >> arch/x86/kvm/vmx.c | 12 ++++++++++-- >> 1 file changed, 10 insertions(+), 2 deletions(-) >> >> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c >> index a214104..44c8d24 100644 >> --- a/arch/x86/kvm/vmx.c >> +++ b/arch/x86/kvm/vmx.c >> @@ -3456,8 +3456,16 @@ static void vmx_set_cr3(struct kvm_vcpu *vcpu, >> unsigned long cr3) >> >> static int vmx_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4) >> { >> - unsigned long hw_cr4 = cr4 | (to_vmx(vcpu)->rmode.vm86_active ? >> - KVM_RMODE_VM_CR4_ALWAYS_ON : KVM_PMODE_VM_CR4_ALWAYS_ON); >> + /* >> + * Pass through host's Machine Check Enable value to hw_cr4, which >> + * is in force while we are in guest mode. Do not let guests control >> + * this bit, even if host CR4.MCE == 0. >> + */ >> + unsigned long hw_cr4 = >> + (read_cr4() & X86_CR4_MCE) | >> + (cr4 & ~X86_CR4_MCE) | >> + (to_vmx(vcpu)->rmode.vm86_active ? >> + KVM_RMODE_VM_CR4_ALWAYS_ON : KVM_PMODE_VM_CR4_ALWAYS_ON); >> >> if (cr4 & X86_CR4_VMXE) { >> /* >> -- >> -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index a214104..44c8d24 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -3456,8 +3456,16 @@ static void vmx_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3) static int vmx_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4) { - unsigned long hw_cr4 = cr4 | (to_vmx(vcpu)->rmode.vm86_active ? - KVM_RMODE_VM_CR4_ALWAYS_ON : KVM_PMODE_VM_CR4_ALWAYS_ON); + /* + * Pass through host's Machine Check Enable value to hw_cr4, which + * is in force while we are in guest mode. Do not let guests control + * this bit, even if host CR4.MCE == 0. + */ + unsigned long hw_cr4 = + (read_cr4() & X86_CR4_MCE) | + (cr4 & ~X86_CR4_MCE) | + (to_vmx(vcpu)->rmode.vm86_active ? + KVM_RMODE_VM_CR4_ALWAYS_ON : KVM_PMODE_VM_CR4_ALWAYS_ON); if (cr4 & X86_CR4_VMXE) {