diff mbox series

[2/2] x86: KVM: Emulate instruction when GPA can't be translated by EPT

Message ID 20231218140543.870234-3-tao1.su@linux.intel.com (mailing list archive)
State New, archived
Headers show
Series x86: KVM: Limit guest physical bits when 5-level EPT is unsupported | expand

Commit Message

Tao Su Dec. 18, 2023, 2:05 p.m. UTC
With 4-level EPT, bits 51:48 of the guest physical address must all
be zero; otherwise, an EPT violation always occurs, which is an unexpected
VM exit in KVM currently.

Even though KVM advertises the max physical bits to guest, guest may
ignore MAXPHYADDR in CPUID and set a bigger physical bits to KVM.
Rejecting invalid guest physical bits on KVM side is a choice, but it will
break current KVM ABI, e.g., current QEMU ignores the physical bits
advertised by KVM and uses host physical bits as guest physical bits by
default when using '-cpu host', although we would like to send a patch to
QEMU, it will still cause backward compatibility issues.

For GPA that can't be translated by EPT but within host.MAXPHYADDR,
emulation should be the best choice since KVM will inject #PF for the
invalid GPA in guest's perspective and try to emulate the instructions
which minimizes the impact on guests as much as possible.

Signed-off-by: Tao Su <tao1.su@linux.intel.com>
Tested-by: Yi Lai <yi1.lai@intel.com>
---
 arch/x86/kvm/vmx/vmx.c | 7 +++++++
 1 file changed, 7 insertions(+)

Comments

Sean Christopherson Dec. 18, 2023, 3:23 p.m. UTC | #1
On Mon, Dec 18, 2023, Tao Su wrote:
> With 4-level EPT, bits 51:48 of the guest physical address must all
> be zero; otherwise, an EPT violation always occurs, which is an unexpected
> VM exit in KVM currently.
> 
> Even though KVM advertises the max physical bits to guest, guest may
> ignore MAXPHYADDR in CPUID and set a bigger physical bits to KVM.
> Rejecting invalid guest physical bits on KVM side is a choice, but it will
> break current KVM ABI, e.g., current QEMU ignores the physical bits
> advertised by KVM and uses host physical bits as guest physical bits by
> default when using '-cpu host', although we would like to send a patch to
> QEMU, it will still cause backward compatibility issues.
> 
> For GPA that can't be translated by EPT but within host.MAXPHYADDR,
> emulation should be the best choice since KVM will inject #PF for the
> invalid GPA in guest's perspective and try to emulate the instructions
> which minimizes the impact on guests as much as possible.

NAK.  allow_smaller_maxphyaddr is a bit of a mess and in IMO was a mistake, but
at least there was reasonable motivation for trying to support guests with a small
MAXPHYADDR.  Fudging around a QEMU bug is not good enough justification, especially
since the odds of a hack in KVM fully working are slim to none.
Chao Gao Dec. 19, 2023, 3:10 a.m. UTC | #2
On Mon, Dec 18, 2023 at 07:23:01AM -0800, Sean Christopherson wrote:
>On Mon, Dec 18, 2023, Tao Su wrote:
>> With 4-level EPT, bits 51:48 of the guest physical address must all
>> be zero; otherwise, an EPT violation always occurs, which is an unexpected
>> VM exit in KVM currently.
>> 
>> Even though KVM advertises the max physical bits to guest, guest may
>> ignore MAXPHYADDR in CPUID and set a bigger physical bits to KVM.
>> Rejecting invalid guest physical bits on KVM side is a choice, but it will
>> break current KVM ABI, e.g., current QEMU ignores the physical bits
>> advertised by KVM and uses host physical bits as guest physical bits by
>> default when using '-cpu host', although we would like to send a patch to
>> QEMU, it will still cause backward compatibility issues.
>> 
>> For GPA that can't be translated by EPT but within host.MAXPHYADDR,
>> emulation should be the best choice since KVM will inject #PF for the
>> invalid GPA in guest's perspective and try to emulate the instructions
>> which minimizes the impact on guests as much as possible.
>
>NAK.  allow_smaller_maxphyaddr is a bit of a mess and in IMO was a mistake, but
>at least there was reasonable motivation for trying to support guests with a small
>MAXPHYADDR.  Fudging around a QEMU bug is not good enough justification, especially
>since the odds of a hack in KVM fully working are slim to none.

The changelog is a little misleading. Even if there is no QEMU "bug" and QEMU
sets guest.MAXPHYADDR to 48 correctly, guest __can__ set up CR3 page table to
access GPAs with bits of 51:48 set. In this case, KVM still gets EPT violation
and needs to inject #PF to the guest (by emulating the instruction).
Jim Mattson Dec. 20, 2023, 1:42 p.m. UTC | #3
On Mon, Dec 18, 2023 at 6:08 AM Tao Su <tao1.su@linux.intel.com> wrote:
>
> With 4-level EPT, bits 51:48 of the guest physical address must all
> be zero; otherwise, an EPT violation always occurs, which is an unexpected
> VM exit in KVM currently.
>
> Even though KVM advertises the max physical bits to guest, guest may
> ignore MAXPHYADDR in CPUID and set a bigger physical bits to KVM.
> Rejecting invalid guest physical bits on KVM side is a choice, but it will
> break current KVM ABI, e.g., current QEMU ignores the physical bits
> advertised by KVM and uses host physical bits as guest physical bits by
> default when using '-cpu host', although we would like to send a patch to
> QEMU, it will still cause backward compatibility issues.
>
> For GPA that can't be translated by EPT but within host.MAXPHYADDR,
> emulation should be the best choice since KVM will inject #PF for the
> invalid GPA in guest's perspective and try to emulate the instructions
> which minimizes the impact on guests as much as possible.
>
> Signed-off-by: Tao Su <tao1.su@linux.intel.com>
> Tested-by: Yi Lai <yi1.lai@intel.com>
> ---
>  arch/x86/kvm/vmx/vmx.c | 7 +++++++
>  1 file changed, 7 insertions(+)
>
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index be20a60047b1..a8aa2cfa2f5d 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -5774,6 +5774,13 @@ static int handle_ept_violation(struct kvm_vcpu *vcpu)
>
>         vcpu->arch.exit_qualification = exit_qualification;
>
> +       /*
> +        * Emulate the instruction when accessing a GPA which is set any bits
> +        * beyond guest-physical bits that EPT can translate.
> +        */
> +       if (unlikely(gpa & rsvd_bits(kvm_mmu_tdp_maxphyaddr(), 63)))
> +               return kvm_emulate_instruction(vcpu, 0);
> +

This doesn't really work, since the KVM instruction emulator is
woefully incomplete. To make this work, first you have to teach the
KVM instruction emulator how to emulate *all* memory-accessing
instructions.

>         /*
>          * Check that the GPA doesn't exceed physical memory limits, as that is
>          * a guest page fault.  We have to emulate the instruction here, because
> --
> 2.34.1
>
>
Tao Su Jan. 8, 2024, 1:48 p.m. UTC | #4
On Wed, Dec 20, 2023 at 05:42:56AM -0800, Jim Mattson wrote:
> On Mon, Dec 18, 2023 at 6:08 AM Tao Su <tao1.su@linux.intel.com> wrote:
> >
> > With 4-level EPT, bits 51:48 of the guest physical address must all
> > be zero; otherwise, an EPT violation always occurs, which is an unexpected
> > VM exit in KVM currently.
> >
> > Even though KVM advertises the max physical bits to guest, guest may
> > ignore MAXPHYADDR in CPUID and set a bigger physical bits to KVM.
> > Rejecting invalid guest physical bits on KVM side is a choice, but it will
> > break current KVM ABI, e.g., current QEMU ignores the physical bits
> > advertised by KVM and uses host physical bits as guest physical bits by
> > default when using '-cpu host', although we would like to send a patch to
> > QEMU, it will still cause backward compatibility issues.
> >
> > For GPA that can't be translated by EPT but within host.MAXPHYADDR,
> > emulation should be the best choice since KVM will inject #PF for the
> > invalid GPA in guest's perspective and try to emulate the instructions
> > which minimizes the impact on guests as much as possible.
> >
> > Signed-off-by: Tao Su <tao1.su@linux.intel.com>
> > Tested-by: Yi Lai <yi1.lai@intel.com>
> > ---
> >  arch/x86/kvm/vmx/vmx.c | 7 +++++++
> >  1 file changed, 7 insertions(+)
> >
> > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> > index be20a60047b1..a8aa2cfa2f5d 100644
> > --- a/arch/x86/kvm/vmx/vmx.c
> > +++ b/arch/x86/kvm/vmx/vmx.c
> > @@ -5774,6 +5774,13 @@ static int handle_ept_violation(struct kvm_vcpu *vcpu)
> >
> >         vcpu->arch.exit_qualification = exit_qualification;
> >
> > +       /*
> > +        * Emulate the instruction when accessing a GPA which is set any bits
> > +        * beyond guest-physical bits that EPT can translate.
> > +        */
> > +       if (unlikely(gpa & rsvd_bits(kvm_mmu_tdp_maxphyaddr(), 63)))
> > +               return kvm_emulate_instruction(vcpu, 0);
> > +
> 
> This doesn't really work, since the KVM instruction emulator is
> woefully incomplete. To make this work, first you have to teach the
> KVM instruction emulator how to emulate *all* memory-accessing
> instructions.

Please forget allow_smaller_maxphyaddr and #PF for a while.

I agree KVM instruction emulator is incomplete. However, hardware can’t
execute instructions with GPA>48-bit and exits to KVM, KVM just repeatedly
builds SPTE, I.e., current KVM is buggy.

In this case, emulation may be a choice, I.e., KVM can emulate most common
instructions which reflects KVM's best efforts to maintain an environment for
continued execution. Even if some instructions can’t be emulated, it can
terminate the guest and will not make KVM silently hang there.

Thanks,
Tao

> 
> >         /*
> >          * Check that the GPA doesn't exceed physical memory limits, as that is
> >          * a guest page fault.  We have to emulate the instruction here, because
> > --
> > 2.34.1
> >
> >
Sean Christopherson Jan. 8, 2024, 3:19 p.m. UTC | #5
On Mon, Jan 08, 2024, Tao Su wrote:
> On Wed, Dec 20, 2023 at 05:42:56AM -0800, Jim Mattson wrote:
> > On Mon, Dec 18, 2023 at 6:08 AM Tao Su <tao1.su@linux.intel.com> wrote:
> > >
> > > With 4-level EPT, bits 51:48 of the guest physical address must all
> > > be zero; otherwise, an EPT violation always occurs, which is an unexpected
> > > VM exit in KVM currently.
> > >
> > > Even though KVM advertises the max physical bits to guest, guest may
> > > ignore MAXPHYADDR in CPUID and set a bigger physical bits to KVM.
> > > Rejecting invalid guest physical bits on KVM side is a choice, but it will
> > > break current KVM ABI, e.g., current QEMU ignores the physical bits
> > > advertised by KVM and uses host physical bits as guest physical bits by
> > > default when using '-cpu host', although we would like to send a patch to
> > > QEMU, it will still cause backward compatibility issues.
> > >
> > > For GPA that can't be translated by EPT but within host.MAXPHYADDR,
> > > emulation should be the best choice since KVM will inject #PF for the
> > > invalid GPA in guest's perspective and try to emulate the instructions
> > > which minimizes the impact on guests as much as possible.
> > >
> > > Signed-off-by: Tao Su <tao1.su@linux.intel.com>
> > > Tested-by: Yi Lai <yi1.lai@intel.com>
> > > ---
> > >  arch/x86/kvm/vmx/vmx.c | 7 +++++++
> > >  1 file changed, 7 insertions(+)
> > >
> > > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> > > index be20a60047b1..a8aa2cfa2f5d 100644
> > > --- a/arch/x86/kvm/vmx/vmx.c
> > > +++ b/arch/x86/kvm/vmx/vmx.c
> > > @@ -5774,6 +5774,13 @@ static int handle_ept_violation(struct kvm_vcpu *vcpu)
> > >
> > >         vcpu->arch.exit_qualification = exit_qualification;
> > >
> > > +       /*
> > > +        * Emulate the instruction when accessing a GPA which is set any bits
> > > +        * beyond guest-physical bits that EPT can translate.
> > > +        */
> > > +       if (unlikely(gpa & rsvd_bits(kvm_mmu_tdp_maxphyaddr(), 63)))
> > > +               return kvm_emulate_instruction(vcpu, 0);
> > > +
> > 
> > This doesn't really work, since the KVM instruction emulator is
> > woefully incomplete. To make this work, first you have to teach the
> > KVM instruction emulator how to emulate *all* memory-accessing
> > instructions.
> 
> Please forget allow_smaller_maxphyaddr and #PF for a while.
> 
> I agree KVM instruction emulator is incomplete. However, hardware can’t
> execute instructions with GPA>48-bit and exits to KVM, KVM just repeatedly
> builds SPTE, I.e., current KVM is buggy.

Eh, hardware is just as much to blame as KVM.  Garbage in, garbage out.

> In this case, emulation may be a choice,

Yes, it can be a choice, but that choice needs to be made conciously by userspace,
not silently by KVM.
diff mbox series

Patch

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index be20a60047b1..a8aa2cfa2f5d 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -5774,6 +5774,13 @@  static int handle_ept_violation(struct kvm_vcpu *vcpu)
 
 	vcpu->arch.exit_qualification = exit_qualification;
 
+	/*
+	 * Emulate the instruction when accessing a GPA which is set any bits
+	 * beyond guest-physical bits that EPT can translate.
+	 */
+	if (unlikely(gpa & rsvd_bits(kvm_mmu_tdp_maxphyaddr(), 63)))
+		return kvm_emulate_instruction(vcpu, 0);
+
 	/*
 	 * Check that the GPA doesn't exceed physical memory limits, as that is
 	 * a guest page fault.  We have to emulate the instruction here, because