diff mbox series

KVM: x86: Always enable legacy fp/sse

Message ID 20220816175936.23238-1-dgilbert@redhat.com (mailing list archive)
State New, archived
Headers show
Series KVM: x86: Always enable legacy fp/sse | expand

Commit Message

Dr. David Alan Gilbert Aug. 16, 2022, 5:59 p.m. UTC
From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

A live migration under qemu is currently failing when the source
host is ~Nehalem era (pre-xsave) and the destination is much newer,
(configured with a guest CPU type of Nehalem).
QEMU always calls kvm_put_xsave, even on this combination because
KVM_CAP_CHECK_EXTENSION_VM always returns true for KVM_CAP_XSAVE.

When QEMU calls kvm_put_xsave it's rejected by
   fpu_copy_uabi_to_guest_fpstate->
     copy_uabi_to_xstate->
       validate_user_xstate_header

when the validate checks the loaded xfeatures against
user_xfeatures, which it finds to be 0.

I think our initialisation of user_xfeatures is being
too strict here, and we should always allow the base FP/SSE.

Fixes: ad856280ddea ("x86/kvm/fpu: Limit guest user_xfeatures to supported bits of XCR0")
bz: https://bugzilla.redhat.com/show_bug.cgi?id=2079311

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 arch/x86/kvm/cpuid.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Sean Christopherson Aug. 16, 2022, 9:37 p.m. UTC | #1
On Tue, Aug 16, 2022, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> A live migration under qemu is currently failing when the source
> host is ~Nehalem era (pre-xsave) and the destination is much newer,
> (configured with a guest CPU type of Nehalem).
> QEMU always calls kvm_put_xsave, even on this combination because
> KVM_CAP_CHECK_EXTENSION_VM always returns true for KVM_CAP_XSAVE.
> 
> When QEMU calls kvm_put_xsave it's rejected by
>    fpu_copy_uabi_to_guest_fpstate->
>      copy_uabi_to_xstate->
>        validate_user_xstate_header
> 
> when the validate checks the loaded xfeatures against
> user_xfeatures, which it finds to be 0.
> 
> I think our initialisation of user_xfeatures is being
> too strict here, and we should always allow the base FP/SSE.
> 
> Fixes: ad856280ddea ("x86/kvm/fpu: Limit guest user_xfeatures to supported bits of XCR0")
> bz: https://bugzilla.redhat.com/show_bug.cgi?id=2079311
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>  arch/x86/kvm/cpuid.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index de6d44e07e34..3b2319cecfd1 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -298,7 +298,8 @@ static void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
>  	guest_supported_xcr0 =
>  		cpuid_get_supported_xcr0(vcpu->arch.cpuid_entries, vcpu->arch.cpuid_nent);
>  
> -	vcpu->arch.guest_fpu.fpstate->user_xfeatures = guest_supported_xcr0;
> +	vcpu->arch.guest_fpu.fpstate->user_xfeatures = guest_supported_xcr0 |
> +		XFEATURE_MASK_FPSSE;

I don't think this is correct.  This will allow the guest to set the SSE bit
even when XSAVE isn't supported due to kvm_guest_supported_xcr0() returning
user_xfeatures.

  static inline u64 kvm_guest_supported_xcr0(struct kvm_vcpu *vcpu)
  {
	return vcpu->arch.guest_fpu.fpstate->user_xfeatures;
  }

I believe the right place to fix this is in validate_user_xstate_header().  It's
reachable if and only if XSAVE is supported in the host, and when XSAVE is _not_
supported, the kernel unconditionally allows FP+SSE.  So it follows that the kernel
should also allow FP+SSE when using XSAVE too.  That would also align the logic
with fpu_copy_guest_fpstate_to_uabi(), which fordces the FPSSE flags.  Ditto for
the non-KVM save_xstate_epilog().

Aha!  And fpu__init_system_xstate() ensure the host supports FP+SSE when XSAVE
is enabled (knew their had to be a sanity check somewhere).

---
 arch/x86/kernel/fpu/xstate.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index c8340156bfd2..83b9a9653d47 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -399,8 +399,13 @@ int xfeature_size(int xfeature_nr)
 static int validate_user_xstate_header(const struct xstate_header *hdr,
 				       struct fpstate *fpstate)
 {
-	/* No unknown or supervisor features may be set */
-	if (hdr->xfeatures & ~fpstate->user_xfeatures)
+	/*
+	 * No unknown or supervisor features may be set.  Userspace is always
+	 * allowed to restore FP+SSE state (XSAVE/XRSTOR are used by the kernel
+	 * if and only if FP+SSE are supported in xstate).
+	 */
+	if (hdr->xfeatures & ~fpstate->user_xfeatures &
+	    ~(XFEATURE_MASK_FP | XFEATURE_MASK_SSE))
 		return -EINVAL;

 	/* Userspace must use the uncompacted format */

base-commit: de3d415edca23831c5d1f24f10c74a715af7efdb
--
Leonardo Bras Aug. 17, 2022, 3:29 a.m. UTC | #2
On Tue, 2022-08-16 at 21:37 +0000, Sean Christopherson wrote:
> On Tue, Aug 16, 2022, Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > 
> > A live migration under qemu is currently failing when the source
> > host is ~Nehalem era (pre-xsave) and the destination is much newer,
> > (configured with a guest CPU type of Nehalem).
> > QEMU always calls kvm_put_xsave, even on this combination because
> > KVM_CAP_CHECK_EXTENSION_VM always returns true for KVM_CAP_XSAVE.
> > 
> > When QEMU calls kvm_put_xsave it's rejected by
> >    fpu_copy_uabi_to_guest_fpstate->
> >      copy_uabi_to_xstate->
> >        validate_user_xstate_header
> > 
> > when the validate checks the loaded xfeatures against
> > user_xfeatures, which it finds to be 0.
> > 
> > I think our initialisation of user_xfeatures is being
> > too strict here, and we should always allow the base FP/SSE.
> > 
> > Fixes: ad856280ddea ("x86/kvm/fpu: Limit guest user_xfeatures to supported bits of XCR0")

Thanks for fixing this, Dave!

> > bz: https://bugzilla.redhat.com/show_bug.cgi?id=2079311
> > 
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > ---
> >  arch/x86/kvm/cpuid.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> > index de6d44e07e34..3b2319cecfd1 100644
> > --- a/arch/x86/kvm/cpuid.c
> > +++ b/arch/x86/kvm/cpuid.c
> > @@ -298,7 +298,8 @@ static void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
> >  	guest_supported_xcr0 =
> >  		cpuid_get_supported_xcr0(vcpu->arch.cpuid_entries, vcpu->arch.cpuid_nent);
> >  
> > -	vcpu->arch.guest_fpu.fpstate->user_xfeatures = guest_supported_xcr0;
> > +	vcpu->arch.guest_fpu.fpstate->user_xfeatures = guest_supported_xcr0 |
> > +		XFEATURE_MASK_FPSSE;
> 
> I don't think this is correct.  This will allow the guest to set the SSE bit
> even when XSAVE isn't supported due to kvm_guest_supported_xcr0() returning
> user_xfeatures.
> 
>   static inline u64 kvm_guest_supported_xcr0(struct kvm_vcpu *vcpu)
>   {
> 	return vcpu->arch.guest_fpu.fpstate->user_xfeatures;
>   }
> 
> I believe the right place to fix this is in validate_user_xstate_header().  It's
> reachable if and only if XSAVE is supported in the host, and when XSAVE is _not_
> supported, the kernel unconditionally allows FP+SSE.  So it follows that the kernel
> should also allow FP+SSE when using XSAVE too.  That would also align the logic
> with fpu_copy_guest_fpstate_to_uabi(), which fordces the FPSSE flags.  Ditto for
> the non-KVM save_xstate_epilog().
> 
> Aha!  And fpu__init_system_xstate() ensure the host supports FP+SSE when XSAVE
> is enabled (knew their had to be a sanity check somewhere).

Thanks for the feedback Sean!

I have near to no experience in this code, and I hope you can help me with a
question I have, based in Dave's commit message:

> > QEMU always calls kvm_put_xsave, even on this combination because
> > KVM_CAP_CHECK_EXTENSION_VM always returns true for KVM_CAP_XSAVE.

Any particular reason why it always returns true for KVM_CAP_XSAVE, even when
the CPU does not support it? 

IIUC, if it returns false to this capability, kvm_put_xsave() should never be
called, and thus it can avoid bug reproduction. 

Thanks in advance,

Leo

> 
> ---
>  arch/x86/kernel/fpu/xstate.c | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
> index c8340156bfd2..83b9a9653d47 100644
> --- a/arch/x86/kernel/fpu/xstate.c
> +++ b/arch/x86/kernel/fpu/xstate.c
> @@ -399,8 +399,13 @@ int xfeature_size(int xfeature_nr)
>  static int validate_user_xstate_header(const struct xstate_header *hdr,
>  				       struct fpstate *fpstate)
>  {
> -	/* No unknown or supervisor features may be set */
> -	if (hdr->xfeatures & ~fpstate->user_xfeatures)
> +	/*
> +	 * No unknown or supervisor features may be set.  Userspace is always
> +	 * allowed to restore FP+SSE state (XSAVE/XRSTOR are used by the kernel
> +	 * if and only if FP+SSE are supported in xstate).
> +	 */
> +	if (hdr->xfeatures & ~fpstate->user_xfeatures &
> +	    ~(XFEATURE_MASK_FP | XFEATURE_MASK_SSE))
>  		return -EINVAL;
> 
>  	/* Userspace must use the uncompacted format */
> 
> base-commit: de3d415edca23831c5d1f24f10c74a715af7efdb
> --
>
Paolo Bonzini Aug. 17, 2022, 8:45 a.m. UTC | #3
On 8/17/22 05:29, Leonardo BrĂ¡s wrote:
>>> QEMU always calls kvm_put_xsave, even on this combination because
>>> KVM_CAP_CHECK_EXTENSION_VM always returns true for KVM_CAP_XSAVE.
> Any particular reason why it always returns true for KVM_CAP_XSAVE, even when
> the CPU does not support it?
> 
> IIUC, if it returns false to this capability, kvm_put_xsave() should never be
> called, and thus it can avoid bug reproduction.

Because it allows userspace to have a single path for saving/restoring 
FPU state.  See for example the "migration" code in 
tools/testing/selftests/kvm/lib/x86_64/processor.c (the vcpu_save_state 
and vcpu_load_state functions).

In fact, the QEMU code that uses KVM_GET_FPU/KVM_SET_FPU in x86 is 
obsolete, because it's not been used since Linux 2.6.36.

Paolo
Dr. David Alan Gilbert Aug. 17, 2022, 11:03 a.m. UTC | #4
* Sean Christopherson (seanjc@google.com) wrote:
> On Tue, Aug 16, 2022, Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > 
> > A live migration under qemu is currently failing when the source
> > host is ~Nehalem era (pre-xsave) and the destination is much newer,
> > (configured with a guest CPU type of Nehalem).
> > QEMU always calls kvm_put_xsave, even on this combination because
> > KVM_CAP_CHECK_EXTENSION_VM always returns true for KVM_CAP_XSAVE.
> > 
> > When QEMU calls kvm_put_xsave it's rejected by
> >    fpu_copy_uabi_to_guest_fpstate->
> >      copy_uabi_to_xstate->
> >        validate_user_xstate_header
> > 
> > when the validate checks the loaded xfeatures against
> > user_xfeatures, which it finds to be 0.
> > 
> > I think our initialisation of user_xfeatures is being
> > too strict here, and we should always allow the base FP/SSE.
> > 
> > Fixes: ad856280ddea ("x86/kvm/fpu: Limit guest user_xfeatures to supported bits of XCR0")
> > bz: https://bugzilla.redhat.com/show_bug.cgi?id=2079311
> > 
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > ---
> >  arch/x86/kvm/cpuid.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> > index de6d44e07e34..3b2319cecfd1 100644
> > --- a/arch/x86/kvm/cpuid.c
> > +++ b/arch/x86/kvm/cpuid.c
> > @@ -298,7 +298,8 @@ static void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
> >  	guest_supported_xcr0 =
> >  		cpuid_get_supported_xcr0(vcpu->arch.cpuid_entries, vcpu->arch.cpuid_nent);
> >  
> > -	vcpu->arch.guest_fpu.fpstate->user_xfeatures = guest_supported_xcr0;
> > +	vcpu->arch.guest_fpu.fpstate->user_xfeatures = guest_supported_xcr0 |
> > +		XFEATURE_MASK_FPSSE;

Hi Sean,
  Thanks for the reply,

> I don't think this is correct.  This will allow the guest to set the SSE bit
> even when XSAVE isn't supported due to kvm_guest_supported_xcr0() returning
> user_xfeatures.
> 
>   static inline u64 kvm_guest_supported_xcr0(struct kvm_vcpu *vcpu)
>   {
> 	return vcpu->arch.guest_fpu.fpstate->user_xfeatures;
>   }
> 
> I believe the right place to fix this is in validate_user_xstate_header().  It's
> reachable if and only if XSAVE is supported in the host, and when XSAVE is _not_
> supported, the kernel unconditionally allows FP+SSE.  So it follows that the kernel
> should also allow FP+SSE when using XSAVE too.  That would also align the logic
> with fpu_copy_guest_fpstate_to_uabi(), which fordces the FPSSE flags.  Ditto for
> the non-KVM save_xstate_epilog().

OK, yes, I'd followed the check that failed down to this test; although
by itself this test works until Leo's patch came along later; so I
wasn't sure where to fix it.

> Aha!  And fpu__init_system_xstate() ensure the host supports FP+SSE when XSAVE
> is enabled (knew their had to be a sanity check somewhere).
> 
> ---
>  arch/x86/kernel/fpu/xstate.c | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
> index c8340156bfd2..83b9a9653d47 100644
> --- a/arch/x86/kernel/fpu/xstate.c
> +++ b/arch/x86/kernel/fpu/xstate.c
> @@ -399,8 +399,13 @@ int xfeature_size(int xfeature_nr)
>  static int validate_user_xstate_header(const struct xstate_header *hdr,
>  				       struct fpstate *fpstate)
>  {
> -	/* No unknown or supervisor features may be set */
> -	if (hdr->xfeatures & ~fpstate->user_xfeatures)
> +	/*
> +	 * No unknown or supervisor features may be set.  Userspace is always
> +	 * allowed to restore FP+SSE state (XSAVE/XRSTOR are used by the kernel
> +	 * if and only if FP+SSE are supported in xstate).
> +	 */
> +	if (hdr->xfeatures & ~fpstate->user_xfeatures &
> +	    ~(XFEATURE_MASK_FP | XFEATURE_MASK_SSE))
>  		return -EINVAL;
> 
>  	/* Userspace must use the uncompacted format */

That passes the small smoke test for me; will you repost that then?

Thanks,

Dave

> base-commit: de3d415edca23831c5d1f24f10c74a715af7efdb
> --
>
Sean Christopherson Aug. 17, 2022, 4:11 p.m. UTC | #5
On Wed, Aug 17, 2022, Dr. David Alan Gilbert wrote:
> That passes the small smoke test for me; will you repost that then?

Yep, will do.
Dr. David Alan Gilbert Aug. 17, 2022, 4:14 p.m. UTC | #6
* Sean Christopherson (seanjc@google.com) wrote:
> On Wed, Aug 17, 2022, Dr. David Alan Gilbert wrote:
> > That passes the small smoke test for me; will you repost that then?
> 
> Yep, will do.

Thanks.

Dave
Sean Christopherson Aug. 23, 2022, 12:15 a.m. UTC | #7
On Wed, Aug 17, 2022, Dr. David Alan Gilbert wrote:
> * Sean Christopherson (seanjc@google.com) wrote:
> > On Tue, Aug 16, 2022, Dr. David Alan Gilbert (git) wrote:
> > > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> > > index de6d44e07e34..3b2319cecfd1 100644
> > > --- a/arch/x86/kvm/cpuid.c
> > > +++ b/arch/x86/kvm/cpuid.c
> > > @@ -298,7 +298,8 @@ static void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
> > >  	guest_supported_xcr0 =
> > >  		cpuid_get_supported_xcr0(vcpu->arch.cpuid_entries, vcpu->arch.cpuid_nent);
> > >  
> > > -	vcpu->arch.guest_fpu.fpstate->user_xfeatures = guest_supported_xcr0;
> > > +	vcpu->arch.guest_fpu.fpstate->user_xfeatures = guest_supported_xcr0 |
> > > +		XFEATURE_MASK_FPSSE;
> 
> Hi Sean,
>   Thanks for the reply,
> 
> > I don't think this is correct.  This will allow the guest to set the SSE bit
> > even when XSAVE isn't supported due to kvm_guest_supported_xcr0() returning
> > user_xfeatures.
> > 
> >   static inline u64 kvm_guest_supported_xcr0(struct kvm_vcpu *vcpu)
> >   {
> > 	return vcpu->arch.guest_fpu.fpstate->user_xfeatures;
> >   }
> > 
> > I believe the right place to fix this is in validate_user_xstate_header().  It's
> > reachable if and only if XSAVE is supported in the host, and when XSAVE is _not_
> > supported, the kernel unconditionally allows FP+SSE.  So it follows that the kernel
> > should also allow FP+SSE when using XSAVE too.  That would also align the logic
> > with fpu_copy_guest_fpstate_to_uabi(), which fordces the FPSSE flags.  Ditto for
> > the non-KVM save_xstate_epilog().
> 
> OK, yes, I'd followed the check that failed down to this test; although
> by itself this test works until Leo's patch came along later; so I
> wasn't sure where to fix it.
> 
> > Aha!  And fpu__init_system_xstate() ensure the host supports FP+SSE when XSAVE
> > is enabled (knew their had to be a sanity check somewhere).
> > 
> > ---
> >  arch/x86/kernel/fpu/xstate.c | 9 +++++++--
> >  1 file changed, 7 insertions(+), 2 deletions(-)
> > 
> > diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
> > index c8340156bfd2..83b9a9653d47 100644
> > --- a/arch/x86/kernel/fpu/xstate.c
> > +++ b/arch/x86/kernel/fpu/xstate.c
> > @@ -399,8 +399,13 @@ int xfeature_size(int xfeature_nr)
> >  static int validate_user_xstate_header(const struct xstate_header *hdr,
> >  				       struct fpstate *fpstate)
> >  {
> > -	/* No unknown or supervisor features may be set */
> > -	if (hdr->xfeatures & ~fpstate->user_xfeatures)
> > +	/*
> > +	 * No unknown or supervisor features may be set.  Userspace is always
> > +	 * allowed to restore FP+SSE state (XSAVE/XRSTOR are used by the kernel
> > +	 * if and only if FP+SSE are supported in xstate).
> > +	 */
> > +	if (hdr->xfeatures & ~fpstate->user_xfeatures &
> > +	    ~(XFEATURE_MASK_FP | XFEATURE_MASK_SSE))
> >  		return -EINVAL;
> > 
> >  	/* Userspace must use the uncompacted format */
> 
> That passes the small smoke test for me; will you repost that then?

*sigh*

The bug is more subtle than just failing to restore.  Saving can also "fail".  If
XSAVE is hidden from the guest on an XSAVE-capable host, __copy_xstate_to_uabi_buf()
will happily reinitialize FP+SSE state and thus corrupt guest FPU state on migration.

And not that it matters now, but before realizing that KVM_GET_XSAVE is also broken,
I decided I like Dave's patch better because KVM really should separate what userspace
can save/restore from what the guest can access.

Amusingly, there's actually another bug lurking with respect to usurping user_xfeatures
to represent supported_guest_xcr0.  The latter is zero-initialized, whereas
user_xfeatures is set to the "default" features on initialization, i.e. migrating a
VM without ever doing KVM_SET_CPUID2 would do odd things.

Sending a v2 shortly to reinstate guest_supported_xcr0 before landing Dave's patch.
diff mbox series

Patch

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index de6d44e07e34..3b2319cecfd1 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -298,7 +298,8 @@  static void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
 	guest_supported_xcr0 =
 		cpuid_get_supported_xcr0(vcpu->arch.cpuid_entries, vcpu->arch.cpuid_nent);
 
-	vcpu->arch.guest_fpu.fpstate->user_xfeatures = guest_supported_xcr0;
+	vcpu->arch.guest_fpu.fpstate->user_xfeatures = guest_supported_xcr0 |
+		XFEATURE_MASK_FPSSE;
 
 	kvm_update_pv_runtime(vcpu);