diff mbox series

[v4,1/2] x86/kvm/fpu: Mask guest fpstate->xfeatures with guest_supported_xcr0

Message ID 20220217053028.96432-2-leobras@redhat.com (mailing list archive)
State New, archived
Headers show
Series x86/kvm/fpu: Fix guest migration bugs that can crash guest | expand

Commit Message

Leonardo Bras Feb. 17, 2022, 5:30 a.m. UTC
During host/guest switch (like in kvm_arch_vcpu_ioctl_run()), the kernel
swaps the fpu between host/guest contexts, by using fpu_swap_kvm_fpstate().

When xsave feature is available, the fpu swap is done by:
- xsave(s) instruction, with guest's fpstate->xfeatures as mask, is used
  to store the current state of the fpu registers to a buffer.
- xrstor(s) instruction, with (fpu_kernel_cfg.max_features &
  XFEATURE_MASK_FPSTATE) as mask, is used to put the buffer into fpu regs.

For xsave(s) the mask is used to limit what parts of the fpu regs will
be copied to the buffer. Likewise on xrstor(s), the mask is used to
limit what parts of the fpu regs will be changed.

The mask for xsave(s), the guest's fpstate->xfeatures, is defined on
kvm_arch_vcpu_create(), which (in summary) sets it to all features
supported by the cpu which are enabled on kernel config.

This means that xsave(s) will save to guest buffer all the fpu regs
contents the cpu has enabled when the guest is paused, even if they
are not used.

This would not be an issue, if xrstor(s) would also do that.

xrstor(s)'s mask for host/guest swap is basically every valid feature
contained in kernel config, except XFEATURE_MASK_PKRU.
Accordingto kernel src, it is instead switched in switch_to() and
flush_thread().

Then, the following happens with a host supporting PKRU starts a
guest that does not support it:
1 - Host has XFEATURE_MASK_PKRU set. 1st switch to guest,
2 - xsave(s) fpu regs to host fpustate (buffer has XFEATURE_MASK_PKRU)
3 - xrstor(s) guest fpustate to fpu regs (fpu regs have XFEATURE_MASK_PKRU)
4 - guest runs, then switch back to host,
5 - xsave(s) fpu regs to guest fpstate (buffer now have XFEATURE_MASK_PKRU)
6 - xrstor(s) host fpstate to fpu regs.
7 - kvm_vcpu_ioctl_x86_get_xsave() copy guest fpstate to userspace (with
    XFEATURE_MASK_PKRU, which should not be supported by guest vcpu)

On 5, even though the guest does not support PKRU, it does have the flag
set on guest fpstate, which is transferred to userspace via vcpu ioctl
KVM_GET_XSAVE.

This becomes a problem when the user decides on migrating the above guest
to another machine that does not support PKRU:
The new host restores guest's fpu regs to as they were before (xrstor(s)),
but since the new host don't support PKRU, a general-protection exception
ocurs in xrstor(s) and that crashes the guest.

This can be solved by making the guest's fpstate->user_xfeatures hold
a copy of guest_supported_xcr0. This way, on 7 the only flags copied to
userspace will be the ones compatible to guest requirements, and thus
there will be no issue during migration.

As a bonus, it will also fail if userspace tries to set fpu features
that are not compatible to the guest configuration. (KVM_SET_XSAVE ioctl)

Also, since kvm_vcpu_after_set_cpuid() now sets fpstate->user_xfeatures,
there is not need to set it in kvm_check_cpuid(). So, change
fpstate_realloc() so it does not touch fpstate->user_xfeatures if a
non-NULL guest_fpu is passed, which is the case when kvm_check_cpuid()
calls it.

Signed-off-by: Leonardo Bras <leobras@redhat.com>
---
 arch/x86/kernel/fpu/xstate.c | 5 ++++-
 arch/x86/kvm/cpuid.c         | 2 ++
 2 files changed, 6 insertions(+), 1 deletion(-)

Comments

David Edmondson Feb. 17, 2022, 12:07 p.m. UTC | #1
The single line summary is now out of date - there's no new masking.

On Thursday, 2022-02-17 at 02:30:29 -03, Leonardo Bras wrote:

> During host/guest switch (like in kvm_arch_vcpu_ioctl_run()), the kernel
> swaps the fpu between host/guest contexts, by using fpu_swap_kvm_fpstate().
>
> When xsave feature is available, the fpu swap is done by:
> - xsave(s) instruction, with guest's fpstate->xfeatures as mask, is used
>   to store the current state of the fpu registers to a buffer.
> - xrstor(s) instruction, with (fpu_kernel_cfg.max_features &
>   XFEATURE_MASK_FPSTATE) as mask, is used to put the buffer into fpu regs.
>
> For xsave(s) the mask is used to limit what parts of the fpu regs will
> be copied to the buffer. Likewise on xrstor(s), the mask is used to
> limit what parts of the fpu regs will be changed.
>
> The mask for xsave(s), the guest's fpstate->xfeatures, is defined on
> kvm_arch_vcpu_create(), which (in summary) sets it to all features
> supported by the cpu which are enabled on kernel config.
>
> This means that xsave(s) will save to guest buffer all the fpu regs
> contents the cpu has enabled when the guest is paused, even if they
> are not used.
>
> This would not be an issue, if xrstor(s) would also do that.
>
> xrstor(s)'s mask for host/guest swap is basically every valid feature
> contained in kernel config, except XFEATURE_MASK_PKRU.
> Accordingto kernel src, it is instead switched in switch_to() and
> flush_thread().
>
> Then, the following happens with a host supporting PKRU starts a
> guest that does not support it:
> 1 - Host has XFEATURE_MASK_PKRU set. 1st switch to guest,
> 2 - xsave(s) fpu regs to host fpustate (buffer has XFEATURE_MASK_PKRU)
> 3 - xrstor(s) guest fpustate to fpu regs (fpu regs have XFEATURE_MASK_PKRU)
> 4 - guest runs, then switch back to host,
> 5 - xsave(s) fpu regs to guest fpstate (buffer now have XFEATURE_MASK_PKRU)
> 6 - xrstor(s) host fpstate to fpu regs.
> 7 - kvm_vcpu_ioctl_x86_get_xsave() copy guest fpstate to userspace (with
>     XFEATURE_MASK_PKRU, which should not be supported by guest vcpu)
>
> On 5, even though the guest does not support PKRU, it does have the flag
> set on guest fpstate, which is transferred to userspace via vcpu ioctl
> KVM_GET_XSAVE.
>
> This becomes a problem when the user decides on migrating the above guest
> to another machine that does not support PKRU:
> The new host restores guest's fpu regs to as they were before (xrstor(s)),
> but since the new host don't support PKRU, a general-protection exception
> ocurs in xrstor(s) and that crashes the guest.
>
> This can be solved by making the guest's fpstate->user_xfeatures hold
> a copy of guest_supported_xcr0. This way, on 7 the only flags copied to
> userspace will be the ones compatible to guest requirements, and thus
> there will be no issue during migration.
>
> As a bonus, it will also fail if userspace tries to set fpu features
> that are not compatible to the guest configuration. (KVM_SET_XSAVE ioctl)
>
> Also, since kvm_vcpu_after_set_cpuid() now sets fpstate->user_xfeatures,
> there is not need to set it in kvm_check_cpuid(). So, change
> fpstate_realloc() so it does not touch fpstate->user_xfeatures if a
> non-NULL guest_fpu is passed, which is the case when kvm_check_cpuid()
> calls it.
>
> Signed-off-by: Leonardo Bras <leobras@redhat.com>
> ---
>  arch/x86/kernel/fpu/xstate.c | 5 ++++-
>  arch/x86/kvm/cpuid.c         | 2 ++
>  2 files changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
> index 02b3ddaf4f75..7c7824ae7862 100644
> --- a/arch/x86/kernel/fpu/xstate.c
> +++ b/arch/x86/kernel/fpu/xstate.c
> @@ -1558,7 +1558,10 @@ static int fpstate_realloc(u64 xfeatures, unsigned int ksize,
>  		fpregs_restore_userregs();
>
>  	newfps->xfeatures = curfps->xfeatures | xfeatures;
> -	newfps->user_xfeatures = curfps->user_xfeatures | xfeatures;
> +
> +	if (!guest_fpu)
> +		newfps->user_xfeatures = curfps->user_xfeatures | xfeatures;
> +
>  	newfps->xfd = curfps->xfd & ~xfeatures;
>
>  	/* Do the final updates within the locked region */
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index 494d4d351859..71125291c578 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -296,6 +296,8 @@ static void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
>  	vcpu->arch.guest_supported_xcr0 =
>  		cpuid_get_supported_xcr0(vcpu->arch.cpuid_entries, vcpu->arch.cpuid_nent);
>
> +	vcpu->arch.guest_fpu.fpstate->user_xfeatures = vcpu->arch.guest_supported_xcr0;
> +
>  	kvm_update_pv_runtime(vcpu);
>
>  	vcpu->arch.maxphyaddr = cpuid_query_maxphyaddr(vcpu);

dme.
Paolo Bonzini Feb. 17, 2022, 3:07 p.m. UTC | #2
On 2/17/22 13:07, David Edmondson wrote:
> The single line summary is now out of date - there's no new masking.

Thanks for the review, I made the adjustments and pushed to master.

Paolo

> On Thursday, 2022-02-17 at 02:30:29 -03, Leonardo Bras wrote:
> 
>> During host/guest switch (like in kvm_arch_vcpu_ioctl_run()), the kernel
>> swaps the fpu between host/guest contexts, by using fpu_swap_kvm_fpstate().
>>
>> When xsave feature is available, the fpu swap is done by:
>> - xsave(s) instruction, with guest's fpstate->xfeatures as mask, is used
>>    to store the current state of the fpu registers to a buffer.
>> - xrstor(s) instruction, with (fpu_kernel_cfg.max_features &
>>    XFEATURE_MASK_FPSTATE) as mask, is used to put the buffer into fpu regs.
>>
>> For xsave(s) the mask is used to limit what parts of the fpu regs will
>> be copied to the buffer. Likewise on xrstor(s), the mask is used to
>> limit what parts of the fpu regs will be changed.
>>
>> The mask for xsave(s), the guest's fpstate->xfeatures, is defined on
>> kvm_arch_vcpu_create(), which (in summary) sets it to all features
>> supported by the cpu which are enabled on kernel config.
>>
>> This means that xsave(s) will save to guest buffer all the fpu regs
>> contents the cpu has enabled when the guest is paused, even if they
>> are not used.
>>
>> This would not be an issue, if xrstor(s) would also do that.
>>
>> xrstor(s)'s mask for host/guest swap is basically every valid feature
>> contained in kernel config, except XFEATURE_MASK_PKRU.
>> Accordingto kernel src, it is instead switched in switch_to() and
>> flush_thread().
>>
>> Then, the following happens with a host supporting PKRU starts a
>> guest that does not support it:
>> 1 - Host has XFEATURE_MASK_PKRU set. 1st switch to guest,
>> 2 - xsave(s) fpu regs to host fpustate (buffer has XFEATURE_MASK_PKRU)
>> 3 - xrstor(s) guest fpustate to fpu regs (fpu regs have XFEATURE_MASK_PKRU)
>> 4 - guest runs, then switch back to host,
>> 5 - xsave(s) fpu regs to guest fpstate (buffer now have XFEATURE_MASK_PKRU)
>> 6 - xrstor(s) host fpstate to fpu regs.
>> 7 - kvm_vcpu_ioctl_x86_get_xsave() copy guest fpstate to userspace (with
>>      XFEATURE_MASK_PKRU, which should not be supported by guest vcpu)
>>
>> On 5, even though the guest does not support PKRU, it does have the flag
>> set on guest fpstate, which is transferred to userspace via vcpu ioctl
>> KVM_GET_XSAVE.
>>
>> This becomes a problem when the user decides on migrating the above guest
>> to another machine that does not support PKRU:
>> The new host restores guest's fpu regs to as they were before (xrstor(s)),
>> but since the new host don't support PKRU, a general-protection exception
>> ocurs in xrstor(s) and that crashes the guest.
>>
>> This can be solved by making the guest's fpstate->user_xfeatures hold
>> a copy of guest_supported_xcr0. This way, on 7 the only flags copied to
>> userspace will be the ones compatible to guest requirements, and thus
>> there will be no issue during migration.
>>
>> As a bonus, it will also fail if userspace tries to set fpu features
>> that are not compatible to the guest configuration. (KVM_SET_XSAVE ioctl)
>>
>> Also, since kvm_vcpu_after_set_cpuid() now sets fpstate->user_xfeatures,
>> there is not need to set it in kvm_check_cpuid(). So, change
>> fpstate_realloc() so it does not touch fpstate->user_xfeatures if a
>> non-NULL guest_fpu is passed, which is the case when kvm_check_cpuid()
>> calls it.
>>
>> Signed-off-by: Leonardo Bras <leobras@redhat.com>
>> ---
>>   arch/x86/kernel/fpu/xstate.c | 5 ++++-
>>   arch/x86/kvm/cpuid.c         | 2 ++
>>   2 files changed, 6 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
>> index 02b3ddaf4f75..7c7824ae7862 100644
>> --- a/arch/x86/kernel/fpu/xstate.c
>> +++ b/arch/x86/kernel/fpu/xstate.c
>> @@ -1558,7 +1558,10 @@ static int fpstate_realloc(u64 xfeatures, unsigned int ksize,
>>   		fpregs_restore_userregs();
>>
>>   	newfps->xfeatures = curfps->xfeatures | xfeatures;
>> -	newfps->user_xfeatures = curfps->user_xfeatures | xfeatures;
>> +
>> +	if (!guest_fpu)
>> +		newfps->user_xfeatures = curfps->user_xfeatures | xfeatures;
>> +
>>   	newfps->xfd = curfps->xfd & ~xfeatures;
>>
>>   	/* Do the final updates within the locked region */
>> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
>> index 494d4d351859..71125291c578 100644
>> --- a/arch/x86/kvm/cpuid.c
>> +++ b/arch/x86/kvm/cpuid.c
>> @@ -296,6 +296,8 @@ static void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
>>   	vcpu->arch.guest_supported_xcr0 =
>>   		cpuid_get_supported_xcr0(vcpu->arch.cpuid_entries, vcpu->arch.cpuid_nent);
>>
>> +	vcpu->arch.guest_fpu.fpstate->user_xfeatures = vcpu->arch.guest_supported_xcr0;
>> +
>>   	kvm_update_pv_runtime(vcpu);
>>
>>   	vcpu->arch.maxphyaddr = cpuid_query_maxphyaddr(vcpu);
> 
> dme.
diff mbox series

Patch

diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 02b3ddaf4f75..7c7824ae7862 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -1558,7 +1558,10 @@  static int fpstate_realloc(u64 xfeatures, unsigned int ksize,
 		fpregs_restore_userregs();
 
 	newfps->xfeatures = curfps->xfeatures | xfeatures;
-	newfps->user_xfeatures = curfps->user_xfeatures | xfeatures;
+
+	if (!guest_fpu)
+		newfps->user_xfeatures = curfps->user_xfeatures | xfeatures;
+
 	newfps->xfd = curfps->xfd & ~xfeatures;
 
 	/* Do the final updates within the locked region */
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 494d4d351859..71125291c578 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -296,6 +296,8 @@  static void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
 	vcpu->arch.guest_supported_xcr0 =
 		cpuid_get_supported_xcr0(vcpu->arch.cpuid_entries, vcpu->arch.cpuid_nent);
 
+	vcpu->arch.guest_fpu.fpstate->user_xfeatures = vcpu->arch.guest_supported_xcr0;
+
 	kvm_update_pv_runtime(vcpu);
 
 	vcpu->arch.maxphyaddr = cpuid_query_maxphyaddr(vcpu);