diff mbox series

[v10,10/28] x86/fpu/xstate: Update the XSTATE save function to support dynamic states

Message ID 20210825155413.19673-11-chang.seok.bae@intel.com (mailing list archive)
State New, archived
Headers show
Series None | expand

Commit Message

Chang S. Bae Aug. 25, 2021, 3:53 p.m. UTC
Extend os_xsave() to receive a mask argument of which states to save, in
preparation for dynamic user state handling.

Update KVM to set a valid fpu->state_mask, so it can continue to share with
the core code.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: kvm@vger.kernel.org
---
Changes from v5:
* Adjusted the changelog and code for the new base code.

Changes from v3:
* Updated the changelog. (Borislav Petkov)
* Made the code change more reviewable.

Changes from v2:
* Updated the changelog to clarify the KVM code changes.
---
 arch/x86/include/asm/fpu/internal.h | 3 +--
 arch/x86/kernel/fpu/core.c          | 2 +-
 arch/x86/kernel/fpu/signal.c        | 2 +-
 arch/x86/kvm/x86.c                  | 9 +++++++--
 4 files changed, 10 insertions(+), 6 deletions(-)

Comments

Thomas Gleixner Oct. 1, 2021, 3:41 p.m. UTC | #1
On Wed, Aug 25 2021 at 08:53, Chang S. Bae wrote:
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 74dde635df40..7c46747f6865 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -9899,11 +9899,16 @@ static void kvm_save_current_fpu(struct fpu *fpu)
>  	 * KVM does not support dynamic user states yet. Assume the buffer
>  	 * always has the minimum size.
>  	 */
> -	if (test_thread_flag(TIF_NEED_FPU_LOAD))
> +	if (test_thread_flag(TIF_NEED_FPU_LOAD)) {
>  		memcpy(fpu->state, current->thread.fpu.state,
>  		       fpu_buf_cfg.min_size);

What happens with the rest of the state?

> -	else
> +	} else {
> +		struct fpu *src_fpu = &current->thread.fpu;
> +
> +		if (fpu->state_mask != src_fpu->state_mask)
> +			fpu->state_mask = src_fpu->state_mask;

What guarantees that the state size of @fpu is big enough when src_fpu
has dynamic features included?

>  		save_fpregs_to_fpstate(fpu);

Thanks,

        tglx
Thomas Gleixner Oct. 2, 2021, 9:31 p.m. UTC | #2
On Fri, Oct 01 2021 at 17:41, Thomas Gleixner wrote:
> On Wed, Aug 25 2021 at 08:53, Chang S. Bae wrote:
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index 74dde635df40..7c46747f6865 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -9899,11 +9899,16 @@ static void kvm_save_current_fpu(struct fpu *fpu)
>>  	 * KVM does not support dynamic user states yet. Assume the buffer
>>  	 * always has the minimum size.

I have to come back to this because that assumption is just broken.

create_vcpu()
   vcpu->user_fpu = alloc_default_fpu_size();
   vcpu->guest_fpu = alloc_default_fpu_size();

vcpu_task()
   get_amx_permission()
   use_amx()
     #NM
     alloc_larger_state()
   ...
   kvm_arch_vcpu_ioctl_run()
     kvm_arch_vcpu_ioctl_run()
       kvm_load_guest_fpu()
         kvm_save_current_fpu(vcpu->arch.user_fpu);
           save_fpregs_to_fpstate(fpu);         <- Out of bounds write

Adding a comment that KVM does not yet support dynamic user states does
not cut it, really.

Even if the above is unlikely, it is possible and has to be handled
correctly at the point where AMX support is enabled in the kernel
independent of guest support.

You have two options:

  1) Always allocate the large buffer size which is required to
     accomodate all possible features.

     Trivial, but waste of memory.

  2) Make the allocation dynamic which seems to be trivial to do in
     kvm_load_guest_fpu() at least for vcpu->user_fpu.

     The vcpu->guest_fpu handling can probably be postponed to the
     point where AMX is actually exposed to guests, but it's probably
     not the worst idea to think about the implications now.

Paolo, any opinions?

Thanks,

        tglx
Chang S. Bae Oct. 2, 2021, 10:54 p.m. UTC | #3
On Oct 2, 2021, at 14:31, Thomas Gleixner <tglx@linutronix.de> wrote:
> On Fri, Oct 01 2021 at 17:41, Thomas Gleixner wrote:
>> On Wed, Aug 25 2021 at 08:53, Chang S. Bae wrote:
>>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>>> index 74dde635df40..7c46747f6865 100644
>>> --- a/arch/x86/kvm/x86.c
>>> +++ b/arch/x86/kvm/x86.c
>>> @@ -9899,11 +9899,16 @@ static void kvm_save_current_fpu(struct fpu *fpu)
>>> 	 * KVM does not support dynamic user states yet. Assume the buffer
>>> 	 * always has the minimum size.
> 
> I have to come back to this because that assumption is just broken.
> 
> create_vcpu()
>   vcpu->user_fpu = alloc_default_fpu_size();
>   vcpu->guest_fpu = alloc_default_fpu_size();
> 
> vcpu_task()
>   get_amx_permission()
>   use_amx()
>     #NM
>     alloc_larger_state()
>   ...
>   kvm_arch_vcpu_ioctl_run()
>     kvm_arch_vcpu_ioctl_run()
>       kvm_load_guest_fpu()
>         kvm_save_current_fpu(vcpu->arch.user_fpu);
>           save_fpregs_to_fpstate(fpu);         <- Out of bounds write
> 
> Adding a comment that KVM does not yet support dynamic user states does
> not cut it, really.
> 
> Even if the above is unlikely, it is possible and has to be handled
> correctly at the point where AMX support is enabled in the kernel
> independent of guest support.

I see. 

> You have two options:
> 
>  1) Always allocate the large buffer size which is required to
>     accomodate all possible features.
> 
>     Trivial, but waste of memory.
> 
>  2) Make the allocation dynamic which seems to be trivial to do in
>     kvm_load_guest_fpu() at least for vcpu->user_fpu.
> 
>     The vcpu->guest_fpu handling can probably be postponed to the
>     point where AMX is actually exposed to guests, but it's probably
>     not the worst idea to think about the implications now.

FWIW, the proposed KVM patch for AMX looks to take (1) here [1] and
Paolo said [2]:

    Most guests will not need the whole xstate feature set.  So perhaps you 
    could set XFD to the host value | the guest value, trap #NM if the 
    host XFD is zero, and possibly reflect the exception to the guest's XFD 
    and XFD_ERR.

    In addition, loading the guest XFD MSRs should use the MSR autoload 
    feature (add_atomic_switch_msr).

And then I guess discussion goes on..

Thanks,
Chang

[1] https://lore.kernel.org/kvm/20210207154256.52850-4-jing2.liu@linux.intel.com/
[2] https://lore.kernel.org/kvm/ae5b0195-b04f-8eef-9e0d-2a46c761d2d5@redhat.com/
Paolo Bonzini Oct. 5, 2021, 7:50 a.m. UTC | #4
On 02/10/21 23:31, Thomas Gleixner wrote:
> You have two options:
> 
>    1) Always allocate the large buffer size which is required to
>       accomodate all possible features.
> 
>       Trivial, but waste of memory.
> 
>    2) Make the allocation dynamic which seems to be trivial to do in
>       kvm_load_guest_fpu() at least for vcpu->user_fpu.
> 
>       The vcpu->guest_fpu handling can probably be postponed to the
>       point where AMX is actually exposed to guests, but it's probably
>       not the worst idea to think about the implications now.
> 
> Paolo, any opinions?

Unless we're missing something, dynamic allocation should not be hard to 
do for both guest_fpu and user_fpu; either near the call sites of 
kvm_save_current_fpu, or in the function itself.  Basically adding 
something like

	struct kvm_fpu {
		struct fpu *state;
		unsigned size;
	} user_fpu, guest_fpu;

to struct kvm_vcpu.  Since the size can vary, it can be done simply with 
kzalloc instead of the x86_fpu_cache that KVM has now.

The only small complication is that kvm_save_current_fpu is called 
within fpregs_lock; the allocation has to be outside so that you can use 
GFP_KERNEL even on RT kernels.   If the code looks better with 
fpregs_lock moved within kvm_save_current_fpu, go ahead and do it like that.

Paolo
Paolo Bonzini Oct. 5, 2021, 8:16 a.m. UTC | #5
On 03/10/21 00:54, Bae, Chang Seok wrote:
> FWIW, the proposed KVM patch for AMX looks to take (1) here [1] and
> Paolo said [2]:
> 
>      Most guests will not need the whole xstate feature set.  So perhaps you
>      could set XFD to the host value | the guest value, trap #NM if the
>      host XFD is zero, and possibly reflect the exception to the guest's XFD
                    ^^^^

(better: if the host XFD is nonzero, and the guest XCR0 includes any bit 
whose state is optional).

>      and XFD_ERR.

This comment is about letting arch/x86/kernel resize current->thread.fpu 
while the guest runs.  It's not necessary before KVM supports AMX, 
because KVM will never let a guest set XCR0[18] (__kvm_set_xcr).

Thomas instead was talking about allocation of vcpu->arch.guest_fpu and 
vcpu->arch.user_fpu.

For dynamic allocation in kvm_save_current_fpu, you can retrieve the 
XINUSE bitmask (XGETBV with RCX=1).  If it contains any bits that have 
optional state, you check if KVM's vcpu->arch.guest_fpu or 
vcpu->arch.user_fpu are already big enough, and if not do the reallocation.

If X86_FEATURE_XGETBV1 is not available, you will not need to resize. 
If XFD is supported but X86_FEATURE_XGETBV1 is not, you can just make 
kvm_arch_init fail with -ENODEV.  It's a nonsensical combination.

Thanks,

Paolo

>      In addition, loading the guest XFD MSRs should use the MSR autoload
>      feature (add_atomic_switch_msr).
> 
> And then I guess discussion goes on..
Thomas Gleixner Oct. 5, 2021, 9:55 a.m. UTC | #6
On Tue, Oct 05 2021 at 09:50, Paolo Bonzini wrote:
> On 02/10/21 23:31, Thomas Gleixner wrote:
>> You have two options:
>> 
>>    1) Always allocate the large buffer size which is required to
>>       accomodate all possible features.
>> 
>>       Trivial, but waste of memory.
>> 
>>    2) Make the allocation dynamic which seems to be trivial to do in
>>       kvm_load_guest_fpu() at least for vcpu->user_fpu.
>> 
>>       The vcpu->guest_fpu handling can probably be postponed to the
>>       point where AMX is actually exposed to guests, but it's probably
>>       not the worst idea to think about the implications now.
>> 
>> Paolo, any opinions?
>
> Unless we're missing something, dynamic allocation should not be hard to 
> do for both guest_fpu and user_fpu; either near the call sites of 
> kvm_save_current_fpu, or in the function itself.  Basically adding 
> something like
>
> 	struct kvm_fpu {
> 		struct fpu *state;
> 		unsigned size;
> 	} user_fpu, guest_fpu;
>
> to struct kvm_vcpu.  Since the size can vary, it can be done simply with 
> kzalloc instead of the x86_fpu_cache that KVM has now.
>
> The only small complication is that kvm_save_current_fpu is called 
> within fpregs_lock; the allocation has to be outside so that you can use 
> GFP_KERNEL even on RT kernels.   If the code looks better with 
> fpregs_lock moved within kvm_save_current_fpu, go ahead and do it like that.

I'm reworking quite some of this already and with the new bits you don't
have to do anything in kvm_fpu because the size and allowed feature bits
are going to be part of fpu->fpstate.

Stay tuned.

Thanks,

        tglx
diff mbox series

Patch

diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
index d2fc19c0e457..263e349ff85a 100644
--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -298,9 +298,8 @@  static inline void os_xrstor_booting(struct xregs_state *xstate)
  * Uses either XSAVE or XSAVEOPT or XSAVES depending on the CPU features
  * and command line options. The choice is permanent until the next reboot.
  */
-static inline void os_xsave(struct xregs_state *xstate)
+static inline void os_xsave(struct xregs_state *xstate, u64 mask)
 {
-	u64 mask = xfeatures_mask_all;
 	u32 lmask = mask;
 	u32 hmask = mask >> 32;
 	int err;
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index 2941d03912db..164e75c37dbb 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -99,7 +99,7 @@  EXPORT_SYMBOL(irq_fpu_usable);
 void save_fpregs_to_fpstate(struct fpu *fpu)
 {
 	if (likely(use_xsave())) {
-		os_xsave(&fpu->state->xsave);
+		os_xsave(&fpu->state->xsave, fpu->state_mask);
 
 		/*
 		 * AVX512 state is tracked here because its use is
diff --git a/arch/x86/kernel/fpu/signal.c b/arch/x86/kernel/fpu/signal.c
index 8b333b1a4d07..fe2732db6d6b 100644
--- a/arch/x86/kernel/fpu/signal.c
+++ b/arch/x86/kernel/fpu/signal.c
@@ -365,7 +365,7 @@  static int __fpu_restore_sig(void __user *buf, void __user *buf_fx,
 		 * the right place in memory. It's ia32 mode. Shrug.
 		 */
 		if (xfeatures_mask_supervisor())
-			os_xsave(&fpu->state->xsave);
+			os_xsave(&fpu->state->xsave, fpu->state_mask);
 		set_thread_flag(TIF_NEED_FPU_LOAD);
 	}
 	__fpu_invalidate_fpregs_state(fpu);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 74dde635df40..7c46747f6865 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9899,11 +9899,16 @@  static void kvm_save_current_fpu(struct fpu *fpu)
 	 * KVM does not support dynamic user states yet. Assume the buffer
 	 * always has the minimum size.
 	 */
-	if (test_thread_flag(TIF_NEED_FPU_LOAD))
+	if (test_thread_flag(TIF_NEED_FPU_LOAD)) {
 		memcpy(fpu->state, current->thread.fpu.state,
 		       fpu_buf_cfg.min_size);
-	else
+	} else {
+		struct fpu *src_fpu = &current->thread.fpu;
+
+		if (fpu->state_mask != src_fpu->state_mask)
+			fpu->state_mask = src_fpu->state_mask;
 		save_fpregs_to_fpstate(fpu);
+	}
 }
 
 /* Swap (qemu) user FPU context for the guest FPU context. */