Message ID | 20210825155413.19673-11-chang.seok.bae@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | None | expand |
On Wed, Aug 25 2021 at 08:53, Chang S. Bae wrote: > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index 74dde635df40..7c46747f6865 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -9899,11 +9899,16 @@ static void kvm_save_current_fpu(struct fpu *fpu) > * KVM does not support dynamic user states yet. Assume the buffer > * always has the minimum size. > */ > - if (test_thread_flag(TIF_NEED_FPU_LOAD)) > + if (test_thread_flag(TIF_NEED_FPU_LOAD)) { > memcpy(fpu->state, current->thread.fpu.state, > fpu_buf_cfg.min_size); What happens with the rest of the state? > - else > + } else { > + struct fpu *src_fpu = ¤t->thread.fpu; > + > + if (fpu->state_mask != src_fpu->state_mask) > + fpu->state_mask = src_fpu->state_mask; What guarantees that the state size of @fpu is big enough when src_fpu has dynamic features included? > save_fpregs_to_fpstate(fpu); Thanks, tglx
On Fri, Oct 01 2021 at 17:41, Thomas Gleixner wrote: > On Wed, Aug 25 2021 at 08:53, Chang S. Bae wrote: >> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c >> index 74dde635df40..7c46747f6865 100644 >> --- a/arch/x86/kvm/x86.c >> +++ b/arch/x86/kvm/x86.c >> @@ -9899,11 +9899,16 @@ static void kvm_save_current_fpu(struct fpu *fpu) >> * KVM does not support dynamic user states yet. Assume the buffer >> * always has the minimum size. I have to come back to this because that assumption is just broken. create_vcpu() vcpu->user_fpu = alloc_default_fpu_size(); vcpu->guest_fpu = alloc_default_fpu_size(); vcpu_task() get_amx_permission() use_amx() #NM alloc_larger_state() ... kvm_arch_vcpu_ioctl_run() kvm_arch_vcpu_ioctl_run() kvm_load_guest_fpu() kvm_save_current_fpu(vcpu->arch.user_fpu); save_fpregs_to_fpstate(fpu); <- Out of bounds write Adding a comment that KVM does not yet support dynamic user states does not cut it, really. Even if the above is unlikely, it is possible and has to be handled correctly at the point where AMX support is enabled in the kernel independent of guest support. You have two options: 1) Always allocate the large buffer size which is required to accomodate all possible features. Trivial, but waste of memory. 2) Make the allocation dynamic which seems to be trivial to do in kvm_load_guest_fpu() at least for vcpu->user_fpu. The vcpu->guest_fpu handling can probably be postponed to the point where AMX is actually exposed to guests, but it's probably not the worst idea to think about the implications now. Paolo, any opinions? Thanks, tglx
On Oct 2, 2021, at 14:31, Thomas Gleixner <tglx@linutronix.de> wrote: > On Fri, Oct 01 2021 at 17:41, Thomas Gleixner wrote: >> On Wed, Aug 25 2021 at 08:53, Chang S. Bae wrote: >>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c >>> index 74dde635df40..7c46747f6865 100644 >>> --- a/arch/x86/kvm/x86.c >>> +++ b/arch/x86/kvm/x86.c >>> @@ -9899,11 +9899,16 @@ static void kvm_save_current_fpu(struct fpu *fpu) >>> * KVM does not support dynamic user states yet. Assume the buffer >>> * always has the minimum size. > > I have to come back to this because that assumption is just broken. > > create_vcpu() > vcpu->user_fpu = alloc_default_fpu_size(); > vcpu->guest_fpu = alloc_default_fpu_size(); > > vcpu_task() > get_amx_permission() > use_amx() > #NM > alloc_larger_state() > ... > kvm_arch_vcpu_ioctl_run() > kvm_arch_vcpu_ioctl_run() > kvm_load_guest_fpu() > kvm_save_current_fpu(vcpu->arch.user_fpu); > save_fpregs_to_fpstate(fpu); <- Out of bounds write > > Adding a comment that KVM does not yet support dynamic user states does > not cut it, really. > > Even if the above is unlikely, it is possible and has to be handled > correctly at the point where AMX support is enabled in the kernel > independent of guest support. I see. > You have two options: > > 1) Always allocate the large buffer size which is required to > accomodate all possible features. > > Trivial, but waste of memory. > > 2) Make the allocation dynamic which seems to be trivial to do in > kvm_load_guest_fpu() at least for vcpu->user_fpu. > > The vcpu->guest_fpu handling can probably be postponed to the > point where AMX is actually exposed to guests, but it's probably > not the worst idea to think about the implications now. FWIW, the proposed KVM patch for AMX looks to take (1) here [1] and Paolo said [2]: Most guests will not need the whole xstate feature set. So perhaps you could set XFD to the host value | the guest value, trap #NM if the host XFD is zero, and possibly reflect the exception to the guest's XFD and XFD_ERR. In addition, loading the guest XFD MSRs should use the MSR autoload feature (add_atomic_switch_msr). And then I guess discussion goes on.. Thanks, Chang [1] https://lore.kernel.org/kvm/20210207154256.52850-4-jing2.liu@linux.intel.com/ [2] https://lore.kernel.org/kvm/ae5b0195-b04f-8eef-9e0d-2a46c761d2d5@redhat.com/
On 02/10/21 23:31, Thomas Gleixner wrote: > You have two options: > > 1) Always allocate the large buffer size which is required to > accomodate all possible features. > > Trivial, but waste of memory. > > 2) Make the allocation dynamic which seems to be trivial to do in > kvm_load_guest_fpu() at least for vcpu->user_fpu. > > The vcpu->guest_fpu handling can probably be postponed to the > point where AMX is actually exposed to guests, but it's probably > not the worst idea to think about the implications now. > > Paolo, any opinions? Unless we're missing something, dynamic allocation should not be hard to do for both guest_fpu and user_fpu; either near the call sites of kvm_save_current_fpu, or in the function itself. Basically adding something like struct kvm_fpu { struct fpu *state; unsigned size; } user_fpu, guest_fpu; to struct kvm_vcpu. Since the size can vary, it can be done simply with kzalloc instead of the x86_fpu_cache that KVM has now. The only small complication is that kvm_save_current_fpu is called within fpregs_lock; the allocation has to be outside so that you can use GFP_KERNEL even on RT kernels. If the code looks better with fpregs_lock moved within kvm_save_current_fpu, go ahead and do it like that. Paolo
On 03/10/21 00:54, Bae, Chang Seok wrote: > FWIW, the proposed KVM patch for AMX looks to take (1) here [1] and > Paolo said [2]: > > Most guests will not need the whole xstate feature set. So perhaps you > could set XFD to the host value | the guest value, trap #NM if the > host XFD is zero, and possibly reflect the exception to the guest's XFD ^^^^ (better: if the host XFD is nonzero, and the guest XCR0 includes any bit whose state is optional). > and XFD_ERR. This comment is about letting arch/x86/kernel resize current->thread.fpu while the guest runs. It's not necessary before KVM supports AMX, because KVM will never let a guest set XCR0[18] (__kvm_set_xcr). Thomas instead was talking about allocation of vcpu->arch.guest_fpu and vcpu->arch.user_fpu. For dynamic allocation in kvm_save_current_fpu, you can retrieve the XINUSE bitmask (XGETBV with RCX=1). If it contains any bits that have optional state, you check if KVM's vcpu->arch.guest_fpu or vcpu->arch.user_fpu are already big enough, and if not do the reallocation. If X86_FEATURE_XGETBV1 is not available, you will not need to resize. If XFD is supported but X86_FEATURE_XGETBV1 is not, you can just make kvm_arch_init fail with -ENODEV. It's a nonsensical combination. Thanks, Paolo > In addition, loading the guest XFD MSRs should use the MSR autoload > feature (add_atomic_switch_msr). > > And then I guess discussion goes on..
On Tue, Oct 05 2021 at 09:50, Paolo Bonzini wrote: > On 02/10/21 23:31, Thomas Gleixner wrote: >> You have two options: >> >> 1) Always allocate the large buffer size which is required to >> accomodate all possible features. >> >> Trivial, but waste of memory. >> >> 2) Make the allocation dynamic which seems to be trivial to do in >> kvm_load_guest_fpu() at least for vcpu->user_fpu. >> >> The vcpu->guest_fpu handling can probably be postponed to the >> point where AMX is actually exposed to guests, but it's probably >> not the worst idea to think about the implications now. >> >> Paolo, any opinions? > > Unless we're missing something, dynamic allocation should not be hard to > do for both guest_fpu and user_fpu; either near the call sites of > kvm_save_current_fpu, or in the function itself. Basically adding > something like > > struct kvm_fpu { > struct fpu *state; > unsigned size; > } user_fpu, guest_fpu; > > to struct kvm_vcpu. Since the size can vary, it can be done simply with > kzalloc instead of the x86_fpu_cache that KVM has now. > > The only small complication is that kvm_save_current_fpu is called > within fpregs_lock; the allocation has to be outside so that you can use > GFP_KERNEL even on RT kernels. If the code looks better with > fpregs_lock moved within kvm_save_current_fpu, go ahead and do it like that. I'm reworking quite some of this already and with the new bits you don't have to do anything in kvm_fpu because the size and allowed feature bits are going to be part of fpu->fpstate. Stay tuned. Thanks, tglx
diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h index d2fc19c0e457..263e349ff85a 100644 --- a/arch/x86/include/asm/fpu/internal.h +++ b/arch/x86/include/asm/fpu/internal.h @@ -298,9 +298,8 @@ static inline void os_xrstor_booting(struct xregs_state *xstate) * Uses either XSAVE or XSAVEOPT or XSAVES depending on the CPU features * and command line options. The choice is permanent until the next reboot. */ -static inline void os_xsave(struct xregs_state *xstate) +static inline void os_xsave(struct xregs_state *xstate, u64 mask) { - u64 mask = xfeatures_mask_all; u32 lmask = mask; u32 hmask = mask >> 32; int err; diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c index 2941d03912db..164e75c37dbb 100644 --- a/arch/x86/kernel/fpu/core.c +++ b/arch/x86/kernel/fpu/core.c @@ -99,7 +99,7 @@ EXPORT_SYMBOL(irq_fpu_usable); void save_fpregs_to_fpstate(struct fpu *fpu) { if (likely(use_xsave())) { - os_xsave(&fpu->state->xsave); + os_xsave(&fpu->state->xsave, fpu->state_mask); /* * AVX512 state is tracked here because its use is diff --git a/arch/x86/kernel/fpu/signal.c b/arch/x86/kernel/fpu/signal.c index 8b333b1a4d07..fe2732db6d6b 100644 --- a/arch/x86/kernel/fpu/signal.c +++ b/arch/x86/kernel/fpu/signal.c @@ -365,7 +365,7 @@ static int __fpu_restore_sig(void __user *buf, void __user *buf_fx, * the right place in memory. It's ia32 mode. Shrug. */ if (xfeatures_mask_supervisor()) - os_xsave(&fpu->state->xsave); + os_xsave(&fpu->state->xsave, fpu->state_mask); set_thread_flag(TIF_NEED_FPU_LOAD); } __fpu_invalidate_fpregs_state(fpu); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 74dde635df40..7c46747f6865 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -9899,11 +9899,16 @@ static void kvm_save_current_fpu(struct fpu *fpu) * KVM does not support dynamic user states yet. Assume the buffer * always has the minimum size. */ - if (test_thread_flag(TIF_NEED_FPU_LOAD)) + if (test_thread_flag(TIF_NEED_FPU_LOAD)) { memcpy(fpu->state, current->thread.fpu.state, fpu_buf_cfg.min_size); - else + } else { + struct fpu *src_fpu = ¤t->thread.fpu; + + if (fpu->state_mask != src_fpu->state_mask) + fpu->state_mask = src_fpu->state_mask; save_fpregs_to_fpstate(fpu); + } } /* Swap (qemu) user FPU context for the guest FPU context. */