Message ID | 20220211060742.34083-1-leobras@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v3,1/1] x86/kvm/fpu: Mask guest fpstate->xfeatures with guest_supported_xcr0 | expand |
On 2/11/22 07:07, Leonardo Bras wrote: > During host/guest switch (like in kvm_arch_vcpu_ioctl_run()), the kernel > swaps the fpu between host/guest contexts, by using fpu_swap_kvm_fpstate(). > > When xsave feature is available, the fpu swap is done by: > - xsave(s) instruction, with guest's fpstate->xfeatures as mask, is used > to store the current state of the fpu registers to a buffer. > - xrstor(s) instruction, with (fpu_kernel_cfg.max_features & > XFEATURE_MASK_FPSTATE) as mask, is used to put the buffer into fpu regs. > > For xsave(s) the mask is used to limit what parts of the fpu regs will > be copied to the buffer. Likewise on xrstor(s), the mask is used to > limit what parts of the fpu regs will be changed. > > The mask for xsave(s), the guest's fpstate->xfeatures, is defined on > kvm_arch_vcpu_create(), which (in summary) sets it to all features > supported by the cpu which are enabled on kernel config. > > This means that xsave(s) will save to guest buffer all the fpu regs > contents the cpu has enabled when the guest is paused, even if they > are not used. > > This would not be an issue, if xrstor(s) would also do that. > > xrstor(s)'s mask for host/guest swap is basically every valid feature > contained in kernel config, except XFEATURE_MASK_PKRU. > Accordingto kernel src, it is instead switched in switch_to() and > flush_thread(). > > Then, the following happens with a host supporting PKRU starts a > guest that does not support it: > 1 - Host has XFEATURE_MASK_PKRU set. 1st switch to guest, > 2 - xsave(s) fpu regs to host fpustate (buffer has XFEATURE_MASK_PKRU) > 3 - xrstor(s) guest fpustate to fpu regs (fpu regs have XFEATURE_MASK_PKRU) > 4 - guest runs, then switch back to host, > 5 - xsave(s) fpu regs to guest fpstate (buffer now have XFEATURE_MASK_PKRU) > 6 - xrstor(s) host fpstate to fpu regs. > 7 - kvm_vcpu_ioctl_x86_get_xsave() copy guest fpstate to userspace (with > XFEATURE_MASK_PKRU, which should not be supported by guest vcpu) > > On 5, even though the guest does not support PKRU, it does have the flag > set on guest fpstate, which is transferred to userspace via vcpu ioctl > KVM_GET_XSAVE. > > This becomes a problem when the user decides on migrating the above guest > to another machine that does not support PKRU: > The new host restores guest's fpu regs to as they were before (xrstor(s)), > but since the new host don't support PKRU, a general-protection exception > ocurs in xrstor(s) and that crashes the guest. > > This can be solved by making the guest's fpstate->user_xfeatures only hold > values compatible to guest_supported_xcr0. This way, on 7 the only flags > copied to userspace will be the ones compatible to guest requirements, > and thus there will be no issue during migration. > > As a bonus, will also fail if userspace tries to set fpu features > that are not compatible to the guest configuration. (KVM_SET_XSAVE ioctl) > > Signed-off-by: Leonardo Bras <leobras@redhat.com> > --- > arch/x86/kernel/fpu/core.c | 1 + > arch/x86/kvm/cpuid.c | 4 ++++ > 2 files changed, 5 insertions(+) > > diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c > index 8dea01ffc5c1..e83d8b1fbc83 100644 > --- a/arch/x86/kernel/fpu/core.c > +++ b/arch/x86/kernel/fpu/core.c > @@ -34,6 +34,7 @@ DEFINE_PER_CPU(u64, xfd_state); > /* The FPU state configuration data for kernel and user space */ > struct fpu_state_config fpu_kernel_cfg __ro_after_init; > struct fpu_state_config fpu_user_cfg __ro_after_init; > +EXPORT_SYMBOL(fpu_user_cfg); > > /* > * Represents the initial FPU state. It's mostly (but not completely) zeroes, > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c > index 494d4d351859..aecebd6bc490 100644 > --- a/arch/x86/kvm/cpuid.c > +++ b/arch/x86/kvm/cpuid.c > @@ -296,6 +296,10 @@ static void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu) > vcpu->arch.guest_supported_xcr0 = > cpuid_get_supported_xcr0(vcpu->arch.cpuid_entries, vcpu->arch.cpuid_nent); > > + /* Mask out features unsupported by guest */ > + vcpu->arch.guest_fpu.fpstate->user_xfeatures = > + fpu_user_cfg.default_features & vcpu->arch.guest_supported_xcr0; This is not correct, because default_features does not include the optional features (such as AMX) that were the original reason to go through all this mess. What about: vcpu->arch.guest_fpu.fpstate->user_xfeatures = vcpu->arch.guest_fpu.fpstate->xfeatures & vcpu->arch.guest_supported_xcr0; ? Paolo > kvm_update_pv_runtime(vcpu); > > vcpu->arch.maxphyaddr = cpuid_query_maxphyaddr(vcpu);
On Saturday, 2022-02-12 at 12:02:14 +01, Paolo Bonzini wrote: > On 2/11/22 07:07, Leonardo Bras wrote: >> During host/guest switch (like in kvm_arch_vcpu_ioctl_run()), the kernel >> swaps the fpu between host/guest contexts, by using fpu_swap_kvm_fpstate(). >> When xsave feature is available, the fpu swap is done by: >> - xsave(s) instruction, with guest's fpstate->xfeatures as mask, is used >> to store the current state of the fpu registers to a buffer. >> - xrstor(s) instruction, with (fpu_kernel_cfg.max_features & >> XFEATURE_MASK_FPSTATE) as mask, is used to put the buffer into fpu regs. >> For xsave(s) the mask is used to limit what parts of the fpu regs >> will >> be copied to the buffer. Likewise on xrstor(s), the mask is used to >> limit what parts of the fpu regs will be changed. >> The mask for xsave(s), the guest's fpstate->xfeatures, is defined on >> kvm_arch_vcpu_create(), which (in summary) sets it to all features >> supported by the cpu which are enabled on kernel config. >> This means that xsave(s) will save to guest buffer all the fpu regs >> contents the cpu has enabled when the guest is paused, even if they >> are not used. >> This would not be an issue, if xrstor(s) would also do that. >> xrstor(s)'s mask for host/guest swap is basically every valid >> feature >> contained in kernel config, except XFEATURE_MASK_PKRU. >> Accordingto kernel src, it is instead switched in switch_to() and >> flush_thread(). >> Then, the following happens with a host supporting PKRU starts a >> guest that does not support it: >> 1 - Host has XFEATURE_MASK_PKRU set. 1st switch to guest, >> 2 - xsave(s) fpu regs to host fpustate (buffer has XFEATURE_MASK_PKRU) >> 3 - xrstor(s) guest fpustate to fpu regs (fpu regs have XFEATURE_MASK_PKRU) >> 4 - guest runs, then switch back to host, >> 5 - xsave(s) fpu regs to guest fpstate (buffer now have XFEATURE_MASK_PKRU) >> 6 - xrstor(s) host fpstate to fpu regs. >> 7 - kvm_vcpu_ioctl_x86_get_xsave() copy guest fpstate to userspace (with >> XFEATURE_MASK_PKRU, which should not be supported by guest vcpu) >> On 5, even though the guest does not support PKRU, it does have the >> flag >> set on guest fpstate, which is transferred to userspace via vcpu ioctl >> KVM_GET_XSAVE. >> This becomes a problem when the user decides on migrating the above >> guest >> to another machine that does not support PKRU: >> The new host restores guest's fpu regs to as they were before (xrstor(s)), >> but since the new host don't support PKRU, a general-protection exception >> ocurs in xrstor(s) and that crashes the guest. >> This can be solved by making the guest's fpstate->user_xfeatures >> only hold >> values compatible to guest_supported_xcr0. This way, on 7 the only flags >> copied to userspace will be the ones compatible to guest requirements, >> and thus there will be no issue during migration. >> As a bonus, will also fail if userspace tries to set fpu features >> that are not compatible to the guest configuration. (KVM_SET_XSAVE ioctl) >> Signed-off-by: Leonardo Bras <leobras@redhat.com> >> --- >> arch/x86/kernel/fpu/core.c | 1 + >> arch/x86/kvm/cpuid.c | 4 ++++ >> 2 files changed, 5 insertions(+) >> diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c >> index 8dea01ffc5c1..e83d8b1fbc83 100644 >> --- a/arch/x86/kernel/fpu/core.c >> +++ b/arch/x86/kernel/fpu/core.c >> @@ -34,6 +34,7 @@ DEFINE_PER_CPU(u64, xfd_state); >> /* The FPU state configuration data for kernel and user space */ >> struct fpu_state_config fpu_kernel_cfg __ro_after_init; >> struct fpu_state_config fpu_user_cfg __ro_after_init; >> +EXPORT_SYMBOL(fpu_user_cfg); >> /* >> * Represents the initial FPU state. It's mostly (but not completely) zeroes, >> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c >> index 494d4d351859..aecebd6bc490 100644 >> --- a/arch/x86/kvm/cpuid.c >> +++ b/arch/x86/kvm/cpuid.c >> @@ -296,6 +296,10 @@ static void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu) >> vcpu->arch.guest_supported_xcr0 = >> cpuid_get_supported_xcr0(vcpu->arch.cpuid_entries, vcpu->arch.cpuid_nent); >> + /* Mask out features unsupported by guest */ >> + vcpu->arch.guest_fpu.fpstate->user_xfeatures = >> + fpu_user_cfg.default_features & vcpu->arch.guest_supported_xcr0; > > This is not correct, because default_features does not include the > optional features (such as AMX) that were the original reason to > go through all this mess. What about: > > vcpu->arch.guest_fpu.fpstate->user_xfeatures = > vcpu->arch.guest_fpu.fpstate->xfeatures & vcpu->arch.guest_supported_xcr0; > > ? Sorry if this is a daft question: In what situations will there be bits set in vcpu->arch.guest_supported_xcr0 that are not set in vcpu->arch.guest_fpu.fpstate->xfeatures ? guest_supported_xcr0 is filtered based on supported_xcr0, which I would expect to weed out all bits that are not set in ->xfeatures. > Paolo > >> kvm_update_pv_runtime(vcpu); >> vcpu->arch.maxphyaddr = cpuid_query_maxphyaddr(vcpu); dme.
On 2/14/22 10:43, David Edmondson wrote: > Sorry if this is a daft question: > > In what situations will there be bits set in > vcpu->arch.guest_supported_xcr0 that are not set in > vcpu->arch.guest_fpu.fpstate->xfeatures ? > > guest_supported_xcr0 is filtered based on supported_xcr0, which I would > expect to weed out all bits that are not set in ->xfeatures. Good point, so we can do just vcpu->arch.guest_fpu.fpstate->user_xfeatures = vcpu->arch.guest_supported_xcr0; On top of this patch, we can even replace vcpu->arch.guest_supported_xcr0 with vcpu->arch.guest_fpu.fpstate->user_xfeatures. Probably with local variables or wrapper functions though, so as to keep the code readable. For example: static inline u64 kvm_guest_supported_xfd() { u64 guest_supported_xcr0 = vcpu->arch.guest_fpu.fpstate->user_xfeatures; return guest_supported_xcr0 & XFEATURE_MASK_USER_DYNAMIC; } Also, already in this patch fpstate_realloc should do newfps->user_xfeatures = curfps->user_xfeatures | xfeatures; only if !guest_fpu. In other words, the user_xfeatures of the guest FPU should be controlled exclusively by KVM_SET_CPUID2. Thanks, Paolo
Hello Paolo, thanks for the feedback! On Mon, Feb 14, 2022 at 6:56 AM Paolo Bonzini <pbonzini@redhat.com> wrote: > > On 2/14/22 10:43, David Edmondson wrote: > > Sorry if this is a daft question: > > > > In what situations will there be bits set in > > vcpu->arch.guest_supported_xcr0 that are not set in > > vcpu->arch.guest_fpu.fpstate->xfeatures ? > > > > guest_supported_xcr0 is filtered based on supported_xcr0, which I would > > expect to weed out all bits that are not set in ->xfeatures. > > Good point, so we can do just > > vcpu->arch.guest_fpu.fpstate->user_xfeatures = > vcpu->arch.guest_supported_xcr0; Updated for v4. > > On top of this patch, we can even replace vcpu->arch.guest_supported_xcr0 > with vcpu->arch.guest_fpu.fpstate->user_xfeatures. Probably with local > variables or wrapper functions though, so as to keep the code readable. You mean another patch (#2) removing guest_supported_xcr0 field from kvm_vcpu_arch ? (and introducing something like kvm_guest_supported_xcr() ?) > For example: > > static inline u64 kvm_guest_supported_xfd() > { > u64 guest_supported_xcr0 = vcpu->arch.guest_fpu.fpstate->user_xfeatures; > > return guest_supported_xcr0 & XFEATURE_MASK_USER_DYNAMIC; > } Not sure If I get the above. Are you suggesting also removing fpstate->xfd and use a wrapper instead? Or is the above just an example? (s/xfd/xcr0/ & s/XFEATURE_MASK_USER_DYNAMIC/XFEATURE_MASK_USER_SUPPORTED/ ) > > Also, already in this patch fpstate_realloc should do > > newfps->user_xfeatures = curfps->user_xfeatures | xfeatures; > > only if !guest_fpu. In other words, the user_xfeatures of the guest FPU > should be controlled exclusively by KVM_SET_CPUID2. Just to check, you suggest adding this on patch #2 ? (I am failing to see how would that impact on #1) > > Thanks, > > Paolo > Thank you! Best regards, Leo
On 2/16/22 08:48, Leonardo Bras Soares Passos wrote: > On Mon, Feb 14, 2022 at 6:56 AM Paolo Bonzini <pbonzini@redhat.com> wrote: >> On top of this patch, we can even replace vcpu->arch.guest_supported_xcr0 >> with vcpu->arch.guest_fpu.fpstate->user_xfeatures. Probably with local >> variables or wrapper functions though, so as to keep the code readable. > > You mean another patch (#2) removing guest_supported_xcr0 field from > kvm_vcpu_arch ? > (and introducing something like kvm_guest_supported_xcr() ?) Yes, introducing both kvm_guest_supported_xcr0() that just reads user_xfeatures, and kvm_guest_supported_xfd() as below. >> For example: >> >> static inline u64 kvm_guest_supported_xfd() >> { >> u64 guest_supported_xcr0 = vcpu->arch.guest_fpu.fpstate->user_xfeatures; >> >> return guest_supported_xcr0 & XFEATURE_MASK_USER_DYNAMIC; >> } > > Not sure If I get the above. > Are you suggesting also removing fpstate->xfd and use a wrapper instead? > Or is the above just an example? > (s/xfd/xcr0/ & s/XFEATURE_MASK_USER_DYNAMIC/XFEATURE_MASK_USER_SUPPORTED/ ) The above is an example of how even "indirect" uses as guest_supported_xcr0 can be changed to a function. >> Also, already in this patch fpstate_realloc should do >> >> newfps->user_xfeatures = curfps->user_xfeatures | xfeatures; >> >> only if !guest_fpu. In other words, the user_xfeatures of the guest FPU >> should be controlled exclusively by KVM_SET_CPUID2. > > Just to check, you suggest adding this on patch #2 ? > (I am failing to see how would that impact on #1) In patch 1. Since KVM_SET_CPUID2 now changes newfps->user_xfeatures, it should be the only place where it's changed, and arch_prctl() should not change it anymore. Paolo
On Wed, Feb 16, 2022 at 8:45 AM Paolo Bonzini <pbonzini@redhat.com> wrote: > > On 2/16/22 08:48, Leonardo Bras Soares Passos wrote: > > On Mon, Feb 14, 2022 at 6:56 AM Paolo Bonzini <pbonzini@redhat.com> wrote: > >> On top of this patch, we can even replace vcpu->arch.guest_supported_xcr0 > >> with vcpu->arch.guest_fpu.fpstate->user_xfeatures. Probably with local > >> variables or wrapper functions though, so as to keep the code readable. > > > > You mean another patch (#2) removing guest_supported_xcr0 field from > > kvm_vcpu_arch ? > > (and introducing something like kvm_guest_supported_xcr() ?) > > Yes, introducing both kvm_guest_supported_xcr0() that just reads > user_xfeatures, and kvm_guest_supported_xfd() as below. Oh, I see. Thanks for clearing that up! > > >> For example: > >> > >> static inline u64 kvm_guest_supported_xfd() > >> { > >> u64 guest_supported_xcr0 = vcpu->arch.guest_fpu.fpstate->user_xfeatures; > >> > >> return guest_supported_xcr0 & XFEATURE_MASK_USER_DYNAMIC; > >> } > > > > Not sure If I get the above. > > Are you suggesting also removing fpstate->xfd and use a wrapper instead? > > Or is the above just an example? > > (s/xfd/xcr0/ & s/XFEATURE_MASK_USER_DYNAMIC/XFEATURE_MASK_USER_SUPPORTED/ ) > > The above is an example of how even "indirect" uses as > guest_supported_xcr0 can be changed to a function. > > >> Also, already in this patch fpstate_realloc should do > >> > >> newfps->user_xfeatures = curfps->user_xfeatures | xfeatures; > >> > >> only if !guest_fpu. In other words, the user_xfeatures of the guest FPU > >> should be controlled exclusively by KVM_SET_CPUID2. > > > > Just to check, you suggest adding this on patch #2 ? > > (I am failing to see how would that impact on #1) > > In patch 1. Since KVM_SET_CPUID2 now changes newfps->user_xfeatures, it > should be the only place where it's changed, and arch_prctl() should not > change it anymore. > > Paolo > Oh, yeah, that makes sense to me. Thanks for pointing that out! I will send a v4 shortly. Best regards, Leo
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c index 8dea01ffc5c1..e83d8b1fbc83 100644 --- a/arch/x86/kernel/fpu/core.c +++ b/arch/x86/kernel/fpu/core.c @@ -34,6 +34,7 @@ DEFINE_PER_CPU(u64, xfd_state); /* The FPU state configuration data for kernel and user space */ struct fpu_state_config fpu_kernel_cfg __ro_after_init; struct fpu_state_config fpu_user_cfg __ro_after_init; +EXPORT_SYMBOL(fpu_user_cfg); /* * Represents the initial FPU state. It's mostly (but not completely) zeroes, diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index 494d4d351859..aecebd6bc490 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -296,6 +296,10 @@ static void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu) vcpu->arch.guest_supported_xcr0 = cpuid_get_supported_xcr0(vcpu->arch.cpuid_entries, vcpu->arch.cpuid_nent); + /* Mask out features unsupported by guest */ + vcpu->arch.guest_fpu.fpstate->user_xfeatures = + fpu_user_cfg.default_features & vcpu->arch.guest_supported_xcr0; + kvm_update_pv_runtime(vcpu); vcpu->arch.maxphyaddr = cpuid_query_maxphyaddr(vcpu);
During host/guest switch (like in kvm_arch_vcpu_ioctl_run()), the kernel swaps the fpu between host/guest contexts, by using fpu_swap_kvm_fpstate(). When xsave feature is available, the fpu swap is done by: - xsave(s) instruction, with guest's fpstate->xfeatures as mask, is used to store the current state of the fpu registers to a buffer. - xrstor(s) instruction, with (fpu_kernel_cfg.max_features & XFEATURE_MASK_FPSTATE) as mask, is used to put the buffer into fpu regs. For xsave(s) the mask is used to limit what parts of the fpu regs will be copied to the buffer. Likewise on xrstor(s), the mask is used to limit what parts of the fpu regs will be changed. The mask for xsave(s), the guest's fpstate->xfeatures, is defined on kvm_arch_vcpu_create(), which (in summary) sets it to all features supported by the cpu which are enabled on kernel config. This means that xsave(s) will save to guest buffer all the fpu regs contents the cpu has enabled when the guest is paused, even if they are not used. This would not be an issue, if xrstor(s) would also do that. xrstor(s)'s mask for host/guest swap is basically every valid feature contained in kernel config, except XFEATURE_MASK_PKRU. Accordingto kernel src, it is instead switched in switch_to() and flush_thread(). Then, the following happens with a host supporting PKRU starts a guest that does not support it: 1 - Host has XFEATURE_MASK_PKRU set. 1st switch to guest, 2 - xsave(s) fpu regs to host fpustate (buffer has XFEATURE_MASK_PKRU) 3 - xrstor(s) guest fpustate to fpu regs (fpu regs have XFEATURE_MASK_PKRU) 4 - guest runs, then switch back to host, 5 - xsave(s) fpu regs to guest fpstate (buffer now have XFEATURE_MASK_PKRU) 6 - xrstor(s) host fpstate to fpu regs. 7 - kvm_vcpu_ioctl_x86_get_xsave() copy guest fpstate to userspace (with XFEATURE_MASK_PKRU, which should not be supported by guest vcpu) On 5, even though the guest does not support PKRU, it does have the flag set on guest fpstate, which is transferred to userspace via vcpu ioctl KVM_GET_XSAVE. This becomes a problem when the user decides on migrating the above guest to another machine that does not support PKRU: The new host restores guest's fpu regs to as they were before (xrstor(s)), but since the new host don't support PKRU, a general-protection exception ocurs in xrstor(s) and that crashes the guest. This can be solved by making the guest's fpstate->user_xfeatures only hold values compatible to guest_supported_xcr0. This way, on 7 the only flags copied to userspace will be the ones compatible to guest requirements, and thus there will be no issue during migration. As a bonus, will also fail if userspace tries to set fpu features that are not compatible to the guest configuration. (KVM_SET_XSAVE ioctl) Signed-off-by: Leonardo Bras <leobras@redhat.com> --- arch/x86/kernel/fpu/core.c | 1 + arch/x86/kvm/cpuid.c | 4 ++++ 2 files changed, 5 insertions(+)