Message ID | 20220205081658.562208-2-leobras@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | x86/kvm/fpu: Fix guest migration bugs that can crash guest | expand |
On 2/5/22 09:16, Leonardo Bras wrote: > During host/guest switch (like in kvm_arch_vcpu_ioctl_run()), the kernel > swaps the fpu between host/guest contexts, by using fpu_swap_kvm_fpstate(). > > When xsave feature is available, the fpu swap is done by: > - xsave(s) instruction, with guest's fpstate->xfeatures as mask, is used > to store the current state of the fpu registers to a buffer. > - xrstor(s) instruction, with (fpu_kernel_cfg.max_features & > XFEATURE_MASK_FPSTATE) as mask, is used to put the buffer into fpu regs. > > For xsave(s) the mask is used to limit what parts of the fpu regs will > be copied to the buffer. Likewise on xrstor(s), the mask is used to > limit what parts of the fpu regs will be changed. > > The mask for xsave(s), the guest's fpstate->xfeatures, is defined on > kvm_arch_vcpu_create(), which (in summary) sets it to all features > supported by the cpu which are enabled on kernel config. > > This means that xsave(s) will save to guest buffer all the fpu regs > contents the cpu has enabled when the guest is paused, even if they > are not used. > > This would not be an issue, if xrstor(s) would also do that. > > xrstor(s)'s mask for host/guest swap is basically every valid feature > contained in kernel config, except XFEATURE_MASK_PKRU. > Accordingto kernel src, it is instead switched in switch_to() and > flush_thread(). Hi Leonardo, is this an issue when patch 2 is applied? With this patch, we have to reason about the effect of calling KVM_SET_CPUID2 twice calls back to back. I think an "&=" would be wrong in that case. On the other hand, with patch 2 the change is only in the KVM_SET_XSAVE output, which is much more self-contained. Thanks, Paolo > Then, the following happens with a host supporting PKRU starts a > guest that does not support it: > 1 - Host has XFEATURE_MASK_PKRU set. 1st switch to guest, > 2 - xsave(s) fpu regs to host fpustate (buffer has XFEATURE_MASK_PKRU) > 3 - xrstor(s) guest fpustate to fpu regs (fpu regs have XFEATURE_MASK_PKRU) > 4 - guest runs, then switch back to host, > 5 - xsave(s) fpu regs to guest fpstate (buffer now have XFEATURE_MASK_PKRU) > 6 - xrstor(s) host fpstate to fpu regs. > > On 5, even though the guest does not support PKRU, it does have the flag > set on guest fpstate, which is transferred to userspace via vcpu ioctl > KVM_GET_XSAVE. > > This becomes a problem when the user decides on migrating the above guest > to another machine that does not support PKRU: > The new host restores guest's fpu regs to as they were before (xrstor(s)), > but since the new host don't support PKRU, a general-protection exception > ocurs in xrstor(s) and that crashes the guest. > > This can be solved by making the guest's fpstate->xfeatures only hold > values compatible to guest_supported_xcr0. This way, on 5 the only flags > saved by xsave(s) will be the ones compatible to guest requirements, > and thus there will be no issue during migration. > > Signed-off-by: Leonardo Bras <leobras@redhat.com> > --- > arch/x86/kvm/cpuid.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c > index 28be02adc669..8ce481cc0f9b 100644 > --- a/arch/x86/kvm/cpuid.c > +++ b/arch/x86/kvm/cpuid.c > @@ -296,6 +296,9 @@ static void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu) > vcpu->arch.guest_supported_xcr0 = > cpuid_get_supported_xcr0(vcpu->arch.cpuid_entries, vcpu->arch.cpuid_nent); > > + /* Mask out features unsupported by guest */ > + vcpu->arch.guest_fpu.fpstate->xfeatures &= vcpu->arch.guest_supported_xcr0; > + > kvm_update_pv_runtime(vcpu); > > vcpu->arch.maxphyaddr = cpuid_query_maxphyaddr(vcpu);
Hello Paolo, On Mon, Feb 7, 2022 at 10:30 AM Paolo Bonzini <pbonzini@redhat.com> wrote: > > On 2/5/22 09:16, Leonardo Bras wrote: > > During host/guest switch (like in kvm_arch_vcpu_ioctl_run()), the kernel > > swaps the fpu between host/guest contexts, by using fpu_swap_kvm_fpstate(). > > > > When xsave feature is available, the fpu swap is done by: > > - xsave(s) instruction, with guest's fpstate->xfeatures as mask, is used > > to store the current state of the fpu registers to a buffer. > > - xrstor(s) instruction, with (fpu_kernel_cfg.max_features & > > XFEATURE_MASK_FPSTATE) as mask, is used to put the buffer into fpu regs. > > > > For xsave(s) the mask is used to limit what parts of the fpu regs will > > be copied to the buffer. Likewise on xrstor(s), the mask is used to > > limit what parts of the fpu regs will be changed. > > > > The mask for xsave(s), the guest's fpstate->xfeatures, is defined on > > kvm_arch_vcpu_create(), which (in summary) sets it to all features > > supported by the cpu which are enabled on kernel config. > > > > This means that xsave(s) will save to guest buffer all the fpu regs > > contents the cpu has enabled when the guest is paused, even if they > > are not used. > > > > This would not be an issue, if xrstor(s) would also do that. > > > > xrstor(s)'s mask for host/guest swap is basically every valid feature > > contained in kernel config, except XFEATURE_MASK_PKRU. > > According to kernel src, it is instead switched in switch_to() and > > flush_thread(). > > Hi Leonardo, is this an issue when patch 2 is applied? Yes. This issue happens on host/guest context switch, instead of KVM_{GET,SET}_XSAVE, so this bug will be triggered whenever the guest doesn't support PKRU but the host does, without any interference of above IOCTLs. In fact, IIUC, even if we are able to fix the feature bit with KVM_SET_XSAVE, it would come back after another host/guest context switch if we don't fix vcpu->arch.guest_fpu.fpstate->xfeatures. > With this patch, > we have to reason about the effect of calling KVM_SET_CPUID2 twice calls > back to back. I think an "&=" would be wrong in that case. So, you suggest something like this ? vcpu->arch.guest_fpu.fpstate->xfeatures = fpu_user_cfg.default_features & vcpu->arch.guest_supported_xcr0; > > On the other hand, with patch 2 the change is only in the KVM_SET_XSAVE > output, which is much more self-contained. Agree, but they solve different sources of the same issue. Patch 2 will only address a bug that can happen if userspace mistakenly tries to set a feature the guest does not support. > > Thanks, Thank you! Best regards, Leo [...]
On 2/7/22 21:24, Leonardo Bras Soares Passos wrote: >> With this patch, >> we have to reason about the effect of calling KVM_SET_CPUID2 twice calls >> back to back. I think an "&=" would be wrong in that case. > > So, you suggest something like this ? > > vcpu->arch.guest_fpu.fpstate->xfeatures = > fpu_user_cfg.default_features & vcpu->arch.guest_supported_xcr0; > Yes, but you need to change user_xfeatures instead of xfeatures. KVM_GET_XSAVE and KVM_SET_XSAVE will take it into account automatically: - KVM_GET_XSAVE: fpu_copy_guest_fpstate_to_uabi -> __copy_xstate_to_uabi_buf - KVM_SET_XSAVE: fpu_copy_uabi_to_guest_fpstate -> copy_uabi_from_kernel_to_xstate -> copy_uabi_to_xstate -> validate_user_xstate_buffer Paolo
On Mon, Feb 7, 2022 at 6:00 PM Paolo Bonzini <pbonzini@redhat.com> wrote: > > On 2/7/22 21:24, Leonardo Bras Soares Passos wrote: > >> With this patch, > >> we have to reason about the effect of calling KVM_SET_CPUID2 twice calls > >> back to back. I think an "&=" would be wrong in that case. > > > > So, you suggest something like this ? > > > > vcpu->arch.guest_fpu.fpstate->xfeatures = > > fpu_user_cfg.default_features & vcpu->arch.guest_supported_xcr0; > > > > Yes, but you need to change user_xfeatures instead of xfeatures. > KVM_GET_XSAVE and KVM_SET_XSAVE will take it into account automatically: > > - KVM_GET_XSAVE: fpu_copy_guest_fpstate_to_uabi -> __copy_xstate_to_uabi_buf > > - KVM_SET_XSAVE: fpu_copy_uabi_to_guest_fpstate -> > copy_uabi_from_kernel_to_xstate -> copy_uabi_to_xstate -> > validate_user_xstate_buffer Ok, I understand how this replaces patch 2/2, so no issue on that. About patch 1/2, you suggest that instead of fixing what we save in the regs buffer, we fix only what we want to return to the user when they call KVM_GET_XSAVE, is that correct? > > Paolo >
On 2/7/22 23:45, Leonardo Bras Soares Passos wrote: > On Mon, Feb 7, 2022 at 6:00 PM Paolo Bonzini <pbonzini@redhat.com> wrote: >> >> On 2/7/22 21:24, Leonardo Bras Soares Passos wrote: >>>> With this patch, >>>> we have to reason about the effect of calling KVM_SET_CPUID2 twice calls >>>> back to back. I think an "&=" would be wrong in that case. >>> >>> So, you suggest something like this ? >>> >>> vcpu->arch.guest_fpu.fpstate->xfeatures = >>> fpu_user_cfg.default_features & vcpu->arch.guest_supported_xcr0; >>> >> >> Yes, but you need to change user_xfeatures instead of xfeatures. >> KVM_GET_XSAVE and KVM_SET_XSAVE will take it into account automatically: >> >> - KVM_GET_XSAVE: fpu_copy_guest_fpstate_to_uabi -> __copy_xstate_to_uabi_buf >> >> - KVM_SET_XSAVE: fpu_copy_uabi_to_guest_fpstate -> >> copy_uabi_from_kernel_to_xstate -> copy_uabi_to_xstate -> >> validate_user_xstate_buffer > > > Ok, I understand how this replaces patch 2/2, so no issue on that. > > About patch 1/2, you suggest that instead of fixing what we save in > the regs buffer, we fix only what we want to return to the user when > they call KVM_GET_XSAVE, is that correct? Yes, exactly. Paolo
On Mon, Feb 7, 2022 at 7:59 PM Paolo Bonzini <pbonzini@redhat.com> wrote: > > On 2/7/22 23:45, Leonardo Bras Soares Passos wrote: > > On Mon, Feb 7, 2022 at 6:00 PM Paolo Bonzini <pbonzini@redhat.com> wrote: > >> > >> On 2/7/22 21:24, Leonardo Bras Soares Passos wrote: > >>>> With this patch, > >>>> we have to reason about the effect of calling KVM_SET_CPUID2 twice calls > >>>> back to back. I think an "&=" would be wrong in that case. > >>> > >>> So, you suggest something like this ? > >>> > >>> vcpu->arch.guest_fpu.fpstate->xfeatures = > >>> fpu_user_cfg.default_features & vcpu->arch.guest_supported_xcr0; > >>> > >> > >> Yes, but you need to change user_xfeatures instead of xfeatures. > >> KVM_GET_XSAVE and KVM_SET_XSAVE will take it into account automatically: > >> > >> - KVM_GET_XSAVE: fpu_copy_guest_fpstate_to_uabi -> __copy_xstate_to_uabi_buf > >> > >> - KVM_SET_XSAVE: fpu_copy_uabi_to_guest_fpstate -> > >> copy_uabi_from_kernel_to_xstate -> copy_uabi_to_xstate -> > >> validate_user_xstate_buffer > > > > > > Ok, I understand how this replaces patch 2/2, so no issue on that. > > > > About patch 1/2, you suggest that instead of fixing what we save in > > the regs buffer, we fix only what we want to return to the user when > > they call KVM_GET_XSAVE, is that correct? > > Yes, exactly. Thanks! I will update my patch and send a v2 shortly. I got really curious while I was debugging this issue: - Is it ok that the cpu has other features enabled (like PKRU), while our vcpu does not have them? - Should guest OS always use the cpuid for checking features available? - Would it be better if we could have exactly the same fpu features enabled in the cpu, as we have in the vcpu? - Why do we xsave with a mask different from what we xrstor ? > > Paolo >
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index 28be02adc669..8ce481cc0f9b 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -296,6 +296,9 @@ static void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu) vcpu->arch.guest_supported_xcr0 = cpuid_get_supported_xcr0(vcpu->arch.cpuid_entries, vcpu->arch.cpuid_nent); + /* Mask out features unsupported by guest */ + vcpu->arch.guest_fpu.fpstate->xfeatures &= vcpu->arch.guest_supported_xcr0; + kvm_update_pv_runtime(vcpu); vcpu->arch.maxphyaddr = cpuid_query_maxphyaddr(vcpu);
During host/guest switch (like in kvm_arch_vcpu_ioctl_run()), the kernel swaps the fpu between host/guest contexts, by using fpu_swap_kvm_fpstate(). When xsave feature is available, the fpu swap is done by: - xsave(s) instruction, with guest's fpstate->xfeatures as mask, is used to store the current state of the fpu registers to a buffer. - xrstor(s) instruction, with (fpu_kernel_cfg.max_features & XFEATURE_MASK_FPSTATE) as mask, is used to put the buffer into fpu regs. For xsave(s) the mask is used to limit what parts of the fpu regs will be copied to the buffer. Likewise on xrstor(s), the mask is used to limit what parts of the fpu regs will be changed. The mask for xsave(s), the guest's fpstate->xfeatures, is defined on kvm_arch_vcpu_create(), which (in summary) sets it to all features supported by the cpu which are enabled on kernel config. This means that xsave(s) will save to guest buffer all the fpu regs contents the cpu has enabled when the guest is paused, even if they are not used. This would not be an issue, if xrstor(s) would also do that. xrstor(s)'s mask for host/guest swap is basically every valid feature contained in kernel config, except XFEATURE_MASK_PKRU. Accordingto kernel src, it is instead switched in switch_to() and flush_thread(). Then, the following happens with a host supporting PKRU starts a guest that does not support it: 1 - Host has XFEATURE_MASK_PKRU set. 1st switch to guest, 2 - xsave(s) fpu regs to host fpustate (buffer has XFEATURE_MASK_PKRU) 3 - xrstor(s) guest fpustate to fpu regs (fpu regs have XFEATURE_MASK_PKRU) 4 - guest runs, then switch back to host, 5 - xsave(s) fpu regs to guest fpstate (buffer now have XFEATURE_MASK_PKRU) 6 - xrstor(s) host fpstate to fpu regs. On 5, even though the guest does not support PKRU, it does have the flag set on guest fpstate, which is transferred to userspace via vcpu ioctl KVM_GET_XSAVE. This becomes a problem when the user decides on migrating the above guest to another machine that does not support PKRU: The new host restores guest's fpu regs to as they were before (xrstor(s)), but since the new host don't support PKRU, a general-protection exception ocurs in xrstor(s) and that crashes the guest. This can be solved by making the guest's fpstate->xfeatures only hold values compatible to guest_supported_xcr0. This way, on 5 the only flags saved by xsave(s) will be the ones compatible to guest requirements, and thus there will be no issue during migration. Signed-off-by: Leonardo Bras <leobras@redhat.com> --- arch/x86/kvm/cpuid.c | 3 +++ 1 file changed, 3 insertions(+)