Message ID | 20210903102039.55422-3-eesposit@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | KVM: nSVM: avoid TOC/TOU race when checking vmcb12 | expand |
On Fri, 2021-09-03 at 12:20 +0200, Emanuele Giuseppe Esposito wrote: > This is useful in next patch, to avoid having temporary > copies of vmcb12 registers and passing them manually. This is NOT what I had in mind, but I do like that idea very much, IMHO this is much better than what I had in mind! The only thing that I would change is that I woudn't reuse 'struct vmcb_save_area' for the copy, as this both wastes space (minor issue), and introduces a chance of someone later using non copied fields from it (can cause a bug later on). I would just define a new struct for that (but keep same names for readability) Maybe something like 'struct vmcb_save_area_cached'? > > Right now, instead of blindly copying everything, > we just copy EFER, CR0, CR3, CR4, DR6 and DR7. If more fields > will need to be added, it will be more obvious to see > that they must be added in copy_vmcb_save_area, > otherwise the checks will fail. > > Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> > --- > arch/x86/kvm/svm/nested.c | 24 ++++++++++++++++++++++++ > arch/x86/kvm/svm/svm.c | 1 + > arch/x86/kvm/svm/svm.h | 3 +++ > 3 files changed, 28 insertions(+) > > diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c > index d2fe65e2a7a4..2491c77203c7 100644 > --- a/arch/x86/kvm/svm/nested.c > +++ b/arch/x86/kvm/svm/nested.c > @@ -194,6 +194,22 @@ static void copy_vmcb_control_area(struct vmcb_control_area *dst, > dst->pause_filter_thresh = from->pause_filter_thresh; > } > > +static void copy_vmcb_save_area(struct vmcb_save_area *dst, > + struct vmcb_save_area *from) > +{ > + /* > + * Copy only necessary fields, as we need them > + * to avoid TOC/TOU races. > + */ > + dst->efer = from->efer; > + dst->cr0 = from->cr0; > + dst->cr3 = from->cr3; > + dst->cr4 = from->cr4; > + > + dst->dr6 = from->dr6; > + dst->dr7 = from->dr7; > +} > + > static bool nested_svm_vmrun_msrpm(struct vcpu_svm *svm) > { > /* > @@ -313,6 +329,12 @@ void nested_load_control_from_vmcb12(struct vcpu_svm *svm, > svm->nested.ctl.iopm_base_pa &= ~0x0fffULL; > } > > +void nested_load_save_from_vmcb12(struct vcpu_svm *svm, > + struct vmcb_save_area *save) > +{ > + copy_vmcb_save_area(&svm->nested.save, save); > +} > + > /* > * Synchronize fields that are written by the processor, so that > * they can be copied back into the vmcb12. > @@ -647,6 +669,7 @@ int nested_svm_vmrun(struct kvm_vcpu *vcpu) > return -EINVAL; > > nested_load_control_from_vmcb12(svm, &vmcb12->control); > + nested_load_save_from_vmcb12(svm, &vmcb12->save); > > if (!nested_vmcb_valid_sregs(vcpu, &vmcb12->save) || > !nested_vmcb_check_controls(vcpu, &svm->nested.ctl)) { > @@ -1385,6 +1408,7 @@ static int svm_set_nested_state(struct kvm_vcpu *vcpu, > > svm_copy_vmrun_state(&svm->vmcb01.ptr->save, save); > nested_load_control_from_vmcb12(svm, ctl); > + nested_load_save_from_vmcb12(svm, save); > > svm_switch_vmcb(svm, &svm->nested.vmcb02); > nested_vmcb02_prepare_control(svm); > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c > index 69639f9624f5..169b930322ef 100644 > --- a/arch/x86/kvm/svm/svm.c > +++ b/arch/x86/kvm/svm/svm.c > @@ -4386,6 +4386,7 @@ static int svm_leave_smm(struct kvm_vcpu *vcpu, const char *smstate) > vmcb12 = map.hva; > > nested_load_control_from_vmcb12(svm, &vmcb12->control); > + nested_load_save_from_vmcb12(svm, &vmcb12->save); > > ret = enter_svm_guest_mode(vcpu, vmcb12_gpa, vmcb12); > kvm_vcpu_unmap(vcpu, &map, true); > diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h > index bd0fe94c2920..6d12814cf64c 100644 > --- a/arch/x86/kvm/svm/svm.h > +++ b/arch/x86/kvm/svm/svm.h > @@ -119,6 +119,7 @@ struct svm_nested_state { > > /* cache for control fields of the guest */ > struct vmcb_control_area ctl; > + struct vmcb_save_area save; > > bool initialized; > }; > @@ -484,6 +485,8 @@ int nested_svm_check_exception(struct vcpu_svm *svm, unsigned nr, > int nested_svm_exit_special(struct vcpu_svm *svm); > void nested_load_control_from_vmcb12(struct vcpu_svm *svm, > struct vmcb_control_area *control); > +void nested_load_save_from_vmcb12(struct vcpu_svm *svm, > + struct vmcb_save_area *save); > void nested_sync_control_from_vmcb02(struct vcpu_svm *svm); > void nested_vmcb02_compute_g_pat(struct vcpu_svm *svm); > void svm_switch_vmcb(struct vcpu_svm *svm, struct kvm_vmcb_info *target_vmcb); So besides the struct comment: Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Best regards, Maxim Levitsky
On 12/09/21 12:39, Maxim Levitsky wrote: > On Fri, 2021-09-03 at 12:20 +0200, Emanuele Giuseppe Esposito wrote: >> This is useful in next patch, to avoid having temporary >> copies of vmcb12 registers and passing them manually. > > This is NOT what I had in mind, but I do like that idea very much, > IMHO this is much better than what I had in mind! > > The only thing that I would change is that I woudn't reuse 'struct vmcb_save_area' > for the copy, as this both wastes space (minor issue), > and introduces a chance of someone later using non copied > fields from it (can cause a bug later on). > > I would just define a new struct for that (but keep same names > for readability) > > Maybe something like 'struct vmcb_save_area_cached'? I agree, I like this too. However, it needs a comment that this new struct is not kept up-to-date, and is only valid until enter_svm_guest_mode. I might even propose a #ifdef CONFIG_DEBUG_KERNEL memset(&svm->nested.save, 0xaf, sizeof(svm->nested.save)); #endif but there are no uses of CONFIG_DEBUG_KERNEL in all of Linux so it's probably not the way one should use that symbol. Can anybody think of a similar alternative? Or should the memset simply be unconditional? Paolo
On Tue, Sep 28, 2021, Paolo Bonzini wrote: > On 12/09/21 12:39, Maxim Levitsky wrote: > > On Fri, 2021-09-03 at 12:20 +0200, Emanuele Giuseppe Esposito wrote: > > > This is useful in next patch, to avoid having temporary > > > copies of vmcb12 registers and passing them manually. > > > > This is NOT what I had in mind, but I do like that idea very much, > > IMHO this is much better than what I had in mind! > > > > The only thing that I would change is that I woudn't reuse 'struct vmcb_save_area' > > for the copy, as this both wastes space (minor issue), > > and introduces a chance of someone later using non copied > > fields from it (can cause a bug later on). > > > > I would just define a new struct for that (but keep same names > > for readability) > > > > Maybe something like 'struct vmcb_save_area_cached'? > > I agree, I like this too. However, it needs a comment that this new struct > is not kept up-to-date, and is only valid until enter_svm_guest_mode. > > I might even propose a > > #ifdef CONFIG_DEBUG_KERNEL > memset(&svm->nested.save, 0xaf, sizeof(svm->nested.save)); > #endif > > but there are no uses of CONFIG_DEBUG_KERNEL in all of Linux so it's > probably not the way one should use that symbol. Can anybody think of a > similar alternative? Or should the memset simply be unconditional? I still think this doesn't go far enough to prevent TOCTOU bugs, and in general KVM lacks a coherent design/approach in this area. Case in point, the next patch fails to handle at least one, probably more, TOCTOU bugs. CR3 is checked using KVM's copy (svm->nested.save) /* * These checks are also performed by KVM_SET_SREGS, * except that EFER.LMA is not checked by SVM against * CR0.PG && EFER.LME. */ if ((save->efer & EFER_LME) && (save->cr0 & X86_CR0_PG)) { if (CC(!(save->cr4 & X86_CR4_PAE)) || CC(!(save->cr0 & X86_CR0_PE)) || CC(kvm_vcpu_is_illegal_gpa(vcpu, save->cr3))) return false; } but KVM prepares vmcb02 and the MMU using the CR3 value directly from vmcb12. nested_vmcb02_prepare_control(svm); nested_vmcb02_prepare_save(svm, vmcb12); ret = nested_svm_load_cr3(&svm->vcpu, vmcb12->save.cr3, nested_npt_enabled(svm), true); I assume there is similar badness in nested_vmcb02_prepare_save()'s usage of vmcb12, but even if there isn't, IMO that it's even a question/possibility means KVM is broken. I.e. KVM should fully unmap L1's vmcb12 before doing _anything_. (1) map, (2) copy, (3) unmap, (4) check, (5) consume. Yes, there are performance gains to be had (or lost), but we need a fully correct/functional baseline before we start worrying about performance. E.g. the copy at step (2) can be optimized to copy only data that is not marked cleaned, but first we should have a version of KVM that has no optimizations and just copies the entire vmcb (or at least the chunks KVM consumes). On a related topic, this would be a good opportunity to resolve the naming discrepancies between VMX and SVM. VMX generally refers to vmcs12 as KVM's copy of L1's VMCS, whereas SVM generally refers to vmcb12 as the "direct" mapping of L1's VMCB. I'd prefer to go with VMX's terminology, i.e. rework nSVM to refer to the copy as vmcb12, but I'm more than a bit biased since I've spent so much time in nVMX,
On 29/09/21 00:23, Sean Christopherson wrote: > On a related topic, this would be a good opportunity to resolve the naming > discrepancies between VMX and SVM. VMX generally refers to vmcs12 as KVM's copy > of L1's VMCS, whereas SVM generally refers to vmcb12 as the "direct" mapping of > L1's VMCB. I'd prefer to go with VMX's terminology, i.e. rework nSVM to refer to > the copy as vmcb12, but I'm more than a bit biased since I've spent so much time > in nVMX, I agree, and I think Emanuele's patches are a step in the right direction. Once we ensure that all state in svm->nested is cached vmcb12 content, we can get rid of vmcb12 pointers in the functions. Paolo
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c index d2fe65e2a7a4..2491c77203c7 100644 --- a/arch/x86/kvm/svm/nested.c +++ b/arch/x86/kvm/svm/nested.c @@ -194,6 +194,22 @@ static void copy_vmcb_control_area(struct vmcb_control_area *dst, dst->pause_filter_thresh = from->pause_filter_thresh; } +static void copy_vmcb_save_area(struct vmcb_save_area *dst, + struct vmcb_save_area *from) +{ + /* + * Copy only necessary fields, as we need them + * to avoid TOC/TOU races. + */ + dst->efer = from->efer; + dst->cr0 = from->cr0; + dst->cr3 = from->cr3; + dst->cr4 = from->cr4; + + dst->dr6 = from->dr6; + dst->dr7 = from->dr7; +} + static bool nested_svm_vmrun_msrpm(struct vcpu_svm *svm) { /* @@ -313,6 +329,12 @@ void nested_load_control_from_vmcb12(struct vcpu_svm *svm, svm->nested.ctl.iopm_base_pa &= ~0x0fffULL; } +void nested_load_save_from_vmcb12(struct vcpu_svm *svm, + struct vmcb_save_area *save) +{ + copy_vmcb_save_area(&svm->nested.save, save); +} + /* * Synchronize fields that are written by the processor, so that * they can be copied back into the vmcb12. @@ -647,6 +669,7 @@ int nested_svm_vmrun(struct kvm_vcpu *vcpu) return -EINVAL; nested_load_control_from_vmcb12(svm, &vmcb12->control); + nested_load_save_from_vmcb12(svm, &vmcb12->save); if (!nested_vmcb_valid_sregs(vcpu, &vmcb12->save) || !nested_vmcb_check_controls(vcpu, &svm->nested.ctl)) { @@ -1385,6 +1408,7 @@ static int svm_set_nested_state(struct kvm_vcpu *vcpu, svm_copy_vmrun_state(&svm->vmcb01.ptr->save, save); nested_load_control_from_vmcb12(svm, ctl); + nested_load_save_from_vmcb12(svm, save); svm_switch_vmcb(svm, &svm->nested.vmcb02); nested_vmcb02_prepare_control(svm); diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index 69639f9624f5..169b930322ef 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -4386,6 +4386,7 @@ static int svm_leave_smm(struct kvm_vcpu *vcpu, const char *smstate) vmcb12 = map.hva; nested_load_control_from_vmcb12(svm, &vmcb12->control); + nested_load_save_from_vmcb12(svm, &vmcb12->save); ret = enter_svm_guest_mode(vcpu, vmcb12_gpa, vmcb12); kvm_vcpu_unmap(vcpu, &map, true); diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h index bd0fe94c2920..6d12814cf64c 100644 --- a/arch/x86/kvm/svm/svm.h +++ b/arch/x86/kvm/svm/svm.h @@ -119,6 +119,7 @@ struct svm_nested_state { /* cache for control fields of the guest */ struct vmcb_control_area ctl; + struct vmcb_save_area save; bool initialized; }; @@ -484,6 +485,8 @@ int nested_svm_check_exception(struct vcpu_svm *svm, unsigned nr, int nested_svm_exit_special(struct vcpu_svm *svm); void nested_load_control_from_vmcb12(struct vcpu_svm *svm, struct vmcb_control_area *control); +void nested_load_save_from_vmcb12(struct vcpu_svm *svm, + struct vmcb_save_area *save); void nested_sync_control_from_vmcb02(struct vcpu_svm *svm); void nested_vmcb02_compute_g_pat(struct vcpu_svm *svm); void svm_switch_vmcb(struct vcpu_svm *svm, struct kvm_vmcb_info *target_vmcb);
This is useful in next patch, to avoid having temporary copies of vmcb12 registers and passing them manually. Right now, instead of blindly copying everything, we just copy EFER, CR0, CR3, CR4, DR6 and DR7. If more fields will need to be added, it will be more obvious to see that they must be added in copy_vmcb_save_area, otherwise the checks will fail. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> --- arch/x86/kvm/svm/nested.c | 24 ++++++++++++++++++++++++ arch/x86/kvm/svm/svm.c | 1 + arch/x86/kvm/svm/svm.h | 3 +++ 3 files changed, 28 insertions(+)