Message ID | 20210401141814.1029036-2-mlevitsk@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Introduce KVM_{GET|SET}_SREGS2 and fix PDPTR migration | expand |
On 01/04/21 16:18, Maxim Levitsky wrote: > Similar to the rest of guest page accesses after migration, > this should be delayed to KVM_REQ_GET_NESTED_STATE_PAGES > request. > > Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> > --- > arch/x86/kvm/vmx/nested.c | 14 +++++++++----- > 1 file changed, 9 insertions(+), 5 deletions(-) > > diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c > index fd334e4aa6db..b44f1f6b68db 100644 > --- a/arch/x86/kvm/vmx/nested.c > +++ b/arch/x86/kvm/vmx/nested.c > @@ -2564,11 +2564,6 @@ static int prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12, > return -EINVAL; > } > > - /* Shadow page tables on either EPT or shadow page tables. */ > - if (nested_vmx_load_cr3(vcpu, vmcs12->guest_cr3, nested_cpu_has_ept(vmcs12), > - entry_failure_code)) > - return -EINVAL; > - > /* > * Immediately write vmcs02.GUEST_CR3. It will be propagated to vmcs12 > * on nested VM-Exit, which can occur without actually running L2 and > @@ -3109,11 +3104,16 @@ static bool nested_get_evmcs_page(struct kvm_vcpu *vcpu) > static bool nested_get_vmcs12_pages(struct kvm_vcpu *vcpu) > { > struct vmcs12 *vmcs12 = get_vmcs12(vcpu); > + enum vm_entry_failure_code entry_failure_code; > struct vcpu_vmx *vmx = to_vmx(vcpu); > struct kvm_host_map *map; > struct page *page; > u64 hpa; > > + if (nested_vmx_load_cr3(vcpu, vmcs12->guest_cr3, nested_cpu_has_ept(vmcs12), > + &entry_failure_code)) > + return false; > + > if (nested_cpu_has2(vmcs12, SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES)) { > /* > * Translate L1 physical address to host physical > @@ -3357,6 +3357,10 @@ enum nvmx_vmentry_status nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu, > } > > if (from_vmentry) { > + if (nested_vmx_load_cr3(vcpu, vmcs12->guest_cr3, > + nested_cpu_has_ept(vmcs12), &entry_failure_code)) > + goto vmentry_fail_vmexit_guest_mode; > + > failed_index = nested_vmx_load_msr(vcpu, > vmcs12->vm_entry_msr_load_addr, > vmcs12->vm_entry_msr_load_count); > Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
On Thu, Apr 01, 2021, Maxim Levitsky wrote: > Similar to the rest of guest page accesses after migration, > this should be delayed to KVM_REQ_GET_NESTED_STATE_PAGES > request. FWIW, I still object to this approach, and this patch has a plethora of issues. I'm not against deferring various state loading to KVM_RUN, but wholesale moving all of GUEST_CR3 processing without in-depth consideration of all the side effects is a really bad idea. > Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> > --- > arch/x86/kvm/vmx/nested.c | 14 +++++++++----- > 1 file changed, 9 insertions(+), 5 deletions(-) > > diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c > index fd334e4aa6db..b44f1f6b68db 100644 > --- a/arch/x86/kvm/vmx/nested.c > +++ b/arch/x86/kvm/vmx/nested.c > @@ -2564,11 +2564,6 @@ static int prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12, > return -EINVAL; > } > > - /* Shadow page tables on either EPT or shadow page tables. */ > - if (nested_vmx_load_cr3(vcpu, vmcs12->guest_cr3, nested_cpu_has_ept(vmcs12), > - entry_failure_code)) > - return -EINVAL; > - > /* > * Immediately write vmcs02.GUEST_CR3. It will be propagated to vmcs12 > * on nested VM-Exit, which can occur without actually running L2 and > @@ -3109,11 +3104,16 @@ static bool nested_get_evmcs_page(struct kvm_vcpu *vcpu) > static bool nested_get_vmcs12_pages(struct kvm_vcpu *vcpu) > { > struct vmcs12 *vmcs12 = get_vmcs12(vcpu); > + enum vm_entry_failure_code entry_failure_code; > struct vcpu_vmx *vmx = to_vmx(vcpu); > struct kvm_host_map *map; > struct page *page; > u64 hpa; > > + if (nested_vmx_load_cr3(vcpu, vmcs12->guest_cr3, nested_cpu_has_ept(vmcs12), > + &entry_failure_code)) This results in KVM_RUN returning 0 without filling vcpu->run->exit_reason. Speaking from experience, debugging those types of issues is beyond painful. It also means CR3 is double loaded in the from_vmentry case. And it will cause KVM to incorrectly return NVMX_VMENTRY_KVM_INTERNAL_ERROR if a consistency check fails when nested_get_vmcs12_pages() is called on from_vmentry. E.g. run unit tests with this and it will silently disappear. diff --git a/x86/vmx_tests.c b/x86/vmx_tests.c index bbb006a..b8ccc69 100644 --- a/x86/vmx_tests.c +++ b/x86/vmx_tests.c @@ -8172,6 +8172,16 @@ static void test_guest_segment_base_addr_fields(void) vmcs_write(GUEST_AR_ES, ar_saved); } +static void test_guest_cr3(void) +{ + u64 cr3_saved = vmcs_read(GUEST_CR3); + + vmcs_write(GUEST_CR3, -1ull); + test_guest_state("Bad CR3 fails VM-Enter", true, -1ull, "GUEST_CR3"); + + vmcs_write(GUEST_DR7, cr3_saved); +} + /* * Check that the virtual CPU checks the VMX Guest State Area as * documented in the Intel SDM. @@ -8181,6 +8191,8 @@ static void vmx_guest_state_area_test(void) vmx_set_test_stage(1); test_set_guest(guest_state_test_main); + test_guest_cr3(); + /* * The IA32_SYSENTER_ESP field and the IA32_SYSENTER_EIP field * must each contain a canonical address. > + return false; > + > if (nested_cpu_has2(vmcs12, SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES)) { > /* > * Translate L1 physical address to host physical > @@ -3357,6 +3357,10 @@ enum nvmx_vmentry_status nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu, > } > > if (from_vmentry) { > + if (nested_vmx_load_cr3(vcpu, vmcs12->guest_cr3, > + nested_cpu_has_ept(vmcs12), &entry_failure_code)) This alignment is messed up; it looks like two separate function calls. > + goto vmentry_fail_vmexit_guest_mode; > + > failed_index = nested_vmx_load_msr(vcpu, > vmcs12->vm_entry_msr_load_addr, > vmcs12->vm_entry_msr_load_count); > -- > 2.26.2 >
On Fri, 2021-04-02 at 17:27 +0000, Sean Christopherson wrote: > On Thu, Apr 01, 2021, Maxim Levitsky wrote: > > Similar to the rest of guest page accesses after migration, > > this should be delayed to KVM_REQ_GET_NESTED_STATE_PAGES > > request. > > FWIW, I still object to this approach, and this patch has a plethora of issues. > > I'm not against deferring various state loading to KVM_RUN, but wholesale moving > all of GUEST_CR3 processing without in-depth consideration of all the side > effects is a really bad idea. It could be, I won't argue about this. > > > Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> > > --- > > arch/x86/kvm/vmx/nested.c | 14 +++++++++----- > > 1 file changed, 9 insertions(+), 5 deletions(-) > > > > diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c > > index fd334e4aa6db..b44f1f6b68db 100644 > > --- a/arch/x86/kvm/vmx/nested.c > > +++ b/arch/x86/kvm/vmx/nested.c > > @@ -2564,11 +2564,6 @@ static int prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12, > > return -EINVAL; > > } > > > > - /* Shadow page tables on either EPT or shadow page tables. */ > > - if (nested_vmx_load_cr3(vcpu, vmcs12->guest_cr3, nested_cpu_has_ept(vmcs12), > > - entry_failure_code)) > > - return -EINVAL; > > - > > /* > > * Immediately write vmcs02.GUEST_CR3. It will be propagated to vmcs12 > > * on nested VM-Exit, which can occur without actually running L2 and > > @@ -3109,11 +3104,16 @@ static bool nested_get_evmcs_page(struct kvm_vcpu *vcpu) > > static bool nested_get_vmcs12_pages(struct kvm_vcpu *vcpu) > > { > > struct vmcs12 *vmcs12 = get_vmcs12(vcpu); > > + enum vm_entry_failure_code entry_failure_code; > > struct vcpu_vmx *vmx = to_vmx(vcpu); > > struct kvm_host_map *map; > > struct page *page; > > u64 hpa; > > > > + if (nested_vmx_load_cr3(vcpu, vmcs12->guest_cr3, nested_cpu_has_ept(vmcs12), > > + &entry_failure_code)) > > This results in KVM_RUN returning 0 without filling vcpu->run->exit_reason. > Speaking from experience, debugging those types of issues is beyond painful. > > It also means CR3 is double loaded in the from_vmentry case. > > And it will cause KVM to incorrectly return NVMX_VMENTRY_KVM_INTERNAL_ERROR > if a consistency check fails when nested_get_vmcs12_pages() is called on > from_vmentry. E.g. run unit tests with this and it will silently disappear. I do remember now that you said something about this, but I wasn't able to find it in my email. Sorry about this. I agree with you. I think that a question I should ask is why do we really need to delay accessing guest memory after a migration. So far I mostly just assumed that we need to do so, thinking that qemu updates the memslots or something, or maybe because guest memory isn't fully migrated and relies on post-copy to finish it. Also I am not against leaving CR3 processing in here and doing only PDPTR load in KVM_RUN (and only when *SREG2 API is not used). > > diff --git a/x86/vmx_tests.c b/x86/vmx_tests.c > index bbb006a..b8ccc69 100644 > --- a/x86/vmx_tests.c > +++ b/x86/vmx_tests.c > @@ -8172,6 +8172,16 @@ static void test_guest_segment_base_addr_fields(void) > vmcs_write(GUEST_AR_ES, ar_saved); > } > > +static void test_guest_cr3(void) > +{ > + u64 cr3_saved = vmcs_read(GUEST_CR3); > + > + vmcs_write(GUEST_CR3, -1ull); > + test_guest_state("Bad CR3 fails VM-Enter", true, -1ull, "GUEST_CR3"); > + > + vmcs_write(GUEST_DR7, cr3_saved); > +} > + Could you send this test to kvm unit tests? > /* > * Check that the virtual CPU checks the VMX Guest State Area as > * documented in the Intel SDM. > @@ -8181,6 +8191,8 @@ static void vmx_guest_state_area_test(void) > vmx_set_test_stage(1); > test_set_guest(guest_state_test_main); > > + test_guest_cr3(); > + > /* > * The IA32_SYSENTER_ESP field and the IA32_SYSENTER_EIP field > * must each contain a canonical address. > > > > + return false; > > + > > if (nested_cpu_has2(vmcs12, SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES)) { > > /* > > * Translate L1 physical address to host physical > > @@ -3357,6 +3357,10 @@ enum nvmx_vmentry_status nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu, > > } > > > > if (from_vmentry) { > > + if (nested_vmx_load_cr3(vcpu, vmcs12->guest_cr3, > > + nested_cpu_has_ept(vmcs12), &entry_failure_code)) > > This alignment is messed up; it looks like two separate function calls. Sorry about this, I see it now. > > > + goto vmentry_fail_vmexit_guest_mode; > > + > > failed_index = nested_vmx_load_msr(vcpu, > > vmcs12->vm_entry_msr_load_addr, > > vmcs12->vm_entry_msr_load_count); > > -- > > 2.26.2 > > Best regards, Maxim Levitsky
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c index fd334e4aa6db..b44f1f6b68db 100644 --- a/arch/x86/kvm/vmx/nested.c +++ b/arch/x86/kvm/vmx/nested.c @@ -2564,11 +2564,6 @@ static int prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12, return -EINVAL; } - /* Shadow page tables on either EPT or shadow page tables. */ - if (nested_vmx_load_cr3(vcpu, vmcs12->guest_cr3, nested_cpu_has_ept(vmcs12), - entry_failure_code)) - return -EINVAL; - /* * Immediately write vmcs02.GUEST_CR3. It will be propagated to vmcs12 * on nested VM-Exit, which can occur without actually running L2 and @@ -3109,11 +3104,16 @@ static bool nested_get_evmcs_page(struct kvm_vcpu *vcpu) static bool nested_get_vmcs12_pages(struct kvm_vcpu *vcpu) { struct vmcs12 *vmcs12 = get_vmcs12(vcpu); + enum vm_entry_failure_code entry_failure_code; struct vcpu_vmx *vmx = to_vmx(vcpu); struct kvm_host_map *map; struct page *page; u64 hpa; + if (nested_vmx_load_cr3(vcpu, vmcs12->guest_cr3, nested_cpu_has_ept(vmcs12), + &entry_failure_code)) + return false; + if (nested_cpu_has2(vmcs12, SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES)) { /* * Translate L1 physical address to host physical @@ -3357,6 +3357,10 @@ enum nvmx_vmentry_status nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu, } if (from_vmentry) { + if (nested_vmx_load_cr3(vcpu, vmcs12->guest_cr3, + nested_cpu_has_ept(vmcs12), &entry_failure_code)) + goto vmentry_fail_vmexit_guest_mode; + failed_index = nested_vmx_load_msr(vcpu, vmcs12->vm_entry_msr_load_addr, vmcs12->vm_entry_msr_load_count);
Similar to the rest of guest page accesses after migration, this should be delayed to KVM_REQ_GET_NESTED_STATE_PAGES request. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> --- arch/x86/kvm/vmx/nested.c | 14 +++++++++----- 1 file changed, 9 insertions(+), 5 deletions(-)