Message ID | 20210105143749.557054-2-michael.roth@amd.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | KVM: SVM: Refactor vcpu_load/put to use vmload/vmsave for host state | expand |
On Tue, Jan 05, 2021, Michael Roth wrote: > @@ -3703,16 +3688,9 @@ static noinstr void svm_vcpu_enter_exit(struct kvm_vcpu *vcpu, > if (sev_es_guest(svm->vcpu.kvm)) { > __svm_sev_es_vcpu_run(svm->vmcb_pa); > } else { > - __svm_vcpu_run(svm->vmcb_pa, (unsigned long *)&svm->vcpu.arch.regs); > - > -#ifdef CONFIG_X86_64 > - native_wrmsrl(MSR_GS_BASE, svm->host.gs_base); > -#else > - loadsegment(fs, svm->host.fs); > -#ifndef CONFIG_X86_32_LAZY_GS > - loadsegment(gs, svm->host.gs); > -#endif > -#endif > + __svm_vcpu_run(svm->vmcb_pa, (unsigned long *)&svm->vcpu.arch.regs, > + page_to_phys(per_cpu(svm_data, > + vcpu->cpu)->save_area)); Does this need to use __sme_page_pa()? > } > > /* ... > diff --git a/arch/x86/kvm/svm/vmenter.S b/arch/x86/kvm/svm/vmenter.S > index 6feb8c08f45a..89f4e8e7bf0e 100644 > --- a/arch/x86/kvm/svm/vmenter.S > +++ b/arch/x86/kvm/svm/vmenter.S > @@ -33,6 +33,7 @@ > * __svm_vcpu_run - Run a vCPU via a transition to SVM guest mode > * @vmcb_pa: unsigned long > * @regs: unsigned long * (to guest registers) > + * @hostsa_pa: unsigned long > */ > SYM_FUNC_START(__svm_vcpu_run) > push %_ASM_BP > @@ -47,6 +48,9 @@ SYM_FUNC_START(__svm_vcpu_run) > #endif > push %_ASM_BX > > + /* Save @hostsa_pa */ > + push %_ASM_ARG3 > + > /* Save @regs. */ > push %_ASM_ARG2 > > @@ -154,6 +158,12 @@ SYM_FUNC_START(__svm_vcpu_run) > xor %r15d, %r15d > #endif > > + /* "POP" @hostsa_pa to RAX. */ > + pop %_ASM_AX > + > + /* Restore host user state and FS/GS base */ > + vmload %_ASM_AX This VMLOAD needs the "handle fault on reboot" goo. Seeing the code, I think I'd prefer to handle this in C code, especially if Paolo takes the svm_ops.h patch[*]. Actually, I think with that patch it'd make sense to move the existing VMSAVE+VMLOAD for the guest into svm.c, too. And completely unrelated, the fault handling in svm/vmenter.S can be cleaned up a smidge to eliminate the JMPs. Paolo, what do you think about me folding these patches into my series to do the above cleanups? And maybe sending a pull request for the end result? (I'd also like to add on a patch to use the user return MSR mechanism for MSR_TSC_AUX). [*] https://lkml.kernel.org/r/20201231002702.2223707-8-seanjc@google.com > + > pop %_ASM_BX > > #ifdef CONFIG_X86_64 > -- > 2.25.1 >
On 1/5/21 11:20 AM, Sean Christopherson wrote: > On Tue, Jan 05, 2021, Michael Roth wrote: >> @@ -3703,16 +3688,9 @@ static noinstr void svm_vcpu_enter_exit(struct kvm_vcpu *vcpu, >> if (sev_es_guest(svm->vcpu.kvm)) { >> __svm_sev_es_vcpu_run(svm->vmcb_pa); >> } else { >> - __svm_vcpu_run(svm->vmcb_pa, (unsigned long *)&svm->vcpu.arch.regs); >> - >> -#ifdef CONFIG_X86_64 >> - native_wrmsrl(MSR_GS_BASE, svm->host.gs_base); >> -#else >> - loadsegment(fs, svm->host.fs); >> -#ifndef CONFIG_X86_32_LAZY_GS >> - loadsegment(gs, svm->host.gs); >> -#endif >> -#endif >> + __svm_vcpu_run(svm->vmcb_pa, (unsigned long *)&svm->vcpu.arch.regs, >> + page_to_phys(per_cpu(svm_data, >> + vcpu->cpu)->save_area)); > > Does this need to use __sme_page_pa()? Yes, it should now. The SEV-ES support added the SME encryption bit to the MSR_VM_HSAVE_PA MSR, so we need to be consistent in how the data is read and written. Thanks, Tom > >> } >> >> /* > > ... > >> diff --git a/arch/x86/kvm/svm/vmenter.S b/arch/x86/kvm/svm/vmenter.S >> index 6feb8c08f45a..89f4e8e7bf0e 100644 >> --- a/arch/x86/kvm/svm/vmenter.S >> +++ b/arch/x86/kvm/svm/vmenter.S >> @@ -33,6 +33,7 @@ >> * __svm_vcpu_run - Run a vCPU via a transition to SVM guest mode >> * @vmcb_pa: unsigned long >> * @regs: unsigned long * (to guest registers) >> + * @hostsa_pa: unsigned long >> */ >> SYM_FUNC_START(__svm_vcpu_run) >> push %_ASM_BP >> @@ -47,6 +48,9 @@ SYM_FUNC_START(__svm_vcpu_run) >> #endif >> push %_ASM_BX >> >> + /* Save @hostsa_pa */ >> + push %_ASM_ARG3 >> + >> /* Save @regs. */ >> push %_ASM_ARG2 >> >> @@ -154,6 +158,12 @@ SYM_FUNC_START(__svm_vcpu_run) >> xor %r15d, %r15d >> #endif >> >> + /* "POP" @hostsa_pa to RAX. */ >> + pop %_ASM_AX >> + >> + /* Restore host user state and FS/GS base */ >> + vmload %_ASM_AX > > This VMLOAD needs the "handle fault on reboot" goo. Seeing the code, I think > I'd prefer to handle this in C code, especially if Paolo takes the svm_ops.h > patch[*]. Actually, I think with that patch it'd make sense to move the > existing VMSAVE+VMLOAD for the guest into svm.c, too. And completely unrelated, > the fault handling in svm/vmenter.S can be cleaned up a smidge to eliminate the > JMPs. > > Paolo, what do you think about me folding these patches into my series to do the > above cleanups? And maybe sending a pull request for the end result? (I'd also > like to add on a patch to use the user return MSR mechanism for MSR_TSC_AUX). > > [*] https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flkml.kernel.org%2Fr%2F20201231002702.2223707-8-seanjc%40google.com&data=04%7C01%7Cthomas.lendacky%40amd.com%7C5125acb3a3384ee75a5c08d8b19e2888%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637454640159484993%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=Q%2F%2B7kxE9pcV%2BelzHbeRpvs8wlQGQkirKUPg7fBP3QbU%3D&reserved=0 > >> + >> pop %_ASM_BX >> >> #ifdef CONFIG_X86_64 >> -- >> 2.25.1 >>
On 1/5/21 8:37 AM, Michael Roth wrote: > Using a guest workload which simply issues 'hlt' in a tight loop to > generate VMEXITs, it was observed (on a recent EPYC processor) that a > significant amount of the VMEXIT overhead measured on the host was the > result of MSR reads/writes in svm_vcpu_load/svm_vcpu_put according to > perf: > > 67.49%--kvm_arch_vcpu_ioctl_run > | > |--23.13%--vcpu_put > | kvm_arch_vcpu_put > | | > | |--21.31%--native_write_msr > | | > | --1.27%--svm_set_cr4 > | > |--16.11%--vcpu_load > | | > | --15.58%--kvm_arch_vcpu_load > | | > | |--13.97%--svm_set_cr4 > | | | > | | |--12.64%--native_read_msr > > Most of these MSRs relate to 'syscall'/'sysenter' and segment bases, and > can be saved/restored using 'vmsave'/'vmload' instructions rather than > explicit MSR reads/writes. In doing so there is a significant reduction > in the svm_vcpu_load/svm_vcpu_put overhead measured for the above > workload: > > 50.92%--kvm_arch_vcpu_ioctl_run > | > |--19.28%--disable_nmi_singlestep > | > |--13.68%--vcpu_load > | kvm_arch_vcpu_load > | | > | |--9.19%--svm_set_cr4 > | | | > | | --6.44%--native_read_msr > | | > | --3.55%--native_write_msr > | > |--6.05%--kvm_inject_nmi > |--2.80%--kvm_sev_es_mmio_read > |--2.19%--vcpu_put > | | > | --1.25%--kvm_arch_vcpu_put > | native_write_msr > > Quantifying this further, if we look at the raw cycle counts for a > normal iteration of the above workload (according to 'rdtscp'), > kvm_arch_vcpu_ioctl_run() takes ~4600 cycles from start to finish with > the current behavior. Using 'vmsave'/'vmload', this is reduced to > ~2800 cycles, a savings of 39%. > > While this approach doesn't seem to manifest in any noticeable > improvement for more realistic workloads like UnixBench, netperf, and > kernel builds, likely due to their exit paths generally involving IO > with comparatively high latencies, it does improve overall overhead > of KVM_RUN significantly, which may still be noticeable for certain > situations. It also simplifies some aspects of the code. > > With this change, explicit save/restore is no longer needed for the > following host MSRs, since they are documented[1] as being part of the > VMCB State Save Area: > > MSR_STAR, MSR_LSTAR, MSR_CSTAR, > MSR_SYSCALL_MASK, MSR_KERNEL_GS_BASE, > MSR_IA32_SYSENTER_CS, > MSR_IA32_SYSENTER_ESP, > MSR_IA32_SYSENTER_EIP, > MSR_FS_BASE, MSR_GS_BASE > > and only the following MSR needs individual handling in > svm_vcpu_put/svm_vcpu_load: > > MSR_TSC_AUX > > We could drop the host_save_user_msrs array/loop and instead handle > MSR read/write of MSR_TSC_AUX directly, but we leave that for now as > a potential follow-up. > > Since 'vmsave'/'vmload' also handles the LDTR and FS/GS segment > registers (and associated hidden state)[2], some of the code > previously used to handle this is no longer needed, so we drop it > as well. > > The first public release of the SVM spec[3] also documents the same > handling for the host state in question, so we make these changes > unconditionally. > > Also worth noting is that we 'vmsave' to the same page that is > subsequently used by 'vmrun' to record some host additional state. This > is okay, since, in accordance with the spec[2], the additional state > written to the page by 'vmrun' does not overwrite any fields written by > 'vmsave'. This has also been confirmed through testing (for the above > CPU, at least). > > [1] AMD64 Architecture Programmer's Manual, Rev 3.33, Volume 2, Appendix B, Table B-2 > [2] AMD64 Architecture Programmer's Manual, Rev 3.31, Volume 3, Chapter 4, VMSAVE/VMLOAD > [3] Secure Virtual Machine Architecture Reference Manual, Rev 3.01 > > Suggested-by: Tom Lendacky <thomas.lendacky@amd.com> > Signed-off-by: Michael Roth <michael.roth@amd.com> > --- > arch/x86/kvm/svm/svm.c | 36 +++++++----------------------------- > arch/x86/kvm/svm/svm.h | 19 +------------------ > arch/x86/kvm/svm/vmenter.S | 10 ++++++++++ > 3 files changed, 18 insertions(+), 47 deletions(-) > > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c > index 941e5251e13f..7a7e9b7d47a7 100644 > --- a/arch/x86/kvm/svm/svm.c > +++ b/arch/x86/kvm/svm/svm.c > @@ -1420,16 +1420,12 @@ static void svm_vcpu_load(struct kvm_vcpu *vcpu, int cpu) > if (sev_es_guest(svm->vcpu.kvm)) { > sev_es_vcpu_load(svm, cpu); > } else { > -#ifdef CONFIG_X86_64 > - rdmsrl(MSR_GS_BASE, to_svm(vcpu)->host.gs_base); > -#endif > - savesegment(fs, svm->host.fs); > - savesegment(gs, svm->host.gs); > - svm->host.ldt = kvm_read_ldt(); > - > for (i = 0; i < NR_HOST_SAVE_USER_MSRS; i++) > rdmsrl(host_save_user_msrs[i].index, > svm->host_user_msrs[i]); > + > + asm volatile(__ex("vmsave %%"_ASM_AX) > + : : "a" (page_to_phys(sd->save_area)) : "memory"); > } > > if (static_cpu_has(X86_FEATURE_TSCRATEMSR)) { > @@ -1461,17 +1457,6 @@ static void svm_vcpu_put(struct kvm_vcpu *vcpu) > if (sev_es_guest(svm->vcpu.kvm)) { > sev_es_vcpu_put(svm); > } else { > - kvm_load_ldt(svm->host.ldt); > -#ifdef CONFIG_X86_64 > - loadsegment(fs, svm->host.fs); > - wrmsrl(MSR_KERNEL_GS_BASE, current->thread.gsbase); > - load_gs_index(svm->host.gs); > -#else > -#ifdef CONFIG_X86_32_LAZY_GS > - loadsegment(gs, svm->host.gs); > -#endif > -#endif > - > for (i = 0; i < NR_HOST_SAVE_USER_MSRS; i++) > wrmsrl(host_save_user_msrs[i].index, > svm->host_user_msrs[i]); > @@ -3675,7 +3660,7 @@ static fastpath_t svm_exit_handlers_fastpath(struct kvm_vcpu *vcpu) > return EXIT_FASTPATH_NONE; > } > > -void __svm_vcpu_run(unsigned long vmcb_pa, unsigned long *regs); > +void __svm_vcpu_run(unsigned long vmcb_pa, unsigned long *regs, unsigned long hostsa_pa); There was a follow on fix patch to remove this forward declaration since, for SEV-ES, I had moved it into svm.h without deleting it here. I'm not sure when it will hit Paolo's tree. Thanks, Tom > > static noinstr void svm_vcpu_enter_exit(struct kvm_vcpu *vcpu, > struct vcpu_svm *svm) > @@ -3703,16 +3688,9 @@ static noinstr void svm_vcpu_enter_exit(struct kvm_vcpu *vcpu, > if (sev_es_guest(svm->vcpu.kvm)) { > __svm_sev_es_vcpu_run(svm->vmcb_pa); > } else { > - __svm_vcpu_run(svm->vmcb_pa, (unsigned long *)&svm->vcpu.arch.regs); > - > -#ifdef CONFIG_X86_64 > - native_wrmsrl(MSR_GS_BASE, svm->host.gs_base); > -#else > - loadsegment(fs, svm->host.fs); > -#ifndef CONFIG_X86_32_LAZY_GS > - loadsegment(gs, svm->host.gs); > -#endif > -#endif > + __svm_vcpu_run(svm->vmcb_pa, (unsigned long *)&svm->vcpu.arch.regs, > + page_to_phys(per_cpu(svm_data, > + vcpu->cpu)->save_area)); > } > > /* > diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h > index 5431e6335e2e..1f4460508036 100644 > --- a/arch/x86/kvm/svm/svm.h > +++ b/arch/x86/kvm/svm/svm.h > @@ -27,17 +27,6 @@ static const struct svm_host_save_msrs { > u32 index; /* Index of the MSR */ > bool sev_es_restored; /* True if MSR is restored on SEV-ES VMEXIT */ > } host_save_user_msrs[] = { > -#ifdef CONFIG_X86_64 > - { .index = MSR_STAR, .sev_es_restored = true }, > - { .index = MSR_LSTAR, .sev_es_restored = true }, > - { .index = MSR_CSTAR, .sev_es_restored = true }, > - { .index = MSR_SYSCALL_MASK, .sev_es_restored = true }, > - { .index = MSR_KERNEL_GS_BASE, .sev_es_restored = true }, > - { .index = MSR_FS_BASE, .sev_es_restored = true }, > -#endif > - { .index = MSR_IA32_SYSENTER_CS, .sev_es_restored = true }, > - { .index = MSR_IA32_SYSENTER_ESP, .sev_es_restored = true }, > - { .index = MSR_IA32_SYSENTER_EIP, .sev_es_restored = true }, > { .index = MSR_TSC_AUX, .sev_es_restored = false }, > }; > #define NR_HOST_SAVE_USER_MSRS ARRAY_SIZE(host_save_user_msrs) > @@ -130,12 +119,6 @@ struct vcpu_svm { > u64 next_rip; > > u64 host_user_msrs[NR_HOST_SAVE_USER_MSRS]; > - struct { > - u16 fs; > - u16 gs; > - u16 ldt; > - u64 gs_base; > - } host; > > u64 spec_ctrl; > /* > @@ -595,6 +578,6 @@ void sev_es_vcpu_put(struct vcpu_svm *svm); > /* vmenter.S */ > > void __svm_sev_es_vcpu_run(unsigned long vmcb_pa); > -void __svm_vcpu_run(unsigned long vmcb_pa, unsigned long *regs); > +void __svm_vcpu_run(unsigned long vmcb_pa, unsigned long *regs, unsigned long hostsa_pa); > > #endif > diff --git a/arch/x86/kvm/svm/vmenter.S b/arch/x86/kvm/svm/vmenter.S > index 6feb8c08f45a..89f4e8e7bf0e 100644 > --- a/arch/x86/kvm/svm/vmenter.S > +++ b/arch/x86/kvm/svm/vmenter.S > @@ -33,6 +33,7 @@ > * __svm_vcpu_run - Run a vCPU via a transition to SVM guest mode > * @vmcb_pa: unsigned long > * @regs: unsigned long * (to guest registers) > + * @hostsa_pa: unsigned long > */ > SYM_FUNC_START(__svm_vcpu_run) > push %_ASM_BP > @@ -47,6 +48,9 @@ SYM_FUNC_START(__svm_vcpu_run) > #endif > push %_ASM_BX > > + /* Save @hostsa_pa */ > + push %_ASM_ARG3 > + > /* Save @regs. */ > push %_ASM_ARG2 > > @@ -154,6 +158,12 @@ SYM_FUNC_START(__svm_vcpu_run) > xor %r15d, %r15d > #endif > > + /* "POP" @hostsa_pa to RAX. */ > + pop %_ASM_AX > + > + /* Restore host user state and FS/GS base */ > + vmload %_ASM_AX > + > pop %_ASM_BX > > #ifdef CONFIG_X86_64 >
On 1/7/21 9:32 AM, Tom Lendacky wrote: > On 1/5/21 11:20 AM, Sean Christopherson wrote: >> On Tue, Jan 05, 2021, Michael Roth wrote: >>> @@ -3703,16 +3688,9 @@ static noinstr void svm_vcpu_enter_exit(struct >>> kvm_vcpu *vcpu, >>> if (sev_es_guest(svm->vcpu.kvm)) { >>> __svm_sev_es_vcpu_run(svm->vmcb_pa); >>> } else { >>> - __svm_vcpu_run(svm->vmcb_pa, (unsigned long >>> *)&svm->vcpu.arch.regs); >>> - >>> -#ifdef CONFIG_X86_64 >>> - native_wrmsrl(MSR_GS_BASE, svm->host.gs_base); >>> -#else >>> - loadsegment(fs, svm->host.fs); >>> -#ifndef CONFIG_X86_32_LAZY_GS >>> - loadsegment(gs, svm->host.gs); >>> -#endif >>> -#endif >>> + __svm_vcpu_run(svm->vmcb_pa, (unsigned long >>> *)&svm->vcpu.arch.regs, >>> + page_to_phys(per_cpu(svm_data, >>> + vcpu->cpu)->save_area)); >> >> Does this need to use __sme_page_pa()? > > Yes, it should now. The SEV-ES support added the SME encryption bit to the > MSR_VM_HSAVE_PA MSR, so we need to be consistent in how the data is read > and written. Oh, and also in svm_vcpu_load(). Thanks, Tom > > Thanks, > Tom > >> >>> } >>> /* >> >> ... >> >>> diff --git a/arch/x86/kvm/svm/vmenter.S b/arch/x86/kvm/svm/vmenter.S >>> index 6feb8c08f45a..89f4e8e7bf0e 100644 >>> --- a/arch/x86/kvm/svm/vmenter.S >>> +++ b/arch/x86/kvm/svm/vmenter.S >>> @@ -33,6 +33,7 @@ >>> * __svm_vcpu_run - Run a vCPU via a transition to SVM guest mode >>> * @vmcb_pa: unsigned long >>> * @regs: unsigned long * (to guest registers) >>> + * @hostsa_pa: unsigned long >>> */ >>> SYM_FUNC_START(__svm_vcpu_run) >>> push %_ASM_BP >>> @@ -47,6 +48,9 @@ SYM_FUNC_START(__svm_vcpu_run) >>> #endif >>> push %_ASM_BX >>> + /* Save @hostsa_pa */ >>> + push %_ASM_ARG3 >>> + >>> /* Save @regs. */ >>> push %_ASM_ARG2 >>> @@ -154,6 +158,12 @@ SYM_FUNC_START(__svm_vcpu_run) >>> xor %r15d, %r15d >>> #endif >>> + /* "POP" @hostsa_pa to RAX. */ >>> + pop %_ASM_AX >>> + >>> + /* Restore host user state and FS/GS base */ >>> + vmload %_ASM_AX >> >> This VMLOAD needs the "handle fault on reboot" goo. Seeing the code, I >> think >> I'd prefer to handle this in C code, especially if Paolo takes the >> svm_ops.h >> patch[*]. Actually, I think with that patch it'd make sense to move the >> existing VMSAVE+VMLOAD for the guest into svm.c, too. And completely >> unrelated, >> the fault handling in svm/vmenter.S can be cleaned up a smidge to >> eliminate the >> JMPs. >> >> Paolo, what do you think about me folding these patches into my series >> to do the >> above cleanups? And maybe sending a pull request for the end result? >> (I'd also >> like to add on a patch to use the user return MSR mechanism for >> MSR_TSC_AUX). >> >> [*] >> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flkml.kernel.org%2Fr%2F20201231002702.2223707-8-seanjc%40google.com&data=04%7C01%7Cthomas.lendacky%40amd.com%7Ca130e2c4b40442b8532108d8b321a57b%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637456304409010405%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=6vWmBbFFP0aOaZr31I7WDhpmzL4A%2FY%2BuzvvZrmDHpWI%3D&reserved=0 >> >> >>> + >>> pop %_ASM_BX >>> #ifdef CONFIG_X86_64 >>> -- >>> 2.25.1 >>>
On Tue, Jan 05, 2021 at 09:20:03AM -0800, Sean Christopherson wrote: > On Tue, Jan 05, 2021, Michael Roth wrote: > > @@ -3703,16 +3688,9 @@ static noinstr void svm_vcpu_enter_exit(struct kvm_vcpu *vcpu, > > if (sev_es_guest(svm->vcpu.kvm)) { > > __svm_sev_es_vcpu_run(svm->vmcb_pa); > > } else { > > - __svm_vcpu_run(svm->vmcb_pa, (unsigned long *)&svm->vcpu.arch.regs); > > - > > -#ifdef CONFIG_X86_64 > > - native_wrmsrl(MSR_GS_BASE, svm->host.gs_base); > > -#else > > - loadsegment(fs, svm->host.fs); > > -#ifndef CONFIG_X86_32_LAZY_GS > > - loadsegment(gs, svm->host.gs); > > -#endif > > -#endif > > + __svm_vcpu_run(svm->vmcb_pa, (unsigned long *)&svm->vcpu.arch.regs, > > + page_to_phys(per_cpu(svm_data, > > + vcpu->cpu)->save_area)); > > Does this need to use __sme_page_pa()? Oddly enough the current patch seems to work even with SME enabled. Not sure why though since as Tom pointed out we do use it elsewhere with the SME bit set. But should be setting it either way. > > > } > > > > /* > > ... > > > diff --git a/arch/x86/kvm/svm/vmenter.S b/arch/x86/kvm/svm/vmenter.S > > index 6feb8c08f45a..89f4e8e7bf0e 100644 > > --- a/arch/x86/kvm/svm/vmenter.S > > +++ b/arch/x86/kvm/svm/vmenter.S > > @@ -33,6 +33,7 @@ > > * __svm_vcpu_run - Run a vCPU via a transition to SVM guest mode > > * @vmcb_pa: unsigned long > > * @regs: unsigned long * (to guest registers) > > + * @hostsa_pa: unsigned long > > */ > > SYM_FUNC_START(__svm_vcpu_run) > > push %_ASM_BP > > @@ -47,6 +48,9 @@ SYM_FUNC_START(__svm_vcpu_run) > > #endif > > push %_ASM_BX > > > > + /* Save @hostsa_pa */ > > + push %_ASM_ARG3 > > + > > /* Save @regs. */ > > push %_ASM_ARG2 > > > > @@ -154,6 +158,12 @@ SYM_FUNC_START(__svm_vcpu_run) > > xor %r15d, %r15d > > #endif > > > > + /* "POP" @hostsa_pa to RAX. */ > > + pop %_ASM_AX > > + > > + /* Restore host user state and FS/GS base */ > > + vmload %_ASM_AX > > This VMLOAD needs the "handle fault on reboot" goo. Seeing the code, I think Ah, yes, I overlooked that with the rework. > I'd prefer to handle this in C code, especially if Paolo takes the svm_ops.h > patch[*]. Actually, I think with that patch it'd make sense to move the > existing VMSAVE+VMLOAD for the guest into svm.c, too. And completely unrelated, > the fault handling in svm/vmenter.S can be cleaned up a smidge to eliminate the > JMPs. > > Paolo, what do you think about me folding these patches into my series to do the > above cleanups? And maybe sending a pull request for the end result? (I'd also > like to add on a patch to use the user return MSR mechanism for MSR_TSC_AUX). No complaints on my end at least :) But happy to send a v4 with SME bit fix and reboot handling if you think that's worthwhile (and other suggested changes as well, but not sure exactly what you have in mind there). Can also help with any testing of course. Thanks, Mike > > [*] https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flkml.kernel.org%2Fr%2F20201231002702.2223707-8-seanjc%40google.com&data=04%7C01%7Cmichael.roth%40amd.com%7C78b8a6cc557c4b7cda2e08d8b19e28e4%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637454640153346851%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=r2dX27RJ569gloShKvha%2BUtcf0%2B5vNc%2Fn6E1dREJDm0%3D&reserved=0 > > > + > > pop %_ASM_BX > > > > #ifdef CONFIG_X86_64 > > -- > > 2.25.1 > >
On 05/01/21 18:20, Sean Christopherson wrote: > This VMLOAD needs the "handle fault on reboot" goo. Seeing the code, I think > I'd prefer to handle this in C code, especially if Paolo takes the svm_ops.h > patch[*]. Actually, I think with that patch it'd make sense to move the > existing VMSAVE+VMLOAD for the guest into svm.c, too. And completely unrelated, > the fault handling in svm/vmenter.S can be cleaned up a smidge to eliminate the > JMPs. > > Paolo, what do you think about me folding these patches into my series to do the > above cleanups? And maybe sending a pull request for the end result? (I'd also > like to add on a patch to use the user return MSR mechanism for MSR_TSC_AUX). I have queued that part already, so Mike can rebase on top of kvm/queue. Paolo > [*]https://lkml.kernel.org/r/20201231002702.2223707-8-seanjc@google.com >
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index 941e5251e13f..7a7e9b7d47a7 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -1420,16 +1420,12 @@ static void svm_vcpu_load(struct kvm_vcpu *vcpu, int cpu) if (sev_es_guest(svm->vcpu.kvm)) { sev_es_vcpu_load(svm, cpu); } else { -#ifdef CONFIG_X86_64 - rdmsrl(MSR_GS_BASE, to_svm(vcpu)->host.gs_base); -#endif - savesegment(fs, svm->host.fs); - savesegment(gs, svm->host.gs); - svm->host.ldt = kvm_read_ldt(); - for (i = 0; i < NR_HOST_SAVE_USER_MSRS; i++) rdmsrl(host_save_user_msrs[i].index, svm->host_user_msrs[i]); + + asm volatile(__ex("vmsave %%"_ASM_AX) + : : "a" (page_to_phys(sd->save_area)) : "memory"); } if (static_cpu_has(X86_FEATURE_TSCRATEMSR)) { @@ -1461,17 +1457,6 @@ static void svm_vcpu_put(struct kvm_vcpu *vcpu) if (sev_es_guest(svm->vcpu.kvm)) { sev_es_vcpu_put(svm); } else { - kvm_load_ldt(svm->host.ldt); -#ifdef CONFIG_X86_64 - loadsegment(fs, svm->host.fs); - wrmsrl(MSR_KERNEL_GS_BASE, current->thread.gsbase); - load_gs_index(svm->host.gs); -#else -#ifdef CONFIG_X86_32_LAZY_GS - loadsegment(gs, svm->host.gs); -#endif -#endif - for (i = 0; i < NR_HOST_SAVE_USER_MSRS; i++) wrmsrl(host_save_user_msrs[i].index, svm->host_user_msrs[i]); @@ -3675,7 +3660,7 @@ static fastpath_t svm_exit_handlers_fastpath(struct kvm_vcpu *vcpu) return EXIT_FASTPATH_NONE; } -void __svm_vcpu_run(unsigned long vmcb_pa, unsigned long *regs); +void __svm_vcpu_run(unsigned long vmcb_pa, unsigned long *regs, unsigned long hostsa_pa); static noinstr void svm_vcpu_enter_exit(struct kvm_vcpu *vcpu, struct vcpu_svm *svm) @@ -3703,16 +3688,9 @@ static noinstr void svm_vcpu_enter_exit(struct kvm_vcpu *vcpu, if (sev_es_guest(svm->vcpu.kvm)) { __svm_sev_es_vcpu_run(svm->vmcb_pa); } else { - __svm_vcpu_run(svm->vmcb_pa, (unsigned long *)&svm->vcpu.arch.regs); - -#ifdef CONFIG_X86_64 - native_wrmsrl(MSR_GS_BASE, svm->host.gs_base); -#else - loadsegment(fs, svm->host.fs); -#ifndef CONFIG_X86_32_LAZY_GS - loadsegment(gs, svm->host.gs); -#endif -#endif + __svm_vcpu_run(svm->vmcb_pa, (unsigned long *)&svm->vcpu.arch.regs, + page_to_phys(per_cpu(svm_data, + vcpu->cpu)->save_area)); } /* diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h index 5431e6335e2e..1f4460508036 100644 --- a/arch/x86/kvm/svm/svm.h +++ b/arch/x86/kvm/svm/svm.h @@ -27,17 +27,6 @@ static const struct svm_host_save_msrs { u32 index; /* Index of the MSR */ bool sev_es_restored; /* True if MSR is restored on SEV-ES VMEXIT */ } host_save_user_msrs[] = { -#ifdef CONFIG_X86_64 - { .index = MSR_STAR, .sev_es_restored = true }, - { .index = MSR_LSTAR, .sev_es_restored = true }, - { .index = MSR_CSTAR, .sev_es_restored = true }, - { .index = MSR_SYSCALL_MASK, .sev_es_restored = true }, - { .index = MSR_KERNEL_GS_BASE, .sev_es_restored = true }, - { .index = MSR_FS_BASE, .sev_es_restored = true }, -#endif - { .index = MSR_IA32_SYSENTER_CS, .sev_es_restored = true }, - { .index = MSR_IA32_SYSENTER_ESP, .sev_es_restored = true }, - { .index = MSR_IA32_SYSENTER_EIP, .sev_es_restored = true }, { .index = MSR_TSC_AUX, .sev_es_restored = false }, }; #define NR_HOST_SAVE_USER_MSRS ARRAY_SIZE(host_save_user_msrs) @@ -130,12 +119,6 @@ struct vcpu_svm { u64 next_rip; u64 host_user_msrs[NR_HOST_SAVE_USER_MSRS]; - struct { - u16 fs; - u16 gs; - u16 ldt; - u64 gs_base; - } host; u64 spec_ctrl; /* @@ -595,6 +578,6 @@ void sev_es_vcpu_put(struct vcpu_svm *svm); /* vmenter.S */ void __svm_sev_es_vcpu_run(unsigned long vmcb_pa); -void __svm_vcpu_run(unsigned long vmcb_pa, unsigned long *regs); +void __svm_vcpu_run(unsigned long vmcb_pa, unsigned long *regs, unsigned long hostsa_pa); #endif diff --git a/arch/x86/kvm/svm/vmenter.S b/arch/x86/kvm/svm/vmenter.S index 6feb8c08f45a..89f4e8e7bf0e 100644 --- a/arch/x86/kvm/svm/vmenter.S +++ b/arch/x86/kvm/svm/vmenter.S @@ -33,6 +33,7 @@ * __svm_vcpu_run - Run a vCPU via a transition to SVM guest mode * @vmcb_pa: unsigned long * @regs: unsigned long * (to guest registers) + * @hostsa_pa: unsigned long */ SYM_FUNC_START(__svm_vcpu_run) push %_ASM_BP @@ -47,6 +48,9 @@ SYM_FUNC_START(__svm_vcpu_run) #endif push %_ASM_BX + /* Save @hostsa_pa */ + push %_ASM_ARG3 + /* Save @regs. */ push %_ASM_ARG2 @@ -154,6 +158,12 @@ SYM_FUNC_START(__svm_vcpu_run) xor %r15d, %r15d #endif + /* "POP" @hostsa_pa to RAX. */ + pop %_ASM_AX + + /* Restore host user state and FS/GS base */ + vmload %_ASM_AX + pop %_ASM_BX #ifdef CONFIG_X86_64
Using a guest workload which simply issues 'hlt' in a tight loop to generate VMEXITs, it was observed (on a recent EPYC processor) that a significant amount of the VMEXIT overhead measured on the host was the result of MSR reads/writes in svm_vcpu_load/svm_vcpu_put according to perf: 67.49%--kvm_arch_vcpu_ioctl_run | |--23.13%--vcpu_put | kvm_arch_vcpu_put | | | |--21.31%--native_write_msr | | | --1.27%--svm_set_cr4 | |--16.11%--vcpu_load | | | --15.58%--kvm_arch_vcpu_load | | | |--13.97%--svm_set_cr4 | | | | | |--12.64%--native_read_msr Most of these MSRs relate to 'syscall'/'sysenter' and segment bases, and can be saved/restored using 'vmsave'/'vmload' instructions rather than explicit MSR reads/writes. In doing so there is a significant reduction in the svm_vcpu_load/svm_vcpu_put overhead measured for the above workload: 50.92%--kvm_arch_vcpu_ioctl_run | |--19.28%--disable_nmi_singlestep | |--13.68%--vcpu_load | kvm_arch_vcpu_load | | | |--9.19%--svm_set_cr4 | | | | | --6.44%--native_read_msr | | | --3.55%--native_write_msr | |--6.05%--kvm_inject_nmi |--2.80%--kvm_sev_es_mmio_read |--2.19%--vcpu_put | | | --1.25%--kvm_arch_vcpu_put | native_write_msr Quantifying this further, if we look at the raw cycle counts for a normal iteration of the above workload (according to 'rdtscp'), kvm_arch_vcpu_ioctl_run() takes ~4600 cycles from start to finish with the current behavior. Using 'vmsave'/'vmload', this is reduced to ~2800 cycles, a savings of 39%. While this approach doesn't seem to manifest in any noticeable improvement for more realistic workloads like UnixBench, netperf, and kernel builds, likely due to their exit paths generally involving IO with comparatively high latencies, it does improve overall overhead of KVM_RUN significantly, which may still be noticeable for certain situations. It also simplifies some aspects of the code. With this change, explicit save/restore is no longer needed for the following host MSRs, since they are documented[1] as being part of the VMCB State Save Area: MSR_STAR, MSR_LSTAR, MSR_CSTAR, MSR_SYSCALL_MASK, MSR_KERNEL_GS_BASE, MSR_IA32_SYSENTER_CS, MSR_IA32_SYSENTER_ESP, MSR_IA32_SYSENTER_EIP, MSR_FS_BASE, MSR_GS_BASE and only the following MSR needs individual handling in svm_vcpu_put/svm_vcpu_load: MSR_TSC_AUX We could drop the host_save_user_msrs array/loop and instead handle MSR read/write of MSR_TSC_AUX directly, but we leave that for now as a potential follow-up. Since 'vmsave'/'vmload' also handles the LDTR and FS/GS segment registers (and associated hidden state)[2], some of the code previously used to handle this is no longer needed, so we drop it as well. The first public release of the SVM spec[3] also documents the same handling for the host state in question, so we make these changes unconditionally. Also worth noting is that we 'vmsave' to the same page that is subsequently used by 'vmrun' to record some host additional state. This is okay, since, in accordance with the spec[2], the additional state written to the page by 'vmrun' does not overwrite any fields written by 'vmsave'. This has also been confirmed through testing (for the above CPU, at least). [1] AMD64 Architecture Programmer's Manual, Rev 3.33, Volume 2, Appendix B, Table B-2 [2] AMD64 Architecture Programmer's Manual, Rev 3.31, Volume 3, Chapter 4, VMSAVE/VMLOAD [3] Secure Virtual Machine Architecture Reference Manual, Rev 3.01 Suggested-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Michael Roth <michael.roth@amd.com> --- arch/x86/kvm/svm/svm.c | 36 +++++++----------------------------- arch/x86/kvm/svm/svm.h | 19 +------------------ arch/x86/kvm/svm/vmenter.S | 10 ++++++++++ 3 files changed, 18 insertions(+), 47 deletions(-)