Message ID | 159191213022.31436.11150808867377936241.stgit@bmoger-ubuntu (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | INVPCID support for the AMD guests | expand |
On Thu, Jun 11, 2020 at 2:48 PM Babu Moger <babu.moger@amd.com> wrote: > > The following intercept is added for INVPCID instruction: > Code Name Cause > A2h VMEXIT_INVPCID INVPCID instruction > > The following bit is added to the VMCB layout control area > to control intercept of INVPCID: > Byte Offset Bit(s) Function > 14h 2 intercept INVPCID > > For the guests with nested page table (NPT) support, the INVPCID > feature works as running it natively. KVM does not need to do any > special handling in this case. > > Interceptions are required in the following cases. > 1. If the guest tries to disable the feature when the underlying > hardware supports it. In this case hypervisor needs to report #UD. Per the AMD documentation, attempts to use INVPCID at CPL>0 will result in a #GP, regardless of the intercept bit. If the guest CPUID doesn't enumerate the feature, shouldn't the instruction raise #UD regardless of CPL? This seems to imply that we should intercept #GP and decode the instruction to see if we should synthesize #UD instead. > 2. When the guest is running with shadow page table enabled, in > this case the hypervisor needs to handle the tlbflush based on the > type of invpcid instruction type. > > AMD documentation for INVPCID feature is available at "AMD64 > Architecture Programmer’s Manual Volume 2: System Programming, > Pub. 24593 Rev. 3.34(or later)" > > The documentation can be obtained at the links below: > Link: https://www.amd.com/system/files/TechDocs/24593.pdf > Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537 > > Signed-off-by: Babu Moger <babu.moger@amd.com> > --- > arch/x86/include/asm/svm.h | 4 ++++ > arch/x86/include/uapi/asm/svm.h | 2 ++ > arch/x86/kvm/svm/svm.c | 42 +++++++++++++++++++++++++++++++++++++++ > 3 files changed, 48 insertions(+) > > diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h > index 62649fba8908..6488094f67fa 100644 > --- a/arch/x86/include/asm/svm.h > +++ b/arch/x86/include/asm/svm.h > @@ -55,6 +55,10 @@ enum { > INTERCEPT_RDPRU, > }; > > +/* Extended Intercept bits */ > +enum { > + INTERCEPT_INVPCID = 2, > +}; > > struct __attribute__ ((__packed__)) vmcb_control_area { > u32 intercept_cr; > diff --git a/arch/x86/include/uapi/asm/svm.h b/arch/x86/include/uapi/asm/svm.h > index 2e8a30f06c74..522d42dfc28c 100644 > --- a/arch/x86/include/uapi/asm/svm.h > +++ b/arch/x86/include/uapi/asm/svm.h > @@ -76,6 +76,7 @@ > #define SVM_EXIT_MWAIT_COND 0x08c > #define SVM_EXIT_XSETBV 0x08d > #define SVM_EXIT_RDPRU 0x08e > +#define SVM_EXIT_INVPCID 0x0a2 > #define SVM_EXIT_NPF 0x400 > #define SVM_EXIT_AVIC_INCOMPLETE_IPI 0x401 > #define SVM_EXIT_AVIC_UNACCELERATED_ACCESS 0x402 > @@ -171,6 +172,7 @@ > { SVM_EXIT_MONITOR, "monitor" }, \ > { SVM_EXIT_MWAIT, "mwait" }, \ > { SVM_EXIT_XSETBV, "xsetbv" }, \ > + { SVM_EXIT_INVPCID, "invpcid" }, \ > { SVM_EXIT_NPF, "npf" }, \ > { SVM_EXIT_AVIC_INCOMPLETE_IPI, "avic_incomplete_ipi" }, \ > { SVM_EXIT_AVIC_UNACCELERATED_ACCESS, "avic_unaccelerated_access" }, \ > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c > index 285e5e1ff518..82d974338f68 100644 > --- a/arch/x86/kvm/svm/svm.c > +++ b/arch/x86/kvm/svm/svm.c > @@ -813,6 +813,11 @@ static __init void svm_set_cpu_caps(void) > if (boot_cpu_has(X86_FEATURE_LS_CFG_SSBD) || > boot_cpu_has(X86_FEATURE_AMD_SSBD)) > kvm_cpu_cap_set(X86_FEATURE_VIRT_SSBD); > + > + /* Enable INVPCID if both PCID and INVPCID enabled */ > + if (boot_cpu_has(X86_FEATURE_PCID) && > + boot_cpu_has(X86_FEATURE_INVPCID)) > + kvm_cpu_cap_set(X86_FEATURE_INVPCID); > } > > static __init int svm_hardware_setup(void) > @@ -1099,6 +1104,17 @@ static void init_vmcb(struct vcpu_svm *svm) > clr_intercept(svm, INTERCEPT_PAUSE); > } > > + /* > + * Intercept INVPCID instruction only if shadow page table is > + * enabled. Interception is not required with nested page table. > + */ > + if (boot_cpu_has(X86_FEATURE_INVPCID)) { > + if (!npt_enabled) > + set_extended_intercept(svm, INTERCEPT_INVPCID); > + else > + clr_extended_intercept(svm, INTERCEPT_INVPCID); > + } > + > if (kvm_vcpu_apicv_active(&svm->vcpu)) > avic_init_vmcb(svm); > > @@ -2715,6 +2731,23 @@ static int mwait_interception(struct vcpu_svm *svm) > return nop_interception(svm); > } > > +static int invpcid_interception(struct vcpu_svm *svm) > +{ > + struct kvm_vcpu *vcpu = &svm->vcpu; > + unsigned long type; > + gva_t gva; > + > + /* > + * For an INVPCID intercept: > + * EXITINFO1 provides the linear address of the memory operand. > + * EXITINFO2 provides the contents of the register operand. > + */ > + type = svm->vmcb->control.exit_info_2; > + gva = svm->vmcb->control.exit_info_1; > + > + return kvm_handle_invpcid_types(vcpu, gva, type); > +} > + > static int (*const svm_exit_handlers[])(struct vcpu_svm *svm) = { > [SVM_EXIT_READ_CR0] = cr_interception, > [SVM_EXIT_READ_CR3] = cr_interception, > @@ -2777,6 +2810,7 @@ static int (*const svm_exit_handlers[])(struct vcpu_svm *svm) = { > [SVM_EXIT_MWAIT] = mwait_interception, > [SVM_EXIT_XSETBV] = xsetbv_interception, > [SVM_EXIT_RDPRU] = rdpru_interception, > + [SVM_EXIT_INVPCID] = invpcid_interception, > [SVM_EXIT_NPF] = npf_interception, > [SVM_EXIT_RSM] = rsm_interception, > [SVM_EXIT_AVIC_INCOMPLETE_IPI] = avic_incomplete_ipi_interception, > @@ -3562,6 +3596,14 @@ static void svm_cpuid_update(struct kvm_vcpu *vcpu) > svm->nrips_enabled = kvm_cpu_cap_has(X86_FEATURE_NRIPS) && > guest_cpuid_has(&svm->vcpu, X86_FEATURE_NRIPS); > > + /* > + * Intercept INVPCID instruction if the baremetal has the support > + * but the guest doesn't claim the feature. > + */ > + if (boot_cpu_has(X86_FEATURE_INVPCID) && > + !guest_cpuid_has(vcpu, X86_FEATURE_INVPCID)) > + set_extended_intercept(svm, INTERCEPT_INVPCID); > + What if INVPCID is enabled in the guest CPUID later? Shouldn't we then clear this intercept bit? > if (!kvm_vcpu_apicv_active(vcpu)) > return; > >
> -----Original Message----- > From: Jim Mattson <jmattson@google.com> > Sent: Thursday, June 11, 2020 6:51 PM > To: Moger, Babu <Babu.Moger@amd.com> > Cc: Wanpeng Li <wanpengli@tencent.com>; Joerg Roedel <joro@8bytes.org>; > the arch/x86 maintainers <x86@kernel.org>; Sean Christopherson > <sean.j.christopherson@intel.com>; Ingo Molnar <mingo@redhat.com>; > Borislav Petkov <bp@alien8.de>; H . Peter Anvin <hpa@zytor.com>; Paolo > Bonzini <pbonzini@redhat.com>; Vitaly Kuznetsov <vkuznets@redhat.com>; > Thomas Gleixner <tglx@linutronix.de>; LKML <linux-kernel@vger.kernel.org>; > kvm list <kvm@vger.kernel.org> > Subject: Re: [PATCH 3/3] KVM:SVM: Enable INVPCID feature on AMD > > On Thu, Jun 11, 2020 at 2:48 PM Babu Moger <babu.moger@amd.com> wrote: > > > > The following intercept is added for INVPCID instruction: > > Code Name Cause > > A2h VMEXIT_INVPCID INVPCID instruction > > > > The following bit is added to the VMCB layout control area > > to control intercept of INVPCID: > > Byte Offset Bit(s) Function > > 14h 2 intercept INVPCID > > > > For the guests with nested page table (NPT) support, the INVPCID > > feature works as running it natively. KVM does not need to do any > > special handling in this case. > > > > Interceptions are required in the following cases. > > 1. If the guest tries to disable the feature when the underlying > > hardware supports it. In this case hypervisor needs to report #UD. > > Per the AMD documentation, attempts to use INVPCID at CPL>0 will > result in a #GP, regardless of the intercept bit. If the guest CPUID > doesn't enumerate the feature, shouldn't the instruction raise #UD > regardless of CPL? This seems to imply that we should intercept #GP > and decode the instruction to see if we should synthesize #UD instead. Purpose here is to report UD when the guest CPUID doesn't enumerate the INVPCID feature When Bare-metal supports it. It seems to work fine for that purpose. You are right. The #GP for CPL>0 takes precedence over interception. No. I am not planning to intercept GP. I will change the text. How about this? Interceptions are required in the following cases. 1. If the guest CPUID doesn't enumerate the INVPCID feature when the underlying hardware supports it, hypervisor needs to report UD. However, #GP for CPL>0 takes precedence over interception. > > 2. When the guest is running with shadow page table enabled, in > > this case the hypervisor needs to handle the tlbflush based on the > > type of invpcid instruction type. > > > > AMD documentation for INVPCID feature is available at "AMD64 > > Architecture Programmer’s Manual Volume 2: System Programming, > > Pub. 24593 Rev. 3.34(or later)" > > > > The documentation can be obtained at the links below: > > Link: > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.a > md.com%2Fsystem%2Ffiles%2FTechDocs%2F24593.pdf&data=02%7C01%7 > Cbabu.moger%40amd.com%7C36861b25f6d143e3b38e08d80e624472%7C3dd8 > 961fe4884e608e11a82d994e183d%7C0%7C0%7C637275163374103811&s > data=E%2Fdb6T%2BdO4nrtUoqhKidF6XyorsWrphj6O4WwNZpmYA%3D&res > erved=0 > > Link: > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla. > kernel.org%2Fshow_bug.cgi%3Fid%3D206537&data=02%7C01%7Cbabu.m > oger%40amd.com%7C36861b25f6d143e3b38e08d80e624472%7C3dd8961fe488 > 4e608e11a82d994e183d%7C0%7C0%7C637275163374103811&sdata=b81 > 9W%2FhKS93%2BAp3QvcsR0BwTQpUVUFMbIaNaisgWHRY%3D&reserved= > 0 > > > > Signed-off-by: Babu Moger <babu.moger@amd.com> > > --- > > arch/x86/include/asm/svm.h | 4 ++++ > > arch/x86/include/uapi/asm/svm.h | 2 ++ > > arch/x86/kvm/svm/svm.c | 42 > +++++++++++++++++++++++++++++++++++++++ > > 3 files changed, 48 insertions(+) > > > > diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h > > index 62649fba8908..6488094f67fa 100644 > > --- a/arch/x86/include/asm/svm.h > > +++ b/arch/x86/include/asm/svm.h > > @@ -55,6 +55,10 @@ enum { > > INTERCEPT_RDPRU, > > }; > > > > +/* Extended Intercept bits */ > > +enum { > > + INTERCEPT_INVPCID = 2, > > +}; > > > > struct __attribute__ ((__packed__)) vmcb_control_area { > > u32 intercept_cr; > > diff --git a/arch/x86/include/uapi/asm/svm.h > b/arch/x86/include/uapi/asm/svm.h > > index 2e8a30f06c74..522d42dfc28c 100644 > > --- a/arch/x86/include/uapi/asm/svm.h > > +++ b/arch/x86/include/uapi/asm/svm.h > > @@ -76,6 +76,7 @@ > > #define SVM_EXIT_MWAIT_COND 0x08c > > #define SVM_EXIT_XSETBV 0x08d > > #define SVM_EXIT_RDPRU 0x08e > > +#define SVM_EXIT_INVPCID 0x0a2 > > #define SVM_EXIT_NPF 0x400 > > #define SVM_EXIT_AVIC_INCOMPLETE_IPI 0x401 > > #define SVM_EXIT_AVIC_UNACCELERATED_ACCESS 0x402 > > @@ -171,6 +172,7 @@ > > { SVM_EXIT_MONITOR, "monitor" }, \ > > { SVM_EXIT_MWAIT, "mwait" }, \ > > { SVM_EXIT_XSETBV, "xsetbv" }, \ > > + { SVM_EXIT_INVPCID, "invpcid" }, \ > > { SVM_EXIT_NPF, "npf" }, \ > > { SVM_EXIT_AVIC_INCOMPLETE_IPI, "avic_incomplete_ipi" }, \ > > { SVM_EXIT_AVIC_UNACCELERATED_ACCESS, > "avic_unaccelerated_access" }, \ > > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c > > index 285e5e1ff518..82d974338f68 100644 > > --- a/arch/x86/kvm/svm/svm.c > > +++ b/arch/x86/kvm/svm/svm.c > > @@ -813,6 +813,11 @@ static __init void svm_set_cpu_caps(void) > > if (boot_cpu_has(X86_FEATURE_LS_CFG_SSBD) || > > boot_cpu_has(X86_FEATURE_AMD_SSBD)) > > kvm_cpu_cap_set(X86_FEATURE_VIRT_SSBD); > > + > > + /* Enable INVPCID if both PCID and INVPCID enabled */ > > + if (boot_cpu_has(X86_FEATURE_PCID) && > > + boot_cpu_has(X86_FEATURE_INVPCID)) > > + kvm_cpu_cap_set(X86_FEATURE_INVPCID); > > } > > > > static __init int svm_hardware_setup(void) > > @@ -1099,6 +1104,17 @@ static void init_vmcb(struct vcpu_svm *svm) > > clr_intercept(svm, INTERCEPT_PAUSE); > > } > > > > + /* > > + * Intercept INVPCID instruction only if shadow page table is > > + * enabled. Interception is not required with nested page table. > > + */ > > + if (boot_cpu_has(X86_FEATURE_INVPCID)) { > > + if (!npt_enabled) > > + set_extended_intercept(svm, INTERCEPT_INVPCID); > > + else > > + clr_extended_intercept(svm, INTERCEPT_INVPCID); > > + } > > + > > if (kvm_vcpu_apicv_active(&svm->vcpu)) > > avic_init_vmcb(svm); > > > > @@ -2715,6 +2731,23 @@ static int mwait_interception(struct vcpu_svm > *svm) > > return nop_interception(svm); > > } > > > > +static int invpcid_interception(struct vcpu_svm *svm) > > +{ > > + struct kvm_vcpu *vcpu = &svm->vcpu; > > + unsigned long type; > > + gva_t gva; > > + > > + /* > > + * For an INVPCID intercept: > > + * EXITINFO1 provides the linear address of the memory operand. > > + * EXITINFO2 provides the contents of the register operand. > > + */ > > + type = svm->vmcb->control.exit_info_2; > > + gva = svm->vmcb->control.exit_info_1; > > + > > + return kvm_handle_invpcid_types(vcpu, gva, type); > > +} > > + > > static int (*const svm_exit_handlers[])(struct vcpu_svm *svm) = { > > [SVM_EXIT_READ_CR0] = cr_interception, > > [SVM_EXIT_READ_CR3] = cr_interception, > > @@ -2777,6 +2810,7 @@ static int (*const svm_exit_handlers[])(struct > vcpu_svm *svm) = { > > [SVM_EXIT_MWAIT] = mwait_interception, > > [SVM_EXIT_XSETBV] = xsetbv_interception, > > [SVM_EXIT_RDPRU] = rdpru_interception, > > + [SVM_EXIT_INVPCID] = invpcid_interception, > > [SVM_EXIT_NPF] = npf_interception, > > [SVM_EXIT_RSM] = rsm_interception, > > [SVM_EXIT_AVIC_INCOMPLETE_IPI] = > avic_incomplete_ipi_interception, > > @@ -3562,6 +3596,14 @@ static void svm_cpuid_update(struct kvm_vcpu > *vcpu) > > svm->nrips_enabled = kvm_cpu_cap_has(X86_FEATURE_NRIPS) && > > guest_cpuid_has(&svm->vcpu, X86_FEATURE_NRIPS); > > > > + /* > > + * Intercept INVPCID instruction if the baremetal has the support > > + * but the guest doesn't claim the feature. > > + */ > > + if (boot_cpu_has(X86_FEATURE_INVPCID) && > > + !guest_cpuid_has(vcpu, X86_FEATURE_INVPCID)) > > + set_extended_intercept(svm, INTERCEPT_INVPCID); > > + > > What if INVPCID is enabled in the guest CPUID later? Shouldn't we then > clear this intercept bit? I assume the feature enable comes in the same code path as this. I can add "if else" check here if that is what you are suggesting. > > > if (!kvm_vcpu_apicv_active(vcpu)) > > return; > > > >
On Fri, Jun 12, 2020 at 12:35 PM Babu Moger <babu.moger@amd.com> wrote: > > > > > -----Original Message----- > > From: Jim Mattson <jmattson@google.com> > > Sent: Thursday, June 11, 2020 6:51 PM > > To: Moger, Babu <Babu.Moger@amd.com> > > Cc: Wanpeng Li <wanpengli@tencent.com>; Joerg Roedel <joro@8bytes.org>; > > the arch/x86 maintainers <x86@kernel.org>; Sean Christopherson > > <sean.j.christopherson@intel.com>; Ingo Molnar <mingo@redhat.com>; > > Borislav Petkov <bp@alien8.de>; H . Peter Anvin <hpa@zytor.com>; Paolo > > Bonzini <pbonzini@redhat.com>; Vitaly Kuznetsov <vkuznets@redhat.com>; > > Thomas Gleixner <tglx@linutronix.de>; LKML <linux-kernel@vger.kernel.org>; > > kvm list <kvm@vger.kernel.org> > > Subject: Re: [PATCH 3/3] KVM:SVM: Enable INVPCID feature on AMD > > > > On Thu, Jun 11, 2020 at 2:48 PM Babu Moger <babu.moger@amd.com> wrote: > > > > > > The following intercept is added for INVPCID instruction: > > > Code Name Cause > > > A2h VMEXIT_INVPCID INVPCID instruction > > > > > > The following bit is added to the VMCB layout control area > > > to control intercept of INVPCID: > > > Byte Offset Bit(s) Function > > > 14h 2 intercept INVPCID > > > > > > For the guests with nested page table (NPT) support, the INVPCID > > > feature works as running it natively. KVM does not need to do any > > > special handling in this case. > > > > > > Interceptions are required in the following cases. > > > 1. If the guest tries to disable the feature when the underlying > > > hardware supports it. In this case hypervisor needs to report #UD. > > > > Per the AMD documentation, attempts to use INVPCID at CPL>0 will > > result in a #GP, regardless of the intercept bit. If the guest CPUID > > doesn't enumerate the feature, shouldn't the instruction raise #UD > > regardless of CPL? This seems to imply that we should intercept #GP > > and decode the instruction to see if we should synthesize #UD instead. > > Purpose here is to report UD when the guest CPUID doesn't enumerate the > INVPCID feature When Bare-metal supports it. It seems to work fine for > that purpose. You are right. The #GP for CPL>0 takes precedence over > interception. No. I am not planning to intercept GP. WIthout intercepting #GP, you fail to achieve your stated purpose. > I will change the text. How about this? > > Interceptions are required in the following cases. > 1. If the guest CPUID doesn't enumerate the INVPCID feature when the > underlying hardware supports it, hypervisor needs to report UD. However, > #GP for CPL>0 takes precedence over interception. This text is not internally consistent. In one sentence, you say that "hypervisor needs to report #UD." In the next sentence, you are essentially saying that the hypervisor doesn't need to report #UD. Which is it? > > > 2. When the guest is running with shadow page table enabled, in > > > this case the hypervisor needs to handle the tlbflush based on the > > > type of invpcid instruction type. > > > > > > AMD documentation for INVPCID feature is available at "AMD64 > > > Architecture Programmer’s Manual Volume 2: System Programming, > > > Pub. 24593 Rev. 3.34(or later)" > > > > > > The documentation can be obtained at the links below: > > > Link: > > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.a > > md.com%2Fsystem%2Ffiles%2FTechDocs%2F24593.pdf&data=02%7C01%7 > > Cbabu.moger%40amd.com%7C36861b25f6d143e3b38e08d80e624472%7C3dd8 > > 961fe4884e608e11a82d994e183d%7C0%7C0%7C637275163374103811&s > > data=E%2Fdb6T%2BdO4nrtUoqhKidF6XyorsWrphj6O4WwNZpmYA%3D&res > > erved=0 > > > Link: > > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla. > > kernel.org%2Fshow_bug.cgi%3Fid%3D206537&data=02%7C01%7Cbabu.m > > oger%40amd.com%7C36861b25f6d143e3b38e08d80e624472%7C3dd8961fe488 > > 4e608e11a82d994e183d%7C0%7C0%7C637275163374103811&sdata=b81 > > 9W%2FhKS93%2BAp3QvcsR0BwTQpUVUFMbIaNaisgWHRY%3D&reserved= > > 0 > > > > > > Signed-off-by: Babu Moger <babu.moger@amd.com> > > > --- > > > arch/x86/include/asm/svm.h | 4 ++++ > > > arch/x86/include/uapi/asm/svm.h | 2 ++ > > > arch/x86/kvm/svm/svm.c | 42 > > +++++++++++++++++++++++++++++++++++++++ > > > 3 files changed, 48 insertions(+) > > > > > > diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h > > > index 62649fba8908..6488094f67fa 100644 > > > --- a/arch/x86/include/asm/svm.h > > > +++ b/arch/x86/include/asm/svm.h > > > @@ -55,6 +55,10 @@ enum { > > > INTERCEPT_RDPRU, > > > }; > > > > > > +/* Extended Intercept bits */ > > > +enum { > > > + INTERCEPT_INVPCID = 2, > > > +}; > > > > > > struct __attribute__ ((__packed__)) vmcb_control_area { > > > u32 intercept_cr; > > > diff --git a/arch/x86/include/uapi/asm/svm.h > > b/arch/x86/include/uapi/asm/svm.h > > > index 2e8a30f06c74..522d42dfc28c 100644 > > > --- a/arch/x86/include/uapi/asm/svm.h > > > +++ b/arch/x86/include/uapi/asm/svm.h > > > @@ -76,6 +76,7 @@ > > > #define SVM_EXIT_MWAIT_COND 0x08c > > > #define SVM_EXIT_XSETBV 0x08d > > > #define SVM_EXIT_RDPRU 0x08e > > > +#define SVM_EXIT_INVPCID 0x0a2 > > > #define SVM_EXIT_NPF 0x400 > > > #define SVM_EXIT_AVIC_INCOMPLETE_IPI 0x401 > > > #define SVM_EXIT_AVIC_UNACCELERATED_ACCESS 0x402 > > > @@ -171,6 +172,7 @@ > > > { SVM_EXIT_MONITOR, "monitor" }, \ > > > { SVM_EXIT_MWAIT, "mwait" }, \ > > > { SVM_EXIT_XSETBV, "xsetbv" }, \ > > > + { SVM_EXIT_INVPCID, "invpcid" }, \ > > > { SVM_EXIT_NPF, "npf" }, \ > > > { SVM_EXIT_AVIC_INCOMPLETE_IPI, "avic_incomplete_ipi" }, \ > > > { SVM_EXIT_AVIC_UNACCELERATED_ACCESS, > > "avic_unaccelerated_access" }, \ > > > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c > > > index 285e5e1ff518..82d974338f68 100644 > > > --- a/arch/x86/kvm/svm/svm.c > > > +++ b/arch/x86/kvm/svm/svm.c > > > @@ -813,6 +813,11 @@ static __init void svm_set_cpu_caps(void) > > > if (boot_cpu_has(X86_FEATURE_LS_CFG_SSBD) || > > > boot_cpu_has(X86_FEATURE_AMD_SSBD)) > > > kvm_cpu_cap_set(X86_FEATURE_VIRT_SSBD); > > > + > > > + /* Enable INVPCID if both PCID and INVPCID enabled */ > > > + if (boot_cpu_has(X86_FEATURE_PCID) && > > > + boot_cpu_has(X86_FEATURE_INVPCID)) > > > + kvm_cpu_cap_set(X86_FEATURE_INVPCID); > > > } > > > > > > static __init int svm_hardware_setup(void) > > > @@ -1099,6 +1104,17 @@ static void init_vmcb(struct vcpu_svm *svm) > > > clr_intercept(svm, INTERCEPT_PAUSE); > > > } > > > > > > + /* > > > + * Intercept INVPCID instruction only if shadow page table is > > > + * enabled. Interception is not required with nested page table. > > > + */ > > > + if (boot_cpu_has(X86_FEATURE_INVPCID)) { > > > + if (!npt_enabled) > > > + set_extended_intercept(svm, INTERCEPT_INVPCID); > > > + else > > > + clr_extended_intercept(svm, INTERCEPT_INVPCID); > > > + } > > > + > > > if (kvm_vcpu_apicv_active(&svm->vcpu)) > > > avic_init_vmcb(svm); > > > > > > @@ -2715,6 +2731,23 @@ static int mwait_interception(struct vcpu_svm > > *svm) > > > return nop_interception(svm); > > > } > > > > > > +static int invpcid_interception(struct vcpu_svm *svm) > > > +{ > > > + struct kvm_vcpu *vcpu = &svm->vcpu; > > > + unsigned long type; > > > + gva_t gva; > > > + > > > + /* > > > + * For an INVPCID intercept: > > > + * EXITINFO1 provides the linear address of the memory operand. > > > + * EXITINFO2 provides the contents of the register operand. > > > + */ > > > + type = svm->vmcb->control.exit_info_2; > > > + gva = svm->vmcb->control.exit_info_1; > > > + > > > + return kvm_handle_invpcid_types(vcpu, gva, type); > > > +} > > > + > > > static int (*const svm_exit_handlers[])(struct vcpu_svm *svm) = { > > > [SVM_EXIT_READ_CR0] = cr_interception, > > > [SVM_EXIT_READ_CR3] = cr_interception, > > > @@ -2777,6 +2810,7 @@ static int (*const svm_exit_handlers[])(struct > > vcpu_svm *svm) = { > > > [SVM_EXIT_MWAIT] = mwait_interception, > > > [SVM_EXIT_XSETBV] = xsetbv_interception, > > > [SVM_EXIT_RDPRU] = rdpru_interception, > > > + [SVM_EXIT_INVPCID] = invpcid_interception, > > > [SVM_EXIT_NPF] = npf_interception, > > > [SVM_EXIT_RSM] = rsm_interception, > > > [SVM_EXIT_AVIC_INCOMPLETE_IPI] = > > avic_incomplete_ipi_interception, > > > @@ -3562,6 +3596,14 @@ static void svm_cpuid_update(struct kvm_vcpu > > *vcpu) > > > svm->nrips_enabled = kvm_cpu_cap_has(X86_FEATURE_NRIPS) && > > > guest_cpuid_has(&svm->vcpu, X86_FEATURE_NRIPS); > > > > > > + /* > > > + * Intercept INVPCID instruction if the baremetal has the support > > > + * but the guest doesn't claim the feature. > > > + */ > > > + if (boot_cpu_has(X86_FEATURE_INVPCID) && > > > + !guest_cpuid_has(vcpu, X86_FEATURE_INVPCID)) > > > + set_extended_intercept(svm, INTERCEPT_INVPCID); > > > + > > > > What if INVPCID is enabled in the guest CPUID later? Shouldn't we then > > clear this intercept bit? > > I assume the feature enable comes in the same code path as this. I can add > "if else" check here if that is what you are suggesting. Yes, that's what I'm suggesting. > > > > > if (!kvm_vcpu_apicv_active(vcpu)) > > > return; > > > > > >
On 6/12/20 3:10 PM, Jim Mattson wrote: > On Fri, Jun 12, 2020 at 12:35 PM Babu Moger <babu.moger@amd.com> wrote: >> >> >> >>> -----Original Message----- >>> From: Jim Mattson <jmattson@google.com> >>> Sent: Thursday, June 11, 2020 6:51 PM >>> To: Moger, Babu <Babu.Moger@amd.com> >>> Cc: Wanpeng Li <wanpengli@tencent.com>; Joerg Roedel <joro@8bytes.org>; >>> the arch/x86 maintainers <x86@kernel.org>; Sean Christopherson >>> <sean.j.christopherson@intel.com>; Ingo Molnar <mingo@redhat.com>; >>> Borislav Petkov <bp@alien8.de>; H . Peter Anvin <hpa@zytor.com>; Paolo >>> Bonzini <pbonzini@redhat.com>; Vitaly Kuznetsov <vkuznets@redhat.com>; >>> Thomas Gleixner <tglx@linutronix.de>; LKML <linux-kernel@vger.kernel.org>; >>> kvm list <kvm@vger.kernel.org> >>> Subject: Re: [PATCH 3/3] KVM:SVM: Enable INVPCID feature on AMD >>> >>> On Thu, Jun 11, 2020 at 2:48 PM Babu Moger <babu.moger@amd.com> wrote: >>>> >>>> The following intercept is added for INVPCID instruction: >>>> Code Name Cause >>>> A2h VMEXIT_INVPCID INVPCID instruction >>>> >>>> The following bit is added to the VMCB layout control area >>>> to control intercept of INVPCID: >>>> Byte Offset Bit(s) Function >>>> 14h 2 intercept INVPCID >>>> >>>> For the guests with nested page table (NPT) support, the INVPCID >>>> feature works as running it natively. KVM does not need to do any >>>> special handling in this case. >>>> >>>> Interceptions are required in the following cases. >>>> 1. If the guest tries to disable the feature when the underlying >>>> hardware supports it. In this case hypervisor needs to report #UD. >>> >>> Per the AMD documentation, attempts to use INVPCID at CPL>0 will >>> result in a #GP, regardless of the intercept bit. If the guest CPUID >>> doesn't enumerate the feature, shouldn't the instruction raise #UD >>> regardless of CPL? This seems to imply that we should intercept #GP >>> and decode the instruction to see if we should synthesize #UD instead. >> >> Purpose here is to report UD when the guest CPUID doesn't enumerate the >> INVPCID feature When Bare-metal supports it. It seems to work fine for >> that purpose. You are right. The #GP for CPL>0 takes precedence over >> interception. No. I am not planning to intercept GP. > > WIthout intercepting #GP, you fail to achieve your stated purpose. I think I have misunderstood this part. I was not inteding to change the #GP behaviour. I will remove this part. My intension of these series is to handle invpcid in shadow page mode. I have verified that part. Hope I did not miss anything else. > >> I will change the text. How about this? >> >> Interceptions are required in the following cases. >> 1. If the guest CPUID doesn't enumerate the INVPCID feature when the >> underlying hardware supports it, hypervisor needs to report UD. However, >> #GP for CPL>0 takes precedence over interception. > > This text is not internally consistent. In one sentence, you say that > "hypervisor needs to report #UD." In the next sentence, you are > essentially saying that the hypervisor doesn't need to report #UD. > Which is it? > >>>> 2. When the guest is running with shadow page table enabled, in >>>> this case the hypervisor needs to handle the tlbflush based on the >>>> type of invpcid instruction type. >>>> >>>> AMD documentation for INVPCID feature is available at "AMD64 >>>> Architecture Programmer’s Manual Volume 2: System Programming, >>>> Pub. 24593 Rev. 3.34(or later)" >>>> >>>> The documentation can be obtained at the links below: >>>> Link: >>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.a >>> md.com%2Fsystem%2Ffiles%2FTechDocs%2F24593.pdf&data=02%7C01%7 >>> Cbabu.moger%40amd.com%7C36861b25f6d143e3b38e08d80e624472%7C3dd8 >>> 961fe4884e608e11a82d994e183d%7C0%7C0%7C637275163374103811&s >>> data=E%2Fdb6T%2BdO4nrtUoqhKidF6XyorsWrphj6O4WwNZpmYA%3D&res >>> erved=0 >>>> Link: >>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla. >>> kernel.org%2Fshow_bug.cgi%3Fid%3D206537&data=02%7C01%7Cbabu.m >>> oger%40amd.com%7C36861b25f6d143e3b38e08d80e624472%7C3dd8961fe488 >>> 4e608e11a82d994e183d%7C0%7C0%7C637275163374103811&sdata=b81 >>> 9W%2FhKS93%2BAp3QvcsR0BwTQpUVUFMbIaNaisgWHRY%3D&reserved= >>> 0 >>>> >>>> Signed-off-by: Babu Moger <babu.moger@amd.com> >>>> --- >>>> arch/x86/include/asm/svm.h | 4 ++++ >>>> arch/x86/include/uapi/asm/svm.h | 2 ++ >>>> arch/x86/kvm/svm/svm.c | 42 >>> +++++++++++++++++++++++++++++++++++++++ >>>> 3 files changed, 48 insertions(+) >>>> >>>> diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h >>>> index 62649fba8908..6488094f67fa 100644 >>>> --- a/arch/x86/include/asm/svm.h >>>> +++ b/arch/x86/include/asm/svm.h >>>> @@ -55,6 +55,10 @@ enum { >>>> INTERCEPT_RDPRU, >>>> }; >>>> >>>> +/* Extended Intercept bits */ >>>> +enum { >>>> + INTERCEPT_INVPCID = 2, >>>> +}; >>>> >>>> struct __attribute__ ((__packed__)) vmcb_control_area { >>>> u32 intercept_cr; >>>> diff --git a/arch/x86/include/uapi/asm/svm.h >>> b/arch/x86/include/uapi/asm/svm.h >>>> index 2e8a30f06c74..522d42dfc28c 100644 >>>> --- a/arch/x86/include/uapi/asm/svm.h >>>> +++ b/arch/x86/include/uapi/asm/svm.h >>>> @@ -76,6 +76,7 @@ >>>> #define SVM_EXIT_MWAIT_COND 0x08c >>>> #define SVM_EXIT_XSETBV 0x08d >>>> #define SVM_EXIT_RDPRU 0x08e >>>> +#define SVM_EXIT_INVPCID 0x0a2 >>>> #define SVM_EXIT_NPF 0x400 >>>> #define SVM_EXIT_AVIC_INCOMPLETE_IPI 0x401 >>>> #define SVM_EXIT_AVIC_UNACCELERATED_ACCESS 0x402 >>>> @@ -171,6 +172,7 @@ >>>> { SVM_EXIT_MONITOR, "monitor" }, \ >>>> { SVM_EXIT_MWAIT, "mwait" }, \ >>>> { SVM_EXIT_XSETBV, "xsetbv" }, \ >>>> + { SVM_EXIT_INVPCID, "invpcid" }, \ >>>> { SVM_EXIT_NPF, "npf" }, \ >>>> { SVM_EXIT_AVIC_INCOMPLETE_IPI, "avic_incomplete_ipi" }, \ >>>> { SVM_EXIT_AVIC_UNACCELERATED_ACCESS, >>> "avic_unaccelerated_access" }, \ >>>> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c >>>> index 285e5e1ff518..82d974338f68 100644 >>>> --- a/arch/x86/kvm/svm/svm.c >>>> +++ b/arch/x86/kvm/svm/svm.c >>>> @@ -813,6 +813,11 @@ static __init void svm_set_cpu_caps(void) >>>> if (boot_cpu_has(X86_FEATURE_LS_CFG_SSBD) || >>>> boot_cpu_has(X86_FEATURE_AMD_SSBD)) >>>> kvm_cpu_cap_set(X86_FEATURE_VIRT_SSBD); >>>> + >>>> + /* Enable INVPCID if both PCID and INVPCID enabled */ >>>> + if (boot_cpu_has(X86_FEATURE_PCID) && >>>> + boot_cpu_has(X86_FEATURE_INVPCID)) >>>> + kvm_cpu_cap_set(X86_FEATURE_INVPCID); >>>> } >>>> >>>> static __init int svm_hardware_setup(void) >>>> @@ -1099,6 +1104,17 @@ static void init_vmcb(struct vcpu_svm *svm) >>>> clr_intercept(svm, INTERCEPT_PAUSE); >>>> } >>>> >>>> + /* >>>> + * Intercept INVPCID instruction only if shadow page table is >>>> + * enabled. Interception is not required with nested page table. >>>> + */ >>>> + if (boot_cpu_has(X86_FEATURE_INVPCID)) { >>>> + if (!npt_enabled) >>>> + set_extended_intercept(svm, INTERCEPT_INVPCID); >>>> + else >>>> + clr_extended_intercept(svm, INTERCEPT_INVPCID); >>>> + } >>>> + >>>> if (kvm_vcpu_apicv_active(&svm->vcpu)) >>>> avic_init_vmcb(svm); >>>> >>>> @@ -2715,6 +2731,23 @@ static int mwait_interception(struct vcpu_svm >>> *svm) >>>> return nop_interception(svm); >>>> } >>>> >>>> +static int invpcid_interception(struct vcpu_svm *svm) >>>> +{ >>>> + struct kvm_vcpu *vcpu = &svm->vcpu; >>>> + unsigned long type; >>>> + gva_t gva; >>>> + >>>> + /* >>>> + * For an INVPCID intercept: >>>> + * EXITINFO1 provides the linear address of the memory operand. >>>> + * EXITINFO2 provides the contents of the register operand. >>>> + */ >>>> + type = svm->vmcb->control.exit_info_2; >>>> + gva = svm->vmcb->control.exit_info_1; >>>> + >>>> + return kvm_handle_invpcid_types(vcpu, gva, type); >>>> +} >>>> + >>>> static int (*const svm_exit_handlers[])(struct vcpu_svm *svm) = { >>>> [SVM_EXIT_READ_CR0] = cr_interception, >>>> [SVM_EXIT_READ_CR3] = cr_interception, >>>> @@ -2777,6 +2810,7 @@ static int (*const svm_exit_handlers[])(struct >>> vcpu_svm *svm) = { >>>> [SVM_EXIT_MWAIT] = mwait_interception, >>>> [SVM_EXIT_XSETBV] = xsetbv_interception, >>>> [SVM_EXIT_RDPRU] = rdpru_interception, >>>> + [SVM_EXIT_INVPCID] = invpcid_interception, >>>> [SVM_EXIT_NPF] = npf_interception, >>>> [SVM_EXIT_RSM] = rsm_interception, >>>> [SVM_EXIT_AVIC_INCOMPLETE_IPI] = >>> avic_incomplete_ipi_interception, >>>> @@ -3562,6 +3596,14 @@ static void svm_cpuid_update(struct kvm_vcpu >>> *vcpu) >>>> svm->nrips_enabled = kvm_cpu_cap_has(X86_FEATURE_NRIPS) && >>>> guest_cpuid_has(&svm->vcpu, X86_FEATURE_NRIPS); >>>> >>>> + /* >>>> + * Intercept INVPCID instruction if the baremetal has the support >>>> + * but the guest doesn't claim the feature. >>>> + */ >>>> + if (boot_cpu_has(X86_FEATURE_INVPCID) && >>>> + !guest_cpuid_has(vcpu, X86_FEATURE_INVPCID)) >>>> + set_extended_intercept(svm, INTERCEPT_INVPCID); >>>> + >>> >>> What if INVPCID is enabled in the guest CPUID later? Shouldn't we then >>> clear this intercept bit? >> >> I assume the feature enable comes in the same code path as this. I can add >> "if else" check here if that is what you are suggesting. > > Yes, that's what I'm suggesting. > >>> >>>> if (!kvm_vcpu_apicv_active(vcpu)) >>>> return; >>>> >>>>
On Fri, Jun 12, 2020 at 2:47 PM Babu Moger <babu.moger@amd.com> wrote: > > > > On 6/12/20 3:10 PM, Jim Mattson wrote: > > On Fri, Jun 12, 2020 at 12:35 PM Babu Moger <babu.moger@amd.com> wrote: > >> > >> > >> > >>> -----Original Message----- > >>> From: Jim Mattson <jmattson@google.com> > >>> Sent: Thursday, June 11, 2020 6:51 PM > >>> To: Moger, Babu <Babu.Moger@amd.com> > >>> Cc: Wanpeng Li <wanpengli@tencent.com>; Joerg Roedel <joro@8bytes.org>; > >>> the arch/x86 maintainers <x86@kernel.org>; Sean Christopherson > >>> <sean.j.christopherson@intel.com>; Ingo Molnar <mingo@redhat.com>; > >>> Borislav Petkov <bp@alien8.de>; H . Peter Anvin <hpa@zytor.com>; Paolo > >>> Bonzini <pbonzini@redhat.com>; Vitaly Kuznetsov <vkuznets@redhat.com>; > >>> Thomas Gleixner <tglx@linutronix.de>; LKML <linux-kernel@vger.kernel.org>; > >>> kvm list <kvm@vger.kernel.org> > >>> Subject: Re: [PATCH 3/3] KVM:SVM: Enable INVPCID feature on AMD > >>> > >>> On Thu, Jun 11, 2020 at 2:48 PM Babu Moger <babu.moger@amd.com> wrote: > >>>> > >>>> The following intercept is added for INVPCID instruction: > >>>> Code Name Cause > >>>> A2h VMEXIT_INVPCID INVPCID instruction > >>>> > >>>> The following bit is added to the VMCB layout control area > >>>> to control intercept of INVPCID: > >>>> Byte Offset Bit(s) Function > >>>> 14h 2 intercept INVPCID > >>>> > >>>> For the guests with nested page table (NPT) support, the INVPCID > >>>> feature works as running it natively. KVM does not need to do any > >>>> special handling in this case. > >>>> > >>>> Interceptions are required in the following cases. > >>>> 1. If the guest tries to disable the feature when the underlying > >>>> hardware supports it. In this case hypervisor needs to report #UD. > >>> > >>> Per the AMD documentation, attempts to use INVPCID at CPL>0 will > >>> result in a #GP, regardless of the intercept bit. If the guest CPUID > >>> doesn't enumerate the feature, shouldn't the instruction raise #UD > >>> regardless of CPL? This seems to imply that we should intercept #GP > >>> and decode the instruction to see if we should synthesize #UD instead. > >> > >> Purpose here is to report UD when the guest CPUID doesn't enumerate the > >> INVPCID feature When Bare-metal supports it. It seems to work fine for > >> that purpose. You are right. The #GP for CPL>0 takes precedence over > >> interception. No. I am not planning to intercept GP. > > > > WIthout intercepting #GP, you fail to achieve your stated purpose. > > I think I have misunderstood this part. I was not inteding to change the > #GP behaviour. I will remove this part. My intension of these series is to > handle invpcid in shadow page mode. I have verified that part. Hope I did > not miss anything else. You don't really have to intercept INVPCID when tdp is in use, right? There are certainly plenty of operations for which kvm does not properly raise #UD when they aren't enumerated in the guest CPUID. > >> I will change the text. How about this? > >> > >> Interceptions are required in the following cases. > >> 1. If the guest CPUID doesn't enumerate the INVPCID feature when the > >> underlying hardware supports it, hypervisor needs to report UD. However, > >> #GP for CPL>0 takes precedence over interception. > > > > This text is not internally consistent. In one sentence, you say that > > "hypervisor needs to report #UD." In the next sentence, you are > > essentially saying that the hypervisor doesn't need to report #UD. > > Which is it? > > > >>>> 2. When the guest is running with shadow page table enabled, in > >>>> this case the hypervisor needs to handle the tlbflush based on the > >>>> type of invpcid instruction type. > >>>> > >>>> AMD documentation for INVPCID feature is available at "AMD64 > >>>> Architecture Programmer’s Manual Volume 2: System Programming, > >>>> Pub. 24593 Rev. 3.34(or later)" > >>>> > >>>> The documentation can be obtained at the links below: > >>>> Link: > >>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.a > >>> md.com%2Fsystem%2Ffiles%2FTechDocs%2F24593.pdf&data=02%7C01%7 > >>> Cbabu.moger%40amd.com%7C36861b25f6d143e3b38e08d80e624472%7C3dd8 > >>> 961fe4884e608e11a82d994e183d%7C0%7C0%7C637275163374103811&s > >>> data=E%2Fdb6T%2BdO4nrtUoqhKidF6XyorsWrphj6O4WwNZpmYA%3D&res > >>> erved=0 > >>>> Link: > >>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla. > >>> kernel.org%2Fshow_bug.cgi%3Fid%3D206537&data=02%7C01%7Cbabu.m > >>> oger%40amd.com%7C36861b25f6d143e3b38e08d80e624472%7C3dd8961fe488 > >>> 4e608e11a82d994e183d%7C0%7C0%7C637275163374103811&sdata=b81 > >>> 9W%2FhKS93%2BAp3QvcsR0BwTQpUVUFMbIaNaisgWHRY%3D&reserved= > >>> 0 > >>>> > >>>> Signed-off-by: Babu Moger <babu.moger@amd.com> > >>>> --- > >>>> arch/x86/include/asm/svm.h | 4 ++++ > >>>> arch/x86/include/uapi/asm/svm.h | 2 ++ > >>>> arch/x86/kvm/svm/svm.c | 42 > >>> +++++++++++++++++++++++++++++++++++++++ > >>>> 3 files changed, 48 insertions(+) > >>>> > >>>> diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h > >>>> index 62649fba8908..6488094f67fa 100644 > >>>> --- a/arch/x86/include/asm/svm.h > >>>> +++ b/arch/x86/include/asm/svm.h > >>>> @@ -55,6 +55,10 @@ enum { > >>>> INTERCEPT_RDPRU, > >>>> }; > >>>> > >>>> +/* Extended Intercept bits */ > >>>> +enum { > >>>> + INTERCEPT_INVPCID = 2, > >>>> +}; > >>>> > >>>> struct __attribute__ ((__packed__)) vmcb_control_area { > >>>> u32 intercept_cr; > >>>> diff --git a/arch/x86/include/uapi/asm/svm.h > >>> b/arch/x86/include/uapi/asm/svm.h > >>>> index 2e8a30f06c74..522d42dfc28c 100644 > >>>> --- a/arch/x86/include/uapi/asm/svm.h > >>>> +++ b/arch/x86/include/uapi/asm/svm.h > >>>> @@ -76,6 +76,7 @@ > >>>> #define SVM_EXIT_MWAIT_COND 0x08c > >>>> #define SVM_EXIT_XSETBV 0x08d > >>>> #define SVM_EXIT_RDPRU 0x08e > >>>> +#define SVM_EXIT_INVPCID 0x0a2 > >>>> #define SVM_EXIT_NPF 0x400 > >>>> #define SVM_EXIT_AVIC_INCOMPLETE_IPI 0x401 > >>>> #define SVM_EXIT_AVIC_UNACCELERATED_ACCESS 0x402 > >>>> @@ -171,6 +172,7 @@ > >>>> { SVM_EXIT_MONITOR, "monitor" }, \ > >>>> { SVM_EXIT_MWAIT, "mwait" }, \ > >>>> { SVM_EXIT_XSETBV, "xsetbv" }, \ > >>>> + { SVM_EXIT_INVPCID, "invpcid" }, \ > >>>> { SVM_EXIT_NPF, "npf" }, \ > >>>> { SVM_EXIT_AVIC_INCOMPLETE_IPI, "avic_incomplete_ipi" }, \ > >>>> { SVM_EXIT_AVIC_UNACCELERATED_ACCESS, > >>> "avic_unaccelerated_access" }, \ > >>>> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c > >>>> index 285e5e1ff518..82d974338f68 100644 > >>>> --- a/arch/x86/kvm/svm/svm.c > >>>> +++ b/arch/x86/kvm/svm/svm.c > >>>> @@ -813,6 +813,11 @@ static __init void svm_set_cpu_caps(void) > >>>> if (boot_cpu_has(X86_FEATURE_LS_CFG_SSBD) || > >>>> boot_cpu_has(X86_FEATURE_AMD_SSBD)) > >>>> kvm_cpu_cap_set(X86_FEATURE_VIRT_SSBD); > >>>> + > >>>> + /* Enable INVPCID if both PCID and INVPCID enabled */ > >>>> + if (boot_cpu_has(X86_FEATURE_PCID) && > >>>> + boot_cpu_has(X86_FEATURE_INVPCID)) > >>>> + kvm_cpu_cap_set(X86_FEATURE_INVPCID); > >>>> } > >>>> > >>>> static __init int svm_hardware_setup(void) > >>>> @@ -1099,6 +1104,17 @@ static void init_vmcb(struct vcpu_svm *svm) > >>>> clr_intercept(svm, INTERCEPT_PAUSE); > >>>> } > >>>> > >>>> + /* > >>>> + * Intercept INVPCID instruction only if shadow page table is > >>>> + * enabled. Interception is not required with nested page table. > >>>> + */ > >>>> + if (boot_cpu_has(X86_FEATURE_INVPCID)) { > >>>> + if (!npt_enabled) > >>>> + set_extended_intercept(svm, INTERCEPT_INVPCID); > >>>> + else > >>>> + clr_extended_intercept(svm, INTERCEPT_INVPCID); > >>>> + } > >>>> + > >>>> if (kvm_vcpu_apicv_active(&svm->vcpu)) > >>>> avic_init_vmcb(svm); > >>>> > >>>> @@ -2715,6 +2731,23 @@ static int mwait_interception(struct vcpu_svm > >>> *svm) > >>>> return nop_interception(svm); > >>>> } > >>>> > >>>> +static int invpcid_interception(struct vcpu_svm *svm) > >>>> +{ > >>>> + struct kvm_vcpu *vcpu = &svm->vcpu; > >>>> + unsigned long type; > >>>> + gva_t gva; > >>>> + > >>>> + /* > >>>> + * For an INVPCID intercept: > >>>> + * EXITINFO1 provides the linear address of the memory operand. > >>>> + * EXITINFO2 provides the contents of the register operand. > >>>> + */ > >>>> + type = svm->vmcb->control.exit_info_2; > >>>> + gva = svm->vmcb->control.exit_info_1; > >>>> + > >>>> + return kvm_handle_invpcid_types(vcpu, gva, type); > >>>> +} > >>>> + > >>>> static int (*const svm_exit_handlers[])(struct vcpu_svm *svm) = { > >>>> [SVM_EXIT_READ_CR0] = cr_interception, > >>>> [SVM_EXIT_READ_CR3] = cr_interception, > >>>> @@ -2777,6 +2810,7 @@ static int (*const svm_exit_handlers[])(struct > >>> vcpu_svm *svm) = { > >>>> [SVM_EXIT_MWAIT] = mwait_interception, > >>>> [SVM_EXIT_XSETBV] = xsetbv_interception, > >>>> [SVM_EXIT_RDPRU] = rdpru_interception, > >>>> + [SVM_EXIT_INVPCID] = invpcid_interception, > >>>> [SVM_EXIT_NPF] = npf_interception, > >>>> [SVM_EXIT_RSM] = rsm_interception, > >>>> [SVM_EXIT_AVIC_INCOMPLETE_IPI] = > >>> avic_incomplete_ipi_interception, > >>>> @@ -3562,6 +3596,14 @@ static void svm_cpuid_update(struct kvm_vcpu > >>> *vcpu) > >>>> svm->nrips_enabled = kvm_cpu_cap_has(X86_FEATURE_NRIPS) && > >>>> guest_cpuid_has(&svm->vcpu, X86_FEATURE_NRIPS); > >>>> > >>>> + /* > >>>> + * Intercept INVPCID instruction if the baremetal has the support > >>>> + * but the guest doesn't claim the feature. > >>>> + */ > >>>> + if (boot_cpu_has(X86_FEATURE_INVPCID) && > >>>> + !guest_cpuid_has(vcpu, X86_FEATURE_INVPCID)) > >>>> + set_extended_intercept(svm, INTERCEPT_INVPCID); > >>>> + > >>> > >>> What if INVPCID is enabled in the guest CPUID later? Shouldn't we then > >>> clear this intercept bit? > >> > >> I assume the feature enable comes in the same code path as this. I can add > >> "if else" check here if that is what you are suggesting. > > > > Yes, that's what I'm suggesting. > > > >>> > >>>> if (!kvm_vcpu_apicv_active(vcpu)) > >>>> return; > >>>> > >>>>
On 13/06/20 02:04, Jim Mattson wrote: >> I think I have misunderstood this part. I was not inteding to change the >> #GP behaviour. I will remove this part. My intension of these series is to >> handle invpcid in shadow page mode. I have verified that part. Hope I did >> not miss anything else. > You don't really have to intercept INVPCID when tdp is in use, right? > There are certainly plenty of operations for which kvm does not > properly raise #UD when they aren't enumerated in the guest CPUID. > Indeed; for RDRAND and RDSEED it makes sense to do so because the instructions may introduce undesirable nondeterminism (or block all the packages in your core as they have been doing for a few weeks). Therefore on Intel we trap them and raise #UD; on AMD this is not possible because RDRAND has no intercept. In general however we do not try to hard to raise #UD and that is usually not even possible. Paolo
> -----Original Message----- > From: Jim Mattson <jmattson@google.com> > Sent: Friday, June 12, 2020 7:04 PM > To: Moger, Babu <Babu.Moger@amd.com> > Cc: Wanpeng Li <wanpengli@tencent.com>; Joerg Roedel <joro@8bytes.org>; > the arch/x86 maintainers <x86@kernel.org>; Sean Christopherson > <sean.j.christopherson@intel.com>; Ingo Molnar <mingo@redhat.com>; > Borislav Petkov <bp@alien8.de>; H . Peter Anvin <hpa@zytor.com>; Paolo > Bonzini <pbonzini@redhat.com>; Vitaly Kuznetsov <vkuznets@redhat.com>; > Thomas Gleixner <tglx@linutronix.de>; LKML <linux-kernel@vger.kernel.org>; > kvm list <kvm@vger.kernel.org> > Subject: Re: [PATCH 3/3] KVM:SVM: Enable INVPCID feature on AMD > > On Fri, Jun 12, 2020 at 2:47 PM Babu Moger <babu.moger@amd.com> wrote: > > > > > > > > On 6/12/20 3:10 PM, Jim Mattson wrote: > > > On Fri, Jun 12, 2020 at 12:35 PM Babu Moger <babu.moger@amd.com> > wrote: > > >> > > >> > > >> > > >>> -----Original Message----- > > >>> From: Jim Mattson <jmattson@google.com> > > >>> Sent: Thursday, June 11, 2020 6:51 PM > > >>> To: Moger, Babu <Babu.Moger@amd.com> > > >>> Cc: Wanpeng Li <wanpengli@tencent.com>; Joerg Roedel > <joro@8bytes.org>; > > >>> the arch/x86 maintainers <x86@kernel.org>; Sean Christopherson > > >>> <sean.j.christopherson@intel.com>; Ingo Molnar <mingo@redhat.com>; > > >>> Borislav Petkov <bp@alien8.de>; H . Peter Anvin <hpa@zytor.com>; Paolo > > >>> Bonzini <pbonzini@redhat.com>; Vitaly Kuznetsov > <vkuznets@redhat.com>; > > >>> Thomas Gleixner <tglx@linutronix.de>; LKML <linux- > kernel@vger.kernel.org>; > > >>> kvm list <kvm@vger.kernel.org> > > >>> Subject: Re: [PATCH 3/3] KVM:SVM: Enable INVPCID feature on AMD > > >>> > > >>> On Thu, Jun 11, 2020 at 2:48 PM Babu Moger <babu.moger@amd.com> > wrote: > > >>>> > > >>>> The following intercept is added for INVPCID instruction: > > >>>> Code Name Cause > > >>>> A2h VMEXIT_INVPCID INVPCID instruction > > >>>> > > >>>> The following bit is added to the VMCB layout control area > > >>>> to control intercept of INVPCID: > > >>>> Byte Offset Bit(s) Function > > >>>> 14h 2 intercept INVPCID > > >>>> > > >>>> For the guests with nested page table (NPT) support, the INVPCID > > >>>> feature works as running it natively. KVM does not need to do any > > >>>> special handling in this case. > > >>>> > > >>>> Interceptions are required in the following cases. > > >>>> 1. If the guest tries to disable the feature when the underlying > > >>>> hardware supports it. In this case hypervisor needs to report #UD. > > >>> > > >>> Per the AMD documentation, attempts to use INVPCID at CPL>0 will > > >>> result in a #GP, regardless of the intercept bit. If the guest CPUID > > >>> doesn't enumerate the feature, shouldn't the instruction raise #UD > > >>> regardless of CPL? This seems to imply that we should intercept #GP > > >>> and decode the instruction to see if we should synthesize #UD instead. > > >> > > >> Purpose here is to report UD when the guest CPUID doesn't enumerate the > > >> INVPCID feature When Bare-metal supports it. It seems to work fine for > > >> that purpose. You are right. The #GP for CPL>0 takes precedence over > > >> interception. No. I am not planning to intercept GP. > > > > > > WIthout intercepting #GP, you fail to achieve your stated purpose. > > > > I think I have misunderstood this part. I was not inteding to change the > > #GP behaviour. I will remove this part. My intension of these series is to > > handle invpcid in shadow page mode. I have verified that part. Hope I did > > not miss anything else. > > You don't really have to intercept INVPCID when tdp is in use, right? That is correct. Adding the intercept only when tdp is off. > There are certainly plenty of operations for which kvm does not > properly raise #UD when they aren't enumerated in the guest CPUID. > > > >> I will change the text. How about this? > > >> > > >> Interceptions are required in the following cases. > > >> 1. If the guest CPUID doesn't enumerate the INVPCID feature when the > > >> underlying hardware supports it, hypervisor needs to report UD. However, > > >> #GP for CPL>0 takes precedence over interception. > > > > > > This text is not internally consistent. In one sentence, you say that > > > "hypervisor needs to report #UD." In the next sentence, you are > > > essentially saying that the hypervisor doesn't need to report #UD. > > > Which is it? > > > > > >>>> 2. When the guest is running with shadow page table enabled, in > > >>>> this case the hypervisor needs to handle the tlbflush based on the > > >>>> type of invpcid instruction type. > > >>>> > > >>>> AMD documentation for INVPCID feature is available at "AMD64 > > >>>> Architecture Programmer’s Manual Volume 2: System Programming, > > >>>> Pub. 24593 Rev. 3.34(or later)" > > >>>> > > >>>> The documentation can be obtained at the links below: > > >>>> Link: > > >>> > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.a > > >>> > md.com%2Fsystem%2Ffiles%2FTechDocs%2F24593.pdf&data=02%7C01%7 > > >>> > Cbabu.moger%40amd.com%7C36861b25f6d143e3b38e08d80e624472%7C3dd8 > > >>> > 961fe4884e608e11a82d994e183d%7C0%7C0%7C637275163374103811&s > > >>> > data=E%2Fdb6T%2BdO4nrtUoqhKidF6XyorsWrphj6O4WwNZpmYA%3D&res > > >>> erved=0 > > >>>> Link: > > >>> > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla. > > >>> > kernel.org%2Fshow_bug.cgi%3Fid%3D206537&data=02%7C01%7Cbabu.m > > >>> > oger%40amd.com%7C36861b25f6d143e3b38e08d80e624472%7C3dd8961fe488 > > >>> > 4e608e11a82d994e183d%7C0%7C0%7C637275163374103811&sdata=b81 > > >>> > 9W%2FhKS93%2BAp3QvcsR0BwTQpUVUFMbIaNaisgWHRY%3D&reserved= > > >>> 0 > > >>>> > > >>>> Signed-off-by: Babu Moger <babu.moger@amd.com> > > >>>> --- > > >>>> arch/x86/include/asm/svm.h | 4 ++++ > > >>>> arch/x86/include/uapi/asm/svm.h | 2 ++ > > >>>> arch/x86/kvm/svm/svm.c | 42 > > >>> +++++++++++++++++++++++++++++++++++++++ > > >>>> 3 files changed, 48 insertions(+) > > >>>> > > >>>> diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h > > >>>> index 62649fba8908..6488094f67fa 100644 > > >>>> --- a/arch/x86/include/asm/svm.h > > >>>> +++ b/arch/x86/include/asm/svm.h > > >>>> @@ -55,6 +55,10 @@ enum { > > >>>> INTERCEPT_RDPRU, > > >>>> }; > > >>>> > > >>>> +/* Extended Intercept bits */ > > >>>> +enum { > > >>>> + INTERCEPT_INVPCID = 2, > > >>>> +}; > > >>>> > > >>>> struct __attribute__ ((__packed__)) vmcb_control_area { > > >>>> u32 intercept_cr; > > >>>> diff --git a/arch/x86/include/uapi/asm/svm.h > > >>> b/arch/x86/include/uapi/asm/svm.h > > >>>> index 2e8a30f06c74..522d42dfc28c 100644 > > >>>> --- a/arch/x86/include/uapi/asm/svm.h > > >>>> +++ b/arch/x86/include/uapi/asm/svm.h > > >>>> @@ -76,6 +76,7 @@ > > >>>> #define SVM_EXIT_MWAIT_COND 0x08c > > >>>> #define SVM_EXIT_XSETBV 0x08d > > >>>> #define SVM_EXIT_RDPRU 0x08e > > >>>> +#define SVM_EXIT_INVPCID 0x0a2 > > >>>> #define SVM_EXIT_NPF 0x400 > > >>>> #define SVM_EXIT_AVIC_INCOMPLETE_IPI 0x401 > > >>>> #define SVM_EXIT_AVIC_UNACCELERATED_ACCESS 0x402 > > >>>> @@ -171,6 +172,7 @@ > > >>>> { SVM_EXIT_MONITOR, "monitor" }, \ > > >>>> { SVM_EXIT_MWAIT, "mwait" }, \ > > >>>> { SVM_EXIT_XSETBV, "xsetbv" }, \ > > >>>> + { SVM_EXIT_INVPCID, "invpcid" }, \ > > >>>> { SVM_EXIT_NPF, "npf" }, \ > > >>>> { SVM_EXIT_AVIC_INCOMPLETE_IPI, "avic_incomplete_ipi" }, \ > > >>>> { SVM_EXIT_AVIC_UNACCELERATED_ACCESS, > > >>> "avic_unaccelerated_access" }, \ > > >>>> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c > > >>>> index 285e5e1ff518..82d974338f68 100644 > > >>>> --- a/arch/x86/kvm/svm/svm.c > > >>>> +++ b/arch/x86/kvm/svm/svm.c > > >>>> @@ -813,6 +813,11 @@ static __init void svm_set_cpu_caps(void) > > >>>> if (boot_cpu_has(X86_FEATURE_LS_CFG_SSBD) || > > >>>> boot_cpu_has(X86_FEATURE_AMD_SSBD)) > > >>>> kvm_cpu_cap_set(X86_FEATURE_VIRT_SSBD); > > >>>> + > > >>>> + /* Enable INVPCID if both PCID and INVPCID enabled */ > > >>>> + if (boot_cpu_has(X86_FEATURE_PCID) && > > >>>> + boot_cpu_has(X86_FEATURE_INVPCID)) > > >>>> + kvm_cpu_cap_set(X86_FEATURE_INVPCID); > > >>>> } > > >>>> > > >>>> static __init int svm_hardware_setup(void) > > >>>> @@ -1099,6 +1104,17 @@ static void init_vmcb(struct vcpu_svm *svm) > > >>>> clr_intercept(svm, INTERCEPT_PAUSE); > > >>>> } > > >>>> > > >>>> + /* > > >>>> + * Intercept INVPCID instruction only if shadow page table is > > >>>> + * enabled. Interception is not required with nested page table. > > >>>> + */ > > >>>> + if (boot_cpu_has(X86_FEATURE_INVPCID)) { > > >>>> + if (!npt_enabled) > > >>>> + set_extended_intercept(svm, INTERCEPT_INVPCID); > > >>>> + else > > >>>> + clr_extended_intercept(svm, INTERCEPT_INVPCID); > > >>>> + } > > >>>> + > > >>>> if (kvm_vcpu_apicv_active(&svm->vcpu)) > > >>>> avic_init_vmcb(svm); > > >>>> > > >>>> @@ -2715,6 +2731,23 @@ static int mwait_interception(struct > vcpu_svm > > >>> *svm) > > >>>> return nop_interception(svm); > > >>>> } > > >>>> > > >>>> +static int invpcid_interception(struct vcpu_svm *svm) > > >>>> +{ > > >>>> + struct kvm_vcpu *vcpu = &svm->vcpu; > > >>>> + unsigned long type; > > >>>> + gva_t gva; > > >>>> + > > >>>> + /* > > >>>> + * For an INVPCID intercept: > > >>>> + * EXITINFO1 provides the linear address of the memory operand. > > >>>> + * EXITINFO2 provides the contents of the register operand. > > >>>> + */ > > >>>> + type = svm->vmcb->control.exit_info_2; > > >>>> + gva = svm->vmcb->control.exit_info_1; > > >>>> + > > >>>> + return kvm_handle_invpcid_types(vcpu, gva, type); > > >>>> +} > > >>>> + > > >>>> static int (*const svm_exit_handlers[])(struct vcpu_svm *svm) = { > > >>>> [SVM_EXIT_READ_CR0] = cr_interception, > > >>>> [SVM_EXIT_READ_CR3] = cr_interception, > > >>>> @@ -2777,6 +2810,7 @@ static int (*const svm_exit_handlers[])(struct > > >>> vcpu_svm *svm) = { > > >>>> [SVM_EXIT_MWAIT] = mwait_interception, > > >>>> [SVM_EXIT_XSETBV] = xsetbv_interception, > > >>>> [SVM_EXIT_RDPRU] = rdpru_interception, > > >>>> + [SVM_EXIT_INVPCID] = invpcid_interception, > > >>>> [SVM_EXIT_NPF] = npf_interception, > > >>>> [SVM_EXIT_RSM] = rsm_interception, > > >>>> [SVM_EXIT_AVIC_INCOMPLETE_IPI] = > > >>> avic_incomplete_ipi_interception, > > >>>> @@ -3562,6 +3596,14 @@ static void svm_cpuid_update(struct > kvm_vcpu > > >>> *vcpu) > > >>>> svm->nrips_enabled = kvm_cpu_cap_has(X86_FEATURE_NRIPS) && > > >>>> guest_cpuid_has(&svm->vcpu, X86_FEATURE_NRIPS); > > >>>> > > >>>> + /* > > >>>> + * Intercept INVPCID instruction if the baremetal has the support > > >>>> + * but the guest doesn't claim the feature. > > >>>> + */ > > >>>> + if (boot_cpu_has(X86_FEATURE_INVPCID) && > > >>>> + !guest_cpuid_has(vcpu, X86_FEATURE_INVPCID)) > > >>>> + set_extended_intercept(svm, INTERCEPT_INVPCID); > > >>>> + > > >>> > > >>> What if INVPCID is enabled in the guest CPUID later? Shouldn't we then > > >>> clear this intercept bit? > > >> > > >> I assume the feature enable comes in the same code path as this. I can add > > >> "if else" check here if that is what you are suggesting. > > > > > > Yes, that's what I'm suggesting. > > > > > >>> > > >>>> if (!kvm_vcpu_apicv_active(vcpu)) > > >>>> return; > > >>>> > > >>>>
> -----Original Message----- > From: Paolo Bonzini <pbonzini@redhat.com> > Sent: Monday, June 15, 2020 8:47 AM > To: Jim Mattson <jmattson@google.com>; Moger, Babu > <Babu.Moger@amd.com> > Cc: Wanpeng Li <wanpengli@tencent.com>; Joerg Roedel <joro@8bytes.org>; > the arch/x86 maintainers <x86@kernel.org>; Sean Christopherson > <sean.j.christopherson@intel.com>; Ingo Molnar <mingo@redhat.com>; > Borislav Petkov <bp@alien8.de>; H . Peter Anvin <hpa@zytor.com>; Vitaly > Kuznetsov <vkuznets@redhat.com>; Thomas Gleixner <tglx@linutronix.de>; > LKML <linux-kernel@vger.kernel.org>; kvm list <kvm@vger.kernel.org> > Subject: Re: [PATCH 3/3] KVM:SVM: Enable INVPCID feature on AMD > > On 13/06/20 02:04, Jim Mattson wrote: > >> I think I have misunderstood this part. I was not inteding to change the > >> #GP behaviour. I will remove this part. My intension of these series is to > >> handle invpcid in shadow page mode. I have verified that part. Hope I did > >> not miss anything else. > > You don't really have to intercept INVPCID when tdp is in use, right? > > There are certainly plenty of operations for which kvm does not > > properly raise #UD when they aren't enumerated in the guest CPUID. > > > > Indeed; for RDRAND and RDSEED it makes sense to do so because the > instructions may introduce undesirable nondeterminism (or block all the > packages in your core as they have been doing for a few weeks). > Therefore on Intel we trap them and raise #UD; on AMD this is not > possible because RDRAND has no intercept. > > In general however we do not try to hard to raise #UD and that is > usually not even possible. Yes. Sure. Thanks. > > Paolo
diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h index 62649fba8908..6488094f67fa 100644 --- a/arch/x86/include/asm/svm.h +++ b/arch/x86/include/asm/svm.h @@ -55,6 +55,10 @@ enum { INTERCEPT_RDPRU, }; +/* Extended Intercept bits */ +enum { + INTERCEPT_INVPCID = 2, +}; struct __attribute__ ((__packed__)) vmcb_control_area { u32 intercept_cr; diff --git a/arch/x86/include/uapi/asm/svm.h b/arch/x86/include/uapi/asm/svm.h index 2e8a30f06c74..522d42dfc28c 100644 --- a/arch/x86/include/uapi/asm/svm.h +++ b/arch/x86/include/uapi/asm/svm.h @@ -76,6 +76,7 @@ #define SVM_EXIT_MWAIT_COND 0x08c #define SVM_EXIT_XSETBV 0x08d #define SVM_EXIT_RDPRU 0x08e +#define SVM_EXIT_INVPCID 0x0a2 #define SVM_EXIT_NPF 0x400 #define SVM_EXIT_AVIC_INCOMPLETE_IPI 0x401 #define SVM_EXIT_AVIC_UNACCELERATED_ACCESS 0x402 @@ -171,6 +172,7 @@ { SVM_EXIT_MONITOR, "monitor" }, \ { SVM_EXIT_MWAIT, "mwait" }, \ { SVM_EXIT_XSETBV, "xsetbv" }, \ + { SVM_EXIT_INVPCID, "invpcid" }, \ { SVM_EXIT_NPF, "npf" }, \ { SVM_EXIT_AVIC_INCOMPLETE_IPI, "avic_incomplete_ipi" }, \ { SVM_EXIT_AVIC_UNACCELERATED_ACCESS, "avic_unaccelerated_access" }, \ diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index 285e5e1ff518..82d974338f68 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -813,6 +813,11 @@ static __init void svm_set_cpu_caps(void) if (boot_cpu_has(X86_FEATURE_LS_CFG_SSBD) || boot_cpu_has(X86_FEATURE_AMD_SSBD)) kvm_cpu_cap_set(X86_FEATURE_VIRT_SSBD); + + /* Enable INVPCID if both PCID and INVPCID enabled */ + if (boot_cpu_has(X86_FEATURE_PCID) && + boot_cpu_has(X86_FEATURE_INVPCID)) + kvm_cpu_cap_set(X86_FEATURE_INVPCID); } static __init int svm_hardware_setup(void) @@ -1099,6 +1104,17 @@ static void init_vmcb(struct vcpu_svm *svm) clr_intercept(svm, INTERCEPT_PAUSE); } + /* + * Intercept INVPCID instruction only if shadow page table is + * enabled. Interception is not required with nested page table. + */ + if (boot_cpu_has(X86_FEATURE_INVPCID)) { + if (!npt_enabled) + set_extended_intercept(svm, INTERCEPT_INVPCID); + else + clr_extended_intercept(svm, INTERCEPT_INVPCID); + } + if (kvm_vcpu_apicv_active(&svm->vcpu)) avic_init_vmcb(svm); @@ -2715,6 +2731,23 @@ static int mwait_interception(struct vcpu_svm *svm) return nop_interception(svm); } +static int invpcid_interception(struct vcpu_svm *svm) +{ + struct kvm_vcpu *vcpu = &svm->vcpu; + unsigned long type; + gva_t gva; + + /* + * For an INVPCID intercept: + * EXITINFO1 provides the linear address of the memory operand. + * EXITINFO2 provides the contents of the register operand. + */ + type = svm->vmcb->control.exit_info_2; + gva = svm->vmcb->control.exit_info_1; + + return kvm_handle_invpcid_types(vcpu, gva, type); +} + static int (*const svm_exit_handlers[])(struct vcpu_svm *svm) = { [SVM_EXIT_READ_CR0] = cr_interception, [SVM_EXIT_READ_CR3] = cr_interception, @@ -2777,6 +2810,7 @@ static int (*const svm_exit_handlers[])(struct vcpu_svm *svm) = { [SVM_EXIT_MWAIT] = mwait_interception, [SVM_EXIT_XSETBV] = xsetbv_interception, [SVM_EXIT_RDPRU] = rdpru_interception, + [SVM_EXIT_INVPCID] = invpcid_interception, [SVM_EXIT_NPF] = npf_interception, [SVM_EXIT_RSM] = rsm_interception, [SVM_EXIT_AVIC_INCOMPLETE_IPI] = avic_incomplete_ipi_interception, @@ -3562,6 +3596,14 @@ static void svm_cpuid_update(struct kvm_vcpu *vcpu) svm->nrips_enabled = kvm_cpu_cap_has(X86_FEATURE_NRIPS) && guest_cpuid_has(&svm->vcpu, X86_FEATURE_NRIPS); + /* + * Intercept INVPCID instruction if the baremetal has the support + * but the guest doesn't claim the feature. + */ + if (boot_cpu_has(X86_FEATURE_INVPCID) && + !guest_cpuid_has(vcpu, X86_FEATURE_INVPCID)) + set_extended_intercept(svm, INTERCEPT_INVPCID); + if (!kvm_vcpu_apicv_active(vcpu)) return;
The following intercept is added for INVPCID instruction: Code Name Cause A2h VMEXIT_INVPCID INVPCID instruction The following bit is added to the VMCB layout control area to control intercept of INVPCID: Byte Offset Bit(s) Function 14h 2 intercept INVPCID For the guests with nested page table (NPT) support, the INVPCID feature works as running it natively. KVM does not need to do any special handling in this case. Interceptions are required in the following cases. 1. If the guest tries to disable the feature when the underlying hardware supports it. In this case hypervisor needs to report #UD. 2. When the guest is running with shadow page table enabled, in this case the hypervisor needs to handle the tlbflush based on the type of invpcid instruction type. AMD documentation for INVPCID feature is available at "AMD64 Architecture Programmer’s Manual Volume 2: System Programming, Pub. 24593 Rev. 3.34(or later)" The documentation can be obtained at the links below: Link: https://www.amd.com/system/files/TechDocs/24593.pdf Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537 Signed-off-by: Babu Moger <babu.moger@amd.com> --- arch/x86/include/asm/svm.h | 4 ++++ arch/x86/include/uapi/asm/svm.h | 2 ++ arch/x86/kvm/svm/svm.c | 42 +++++++++++++++++++++++++++++++++++++++ 3 files changed, 48 insertions(+)