Message ID | f45c503fad62c899473b5a6fd0f2085208d6dfaf.1623174621.git.ashish.kalra@amd.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Add Guest API & Guest Kernel support for SEV live migration. | expand |
Preferred shortlog prefix for KVM guest changes is "x86/kvm". "KVM: x86" is for host changes. On Tue, Jun 08, 2021, Ashish Kalra wrote: > From: Ashish Kalra <ashish.kalra@amd.com> > > KVM hypercall framework relies on alternative framework to patch the > VMCALL -> VMMCALL on AMD platform. If a hypercall is made before > apply_alternative() is called then it defaults to VMCALL. The approach > works fine on non SEV guest. A VMCALL would causes #UD, and hypervisor > will be able to decode the instruction and do the right things. But > when SEV is active, guest memory is encrypted with guest key and > hypervisor will not be able to decode the instruction bytes. > > So invert KVM_HYPERCALL and X86_FEATURE_VMMCALL to default to VMMCALL > and opt into VMCALL. The changelog needs to explain why SEV hypercalls need to be made before apply_alternative(), why it's ok to make Intel CPUs take #UDs on the unknown VMMCALL, and why this is not creating the same conundrum for TDX. Actually, I don't think making Intel CPUs take #UDs is acceptable. This patch breaks Linux on upstream KVM on Intel due a bug in upstream KVM. KVM attempts to patch the "wrong" hypercall to the "right" hypercall, but stupidly does so via an emulated write. I.e. KVM honors the guest page table permissions and injects a !WRITABLE #PF on the VMMCALL RIP if the kernel code is mapped RX. In other words, trusting the VMM to not screw up the #UD is a bad idea. This also makes documenting the "why does SEV need super early hypercalls" extra important. This patch doesn't work because X86_FEATURE_VMCALL is a synthetic flag and is only set by VMware paravirt code, which is why the patching doesn't happen as would be expected. The obvious solution would be to manually set X86_FEATURE_VMCALL where appropriate, but given that defaulting to VMCALL has worked for years, defaulting to VMMCALL makes me nervous, e.g. even if we splatter X86_FEATURE_VMCALL into Intel, Centaur, and Zhaoxin, there's a possibility we'll break existing VMs that run on hypervisors that do something weird with the vendor string. Rather than look for X86_FEATURE_VMCALL, I think it makes sense to have this be a "pure" inversion, i.e. patch in VMCALL if VMMCALL is not supported, as opposed to patching in VMCALL if VMCALL is supproted. diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h index 69299878b200..61641e69cfda 100644 --- a/arch/x86/include/asm/kvm_para.h +++ b/arch/x86/include/asm/kvm_para.h @@ -17,7 +17,7 @@ static inline bool kvm_check_and_clear_guest_paused(void) #endif /* CONFIG_KVM_GUEST */ #define KVM_HYPERCALL \ - ALTERNATIVE("vmcall", "vmmcall", X86_FEATURE_VMMCALL) + ALTERNATIVE("vmmcall", "vmcall", ALT_NOT(X86_FEATURE_VMMCALL)) /* For KVM hypercalls, a three-byte sequence of either the vmcall or the vmmcall * instruction. The hypervisor may replace it with something else but only the > Cc: Thomas Gleixner <tglx@linutronix.de> > Cc: Ingo Molnar <mingo@redhat.com> > Cc: "H. Peter Anvin" <hpa@zytor.com> > Cc: Paolo Bonzini <pbonzini@redhat.com> > Cc: Joerg Roedel <joro@8bytes.org> > Cc: Borislav Petkov <bp@suse.de> > Cc: Tom Lendacky <thomas.lendacky@amd.com> > Cc: x86@kernel.org > Cc: kvm@vger.kernel.org > Cc: linux-kernel@vger.kernel.org Suggested-by: Sean Christopherson <seanjc@google.com> > Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Is Brijesh the author? Co-developed-by for a one-line change would be odd... > Signed-off-by: Ashish Kalra <ashish.kalra@amd.com> > --- > arch/x86/include/asm/kvm_para.h | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h > index 69299878b200..0267bebb0b0f 100644 > --- a/arch/x86/include/asm/kvm_para.h > +++ b/arch/x86/include/asm/kvm_para.h > @@ -17,7 +17,7 @@ static inline bool kvm_check_and_clear_guest_paused(void) > #endif /* CONFIG_KVM_GUEST */ > > #define KVM_HYPERCALL \ > - ALTERNATIVE("vmcall", "vmmcall", X86_FEATURE_VMMCALL) > + ALTERNATIVE("vmmcall", "vmcall", X86_FEATURE_VMCALL) > > /* For KVM hypercalls, a three-byte sequence of either the vmcall or the vmmcall > * instruction. The hypervisor may replace it with something else but only the > -- > 2.17.1 >
Hello Sean, > On Aug 20, 2021, at 2:15 AM, Sean Christopherson <seanjc@google.com> wrote: > > Preferred shortlog prefix for KVM guest changes is "x86/kvm". "KVM: x86" is for > host changes. > >> On Tue, Jun 08, 2021, Ashish Kalra wrote: >> From: Ashish Kalra <ashish.kalra@amd.com> >> >> KVM hypercall framework relies on alternative framework to patch the >> VMCALL -> VMMCALL on AMD platform. If a hypercall is made before >> apply_alternative() is called then it defaults to VMCALL. The approach >> works fine on non SEV guest. A VMCALL would causes #UD, and hypervisor >> will be able to decode the instruction and do the right things. But >> when SEV is active, guest memory is encrypted with guest key and >> hypervisor will not be able to decode the instruction bytes. >> >> So invert KVM_HYPERCALL and X86_FEATURE_VMMCALL to default to VMMCALL >> and opt into VMCALL. > > The changelog needs to explain why SEV hypercalls need to be made before > apply_alternative(), why it's ok to make Intel CPUs take #UDs on the unknown > VMMCALL, and why this is not creating the same conundrum for TDX. I think it makes more sense to stick to the original approach/patch, i.e., introducing a new private hypercall interface like kvm_sev_hypercall3() and let early paravirtualized kernel code invoke this private hypercall interface wherever required. This helps avoiding Intel CPUs taking unnecessary #UDs and also avoid using hacks as below. TDX code can introduce similar private hypercall interface for their early para virtualized kernel code if required. > > Actually, I don't think making Intel CPUs take #UDs is acceptable. This patch > breaks Linux on upstream KVM on Intel due a bug in upstream KVM. KVM attempts > to patch the "wrong" hypercall to the "right" hypercall, but stupidly does so > via an emulated write. I.e. KVM honors the guest page table permissions and > injects a !WRITABLE #PF on the VMMCALL RIP if the kernel code is mapped RX. > > In other words, trusting the VMM to not screw up the #UD is a bad idea. This also > makes documenting the "why does SEV need super early hypercalls" extra important. > Makes sense. Thanks, Ashish > This patch doesn't work because X86_FEATURE_VMCALL is a synthetic flag and is > only set by VMware paravirt code, which is why the patching doesn't happen as > would be expected. The obvious solution would be to manually set X86_FEATURE_VMCALL > where appropriate, but given that defaulting to VMCALL has worked for years, > defaulting to VMMCALL makes me nervous, e.g. even if we splatter X86_FEATURE_VMCALL > into Intel, Centaur, and Zhaoxin, there's a possibility we'll break existing VMs > that run on hypervisors that do something weird with the vendor string. > > Rather than look for X86_FEATURE_VMCALL, I think it makes sense to have this be > a "pure" inversion, i.e. patch in VMCALL if VMMCALL is not supported, as opposed > to patching in VMCALL if VMCALL is supproted. > > diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h > index 69299878b200..61641e69cfda 100644 > --- a/arch/x86/include/asm/kvm_para.h > +++ b/arch/x86/include/asm/kvm_para.h > @@ -17,7 +17,7 @@ static inline bool kvm_check_and_clear_guest_paused(void) > #endif /* CONFIG_KVM_GUEST */ > > #define KVM_HYPERCALL \ > - ALTERNATIVE("vmcall", "vmmcall", X86_FEATURE_VMMCALL) > + ALTERNATIVE("vmmcall", "vmcall", ALT_NOT(X86_FEATURE_VMMCALL)) > > /* For KVM hypercalls, a three-byte sequence of either the vmcall or the vmmcall > * instruction. The hypervisor may replace it with something else but only the > >> Cc: Thomas Gleixner <tglx@linutronix.de> >> Cc: Ingo Molnar <mingo@redhat.com> >> Cc: "H. Peter Anvin" <hpa@zytor.com> >> Cc: Paolo Bonzini <pbonzini@redhat.com> >> Cc: Joerg Roedel <joro@8bytes.org> >> Cc: Borislav Petkov <bp@suse.de> >> Cc: Tom Lendacky <thomas.lendacky@amd.com> >> Cc: x86@kernel.org >> Cc: kvm@vger.kernel.org >> Cc: linux-kernel@vger.kernel.org > > Suggested-by: Sean Christopherson <seanjc@google.com> > >> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> > > Is Brijesh the author? Co-developed-by for a one-line change would be odd... > >> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com> >> --- >> arch/x86/include/asm/kvm_para.h | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h >> index 69299878b200..0267bebb0b0f 100644 >> --- a/arch/x86/include/asm/kvm_para.h >> +++ b/arch/x86/include/asm/kvm_para.h >> @@ -17,7 +17,7 @@ static inline bool kvm_check_and_clear_guest_paused(void) >> #endif /* CONFIG_KVM_GUEST */ >> >> #define KVM_HYPERCALL \ >> - ALTERNATIVE("vmcall", "vmmcall", X86_FEATURE_VMMCALL) >> + ALTERNATIVE("vmmcall", "vmcall", X86_FEATURE_VMCALL) >> >> /* For KVM hypercalls, a three-byte sequence of either the vmcall or the vmmcall >> * instruction. The hypervisor may replace it with something else but only the >> -- >> 2.17.1 >>
> On Aug 20, 2021, at 3:38 AM, Kalra, Ashish <Ashish.Kalra@amd.com> wrote: > > Hello Sean, > >> On Aug 20, 2021, at 2:15 AM, Sean Christopherson <seanjc@google.com> wrote: >> >> Preferred shortlog prefix for KVM guest changes is "x86/kvm". "KVM: x86" is for >> host changes. >> >>>> On Tue, Jun 08, 2021, Ashish Kalra wrote: >>> From: Ashish Kalra <ashish.kalra@amd.com> >>> >>> KVM hypercall framework relies on alternative framework to patch the >>> VMCALL -> VMMCALL on AMD platform. If a hypercall is made before >>> apply_alternative() is called then it defaults to VMCALL. The approach >>> works fine on non SEV guest. A VMCALL would causes #UD, and hypervisor >>> will be able to decode the instruction and do the right things. But >>> when SEV is active, guest memory is encrypted with guest key and >>> hypervisor will not be able to decode the instruction bytes. >>> >>> So invert KVM_HYPERCALL and X86_FEATURE_VMMCALL to default to VMMCALL >>> and opt into VMCALL. >> >> The changelog needs to explain why SEV hypercalls need to be made before >> apply_alternative(), why it's ok to make Intel CPUs take #UDs on the unknown >> VMMCALL, and why this is not creating the same conundrum for TDX. > > I think it makes more sense to stick to the original approach/patch, i.e., introducing a new private hypercall interface like kvm_sev_hypercall3() and let early paravirtualized kernel code invoke this private hypercall interface wherever required. > > This helps avoiding Intel CPUs taking unnecessary #UDs and also avoid using hacks as below. > > TDX code can introduce similar private hypercall interface for their early para virtualized kernel code if required. Actually, if we are using this kvm_sev_hypercall3() and not modifying KVM_HYPERCALL() then Intel CPUs avoid unnecessary #UDs and TDX code does not need any new interface. Only early AMD/SEV specific code will use this kvm_sev_hypercall3() interface. TDX code will always work with KVM_HYPERCALL(). Thanks, Ashish > >> >> Actually, I don't think making Intel CPUs take #UDs is acceptable. This patch >> breaks Linux on upstream KVM on Intel due a bug in upstream KVM. KVM attempts >> to patch the "wrong" hypercall to the "right" hypercall, but stupidly does so >> via an emulated write. I.e. KVM honors the guest page table permissions and >> injects a !WRITABLE #PF on the VMMCALL RIP if the kernel code is mapped RX. >> >> In other words, trusting the VMM to not screw up the #UD is a bad idea. This also >> makes documenting the "why does SEV need super early hypercalls" extra important. >> > > Makes sense. > > Thanks, > Ashish > >> This patch doesn't work because X86_FEATURE_VMCALL is a synthetic flag and is >> only set by VMware paravirt code, which is why the patching doesn't happen as >> would be expected. The obvious solution would be to manually set X86_FEATURE_VMCALL >> where appropriate, but given that defaulting to VMCALL has worked for years, >> defaulting to VMMCALL makes me nervous, e.g. even if we splatter X86_FEATURE_VMCALL >> into Intel, Centaur, and Zhaoxin, there's a possibility we'll break existing VMs >> that run on hypervisors that do something weird with the vendor string. >> >> Rather than look for X86_FEATURE_VMCALL, I think it makes sense to have this be >> a "pure" inversion, i.e. patch in VMCALL if VMMCALL is not supported, as opposed >> to patching in VMCALL if VMCALL is supproted. >> >> diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h >> index 69299878b200..61641e69cfda 100644 >> --- a/arch/x86/include/asm/kvm_para.h >> +++ b/arch/x86/include/asm/kvm_para.h >> @@ -17,7 +17,7 @@ static inline bool kvm_check_and_clear_guest_paused(void) >> #endif /* CONFIG_KVM_GUEST */ >> >> #define KVM_HYPERCALL \ >> - ALTERNATIVE("vmcall", "vmmcall", X86_FEATURE_VMMCALL) >> + ALTERNATIVE("vmmcall", "vmcall", ALT_NOT(X86_FEATURE_VMMCALL)) >> >> /* For KVM hypercalls, a three-byte sequence of either the vmcall or the vmmcall >> * instruction. The hypervisor may replace it with something else but only the >> >>> Cc: Thomas Gleixner <tglx@linutronix.de> >>> Cc: Ingo Molnar <mingo@redhat.com> >>> Cc: "H. Peter Anvin" <hpa@zytor.com> >>> Cc: Paolo Bonzini <pbonzini@redhat.com> >>> Cc: Joerg Roedel <joro@8bytes.org> >>> Cc: Borislav Petkov <bp@suse.de> >>> Cc: Tom Lendacky <thomas.lendacky@amd.com> >>> Cc: x86@kernel.org >>> Cc: kvm@vger.kernel.org >>> Cc: linux-kernel@vger.kernel.org >> >> Suggested-by: Sean Christopherson <seanjc@google.com> >> >>> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> >> >> Is Brijesh the author? Co-developed-by for a one-line change would be odd... >> >>> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com> >>> --- >>> arch/x86/include/asm/kvm_para.h | 2 +- >>> 1 file changed, 1 insertion(+), 1 deletion(-) >>> >>> diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h >>> index 69299878b200..0267bebb0b0f 100644 >>> --- a/arch/x86/include/asm/kvm_para.h >>> +++ b/arch/x86/include/asm/kvm_para.h >>> @@ -17,7 +17,7 @@ static inline bool kvm_check_and_clear_guest_paused(void) >>> #endif /* CONFIG_KVM_GUEST */ >>> >>> #define KVM_HYPERCALL \ >>> - ALTERNATIVE("vmcall", "vmmcall", X86_FEATURE_VMMCALL) >>> + ALTERNATIVE("vmmcall", "vmcall", X86_FEATURE_VMCALL) >>> >>> /* For KVM hypercalls, a three-byte sequence of either the vmcall or the vmmcall >>> * instruction. The hypervisor may replace it with something else but only the >>> -- >>> 2.17.1 >>>
On Thu, Aug 19, 2021, Kalra, Ashish wrote: > > > On Aug 20, 2021, at 3:38 AM, Kalra, Ashish <Ashish.Kalra@amd.com> wrote: > > I think it makes more sense to stick to the original approach/patch, i.e., > > introducing a new private hypercall interface like kvm_sev_hypercall3() and > > let early paravirtualized kernel code invoke this private hypercall > > interface wherever required. I don't like the idea of duplicating code just because the problem is tricky to solve. Right now it's just one function, but it could balloon to multiple in the future. Plus there's always the possibility of a new, pre-alternatives kvm_hypercall() being added in generic code, at which point using an SEV-specific variant gets even uglier. > > This helps avoiding Intel CPUs taking unnecessary #UDs and also avoid using > > hacks as below. > > > > TDX code can introduce similar private hypercall interface for their early > > para virtualized kernel code if required. > > Actually, if we are using this kvm_sev_hypercall3() and not modifying > KVM_HYPERCALL() then Intel CPUs avoid unnecessary #UDs and TDX code does not > need any new interface. Only early AMD/SEV specific code will use this > kvm_sev_hypercall3() interface. TDX code will always work with > KVM_HYPERCALL(). Even if VMCALL is the default, i.e. not patched in, VMCALL it will #VE on TDX. In other words, VMCALL isn't really any better than VMMCALL, TDX will need to do something clever either way.
On Thu, Aug 19, 2021 at 11:15:26PM +0000, Sean Christopherson wrote: > On Thu, Aug 19, 2021, Kalra, Ashish wrote: > > > > > On Aug 20, 2021, at 3:38 AM, Kalra, Ashish <Ashish.Kalra@amd.com> wrote: > > > I think it makes more sense to stick to the original approach/patch, i.e., > > > introducing a new private hypercall interface like kvm_sev_hypercall3() and > > > let early paravirtualized kernel code invoke this private hypercall > > > interface wherever required. > > I don't like the idea of duplicating code just because the problem is tricky to > solve. Right now it's just one function, but it could balloon to multiple in > the future. Plus there's always the possibility of a new, pre-alternatives > kvm_hypercall() being added in generic code, at which point using an SEV-specific > variant gets even uglier. > Also to highlight the need to support this interface, capturing the flow of apply_alternatives() as part of this thread: setup_arch() call init_hypervisor_platform() which detects the hypervisor platform the kernel is running under and then the hypervisor specific initialization code can make early hypercalls. For example, KVM specific initialization in case of SEV will try to mark the "__bss_decrypted" section's encryption state via early page encryption status hypercalls. Now, apply_alternatives() is called much later when setup_arch() calls check_bugs(), so we do need some kind of an early, pre-alternatives hypercall interface. Other cases of pre-alternatives hypercalls include marking per-cpu GHCB pages as decrypted on SEV-ES and per-cpu apf_reason, steal_time and kvm_apic_eoi as decrypted for SEV generally. Actually using this kvm_sev_hypercall3() function may be abstracted quite nicely. All these early hypercalls are made through early_set_memory_XX() interfaces, which in turn invoke pv_ops. Now, pv_ops can have this SEV/TDX specific abstractions. Currently, pv_ops.mmu.notify_page_enc_status_changed() callback is setup to kvm_sev_hypercall3() in case of SEV. Similarly, in case of TDX, pv_ops.mmu.notify_page_enc_status_changed() can be setup to a TDX specific callback. Therefore, this early_set_memory_XX() -> pv_ops.mmu.notify_page_enc_status_changed() is a generic interface and can easily have SEV, TDX and any other future platform specific abstractions added to it. Thanks, Ashish > > > This helps avoiding Intel CPUs taking unnecessary #UDs and also avoid using > > > hacks as below. > > > > > > TDX code can introduce similar private hypercall interface for their early > > > para virtualized kernel code if required. > > > > Actually, if we are using this kvm_sev_hypercall3() and not modifying > > KVM_HYPERCALL() then Intel CPUs avoid unnecessary #UDs and TDX code does not > > need any new interface. Only early AMD/SEV specific code will use this > > kvm_sev_hypercall3() interface. TDX code will always work with > > KVM_HYPERCALL(). > > Even if VMCALL is the default, i.e. not patched in, VMCALL it will #VE on TDX. > In other words, VMCALL isn't really any better than VMMCALL, TDX will need to do > something clever either way.
diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h index 69299878b200..0267bebb0b0f 100644 --- a/arch/x86/include/asm/kvm_para.h +++ b/arch/x86/include/asm/kvm_para.h @@ -17,7 +17,7 @@ static inline bool kvm_check_and_clear_guest_paused(void) #endif /* CONFIG_KVM_GUEST */ #define KVM_HYPERCALL \ - ALTERNATIVE("vmcall", "vmmcall", X86_FEATURE_VMMCALL) + ALTERNATIVE("vmmcall", "vmcall", X86_FEATURE_VMCALL) /* For KVM hypercalls, a three-byte sequence of either the vmcall or the vmmcall * instruction. The hypervisor may replace it with something else but only the