Message ID | 20240827203804.4989-1-Ashish.Kalra@amd.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | x86/sev: Fix host kdump support for SNP | expand |
On August 27, 2024 10:38:04 PM GMT+02:00, Ashish Kalra <Ashish.Kalra@amd.com> wrote: >From: Ashish Kalra <ashish.kalra@amd.com> > >With active SNP VMs, SNP_SHUTDOWN_EX invoked during panic notifiers causes >crashkernel boot failure with the following signature: Why would SNP_SHUTDOWN be allowed *at all* if there are active SNP guests and there's potential to lose guest data in the process?! I don't think you want to be on the receiving end of those customer support calls at your cloud provider...
Hello Boris, On 8/29/2024 3:34 AM, Borislav Petkov wrote: > On August 27, 2024 10:38:04 PM GMT+02:00, Ashish Kalra <Ashish.Kalra@amd.com> wrote: >> From: Ashish Kalra <ashish.kalra@amd.com> >> >> With active SNP VMs, SNP_SHUTDOWN_EX invoked during panic notifiers causes >> crashkernel boot failure with the following signature: > Why would SNP_SHUTDOWN be allowed *at all* if there are active SNP guests and there's potential to lose guest data in the process?! If SNP_SHUTDOWN is not done, then crashkernel panics during boot as the crashdump attached to the fix/patch here shows, so essentially if SNP_DECOMMISSION followed by SNP_SHUTDOWN is not done then we can't boot crashkernel in case of any active SNP guests (which i will believe is an important requirement for cloud providers). Additionally, in case of SNP_DECOMMISSION, the firmware marks the ASID of the guest as not runnable and then transitions the SNP guest context page into a Firmware page (so that is one RMP table change) and for SNP_SHUTDOWN_EX, the firmware transitions all pages associated with the IOMMU to the Reclaim state (which then the HV marks as hypervisor pages), these IOMMU pages are the event log, PPR log, and completion wait buffers of the IOMMU. Aside from the IOMMU pages mentioned above, the firmware will not automatically reclaim or modify any other pages in the RMP table and also does not reset the RMP table. So essentially all host memory (and guest data) will still be available and saved by crashkernel. Thanks, Ashish
On Thu, Aug 29, 2024 at 09:30:54AM -0500, Kalra, Ashish wrote: > Hello Boris, > > On 8/29/2024 3:34 AM, Borislav Petkov wrote: > > On August 27, 2024 10:38:04 PM GMT+02:00, Ashish Kalra <Ashish.Kalra@amd.com> wrote: > >> From: Ashish Kalra <ashish.kalra@amd.com> > >> > >> With active SNP VMs, SNP_SHUTDOWN_EX invoked during panic notifiers causes > >> crashkernel boot failure with the following signature: > > Why would SNP_SHUTDOWN be allowed *at all* if there are active SNP guests and there's potential to lose guest data in the process?! > > If SNP_SHUTDOWN is not done, Read my question again pls.
On Thu, Aug 29, 2024, Borislav Petkov wrote: > On August 27, 2024 10:38:04 PM GMT+02:00, Ashish Kalra <Ashish.Kalra@amd.com> wrote: > >From: Ashish Kalra <ashish.kalra@amd.com> > > > >With active SNP VMs, SNP_SHUTDOWN_EX invoked during panic notifiers causes > >crashkernel boot failure with the following signature: > > Why would SNP_SHUTDOWN be allowed *at all* if there are active SNP guests and > there's potential to lose guest data in the process?! Because if the host is panicking, guests are hosed regardless. Unless I'm misreading things, the goal here is to ensure the crashkernel can actually capture a kdump.
On Thu, Aug 29, 2024 at 07:50:16AM -0700, Sean Christopherson wrote:
> Because if the host is panicking, guests are hosed regardless.
Are they?
I read "active SNP VMs"...
I guess if it reaches svm_emergency_disable() we're hosed anyway and that's
what you mean but I'm not sure...
On 8/29/2024 9:50 AM, Sean Christopherson wrote: > On Thu, Aug 29, 2024, Borislav Petkov wrote: >> On August 27, 2024 10:38:04 PM GMT+02:00, Ashish Kalra <Ashish.Kalra@amd.com> wrote: >>> From: Ashish Kalra <ashish.kalra@amd.com> >>> >>> With active SNP VMs, SNP_SHUTDOWN_EX invoked during panic notifiers causes >>> crashkernel boot failure with the following signature: >> Why would SNP_SHUTDOWN be allowed *at all* if there are active SNP guests and >> there's potential to lose guest data in the process?! > Because if the host is panicking, guests are hosed regardless. Unless I'm > misreading things, the goal here is to ensure the crashkernel can actually capture > a kdump. Yes, that is the main goal here to ensure that crashkernel can boot and capture a kdump on a SNP enabled host regardless of SNP VMs running. Thanks, Ashish
Hi Ashish, kernel test robot noticed the following build warnings: [auto build test WARNING on kvm/queue] [also build test WARNING on linus/master v6.11-rc5 next-20240829] [cannot apply to kvm/linux-next] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Ashish-Kalra/x86-sev-Fix-host-kdump-support-for-SNP/20240828-044035 base: https://git.kernel.org/pub/scm/virt/kvm/kvm.git queue patch link: https://lore.kernel.org/r/20240827203804.4989-1-Ashish.Kalra%40amd.com patch subject: [PATCH] x86/sev: Fix host kdump support for SNP config: x86_64-buildonly-randconfig-002-20240829 (https://download.01.org/0day-ci/archive/20240829/202408292344.yuQ5sYEz-lkp@intel.com/config) compiler: gcc-12 (Debian 12.2.0-14) 12.2.0 reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240829/202408292344.yuQ5sYEz-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202408292344.yuQ5sYEz-lkp@intel.com/ All warnings (new ones prefixed by >>): In file included from arch/x86/kvm/svm/avic.c:28: >> arch/x86/kvm/svm/svm.h:783:13: warning: 'snp_decommision_all' declared 'static' but never defined [-Wunused-function] 783 | static void snp_decommision_all(void); | ^~~~~~~~~~~~~~~~~~~ -- In file included from arch/x86/kvm/svm/svm.c:50: >> arch/x86/kvm/svm/svm.h:783:13: warning: 'snp_decommision_all' used but never defined 783 | static void snp_decommision_all(void); | ^~~~~~~~~~~~~~~~~~~ vim +783 arch/x86/kvm/svm/svm.h 763 764 static inline void sev_free_vcpu(struct kvm_vcpu *vcpu) {} 765 static inline void sev_vm_destroy(struct kvm *kvm) {} 766 static inline void __init sev_set_cpu_caps(void) {} 767 static inline void __init sev_hardware_setup(void) {} 768 static inline void sev_hardware_unsetup(void) {} 769 static inline int sev_cpu_init(struct svm_cpu_data *sd) { return 0; } 770 static inline int sev_dev_get_attr(u32 group, u64 attr, u64 *val) { return -ENXIO; } 771 #define max_sev_asid 0 772 static inline void sev_handle_rmp_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code) {} 773 static inline void sev_snp_init_protected_guest_state(struct kvm_vcpu *vcpu) {} 774 static inline int sev_gmem_prepare(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, int max_order) 775 { 776 return 0; 777 } 778 static inline void sev_gmem_invalidate(kvm_pfn_t start, kvm_pfn_t end) {} 779 static inline int sev_private_max_mapping_level(struct kvm *kvm, kvm_pfn_t pfn) 780 { 781 return 0; 782 } > 783 static void snp_decommision_all(void); 784 #endif 785
Hello Boris, On 8/29/2024 10:16 AM, Kalra, Ashish wrote: > On 8/29/2024 9:50 AM, Sean Christopherson wrote: > >> On Thu, Aug 29, 2024, Borislav Petkov wrote: >>> On August 27, 2024 10:38:04 PM GMT+02:00, Ashish Kalra <Ashish.Kalra@amd.com> wrote: >>>> From: Ashish Kalra <ashish.kalra@amd.com> >>>> >>>> With active SNP VMs, SNP_SHUTDOWN_EX invoked during panic notifiers causes >>>> crashkernel boot failure with the following signature: >>> Why would SNP_SHUTDOWN be allowed *at all* if there are active SNP guests and >>> there's potential to lose guest data in the process?! >> Because if the host is panicking, guests are hosed regardless. Unless I'm >> misreading things, the goal here is to ensure the crashkernel can actually capture >> a kdump. > Yes, that is the main goal here to ensure that crashkernel can boot and capture a kdump on a SNP enabled host regardless of SNP VMs running. Are you convinced with Sean's feedback here that this is a required feature to fix ? And it is important to reiterate this again: SNP_DECOMMISSION mainly unbinds the ASID from SNP context and marks the ASID as unusable and then transitions the SNP guest context page to a FW page and SNP_SHUTDOWN_EX transitions all pages associated with the IOMMU to reclaim state which the HV then transitions to hypervisor state, all these page state changes are in the RMP table, so there is no loss of guest data as such and the complete host memory is captured with the crashkernel boot. There are no processes which are being killed and host/guest memory is not being altered or modified in any way. Additionally, i believe that the support staff will absolutely need this kind of support which enables crashkernel/kdump for SNP hosts. Thanks, Ashish
Hi Ashish, kernel test robot noticed the following build warnings: [auto build test WARNING on kvm/queue] [also build test WARNING on linus/master v6.11-rc5 next-20240830] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Ashish-Kalra/x86-sev-Fix-host-kdump-support-for-SNP/20240828-044035 base: https://git.kernel.org/pub/scm/virt/kvm/kvm.git queue patch link: https://lore.kernel.org/r/20240827203804.4989-1-Ashish.Kalra%40amd.com patch subject: [PATCH] x86/sev: Fix host kdump support for SNP config: i386-buildonly-randconfig-006-20240831 (https://download.01.org/0day-ci/archive/20240831/202408311530.cYa27OX8-lkp@intel.com/config) compiler: clang version 18.1.5 (https://github.com/llvm/llvm-project 617a15a9eac96088ae5e9134248d8236e34b91b1) reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240831/202408311530.cYa27OX8-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202408311530.cYa27OX8-lkp@intel.com/ All warnings (new ones prefixed by >>): In file included from arch/x86/kvm/svm/svm.c:50: >> arch/x86/kvm/svm/svm.h:783:13: warning: function 'snp_decommision_all' has internal linkage but is not defined [-Wundefined-internal] 783 | static void snp_decommision_all(void); | ^ arch/x86/kvm/svm/svm.c:686:3: note: used here 686 | snp_decommision_all(); | ^ 1 warning generated. vim +/snp_decommision_all +783 arch/x86/kvm/svm/svm.h 763 764 static inline void sev_free_vcpu(struct kvm_vcpu *vcpu) {} 765 static inline void sev_vm_destroy(struct kvm *kvm) {} 766 static inline void __init sev_set_cpu_caps(void) {} 767 static inline void __init sev_hardware_setup(void) {} 768 static inline void sev_hardware_unsetup(void) {} 769 static inline int sev_cpu_init(struct svm_cpu_data *sd) { return 0; } 770 static inline int sev_dev_get_attr(u32 group, u64 attr, u64 *val) { return -ENXIO; } 771 #define max_sev_asid 0 772 static inline void sev_handle_rmp_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code) {} 773 static inline void sev_snp_init_protected_guest_state(struct kvm_vcpu *vcpu) {} 774 static inline int sev_gmem_prepare(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, int max_order) 775 { 776 return 0; 777 } 778 static inline void sev_gmem_invalidate(kvm_pfn_t start, kvm_pfn_t end) {} 779 static inline int sev_private_max_mapping_level(struct kvm *kvm, kvm_pfn_t pfn) 780 { 781 return 0; 782 } > 783 static void snp_decommision_all(void); 784 #endif 785
On Tue, Aug 27, 2024 at 10:40 PM Ashish Kalra <Ashish.Kalra@amd.com> wrote: > +void snp_decommision_all(void) Should be spelled snp_decommission_all (with two "s"). > +static DEFINE_SPINLOCK(snp_decommision_lock); Same here. > /* > * Only MSR_TSC_AUX is switched via the user return hook. EFER is switched via > * the VMCB, and the SYSCALL/SYSENTER MSRs are handled by VMLOAD/VMSAVE. > @@ -594,9 +597,97 @@ static inline void kvm_cpu_svm_disable(void) > > static void svm_emergency_disable(void) > { > + static atomic_t waiting_for_cpus_synchronized; > + static bool synchronize_cpus_initiated; > + static bool snp_decommision_handled; Same here, and below throughout the function (also SNP_DECOMMISSION). Please create a new function sev_emergency_disable(), with a stub in svm.h if CONFIG_KVM_AMD_ > @@ -749,6 +749,7 @@ void sev_snp_init_protected_guest_state(struct kvm_vcpu *vcpu); > int sev_gmem_prepare(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, int max_order); > void sev_gmem_invalidate(kvm_pfn_t start, kvm_pfn_t end); > int sev_private_max_mapping_level(struct kvm *kvm, kvm_pfn_t pfn); > +void snp_decommision_all(void); > #else > static inline struct page *snp_safe_alloc_page_node(int node, gfp_t gfp) > { > @@ -779,7 +780,7 @@ static inline int sev_private_max_mapping_level(struct kvm *kvm, kvm_pfn_t pfn) > { > return 0; > } > - > +static void snp_decommision_all(void); This should be inline (and after the change above it should be sev_emergency_disable(), not snp_decommission_all(), that is exported from sev.c). Thanks, Paolo
On Fri, Aug 30, 2024 at 04:08:35PM -0500, Kalra, Ashish wrote:
> Are you convinced with Sean's feedback here that this is a required feature to fix ?
Yes.
Thx.
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c index 0b851ef937f2..34ddea43c4e6 100644 --- a/arch/x86/kvm/svm/sev.c +++ b/arch/x86/kvm/svm/sev.c @@ -89,6 +89,7 @@ static unsigned int nr_asids; static unsigned long *sev_asid_bitmap; static unsigned long *sev_reclaim_asid_bitmap; +static void **snp_asid_to_gctx_pages_map; static int snp_decommission_context(struct kvm *kvm); struct enc_region { @@ -2248,6 +2249,9 @@ static int snp_launch_start(struct kvm *kvm, struct kvm_sev_cmd *argp) goto e_free_context; } + if (snp_asid_to_gctx_pages_map) + snp_asid_to_gctx_pages_map[sev_get_asid(kvm)] = sev->snp_context; + return 0; e_free_context: @@ -2884,9 +2888,35 @@ static int snp_decommission_context(struct kvm *kvm) snp_free_firmware_page(sev->snp_context); sev->snp_context = NULL; + if (snp_asid_to_gctx_pages_map) + snp_asid_to_gctx_pages_map[sev_get_asid(kvm)] = NULL; + return 0; } +/* + * NOTE: called in NMI context from sev_emergency_disable(). + */ +void snp_decommision_all(void) +{ + struct sev_data_snp_addr data = {}; + int ret, asid; + + if (!snp_asid_to_gctx_pages_map) + return; + + for (asid = 1; asid < min_sev_asid; asid++) { + if (snp_asid_to_gctx_pages_map[asid]) { + data.address = __sme_pa(snp_asid_to_gctx_pages_map[asid]); + ret = sev_do_cmd(SEV_CMD_SNP_DECOMMISSION, &data, NULL); + if (!ret) { + snp_free_firmware_page(snp_asid_to_gctx_pages_map[asid]); + snp_asid_to_gctx_pages_map[asid] = NULL; + } + } + } +} + void sev_vm_destroy(struct kvm *kvm) { struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info; @@ -3052,6 +3082,13 @@ void __init sev_hardware_setup(void) sev_es_supported = true; sev_snp_supported = sev_snp_enabled && cc_platform_has(CC_ATTR_HOST_SEV_SNP); + if (sev_snp_supported) { + snp_asid_to_gctx_pages_map = kmalloc_array(min_sev_asid, + sizeof(void *), + GFP_KERNEL | __GFP_ZERO); + if (!snp_asid_to_gctx_pages_map) + pr_warn("Could not allocate SNP asid to guest context map\n"); + } out: if (boot_cpu_has(X86_FEATURE_SEV)) pr_info("SEV %s (ASIDs %u - %u)\n", @@ -3094,6 +3131,8 @@ void sev_hardware_unsetup(void) misc_cg_set_capacity(MISC_CG_RES_SEV, 0); misc_cg_set_capacity(MISC_CG_RES_SEV_ES, 0); + + kfree(snp_asid_to_gctx_pages_map); } int sev_cpu_init(struct svm_cpu_data *sd) diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index e13c54d93964..a8f64a1710c2 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -17,6 +17,7 @@ #include <linux/highmem.h> #include <linux/amd-iommu.h> #include <linux/sched.h> +#include <linux/delay.h> #include <linux/trace_events.h> #include <linux/slab.h> #include <linux/hashtable.h> @@ -248,6 +249,8 @@ static unsigned long iopm_base; DEFINE_PER_CPU(struct svm_cpu_data, svm_data); +static DEFINE_SPINLOCK(snp_decommision_lock); + /* * Only MSR_TSC_AUX is switched via the user return hook. EFER is switched via * the VMCB, and the SYSCALL/SYSENTER MSRs are handled by VMLOAD/VMSAVE. @@ -594,9 +597,97 @@ static inline void kvm_cpu_svm_disable(void) static void svm_emergency_disable(void) { + static atomic_t waiting_for_cpus_synchronized; + static bool synchronize_cpus_initiated; + static bool snp_decommision_handled; + static atomic_t cpus_synchronized; + kvm_rebooting = true; kvm_cpu_svm_disable(); + + if (!cc_platform_has(CC_ATTR_HOST_SEV_SNP)) + return; + + /* + * SNP_SHUTDOWN_EX fails when SNP VMs are active as the firmware checks + * every encryption-capable ASID to verify that it is not in use by a + * guest and a DF_FLUSH is not required. If a DF_FLUSH is required, + * the firmware returns DFFLUSH_REQUIRED. To address this, SNP_DECOMMISION + * is required to shutdown all active SNP VMs, but SNP_DECOMMISION tags all + * CPUs that guest was activated on to do a WBINVD. When panic notifier + * is invoked all other CPUs have already been shutdown, so it is not + * possible to do a wbinvd_on_all_cpus() after SNP_DECOMMISION has been + * executed. This eventually causes SNP_SHUTDOWN_EX to fail after + * SNP_DECOMMISION. To fix this, do SNP_DECOMMISION and subsequent WBINVD + * on all CPUs during NMI shutdown of CPUs as part of disabling + * virtualization on all CPUs via cpu_emergency_disable_virtualization(). + */ + + spin_lock(&snp_decommision_lock); + + /* + * exit early for call from native_machine_crash_shutdown() + * as SNP_DECOMMISSION has already been done as part of + * NMI shutdown of the CPUs. + */ + if (snp_decommision_handled) { + spin_unlock(&snp_decommision_lock); + return; + } + + /* + * Synchronize all CPUs handling NMI before issuing + * SNP_DECOMMISSION. + */ + if (!synchronize_cpus_initiated) { + /* + * one CPU handling panic, the other CPU is initiator for + * CPU synchronization. + */ + atomic_set(&waiting_for_cpus_synchronized, num_online_cpus() - 2); + synchronize_cpus_initiated = true; + /* + * Ensure CPU synchronization parameters are setup before dropping + * the lock to let other CPUs continue to reach synchronization. + */ + wmb(); + + spin_unlock(&snp_decommision_lock); + + /* + * This will not cause system to hang forever as the CPU + * handling panic waits for maximum one second for + * other CPUs to stop in nmi_shootdown_cpus(). + */ + while (atomic_read(&waiting_for_cpus_synchronized) > 0) + mdelay(1); + + /* Reacquire the lock once CPUs are synchronized */ + spin_lock(&snp_decommision_lock); + + atomic_set(&cpus_synchronized, 1); + } else { + atomic_dec(&waiting_for_cpus_synchronized); + /* + * drop the lock to let other CPUs contiune to reach + * synchronization. + */ + spin_unlock(&snp_decommision_lock); + + while (atomic_read(&cpus_synchronized) == 0) + mdelay(1); + + /* Try to re-acquire lock after CPUs are synchronized */ + spin_lock(&snp_decommision_lock); + } + + if (!snp_decommision_handled) { + snp_decommision_all(); + snp_decommision_handled = true; + } + spin_unlock(&snp_decommision_lock); + wbinvd(); } static void svm_hardware_disable(void) diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h index 76107c7d0595..2f933b941b8d 100644 --- a/arch/x86/kvm/svm/svm.h +++ b/arch/x86/kvm/svm/svm.h @@ -749,6 +749,7 @@ void sev_snp_init_protected_guest_state(struct kvm_vcpu *vcpu); int sev_gmem_prepare(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, int max_order); void sev_gmem_invalidate(kvm_pfn_t start, kvm_pfn_t end); int sev_private_max_mapping_level(struct kvm *kvm, kvm_pfn_t pfn); +void snp_decommision_all(void); #else static inline struct page *snp_safe_alloc_page_node(int node, gfp_t gfp) { @@ -779,7 +780,7 @@ static inline int sev_private_max_mapping_level(struct kvm *kvm, kvm_pfn_t pfn) { return 0; } - +static void snp_decommision_all(void); #endif /* vmenter.S */