Message ID | 1684506832-41392-1-git-send-email-mikelley@microsoft.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [1/2] x86/hyperv: Fix hyperv_pcpu_input_arg handling when CPUs go online/offline | expand |
Michael Kelley <mikelley@microsoft.com> writes: > These commits > > a494aef23dfc ("PCI: hv: Replace retarget_msi_interrupt_params with hyperv_pcpu_input_arg") > 2c6ba4216844 ("PCI: hv: Enable PCI pass-thru devices in Confidential VMs") > > update the Hyper-V virtual PCI driver to use the hyperv_pcpu_input_arg > because that memory will be correctly marked as decrypted or encrypted > for all VM types (CoCo or normal). But problems ensue when CPUs in the > VM go online or offline after virtual PCI devices have been configured. > > When a CPU is brought online, the hyperv_pcpu_input_arg for that CPU is > initialized by hv_cpu_init() running under state CPUHP_AP_ONLINE_DYN. > But this state occurs after state CPUHP_AP_IRQ_AFFINITY_ONLINE, which > may call the virtual PCI driver and fault trying to use the as yet > uninitialized hyperv_pcpu_input_arg. A similar problem occurs in a CoCo > VM if the MMIO read and write hypercalls are used from state > CPUHP_AP_IRQ_AFFINITY_ONLINE. > > When a CPU is taken offline, IRQs may be reassigned in state > CPUHP_TEARDOWN_CPU. Again, the virtual PCI driver may fault trying to > use the hyperv_pcpu_input_arg that has already been freed by a > higher state. > > Fix the onlining problem by adding state CPUHP_AP_HYPERV_ONLINE > immediately after CPUHP_AP_ONLINE_IDLE (similar to CPUHP_AP_KVM_ONLINE) > and before CPUHP_AP_IRQ_AFFINITY_ONLINE. Use this new state for > Hyper-V initialization so that hyperv_pcpu_input_arg is allocated > early enough. > > Fix the offlining problem by not freeing hyperv_pcpu_input_arg when > a CPU goes offline. Retain the allocated memory, and reuse it if > the CPU comes back online later. > > Signed-off-by: Michael Kelley <mikelley@microsoft.com> > --- > arch/x86/hyperv/hv_init.c | 2 +- > drivers/hv/hv_common.c | 48 +++++++++++++++++++++++----------------------- > include/linux/cpuhotplug.h | 1 + > 3 files changed, 26 insertions(+), 25 deletions(-) > > diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c > index a5f9474..6c04b52 100644 > --- a/arch/x86/hyperv/hv_init.c > +++ b/arch/x86/hyperv/hv_init.c > @@ -416,7 +416,7 @@ void __init hyperv_init(void) > goto free_vp_assist_page; > } > > - cpuhp = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "x86/hyperv_init:online", > + cpuhp = cpuhp_setup_state(CPUHP_AP_HYPERV_ONLINE, "x86/hyperv_init:online", > hv_cpu_init, hv_cpu_die); > if (cpuhp < 0) > goto free_ghcb_page; > diff --git a/drivers/hv/hv_common.c b/drivers/hv/hv_common.c > index 64f9cec..4c5cee4 100644 > --- a/drivers/hv/hv_common.c > +++ b/drivers/hv/hv_common.c > @@ -364,13 +364,20 @@ int hv_common_cpu_init(unsigned int cpu) > flags = irqs_disabled() ? GFP_ATOMIC : GFP_KERNEL; > > inputarg = (void **)this_cpu_ptr(hyperv_pcpu_input_arg); > - *inputarg = kmalloc(pgcount * HV_HYP_PAGE_SIZE, flags); > - if (!(*inputarg)) > - return -ENOMEM; > > - if (hv_root_partition) { > - outputarg = (void **)this_cpu_ptr(hyperv_pcpu_output_arg); > - *outputarg = (char *)(*inputarg) + HV_HYP_PAGE_SIZE; > + /* > + * hyperv_pcpu_input_arg and hyperv_pcpu_output_arg memory is already > + * allocated if this CPU was previously online and then taken offline > + */ > + if (!*inputarg) { > + *inputarg = kmalloc(pgcount * HV_HYP_PAGE_SIZE, flags); > + if (!(*inputarg)) > + return -ENOMEM; > + > + if (hv_root_partition) { > + outputarg = (void **)this_cpu_ptr(hyperv_pcpu_output_arg); > + *outputarg = (char *)(*inputarg) + HV_HYP_PAGE_SIZE; > + } > } > > msr_vp_index = hv_get_register(HV_REGISTER_VP_INDEX); > @@ -385,24 +392,17 @@ int hv_common_cpu_init(unsigned int cpu) > > int hv_common_cpu_die(unsigned int cpu) > { > - unsigned long flags; > - void **inputarg, **outputarg; > - void *mem; > - > - local_irq_save(flags); > - > - inputarg = (void **)this_cpu_ptr(hyperv_pcpu_input_arg); > - mem = *inputarg; > - *inputarg = NULL; > - > - if (hv_root_partition) { > - outputarg = (void **)this_cpu_ptr(hyperv_pcpu_output_arg); > - *outputarg = NULL; > - } > - > - local_irq_restore(flags); > - > - kfree(mem); > + /* > + * The hyperv_pcpu_input_arg and hyperv_pcpu_output_arg memory > + * is not freed when the CPU goes offline as the hyperv_pcpu_input_arg > + * may be used by the Hyper-V vPCI driver in reassigning interrupts > + * as part of the offlining process. The interrupt reassignment > + * happens *after* the CPUHP_AP_HYPERV_ONLINE state has run and > + * called this function. > + * > + * If a previously offlined CPU is brought back online again, the > + * originally allocated memory is reused in hv_common_cpu_init(). > + */ > > return 0; > } > diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h > index 0f1001d..696004a 100644 > --- a/include/linux/cpuhotplug.h > +++ b/include/linux/cpuhotplug.h > @@ -201,6 +201,7 @@ enum cpuhp_state { > /* Online section invoked on the hotplugged CPU from the hotplug thread */ > CPUHP_AP_ONLINE_IDLE, > CPUHP_AP_KVM_ONLINE, > + CPUHP_AP_HYPERV_ONLINE, (Cc: KVM) I would very much prefer we swap the order with KVM here: hv_cpu_init() allocates and sets vCPU's VP assist page which is used by KVM on CPUHP_AP_KVM_ONLINE: kvm_online_cpu() -> __hardware_enable_nolock() -> kvm_arch_hardware_enable() -> vmx_hardware_enable(): /* * This can happen if we hot-added a CPU but failed to allocate * VP assist page for it. */ if (kvm_is_using_evmcs() && !hv_get_vp_assist_page(cpu)) return -EFAULT; With the change, this is likely broken. FWIF, KVM also needs current vCPU's VP index (also set by hv_cpu_init()) through __kvm_x86_vendor_init() -> set_hv_tscchange_cb() call chain but this happens upon KVM module load so CPU hotplug ordering should not matter as this always happens on a CPU which is already online. Generally, as 'KVM on Hyper-V' is a supported scenario, doing Hyper-V init before KVM's (and vice versa on teardown) makes sense. > CPUHP_AP_SCHED_WAIT_EMPTY, > CPUHP_AP_SMPBOOT_THREADS, > CPUHP_AP_X86_VDSO_VMA_ONLINE,
From: Vitaly Kuznetsov <vkuznets@redhat.com> Sent: Monday, May 22, 2023 1:56 AM > > Michael Kelley <mikelley@microsoft.com> writes: > [snip] > > diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h > > index 0f1001d..696004a 100644 > > --- a/include/linux/cpuhotplug.h > > +++ b/include/linux/cpuhotplug.h > > @@ -201,6 +201,7 @@ enum cpuhp_state { > > /* Online section invoked on the hotplugged CPU from the hotplug thread */ > > CPUHP_AP_ONLINE_IDLE, > > CPUHP_AP_KVM_ONLINE, > > + CPUHP_AP_HYPERV_ONLINE, > > (Cc: KVM) > > I would very much prefer we swap the order with KVM here: hv_cpu_init() > allocates and sets vCPU's VP assist page which is used by KVM on > CPUHP_AP_KVM_ONLINE: > > kvm_online_cpu() -> __hardware_enable_nolock() -> > kvm_arch_hardware_enable() -> vmx_hardware_enable(): > > /* > * This can happen if we hot-added a CPU but failed to allocate > * VP assist page for it. > */ > if (kvm_is_using_evmcs() && !hv_get_vp_assist_page(cpu)) > return -EFAULT; > > With the change, this is likely broken. > > FWIF, KVM also needs current vCPU's VP index (also set by hv_cpu_init()) > through __kvm_x86_vendor_init() -> set_hv_tscchange_cb() call chain but > this happens upon KVM module load so CPU hotplug ordering should not > matter as this always happens on a CPU which is already online. > > Generally, as 'KVM on Hyper-V' is a supported scenario, doing Hyper-V > init before KVM's (and vice versa on teardown) makes sense. > > > CPUHP_AP_SCHED_WAIT_EMPTY, > > CPUHP_AP_SMPBOOT_THREADS, > > CPUHP_AP_X86_VDSO_VMA_ONLINE, I have no objection to putting CPUHP_AP_HYPERV_ONLINE first. I did not give any consideration to a possible dependency between the two. :-( But note that in current code, hv_cpu_init() is running on the CPUHP_AP_ONLINE_DYN state, which is also after KVM. So this patch doesn't change the order w.r.t. KVM and the VP assist page. Are things already broken for KVM, or is something else happening that makes it work anyway? Michael
"Michael Kelley (LINUX)" <mikelley@microsoft.com> writes: > From: Vitaly Kuznetsov <vkuznets@redhat.com> Sent: Monday, May 22, 2023 1:56 AM >> >> Michael Kelley <mikelley@microsoft.com> writes: >> > > [snip] > >> > diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h >> > index 0f1001d..696004a 100644 >> > --- a/include/linux/cpuhotplug.h >> > +++ b/include/linux/cpuhotplug.h >> > @@ -201,6 +201,7 @@ enum cpuhp_state { >> > /* Online section invoked on the hotplugged CPU from the hotplug thread */ >> > CPUHP_AP_ONLINE_IDLE, >> > CPUHP_AP_KVM_ONLINE, >> > + CPUHP_AP_HYPERV_ONLINE, >> >> (Cc: KVM) >> >> I would very much prefer we swap the order with KVM here: hv_cpu_init() >> allocates and sets vCPU's VP assist page which is used by KVM on >> CPUHP_AP_KVM_ONLINE: >> >> kvm_online_cpu() -> __hardware_enable_nolock() -> >> kvm_arch_hardware_enable() -> vmx_hardware_enable(): >> >> /* >> * This can happen if we hot-added a CPU but failed to allocate >> * VP assist page for it. >> */ >> if (kvm_is_using_evmcs() && !hv_get_vp_assist_page(cpu)) >> return -EFAULT; >> >> With the change, this is likely broken. >> >> FWIF, KVM also needs current vCPU's VP index (also set by hv_cpu_init()) >> through __kvm_x86_vendor_init() -> set_hv_tscchange_cb() call chain but >> this happens upon KVM module load so CPU hotplug ordering should not >> matter as this always happens on a CPU which is already online. >> >> Generally, as 'KVM on Hyper-V' is a supported scenario, doing Hyper-V >> init before KVM's (and vice versa on teardown) makes sense. >> >> > CPUHP_AP_SCHED_WAIT_EMPTY, >> > CPUHP_AP_SMPBOOT_THREADS, >> > CPUHP_AP_X86_VDSO_VMA_ONLINE, > > I have no objection to putting CPUHP_AP_HYPERV_ONLINE first. I did > not give any consideration to a possible dependency between the two. :-( > But note that in current code, hv_cpu_init() is running on the > CPUHP_AP_ONLINE_DYN state, which is also after KVM. So this patch > doesn't change the order w.r.t. KVM and the VP assist page. Are things > already broken for KVM, or is something else happening that makes it > work anyway? This looks like a currently present bug indeed so I had to refresh my memory. KVM's CPUHP_AP_KVM_STARTIN is registered with cpuhp_setup_state_nocalls() which means that kvm_starting_cpu() is not called for all currently present CPUs. Moreover, kvm_init() is called when KVM vendor module (e.g. kvm_intel) is loaded and as KVM is normally built as module, this happens much later than Hyper-V's hyperv_init(). vmx_hardware_enable() is actually called from hardware_enable_all() which happens when the first KVM VM is created. This all changes when a CPU is hotplugged. The order CPUHP_AP_* from cpuhp_state is respected and KVM's kvm_starting_cpu() is called _before_ Hyper-V's hv_cpu_init() even before your patch. We don't see the bug just because Hyper-V doesn't(?) support CPU hotplug. Just sending a CPU offline with e.g. "echo 0 > /sys/devices/system/cpu/cpuX/online" is not the same as once allocated, VP assist page persists for all non-root Hyper-V partitions. I don't know if KVM is supported for Hyper-V root partitions but in case it is, we may have a problem. Let's put CPUHP_AP_HYPERV_ONLINE before KVM's CPUHP_AP_KVM_ONLINE explicitly so CPU hotplug scenario is handled correctly, even if current Hyper-V versions don't support it.
From: Vitaly Kuznetsov <vkuznets@redhat.com> Sent: Tuesday, May 23, 2023 1:05 AM > > "Michael Kelley (LINUX)" <mikelley@microsoft.com> writes: > > > From: Vitaly Kuznetsov <vkuznets@redhat.com> Sent: Monday, May 22, 2023 1:56 AM > >> > >> Michael Kelley <mikelley@microsoft.com> writes: > >> > > > > [snip] > > > >> > diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h > >> > index 0f1001d..696004a 100644 > >> > --- a/include/linux/cpuhotplug.h > >> > +++ b/include/linux/cpuhotplug.h > >> > @@ -201,6 +201,7 @@ enum cpuhp_state { > >> > /* Online section invoked on the hotplugged CPU from the hotplug thread */ > >> > CPUHP_AP_ONLINE_IDLE, > >> > CPUHP_AP_KVM_ONLINE, > >> > + CPUHP_AP_HYPERV_ONLINE, > >> > >> (Cc: KVM) > >> > >> I would very much prefer we swap the order with KVM here: hv_cpu_init() > >> allocates and sets vCPU's VP assist page which is used by KVM on > >> CPUHP_AP_KVM_ONLINE: > >> > >> kvm_online_cpu() -> __hardware_enable_nolock() -> > >> kvm_arch_hardware_enable() -> vmx_hardware_enable(): > >> > >> /* > >> * This can happen if we hot-added a CPU but failed to allocate > >> * VP assist page for it. > >> */ > >> if (kvm_is_using_evmcs() && !hv_get_vp_assist_page(cpu)) > >> return -EFAULT; > >> > >> With the change, this is likely broken. > >> > >> FWIF, KVM also needs current vCPU's VP index (also set by hv_cpu_init()) > >> through __kvm_x86_vendor_init() -> set_hv_tscchange_cb() call chain but > >> this happens upon KVM module load so CPU hotplug ordering should not > >> matter as this always happens on a CPU which is already online. > >> > >> Generally, as 'KVM on Hyper-V' is a supported scenario, doing Hyper-V > >> init before KVM's (and vice versa on teardown) makes sense. > >> > >> > CPUHP_AP_SCHED_WAIT_EMPTY, > >> > CPUHP_AP_SMPBOOT_THREADS, > >> > CPUHP_AP_X86_VDSO_VMA_ONLINE, > > > > I have no objection to putting CPUHP_AP_HYPERV_ONLINE first. I did > > not give any consideration to a possible dependency between the two. :-( > > But note that in current code, hv_cpu_init() is running on the > > CPUHP_AP_ONLINE_DYN state, which is also after KVM. So this patch > > doesn't change the order w.r.t. KVM and the VP assist page. Are things > > already broken for KVM, or is something else happening that makes it > > work anyway? > > This looks like a currently present bug indeed so I had to refresh my > memory. > > KVM's CPUHP_AP_KVM_STARTIN is registered with > cpuhp_setup_state_nocalls() which means that kvm_starting_cpu() is not > called for all currently present CPUs. Moreover, kvm_init() is called > when KVM vendor module (e.g. kvm_intel) is loaded and as KVM is normally > built as module, this happens much later than Hyper-V's > hyperv_init(). vmx_hardware_enable() is actually called from > hardware_enable_all() which happens when the first KVM VM is created. > > This all changes when a CPU is hotplugged. The order CPUHP_AP_* from > cpuhp_state is respected and KVM's kvm_starting_cpu() is called _before_ > Hyper-V's hv_cpu_init() even before your patch. We don't see the bug > just because Hyper-V doesn't(?) support CPU hotplug. Just sending a CPU > offline with e.g. "echo 0 > /sys/devices/system/cpu/cpuX/online" is not > the same as once allocated, VP assist page persists for all non-root > Hyper-V partitions. I don't know if KVM is supported for Hyper-V root > partitions but in case it is, we may have a problem. > > Let's put CPUHP_AP_HYPERV_ONLINE before KVM's CPUHP_AP_KVM_ONLINE > explicitly so CPU hotplug scenario is handled correctly, even if current > Hyper-V versions don't support it. > Will do. I'll send a v2 with the change. Michael
diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c index a5f9474..6c04b52 100644 --- a/arch/x86/hyperv/hv_init.c +++ b/arch/x86/hyperv/hv_init.c @@ -416,7 +416,7 @@ void __init hyperv_init(void) goto free_vp_assist_page; } - cpuhp = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "x86/hyperv_init:online", + cpuhp = cpuhp_setup_state(CPUHP_AP_HYPERV_ONLINE, "x86/hyperv_init:online", hv_cpu_init, hv_cpu_die); if (cpuhp < 0) goto free_ghcb_page; diff --git a/drivers/hv/hv_common.c b/drivers/hv/hv_common.c index 64f9cec..4c5cee4 100644 --- a/drivers/hv/hv_common.c +++ b/drivers/hv/hv_common.c @@ -364,13 +364,20 @@ int hv_common_cpu_init(unsigned int cpu) flags = irqs_disabled() ? GFP_ATOMIC : GFP_KERNEL; inputarg = (void **)this_cpu_ptr(hyperv_pcpu_input_arg); - *inputarg = kmalloc(pgcount * HV_HYP_PAGE_SIZE, flags); - if (!(*inputarg)) - return -ENOMEM; - if (hv_root_partition) { - outputarg = (void **)this_cpu_ptr(hyperv_pcpu_output_arg); - *outputarg = (char *)(*inputarg) + HV_HYP_PAGE_SIZE; + /* + * hyperv_pcpu_input_arg and hyperv_pcpu_output_arg memory is already + * allocated if this CPU was previously online and then taken offline + */ + if (!*inputarg) { + *inputarg = kmalloc(pgcount * HV_HYP_PAGE_SIZE, flags); + if (!(*inputarg)) + return -ENOMEM; + + if (hv_root_partition) { + outputarg = (void **)this_cpu_ptr(hyperv_pcpu_output_arg); + *outputarg = (char *)(*inputarg) + HV_HYP_PAGE_SIZE; + } } msr_vp_index = hv_get_register(HV_REGISTER_VP_INDEX); @@ -385,24 +392,17 @@ int hv_common_cpu_init(unsigned int cpu) int hv_common_cpu_die(unsigned int cpu) { - unsigned long flags; - void **inputarg, **outputarg; - void *mem; - - local_irq_save(flags); - - inputarg = (void **)this_cpu_ptr(hyperv_pcpu_input_arg); - mem = *inputarg; - *inputarg = NULL; - - if (hv_root_partition) { - outputarg = (void **)this_cpu_ptr(hyperv_pcpu_output_arg); - *outputarg = NULL; - } - - local_irq_restore(flags); - - kfree(mem); + /* + * The hyperv_pcpu_input_arg and hyperv_pcpu_output_arg memory + * is not freed when the CPU goes offline as the hyperv_pcpu_input_arg + * may be used by the Hyper-V vPCI driver in reassigning interrupts + * as part of the offlining process. The interrupt reassignment + * happens *after* the CPUHP_AP_HYPERV_ONLINE state has run and + * called this function. + * + * If a previously offlined CPU is brought back online again, the + * originally allocated memory is reused in hv_common_cpu_init(). + */ return 0; } diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h index 0f1001d..696004a 100644 --- a/include/linux/cpuhotplug.h +++ b/include/linux/cpuhotplug.h @@ -201,6 +201,7 @@ enum cpuhp_state { /* Online section invoked on the hotplugged CPU from the hotplug thread */ CPUHP_AP_ONLINE_IDLE, CPUHP_AP_KVM_ONLINE, + CPUHP_AP_HYPERV_ONLINE, CPUHP_AP_SCHED_WAIT_EMPTY, CPUHP_AP_SMPBOOT_THREADS, CPUHP_AP_X86_VDSO_VMA_ONLINE,
These commits a494aef23dfc ("PCI: hv: Replace retarget_msi_interrupt_params with hyperv_pcpu_input_arg") 2c6ba4216844 ("PCI: hv: Enable PCI pass-thru devices in Confidential VMs") update the Hyper-V virtual PCI driver to use the hyperv_pcpu_input_arg because that memory will be correctly marked as decrypted or encrypted for all VM types (CoCo or normal). But problems ensue when CPUs in the VM go online or offline after virtual PCI devices have been configured. When a CPU is brought online, the hyperv_pcpu_input_arg for that CPU is initialized by hv_cpu_init() running under state CPUHP_AP_ONLINE_DYN. But this state occurs after state CPUHP_AP_IRQ_AFFINITY_ONLINE, which may call the virtual PCI driver and fault trying to use the as yet uninitialized hyperv_pcpu_input_arg. A similar problem occurs in a CoCo VM if the MMIO read and write hypercalls are used from state CPUHP_AP_IRQ_AFFINITY_ONLINE. When a CPU is taken offline, IRQs may be reassigned in state CPUHP_TEARDOWN_CPU. Again, the virtual PCI driver may fault trying to use the hyperv_pcpu_input_arg that has already been freed by a higher state. Fix the onlining problem by adding state CPUHP_AP_HYPERV_ONLINE immediately after CPUHP_AP_ONLINE_IDLE (similar to CPUHP_AP_KVM_ONLINE) and before CPUHP_AP_IRQ_AFFINITY_ONLINE. Use this new state for Hyper-V initialization so that hyperv_pcpu_input_arg is allocated early enough. Fix the offlining problem by not freeing hyperv_pcpu_input_arg when a CPU goes offline. Retain the allocated memory, and reuse it if the CPU comes back online later. Signed-off-by: Michael Kelley <mikelley@microsoft.com> --- arch/x86/hyperv/hv_init.c | 2 +- drivers/hv/hv_common.c | 48 +++++++++++++++++++++++----------------------- include/linux/cpuhotplug.h | 1 + 3 files changed, 26 insertions(+), 25 deletions(-)