Message ID | f9c2583332d83fe76c3d98e215c76b7b111650e3.1592496443.git.hubert.jasudowicz@cert.pl (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | x86/cpuid: Expose number of vCPUs in CPUID.1.EBX | expand |
On 18/06/2020 17:22, Hubert Jasudowicz wrote: > When running under KVM (or presumably other hypervisors) we enable > the CPUID.1.EDX.HTT flag, thus indicating validity of CPUID.1.EBX[23:16] > - maximum number of logical processors which the guest reads as 0. > > Although this method of topology detection is considered legacy, > Windows falls back to it when CPUID.0BH.EBX is 0. > > CPUID.1.EBX[23:16] being equal to 0, triggers memory corruption in > ntoskrnl.exe as Windows assumes that number of logical processors would > be at least 1. Memory corruption manifests itself while mapping > framebuffer for early graphical subsystem, causing BSOD. > > This patch fixes running nested Windows (tested on 7 and 10) with KVM as > L0 hypervisor, by setting the value to maximum number of vCPUs in domain. > > Signed-off-by: Hubert Jasudowicz <hubert.jasudowicz@cert.pl> I'm afraid fixing guest topology is more complicated than just this. On its own, I'm not sure if this is safe for VMs migrating in. While I agree that Xen's logic is definitely broken, I suspect the conditions for the BSOD are more complicated than this, because Windows does work fine when there is no KVM in the setup described. ~Andrew
On 6/18/20 6:51 PM, Andrew Cooper wrote: > On 18/06/2020 17:22, Hubert Jasudowicz wrote: >> When running under KVM (or presumably other hypervisors) we enable >> the CPUID.1.EDX.HTT flag, thus indicating validity of CPUID.1.EBX[23:16] >> - maximum number of logical processors which the guest reads as 0. >> >> Although this method of topology detection is considered legacy, >> Windows falls back to it when CPUID.0BH.EBX is 0. >> >> CPUID.1.EBX[23:16] being equal to 0, triggers memory corruption in >> ntoskrnl.exe as Windows assumes that number of logical processors would >> be at least 1. Memory corruption manifests itself while mapping >> framebuffer for early graphical subsystem, causing BSOD. >> >> This patch fixes running nested Windows (tested on 7 and 10) with KVM as >> L0 hypervisor, by setting the value to maximum number of vCPUs in domain. >> >> Signed-off-by: Hubert Jasudowicz <hubert.jasudowicz@cert.pl> > > I'm afraid fixing guest topology is more complicated than just this. On > its own, I'm not sure if this is safe for VMs migrating in. > > While I agree that Xen's logic is definitely broken, I suspect the > conditions for the BSOD are more complicated than this, because Windows > does work fine when there is no KVM in the setup described. > > ~Andrew > After some more testing, I've managed to boot Windows by explicitly configuring the guest with cpuid="host,htt=0". If I understand correctly, the default behavior is to enable HTT for the guest and basically pass through the value of CPUID.1.EBX[23:16] without any sanity checks. The reason this works in other setups is that the non-zero value returned by real hardware leaks into the guest. In my setup, what Xen sees is: CPUID.1h == EAX: 000806ea EBX: 00000800 ECX: fffab223 EDX: 0f8bfbff In terms of VM migration, this seems already broken because guest might read different values depending on what underlying hardware reports. The patch would at least provide some consistency between hosts. Another solution would be not to enable HTT bit by default. Kind regards, Hubert Jasudowicz
On 19/06/2020 15:19, Hubert Jasudowicz wrote: > On 6/18/20 6:51 PM, Andrew Cooper wrote: >> On 18/06/2020 17:22, Hubert Jasudowicz wrote: >>> When running under KVM (or presumably other hypervisors) we enable >>> the CPUID.1.EDX.HTT flag, thus indicating validity of CPUID.1.EBX[23:16] >>> - maximum number of logical processors which the guest reads as 0. >>> >>> Although this method of topology detection is considered legacy, >>> Windows falls back to it when CPUID.0BH.EBX is 0. >>> >>> CPUID.1.EBX[23:16] being equal to 0, triggers memory corruption in >>> ntoskrnl.exe as Windows assumes that number of logical processors would >>> be at least 1. Memory corruption manifests itself while mapping >>> framebuffer for early graphical subsystem, causing BSOD. >>> >>> This patch fixes running nested Windows (tested on 7 and 10) with KVM as >>> L0 hypervisor, by setting the value to maximum number of vCPUs in domain. >>> >>> Signed-off-by: Hubert Jasudowicz <hubert.jasudowicz@cert.pl> >> I'm afraid fixing guest topology is more complicated than just this. On >> its own, I'm not sure if this is safe for VMs migrating in. >> >> While I agree that Xen's logic is definitely broken, I suspect the >> conditions for the BSOD are more complicated than this, because Windows >> does work fine when there is no KVM in the setup described. >> >> ~Andrew >> > After some more testing, I've managed to boot Windows by explicitly configuring the guest > with cpuid="host,htt=0". If I understand correctly, the default behavior is to > enable HTT for the guest and basically pass through the value of CPUID.1.EBX[23:16] > without any sanity checks. > > The reason this works in other setups is that the non-zero value returned by real hardware > leaks into the guest. In my setup, what Xen sees is: > CPUID.1h == EAX: 000806ea EBX: 00000800 ECX: fffab223 EDX: 0f8bfbff > > In terms of VM migration, this seems already broken because guest might read different > values depending on what underlying hardware reports. The patch would at least provide > some consistency between hosts. Another solution would be not to enable HTT bit by default. Apologies for the delay replying. (I've been attempting to finish the reply for more than a week now, but am just far too busy). Xen's behaviour is definitely buggy. I'm not trying to defend the mess it is currently in. The problem started (AFAICT) with c/s ca2eee92df44 in Xen 3.4 (yup - you're reading that right), which is still reverted in XenServer because it broke migration across that changeset. (We also have other topology extensions which are broken in different ways, and I'm still attempting to unbreak upstream Xen enough to fix it properly). That changeset attempted to expose hyperthreads, but keep them somewhat hidden by blindly asserting that APIC_ID shall now be vcpu_id * 2. Starting with 4.14-rc3, the logic patched above can now distinguish between a clean boot, and a migration in from a pre-4.14 version of Xen, where the CPUID settings need re-inventing out of thin air. Anyway - to this problem specifically. It seems KVM is giving us HTT=0 and NC=0. The botched logic above has clearly not been run on a pre-HTT processor, and it trips up properly under KVM's way of doing things. How is the rest of the topology expressed? Do we get one socket per vcpu then, or is this example a single vcpu VM? I'm wondering if the option least likely to break migration under the current scheme would be to have Xen invent a nonzero number there in the HVM policy alongside setting HTT. ~Andrew
On 6/30/20 10:49 PM, Andrew Cooper wrote: > On 19/06/2020 15:19, Hubert Jasudowicz wrote: >> On 6/18/20 6:51 PM, Andrew Cooper wrote: >>> On 18/06/2020 17:22, Hubert Jasudowicz wrote: >>>> When running under KVM (or presumably other hypervisors) we enable >>>> the CPUID.1.EDX.HTT flag, thus indicating validity of CPUID.1.EBX[23:16] >>>> - maximum number of logical processors which the guest reads as 0. >>>> >>>> Although this method of topology detection is considered legacy, >>>> Windows falls back to it when CPUID.0BH.EBX is 0. >>>> >>>> CPUID.1.EBX[23:16] being equal to 0, triggers memory corruption in >>>> ntoskrnl.exe as Windows assumes that number of logical processors would >>>> be at least 1. Memory corruption manifests itself while mapping >>>> framebuffer for early graphical subsystem, causing BSOD. >>>> >>>> This patch fixes running nested Windows (tested on 7 and 10) with KVM as >>>> L0 hypervisor, by setting the value to maximum number of vCPUs in domain. >>>> >>>> Signed-off-by: Hubert Jasudowicz <hubert.jasudowicz@cert.pl> >>> I'm afraid fixing guest topology is more complicated than just this. On >>> its own, I'm not sure if this is safe for VMs migrating in. >>> >>> While I agree that Xen's logic is definitely broken, I suspect the >>> conditions for the BSOD are more complicated than this, because Windows >>> does work fine when there is no KVM in the setup described. >>> >>> ~Andrew >>> >> After some more testing, I've managed to boot Windows by explicitly configuring the guest >> with cpuid="host,htt=0". If I understand correctly, the default behavior is to >> enable HTT for the guest and basically pass through the value of CPUID.1.EBX[23:16] >> without any sanity checks. >> >> The reason this works in other setups is that the non-zero value returned by real hardware >> leaks into the guest. In my setup, what Xen sees is: >> CPUID.1h == EAX: 000806ea EBX: 00000800 ECX: fffab223 EDX: 0f8bfbff >> >> In terms of VM migration, this seems already broken because guest might read different >> values depending on what underlying hardware reports. The patch would at least provide >> some consistency between hosts. Another solution would be not to enable HTT bit by default. > > Apologies for the delay replying. (I've been attempting to finish the > reply for more than a week now, but am just far too busy). > No worries. I understand that it's always too much code to review and too few maintainers. ;) > > Xen's behaviour is definitely buggy. I'm not trying to defend the mess > it is currently in. > > The problem started (AFAICT) with c/s ca2eee92df44 in Xen 3.4 (yup - > you're reading that right), which is still reverted in XenServer because > it broke migration across that changeset. (We also have other topology > extensions which are broken in different ways, and I'm still attempting > to unbreak upstream Xen enough to fix it properly). > > That changeset attempted to expose hyperthreads, but keep them somewhat > hidden by blindly asserting that APIC_ID shall now be vcpu_id * 2. > > Starting with 4.14-rc3, the logic patched above can now distinguish > between a clean boot, and a migration in from a pre-4.14 version of Xen, > where the CPUID settings need re-inventing out of thin air. > > > Anyway - to this problem specifically. > > It seems KVM is giving us HTT=0 and NC=0. The botched logic above has > clearly not been run on a pre-HTT processor, and it trips up properly > under KVM's way of doing things. > > How is the rest of the topology expressed? Do we get one socket per > vcpu then, or is this example a single vcpu VM? The default way of exposing topology when specifying -smp [cpu number] in QEMU command line is 1 socket, 1 core, 1 thread for each vCPU. I've fiddled with the switches and when I configured QEMU with -smp cores=2,sockets=2,threads=2, Xen sees the leaf as: CPUID.1h == EAX: 806ea EBX: 40800 ECX: fffa3223 EDX: 1f8bfbff so, as you can see, now the HTT bit is on and thus EBX[23:16] makes sense being equal to number of threads * number of cores for this socket. This also makes Windows boot without overriding cpuid policy. > I'm wondering if the option least likely to break migration under the > current scheme would be to have Xen invent a nonzero number there in the > HVM policy alongside setting HTT. This would probably fix the issue and not break anything (hopefully), however I don't really understand the rationale behind setting HTT bit on by default, except for looking "weird" to the guest that it has multiple sockets each with single core. Can you elaborate on that? Hubert Jasudowicz
diff --git a/xen/arch/x86/cpuid.c b/xen/arch/x86/cpuid.c index ee11087626..bf38398ef3 100644 --- a/xen/arch/x86/cpuid.c +++ b/xen/arch/x86/cpuid.c @@ -811,10 +811,12 @@ void guest_cpuid(const struct vcpu *v, uint32_t leaf, case 0x1: /* TODO: Rework topology logic. */ - res->b &= 0x00ffffffu; + res->b &= 0x0000ffffu; if ( is_hvm_domain(d) ) res->b |= (v->vcpu_id * 2) << 24; + res->b |= (d->max_vcpus & 0xff) << 16; + /* TODO: Rework vPMU control in terms of toolstack choices. */ if ( vpmu_available(v) && vpmu_is_set(vcpu_vpmu(v), VPMU_CPU_HAS_DS) )
When running under KVM (or presumably other hypervisors) we enable the CPUID.1.EDX.HTT flag, thus indicating validity of CPUID.1.EBX[23:16] - maximum number of logical processors which the guest reads as 0. Although this method of topology detection is considered legacy, Windows falls back to it when CPUID.0BH.EBX is 0. CPUID.1.EBX[23:16] being equal to 0, triggers memory corruption in ntoskrnl.exe as Windows assumes that number of logical processors would be at least 1. Memory corruption manifests itself while mapping framebuffer for early graphical subsystem, causing BSOD. This patch fixes running nested Windows (tested on 7 and 10) with KVM as L0 hypervisor, by setting the value to maximum number of vCPUs in domain. Signed-off-by: Hubert Jasudowicz <hubert.jasudowicz@cert.pl> --- xen/arch/x86/cpuid.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)