Message ID | b080fa0f-08d2-34d0-3f54-549e1303eeb4@suse.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | libxc/x86: avoid overflow in CPUID APIC ID adjustments | expand |
On 19/09/2019 12:48, Jan Beulich wrote: > Recent AMD processors may report up to 128 logical processors in CPUID > leaf 1. Doubling this value produces 0 (which OSes sincerely dislike), > as the respective field is only 8 bits wide. Suppress doubling the value > (and its leaf 0x80000008 counterpart) in such a case. > > Additionally don't even do any adjustment when the host topology already > includes room for multiple threads per core. > > Furthermore don't double the Maximum Cores Per Package at all - by us > introducing a fake HTT effect, the core count doesn't need to change. > Instead adjust the Maximum Logical Processors Sharing Cache field, which > so far was zapped altogether. > > Also zap leaf 4 (and at the same time leaf 2) EDX output for AMD. > > Signed-off-by: Jan Beulich <jbeulich@suse.com> > --- > TBD: Using xc_physinfo() output here needs a better solution. The > threads_per_core value returned is the count of active siblings of > CPU 0, rather than a system wide applicable value (and constant > over the entire session). Using CPUID output (leaves 4 and > 8000001e) doesn't look viable either, due to this not really being > the host values on PVH. Judging from the host feature set's HTT > flag also wouldn't tell us whether there actually are multiple > threads per core. The key thing is that htt != "more than one thread per core". HTT is strictly a bit indicating that topology information is available in a new form in the CPUID leaves (or in AMDs case, the same information should be interpreted in a new way). Just because HTT is set (and it does get set in non-HT capable systems), doesn't mean there is space for more than thread per core in topology information. For PV guests, my adjustment in the CPUID series shows (what I believe to be) the only correct way of propagating the host HTT/CMP_LEGACY settings through. For HVM guests, it really shouldn't really have anything to do with the host setting. We should be choosing how many threads/core to give to the guest, then constructing the topology information from first principles. Ignore the PVH case. It is totally broken for several other reasons as well, and PVH Dom0 isn't a production-ready thing yet. This gets us back to the PV case where the host information is actually in view, and (for backport purposes) can be trusted. ~Andrew
On 20.09.2019 12:05, Andrew Cooper wrote: > On 19/09/2019 12:48, Jan Beulich wrote: >> Recent AMD processors may report up to 128 logical processors in CPUID >> leaf 1. Doubling this value produces 0 (which OSes sincerely dislike), >> as the respective field is only 8 bits wide. Suppress doubling the value >> (and its leaf 0x80000008 counterpart) in such a case. >> >> Additionally don't even do any adjustment when the host topology already >> includes room for multiple threads per core. >> >> Furthermore don't double the Maximum Cores Per Package at all - by us >> introducing a fake HTT effect, the core count doesn't need to change. >> Instead adjust the Maximum Logical Processors Sharing Cache field, which >> so far was zapped altogether. >> >> Also zap leaf 4 (and at the same time leaf 2) EDX output for AMD. >> >> Signed-off-by: Jan Beulich <jbeulich@suse.com> >> --- >> TBD: Using xc_physinfo() output here needs a better solution. The >> threads_per_core value returned is the count of active siblings of >> CPU 0, rather than a system wide applicable value (and constant >> over the entire session). Using CPUID output (leaves 4 and >> 8000001e) doesn't look viable either, due to this not really being >> the host values on PVH. Judging from the host feature set's HTT >> flag also wouldn't tell us whether there actually are multiple >> threads per core. > > The key thing is that htt != "more than one thread per core". HTT is > strictly a bit indicating that topology information is available in a > new form in the CPUID leaves (or in AMDs case, the same information > should be interpreted in a new way). Just because HTT is set (and it > does get set in non-HT capable systems), doesn't mean there is space for > more than thread per core in topology information. > > For PV guests, my adjustment in the CPUID series shows (what I believe > to be) the only correct way of propagating the host HTT/CMP_LEGACY > settings through. > > For HVM guests, it really shouldn't really have anything to do with the > host setting. We should be choosing how many threads/core to give to > the guest, then constructing the topology information from first principles. > > Ignore the PVH case. It is totally broken for several other reasons as > well, and PVH Dom0 isn't a production-ready thing yet. > > This gets us back to the PV case where the host information is actually > in view, and (for backport purposes) can be trusted. Okay, this means I'll revive and finish the half cpuid() based attempt I had made initially. A fundamental question remains open though from your reply: Do you agree with the idea of avoiding the multiplication by 2 if the host topology already provides at least one bit of thread ID within the APIC ID? Related to which then the question whether you also agree with my approach of ditching the adjustment to Maximum Cores Per Package? (I'm asking because I'd like to avoid several more round trips of the patch itself.) Jan
On 20/09/2019 11:20, Jan Beulich wrote: > On 20.09.2019 12:05, Andrew Cooper wrote: >> On 19/09/2019 12:48, Jan Beulich wrote: >>> Recent AMD processors may report up to 128 logical processors in CPUID >>> leaf 1. Doubling this value produces 0 (which OSes sincerely dislike), >>> as the respective field is only 8 bits wide. Suppress doubling the value >>> (and its leaf 0x80000008 counterpart) in such a case. >>> >>> Additionally don't even do any adjustment when the host topology already >>> includes room for multiple threads per core. >>> >>> Furthermore don't double the Maximum Cores Per Package at all - by us >>> introducing a fake HTT effect, the core count doesn't need to change. >>> Instead adjust the Maximum Logical Processors Sharing Cache field, which >>> so far was zapped altogether. >>> >>> Also zap leaf 4 (and at the same time leaf 2) EDX output for AMD. >>> >>> Signed-off-by: Jan Beulich <jbeulich@suse.com> >>> --- >>> TBD: Using xc_physinfo() output here needs a better solution. The >>> threads_per_core value returned is the count of active siblings of >>> CPU 0, rather than a system wide applicable value (and constant >>> over the entire session). Using CPUID output (leaves 4 and >>> 8000001e) doesn't look viable either, due to this not really being >>> the host values on PVH. Judging from the host feature set's HTT >>> flag also wouldn't tell us whether there actually are multiple >>> threads per core. >> The key thing is that htt != "more than one thread per core". HTT is >> strictly a bit indicating that topology information is available in a >> new form in the CPUID leaves (or in AMDs case, the same information >> should be interpreted in a new way). Just because HTT is set (and it >> does get set in non-HT capable systems), doesn't mean there is space for >> more than thread per core in topology information. >> >> For PV guests, my adjustment in the CPUID series shows (what I believe >> to be) the only correct way of propagating the host HTT/CMP_LEGACY >> settings through. >> >> For HVM guests, it really shouldn't really have anything to do with the >> host setting. We should be choosing how many threads/core to give to >> the guest, then constructing the topology information from first principles. >> >> Ignore the PVH case. It is totally broken for several other reasons as >> well, and PVH Dom0 isn't a production-ready thing yet. >> >> This gets us back to the PV case where the host information is actually >> in view, and (for backport purposes) can be trusted. > Okay, this means I'll revive and finish the half cpuid() based attempt > I had made initially. A fundamental question remains open though from > your reply: Do you agree with the idea of avoiding the multiplication > by 2 if the host topology already provides at least one bit of thread > ID within the APIC ID? In theory, yes. In practice, I'd err on the side of not. A further problem with CPUID handling is that it is recalculated from scratch even after migrate. Therefore, any changes to the algorithm will cause inconsistencies to be seen in the guest across migrate/upgrade. This problem becomes substantially worse if the patch is backported to stable trees. Now that get_cpu_policy has existed for a little while, and set_cpu_policy is imminent, fixing the "CPUID changes across migrate" problem is almost doable, and is on the plan for toolstack work. That said, ultimately, anything "pre 4.14" => "4.14" is going to hit a discontinuity, because there is information discarded on the source side which can't be reconstructed on the destination. Overall, I would suggest doing the absolute minimum change required to unbreak Rome CPUs. Everything further is going to cause differences across migrate. In 4.14, I think we can reasonably fix all of: 1) CPUID data discarded for migrate 2) domain builder uses native CPUID 3) topology handling isn't consistent with SDM/APM all of which is libxc/libxl work, once set_cpu_policy() is in place. ~Andrew
On 20.09.2019 14:40, Andrew Cooper wrote: > Overall, I would suggest doing the absolute minimum change required to > unbreak Rome CPUs. Well, okay, I'm zapping everything else, but it feels wrong to leave in place the similar overflow in intel_xc_cpuid_policy(). Jan
--- a/tools/libxc/xc_cpuid_x86.c +++ b/tools/libxc/xc_cpuid_x86.c @@ -251,6 +251,8 @@ struct cpuid_domain_info uint32_t *featureset; unsigned int nr_features; + bool host_htt; + /* PV-only information. */ bool pv64; @@ -290,6 +292,7 @@ static int get_cpuid_domain_info(xc_inte xc_dominfo_t di; unsigned int in[2] = { 0, ~0U }, regs[4]; unsigned int i, host_nr_features = xc_get_cpu_featureset_size(); + xc_physinfo_t physinfo; int rc; cpuid(in, regs); @@ -343,6 +346,10 @@ static int get_cpuid_domain_info(xc_inte info->xfeature_mask = domctl.u.vcpuextstate.xfeature_mask; + rc = xc_physinfo(xch, &physinfo); + if ( !rc && physinfo.threads_per_core > 1 ) + info->host_htt = true; + if ( di.hvm ) { uint64_t val; @@ -385,7 +392,7 @@ static void amd_xc_cpuid_policy(const st { case 0x00000002: case 0x00000004: - regs[0] = regs[1] = regs[2] = 0; + regs[0] = regs[1] = regs[2] = regs[3] = 0; break; case 0x80000000: @@ -395,11 +402,25 @@ static void amd_xc_cpuid_policy(const st case 0x80000008: /* - * ECX[15:12] is ApicIdCoreSize: ECX[7:0] is NumberOfCores (minus one). - * Update to reflect vLAPIC_ID = vCPU_ID * 2. + * ECX[15:12] is ApicIdCoreSize. + * ECX[7:0] is NumberOfCores (minus one). + */ + if ( info->host_htt ) + { + regs[2] &= 0xf0ffu; + break; + } + /* + * Update to reflect vLAPIC_ID = vCPU_ID * 2. But make sure to avoid + * - overflow, + * - going out of sync with leaf 1 EBX[23:16], + * - incrementing ApicIdCoreSize when it's zero (which changes the + * meaning of bits 7:0). */ - regs[2] = ((regs[2] + (1u << 12)) & 0xf000u) | - ((regs[2] & 0xffu) << 1) | 1u; + if ( (regs[2] & 0xf000u) && (regs[2] & 0xf000u) != 0xf000u ) + regs[2] = ((regs[2] + (1u << 12)) & 0xf000u) | (regs[2] & 0xffu); + if ( (regs[2] & 0x7fu) < 0x7fu ) + regs[2] = (regs[2] & 0xf000u) | ((regs[2] & 0x7fu) << 1) | 1u; break; case 0x8000000a: { @@ -442,10 +463,19 @@ static void intel_xc_cpuid_policy(const case 0x00000004: /* * EAX[31:26] is Maximum Cores Per Package (minus one). - * Update to reflect vLAPIC_ID = vCPU_ID * 2. + * EAX[25:14] is Maximum Logical Processors Sharing Cache (minus one). */ - regs[0] = (((regs[0] & 0x7c000000u) << 1) | 0x04000000u | - (regs[0] & 0x3ffu)); + if ( info->host_htt ) + regs[0] &= 0xffffc3ffu; + else + { + /* + * Update to reflect vLAPIC_ID = vCPU_ID * 2. Note that overflow + * is sufficiently benign here. + */ + regs[0] = (((regs[0] | 0x00002000u) << 1) & 0x03ffc000u) | + (regs[0] & 0xfc0003ffu); + } regs[3] &= 0x3ffu; break; @@ -478,9 +508,13 @@ static void xc_cpuid_hvm_policy(const st case 0x00000001: /* * EBX[23:16] is Maximum Logical Processors Per Package. - * Update to reflect vLAPIC_ID = vCPU_ID * 2. + * Update to reflect vLAPIC_ID = vCPU_ID * 2, but make sure to avoid + * overflow. */ - regs[1] = (regs[1] & 0x0000ffffu) | ((regs[1] & 0x007f0000u) << 1); + if ( !info->host_htt && !(regs[1] & 0x00800000u) ) + regs[1] = (regs[1] & 0x0000ffffu) | ((regs[1] & 0x007f0000u) << 1); + else + regs[1] &= 0x00ffffffu; regs[2] = info->featureset[featureword_of(X86_FEATURE_SSE3)]; regs[3] = (info->featureset[featureword_of(X86_FEATURE_FPU)] |
Recent AMD processors may report up to 128 logical processors in CPUID leaf 1. Doubling this value produces 0 (which OSes sincerely dislike), as the respective field is only 8 bits wide. Suppress doubling the value (and its leaf 0x80000008 counterpart) in such a case. Additionally don't even do any adjustment when the host topology already includes room for multiple threads per core. Furthermore don't double the Maximum Cores Per Package at all - by us introducing a fake HTT effect, the core count doesn't need to change. Instead adjust the Maximum Logical Processors Sharing Cache field, which so far was zapped altogether. Also zap leaf 4 (and at the same time leaf 2) EDX output for AMD. Signed-off-by: Jan Beulich <jbeulich@suse.com> --- TBD: Using xc_physinfo() output here needs a better solution. The threads_per_core value returned is the count of active siblings of CPU 0, rather than a system wide applicable value (and constant over the entire session). Using CPUID output (leaves 4 and 8000001e) doesn't look viable either, due to this not really being the host values on PVH. Judging from the host feature set's HTT flag also wouldn't tell us whether there actually are multiple threads per core. TBD: The adjustment of Maximum Logical Processors Sharing Cache should perhaps occur only if an adjustment to leaf 1 has occurred.