Message ID | 20210628132018.394994-1-pizhenwei@bytedance.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | target/i386: Fix cpuid level for AMD | expand |
* zhenwei pi (pizhenwei@bytedance.com) wrote: > A AMD server typically has cpuid level 0x10(test on Rome/Milan), it > should not be changed to 0x1f in multi-dies case. > > Fixes: a94e1428991 (target/i386: Add CPUID.1F generation support > for multi-dies PCMachine) > Signed-off-by: zhenwei pi <pizhenwei@bytedance.com> (Copying in Babu) Hmm I think you're right. I've cc'd in Babu and Wei. Eduardo: What do we need to do about compatibility, do we need to wire this to machine type or CPU version? Dave > --- > target/i386/cpu.c | 8 ++++++-- > 1 file changed, 6 insertions(+), 2 deletions(-) > > diff --git a/target/i386/cpu.c b/target/i386/cpu.c > index a9fe1662d3..3934c559e4 100644 > --- a/target/i386/cpu.c > +++ b/target/i386/cpu.c > @@ -5961,8 +5961,12 @@ void x86_cpu_expand_features(X86CPU *cpu, Error **errp) > } > } > > - /* CPU topology with multi-dies support requires CPUID[0x1F] */ > - if (env->nr_dies > 1) { > + /* > + * Intel CPU topology with multi-dies support requires CPUID[0x1F]. > + * For AMD Rome/Milan, cpuid level is 0x10, and guest OS should detect > + * extended toplogy by leaf 0xB. Only adjust it for Intel CPU. > + */ > + if ((env->nr_dies > 1) && IS_INTEL_CPU(env)) { > x86_cpu_adjust_level(cpu, &env->cpuid_min_level, 0x1F); > } > > -- > 2.25.1 > >
On 6/29/21 9:06 AM, Dr. David Alan Gilbert wrote: > * zhenwei pi (pizhenwei@bytedance.com) wrote: >> A AMD server typically has cpuid level 0x10(test on Rome/Milan), it >> should not be changed to 0x1f in multi-dies case. >> >> Fixes: a94e1428991 (target/i386: Add CPUID.1F generation support >> for multi-dies PCMachine) >> Signed-off-by: zhenwei pi <pizhenwei@bytedance.com> > > (Copying in Babu) > > Hmm I think you're right. I've cc'd in Babu and Wei. > > Eduardo: What do we need to do about compatibility, do we need to wire > this to machine type or CPU version? > > Dave > >> --- >> target/i386/cpu.c | 8 ++++++-- >> 1 file changed, 6 insertions(+), 2 deletions(-) >> >> diff --git a/target/i386/cpu.c b/target/i386/cpu.c >> index a9fe1662d3..3934c559e4 100644 >> --- a/target/i386/cpu.c >> +++ b/target/i386/cpu.c >> @@ -5961,8 +5961,12 @@ void x86_cpu_expand_features(X86CPU *cpu, Error **errp) >> } >> } >> >> - /* CPU topology with multi-dies support requires CPUID[0x1F] */ >> - if (env->nr_dies > 1) { >> + /* >> + * Intel CPU topology with multi-dies support requires CPUID[0x1F]. >> + * For AMD Rome/Milan, cpuid level is 0x10, and guest OS should detect >> + * extended toplogy by leaf 0xB. Only adjust it for Intel CPU. The patch appears correct to me. AMD should use leaf 0xB to detect extended topology. What is the problem here? Or is it just correcting the cpuid based on the SPECS? AMD uses nr_dies to simulate some topology. I dont know if it could become a problem after this patch. >> + */ >> + if ((env->nr_dies > 1) && IS_INTEL_CPU(env)) { >> x86_cpu_adjust_level(cpu, &env->cpuid_min_level, 0x1F); >> } >> >> -- >> 2.25.1 >> >>
Quoting Dr. David Alan Gilbert (2021-06-29 09:06:02) > * zhenwei pi (pizhenwei@bytedance.com) wrote: > > A AMD server typically has cpuid level 0x10(test on Rome/Milan), it > > should not be changed to 0x1f in multi-dies case. > > > > Fixes: a94e1428991 (target/i386: Add CPUID.1F generation support > > for multi-dies PCMachine) > > Signed-off-by: zhenwei pi <pizhenwei@bytedance.com> > > (Copying in Babu) > > Hmm I think you're right. I've cc'd in Babu and Wei. > > Eduardo: What do we need to do about compatibility, do we need to wire > this to machine type or CPU version? FWIW, there are some other CPUID entries like leaves 2 and 4 that are also Intel-specific. With SEV-SNP CPUID enforcement, advertising them to guests will result in failures when host SNP firmware checks the hypervisor-provided CPUID values against the host-supported ones. To address this we've been planning to add an 'amd-cpuid-only' property to suppress them: https://github.com/mdroth/qemu/commit/28d0553fe748d30a8af09e5e58a7da3eff03e21b My thinking is this property should be off by default, and only defined either via explicit command-line option, or via new CPU types. We're also planning to add new CPU versions for EPYC* CPU types that set this 'amd-cpuid-only' property by default: https://github.com/mdroth/qemu/commits/new-cpu-types-upstream So in general I think maybe this change should be similarly controlled by this proposed 'amd-cpuid-only' property. Maybe for this particular case it's okay to do it unconditionally, but it sounds bad to switch up the valid CPUID range after a guest has already booted (which might happen with old->new migration for instance), since it might continue treating values in the range as valid afterward (but again, not sure that's the case here or not). There's some other changes with the new CPU types that we're still considering/testing internally, but should be able to post them in some form next week. -Mike > > Dave > > > --- > > target/i386/cpu.c | 8 ++++++-- > > 1 file changed, 6 insertions(+), 2 deletions(-) > > > > diff --git a/target/i386/cpu.c b/target/i386/cpu.c > > index a9fe1662d3..3934c559e4 100644 > > --- a/target/i386/cpu.c > > +++ b/target/i386/cpu.c > > @@ -5961,8 +5961,12 @@ void x86_cpu_expand_features(X86CPU *cpu, Error **errp) > > } > > } > > > > - /* CPU topology with multi-dies support requires CPUID[0x1F] */ > > - if (env->nr_dies > 1) { > > + /* > > + * Intel CPU topology with multi-dies support requires CPUID[0x1F]. > > + * For AMD Rome/Milan, cpuid level is 0x10, and guest OS should detect > > + * extended toplogy by leaf 0xB. Only adjust it for Intel CPU. > > + */ > > + if ((env->nr_dies > 1) && IS_INTEL_CPU(env)) { > > x86_cpu_adjust_level(cpu, &env->cpuid_min_level, 0x1F); > > } > > > > -- > > 2.25.1 > > > > > -- > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK > >
On Wed, 30 Jun 2021 14:18:09 -0500 Michael Roth <michael.roth@amd.com> wrote: > Quoting Dr. David Alan Gilbert (2021-06-29 09:06:02) > > * zhenwei pi (pizhenwei@bytedance.com) wrote: > > > A AMD server typically has cpuid level 0x10(test on Rome/Milan), it > > > should not be changed to 0x1f in multi-dies case. > > > > > > Fixes: a94e1428991 (target/i386: Add CPUID.1F generation support > > > for multi-dies PCMachine) > > > Signed-off-by: zhenwei pi <pizhenwei@bytedance.com> > > > > (Copying in Babu) > > > > Hmm I think you're right. I've cc'd in Babu and Wei. > > > > Eduardo: What do we need to do about compatibility, do we need to wire > > this to machine type or CPU version? > > FWIW, there are some other CPUID entries like leaves 2 and 4 that are > also Intel-specific. With SEV-SNP CPUID enforcement, advertising them to > guests will result in failures when host SNP firmware checks the > hypervisor-provided CPUID values against the host-supported ones. > > To address this we've been planning to add an 'amd-cpuid-only' property > to suppress them: > > https://github.com/mdroth/qemu/commit/28d0553fe748d30a8af09e5e58a7da3eff03e21b > > My thinking is this property should be off by default, and only defined > either via explicit command-line option, or via new CPU types. We're also > planning to add new CPU versions for EPYC* CPU types that set this > 'amd-cpuid-only' property by default: > > https://github.com/mdroth/qemu/commits/new-cpu-types-upstream It look like having new cpu versions is enough to change behavior, maybe keep 'amd-cpuid-only' as internal field and not expose it to users as a property. > So in general I think maybe this change should be similarly controlled by > this proposed 'amd-cpuid-only' property. Maybe for this particular case it's > okay to do it unconditionally, but it sounds bad to switch up the valid CPUID > range after a guest has already booted (which might happen with old->new > migration for instance), since it might continue treating values in the range > as valid afterward (but again, not sure that's the case here or not). > > There's some other changes with the new CPU types that we're still > considering/testing internally, but should be able to post them in some form > next week. > > -Mike > > > > > Dave > > > > > --- > > > target/i386/cpu.c | 8 ++++++-- > > > 1 file changed, 6 insertions(+), 2 deletions(-) > > > > > > diff --git a/target/i386/cpu.c b/target/i386/cpu.c > > > index a9fe1662d3..3934c559e4 100644 > > > --- a/target/i386/cpu.c > > > +++ b/target/i386/cpu.c > > > @@ -5961,8 +5961,12 @@ void x86_cpu_expand_features(X86CPU *cpu, Error **errp) > > > } > > > } > > > > > > - /* CPU topology with multi-dies support requires CPUID[0x1F] */ > > > - if (env->nr_dies > 1) { > > > + /* > > > + * Intel CPU topology with multi-dies support requires CPUID[0x1F]. > > > + * For AMD Rome/Milan, cpuid level is 0x10, and guest OS should detect > > > + * extended toplogy by leaf 0xB. Only adjust it for Intel CPU. > > > + */ > > > + if ((env->nr_dies > 1) && IS_INTEL_CPU(env)) { > > > x86_cpu_adjust_level(cpu, &env->cpuid_min_level, 0x1F); > > > } > > > > > > -- > > > 2.25.1 > > > > > > > > -- > > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK > > > > >
Quoting Igor Mammedov (2021-07-01 03:43:13) > On Wed, 30 Jun 2021 14:18:09 -0500 > Michael Roth <michael.roth@amd.com> wrote: > > > Quoting Dr. David Alan Gilbert (2021-06-29 09:06:02) > > > * zhenwei pi (pizhenwei@bytedance.com) wrote: > > > > A AMD server typically has cpuid level 0x10(test on Rome/Milan), it > > > > should not be changed to 0x1f in multi-dies case. > > > > > > > > Fixes: a94e1428991 (target/i386: Add CPUID.1F generation support > > > > for multi-dies PCMachine) > > > > Signed-off-by: zhenwei pi <pizhenwei@bytedance.com> > > > > > > (Copying in Babu) > > > > > > Hmm I think you're right. I've cc'd in Babu and Wei. > > > > > > Eduardo: What do we need to do about compatibility, do we need to wire > > > this to machine type or CPU version? > > > > FWIW, there are some other CPUID entries like leaves 2 and 4 that are > > also Intel-specific. With SEV-SNP CPUID enforcement, advertising them to > > guests will result in failures when host SNP firmware checks the > > hypervisor-provided CPUID values against the host-supported ones. > > > > To address this we've been planning to add an 'amd-cpuid-only' property > > to suppress them: > > > > https://github.com/mdroth/qemu/commit/28d0553fe748d30a8af09e5e58a7da3eff03e21b > > > > My thinking is this property should be off by default, and only defined > > either via explicit command-line option, or via new CPU types. We're also > > planning to add new CPU versions for EPYC* CPU types that set this > > 'amd-cpuid-only' property by default: > > > > https://github.com/mdroth/qemu/commits/new-cpu-types-upstream > It look like having new cpu versions is enough to change behavior, > maybe keep 'amd-cpuid-only' as internal field and not expose it to users > as a property. Hmm, I defined it as a property mainly to make use of X86CPUVersionDefinition.props to create new versions of the CPU types with those properties set. There's a patch there that adds X86CPUVersionDefinition.cache_info so that new cache definitions can be provided for new CPU versions. So would you suggest a similar approach here, e.g. adding an X86CPUVersionDefinition.amd_cpuid_only field that could be used directly rather than going through X86CPUVersionDefinition.props? There's also another new "amd-xsave" prop in that series that does something similar to "amd-cpuid-only", so a little worried about tacking to much extra into X86CPUVersionDefinition. But maybe that one could just be rolled into "amd-cpuid-only" since it is basically fixing up xsave-related cpuid entries for AMD... > > > So in general I think maybe this change should be similarly controlled by > > this proposed 'amd-cpuid-only' property. Maybe for this particular case it's > > okay to do it unconditionally, but it sounds bad to switch up the valid CPUID > > range after a guest has already booted (which might happen with old->new > > migration for instance), since it might continue treating values in the range > > as valid afterward (but again, not sure that's the case here or not). > > > > There's some other changes with the new CPU types that we're still > > considering/testing internally, but should be able to post them in some form > > next week. > > > > -Mike > > > > > > > > Dave > > > > > > > --- > > > > target/i386/cpu.c | 8 ++++++-- > > > > 1 file changed, 6 insertions(+), 2 deletions(-) > > > > > > > > diff --git a/target/i386/cpu.c b/target/i386/cpu.c > > > > index a9fe1662d3..3934c559e4 100644 > > > > --- a/target/i386/cpu.c > > > > +++ b/target/i386/cpu.c > > > > @@ -5961,8 +5961,12 @@ void x86_cpu_expand_features(X86CPU *cpu, Error **errp) > > > > } > > > > } > > > > > > > > - /* CPU topology with multi-dies support requires CPUID[0x1F] */ > > > > - if (env->nr_dies > 1) { > > > > + /* > > > > + * Intel CPU topology with multi-dies support requires CPUID[0x1F]. > > > > + * For AMD Rome/Milan, cpuid level is 0x10, and guest OS should detect > > > > + * extended toplogy by leaf 0xB. Only adjust it for Intel CPU. > > > > + */ > > > > + if ((env->nr_dies > 1) && IS_INTEL_CPU(env)) { > > > > x86_cpu_adjust_level(cpu, &env->cpuid_min_level, 0x1F); > > > > } > > > > > > > > -- > > > > 2.25.1 > > > > > > > > > > > -- > > > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK > > > > > > > > > >
On 7/2/21 4:35 AM, Michael Roth wrote: > Quoting Igor Mammedov (2021-07-01 03:43:13) >> On Wed, 30 Jun 2021 14:18:09 -0500 >> Michael Roth <michael.roth@amd.com> wrote: >> >>> Quoting Dr. David Alan Gilbert (2021-06-29 09:06:02) >>>> * zhenwei pi (pizhenwei@bytedance.com) wrote: >>>>> A AMD server typically has cpuid level 0x10(test on Rome/Milan), it >>>>> should not be changed to 0x1f in multi-dies case. >>>>> >>>>> Fixes: a94e1428991 (target/i386: Add CPUID.1F generation support >>>>> for multi-dies PCMachine) >>>>> Signed-off-by: zhenwei pi <pizhenwei@bytedance.com> >>>> >>>> (Copying in Babu) >>>> >>>> Hmm I think you're right. I've cc'd in Babu and Wei. >>>> >>>> Eduardo: What do we need to do about compatibility, do we need to wire >>>> this to machine type or CPU version? >>> >>> FWIW, there are some other CPUID entries like leaves 2 and 4 that are >>> also Intel-specific. With SEV-SNP CPUID enforcement, advertising them to >>> guests will result in failures when host SNP firmware checks the >>> hypervisor-provided CPUID values against the host-supported ones. >>> >>> To address this we've been planning to add an 'amd-cpuid-only' property >>> to suppress them: >>> >>> https://github.com/mdroth/qemu/commit/28d0553fe748d30a8af09e5e58a7da3eff03e21b >>> >>> My thinking is this property should be off by default, and only defined >>> either via explicit command-line option, or via new CPU types. We're also >>> planning to add new CPU versions for EPYC* CPU types that set this >>> 'amd-cpuid-only' property by default: >>> >>> https://github.com/mdroth/qemu/commits/new-cpu-types-upstream >> It look like having new cpu versions is enough to change behavior, >> maybe keep 'amd-cpuid-only' as internal field and not expose it to users >> as a property. > > Hmm, I defined it as a property mainly to make use of > X86CPUVersionDefinition.props to create new versions of the CPU types > with those properties set. > > There's a patch there that adds X86CPUVersionDefinition.cache_info so > that new cache definitions can be provided for new CPU versions. So > would you suggest a similar approach here, e.g. adding an > X86CPUVersionDefinition.amd_cpuid_only field that could be used directly > rather than going through X86CPUVersionDefinition.props? > > There's also another new "amd-xsave" prop in that series that does something > similar to "amd-cpuid-only", so a little worried about tacking to much extra > into X86CPUVersionDefinition. But maybe that one could just be rolled into > "amd-cpuid-only" since it is basically fixing up xsave-related cpuid > entries for AMD... > Hi, this patch wants to fix the issue: AMD CPU (Rome/Milan) should get the cpuid level 0x10, not 0x1F in the guest. If QEMU reports a 0x1F to guest, guest(Linux) would use leaf 0x1F instead of leaf 0xB to get extended topology: https://github.com/torvalds/linux/blob/master/arch/x86/kernel/cpu/topology.c#L49 static int detect_extended_topology_leaf(struct cpuinfo_x86 *c) { if (c->cpuid_level >= 0x1f) { if (check_extended_topology_leaf(0x1f) == 0) return 0x1f; } if (c->cpuid_level >= 0xb) { if (check_extended_topology_leaf(0xb) == 0) return 0xb; } return -1; } Because of the wrong cpuid level, the guest gets unexpected topology from leaf 0x1F. I tested https://github.com/mdroth/qemu/commits/new-cpu-types-upstream, and it seems that these patches could not fix this issue. >> >>> So in general I think maybe this change should be similarly controlled by >>> this proposed 'amd-cpuid-only' property. Maybe for this particular case it's >>> okay to do it unconditionally, but it sounds bad to switch up the valid CPUID >>> range after a guest has already booted (which might happen with old->new >>> migration for instance), since it might continue treating values in the range >>> as valid afterward (but again, not sure that's the case here or not). >>> >>> There's some other changes with the new CPU types that we're still >>> considering/testing internally, but should be able to post them in some form >>> next week. >>> >>> -Mike >>> >>>> >>>> Dave >>>> >>>>> --- >>>>> target/i386/cpu.c | 8 ++++++-- >>>>> 1 file changed, 6 insertions(+), 2 deletions(-) >>>>> >>>>> diff --git a/target/i386/cpu.c b/target/i386/cpu.c >>>>> index a9fe1662d3..3934c559e4 100644 >>>>> --- a/target/i386/cpu.c >>>>> +++ b/target/i386/cpu.c >>>>> @@ -5961,8 +5961,12 @@ void x86_cpu_expand_features(X86CPU *cpu, Error **errp) >>>>> } >>>>> } >>>>> >>>>> - /* CPU topology with multi-dies support requires CPUID[0x1F] */ >>>>> - if (env->nr_dies > 1) { >>>>> + /* >>>>> + * Intel CPU topology with multi-dies support requires CPUID[0x1F]. >>>>> + * For AMD Rome/Milan, cpuid level is 0x10, and guest OS should detect >>>>> + * extended toplogy by leaf 0xB. Only adjust it for Intel CPU. >>>>> + */ >>>>> + if ((env->nr_dies > 1) && IS_INTEL_CPU(env)) { >>>>> x86_cpu_adjust_level(cpu, &env->cpuid_min_level, 0x1F); >>>>> } >>>>> >>>>> -- >>>>> 2.25.1 >>>>> >>>>> >>>> -- >>>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK >>>> >>>> >>> >> >>
On Thursday, 2021-07-01 at 15:35:49 -05, Michael Roth wrote: > Quoting Igor Mammedov (2021-07-01 03:43:13) >> On Wed, 30 Jun 2021 14:18:09 -0500 >> Michael Roth <michael.roth@amd.com> wrote: >> >> > Quoting Dr. David Alan Gilbert (2021-06-29 09:06:02) >> > > * zhenwei pi (pizhenwei@bytedance.com) wrote: >> > > > A AMD server typically has cpuid level 0x10(test on Rome/Milan), it >> > > > should not be changed to 0x1f in multi-dies case. >> > > > >> > > > Fixes: a94e1428991 (target/i386: Add CPUID.1F generation support >> > > > for multi-dies PCMachine) >> > > > Signed-off-by: zhenwei pi <pizhenwei@bytedance.com> >> > > >> > > (Copying in Babu) >> > > >> > > Hmm I think you're right. I've cc'd in Babu and Wei. >> > > >> > > Eduardo: What do we need to do about compatibility, do we need to wire >> > > this to machine type or CPU version? >> > >> > FWIW, there are some other CPUID entries like leaves 2 and 4 that are >> > also Intel-specific. With SEV-SNP CPUID enforcement, advertising them to >> > guests will result in failures when host SNP firmware checks the >> > hypervisor-provided CPUID values against the host-supported ones. >> > >> > To address this we've been planning to add an 'amd-cpuid-only' property >> > to suppress them: >> > >> > https://github.com/mdroth/qemu/commit/28d0553fe748d30a8af09e5e58a7da3eff03e21b >> > >> > My thinking is this property should be off by default, and only defined >> > either via explicit command-line option, or via new CPU types. We're also >> > planning to add new CPU versions for EPYC* CPU types that set this >> > 'amd-cpuid-only' property by default: >> > >> > https://github.com/mdroth/qemu/commits/new-cpu-types-upstream >> It look like having new cpu versions is enough to change behavior, >> maybe keep 'amd-cpuid-only' as internal field and not expose it to users >> as a property. > > Hmm, I defined it as a property mainly to make use of > X86CPUVersionDefinition.props to create new versions of the CPU types > with those properties set. > > There's a patch there that adds X86CPUVersionDefinition.cache_info so > that new cache definitions can be provided for new CPU versions. So > would you suggest a similar approach here, e.g. adding an > X86CPUVersionDefinition.amd_cpuid_only field that could be used directly > rather than going through X86CPUVersionDefinition.props? > > There's also another new "amd-xsave" prop in that series that does something > similar to "amd-cpuid-only", so a little worried about tacking to much extra > into X86CPUVersionDefinition. But maybe that one could just be rolled into > "amd-cpuid-only" since it is basically fixing up xsave-related cpuid > entries for AMD... The XSAVE changes are similar to https://lore.kernel.org/r/20210520145647.3483809-1-david.edmondson@oracle.com, in response to which Paolo suggested that I should have QEMU observe the CPUID 0xd leaves rather than encoding knowledge about the state offsets (at least, that's how I understood his comment). I have patches that do this (which includes making X86XSaveArea a TCG-only thing) that I plan to send out in the next couple of days. They should make "amd-xsave" unnecessary. >> >> > So in general I think maybe this change should be similarly controlled by >> > this proposed 'amd-cpuid-only' property. Maybe for this particular case it's >> > okay to do it unconditionally, but it sounds bad to switch up the valid CPUID >> > range after a guest has already booted (which might happen with old->new >> > migration for instance), since it might continue treating values in the range >> > as valid afterward (but again, not sure that's the case here or not). >> > >> > There's some other changes with the new CPU types that we're still >> > considering/testing internally, but should be able to post them in some form >> > next week. >> > >> > -Mike >> > >> > > >> > > Dave >> > > >> > > > --- >> > > > target/i386/cpu.c | 8 ++++++-- >> > > > 1 file changed, 6 insertions(+), 2 deletions(-) >> > > > >> > > > diff --git a/target/i386/cpu.c b/target/i386/cpu.c >> > > > index a9fe1662d3..3934c559e4 100644 >> > > > --- a/target/i386/cpu.c >> > > > +++ b/target/i386/cpu.c >> > > > @@ -5961,8 +5961,12 @@ void x86_cpu_expand_features(X86CPU *cpu, Error **errp) >> > > > } >> > > > } >> > > > >> > > > - /* CPU topology with multi-dies support requires CPUID[0x1F] */ >> > > > - if (env->nr_dies > 1) { >> > > > + /* >> > > > + * Intel CPU topology with multi-dies support requires CPUID[0x1F]. >> > > > + * For AMD Rome/Milan, cpuid level is 0x10, and guest OS should detect >> > > > + * extended toplogy by leaf 0xB. Only adjust it for Intel CPU. >> > > > + */ >> > > > + if ((env->nr_dies > 1) && IS_INTEL_CPU(env)) { >> > > > x86_cpu_adjust_level(cpu, &env->cpuid_min_level, 0x1F); >> > > > } >> > > > >> > > > -- >> > > > 2.25.1 >> > > > >> > > > >> > > -- >> > > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK >> > > >> > > >> > >> >> dme.
On Fri, Jul 02, 2021 at 07:50:03AM +0100, David Edmondson wrote: > On Thursday, 2021-07-01 at 15:35:49 -05, Michael Roth wrote: > > > Quoting Igor Mammedov (2021-07-01 03:43:13) > >> On Wed, 30 Jun 2021 14:18:09 -0500 > >> Michael Roth <michael.roth@amd.com> wrote: > >> > >> > Quoting Dr. David Alan Gilbert (2021-06-29 09:06:02) > >> > > * zhenwei pi (pizhenwei@bytedance.com) wrote: > >> > > > A AMD server typically has cpuid level 0x10(test on Rome/Milan), it > >> > > > should not be changed to 0x1f in multi-dies case. > >> > > > > >> > > > Fixes: a94e1428991 (target/i386: Add CPUID.1F generation support > >> > > > for multi-dies PCMachine) > >> > > > Signed-off-by: zhenwei pi <pizhenwei@bytedance.com> > >> > > > >> > > (Copying in Babu) > >> > > > >> > > Hmm I think you're right. I've cc'd in Babu and Wei. > >> > > > >> > > Eduardo: What do we need to do about compatibility, do we need to wire > >> > > this to machine type or CPU version? > >> > > >> > FWIW, there are some other CPUID entries like leaves 2 and 4 that are > >> > also Intel-specific. With SEV-SNP CPUID enforcement, advertising them to > >> > guests will result in failures when host SNP firmware checks the > >> > hypervisor-provided CPUID values against the host-supported ones. > >> > > >> > To address this we've been planning to add an 'amd-cpuid-only' property > >> > to suppress them: > >> > > >> > https://github.com/mdroth/qemu/commit/28d0553fe748d30a8af09e5e58a7da3eff03e21b > >> > > >> > My thinking is this property should be off by default, and only defined > >> > either via explicit command-line option, or via new CPU types. We're also > >> > planning to add new CPU versions for EPYC* CPU types that set this > >> > 'amd-cpuid-only' property by default: > >> > > >> > https://github.com/mdroth/qemu/commits/new-cpu-types-upstream > >> It look like having new cpu versions is enough to change behavior, > >> maybe keep 'amd-cpuid-only' as internal field and not expose it to users > >> as a property. > > > > Hmm, I defined it as a property mainly to make use of > > X86CPUVersionDefinition.props to create new versions of the CPU types > > with those properties set. > > > > There's a patch there that adds X86CPUVersionDefinition.cache_info so > > that new cache definitions can be provided for new CPU versions. So > > would you suggest a similar approach here, e.g. adding an > > X86CPUVersionDefinition.amd_cpuid_only field that could be used directly > > rather than going through X86CPUVersionDefinition.props? > > > > There's also another new "amd-xsave" prop in that series that does something > > similar to "amd-cpuid-only", so a little worried about tacking to much extra > > into X86CPUVersionDefinition. But maybe that one could just be rolled into > > "amd-cpuid-only" since it is basically fixing up xsave-related cpuid > > entries for AMD... > > The XSAVE changes are similar to > https://lore.kernel.org/r/20210520145647.3483809-1-david.edmondson@oracle.com, > in response to which Paolo suggested that I should have QEMU observe the > CPUID 0xd leaves rather than encoding knowledge about the state offsets > (at least, that's how I understood his comment). > > I have patches that do this (which includes making X86XSaveArea a > TCG-only thing) that I plan to send out in the next couple of days. They > should make "amd-xsave" unnecessary. Ok, that does sounds like the better approach. Thanks for the heads up. > > >> > >> > So in general I think maybe this change should be similarly controlled by > >> > this proposed 'amd-cpuid-only' property. Maybe for this particular case it's > >> > okay to do it unconditionally, but it sounds bad to switch up the valid CPUID > >> > range after a guest has already booted (which might happen with old->new > >> > migration for instance), since it might continue treating values in the range > >> > as valid afterward (but again, not sure that's the case here or not). > >> > > >> > There's some other changes with the new CPU types that we're still > >> > considering/testing internally, but should be able to post them in some form > >> > next week. > >> > > >> > -Mike > >> > > >> > > > >> > > Dave > >> > > > >> > > > --- > >> > > > target/i386/cpu.c | 8 ++++++-- > >> > > > 1 file changed, 6 insertions(+), 2 deletions(-) > >> > > > > >> > > > diff --git a/target/i386/cpu.c b/target/i386/cpu.c > >> > > > index a9fe1662d3..3934c559e4 100644 > >> > > > --- a/target/i386/cpu.c > >> > > > +++ b/target/i386/cpu.c > >> > > > @@ -5961,8 +5961,12 @@ void x86_cpu_expand_features(X86CPU *cpu, Error **errp) > >> > > > } > >> > > > } > >> > > > > >> > > > - /* CPU topology with multi-dies support requires CPUID[0x1F] */ > >> > > > - if (env->nr_dies > 1) { > >> > > > + /* > >> > > > + * Intel CPU topology with multi-dies support requires CPUID[0x1F]. > >> > > > + * For AMD Rome/Milan, cpuid level is 0x10, and guest OS should detect > >> > > > + * extended toplogy by leaf 0xB. Only adjust it for Intel CPU. > >> > > > + */ > >> > > > + if ((env->nr_dies > 1) && IS_INTEL_CPU(env)) { > >> > > > x86_cpu_adjust_level(cpu, &env->cpuid_min_level, 0x1F); > >> > > > } > >> > > > > >> > > > -- > >> > > > 2.25.1 > >> > > > > >> > > > > >> > > -- > >> > > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK > >> > > > >> > > > >> > > >> > >> > > dme. > -- > I had my eyes closed in the dark. >
On Fri, Jul 02, 2021 at 01:14:56PM +0800, zhenwei pi wrote: > On 7/2/21 4:35 AM, Michael Roth wrote: > > Quoting Igor Mammedov (2021-07-01 03:43:13) > > > On Wed, 30 Jun 2021 14:18:09 -0500 > > > Michael Roth <michael.roth@amd.com> wrote: > > > > > > > Quoting Dr. David Alan Gilbert (2021-06-29 09:06:02) > > > > > * zhenwei pi (pizhenwei@bytedance.com) wrote: > > > > > > A AMD server typically has cpuid level 0x10(test on Rome/Milan), it > > > > > > should not be changed to 0x1f in multi-dies case. > > > > > > > > > > > > Fixes: a94e1428991 (target/i386: Add CPUID.1F generation support > > > > > > for multi-dies PCMachine) > > > > > > Signed-off-by: zhenwei pi <pizhenwei@bytedance.com> > > > > > > > > > > (Copying in Babu) > > > > > > > > > > Hmm I think you're right. I've cc'd in Babu and Wei. > > > > > > > > > > Eduardo: What do we need to do about compatibility, do we need to wire > > > > > this to machine type or CPU version? > > > > > > > > FWIW, there are some other CPUID entries like leaves 2 and 4 that are > > > > also Intel-specific. With SEV-SNP CPUID enforcement, advertising them to > > > > guests will result in failures when host SNP firmware checks the > > > > hypervisor-provided CPUID values against the host-supported ones. > > > > > > > > To address this we've been planning to add an 'amd-cpuid-only' property > > > > to suppress them: > > > > > > > > https://github.com/mdroth/qemu/commit/28d0553fe748d30a8af09e5e58a7da3eff03e21b > > > > > > > > My thinking is this property should be off by default, and only defined > > > > either via explicit command-line option, or via new CPU types. We're also > > > > planning to add new CPU versions for EPYC* CPU types that set this > > > > 'amd-cpuid-only' property by default: > > > > > > > > https://github.com/mdroth/qemu/commits/new-cpu-types-upstream > > > It look like having new cpu versions is enough to change behavior, > > > maybe keep 'amd-cpuid-only' as internal field and not expose it to users > > > as a property. > > > > Hmm, I defined it as a property mainly to make use of > > X86CPUVersionDefinition.props to create new versions of the CPU types > > with those properties set. > > > > There's a patch there that adds X86CPUVersionDefinition.cache_info so > > that new cache definitions can be provided for new CPU versions. So > > would you suggest a similar approach here, e.g. adding an > > X86CPUVersionDefinition.amd_cpuid_only field that could be used directly > > rather than going through X86CPUVersionDefinition.props? > > > > There's also another new "amd-xsave" prop in that series that does something > > similar to "amd-cpuid-only", so a little worried about tacking to much extra > > into X86CPUVersionDefinition. But maybe that one could just be rolled into > > "amd-cpuid-only" since it is basically fixing up xsave-related cpuid > > entries for AMD... > > > Hi, this patch wants to fix the issue: > AMD CPU (Rome/Milan) should get the cpuid level 0x10, not 0x1F in the guest. > If QEMU reports a 0x1F to guest, guest(Linux) would use leaf 0x1F instead of > leaf 0xB to get extended topology: > > https://github.com/torvalds/linux/blob/master/arch/x86/kernel/cpu/topology.c#L49 > > static int detect_extended_topology_leaf(struct cpuinfo_x86 *c) > { > if (c->cpuid_level >= 0x1f) { > if (check_extended_topology_leaf(0x1f) == 0) > return 0x1f; > } > > if (c->cpuid_level >= 0xb) { > if (check_extended_topology_leaf(0xb) == 0) > return 0xb; > } > > return -1; > } > > Because of the wrong cpuid level, the guest gets unexpected topology from > leaf 0x1F. > > I tested https://github.com/mdroth/qemu/commits/new-cpu-types-upstream, and > it seems that these patches could not fix this issue. Yes, I think your patch would still be needed. The question is whether it's okay to change it for existing CPU types, e.g. EPYC-Milan, or only for new ones when they set a certain flag/property, like the proposed "amd-cpuid-only" (which the proposed EPYC-Milan-v2 would set). > > > > > > > > So in general I think maybe this change should be similarly controlled by > > > > this proposed 'amd-cpuid-only' property. Maybe for this particular case it's > > > > okay to do it unconditionally, but it sounds bad to switch up the valid CPUID > > > > range after a guest has already booted (which might happen with old->new > > > > migration for instance), since it might continue treating values in the range > > > > as valid afterward (but again, not sure that's the case here or not). > > > > > > > > There's some other changes with the new CPU types that we're still > > > > considering/testing internally, but should be able to post them in some form > > > > next week. > > > > > > > > -Mike > > > > > > > > > > > > > > Dave > > > > > > --- > > > > > > target/i386/cpu.c | 8 ++++++-- > > > > > > 1 file changed, 6 insertions(+), 2 deletions(-) > > > > > > > > > > > > diff --git a/target/i386/cpu.c b/target/i386/cpu.c > > > > > > index a9fe1662d3..3934c559e4 100644 > > > > > > --- a/target/i386/cpu.c > > > > > > +++ b/target/i386/cpu.c > > > > > > @@ -5961,8 +5961,12 @@ void x86_cpu_expand_features(X86CPU *cpu, Error **errp) > > > > > > } > > > > > > } > > > > > > - /* CPU topology with multi-dies support requires CPUID[0x1F] */ > > > > > > - if (env->nr_dies > 1) { > > > > > > + /* > > > > > > + * Intel CPU topology with multi-dies support requires CPUID[0x1F]. > > > > > > + * For AMD Rome/Milan, cpuid level is 0x10, and guest OS should detect > > > > > > + * extended toplogy by leaf 0xB. Only adjust it for Intel CPU. > > > > > > + */ > > > > > > + if ((env->nr_dies > 1) && IS_INTEL_CPU(env)) { > > > > > > x86_cpu_adjust_level(cpu, &env->cpuid_min_level, 0x1F); > > > > > > } > > > > > > -- > > > > > > 2.25.1 > > > > > > > > > > > -- > > > > > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK > > > > > > > > > > > > > > > > > -- > zhenwei pi >
On Wed, Jun 30, 2021 at 02:18:09PM -0500, Michael Roth wrote: > Quoting Dr. David Alan Gilbert (2021-06-29 09:06:02) > > * zhenwei pi (pizhenwei@bytedance.com) wrote: > > > A AMD server typically has cpuid level 0x10(test on Rome/Milan), it > > > should not be changed to 0x1f in multi-dies case. > > > > > > Fixes: a94e1428991 (target/i386: Add CPUID.1F generation support > > > for multi-dies PCMachine) > > > Signed-off-by: zhenwei pi <pizhenwei@bytedance.com> > > > > (Copying in Babu) > > > > Hmm I think you're right. I've cc'd in Babu and Wei. > > > > Eduardo: What do we need to do about compatibility, do we need to wire > > this to machine type or CPU version? If the change doesn't affect runnability of the CPU in a given host (i.e. it doesn't introduce or remove host software or hardware dependencies), it can be enabled for all CPU types in newer machine types. > > FWIW, there are some other CPUID entries like leaves 2 and 4 that are > also Intel-specific. With SEV-SNP CPUID enforcement, advertising them to > guests will result in failures when host SNP firmware checks the > hypervisor-provided CPUID values against the host-supported ones. > > To address this we've been planning to add an 'amd-cpuid-only' property > to suppress them: > > https://github.com/mdroth/qemu/commit/28d0553fe748d30a8af09e5e58a7da3eff03e21b > > My thinking is this property should be off by default, and only defined > either via explicit command-line option, or via new CPU types. We're also > planning to add new CPU versions for EPYC* CPU types that set this > 'amd-cpuid-only' property by default: > > https://github.com/mdroth/qemu/commits/new-cpu-types-upstream KVM has a hack that changes the CPUID vendor info depending on the host (ignoring X86CPUDefinition.vendor completely). For that reason, I would make the new behavior tied to the actual CPU vendor seen by the guest, not to the CPU type. It will be a bit more complicated, but less likely to cause problems when management software tries to auto-detect the CPU model and guesses a model from the wrong vendor. We still need to keep compatibility somehow, though: > > So in general I think maybe this change should be similarly controlled by > this proposed 'amd-cpuid-only' property. Maybe for this particular case it's > okay to do it unconditionally, but it sounds bad to switch up the valid CPUID > range after a guest has already booted (which might happen with old->new > migration for instance), since it might continue treating values in the range > as valid afterward (but again, not sure that's the case here or not). I agree, especially if the planned CPUID changes are more intrusive than just CPUID level adjustments. I suggest adding a "vendor-cpuid-only" property, that would hide CPUID leaves depending on the actual CPUID vendor seen by the guest. Older machine types can set vendor-cpuid-only=off, and newer machine-types would have vendor-cpuid-only=on by default. > > There's some other changes with the new CPU types that we're still > considering/testing internally, but should be able to post them in some form > next week. > > -Mike > > > > > Dave > > > > > --- > > > target/i386/cpu.c | 8 ++++++-- > > > 1 file changed, 6 insertions(+), 2 deletions(-) > > > > > > diff --git a/target/i386/cpu.c b/target/i386/cpu.c > > > index a9fe1662d3..3934c559e4 100644 > > > --- a/target/i386/cpu.c > > > +++ b/target/i386/cpu.c > > > @@ -5961,8 +5961,12 @@ void x86_cpu_expand_features(X86CPU *cpu, Error **errp) > > > } > > > } > > > > > > - /* CPU topology with multi-dies support requires CPUID[0x1F] */ > > > - if (env->nr_dies > 1) { > > > + /* > > > + * Intel CPU topology with multi-dies support requires CPUID[0x1F]. > > > + * For AMD Rome/Milan, cpuid level is 0x10, and guest OS should detect > > > + * extended toplogy by leaf 0xB. Only adjust it for Intel CPU. > > > + */ > > > + if ((env->nr_dies > 1) && IS_INTEL_CPU(env)) { > > > x86_cpu_adjust_level(cpu, &env->cpuid_min_level, 0x1F); > > > } > > > > > > -- > > > 2.25.1 > > > > > > > > -- > > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK > > > > >
On Fri, Jul 02, 2021 at 10:43:22AM -0500, Michael Roth wrote: > On Fri, Jul 02, 2021 at 01:14:56PM +0800, zhenwei pi wrote: > > On 7/2/21 4:35 AM, Michael Roth wrote: > > > Quoting Igor Mammedov (2021-07-01 03:43:13) > > > > On Wed, 30 Jun 2021 14:18:09 -0500 > > > > Michael Roth <michael.roth@amd.com> wrote: > > > > > > > > > Quoting Dr. David Alan Gilbert (2021-06-29 09:06:02) > > > > > > * zhenwei pi (pizhenwei@bytedance.com) wrote: > > > > > > > A AMD server typically has cpuid level 0x10(test on Rome/Milan), it > > > > > > > should not be changed to 0x1f in multi-dies case. > > > > > > > > > > > > > > Fixes: a94e1428991 (target/i386: Add CPUID.1F generation support > > > > > > > for multi-dies PCMachine) > > > > > > > Signed-off-by: zhenwei pi <pizhenwei@bytedance.com> > > > > > > > > > > > > (Copying in Babu) > > > > > > > > > > > > Hmm I think you're right. I've cc'd in Babu and Wei. > > > > > > > > > > > > Eduardo: What do we need to do about compatibility, do we need to wire > > > > > > this to machine type or CPU version? > > > > > > > > > > FWIW, there are some other CPUID entries like leaves 2 and 4 that are > > > > > also Intel-specific. With SEV-SNP CPUID enforcement, advertising them to > > > > > guests will result in failures when host SNP firmware checks the > > > > > hypervisor-provided CPUID values against the host-supported ones. > > > > > > > > > > To address this we've been planning to add an 'amd-cpuid-only' property > > > > > to suppress them: > > > > > > > > > > https://github.com/mdroth/qemu/commit/28d0553fe748d30a8af09e5e58a7da3eff03e21b > > > > > > > > > > My thinking is this property should be off by default, and only defined > > > > > either via explicit command-line option, or via new CPU types. We're also > > > > > planning to add new CPU versions for EPYC* CPU types that set this > > > > > 'amd-cpuid-only' property by default: > > > > > > > > > > https://github.com/mdroth/qemu/commits/new-cpu-types-upstream > > > > It look like having new cpu versions is enough to change behavior, > > > > maybe keep 'amd-cpuid-only' as internal field and not expose it to users > > > > as a property. > > > > > > Hmm, I defined it as a property mainly to make use of > > > X86CPUVersionDefinition.props to create new versions of the CPU types > > > with those properties set. > > > > > > There's a patch there that adds X86CPUVersionDefinition.cache_info so > > > that new cache definitions can be provided for new CPU versions. So > > > would you suggest a similar approach here, e.g. adding an > > > X86CPUVersionDefinition.amd_cpuid_only field that could be used directly > > > rather than going through X86CPUVersionDefinition.props? > > > > > > There's also another new "amd-xsave" prop in that series that does something > > > similar to "amd-cpuid-only", so a little worried about tacking to much extra > > > into X86CPUVersionDefinition. But maybe that one could just be rolled into > > > "amd-cpuid-only" since it is basically fixing up xsave-related cpuid > > > entries for AMD... > > > > > Hi, this patch wants to fix the issue: > > AMD CPU (Rome/Milan) should get the cpuid level 0x10, not 0x1F in the guest. > > If QEMU reports a 0x1F to guest, guest(Linux) would use leaf 0x1F instead of > > leaf 0xB to get extended topology: > > > > https://github.com/torvalds/linux/blob/master/arch/x86/kernel/cpu/topology.c#L49 > > > > static int detect_extended_topology_leaf(struct cpuinfo_x86 *c) > > { > > if (c->cpuid_level >= 0x1f) { > > if (check_extended_topology_leaf(0x1f) == 0) > > return 0x1f; > > } > > > > if (c->cpuid_level >= 0xb) { > > if (check_extended_topology_leaf(0xb) == 0) > > return 0xb; > > } > > > > return -1; > > } > > > > Because of the wrong cpuid level, the guest gets unexpected topology from > > leaf 0x1F. > > > > I tested https://github.com/mdroth/qemu/commits/new-cpu-types-upstream, and > > it seems that these patches could not fix this issue. > > Yes, I think your patch would still be needed. The question is whether it's > okay to change it for existing CPU types, e.g. EPYC-Milan, or only for new ones > when they set a certain flag/property, like the proposed "amd-cpuid-only" (which > the proposed EPYC-Milan-v2 would set). I tried to answer this in a separate reply in this thread, but answering here for visibility: You can safely do it on existing CPU types, because the new behavior doesn't introduce host software or hardware requirements when enabled. You just need to disable the new behavior in MachineClass.compat_props for older machine types.
Quoting Eduardo Habkost (2021-07-02 12:35:34) > On Fri, Jul 02, 2021 at 10:43:22AM -0500, Michael Roth wrote: > > On Fri, Jul 02, 2021 at 01:14:56PM +0800, zhenwei pi wrote: > > > On 7/2/21 4:35 AM, Michael Roth wrote: > > > > Quoting Igor Mammedov (2021-07-01 03:43:13) > > > > > On Wed, 30 Jun 2021 14:18:09 -0500 > > > > > Michael Roth <michael.roth@amd.com> wrote: > > > > > > > > > > > Quoting Dr. David Alan Gilbert (2021-06-29 09:06:02) > > > > > > > * zhenwei pi (pizhenwei@bytedance.com) wrote: > > > > > > > > A AMD server typically has cpuid level 0x10(test on Rome/Milan), it > > > > > > > > should not be changed to 0x1f in multi-dies case. > > > > > > > > > > > > > > > > Fixes: a94e1428991 (target/i386: Add CPUID.1F generation support > > > > > > > > for multi-dies PCMachine) > > > > > > > > Signed-off-by: zhenwei pi <pizhenwei@bytedance.com> > > > > > > > > > > > > > > (Copying in Babu) > > > > > > > > > > > > > > Hmm I think you're right. I've cc'd in Babu and Wei. > > > > > > > > > > > > > > Eduardo: What do we need to do about compatibility, do we need to wire > > > > > > > this to machine type or CPU version? > > > > > > > > > > > > FWIW, there are some other CPUID entries like leaves 2 and 4 that are > > > > > > also Intel-specific. With SEV-SNP CPUID enforcement, advertising them to > > > > > > guests will result in failures when host SNP firmware checks the > > > > > > hypervisor-provided CPUID values against the host-supported ones. > > > > > > > > > > > > To address this we've been planning to add an 'amd-cpuid-only' property > > > > > > to suppress them: > > > > > > > > > > > > https://github.com/mdroth/qemu/commit/28d0553fe748d30a8af09e5e58a7da3eff03e21b > > > > > > > > > > > > My thinking is this property should be off by default, and only defined > > > > > > either via explicit command-line option, or via new CPU types. We're also > > > > > > planning to add new CPU versions for EPYC* CPU types that set this > > > > > > 'amd-cpuid-only' property by default: > > > > > > > > > > > > https://github.com/mdroth/qemu/commits/new-cpu-types-upstream > > > > > It look like having new cpu versions is enough to change behavior, > > > > > maybe keep 'amd-cpuid-only' as internal field and not expose it to users > > > > > as a property. > > > > > > > > Hmm, I defined it as a property mainly to make use of > > > > X86CPUVersionDefinition.props to create new versions of the CPU types > > > > with those properties set. > > > > > > > > There's a patch there that adds X86CPUVersionDefinition.cache_info so > > > > that new cache definitions can be provided for new CPU versions. So > > > > would you suggest a similar approach here, e.g. adding an > > > > X86CPUVersionDefinition.amd_cpuid_only field that could be used directly > > > > rather than going through X86CPUVersionDefinition.props? > > > > > > > > There's also another new "amd-xsave" prop in that series that does something > > > > similar to "amd-cpuid-only", so a little worried about tacking to much extra > > > > into X86CPUVersionDefinition. But maybe that one could just be rolled into > > > > "amd-cpuid-only" since it is basically fixing up xsave-related cpuid > > > > entries for AMD... > > > > > > > Hi, this patch wants to fix the issue: > > > AMD CPU (Rome/Milan) should get the cpuid level 0x10, not 0x1F in the guest. > > > If QEMU reports a 0x1F to guest, guest(Linux) would use leaf 0x1F instead of > > > leaf 0xB to get extended topology: > > > > > > https://github.com/torvalds/linux/blob/master/arch/x86/kernel/cpu/topology.c#L49 > > > > > > static int detect_extended_topology_leaf(struct cpuinfo_x86 *c) > > > { > > > if (c->cpuid_level >= 0x1f) { > > > if (check_extended_topology_leaf(0x1f) == 0) > > > return 0x1f; > > > } > > > > > > if (c->cpuid_level >= 0xb) { > > > if (check_extended_topology_leaf(0xb) == 0) > > > return 0xb; > > > } > > > > > > return -1; > > > } > > > > > > Because of the wrong cpuid level, the guest gets unexpected topology from > > > leaf 0x1F. > > > > > > I tested https://github.com/mdroth/qemu/commits/new-cpu-types-upstream, and > > > it seems that these patches could not fix this issue. > > > > Yes, I think your patch would still be needed. The question is whether it's > > okay to change it for existing CPU types, e.g. EPYC-Milan, or only for new ones > > when they set a certain flag/property, like the proposed "amd-cpuid-only" (which > > the proposed EPYC-Milan-v2 would set). > > I tried to answer this in a separate reply in this thread, but > answering here for visibility: > > You can safely do it on existing CPU types, because the new > behavior doesn't introduce host software or hardware requirements > when enabled. You just need to disable the new behavior in > MachineClass.compat_props for older machine types. Hi Eduardo, Thanks for the suggestions. Since the CPUID changes no longer rely on adding new CPU models, I've broken that out as a separate patch here based on your input: https://lists.nongnu.org/archive/html/qemu-devel/2021-07/msg01679.html Zhenwei, with the above patch I think you can change your patch to use: if (!cpu->vendor_cpuid_only || IS_INTEL_CPU(env)) //add intel-specific range Let me know if you want me to update and add to my series. > > -- > Eduardo > >
On Thu, Jul 8, 2021, 13:12 <michael.roth@amd.com> wrote: Quoting Eduardo Habkost (2021-07-02 12:35:34) > On Fri, Jul 02, 2021 at 10:43:22AM -0500, Michael Roth wrote: > > On Fri, Jul 02, 2021 at 01:14:56PM +0800, zhenwei pi wrote: > > > On 7/2/21 4:35 AM, Michael Roth wrote: > > > > Quoting Igor Mammedov (2021-07-01 03:43:13) > > > > > On Wed, 30 Jun 2021 14:18:09 -0500 > > > > > Michael Roth <michael.roth@amd.com> wrote: > > > > > > > > > > > Quoting Dr. David Alan Gilbert (2021-06-29 09:06:02) > > > > > > > * zhenwei pi (pizhenwei@bytedance.com) wrote: > > > > > > > > A AMD server typically has cpuid level 0x10(test on Rome/Milan), it > > > > > > > > should not be changed to 0x1f in multi-dies case. > > > > > > > > > > > > > > > > Fixes: a94e1428991 (target/i386: Add CPUID.1F generation support > > > > > > > > for multi-dies PCMachine) > > > > > > > > Signed-off-by: zhenwei pi <pizhenwei@bytedance.com> > > > > > > > > > > > > > > (Copying in Babu) > > > > > > > > > > > > > > Hmm I think you're right. I've cc'd in Babu and Wei. > > > > > > > > > > > > > > Eduardo: What do we need to do about compatibility, do we need to wire > > > > > > > this to machine type or CPU version? > > > > > > > > > > > > FWIW, there are some other CPUID entries like leaves 2 and 4 that are > > > > > > also Intel-specific. With SEV-SNP CPUID enforcement, advertising them to > > > > > > guests will result in failures when host SNP firmware checks the > > > > > > hypervisor-provided CPUID values against the host-supported ones. > > > > > > > > > > > > To address this we've been planning to add an 'amd-cpuid-only' property > > > > > > to suppress them: > > > > > > > > > > > > https://github.com/mdroth/qemu/commit/28d0553fe748d30a8af09e5e58a7da3eff03e21b > > > > > > > > > > > > My thinking is this property should be off by default, and only defined > > > > > > either via explicit command-line option, or via new CPU types. We're also > > > > > > planning to add new CPU versions for EPYC* CPU types that set this > > > > > > 'amd-cpuid-only' property by default: > > > > > > > > > > > > https://github.com/mdroth/qemu/commits/new-cpu-types-upstream > > > > > It look like having new cpu versions is enough to change behavior, > > > > > maybe keep 'amd-cpuid-only' as internal field and not expose it to users > > > > > as a property. > > > > > > > > Hmm, I defined it as a property mainly to make use of > > > > X86CPUVersionDefinition.props to create new versions of the CPU types > > > > with those properties set. > > > > > > > > There's a patch there that adds X86CPUVersionDefinition.cache_info so > > > > that new cache definitions can be provided for new CPU versions. So > > > > would you suggest a similar approach here, e.g. adding an > > > > X86CPUVersionDefinition.amd_cpuid_only field that could be used directly > > > > rather than going through X86CPUVersionDefinition.props? > > > > > > > > There's also another new "amd-xsave" prop in that series that does something > > > > similar to "amd-cpuid-only", so a little worried about tacking to much extra > > > > into X86CPUVersionDefinition. But maybe that one could just be rolled into > > > > "amd-cpuid-only" since it is basically fixing up xsave-related cpuid > > > > entries for AMD... > > > > > > > Hi, this patch wants to fix the issue: > > > AMD CPU (Rome/Milan) should get the cpuid level 0x10, not 0x1F in the guest. > > > If QEMU reports a 0x1F to guest, guest(Linux) would use leaf 0x1F instead of > > > leaf 0xB to get extended topology: > > > > > > https://github.com/torvalds/linux/blob/master/arch/x86/kernel/cpu/topology.c#L49 > > > > > > static int detect_extended_topology_leaf(struct cpuinfo_x86 *c) > > > { > > > if (c->cpuid_level >= 0x1f) { > > > if (check_extended_topology_leaf(0x1f) == 0) > > > return 0x1f; > > > } > > > > > > if (c->cpuid_level >= 0xb) { > > > if (check_extended_topology_leaf(0xb) == 0) > > > return 0xb; > > > } > > > > > > return -1; > > > } > > > > > > Because of the wrong cpuid level, the guest gets unexpected topology from > > > leaf 0x1F. > > > > > > I tested https://github.com/mdroth/qemu/commits/new-cpu-types-upstream, and > > > it seems that these patches could not fix this issue. > > > > Yes, I think your patch would still be needed. The question is whether it's > > okay to change it for existing CPU types, e.g. EPYC-Milan, or only for new ones > > when they set a certain flag/property, like the proposed "amd-cpuid-only" (which > > the proposed EPYC-Milan-v2 would set). > > I tried to answer this in a separate reply in this thread, but > answering here for visibility: > > You can safely do it on existing CPU types, because the new > behavior doesn't introduce host software or hardware requirements > when enabled. You just need to disable the new behavior in > MachineClass.compat_props for older machine types. Hi Eduardo, Thanks for the suggestions. Since the CPUID changes no longer rely on adding new CPU models, I've broken that out as a separate patch here based on your input: https://lists.nongnu.org/archive/html/qemu-devel/2021-07/msg01679.html Zhenwei, with the above patch I think you can change your patch to use: if (!cpu->vendor_cpuid_only || IS_INTEL_CPU(env)) //add intel-specific range Let me know if you want me to update and add to my series. Sure, thanks a lot. > > -- > Eduardo > >
diff --git a/target/i386/cpu.c b/target/i386/cpu.c index a9fe1662d3..3934c559e4 100644 --- a/target/i386/cpu.c +++ b/target/i386/cpu.c @@ -5961,8 +5961,12 @@ void x86_cpu_expand_features(X86CPU *cpu, Error **errp) } } - /* CPU topology with multi-dies support requires CPUID[0x1F] */ - if (env->nr_dies > 1) { + /* + * Intel CPU topology with multi-dies support requires CPUID[0x1F]. + * For AMD Rome/Milan, cpuid level is 0x10, and guest OS should detect + * extended toplogy by leaf 0xB. Only adjust it for Intel CPU. + */ + if ((env->nr_dies > 1) && IS_INTEL_CPU(env)) { x86_cpu_adjust_level(cpu, &env->cpuid_min_level, 0x1F); }
A AMD server typically has cpuid level 0x10(test on Rome/Milan), it should not be changed to 0x1f in multi-dies case. Fixes: a94e1428991 (target/i386: Add CPUID.1F generation support for multi-dies PCMachine) Signed-off-by: zhenwei pi <pizhenwei@bytedance.com> --- target/i386/cpu.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)