Message ID | 20230914072159.1177582-11-zhao1.liu@linux.intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Support smp.clusters for x86 in QEMU | expand |
On 14/9/23 09:21, Zhao Liu wrote: > From: Zhuocheng Ding <zhuocheng.ding@intel.com> > > smp command has the "clusters" parameter but x86 hasn't supported that > level. "cluster" is a CPU topology level concept above cores, in which > the cores may share some resources (L2 cache or some others like L3 > cache tags, depending on the Archs) [1][2]. For x86, the resource shared > by cores at the cluster level is mainly the L2 cache. > > However, using cluster to define x86's L2 cache topology will cause the > compatibility problem: > > Currently, x86 defaults that the L2 cache is shared in one core, which > actually implies a default setting "cores per L2 cache is 1" and > therefore implicitly defaults to having as many L2 caches as cores. > > For example (i386 PC machine): > -smp 16,sockets=2,dies=2,cores=2,threads=2,maxcpus=16 (*) > > Considering the topology of the L2 cache, this (*) implicitly means "1 > core per L2 cache" and "2 L2 caches per die". > > If we use cluster to configure L2 cache topology with the new default > setting "clusters per L2 cache is 1", the above semantics will change > to "2 cores per cluster" and "1 cluster per L2 cache", that is, "2 > cores per L2 cache". > > So the same command (*) will cause changes in the L2 cache topology, > further affecting the performance of the virtual machine. > > Therefore, x86 should only treat cluster as a cpu topology level and > avoid using it to change L2 cache by default for compatibility. > > "cluster" in smp is the CPU topology level which is between "core" and > die. > > For x86, the "cluster" in smp is corresponding to the module level [2], > which is above the core level. So use the "module" other than "cluster" > in i386 code. > > And please note that x86 already has a cpu topology level also named > "cluster" [3], this level is at the upper level of the package. Here, > the cluster in x86 cpu topology is completely different from the > "clusters" as the smp parameter. After the module level is introduced, > the cluster as the smp parameter will actually refer to the module level > of x86. > > [1]: 864c3b5c32f0 ("hw/core/machine: Introduce CPU cluster topology support") > [2]: Yanan's comment about "cluster", > https://lists.gnu.org/archive/html/qemu-devel/2023-02/msg04051.html > [3]: SDM, vol.3, ch.9, 9.9.1 Hierarchical Mapping of Shared Resources. > > Signed-off-by: Zhuocheng Ding <zhuocheng.ding@intel.com> > Co-developed-by: Zhao Liu <zhao1.liu@intel.com> > Signed-off-by: Zhao Liu <zhao1.liu@intel.com> > Acked-by: Michael S. Tsirkin <mst@redhat.com> > --- > Changes since v1: > * The background of the introduction of the "cluster" parameter and its > exact meaning were revised according to Yanan's explanation. (Yanan) > --- > hw/i386/x86.c | 1 + > target/i386/cpu.c | 1 + > target/i386/cpu.h | 5 +++++ > 3 files changed, 7 insertions(+) > diff --git a/target/i386/cpu.h b/target/i386/cpu.h > index 470257b92240..556e80f29764 100644 > --- a/target/i386/cpu.h > +++ b/target/i386/cpu.h > @@ -1903,6 +1903,11 @@ typedef struct CPUArchState { > > /* Number of dies within this CPU package. */ > unsigned nr_dies; > + /* > + * Number of modules within this CPU package. > + * Module level in x86 cpu topology is corresponding to smp.clusters. > + */ > + unsigned nr_modules; > } CPUX86State; It would be really useful to have an ASCII art comment showing the architecture topology. Also for clarity the topo fields from CPU[Arch]State could be moved into a 'topo' sub structure, or even clearer would be to re-use the X86CPUTopoIDs structure?
Hi Philippe, On Thu, Sep 14, 2023 at 09:38:46AM +0200, Philippe Mathieu-Daudé wrote: > Date: Thu, 14 Sep 2023 09:38:46 +0200 > From: Philippe Mathieu-Daudé <philmd@linaro.org> > Subject: Re: [PATCH v4 10/21] i386: Introduce module-level cpu topology to > CPUX86State > > On 14/9/23 09:21, Zhao Liu wrote: > > From: Zhuocheng Ding <zhuocheng.ding@intel.com> > > > > smp command has the "clusters" parameter but x86 hasn't supported that > > level. "cluster" is a CPU topology level concept above cores, in which > > the cores may share some resources (L2 cache or some others like L3 > > cache tags, depending on the Archs) [1][2]. For x86, the resource shared > > by cores at the cluster level is mainly the L2 cache. > > > > However, using cluster to define x86's L2 cache topology will cause the > > compatibility problem: > > > > Currently, x86 defaults that the L2 cache is shared in one core, which > > actually implies a default setting "cores per L2 cache is 1" and > > therefore implicitly defaults to having as many L2 caches as cores. > > > > For example (i386 PC machine): > > -smp 16,sockets=2,dies=2,cores=2,threads=2,maxcpus=16 (*) > > > > Considering the topology of the L2 cache, this (*) implicitly means "1 > > core per L2 cache" and "2 L2 caches per die". > > > > If we use cluster to configure L2 cache topology with the new default > > setting "clusters per L2 cache is 1", the above semantics will change > > to "2 cores per cluster" and "1 cluster per L2 cache", that is, "2 > > cores per L2 cache". > > > > So the same command (*) will cause changes in the L2 cache topology, > > further affecting the performance of the virtual machine. > > > > Therefore, x86 should only treat cluster as a cpu topology level and > > avoid using it to change L2 cache by default for compatibility. > > > > "cluster" in smp is the CPU topology level which is between "core" and > > die. > > > > For x86, the "cluster" in smp is corresponding to the module level [2], > > which is above the core level. So use the "module" other than "cluster" > > in i386 code. > > > > And please note that x86 already has a cpu topology level also named > > "cluster" [3], this level is at the upper level of the package. Here, > > the cluster in x86 cpu topology is completely different from the > > "clusters" as the smp parameter. After the module level is introduced, > > the cluster as the smp parameter will actually refer to the module level > > of x86. > > > > [1]: 864c3b5c32f0 ("hw/core/machine: Introduce CPU cluster topology support") > > [2]: Yanan's comment about "cluster", > > https://lists.gnu.org/archive/html/qemu-devel/2023-02/msg04051.html > > [3]: SDM, vol.3, ch.9, 9.9.1 Hierarchical Mapping of Shared Resources. > > > > Signed-off-by: Zhuocheng Ding <zhuocheng.ding@intel.com> > > Co-developed-by: Zhao Liu <zhao1.liu@intel.com> > > Signed-off-by: Zhao Liu <zhao1.liu@intel.com> > > Acked-by: Michael S. Tsirkin <mst@redhat.com> > > --- > > Changes since v1: > > * The background of the introduction of the "cluster" parameter and its > > exact meaning were revised according to Yanan's explanation. (Yanan) > > --- > > hw/i386/x86.c | 1 + > > target/i386/cpu.c | 1 + > > target/i386/cpu.h | 5 +++++ > > 3 files changed, 7 insertions(+) > > > > diff --git a/target/i386/cpu.h b/target/i386/cpu.h > > index 470257b92240..556e80f29764 100644 > > --- a/target/i386/cpu.h > > +++ b/target/i386/cpu.h > > @@ -1903,6 +1903,11 @@ typedef struct CPUArchState { > > /* Number of dies within this CPU package. */ > > unsigned nr_dies; > > + /* > > + * Number of modules within this CPU package. > > + * Module level in x86 cpu topology is corresponding to smp.clusters. > > + */ > > + unsigned nr_modules; > > } CPUX86State; > > It would be really useful to have an ASCII art comment showing > the architecture topology. Good idea, I could consider how to show that. > Also for clarity the topo fields from > CPU[Arch]State could be moved into a 'topo' sub structure, or even > clearer would be to re-use the X86CPUTopoIDs structure? Yeah, I also have the plan to do further cleanup about these topology structures [1]. X86CPUTopoInfo is not suitable for hybrid topology case, (hybrid case needs another structure to store the max number elements for each level), so I will try to move that X86CPUTopoInfo into CPU[Arch]State. [1]: https://lists.gnu.org/archive/html/qemu-devel/2023-08/msg01032.html Thanks, Zhao
diff --git a/hw/i386/x86.c b/hw/i386/x86.c index f034df8bf628..9c61b6882b99 100644 --- a/hw/i386/x86.c +++ b/hw/i386/x86.c @@ -306,6 +306,7 @@ void x86_cpu_pre_plug(HotplugHandler *hotplug_dev, init_topo_info(&topo_info, x86ms); env->nr_dies = ms->smp.dies; + env->nr_modules = ms->smp.clusters; /* * If APIC ID is not set, diff --git a/target/i386/cpu.c b/target/i386/cpu.c index 10e11c85f459..401409c5db08 100644 --- a/target/i386/cpu.c +++ b/target/i386/cpu.c @@ -7696,6 +7696,7 @@ static void x86_cpu_initfn(Object *obj) CPUX86State *env = &cpu->env; env->nr_dies = 1; + env->nr_modules = 1; cpu_set_cpustate_pointers(cpu); object_property_add(obj, "feature-words", "X86CPUFeatureWordInfo", diff --git a/target/i386/cpu.h b/target/i386/cpu.h index 470257b92240..556e80f29764 100644 --- a/target/i386/cpu.h +++ b/target/i386/cpu.h @@ -1903,6 +1903,11 @@ typedef struct CPUArchState { /* Number of dies within this CPU package. */ unsigned nr_dies; + /* + * Number of modules within this CPU package. + * Module level in x86 cpu topology is corresponding to smp.clusters. + */ + unsigned nr_modules; } CPUX86State; struct kvm_msrs;