mbox series

[v8,00/21] Introduce smp.modules for x86 in QEMU

Message ID 20240131101350.109512-1-zhao1.liu@linux.intel.com (mailing list archive)
Headers show
Series Introduce smp.modules for x86 in QEMU | expand

Message

Zhao Liu Jan. 31, 2024, 10:13 a.m. UTC
From: Zhao Liu <zhao1.liu@intel.com>

Hi list,

This is the our v8 patch series, rebased on the master branch at the
commit 11be70677c70 ("Merge tag 'pull-vfio-20240129' of
https://github.com/legoater/qemu into staging").

Compared with v7 [1], v8 mainly has the following changes:
  * Introduced smp.modules for x86 instead of reusing current
    smp.clusters.
  * Reworte the CPUID[0x1F] encoding.

Given the code change, I dropped the most previously gotten tags
(Acked-by/Reviewed-by/Tested-by from Michael & Babu, thanks for your
previous reviews and tests!) in v8.

With the description of the new modules added to x86 arch code in v7 [1]
cover letter, the following sections are mainly the description of
the newly added smp.modules (since v8) as supplement.

Welcome your comments!


Why We Need a New CPU Topology Level
====================================

For the discussion in v7 about whether we should reuse current
smp.clusters for x86 module, the core point is what's the essential
differences between x86 module and general cluster.

Since, cluster (for ARM/riscv) lacks a comprehensive and rigorous
hardware definition, and judging from the description of smp.clusters
[2] when it was introduced by QEMU, x86 module is very similar to
general smp.clusters: they are all a layer above existing core level
to organize the physical cores and share L2 cache.

However, after digging deeper into the description and use cases of
cluster in the device tree [3], I realized that the essential
difference between clusters and modules is that cluster is an extremely
abstract concept:
  * Cluster supports nesting though currently QEMU doesn't support
    nested cluster topology. However, modules will not support nesting.
  * Also due to nesting, there is great flexibility in sharing resources
    on clusters, rather than narrowing cluster down to sharing L2 (and
    L3 tags) as the lowest topology level that contains cores.
  * Flexible nesting of cluster allows it to correspond to any level
    between the x86 package and core.

Based on the above considerations, and in order to eliminate the naming
confusion caused by the mapping between general cluster and x86 module
in v7, we now formally introduce smp.modules as the new topology level.


Where to Place Module in Existing Topology Levels
=================================================

The module is, in existing hardware practice, the lowest layer that
contains the core, while the cluster is able to have a higher topological
scope than the module due to its nesting.

Thereby, we place the module between the cluster and the core, viz:

    drawer/book/socket/die/cluster/module/core/thread


Additional Consideration on CPU Topology
========================================

Beyond this patchset, nowadays, different arches have different topology
requirements, and maintaining arch-agnostic general topology in SMP
becomes to be an increasingly difficult thing due to differences in
sharing resources and special flexibility (e.g., nesting):
  * It becomes difficult to put together all CPU topology hierarchies of
    different arches to define complete topology order.
  * It also becomes complex to ensure the correctness of the topology
    calculations.
      - Now the max_cpus is calculated by multiplying all topology
        levels, and too many topology levels can easily cause omissions.

Maybe we should consider implementing arch-specfic topology hierarchies.


[1]: https://lore.kernel.org/qemu-devel/20240108082727.420817-1-zhao1.liu@linux.intel.com/
[2]: https://lists.gnu.org/archive/html/qemu-devel/2023-02/msg04051.html
[3]: https://www.kernel.org/doc/Documentation/devicetree/bindings/cpu/cpu-topology.txt

---
Changelog:

Changes since v7 (main changes):
 * Introduced smp.modules as a new CPU topology level. (Xiaoyao)
 * Fixed calculations of cache_info_passthrough case in the
   patch "i386/cpu: Use APIC ID info to encode cache topo in
   CPUID[4]". (Xiaoyao)
 * Moved the patch "i386/cpu: Use APIC ID info get NumSharingCache
   for CPUID[0x8000001D].EAX[bits 25:14]" after CPUID[4]'s similar
   change ("i386/cpu: Use APIC ID offset to encode cache topo in
   CPUID[4]"). (Xiaoyao)
 * Introduced a bitmap in CPUX86State to cache available CPU topology
   levels.
 * Refactored the encode_topo_cpuid1f() to use traversal to search the
   encoded level and avoid using static variables.
 * Mapped x86 module to smp module instead of cluster.
 * Dropped Michael/Babu's ACKed/Tested tags for most patches since the
   code change.

Changes since v6:
 * Updated the comment when check cluster-id. Since there's no
   v8.2, the cluster-id support should at least start from v9.0.
 * Rebased on commit d328fef93ae7 ("Merge tag 'pull-20231230' of
   https://gitlab.com/rth7680/qemu into staging").

Changes since v5:
 * The first four patches of v5 [1] have been merged, v6 contains
   the remaining patches.
 * Reabsed on the latest master.
 * Updated the comment when check cluster-id. Since current QEMU is
   v8.2, the cluster-id support should at least start from v8.3.

Changes since v4:
 * Dropped the "x-l2-cache-topo" option. (Michael)
 * Added A/R/T tags.

Changes since v3 (main changes):
 * Exposed module level in CPUID[0x1F].
 * Fixed compile warnings. (Babu)
 * Fixed cache topology uninitialization bugs for some AMD CPUs. (Babu)

Changes since v2:
 * Added "Tested-by", "Reviewed-by" and "ACKed-by" tags.
 * Used newly added wrapped helper to get cores per socket in
   qemu_init_vcpu().

Changes since v1:
 * Reordered patches. (Yanan)
 * Deprecated the patch to fix comment of machine_parse_smp_config().
   (Yanan)
 * Renamed test-x86-cpuid.c to test-x86-topo.c. (Yanan)
 * Split the intel's l1 cache topology fix into a new separate patch.
   (Yanan)
 * Combined module_id and APIC ID for module level support into one
   patch. (Yanan)
 * Made cache_into_passthrough case of cpuid 0x04 leaf in
 * cpu_x86_cpuid() used max_processor_ids_for_cache() and
   max_core_ids_in_package() to encode CPUID[4]. (Yanan)
 * Added the prefix "CPU_TOPO_LEVEL_*" for CPU topology level names.
   (Yanan)

---
Zhao Liu (20):
  hw/core/machine: Introduce the module as a CPU topology level
  hw/core/machine: Support modules in -smp
  hw/core: Introduce module-id as the topology subindex
  hw/core: Support module-id in numa configuration
  i386/cpu: Fix i/d-cache topology to core level for Intel CPU
  i386/cpu: Use APIC ID info to encode cache topo in CPUID[4]
  i386/cpu: Use APIC ID info get NumSharingCache for
    CPUID[0x8000001D].EAX[bits 25:14]
  i386/cpu: Consolidate the use of topo_info in cpu_x86_cpuid()
  i386/cpu: Introduce bitmap to cache available CPU topology levels
  i386: Split topology types of CPUID[0x1F] from the definitions of
    CPUID[0xB]
  i386/cpu: Decouple CPUID[0x1F] subleaf with specific topology level
  i386: Introduce module level cpu topology to CPUX86State
  i386: Support modules_per_die in X86CPUTopoInfo
  i386: Expose module level in CPUID[0x1F]
  i386: Support module_id in X86CPUTopoIDs
  i386/cpu: Introduce module-id to X86CPU
  hw/i386/pc: Support smp.modules for x86 PC machine
  i386: Add cache topology info in CPUCacheInfo
  i386/cpu: Use CPUCacheInfo.share_level to encode CPUID[4]
  i386/cpu: Use CPUCacheInfo.share_level to encode
    CPUID[0x8000001D].EAX[bits 25:14]

Zhuocheng Ding (1):
  tests: Add test case of APIC ID for module level parsing

 hw/core/machine-hmp-cmds.c |   4 +
 hw/core/machine-smp.c      |  41 +++--
 hw/core/machine.c          |  18 +++
 hw/i386/pc.c               |   1 +
 hw/i386/x86.c              |  67 ++++++--
 include/hw/boards.h        |   4 +
 include/hw/i386/topology.h |  60 +++++++-
 qapi/machine.json          |   7 +
 qemu-options.hx            |  10 +-
 target/i386/cpu.c          | 304 +++++++++++++++++++++++++++++--------
 target/i386/cpu.h          |  29 +++-
 target/i386/kvm/kvm.c      |   3 +-
 tests/unit/test-x86-topo.c |  56 ++++---
 13 files changed, 481 insertions(+), 123 deletions(-)

Comments

Daniel P. Berrangé Jan. 31, 2024, 10:28 a.m. UTC | #1
On Wed, Jan 31, 2024 at 06:13:29PM +0800, Zhao Liu wrote:
> From: Zhao Liu <zhao1.liu@intel.com>
> 
> Hi list,
> 
> This is the our v8 patch series, rebased on the master branch at the
> commit 11be70677c70 ("Merge tag 'pull-vfio-20240129' of
> https://github.com/legoater/qemu into staging").
> 
> Compared with v7 [1], v8 mainly has the following changes:
>   * Introduced smp.modules for x86 instead of reusing current
>     smp.clusters.
>   * Reworte the CPUID[0x1F] encoding.
> 
> Given the code change, I dropped the most previously gotten tags
> (Acked-by/Reviewed-by/Tested-by from Michael & Babu, thanks for your
> previous reviews and tests!) in v8.
> 
> With the description of the new modules added to x86 arch code in v7 [1]
> cover letter, the following sections are mainly the description of
> the newly added smp.modules (since v8) as supplement.
> 
> Welcome your comments!
> 
> 
> Why We Need a New CPU Topology Level
> ====================================
> 
> For the discussion in v7 about whether we should reuse current
> smp.clusters for x86 module, the core point is what's the essential
> differences between x86 module and general cluster.
> 
> Since, cluster (for ARM/riscv) lacks a comprehensive and rigorous
> hardware definition, and judging from the description of smp.clusters
> [2] when it was introduced by QEMU, x86 module is very similar to
> general smp.clusters: they are all a layer above existing core level
> to organize the physical cores and share L2 cache.
> 
> However, after digging deeper into the description and use cases of
> cluster in the device tree [3], I realized that the essential
> difference between clusters and modules is that cluster is an extremely
> abstract concept:
>   * Cluster supports nesting though currently QEMU doesn't support
>     nested cluster topology. However, modules will not support nesting.
>   * Also due to nesting, there is great flexibility in sharing resources
>     on clusters, rather than narrowing cluster down to sharing L2 (and
>     L3 tags) as the lowest topology level that contains cores.
>   * Flexible nesting of cluster allows it to correspond to any level
>     between the x86 package and core.
> 
> Based on the above considerations, and in order to eliminate the naming
> confusion caused by the mapping between general cluster and x86 module
> in v7, we now formally introduce smp.modules as the new topology level.

What is the Linux kernel calling this topology level on x86 ?
It will be pretty unfortunate if Linux and QEMU end up with
different names for the same topology level.

With regards,
Daniel
Zhao Liu Feb. 1, 2024, 2:57 a.m. UTC | #2
Hi Daniel,

On Wed, Jan 31, 2024 at 10:28:42AM +0000, Daniel P. Berrangé wrote:
> Date: Wed, 31 Jan 2024 10:28:42 +0000
> From: "Daniel P. Berrangé" <berrange@redhat.com>
> Subject: Re: [PATCH v8 00/21] Introduce smp.modules for x86 in QEMU
> 
> On Wed, Jan 31, 2024 at 06:13:29PM +0800, Zhao Liu wrote:
> > From: Zhao Liu <zhao1.liu@intel.com>

[snip]

> > However, after digging deeper into the description and use cases of
> > cluster in the device tree [3], I realized that the essential
> > difference between clusters and modules is that cluster is an extremely
> > abstract concept:
> >   * Cluster supports nesting though currently QEMU doesn't support
> >     nested cluster topology. However, modules will not support nesting.
> >   * Also due to nesting, there is great flexibility in sharing resources
> >     on clusters, rather than narrowing cluster down to sharing L2 (and
> >     L3 tags) as the lowest topology level that contains cores.
> >   * Flexible nesting of cluster allows it to correspond to any level
> >     between the x86 package and core.
> > 
> > Based on the above considerations, and in order to eliminate the naming
> > confusion caused by the mapping between general cluster and x86 module
> > in v7, we now formally introduce smp.modules as the new topology level.
> 
> What is the Linux kernel calling this topology level on x86 ?
> It will be pretty unfortunate if Linux and QEMU end up with
> different names for the same topology level.
> 

Now Intel's engineers in the Linux kernel are starting to use "module"
to refer to this layer of topology [4] to avoid confusion, where
previously the scheduler developers referred to the share L2 hierarchy
collectively as "cluster".

Looking at it this way, it makes more sense for QEMU to use the
"module" for x86.

[4]: https://lore.kernel.org/lkml/20231116142245.1233485-3-kan.liang@linux.intel.com/

Thanks,
Zhao
Daniel P. Berrangé Feb. 1, 2024, 9:21 a.m. UTC | #3
On Thu, Feb 01, 2024 at 10:57:32AM +0800, Zhao Liu wrote:
> Hi Daniel,
> 
> On Wed, Jan 31, 2024 at 10:28:42AM +0000, Daniel P. Berrangé wrote:
> > Date: Wed, 31 Jan 2024 10:28:42 +0000
> > From: "Daniel P. Berrangé" <berrange@redhat.com>
> > Subject: Re: [PATCH v8 00/21] Introduce smp.modules for x86 in QEMU
> > 
> > On Wed, Jan 31, 2024 at 06:13:29PM +0800, Zhao Liu wrote:
> > > From: Zhao Liu <zhao1.liu@intel.com>
> 
> [snip]
> 
> > > However, after digging deeper into the description and use cases of
> > > cluster in the device tree [3], I realized that the essential
> > > difference between clusters and modules is that cluster is an extremely
> > > abstract concept:
> > >   * Cluster supports nesting though currently QEMU doesn't support
> > >     nested cluster topology. However, modules will not support nesting.
> > >   * Also due to nesting, there is great flexibility in sharing resources
> > >     on clusters, rather than narrowing cluster down to sharing L2 (and
> > >     L3 tags) as the lowest topology level that contains cores.
> > >   * Flexible nesting of cluster allows it to correspond to any level
> > >     between the x86 package and core.
> > > 
> > > Based on the above considerations, and in order to eliminate the naming
> > > confusion caused by the mapping between general cluster and x86 module
> > > in v7, we now formally introduce smp.modules as the new topology level.
> > 
> > What is the Linux kernel calling this topology level on x86 ?
> > It will be pretty unfortunate if Linux and QEMU end up with
> > different names for the same topology level.
> > 
> 
> Now Intel's engineers in the Linux kernel are starting to use "module"
> to refer to this layer of topology [4] to avoid confusion, where
> previously the scheduler developers referred to the share L2 hierarchy
> collectively as "cluster".
> 
> Looking at it this way, it makes more sense for QEMU to use the
> "module" for x86.

I was thinking specificially about what Linux calls this topology when
exposing it in sysfs and /proc/cpuinfo. AFAICT, it looks like it is
called 'clusters' in this context, and so this is the terminology that
applications and users are going to expect.

I think it would be a bad idea for QEMU to diverge from this and call
it modules.

With regards,
Daniel
Zhao Liu Feb. 1, 2024, 4:10 p.m. UTC | #4
Hi Daniel,

On Thu, Feb 01, 2024 at 09:21:48AM +0000, Daniel P. Berrangé wrote:
> Date: Thu, 1 Feb 2024 09:21:48 +0000
> From: "Daniel P. Berrangé" <berrange@redhat.com>
> Subject: Re: [PATCH v8 00/21] Introduce smp.modules for x86 in QEMU
> 
> On Thu, Feb 01, 2024 at 10:57:32AM +0800, Zhao Liu wrote:
> > Hi Daniel,
> > 
> > On Wed, Jan 31, 2024 at 10:28:42AM +0000, Daniel P. Berrangé wrote:
> > > Date: Wed, 31 Jan 2024 10:28:42 +0000
> > > From: "Daniel P. Berrangé" <berrange@redhat.com>
> > > Subject: Re: [PATCH v8 00/21] Introduce smp.modules for x86 in QEMU
> > > 
> > > On Wed, Jan 31, 2024 at 06:13:29PM +0800, Zhao Liu wrote:
> > > > From: Zhao Liu <zhao1.liu@intel.com>
> > 
> > [snip]
> > 
> > > > However, after digging deeper into the description and use cases of
> > > > cluster in the device tree [3], I realized that the essential
> > > > difference between clusters and modules is that cluster is an extremely
> > > > abstract concept:
> > > >   * Cluster supports nesting though currently QEMU doesn't support
> > > >     nested cluster topology. However, modules will not support nesting.
> > > >   * Also due to nesting, there is great flexibility in sharing resources
> > > >     on clusters, rather than narrowing cluster down to sharing L2 (and
> > > >     L3 tags) as the lowest topology level that contains cores.
> > > >   * Flexible nesting of cluster allows it to correspond to any level
> > > >     between the x86 package and core.
> > > > 
> > > > Based on the above considerations, and in order to eliminate the naming
> > > > confusion caused by the mapping between general cluster and x86 module
> > > > in v7, we now formally introduce smp.modules as the new topology level.
> > > 
> > > What is the Linux kernel calling this topology level on x86 ?
> > > It will be pretty unfortunate if Linux and QEMU end up with
> > > different names for the same topology level.
> > > 
> > 
> > Now Intel's engineers in the Linux kernel are starting to use "module"
> > to refer to this layer of topology [4] to avoid confusion, where
> > previously the scheduler developers referred to the share L2 hierarchy
> > collectively as "cluster".
> > 
> > Looking at it this way, it makes more sense for QEMU to use the
> > "module" for x86.
> 
> I was thinking specificially about what Linux calls this topology when
> exposing it in sysfs and /proc/cpuinfo. AFAICT, it looks like it is
> called 'clusters' in this context, and so this is the terminology that
> applications and users are going to expect.

The cluster related topology information under "/sys/devices/system/cpu/
cpu*/topology" indicates the L2 cache topology (CPUID[0x4]), not module
level CPU topology (CPUID[0x1f]).

So far, kernel hasn't exposed module topology related sysfs. But we will
add new "module" related information in sysfs. The relevant patches are
ready internally, but not posted yet.

In the future, we will use "module" in sysfs to indicate module level CPU
topology, and "cluster" will be only used to refer to the l2 cache domain
as it is now.

> 
> I think it would be a bad idea for QEMU to diverge from this and call
> it modules.
>

This patch set mainly tries to configure the module level CPU topology
for Guest, which will be aligned with the future module information in
the sysfs, so that it doesn't violate the kernel x86 arch's definition
or current end users' expectation for x86's cluster.

Thanks,
Zhao
Daniel P. Berrangé Feb. 8, 2024, 4:52 p.m. UTC | #5
On Fri, Feb 02, 2024 at 12:10:58AM +0800, Zhao Liu wrote:
> Hi Daniel,
> 
> On Thu, Feb 01, 2024 at 09:21:48AM +0000, Daniel P. Berrangé wrote:
> > Date: Thu, 1 Feb 2024 09:21:48 +0000
> > From: "Daniel P. Berrangé" <berrange@redhat.com>
> > Subject: Re: [PATCH v8 00/21] Introduce smp.modules for x86 in QEMU
> > 
> > On Thu, Feb 01, 2024 at 10:57:32AM +0800, Zhao Liu wrote:
> > > Hi Daniel,
> > > 
> > > On Wed, Jan 31, 2024 at 10:28:42AM +0000, Daniel P. Berrangé wrote:
> > > > Date: Wed, 31 Jan 2024 10:28:42 +0000
> > > > From: "Daniel P. Berrangé" <berrange@redhat.com>
> > > > Subject: Re: [PATCH v8 00/21] Introduce smp.modules for x86 in QEMU
> > > > 
> > > > On Wed, Jan 31, 2024 at 06:13:29PM +0800, Zhao Liu wrote:
> > > > > From: Zhao Liu <zhao1.liu@intel.com>
> > > 
> > > [snip]
> > > 
> > > > > However, after digging deeper into the description and use cases of
> > > > > cluster in the device tree [3], I realized that the essential
> > > > > difference between clusters and modules is that cluster is an extremely
> > > > > abstract concept:
> > > > >   * Cluster supports nesting though currently QEMU doesn't support
> > > > >     nested cluster topology. However, modules will not support nesting.
> > > > >   * Also due to nesting, there is great flexibility in sharing resources
> > > > >     on clusters, rather than narrowing cluster down to sharing L2 (and
> > > > >     L3 tags) as the lowest topology level that contains cores.
> > > > >   * Flexible nesting of cluster allows it to correspond to any level
> > > > >     between the x86 package and core.
> > > > > 
> > > > > Based on the above considerations, and in order to eliminate the naming
> > > > > confusion caused by the mapping between general cluster and x86 module
> > > > > in v7, we now formally introduce smp.modules as the new topology level.
> > > > 
> > > > What is the Linux kernel calling this topology level on x86 ?
> > > > It will be pretty unfortunate if Linux and QEMU end up with
> > > > different names for the same topology level.
> > > > 
> > > 
> > > Now Intel's engineers in the Linux kernel are starting to use "module"
> > > to refer to this layer of topology [4] to avoid confusion, where
> > > previously the scheduler developers referred to the share L2 hierarchy
> > > collectively as "cluster".
> > > 
> > > Looking at it this way, it makes more sense for QEMU to use the
> > > "module" for x86.
> > 
> > I was thinking specificially about what Linux calls this topology when
> > exposing it in sysfs and /proc/cpuinfo. AFAICT, it looks like it is
> > called 'clusters' in this context, and so this is the terminology that
> > applications and users are going to expect.
> 
> The cluster related topology information under "/sys/devices/system/cpu/
> cpu*/topology" indicates the L2 cache topology (CPUID[0x4]), not module
> level CPU topology (CPUID[0x1f]).
> 
> So far, kernel hasn't exposed module topology related sysfs. But we will
> add new "module" related information in sysfs. The relevant patches are
> ready internally, but not posted yet.
> 
> In the future, we will use "module" in sysfs to indicate module level CPU
> topology, and "cluster" will be only used to refer to the l2 cache domain
> as it is now.

So, if they're distinct concepts both relevant to x86 CPUs, then from
the QEMU POV, should this patch series be changing the -smp arg to
allowing configuration of both 'clusters' and 'modules' for x86 ?

An earlier version of this series just supported 'clusters', and this
changed to 'modules', but your description of Linux reporting both
suggests QEMU would need both.


With regards,
Daniel
Zhao Liu Feb. 15, 2024, 4:56 p.m. UTC | #6
Hi Daniel,

On Thu, Feb 08, 2024 at 04:52:33PM +0000, Daniel P. Berrangé wrote:
> Date: Thu, 8 Feb 2024 16:52:33 +0000
> From: "Daniel P. Berrangé" <berrange@redhat.com>
> Subject: Re: [PATCH v8 00/21] Introduce smp.modules for x86 in QEMU
> 
> On Fri, Feb 02, 2024 at 12:10:58AM +0800, Zhao Liu wrote:
> > Hi Daniel,
> > 
> > On Thu, Feb 01, 2024 at 09:21:48AM +0000, Daniel P. Berrangé wrote:
> > > Date: Thu, 1 Feb 2024 09:21:48 +0000
> > > From: "Daniel P. Berrangé" <berrange@redhat.com>
> > > Subject: Re: [PATCH v8 00/21] Introduce smp.modules for x86 in QEMU
> > > 
> > > On Thu, Feb 01, 2024 at 10:57:32AM +0800, Zhao Liu wrote:
> > > > Hi Daniel,
> > > > 
> > > > On Wed, Jan 31, 2024 at 10:28:42AM +0000, Daniel P. Berrangé wrote:
> > > > > Date: Wed, 31 Jan 2024 10:28:42 +0000
> > > > > From: "Daniel P. Berrangé" <berrange@redhat.com>
> > > > > Subject: Re: [PATCH v8 00/21] Introduce smp.modules for x86 in QEMU
> > > > > 
> > > > > On Wed, Jan 31, 2024 at 06:13:29PM +0800, Zhao Liu wrote:
> > > > > > From: Zhao Liu <zhao1.liu@intel.com>
> > > > 
> > > > [snip]
> > > > 
> > > > > > However, after digging deeper into the description and use cases of
> > > > > > cluster in the device tree [3], I realized that the essential
> > > > > > difference between clusters and modules is that cluster is an extremely
> > > > > > abstract concept:
> > > > > >   * Cluster supports nesting though currently QEMU doesn't support
> > > > > >     nested cluster topology. However, modules will not support nesting.
> > > > > >   * Also due to nesting, there is great flexibility in sharing resources
> > > > > >     on clusters, rather than narrowing cluster down to sharing L2 (and
> > > > > >     L3 tags) as the lowest topology level that contains cores.
> > > > > >   * Flexible nesting of cluster allows it to correspond to any level
> > > > > >     between the x86 package and core.
> > > > > > 
> > > > > > Based on the above considerations, and in order to eliminate the naming
> > > > > > confusion caused by the mapping between general cluster and x86 module
> > > > > > in v7, we now formally introduce smp.modules as the new topology level.
> > > > > 
> > > > > What is the Linux kernel calling this topology level on x86 ?
> > > > > It will be pretty unfortunate if Linux and QEMU end up with
> > > > > different names for the same topology level.
> > > > > 
> > > > 
> > > > Now Intel's engineers in the Linux kernel are starting to use "module"
> > > > to refer to this layer of topology [4] to avoid confusion, where
> > > > previously the scheduler developers referred to the share L2 hierarchy
> > > > collectively as "cluster".
> > > > 
> > > > Looking at it this way, it makes more sense for QEMU to use the
> > > > "module" for x86.
> > > 
> > > I was thinking specificially about what Linux calls this topology when
> > > exposing it in sysfs and /proc/cpuinfo. AFAICT, it looks like it is
> > > called 'clusters' in this context, and so this is the terminology that
> > > applications and users are going to expect.
> > 
> > The cluster related topology information under "/sys/devices/system/cpu/
> > cpu*/topology" indicates the L2 cache topology (CPUID[0x4]), not module
> > level CPU topology (CPUID[0x1f]).
> > 
> > So far, kernel hasn't exposed module topology related sysfs. But we will
> > add new "module" related information in sysfs. The relevant patches are
> > ready internally, but not posted yet.
> > 
> > In the future, we will use "module" in sysfs to indicate module level CPU
> > topology, and "cluster" will be only used to refer to the l2 cache domain
> > as it is now.
> 
> So, if they're distinct concepts both relevant to x86 CPUs, then from
> the QEMU POV, should this patch series be changing the -smp arg to
> allowing configuration of both 'clusters' and 'modules' for x86 ?

Though the previous versions use "clusters" parameter, they, like the
current "modules" version, are just to add a CPU topology level to the
x86 CPU.

> 
> An earlier version of this series just supported 'clusters', and this
> changed to 'modules', but your description of Linux reporting both
> suggests QEMU would need both.
>

Given the cluster support for x86, i.e. the L2 cache topology support,
we want to introduce a different cache topology configuration way than
CPU topology and avoid using the "cluster" as cache topology name (this
avoids the confusion of -smp "clusters" which is a CPU topology since
ARM also just treats "cluster" as a CPU topology level in QEMU other
than cache topology level).

BTW, for cache topology, may I ask for your advice? Currently, I can
think of 2 options:

1. Hacked the -smp as:

-smp cpus=4,sockets=2,cores=2,threads=1, \
     l3-cache=socket,l2-cache=core,l1-i-cache=core,l1-d-cache=core

For this way, I just parsed the extended -smp and store the cache
topology in such structue:

typedef struct CacheTopology {
    CPUTopoLevel l1i;
    CPUTopoLevel l1d;
    CPUTopoLevel l2;
    CPUTopoLevel l3;
} CacheTopology;

This way is just used for smp cache topology. For the heterogeneous/hybrid
cache topology, I think it can be expanded based on the QOM CPU topology
[4] as:

    -accel kvm -cpu host \
    -device cpu-socket,id=sock0 \
    -device cpu-die,id=die0,parent=sock0 \
    -device cpu-module,id=module0,parent=die0 \
    -device cpu-module,id=module1,parent=die0 \
    -device cpu-core,id=core0,parent=module0,nr-threads=2 \
    -device cpu-core,id=core1,parent=module1,nr-threads=1 \
    -device cpu-core,id=core2,parent=module1,nr-threads=1 \
    -device cache,id=cache0,parent=die0,level=3,type=unified \
    -device cache,id=cache1,parent=core0,level=2,type=unified \
    -device cache,id=cache2,parent=core0,level=1,type=data \
    -device cache,id=cache3,parent=core0,level=1,type=inst \
    -device cache,id=cache4,parent=module1,level=2,type=unified \
    -device cache,id=cache5,parent=core1,level=1,type=data \
    -device cache,id=cache6,parent=core1,level=1,type=inst \
    -device cache,id=cache5,parent=core2,level=1,type=data \
    -device cache,id=cache6,parent=core2,level=1,type=inst \

In the module0, the l2 (x86's cluster) is shared at core0 (core level).
And in the module1, the l2 is shared for core1 and core 2 (at module
level).

[4]: https://lore.kernel.org/qemu-devel/20231130144203.2307629-1-zhao1.liu@linux.intel.com/


2. But recently I realized maybe there's another option, which is just
to introduce a new option "-cache" like "-numa" to configure cache
topology.

In "-cache", we could accept the CPU list as the parameter:

    -cache cache,cacheid=0,level=2,type=unified,cpus=0-1 \
    -cache cache,cacheid=1,level=2,type=unified,cpus=2-3 \

or CPU topology ids as the parameters:

    -cache cache,cache-id=0,level=2,type=unified \
    -cache cache,cache-id=1,level=2,type=unified \
    -cache cpu,cache-id=0,socket-id=0,die-id=0,module-id=0,core-id=0 \
    -cache cpu,cache-id=1,socket-id=0,die-id=0,module-id=1 \


Hmmm, Daniel, which of the above two options do you prefer?

Thanks,
Zhao
Markus Armbruster Feb. 21, 2024, 12:41 p.m. UTC | #7
Zhao Liu <zhao1.liu@linux.intel.com> writes:

> From: Zhao Liu <zhao1.liu@intel.com>
>
> Hi list,
>
> This is the our v8 patch series, rebased on the master branch at the
> commit 11be70677c70 ("Merge tag 'pull-vfio-20240129' of
> https://github.com/legoater/qemu into staging").
>
> Compared with v7 [1], v8 mainly has the following changes:
>   * Introduced smp.modules for x86 instead of reusing current
>     smp.clusters.
>   * Reworte the CPUID[0x1F] encoding.
>
> Given the code change, I dropped the most previously gotten tags
> (Acked-by/Reviewed-by/Tested-by from Michael & Babu, thanks for your
> previous reviews and tests!) in v8.
>
> With the description of the new modules added to x86 arch code in v7 [1]
> cover letter, the following sections are mainly the description of
> the newly added smp.modules (since v8) as supplement.
>
> Welcome your comments!
>
>
> Why We Need a New CPU Topology Level
> ====================================
>
> For the discussion in v7 about whether we should reuse current
> smp.clusters for x86 module, the core point is what's the essential
> differences between x86 module and general cluster.
>
> Since, cluster (for ARM/riscv) lacks a comprehensive and rigorous
> hardware definition, and judging from the description of smp.clusters
> [2] when it was introduced by QEMU, x86 module is very similar to
> general smp.clusters: they are all a layer above existing core level
> to organize the physical cores and share L2 cache.
>
> However, after digging deeper into the description and use cases of
> cluster in the device tree [3], I realized that the essential
> difference between clusters and modules is that cluster is an extremely
> abstract concept:
>   * Cluster supports nesting though currently QEMU doesn't support
>     nested cluster topology. However, modules will not support nesting.
>   * Also due to nesting, there is great flexibility in sharing resources
>     on clusters, rather than narrowing cluster down to sharing L2 (and
>     L3 tags) as the lowest topology level that contains cores.
>   * Flexible nesting of cluster allows it to correspond to any level
>     between the x86 package and core.
>
> Based on the above considerations, and in order to eliminate the naming
> confusion caused by the mapping between general cluster and x86 module
> in v7, we now formally introduce smp.modules as the new topology level.
>
>
> Where to Place Module in Existing Topology Levels
> =================================================
>
> The module is, in existing hardware practice, the lowest layer that
> contains the core, while the cluster is able to have a higher topological
> scope than the module due to its nesting.
>
> Thereby, we place the module between the cluster and the core, viz:
>
>     drawer/book/socket/die/cluster/module/core/thread
>
>
> Additional Consideration on CPU Topology
> ========================================
>
> Beyond this patchset, nowadays, different arches have different topology
> requirements, and maintaining arch-agnostic general topology in SMP
> becomes to be an increasingly difficult thing due to differences in
> sharing resources and special flexibility (e.g., nesting):
>   * It becomes difficult to put together all CPU topology hierarchies of
>     different arches to define complete topology order.
>   * It also becomes complex to ensure the correctness of the topology
>     calculations.
>       - Now the max_cpus is calculated by multiplying all topology
>         levels, and too many topology levels can easily cause omissions.
>
> Maybe we should consider implementing arch-specfic topology hierarchies.
>
>
> [1]: https://lore.kernel.org/qemu-devel/20240108082727.420817-1-zhao1.liu@linux.intel.com/
> [2]: https://lists.gnu.org/archive/html/qemu-devel/2023-02/msg04051.html
> [3]: https://www.kernel.org/doc/Documentation/devicetree/bindings/cpu/cpu-topology.txt

Have you considered putting an abridged version of your lovely rationale
into a commit message, so it can be found later more easily?
Zhao Liu Feb. 21, 2024, 3:15 p.m. UTC | #8
On Wed, Feb 21, 2024 at 01:41:35PM +0100, Markus Armbruster wrote:
> Date: Wed, 21 Feb 2024 13:41:35 +0100
> From: Markus Armbruster <armbru@redhat.com>
> Subject: Re: [PATCH v8 00/21] Introduce smp.modules for x86 in QEMU
> 
> Zhao Liu <zhao1.liu@linux.intel.com> writes:
> 
> > From: Zhao Liu <zhao1.liu@intel.com>
> >
> > Hi list,
> >
> > This is the our v8 patch series, rebased on the master branch at the
> > commit 11be70677c70 ("Merge tag 'pull-vfio-20240129' of
> > https://github.com/legoater/qemu into staging").
> >
> > Compared with v7 [1], v8 mainly has the following changes:
> >   * Introduced smp.modules for x86 instead of reusing current
> >     smp.clusters.
> >   * Reworte the CPUID[0x1F] encoding.
> >
> > Given the code change, I dropped the most previously gotten tags
> > (Acked-by/Reviewed-by/Tested-by from Michael & Babu, thanks for your
> > previous reviews and tests!) in v8.
> >
> > With the description of the new modules added to x86 arch code in v7 [1]
> > cover letter, the following sections are mainly the description of
> > the newly added smp.modules (since v8) as supplement.
> >
> > Welcome your comments!
> >
> >
> > Why We Need a New CPU Topology Level
> > ====================================
> >
> > For the discussion in v7 about whether we should reuse current
> > smp.clusters for x86 module, the core point is what's the essential
> > differences between x86 module and general cluster.
> >
> > Since, cluster (for ARM/riscv) lacks a comprehensive and rigorous
> > hardware definition, and judging from the description of smp.clusters
> > [2] when it was introduced by QEMU, x86 module is very similar to
> > general smp.clusters: they are all a layer above existing core level
> > to organize the physical cores and share L2 cache.
> >
> > However, after digging deeper into the description and use cases of
> > cluster in the device tree [3], I realized that the essential
> > difference between clusters and modules is that cluster is an extremely
> > abstract concept:
> >   * Cluster supports nesting though currently QEMU doesn't support
> >     nested cluster topology. However, modules will not support nesting.
> >   * Also due to nesting, there is great flexibility in sharing resources
> >     on clusters, rather than narrowing cluster down to sharing L2 (and
> >     L3 tags) as the lowest topology level that contains cores.
> >   * Flexible nesting of cluster allows it to correspond to any level
> >     between the x86 package and core.
> >
> > Based on the above considerations, and in order to eliminate the naming
> > confusion caused by the mapping between general cluster and x86 module
> > in v7, we now formally introduce smp.modules as the new topology level.
> >
> >
> > Where to Place Module in Existing Topology Levels
> > =================================================
> >
> > The module is, in existing hardware practice, the lowest layer that
> > contains the core, while the cluster is able to have a higher topological
> > scope than the module due to its nesting.
> >
> > Thereby, we place the module between the cluster and the core, viz:
> >
> >     drawer/book/socket/die/cluster/module/core/thread
> >
> >
> > Additional Consideration on CPU Topology
> > ========================================
> >
> > Beyond this patchset, nowadays, different arches have different topology
> > requirements, and maintaining arch-agnostic general topology in SMP
> > becomes to be an increasingly difficult thing due to differences in
> > sharing resources and special flexibility (e.g., nesting):
> >   * It becomes difficult to put together all CPU topology hierarchies of
> >     different arches to define complete topology order.
> >   * It also becomes complex to ensure the correctness of the topology
> >     calculations.
> >       - Now the max_cpus is calculated by multiplying all topology
> >         levels, and too many topology levels can easily cause omissions.
> >
> > Maybe we should consider implementing arch-specfic topology hierarchies.
> >
> >
> > [1]: https://lore.kernel.org/qemu-devel/20240108082727.420817-1-zhao1.liu@linux.intel.com/
> > [2]: https://lists.gnu.org/archive/html/qemu-devel/2023-02/msg04051.html
> > [3]: https://www.kernel.org/doc/Documentation/devicetree/bindings/cpu/cpu-topology.txt
> 
> Have you considered putting an abridged version of your lovely rationale
> into a commit message, so it can be found later more easily?
>

Sure, I'll. Thanks for helping me improve my commit message. ;-)

Regards,
Zhao