[v6,0/4] i386: Support SMP Cache Topology

Message ID	20241219083237.265419-1-zhao1.liu@intel.com (mailing list archive)
Headers	show Return-Path: <qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org> From: Zhao Liu <zhao1.liu@intel.com> To: Paolo Bonzini <pbonzini@redhat.com>, =?utf-8?q?Daniel_P_=2E_Berrang?= =?utf-8?q?=C3=A9?= <berrange@redhat.com>, Igor Mammedov <imammedo@redhat.com>, Eduardo Habkost <eduardo@habkost.net>, Marcel Apfelbaum <marcel.apfelbaum@gmail.com>, =?utf-8?q?Philippe_Mathieu-D?= =?utf-8?q?aud=C3=A9?= <philmd@linaro.org>, Yanan Wang <wangyanan55@huawei.com>, "Michael S . Tsirkin" <mst@redhat.com>, Richard Henderson <richard.henderson@linaro.org>, Jonathan Cameron <Jonathan.Cameron@huawei.com>, Alireza Sanaee <alireza.sanaee@huawei.com>, Sia Jee Heng <jeeheng.sia@starfivetech.com> Cc: qemu-devel@nongnu.org, kvm@vger.kernel.org, Zhao Liu <zhao1.liu@intel.com> Subject: [PATCH v6 0/4] i386: Support SMP Cache Topology Date: Thu, 19 Dec 2024 16:32:33 +0800 Message-Id: <20241219083237.265419-1-zhao1.liu@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Received-SPF: pass client-ip=192.198.163.19; envelope-from=zhao1.liu@intel.com; helo=mgamail.intel.com X-Spam_score_int: -43 X-Spam_score: -4.4 X-Spam_bar: ---- X-Spam_report: (-4.4 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_VALIDITY_CERTIFIED_BLOCKED=0.001, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action Precedence: list Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org
Series	i386: Support SMP Cache Topology \| expand [v6,0/4] i386: Support SMP Cache Topology [v6,1/4] i386/cpu: Support thread and module level cache topology [v6,2/4] i386/cpu: Update cache topology with machine's configuration [v6,3/4] i386/pc: Support cache topology in -machine for PC machine [v6,4/4] i386/cpu: add has_caches flag to check smp_cache configuration

Zhao Liu Dec. 19, 2024, 8:32 a.m. UTC

Hi folks,

This is my v6. since Phili has already merged the general smp cache
part, v6 just includes the remaining i386-specific changes to support
SMP cache topology for PC machine (currently all patches have got
Reviewed-by from previous review).

Compared with v5 [1], there's no change and just series just picks
the unmerged patches and rebases on the master branch (based on the
commit 8032c78e556c "Merge tag 'firmware-20241216-pull-request' of
https://gitlab.com/kraxel/qemu into staging").

The patch 4 ("i386/cpu: add has_caches flag to check smp_cache"), which
introduced a has_caches flag, is also ARM side wanted.

Though now this series targets to i386, to help review, I still include
the previous introduction about smp cache topology feature.


Background
==========

The x86 and ARM (RISCV) need to allow user to configure cache properties
(current only topology):
 * For x86, the default cache topology model (of max/host CPU) does not
   always match the Host's real physical cache topology. Performance can
   increase when the configured virtual topology is closer to the
   physical topology than a default topology would be.
 * For ARM, QEMU can't get the cache topology information from the CPU
   registers, then user configuration is necessary. Additionally, the
   cache information is also needed for MPAM emulation (for TCG) to
   build the right PPTT. (Originally from Jonathan)


About smp-cache
===============

The API design has been discussed heavily in [3].

Now, smp-cache is implemented as a array integrated in -machine. Though
-machine currently can't support JSON format, this is the one of the
directions of future.

An example is as follows:

smp_cache=smp-cache.0.cache=l1i,smp-cache.0.topology=core,smp-cache.1.cache=l1d,smp-cache.1.topology=core,smp-cache.2.cache=l2,smp-cache.2.topology=module,smp-cache.3.cache=l3,smp-cache.3.topology=die

"cache" specifies the cache that the properties will be applied on. This
field is the combination of cache level and cache type. Now it supports
"l1d" (L1 data cache), "l1i" (L1 instruction cache), "l2" (L2 unified
cache) and "l3" (L3 unified cache).

"topology" field accepts CPU topology levels including "thread", "core",
"module", "cluster", "die", "socket", "book", "drawer" and a special
value "default".

The "default" is introduced to make it easier for libvirt to set a
default parameter value without having to care about the specific
machine (because currently there is no proper way for machine to
expose supported topology levels and caches).

If "default" is set, then the cache topology will follow the
architecture's default cache topology model. If other CPU topology level
is set, the cache will be shared at corresponding CPU topology level.


[1]: Patch v5: https://lore.kernel.org/qemu-devel/20241101083331.340178-1-zhao1.liu@intel.com/
[2]: ARM smp-cache: https://lore.kernel.org/qemu-devel/20241010111822.345-1-alireza.sanaee@huawei.com/
[3]: API disscussion: https://lore.kernel.org/qemu-devel/8734ndj33j.fsf@pond.sub.org/

Thanks and Best Regards,
Zhao
---
Alireza Sanaee (1):
  i386/cpu: add has_caches flag to check smp_cache configuration

Zhao Liu (3):
  i386/cpu: Support thread and module level cache topology
  i386/cpu: Update cache topology with machine's configuration
  i386/pc: Support cache topology in -machine for PC machine

 hw/core/machine-smp.c |  2 ++
 hw/i386/pc.c          |  4 +++
 include/hw/boards.h   |  3 ++
 qemu-options.hx       | 31 +++++++++++++++++-
 target/i386/cpu.c     | 76 ++++++++++++++++++++++++++++++++++++++++---
 5 files changed, 111 insertions(+), 5 deletions(-)

Paolo Bonzini Dec. 24, 2024, 4:04 p.m. UTC | #1

On 12/19/24 09:32, Zhao Liu wrote:
> Hi folks,
> 
> This is my v6. since Phili has already merged the general smp cache
> part, v6 just includes the remaining i386-specific changes to support
> SMP cache topology for PC machine (currently all patches have got
> Reviewed-by from previous review).
> 
> Compared with v5 [1], there's no change and just series just picks
> the unmerged patches and rebases on the master branch (based on the
> commit 8032c78e556c "Merge tag 'firmware-20241216-pull-request' of
> https://gitlab.com/kraxel/qemu into staging").
> 
> The patch 4 ("i386/cpu: add has_caches flag to check smp_cache"), which
> introduced a has_caches flag, is also ARM side wanted.
> 
> Though now this series targets to i386, to help review, I still include
> the previous introduction about smp cache topology feature.
> 
> 
> Background
> ==========
> 
> The x86 and ARM (RISCV) need to allow user to configure cache properties
> (current only topology):
>   * For x86, the default cache topology model (of max/host CPU) does not
>     always match the Host's real physical cache topology. Performance can
>     increase when the configured virtual topology is closer to the
>     physical topology than a default topology would be.
>   * For ARM, QEMU can't get the cache topology information from the CPU
>     registers, then user configuration is necessary. Additionally, the
>     cache information is also needed for MPAM emulation (for TCG) to
>     build the right PPTT. (Originally from Jonathan)
> 
> 
> About smp-cache
> ===============
> 
> The API design has been discussed heavily in [3].
> 
> Now, smp-cache is implemented as a array integrated in -machine. Though
> -machine currently can't support JSON format, this is the one of the
> directions of future.
> 
> An example is as follows:
> 
> smp_cache=smp-cache.0.cache=l1i,smp-cache.0.topology=core,smp-cache.1.cache=l1d,smp-cache.1.topology=core,smp-cache.2.cache=l2,smp-cache.2.topology=module,smp-cache.3.cache=l3,smp-cache.3.topology=die
> 
> "cache" specifies the cache that the properties will be applied on. This
> field is the combination of cache level and cache type. Now it supports
> "l1d" (L1 data cache), "l1i" (L1 instruction cache), "l2" (L2 unified
> cache) and "l3" (L3 unified cache).
> 
> "topology" field accepts CPU topology levels including "thread", "core",
> "module", "cluster", "die", "socket", "book", "drawer" and a special
> value "default".

Looks good; just one thing, does "thread" make sense?  I think that it's 
almost by definition that threads within a core share all caches, but 
maybe I'm missing some hardware configurations.

Paolo

> The "default" is introduced to make it easier for libvirt to set a
> default parameter value without having to care about the specific
> machine (because currently there is no proper way for machine to
> expose supported topology levels and caches).
> 
> If "default" is set, then the cache topology will follow the
> architecture's default cache topology model. If other CPU topology level
> is set, the cache will be shared at corresponding CPU topology level.
> 
> 
> [1]: Patch v5: https://lore.kernel.org/qemu-devel/20241101083331.340178-1-zhao1.liu@intel.com/
> [2]: ARM smp-cache: https://lore.kernel.org/qemu-devel/20241010111822.345-1-alireza.sanaee@huawei.com/
> [3]: API disscussion: https://lore.kernel.org/qemu-devel/8734ndj33j.fsf@pond.sub.org/
> 
> Thanks and Best Regards,
> Zhao
> ---
> Alireza Sanaee (1):
>    i386/cpu: add has_caches flag to check smp_cache configuration
> 
> Zhao Liu (3):
>    i386/cpu: Support thread and module level cache topology
>    i386/cpu: Update cache topology with machine's configuration
>    i386/pc: Support cache topology in -machine for PC machine
> 
>   hw/core/machine-smp.c |  2 ++
>   hw/i386/pc.c          |  4 +++
>   include/hw/boards.h   |  3 ++
>   qemu-options.hx       | 31 +++++++++++++++++-
>   target/i386/cpu.c     | 76 ++++++++++++++++++++++++++++++++++++++++---
>   5 files changed, 111 insertions(+), 5 deletions(-)
>

Zhao Liu Dec. 25, 2024, 3:03 a.m. UTC | #2

> > About smp-cache
> > ===============
> > 
> > The API design has been discussed heavily in [3].
> > 
> > Now, smp-cache is implemented as a array integrated in -machine. Though
> > -machine currently can't support JSON format, this is the one of the
> > directions of future.
> > 
> > An example is as follows:
> > 
> > smp_cache=smp-cache.0.cache=l1i,smp-cache.0.topology=core,smp-cache.1.cache=l1d,smp-cache.1.topology=core,smp-cache.2.cache=l2,smp-cache.2.topology=module,smp-cache.3.cache=l3,smp-cache.3.topology=die
> > 
> > "cache" specifies the cache that the properties will be applied on. This
> > field is the combination of cache level and cache type. Now it supports
> > "l1d" (L1 data cache), "l1i" (L1 instruction cache), "l2" (L2 unified
> > cache) and "l3" (L3 unified cache).
> > 
> > "topology" field accepts CPU topology levels including "thread", "core",
> > "module", "cluster", "die", "socket", "book", "drawer" and a special
> > value "default".
> 
> Looks good; just one thing, does "thread" make sense?  I think that it's
> almost by definition that threads within a core share all caches, but maybe
> I'm missing some hardware configurations.

Hi Paolo, merry Christmas. Yes, AFAIK, there's no hardware has thread
level cache.

I considered the thread case is that it could be used for vCPU
scheduling optimization (although I haven't rigorously tested the actual
impact). Without CPU affinity, tasks in Linux are generally distributed
evenly across different cores (for example, vCPU0 on Core 0, vCPU1 on
Core 1). In this case, the thread-level cache settings are closer to the
actual situation, with vCPU0 occupying the L1/L2 of Host core0 and vCPU1
occupying the L1/L2 of Host core1.


  ┌───┐        ┌───┐
  vCPU0        vCPU1
  │   │        │   │
  └───┘        └───┘
 ┌┌───┐┌───┐┐ ┌┌───┐┌───┐┐
 ││T0 ││T1 ││ ││T2 ││T3 ││
 │└───┘└───┘│ │└───┘└───┘│
 └────C0────┘ └────C1────┘


The L2 cache topology affects performance, and cluster-aware scheduling
feature in the Linux kernel will try to schedule tasks on the same L2
cache. So, in cases like the figure above, setting the L2 cache to be
per thread should, in principle, be better.

Thanks,
Zhao

Alireza Sanaee Jan. 2, 2025, 2:57 p.m. UTC | #3

On Wed, 25 Dec 2024 11:03:42 +0800
Zhao Liu <zhao1.liu@intel.com> wrote:

> > > About smp-cache
> > > ===============
> > > 
> > > The API design has been discussed heavily in [3].
> > > 
> > > Now, smp-cache is implemented as a array integrated in -machine.
> > > Though -machine currently can't support JSON format, this is the
> > > one of the directions of future.
> > > 
> > > An example is as follows:
> > > 
> > > smp_cache=smp-cache.0.cache=l1i,smp-cache.0.topology=core,smp-cache.1.cache=l1d,smp-cache.1.topology=core,smp-cache.2.cache=l2,smp-cache.2.topology=module,smp-cache.3.cache=l3,smp-cache.3.topology=die
> > > 
> > > "cache" specifies the cache that the properties will be applied
> > > on. This field is the combination of cache level and cache type.
> > > Now it supports "l1d" (L1 data cache), "l1i" (L1 instruction
> > > cache), "l2" (L2 unified cache) and "l3" (L3 unified cache).
> > > 
> > > "topology" field accepts CPU topology levels including "thread",
> > > "core", "module", "cluster", "die", "socket", "book", "drawer"
> > > and a special value "default".  
> > 
> > Looks good; just one thing, does "thread" make sense?  I think that
> > it's almost by definition that threads within a core share all
> > caches, but maybe I'm missing some hardware configurations.  
> 
> Hi Paolo, merry Christmas. Yes, AFAIK, there's no hardware has thread
> level cache.

Hi Zhao and Paolo,

While the example looks OK to me, and makes sense. But would be curious
to know more scenarios where I can legitimately see benefit there.

I am wrestling with this point on ARM too. If I were to
have device trees describing caches in a way that threads get their own
private caches then this would not be possible to be
described via device tree due to spec limitations (+CCed Rob) if I
understood correctly.

Thanks,
Alireza

> 
> I considered the thread case is that it could be used for vCPU
> scheduling optimization (although I haven't rigorously tested the
> actual impact). Without CPU affinity, tasks in Linux are generally
> distributed evenly across different cores (for example, vCPU0 on Core
> 0, vCPU1 on Core 1). In this case, the thread-level cache settings
> are closer to the actual situation, with vCPU0 occupying the L1/L2 of
> Host core0 and vCPU1 occupying the L1/L2 of Host core1.
> 
> 
>   ┌───┐        ┌───┐
>   vCPU0        vCPU1
>   │   │        │   │
>   └───┘        └───┘
>  ┌┌───┐┌───┐┐ ┌┌───┐┌───┐┐
>  ││T0 ││T1 ││ ││T2 ││T3 ││
>  │└───┘└───┘│ │└───┘└───┘│
>  └────C0────┘ └────C1────┘
> 
> 
> The L2 cache topology affects performance, and cluster-aware
> scheduling feature in the Linux kernel will try to schedule tasks on
> the same L2 cache. So, in cases like the figure above, setting the L2
> cache to be per thread should, in principle, be better.
> 
> Thanks,
> Zhao
> 
>

Rob Herring (Arm) Jan. 2, 2025, 5:09 p.m. UTC | #4

On Thu, Jan 2, 2025 at 8:57 AM Alireza Sanaee <alireza.sanaee@huawei.com> wrote:
>
> On Wed, 25 Dec 2024 11:03:42 +0800
> Zhao Liu <zhao1.liu@intel.com> wrote:
>
> > > > About smp-cache
> > > > ===============
> > > >
> > > > The API design has been discussed heavily in [3].
> > > >
> > > > Now, smp-cache is implemented as a array integrated in -machine.
> > > > Though -machine currently can't support JSON format, this is the
> > > > one of the directions of future.
> > > >
> > > > An example is as follows:
> > > >
> > > > smp_cache=smp-cache.0.cache=l1i,smp-cache.0.topology=core,smp-cache.1.cache=l1d,smp-cache.1.topology=core,smp-cache.2.cache=l2,smp-cache.2.topology=module,smp-cache.3.cache=l3,smp-cache.3.topology=die
> > > >
> > > > "cache" specifies the cache that the properties will be applied
> > > > on. This field is the combination of cache level and cache type.
> > > > Now it supports "l1d" (L1 data cache), "l1i" (L1 instruction
> > > > cache), "l2" (L2 unified cache) and "l3" (L3 unified cache).
> > > >
> > > > "topology" field accepts CPU topology levels including "thread",
> > > > "core", "module", "cluster", "die", "socket", "book", "drawer"
> > > > and a special value "default".
> > >
> > > Looks good; just one thing, does "thread" make sense?  I think that
> > > it's almost by definition that threads within a core share all
> > > caches, but maybe I'm missing some hardware configurations.
> >
> > Hi Paolo, merry Christmas. Yes, AFAIK, there's no hardware has thread
> > level cache.
>
> Hi Zhao and Paolo,
>
> While the example looks OK to me, and makes sense. But would be curious
> to know more scenarios where I can legitimately see benefit there.
>
> I am wrestling with this point on ARM too. If I were to
> have device trees describing caches in a way that threads get their own
> private caches then this would not be possible to be
> described via device tree due to spec limitations (+CCed Rob) if I
> understood correctly.

You asked me for the opposite though, and I described how you can
share the cache. If you want a cache per thread, then you probably
want a node per thread.

Rob

Alireza Sanaee Jan. 2, 2025, 6:01 p.m. UTC | #5

On Thu, 2 Jan 2025 11:09:51 -0600
Rob Herring <robh@kernel.org> wrote:

> On Thu, Jan 2, 2025 at 8:57 AM Alireza Sanaee
> <alireza.sanaee@huawei.com> wrote:
> >
> > On Wed, 25 Dec 2024 11:03:42 +0800
> > Zhao Liu <zhao1.liu@intel.com> wrote:
> >  
> > > > > About smp-cache
> > > > > ===============
> > > > >
> > > > > The API design has been discussed heavily in [3].
> > > > >
> > > > > Now, smp-cache is implemented as a array integrated in
> > > > > -machine. Though -machine currently can't support JSON
> > > > > format, this is the one of the directions of future.
> > > > >
> > > > > An example is as follows:
> > > > >
> > > > > smp_cache=smp-cache.0.cache=l1i,smp-cache.0.topology=core,smp-cache.1.cache=l1d,smp-cache.1.topology=core,smp-cache.2.cache=l2,smp-cache.2.topology=module,smp-cache.3.cache=l3,smp-cache.3.topology=die
> > > > >
> > > > > "cache" specifies the cache that the properties will be
> > > > > applied on. This field is the combination of cache level and
> > > > > cache type. Now it supports "l1d" (L1 data cache), "l1i" (L1
> > > > > instruction cache), "l2" (L2 unified cache) and "l3" (L3
> > > > > unified cache).
> > > > >
> > > > > "topology" field accepts CPU topology levels including
> > > > > "thread", "core", "module", "cluster", "die", "socket",
> > > > > "book", "drawer" and a special value "default".  
> > > >
> > > > Looks good; just one thing, does "thread" make sense?  I think
> > > > that it's almost by definition that threads within a core share
> > > > all caches, but maybe I'm missing some hardware configurations.
> > > >  
> > >
> > > Hi Paolo, merry Christmas. Yes, AFAIK, there's no hardware has
> > > thread level cache.  
> >
> > Hi Zhao and Paolo,
> >
> > While the example looks OK to me, and makes sense. But would be
> > curious to know more scenarios where I can legitimately see benefit
> > there.
> >
> > I am wrestling with this point on ARM too. If I were to
> > have device trees describing caches in a way that threads get their
> > own private caches then this would not be possible to be
> > described via device tree due to spec limitations (+CCed Rob) if I
> > understood correctly.  
> 
> You asked me for the opposite though, and I described how you can
> share the cache. If you want a cache per thread, then you probably
> want a node per thread.
> 
> Rob
> 

Hi Rob,

That's right, I made the mistake in my prior message, and you recalled
correctly. I wanted shared caches between two threads, though I have
missed your answer before, just found it.

Thanks,
Alireza

Zhao Liu Jan. 3, 2025, 8:25 a.m. UTC | #6

On Thu, Jan 02, 2025 at 06:01:41PM +0000, Alireza Sanaee wrote:
> Date: Thu, 2 Jan 2025 18:01:41 +0000
> From: Alireza Sanaee <alireza.sanaee@huawei.com>
> Subject: Re: [PATCH v6 0/4] i386: Support SMP Cache Topology
> X-Mailer: Claws Mail 4.3.0 (GTK 3.24.42; x86_64-w64-mingw32)
> 
> On Thu, 2 Jan 2025 11:09:51 -0600
> Rob Herring <robh@kernel.org> wrote:
> 
> > On Thu, Jan 2, 2025 at 8:57 AM Alireza Sanaee
> > <alireza.sanaee@huawei.com> wrote:
> > >
> > > On Wed, 25 Dec 2024 11:03:42 +0800
> > > Zhao Liu <zhao1.liu@intel.com> wrote:
> > >  
> > > > > > About smp-cache
> > > > > > ===============
> > > > > >
> > > > > > The API design has been discussed heavily in [3].
> > > > > >
> > > > > > Now, smp-cache is implemented as a array integrated in
> > > > > > -machine. Though -machine currently can't support JSON
> > > > > > format, this is the one of the directions of future.
> > > > > >
> > > > > > An example is as follows:
> > > > > >
> > > > > > smp_cache=smp-cache.0.cache=l1i,smp-cache.0.topology=core,smp-cache.1.cache=l1d,smp-cache.1.topology=core,smp-cache.2.cache=l2,smp-cache.2.topology=module,smp-cache.3.cache=l3,smp-cache.3.topology=die
> > > > > >
> > > > > > "cache" specifies the cache that the properties will be
> > > > > > applied on. This field is the combination of cache level and
> > > > > > cache type. Now it supports "l1d" (L1 data cache), "l1i" (L1
> > > > > > instruction cache), "l2" (L2 unified cache) and "l3" (L3
> > > > > > unified cache).
> > > > > >
> > > > > > "topology" field accepts CPU topology levels including
> > > > > > "thread", "core", "module", "cluster", "die", "socket",
> > > > > > "book", "drawer" and a special value "default".  
> > > > >
> > > > > Looks good; just one thing, does "thread" make sense?  I think
> > > > > that it's almost by definition that threads within a core share
> > > > > all caches, but maybe I'm missing some hardware configurations.
> > > > >  
> > > >
> > > > Hi Paolo, merry Christmas. Yes, AFAIK, there's no hardware has
> > > > thread level cache.  
> > >
> > > Hi Zhao and Paolo,
> > >
> > > While the example looks OK to me, and makes sense. But would be
> > > curious to know more scenarios where I can legitimately see benefit
> > > there.
> > >
> > > I am wrestling with this point on ARM too. If I were to
> > > have device trees describing caches in a way that threads get their
> > > own private caches then this would not be possible to be
> > > described via device tree due to spec limitations (+CCed Rob) if I
> > > understood correctly.  
> > 
> > You asked me for the opposite though, and I described how you can
> > share the cache. If you want a cache per thread, then you probably
> > want a node per thread.
> > 
> > Rob
> > 
> 
> Hi Rob,
> 
> That's right, I made the mistake in my prior message, and you recalled
> correctly. I wanted shared caches between two threads, though I have
> missed your answer before, just found it.
> 

Thank you all!

Alireza, do you know how to configure arm node through QEMU options?

However, IIUC, arm needs more effort to configure cache per thread (by
configuring node topology)...In that case, since no one has explicitly
requested the need for cache per thread, I can disable cache per thread
for now. I can return an error for this scenario during the general
smp-cache option parsing (in the future, if there is a real need, it can
be easily re-enabled).

Will drop cache per thread in the next version.

Thanks,
Zhao

Alireza Sanaee Jan. 3, 2025, 2:04 p.m. UTC | #7

On Fri, 3 Jan 2025 16:25:58 +0800
Zhao Liu <zhao1.liu@intel.com> wrote:

> On Thu, Jan 02, 2025 at 06:01:41PM +0000, Alireza Sanaee wrote:
> > Date: Thu, 2 Jan 2025 18:01:41 +0000
> > From: Alireza Sanaee <alireza.sanaee@huawei.com>
> > Subject: Re: [PATCH v6 0/4] i386: Support SMP Cache Topology
> > X-Mailer: Claws Mail 4.3.0 (GTK 3.24.42; x86_64-w64-mingw32)
> > 
> > On Thu, 2 Jan 2025 11:09:51 -0600
> > Rob Herring <robh@kernel.org> wrote:
> >   
> > > On Thu, Jan 2, 2025 at 8:57 AM Alireza Sanaee
> > > <alireza.sanaee@huawei.com> wrote:  
> > > >
> > > > On Wed, 25 Dec 2024 11:03:42 +0800
> > > > Zhao Liu <zhao1.liu@intel.com> wrote:
> > > >    
> > > > > > > About smp-cache
> > > > > > > ===============
> > > > > > >
> > > > > > > The API design has been discussed heavily in [3].
> > > > > > >
> > > > > > > Now, smp-cache is implemented as a array integrated in
> > > > > > > -machine. Though -machine currently can't support JSON
> > > > > > > format, this is the one of the directions of future.
> > > > > > >
> > > > > > > An example is as follows:
> > > > > > >
> > > > > > > smp_cache=smp-cache.0.cache=l1i,smp-cache.0.topology=core,smp-cache.1.cache=l1d,smp-cache.1.topology=core,smp-cache.2.cache=l2,smp-cache.2.topology=module,smp-cache.3.cache=l3,smp-cache.3.topology=die
> > > > > > >
> > > > > > > "cache" specifies the cache that the properties will be
> > > > > > > applied on. This field is the combination of cache level
> > > > > > > and cache type. Now it supports "l1d" (L1 data cache),
> > > > > > > "l1i" (L1 instruction cache), "l2" (L2 unified cache) and
> > > > > > > "l3" (L3 unified cache).
> > > > > > >
> > > > > > > "topology" field accepts CPU topology levels including
> > > > > > > "thread", "core", "module", "cluster", "die", "socket",
> > > > > > > "book", "drawer" and a special value "default".    
> > > > > >
> > > > > > Looks good; just one thing, does "thread" make sense?  I
> > > > > > think that it's almost by definition that threads within a
> > > > > > core share all caches, but maybe I'm missing some hardware
> > > > > > configurations. 
> > > > >
> > > > > Hi Paolo, merry Christmas. Yes, AFAIK, there's no hardware has
> > > > > thread level cache.    
> > > >
> > > > Hi Zhao and Paolo,
> > > >
> > > > While the example looks OK to me, and makes sense. But would be
> > > > curious to know more scenarios where I can legitimately see
> > > > benefit there.
> > > >
> > > > I am wrestling with this point on ARM too. If I were to
> > > > have device trees describing caches in a way that threads get
> > > > their own private caches then this would not be possible to be
> > > > described via device tree due to spec limitations (+CCed Rob)
> > > > if I understood correctly.    
> > > 
> > > You asked me for the opposite though, and I described how you can
> > > share the cache. If you want a cache per thread, then you probably
> > > want a node per thread.
> > > 
> > > Rob
> > >   
> > 
> > Hi Rob,
> > 
> > That's right, I made the mistake in my prior message, and you
> > recalled correctly. I wanted shared caches between two threads,
> > though I have missed your answer before, just found it.
> >   
> 
> Thank you all!
> 
> Alireza, do you know how to configure arm node through QEMU options?

Hi Zhao, do you mean the -smp param?
> 
> However, IIUC, arm needs more effort to configure cache per thread (by
> configuring node topology)...In that case, since no one has explicitly
> requested the need for cache per thread, I can disable cache per
> thread for now. I can return an error for this scenario during the
> general smp-cache option parsing (in the future, if there is a real
> need, it can be easily re-enabled).
> 
> Will drop cache per thread in the next version.
> 
> Thanks,
> Zhao
> 
>

Zhao Liu Jan. 3, 2025, 3:50 p.m. UTC | #8

> > > > You asked me for the opposite though, and I described how you can
> > > > share the cache. If you want a cache per thread, then you probably
> > > > want a node per thread.
> > > > 
> > > > Rob
> > > >   
> > > 
> > > Hi Rob,
> > > 
> > > That's right, I made the mistake in my prior message, and you
> > > recalled correctly. I wanted shared caches between two threads,
> > > though I have missed your answer before, just found it.
> > >   
> > 
> > Thank you all!
> > 
> > Alireza, do you know how to configure arm node through QEMU options?
> 
> Hi Zhao, do you mean the -smp param?
>

I mean do you know how to configure something like "a node per thread"
by QEMU option? :-) I'm curious about the relationship between "node"
and the SMP topology on the ARM side in the current QEMU. I'm not sure
if this "node" refers to the NUMA node.

Thanks,
Zhao

[v6,0/4] i386: Support SMP Cache Topology

Message

Comments