mbox series

[v4,00/20] arch_topology: Updates to add socket support and fix cluster ids

Message ID 20220621192034.3332546-1-sudeep.holla@arm.com (mailing list archive)
Headers show
Series arch_topology: Updates to add socket support and fix cluster ids | expand

Message

Sudeep Holla June 21, 2022, 7:20 p.m. UTC
Hi All,

This version updates cacheinfo to populate and use the information from
there for all the cache topology.

This series intends to fix some discrepancies we have in the CPU topology
parsing from the device tree /cpu-map node. Also this diverges from the
behaviour on a ACPI enabled platform. The expectation is that both DT
and ACPI enabled systems must present consistent view of the CPU topology.

Currently we assign generated cluster count as the physical package identifier
for each CPU which is wrong. The device tree bindings for CPU topology supports
sockets to infer the socket or physical package identifier for a given CPU.
Also we don't check if all the cores/threads belong to the same cluster before
updating their sibling masks which is fine as we don't set the cluster id yet.

These changes also assigns the cluster identifier as parsed from the device tree
cluster nodes within /cpu-map without support for nesting of the clusters.
Finally, it also add support for socket nodes in /cpu-map. With this the
parsing of exact same information from ACPI PPTT and /cpu-map DT node
aligns well.

The only exception is that the last level cache id information can be
inferred from the same ACPI PPTT while we need to parse CPU cache nodes
in the device tree.

Hi Greg,

I had not cc-ed you on earlier 3 versions as we had some disagreement
amongst Arm developers which we have not settled. Let me know how you want to
merge this once you agree with the changes. I can set pull request if
you prefer. Let me know.

v4[3]->v4:
	- Updated ACPI PPTT fw_token to use table offset instead of virtual
	  address as it could get changed for everytime it is mapped before
	  the global acpi_permanent_mmap is set
	- Added warning for the topology with nested clusters
	- Added update to cpu_clustergroup_mask so that introduction of
	  correct cluster_id doesn't break existing platforms by limiting
	  the span of clustergroup_mask(by Ionela)

v2[2]->v3[3]:
        - Dropped support to get the device node for the CPU's LLC
        - Updated cacheinfo to support calling of detect_cache_attributes
          early in smp_prepare_cpus stage
        - Added support to check if LLC is valid and shared in the cacheinfo
        - Used the same in arch_topology

v1[1]->v2[2]:
        - Updated ID validity check include all non-negative value
        - Added support to get the device node for the CPU's last level cache
        - Added support to build llc_sibling on DT platforms

[1] https://lore.kernel.org/lkml/20220513095559.1034633-1-sudeep.holla@arm.com
[2] https://lore.kernel.org/lkml/20220518093325.2070336-1-sudeep.holla@arm.com
[3] https://lore.kernel.org/lkml/20220525081416.3306043-1-sudeep.holla@arm.com


Ionela Voinescu (1):
  arch_topology: Limit span of cpu_clustergroup_mask()

Sudeep Holla (19):
  ACPI: PPTT: Use table offset as fw_token instead of virtual address
  cacheinfo: Use of_cpu_device_node_get instead cpu_dev->of_node
  cacheinfo: Add helper to access any cache index for a given CPU
  cacheinfo: Move cache_leaves_are_shared out of CONFIG_OF
  cacheinfo: Add support to check if last level cache(LLC) is valid or shared
  cacheinfo: Allow early detection and population of cache attributes
  cacheinfo: Use cache identifiers to check if the caches are shared if available
  arch_topology: Add support to parse and detect cache attributes
  arch_topology: Use the last level cache information from the cacheinfo
  arm64: topology: Remove redundant setting of llc_id in CPU topology
  arch_topology: Drop LLC identifier stash from the CPU topology
  arch_topology: Set thread sibling cpumask only within the cluster
  arch_topology: Check for non-negative value rather than -1 for IDs validity
  arch_topology: Avoid parsing through all the CPUs once a outlier CPU is found
  arch_topology: Don't set cluster identifier as physical package identifier
  arch_topology: Drop unnecessary check for uninitialised package_id
  arch_topology: Set cluster identifier in each core/thread from /cpu-map
  arch_topology: Add support for parsing sockets in /cpu-map
  arch_topology: Warn that topology for nested clusters is not supported

 arch/arm64/kernel/topology.c  |  14 ----
 drivers/acpi/pptt.c           |   3 +-
 drivers/base/arch_topology.c  | 102 ++++++++++++++++++---------
 drivers/base/cacheinfo.c      | 127 ++++++++++++++++++++++------------
 include/linux/arch_topology.h |   1 -
 include/linux/cacheinfo.h     |   3 +
 6 files changed, 159 insertions(+), 91 deletions(-)

--
2.36.1

Comments

Ionela Voinescu June 27, 2022, 1:54 p.m. UTC | #1
Hi Sudeep,

On Tuesday 21 Jun 2022 at 20:20:14 (+0100), Sudeep Holla wrote:
> Hi All,
> 
> This version updates cacheinfo to populate and use the information from
> there for all the cache topology.
> 
> This series intends to fix some discrepancies we have in the CPU topology
> parsing from the device tree /cpu-map node. Also this diverges from the
> behaviour on a ACPI enabled platform. The expectation is that both DT
> and ACPI enabled systems must present consistent view of the CPU topology.
> 
> Currently we assign generated cluster count as the physical package identifier
> for each CPU which is wrong. The device tree bindings for CPU topology supports
> sockets to infer the socket or physical package identifier for a given CPU.
> Also we don't check if all the cores/threads belong to the same cluster before
> updating their sibling masks which is fine as we don't set the cluster id yet.
> 
> These changes also assigns the cluster identifier as parsed from the device tree
> cluster nodes within /cpu-map without support for nesting of the clusters.
> Finally, it also add support for socket nodes in /cpu-map. With this the
> parsing of exact same information from ACPI PPTT and /cpu-map DT node
> aligns well.
> 
> The only exception is that the last level cache id information can be
> inferred from the same ACPI PPTT while we need to parse CPU cache nodes
> in the device tree.
> 
> Hi Greg,
> 
> I had not cc-ed you on earlier 3 versions as we had some disagreement
> amongst Arm developers which we have not settled. Let me know how you want to

s/not/now :)

> merge this once you agree with the changes. I can set pull request if
> you prefer. Let me know.
> 
> v4[3]->v4:
> 	- Updated ACPI PPTT fw_token to use table offset instead of virtual
> 	  address as it could get changed for everytime it is mapped before
> 	  the global acpi_permanent_mmap is set
> 	- Added warning for the topology with nested clusters
> 	- Added update to cpu_clustergroup_mask so that introduction of
> 	  correct cluster_id doesn't break existing platforms by limiting
> 	  the span of clustergroup_mask(by Ionela)
> 

I've tested v4 on quite a few platforms:
 - DT: Juno R0, DB845c, RB5
 - ACPI: TX2, Ampere Altra, Kunpeng920

and it all looks good from my point of view (topology and sched domain
hierarchy).

So for the full set (after the changes requested for 16/20 and 20/20):

Tested-by: Ionela Voinescu <ionela.voinescu@arm.com>

Hope it helps,
Ionela.
Sudeep Holla June 27, 2022, 4:22 p.m. UTC | #2
On Mon, Jun 27, 2022 at 02:54:28PM +0100, Ionela Voinescu wrote:
> Hi Sudeep,
> 
> On Tuesday 21 Jun 2022 at 20:20:14 (+0100), Sudeep Holla wrote:
> > Hi All,
> > 
> > This version updates cacheinfo to populate and use the information from
> > there for all the cache topology.
> > 
> > This series intends to fix some discrepancies we have in the CPU topology
> > parsing from the device tree /cpu-map node. Also this diverges from the
> > behaviour on a ACPI enabled platform. The expectation is that both DT
> > and ACPI enabled systems must present consistent view of the CPU topology.
> > 
> > Currently we assign generated cluster count as the physical package identifier
> > for each CPU which is wrong. The device tree bindings for CPU topology supports
> > sockets to infer the socket or physical package identifier for a given CPU.
> > Also we don't check if all the cores/threads belong to the same cluster before
> > updating their sibling masks which is fine as we don't set the cluster id yet.
> > 
> > These changes also assigns the cluster identifier as parsed from the device tree
> > cluster nodes within /cpu-map without support for nesting of the clusters.
> > Finally, it also add support for socket nodes in /cpu-map. With this the
> > parsing of exact same information from ACPI PPTT and /cpu-map DT node
> > aligns well.
> > 
> > The only exception is that the last level cache id information can be
> > inferred from the same ACPI PPTT while we need to parse CPU cache nodes
> > in the device tree.
> > 
> > Hi Greg,
> > 
> > I had not cc-ed you on earlier 3 versions as we had some disagreement
> > amongst Arm developers which we have not settled. Let me know how you want to
> 
> s/not/now :)
> 
> > merge this once you agree with the changes. I can set pull request if
> > you prefer. Let me know.
> > 
> > v4[3]->v4:
> > 	- Updated ACPI PPTT fw_token to use table offset instead of virtual
> > 	  address as it could get changed for everytime it is mapped before
> > 	  the global acpi_permanent_mmap is set
> > 	- Added warning for the topology with nested clusters
> > 	- Added update to cpu_clustergroup_mask so that introduction of
> > 	  correct cluster_id doesn't break existing platforms by limiting
> > 	  the span of clustergroup_mask(by Ionela)
> > 
> 
> I've tested v4 on quite a few platforms:
>  - DT: Juno R0, DB845c, RB5
>  - ACPI: TX2, Ampere Altra, Kunpeng920
> 
> and it all looks good from my point of view (topology and sched domain
> hierarchy).
> 
> So for the full set (after the changes requested for 16/20 and 20/20):
> 
> Tested-by: Ionela Voinescu <ionela.voinescu@arm.com>
>

Thanks for all the review and testing. Much appreciated!