Message ID | 20241015021841.35713-4-yangyicong@huawei.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Support SMT control on arm64 | expand |
Hello Yicong, On 10/15/24 04:18, Yicong Yang wrote: > From: Yicong Yang <yangyicong@hisilicon.com> > > For ACPI we'll build the topology from PPTT and we cannot directly > get the SMT number of each core. Instead using a temporary xarray > to record the heterogeneous information (from ACPI_PPTT_ACPI_IDENTICAL) > and SMT information of the first core in its heterogeneous CPU cluster > when building the topology. Then we can know the largest SMT number > in the system. Is it possible to also state that the logic only work for PPTT versions prior to 6.2. Otherwise find_acpi_cpu_topology_hetero_id() will return the id of the root node of the PPTT, and all the CPUs of the platform will be assumed to be identical. > > The core's SMT control provides two interface to the users [1]: > 1) enable/disable SMT by writing on/off > 2) enable/disable SMT by writing thread number 1/max_thread_number > > If a system have more than one SMT thread number the 2) may > not handle it well, since there're multiple thread numbers in the > system and 2) only accept 1/max_thread_number. So issue a warning > to notify the users if such system detected. > > [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/ABI/testing/sysfs-devices-system-cpu#n542 > Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> > --- > arch/arm64/kernel/topology.c | 61 ++++++++++++++++++++++++++++++++++++ > 1 file changed, 61 insertions(+) > > diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c > index 1a2c72f3e7f8..2fa584b932ee 100644 > --- a/arch/arm64/kernel/topology.c > +++ b/arch/arm64/kernel/topology.c > @@ -15,8 +15,10 @@ > #include <linux/arch_topology.h> > #include <linux/cacheinfo.h> > #include <linux/cpufreq.h> > +#include <linux/cpu_smt.h> > #include <linux/init.h> > #include <linux/percpu.h> > +#include <linux/xarray.h> > > #include <asm/cpu.h> > #include <asm/cputype.h> > @@ -37,17 +39,29 @@ static bool __init acpi_cpu_is_threaded(int cpu) > return !!is_threaded; > } > > +struct cpu_smt_info { > + int thread_num; > + int core_id; > + int cpu; It doesn't seem that the 'cpu' is used. Maybe this should be removed, or used in the error message (I think this could be removed). > +}; > + > /* > * Propagate the topology information of the processor_topology_node tree to the > * cpu_topology array. > */ > int __init parse_acpi_topology(void) > { > + int max_smt_thread_num = 0; > + struct cpu_smt_info *entry; > + struct xarray hetero_cpu; > + unsigned long hetero_id; > int cpu, topology_id; > > if (acpi_disabled) > return 0; > > + xa_init(&hetero_cpu); > + > for_each_possible_cpu(cpu) { > topology_id = find_acpi_cpu_topology(cpu, 0); > if (topology_id < 0) > @@ -57,6 +71,30 @@ int __init parse_acpi_topology(void) > cpu_topology[cpu].thread_id = topology_id; > topology_id = find_acpi_cpu_topology(cpu, 1); > cpu_topology[cpu].core_id = topology_id; > + > + /* > + * Build up the XArray using the heterogeneous ID of > + * the CPU cluster. Store the CPU and SMT information > + * of the first appeared CPU in the CPU cluster of this > + * heterogeneous ID since the SMT information should be > + * the same in this CPU cluster. Then we can know the > + * SMT information of each heterogeneous CPUs in the > + * system. > + */ A suggestion just in case: /* * In the PPTT, CPUs below a node with the 'identical * implementation' flag have the same number of threads. * Count the number of threads for only one CPU (i.e. * one core_id) among those with the same hetero_id. * * One entry is created for each node having: * - the 'identical implementation' flag * - its parent not having the flag */ > + hetero_id = find_acpi_cpu_topology_hetero_id(cpu); > + entry = (struct cpu_smt_info *)xa_load(&hetero_cpu, hetero_id); > + if (!entry) { > + entry = kzalloc(sizeof(*entry), GFP_KERNEL); > + WARN_ON(!entry); > + > + entry->cpu = cpu; > + entry->core_id = topology_id; > + entry->thread_num = 1; > + xa_store(&hetero_cpu, hetero_id, > + entry, GFP_KERNEL); > + } else if (entry->core_id == topology_id) { > + entry->thread_num++; > + } > } else { > cpu_topology[cpu].thread_id = -1; > cpu_topology[cpu].core_id = topology_id; > @@ -67,6 +105,29 @@ int __init parse_acpi_topology(void) > cpu_topology[cpu].package_id = topology_id; > } > > + /* > + * This should be a short loop depending on the number of heterogeneous > + * CPU clusters. Typically on a homogeneous system there's only one > + * entry in the XArray. > + */ I think it should be ok to remove the above/below comments > + xa_for_each(&hetero_cpu, hetero_id, entry) { > + /* > + * If max_smt_thread_num has been initialized and doesn't match > + * the thread number of this entry, then the system has > + * heterogeneous SMT topology. > + */ > + if (entry->thread_num != max_smt_thread_num && max_smt_thread_num) > + pr_warn_once("Heterogeneous SMT topology is partly supported by SMT control\n"); > + > + if (entry->thread_num > max_smt_thread_num) > + max_smt_thread_num = entry->thread_num; > + > + xa_erase(&hetero_cpu, hetero_id); > + kfree(entry); > + } > + > + cpu_smt_set_num_threads(max_smt_thread_num, max_smt_thread_num); > + xa_destroy(&hetero_cpu); > return 0; > } > #endif Otherwise I tested the SMT detection and it worked well
On 2024/10/24 16:44, Pierre Gondois wrote: > Hello Yicong, > > On 10/15/24 04:18, Yicong Yang wrote: >> From: Yicong Yang <yangyicong@hisilicon.com> >> >> For ACPI we'll build the topology from PPTT and we cannot directly >> get the SMT number of each core. Instead using a temporary xarray >> to record the heterogeneous information (from ACPI_PPTT_ACPI_IDENTICAL) >> and SMT information of the first core in its heterogeneous CPU cluster >> when building the topology. Then we can know the largest SMT number >> in the system. > > Is it possible to also state that the logic only work for PPTT versions > prior to 6.2. Otherwise find_acpi_cpu_topology_hetero_id() will return > the id of the root node of the PPTT, and all the CPUs of the platform > will be assumed to be identical. > sure. of course. >> >> The core's SMT control provides two interface to the users [1]: >> 1) enable/disable SMT by writing on/off >> 2) enable/disable SMT by writing thread number 1/max_thread_number >> >> If a system have more than one SMT thread number the 2) may >> not handle it well, since there're multiple thread numbers in the >> system and 2) only accept 1/max_thread_number. So issue a warning >> to notify the users if such system detected. >> >> [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/ABI/testing/sysfs-devices-system-cpu#n542 >> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> >> --- >> arch/arm64/kernel/topology.c | 61 ++++++++++++++++++++++++++++++++++++ >> 1 file changed, 61 insertions(+) >> >> diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c >> index 1a2c72f3e7f8..2fa584b932ee 100644 >> --- a/arch/arm64/kernel/topology.c >> +++ b/arch/arm64/kernel/topology.c >> @@ -15,8 +15,10 @@ >> #include <linux/arch_topology.h> >> #include <linux/cacheinfo.h> >> #include <linux/cpufreq.h> >> +#include <linux/cpu_smt.h> >> #include <linux/init.h> >> #include <linux/percpu.h> >> +#include <linux/xarray.h> >> #include <asm/cpu.h> >> #include <asm/cputype.h> >> @@ -37,17 +39,29 @@ static bool __init acpi_cpu_is_threaded(int cpu) >> return !!is_threaded; >> } >> +struct cpu_smt_info { >> + int thread_num; >> + int core_id; >> + int cpu; > > It doesn't seem that the 'cpu' is used. Maybe this should be removed, > or used in the error message (I think this could be removed). > sure. will remove.. >> +}; >> + >> /* >> * Propagate the topology information of the processor_topology_node tree to the >> * cpu_topology array. >> */ >> int __init parse_acpi_topology(void) >> { >> + int max_smt_thread_num = 0; >> + struct cpu_smt_info *entry; >> + struct xarray hetero_cpu; >> + unsigned long hetero_id; >> int cpu, topology_id; >> if (acpi_disabled) >> return 0; >> + xa_init(&hetero_cpu); >> + >> for_each_possible_cpu(cpu) { >> topology_id = find_acpi_cpu_topology(cpu, 0); >> if (topology_id < 0) >> @@ -57,6 +71,30 @@ int __init parse_acpi_topology(void) >> cpu_topology[cpu].thread_id = topology_id; >> topology_id = find_acpi_cpu_topology(cpu, 1); >> cpu_topology[cpu].core_id = topology_id; >> + >> + /* >> + * Build up the XArray using the heterogeneous ID of >> + * the CPU cluster. Store the CPU and SMT information >> + * of the first appeared CPU in the CPU cluster of this >> + * heterogeneous ID since the SMT information should be >> + * the same in this CPU cluster. Then we can know the >> + * SMT information of each heterogeneous CPUs in the >> + * system. >> + */ > > A suggestion just in case: > /* > * In the PPTT, CPUs below a node with the 'identical > * implementation' flag have the same number of threads. > * Count the number of threads for only one CPU (i.e. > * one core_id) among those with the same hetero_id. > * > * One entry is created for each node having: > * - the 'identical implementation' flag > * - its parent not having the flag > */ > ok maybe we can also refer to the comments of find_acpi_cpu_topology_hetero_id() [1] to lead reader there if they want more details. [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/acpi/pptt.c#n794 > >> + hetero_id = find_acpi_cpu_topology_hetero_id(cpu); >> + entry = (struct cpu_smt_info *)xa_load(&hetero_cpu, hetero_id); >> + if (!entry) { >> + entry = kzalloc(sizeof(*entry), GFP_KERNEL); >> + WARN_ON(!entry); >> + >> + entry->cpu = cpu; >> + entry->core_id = topology_id; >> + entry->thread_num = 1; >> + xa_store(&hetero_cpu, hetero_id, >> + entry, GFP_KERNEL); >> + } else if (entry->core_id == topology_id) { >> + entry->thread_num++; >> + } >> } else { >> cpu_topology[cpu].thread_id = -1; >> cpu_topology[cpu].core_id = topology_id; >> @@ -67,6 +105,29 @@ int __init parse_acpi_topology(void) >> cpu_topology[cpu].package_id = topology_id; >> } >> + /* >> + * This should be a short loop depending on the number of heterogeneous >> + * CPU clusters. Typically on a homogeneous system there's only one >> + * entry in the XArray. >> + */ > > I think it should be ok to remove the above/below comments > I'd leave the above comment since this won't make people who are awared of the efficiency feel nervous about the loop here... will drop the below comment. >> + xa_for_each(&hetero_cpu, hetero_id, entry) { >> + /* >> + * If max_smt_thread_num has been initialized and doesn't match >> + * the thread number of this entry, then the system has >> + * heterogeneous SMT topology. >> + */ >> + if (entry->thread_num != max_smt_thread_num && max_smt_thread_num) >> + pr_warn_once("Heterogeneous SMT topology is partly supported by SMT control\n"); >> + >> + if (entry->thread_num > max_smt_thread_num) >> + max_smt_thread_num = entry->thread_num; >> + >> + xa_erase(&hetero_cpu, hetero_id); >> + kfree(entry); >> + } >> + >> + cpu_smt_set_num_threads(max_smt_thread_num, max_smt_thread_num); >> + xa_destroy(&hetero_cpu); >> return 0; >> } >> #endif > > Otherwise I tested the SMT detection and it worked well > appreciated for the tests! Thanks.
diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c index 1a2c72f3e7f8..2fa584b932ee 100644 --- a/arch/arm64/kernel/topology.c +++ b/arch/arm64/kernel/topology.c @@ -15,8 +15,10 @@ #include <linux/arch_topology.h> #include <linux/cacheinfo.h> #include <linux/cpufreq.h> +#include <linux/cpu_smt.h> #include <linux/init.h> #include <linux/percpu.h> +#include <linux/xarray.h> #include <asm/cpu.h> #include <asm/cputype.h> @@ -37,17 +39,29 @@ static bool __init acpi_cpu_is_threaded(int cpu) return !!is_threaded; } +struct cpu_smt_info { + int thread_num; + int core_id; + int cpu; +}; + /* * Propagate the topology information of the processor_topology_node tree to the * cpu_topology array. */ int __init parse_acpi_topology(void) { + int max_smt_thread_num = 0; + struct cpu_smt_info *entry; + struct xarray hetero_cpu; + unsigned long hetero_id; int cpu, topology_id; if (acpi_disabled) return 0; + xa_init(&hetero_cpu); + for_each_possible_cpu(cpu) { topology_id = find_acpi_cpu_topology(cpu, 0); if (topology_id < 0) @@ -57,6 +71,30 @@ int __init parse_acpi_topology(void) cpu_topology[cpu].thread_id = topology_id; topology_id = find_acpi_cpu_topology(cpu, 1); cpu_topology[cpu].core_id = topology_id; + + /* + * Build up the XArray using the heterogeneous ID of + * the CPU cluster. Store the CPU and SMT information + * of the first appeared CPU in the CPU cluster of this + * heterogeneous ID since the SMT information should be + * the same in this CPU cluster. Then we can know the + * SMT information of each heterogeneous CPUs in the + * system. + */ + hetero_id = find_acpi_cpu_topology_hetero_id(cpu); + entry = (struct cpu_smt_info *)xa_load(&hetero_cpu, hetero_id); + if (!entry) { + entry = kzalloc(sizeof(*entry), GFP_KERNEL); + WARN_ON(!entry); + + entry->cpu = cpu; + entry->core_id = topology_id; + entry->thread_num = 1; + xa_store(&hetero_cpu, hetero_id, + entry, GFP_KERNEL); + } else if (entry->core_id == topology_id) { + entry->thread_num++; + } } else { cpu_topology[cpu].thread_id = -1; cpu_topology[cpu].core_id = topology_id; @@ -67,6 +105,29 @@ int __init parse_acpi_topology(void) cpu_topology[cpu].package_id = topology_id; } + /* + * This should be a short loop depending on the number of heterogeneous + * CPU clusters. Typically on a homogeneous system there's only one + * entry in the XArray. + */ + xa_for_each(&hetero_cpu, hetero_id, entry) { + /* + * If max_smt_thread_num has been initialized and doesn't match + * the thread number of this entry, then the system has + * heterogeneous SMT topology. + */ + if (entry->thread_num != max_smt_thread_num && max_smt_thread_num) + pr_warn_once("Heterogeneous SMT topology is partly supported by SMT control\n"); + + if (entry->thread_num > max_smt_thread_num) + max_smt_thread_num = entry->thread_num; + + xa_erase(&hetero_cpu, hetero_id); + kfree(entry); + } + + cpu_smt_set_num_threads(max_smt_thread_num, max_smt_thread_num); + xa_destroy(&hetero_cpu); return 0; } #endif