Message ID | 20191127114801.23837-1-dietmar.eggemann@arm.com (mailing list archive) |
---|---|
State | New, archived |
Delegated to: | viresh kumar |
Headers | show |
Series | cpufreq: vexpress-spc: Fix wrong alternation of policy->related_cpus during CPU hp | expand |
On 27-11-19, 12:48, Dietmar Eggemann wrote: > Since commit ca74b316df96 ("arm: Use common cpu_topology structure and > functions.") the core cpumask has to be modified during cpu hotplug > operations. > > ("arm: Fix topology setup in case of CPU hotplug for CONFIG_SCHED_MC") > [1] fixed that but revealed another issue on TC2, i.e in its cpufreq > driver. > > During CPU hp stress operations on multiple CPUs, policy->related_cpus > can be altered. This is wrong since this cpumask should contain the > online and offline CPUs. > > The WARN_ON(!cpumask_test_cpu(cpu, policy->related_cpus)) in > cpufreq_online() triggers in this case. > > The core cpumask can't be used to set the policy->cpus in > ve_spc_cpufreq_init() anymore in case it is called via > cpuhp_cpufreq_online()->cpufreq_online()->cpufreq_driver->init(). > > An empty online() callback can be used to avoid that the init() > driver function is called during CPU hotplug in so that > policy->related_cpus will not be changed. > > Implementing an online() also requires an offline() callback. > > Tested on TC2 with CPU hp stress test (CPU hp from multiple CPUs at > the same time). > > [1] > https://lore.kernel.org/r/20191127103353.12417-1-dietmar.eggemann@arm.com > > Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Wanna provide any fixes tag ? > --- > drivers/cpufreq/vexpress-spc-cpufreq.c | 12 ++++++++++++ > 1 file changed, 12 insertions(+) This is 5.5 material or 5.6 ?
On Wed, Nov 27, 2019 at 12:48:01PM +0100, Dietmar Eggemann wrote: > Since commit ca74b316df96 ("arm: Use common cpu_topology structure and > functions.") the core cpumask has to be modified during cpu hotplug > operations. > > ("arm: Fix topology setup in case of CPU hotplug for CONFIG_SCHED_MC") > [1] fixed that but revealed another issue on TC2, i.e in its cpufreq > driver. > > During CPU hp stress operations on multiple CPUs, policy->related_cpus > can be altered. This is wrong since this cpumask should contain the > online and offline CPUs. > > The WARN_ON(!cpumask_test_cpu(cpu, policy->related_cpus)) in > cpufreq_online() triggers in this case. > > The core cpumask can't be used to set the policy->cpus in > ve_spc_cpufreq_init() anymore in case it is called via > cpuhp_cpufreq_online()->cpufreq_online()->cpufreq_driver->init(). > > An empty online() callback can be used to avoid that the init() > driver function is called during CPU hotplug in so that > policy->related_cpus will not be changed. > Unlike DT based drivers, it not easy to get the fixed cpumask unless we add some mechanism to extract it based on clks/OPP added. I prefer this simple solution instead. > Implementing an online() also requires an offline() callback. > > Tested on TC2 with CPU hp stress test (CPU hp from multiple CPUs at > the same time). > > [1] > https://lore.kernel.org/r/20191127103353.12417-1-dietmar.eggemann@arm.com > Reviewed-by: Sudeep Holla <sudeep.holla@arm.com> Tested-by: Sudeep Holla <sudeep.holla@arm.com> -- Regards, Sudeep
On Wed, Nov 27, 2019 at 05:37:44PM +0530, Viresh Kumar wrote: > On 27-11-19, 12:48, Dietmar Eggemann wrote: > > Since commit ca74b316df96 ("arm: Use common cpu_topology structure and > > functions.") the core cpumask has to be modified during cpu hotplug > > operations. > > > > ("arm: Fix topology setup in case of CPU hotplug for CONFIG_SCHED_MC") > > [1] fixed that but revealed another issue on TC2, i.e in its cpufreq > > driver. > > > > During CPU hp stress operations on multiple CPUs, policy->related_cpus > > can be altered. This is wrong since this cpumask should contain the > > online and offline CPUs. > > > > The WARN_ON(!cpumask_test_cpu(cpu, policy->related_cpus)) in > > cpufreq_online() triggers in this case. > > > > The core cpumask can't be used to set the policy->cpus in > > ve_spc_cpufreq_init() anymore in case it is called via > > cpuhp_cpufreq_online()->cpufreq_online()->cpufreq_driver->init(). > > > > An empty online() callback can be used to avoid that the init() > > driver function is called during CPU hotplug in so that > > policy->related_cpus will not be changed. > > > > Implementing an online() also requires an offline() callback. > > > > Tested on TC2 with CPU hp stress test (CPU hp from multiple CPUs at > > the same time). > > > > [1] > > https://lore.kernel.org/r/20191127103353.12417-1-dietmar.eggemann@arm.com > > > > Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com> > > Wanna provide any fixes tag ? > > > --- > > drivers/cpufreq/vexpress-spc-cpufreq.c | 12 ++++++++++++ > > 1 file changed, 12 insertions(+) > > This is 5.5 material or 5.6 ? > v5.5 for sure, broken even on v5.4 but unless someone really cares for stable on TC2, I am happy to skip it. -- Regards, Sudeep
On 27-11-19, 12:08, Sudeep Holla wrote: > On Wed, Nov 27, 2019 at 12:48:01PM +0100, Dietmar Eggemann wrote: > > Since commit ca74b316df96 ("arm: Use common cpu_topology structure and > > functions.") the core cpumask has to be modified during cpu hotplug > > operations. > > > > ("arm: Fix topology setup in case of CPU hotplug for CONFIG_SCHED_MC") > > [1] fixed that but revealed another issue on TC2, i.e in its cpufreq > > driver. > > > > During CPU hp stress operations on multiple CPUs, policy->related_cpus > > can be altered. This is wrong since this cpumask should contain the > > online and offline CPUs. > > > > The WARN_ON(!cpumask_test_cpu(cpu, policy->related_cpus)) in > > cpufreq_online() triggers in this case. > > > > The core cpumask can't be used to set the policy->cpus in > > ve_spc_cpufreq_init() anymore in case it is called via > > cpuhp_cpufreq_online()->cpufreq_online()->cpufreq_driver->init(). > > > > An empty online() callback can be used to avoid that the init() > > driver function is called during CPU hotplug in so that > > policy->related_cpus will not be changed. > > > > Unlike DT based drivers, it not easy to get the fixed cpumask unless we > add some mechanism to extract it based on clks/OPP added. I prefer > this simple solution instead. I will call this a work-around for the problem and not really the solution, though I won't necessarily oppose it. There are cases which will break even with this solution. - Boot board with cpufreq driver as module. - Offline all CPUs except CPU0. - insert cpufreq driver. - online all CPUs. Now there is no guarantee that the last online will get the mask properly, if I have understood the problem well :) But yeah, who does this kind of messy work anyway :) FWIW, we need a proper way (may be from architecture code) to find list of all CPUs that share clock line.
On Wed, Nov 27, 2019 at 05:44:02PM +0530, Viresh Kumar wrote: > On 27-11-19, 12:08, Sudeep Holla wrote: > > On Wed, Nov 27, 2019 at 12:48:01PM +0100, Dietmar Eggemann wrote: > > > Since commit ca74b316df96 ("arm: Use common cpu_topology structure and > > > functions.") the core cpumask has to be modified during cpu hotplug > > > operations. > > > > > > ("arm: Fix topology setup in case of CPU hotplug for CONFIG_SCHED_MC") > > > [1] fixed that but revealed another issue on TC2, i.e in its cpufreq > > > driver. > > > > > > During CPU hp stress operations on multiple CPUs, policy->related_cpus > > > can be altered. This is wrong since this cpumask should contain the > > > online and offline CPUs. > > > > > > The WARN_ON(!cpumask_test_cpu(cpu, policy->related_cpus)) in > > > cpufreq_online() triggers in this case. > > > > > > The core cpumask can't be used to set the policy->cpus in > > > ve_spc_cpufreq_init() anymore in case it is called via > > > cpuhp_cpufreq_online()->cpufreq_online()->cpufreq_driver->init(). > > > > > > An empty online() callback can be used to avoid that the init() > > > driver function is called during CPU hotplug in so that > > > policy->related_cpus will not be changed. > > > > > > > Unlike DT based drivers, it not easy to get the fixed cpumask unless we > > add some mechanism to extract it based on clks/OPP added. I prefer > > this simple solution instead. > > I will call this a work-around for the problem and not really the > solution, though I won't necessarily oppose it. There are cases which > will break even with this solution. > I agree and that's the reason I spoke out my thought aloud here :) > - Boot board with cpufreq driver as module. > - Offline all CPUs except CPU0. > - insert cpufreq driver. > - online all CPUs. > Indeed, not just boot anytime since it's a module :) > Now there is no guarantee that the last online will get the mask > properly, if I have understood the problem well :) > Yes > But yeah, who does this kind of messy work anyway :) > I won't bet on that ;) > FWIW, we need a proper way (may be from architecture code) to find > list of all CPUs that share clock line. > Yes but there's no architectural way. I need to revise and see tc2_pm.c to check if we can do any magic there. -- Regards, Sudeep
On 27/11/2019 14:32, Sudeep Holla wrote: > On Wed, Nov 27, 2019 at 05:44:02PM +0530, Viresh Kumar wrote: >> On 27-11-19, 12:08, Sudeep Holla wrote: >>> On Wed, Nov 27, 2019 at 12:48:01PM +0100, Dietmar Eggemann wrote: [...] >>> Unlike DT based drivers, it not easy to get the fixed cpumask unless we >>> add some mechanism to extract it based on clks/OPP added. I prefer >>> this simple solution instead. >> >> I will call this a work-around for the problem and not really the >> solution, though I won't necessarily oppose it. There are cases which >> will break even with this solution. >> > > I agree and that's the reason I spoke out my thought aloud here :) > >> - Boot board with cpufreq driver as module. >> - Offline all CPUs except CPU0. >> - insert cpufreq driver. >> - online all CPUs. >> > > Indeed, not just boot anytime since it's a module :) > >> Now there is no guarantee that the last online will get the mask >> properly, if I have understood the problem well :) >> > > Yes > >> But yeah, who does this kind of messy work anyway :) >> > > I won't bet on that ;) > >> FWIW, we need a proper way (may be from architecture code) to find >> list of all CPUs that share clock line. >> > > Yes but there's no architectural way. I need to revise and see tc2_pm.c > to check if we can do any magic there. I'm fine with finding a better solution to return a fixed topology core cpumask or calling this patch a workaround. AFAICS, only TC2 is affected. ("arm: Fix topology setup in case of CPU hotplug for CONFIG_SCHED_MC") is needed for other systems as well in case we have commit ca74b316df96 ("arm: Use common cpu_topology structure and functions."). We probably don't want to revert commit ca74b316df96? We do CPU hp stress tests in our EAS mainline integration test suite https://developer.arm.com/tools-and-software/open-source-software/linux-kernel/energy-aware-scheduling/eas-mainline-development and there is where we initially encountered this issue on TC2.
On Wed, Nov 27, 2019 at 03:58:49PM +0100, Dietmar Eggemann wrote: > On 27/11/2019 14:32, Sudeep Holla wrote: [...] > > > > Yes but there's no architectural way. I need to revise and see tc2_pm.c > > to check if we can do any magic there. > > I'm fine with finding a better solution to return a fixed topology core > cpumask or calling this patch a workaround. AFAICS, only TC2 is affected. > > ("arm: Fix topology setup in case of CPU hotplug for CONFIG_SCHED_MC") > is needed for other systems as well in case we have commit ca74b316df96 > ("arm: Use common cpu_topology structure and functions."). We probably > don't want to revert commit ca74b316df96? > Correct > We do CPU hp stress tests in our EAS mainline integration test suite > https://developer.arm.com/tools-and-software/open-source-software/linux-kernel/energy-aware-scheduling/eas-mainline-development > and there is where we initially encountered this issue on TC2. I could come up with the patch below. If this is any cleaner and acceptable I am happy to post it. One advantage of moving the use of topology_core_cpumask inside ve_spc_clk_init is that it's just device_initcall and not a module. It allows to handle ve_spc_cpufreq as module. I prefer this than the previous solution/workaround. Let me know. Regards, Sudeep -->8 diff --git i/arch/arm/mach-vexpress/spc.c w/arch/arm/mach-vexpress/spc.c index 354e0e7025ae..e0e2e789a0b7 100644 --- i/arch/arm/mach-vexpress/spc.c +++ w/arch/arm/mach-vexpress/spc.c @@ -551,8 +551,9 @@ static struct clk *ve_spc_clk_register(struct device *cpu_dev) static int __init ve_spc_clk_init(void) { - int cpu; + int cpu, cluster; struct clk *clk; + bool init_opp_table[MAX_CLUSTERS] = { false }; if (!info) return 0; /* Continue only if SPC is initialised */ @@ -578,8 +579,17 @@ static int __init ve_spc_clk_init(void) continue; } + cluster = topology_physical_package_id(cpu_dev->id); + if (init_opp_table[cluster]) + continue; + if (ve_init_opp_table(cpu_dev)) pr_warn("failed to initialise cpu%d opp table\n", cpu); + else if (dev_pm_opp_set_sharing_cpus(cpu_dev, + topology_core_cpumask(cpu_dev->id))) + pr_warn("failed to mark OPPs shared for cpu%d\n", cpu); + + init_opp_table[cluster] = true; } platform_device_register_simple("vexpress-spc-cpufreq", -1, NULL, 0); diff --git i/drivers/cpufreq/vexpress-spc-cpufreq.c w/drivers/cpufreq/vexpress-spc-cpufreq.c index 506e3f2bf53a..83c85d3d67e3 100644 --- i/drivers/cpufreq/vexpress-spc-cpufreq.c +++ w/drivers/cpufreq/vexpress-spc-cpufreq.c @@ -434,7 +434,7 @@ static int ve_spc_cpufreq_init(struct cpufreq_policy *policy) if (cur_cluster < MAX_CLUSTERS) { int cpu; - cpumask_copy(policy->cpus, topology_core_cpumask(policy->cpu)); + dev_pm_opp_get_sharing_cpus(cpu_dev, policy->cpus); for_each_cpu(cpu, policy->cpus) per_cpu(physical_cluster, cpu) = cur_cluster;
On Wed, Nov 27, 2019 at 03:40:29PM +0000, Sudeep Holla wrote: > > diff --git i/arch/arm/mach-vexpress/spc.c w/arch/arm/mach-vexpress/spc.c > index 354e0e7025ae..e0e2e789a0b7 100644 > --- i/arch/arm/mach-vexpress/spc.c > +++ w/arch/arm/mach-vexpress/spc.c > @@ -551,8 +551,9 @@ static struct clk *ve_spc_clk_register(struct device *cpu_dev) > > static int __init ve_spc_clk_init(void) > { > - int cpu; > + int cpu, cluster; > struct clk *clk; > + bool init_opp_table[MAX_CLUSTERS] = { false }; > > if (!info) > return 0; /* Continue only if SPC is initialised */ > @@ -578,8 +579,17 @@ static int __init ve_spc_clk_init(void) > continue; > } > > + cluster = topology_physical_package_id(cpu_dev->id); > + if (init_opp_table[cluster]) > + continue; > + > if (ve_init_opp_table(cpu_dev)) > pr_warn("failed to initialise cpu%d opp table\n", cpu); > + else if (dev_pm_opp_set_sharing_cpus(cpu_dev, > + topology_core_cpumask(cpu_dev->id))) > + pr_warn("failed to mark OPPs shared for cpu%d\n", cpu); missing else here, found when I was looking at the patch to stash/commit locally. > + > + init_opp_table[cluster] = true; > } > > platform_device_register_simple("vexpress-spc-cpufreq", -1, NULL, 0); -- Regards, Sudeep
On 27-11-19, 15:40, Sudeep Holla wrote: > diff --git i/arch/arm/mach-vexpress/spc.c w/arch/arm/mach-vexpress/spc.c > index 354e0e7025ae..e0e2e789a0b7 100644 > --- i/arch/arm/mach-vexpress/spc.c > +++ w/arch/arm/mach-vexpress/spc.c > @@ -551,8 +551,9 @@ static struct clk *ve_spc_clk_register(struct device *cpu_dev) > > static int __init ve_spc_clk_init(void) > { > - int cpu; > + int cpu, cluster; > struct clk *clk; > + bool init_opp_table[MAX_CLUSTERS] = { false }; > > if (!info) > return 0; /* Continue only if SPC is initialised */ > @@ -578,8 +579,17 @@ static int __init ve_spc_clk_init(void) > continue; > } > > + cluster = topology_physical_package_id(cpu_dev->id); > + if (init_opp_table[cluster]) > + continue; > + > if (ve_init_opp_table(cpu_dev)) > pr_warn("failed to initialise cpu%d opp table\n", cpu); > + else if (dev_pm_opp_set_sharing_cpus(cpu_dev, > + topology_core_cpumask(cpu_dev->id))) > + pr_warn("failed to mark OPPs shared for cpu%d\n", cpu); > + > + init_opp_table[cluster] = true; > } > > platform_device_register_simple("vexpress-spc-cpufreq", -1, NULL, 0); > diff --git i/drivers/cpufreq/vexpress-spc-cpufreq.c w/drivers/cpufreq/vexpress-spc-cpufreq.c > index 506e3f2bf53a..83c85d3d67e3 100644 > --- i/drivers/cpufreq/vexpress-spc-cpufreq.c > +++ w/drivers/cpufreq/vexpress-spc-cpufreq.c > @@ -434,7 +434,7 @@ static int ve_spc_cpufreq_init(struct cpufreq_policy *policy) > if (cur_cluster < MAX_CLUSTERS) { > int cpu; > > - cpumask_copy(policy->cpus, topology_core_cpumask(policy->cpu)); > + dev_pm_opp_get_sharing_cpus(cpu_dev, policy->cpus); > > for_each_cpu(cpu, policy->cpus) > per_cpu(physical_cluster, cpu) = cur_cluster; This is a better *work-around* I would say, as we can't break it the way I explained earlier :)
On 28/11/2019 03:31, Viresh Kumar wrote: > On 27-11-19, 15:40, Sudeep Holla wrote: >> diff --git i/arch/arm/mach-vexpress/spc.c w/arch/arm/mach-vexpress/spc.c >> index 354e0e7025ae..e0e2e789a0b7 100644 >> --- i/arch/arm/mach-vexpress/spc.c >> +++ w/arch/arm/mach-vexpress/spc.c >> @@ -551,8 +551,9 @@ static struct clk *ve_spc_clk_register(struct device *cpu_dev) >> >> static int __init ve_spc_clk_init(void) >> { >> - int cpu; >> + int cpu, cluster; >> struct clk *clk; >> + bool init_opp_table[MAX_CLUSTERS] = { false }; >> >> if (!info) >> return 0; /* Continue only if SPC is initialised */ >> @@ -578,8 +579,17 @@ static int __init ve_spc_clk_init(void) >> continue; >> } >> >> + cluster = topology_physical_package_id(cpu_dev->id); >> + if (init_opp_table[cluster]) >> + continue; >> + >> if (ve_init_opp_table(cpu_dev)) >> pr_warn("failed to initialise cpu%d opp table\n", cpu); >> + else if (dev_pm_opp_set_sharing_cpus(cpu_dev, >> + topology_core_cpumask(cpu_dev->id))) >> + pr_warn("failed to mark OPPs shared for cpu%d\n", cpu); >> + >> + init_opp_table[cluster] = true; >> } >> >> platform_device_register_simple("vexpress-spc-cpufreq", -1, NULL, 0); >> diff --git i/drivers/cpufreq/vexpress-spc-cpufreq.c w/drivers/cpufreq/vexpress-spc-cpufreq.c >> index 506e3f2bf53a..83c85d3d67e3 100644 >> --- i/drivers/cpufreq/vexpress-spc-cpufreq.c >> +++ w/drivers/cpufreq/vexpress-spc-cpufreq.c >> @@ -434,7 +434,7 @@ static int ve_spc_cpufreq_init(struct cpufreq_policy *policy) >> if (cur_cluster < MAX_CLUSTERS) { >> int cpu; >> >> - cpumask_copy(policy->cpus, topology_core_cpumask(policy->cpu)); >> + dev_pm_opp_get_sharing_cpus(cpu_dev, policy->cpus); >> >> for_each_cpu(cpu, policy->cpus) >> per_cpu(physical_cluster, cpu) = cur_cluster; > > This is a better *work-around* I would say, as we can't break it the > way I explained earlier :) I do agree. Tested CPU hp stress on TC2 and it looks good. Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
diff --git a/drivers/cpufreq/vexpress-spc-cpufreq.c b/drivers/cpufreq/vexpress-spc-cpufreq.c index 506e3f2bf53a..857dcb535e97 100644 --- a/drivers/cpufreq/vexpress-spc-cpufreq.c +++ b/drivers/cpufreq/vexpress-spc-cpufreq.c @@ -492,6 +492,16 @@ static void ve_spc_cpufreq_ready(struct cpufreq_policy *policy) cdev[cur_cluster] = of_cpufreq_cooling_register(policy); } +static int ve_spc_cpufreq_online(struct cpufreq_policy *policy) +{ + return 0; +} + +static int ve_spc_cpufreq_offline(struct cpufreq_policy *policy) +{ + return 0; +} + static struct cpufreq_driver ve_spc_cpufreq_driver = { .name = "vexpress-spc", .flags = CPUFREQ_STICKY | @@ -503,6 +513,8 @@ static struct cpufreq_driver ve_spc_cpufreq_driver = { .init = ve_spc_cpufreq_init, .exit = ve_spc_cpufreq_exit, .ready = ve_spc_cpufreq_ready, + .online = ve_spc_cpufreq_online, + .offline = ve_spc_cpufreq_offline, .attr = cpufreq_generic_attr, };
Since commit ca74b316df96 ("arm: Use common cpu_topology structure and functions.") the core cpumask has to be modified during cpu hotplug operations. ("arm: Fix topology setup in case of CPU hotplug for CONFIG_SCHED_MC") [1] fixed that but revealed another issue on TC2, i.e in its cpufreq driver. During CPU hp stress operations on multiple CPUs, policy->related_cpus can be altered. This is wrong since this cpumask should contain the online and offline CPUs. The WARN_ON(!cpumask_test_cpu(cpu, policy->related_cpus)) in cpufreq_online() triggers in this case. The core cpumask can't be used to set the policy->cpus in ve_spc_cpufreq_init() anymore in case it is called via cpuhp_cpufreq_online()->cpufreq_online()->cpufreq_driver->init(). An empty online() callback can be used to avoid that the init() driver function is called during CPU hotplug in so that policy->related_cpus will not be changed. Implementing an online() also requires an offline() callback. Tested on TC2 with CPU hp stress test (CPU hp from multiple CPUs at the same time). [1] https://lore.kernel.org/r/20191127103353.12417-1-dietmar.eggemann@arm.com Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com> --- drivers/cpufreq/vexpress-spc-cpufreq.c | 12 ++++++++++++ 1 file changed, 12 insertions(+)