Message ID | 20190513192300.653-15-ulf.hansson@linaro.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | ARM/ARM64: Support hierarchical CPU arrangement for PSCI | expand |
On Mon, May 13, 2019 at 09:22:56PM +0200, Ulf Hansson wrote: > When the hierarchical CPU topology layout is used in DT, let's allow the > CPU to be power managed through its PM domain, via deploying runtime PM > support. > > To know for which idle states runtime PM reference counting is needed, > let's store the index of deepest idle state for the CPU, in a per CPU > variable. This allows psci_cpu_suspend_enter() to compare this index with > the requested idle state index and then act accordingly. I do not see why a system with two CPU CPUidle states, say CPU retention and CPU shutdown, should not be calling runtime PM on CPU retention entry. The question then is what cluster/package/system states are allowed for a given CPU idle state, to understand what idle states can be actually entered at any hierarchy level given the choice made for the CPU idle state. In the case above, a CPU entering retention state should prevent runtime PM selecting a cluster shutdown state; most likely firmware would demote the request to cluster retention but still, we should find a way to describe these dependencies. Thanks, Lorenzo > Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> > --- > > Changes: > - Simplify the code by using the new per CPU struct, that stores the > needed struct device*. > > --- > drivers/firmware/psci/psci.c | 22 ++++++++++++++++++++-- > 1 file changed, 20 insertions(+), 2 deletions(-) > > diff --git a/drivers/firmware/psci/psci.c b/drivers/firmware/psci/psci.c > index 54e23d4ed0ea..2c4157d3a616 100644 > --- a/drivers/firmware/psci/psci.c > +++ b/drivers/firmware/psci/psci.c > @@ -20,6 +20,7 @@ > #include <linux/linkage.h> > #include <linux/of.h> > #include <linux/pm.h> > +#include <linux/pm_runtime.h> > #include <linux/printk.h> > #include <linux/psci.h> > #include <linux/reboot.h> > @@ -298,6 +299,7 @@ static int __init psci_features(u32 psci_func_id) > > struct psci_cpuidle_data { > u32 *psci_states; > + u32 rpm_state_id; > struct device *dev; > }; > > @@ -385,6 +387,7 @@ static int psci_dt_cpu_init_idle(struct cpuidle_driver *drv, > goto free_mem; > > data->dev = dev; > + data->rpm_state_id = drv->state_count - 1; > } > > /* Idle states parsed correctly, store them in the per-cpu struct. */ > @@ -481,8 +484,11 @@ static int psci_suspend_finisher(unsigned long index) > int psci_cpu_suspend_enter(unsigned long index) > { > int ret; > - u32 *state = __this_cpu_read(psci_cpuidle_data.psci_states); > - u32 composite_state = state[index - 1] | psci_get_domain_state(); > + struct psci_cpuidle_data *data = this_cpu_ptr(&psci_cpuidle_data); > + u32 *states = data->psci_states; > + struct device *dev = data->dev; > + bool runtime_pm = (dev && data->rpm_state_id == index); > + u32 composite_state; > > /* > * idle state index 0 corresponds to wfi, should never be called > @@ -491,11 +497,23 @@ int psci_cpu_suspend_enter(unsigned long index) > if (WARN_ON_ONCE(!index)) > return -EINVAL; > > + /* > + * Do runtime PM if we are using the hierarchical CPU toplogy, but only > + * when cpuidle have selected the deepest idle state for the CPU. > + */ > + if (runtime_pm) > + pm_runtime_put_sync_suspend(dev); > + > + composite_state = states[index - 1] | psci_get_domain_state(); > + > if (!psci_power_state_loses_context(composite_state)) > ret = psci_ops.cpu_suspend(composite_state, 0); > else > ret = cpu_suspend(index, psci_suspend_finisher); > > + if (runtime_pm) > + pm_runtime_get_sync(dev); > + > /* Clear the domain state to start fresh when back from idle. */ > psci_set_domain_state(0); > > -- > 2.17.1 >
On Tue, 16 Jul 2019 at 17:53, Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> wrote: > > On Mon, May 13, 2019 at 09:22:56PM +0200, Ulf Hansson wrote: > > When the hierarchical CPU topology layout is used in DT, let's allow the > > CPU to be power managed through its PM domain, via deploying runtime PM > > support. > > > > To know for which idle states runtime PM reference counting is needed, > > let's store the index of deepest idle state for the CPU, in a per CPU > > variable. This allows psci_cpu_suspend_enter() to compare this index with > > the requested idle state index and then act accordingly. > > I do not see why a system with two CPU CPUidle states, say CPU retention > and CPU shutdown, should not be calling runtime PM on CPU retention > entry. If the CPU idle governor did select the CPU retention for the CPU, it was probably because the target residency for the CPU shutdown state could not be met. In this case, there is no point in allowing any other deeper idle states for cluster/package/system, since those have even greater residencies, hence calling runtime PM doesn't make sense. > > The question then is what cluster/package/system states > are allowed for a given CPU idle state, to understand > what idle states can be actually entered at any hierarchy > level given the choice made for the CPU idle state. > > In the case above, a CPU entering retention state should prevent > runtime PM selecting a cluster shutdown state; most likely firmware > would demote the request to cluster retention but still, we should > find a way to describe these dependencies. See above. [...] Kind regards Uffe
On Thu, Jul 18, 2019 at 12:35:07PM +0200, Ulf Hansson wrote: > On Tue, 16 Jul 2019 at 17:53, Lorenzo Pieralisi > <lorenzo.pieralisi@arm.com> wrote: > > > > On Mon, May 13, 2019 at 09:22:56PM +0200, Ulf Hansson wrote: > > > When the hierarchical CPU topology layout is used in DT, let's allow the > > > CPU to be power managed through its PM domain, via deploying runtime PM > > > support. > > > > > > To know for which idle states runtime PM reference counting is needed, > > > let's store the index of deepest idle state for the CPU, in a per CPU > > > variable. This allows psci_cpu_suspend_enter() to compare this index with > > > the requested idle state index and then act accordingly. > > > > I do not see why a system with two CPU CPUidle states, say CPU retention > > and CPU shutdown, should not be calling runtime PM on CPU retention > > entry. > > If the CPU idle governor did select the CPU retention for the CPU, it > was probably because the target residency for the CPU shutdown state > could not be met. The kernel does not know what those cpu states represent, so, this is an assumption you are making and it must be made clear that this code works as long as your assumption is valid. If eg a "cluster" retention state has lower target_residency than the deepest CPU idle state this assumption is wrong. And CPUidle and genPD governor decisions are not synced anyway so, again, this is an assumption, not a certainty. > In this case, there is no point in allowing any other deeper idle > states for cluster/package/system, since those have even greater > residencies, hence calling runtime PM doesn't make sense. On the systems you are testing on. Lorenzo > > The question then is what cluster/package/system states > > are allowed for a given CPU idle state, to understand > > what idle states can be actually entered at any hierarchy > > level given the choice made for the CPU idle state. > > > > In the case above, a CPU entering retention state should prevent > > runtime PM selecting a cluster shutdown state; most likely firmware > > would demote the request to cluster retention but still, we should > > find a way to describe these dependencies. > > See above. > > [...] > > Kind regards > Uffe
On Thu, 18 Jul 2019 at 15:31, Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> wrote: > > On Thu, Jul 18, 2019 at 12:35:07PM +0200, Ulf Hansson wrote: > > On Tue, 16 Jul 2019 at 17:53, Lorenzo Pieralisi > > <lorenzo.pieralisi@arm.com> wrote: > > > > > > On Mon, May 13, 2019 at 09:22:56PM +0200, Ulf Hansson wrote: > > > > When the hierarchical CPU topology layout is used in DT, let's allow the > > > > CPU to be power managed through its PM domain, via deploying runtime PM > > > > support. > > > > > > > > To know for which idle states runtime PM reference counting is needed, > > > > let's store the index of deepest idle state for the CPU, in a per CPU > > > > variable. This allows psci_cpu_suspend_enter() to compare this index with > > > > the requested idle state index and then act accordingly. > > > > > > I do not see why a system with two CPU CPUidle states, say CPU retention > > > and CPU shutdown, should not be calling runtime PM on CPU retention > > > entry. > > > > If the CPU idle governor did select the CPU retention for the CPU, it > > was probably because the target residency for the CPU shutdown state > > could not be met. > > The kernel does not know what those cpu states represent, so, this is an > assumption you are making and it must be made clear that this code works > as long as your assumption is valid. > > If eg a "cluster" retention state has lower target_residency than > the deepest CPU idle state this assumption is wrong. Good point, you are right. I try to find a place to document this assumption. > > And CPUidle and genPD governor decisions are not synced anyway so, > again, this is an assumption, not a certainty. > > > In this case, there is no point in allowing any other deeper idle > > states for cluster/package/system, since those have even greater > > residencies, hence calling runtime PM doesn't make sense. > > On the systems you are testing on. So what you are saying typically means, that if all CPUs in the same cluster have entered the CPU retention state, on some system the cluster may also put into a cluster retention state (assuming the target residency is met)? Do you know of any systems that has these characteristics? [...] Kind regards Uffe
On Thu, Jul 18 2019 at 10:55 -0600, Ulf Hansson wrote: >On Thu, 18 Jul 2019 at 15:31, Lorenzo Pieralisi ><lorenzo.pieralisi@arm.com> wrote: >> >> On Thu, Jul 18, 2019 at 12:35:07PM +0200, Ulf Hansson wrote: >> > On Tue, 16 Jul 2019 at 17:53, Lorenzo Pieralisi >> > <lorenzo.pieralisi@arm.com> wrote: >> > > >> > > On Mon, May 13, 2019 at 09:22:56PM +0200, Ulf Hansson wrote: >> > > > When the hierarchical CPU topology layout is used in DT, let's allow the >> > > > CPU to be power managed through its PM domain, via deploying runtime PM >> > > > support. >> > > > >> > > > To know for which idle states runtime PM reference counting is needed, >> > > > let's store the index of deepest idle state for the CPU, in a per CPU >> > > > variable. This allows psci_cpu_suspend_enter() to compare this index with >> > > > the requested idle state index and then act accordingly. >> > > >> > > I do not see why a system with two CPU CPUidle states, say CPU retention >> > > and CPU shutdown, should not be calling runtime PM on CPU retention >> > > entry. >> > >> > If the CPU idle governor did select the CPU retention for the CPU, it >> > was probably because the target residency for the CPU shutdown state >> > could not be met. >> >> The kernel does not know what those cpu states represent, so, this is an >> assumption you are making and it must be made clear that this code works >> as long as your assumption is valid. >> >> If eg a "cluster" retention state has lower target_residency than >> the deepest CPU idle state this assumption is wrong. > >Good point, you are right. I try to find a place to document this assumption. > >> >> And CPUidle and genPD governor decisions are not synced anyway so, >> again, this is an assumption, not a certainty. >> >> > In this case, there is no point in allowing any other deeper idle >> > states for cluster/package/system, since those have even greater >> > residencies, hence calling runtime PM doesn't make sense. >> >> On the systems you are testing on. > >So what you are saying typically means, that if all CPUs in the same >cluster have entered the CPU retention state, on some system the >cluster may also put into a cluster retention state (assuming the >target residency is met)? > >Do you know of any systems that has these characteristics? > Many QCOM SoCs can do that. But with the hardware improving, the power-performance benefits skew the results in favor of powering off the cluster than keeping the CPU and cluster in retention. Kevin H and I thought of this problem earlier on. But that is a second level problem to solve and definitely to be thought of after we have the support for the deepest states in the kernel. We left that out for a later date. The idea would have been to setup the allowable state(s) in the DT for CPU and cluster state definitions and have the genpd take that into consideration when deciding the idle state for the domain. Thanks, Lina
On Thu, 18 Jul 2019 at 19:41, Lina Iyer <ilina@codeaurora.org> wrote: > > On Thu, Jul 18 2019 at 10:55 -0600, Ulf Hansson wrote: > >On Thu, 18 Jul 2019 at 15:31, Lorenzo Pieralisi > ><lorenzo.pieralisi@arm.com> wrote: > >> > >> On Thu, Jul 18, 2019 at 12:35:07PM +0200, Ulf Hansson wrote: > >> > On Tue, 16 Jul 2019 at 17:53, Lorenzo Pieralisi > >> > <lorenzo.pieralisi@arm.com> wrote: > >> > > > >> > > On Mon, May 13, 2019 at 09:22:56PM +0200, Ulf Hansson wrote: > >> > > > When the hierarchical CPU topology layout is used in DT, let's allow the > >> > > > CPU to be power managed through its PM domain, via deploying runtime PM > >> > > > support. > >> > > > > >> > > > To know for which idle states runtime PM reference counting is needed, > >> > > > let's store the index of deepest idle state for the CPU, in a per CPU > >> > > > variable. This allows psci_cpu_suspend_enter() to compare this index with > >> > > > the requested idle state index and then act accordingly. > >> > > > >> > > I do not see why a system with two CPU CPUidle states, say CPU retention > >> > > and CPU shutdown, should not be calling runtime PM on CPU retention > >> > > entry. > >> > > >> > If the CPU idle governor did select the CPU retention for the CPU, it > >> > was probably because the target residency for the CPU shutdown state > >> > could not be met. > >> > >> The kernel does not know what those cpu states represent, so, this is an > >> assumption you are making and it must be made clear that this code works > >> as long as your assumption is valid. > >> > >> If eg a "cluster" retention state has lower target_residency than > >> the deepest CPU idle state this assumption is wrong. > > > >Good point, you are right. I try to find a place to document this assumption. > > > >> > >> And CPUidle and genPD governor decisions are not synced anyway so, > >> again, this is an assumption, not a certainty. > >> > >> > In this case, there is no point in allowing any other deeper idle > >> > states for cluster/package/system, since those have even greater > >> > residencies, hence calling runtime PM doesn't make sense. > >> > >> On the systems you are testing on. > > > >So what you are saying typically means, that if all CPUs in the same > >cluster have entered the CPU retention state, on some system the > >cluster may also put into a cluster retention state (assuming the > >target residency is met)? > > > >Do you know of any systems that has these characteristics? > > > Many QCOM SoCs can do that. But with the hardware improving, the > power-performance benefits skew the results in favor of powering off > the cluster than keeping the CPU and cluster in retention. > > Kevin H and I thought of this problem earlier on. But that is a second > level problem to solve and definitely to be thought of after we have the > support for the deepest states in the kernel. We left that out for a > later date. The idea would have been to setup the allowable state(s) in > the DT for CPU and cluster state definitions and have the genpd take > that into consideration when deciding the idle state for the domain. Thanks for confirming. This more or less means we need to improve the hierarchical support in genpd to support more levels, such that it makes sense to have a genpd governor assigned at more than one level. This doesn't work well today. As I also have stated, this is on my todo list for genpd. However, I also agree with your standpoint, that let's start simple to enable the deepest state as a start with, then we can improve things on top. Kind regards Uffe
On Thu, Jul 18, 2019 at 11:49:11PM +0200, Ulf Hansson wrote: > On Thu, 18 Jul 2019 at 19:41, Lina Iyer <ilina@codeaurora.org> wrote: > > > > On Thu, Jul 18 2019 at 10:55 -0600, Ulf Hansson wrote: > > >On Thu, 18 Jul 2019 at 15:31, Lorenzo Pieralisi > > ><lorenzo.pieralisi@arm.com> wrote: > > >> > > >> On Thu, Jul 18, 2019 at 12:35:07PM +0200, Ulf Hansson wrote: > > >> > On Tue, 16 Jul 2019 at 17:53, Lorenzo Pieralisi > > >> > <lorenzo.pieralisi@arm.com> wrote: > > >> > > > > >> > > On Mon, May 13, 2019 at 09:22:56PM +0200, Ulf Hansson wrote: > > >> > > > When the hierarchical CPU topology layout is used in DT, let's allow the > > >> > > > CPU to be power managed through its PM domain, via deploying runtime PM > > >> > > > support. > > >> > > > > > >> > > > To know for which idle states runtime PM reference counting is needed, > > >> > > > let's store the index of deepest idle state for the CPU, in a per CPU > > >> > > > variable. This allows psci_cpu_suspend_enter() to compare this index with > > >> > > > the requested idle state index and then act accordingly. > > >> > > > > >> > > I do not see why a system with two CPU CPUidle states, say CPU retention > > >> > > and CPU shutdown, should not be calling runtime PM on CPU retention > > >> > > entry. > > >> > > > >> > If the CPU idle governor did select the CPU retention for the CPU, it > > >> > was probably because the target residency for the CPU shutdown state > > >> > could not be met. > > >> > > >> The kernel does not know what those cpu states represent, so, this is an > > >> assumption you are making and it must be made clear that this code works > > >> as long as your assumption is valid. > > >> > > >> If eg a "cluster" retention state has lower target_residency than > > >> the deepest CPU idle state this assumption is wrong. > > > > > >Good point, you are right. I try to find a place to document this assumption. > > > > > >> > > >> And CPUidle and genPD governor decisions are not synced anyway so, > > >> again, this is an assumption, not a certainty. > > >> > > >> > In this case, there is no point in allowing any other deeper idle > > >> > states for cluster/package/system, since those have even greater > > >> > residencies, hence calling runtime PM doesn't make sense. > > >> > > >> On the systems you are testing on. > > > > > >So what you are saying typically means, that if all CPUs in the same > > >cluster have entered the CPU retention state, on some system the > > >cluster may also put into a cluster retention state (assuming the > > >target residency is met)? > > > > > >Do you know of any systems that has these characteristics? > > > > > Many QCOM SoCs can do that. But with the hardware improving, the > > power-performance benefits skew the results in favor of powering off > > the cluster than keeping the CPU and cluster in retention. > > > > Kevin H and I thought of this problem earlier on. But that is a second > > level problem to solve and definitely to be thought of after we have the > > support for the deepest states in the kernel. We left that out for a > > later date. The idea would have been to setup the allowable state(s) in > > the DT for CPU and cluster state definitions and have the genpd take > > that into consideration when deciding the idle state for the domain. > > Thanks for confirming. > > This more or less means we need to improve the hierarchical support in > genpd to support more levels, such that it makes sense to have a genpd > governor assigned at more than one level. This doesn't work well > today. As I also have stated, this is on my todo list for genpd. > > However, I also agree with your standpoint, that let's start simple to > enable the deepest state as a start with, then we can improve things > on top. How to solve this in the kernel I don't know but please do make sure that the DT bindings allow you to describe what's needed, once they are merged you won't be able to change them and I won't bodge the code to make things fit, so if anything let's focus on getting them right as a matter of priority to get this done please. Thanks, Lorenzo
diff --git a/drivers/firmware/psci/psci.c b/drivers/firmware/psci/psci.c index 54e23d4ed0ea..2c4157d3a616 100644 --- a/drivers/firmware/psci/psci.c +++ b/drivers/firmware/psci/psci.c @@ -20,6 +20,7 @@ #include <linux/linkage.h> #include <linux/of.h> #include <linux/pm.h> +#include <linux/pm_runtime.h> #include <linux/printk.h> #include <linux/psci.h> #include <linux/reboot.h> @@ -298,6 +299,7 @@ static int __init psci_features(u32 psci_func_id) struct psci_cpuidle_data { u32 *psci_states; + u32 rpm_state_id; struct device *dev; }; @@ -385,6 +387,7 @@ static int psci_dt_cpu_init_idle(struct cpuidle_driver *drv, goto free_mem; data->dev = dev; + data->rpm_state_id = drv->state_count - 1; } /* Idle states parsed correctly, store them in the per-cpu struct. */ @@ -481,8 +484,11 @@ static int psci_suspend_finisher(unsigned long index) int psci_cpu_suspend_enter(unsigned long index) { int ret; - u32 *state = __this_cpu_read(psci_cpuidle_data.psci_states); - u32 composite_state = state[index - 1] | psci_get_domain_state(); + struct psci_cpuidle_data *data = this_cpu_ptr(&psci_cpuidle_data); + u32 *states = data->psci_states; + struct device *dev = data->dev; + bool runtime_pm = (dev && data->rpm_state_id == index); + u32 composite_state; /* * idle state index 0 corresponds to wfi, should never be called @@ -491,11 +497,23 @@ int psci_cpu_suspend_enter(unsigned long index) if (WARN_ON_ONCE(!index)) return -EINVAL; + /* + * Do runtime PM if we are using the hierarchical CPU toplogy, but only + * when cpuidle have selected the deepest idle state for the CPU. + */ + if (runtime_pm) + pm_runtime_put_sync_suspend(dev); + + composite_state = states[index - 1] | psci_get_domain_state(); + if (!psci_power_state_loses_context(composite_state)) ret = psci_ops.cpu_suspend(composite_state, 0); else ret = cpu_suspend(index, psci_suspend_finisher); + if (runtime_pm) + pm_runtime_get_sync(dev); + /* Clear the domain state to start fresh when back from idle. */ psci_set_domain_state(0);
When the hierarchical CPU topology layout is used in DT, let's allow the CPU to be power managed through its PM domain, via deploying runtime PM support. To know for which idle states runtime PM reference counting is needed, let's store the index of deepest idle state for the CPU, in a per CPU variable. This allows psci_cpu_suspend_enter() to compare this index with the requested idle state index and then act accordingly. Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> --- Changes: - Simplify the code by using the new per CPU struct, that stores the needed struct device*. --- drivers/firmware/psci/psci.c | 22 ++++++++++++++++++++-- 1 file changed, 20 insertions(+), 2 deletions(-)