Message ID | 1447799871-56374-19-git-send-email-lina.iyer@linaro.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Tue, Nov 17, 2015 at 03:37:42PM -0700, Lina Iyer wrote: > A PM domain comprising of CPUs may be powered off when all the CPUs in > the domain are powered down. Powering down a CPU domain is generally a > expensive operation and therefore the power performance trade offs > should be considered. The time between the last CPU powering down and > the first CPU powering up in a domain, is the time available for the > domain to sleep. Ideally, the sleep time of the domain should fulfill > the residency requirement of the domains' idle state. > > To do this effectively, read the time before the wakeup of the cluster's > CPUs and ensure that the domain's idle state sleep time guarantees the > QoS requirements of each of the CPU, the PM QoS CPU_DMA_LATENCY and the > state's residency. To me this information should be part of the CPUidle governor (it is already there), we should not split the decision into multiple layers. The problem you are facing is that the CPUidle governor(s) do not take cross cpus relationship into account, I do not think that adding another decision layer in the power domain subsystem helps, you are doing that just because adding it to the existing CPUidle governor(s) is invasive. Why can't we use the power domain work you put together to eg disable idle states that share multiple cpus and make them "visible" only when the power domain that encompass them is actually going down ? You could use the power domains information to detect states that are shared between cpus. It is just an idea, what I am saying is that having another governor in the power domain subsytem does not make much sense, you split the decision in two layers while there is actually one, the existing CPUidle governor and that's where the decision should be taken. Thoughts appreciated. Lorenzo > Signed-off-by: Lina Iyer <lina.iyer@linaro.org> > --- > drivers/base/power/cpu-pd.c | 83 ++++++++++++++++++++++++++++++++++++++++++++- > 1 file changed, 82 insertions(+), 1 deletion(-) > > diff --git a/drivers/base/power/cpu-pd.c b/drivers/base/power/cpu-pd.c > index 617ce54..a00abc1 100644 > --- a/drivers/base/power/cpu-pd.c > +++ b/drivers/base/power/cpu-pd.c > @@ -21,6 +21,7 @@ > #include <linux/pm_qos.h> > #include <linux/rculist.h> > #include <linux/slab.h> > +#include <linux/tick.h> > > #define CPU_PD_NAME_MAX 36 > > @@ -66,6 +67,86 @@ static void get_cpus_in_domain(struct generic_pm_domain *genpd, > } > } > > +static bool cpu_pd_down_ok(struct dev_pm_domain *pd) > +{ > + struct generic_pm_domain *genpd = pd_to_genpd(pd); > + struct cpu_pm_domain *cpu_pd = to_cpu_pd(genpd); > + int qos = pm_qos_request(PM_QOS_CPU_DMA_LATENCY); > + u64 sleep_ns = ~0; > + ktime_t earliest; > + int cpu; > + int i; > + > + /* Reset the last set genpd state, default to index 0 */ > + genpd->state_idx = 0; > + > + /* We dont want to power down, if QoS is 0 */ > + if (!qos) > + return false; > + > + /* > + * Find the sleep time for the cluster. > + * The time between now and the first wake up of any CPU that > + * are in this domain hierarchy is the time available for the > + * domain to be idle. > + */ > + earliest.tv64 = KTIME_MAX; > + for_each_cpu_and(cpu, cpu_pd->cpus, cpu_online_mask) { > + struct device *cpu_dev = get_cpu_device(cpu); > + struct gpd_timing_data *td; > + > + td = &dev_gpd_data(cpu_dev)->td; > + > + if (earliest.tv64 < td->next_wakeup.tv64) > + earliest = td->next_wakeup; > + } > + > + sleep_ns = ktime_to_ns(ktime_sub(earliest, ktime_get())); > + if (sleep_ns <= 0) > + return false; > + > + /* > + * Find the deepest sleep state that satisfies the residency > + * requirement and the QoS constraint > + */ > + for (i = genpd->state_count - 1; i > 0; i--) { > + u64 state_sleep_ns; > + > + state_sleep_ns = genpd->states[i].power_off_latency_ns + > + genpd->states[i].power_on_latency_ns + > + genpd->states[i].residency_ns; > + > + /* > + * If we cant sleep to save power in the state, move on > + * to the next lower idle state. > + */ > + if (state_sleep_ns > sleep_ns) > + continue; > + > + /* > + * We also dont want to sleep more than we should to > + * gaurantee QoS. > + */ > + if (state_sleep_ns < (qos * NSEC_PER_USEC)) > + break; > + } > + > + if (i >= 0) > + genpd->state_idx = i; > + > + return (i >= 0) ? true : false; > +} > + > +static bool cpu_stop_ok(struct device *dev) > +{ > + return true; > +} > + > +struct dev_power_governor cpu_pd_gov = { > + .power_down_ok = cpu_pd_down_ok, > + .stop_ok = cpu_stop_ok, > +}; > + > static int cpu_pd_power_off(struct generic_pm_domain *genpd) > { > struct cpu_pm_domain *pd = to_cpu_pd(genpd); > @@ -183,7 +264,7 @@ int of_register_cpu_pm_domain(struct device_node *dn, > > /* Register the CPU genpd */ > pr_debug("adding %s as CPU PM domain.\n", pd->genpd->name); > - ret = of_pm_genpd_init(dn, pd->genpd, &simple_qos_governor, false); > + ret = of_pm_genpd_init(dn, pd->genpd, &cpu_pd_gov, false); > if (ret) { > pr_err("Unable to initialize domain %s\n", dn->full_name); > return ret; > -- > 2.1.4 >
On 18/11/2015 19:42, Lorenzo Pieralisi wrote: > On Tue, Nov 17, 2015 at 03:37:42PM -0700, Lina Iyer wrote: >> A PM domain comprising of CPUs may be powered off when all the CPUs in >> the domain are powered down. Powering down a CPU domain is generally a >> expensive operation and therefore the power performance trade offs >> should be considered. The time between the last CPU powering down and >> the first CPU powering up in a domain, is the time available for the >> domain to sleep. Ideally, the sleep time of the domain should fulfill >> the residency requirement of the domains' idle state. >> >> To do this effectively, read the time before the wakeup of the cluster's >> CPUs and ensure that the domain's idle state sleep time guarantees the >> QoS requirements of each of the CPU, the PM QoS CPU_DMA_LATENCY and the >> state's residency. > > To me this information should be part of the CPUidle governor (it is > already there), we should not split the decision into multiple layers. > > The problem you are facing is that the CPUidle governor(s) do not take > cross cpus relationship into account, I do not think that adding another > decision layer in the power domain subsystem helps, you are doing that > just because adding it to the existing CPUidle governor(s) is invasive. > > Why can't we use the power domain work you put together to eg disable > idle states that share multiple cpus and make them "visible" only > when the power domain that encompass them is actually going down ? > > You could use the power domains information to detect states that > are shared between cpus. > > It is just an idea, what I am saying is that having another governor in > the power domain subsytem does not make much sense, you split the > decision in two layers while there is actually one, the existing > CPUidle governor and that's where the decision should be taken. > > Thoughts appreciated. Maybe this is silly and not thought-through, but I wonder if the responsibilities could be split or instance with an outer control loop that has the heuristic to compute the next tick time, and the required cpu-power needed during that time slot, and an inner control loop (genpd) that has a per-domain QoS and can optimize power consumption. Marc. > > Lorenzo > >> Signed-off-by: Lina Iyer <lina.iyer@linaro.org> >> --- >> drivers/base/power/cpu-pd.c | 83 ++++++++++++++++++++++++++++++++++++++++++++- >> 1 file changed, 82 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/base/power/cpu-pd.c b/drivers/base/power/cpu-pd.c >> index 617ce54..a00abc1 100644 >> --- a/drivers/base/power/cpu-pd.c >> +++ b/drivers/base/power/cpu-pd.c >> @@ -21,6 +21,7 @@ >> #include <linux/pm_qos.h> >> #include <linux/rculist.h> >> #include <linux/slab.h> >> +#include <linux/tick.h> >> >> #define CPU_PD_NAME_MAX 36 >> >> @@ -66,6 +67,86 @@ static void get_cpus_in_domain(struct generic_pm_domain *genpd, >> } >> } >> >> +static bool cpu_pd_down_ok(struct dev_pm_domain *pd) >> +{ >> + struct generic_pm_domain *genpd = pd_to_genpd(pd); >> + struct cpu_pm_domain *cpu_pd = to_cpu_pd(genpd); >> + int qos = pm_qos_request(PM_QOS_CPU_DMA_LATENCY); >> + u64 sleep_ns = ~0; >> + ktime_t earliest; >> + int cpu; >> + int i; >> + >> + /* Reset the last set genpd state, default to index 0 */ >> + genpd->state_idx = 0; >> + >> + /* We dont want to power down, if QoS is 0 */ >> + if (!qos) >> + return false; >> + >> + /* >> + * Find the sleep time for the cluster. >> + * The time between now and the first wake up of any CPU that >> + * are in this domain hierarchy is the time available for the >> + * domain to be idle. >> + */ >> + earliest.tv64 = KTIME_MAX; >> + for_each_cpu_and(cpu, cpu_pd->cpus, cpu_online_mask) { >> + struct device *cpu_dev = get_cpu_device(cpu); >> + struct gpd_timing_data *td; >> + >> + td = &dev_gpd_data(cpu_dev)->td; >> + >> + if (earliest.tv64 < td->next_wakeup.tv64) >> + earliest = td->next_wakeup; >> + } >> + >> + sleep_ns = ktime_to_ns(ktime_sub(earliest, ktime_get())); >> + if (sleep_ns <= 0) >> + return false; >> + >> + /* >> + * Find the deepest sleep state that satisfies the residency >> + * requirement and the QoS constraint >> + */ >> + for (i = genpd->state_count - 1; i > 0; i--) { >> + u64 state_sleep_ns; >> + >> + state_sleep_ns = genpd->states[i].power_off_latency_ns + >> + genpd->states[i].power_on_latency_ns + >> + genpd->states[i].residency_ns; >> + >> + /* >> + * If we cant sleep to save power in the state, move on >> + * to the next lower idle state. >> + */ >> + if (state_sleep_ns > sleep_ns) >> + continue; >> + >> + /* >> + * We also dont want to sleep more than we should to >> + * gaurantee QoS. >> + */ >> + if (state_sleep_ns < (qos * NSEC_PER_USEC)) >> + break; >> + } >> + >> + if (i >= 0) >> + genpd->state_idx = i; >> + >> + return (i >= 0) ? true : false; >> +} >> + >> +static bool cpu_stop_ok(struct device *dev) >> +{ >> + return true; >> +} >> + >> +struct dev_power_governor cpu_pd_gov = { >> + .power_down_ok = cpu_pd_down_ok, >> + .stop_ok = cpu_stop_ok, >> +}; >> + >> static int cpu_pd_power_off(struct generic_pm_domain *genpd) >> { >> struct cpu_pm_domain *pd = to_cpu_pd(genpd); >> @@ -183,7 +264,7 @@ int of_register_cpu_pm_domain(struct device_node *dn, >> >> /* Register the CPU genpd */ >> pr_debug("adding %s as CPU PM domain.\n", pd->genpd->name); >> - ret = of_pm_genpd_init(dn, pd->genpd, &simple_qos_governor, false); >> + ret = of_pm_genpd_init(dn, pd->genpd, &cpu_pd_gov, false); >> if (ret) { >> pr_err("Unable to initialize domain %s\n", dn->full_name); >> return ret; >> -- >> 2.1.4 >>
Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> writes: > On Tue, Nov 17, 2015 at 03:37:42PM -0700, Lina Iyer wrote: >> A PM domain comprising of CPUs may be powered off when all the CPUs in >> the domain are powered down. Powering down a CPU domain is generally a >> expensive operation and therefore the power performance trade offs >> should be considered. The time between the last CPU powering down and >> the first CPU powering up in a domain, is the time available for the >> domain to sleep. Ideally, the sleep time of the domain should fulfill >> the residency requirement of the domains' idle state. >> >> To do this effectively, read the time before the wakeup of the cluster's >> CPUs and ensure that the domain's idle state sleep time guarantees the >> QoS requirements of each of the CPU, the PM QoS CPU_DMA_LATENCY and the >> state's residency. > > To me this information should be part of the CPUidle governor (it is > already there), we should not split the decision into multiple layers. > > The problem you are facing is that the CPUidle governor(s) do not take > cross cpus relationship into account, I do not think that adding another > decision layer in the power domain subsystem helps, you are doing that > just because adding it to the existing CPUidle governor(s) is invasive. > > Why can't we use the power domain work you put together to eg disable > idle states that share multiple cpus and make them "visible" only > when the power domain that encompass them is actually going down ? > > You could use the power domains information to detect states that > are shared between cpus. > > It is just an idea, what I am saying is that having another governor in > the power domain subsytem does not make much sense, you split the > decision in two layers while there is actually one, the existing > CPUidle governor and that's where the decision should be taken. Hmm, considering "normal" devices in "normal" power domains, and following the same logic, the equivalent would be to say that the decision to gate the power domain belongs to the individual drivers in the domain instead of in the power domain layer. I disagree. IMO, there are different decision layers because there are different hardware layers. Devices (including CPUs) are reponsible for handling device-local idle states, based on device-local conditions (e.g. local wakeups, timers, etc.) and domains are responsible for handling decisions based on conditions of the whole domain. Kevin
Lina Iyer <lina.iyer@linaro.org> writes: > A PM domain comprising of CPUs may be powered off when all the CPUs in > the domain are powered down. Powering down a CPU domain is generally a > expensive operation and therefore the power performance trade offs > should be considered. The time between the last CPU powering down and > the first CPU powering up in a domain, is the time available for the > domain to sleep. Ideally, the sleep time of the domain should fulfill > the residency requirement of the domains' idle state. > > To do this effectively, read the time before the wakeup of the cluster's > CPUs and ensure that the domain's idle state sleep time guarantees the > QoS requirements of each of the CPU, the PM QoS CPU_DMA_LATENCY and the > state's residency. > > Signed-off-by: Lina Iyer <lina.iyer@linaro.org> [...] > +static bool cpu_stop_ok(struct device *dev) > +{ > + return true; > +} > + > +struct dev_power_governor cpu_pd_gov = { > + .power_down_ok = cpu_pd_down_ok, > + .stop_ok = cpu_stop_ok, > +}; If stop_ok is unconditionally true, it should probably just be removed (IOW cpu_pd_gov->stop_ok == NULL), and that will avoid an unnecessary function call. Kevin
On Thu, Nov 19, 2015 at 03:52:13PM -0800, Kevin Hilman wrote: > Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> writes: > > > On Tue, Nov 17, 2015 at 03:37:42PM -0700, Lina Iyer wrote: > >> A PM domain comprising of CPUs may be powered off when all the CPUs in > >> the domain are powered down. Powering down a CPU domain is generally a > >> expensive operation and therefore the power performance trade offs > >> should be considered. The time between the last CPU powering down and > >> the first CPU powering up in a domain, is the time available for the > >> domain to sleep. Ideally, the sleep time of the domain should fulfill > >> the residency requirement of the domains' idle state. > >> > >> To do this effectively, read the time before the wakeup of the cluster's > >> CPUs and ensure that the domain's idle state sleep time guarantees the > >> QoS requirements of each of the CPU, the PM QoS CPU_DMA_LATENCY and the > >> state's residency. > > > > To me this information should be part of the CPUidle governor (it is > > already there), we should not split the decision into multiple layers. > > > > The problem you are facing is that the CPUidle governor(s) do not take > > cross cpus relationship into account, I do not think that adding another > > decision layer in the power domain subsystem helps, you are doing that > > just because adding it to the existing CPUidle governor(s) is invasive. > > > > Why can't we use the power domain work you put together to eg disable > > idle states that share multiple cpus and make them "visible" only > > when the power domain that encompass them is actually going down ? > > > > You could use the power domains information to detect states that > > are shared between cpus. > > > > It is just an idea, what I am saying is that having another governor in > > the power domain subsytem does not make much sense, you split the > > decision in two layers while there is actually one, the existing > > CPUidle governor and that's where the decision should be taken. > > Hmm, considering "normal" devices in "normal" power domains, and > following the same logic, the equivalent would be to say that the > decision to gate the power domain belongs to the individual drivers > in the domain instead of in the power domain layer. I disagree. > > IMO, there are different decision layers because there are different > hardware layers. Devices (including CPUs) are reponsible for handling > device-local idle states, based on device-local conditions (e.g. local > wakeups, timers, etc.) and domains are responsible for handling > decisions based on conditions of the whole domain. After going through the series for the second time (it is quite complex and should probably be split) I understood your point of view and I agree with it, I will review it more in-depth to understand the details. One thing that is not clear to me is how we would end up handling cluster states in platform coordinated mode with this series (and I am actually referring to the data we would add in the idle-states, such as min-residency). I admit that data for cluster states at present is not extremely well defined, because we have to add latencies for the cluster state even if the state itself may be just a cpu one (by definition a cluster state is entered only if all cpus in the cluster enter it, otherwise FW or power controller demote them automatically). I would like to take this series as an opportunity to improve the current situation in a clean way (and without changing the bindings, only augmenting them). On a side note, I think we should give up the concept of cluster entirely, to me they are just a group of cpus, I do not see any reason why we should group cpus this way and I do not like the dependencies of this series on the cpu-map either, I do not see the reason but I will go through code again to make sure I am not missing anything. To be clear, to me the cpumask should be created with all cpus belonging in a given power domain, no cluster dependency (and yes the CPU PM notifiers are not appropriate at present - eg on cpu_cluster_pm_{enter/exit} we save and restore the GIC distributor state even on multi-cluster systems, that's useless and has no connection with the real power domain topology at all, so the concept of cluster as it stands is shaky to say the least). Thanks, Lorenzo
On Fri, Nov 20 2015 at 09:20 -0700, Lorenzo Pieralisi wrote: >On Thu, Nov 19, 2015 at 03:52:13PM -0800, Kevin Hilman wrote: >> Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> writes: >> >> > On Tue, Nov 17, 2015 at 03:37:42PM -0700, Lina Iyer wrote: >> >> A PM domain comprising of CPUs may be powered off when all the CPUs in >> >> the domain are powered down. Powering down a CPU domain is generally a >> >> expensive operation and therefore the power performance trade offs >> >> should be considered. The time between the last CPU powering down and >> >> the first CPU powering up in a domain, is the time available for the >> >> domain to sleep. Ideally, the sleep time of the domain should fulfill >> >> the residency requirement of the domains' idle state. >> >> >> >> To do this effectively, read the time before the wakeup of the cluster's >> >> CPUs and ensure that the domain's idle state sleep time guarantees the >> >> QoS requirements of each of the CPU, the PM QoS CPU_DMA_LATENCY and the >> >> state's residency. >> > >> > To me this information should be part of the CPUidle governor (it is >> > already there), we should not split the decision into multiple layers. >> > >> > The problem you are facing is that the CPUidle governor(s) do not take >> > cross cpus relationship into account, I do not think that adding another >> > decision layer in the power domain subsystem helps, you are doing that >> > just because adding it to the existing CPUidle governor(s) is invasive. >> > >> > Why can't we use the power domain work you put together to eg disable >> > idle states that share multiple cpus and make them "visible" only >> > when the power domain that encompass them is actually going down ? >> > >> > You could use the power domains information to detect states that >> > are shared between cpus. >> > >> > It is just an idea, what I am saying is that having another governor in >> > the power domain subsytem does not make much sense, you split the >> > decision in two layers while there is actually one, the existing >> > CPUidle governor and that's where the decision should be taken. >> >> Hmm, considering "normal" devices in "normal" power domains, and >> following the same logic, the equivalent would be to say that the >> decision to gate the power domain belongs to the individual drivers >> in the domain instead of in the power domain layer. I disagree. >> >> IMO, there are different decision layers because there are different >> hardware layers. Devices (including CPUs) are reponsible for handling >> device-local idle states, based on device-local conditions (e.g. local >> wakeups, timers, etc.) and domains are responsible for handling >> decisions based on conditions of the whole domain. > >After going through the series for the second time (it is quite complex and >should probably be split) I understood your point of view and I agree with >it, I will review it more in-depth to understand the details. > I have included patches from Axel and Marc, so as to get a complete picture. My core changes are in genpd, cpu-pd and psci.c >One thing that is not clear to me is how we would end up handling >cluster states in platform coordinated mode with this series (and >I am actually referring to the data we would add in the idle-states, >such as min-residency). > From what I see, the platform coordinated mode, doesnt need any of this. We are fine as it is today. CPUs vote for the cluster state they can enter and the f/w determines based on these votes. It makes sense and probably easier to flatten out the cluster states and attach them to cpuidle for that. I couldnt find a symmetry with OS initated. May be it deserves more discussion and brain storming. >I admit that data for cluster states at present >is not extremely well defined, because we have to add latencies for >the cluster state even if the state itself may be just a cpu one (by >definition a cluster state is entered only if all cpus in the cluster >enter it, otherwise FW or power controller demote them automatically). > >I would like to take this series as an opportunity to improve the >current situation in a clean way (and without changing the bindings, >only augmenting them). > >On a side note, I think we should give up the concept of cluster >entirely, to me they are just a group of cpus, I do not see any reason >why we should group cpus this way and I do not like the dependencies >of this series on the cpu-map either, I do not see the reason but I >will go through code again to make sure I am not missing anything. > SoC's could have different organization of CPUs (clubbed as clusters) and power domains the power thesee clusters. This information has to come from the DT. Since there are no actual devices in linux for domain management (with PSCI), I have added them to cpu-map, which already builds up the cluster hierarchy. The only addition I had to make wa allow these cluster nodes to be tell the kernel that they are domain providers. >To be clear, to me the cpumask should be created with all cpus belonging >in a given power domain, no cluster dependency (and yes the CPU PM >notifiers are not appropriate at present - eg on >cpu_cluster_pm_{enter/exit} we save and restore the GIC distributor state >even on multi-cluster systems, that's useless and has no connection with >the real power domain topology at all, so the concept of cluster as it >stands is shaky to say the least). > Lets discuss this more. I am interested in what you are thinking, will let you go through the code. Thanks for you time Lorenzo. -- Lina
On Thu, Nov 19 2015 at 01:50 -0700, Marc Titinger wrote: >On 18/11/2015 19:42, Lorenzo Pieralisi wrote: >>On Tue, Nov 17, 2015 at 03:37:42PM -0700, Lina Iyer wrote: >>>A PM domain comprising of CPUs may be powered off when all the CPUs in >>>the domain are powered down. Powering down a CPU domain is generally a >>>expensive operation and therefore the power performance trade offs >>>should be considered. The time between the last CPU powering down and >>>the first CPU powering up in a domain, is the time available for the >>>domain to sleep. Ideally, the sleep time of the domain should fulfill >>>the residency requirement of the domains' idle state. >>> >>>To do this effectively, read the time before the wakeup of the cluster's >>>CPUs and ensure that the domain's idle state sleep time guarantees the >>>QoS requirements of each of the CPU, the PM QoS CPU_DMA_LATENCY and the >>>state's residency. >> >>To me this information should be part of the CPUidle governor (it is >>already there), we should not split the decision into multiple layers. >> >>The problem you are facing is that the CPUidle governor(s) do not take >>cross cpus relationship into account, I do not think that adding another >>decision layer in the power domain subsystem helps, you are doing that >>just because adding it to the existing CPUidle governor(s) is invasive. >> >>Why can't we use the power domain work you put together to eg disable >>idle states that share multiple cpus and make them "visible" only >>when the power domain that encompass them is actually going down ? >> >>You could use the power domains information to detect states that >>are shared between cpus. >> >>It is just an idea, what I am saying is that having another governor in >>the power domain subsytem does not make much sense, you split the >>decision in two layers while there is actually one, the existing >>CPUidle governor and that's where the decision should be taken. >> >>Thoughts appreciated. > >Maybe this is silly and not thought-through, but I wonder if the >responsibilities could be split or instance with an outer control loop >that has the heuristic to compute the next tick time, and the required >cpu-power needed during that time slot, and an inner control loop >(genpd) that has a per-domain QoS and can optimize power consumption. > Not sure I understand everything you said, but the heuristics across a bunch of CPUs can be very erratic. Its hard enough for menu governor to determine heuristics on a per-cpu basis. I governor in this patch already takes care of PM QoS, but does not do a per-cpu QoS. We should discuss this more. -- Lina >Marc. > >> >>Lorenzo >> >>>Signed-off-by: Lina Iyer <lina.iyer@linaro.org> >>>--- >>> drivers/base/power/cpu-pd.c | 83 ++++++++++++++++++++++++++++++++++++++++++++- >>> 1 file changed, 82 insertions(+), 1 deletion(-) >>> >>>diff --git a/drivers/base/power/cpu-pd.c b/drivers/base/power/cpu-pd.c >>>index 617ce54..a00abc1 100644 >>>--- a/drivers/base/power/cpu-pd.c >>>+++ b/drivers/base/power/cpu-pd.c >>>@@ -21,6 +21,7 @@ >>> #include <linux/pm_qos.h> >>> #include <linux/rculist.h> >>> #include <linux/slab.h> >>>+#include <linux/tick.h> >>> >>> #define CPU_PD_NAME_MAX 36 >>> >>>@@ -66,6 +67,86 @@ static void get_cpus_in_domain(struct generic_pm_domain *genpd, >>> } >>> } >>> >>>+static bool cpu_pd_down_ok(struct dev_pm_domain *pd) >>>+{ >>>+ struct generic_pm_domain *genpd = pd_to_genpd(pd); >>>+ struct cpu_pm_domain *cpu_pd = to_cpu_pd(genpd); >>>+ int qos = pm_qos_request(PM_QOS_CPU_DMA_LATENCY); >>>+ u64 sleep_ns = ~0; >>>+ ktime_t earliest; >>>+ int cpu; >>>+ int i; >>>+ >>>+ /* Reset the last set genpd state, default to index 0 */ >>>+ genpd->state_idx = 0; >>>+ >>>+ /* We dont want to power down, if QoS is 0 */ >>>+ if (!qos) >>>+ return false; >>>+ >>>+ /* >>>+ * Find the sleep time for the cluster. >>>+ * The time between now and the first wake up of any CPU that >>>+ * are in this domain hierarchy is the time available for the >>>+ * domain to be idle. >>>+ */ >>>+ earliest.tv64 = KTIME_MAX; >>>+ for_each_cpu_and(cpu, cpu_pd->cpus, cpu_online_mask) { >>>+ struct device *cpu_dev = get_cpu_device(cpu); >>>+ struct gpd_timing_data *td; >>>+ >>>+ td = &dev_gpd_data(cpu_dev)->td; >>>+ >>>+ if (earliest.tv64 < td->next_wakeup.tv64) >>>+ earliest = td->next_wakeup; >>>+ } >>>+ >>>+ sleep_ns = ktime_to_ns(ktime_sub(earliest, ktime_get())); >>>+ if (sleep_ns <= 0) >>>+ return false; >>>+ >>>+ /* >>>+ * Find the deepest sleep state that satisfies the residency >>>+ * requirement and the QoS constraint >>>+ */ >>>+ for (i = genpd->state_count - 1; i > 0; i--) { >>>+ u64 state_sleep_ns; >>>+ >>>+ state_sleep_ns = genpd->states[i].power_off_latency_ns + >>>+ genpd->states[i].power_on_latency_ns + >>>+ genpd->states[i].residency_ns; >>>+ >>>+ /* >>>+ * If we cant sleep to save power in the state, move on >>>+ * to the next lower idle state. >>>+ */ >>>+ if (state_sleep_ns > sleep_ns) >>>+ continue; >>>+ >>>+ /* >>>+ * We also dont want to sleep more than we should to >>>+ * gaurantee QoS. >>>+ */ >>>+ if (state_sleep_ns < (qos * NSEC_PER_USEC)) >>>+ break; >>>+ } >>>+ >>>+ if (i >= 0) >>>+ genpd->state_idx = i; >>>+ >>>+ return (i >= 0) ? true : false; >>>+} >>>+ >>>+static bool cpu_stop_ok(struct device *dev) >>>+{ >>>+ return true; >>>+} >>>+ >>>+struct dev_power_governor cpu_pd_gov = { >>>+ .power_down_ok = cpu_pd_down_ok, >>>+ .stop_ok = cpu_stop_ok, >>>+}; >>>+ >>> static int cpu_pd_power_off(struct generic_pm_domain *genpd) >>> { >>> struct cpu_pm_domain *pd = to_cpu_pd(genpd); >>>@@ -183,7 +264,7 @@ int of_register_cpu_pm_domain(struct device_node *dn, >>> >>> /* Register the CPU genpd */ >>> pr_debug("adding %s as CPU PM domain.\n", pd->genpd->name); >>>- ret = of_pm_genpd_init(dn, pd->genpd, &simple_qos_governor, false); >>>+ ret = of_pm_genpd_init(dn, pd->genpd, &cpu_pd_gov, false); >>> if (ret) { >>> pr_err("Unable to initialize domain %s\n", dn->full_name); >>> return ret; >>>-- >>>2.1.4 >>> >
diff --git a/drivers/base/power/cpu-pd.c b/drivers/base/power/cpu-pd.c index 617ce54..a00abc1 100644 --- a/drivers/base/power/cpu-pd.c +++ b/drivers/base/power/cpu-pd.c @@ -21,6 +21,7 @@ #include <linux/pm_qos.h> #include <linux/rculist.h> #include <linux/slab.h> +#include <linux/tick.h> #define CPU_PD_NAME_MAX 36 @@ -66,6 +67,86 @@ static void get_cpus_in_domain(struct generic_pm_domain *genpd, } } +static bool cpu_pd_down_ok(struct dev_pm_domain *pd) +{ + struct generic_pm_domain *genpd = pd_to_genpd(pd); + struct cpu_pm_domain *cpu_pd = to_cpu_pd(genpd); + int qos = pm_qos_request(PM_QOS_CPU_DMA_LATENCY); + u64 sleep_ns = ~0; + ktime_t earliest; + int cpu; + int i; + + /* Reset the last set genpd state, default to index 0 */ + genpd->state_idx = 0; + + /* We dont want to power down, if QoS is 0 */ + if (!qos) + return false; + + /* + * Find the sleep time for the cluster. + * The time between now and the first wake up of any CPU that + * are in this domain hierarchy is the time available for the + * domain to be idle. + */ + earliest.tv64 = KTIME_MAX; + for_each_cpu_and(cpu, cpu_pd->cpus, cpu_online_mask) { + struct device *cpu_dev = get_cpu_device(cpu); + struct gpd_timing_data *td; + + td = &dev_gpd_data(cpu_dev)->td; + + if (earliest.tv64 < td->next_wakeup.tv64) + earliest = td->next_wakeup; + } + + sleep_ns = ktime_to_ns(ktime_sub(earliest, ktime_get())); + if (sleep_ns <= 0) + return false; + + /* + * Find the deepest sleep state that satisfies the residency + * requirement and the QoS constraint + */ + for (i = genpd->state_count - 1; i > 0; i--) { + u64 state_sleep_ns; + + state_sleep_ns = genpd->states[i].power_off_latency_ns + + genpd->states[i].power_on_latency_ns + + genpd->states[i].residency_ns; + + /* + * If we cant sleep to save power in the state, move on + * to the next lower idle state. + */ + if (state_sleep_ns > sleep_ns) + continue; + + /* + * We also dont want to sleep more than we should to + * gaurantee QoS. + */ + if (state_sleep_ns < (qos * NSEC_PER_USEC)) + break; + } + + if (i >= 0) + genpd->state_idx = i; + + return (i >= 0) ? true : false; +} + +static bool cpu_stop_ok(struct device *dev) +{ + return true; +} + +struct dev_power_governor cpu_pd_gov = { + .power_down_ok = cpu_pd_down_ok, + .stop_ok = cpu_stop_ok, +}; + static int cpu_pd_power_off(struct generic_pm_domain *genpd) { struct cpu_pm_domain *pd = to_cpu_pd(genpd); @@ -183,7 +264,7 @@ int of_register_cpu_pm_domain(struct device_node *dn, /* Register the CPU genpd */ pr_debug("adding %s as CPU PM domain.\n", pd->genpd->name); - ret = of_pm_genpd_init(dn, pd->genpd, &simple_qos_governor, false); + ret = of_pm_genpd_init(dn, pd->genpd, &cpu_pd_gov, false); if (ret) { pr_err("Unable to initialize domain %s\n", dn->full_name); return ret;
A PM domain comprising of CPUs may be powered off when all the CPUs in the domain are powered down. Powering down a CPU domain is generally a expensive operation and therefore the power performance trade offs should be considered. The time between the last CPU powering down and the first CPU powering up in a domain, is the time available for the domain to sleep. Ideally, the sleep time of the domain should fulfill the residency requirement of the domains' idle state. To do this effectively, read the time before the wakeup of the cluster's CPUs and ensure that the domain's idle state sleep time guarantees the QoS requirements of each of the CPU, the PM QoS CPU_DMA_LATENCY and the state's residency. Signed-off-by: Lina Iyer <lina.iyer@linaro.org> --- drivers/base/power/cpu-pd.c | 83 ++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 82 insertions(+), 1 deletion(-)