diff mbox

[RFC,3/3] PM / Domains: Introduce generic PM domain for cpu domain

Message ID 1433456946-53296-4-git-send-email-lina.iyer@linaro.org (mailing list archive)
State New, archived
Headers show

Commit Message

Lina Iyer June 4, 2015, 10:29 p.m. UTC
Generally cpus are grouped under a power domain in a SoC. When all cpus
in the domain are in their power off state, the cpu domain can also be
powered off. Genpd provides the framework for defining cpus as devices
that are part of a cpu domain.

Introduce support for defining and adding a generic power domain for the
cpus based on the DT specification of power domain providers and
consumers.  SoC's that have the cpu domain defined in their DT, can
setup a genpd with a name and the power_on/power_off callbacks. Calling
pm_cpu_domain_init() will register the genpd and attach the cpus for
this domain with the genpd.

CPU_PM notifications for are used to pm_runtime_get_sync() and
pm_runtime_put_sync() for each cpu.  When all cpus are powered off, the
last cpu going down would call the genpd->power_off(). Correspondingly,
the first cpu up would call the genpd->power_on() callback before
resuming from idle.

Cc: Ulf Hansson <ulf.hansson@linaro.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Kevin Hilman <khilman@linaro.org>
Signed-off-by: Lina Iyer <lina.iyer@linaro.org>
---
 drivers/base/power/Makefile     |   1 +
 drivers/base/power/cpu_domain.c | 187 ++++++++++++++++++++++++++++++++++++++++
 include/linux/pm_domain.h       |  12 +++
 kernel/power/Kconfig            |  12 +++
 4 files changed, 212 insertions(+)
 create mode 100644 drivers/base/power/cpu_domain.c

Comments

Krzysztof Kozlowski June 7, 2015, 9:42 a.m. UTC | #1
W dniu 05.06.2015 o 07:29, Lina Iyer pisze:
> Generally cpus are grouped under a power domain in a SoC. When all cpus
> in the domain are in their power off state,

What do you exactly mean here by "CPU in power off state"? How does it
map to kernel understanding of CPU device (hotplug? cpuidle?)?

> the cpu domain can also be
> powered off. Genpd provides the framework for defining cpus as devices
> that are part of a cpu domain.

The problem which is solved looks to me like the same problem which
coupled cpuidle tried to solve: a certain deep sleep mode (e.g. power
off) can be entered when whole cluster is idle or other CPUs in cluster
are powered off completely.

It seems a little like duplicating the effort around coupled cpuidle.

Best regards,
Krzysztof

> 
> Introduce support for defining and adding a generic power domain for the
> cpus based on the DT specification of power domain providers and
> consumers.  SoC's that have the cpu domain defined in their DT, can
> setup a genpd with a name and the power_on/power_off callbacks. Calling
> pm_cpu_domain_init() will register the genpd and attach the cpus for
> this domain with the genpd.
> 
> CPU_PM notifications for are used to pm_runtime_get_sync() and
> pm_runtime_put_sync() for each cpu.  When all cpus are powered off, the
> last cpu going down would call the genpd->power_off(). Correspondingly,
> the first cpu up would call the genpd->power_on() callback before
> resuming from idle.
> 
> Cc: Ulf Hansson <ulf.hansson@linaro.org>
> Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
> Cc: Kevin Hilman <khilman@linaro.org>
> Signed-off-by: Lina Iyer <lina.iyer@linaro.org>
> ---
>  drivers/base/power/Makefile     |   1 +
>  drivers/base/power/cpu_domain.c | 187 ++++++++++++++++++++++++++++++++++++++++
>  include/linux/pm_domain.h       |  12 +++
>  kernel/power/Kconfig            |  12 +++
>  4 files changed, 212 insertions(+)
>  create mode 100644 drivers/base/power/cpu_domain.c
> 
> diff --git a/drivers/base/power/Makefile b/drivers/base/power/Makefile
> index 1cb8544..debfc74 100644
> --- a/drivers/base/power/Makefile
> +++ b/drivers/base/power/Makefile
> @@ -4,5 +4,6 @@ obj-$(CONFIG_PM_TRACE_RTC)	+= trace.o
>  obj-$(CONFIG_PM_OPP)	+= opp.o
>  obj-$(CONFIG_PM_GENERIC_DOMAINS)	+=  domain.o domain_governor.o
>  obj-$(CONFIG_HAVE_CLK)	+= clock_ops.o
> +obj-$(CONFIG_PM_CPU_DOMAIN)	+= cpu_domain.o
>  
>  ccflags-$(CONFIG_DEBUG_DRIVER) := -DDEBUG
> diff --git a/drivers/base/power/cpu_domain.c b/drivers/base/power/cpu_domain.c
> new file mode 100644
> index 0000000..ee90094
> --- /dev/null
> +++ b/drivers/base/power/cpu_domain.c
> @@ -0,0 +1,187 @@
> +/*
> + * Generic CPU domain runtime power on/off support
> + *
> + * Copyright (C) 2015 Linaro Ltd.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + */
> +
> +#include <linux/cpu.h>
> +#include <linux/cpu_pm.h>
> +#include <linux/device.h>
> +#include <linux/kernel.h>
> +#include <linux/module.h>
> +#include <linux/of.h>
> +#include <linux/pm_domain.h>
> +#include <linux/pm_runtime.h>
> +
> +static struct cpumask cpus_handled;
> +
> +static void do_cpu(void *unused)
> +{
> +	int cpu = smp_processor_id();
> +	struct device *dev = get_cpu_device(cpu);
> +
> +	pm_runtime_get_sync(dev);
> +}
> +
> +static int cpuidle_genpd_device_init(int cpu)
> +{
> +	struct device *dev = get_cpu_device(cpu);
> +
> +	/*
> +	 * CPU device have to be irq safe for use with cpuidle, which runs
> +	 * with irqs disabled.
> +	 */
> +	pm_runtime_irq_safe(dev);
> +	pm_runtime_enable(dev);
> +
> +	genpd_dev_pm_attach(dev);
> +
> +	/*
> +	 * Execute the below on 'that' cpu to ensure that the reference
> +	 * counting is correct. Its possible that while this code is
> +	 * executed, the cpu may be in idle but we may incorrectly
> +	 * increment the usage. By executing the do_cpu on 'that' cpu,
> +	 * we can ensure that the cpu and the usage count are matched.
> +	 */
> +	return smp_call_function_single(cpu, do_cpu, NULL, true);
> +}
> +
> +static int cpu_state_notifier(struct notifier_block *n,
> +			unsigned long action, void *hcpu)
> +{
> +	int cpu = smp_processor_id();
> +	struct device *dev = get_cpu_device(cpu);
> +
> +	if (!cpumask_test_cpu(cpu, &cpus_handled))
> +		return NOTIFY_DONE;
> +
> +	switch (action) {
> +	case CPU_PM_ENTER:
> +		pm_runtime_put_sync(dev);
> +		break;
> +
> +	case CPU_PM_ENTER_FAILED:
> +	case CPU_PM_EXIT:
> +		pm_runtime_get_sync(dev);
> +		break;
> +
> +	default:
> +		return NOTIFY_DONE;
> +	}
> +
> +	return NOTIFY_OK;
> +}
> +
> +static int cpu_online_notifier(struct notifier_block *n,
> +			unsigned long action, void *hcpu)
> +{
> +	int cpu = (unsigned long)hcpu;
> +	struct device *dev = get_cpu_device(cpu);
> +
> +	if (!cpumask_test_cpu(cpu, &cpus_handled))
> +		return NOTIFY_DONE;
> +
> +	switch (action) {
> +	case CPU_STARTING:
> +	case CPU_STARTING_FROZEN:
> +		/*
> +		 * Attach the cpu to its domain if the cpu is coming up
> +		 * for the first time.
> +		 * Called from the cpu that is coming up.
> +		 */
> +		if (!genpd_dev_pm_attach(dev))
> +			do_cpu(NULL);
> +		break;
> +
> +	default:
> +		return NOTIFY_DONE;
> +	}
> +
> +	return NOTIFY_OK;
> +}
> +
> +static struct notifier_block hotplug_notifier = {
> +	.notifier_call = cpu_online_notifier,
> +};
> +
> +static struct notifier_block cpu_pm_notifier = {
> +	.notifier_call = cpu_state_notifier,
> +};
> +
> +static struct generic_pm_domain *get_cpu_domain(int cpu)
> +{
> +	struct device *dev = get_cpu_device(cpu);
> +	struct of_phandle_args pd_args;
> +	int ret;
> +
> +	/* Make sure we are a domain consumer */
> +	ret = of_parse_phandle_with_args(dev->of_node, "power-domains",
> +				"#power-domain-cells", 0, &pd_args);
> +	if (ret)
> +		return ERR_PTR(ret);
> +
> +	/* Attach cpus only for this domain */
> +	return of_genpd_get_from_provider(&pd_args);
> +}
> +
> +int pm_cpu_domain_init(struct generic_pm_domain *genpd, struct device_node *dn)
> +{
> +	int cpu;
> +	int ret;
> +	cpumask_var_t tmpmask;
> +	struct generic_pm_domain *cpupd;
> +
> +	if (!genpd || !dn)
> +		return -EINVAL;
> +
> +	if (!zalloc_cpumask_var(&tmpmask, GFP_KERNEL))
> +		return -ENOMEM;
> +
> +	/* CPU genpds have to operate in IRQ safe mode */
> +	genpd->flags |= GENPD_FLAG_IRQ_SAFE;
> +
> +	pm_genpd_init(genpd, NULL, false);
> +	ret = of_genpd_add_provider_simple(dn, genpd);
> +	if (ret)
> +		return ret;
> +
> +	/* Only add those cpus to whom we are the domain provider */
> +	for_each_online_cpu(cpu) {
> +		cpupd = get_cpu_domain(cpu);
> +
> +		if (IS_ERR(cpupd))
> +			continue;
> +
> +		if (genpd == cpupd) {
> +			cpuidle_genpd_device_init(cpu);
> +			cpumask_set_cpu(cpu, tmpmask);
> +		}
> +	}
> +
> +	if (cpumask_empty(tmpmask))
> +		goto done;
> +
> +	/*
> +	 * Not all cpus may be online at this point. Use the hotplug
> +	 * notifier to be notified of when the cpu comes online, then
> +	 * attach it to the domain.
> +	 *
> +	 * Register hotplug and cpu_pm notification once for all
> +	 * domains.
> +	 */
> +	if (cpumask_empty(&cpus_handled)) {
> +		cpu_pm_register_notifier(&cpu_pm_notifier);
> +		register_cpu_notifier(&hotplug_notifier);
> +	}
> +
> +	cpumask_copy(&cpus_handled, tmpmask);
> +
> +done:
> +	free_cpumask_var(tmpmask);
> +	return 0;
> +}
> +EXPORT_SYMBOL(pm_cpu_domain_init);
> diff --git a/include/linux/pm_domain.h b/include/linux/pm_domain.h
> index dc7cb53..fc97ad8 100644
> --- a/include/linux/pm_domain.h
> +++ b/include/linux/pm_domain.h
> @@ -280,6 +280,7 @@ struct generic_pm_domain *__of_genpd_xlate_onecell(
>  					void *data);
>  
>  int genpd_dev_pm_attach(struct device *dev);
> +
>  #else /* !CONFIG_PM_GENERIC_DOMAINS_OF */
>  static inline int __of_genpd_add_provider(struct device_node *np,
>  					genpd_xlate_t xlate, void *data)
> @@ -325,4 +326,15 @@ static inline int dev_pm_domain_attach(struct device *dev, bool power_on)
>  static inline void dev_pm_domain_detach(struct device *dev, bool power_off) {}
>  #endif
>  
> +#ifdef CONFIG_PM_CPU_DOMAIN
> +extern int pm_cpu_domain_init(struct generic_pm_domain *genpd,
> +			struct device_node *dn);
> +#else
> +static inline int pm_cpu_domain_init(struct generic_pm_domain *genpd,
> +			struct device_node *dn)
> +{
> +	return -ENODEV;
> +}
> +#endif
> +
>  #endif /* _LINUX_PM_DOMAIN_H */
> diff --git a/kernel/power/Kconfig b/kernel/power/Kconfig
> index 7e01f78..55d49f6 100644
> --- a/kernel/power/Kconfig
> +++ b/kernel/power/Kconfig
> @@ -301,3 +301,15 @@ config PM_GENERIC_DOMAINS_OF
>  
>  config CPU_PM
>  	bool
> +
> +config PM_CPU_DOMAIN
> +	def_bool y
> +	depends on PM_GENERIC_DOMAINS_OF && CPU_PM
> +	help
> +	  When cpuidle powers of the cpus in a domain, the domain can also be
> +	  powered off.
> +	  This config option allow for cpus to be registered with the domain
> +	  provider specified in the DT and when the cpu is powered off, calls
> +	  the runtime PM methods to do the reference counting. The last cpu
> +	  going down powers the domain off as well.
> +
>
Lina Iyer June 10, 2015, 4:57 p.m. UTC | #2
On Sun, Jun 07 2015 at 03:43 -0600, Krzysztof Kozlowski wrote:
>W dniu 05.06.2015 o 07:29, Lina Iyer pisze:
>> Generally cpus are grouped under a power domain in a SoC. When all cpus
>> in the domain are in their power off state,
>
>What do you exactly mean here by "CPU in power off state"? How does it
>map to kernel understanding of CPU device (hotplug? cpuidle?)?
>
Both cpuidle and hotplug could end with with core being powered down at
the platform driver or at PSCI (on V8). It does not matter which of
these two frameworks resulted in the cpu being powered off. But, if all
cpus in the domain are powered off, then the domain could be powered off
as well. This is the premise of this change. It is probably easier to
power off the domain when the cores in that domain/cluster have been
hotplugged off. It saves power to turn off the domain at that time, but
more power savings can be achieved if the domain could also be powered
off during cpuidle. Hotplug is not a common occurance, while cpuidle is.

>> the cpu domain can also be
>> powered off. Genpd provides the framework for defining cpus as devices
>> that are part of a cpu domain.
>
>The problem which is solved looks to me like the same problem which
>coupled cpuidle tried to solve: a certain deep sleep mode (e.g. power
>off) can be entered when whole cluster is idle or other CPUs in cluster
>are powered off completely.
>
>It seems a little like duplicating the effort around coupled cpuidle.
>
I see where are you are going with this, but genpd solution is not
exactly a duplicate of the solution.

Couple state is used to put the cpus in a deeper sleep state, which
could also result in powering off the domain. Coupled cpuidle is a
cpuidle mechanism for choosing a deeper sleep mode on certain hardware
that can only enter such a mode when all cpus cooperate.

This patch attempts to describe the backend of a cpu domain. CPUs are
responsible for individual cpuidle states, cpus do enter their
recommended deepest idle state at the time of no activity. A cpu-domain
could be comprised of cpus, and other devices like GIC, busses etc, that
all need to idle before the domain can be powered off. This patch does
not dictate which idle state any those devices should enter, or
coordinate the idle states between devices. But, if cpus, choose to
power down, then this patch recognizes that and reduces the reference
usage count on the domain. Only when all devices in the domain remove
their usage count, will the domain be powered off. 

There are two things this patch provides -

i. A generic way to initialize a genpd specifically for cpus. (The
platform specifies the relation between a cpu and its domain in the DT
and provides the memory for the genpd structure)

ii. On behalf of a platform, we track when the cpus power up and down
and use runtime_get and runtime_put on the genpd.

Unlike coupled cpuidle, individual cpu idle state is not manipulated.
Coupled cpuidle does not care if the domain is powered off, it is used
to allow a certain C-state for the cpu, based on the idleness of other
cpus in that cluster. The focus of the series is powering down the
domain when the devices (cpus included) are powered off. You could see
this patch as a cpu-pm and runtime-pm interface layer.

Hope that helps.

Thanks,
Lina

>
>>
>> Introduce support for defining and adding a generic power domain for the
>> cpus based on the DT specification of power domain providers and
>> consumers.  SoC's that have the cpu domain defined in their DT, can
>> setup a genpd with a name and the power_on/power_off callbacks. Calling
>> pm_cpu_domain_init() will register the genpd and attach the cpus for
>> this domain with the genpd.
>>
>> CPU_PM notifications for are used to pm_runtime_get_sync() and
>> pm_runtime_put_sync() for each cpu.  When all cpus are powered off, the
>> last cpu going down would call the genpd->power_off(). Correspondingly,
>> the first cpu up would call the genpd->power_on() callback before
>> resuming from idle.
>>
>> Cc: Ulf Hansson <ulf.hansson@linaro.org>
>> Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
>> Cc: Kevin Hilman <khilman@linaro.org>
>> Signed-off-by: Lina Iyer <lina.iyer@linaro.org>
>> ---
>>  drivers/base/power/Makefile     |   1 +
>>  drivers/base/power/cpu_domain.c | 187 ++++++++++++++++++++++++++++++++++++++++
>>  include/linux/pm_domain.h       |  12 +++
>>  kernel/power/Kconfig            |  12 +++
>>  4 files changed, 212 insertions(+)
>>  create mode 100644 drivers/base/power/cpu_domain.c
>>
>> diff --git a/drivers/base/power/Makefile b/drivers/base/power/Makefile
>> index 1cb8544..debfc74 100644
>> --- a/drivers/base/power/Makefile
>> +++ b/drivers/base/power/Makefile
>> @@ -4,5 +4,6 @@ obj-$(CONFIG_PM_TRACE_RTC)	+= trace.o
>>  obj-$(CONFIG_PM_OPP)	+= opp.o
>>  obj-$(CONFIG_PM_GENERIC_DOMAINS)	+=  domain.o domain_governor.o
>>  obj-$(CONFIG_HAVE_CLK)	+= clock_ops.o
>> +obj-$(CONFIG_PM_CPU_DOMAIN)	+= cpu_domain.o
>>
>>  ccflags-$(CONFIG_DEBUG_DRIVER) := -DDEBUG
>> diff --git a/drivers/base/power/cpu_domain.c b/drivers/base/power/cpu_domain.c
>> new file mode 100644
>> index 0000000..ee90094
>> --- /dev/null
>> +++ b/drivers/base/power/cpu_domain.c
>> @@ -0,0 +1,187 @@
>> +/*
>> + * Generic CPU domain runtime power on/off support
>> + *
>> + * Copyright (C) 2015 Linaro Ltd.
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License version 2 as
>> + * published by the Free Software Foundation.
>> + */
>> +
>> +#include <linux/cpu.h>
>> +#include <linux/cpu_pm.h>
>> +#include <linux/device.h>
>> +#include <linux/kernel.h>
>> +#include <linux/module.h>
>> +#include <linux/of.h>
>> +#include <linux/pm_domain.h>
>> +#include <linux/pm_runtime.h>
>> +
>> +static struct cpumask cpus_handled;
>> +
>> +static void do_cpu(void *unused)
>> +{
>> +	int cpu = smp_processor_id();
>> +	struct device *dev = get_cpu_device(cpu);
>> +
>> +	pm_runtime_get_sync(dev);
>> +}
>> +
>> +static int cpuidle_genpd_device_init(int cpu)
>> +{
>> +	struct device *dev = get_cpu_device(cpu);
>> +
>> +	/*
>> +	 * CPU device have to be irq safe for use with cpuidle, which runs
>> +	 * with irqs disabled.
>> +	 */
>> +	pm_runtime_irq_safe(dev);
>> +	pm_runtime_enable(dev);
>> +
>> +	genpd_dev_pm_attach(dev);
>> +
>> +	/*
>> +	 * Execute the below on 'that' cpu to ensure that the reference
>> +	 * counting is correct. Its possible that while this code is
>> +	 * executed, the cpu may be in idle but we may incorrectly
>> +	 * increment the usage. By executing the do_cpu on 'that' cpu,
>> +	 * we can ensure that the cpu and the usage count are matched.
>> +	 */
>> +	return smp_call_function_single(cpu, do_cpu, NULL, true);
>> +}
>> +
>> +static int cpu_state_notifier(struct notifier_block *n,
>> +			unsigned long action, void *hcpu)
>> +{
>> +	int cpu = smp_processor_id();
>> +	struct device *dev = get_cpu_device(cpu);
>> +
>> +	if (!cpumask_test_cpu(cpu, &cpus_handled))
>> +		return NOTIFY_DONE;
>> +
>> +	switch (action) {
>> +	case CPU_PM_ENTER:
>> +		pm_runtime_put_sync(dev);
>> +		break;
>> +
>> +	case CPU_PM_ENTER_FAILED:
>> +	case CPU_PM_EXIT:
>> +		pm_runtime_get_sync(dev);
>> +		break;
>> +
>> +	default:
>> +		return NOTIFY_DONE;
>> +	}
>> +
>> +	return NOTIFY_OK;
>> +}
>> +
>> +static int cpu_online_notifier(struct notifier_block *n,
>> +			unsigned long action, void *hcpu)
>> +{
>> +	int cpu = (unsigned long)hcpu;
>> +	struct device *dev = get_cpu_device(cpu);
>> +
>> +	if (!cpumask_test_cpu(cpu, &cpus_handled))
>> +		return NOTIFY_DONE;
>> +
>> +	switch (action) {
>> +	case CPU_STARTING:
>> +	case CPU_STARTING_FROZEN:
>> +		/*
>> +		 * Attach the cpu to its domain if the cpu is coming up
>> +		 * for the first time.
>> +		 * Called from the cpu that is coming up.
>> +		 */
>> +		if (!genpd_dev_pm_attach(dev))
>> +			do_cpu(NULL);
>> +		break;
>> +
>> +	default:
>> +		return NOTIFY_DONE;
>> +	}
>> +
>> +	return NOTIFY_OK;
>> +}
>> +
>> +static struct notifier_block hotplug_notifier = {
>> +	.notifier_call = cpu_online_notifier,
>> +};
>> +
>> +static struct notifier_block cpu_pm_notifier = {
>> +	.notifier_call = cpu_state_notifier,
>> +};
>> +
>> +static struct generic_pm_domain *get_cpu_domain(int cpu)
>> +{
>> +	struct device *dev = get_cpu_device(cpu);
>> +	struct of_phandle_args pd_args;
>> +	int ret;
>> +
>> +	/* Make sure we are a domain consumer */
>> +	ret = of_parse_phandle_with_args(dev->of_node, "power-domains",
>> +				"#power-domain-cells", 0, &pd_args);
>> +	if (ret)
>> +		return ERR_PTR(ret);
>> +
>> +	/* Attach cpus only for this domain */
>> +	return of_genpd_get_from_provider(&pd_args);
>> +}
>> +
>> +int pm_cpu_domain_init(struct generic_pm_domain *genpd, struct device_node *dn)
>> +{
>> +	int cpu;
>> +	int ret;
>> +	cpumask_var_t tmpmask;
>> +	struct generic_pm_domain *cpupd;
>> +
>> +	if (!genpd || !dn)
>> +		return -EINVAL;
>> +
>> +	if (!zalloc_cpumask_var(&tmpmask, GFP_KERNEL))
>> +		return -ENOMEM;
>> +
>> +	/* CPU genpds have to operate in IRQ safe mode */
>> +	genpd->flags |= GENPD_FLAG_IRQ_SAFE;
>> +
>> +	pm_genpd_init(genpd, NULL, false);
>> +	ret = of_genpd_add_provider_simple(dn, genpd);
>> +	if (ret)
>> +		return ret;
>> +
>> +	/* Only add those cpus to whom we are the domain provider */
>> +	for_each_online_cpu(cpu) {
>> +		cpupd = get_cpu_domain(cpu);
>> +
>> +		if (IS_ERR(cpupd))
>> +			continue;
>> +
>> +		if (genpd == cpupd) {
>> +			cpuidle_genpd_device_init(cpu);
>> +			cpumask_set_cpu(cpu, tmpmask);
>> +		}
>> +	}
>> +
>> +	if (cpumask_empty(tmpmask))
>> +		goto done;
>> +
>> +	/*
>> +	 * Not all cpus may be online at this point. Use the hotplug
>> +	 * notifier to be notified of when the cpu comes online, then
>> +	 * attach it to the domain.
>> +	 *
>> +	 * Register hotplug and cpu_pm notification once for all
>> +	 * domains.
>> +	 */
>> +	if (cpumask_empty(&cpus_handled)) {
>> +		cpu_pm_register_notifier(&cpu_pm_notifier);
>> +		register_cpu_notifier(&hotplug_notifier);
>> +	}
>> +
>> +	cpumask_copy(&cpus_handled, tmpmask);
>> +
>> +done:
>> +	free_cpumask_var(tmpmask);
>> +	return 0;
>> +}
>> +EXPORT_SYMBOL(pm_cpu_domain_init);
>> diff --git a/include/linux/pm_domain.h b/include/linux/pm_domain.h
>> index dc7cb53..fc97ad8 100644
>> --- a/include/linux/pm_domain.h
>> +++ b/include/linux/pm_domain.h
>> @@ -280,6 +280,7 @@ struct generic_pm_domain *__of_genpd_xlate_onecell(
>>  					void *data);
>>
>>  int genpd_dev_pm_attach(struct device *dev);
>> +
>>  #else /* !CONFIG_PM_GENERIC_DOMAINS_OF */
>>  static inline int __of_genpd_add_provider(struct device_node *np,
>>  					genpd_xlate_t xlate, void *data)
>> @@ -325,4 +326,15 @@ static inline int dev_pm_domain_attach(struct device *dev, bool power_on)
>>  static inline void dev_pm_domain_detach(struct device *dev, bool power_off) {}
>>  #endif
>>
>> +#ifdef CONFIG_PM_CPU_DOMAIN
>> +extern int pm_cpu_domain_init(struct generic_pm_domain *genpd,
>> +			struct device_node *dn);
>> +#else
>> +static inline int pm_cpu_domain_init(struct generic_pm_domain *genpd,
>> +			struct device_node *dn)
>> +{
>> +	return -ENODEV;
>> +}
>> +#endif
>> +
>>  #endif /* _LINUX_PM_DOMAIN_H */
>> diff --git a/kernel/power/Kconfig b/kernel/power/Kconfig
>> index 7e01f78..55d49f6 100644
>> --- a/kernel/power/Kconfig
>> +++ b/kernel/power/Kconfig
>> @@ -301,3 +301,15 @@ config PM_GENERIC_DOMAINS_OF
>>
>>  config CPU_PM
>>  	bool
>> +
>> +config PM_CPU_DOMAIN
>> +	def_bool y
>> +	depends on PM_GENERIC_DOMAINS_OF && CPU_PM
>> +	help
>> +	  When cpuidle powers of the cpus in a domain, the domain can also be
>> +	  powered off.
>> +	  This config option allow for cpus to be registered with the domain
>> +	  provider specified in the DT and when the cpu is powered off, calls
>> +	  the runtime PM methods to do the reference counting. The last cpu
>> +	  going down powers the domain off as well.
>> +
>>
>
Kevin Hilman June 10, 2015, 5:01 p.m. UTC | #3
Krzysztof Kozlowski <k.kozlowski@samsung.com> writes:

> W dniu 05.06.2015 o 07:29, Lina Iyer pisze:
>> Generally cpus are grouped under a power domain in a SoC. When all cpus
>> in the domain are in their power off state,
>
> What do you exactly mean here by "CPU in power off state"? How does it
> map to kernel understanding of CPU device (hotplug? cpuidle?)?
>
>> the cpu domain can also be
>> powered off. Genpd provides the framework for defining cpus as devices
>> that are part of a cpu domain.
>
> The problem which is solved looks to me like the same problem which
> coupled cpuidle tried to solve: a certain deep sleep mode (e.g. power
> off) can be entered when whole cluster is idle or other CPUs in cluster
> are powered off completely.
>
> It seems a little like duplicating the effort around coupled cpuidle.

Yes, it duplicates some aspects of coupled idle states, but coupled
states have their own limitations:

- only handles CPUs, not other devices sharing a power rail (e.g. L2$,
  GIC, floating point unit, CoreSight, etc. etc.)

- not scaling well past 2 CPUs

- doesn't handle clusters: While this series only addresses CPUs
  currently, the approach can be extended.  Because genpd handles nested
  domains, the could be used to model clusters as well.

Kevin
Kevin Hilman June 10, 2015, 9:37 p.m. UTC | #4
Lina Iyer <lina.iyer@linaro.org> writes:

> Generally cpus are grouped under a power domain in a SoC. When all cpus
> in the domain are in their power off state, the cpu domain can also be
> powered off. 

How does this relate to a cluster, and why aren't you using that terminolgy?

> Genpd provides the framework for defining cpus as devices
> that are part of a cpu domain.
>
> Introduce support for defining and adding a generic power domain for the
> cpus based on the DT specification of power domain providers and
> consumers.  SoC's that have the cpu domain defined in their DT, can
> setup a genpd with a name and the power_on/power_off callbacks. Calling
> pm_cpu_domain_init() will register the genpd and attach the cpus for
> this domain with the genpd.
>
> CPU_PM notifications for are used to pm_runtime_get_sync() and
> pm_runtime_put_sync() for each cpu.  When all cpus are powered off, the
> last cpu going down would call the genpd->power_off(). Correspondingly,
> the first cpu up would call the genpd->power_on() callback before
> resuming from idle.

Other patches also mention this genpd being useful to gate power to
non-CPU peripherals on the same power rail.  How are those devices to be
added?

Without seeing the DTs and the init code that might call
pm_cpu_domain_init(), it's hard for me to see how this is intended to be
used.  Could you also include a patch that shows how this is initialized
and the DT additions?  Ideally, it should also show how a non-CPU device
would be included.

Kevin
Krzysztof Kozlowski June 11, 2015, 12:27 a.m. UTC | #5
On 11.06.2015 01:57, Lina Iyer wrote:
> On Sun, Jun 07 2015 at 03:43 -0600, Krzysztof Kozlowski wrote:
>> W dniu 05.06.2015 o 07:29, Lina Iyer pisze:
>>> Generally cpus are grouped under a power domain in a SoC. When all cpus
>>> in the domain are in their power off state,
>>
>> What do you exactly mean here by "CPU in power off state"? How does it
>> map to kernel understanding of CPU device (hotplug? cpuidle?)?
>>
> Both cpuidle and hotplug could end with with core being powered down at
> the platform driver or at PSCI (on V8). It does not matter which of
> these two frameworks resulted in the cpu being powered off. But, if all
> cpus in the domain are powered off, then the domain could be powered off
> as well. This is the premise of this change. It is probably easier to
> power off the domain when the cores in that domain/cluster have been
> hotplugged off. It saves power to turn off the domain at that time, but
> more power savings can be achieved if the domain could also be powered
> off during cpuidle. Hotplug is not a common occurance, while cpuidle is.

OK, it answers my questions, thanks.

> 
>>> the cpu domain can also be
>>> powered off. Genpd provides the framework for defining cpus as devices
>>> that are part of a cpu domain.
>>
>> The problem which is solved looks to me like the same problem which
>> coupled cpuidle tried to solve: a certain deep sleep mode (e.g. power
>> off) can be entered when whole cluster is idle or other CPUs in cluster
>> are powered off completely.
>>
>> It seems a little like duplicating the effort around coupled cpuidle.
>>
> I see where are you are going with this, but genpd solution is not
> exactly a duplicate of the solution.
> 
> Couple state is used to put the cpus in a deeper sleep state, which
> could also result in powering off the domain. Coupled cpuidle is a
> cpuidle mechanism for choosing a deeper sleep mode on certain hardware
> that can only enter such a mode when all cpus cooperate.
> 
> This patch attempts to describe the backend of a cpu domain. CPUs are
> responsible for individual cpuidle states, cpus do enter their
> recommended deepest idle state at the time of no activity. A cpu-domain
> could be comprised of cpus, and other devices like GIC, busses etc, that
> all need to idle before the domain can be powered off. This patch does
> not dictate which idle state any those devices should enter, or
> coordinate the idle states between devices. But, if cpus, choose to
> power down, then this patch recognizes that and reduces the reference
> usage count on the domain. Only when all devices in the domain remove
> their usage count, will the domain be powered off.

It would be nice to see the usage of this patch in cpuidle driver or
platform code but I think I get the idea.

Actually I like the approach.
I am thinking how to utilize it to replace coupled cpuidle for our case.
In our case we use coupled cpuidle because the SoC can be put in low
power mode only if non-boot CPUs are powered down.

However in our case:
1. Some other devices (buses, clocks) also should be idle. This would
perfectly match with this patch and with runtime PM.

2. Some non-boot idle CPU could power itself down but it cannot wake up.
Only the alive CPU can wake others. This probably means that we cannot
provide a cpuidle driver which will power off unused cores and then, if
boot CPU is idle, disable the CPU power domain by entering to low power
mode.

Anyway, as I said, I like the approach.


> There are two things this patch provides -
> 
> i. A generic way to initialize a genpd specifically for cpus. (The
> platform specifies the relation between a cpu and its domain in the DT
> and provides the memory for the genpd structure)
> 
> ii. On behalf of a platform, we track when the cpus power up and down
> and use runtime_get and runtime_put on the genpd.
> 
> Unlike coupled cpuidle, individual cpu idle state is not manipulated.
> Coupled cpuidle does not care if the domain is powered off, it is used
> to allow a certain C-state for the cpu, based on the idleness of other
> cpus in that cluster. The focus of the series is powering down the
> domain when the devices (cpus included) are powered off. You could see
> this patch as a cpu-pm and runtime-pm interface layer.
> 
> Hope that helps.
> 
> Thanks,
> Lina
> 

Best regards,
Krzysztof
Krzysztof Kozlowski June 11, 2015, 12:35 a.m. UTC | #6
On 11.06.2015 02:01, Kevin Hilman wrote:
> Krzysztof Kozlowski <k.kozlowski@samsung.com> writes:
> 
>> W dniu 05.06.2015 o 07:29, Lina Iyer pisze:
>>> Generally cpus are grouped under a power domain in a SoC. When all cpus
>>> in the domain are in their power off state,
>>
>> What do you exactly mean here by "CPU in power off state"? How does it
>> map to kernel understanding of CPU device (hotplug? cpuidle?)?
>>
>>> the cpu domain can also be
>>> powered off. Genpd provides the framework for defining cpus as devices
>>> that are part of a cpu domain.
>>
>> The problem which is solved looks to me like the same problem which
>> coupled cpuidle tried to solve: a certain deep sleep mode (e.g. power
>> off) can be entered when whole cluster is idle or other CPUs in cluster
>> are powered off completely.
>>
>> It seems a little like duplicating the effort around coupled cpuidle.
> 
> Yes, it duplicates some aspects of coupled idle states, but coupled
> states have their own limitations:
> 
> - only handles CPUs, not other devices sharing a power rail (e.g. L2$,
>   GIC, floating point unit, CoreSight, etc. etc.)
> 
> - not scaling well past 2 CPUs
> 
> - doesn't handle clusters: While this series only addresses CPUs
>   currently, the approach can be extended.  Because genpd handles nested
>   domains, the could be used to model clusters as well.

Right. I agree with your explanation. I am just thinking how to utilize
this for Exynos deep sleep modes which now we implement using coupled
cpuidle.

Anyway I like the idea!

Best regards,
Krzysztof
Lina Iyer June 11, 2015, 2:42 p.m. UTC | #7
On Thu, Jun 11 2015 at 18:27 -0600, Krzysztof Kozlowski wrote:
>On 11.06.2015 01:57, Lina Iyer wrote:
>> On Sun, Jun 07 2015 at 03:43 -0600, Krzysztof Kozlowski wrote:
>>> W dniu 05.06.2015 o 07:29, Lina Iyer pisze:
>>>> Generally cpus are grouped under a power domain in a SoC. When all cpus
>>>> in the domain are in their power off state,
>>>
>>> What do you exactly mean here by "CPU in power off state"? How does it
>>> map to kernel understanding of CPU device (hotplug? cpuidle?)?
>>>
>> Both cpuidle and hotplug could end with with core being powered down at
>> the platform driver or at PSCI (on V8). It does not matter which of
>> these two frameworks resulted in the cpu being powered off. But, if all
>> cpus in the domain are powered off, then the domain could be powered off
>> as well. This is the premise of this change. It is probably easier to
>> power off the domain when the cores in that domain/cluster have been
>> hotplugged off. It saves power to turn off the domain at that time, but
>> more power savings can be achieved if the domain could also be powered
>> off during cpuidle. Hotplug is not a common occurance, while cpuidle is.
>
>OK, it answers my questions, thanks.
>
>>
>>>> the cpu domain can also be
>>>> powered off. Genpd provides the framework for defining cpus as devices
>>>> that are part of a cpu domain.
>>>
>>> The problem which is solved looks to me like the same problem which
>>> coupled cpuidle tried to solve: a certain deep sleep mode (e.g. power
>>> off) can be entered when whole cluster is idle or other CPUs in cluster
>>> are powered off completely.
>>>
>>> It seems a little like duplicating the effort around coupled cpuidle.
>>>
>> I see where are you are going with this, but genpd solution is not
>> exactly a duplicate of the solution.
>>
>> Couple state is used to put the cpus in a deeper sleep state, which
>> could also result in powering off the domain. Coupled cpuidle is a
>> cpuidle mechanism for choosing a deeper sleep mode on certain hardware
>> that can only enter such a mode when all cpus cooperate.
>>
>> This patch attempts to describe the backend of a cpu domain. CPUs are
>> responsible for individual cpuidle states, cpus do enter their
>> recommended deepest idle state at the time of no activity. A cpu-domain
>> could be comprised of cpus, and other devices like GIC, busses etc, that
>> all need to idle before the domain can be powered off. This patch does
>> not dictate which idle state any those devices should enter, or
>> coordinate the idle states between devices. But, if cpus, choose to
>> power down, then this patch recognizes that and reduces the reference
>> usage count on the domain. Only when all devices in the domain remove
>> their usage count, will the domain be powered off.
>
>It would be nice to see the usage of this patch in cpuidle driver or
>platform code but I think I get the idea.
>
Ok, my next spin, will include the platform driver changes for the QCOM
SoC that I tested this on.


>Actually I like the approach.
>I am thinking how to utilize it to replace coupled cpuidle for our case.
>In our case we use coupled cpuidle because the SoC can be put in low
>power mode only if non-boot CPUs are powered down.
>
>However in our case:
>1. Some other devices (buses, clocks) also should be idle. This would
>perfectly match with this patch and with runtime PM.
>
>2. Some non-boot idle CPU could power itself down but it cannot wake up.
>Only the alive CPU can wake others. This probably means that we cannot
>provide a cpuidle driver which will power off unused cores and then, if
>boot CPU is idle, disable the CPU power domain by entering to low power
>mode.
>
>Anyway, as I said, I like the approach.
>
It was Kevin's idea, that I implemented.

Thanks,
Lina
>
>> There are two things this patch provides -
>>
>> i. A generic way to initialize a genpd specifically for cpus. (The
>> platform specifies the relation between a cpu and its domain in the DT
>> and provides the memory for the genpd structure)
>>
>> ii. On behalf of a platform, we track when the cpus power up and down
>> and use runtime_get and runtime_put on the genpd.
>>
>> Unlike coupled cpuidle, individual cpu idle state is not manipulated.
>> Coupled cpuidle does not care if the domain is powered off, it is used
>> to allow a certain C-state for the cpu, based on the idleness of other
>> cpus in that cluster. The focus of the series is powering down the
>> domain when the devices (cpus included) are powered off. You could see
>> this patch as a cpu-pm and runtime-pm interface layer.
>>
>> Hope that helps.
>>
>> Thanks,
>> Lina
>>
>
>Best regards,
>Krzysztof
>
>--
>To unsubscribe from this list: send the line "unsubscribe linux-pm" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html
Lina Iyer June 11, 2015, 2:56 p.m. UTC | #8
On Wed, Jun 10 2015 at 15:38 -0600, Kevin Hilman wrote:
>Lina Iyer <lina.iyer@linaro.org> writes:
>
>> Generally cpus are grouped under a power domain in a SoC. When all cpus
>> in the domain are in their power off state, the cpu domain can also be
>> powered off.
>
>How does this relate to a cluster, and why aren't you using that terminolgy?
>
>> Genpd provides the framework for defining cpus as devices
>> that are part of a cpu domain.
>>
>> Introduce support for defining and adding a generic power domain for the
>> cpus based on the DT specification of power domain providers and
>> consumers.  SoC's that have the cpu domain defined in their DT, can
>> setup a genpd with a name and the power_on/power_off callbacks. Calling
>> pm_cpu_domain_init() will register the genpd and attach the cpus for
>> this domain with the genpd.
>>
>> CPU_PM notifications for are used to pm_runtime_get_sync() and
>> pm_runtime_put_sync() for each cpu.  When all cpus are powered off, the
>> last cpu going down would call the genpd->power_off(). Correspondingly,
>> the first cpu up would call the genpd->power_on() callback before
>> resuming from idle.
>
>Other patches also mention this genpd being useful to gate power to
>non-CPU peripherals on the same power rail.  How are those devices to be
>added?
>
I am not investigating DT nodes to figure out which node is a consumer
for this domain provider. That could be a good way to do it, but
practically speaking, there may be platform dependencies and specifics
that may need to be met, before the device can be added to the CPU
genpd.

So that is not generalized here. I didn't see a better way to do that,
generically. Do you have ideas on that?

The platform is the owner of the genpd and therefore can add those
non-cpu devices to the genpd as and when appropritate.

In this regard, I also have a question, who initializes the genpd. My
assumption is that CPUs will get probed before most other devices and
therefore the domain provider could be initialized by this file. But, I
could be wrong here.




>Without seeing the DTs and the init code that might call
>pm_cpu_domain_init(), it's hard for me to see how this is intended to be
>used.  Could you also include a patch that shows how this is initialized
>and the DT additions?  Ideally, it should also show how a non-CPU device
>would be included.
>
I am sorry, you are right. This is better explained with the patches for
platform driver. I will add them in the next spin. I thought the
examples provided in the cover letter is pretty close to the genpd
related changes that I made in my platform coder. But I agree, it doesnt
give the complete picture.

Thanks,
Lina
Kevin Hilman June 15, 2015, 6:43 p.m. UTC | #9
Lina Iyer <lina.iyer@linaro.org> writes:

> On Wed, Jun 10 2015 at 15:38 -0600, Kevin Hilman wrote:
>>Lina Iyer <lina.iyer@linaro.org> writes:
>>
>>> Generally cpus are grouped under a power domain in a SoC. When all cpus
>>> in the domain are in their power off state, the cpu domain can also be
>>> powered off.
>>
>>How does this relate to a cluster, and why aren't you using that terminolgy?
>>
>>> Genpd provides the framework for defining cpus as devices
>>> that are part of a cpu domain.
>>>
>>> Introduce support for defining and adding a generic power domain for the
>>> cpus based on the DT specification of power domain providers and
>>> consumers.  SoC's that have the cpu domain defined in their DT, can
>>> setup a genpd with a name and the power_on/power_off callbacks. Calling
>>> pm_cpu_domain_init() will register the genpd and attach the cpus for
>>> this domain with the genpd.
>>>
>>> CPU_PM notifications for are used to pm_runtime_get_sync() and
>>> pm_runtime_put_sync() for each cpu.  When all cpus are powered off, the
>>> last cpu going down would call the genpd->power_off(). Correspondingly,
>>> the first cpu up would call the genpd->power_on() callback before
>>> resuming from idle.
>>
>>Other patches also mention this genpd being useful to gate power to
>>non-CPU peripherals on the same power rail.  How are those devices to be
>>added?
>>
> I am not investigating DT nodes to figure out which node is a consumer
> for this domain provider. That could be a good way to do it, but
> practically speaking, there may be platform dependencies and specifics
> that may need to be met, before the device can be added to the CPU
> genpd.
>
> So that is not generalized here. I didn't see a better way to do that,
> generically. Do you have ideas on that?
>
> The platform is the owner of the genpd and therefore can add those
> non-cpu devices to the genpd as and when appropritate.

I'm pretty sure the generic code will already add devices to genpds if
the genpd is using the of_genpd_* stuff.  That is why I'm wondering why
the extra stuff for CPUs is needed.

> In this regard, I also have a question, who initializes the genpd. My
> assumption is that CPUs will get probed before most other devices and
> therefore the domain provider could be initialized by this file. But, I
> could be wrong here.

Initializing it in this driver seems OK to me.

Kevin
Lina Iyer June 15, 2015, 7:14 p.m. UTC | #10
On Mon, Jun 15 2015 at 12:43 -0600, Kevin Hilman wrote:
>Lina Iyer <lina.iyer@linaro.org> writes:
>
>> On Wed, Jun 10 2015 at 15:38 -0600, Kevin Hilman wrote:
>>>Lina Iyer <lina.iyer@linaro.org> writes:


>I'm pretty sure the generic code will already add devices to genpds if
>the genpd is using the of_genpd_* stuff.  That is why I'm wondering why
>the extra stuff for CPUs is needed.
>
I dont see that automatically happening. When I attach a device, it
finds the corresponding genpd provider and attaches the device.  But I
dont see in any code that creates genpd and find the related device
nodes and adds them to the genpd.

May be I am missing something.

-- Lina
Kevin Hilman June 16, 2015, 3:50 p.m. UTC | #11
Lina Iyer <lina.iyer@linaro.org> writes:

> On Mon, Jun 15 2015 at 12:43 -0600, Kevin Hilman wrote:
>>Lina Iyer <lina.iyer@linaro.org> writes:
>>
>>> On Wed, Jun 10 2015 at 15:38 -0600, Kevin Hilman wrote:
>>>>Lina Iyer <lina.iyer@linaro.org> writes:
>
>
>>I'm pretty sure the generic code will already add devices to genpds if
>>the genpd is using the of_genpd_* stuff.  That is why I'm wondering why
>>the extra stuff for CPUs is needed.
>>
> I dont see that automatically happening. When I attach a device, it
> finds the corresponding genpd provider and attaches the device.  But I
> dont see in any code that creates genpd and find the related device
> nodes and adds them to the genpd.

[summary from our IRC discussion]

You still need to create the genpd, but it's dev_pm_domain_attach()
called by the platform device probe path that will automaticaly try to
attach a device with a power-domains property to the correct PM domain.

Note that this assumes that the genpds are created before the devices
are probed.

Kevin
diff mbox

Patch

diff --git a/drivers/base/power/Makefile b/drivers/base/power/Makefile
index 1cb8544..debfc74 100644
--- a/drivers/base/power/Makefile
+++ b/drivers/base/power/Makefile
@@ -4,5 +4,6 @@  obj-$(CONFIG_PM_TRACE_RTC)	+= trace.o
 obj-$(CONFIG_PM_OPP)	+= opp.o
 obj-$(CONFIG_PM_GENERIC_DOMAINS)	+=  domain.o domain_governor.o
 obj-$(CONFIG_HAVE_CLK)	+= clock_ops.o
+obj-$(CONFIG_PM_CPU_DOMAIN)	+= cpu_domain.o
 
 ccflags-$(CONFIG_DEBUG_DRIVER) := -DDEBUG
diff --git a/drivers/base/power/cpu_domain.c b/drivers/base/power/cpu_domain.c
new file mode 100644
index 0000000..ee90094
--- /dev/null
+++ b/drivers/base/power/cpu_domain.c
@@ -0,0 +1,187 @@ 
+/*
+ * Generic CPU domain runtime power on/off support
+ *
+ * Copyright (C) 2015 Linaro Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/cpu.h>
+#include <linux/cpu_pm.h>
+#include <linux/device.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/of.h>
+#include <linux/pm_domain.h>
+#include <linux/pm_runtime.h>
+
+static struct cpumask cpus_handled;
+
+static void do_cpu(void *unused)
+{
+	int cpu = smp_processor_id();
+	struct device *dev = get_cpu_device(cpu);
+
+	pm_runtime_get_sync(dev);
+}
+
+static int cpuidle_genpd_device_init(int cpu)
+{
+	struct device *dev = get_cpu_device(cpu);
+
+	/*
+	 * CPU device have to be irq safe for use with cpuidle, which runs
+	 * with irqs disabled.
+	 */
+	pm_runtime_irq_safe(dev);
+	pm_runtime_enable(dev);
+
+	genpd_dev_pm_attach(dev);
+
+	/*
+	 * Execute the below on 'that' cpu to ensure that the reference
+	 * counting is correct. Its possible that while this code is
+	 * executed, the cpu may be in idle but we may incorrectly
+	 * increment the usage. By executing the do_cpu on 'that' cpu,
+	 * we can ensure that the cpu and the usage count are matched.
+	 */
+	return smp_call_function_single(cpu, do_cpu, NULL, true);
+}
+
+static int cpu_state_notifier(struct notifier_block *n,
+			unsigned long action, void *hcpu)
+{
+	int cpu = smp_processor_id();
+	struct device *dev = get_cpu_device(cpu);
+
+	if (!cpumask_test_cpu(cpu, &cpus_handled))
+		return NOTIFY_DONE;
+
+	switch (action) {
+	case CPU_PM_ENTER:
+		pm_runtime_put_sync(dev);
+		break;
+
+	case CPU_PM_ENTER_FAILED:
+	case CPU_PM_EXIT:
+		pm_runtime_get_sync(dev);
+		break;
+
+	default:
+		return NOTIFY_DONE;
+	}
+
+	return NOTIFY_OK;
+}
+
+static int cpu_online_notifier(struct notifier_block *n,
+			unsigned long action, void *hcpu)
+{
+	int cpu = (unsigned long)hcpu;
+	struct device *dev = get_cpu_device(cpu);
+
+	if (!cpumask_test_cpu(cpu, &cpus_handled))
+		return NOTIFY_DONE;
+
+	switch (action) {
+	case CPU_STARTING:
+	case CPU_STARTING_FROZEN:
+		/*
+		 * Attach the cpu to its domain if the cpu is coming up
+		 * for the first time.
+		 * Called from the cpu that is coming up.
+		 */
+		if (!genpd_dev_pm_attach(dev))
+			do_cpu(NULL);
+		break;
+
+	default:
+		return NOTIFY_DONE;
+	}
+
+	return NOTIFY_OK;
+}
+
+static struct notifier_block hotplug_notifier = {
+	.notifier_call = cpu_online_notifier,
+};
+
+static struct notifier_block cpu_pm_notifier = {
+	.notifier_call = cpu_state_notifier,
+};
+
+static struct generic_pm_domain *get_cpu_domain(int cpu)
+{
+	struct device *dev = get_cpu_device(cpu);
+	struct of_phandle_args pd_args;
+	int ret;
+
+	/* Make sure we are a domain consumer */
+	ret = of_parse_phandle_with_args(dev->of_node, "power-domains",
+				"#power-domain-cells", 0, &pd_args);
+	if (ret)
+		return ERR_PTR(ret);
+
+	/* Attach cpus only for this domain */
+	return of_genpd_get_from_provider(&pd_args);
+}
+
+int pm_cpu_domain_init(struct generic_pm_domain *genpd, struct device_node *dn)
+{
+	int cpu;
+	int ret;
+	cpumask_var_t tmpmask;
+	struct generic_pm_domain *cpupd;
+
+	if (!genpd || !dn)
+		return -EINVAL;
+
+	if (!zalloc_cpumask_var(&tmpmask, GFP_KERNEL))
+		return -ENOMEM;
+
+	/* CPU genpds have to operate in IRQ safe mode */
+	genpd->flags |= GENPD_FLAG_IRQ_SAFE;
+
+	pm_genpd_init(genpd, NULL, false);
+	ret = of_genpd_add_provider_simple(dn, genpd);
+	if (ret)
+		return ret;
+
+	/* Only add those cpus to whom we are the domain provider */
+	for_each_online_cpu(cpu) {
+		cpupd = get_cpu_domain(cpu);
+
+		if (IS_ERR(cpupd))
+			continue;
+
+		if (genpd == cpupd) {
+			cpuidle_genpd_device_init(cpu);
+			cpumask_set_cpu(cpu, tmpmask);
+		}
+	}
+
+	if (cpumask_empty(tmpmask))
+		goto done;
+
+	/*
+	 * Not all cpus may be online at this point. Use the hotplug
+	 * notifier to be notified of when the cpu comes online, then
+	 * attach it to the domain.
+	 *
+	 * Register hotplug and cpu_pm notification once for all
+	 * domains.
+	 */
+	if (cpumask_empty(&cpus_handled)) {
+		cpu_pm_register_notifier(&cpu_pm_notifier);
+		register_cpu_notifier(&hotplug_notifier);
+	}
+
+	cpumask_copy(&cpus_handled, tmpmask);
+
+done:
+	free_cpumask_var(tmpmask);
+	return 0;
+}
+EXPORT_SYMBOL(pm_cpu_domain_init);
diff --git a/include/linux/pm_domain.h b/include/linux/pm_domain.h
index dc7cb53..fc97ad8 100644
--- a/include/linux/pm_domain.h
+++ b/include/linux/pm_domain.h
@@ -280,6 +280,7 @@  struct generic_pm_domain *__of_genpd_xlate_onecell(
 					void *data);
 
 int genpd_dev_pm_attach(struct device *dev);
+
 #else /* !CONFIG_PM_GENERIC_DOMAINS_OF */
 static inline int __of_genpd_add_provider(struct device_node *np,
 					genpd_xlate_t xlate, void *data)
@@ -325,4 +326,15 @@  static inline int dev_pm_domain_attach(struct device *dev, bool power_on)
 static inline void dev_pm_domain_detach(struct device *dev, bool power_off) {}
 #endif
 
+#ifdef CONFIG_PM_CPU_DOMAIN
+extern int pm_cpu_domain_init(struct generic_pm_domain *genpd,
+			struct device_node *dn);
+#else
+static inline int pm_cpu_domain_init(struct generic_pm_domain *genpd,
+			struct device_node *dn)
+{
+	return -ENODEV;
+}
+#endif
+
 #endif /* _LINUX_PM_DOMAIN_H */
diff --git a/kernel/power/Kconfig b/kernel/power/Kconfig
index 7e01f78..55d49f6 100644
--- a/kernel/power/Kconfig
+++ b/kernel/power/Kconfig
@@ -301,3 +301,15 @@  config PM_GENERIC_DOMAINS_OF
 
 config CPU_PM
 	bool
+
+config PM_CPU_DOMAIN
+	def_bool y
+	depends on PM_GENERIC_DOMAINS_OF && CPU_PM
+	help
+	  When cpuidle powers of the cpus in a domain, the domain can also be
+	  powered off.
+	  This config option allow for cpus to be registered with the domain
+	  provider specified in the DT and when the cpu is powered off, calls
+	  the runtime PM methods to do the reference counting. The last cpu
+	  going down powers the domain off as well.
+