Message ID | 1889415.atdPhlSkOF@rjwysocki.net (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | cpufreq: intel_pstate: Enable EAS on hybrid platforms without SMT | expand |
On 11/8/24 16:41, Rafael J. Wysocki wrote: > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > Some cpufreq drivers, like intel_pstate, have built-in governors that > are used instead of regular cpufreq governors, schedutil in particular, > but they can work with EAS just fine, so allow EAS to be used with > those drivers. > > Also update the debug message printed when the cpufreq governor in > use is not schedutil and the related comment, to better match the > code after the change. > > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > --- > > I'm not sure how much value there is in refusing to enable EAS without > schedutil in general. For instance, if there are no crossover points > between the cost curves for different perf domains, EAS may as well be > used with the performance and powersave governors AFAICS. Agreed, but having no cross-over points or no DVFS at all should be the only instances, right? For plain (non-intel_pstate) powersave and performance we could replace sugov_effective_cpu_perf() that determines the OPP of the perf-domain by the OPP they will be choosing, but for the rest? Also there is the entire uclamp thing, not sure what the best solution is there. Will intel_pstate just always ignore it? Might be better then to depend on !intel_pstate? > --- > kernel/sched/topology.c | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > Index: linux-pm/kernel/sched/topology.c > =================================================================== > --- linux-pm.orig/kernel/sched/topology.c > +++ linux-pm/kernel/sched/topology.c > @@ -251,7 +251,7 @@ static bool sched_is_eas_possible(const > return false; > } > > - /* Do not attempt EAS if schedutil is not being used. */ > + /* Do not attempt EAS with a cpufreq governor other than schedutil. */ > for_each_cpu(i, cpu_mask) { > policy = cpufreq_cpu_get(i); > if (!policy) { > @@ -263,9 +263,9 @@ static bool sched_is_eas_possible(const > } > gov = policy->governor; > cpufreq_cpu_put(policy); > - if (gov != &schedutil_gov) { > + if (gov && gov != &schedutil_gov) { > if (sched_debug()) { > - pr_info("rd %*pbl: Checking EAS, schedutil is mandatory\n", > + pr_info("rd %*pbl: Checking EAS, cpufreq governor is not schedutil\n", > cpumask_pr_args(cpu_mask)); > } > return false; > > > >
On Mon, Nov 11, 2024 at 12:54 PM Christian Loehle <christian.loehle@arm.com> wrote: > > On 11/8/24 16:41, Rafael J. Wysocki wrote: > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > > > Some cpufreq drivers, like intel_pstate, have built-in governors that > > are used instead of regular cpufreq governors, schedutil in particular, > > but they can work with EAS just fine, so allow EAS to be used with > > those drivers. > > > > Also update the debug message printed when the cpufreq governor in > > use is not schedutil and the related comment, to better match the > > code after the change. > > > > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > --- > > > > I'm not sure how much value there is in refusing to enable EAS without > > schedutil in general. For instance, if there are no crossover points > > between the cost curves for different perf domains, EAS may as well be > > used with the performance and powersave governors AFAICS. > > Agreed, but having no cross-over points or no DVFS at all should be the > only instances, right? Not really. This is the most obvious case, but there are other less obvious ones. Say there are two cross-over points: The "performance" and "powersave" governors should still be fine with EAS in that case. Or what if somebody has a governor in user space that generally behaves like schedutil? Or what about ondemand? Is it alway completely broken with EAS? > For plain (non-intel_pstate) powersave and performance we could replace > sugov_effective_cpu_perf() > that determines the OPP of the perf-domain by the OPP they will be > choosing, but for the rest? I generally think that depending on schedutil for EAS is a mistake. I would just print a warning that results may be suboptimal or generally not as expected if the cpufreq governor is not schedutil instead of preventing EAS from running at all. > Also there is the entire uclamp thing, not sure what the best > solution is there. > Will intel_pstate just always ignore it? Might be better then to > depend on !intel_pstate? Well, it can be made dependent on policy->policy == CPUFREQ_POLICY_POWERSAVE if gov is NULL or similar, but honestly why bother? > > --- > > kernel/sched/topology.c | 6 +++--- > > 1 file changed, 3 insertions(+), 3 deletions(-) > > > > Index: linux-pm/kernel/sched/topology.c > > =================================================================== > > --- linux-pm.orig/kernel/sched/topology.c > > +++ linux-pm/kernel/sched/topology.c > > @@ -251,7 +251,7 @@ static bool sched_is_eas_possible(const > > return false; > > } > > > > - /* Do not attempt EAS if schedutil is not being used. */ > > + /* Do not attempt EAS with a cpufreq governor other than schedutil. */ > > for_each_cpu(i, cpu_mask) { > > policy = cpufreq_cpu_get(i); > > if (!policy) { > > @@ -263,9 +263,9 @@ static bool sched_is_eas_possible(const > > } > > gov = policy->governor; > > cpufreq_cpu_put(policy); > > - if (gov != &schedutil_gov) { > > + if (gov && gov != &schedutil_gov) { > > if (sched_debug()) { > > - pr_info("rd %*pbl: Checking EAS, schedutil is mandatory\n", > > + pr_info("rd %*pbl: Checking EAS, cpufreq governor is not schedutil\n", > > cpumask_pr_args(cpu_mask)); > > } > > return false; > > > > > > > > > >
On Mon, 11 Nov 2024 at 14:54, Rafael J. Wysocki <rafael@kernel.org> wrote: > > On Mon, Nov 11, 2024 at 12:54 PM Christian Loehle > <christian.loehle@arm.com> wrote: > > > > On 11/8/24 16:41, Rafael J. Wysocki wrote: > > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > > > > > Some cpufreq drivers, like intel_pstate, have built-in governors that > > > are used instead of regular cpufreq governors, schedutil in particular, > > > but they can work with EAS just fine, so allow EAS to be used with > > > those drivers. > > > > > > Also update the debug message printed when the cpufreq governor in > > > use is not schedutil and the related comment, to better match the > > > code after the change. > > > > > > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > > --- > > > > > > I'm not sure how much value there is in refusing to enable EAS without > > > schedutil in general. For instance, if there are no crossover points > > > between the cost curves for different perf domains, EAS may as well be > > > used with the performance and powersave governors AFAICS. > > > > Agreed, but having no cross-over points or no DVFS at all should be the > > only instances, right? > > Not really. This is the most obvious case, but there are other less > obvious ones. > > Say there are two cross-over points: The "performance" and > "powersave" governors should still be fine with EAS in that case. > > Or what if somebody has a governor in user space that generally > behaves like schedutil? > > Or what about ondemand? Is it alway completely broken with EAS? The only requirement from EAS is to know which OPP and its cost will be selected by cpufreq gov for an utilization level of the CPU. sched_util provides it with sugov_effective_cpu_perf(). Any other gov that can provide such estimate of the OPP and associated cost should be ok powersave and perf should be pretty obvious not so sure for ondemand > > > For plain (non-intel_pstate) powersave and performance we could replace > > sugov_effective_cpu_perf() > > that determines the OPP of the perf-domain by the OPP they will be > > choosing, but for the rest? > > I generally think that depending on schedutil for EAS is a mistake. > > I would just print a warning that results may be suboptimal or > generally not as expected if the cpufreq governor is not schedutil > instead of preventing EAS from running at all. > > > Also there is the entire uclamp thing, not sure what the best > > solution is there. > > Will intel_pstate just always ignore it? Might be better then to > > depend on !intel_pstate? > > Well, it can be made dependent on policy->policy == > CPUFREQ_POLICY_POWERSAVE if gov is NULL or similar, but honestly why > bother? > > > > --- > > > kernel/sched/topology.c | 6 +++--- > > > 1 file changed, 3 insertions(+), 3 deletions(-) > > > > > > Index: linux-pm/kernel/sched/topology.c > > > =================================================================== > > > --- linux-pm.orig/kernel/sched/topology.c > > > +++ linux-pm/kernel/sched/topology.c > > > @@ -251,7 +251,7 @@ static bool sched_is_eas_possible(const > > > return false; > > > } > > > > > > - /* Do not attempt EAS if schedutil is not being used. */ > > > + /* Do not attempt EAS with a cpufreq governor other than schedutil. */ > > > for_each_cpu(i, cpu_mask) { > > > policy = cpufreq_cpu_get(i); > > > if (!policy) { > > > @@ -263,9 +263,9 @@ static bool sched_is_eas_possible(const > > > } > > > gov = policy->governor; > > > cpufreq_cpu_put(policy); > > > - if (gov != &schedutil_gov) { > > > + if (gov && gov != &schedutil_gov) { > > > if (sched_debug()) { > > > - pr_info("rd %*pbl: Checking EAS, schedutil is mandatory\n", > > > + pr_info("rd %*pbl: Checking EAS, cpufreq governor is not schedutil\n", > > > cpumask_pr_args(cpu_mask)); > > > } > > > return false; > > > > > > > > > > > > > > > >
On Mon, Nov 11, 2024 at 02:54:43PM +0100, Rafael J. Wysocki wrote: > Or what about ondemand? Is it alway completely broken with EAS? I thought that thing was mostly considered broken anyway :-) > > For plain (non-intel_pstate) powersave and performance we could replace > > sugov_effective_cpu_perf() > > that determines the OPP of the perf-domain by the OPP they will be > > choosing, but for the rest? > > I generally think that depending on schedutil for EAS is a mistake. Well, the thinking was that we wanted to move to a single governor, and not proliferate things.
On Tue, Nov 19, 2024 at 6:37 PM Peter Zijlstra <peterz@infradead.org> wrote: > > On Mon, Nov 11, 2024 at 02:54:43PM +0100, Rafael J. Wysocki wrote: > > > Or what about ondemand? Is it alway completely broken with EAS? > > I thought that thing was mostly considered broken anyway :-) Well, it's still there in the tree, although honestly I don't know how many users of it there are. > > > For plain (non-intel_pstate) powersave and performance we could replace > > > sugov_effective_cpu_perf() > > > that determines the OPP of the perf-domain by the OPP they will be > > > choosing, but for the rest? > > > > I generally think that depending on schedutil for EAS is a mistake. > > Well, the thinking was that we wanted to move to a single governor, and > not proliferate things. Thing is, intel_pstate in its default configuration doesn't use a separate cpufreq governor at all. It allows P-code to select P-states, but on the new HW the result of this isn't really that much different from what schedutil would do. The cpufreq governor check needs to be adjusted at least for this case, but overall it should be done in cpufreq because it refers to cpufreq internals.
Index: linux-pm/kernel/sched/topology.c =================================================================== --- linux-pm.orig/kernel/sched/topology.c +++ linux-pm/kernel/sched/topology.c @@ -251,7 +251,7 @@ static bool sched_is_eas_possible(const return false; } - /* Do not attempt EAS if schedutil is not being used. */ + /* Do not attempt EAS with a cpufreq governor other than schedutil. */ for_each_cpu(i, cpu_mask) { policy = cpufreq_cpu_get(i); if (!policy) { @@ -263,9 +263,9 @@ static bool sched_is_eas_possible(const } gov = policy->governor; cpufreq_cpu_put(policy); - if (gov != &schedutil_gov) { + if (gov && gov != &schedutil_gov) { if (sched_debug()) { - pr_info("rd %*pbl: Checking EAS, schedutil is mandatory\n", + pr_info("rd %*pbl: Checking EAS, cpufreq governor is not schedutil\n", cpumask_pr_args(cpu_mask)); } return false;