[v2,1/6] cpufreq: schedutil: ignore sugov kthreads

Message ID	1499189651-18797-2-git-send-email-patrick.bellasi@arm.com (mailing list archive)
State	Deferred
Headers	show Return-Path: <linux-pm-owner@kernel.org> From: Patrick Bellasi <patrick.bellasi@arm.com> To: linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org Cc: Ingo Molnar <mingo@redhat.com>, Peter Zijlstra <peterz@infradead.org>, "Rafael J . Wysocki" <rafael.j.wysocki@intel.com>, Viresh Kumar <viresh.kumar@linaro.org>, Vincent Guittot <vincent.guittot@linaro.org>, Juri Lelli <juri.lelli@arm.com>, Joel Fernandes <joelaf@google.com>, Andres Oportus <andresoportus@google.com>, Todd Kjos <tkjos@android.com>, Morten Rasmussen <morten.rasmussen@arm.com>, Dietmar Eggemann <dietmar.eggemann@arm.com> Subject: [PATCH v2 1/6] cpufreq: schedutil: ignore sugov kthreads Date: Tue, 4 Jul 2017 18:34:06 +0100 Message-Id: <1499189651-18797-2-git-send-email-patrick.bellasi@arm.com> In-Reply-To: <1499189651-18797-1-git-send-email-patrick.bellasi@arm.com> References: <1499189651-18797-1-git-send-email-patrick.bellasi@arm.com> Sender: linux-pm-owner@vger.kernel.org Precedence: bulk

Patrick Bellasi July 4, 2017, 5:34 p.m. UTC

In system where multiple CPUs shares the same frequency domain a small
workload on a CPU can still be subject to frequency spikes, generated by
the activation of the sugov's kthread.

Since the sugov kthread is a special RT task, which goal is just that to
activate a frequency transition, it does not make sense for it to bias
the schedutil's frequency selection policy.

This patch exploits the information related to the current task to silently
ignore cpufreq_update_this_cpu() calls, coming from the RT scheduler, while
the sugov kthread is running.

Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: linux-kernel@vger.kernel.org
Cc: linux-pm@vger.kernel.org

---
Changes from v1:
- move check before policy spinlock (JuriL)
---
 kernel/sched/cpufreq_schedutil.c | 8 ++++++++
 1 file changed, 8 insertions(+)

Viresh Kumar July 5, 2017, 5 a.m. UTC | #1

On 04-07-17, 18:34, Patrick Bellasi wrote:
> In system where multiple CPUs shares the same frequency domain a small
> workload on a CPU can still be subject to frequency spikes, generated by
> the activation of the sugov's kthread.
> 
> Since the sugov kthread is a special RT task, which goal is just that to
> activate a frequency transition, it does not make sense for it to bias
> the schedutil's frequency selection policy.
> 
> This patch exploits the information related to the current task to silently
> ignore cpufreq_update_this_cpu() calls, coming from the RT scheduler, while
> the sugov kthread is running.
> 
> Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Cc: Viresh Kumar <viresh.kumar@linaro.org>
> Cc: linux-kernel@vger.kernel.org
> Cc: linux-pm@vger.kernel.org
> 
> ---
> Changes from v1:
> - move check before policy spinlock (JuriL)
> ---
>  kernel/sched/cpufreq_schedutil.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> index c982dd0..eaba6d6 100644
> --- a/kernel/sched/cpufreq_schedutil.c
> +++ b/kernel/sched/cpufreq_schedutil.c
> @@ -218,6 +218,10 @@ static void sugov_update_single(struct update_util_data *hook, u64 time,
>  	unsigned int next_f;
>  	bool busy;
>  
> +	/* Skip updates generated by sugov kthreads */
> +	if (unlikely(current == sg_policy->thread))
> +		return;
> +
>  	sugov_set_iowait_boost(sg_cpu, time, flags);
>  	sg_cpu->last_update = time;
>  
> @@ -290,6 +294,10 @@ static void sugov_update_shared(struct update_util_data *hook, u64 time,
>  	unsigned long util, max;
>  	unsigned int next_f;
>  
> +	/* Skip updates generated by sugov kthreads */
> +	if (unlikely(current == sg_policy->thread))
> +		return;
> +
>  	sugov_get_util(&util, &max);

Yes we discussed this last time as well (I looked again at those discussions and
am still confused a bit), but wanted to clarify one more time.

After the 2nd patch of this series is applied, why will we still have this
problem? As we concluded it last time, the problem wouldn't happen until the
time the sugov RT thread is running (Hint: work_in_progress). And once the sugov
RT thread is gone, one of the other scheduling classes will take over and should
update the flag pretty quickly.

Are we worried about the time between the sugov RT thread finishes and when the
CFS or IDLE sched class call the util handler again? If yes, then we will still
have that problem for any normal RT/DL task. Isn't it ?

Patrick Bellasi July 5, 2017, 11:38 a.m. UTC | #2

On 05-Jul 10:30, Viresh Kumar wrote:
> On 04-07-17, 18:34, Patrick Bellasi wrote:
> > In system where multiple CPUs shares the same frequency domain a small
> > workload on a CPU can still be subject to frequency spikes, generated by
> > the activation of the sugov's kthread.
> > 
> > Since the sugov kthread is a special RT task, which goal is just that to
> > activate a frequency transition, it does not make sense for it to bias
> > the schedutil's frequency selection policy.
> > 
> > This patch exploits the information related to the current task to silently
> > ignore cpufreq_update_this_cpu() calls, coming from the RT scheduler, while
> > the sugov kthread is running.
> > 
> > Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
> > Cc: Ingo Molnar <mingo@redhat.com>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > Cc: Viresh Kumar <viresh.kumar@linaro.org>
> > Cc: linux-kernel@vger.kernel.org
> > Cc: linux-pm@vger.kernel.org
> > 
> > ---
> > Changes from v1:
> > - move check before policy spinlock (JuriL)
> > ---
> >  kernel/sched/cpufreq_schedutil.c | 8 ++++++++
> >  1 file changed, 8 insertions(+)
> > 
> > diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> > index c982dd0..eaba6d6 100644
> > --- a/kernel/sched/cpufreq_schedutil.c
> > +++ b/kernel/sched/cpufreq_schedutil.c
> > @@ -218,6 +218,10 @@ static void sugov_update_single(struct update_util_data *hook, u64 time,
> >  	unsigned int next_f;
> >  	bool busy;
> >  
> > +	/* Skip updates generated by sugov kthreads */
> > +	if (unlikely(current == sg_policy->thread))
> > +		return;
> > +
> >  	sugov_set_iowait_boost(sg_cpu, time, flags);
> >  	sg_cpu->last_update = time;
> >  
> > @@ -290,6 +294,10 @@ static void sugov_update_shared(struct update_util_data *hook, u64 time,
> >  	unsigned long util, max;
> >  	unsigned int next_f;
> >  
> > +	/* Skip updates generated by sugov kthreads */
> > +	if (unlikely(current == sg_policy->thread))
> > +		return;
> > +
> >  	sugov_get_util(&util, &max);
> 
> Yes we discussed this last time as well (I looked again at those discussions and
> am still confused a bit), but wanted to clarify one more time.
> 
> After the 2nd patch of this series is applied, why will we still have this
> problem? As we concluded it last time, the problem wouldn't happen until the
> time the sugov RT thread is running (Hint: work_in_progress). And once the sugov
> RT thread is gone, one of the other scheduling classes will take over and should
> update the flag pretty quickly.
> 
> Are we worried about the time between the sugov RT thread finishes and when the
> CFS or IDLE sched class call the util handler again? If yes, then we will still
> have that problem for any normal RT/DL task. Isn't it ?

Yes, we are worried about that time, without this we can generate
spikes to the max OPP even when only relatively small FAIR tasks are
running.

The same problem is not there for the other "normal RT/DL" tasks, just
because for those tasks this is the expected behavior: we wanna go to
max.

To the contrary the sugov kthread, although being a RT task, is just
functional to the "machinery" to work, it's an actuator. Thus, IMO it
makes no sense from a design standpoint for it to interfere whatsoever
with what the "machinery" is doing.

Finally, the second patch of this series fixes a kind-of symmetrical
issue: while this one avoid going to max OPP, the next one avoid to
stay at max OPP once not more needed.

Cheers Patrick

Viresh Kumar July 6, 2017, 4:50 a.m. UTC | #3

On 05-07-17, 12:38, Patrick Bellasi wrote:
> On 05-Jul 10:30, Viresh Kumar wrote:
> > Yes we discussed this last time as well (I looked again at those discussions and
> > am still confused a bit), but wanted to clarify one more time.
> > 
> > After the 2nd patch of this series is applied, why will we still have this
> > problem? As we concluded it last time, the problem wouldn't happen until the
> > time the sugov RT thread is running (Hint: work_in_progress). And once the sugov
> > RT thread is gone, one of the other scheduling classes will take over and should
> > update the flag pretty quickly.
> > 
> > Are we worried about the time between the sugov RT thread finishes and when the
> > CFS or IDLE sched class call the util handler again? If yes, then we will still
> > have that problem for any normal RT/DL task. Isn't it ?
> 
> Yes, we are worried about that time,

But isn't that a very very small amount of time? i.e. As soon as the RT thread
is finished, we will select the next task from CFS or go to IDLE class (of
course if there is nothing left in DL/RT). And this should happen very quickly.
Are we sure we really see problems in that short time? Sure it can happen, but
it looks to be an extreme corner case and just wanted to check if it really
happened for you after the 2nd patch.

> without this we can generate
> spikes to the max OPP even when only relatively small FAIR tasks are
> running.
> 
> The same problem is not there for the other "normal RT/DL" tasks, just
> because for those tasks this is the expected behavior: we wanna go to
> max.

By same problem I meant that after the last RT task is finished and before the
pick_next_task of the IDLE_CLASS (or CFS) is called, we can still get a callback
into schedutil and that may raise the frequency to MAX. Its a similar kind of
problem, but yes we never wanted the freq to go to max for sugov thread.

> To the contrary the sugov kthread, although being a RT task, is just
> functional to the "machinery" to work, it's an actuator. Thus, IMO it
> makes no sense from a design standpoint for it to interfere whatsoever
> with what the "machinery" is doing.

I think everyone agrees on this. I was just exploring if that can be achieved
without any special code like what this patch proposes.

I was wondering about what will happen for a case where we have two RT tasks
(one of them is sugov thread) and when we land into schedutil the current task
is sugov. With this patch we will not set the flag, but actually we have another
task which is RT.

Rafael J. Wysocki July 6, 2017, 10:18 p.m. UTC | #4

On Wednesday, July 05, 2017 12:38:34 PM Patrick Bellasi wrote:
> On 05-Jul 10:30, Viresh Kumar wrote:
> > On 04-07-17, 18:34, Patrick Bellasi wrote:
> > > In system where multiple CPUs shares the same frequency domain a small
> > > workload on a CPU can still be subject to frequency spikes, generated by
> > > the activation of the sugov's kthread.
> > > 
> > > Since the sugov kthread is a special RT task, which goal is just that to
> > > activate a frequency transition, it does not make sense for it to bias
> > > the schedutil's frequency selection policy.
> > > 
> > > This patch exploits the information related to the current task to silently
> > > ignore cpufreq_update_this_cpu() calls, coming from the RT scheduler, while
> > > the sugov kthread is running.
> > > 
> > > Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
> > > Cc: Ingo Molnar <mingo@redhat.com>
> > > Cc: Peter Zijlstra <peterz@infradead.org>
> > > Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > > Cc: Viresh Kumar <viresh.kumar@linaro.org>
> > > Cc: linux-kernel@vger.kernel.org
> > > Cc: linux-pm@vger.kernel.org
> > > 
> > > ---
> > > Changes from v1:
> > > - move check before policy spinlock (JuriL)
> > > ---
> > >  kernel/sched/cpufreq_schedutil.c | 8 ++++++++
> > >  1 file changed, 8 insertions(+)
> > > 
> > > diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> > > index c982dd0..eaba6d6 100644
> > > --- a/kernel/sched/cpufreq_schedutil.c
> > > +++ b/kernel/sched/cpufreq_schedutil.c
> > > @@ -218,6 +218,10 @@ static void sugov_update_single(struct update_util_data *hook, u64 time,
> > >  	unsigned int next_f;
> > >  	bool busy;
> > >  
> > > +	/* Skip updates generated by sugov kthreads */
> > > +	if (unlikely(current == sg_policy->thread))
> > > +		return;
> > > +
> > >  	sugov_set_iowait_boost(sg_cpu, time, flags);
> > >  	sg_cpu->last_update = time;
> > >  
> > > @@ -290,6 +294,10 @@ static void sugov_update_shared(struct update_util_data *hook, u64 time,
> > >  	unsigned long util, max;
> > >  	unsigned int next_f;
> > >  
> > > +	/* Skip updates generated by sugov kthreads */
> > > +	if (unlikely(current == sg_policy->thread))
> > > +		return;
> > > +
> > >  	sugov_get_util(&util, &max);
> > 
> > Yes we discussed this last time as well (I looked again at those discussions and
> > am still confused a bit), but wanted to clarify one more time.
> > 
> > After the 2nd patch of this series is applied, why will we still have this
> > problem? As we concluded it last time, the problem wouldn't happen until the
> > time the sugov RT thread is running (Hint: work_in_progress). And once the sugov
> > RT thread is gone, one of the other scheduling classes will take over and should
> > update the flag pretty quickly.
> > 
> > Are we worried about the time between the sugov RT thread finishes and when the
> > CFS or IDLE sched class call the util handler again? If yes, then we will still
> > have that problem for any normal RT/DL task. Isn't it ?
> 
> Yes, we are worried about that time, without this we can generate
> spikes to the max OPP even when only relatively small FAIR tasks are
> running.
> 
> The same problem is not there for the other "normal RT/DL" tasks, just
> because for those tasks this is the expected behavior: we wanna go to
> max.
> 
> To the contrary the sugov kthread, although being a RT task, is just
> functional to the "machinery" to work, it's an actuator. Thus, IMO it
> makes no sense from a design standpoint for it to interfere whatsoever
> with what the "machinery" is doing.

How is this related to the Juri's series?

Thanks,
Rafael

Saravana Kannan July 11, 2017, 7:08 p.m. UTC | #5

On 07/04/2017 10:34 AM, Patrick Bellasi wrote:
> In system where multiple CPUs shares the same frequency domain a small
> workload on a CPU can still be subject to frequency spikes, generated by
> the activation of the sugov's kthread.
>
> Since the sugov kthread is a special RT task, which goal is just that to
> activate a frequency transition, it does not make sense for it to bias
> the schedutil's frequency selection policy.
>
> This patch exploits the information related to the current task to silently
> ignore cpufreq_update_this_cpu() calls, coming from the RT scheduler, while
> the sugov kthread is running.
>
> Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Cc: Viresh Kumar <viresh.kumar@linaro.org>
> Cc: linux-kernel@vger.kernel.org
> Cc: linux-pm@vger.kernel.org
>
> ---
> Changes from v1:
> - move check before policy spinlock (JuriL)
> ---
>   kernel/sched/cpufreq_schedutil.c | 8 ++++++++
>   1 file changed, 8 insertions(+)
>
> diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> index c982dd0..eaba6d6 100644
> --- a/kernel/sched/cpufreq_schedutil.c
> +++ b/kernel/sched/cpufreq_schedutil.c
> @@ -218,6 +218,10 @@ static void sugov_update_single(struct update_util_data *hook, u64 time,
>   	unsigned int next_f;
>   	bool busy;
>
> +	/* Skip updates generated by sugov kthreads */
> +	if (unlikely(current == sg_policy->thread))
> +		return;
> +
>   	sugov_set_iowait_boost(sg_cpu, time, flags);
>   	sg_cpu->last_update = time;
>
> @@ -290,6 +294,10 @@ static void sugov_update_shared(struct update_util_data *hook, u64 time,
>   	unsigned long util, max;
>   	unsigned int next_f;
>
> +	/* Skip updates generated by sugov kthreads */
> +	if (unlikely(current == sg_policy->thread))
> +		return;
> +

This seems super race-y. Especially when combined with rate_limit_us. 
Deciding to not update the frequency for a policy just because the call 
back happened in the context of the kthread is not right. Especially 
when it's combined with the remote CPU call backs patches Viresh is 
putting out (which I think is a well intended patch series).

-Saravana

[v2,1/6] cpufreq: schedutil: ignore sugov kthreads

Commit Message

Comments

Patch