diff mbox

[5/5] cpufreq: schedutil: do not update rate limit ts when freq is unchanged

Message ID 1462828814-32530-6-git-send-email-smuckle@linaro.org (mailing list archive)
State Changes Requested, archived
Headers show

Commit Message

Steve Muckle May 9, 2016, 9:20 p.m. UTC
The rate limit timestamp (last_freq_update_time) is currently advanced
anytime schedutil re-evaluates the policy regardless of whether the CPU
frequency is changed or not. This means that utilization updates which
have no effect can cause much more significant utilization updates
(which require a large increase or decrease in CPU frequency) to be
delayed due to rate limiting.

Instead only update the rate limiting timstamp when the requested
target-supported frequency changes. The rate limit will now apply to
the rate of CPU frequency changes rather than the rate of
re-evaluations of the policy frequency.

Signed-off-by: Steve Muckle <smuckle@linaro.org>
---
 kernel/sched/cpufreq_schedutil.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

Comments

Rafael J. Wysocki May 18, 2016, 11:44 p.m. UTC | #1
On Mon, May 9, 2016 at 11:20 PM, Steve Muckle <steve.muckle@linaro.org> wrote:
> The rate limit timestamp (last_freq_update_time) is currently advanced
> anytime schedutil re-evaluates the policy regardless of whether the CPU
> frequency is changed or not. This means that utilization updates which
> have no effect can cause much more significant utilization updates
> (which require a large increase or decrease in CPU frequency) to be
> delayed due to rate limiting.
>
> Instead only update the rate limiting timstamp when the requested
> target-supported frequency changes. The rate limit will now apply to
> the rate of CPU frequency changes rather than the rate of
> re-evaluations of the policy frequency.
>
> Signed-off-by: Steve Muckle <smuckle@linaro.org>

I'm sort of divided here to be honest.

> ---
>  kernel/sched/cpufreq_schedutil.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> index e185075fcb5c..4d2907c8a142 100644
> --- a/kernel/sched/cpufreq_schedutil.c
> +++ b/kernel/sched/cpufreq_schedutil.c
> @@ -117,12 +117,11 @@ static void sugov_update_commit(struct sugov_cpu *sg_cpu, int cpu, u64 time,
>         struct sugov_policy *sg_policy = sg_cpu->sg_policy;
>         struct cpufreq_policy *policy = sg_policy->policy;
>
> -       sg_policy->last_freq_update_time = time;
> -
>         if (sg_policy->next_freq == next_freq) {
>                 trace_cpu_frequency(policy->cur, cpu);

You should at least rate limit the trace_cpu_frequency() thing here if
you don't want to advance the last update time I think, or you may
easily end up with the trace buffer flooded by irrelevant stuff.

>                 return;
>         }
> +       sg_policy->last_freq_update_time = time;
>         sg_policy->next_freq = next_freq;
>
>         if (sugov_queue_remote_callback(sg_policy, cpu))
> --
> 2.4.10
>
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Steve Muckle May 19, 2016, 7:46 p.m. UTC | #2
On Thu, May 19, 2016 at 01:44:36AM +0200, Rafael J. Wysocki wrote:
> On Mon, May 9, 2016 at 11:20 PM, Steve Muckle <steve.muckle@linaro.org> wrote:
> > The rate limit timestamp (last_freq_update_time) is currently advanced
> > anytime schedutil re-evaluates the policy regardless of whether the CPU
> > frequency is changed or not. This means that utilization updates which
> > have no effect can cause much more significant utilization updates
> > (which require a large increase or decrease in CPU frequency) to be
> > delayed due to rate limiting.
> >
> > Instead only update the rate limiting timstamp when the requested
> > target-supported frequency changes. The rate limit will now apply to
> > the rate of CPU frequency changes rather than the rate of
> > re-evaluations of the policy frequency.
> >
> > Signed-off-by: Steve Muckle <smuckle@linaro.org>
> 
> I'm sort of divided here to be honest.

It is true that this means we'll do more frequency re-evaluations, they
will occur until an actual frequency change is requested.

But the way it stands now, with a system's typical background activity
there are so many minor events that it is very common for throttling to
be in effect, causing major events to be ignored. 
> 
> > ---
> >  kernel/sched/cpufreq_schedutil.c | 3 +--
> >  1 file changed, 1 insertion(+), 2 deletions(-)
> >
> > diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> > index e185075fcb5c..4d2907c8a142 100644
> > --- a/kernel/sched/cpufreq_schedutil.c
> > +++ b/kernel/sched/cpufreq_schedutil.c
> > @@ -117,12 +117,11 @@ static void sugov_update_commit(struct sugov_cpu *sg_cpu, int cpu, u64 time,
> >         struct sugov_policy *sg_policy = sg_cpu->sg_policy;
> >         struct cpufreq_policy *policy = sg_policy->policy;
> >
> > -       sg_policy->last_freq_update_time = time;
> > -
> >         if (sg_policy->next_freq == next_freq) {
> >                 trace_cpu_frequency(policy->cur, cpu);
> 
> You should at least rate limit the trace_cpu_frequency() thing here if
> you don't want to advance the last update time I think, or you may
> easily end up with the trace buffer flooded by irrelevant stuff.

Going back to the reason this tracepoint exists, is it known why
powertop thinks the CPU is idle when this tracepoint is removed? Maybe
it's possible to get rid of this tracepoint altogether.

Thanks for reviewing the series.

thanks,
Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael J. Wysocki May 19, 2016, 9:15 p.m. UTC | #3
On Thu, May 19, 2016 at 9:46 PM, Steve Muckle <steve.muckle@linaro.org> wrote:
> On Thu, May 19, 2016 at 01:44:36AM +0200, Rafael J. Wysocki wrote:
>> On Mon, May 9, 2016 at 11:20 PM, Steve Muckle <steve.muckle@linaro.org> wrote:
>> > The rate limit timestamp (last_freq_update_time) is currently advanced
>> > anytime schedutil re-evaluates the policy regardless of whether the CPU
>> > frequency is changed or not. This means that utilization updates which
>> > have no effect can cause much more significant utilization updates
>> > (which require a large increase or decrease in CPU frequency) to be
>> > delayed due to rate limiting.
>> >
>> > Instead only update the rate limiting timstamp when the requested
>> > target-supported frequency changes. The rate limit will now apply to
>> > the rate of CPU frequency changes rather than the rate of
>> > re-evaluations of the policy frequency.
>> >
>> > Signed-off-by: Steve Muckle <smuckle@linaro.org>
>>
>> I'm sort of divided here to be honest.
>
> It is true that this means we'll do more frequency re-evaluations, they
> will occur until an actual frequency change is requested.
>
> But the way it stands now, with a system's typical background activity
> there are so many minor events that it is very common for throttling to
> be in effect, causing major events to be ignored.
>>
>> > ---
>> >  kernel/sched/cpufreq_schedutil.c | 3 +--
>> >  1 file changed, 1 insertion(+), 2 deletions(-)
>> >
>> > diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
>> > index e185075fcb5c..4d2907c8a142 100644
>> > --- a/kernel/sched/cpufreq_schedutil.c
>> > +++ b/kernel/sched/cpufreq_schedutil.c
>> > @@ -117,12 +117,11 @@ static void sugov_update_commit(struct sugov_cpu *sg_cpu, int cpu, u64 time,
>> >         struct sugov_policy *sg_policy = sg_cpu->sg_policy;
>> >         struct cpufreq_policy *policy = sg_policy->policy;
>> >
>> > -       sg_policy->last_freq_update_time = time;
>> > -
>> >         if (sg_policy->next_freq == next_freq) {
>> >                 trace_cpu_frequency(policy->cur, cpu);
>>
>> You should at least rate limit the trace_cpu_frequency() thing here if
>> you don't want to advance the last update time I think, or you may
>> easily end up with the trace buffer flooded by irrelevant stuff.
>
> Going back to the reason this tracepoint exists, is it known why
> powertop thinks the CPU is idle when this tracepoint is removed? Maybe
> it's possible to get rid of this tracepoint altogether.

I'm not sure ATM.  It seems to go by the time stamps and declare idle
if it doesn't see updates for long enough.

I was hoping to be able to make cpufreq stats usable for the fast
switch case, but that appears to mean some major surgery in there.

But anyway this change again seems to be an optimization that might be
done later to me.

I guess there are many things that might be optimized in schedutil,
but I'd prefer to address one item at a time, maybe going after the
ones that appear most relevant first?
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Steve Muckle May 19, 2016, 11:34 p.m. UTC | #4
On Thu, May 19, 2016 at 11:15:52PM +0200, Rafael J. Wysocki wrote:
> But anyway this change again seems to be an optimization that might be
> done later to me.
> 
> I guess there are many things that might be optimized in schedutil,
> but I'd prefer to address one item at a time, maybe going after the
> ones that appear most relevant first?

Calling the last two patches in this series optimizations is a stretch
IMO. Issuing frequency change requests that result in the same
target-supported frequency is clearly unnecessary and ends up delaying
more urgent frequency changes, which I think is more like a bug. These
patches are also needed in conjunction with the first three to address
the remote wakeup delay.

Are there specific items you want to see addressed before these patches
could go in? I'm aware of the RT/DL support that needs improving, though
that should be orthogonal.

Also if it helps, I can provide a test case and/or traces to show the
need for the last two patches.

thanks,
Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael J. Wysocki May 20, 2016, 12:24 a.m. UTC | #5
On Fri, May 20, 2016 at 1:34 AM, Steve Muckle <steve.muckle@linaro.org> wrote:
> On Thu, May 19, 2016 at 11:15:52PM +0200, Rafael J. Wysocki wrote:
>> But anyway this change again seems to be an optimization that might be
>> done later to me.
>>
>> I guess there are many things that might be optimized in schedutil,
>> but I'd prefer to address one item at a time, maybe going after the
>> ones that appear most relevant first?
>
> Calling the last two patches in this series optimizations is a stretch
> IMO. Issuing frequency change requests that result in the same
> target-supported frequency is clearly unnecessary and ends up delaying
> more urgent frequency changes, which I think is more like a bug.

The [4/5] is pulling stuff where it doesn't belong.  Simple as that.
Frequency tables don't belong in schedutil, so don't pull them in
there.

If you want to do that cleanly, add a call to the driver that will
tell you what frequency would be selected by it if it were given a
particular target.

I actually do agree with the direction of it and the [5/5], but I
don't like cutting corners. :-)

> These patches are also needed in conjunction with the first three to address
> the remote wakeup delay.

Well, does this mean that without the [4-5/5] the rest of the series
doesn't provide as much benefit as initially expected?

> Are there specific items you want to see addressed before these patches could go in?

Do you mean in addition to what I already said in my comments?

> I'm aware of the RT/DL support that needs improving, though
> that should be orthogonal.
>
> Also if it helps, I can provide a test case and/or traces to show the
> need for the last two patches.

Yes, that will help.

Thanks,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael J. Wysocki May 20, 2016, 12:37 a.m. UTC | #6
On Fri, May 20, 2016 at 2:24 AM, Rafael J. Wysocki <rafael@kernel.org> wrote:
> On Fri, May 20, 2016 at 1:34 AM, Steve Muckle <steve.muckle@linaro.org> wrote:
>> On Thu, May 19, 2016 at 11:15:52PM +0200, Rafael J. Wysocki wrote:
>>> But anyway this change again seems to be an optimization that might be
>>> done later to me.
>>>
>>> I guess there are many things that might be optimized in schedutil,
>>> but I'd prefer to address one item at a time, maybe going after the
>>> ones that appear most relevant first?
>>
>> Calling the last two patches in this series optimizations is a stretch
>> IMO. Issuing frequency change requests that result in the same
>> target-supported frequency is clearly unnecessary and ends up delaying
>> more urgent frequency changes, which I think is more like a bug.
>
> The [4/5] is pulling stuff where it doesn't belong.  Simple as that.
> Frequency tables don't belong in schedutil, so don't pull them in
> there.
>
> If you want to do that cleanly, add a call to the driver that will
> tell you what frequency would be selected by it if it were given a
> particular target.

Also I think that it would be good to avoid walking the frequency
table twice in case we end up wanting to update the frequency after
all.  With the [4/5] we'd do it once in get_next_freq() and then once
more in cpufreq_driver_fast_switch(), for example, and walking the
frequency table may be more expensive that doing the switch in the
first place.
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Steve Muckle May 20, 2016, 12:37 a.m. UTC | #7
On Fri, May 20, 2016 at 02:24:19AM +0200, Rafael J. Wysocki wrote:
> On Fri, May 20, 2016 at 1:34 AM, Steve Muckle <steve.muckle@linaro.org> wrote:
> > On Thu, May 19, 2016 at 11:15:52PM +0200, Rafael J. Wysocki wrote:
> >> But anyway this change again seems to be an optimization that might be
> >> done later to me.
> >>
> >> I guess there are many things that might be optimized in schedutil,
> >> but I'd prefer to address one item at a time, maybe going after the
> >> ones that appear most relevant first?
> >
> > Calling the last two patches in this series optimizations is a stretch
> > IMO. Issuing frequency change requests that result in the same
> > target-supported frequency is clearly unnecessary and ends up delaying
> > more urgent frequency changes, which I think is more like a bug.
> 
> The [4/5] is pulling stuff where it doesn't belong.  Simple as that.
> Frequency tables don't belong in schedutil, so don't pull them in
> there.
> 
> If you want to do that cleanly, add a call to the driver that will
> tell you what frequency would be selected by it if it were given a
> particular target.
> 
> I actually do agree with the direction of it and the [5/5], but I
> don't like cutting corners. :-)

Okay sure, makes sense and I'll work on a change to do that.

> 
> > These patches are also needed in conjunction with the first three to address
> > the remote wakeup delay.
> 
> Well, does this mean that without the [4-5/5] the rest of the series
> doesn't provide as much benefit as initially expected?

At least in my testing the last two patches were required to show the
improvement with the unit test I had developed. This is because all the
minor system activity would keep pushing the rate limit out so when the
remote cb came in it had no effect.

So the last two patches aren't a hard requirement to theoretically solve
the problem but in practice I don't think a benefit will be seen without
them.

> > Are there specific items you want to see addressed before these patches could go in?
> 
> Do you mean in addition to what I already said in my comments?

I was referring to the other possible optimizations/changes you
mentioned in schedutil that might serve as cause to defer these patches.

> 
> > I'm aware of the RT/DL support that needs improving, though
> > that should be orthogonal.
> >
> > Also if it helps, I can provide a test case and/or traces to show the
> > need for the last two patches.
> 
> Yes, that will help.

Okay I'll include a pointer to my unit test program and some trace data
in my next post.

thanks,
Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Steve Muckle May 20, 2016, 12:40 a.m. UTC | #8
On Fri, May 20, 2016 at 02:37:17AM +0200, Rafael J. Wysocki wrote:
> Also I think that it would be good to avoid walking the frequency
> table twice in case we end up wanting to update the frequency after
> all.  With the [4/5] we'd do it once in get_next_freq() and then once
> more in cpufreq_driver_fast_switch(), for example, and walking the
> frequency table may be more expensive that doing the switch in the
> first place.

If a driver API is added to return the platform frequency associated
with a target frequency, what do you think about requiring the
fast_switch API to take a target-supported frequency?
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael J. Wysocki May 20, 2016, 12:46 a.m. UTC | #9
On Fri, May 20, 2016 at 2:40 AM, Steve Muckle <steve.muckle@linaro.org> wrote:
> On Fri, May 20, 2016 at 02:37:17AM +0200, Rafael J. Wysocki wrote:
>> Also I think that it would be good to avoid walking the frequency
>> table twice in case we end up wanting to update the frequency after
>> all.  With the [4/5] we'd do it once in get_next_freq() and then once
>> more in cpufreq_driver_fast_switch(), for example, and walking the
>> frequency table may be more expensive that doing the switch in the
>> first place.
>
> If a driver API is added to return the platform frequency associated
> with a target frequency, what do you think about requiring the
> fast_switch API to take a target-supported frequency?

That doesn't help much, because it generally would need to find a
table entry corresponding to it anyway, to find the actual command
value to write to a register, for example.

But the driver could be smart and cache the value returned from the
new callback along with the command value associated with it.  If
invoked with that particular frequency, it would use the cached
command.  Otherwise, it would walk the table.
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael J. Wysocki May 20, 2016, 12:55 a.m. UTC | #10
On Fri, May 20, 2016 at 2:37 AM, Steve Muckle <steve.muckle@linaro.org> wrote:
> On Fri, May 20, 2016 at 02:24:19AM +0200, Rafael J. Wysocki wrote:
>> On Fri, May 20, 2016 at 1:34 AM, Steve Muckle <steve.muckle@linaro.org> wrote:
>> > On Thu, May 19, 2016 at 11:15:52PM +0200, Rafael J. Wysocki wrote:
>> >> But anyway this change again seems to be an optimization that might be
>> >> done later to me.
>> >>
>> >> I guess there are many things that might be optimized in schedutil,
>> >> but I'd prefer to address one item at a time, maybe going after the
>> >> ones that appear most relevant first?
>> >
>> > Calling the last two patches in this series optimizations is a stretch
>> > IMO. Issuing frequency change requests that result in the same
>> > target-supported frequency is clearly unnecessary and ends up delaying
>> > more urgent frequency changes, which I think is more like a bug.
>>
>> The [4/5] is pulling stuff where it doesn't belong.  Simple as that.
>> Frequency tables don't belong in schedutil, so don't pull them in
>> there.
>>
>> If you want to do that cleanly, add a call to the driver that will
>> tell you what frequency would be selected by it if it were given a
>> particular target.
>>
>> I actually do agree with the direction of it and the [5/5], but I
>> don't like cutting corners. :-)
>
> Okay sure, makes sense and I'll work on a change to do that.
>
>>
>> > These patches are also needed in conjunction with the first three to address
>> > the remote wakeup delay.
>>
>> Well, does this mean that without the [4-5/5] the rest of the series
>> doesn't provide as much benefit as initially expected?
>
> At least in my testing the last two patches were required to show the
> improvement with the unit test I had developed. This is because all the
> minor system activity would keep pushing the rate limit out so when the
> remote cb came in it had no effect.
>
> So the last two patches aren't a hard requirement to theoretically solve
> the problem but in practice I don't think a benefit will be seen without
> them.

Fair enough.

>> > Are there specific items you want to see addressed before these patches could go in?
>>
>> Do you mean in addition to what I already said in my comments?
>
> I was referring to the other possible optimizations/changes you
> mentioned in schedutil that might serve as cause to defer these patches.

That is sort of orthogonal, but we need to be able to pass hints from
the scheduler via cpufreq_update_util(), for a couple of reasons, and
maybe use those to trigger immediate updates ignoring the rate limit
in some cases.

But to avoid extending the list of arguments to pass to
cpufreq_update_util(), it would be good to teach schedutil to read the
utilization data directly first (Peter had a patch to do that some
time ago, but it would need to be rebased on top of the current code).
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael J. Wysocki May 20, 2016, 11:39 a.m. UTC | #11
On Fri, May 20, 2016 at 2:46 AM, Rafael J. Wysocki <rafael@kernel.org> wrote:
> On Fri, May 20, 2016 at 2:40 AM, Steve Muckle <steve.muckle@linaro.org> wrote:
>> On Fri, May 20, 2016 at 02:37:17AM +0200, Rafael J. Wysocki wrote:
>>> Also I think that it would be good to avoid walking the frequency
>>> table twice in case we end up wanting to update the frequency after
>>> all.  With the [4/5] we'd do it once in get_next_freq() and then once
>>> more in cpufreq_driver_fast_switch(), for example, and walking the
>>> frequency table may be more expensive that doing the switch in the
>>> first place.
>>
>> If a driver API is added to return the platform frequency associated
>> with a target frequency, what do you think about requiring the
>> fast_switch API to take a target-supported frequency?
>
> That doesn't help much, because it generally would need to find a
> table entry corresponding to it anyway, to find the actual command
> value to write to a register, for example.
>
> But the driver could be smart and cache the value returned from the
> new callback along with the command value associated with it.  If
> invoked with that particular frequency, it would use the cached
> command.  Otherwise, it would walk the table.

It also makes sense to save both the "raw" value computed by
get_next_freq() and the corresponding "driver" value, because if the
current "raw" value is equal to the previous "raw" value, it shouldn't
be necessary to walk the frequency table at all (as the previous
"driver" value would then be equal to the current "driver" value too).

So maybe the "driver" value should only be checked after the "raw"
value check in sugov_update_commit() or equivalent?
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael J. Wysocki May 20, 2016, 11:54 a.m. UTC | #12
On Fri, May 20, 2016 at 1:39 PM, Rafael J. Wysocki <rafael@kernel.org> wrote:
> On Fri, May 20, 2016 at 2:46 AM, Rafael J. Wysocki <rafael@kernel.org> wrote:
>> On Fri, May 20, 2016 at 2:40 AM, Steve Muckle <steve.muckle@linaro.org> wrote:
>>> On Fri, May 20, 2016 at 02:37:17AM +0200, Rafael J. Wysocki wrote:
>>>> Also I think that it would be good to avoid walking the frequency
>>>> table twice in case we end up wanting to update the frequency after
>>>> all.  With the [4/5] we'd do it once in get_next_freq() and then once
>>>> more in cpufreq_driver_fast_switch(), for example, and walking the
>>>> frequency table may be more expensive that doing the switch in the
>>>> first place.
>>>
>>> If a driver API is added to return the platform frequency associated
>>> with a target frequency, what do you think about requiring the
>>> fast_switch API to take a target-supported frequency?
>>
>> That doesn't help much, because it generally would need to find a
>> table entry corresponding to it anyway, to find the actual command
>> value to write to a register, for example.
>>
>> But the driver could be smart and cache the value returned from the
>> new callback along with the command value associated with it.  If
>> invoked with that particular frequency, it would use the cached
>> command.  Otherwise, it would walk the table.
>
> It also makes sense to save both the "raw" value computed by
> get_next_freq() and the corresponding "driver" value, because if the
> current "raw" value is equal to the previous "raw" value, it shouldn't
> be necessary to walk the frequency table at all (as the previous
> "driver" value would then be equal to the current "driver" value too).
>
> So maybe the "driver" value should only be checked after the "raw"
> value check in sugov_update_commit() or equivalent?

Moreover, you need to be careful about policy->min/max changes,
because both cpufreq_driver_fast_switch() and
__cpufreq_driver_target() clamp the target frequency between those and
if they change in the meantime, you may end up having to use a
different frequency at the driver level even if you get the same "raw"
value as last time.

It looks like we don't do the right thing here in the current code even ...
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael J. Wysocki May 20, 2016, 11:59 a.m. UTC | #13
On Fri, May 20, 2016 at 1:54 PM, Rafael J. Wysocki <rafael@kernel.org> wrote:
> On Fri, May 20, 2016 at 1:39 PM, Rafael J. Wysocki <rafael@kernel.org> wrote:
>> On Fri, May 20, 2016 at 2:46 AM, Rafael J. Wysocki <rafael@kernel.org> wrote:
>>> On Fri, May 20, 2016 at 2:40 AM, Steve Muckle <steve.muckle@linaro.org> wrote:
>>>> On Fri, May 20, 2016 at 02:37:17AM +0200, Rafael J. Wysocki wrote:
>>>>> Also I think that it would be good to avoid walking the frequency
>>>>> table twice in case we end up wanting to update the frequency after
>>>>> all.  With the [4/5] we'd do it once in get_next_freq() and then once
>>>>> more in cpufreq_driver_fast_switch(), for example, and walking the
>>>>> frequency table may be more expensive that doing the switch in the
>>>>> first place.
>>>>
>>>> If a driver API is added to return the platform frequency associated
>>>> with a target frequency, what do you think about requiring the
>>>> fast_switch API to take a target-supported frequency?
>>>
>>> That doesn't help much, because it generally would need to find a
>>> table entry corresponding to it anyway, to find the actual command
>>> value to write to a register, for example.
>>>
>>> But the driver could be smart and cache the value returned from the
>>> new callback along with the command value associated with it.  If
>>> invoked with that particular frequency, it would use the cached
>>> command.  Otherwise, it would walk the table.
>>
>> It also makes sense to save both the "raw" value computed by
>> get_next_freq() and the corresponding "driver" value, because if the
>> current "raw" value is equal to the previous "raw" value, it shouldn't
>> be necessary to walk the frequency table at all (as the previous
>> "driver" value would then be equal to the current "driver" value too).
>>
>> So maybe the "driver" value should only be checked after the "raw"
>> value check in sugov_update_commit() or equivalent?
>
> Moreover, you need to be careful about policy->min/max changes,
> because both cpufreq_driver_fast_switch() and
> __cpufreq_driver_target() clamp the target frequency between those and
> if they change in the meantime, you may end up having to use a
> different frequency at the driver level even if you get the same "raw"
> value as last time.
>
> It looks like we don't do the right thing here in the current code even ...

Scratch that, sorry.  We'll get the "limits" notification and the
need_freq_update thing will cause next_freq to become zero then.
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index e185075fcb5c..4d2907c8a142 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -117,12 +117,11 @@  static void sugov_update_commit(struct sugov_cpu *sg_cpu, int cpu, u64 time,
 	struct sugov_policy *sg_policy = sg_cpu->sg_policy;
 	struct cpufreq_policy *policy = sg_policy->policy;
 
-	sg_policy->last_freq_update_time = time;
-
 	if (sg_policy->next_freq == next_freq) {
 		trace_cpu_frequency(policy->cur, cpu);
 		return;
 	}
+	sg_policy->last_freq_update_time = time;
 	sg_policy->next_freq = next_freq;
 
 	if (sugov_queue_remote_callback(sg_policy, cpu))