Message ID | 20180518185501.173552-1-joel@joelfernandes.org (mailing list archive) |
---|---|
State | Not Applicable, archived |
Delegated to: | Rafael Wysocki |
Headers | show |
On 05/18/2018 11:55 AM, Joel Fernandes (Google.) wrote: > From: "Joel Fernandes (Google)" <joel@joelfernandes.org> > > Currently there is a chance of a schedutil cpufreq update request to be > dropped if there is a pending update request. This pending request can > be delayed if there is a scheduling delay of the irq_work and the wake > up of the schedutil governor kthread. > > A very bad scenario is when a schedutil request was already just made, > such as to reduce the CPU frequency, then a newer request to increase > CPU frequency (even sched deadline urgent frequency increase requests) > can be dropped, even though the rate limits suggest that its Ok to > process a request. This is because of the way the work_in_progress flag > is used. > > This patch improves the situation by allowing new requests to happen > even though the old one is still being processed. Note that in this > approach, if an irq_work was already issued, we just update next_freq > and don't bother to queue another request so there's no extra work being > done to make this happen. > > I had brought up this issue at the OSPM conference and Claudio had a > discussion RFC with an alternate approach [1]. I prefer the approach as > done in the patch below since it doesn't need any new flags and doesn't > cause any other extra overhead. > > [1] https://patchwork.kernel.org/patch/10384261/ > > LGTMed-by: Viresh Kumar <viresh.kumar@linaro.org> > LGTMed-by: Juri Lelli <juri.lelli@redhat.com> > CC: Viresh Kumar <viresh.kumar@linaro.org> > CC: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > CC: Peter Zijlstra <peterz@infradead.org> > CC: Ingo Molnar <mingo@redhat.com> > CC: Patrick Bellasi <patrick.bellasi@arm.com> > CC: Juri Lelli <juri.lelli@redhat.com> > Cc: Luca Abeni <luca.abeni@santannapisa.it> > CC: Joel Fernandes <joelaf@google.com> > CC: Todd Kjos <tkjos@google.com> > CC: claudio@evidence.eu.com > CC: kernel-team@android.com > CC: linux-pm@vger.kernel.org > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> > --- > v1 -> v2: Minor style related changes. > > kernel/sched/cpufreq_schedutil.c | 34 ++++++++++++++++++++++++-------- > 1 file changed, 26 insertions(+), 8 deletions(-) > > diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c > index e13df951aca7..5c482ec38610 100644 > --- a/kernel/sched/cpufreq_schedutil.c > +++ b/kernel/sched/cpufreq_schedutil.c > @@ -92,9 +92,6 @@ static bool sugov_should_update_freq(struct sugov_policy *sg_policy, u64 time) > !cpufreq_can_do_remote_dvfs(sg_policy->policy)) > return false; > > - if (sg_policy->work_in_progress) > - return false; > - > if (unlikely(sg_policy->need_freq_update)) { > sg_policy->need_freq_update = false; > /* > @@ -128,7 +125,7 @@ static void sugov_update_commit(struct sugov_policy *sg_policy, u64 time, > > policy->cur = next_freq; > trace_cpu_frequency(next_freq, smp_processor_id()); > - } else { > + } else if (!sg_policy->work_in_progress) { Not really something you added, but if you are modifying it: Do we really need this work_in_progress flag? irq_work_queue() already checks if the work is pending and then returns true/false. Wouldn't the issue you are trying to fix be resolved just by dropping this flag check entirely? > sg_policy->work_in_progress = true; > irq_work_queue(&sg_policy->irq_work); > } > @@ -291,6 +288,13 @@ static void sugov_update_single(struct update_util_data *hook, u64 time, > > ignore_dl_rate_limit(sg_cpu, sg_policy); > > + /* > + * For slow-switch systems, single policy requests can't run at the > + * moment if update is in progress, unless we acquire update_lock. > + */ > + if (sg_policy->work_in_progress) > + return; > + > if (!sugov_should_update_freq(sg_policy, time)) > return; > > @@ -382,13 +386,27 @@ sugov_update_shared(struct update_util_data *hook, u64 time, unsigned int flags) > static void sugov_work(struct kthread_work *work) > { > struct sugov_policy *sg_policy = container_of(work, struct sugov_policy, work); > + unsigned int freq; > + unsigned long flags; > + > + /* > + * Hold sg_policy->update_lock shortly to handle the case where: > + * incase sg_policy->next_freq is read here, and then updated by > + * sugov_update_shared just before work_in_progress is set to false > + * here, we may miss queueing the new update. > + * > + * Note: If a work was queued after the update_lock is released, > + * sugov_work will just be called again by kthread_work code; and the > + * request will be proceed before the sugov thread sleeps. > + */ > + raw_spin_lock_irqsave(&sg_policy->update_lock, flags); > + freq = sg_policy->next_freq; > + sg_policy->work_in_progress = false; > + raw_spin_unlock_irqrestore(&sg_policy->update_lock, flags); > > mutex_lock(&sg_policy->work_lock); > - __cpufreq_driver_target(sg_policy->policy, sg_policy->next_freq, > - CPUFREQ_RELATION_L); > + __cpufreq_driver_target(sg_policy->policy, freq, CPUFREQ_RELATION_L); > mutex_unlock(&sg_policy->work_lock); > - > - sg_policy->work_in_progress = false; > } > > static void sugov_irq_work(struct irq_work *irq_work) > -Saravana
On Fri, May 18, 2018 at 11:13 PM, Saravana Kannan <skannan@codeaurora.org> wrote: > On 05/18/2018 11:55 AM, Joel Fernandes (Google.) wrote: >> >> From: "Joel Fernandes (Google)" <joel@joelfernandes.org> >> >> Currently there is a chance of a schedutil cpufreq update request to be >> dropped if there is a pending update request. This pending request can >> be delayed if there is a scheduling delay of the irq_work and the wake >> up of the schedutil governor kthread. >> >> A very bad scenario is when a schedutil request was already just made, >> such as to reduce the CPU frequency, then a newer request to increase >> CPU frequency (even sched deadline urgent frequency increase requests) >> can be dropped, even though the rate limits suggest that its Ok to >> process a request. This is because of the way the work_in_progress flag >> is used. >> >> This patch improves the situation by allowing new requests to happen >> even though the old one is still being processed. Note that in this >> approach, if an irq_work was already issued, we just update next_freq >> and don't bother to queue another request so there's no extra work being >> done to make this happen. >> >> I had brought up this issue at the OSPM conference and Claudio had a >> discussion RFC with an alternate approach [1]. I prefer the approach as >> done in the patch below since it doesn't need any new flags and doesn't >> cause any other extra overhead. >> >> [1] https://patchwork.kernel.org/patch/10384261/ >> >> LGTMed-by: Viresh Kumar <viresh.kumar@linaro.org> >> LGTMed-by: Juri Lelli <juri.lelli@redhat.com> >> CC: Viresh Kumar <viresh.kumar@linaro.org> >> CC: Rafael J. Wysocki <rafael.j.wysocki@intel.com> >> CC: Peter Zijlstra <peterz@infradead.org> >> CC: Ingo Molnar <mingo@redhat.com> >> CC: Patrick Bellasi <patrick.bellasi@arm.com> >> CC: Juri Lelli <juri.lelli@redhat.com> >> Cc: Luca Abeni <luca.abeni@santannapisa.it> >> CC: Joel Fernandes <joelaf@google.com> >> CC: Todd Kjos <tkjos@google.com> >> CC: claudio@evidence.eu.com >> CC: kernel-team@android.com >> CC: linux-pm@vger.kernel.org >> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> >> --- >> v1 -> v2: Minor style related changes. >> >> kernel/sched/cpufreq_schedutil.c | 34 ++++++++++++++++++++++++-------- >> 1 file changed, 26 insertions(+), 8 deletions(-) >> >> diff --git a/kernel/sched/cpufreq_schedutil.c >> b/kernel/sched/cpufreq_schedutil.c >> index e13df951aca7..5c482ec38610 100644 >> --- a/kernel/sched/cpufreq_schedutil.c >> +++ b/kernel/sched/cpufreq_schedutil.c >> @@ -92,9 +92,6 @@ static bool sugov_should_update_freq(struct sugov_policy >> *sg_policy, u64 time) >> !cpufreq_can_do_remote_dvfs(sg_policy->policy)) >> return false; >> >> - if (sg_policy->work_in_progress) >> - return false; >> - >> if (unlikely(sg_policy->need_freq_update)) { >> sg_policy->need_freq_update = false; >> /* >> @@ -128,7 +125,7 @@ static void sugov_update_commit(struct sugov_policy >> *sg_policy, u64 time, >> >> policy->cur = next_freq; >> trace_cpu_frequency(next_freq, smp_processor_id()); >> - } else { >> + } else if (!sg_policy->work_in_progress) { > > > Not really something you added, but if you are modifying it: > Do we really need this work_in_progress flag? irq_work_queue() already > checks if the work is pending and then returns true/false. > > Wouldn't the issue you are trying to fix be resolved just by dropping this > flag check entirely? You've missed the entire discussion on that several days ago, sorry. Thanks, Rafael
On 18-05-18, 11:55, Joel Fernandes (Google.) wrote: > From: "Joel Fernandes (Google)" <joel@joelfernandes.org> > > Currently there is a chance of a schedutil cpufreq update request to be > dropped if there is a pending update request. This pending request can > be delayed if there is a scheduling delay of the irq_work and the wake > up of the schedutil governor kthread. > > A very bad scenario is when a schedutil request was already just made, > such as to reduce the CPU frequency, then a newer request to increase > CPU frequency (even sched deadline urgent frequency increase requests) > can be dropped, even though the rate limits suggest that its Ok to > process a request. This is because of the way the work_in_progress flag > is used. > > This patch improves the situation by allowing new requests to happen > even though the old one is still being processed. Note that in this > approach, if an irq_work was already issued, we just update next_freq > and don't bother to queue another request so there's no extra work being > done to make this happen. Now that this isn't an RFC anymore, you shouldn't have added below paragraph here. It could go to the comments section though. > I had brought up this issue at the OSPM conference and Claudio had a > discussion RFC with an alternate approach [1]. I prefer the approach as > done in the patch below since it doesn't need any new flags and doesn't > cause any other extra overhead. > > [1] https://patchwork.kernel.org/patch/10384261/ > > LGTMed-by: Viresh Kumar <viresh.kumar@linaro.org> > LGTMed-by: Juri Lelli <juri.lelli@redhat.com> Looks like a Tag you just invented ? :) > CC: Viresh Kumar <viresh.kumar@linaro.org> > CC: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > CC: Peter Zijlstra <peterz@infradead.org> > CC: Ingo Molnar <mingo@redhat.com> > CC: Patrick Bellasi <patrick.bellasi@arm.com> > CC: Juri Lelli <juri.lelli@redhat.com> > Cc: Luca Abeni <luca.abeni@santannapisa.it> > CC: Joel Fernandes <joelaf@google.com> > CC: Todd Kjos <tkjos@google.com> > CC: claudio@evidence.eu.com > CC: kernel-team@android.com > CC: linux-pm@vger.kernel.org > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> > --- > v1 -> v2: Minor style related changes. > > kernel/sched/cpufreq_schedutil.c | 34 ++++++++++++++++++++++++-------- > 1 file changed, 26 insertions(+), 8 deletions(-) > > diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c > index e13df951aca7..5c482ec38610 100644 > --- a/kernel/sched/cpufreq_schedutil.c > +++ b/kernel/sched/cpufreq_schedutil.c > @@ -92,9 +92,6 @@ static bool sugov_should_update_freq(struct sugov_policy *sg_policy, u64 time) > !cpufreq_can_do_remote_dvfs(sg_policy->policy)) > return false; > > - if (sg_policy->work_in_progress) > - return false; > - > if (unlikely(sg_policy->need_freq_update)) { > sg_policy->need_freq_update = false; > /* > @@ -128,7 +125,7 @@ static void sugov_update_commit(struct sugov_policy *sg_policy, u64 time, > > policy->cur = next_freq; > trace_cpu_frequency(next_freq, smp_processor_id()); > - } else { > + } else if (!sg_policy->work_in_progress) { > sg_policy->work_in_progress = true; > irq_work_queue(&sg_policy->irq_work); > } > @@ -291,6 +288,13 @@ static void sugov_update_single(struct update_util_data *hook, u64 time, > > ignore_dl_rate_limit(sg_cpu, sg_policy); > > + /* > + * For slow-switch systems, single policy requests can't run at the > + * moment if update is in progress, unless we acquire update_lock. > + */ > + if (sg_policy->work_in_progress) > + return; > + I would still want this to go away :) @Rafael, will it be fine to get locking in place for unshared policy platforms ? > if (!sugov_should_update_freq(sg_policy, time)) > return; > > @@ -382,13 +386,27 @@ sugov_update_shared(struct update_util_data *hook, u64 time, unsigned int flags) > static void sugov_work(struct kthread_work *work) > { > struct sugov_policy *sg_policy = container_of(work, struct sugov_policy, work); > + unsigned int freq; > + unsigned long flags; > + > + /* > + * Hold sg_policy->update_lock shortly to handle the case where: > + * incase sg_policy->next_freq is read here, and then updated by > + * sugov_update_shared just before work_in_progress is set to false > + * here, we may miss queueing the new update. > + * > + * Note: If a work was queued after the update_lock is released, > + * sugov_work will just be called again by kthread_work code; and the > + * request will be proceed before the sugov thread sleeps. > + */ > + raw_spin_lock_irqsave(&sg_policy->update_lock, flags); > + freq = sg_policy->next_freq; > + sg_policy->work_in_progress = false; > + raw_spin_unlock_irqrestore(&sg_policy->update_lock, flags); > > mutex_lock(&sg_policy->work_lock); > - __cpufreq_driver_target(sg_policy->policy, sg_policy->next_freq, > - CPUFREQ_RELATION_L); > + __cpufreq_driver_target(sg_policy->policy, freq, CPUFREQ_RELATION_L); > mutex_unlock(&sg_policy->work_lock); > - > - sg_policy->work_in_progress = false; > } > > static void sugov_irq_work(struct irq_work *irq_work) Fix the commit log and you can add my Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
On Mon, May 21, 2018 at 7:14 AM, Viresh Kumar <viresh.kumar@linaro.org> wrote: > On 18-05-18, 11:55, Joel Fernandes (Google.) wrote: >> From: "Joel Fernandes (Google)" <joel@joelfernandes.org> >> >> Currently there is a chance of a schedutil cpufreq update request to be >> dropped if there is a pending update request. This pending request can >> be delayed if there is a scheduling delay of the irq_work and the wake >> up of the schedutil governor kthread. >> >> A very bad scenario is when a schedutil request was already just made, >> such as to reduce the CPU frequency, then a newer request to increase >> CPU frequency (even sched deadline urgent frequency increase requests) >> can be dropped, even though the rate limits suggest that its Ok to >> process a request. This is because of the way the work_in_progress flag >> is used. >> >> This patch improves the situation by allowing new requests to happen >> even though the old one is still being processed. Note that in this >> approach, if an irq_work was already issued, we just update next_freq >> and don't bother to queue another request so there's no extra work being >> done to make this happen. > > Now that this isn't an RFC anymore, you shouldn't have added below > paragraph here. It could go to the comments section though. > >> I had brought up this issue at the OSPM conference and Claudio had a >> discussion RFC with an alternate approach [1]. I prefer the approach as >> done in the patch below since it doesn't need any new flags and doesn't >> cause any other extra overhead. >> >> [1] https://patchwork.kernel.org/patch/10384261/ >> >> LGTMed-by: Viresh Kumar <viresh.kumar@linaro.org> >> LGTMed-by: Juri Lelli <juri.lelli@redhat.com> > > Looks like a Tag you just invented ? :) Yeah. The LGTM from Juri can be converned into an ACK silently IMO. That said I have added Looks-good-to: tags to a couple of commits. :-) >> CC: Viresh Kumar <viresh.kumar@linaro.org> >> CC: Rafael J. Wysocki <rafael.j.wysocki@intel.com> >> CC: Peter Zijlstra <peterz@infradead.org> >> CC: Ingo Molnar <mingo@redhat.com> >> CC: Patrick Bellasi <patrick.bellasi@arm.com> >> CC: Juri Lelli <juri.lelli@redhat.com> >> Cc: Luca Abeni <luca.abeni@santannapisa.it> >> CC: Joel Fernandes <joelaf@google.com> >> CC: Todd Kjos <tkjos@google.com> >> CC: claudio@evidence.eu.com >> CC: kernel-team@android.com >> CC: linux-pm@vger.kernel.org >> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> >> --- >> v1 -> v2: Minor style related changes. >> >> kernel/sched/cpufreq_schedutil.c | 34 ++++++++++++++++++++++++-------- >> 1 file changed, 26 insertions(+), 8 deletions(-) >> >> diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c >> index e13df951aca7..5c482ec38610 100644 >> --- a/kernel/sched/cpufreq_schedutil.c >> +++ b/kernel/sched/cpufreq_schedutil.c >> @@ -92,9 +92,6 @@ static bool sugov_should_update_freq(struct sugov_policy *sg_policy, u64 time) >> !cpufreq_can_do_remote_dvfs(sg_policy->policy)) >> return false; >> >> - if (sg_policy->work_in_progress) >> - return false; >> - >> if (unlikely(sg_policy->need_freq_update)) { >> sg_policy->need_freq_update = false; >> /* >> @@ -128,7 +125,7 @@ static void sugov_update_commit(struct sugov_policy *sg_policy, u64 time, >> >> policy->cur = next_freq; >> trace_cpu_frequency(next_freq, smp_processor_id()); >> - } else { >> + } else if (!sg_policy->work_in_progress) { >> sg_policy->work_in_progress = true; >> irq_work_queue(&sg_policy->irq_work); >> } >> @@ -291,6 +288,13 @@ static void sugov_update_single(struct update_util_data *hook, u64 time, >> >> ignore_dl_rate_limit(sg_cpu, sg_policy); >> >> + /* >> + * For slow-switch systems, single policy requests can't run at the >> + * moment if update is in progress, unless we acquire update_lock. >> + */ >> + if (sg_policy->work_in_progress) >> + return; >> + > > I would still want this to go away :) > > @Rafael, will it be fine to get locking in place for unshared policy > platforms ? As long as it doesn't affect the fast switch path in any way. > >> if (!sugov_should_update_freq(sg_policy, time)) >> return; >> >> @@ -382,13 +386,27 @@ sugov_update_shared(struct update_util_data *hook, u64 time, unsigned int flags) >> static void sugov_work(struct kthread_work *work) >> { >> struct sugov_policy *sg_policy = container_of(work, struct sugov_policy, work); >> + unsigned int freq; >> + unsigned long flags; >> + >> + /* >> + * Hold sg_policy->update_lock shortly to handle the case where: >> + * incase sg_policy->next_freq is read here, and then updated by >> + * sugov_update_shared just before work_in_progress is set to false >> + * here, we may miss queueing the new update. >> + * >> + * Note: If a work was queued after the update_lock is released, >> + * sugov_work will just be called again by kthread_work code; and the >> + * request will be proceed before the sugov thread sleeps. >> + */ >> + raw_spin_lock_irqsave(&sg_policy->update_lock, flags); >> + freq = sg_policy->next_freq; >> + sg_policy->work_in_progress = false; >> + raw_spin_unlock_irqrestore(&sg_policy->update_lock, flags); >> >> mutex_lock(&sg_policy->work_lock); >> - __cpufreq_driver_target(sg_policy->policy, sg_policy->next_freq, >> - CPUFREQ_RELATION_L); >> + __cpufreq_driver_target(sg_policy->policy, freq, CPUFREQ_RELATION_L); >> mutex_unlock(&sg_policy->work_lock); >> - >> - sg_policy->work_in_progress = false; >> } >> >> static void sugov_irq_work(struct irq_work *irq_work) > > Fix the commit log and you can add my I can fix it up. > Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Thanks, Rafael
On 21/05/18 10:29, Rafael J. Wysocki wrote: > On Mon, May 21, 2018 at 7:14 AM, Viresh Kumar <viresh.kumar@linaro.org> wrote: > > On 18-05-18, 11:55, Joel Fernandes (Google.) wrote: > >> From: "Joel Fernandes (Google)" <joel@joelfernandes.org> > >> > >> Currently there is a chance of a schedutil cpufreq update request to be > >> dropped if there is a pending update request. This pending request can > >> be delayed if there is a scheduling delay of the irq_work and the wake > >> up of the schedutil governor kthread. > >> > >> A very bad scenario is when a schedutil request was already just made, > >> such as to reduce the CPU frequency, then a newer request to increase > >> CPU frequency (even sched deadline urgent frequency increase requests) > >> can be dropped, even though the rate limits suggest that its Ok to > >> process a request. This is because of the way the work_in_progress flag > >> is used. > >> > >> This patch improves the situation by allowing new requests to happen > >> even though the old one is still being processed. Note that in this > >> approach, if an irq_work was already issued, we just update next_freq > >> and don't bother to queue another request so there's no extra work being > >> done to make this happen. > > > > Now that this isn't an RFC anymore, you shouldn't have added below > > paragraph here. It could go to the comments section though. > > > >> I had brought up this issue at the OSPM conference and Claudio had a > >> discussion RFC with an alternate approach [1]. I prefer the approach as > >> done in the patch below since it doesn't need any new flags and doesn't > >> cause any other extra overhead. > >> > >> [1] https://patchwork.kernel.org/patch/10384261/ > >> > >> LGTMed-by: Viresh Kumar <viresh.kumar@linaro.org> > >> LGTMed-by: Juri Lelli <juri.lelli@redhat.com> > > > > Looks like a Tag you just invented ? :) > > Yeah. > > The LGTM from Juri can be converned into an ACK silently IMO. That Sure! :) Thanks, - Juri
On 18-May 11:55, Joel Fernandes (Google.) wrote: > From: "Joel Fernandes (Google)" <joel@joelfernandes.org> > > Currently there is a chance of a schedutil cpufreq update request to be > dropped if there is a pending update request. This pending request can > be delayed if there is a scheduling delay of the irq_work and the wake > up of the schedutil governor kthread. > > A very bad scenario is when a schedutil request was already just made, > such as to reduce the CPU frequency, then a newer request to increase > CPU frequency (even sched deadline urgent frequency increase requests) > can be dropped, even though the rate limits suggest that its Ok to > process a request. This is because of the way the work_in_progress flag > is used. > > This patch improves the situation by allowing new requests to happen > even though the old one is still being processed. Note that in this > approach, if an irq_work was already issued, we just update next_freq > and don't bother to queue another request so there's no extra work being > done to make this happen. Maybe I'm missing something but... is not this patch just a partial mitigation of the issue you descrive above? If a DL freq increase is queued, with this patch we store the request but we don't actually increase the frequency until the next schedutil update, which can be one tick away... isn't it? If that's the case, maybe something like the following can complete the cure? ---8<--- #define SUGOV_FREQ_NONE 0 static unsigned int sugov_work_update(struct sugov_policy *sg_policy, unsigned int prev_freq) { unsigned long irq_flags; bool update_freq = true; unsigned int next_freq; /* * Hold sg_policy->update_lock shortly to handle the case where: * incase sg_policy->next_freq is read here, and then updated by * sugov_update_shared just before work_in_progress is set to false * here, we may miss queueing the new update. * * Note: If a work was queued after the update_lock is released, * sugov_work will just be called again by kthread_work code; and the * request will be proceed before the sugov thread sleeps. */ raw_spin_lock_irqsave(&sg_policy->update_lock, irq_flags); next_freq = sg_policy->next_freq; sg_policy->work_in_progress = false; if (prev_freq == next_freq) update_freq = false; raw_spin_unlock_irqrestore(&sg_policy->update_lock, irq_flags); /* * Update the frequency only if it has changed since the last call. */ if (update_freq) { mutex_lock(&sg_policy->work_lock); __cpufreq_driver_target(sg_policy->policy, next_freq, CPUFREQ_RELATION_L); mutex_unlock(&sg_policy->work_lock); return next_freq; } return SUGOV_FREQ_NONE; } static void sugov_work(struct kthread_work *work) { struct sugov_policy *sg_policy = container_of(work, struct sugov_policy, work); unsigned int prev_freq = 0; /* * Keep updating the frequency until we end up with a frequency which * satisfies the most recent request we got meanwhile. */ do { prev_freq = sugov_work_update(sg_policy, prev_freq); } while (prev_freq != SUGOV_FREQ_NONE); } ---8<---
On Mon, May 21, 2018 at 11:50:55AM +0100, Patrick Bellasi wrote: > On 18-May 11:55, Joel Fernandes (Google.) wrote: > > From: "Joel Fernandes (Google)" <joel@joelfernandes.org> > > > > Currently there is a chance of a schedutil cpufreq update request to be > > dropped if there is a pending update request. This pending request can > > be delayed if there is a scheduling delay of the irq_work and the wake > > up of the schedutil governor kthread. > > > > A very bad scenario is when a schedutil request was already just made, > > such as to reduce the CPU frequency, then a newer request to increase > > CPU frequency (even sched deadline urgent frequency increase requests) > > can be dropped, even though the rate limits suggest that its Ok to > > process a request. This is because of the way the work_in_progress flag > > is used. > > > > This patch improves the situation by allowing new requests to happen > > even though the old one is still being processed. Note that in this > > approach, if an irq_work was already issued, we just update next_freq > > and don't bother to queue another request so there's no extra work being > > done to make this happen. > > Maybe I'm missing something but... is not this patch just a partial > mitigation of the issue you descrive above? > > If a DL freq increase is queued, with this patch we store the request > but we don't actually increase the frequency until the next schedutil > update, which can be one tick away... isn't it? > > If that's the case, maybe something like the following can complete > the cure? We already discussed this and thought of this case, I think you missed a previous thread [1]. The outer loop in the kthread_work subsystem will take care of calling sugov_work again incase another request was queued which we happen to miss. So I don't think more complexity is needed to handle the case you're bringing up. thanks! - Joel [1] https://lkml.org/lkml/2018/5/17/668
On Mon, May 21, 2018 at 10:29:52AM +0200, Rafael J. Wysocki wrote: > On Mon, May 21, 2018 at 7:14 AM, Viresh Kumar <viresh.kumar@linaro.org> wrote: > > On 18-05-18, 11:55, Joel Fernandes (Google.) wrote: > >> From: "Joel Fernandes (Google)" <joel@joelfernandes.org> > >> > >> Currently there is a chance of a schedutil cpufreq update request to be > >> dropped if there is a pending update request. This pending request can > >> be delayed if there is a scheduling delay of the irq_work and the wake > >> up of the schedutil governor kthread. > >> > >> A very bad scenario is when a schedutil request was already just made, > >> such as to reduce the CPU frequency, then a newer request to increase > >> CPU frequency (even sched deadline urgent frequency increase requests) > >> can be dropped, even though the rate limits suggest that its Ok to > >> process a request. This is because of the way the work_in_progress flag > >> is used. > >> > >> This patch improves the situation by allowing new requests to happen > >> even though the old one is still being processed. Note that in this > >> approach, if an irq_work was already issued, we just update next_freq > >> and don't bother to queue another request so there's no extra work being > >> done to make this happen. > > > > Now that this isn't an RFC anymore, you shouldn't have added below > > paragraph here. It could go to the comments section though. > > > >> I had brought up this issue at the OSPM conference and Claudio had a > >> discussion RFC with an alternate approach [1]. I prefer the approach as > >> done in the patch below since it doesn't need any new flags and doesn't > >> cause any other extra overhead. > >> > >> [1] https://patchwork.kernel.org/patch/10384261/ > >> > >> LGTMed-by: Viresh Kumar <viresh.kumar@linaro.org> > >> LGTMed-by: Juri Lelli <juri.lelli@redhat.com> > > > > Looks like a Tag you just invented ? :) > > Yeah. > > The LGTM from Juri can be converned into an ACK silently IMO. That > said I have added Looks-good-to: tags to a couple of commits. :-) Cool, I'll covert them to Acks :-) [..] > >> v1 -> v2: Minor style related changes. > >> > >> kernel/sched/cpufreq_schedutil.c | 34 ++++++++++++++++++++++++-------- > >> 1 file changed, 26 insertions(+), 8 deletions(-) > >> > >> diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c > >> index e13df951aca7..5c482ec38610 100644 > >> --- a/kernel/sched/cpufreq_schedutil.c > >> +++ b/kernel/sched/cpufreq_schedutil.c > >> @@ -92,9 +92,6 @@ static bool sugov_should_update_freq(struct sugov_policy *sg_policy, u64 time) > >> !cpufreq_can_do_remote_dvfs(sg_policy->policy)) > >> return false; > >> > >> - if (sg_policy->work_in_progress) > >> - return false; > >> - > >> if (unlikely(sg_policy->need_freq_update)) { > >> sg_policy->need_freq_update = false; > >> /* > >> @@ -128,7 +125,7 @@ static void sugov_update_commit(struct sugov_policy *sg_policy, u64 time, > >> > >> policy->cur = next_freq; > >> trace_cpu_frequency(next_freq, smp_processor_id()); > >> - } else { > >> + } else if (!sg_policy->work_in_progress) { > >> sg_policy->work_in_progress = true; > >> irq_work_queue(&sg_policy->irq_work); > >> } > >> @@ -291,6 +288,13 @@ static void sugov_update_single(struct update_util_data *hook, u64 time, > >> > >> ignore_dl_rate_limit(sg_cpu, sg_policy); > >> > >> + /* > >> + * For slow-switch systems, single policy requests can't run at the > >> + * moment if update is in progress, unless we acquire update_lock. > >> + */ > >> + if (sg_policy->work_in_progress) > >> + return; > >> + > > > > I would still want this to go away :) > > > > @Rafael, will it be fine to get locking in place for unshared policy > > platforms ? > > As long as it doesn't affect the fast switch path in any way. I just realized that on a single policy switch that uses the governor thread, there will be 1 thread per-CPU. The sugov_update_single will be called on the same CPU with interrupts disabled. In sugov_work, we are doing a raw_spin_lock_irqsave which also disables interrupts. So I don't think there's any possibility of a race happening on the same CPU between the frequency update request and the sugov_work executing. In other words, I feel we can drop the above if (..) statement for single policies completely and only keep the changes for the shared policy. Viresh since you brought up the single policy issue initially which made me add this if statememnt, could you let me know if you agree with what I just said? > >> if (!sugov_should_update_freq(sg_policy, time)) > >> return; > >> > >> @@ -382,13 +386,27 @@ sugov_update_shared(struct update_util_data *hook, u64 time, unsigned int flags) > >> static void sugov_work(struct kthread_work *work) > >> { > >> struct sugov_policy *sg_policy = container_of(work, struct sugov_policy, work); > >> + unsigned int freq; > >> + unsigned long flags; > >> + > >> + /* > >> + * Hold sg_policy->update_lock shortly to handle the case where: > >> + * incase sg_policy->next_freq is read here, and then updated by > >> + * sugov_update_shared just before work_in_progress is set to false > >> + * here, we may miss queueing the new update. > >> + * > >> + * Note: If a work was queued after the update_lock is released, > >> + * sugov_work will just be called again by kthread_work code; and the > >> + * request will be proceed before the sugov thread sleeps. > >> + */ > >> + raw_spin_lock_irqsave(&sg_policy->update_lock, flags); > >> + freq = sg_policy->next_freq; > >> + sg_policy->work_in_progress = false; > >> + raw_spin_unlock_irqrestore(&sg_policy->update_lock, flags); > >> > >> mutex_lock(&sg_policy->work_lock); > >> - __cpufreq_driver_target(sg_policy->policy, sg_policy->next_freq, > >> - CPUFREQ_RELATION_L); > >> + __cpufreq_driver_target(sg_policy->policy, freq, CPUFREQ_RELATION_L); > >> mutex_unlock(&sg_policy->work_lock); > >> - > >> - sg_policy->work_in_progress = false; > >> } > >> > >> static void sugov_irq_work(struct irq_work *irq_work) > > > > Fix the commit log and you can add my > > I can fix it up. > > > Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Thanks! - Joel
On 21-May 08:49, Joel Fernandes wrote: > On Mon, May 21, 2018 at 11:50:55AM +0100, Patrick Bellasi wrote: > > On 18-May 11:55, Joel Fernandes (Google.) wrote: > > > From: "Joel Fernandes (Google)" <joel@joelfernandes.org> > > > > > > Currently there is a chance of a schedutil cpufreq update request to be > > > dropped if there is a pending update request. This pending request can > > > be delayed if there is a scheduling delay of the irq_work and the wake > > > up of the schedutil governor kthread. > > > > > > A very bad scenario is when a schedutil request was already just made, > > > such as to reduce the CPU frequency, then a newer request to increase > > > CPU frequency (even sched deadline urgent frequency increase requests) > > > can be dropped, even though the rate limits suggest that its Ok to > > > process a request. This is because of the way the work_in_progress flag > > > is used. > > > > > > This patch improves the situation by allowing new requests to happen > > > even though the old one is still being processed. Note that in this > > > approach, if an irq_work was already issued, we just update next_freq > > > and don't bother to queue another request so there's no extra work being > > > done to make this happen. > > > > Maybe I'm missing something but... is not this patch just a partial > > mitigation of the issue you descrive above? > > > > If a DL freq increase is queued, with this patch we store the request > > but we don't actually increase the frequency until the next schedutil > > update, which can be one tick away... isn't it? > > > > If that's the case, maybe something like the following can complete > > the cure? > > We already discussed this and thought of this case, I think you missed a > previous thread [1]. The outer loop in the kthread_work subsystem will take > care of calling sugov_work again incase another request was queued which we > happen to miss. Ok, I missed that thread... my bad. However, [1] made me noticing that your solution works under the assumption that we keep queuing a new kworker job for each request we get, isn't it? If that's the case, this means that if, for example, during a frequency switch you get a request to reduce the frequency (e.g. deadline task passing the 0-lag time) and right after a request to increase the frequency (e.g. the current FAIR task tick)... you will enqueue a freq drop followed by a freq increase and actually do two frequency hops? > So I don't think more complexity is needed to handle the case > you're bringing up. > > thanks! > > - Joel > > [1] https://lkml.org/lkml/2018/5/17/668 >
Hi Patrick, On Mon, May 21, 2018 at 06:00:50PM +0100, Patrick Bellasi wrote: > On 21-May 08:49, Joel Fernandes wrote: > > On Mon, May 21, 2018 at 11:50:55AM +0100, Patrick Bellasi wrote: > > > On 18-May 11:55, Joel Fernandes (Google.) wrote: > > > > From: "Joel Fernandes (Google)" <joel@joelfernandes.org> > > > > > > > > Currently there is a chance of a schedutil cpufreq update request to be > > > > dropped if there is a pending update request. This pending request can > > > > be delayed if there is a scheduling delay of the irq_work and the wake > > > > up of the schedutil governor kthread. > > > > > > > > A very bad scenario is when a schedutil request was already just made, > > > > such as to reduce the CPU frequency, then a newer request to increase > > > > CPU frequency (even sched deadline urgent frequency increase requests) > > > > can be dropped, even though the rate limits suggest that its Ok to > > > > process a request. This is because of the way the work_in_progress flag > > > > is used. > > > > > > > > This patch improves the situation by allowing new requests to happen > > > > even though the old one is still being processed. Note that in this > > > > approach, if an irq_work was already issued, we just update next_freq > > > > and don't bother to queue another request so there's no extra work being > > > > done to make this happen. > > > > > > Maybe I'm missing something but... is not this patch just a partial > > > mitigation of the issue you descrive above? > > > > > > If a DL freq increase is queued, with this patch we store the request > > > but we don't actually increase the frequency until the next schedutil > > > update, which can be one tick away... isn't it? > > > > > > If that's the case, maybe something like the following can complete > > > the cure? > > > > We already discussed this and thought of this case, I think you missed a > > previous thread [1]. The outer loop in the kthread_work subsystem will take > > care of calling sugov_work again incase another request was queued which we > > happen to miss. > > Ok, I missed that thread... my bad. Sure no problem, sorry I was just pointing out the thread, not blaming you for not reading it ;) > However, [1] made me noticing that your solution works under the > assumption that we keep queuing a new kworker job for each request we > get, isn't it? Not at each request, but each request after work_in_progress was cleared by the sugov_work. Any requests that happen between work_in_progress is set and cleared only result in updating of the next_freq. > If that's the case, this means that if, for example, during a > frequency switch you get a request to reduce the frequency (e.g. > deadline task passing the 0-lag time) and right after a request to > increase the frequency (e.g. the current FAIR task tick)... you will > enqueue a freq drop followed by a freq increase and actually do two > frequency hops? Yes possibly, I see your point but I'm not sure if the tight loop around that is worth the complexity, or atleast is within the scope of my patch. Perhaps the problem you describe can be looked at as a future enhancement? thanks, - Joel
On 21-May 10:20, Joel Fernandes wrote: > Hi Patrick, > > On Mon, May 21, 2018 at 06:00:50PM +0100, Patrick Bellasi wrote: > > On 21-May 08:49, Joel Fernandes wrote: > > > On Mon, May 21, 2018 at 11:50:55AM +0100, Patrick Bellasi wrote: > > > > On 18-May 11:55, Joel Fernandes (Google.) wrote: > > > > > From: "Joel Fernandes (Google)" <joel@joelfernandes.org> > > > > > > > > > > Currently there is a chance of a schedutil cpufreq update request to be > > > > > dropped if there is a pending update request. This pending request can > > > > > be delayed if there is a scheduling delay of the irq_work and the wake > > > > > up of the schedutil governor kthread. > > > > > > > > > > A very bad scenario is when a schedutil request was already just made, > > > > > such as to reduce the CPU frequency, then a newer request to increase > > > > > CPU frequency (even sched deadline urgent frequency increase requests) > > > > > can be dropped, even though the rate limits suggest that its Ok to > > > > > process a request. This is because of the way the work_in_progress flag > > > > > is used. > > > > > > > > > > This patch improves the situation by allowing new requests to happen > > > > > even though the old one is still being processed. Note that in this > > > > > approach, if an irq_work was already issued, we just update next_freq > > > > > and don't bother to queue another request so there's no extra work being > > > > > done to make this happen. > > > > > > > > Maybe I'm missing something but... is not this patch just a partial > > > > mitigation of the issue you descrive above? > > > > > > > > If a DL freq increase is queued, with this patch we store the request > > > > but we don't actually increase the frequency until the next schedutil > > > > update, which can be one tick away... isn't it? > > > > > > > > If that's the case, maybe something like the following can complete > > > > the cure? > > > > > > We already discussed this and thought of this case, I think you missed a > > > previous thread [1]. The outer loop in the kthread_work subsystem will take > > > care of calling sugov_work again incase another request was queued which we > > > happen to miss. > > > > Ok, I missed that thread... my bad. > > Sure no problem, sorry I was just pointing out the thread, not blaming you > for not reading it ;) Sure, np here too ;) > > However, [1] made me noticing that your solution works under the > > assumption that we keep queuing a new kworker job for each request we > > get, isn't it? > > Not at each request, but each request after work_in_progress was cleared by the > sugov_work. Any requests that happen between work_in_progress is set and > cleared only result in updating of the next_freq. I see, so we enqueue for the time of: mutex_lock(&sg_policy->work_lock); __cpufreq_driver_target(sg_policy->policy, freq, CPUFREQ_RELATION_L); mutex_unlock(&sg_policy->work_lock); > > If that's the case, this means that if, for example, during a > > frequency switch you get a request to reduce the frequency (e.g. > > deadline task passing the 0-lag time) and right after a request to > > increase the frequency (e.g. the current FAIR task tick)... you will > > enqueue a freq drop followed by a freq increase and actually do two > > frequency hops? > > Yes possibly, Not sure about the time window above, I can try to get some measurements tomorrow. > I see your point but I'm not sure if the tight loop around that > is worth the complexity, or atleast is within the scope of my patch. > Perhaps the problem you describe can be looked at as a future enhancement? Sure, I already have it as a patch on top of your. I can post it afterwards and we can discuss whether it makes sense or not. Still have to better check, but I think that maybe we can skip the queueing altogether if some work is already pending... in case we wanna go for a dedicated inner loop like the one I was proposing. Apart that, I think that your patch is already fixing 90% of the issue we have now. > thanks, > > - Joel
On Mon, May 21, 2018 at 11:50:55AM +0100, Patrick Bellasi wrote: > On 18-May 11:55, Joel Fernandes (Google.) wrote: > > From: "Joel Fernandes (Google)" <joel@joelfernandes.org> > > > > Currently there is a chance of a schedutil cpufreq update request to be > > dropped if there is a pending update request. This pending request can > > be delayed if there is a scheduling delay of the irq_work and the wake > > up of the schedutil governor kthread. > > > > A very bad scenario is when a schedutil request was already just made, > > such as to reduce the CPU frequency, then a newer request to increase > > CPU frequency (even sched deadline urgent frequency increase requests) > > can be dropped, even though the rate limits suggest that its Ok to > > process a request. This is because of the way the work_in_progress flag > > is used. > > > > This patch improves the situation by allowing new requests to happen > > even though the old one is still being processed. Note that in this > > approach, if an irq_work was already issued, we just update next_freq > > and don't bother to queue another request so there's no extra work being > > done to make this happen. > > Maybe I'm missing something but... is not this patch just a partial > mitigation of the issue you descrive above? > > If a DL freq increase is queued, with this patch we store the request > but we don't actually increase the frequency until the next schedutil > update, which can be one tick away... isn't it? > > If that's the case, maybe something like the following can complete > the cure? > > ---8<--- > #define SUGOV_FREQ_NONE 0 > > static unsigned int sugov_work_update(struct sugov_policy *sg_policy, > unsigned int prev_freq) > { > unsigned long irq_flags; > bool update_freq = true; > unsigned int next_freq; > > /* > * Hold sg_policy->update_lock shortly to handle the case where: > * incase sg_policy->next_freq is read here, and then updated by > * sugov_update_shared just before work_in_progress is set to false > * here, we may miss queueing the new update. > * > * Note: If a work was queued after the update_lock is released, > * sugov_work will just be called again by kthread_work code; and the > * request will be proceed before the sugov thread sleeps. > */ > raw_spin_lock_irqsave(&sg_policy->update_lock, irq_flags); > next_freq = sg_policy->next_freq; > sg_policy->work_in_progress = false; > if (prev_freq == next_freq) > update_freq = false; About this patch on top of mine, I believe this check is already being done by sugov_update_commit? : static void sugov_update_commit(struct sugov_policy *sg_policy, u64 time, unsigned int next_freq) { struct cpufreq_policy *policy = sg_policy->policy; if (sg_policy->next_freq == next_freq) return; sg_policy->next_freq = next_freq; sg_policy->last_freq_update_time = time; ---- thanks, - Joel
On Mon, May 21, 2018 at 6:13 PM, Joel Fernandes <joel@joelfernandes.org> wrote: > On Mon, May 21, 2018 at 10:29:52AM +0200, Rafael J. Wysocki wrote: >> On Mon, May 21, 2018 at 7:14 AM, Viresh Kumar <viresh.kumar@linaro.org> wrote: >> > On 18-05-18, 11:55, Joel Fernandes (Google.) wrote: >> >> From: "Joel Fernandes (Google)" <joel@joelfernandes.org> >> >> >> >> Currently there is a chance of a schedutil cpufreq update request to be >> >> dropped if there is a pending update request. This pending request can >> >> be delayed if there is a scheduling delay of the irq_work and the wake >> >> up of the schedutil governor kthread. >> >> >> >> A very bad scenario is when a schedutil request was already just made, >> >> such as to reduce the CPU frequency, then a newer request to increase >> >> CPU frequency (even sched deadline urgent frequency increase requests) >> >> can be dropped, even though the rate limits suggest that its Ok to >> >> process a request. This is because of the way the work_in_progress flag >> >> is used. >> >> >> >> This patch improves the situation by allowing new requests to happen >> >> even though the old one is still being processed. Note that in this >> >> approach, if an irq_work was already issued, we just update next_freq >> >> and don't bother to queue another request so there's no extra work being >> >> done to make this happen. >> > >> > Now that this isn't an RFC anymore, you shouldn't have added below >> > paragraph here. It could go to the comments section though. >> > >> >> I had brought up this issue at the OSPM conference and Claudio had a >> >> discussion RFC with an alternate approach [1]. I prefer the approach as >> >> done in the patch below since it doesn't need any new flags and doesn't >> >> cause any other extra overhead. >> >> >> >> [1] https://patchwork.kernel.org/patch/10384261/ >> >> >> >> LGTMed-by: Viresh Kumar <viresh.kumar@linaro.org> >> >> LGTMed-by: Juri Lelli <juri.lelli@redhat.com> >> > >> > Looks like a Tag you just invented ? :) >> >> Yeah. >> >> The LGTM from Juri can be converned into an ACK silently IMO. That >> said I have added Looks-good-to: tags to a couple of commits. :-) > > Cool, I'll covert them to Acks :-) So it looks like I should expect an update of this patch, right? Or do you prefer the current one to be applied and work on top of it? > [..] >> >> v1 -> v2: Minor style related changes. >> >> >> >> kernel/sched/cpufreq_schedutil.c | 34 ++++++++++++++++++++++++-------- >> >> 1 file changed, 26 insertions(+), 8 deletions(-) >> >> >> >> diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c >> >> index e13df951aca7..5c482ec38610 100644 >> >> --- a/kernel/sched/cpufreq_schedutil.c >> >> +++ b/kernel/sched/cpufreq_schedutil.c >> >> @@ -92,9 +92,6 @@ static bool sugov_should_update_freq(struct sugov_policy *sg_policy, u64 time) >> >> !cpufreq_can_do_remote_dvfs(sg_policy->policy)) >> >> return false; >> >> >> >> - if (sg_policy->work_in_progress) >> >> - return false; >> >> - >> >> if (unlikely(sg_policy->need_freq_update)) { >> >> sg_policy->need_freq_update = false; >> >> /* >> >> @@ -128,7 +125,7 @@ static void sugov_update_commit(struct sugov_policy *sg_policy, u64 time, >> >> >> >> policy->cur = next_freq; >> >> trace_cpu_frequency(next_freq, smp_processor_id()); >> >> - } else { >> >> + } else if (!sg_policy->work_in_progress) { >> >> sg_policy->work_in_progress = true; >> >> irq_work_queue(&sg_policy->irq_work); >> >> } >> >> @@ -291,6 +288,13 @@ static void sugov_update_single(struct update_util_data *hook, u64 time, >> >> >> >> ignore_dl_rate_limit(sg_cpu, sg_policy); >> >> >> >> + /* >> >> + * For slow-switch systems, single policy requests can't run at the >> >> + * moment if update is in progress, unless we acquire update_lock. >> >> + */ >> >> + if (sg_policy->work_in_progress) >> >> + return; >> >> + >> > >> > I would still want this to go away :) >> > >> > @Rafael, will it be fine to get locking in place for unshared policy >> > platforms ? >> >> As long as it doesn't affect the fast switch path in any way. > > I just realized that on a single policy switch that uses the governor thread, > there will be 1 thread per-CPU. The sugov_update_single will be called on the > same CPU with interrupts disabled. sugov_update_single() doesn't have to run on the target CPU. > In sugov_work, we are doing a > raw_spin_lock_irqsave which also disables interrupts. So I don't think > there's any possibility of a race happening on the same CPU between the > frequency update request and the sugov_work executing. In other words, I feel > we can drop the above if (..) statement for single policies completely and > only keep the changes for the shared policy. Viresh since you brought up the > single policy issue initially which made me add this if statememnt, could you > let me know if you agree with what I just said? Which is why you need the spinlock too.
On 21-05-18, 10:20, Joel Fernandes wrote: > On Mon, May 21, 2018 at 06:00:50PM +0100, Patrick Bellasi wrote: > > If that's the case, this means that if, for example, during a > > frequency switch you get a request to reduce the frequency (e.g. > > deadline task passing the 0-lag time) and right after a request to > > increase the frequency (e.g. the current FAIR task tick)... you will > > enqueue a freq drop followed by a freq increase and actually do two > > frequency hops? I don't think so. Consider the kthread as running currently and has just cleared the work_in_progress flag. Sched update comes at that time and we decide to reduce the frequency, we queue another work and update next_freq. Now if another sched update comes before the kthread finishes its previous loop, we will simply update next_freq and return. So when the next time kthread runs, it will pick the most recent update. Where is the problem both of you see ?
On 21-May 11:05, Joel Fernandes wrote: > On Mon, May 21, 2018 at 11:50:55AM +0100, Patrick Bellasi wrote: > > On 18-May 11:55, Joel Fernandes (Google.) wrote: > > > From: "Joel Fernandes (Google)" <joel@joelfernandes.org> > > > > > > Currently there is a chance of a schedutil cpufreq update request to be > > > dropped if there is a pending update request. This pending request can > > > be delayed if there is a scheduling delay of the irq_work and the wake > > > up of the schedutil governor kthread. > > > > > > A very bad scenario is when a schedutil request was already just made, > > > such as to reduce the CPU frequency, then a newer request to increase > > > CPU frequency (even sched deadline urgent frequency increase requests) > > > can be dropped, even though the rate limits suggest that its Ok to > > > process a request. This is because of the way the work_in_progress flag > > > is used. > > > > > > This patch improves the situation by allowing new requests to happen > > > even though the old one is still being processed. Note that in this > > > approach, if an irq_work was already issued, we just update next_freq > > > and don't bother to queue another request so there's no extra work being > > > done to make this happen. > > > > Maybe I'm missing something but... is not this patch just a partial > > mitigation of the issue you descrive above? > > > > If a DL freq increase is queued, with this patch we store the request > > but we don't actually increase the frequency until the next schedutil > > update, which can be one tick away... isn't it? > > > > If that's the case, maybe something like the following can complete > > the cure? > > > > ---8<--- > > #define SUGOV_FREQ_NONE 0 > > > > static unsigned int sugov_work_update(struct sugov_policy *sg_policy, > > unsigned int prev_freq) > > { > > unsigned long irq_flags; > > bool update_freq = true; > > unsigned int next_freq; > > > > /* > > * Hold sg_policy->update_lock shortly to handle the case where: > > * incase sg_policy->next_freq is read here, and then updated by > > * sugov_update_shared just before work_in_progress is set to false > > * here, we may miss queueing the new update. > > * > > * Note: If a work was queued after the update_lock is released, > > * sugov_work will just be called again by kthread_work code; and the > > * request will be proceed before the sugov thread sleeps. > > */ > > raw_spin_lock_irqsave(&sg_policy->update_lock, irq_flags); > > next_freq = sg_policy->next_freq; > > sg_policy->work_in_progress = false; > > if (prev_freq == next_freq) > > update_freq = false; > > About this patch on top of mine, I believe this check is already being done > by sugov_update_commit? : No, that check is different... > > static void sugov_update_commit(struct sugov_policy *sg_policy, u64 time, > unsigned int next_freq) > { > struct cpufreq_policy *policy = sg_policy->policy; > > if (sg_policy->next_freq == next_freq) > return; > > sg_policy->next_freq = next_freq; > sg_policy->last_freq_update_time = time; > ---- ... in my snippet the check is required to verify if, once a freq swich has been completed by the kthread, the sugov_update_commit has actually committed a new and different frequency wrt the one the kthread has just configured. It means we will have two async paths: 1. sugov_update_commit() which updates sg_policy->next_freq 2. sugov_work_update() which will run in a loop until the last freq it configures matches with the current value of sg_policy->next_freq But again, as we was discussing yesterday, we can have these additional bits in a following patch on top of your.
Okay, me and Rafael were discussing this patch, locking and races around this. On 18-05-18, 11:55, Joel Fernandes (Google.) wrote: > diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c > index e13df951aca7..5c482ec38610 100644 > --- a/kernel/sched/cpufreq_schedutil.c > +++ b/kernel/sched/cpufreq_schedutil.c > @@ -92,9 +92,6 @@ static bool sugov_should_update_freq(struct sugov_policy *sg_policy, u64 time) > !cpufreq_can_do_remote_dvfs(sg_policy->policy)) > return false; > > - if (sg_policy->work_in_progress) > - return false; > - > if (unlikely(sg_policy->need_freq_update)) { > sg_policy->need_freq_update = false; > /* > @@ -128,7 +125,7 @@ static void sugov_update_commit(struct sugov_policy *sg_policy, u64 time, > > policy->cur = next_freq; > trace_cpu_frequency(next_freq, smp_processor_id()); > - } else { > + } else if (!sg_policy->work_in_progress) { > sg_policy->work_in_progress = true; > irq_work_queue(&sg_policy->irq_work); > } > @@ -291,6 +288,13 @@ static void sugov_update_single(struct update_util_data *hook, u64 time, > > ignore_dl_rate_limit(sg_cpu, sg_policy); > > + /* > + * For slow-switch systems, single policy requests can't run at the > + * moment if update is in progress, unless we acquire update_lock. > + */ > + if (sg_policy->work_in_progress) > + return; > + > if (!sugov_should_update_freq(sg_policy, time)) > return; > > @@ -382,13 +386,27 @@ sugov_update_shared(struct update_util_data *hook, u64 time, unsigned int flags) > static void sugov_work(struct kthread_work *work) > { > struct sugov_policy *sg_policy = container_of(work, struct sugov_policy, work); > + unsigned int freq; > + unsigned long flags; > + > + /* > + * Hold sg_policy->update_lock shortly to handle the case where: > + * incase sg_policy->next_freq is read here, and then updated by > + * sugov_update_shared just before work_in_progress is set to false > + * here, we may miss queueing the new update. > + * > + * Note: If a work was queued after the update_lock is released, > + * sugov_work will just be called again by kthread_work code; and the > + * request will be proceed before the sugov thread sleeps. > + */ > + raw_spin_lock_irqsave(&sg_policy->update_lock, flags); > + freq = sg_policy->next_freq; > + sg_policy->work_in_progress = false; > + raw_spin_unlock_irqrestore(&sg_policy->update_lock, flags); > > mutex_lock(&sg_policy->work_lock); > - __cpufreq_driver_target(sg_policy->policy, sg_policy->next_freq, > - CPUFREQ_RELATION_L); > + __cpufreq_driver_target(sg_policy->policy, freq, CPUFREQ_RELATION_L); > mutex_unlock(&sg_policy->work_lock); > - > - sg_policy->work_in_progress = false; > } And I do see a race here for single policy systems doing slow switching. Kthread Sched update sugov_work() sugov_update_single() lock(); // The CPU is free to rearrange below // two in any order, so it may clear // the flag first and then read next // freq. Lets assume it does. work_in_progress = false if (work_in_progress) return; sg_policy->next_freq = 0; freq = sg_policy->next_freq; sg_policy->next_freq = real-next-freq; unlock(); Is the above theory right or am I day dreaming ? :)
Hi Viresh, thanks for clarifying... On 22-May 15:53, Viresh Kumar wrote: > On 21-05-18, 10:20, Joel Fernandes wrote: > > On Mon, May 21, 2018 at 06:00:50PM +0100, Patrick Bellasi wrote: > > > If that's the case, this means that if, for example, during a > > > frequency switch you get a request to reduce the frequency (e.g. > > > deadline task passing the 0-lag time) and right after a request to > > > increase the frequency (e.g. the current FAIR task tick)... you will > > > enqueue a freq drop followed by a freq increase and actually do two > > > frequency hops? > > I don't think so. > > Consider the kthread as running currently and has just cleared the > work_in_progress flag. Sched update comes at that time and we decide > to reduce the frequency, we queue another work and update next_freq. > Now if another sched update comes before the kthread finishes its > previous loop, we will simply update next_freq and return. So when the > next time kthread runs, it will pick the most recent update. Mmm... right... looking better at the two execution contexts: // A) Frequency update requests sugov_update_commit() { sg_policy->next_freq = next_freq; if (!sg_policy->work_in_progress) { sg_policy->work_in_progress = true; irq_work_queue(&sg_policy->irq_work); } } // B) Actual frequency updates sugov_work() { freq = sg_policy->next_freq; sg_policy->work_in_progress = false; __cpufreq_driver_target(sg_policy->policy, freq, CPUFREQ_RELATION_L); } It's true that A will enqueue only one B at the first next_freq update and then it will keep just updating the next_freq. Thus, we should be ensure to have always just one kwork pending in the queue. > Where is the problem both of you see ? Perhaps the confusion comes just from the naming of "work_in_progress", which is confusing since we use it now to represent that we enqueued a frequency change and we wait for the kwork to pick it up. Maybe it can help to rename it to something like kwork_queued or update_pending, update_queued... ?
On 22-May 16:04, Viresh Kumar wrote: > Okay, me and Rafael were discussing this patch, locking and races around this. > > On 18-05-18, 11:55, Joel Fernandes (Google.) wrote: > > @@ -382,13 +386,27 @@ sugov_update_shared(struct update_util_data *hook, u64 time, unsigned int flags) > > static void sugov_work(struct kthread_work *work) > > { > > struct sugov_policy *sg_policy = container_of(work, struct sugov_policy, work); > > + unsigned int freq; > > + unsigned long flags; > > + > > + /* > > + * Hold sg_policy->update_lock shortly to handle the case where: > > + * incase sg_policy->next_freq is read here, and then updated by > > + * sugov_update_shared just before work_in_progress is set to false > > + * here, we may miss queueing the new update. > > + * > > + * Note: If a work was queued after the update_lock is released, > > + * sugov_work will just be called again by kthread_work code; and the > > + * request will be proceed before the sugov thread sleeps. > > + */ > > + raw_spin_lock_irqsave(&sg_policy->update_lock, flags); > > + freq = sg_policy->next_freq; > > + sg_policy->work_in_progress = false; > > + raw_spin_unlock_irqrestore(&sg_policy->update_lock, flags); > > > > mutex_lock(&sg_policy->work_lock); > > - __cpufreq_driver_target(sg_policy->policy, sg_policy->next_freq, > > - CPUFREQ_RELATION_L); > > + __cpufreq_driver_target(sg_policy->policy, freq, CPUFREQ_RELATION_L); > > mutex_unlock(&sg_policy->work_lock); > > - > > - sg_policy->work_in_progress = false; > > } > > And I do see a race here for single policy systems doing slow switching. > > Kthread Sched update > > sugov_work() sugov_update_single() > > lock(); > // The CPU is free to rearrange below > // two in any order, so it may clear > // the flag first and then read next > // freq. Lets assume it does. > work_in_progress = false > > if (work_in_progress) > return; > > sg_policy->next_freq = 0; > freq = sg_policy->next_freq; > sg_policy->next_freq = real-next-freq; > unlock(); > > > > Is the above theory right or am I day dreaming ? :) It could happen, but using: raw_spin_lock_irqsave(&sg_policy->update_lock, flags); freq = READ_ONCE(sg_policy->next_freq) WRITE_ONCE(sg_policy->work_in_progress, false); raw_spin_unlock_irqrestore(&sg_policy->update_lock, flags); if (!READ_ONCE(sg_policy->work_in_progress)) { WRITE_ONCE(sg_policy->work_in_progress, true); irq_work_queue(&sg_policy->irq_work); } should fix it by enforcing the ordering as well as documenting the concurrent access. However, in the "sched update" side, where do we have the sequence: sg_policy->next_freq = 0; sg_policy->next_freq = real-next-freq; AFAICS we always use locals for next_freq and do one single assignment in sugov_update_commit(), isn't it?
On 22-05-18, 11:51, Patrick Bellasi wrote: > It could happen, but using: > > raw_spin_lock_irqsave(&sg_policy->update_lock, flags); > freq = READ_ONCE(sg_policy->next_freq) > WRITE_ONCE(sg_policy->work_in_progress, false); > raw_spin_unlock_irqrestore(&sg_policy->update_lock, flags); > > if (!READ_ONCE(sg_policy->work_in_progress)) { > WRITE_ONCE(sg_policy->work_in_progress, true); > irq_work_queue(&sg_policy->irq_work); > } I think its better to get locking in place for non-fast switching case in single-policy systems right now. > should fix it by enforcing the ordering as well as documenting the > concurrent access. > > However, in the "sched update" side, where do we have the sequence: > > sg_policy->next_freq = 0; > sg_policy->next_freq = real-next-freq; Ah, that was just an example of what a compiler may do (though it shouldn't do).
On Tuesday, May 22, 2018 12:02:24 PM CEST Rafael J. Wysocki wrote: > On Mon, May 21, 2018 at 6:13 PM, Joel Fernandes <joel@joelfernandes.org> wrote: > > On Mon, May 21, 2018 at 10:29:52AM +0200, Rafael J. Wysocki wrote: > >> On Mon, May 21, 2018 at 7:14 AM, Viresh Kumar <viresh.kumar@linaro.org> wrote: > >> > On 18-05-18, 11:55, Joel Fernandes (Google.) wrote: > >> >> From: "Joel Fernandes (Google)" <joel@joelfernandes.org> > >> >> > >> >> Currently there is a chance of a schedutil cpufreq update request to be > >> >> dropped if there is a pending update request. This pending request can > >> >> be delayed if there is a scheduling delay of the irq_work and the wake > >> >> up of the schedutil governor kthread. > >> >> > >> >> A very bad scenario is when a schedutil request was already just made, > >> >> such as to reduce the CPU frequency, then a newer request to increase > >> >> CPU frequency (even sched deadline urgent frequency increase requests) > >> >> can be dropped, even though the rate limits suggest that its Ok to > >> >> process a request. This is because of the way the work_in_progress flag > >> >> is used. > >> >> > >> >> This patch improves the situation by allowing new requests to happen > >> >> even though the old one is still being processed. Note that in this > >> >> approach, if an irq_work was already issued, we just update next_freq > >> >> and don't bother to queue another request so there's no extra work being > >> >> done to make this happen. > >> > > >> > Now that this isn't an RFC anymore, you shouldn't have added below > >> > paragraph here. It could go to the comments section though. > >> > > >> >> I had brought up this issue at the OSPM conference and Claudio had a > >> >> discussion RFC with an alternate approach [1]. I prefer the approach as > >> >> done in the patch below since it doesn't need any new flags and doesn't > >> >> cause any other extra overhead. > >> >> > >> >> [1] https://patchwork.kernel.org/patch/10384261/ > >> >> > >> >> LGTMed-by: Viresh Kumar <viresh.kumar@linaro.org> > >> >> LGTMed-by: Juri Lelli <juri.lelli@redhat.com> > >> > > >> > Looks like a Tag you just invented ? :) > >> > >> Yeah. > >> > >> The LGTM from Juri can be converned into an ACK silently IMO. That > >> said I have added Looks-good-to: tags to a couple of commits. :-) > > > > Cool, I'll covert them to Acks :-) > > So it looks like I should expect an update of this patch, right? > > Or do you prefer the current one to be applied and work on top of it? Well, sorry, I can't apply this one as it is racy.
On Tue, May 22, 2018 at 12:02 PM, Rafael J. Wysocki <rafael@kernel.org> wrote: > On Mon, May 21, 2018 at 6:13 PM, Joel Fernandes <joel@joelfernandes.org> wrote: >> On Mon, May 21, 2018 at 10:29:52AM +0200, Rafael J. Wysocki wrote: >>> On Mon, May 21, 2018 at 7:14 AM, Viresh Kumar <viresh.kumar@linaro.org> wrote: >>> > On 18-05-18, 11:55, Joel Fernandes (Google.) wrote: >>> >> From: "Joel Fernandes (Google)" <joel@joelfernandes.org> >>> >> >>> >> Currently there is a chance of a schedutil cpufreq update request to be >>> >> dropped if there is a pending update request. This pending request can >>> >> be delayed if there is a scheduling delay of the irq_work and the wake >>> >> up of the schedutil governor kthread. >>> >> >>> >> A very bad scenario is when a schedutil request was already just made, >>> >> such as to reduce the CPU frequency, then a newer request to increase >>> >> CPU frequency (even sched deadline urgent frequency increase requests) >>> >> can be dropped, even though the rate limits suggest that its Ok to >>> >> process a request. This is because of the way the work_in_progress flag >>> >> is used. >>> >> >>> >> This patch improves the situation by allowing new requests to happen >>> >> even though the old one is still being processed. Note that in this >>> >> approach, if an irq_work was already issued, we just update next_freq >>> >> and don't bother to queue another request so there's no extra work being >>> >> done to make this happen. >>> > >>> > Now that this isn't an RFC anymore, you shouldn't have added below >>> > paragraph here. It could go to the comments section though. >>> > >>> >> I had brought up this issue at the OSPM conference and Claudio had a >>> >> discussion RFC with an alternate approach [1]. I prefer the approach as >>> >> done in the patch below since it doesn't need any new flags and doesn't >>> >> cause any other extra overhead. >>> >> >>> >> [1] https://patchwork.kernel.org/patch/10384261/ >>> >> >>> >> LGTMed-by: Viresh Kumar <viresh.kumar@linaro.org> >>> >> LGTMed-by: Juri Lelli <juri.lelli@redhat.com> >>> > >>> > Looks like a Tag you just invented ? :) >>> >>> Yeah. >>> >>> The LGTM from Juri can be converned into an ACK silently IMO. That >>> said I have added Looks-good-to: tags to a couple of commits. :-) >> >> Cool, I'll covert them to Acks :-) > > So it looks like I should expect an update of this patch, right? > > Or do you prefer the current one to be applied and work on top of it? > [cut] >> >> I just realized that on a single policy switch that uses the governor thread, >> there will be 1 thread per-CPU. The sugov_update_single will be called on the >> same CPU with interrupts disabled. > > sugov_update_single() doesn't have to run on the target CPU. Which sadly is a bug IMO. :-/ >> In sugov_work, we are doing a >> raw_spin_lock_irqsave which also disables interrupts. So I don't think >> there's any possibility of a race happening on the same CPU between the >> frequency update request and the sugov_work executing. In other words, I feel >> we can drop the above if (..) statement for single policies completely and >> only keep the changes for the shared policy. Viresh since you brought up the >> single policy issue initially which made me add this if statememnt, could you >> let me know if you agree with what I just said? > > Which is why you need the spinlock too. And you totally have a point. With the above bug fixed, disabling interrupts should be sufficient to prevent concurrent updates from occurring in the one-CPU policy case and the work_in_progress check in sugov_update_single() isn't necessary.
On Tue, May 22, 2018 at 5:30 PM, Rafael J. Wysocki <rafael@kernel.org> wrote: > On Tue, May 22, 2018 at 12:02 PM, Rafael J. Wysocki <rafael@kernel.org> wrote: >> On Mon, May 21, 2018 at 6:13 PM, Joel Fernandes <joel@joelfernandes.org> wrote: >>> On Mon, May 21, 2018 at 10:29:52AM +0200, Rafael J. Wysocki wrote: >>>> On Mon, May 21, 2018 at 7:14 AM, Viresh Kumar <viresh.kumar@linaro.org> wrote: >>>> > On 18-05-18, 11:55, Joel Fernandes (Google.) wrote: >>>> >> From: "Joel Fernandes (Google)" <joel@joelfernandes.org> >>>> >> >>>> >> Currently there is a chance of a schedutil cpufreq update request to be >>>> >> dropped if there is a pending update request. This pending request can >>>> >> be delayed if there is a scheduling delay of the irq_work and the wake >>>> >> up of the schedutil governor kthread. >>>> >> >>>> >> A very bad scenario is when a schedutil request was already just made, >>>> >> such as to reduce the CPU frequency, then a newer request to increase >>>> >> CPU frequency (even sched deadline urgent frequency increase requests) >>>> >> can be dropped, even though the rate limits suggest that its Ok to >>>> >> process a request. This is because of the way the work_in_progress flag >>>> >> is used. >>>> >> >>>> >> This patch improves the situation by allowing new requests to happen >>>> >> even though the old one is still being processed. Note that in this >>>> >> approach, if an irq_work was already issued, we just update next_freq >>>> >> and don't bother to queue another request so there's no extra work being >>>> >> done to make this happen. >>>> > >>>> > Now that this isn't an RFC anymore, you shouldn't have added below >>>> > paragraph here. It could go to the comments section though. >>>> > >>>> >> I had brought up this issue at the OSPM conference and Claudio had a >>>> >> discussion RFC with an alternate approach [1]. I prefer the approach as >>>> >> done in the patch below since it doesn't need any new flags and doesn't >>>> >> cause any other extra overhead. >>>> >> >>>> >> [1] https://patchwork.kernel.org/patch/10384261/ >>>> >> >>>> >> LGTMed-by: Viresh Kumar <viresh.kumar@linaro.org> >>>> >> LGTMed-by: Juri Lelli <juri.lelli@redhat.com> >>>> > >>>> > Looks like a Tag you just invented ? :) >>>> >>>> Yeah. >>>> >>>> The LGTM from Juri can be converned into an ACK silently IMO. That >>>> said I have added Looks-good-to: tags to a couple of commits. :-) >>> >>> Cool, I'll covert them to Acks :-) >> >> So it looks like I should expect an update of this patch, right? >> >> Or do you prefer the current one to be applied and work on top of it? >> > > [cut] > >>> >>> I just realized that on a single policy switch that uses the governor thread, >>> there will be 1 thread per-CPU. The sugov_update_single will be called on the >>> same CPU with interrupts disabled. >> >> sugov_update_single() doesn't have to run on the target CPU. > > Which sadly is a bug IMO. :-/ My bad. sugov_update_single() runs under rq->lock, so it need not run on a target CPU so long as the CPU running it can update the frequency for the target and there is the requisite check for that in sugov_should_update_freq(). That means that sugov_update_single() will not run concurrently on two different CPUs for the same target, but it may be running concurrently with the kthread (as pointed out by Viresh). >>> In sugov_work, we are doing a >>> raw_spin_lock_irqsave which also disables interrupts. So I don't think >>> there's any possibility of a race happening on the same CPU between the >>> frequency update request and the sugov_work executing. In other words, I feel >>> we can drop the above if (..) statement for single policies completely and >>> only keep the changes for the shared policy. Viresh since you brought up the >>> single policy issue initially which made me add this if statememnt, could you >>> let me know if you agree with what I just said? >> >> Which is why you need the spinlock too. > > And you totally have a point. Not really, sorry about that. It is necessary to take the spinlock in the non-fast-switch case, because of the possible race with the kthread, so something like my patch at https://patchwork.kernel.org/patch/10418551/ is needed after all.
On Tue, May 22, 2018 at 04:04:15PM +0530, Viresh Kumar wrote: > Okay, me and Rafael were discussing this patch, locking and races around this. > > On 18-05-18, 11:55, Joel Fernandes (Google.) wrote: > > diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c > > index e13df951aca7..5c482ec38610 100644 > > --- a/kernel/sched/cpufreq_schedutil.c > > +++ b/kernel/sched/cpufreq_schedutil.c > > @@ -92,9 +92,6 @@ static bool sugov_should_update_freq(struct sugov_policy *sg_policy, u64 time) > > !cpufreq_can_do_remote_dvfs(sg_policy->policy)) > > return false; > > > > - if (sg_policy->work_in_progress) > > - return false; > > - > > if (unlikely(sg_policy->need_freq_update)) { > > sg_policy->need_freq_update = false; > > /* > > @@ -128,7 +125,7 @@ static void sugov_update_commit(struct sugov_policy *sg_policy, u64 time, > > > > policy->cur = next_freq; > > trace_cpu_frequency(next_freq, smp_processor_id()); > > - } else { > > + } else if (!sg_policy->work_in_progress) { > > sg_policy->work_in_progress = true; > > irq_work_queue(&sg_policy->irq_work); > > } > > @@ -291,6 +288,13 @@ static void sugov_update_single(struct update_util_data *hook, u64 time, > > > > ignore_dl_rate_limit(sg_cpu, sg_policy); > > > > + /* > > + * For slow-switch systems, single policy requests can't run at the > > + * moment if update is in progress, unless we acquire update_lock. > > + */ > > + if (sg_policy->work_in_progress) > > + return; > > + > > if (!sugov_should_update_freq(sg_policy, time)) > > return; > > > > @@ -382,13 +386,27 @@ sugov_update_shared(struct update_util_data *hook, u64 time, unsigned int flags) > > static void sugov_work(struct kthread_work *work) > > { > > struct sugov_policy *sg_policy = container_of(work, struct sugov_policy, work); > > + unsigned int freq; > > + unsigned long flags; > > + > > + /* > > + * Hold sg_policy->update_lock shortly to handle the case where: > > + * incase sg_policy->next_freq is read here, and then updated by > > + * sugov_update_shared just before work_in_progress is set to false > > + * here, we may miss queueing the new update. > > + * > > + * Note: If a work was queued after the update_lock is released, > > + * sugov_work will just be called again by kthread_work code; and the > > + * request will be proceed before the sugov thread sleeps. > > + */ > > + raw_spin_lock_irqsave(&sg_policy->update_lock, flags); > > + freq = sg_policy->next_freq; > > + sg_policy->work_in_progress = false; > > + raw_spin_unlock_irqrestore(&sg_policy->update_lock, flags); > > > > mutex_lock(&sg_policy->work_lock); > > - __cpufreq_driver_target(sg_policy->policy, sg_policy->next_freq, > > - CPUFREQ_RELATION_L); > > + __cpufreq_driver_target(sg_policy->policy, freq, CPUFREQ_RELATION_L); > > mutex_unlock(&sg_policy->work_lock); > > - > > - sg_policy->work_in_progress = false; > > } > > And I do see a race here for single policy systems doing slow switching. > > Kthread Sched update > > sugov_work() sugov_update_single() > > lock(); > // The CPU is free to rearrange below > // two in any order, so it may clear > // the flag first and then read next > // freq. Lets assume it does. > work_in_progress = false > > if (work_in_progress) > return; > > sg_policy->next_freq = 0; > freq = sg_policy->next_freq; > sg_policy->next_freq = real-next-freq; > unlock(); > I agree with the race you describe for single policy slow-switch. Good find :) The mainline sugov_work could also do such reordering in sugov_work, I think. Even with the mutex_unlock in mainline's sugov_work, that work_in_progress write could be reordered by the CPU to happen before the read of next_freq. AIUI, mutex_unlock is expected to be only a release-barrier. Although to be safe, I could just put an smp_mb() there. I believe with that, no locking would be needed for such case. I'll send out a v3 with Acks for the original patch, and the send out the smp_mb() as a separate patch if that's Ok. thanks, - Joel
On Wed, May 23, 2018 at 12:09 AM, Joel Fernandes <joel@joelfernandes.org> wrote: > On Tue, May 22, 2018 at 04:04:15PM +0530, Viresh Kumar wrote: >> Okay, me and Rafael were discussing this patch, locking and races around this. >> >> On 18-05-18, 11:55, Joel Fernandes (Google.) wrote: >> > diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c >> > index e13df951aca7..5c482ec38610 100644 >> > --- a/kernel/sched/cpufreq_schedutil.c >> > +++ b/kernel/sched/cpufreq_schedutil.c >> > @@ -92,9 +92,6 @@ static bool sugov_should_update_freq(struct sugov_policy *sg_policy, u64 time) >> > !cpufreq_can_do_remote_dvfs(sg_policy->policy)) >> > return false; >> > >> > - if (sg_policy->work_in_progress) >> > - return false; >> > - >> > if (unlikely(sg_policy->need_freq_update)) { >> > sg_policy->need_freq_update = false; >> > /* >> > @@ -128,7 +125,7 @@ static void sugov_update_commit(struct sugov_policy *sg_policy, u64 time, >> > >> > policy->cur = next_freq; >> > trace_cpu_frequency(next_freq, smp_processor_id()); >> > - } else { >> > + } else if (!sg_policy->work_in_progress) { >> > sg_policy->work_in_progress = true; >> > irq_work_queue(&sg_policy->irq_work); >> > } >> > @@ -291,6 +288,13 @@ static void sugov_update_single(struct update_util_data *hook, u64 time, >> > >> > ignore_dl_rate_limit(sg_cpu, sg_policy); >> > >> > + /* >> > + * For slow-switch systems, single policy requests can't run at the >> > + * moment if update is in progress, unless we acquire update_lock. >> > + */ >> > + if (sg_policy->work_in_progress) >> > + return; >> > + >> > if (!sugov_should_update_freq(sg_policy, time)) >> > return; >> > >> > @@ -382,13 +386,27 @@ sugov_update_shared(struct update_util_data *hook, u64 time, unsigned int flags) >> > static void sugov_work(struct kthread_work *work) >> > { >> > struct sugov_policy *sg_policy = container_of(work, struct sugov_policy, work); >> > + unsigned int freq; >> > + unsigned long flags; >> > + >> > + /* >> > + * Hold sg_policy->update_lock shortly to handle the case where: >> > + * incase sg_policy->next_freq is read here, and then updated by >> > + * sugov_update_shared just before work_in_progress is set to false >> > + * here, we may miss queueing the new update. >> > + * >> > + * Note: If a work was queued after the update_lock is released, >> > + * sugov_work will just be called again by kthread_work code; and the >> > + * request will be proceed before the sugov thread sleeps. >> > + */ >> > + raw_spin_lock_irqsave(&sg_policy->update_lock, flags); >> > + freq = sg_policy->next_freq; >> > + sg_policy->work_in_progress = false; >> > + raw_spin_unlock_irqrestore(&sg_policy->update_lock, flags); >> > >> > mutex_lock(&sg_policy->work_lock); >> > - __cpufreq_driver_target(sg_policy->policy, sg_policy->next_freq, >> > - CPUFREQ_RELATION_L); >> > + __cpufreq_driver_target(sg_policy->policy, freq, CPUFREQ_RELATION_L); >> > mutex_unlock(&sg_policy->work_lock); >> > - >> > - sg_policy->work_in_progress = false; >> > } >> >> And I do see a race here for single policy systems doing slow switching. >> >> Kthread Sched update >> >> sugov_work() sugov_update_single() >> >> lock(); >> // The CPU is free to rearrange below >> // two in any order, so it may clear >> // the flag first and then read next >> // freq. Lets assume it does. >> work_in_progress = false >> >> if (work_in_progress) >> return; >> >> sg_policy->next_freq = 0; >> freq = sg_policy->next_freq; >> sg_policy->next_freq = real-next-freq; >> unlock(); >> > > I agree with the race you describe for single policy slow-switch. Good find :) > > The mainline sugov_work could also do such reordering in sugov_work, I think. Even > with the mutex_unlock in mainline's sugov_work, that work_in_progress write could > be reordered by the CPU to happen before the read of next_freq. AIUI, > mutex_unlock is expected to be only a release-barrier. > > Although to be safe, I could just put an smp_mb() there. I believe with that, > no locking would be needed for such case. Yes, but leaving the work_in_progress check in sugov_update_single() means that the original problem is still there in the one-CPU policy case. Namely, utilization updates coming in between setting work_in_progress in sugov_update_commit() and clearing it in sugov_work() will be discarded in the one-CPU policy case, but not in the shared policy case. > I'll send out a v3 with Acks for the original patch, OK > and the send out the smp_mb() as a separate patch if that's Ok. I would prefer to use a spinlock in the one-CPU policy non-fast-switch case and remove the work_in_progress check from sugov_update_single(). I can do a patch on top of yours for that. In fact, I've done that already. :-) Thanks, Rafael
On 22-05-18, 15:09, Joel Fernandes wrote: > I agree with the race you describe for single policy slow-switch. Good find :) > > The mainline sugov_work could also do such reordering in sugov_work, I think. Even > with the mutex_unlock in mainline's sugov_work, that work_in_progress write could > be reordered by the CPU to happen before the read of next_freq. AIUI, > mutex_unlock is expected to be only a release-barrier. > > Although to be safe, I could just put an smp_mb() there. I believe with that, > no locking would be needed for such case. > > I'll send out a v3 with Acks for the original patch, and the send out the > smp_mb() as a separate patch if that's Ok. Maybe it would be better to get the fix (with smp_mb) first and then this optimization patch on the top? That would mean that the fix can get applied to stable kernels easily.
On May 23, 2018 2:01:01 AM PDT, Viresh Kumar <viresh.kumar@linaro.org> wrote: >On 22-05-18, 15:09, Joel Fernandes wrote: >> I agree with the race you describe for single policy slow-switch. >Good find :) >> >> The mainline sugov_work could also do such reordering in sugov_work, >I think. Even >> with the mutex_unlock in mainline's sugov_work, that work_in_progress >write could >> be reordered by the CPU to happen before the read of next_freq. AIUI, >> mutex_unlock is expected to be only a release-barrier. >> >> Although to be safe, I could just put an smp_mb() there. I believe >with that, >> no locking would be needed for such case. >> >> I'll send out a v3 with Acks for the original patch, and the send out >the >> smp_mb() as a separate patch if that's Ok. > >Maybe it would be better to get the fix (with smp_mb) first and then >this optimization patch on the top? That would mean that the fix can >get applied to stable kernels easily. Probably. But then Rafael is changing single policy to use the lock so then barrier wouldn't be needed at all. In that case, both mine and Rafael new patch can go into stable which handles your race ( optimization == fix in this case :P ) thanks, - Joel
On 23-05-18, 02:42, Joel Fernandes wrote: > Probably. But then Rafael is changing single policy to use the lock > so then barrier wouldn't be needed at all. In that case, both mine > and Rafael new patch can go into stable which handles your race ( > optimization == fix in this case :P ) Yeah, we discussed that offline. Go get some sleep. There are no barriers in this world :)
diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c index e13df951aca7..5c482ec38610 100644 --- a/kernel/sched/cpufreq_schedutil.c +++ b/kernel/sched/cpufreq_schedutil.c @@ -92,9 +92,6 @@ static bool sugov_should_update_freq(struct sugov_policy *sg_policy, u64 time) !cpufreq_can_do_remote_dvfs(sg_policy->policy)) return false; - if (sg_policy->work_in_progress) - return false; - if (unlikely(sg_policy->need_freq_update)) { sg_policy->need_freq_update = false; /* @@ -128,7 +125,7 @@ static void sugov_update_commit(struct sugov_policy *sg_policy, u64 time, policy->cur = next_freq; trace_cpu_frequency(next_freq, smp_processor_id()); - } else { + } else if (!sg_policy->work_in_progress) { sg_policy->work_in_progress = true; irq_work_queue(&sg_policy->irq_work); } @@ -291,6 +288,13 @@ static void sugov_update_single(struct update_util_data *hook, u64 time, ignore_dl_rate_limit(sg_cpu, sg_policy); + /* + * For slow-switch systems, single policy requests can't run at the + * moment if update is in progress, unless we acquire update_lock. + */ + if (sg_policy->work_in_progress) + return; + if (!sugov_should_update_freq(sg_policy, time)) return; @@ -382,13 +386,27 @@ sugov_update_shared(struct update_util_data *hook, u64 time, unsigned int flags) static void sugov_work(struct kthread_work *work) { struct sugov_policy *sg_policy = container_of(work, struct sugov_policy, work); + unsigned int freq; + unsigned long flags; + + /* + * Hold sg_policy->update_lock shortly to handle the case where: + * incase sg_policy->next_freq is read here, and then updated by + * sugov_update_shared just before work_in_progress is set to false + * here, we may miss queueing the new update. + * + * Note: If a work was queued after the update_lock is released, + * sugov_work will just be called again by kthread_work code; and the + * request will be proceed before the sugov thread sleeps. + */ + raw_spin_lock_irqsave(&sg_policy->update_lock, flags); + freq = sg_policy->next_freq; + sg_policy->work_in_progress = false; + raw_spin_unlock_irqrestore(&sg_policy->update_lock, flags); mutex_lock(&sg_policy->work_lock); - __cpufreq_driver_target(sg_policy->policy, sg_policy->next_freq, - CPUFREQ_RELATION_L); + __cpufreq_driver_target(sg_policy->policy, freq, CPUFREQ_RELATION_L); mutex_unlock(&sg_policy->work_lock); - - sg_policy->work_in_progress = false; } static void sugov_irq_work(struct irq_work *irq_work)