Message ID | 20220411061702.22978-1-kuyo.chang@mediatek.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [1/1] sched/pelt: Refine the enqueue_load_avg calculate method | expand |
On Mon, Apr 11, 2022 at 02:16:56PM +0800, Kuyo Chang wrote: > From: kuyo chang <kuyo.chang@mediatek.com> > > I meet the warning message at cfs_rq_is_decayed at below code. > > SCHED_WARN_ON(cfs_rq->avg.load_avg || > cfs_rq->avg.util_avg || > cfs_rq->avg.runnable_avg) > > Following is the calltrace. > > Call trace: > __update_blocked_fair > update_blocked_averages > newidle_balance > pick_next_task_fair > __schedule > schedule > pipe_read > vfs_read > ksys_read > > After code analyzing and some debug messages, I found it exits a corner > case at attach_entity_load_avg which will cause load_sum is zero and > load_avg is not. > Consider se_weight is 88761 according by sched_prio_to_weight table. > And assume the get_pelt_divider() is 47742, se->avg.load_avg is 1. > By the calculating for se->avg.load_sum as following will become zero > as following. > se->avg.load_sum = > div_u64(se->avg.load_avg * se->avg.load_sum, se_weight(se)); > se->avg.load_sum = 1*47742/88761 = 0. > > After enqueue_load_avg code as below. > cfs_rq->avg.load_avg += se->avg.load_avg; > cfs_rq->avg.load_sum += se_weight(se) * se->avg.load_sum; > > Then the load_sum for cfs_rq will be 1 while the load_sum for cfs_rq is 0. > So it will hit the warning message. > > After all, I refer the following commit patch to do the similar thing at > enqueue_load_avg. > sched/pelt: Relax the sync of load_sum with load_avg 2d02fa8cc21a ("sched/pelt: Relax the sync of load_sum with load_avg") > > After long time testing, the kernel warning was gone and the system runs > as well as before. > Could probably do with a Fixes: tag too > Signed-off-by: kuyo chang <kuyo.chang@mediatek.com> > --- > kernel/sched/fair.c | 6 ++++-- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index d4bd299d67ab..30d8b6dba249 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -3074,8 +3074,10 @@ account_entity_dequeue(struct cfs_rq *cfs_rq, struct sched_entity *se) > static inline void > enqueue_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se) > { > - cfs_rq->avg.load_avg += se->avg.load_avg; > - cfs_rq->avg.load_sum += se_weight(se) * se->avg.load_sum; > + add_positive(&cfs_rq->avg.load_avg, se->avg.load_avg); > + add_positive(&cfs_rq->avg.load_sum, se_weight(se) * se->avg.load_sum); > + cfs_rq->avg.load_sum = max_t(u32, cfs_rq->avg.load_sum, > + cfs_rq->avg.load_avg * PELT_MIN_DIVIDER); > } Yes,. that looks right. Looking at this again though, I wonder if we should perhaps clean is all up a bit. Does something like the below on top make sense? --- --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -3071,23 +3071,37 @@ account_entity_dequeue(struct cfs_rq *cf } while (0) #ifdef CONFIG_SMP + +/* + * Because of rounding, se->util_sum might ends up being +1 more than + * cfs->util_sum. Although this is not a problem by itself, detaching + * a lot of tasks with the rounding problem between 2 updates of + * util_avg (~1ms) can make cfs->util_sum becoming null whereas + * cfs_util_avg is not. + * + * Check that util_sum is still above its lower bound for the new + * util_avg. Given that period_contrib might have moved since the last + * sync, we are only sure that util_sum must be above or equal to + * util_avg * minimum possible divider + */ +#define __update_sa(sa, name, delta_avg, delta_sum) do { \ + add_positive(&(sa)->name##_avg, delta_avg); \ + add_positive(&(sa)->name##_sum, delta_sum); \ + (sa)->name##_sum = max_t(typeof((sa)->name##_sum), \ + (sa)->name##_sum, \ + (sa)->name##_avg * PELT_MIN_DIVIDER); \ +} while (0) + static inline void enqueue_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se) { - add_positive(&cfs_rq->avg.load_avg, se->avg.load_avg); - add_positive(&cfs_rq->avg.load_sum, se_weight(se) * se->avg.load_sum); - cfs_rq->avg.load_sum = max_t(u32, cfs_rq->avg.load_sum, - cfs_rq->avg.load_avg * PELT_MIN_DIVIDER); + __update_sa(&cfs_rq->avg, load, se->avg.load_avg, se->avg.load_sum); } static inline void dequeue_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se) { - sub_positive(&cfs_rq->avg.load_avg, se->avg.load_avg); - sub_positive(&cfs_rq->avg.load_sum, se_weight(se) * se->avg.load_sum); - /* See update_cfs_rq_load_avg() */ - cfs_rq->avg.load_sum = max_t(u32, cfs_rq->avg.load_sum, - cfs_rq->avg.load_avg * PELT_MIN_DIVIDER); + __update_sa(&cfs_rq->avg, load, -se->avg.load_avg, -se->avg.load_sum); } #else static inline void @@ -3529,12 +3543,7 @@ update_tg_cfs_util(struct cfs_rq *cfs_rq se->avg.util_sum = new_sum; /* Update parent cfs_rq utilization */ - add_positive(&cfs_rq->avg.util_avg, delta_avg); - add_positive(&cfs_rq->avg.util_sum, delta_sum); - - /* See update_cfs_rq_load_avg() */ - cfs_rq->avg.util_sum = max_t(u32, cfs_rq->avg.util_sum, - cfs_rq->avg.util_avg * PELT_MIN_DIVIDER); + __update_sa(&cfs_rq->avg, util, delta_avg, delta_sum); } static inline void @@ -3560,11 +3569,7 @@ update_tg_cfs_runnable(struct cfs_rq *cf se->avg.runnable_sum = new_sum; /* Update parent cfs_rq runnable */ - add_positive(&cfs_rq->avg.runnable_avg, delta_avg); - add_positive(&cfs_rq->avg.runnable_sum, delta_sum); - /* See update_cfs_rq_load_avg() */ - cfs_rq->avg.runnable_sum = max_t(u32, cfs_rq->avg.runnable_sum, - cfs_rq->avg.runnable_avg * PELT_MIN_DIVIDER); + __update_sa(&cfs_rq->avg, runnable, delta_avg, delta_sum); } static inline void @@ -3628,11 +3633,7 @@ update_tg_cfs_load(struct cfs_rq *cfs_rq se->avg.load_sum = runnable_sum; se->avg.load_avg = load_avg; - add_positive(&cfs_rq->avg.load_avg, delta_avg); - add_positive(&cfs_rq->avg.load_sum, delta_sum); - /* See update_cfs_rq_load_avg() */ - cfs_rq->avg.load_sum = max_t(u32, cfs_rq->avg.load_sum, - cfs_rq->avg.load_avg * PELT_MIN_DIVIDER); + __update_sa(&cfs_rq->avg, load, delta_avg, delta_sum); } static inline void add_tg_cfs_propagate(struct cfs_rq *cfs_rq, long runnable_sum) @@ -3747,33 +3748,13 @@ update_cfs_rq_load_avg(u64 now, struct c raw_spin_unlock(&cfs_rq->removed.lock); r = removed_load; - sub_positive(&sa->load_avg, r); - sub_positive(&sa->load_sum, r * divider); - /* See sa->util_sum below */ - sa->load_sum = max_t(u32, sa->load_sum, sa->load_avg * PELT_MIN_DIVIDER); + __update_sa(sa, load, r, r*divider); r = removed_util; - sub_positive(&sa->util_avg, r); - sub_positive(&sa->util_sum, r * divider); - /* - * Because of rounding, se->util_sum might ends up being +1 more than - * cfs->util_sum. Although this is not a problem by itself, detaching - * a lot of tasks with the rounding problem between 2 updates of - * util_avg (~1ms) can make cfs->util_sum becoming null whereas - * cfs_util_avg is not. - * Check that util_sum is still above its lower bound for the new - * util_avg. Given that period_contrib might have moved since the last - * sync, we are only sure that util_sum must be above or equal to - * util_avg * minimum possible divider - */ - sa->util_sum = max_t(u32, sa->util_sum, sa->util_avg * PELT_MIN_DIVIDER); + __update_sa(sa, util, r, r*divider); r = removed_runnable; - sub_positive(&sa->runnable_avg, r); - sub_positive(&sa->runnable_sum, r * divider); - /* See sa->util_sum above */ - sa->runnable_sum = max_t(u32, sa->runnable_sum, - sa->runnable_avg * PELT_MIN_DIVIDER); + __update_sa(sa, runnable, r, r*divider); /* * removed_runnable is the unweighted version of removed_load so we @@ -3861,17 +3842,8 @@ static void attach_entity_load_avg(struc static void detach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se) { dequeue_load_avg(cfs_rq, se); - sub_positive(&cfs_rq->avg.util_avg, se->avg.util_avg); - sub_positive(&cfs_rq->avg.util_sum, se->avg.util_sum); - /* See update_cfs_rq_load_avg() */ - cfs_rq->avg.util_sum = max_t(u32, cfs_rq->avg.util_sum, - cfs_rq->avg.util_avg * PELT_MIN_DIVIDER); - - sub_positive(&cfs_rq->avg.runnable_avg, se->avg.runnable_avg); - sub_positive(&cfs_rq->avg.runnable_sum, se->avg.runnable_sum); - /* See update_cfs_rq_load_avg() */ - cfs_rq->avg.runnable_sum = max_t(u32, cfs_rq->avg.runnable_sum, - cfs_rq->avg.runnable_avg * PELT_MIN_DIVIDER); + __update_sa(&cfs_rq->avg, util, se->avg.util_avg, se->avg.util_sum); + __update_sa(&cfs_rq->avg, runnable, se->avg.runnable_avg, se->avg.runnable_sum); add_tg_cfs_propagate(cfs_rq, -se->avg.load_sum);
On Mon, 11 Apr 2022 at 08:17, Kuyo Chang <kuyo.chang@mediatek.com> wrote: > > From: kuyo chang <kuyo.chang@mediatek.com> > > I meet the warning message at cfs_rq_is_decayed at below code. > > SCHED_WARN_ON(cfs_rq->avg.load_avg || > cfs_rq->avg.util_avg || > cfs_rq->avg.runnable_avg) > > Following is the calltrace. > > Call trace: > __update_blocked_fair > update_blocked_averages > newidle_balance > pick_next_task_fair > __schedule > schedule > pipe_read > vfs_read > ksys_read > > After code analyzing and some debug messages, I found it exits a corner > case at attach_entity_load_avg which will cause load_sum is zero and > load_avg is not. > Consider se_weight is 88761 according by sched_prio_to_weight table. > And assume the get_pelt_divider() is 47742, se->avg.load_avg is 1. > By the calculating for se->avg.load_sum as following will become zero > as following. > se->avg.load_sum = > div_u64(se->avg.load_avg * se->avg.load_sum, se_weight(se)); > se->avg.load_sum = 1*47742/88761 = 0. The root problem is there, se->avg.load_sum must not be null if se->avg.load_avg is not null because the correct relation between _avg and _sum is: load_avg = weight * load_sum / divider. so the fix should be attach_entity_load_avg() and probably the below is enough se->avg.load_sum = div_u64(se->avg.load_avg * se->avg.load_sum, se_weight(se)) + 1; > > After enqueue_load_avg code as below. > cfs_rq->avg.load_avg += se->avg.load_avg; > cfs_rq->avg.load_sum += se_weight(se) * se->avg.load_sum; > > Then the load_sum for cfs_rq will be 1 while the load_sum for cfs_rq is 0. > So it will hit the warning message. > > After all, I refer the following commit patch to do the similar thing at > enqueue_load_avg. > sched/pelt: Relax the sync of load_sum with load_avg > > After long time testing, the kernel warning was gone and the system runs > as well as before. > > Signed-off-by: kuyo chang <kuyo.chang@mediatek.com> > --- > kernel/sched/fair.c | 6 ++++-- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index d4bd299d67ab..30d8b6dba249 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -3074,8 +3074,10 @@ account_entity_dequeue(struct cfs_rq *cfs_rq, struct sched_entity *se) > static inline void > enqueue_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se) > { > - cfs_rq->avg.load_avg += se->avg.load_avg; > - cfs_rq->avg.load_sum += se_weight(se) * se->avg.load_sum; > + add_positive(&cfs_rq->avg.load_avg, se->avg.load_avg); > + add_positive(&cfs_rq->avg.load_sum, se_weight(se) * se->avg.load_sum); > + cfs_rq->avg.load_sum = max_t(u32, cfs_rq->avg.load_sum, > + cfs_rq->avg.load_avg * PELT_MIN_DIVIDER); > } > > static inline void > -- > 2.18.0 >
On Mon, 11 Apr 2022 at 10:25, Peter Zijlstra <peterz@infradead.org> wrote: > > On Mon, Apr 11, 2022 at 02:16:56PM +0800, Kuyo Chang wrote: > > From: kuyo chang <kuyo.chang@mediatek.com> > > > > I meet the warning message at cfs_rq_is_decayed at below code. > > > > SCHED_WARN_ON(cfs_rq->avg.load_avg || > > cfs_rq->avg.util_avg || > > cfs_rq->avg.runnable_avg) > > > > Following is the calltrace. > > > > Call trace: > > __update_blocked_fair > > update_blocked_averages > > newidle_balance > > pick_next_task_fair > > __schedule > > schedule > > pipe_read > > vfs_read > > ksys_read > > > > After code analyzing and some debug messages, I found it exits a corner > > case at attach_entity_load_avg which will cause load_sum is zero and > > load_avg is not. > > Consider se_weight is 88761 according by sched_prio_to_weight table. > > And assume the get_pelt_divider() is 47742, se->avg.load_avg is 1. > > By the calculating for se->avg.load_sum as following will become zero > > as following. > > se->avg.load_sum = > > div_u64(se->avg.load_avg * se->avg.load_sum, se_weight(se)); > > se->avg.load_sum = 1*47742/88761 = 0. > > > > After enqueue_load_avg code as below. > > cfs_rq->avg.load_avg += se->avg.load_avg; > > cfs_rq->avg.load_sum += se_weight(se) * se->avg.load_sum; > > > > Then the load_sum for cfs_rq will be 1 while the load_sum for cfs_rq is 0. > > So it will hit the warning message. > > > > After all, I refer the following commit patch to do the similar thing at > > enqueue_load_avg. > > sched/pelt: Relax the sync of load_sum with load_avg > > 2d02fa8cc21a ("sched/pelt: Relax the sync of load_sum with load_avg") > > > > > After long time testing, the kernel warning was gone and the system runs > > as well as before. > > > > Could probably do with a Fixes: tag too > > > Signed-off-by: kuyo chang <kuyo.chang@mediatek.com> > > --- > > kernel/sched/fair.c | 6 ++++-- > > 1 file changed, 4 insertions(+), 2 deletions(-) > > > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > > index d4bd299d67ab..30d8b6dba249 100644 > > --- a/kernel/sched/fair.c > > +++ b/kernel/sched/fair.c > > @@ -3074,8 +3074,10 @@ account_entity_dequeue(struct cfs_rq *cfs_rq, struct sched_entity *se) > > static inline void > > enqueue_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se) > > { > > - cfs_rq->avg.load_avg += se->avg.load_avg; > > - cfs_rq->avg.load_sum += se_weight(se) * se->avg.load_sum; > > + add_positive(&cfs_rq->avg.load_avg, se->avg.load_avg); > > + add_positive(&cfs_rq->avg.load_sum, se_weight(se) * se->avg.load_sum); > > + cfs_rq->avg.load_sum = max_t(u32, cfs_rq->avg.load_sum, > > + cfs_rq->avg.load_avg * PELT_MIN_DIVIDER); > > } > > Yes,. that looks right. Looking at this again though, I wonder if we > should perhaps clean is all up a bit. adding se->load_avg|sum in cfs_rq should not create a situation where cfs->load_sum is null but not cfs->load_avg unless se->load_avg|sum are wrong from the beginning which seems to be the case here > > Does something like the below on top make sense? > > > --- > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -3071,23 +3071,37 @@ account_entity_dequeue(struct cfs_rq *cf > } while (0) > > #ifdef CONFIG_SMP > + > +/* > + * Because of rounding, se->util_sum might ends up being +1 more than > + * cfs->util_sum. Although this is not a problem by itself, detaching > + * a lot of tasks with the rounding problem between 2 updates of > + * util_avg (~1ms) can make cfs->util_sum becoming null whereas > + * cfs_util_avg is not. > + * > + * Check that util_sum is still above its lower bound for the new > + * util_avg. Given that period_contrib might have moved since the last > + * sync, we are only sure that util_sum must be above or equal to > + * util_avg * minimum possible divider > + */ > +#define __update_sa(sa, name, delta_avg, delta_sum) do { \ > + add_positive(&(sa)->name##_avg, delta_avg); \ > + add_positive(&(sa)->name##_sum, delta_sum); \ > + (sa)->name##_sum = max_t(typeof((sa)->name##_sum), \ > + (sa)->name##_sum, \ > + (sa)->name##_avg * PELT_MIN_DIVIDER); \ > +} while (0) > + > static inline void > enqueue_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se) > { > - add_positive(&cfs_rq->avg.load_avg, se->avg.load_avg); > - add_positive(&cfs_rq->avg.load_sum, se_weight(se) * se->avg.load_sum); > - cfs_rq->avg.load_sum = max_t(u32, cfs_rq->avg.load_sum, > - cfs_rq->avg.load_avg * PELT_MIN_DIVIDER); > + __update_sa(&cfs_rq->avg, load, se->avg.load_avg, se->avg.load_sum); > } > > static inline void > dequeue_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se) > { > - sub_positive(&cfs_rq->avg.load_avg, se->avg.load_avg); > - sub_positive(&cfs_rq->avg.load_sum, se_weight(se) * se->avg.load_sum); > - /* See update_cfs_rq_load_avg() */ > - cfs_rq->avg.load_sum = max_t(u32, cfs_rq->avg.load_sum, > - cfs_rq->avg.load_avg * PELT_MIN_DIVIDER); > + __update_sa(&cfs_rq->avg, load, -se->avg.load_avg, -se->avg.load_sum); > } > #else > static inline void > @@ -3529,12 +3543,7 @@ update_tg_cfs_util(struct cfs_rq *cfs_rq > se->avg.util_sum = new_sum; > > /* Update parent cfs_rq utilization */ > - add_positive(&cfs_rq->avg.util_avg, delta_avg); > - add_positive(&cfs_rq->avg.util_sum, delta_sum); > - > - /* See update_cfs_rq_load_avg() */ > - cfs_rq->avg.util_sum = max_t(u32, cfs_rq->avg.util_sum, > - cfs_rq->avg.util_avg * PELT_MIN_DIVIDER); > + __update_sa(&cfs_rq->avg, util, delta_avg, delta_sum); > } > > static inline void > @@ -3560,11 +3569,7 @@ update_tg_cfs_runnable(struct cfs_rq *cf > se->avg.runnable_sum = new_sum; > > /* Update parent cfs_rq runnable */ > - add_positive(&cfs_rq->avg.runnable_avg, delta_avg); > - add_positive(&cfs_rq->avg.runnable_sum, delta_sum); > - /* See update_cfs_rq_load_avg() */ > - cfs_rq->avg.runnable_sum = max_t(u32, cfs_rq->avg.runnable_sum, > - cfs_rq->avg.runnable_avg * PELT_MIN_DIVIDER); > + __update_sa(&cfs_rq->avg, runnable, delta_avg, delta_sum); > } > > static inline void > @@ -3628,11 +3633,7 @@ update_tg_cfs_load(struct cfs_rq *cfs_rq > > se->avg.load_sum = runnable_sum; > se->avg.load_avg = load_avg; > - add_positive(&cfs_rq->avg.load_avg, delta_avg); > - add_positive(&cfs_rq->avg.load_sum, delta_sum); > - /* See update_cfs_rq_load_avg() */ > - cfs_rq->avg.load_sum = max_t(u32, cfs_rq->avg.load_sum, > - cfs_rq->avg.load_avg * PELT_MIN_DIVIDER); > + __update_sa(&cfs_rq->avg, load, delta_avg, delta_sum); > } > > static inline void add_tg_cfs_propagate(struct cfs_rq *cfs_rq, long runnable_sum) > @@ -3747,33 +3748,13 @@ update_cfs_rq_load_avg(u64 now, struct c > raw_spin_unlock(&cfs_rq->removed.lock); > > r = removed_load; > - sub_positive(&sa->load_avg, r); > - sub_positive(&sa->load_sum, r * divider); > - /* See sa->util_sum below */ > - sa->load_sum = max_t(u32, sa->load_sum, sa->load_avg * PELT_MIN_DIVIDER); > + __update_sa(sa, load, r, r*divider); > > r = removed_util; > - sub_positive(&sa->util_avg, r); > - sub_positive(&sa->util_sum, r * divider); > - /* > - * Because of rounding, se->util_sum might ends up being +1 more than > - * cfs->util_sum. Although this is not a problem by itself, detaching > - * a lot of tasks with the rounding problem between 2 updates of > - * util_avg (~1ms) can make cfs->util_sum becoming null whereas > - * cfs_util_avg is not. > - * Check that util_sum is still above its lower bound for the new > - * util_avg. Given that period_contrib might have moved since the last > - * sync, we are only sure that util_sum must be above or equal to > - * util_avg * minimum possible divider > - */ > - sa->util_sum = max_t(u32, sa->util_sum, sa->util_avg * PELT_MIN_DIVIDER); > + __update_sa(sa, util, r, r*divider); > > r = removed_runnable; > - sub_positive(&sa->runnable_avg, r); > - sub_positive(&sa->runnable_sum, r * divider); > - /* See sa->util_sum above */ > - sa->runnable_sum = max_t(u32, sa->runnable_sum, > - sa->runnable_avg * PELT_MIN_DIVIDER); > + __update_sa(sa, runnable, r, r*divider); > > /* > * removed_runnable is the unweighted version of removed_load so we > @@ -3861,17 +3842,8 @@ static void attach_entity_load_avg(struc > static void detach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se) > { > dequeue_load_avg(cfs_rq, se); > - sub_positive(&cfs_rq->avg.util_avg, se->avg.util_avg); > - sub_positive(&cfs_rq->avg.util_sum, se->avg.util_sum); > - /* See update_cfs_rq_load_avg() */ > - cfs_rq->avg.util_sum = max_t(u32, cfs_rq->avg.util_sum, > - cfs_rq->avg.util_avg * PELT_MIN_DIVIDER); > - > - sub_positive(&cfs_rq->avg.runnable_avg, se->avg.runnable_avg); > - sub_positive(&cfs_rq->avg.runnable_sum, se->avg.runnable_sum); > - /* See update_cfs_rq_load_avg() */ > - cfs_rq->avg.runnable_sum = max_t(u32, cfs_rq->avg.runnable_sum, > - cfs_rq->avg.runnable_avg * PELT_MIN_DIVIDER); > + __update_sa(&cfs_rq->avg, util, se->avg.util_avg, se->avg.util_sum); > + __update_sa(&cfs_rq->avg, runnable, se->avg.runnable_avg, se->avg.runnable_sum); > > add_tg_cfs_propagate(cfs_rq, -se->avg.load_sum); >
On Mon, 2022-04-11 at 10:39 +0200, Vincent Guittot wrote: > On Mon, 11 Apr 2022 at 08:17, Kuyo Chang <kuyo.chang@mediatek.com> > wrote: > > > > From: kuyo chang <kuyo.chang@mediatek.com> > > > > I meet the warning message at cfs_rq_is_decayed at below code. > > > > SCHED_WARN_ON(cfs_rq->avg.load_avg || > > cfs_rq->avg.util_avg || > > cfs_rq->avg.runnable_avg) > > > > Following is the calltrace. > > > > Call trace: > > __update_blocked_fair > > update_blocked_averages > > newidle_balance > > pick_next_task_fair > > __schedule > > schedule > > pipe_read > > vfs_read > > ksys_read > > > > After code analyzing and some debug messages, I found it exits a > > corner > > case at attach_entity_load_avg which will cause load_sum is zero > > and > > load_avg is not. > > Consider se_weight is 88761 according by sched_prio_to_weight > > table. > > And assume the get_pelt_divider() is 47742, se->avg.load_avg is 1. > > By the calculating for se->avg.load_sum as following will become > > zero > > as following. > > se->avg.load_sum = > > div_u64(se->avg.load_avg * se->avg.load_sum, > > se_weight(se)); > > se->avg.load_sum = 1*47742/88761 = 0. > > The root problem is there, se->avg.load_sum must not be null if > se->avg.load_avg is not null because the correct relation between > _avg > and _sum is: > > load_avg = weight * load_sum / divider. > > so the fix should be attach_entity_load_avg() and probably the below > is enough > > se->avg.load_sum = div_u64(se->avg.load_avg * se->avg.load_sum, > se_weight(se)) + 1; Thanks for your kindly suggestion. +1 would make the calcuation for load_sum may be overestimate? How about the below code make sense for fix the corner case? --- --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -3832,7 +3832,8 @@ static void attach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *s se->avg.load_sum = divider; if (se_weight(se)) { se->avg.load_sum = - div_u64(se->avg.load_avg * se->avg.load_sum, se_weight(se)); + (se->avg.load_avg * se->avg.load_sum > se_weight(se)) ? + div_u64(se->avg.load_avg * se->avg.load_sum, se_weight(se)) : 1; } enqueue_load_avg(cfs_rq, se);
Le mardi 12 avril 2022 à 10:51:23 (+0800), Kuyo Chang a écrit : > On Mon, 2022-04-11 at 10:39 +0200, Vincent Guittot wrote: > > On Mon, 11 Apr 2022 at 08:17, Kuyo Chang <kuyo.chang@mediatek.com> > > wrote: > > > > > > From: kuyo chang <kuyo.chang@mediatek.com> > > > > > > I meet the warning message at cfs_rq_is_decayed at below code. > > > > > > SCHED_WARN_ON(cfs_rq->avg.load_avg || > > > cfs_rq->avg.util_avg || > > > cfs_rq->avg.runnable_avg) > > > > > > Following is the calltrace. > > > > > > Call trace: > > > __update_blocked_fair > > > update_blocked_averages > > > newidle_balance > > > pick_next_task_fair > > > __schedule > > > schedule > > > pipe_read > > > vfs_read > > > ksys_read > > > > > > After code analyzing and some debug messages, I found it exits a > > > corner > > > case at attach_entity_load_avg which will cause load_sum is zero > > > and > > > load_avg is not. > > > Consider se_weight is 88761 according by sched_prio_to_weight > > > table. > > > And assume the get_pelt_divider() is 47742, se->avg.load_avg is 1. > > > By the calculating for se->avg.load_sum as following will become > > > zero > > > as following. > > > se->avg.load_sum = > > > div_u64(se->avg.load_avg * se->avg.load_sum, > > > se_weight(se)); > > > se->avg.load_sum = 1*47742/88761 = 0. > > > > The root problem is there, se->avg.load_sum must not be null if > > se->avg.load_avg is not null because the correct relation between > > _avg > > and _sum is: > > > > load_avg = weight * load_sum / divider. > > > > so the fix should be attach_entity_load_avg() and probably the below > > is enough > > > > se->avg.load_sum = div_u64(se->avg.load_avg * se->avg.load_sum, > > se_weight(se)) + 1; > > Thanks for your kindly suggestion. > +1 would make the calcuation for load_sum may be overestimate? > How about the below code make sense for fix the corner case? > > --- > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -3832,7 +3832,8 @@ static void attach_entity_load_avg(struct cfs_rq > *cfs_rq, struct sched_entity *s > se->avg.load_sum = divider; > if (se_weight(se)) { > se->avg.load_sum = > - div_u64(se->avg.load_avg * se->avg.load_sum, > se_weight(se)); > + (se->avg.load_avg * se->avg.load_sum > > se_weight(se)) ? > + div_u64(se->avg.load_avg * se->avg.load_sum, > se_weight(se)) : 1; > } > > enqueue_load_avg(cfs_rq, se); > -- > 2.18.0 In this case, the below is easier to read diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 1658a9428d96..2c685474db23 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -3836,10 +3836,12 @@ static void attach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *s se->avg.runnable_sum = se->avg.runnable_avg * divider; - se->avg.load_sum = divider; - if (se_weight(se)) { + se->avg.load_sum = se->avg.load_avg * divider; + if (se_weight(se) < se->avg.load_sum) { se->avg.load_sum = - div_u64(se->avg.load_avg * se->avg.load_sum, se_weight(se)); + div_u64(se->avg.load_sum, se_weight(se)); + } else { + se->avg.load_sum = 1; } enqueue_load_avg(cfs_rq, se); > > > > > > > > After enqueue_load_avg code as below. > > > cfs_rq->avg.load_avg += se->avg.load_avg; > > > cfs_rq->avg.load_sum += se_weight(se) * se->avg.load_sum; > > > > > > Then the load_sum for cfs_rq will be 1 while the load_sum for > > > cfs_rq is 0. > > > So it will hit the warning message. > > > > > > After all, I refer the following commit patch to do the similar > > > thing at > > > enqueue_load_avg. > > > sched/pelt: Relax the sync of load_sum with load_avg > > > > > > After long time testing, the kernel warning was gone and the system > > > runs > > > as well as before. > > > > > > Signed-off-by: kuyo chang <kuyo.chang@mediatek.com> > > > --- > > > kernel/sched/fair.c | 6 ++++-- > > > 1 file changed, 4 insertions(+), 2 deletions(-) > > > > > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > > > index d4bd299d67ab..30d8b6dba249 100644 > > > --- a/kernel/sched/fair.c > > > +++ b/kernel/sched/fair.c > > > @@ -3074,8 +3074,10 @@ account_entity_dequeue(struct cfs_rq > > > *cfs_rq, struct sched_entity *se) > > > static inline void > > > enqueue_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se) > > > { > > > - cfs_rq->avg.load_avg += se->avg.load_avg; > > > - cfs_rq->avg.load_sum += se_weight(se) * se->avg.load_sum; > > > + add_positive(&cfs_rq->avg.load_avg, se->avg.load_avg); > > > + add_positive(&cfs_rq->avg.load_sum, se_weight(se) * se- > > > >avg.load_sum); > > > + cfs_rq->avg.load_sum = max_t(u32, cfs_rq->avg.load_sum, > > > + cfs_rq->avg.load_avg * > > > PELT_MIN_DIVIDER); > > > } > > > > > > static inline void > > > -- > > > 2.18.0 > > > >
On Tue, 2022-04-12 at 10:58 +0200, Vincent Guittot wrote: > Le mardi 12 avril 2022 à 10:51:23 (+0800), Kuyo Chang a écrit : > > On Mon, 2022-04-11 at 10:39 +0200, Vincent Guittot wrote: > > > On Mon, 11 Apr 2022 at 08:17, Kuyo Chang <kuyo.chang@mediatek.com > > > > > > > wrote: > > > > > > > > From: kuyo chang <kuyo.chang@mediatek.com> > > > > > > > > I meet the warning message at cfs_rq_is_decayed at below code. > > > > > > > > SCHED_WARN_ON(cfs_rq->avg.load_avg || > > > > cfs_rq->avg.util_avg || > > > > cfs_rq->avg.runnable_avg) > > > > > > > > Following is the calltrace. > > > > > > > > Call trace: > > > > __update_blocked_fair > > > > update_blocked_averages > > > > newidle_balance > > > > pick_next_task_fair > > > > __schedule > > > > schedule > > > > pipe_read > > > > vfs_read > > > > ksys_read > > > > > > > > After code analyzing and some debug messages, I found it exits > > > > a > > > > corner > > > > case at attach_entity_load_avg which will cause load_sum is > > > > zero > > > > and > > > > load_avg is not. > > > > Consider se_weight is 88761 according by sched_prio_to_weight > > > > table. > > > > And assume the get_pelt_divider() is 47742, se->avg.load_avg is > > > > 1. > > > > By the calculating for se->avg.load_sum as following will > > > > become > > > > zero > > > > as following. > > > > se->avg.load_sum = > > > > div_u64(se->avg.load_avg * se->avg.load_sum, > > > > se_weight(se)); > > > > se->avg.load_sum = 1*47742/88761 = 0. > > > > > > The root problem is there, se->avg.load_sum must not be null if > > > se->avg.load_avg is not null because the correct relation between > > > _avg > > > and _sum is: > > > > > > load_avg = weight * load_sum / divider. > > > > > > so the fix should be attach_entity_load_avg() and probably the > > > below > > > is enough > > > > > > se->avg.load_sum = div_u64(se->avg.load_avg * se->avg.load_sum, > > > se_weight(se)) + 1; > > > > Thanks for your kindly suggestion. > > +1 would make the calcuation for load_sum may be overestimate? > > How about the below code make sense for fix the corner case? > > > > --- > > --- a/kernel/sched/fair.c > > +++ b/kernel/sched/fair.c > > @@ -3832,7 +3832,8 @@ static void attach_entity_load_avg(struct > > cfs_rq > > *cfs_rq, struct sched_entity *s > > se->avg.load_sum = divider; > > if (se_weight(se)) { > > se->avg.load_sum = > > - div_u64(se->avg.load_avg * se->avg.load_sum, > > se_weight(se)); > > + (se->avg.load_avg * se->avg.load_sum > > > se_weight(se)) ? > > + div_u64(se->avg.load_avg * se->avg.load_sum, > > se_weight(se)) : 1; > > } > > > > enqueue_load_avg(cfs_rq, se); > > -- > > 2.18.0 > > In this case, the below is easier to read > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index 1658a9428d96..2c685474db23 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -3836,10 +3836,12 @@ static void attach_entity_load_avg(struct > cfs_rq *cfs_rq, struct sched_entity *s > > se->avg.runnable_sum = se->avg.runnable_avg * divider; > > - se->avg.load_sum = divider; > - if (se_weight(se)) { > + se->avg.load_sum = se->avg.load_avg * divider; > + if (se_weight(se) < se->avg.load_sum) { > se->avg.load_sum = > - div_u64(se->avg.load_avg * se->avg.load_sum, > se_weight(se)); > + div_u64(se->avg.load_sum, se_weight(se)); > + } else { > + se->avg.load_sum = 1; > } > > enqueue_load_avg(cfs_rq, se); It really easier to read. Thanks for your kindly suggestion. > > > > > > > > > > > > > > After enqueue_load_avg code as below. > > > > cfs_rq->avg.load_avg += se->avg.load_avg; > > > > cfs_rq->avg.load_sum += se_weight(se) * se->avg.load_sum; > > > > > > > > Then the load_sum for cfs_rq will be 1 while the load_sum for > > > > cfs_rq is 0. > > > > So it will hit the warning message. > > > > > > > > After all, I refer the following commit patch to do the similar > > > > thing at > > > > enqueue_load_avg. > > > > sched/pelt: Relax the sync of load_sum with load_avg > > > > > > > > After long time testing, the kernel warning was gone and the > > > > system > > > > runs > > > > as well as before. > > > > > > > > Signed-off-by: kuyo chang <kuyo.chang@mediatek.com> > > > > --- > > > > kernel/sched/fair.c | 6 ++++-- > > > > 1 file changed, 4 insertions(+), 2 deletions(-) > > > > > > > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > > > > index d4bd299d67ab..30d8b6dba249 100644 > > > > --- a/kernel/sched/fair.c > > > > +++ b/kernel/sched/fair.c > > > > @@ -3074,8 +3074,10 @@ account_entity_dequeue(struct cfs_rq > > > > *cfs_rq, struct sched_entity *se) > > > > static inline void > > > > enqueue_load_avg(struct cfs_rq *cfs_rq, struct sched_entity > > > > *se) > > > > { > > > > - cfs_rq->avg.load_avg += se->avg.load_avg; > > > > - cfs_rq->avg.load_sum += se_weight(se) * se- > > > > >avg.load_sum; > > > > + add_positive(&cfs_rq->avg.load_avg, se->avg.load_avg); > > > > + add_positive(&cfs_rq->avg.load_sum, se_weight(se) * se- > > > > > avg.load_sum); > > > > > > > > + cfs_rq->avg.load_sum = max_t(u32, cfs_rq->avg.load_sum, > > > > + cfs_rq->avg.load_avg > > > > * > > > > PELT_MIN_DIVIDER); > > > > } > > > > > > > > static inline void > > > > -- > > > > 2.18.0 > > > >
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index d4bd299d67ab..30d8b6dba249 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -3074,8 +3074,10 @@ account_entity_dequeue(struct cfs_rq *cfs_rq, struct sched_entity *se) static inline void enqueue_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se) { - cfs_rq->avg.load_avg += se->avg.load_avg; - cfs_rq->avg.load_sum += se_weight(se) * se->avg.load_sum; + add_positive(&cfs_rq->avg.load_avg, se->avg.load_avg); + add_positive(&cfs_rq->avg.load_sum, se_weight(se) * se->avg.load_sum); + cfs_rq->avg.load_sum = max_t(u32, cfs_rq->avg.load_sum, + cfs_rq->avg.load_avg * PELT_MIN_DIVIDER); } static inline void