Message ID | 1585904125-2819-1-git-send-email-laoar.shao@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | psi: fix randomized calculation in record_times() | expand |
On Fri, Apr 03, 2020 at 04:55:25AM -0400, Yafang Shao wrote: > @@ -690,7 +690,10 @@ static void psi_group_change(struct psi_group *group, int cpu, > */ > write_seqcount_begin(&groupc->seq); > > - record_times(groupc, cpu, false); > + if (groupc->total_tasks) > + record_times(groupc, cpu, false); > + else > + groupc->state_start = cpu_clock(cpu); This change appears is a no-op. If there are no tasks, groupc->state_mask is 0, and the only thing record_times() does is groupc->state_start = cpu_clock(cpu); Did you encounter actual problems, or are you just reading the code?
On Fri, Apr 3, 2020 at 11:20 PM Johannes Weiner <hannes@cmpxchg.org> wrote: > > On Fri, Apr 03, 2020 at 04:55:25AM -0400, Yafang Shao wrote: > > @@ -690,7 +690,10 @@ static void psi_group_change(struct psi_group *group, int cpu, > > */ > > write_seqcount_begin(&groupc->seq); > > > > - record_times(groupc, cpu, false); > > + if (groupc->total_tasks) > > + record_times(groupc, cpu, false); > > + else > > + groupc->state_start = cpu_clock(cpu); > > This change appears is a no-op. If there are no tasks, > groupc->state_mask is 0, and the only thing record_times() does is > > groupc->state_start = cpu_clock(cpu); > > Did you encounter actual problems, or are you just reading the code? No real issues. Just reading the code and found it looks like a little weird. But as you explained that state_mask can also guarantee it then we don't need to make this change. Thanks for the explanation. Thanks Yafang
diff --git a/include/linux/psi_types.h b/include/linux/psi_types.h index 4b7258495a04..b42cbfdb15e9 100644 --- a/include/linux/psi_types.h +++ b/include/linux/psi_types.h @@ -69,6 +69,8 @@ struct psi_group_cpu { /* States of the tasks belonging to this group */ unsigned int tasks[NR_PSI_TASK_COUNTS]; + /* Sum of above array members */ + unsigned int total_tasks; /* Aggregate pressure state derived from the tasks */ u32 state_mask; diff --git a/kernel/sched/psi.c b/kernel/sched/psi.c index 8f45cdb6463b..7061529dc406 100644 --- a/kernel/sched/psi.c +++ b/kernel/sched/psi.c @@ -690,7 +690,10 @@ static void psi_group_change(struct psi_group *group, int cpu, */ write_seqcount_begin(&groupc->seq); - record_times(groupc, cpu, false); + if (groupc->total_tasks) + record_times(groupc, cpu, false); + else + groupc->state_start = cpu_clock(cpu); for (t = 0, m = clear; m; m &= ~(1 << t), t++) { if (!(m & (1 << t))) @@ -703,11 +706,15 @@ static void psi_group_change(struct psi_group *group, int cpu, psi_bug = 1; } groupc->tasks[t]--; + groupc->total_tasks--; } - for (t = 0; set; set &= ~(1 << t), t++) - if (set & (1 << t)) + for (t = 0; set; set &= ~(1 << t), t++) { + if (set & (1 << t)) { groupc->tasks[t]++; + groupc->total_tasks++; + } + } /* Calculate state mask representing active states */ for (s = 0; s < NR_PSI_STATES; s++) {
In record_times() we use 'now' and groupc->state_start to calculate the delta as bellow, delta = now - groupc->state_start; But note that groupc->state_start may be not initialized yet, IOW, the state_start may be 0 currently. If state_start is 0, this calculation is same with assigning the lower 32-bit of 'now' to delta, that is a random value. To fix this value, we should initialize groupc->state_start before. After we calculate the delta, we will assign 'now' to groupc->state_start then, groupc->state_start = now; This will cause the same issue if groupc->state_start will not be used in a long period. Let's take an example. We create a cgroup foo and run tasks in it. Some of these tasks enter into memstall and state_start is set. Then we move all of these tasks out of cgroup foo for more than (1 << 32) nsecs, and then move them in. That will cause the same issue as above. The root cause of these issues is that we don't initialize the state_start properly. To fix it, we should record how many tasks in this per cpu psi_group. If there's no task in it, we just set state_start and don't calculate the delta, that means it is the begin of the pressure. To avoid redundant calculating the total number of tasks in this per cpu psi_group, a new member 'total_tasks' is introduced in struct psi_group_cpu, which is the sum of array members in tasks[]. Fixes: eb414681d5a0 ("psi: pressure stall information for CPU, memory, and IO") Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Daniel Drake <drake@endlessm.com> Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Yafang Shao <laoar.shao@gmail.com> --- include/linux/psi_types.h | 2 ++ kernel/sched/psi.c | 13 ++++++++++--- 2 files changed, 12 insertions(+), 3 deletions(-)