Message ID | 20210505111525.001031466@infradead.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | sched,delayacct: Some cleanups | expand |
On Wed, 2021-05-05 at 12:59 +0200, Peter Zijlstra wrote: > Like all scheduler statistics, use sched_clock() based time. > > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Looks like this works even on messed up systems because the delayacct code is called from the same CPU at sleep time and wakeup time, before a task is migrated. Reviewed-by: Rik van Riel <riel@surriel.com>
On Wed, May 05, 2021 at 12:59:41PM +0200, Peter Zijlstra wrote: > @@ -42,10 +42,9 @@ void __delayacct_tsk_init(struct task_st > * Finish delay accounting for a statistic using its timestamps (@start), > * accumalator (@total) and @count > */ > -static void delayacct_end(raw_spinlock_t *lock, u64 *start, u64 *total, > - u32 *count) > +static void delayacct_end(raw_spinlock_t *lock, u64 *start, u64 *total, u32 *count) > { > - s64 ns = ktime_get_ns() - *start; > + s64 ns = local_clock() - *start; I don't think this is safe. These time sections that have preemption and migration enabled and so might span multiple CPUs. local_clock() could end up behind *start, AFAICS.
On Thu, May 06, 2021 at 09:59:11AM -0400, Johannes Weiner wrote: > On Wed, May 05, 2021 at 12:59:41PM +0200, Peter Zijlstra wrote: > > @@ -42,10 +42,9 @@ void __delayacct_tsk_init(struct task_st > > * Finish delay accounting for a statistic using its timestamps (@start), > > * accumalator (@total) and @count > > */ > > -static void delayacct_end(raw_spinlock_t *lock, u64 *start, u64 *total, > > - u32 *count) > > +static void delayacct_end(raw_spinlock_t *lock, u64 *start, u64 *total, u32 *count) > > { > > - s64 ns = ktime_get_ns() - *start; > > + s64 ns = local_clock() - *start; > > I don't think this is safe. These time sections that have preemption > and migration enabled and so might span multiple CPUs. local_clock() > could end up behind *start, AFAICS. Only if you have really crummy hardware, and in that case the drift is bounded by around 1 tick. Also, this function actually checks: ns > 0. And if you do have crummy hardware like that, ktime_get_ns() is the very last thing you want to call at any frequency because it'll be the HPET.
On Thu, May 06, 2021 at 04:17:33PM +0200, Peter Zijlstra wrote: > On Thu, May 06, 2021 at 09:59:11AM -0400, Johannes Weiner wrote: > > On Wed, May 05, 2021 at 12:59:41PM +0200, Peter Zijlstra wrote: > > > @@ -42,10 +42,9 @@ void __delayacct_tsk_init(struct task_st > > > * Finish delay accounting for a statistic using its timestamps (@start), > > > * accumalator (@total) and @count > > > */ > > > -static void delayacct_end(raw_spinlock_t *lock, u64 *start, u64 *total, > > > - u32 *count) > > > +static void delayacct_end(raw_spinlock_t *lock, u64 *start, u64 *total, u32 *count) > > > { > > > - s64 ns = ktime_get_ns() - *start; > > > + s64 ns = local_clock() - *start; > > > > I don't think this is safe. These time sections that have preemption > > and migration enabled and so might span multiple CPUs. local_clock() > > could end up behind *start, AFAICS. > > Only if you have really crummy hardware, and in that case the drift is > bounded by around 1 tick. Also, this function actually checks: ns > 0. Oh, I didn't realize it was that close. I just went off the dramatic warnings on cpu_clock() :-) But yeah, that seems plenty accurate for this purpose. Acked-by: Johannes Weiner <hannes@cmpxchg.org>
On Wed, May 05, 2021 at 12:59:41PM +0200, Peter Zijlstra wrote: > Like all scheduler statistics, use sched_clock() based time. > > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> > --- Goind by your comment about preemption safety not being a concern the patch looks good. Acked-by: Balbir Singh <bsingharora@gmail.com>
On Wed, May 05, 2021 at 12:59:41PM +0200, Peter Zijlstra wrote: > Like all scheduler statistics, use sched_clock() based time. > > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Mel Gorman <mgorman@suse.de>
--- a/kernel/delayacct.c +++ b/kernel/delayacct.c @@ -7,9 +7,9 @@ #include <linux/sched.h> #include <linux/sched/task.h> #include <linux/sched/cputime.h> +#include <linux/sched/clock.h> #include <linux/slab.h> #include <linux/taskstats.h> -#include <linux/time.h> #include <linux/sysctl.h> #include <linux/delayacct.h> #include <linux/module.h> @@ -42,10 +42,9 @@ void __delayacct_tsk_init(struct task_st * Finish delay accounting for a statistic using its timestamps (@start), * accumalator (@total) and @count */ -static void delayacct_end(raw_spinlock_t *lock, u64 *start, u64 *total, - u32 *count) +static void delayacct_end(raw_spinlock_t *lock, u64 *start, u64 *total, u32 *count) { - s64 ns = ktime_get_ns() - *start; + s64 ns = local_clock() - *start; unsigned long flags; if (ns > 0) { @@ -58,7 +57,7 @@ static void delayacct_end(raw_spinlock_t void __delayacct_blkio_start(void) { - current->delays->blkio_start = ktime_get_ns(); + current->delays->blkio_start = local_clock(); } /* @@ -151,21 +150,20 @@ __u64 __delayacct_blkio_ticks(struct tas void __delayacct_freepages_start(void) { - current->delays->freepages_start = ktime_get_ns(); + current->delays->freepages_start = local_clock(); } void __delayacct_freepages_end(void) { - delayacct_end( - ¤t->delays->lock, - ¤t->delays->freepages_start, - ¤t->delays->freepages_delay, - ¤t->delays->freepages_count); + delayacct_end(¤t->delays->lock, + ¤t->delays->freepages_start, + ¤t->delays->freepages_delay, + ¤t->delays->freepages_count); } void __delayacct_thrashing_start(void) { - current->delays->thrashing_start = ktime_get_ns(); + current->delays->thrashing_start = local_clock(); } void __delayacct_thrashing_end(void)
Like all scheduler statistics, use sched_clock() based time. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> --- kernel/delayacct.c | 22 ++++++++++------------ 1 file changed, 10 insertions(+), 12 deletions(-)