Message ID | 20241009-devel-anna-maria-b4-timers-ptp-timekeeping-v2-13-554456a44a15@linutronix.de (mailing list archive) |
---|---|
State | Not Applicable |
Headers | show |
Series | timekeeping: Rework to prepare support of indenpendent PTP clocks | expand |
Context | Check | Description |
---|---|---|
netdev/tree_selection | success | Not a local patch |
On Wed, Oct 9, 2024 at 1:29 AM Anna-Maria Behnsen <anna-maria@linutronix.de> wrote: > > From: Anna-Maria Behnsen <anna-maria@linutronix.de> > > timekeeping_advance() is the only optimized function which uses > shadow_timekeeper for updating the real timekeeper to keep the sequence > counter protected region as small as possible. > > To be able to transform timekeeper updates in other functions to use the > same logic, split out functionality into a separate function > timekeeper_update_staged(). > > While at it, document the reason why the sequence counter must be write > held over the call to timekeeping_update() and the copying to the real > timekeeper and why using a pointer based update is suboptimal. > > No functional change. > > Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de> Acked-by: John Stultz <jstultz@google.com>
On Wed, Oct 9, 2024 at 1:29 AM Anna-Maria Behnsen <anna-maria@linutronix.de> wrote: > > From: Anna-Maria Behnsen <anna-maria@linutronix.de> > > timekeeping_advance() is the only optimized function which uses > shadow_timekeeper for updating the real timekeeper to keep the sequence > counter protected region as small as possible. > > To be able to transform timekeeper updates in other functions to use the > same logic, split out functionality into a separate function > timekeeper_update_staged(). > > While at it, document the reason why the sequence counter must be write > held over the call to timekeeping_update() and the copying to the real > timekeeper and why using a pointer based update is suboptimal. > > No functional change. > > Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de> > --- > kernel/time/timekeeping.c | 43 +++++++++++++++++++++++++++---------------- > 1 file changed, 27 insertions(+), 16 deletions(-) > > diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c > index 878f9606946d..fcb2b8b232d2 100644 > --- a/kernel/time/timekeeping.c > +++ b/kernel/time/timekeeping.c > @@ -780,7 +780,32 @@ static void timekeeping_update(struct tk_data *tkd, struct timekeeper *tk, unsig > * timekeeper structure on the next update with stale data > */ > if (action & TK_MIRROR) > - memcpy(&tk_core.shadow_timekeeper, &tk_core.timekeeper, sizeof(tk_core.timekeeper)); > + memcpy(&tkd->shadow_timekeeper, tk, sizeof(*tk)); > +} > + > +static void timekeeping_update_staged(struct tk_data *tkd, unsigned int action) Minor nit I realized as I saw how this was used later on: timekeeping_update_staged() isn't super clear right off. Maybe timekeeping_update_from_shadow() might make it more clear? thanks -john
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c index 878f9606946d..fcb2b8b232d2 100644 --- a/kernel/time/timekeeping.c +++ b/kernel/time/timekeeping.c @@ -780,7 +780,32 @@ static void timekeeping_update(struct tk_data *tkd, struct timekeeper *tk, unsig * timekeeper structure on the next update with stale data */ if (action & TK_MIRROR) - memcpy(&tk_core.shadow_timekeeper, &tk_core.timekeeper, sizeof(tk_core.timekeeper)); + memcpy(&tkd->shadow_timekeeper, tk, sizeof(*tk)); +} + +static void timekeeping_update_staged(struct tk_data *tkd, unsigned int action) +{ + /* + * Block out readers before invoking timekeeping_update() because + * that updates VDSO and other time related infrastructure. Not + * blocking the readers might let a reader see time going backwards + * when reading from the VDSO after the VDSO update and then + * reading in the kernel from the timekeeper before that got updated. + */ + write_seqcount_begin(&tkd->seq); + + timekeeping_update(tkd, &tkd->shadow_timekeeper, action); + + /* + * Update the real timekeeper. + * + * We could avoid this memcpy() by switching pointers, but that has + * the downside that the reader side does not longer benefit from + * the cacheline optimized data layout of the timekeeper and requires + * another indirection. + */ + memcpy(&tkd->timekeeper, &tkd->shadow_timekeeper, sizeof(tkd->shadow_timekeeper)); + write_seqcount_end(&tkd->seq); } /** @@ -2333,21 +2358,7 @@ static bool timekeeping_advance(enum timekeeping_adv_mode mode) */ clock_set |= accumulate_nsecs_to_secs(tk); - write_seqcount_begin(&tk_core.seq); - /* - * Update the real timekeeper. - * - * We could avoid this memcpy by switching pointers, but that - * requires changes to all other timekeeper usage sites as - * well, i.e. move the timekeeper pointer getter into the - * spinlocked/seqcount protected sections. And we trade this - * memcpy under the tk_core.seq against one before we start - * updating. - */ - timekeeping_update(&tk_core, tk, clock_set); - memcpy(real_tk, tk, sizeof(*tk)); - /* The memcpy must come last. Do not put anything here! */ - write_seqcount_end(&tk_core.seq); + timekeeping_update_staged(&tk_core, clock_set); return !!clock_set; }