[RFC,04/11] timekeeping: Export the boot clock in snapshots

Message ID	20240805173234.3542917-5-vdonnefort@google.com (mailing list archive)
State	Superseded
Headers	show Received: from mail-yb1-f201.google.com (mail-yb1-f201.google.com [209.85.219.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 80ECC15F31D for <linux-trace-kernel@vger.kernel.org>; Mon, 5 Aug 2024 17:33:43 +0000 (UTC) Date: Mon, 5 Aug 2024 18:32:27 +0100 In-Reply-To: <20240805173234.3542917-1-vdonnefort@google.com> Precedence: bulk Mime-Version: 1.0 References: <20240805173234.3542917-1-vdonnefort@google.com> Message-ID: <20240805173234.3542917-5-vdonnefort@google.com> Subject: [RFC PATCH 04/11] timekeeping: Export the boot clock in snapshots From: Vincent Donnefort <vdonnefort@google.com> To: rostedt@goodmis.org, mhiramat@kernel.org, linux-trace-kernel@vger.kernel.org, maz@kernel.org, oliver.upton@linux.dev Cc: kvmarm@lists.linux.dev, will@kernel.org, qperret@google.com, kernel-team@android.com, Vincent Donnefort <vdonnefort@google.com> Content-Type: text/plain; charset="UTF-8"
Series	Tracefs support for pKVM \| expand [RFC,00/11] Tracefs support for pKVM [RFC,01/11] ring-buffer: Check for empty ring-buffer with rb_num_of_entries() [RFC,02/11] ring-buffer: Introducing ring-buffer writer [RFC,03/11] ring-buffer: Expose buffer_data_page material [RFC,04/11] timekeeping: Export the boot clock in snapshots [RFC,05/11] KVM: arm64: Support unaligned fixmap in the nVHE hyp [RFC,06/11] KVM: arm64: Add clock support in the nVHE hyp [RFC,07/11] KVM: arm64: Add tracing support for the pKVM hyp [RFC,08/11] KVM: arm64: Add hyp tracing to tracefs [RFC,09/11] KVM: arm64: Add raw interface for hyp tracefs [RFC,10/11] KVM: arm64: Add support for hyp events [RFC,11/11] KVM: arm64: Add kselftest for tracefs hyp tracefs

Vincent Donnefort Aug. 5, 2024, 5:32 p.m. UTC

On arm64 systems, the arch timer can be accessible by both EL1 and EL2.
This means when running with nVHE or protected KVM, it is easy to
generate clock values from the hypervisor, synchronized with the kernel.

For tracing purpose, the boot clock is interesting as it doesn't stop on
suspend. Export it as part of the time snapshot. This will later allow
the hypervisor to add boot clock timestamps to its events.

Signed-off-by: Vincent Donnefort <vdonnefort@google.com>

Marc Zyngier Aug. 22, 2024, 9:13 a.m. UTC | #1

On Mon, 05 Aug 2024 18:32:27 +0100,
Vincent Donnefort <vdonnefort@google.com> wrote:
> 
> On arm64 systems, the arch timer can be accessible by both EL1 and EL2.
> This means when running with nVHE or protected KVM, it is easy to
> generate clock values from the hypervisor, synchronized with the kernel.

When you say "arch_timer" here, are you talking about the data
structure describing the timer? Or about the actual *counter*, a
system register provided by the HW?

I'm not sure the architecture-specific details are massively relevant,
given that this is an arch-agnostic change.

>
> For tracing purpose, the boot clock is interesting as it doesn't stop on
> suspend. Export it as part of the time snapshot. This will later allow
> the hypervisor to add boot clock timestamps to its events.

Isn't that the actual description of the change? By getting the boot
time as well as the parameters to compute an increment, you allow any
subsystem able to perform a snapshot to compute a delta from boot time
as long as they have access to the counter source.

> 
> Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
> 
> diff --git a/include/linux/timekeeping.h b/include/linux/timekeeping.h
> index fc12a9ba2c88..0fc6a61d64bd 100644
> --- a/include/linux/timekeeping.h
> +++ b/include/linux/timekeeping.h
> @@ -275,18 +275,24 @@ struct ktime_timestamps {
>   *				 counter value
>   * @cycles:	Clocksource counter value to produce the system times
>   * @real:	Realtime system time
> + * @boot:	Boot time
>   * @raw:	Monotonic raw system time
>   * @cs_id:	Clocksource ID
>   * @clock_was_set_seq:	The sequence number of clock-was-set events
>   * @cs_was_changed_seq:	The sequence number of clocksource change events
> + * @mono_shift:	The monotonic clock slope shift
> + * @mono_mult:	The monotonic clock slope mult
>   */
>  struct system_time_snapshot {
>  	u64			cycles;
>  	ktime_t			real;
> +	ktime_t			boot;
>  	ktime_t			raw;
>  	enum clocksource_ids	cs_id;
>  	unsigned int		clock_was_set_seq;
>  	u8			cs_was_changed_seq;
> +	u32			mono_shift;
> +	u32			mono_mult;
>  };
>  
>  /**
> diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
> index 2fa87dcfeda9..6d0488a555a7 100644
> --- a/kernel/time/timekeeping.c
> +++ b/kernel/time/timekeeping.c
> @@ -1057,9 +1057,11 @@ noinstr time64_t __ktime_get_real_seconds(void)
>  void ktime_get_snapshot(struct system_time_snapshot *systime_snapshot)
>  {
>  	struct timekeeper *tk = &tk_core.timekeeper;
> +	u32 mono_mult, mono_shift;
>  	unsigned int seq;
>  	ktime_t base_raw;
>  	ktime_t base_real;
> +	ktime_t base_boot;
>  	u64 nsec_raw;
>  	u64 nsec_real;
>  	u64 now;
> @@ -1074,14 +1076,21 @@ void ktime_get_snapshot(struct system_time_snapshot *systime_snapshot)
>  		systime_snapshot->clock_was_set_seq = tk->clock_was_set_seq;
>  		base_real = ktime_add(tk->tkr_mono.base,
>  				      tk_core.timekeeper.offs_real);
> +		base_boot = ktime_add(tk->tkr_mono.base,
> +				      tk_core.timekeeper.offs_boot);
>  		base_raw = tk->tkr_raw.base;
>  		nsec_real = timekeeping_cycles_to_ns(&tk->tkr_mono, now);
>  		nsec_raw  = timekeeping_cycles_to_ns(&tk->tkr_raw, now);
> +		mono_mult = tk->tkr_mono.mult;
> +		mono_shift = tk->tkr_mono.shift;
>  	} while (read_seqcount_retry(&tk_core.seq, seq));
>  
>  	systime_snapshot->cycles = now;
>  	systime_snapshot->real = ktime_add_ns(base_real, nsec_real);
> +	systime_snapshot->boot = ktime_add_ns(base_boot, nsec_real);
>  	systime_snapshot->raw = ktime_add_ns(base_raw, nsec_raw);
> +	systime_snapshot->mono_shift = mono_shift;
> +	systime_snapshot->mono_mult = mono_mult;
>  }
>  EXPORT_SYMBOL_GPL(ktime_get_snapshot);
>  

This looks good to me, but you should probably Cc the timekeeping
maintainers (tglx, John Stultz, and Stephen Boyd).

Thanks,

	M.

John Stultz Aug. 22, 2024, 6:13 p.m. UTC | #2

On Mon, Aug 5, 2024 at 10:33 AM 'Vincent Donnefort' via kernel-team
<kernel-team@android.com> wrote:
>
> On arm64 systems, the arch timer can be accessible by both EL1 and EL2.
> This means when running with nVHE or protected KVM, it is easy to
> generate clock values from the hypervisor, synchronized with the kernel.
>
> For tracing purpose, the boot clock is interesting as it doesn't stop on
> suspend. Export it as part of the time snapshot. This will later allow
> the hypervisor to add boot clock timestamps to its events.
>
> Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
>
> diff --git a/include/linux/timekeeping.h b/include/linux/timekeeping.h
> index fc12a9ba2c88..0fc6a61d64bd 100644
> --- a/include/linux/timekeeping.h
> +++ b/include/linux/timekeeping.h
> @@ -275,18 +275,24 @@ struct ktime_timestamps {
>   *                              counter value
>   * @cycles:    Clocksource counter value to produce the system times
>   * @real:      Realtime system time
> + * @boot:      Boot time

So, adding the boottime to this kernel-internal snapshot seems reasonable to me.

>   * @raw:       Monotonic raw system time
>   * @cs_id:     Clocksource ID
>   * @clock_was_set_seq: The sequence number of clock-was-set events
>   * @cs_was_changed_seq:        The sequence number of clocksource change events
> + * @mono_shift:        The monotonic clock slope shift
> + * @mono_mult: The monotonic clock slope mult

This bit, including the mult/shift pair however, isn't well explained
and is a little more worrying.

> @@ -1074,14 +1076,21 @@ void ktime_get_snapshot(struct system_time_snapshot *systime_snapshot)
>                 systime_snapshot->clock_was_set_seq = tk->clock_was_set_seq;
>                 base_real = ktime_add(tk->tkr_mono.base,
>                                       tk_core.timekeeper.offs_real);
> +               base_boot = ktime_add(tk->tkr_mono.base,
> +                                     tk_core.timekeeper.offs_boot);
>                 base_raw = tk->tkr_raw.base;
>                 nsec_real = timekeeping_cycles_to_ns(&tk->tkr_mono, now);
>                 nsec_raw  = timekeeping_cycles_to_ns(&tk->tkr_raw, now);
> +               mono_mult = tk->tkr_mono.mult;
> +               mono_shift = tk->tkr_mono.shift;
>         } while (read_seqcount_retry(&tk_core.seq, seq));
>
>         systime_snapshot->cycles = now;
>         systime_snapshot->real = ktime_add_ns(base_real, nsec_real);
> +       systime_snapshot->boot = ktime_add_ns(base_boot, nsec_real);
>         systime_snapshot->raw = ktime_add_ns(base_raw, nsec_raw);
> +       systime_snapshot->mono_shift = mono_shift;
> +       systime_snapshot->mono_mult = mono_mult;
>  }
>  EXPORT_SYMBOL_GPL(ktime_get_snapshot);

So this looks like you're trying to stuff kernel timekeeping internal
values into the snapshot so you can skirt around the timekeeping
subsystem and generate your own timestamps.

This ends up duplicating logic, but in an incomplete way.  For
instance, you don't have things like ntp state, etc, so the timestamps
you generate will not exactly match the kernel, and may have
discontinuities. :(

Now for many cases "close enough" is fine. But the difficulty is the
expectation bar always raises, and eventually "close enough" isn't and
we have a broken interface that has to be fixed.

That said, I do get the need to have something like this is
legitimate. There have been a number of cases where external hardware
(PTP timestamps from NICs) or contexts (virt) are able to record
hardware clocksource timestamps on their own, and want to be able to
map that back to the kernel's (or maybe "a kernel's" if there are
multiple VMs) sense of time.  Sometimes even wanting to do this quite
a bit later after the timestamp was recorded. The ktime_get_snapshot()
logic was added in the first place for this reason.

Some more aggressive approaches try to dump a bunch of the internal
kernel timekeeping state out to userland and call it an api.
See https://lore.kernel.org/lkml/410bbef9771ef8aa51704994a70d5965e367e2ce.camel@infradead.org/
for a recent (and thorough) effort there.

I'm very much not a fan of this approach, as it mimics older efforts
for userspace time calculations that were done before we settled on
VDSOs, which were very fragile and required years of keeping backwards
compatibility logic to map the current kernel state back to separate
structures and expensive conversions to different units that userland
expected.

The benefit with VDSO interface is while the data is exposed to
userland, the structure is not, and the logic is still kernel
controlled, so changes to internal state can be done without breaking
userland.

Something I have been thinking about is maybe it would be beneficial
to rework the timekeeping core so that given a clocksource timestamp,
it could calculate the time for that timestamp. While existing apis
would still do a new read of the clocksource, so the timestamps would
always increase, an old timestamp could be used to retro-calculate a
past time.  The thing that prevents this now is that the timekeeping
core doesn't keep any history, so we can't correctly back-calculate
times before the last state change. But potentially we could keep a
buffer of timekeeper states associated with clocksource intervals, and
so we could find the right state to use for a given clocksource
timestamp. Now, this would still only work to a point, as we don't
want to keep tons of historical state.  But then with this, maybe we
could switch to something more VDSO-like where the PTP drivers or host
systems could request a time given a timestamp (and probably some
clocksource id so we can sanity check everyone is using the same
clock), and we could still provide what they want without having to
expose all of our state.

Unfortunately though, this is all hand waving and pontificating on my
part, as it would be a large rework. But it seems something closer
where we share opaque kernel state along with logic with proper
syscall like APIs to do the calculations, would be a much better
approach over just exporting more kernel state as an API.

For a more short term approach, since you can't be exact outside of
the timekeeping logic, why not interpolate from the data
ktime_get_snapshot already provides to calculate your own sense of the
frequency?

thanks
-john

Thomas Gleixner Aug. 22, 2024, 9:41 p.m. UTC | #3

On Thu, Aug 22 2024 at 11:13, John Stultz wrote:
> On Mon, Aug 5, 2024 at 10:33 AM 'Vincent Donnefort' via kernel-team
> <kernel-team@android.com> wrote:
>> diff --git a/include/linux/timekeeping.h b/include/linux/timekeeping.h
>> index fc12a9ba2c88..0fc6a61d64bd 100644
>> --- a/include/linux/timekeeping.h
>> +++ b/include/linux/timekeeping.h
>> @@ -275,18 +275,24 @@ struct ktime_timestamps {
>>   *                              counter value
>>   * @cycles:    Clocksource counter value to produce the system times
>>   * @real:      Realtime system time
>> + * @boot:      Boot time
>
> So, adding the boottime to this kernel-internal snapshot seems reasonable to me.

Maybe for you, but I have zero context to this as this submission
obviously failed to CC the relevant mailing lists and maintainers...

Documentation/process is there for a reason...

Thanks,

        tgkx

Vincent Donnefort Sept. 5, 2024, 1:04 p.m. UTC | #4

On Thu, Aug 22, 2024 at 10:13:34AM +0100, Marc Zyngier wrote:
> On Mon, 05 Aug 2024 18:32:27 +0100,
> Vincent Donnefort <vdonnefort@google.com> wrote:
> > 
> > On arm64 systems, the arch timer can be accessible by both EL1 and EL2.
> > This means when running with nVHE or protected KVM, it is easy to
> > generate clock values from the hypervisor, synchronized with the kernel.
> 
> When you say "arch_timer" here, are you talking about the data
> structure describing the timer? Or about the actual *counter*, a
> system register provided by the HW?
> 
> I'm not sure the architecture-specific details are massively relevant,
> given that this is an arch-agnostic change.

I meant the counter but happy to drop this entire paragraph and just keep the
following one!

> 
> >
> > For tracing purpose, the boot clock is interesting as it doesn't stop on
> > suspend. Export it as part of the time snapshot. This will later allow
> > the hypervisor to add boot clock timestamps to its events.
> 
> Isn't that the actual description of the change? By getting the boot
> time as well as the parameters to compute an increment, you allow any
> subsystem able to perform a snapshot to compute a delta from boot time
> as long as they have access to the counter source.
> 
> > 
> > Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
> > 
> > diff --git a/include/linux/timekeeping.h b/include/linux/timekeeping.h
> > index fc12a9ba2c88..0fc6a61d64bd 100644
> > --- a/include/linux/timekeeping.h
> > +++ b/include/linux/timekeeping.h
> > @@ -275,18 +275,24 @@ struct ktime_timestamps {
> >   *				 counter value
> >   * @cycles:	Clocksource counter value to produce the system times
> >   * @real:	Realtime system time
> > + * @boot:	Boot time
> >   * @raw:	Monotonic raw system time
> >   * @cs_id:	Clocksource ID
> >   * @clock_was_set_seq:	The sequence number of clock-was-set events
> >   * @cs_was_changed_seq:	The sequence number of clocksource change events
> > + * @mono_shift:	The monotonic clock slope shift
> > + * @mono_mult:	The monotonic clock slope mult
> >   */
> >  struct system_time_snapshot {
> >  	u64			cycles;
> >  	ktime_t			real;
> > +	ktime_t			boot;
> >  	ktime_t			raw;
> >  	enum clocksource_ids	cs_id;
> >  	unsigned int		clock_was_set_seq;
> >  	u8			cs_was_changed_seq;
> > +	u32			mono_shift;
> > +	u32			mono_mult;
> >  };
> >  
> >  /**
> > diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
> > index 2fa87dcfeda9..6d0488a555a7 100644
> > --- a/kernel/time/timekeeping.c
> > +++ b/kernel/time/timekeeping.c
> > @@ -1057,9 +1057,11 @@ noinstr time64_t __ktime_get_real_seconds(void)
> >  void ktime_get_snapshot(struct system_time_snapshot *systime_snapshot)
> >  {
> >  	struct timekeeper *tk = &tk_core.timekeeper;
> > +	u32 mono_mult, mono_shift;
> >  	unsigned int seq;
> >  	ktime_t base_raw;
> >  	ktime_t base_real;
> > +	ktime_t base_boot;
> >  	u64 nsec_raw;
> >  	u64 nsec_real;
> >  	u64 now;
> > @@ -1074,14 +1076,21 @@ void ktime_get_snapshot(struct system_time_snapshot *systime_snapshot)
> >  		systime_snapshot->clock_was_set_seq = tk->clock_was_set_seq;
> >  		base_real = ktime_add(tk->tkr_mono.base,
> >  				      tk_core.timekeeper.offs_real);
> > +		base_boot = ktime_add(tk->tkr_mono.base,
> > +				      tk_core.timekeeper.offs_boot);
> >  		base_raw = tk->tkr_raw.base;
> >  		nsec_real = timekeeping_cycles_to_ns(&tk->tkr_mono, now);
> >  		nsec_raw  = timekeeping_cycles_to_ns(&tk->tkr_raw, now);
> > +		mono_mult = tk->tkr_mono.mult;
> > +		mono_shift = tk->tkr_mono.shift;
> >  	} while (read_seqcount_retry(&tk_core.seq, seq));
> >  
> >  	systime_snapshot->cycles = now;
> >  	systime_snapshot->real = ktime_add_ns(base_real, nsec_real);
> > +	systime_snapshot->boot = ktime_add_ns(base_boot, nsec_real);
> >  	systime_snapshot->raw = ktime_add_ns(base_raw, nsec_raw);
> > +	systime_snapshot->mono_shift = mono_shift;
> > +	systime_snapshot->mono_mult = mono_mult;
> >  }
> >  EXPORT_SYMBOL_GPL(ktime_get_snapshot);
> >  
> 
> This looks good to me, but you should probably Cc the timekeeping
> maintainers (tglx, John Stultz, and Stephen Boyd).

Yep, my bad!

> 
> Thanks,
> 
> 	M.
> 
> -- 
> Without deviation from the norm, progress is not possible.

Vincent Donnefort Sept. 5, 2024, 1:17 p.m. UTC | #5

On Thu, Aug 22, 2024 at 11:13:11AM -0700, John Stultz wrote:
> On Mon, Aug 5, 2024 at 10:33 AM 'Vincent Donnefort' via kernel-team
> <kernel-team@android.com> wrote:
> >
> > On arm64 systems, the arch timer can be accessible by both EL1 and EL2.
> > This means when running with nVHE or protected KVM, it is easy to
> > generate clock values from the hypervisor, synchronized with the kernel.
> >
> > For tracing purpose, the boot clock is interesting as it doesn't stop on
> > suspend. Export it as part of the time snapshot. This will later allow
> > the hypervisor to add boot clock timestamps to its events.
> >
> > Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
> >
> > diff --git a/include/linux/timekeeping.h b/include/linux/timekeeping.h
> > index fc12a9ba2c88..0fc6a61d64bd 100644
> > --- a/include/linux/timekeeping.h
> > +++ b/include/linux/timekeeping.h
> > @@ -275,18 +275,24 @@ struct ktime_timestamps {
> >   *                              counter value
> >   * @cycles:    Clocksource counter value to produce the system times
> >   * @real:      Realtime system time
> > + * @boot:      Boot time
> 
> So, adding the boottime to this kernel-internal snapshot seems reasonable to me.
> 
> >   * @raw:       Monotonic raw system time
> >   * @cs_id:     Clocksource ID
> >   * @clock_was_set_seq: The sequence number of clock-was-set events
> >   * @cs_was_changed_seq:        The sequence number of clocksource change events
> > + * @mono_shift:        The monotonic clock slope shift
> > + * @mono_mult: The monotonic clock slope mult
> 
> 
> This bit, including the mult/shift pair however, isn't well explained
> and is a little more worrying.
> 
> 
> > @@ -1074,14 +1076,21 @@ void ktime_get_snapshot(struct system_time_snapshot *systime_snapshot)
> >                 systime_snapshot->clock_was_set_seq = tk->clock_was_set_seq;
> >                 base_real = ktime_add(tk->tkr_mono.base,
> >                                       tk_core.timekeeper.offs_real);
> > +               base_boot = ktime_add(tk->tkr_mono.base,
> > +                                     tk_core.timekeeper.offs_boot);
> >                 base_raw = tk->tkr_raw.base;
> >                 nsec_real = timekeeping_cycles_to_ns(&tk->tkr_mono, now);
> >                 nsec_raw  = timekeeping_cycles_to_ns(&tk->tkr_raw, now);
> > +               mono_mult = tk->tkr_mono.mult;
> > +               mono_shift = tk->tkr_mono.shift;
> >         } while (read_seqcount_retry(&tk_core.seq, seq));
> >
> >         systime_snapshot->cycles = now;
> >         systime_snapshot->real = ktime_add_ns(base_real, nsec_real);
> > +       systime_snapshot->boot = ktime_add_ns(base_boot, nsec_real);
> >         systime_snapshot->raw = ktime_add_ns(base_raw, nsec_raw);
> > +       systime_snapshot->mono_shift = mono_shift;
> > +       systime_snapshot->mono_mult = mono_mult;
> >  }
> >  EXPORT_SYMBOL_GPL(ktime_get_snapshot);
> 
> So this looks like you're trying to stuff kernel timekeeping internal
> values into the snapshot so you can skirt around the timekeeping
> subsystem and generate your own timestamps.
> 
> This ends up duplicating logic, but in an incomplete way.  For
> instance, you don't have things like ntp state, etc, so the timestamps
> you generate will not exactly match the kernel, and may have
> discontinuities. :(
> 
> Now for many cases "close enough" is fine. But the difficulty is the
> expectation bar always raises, and eventually "close enough" isn't and
> we have a broken interface that has to be fixed.
> 
> That said, I do get the need to have something like this is
> legitimate. There have been a number of cases where external hardware
> (PTP timestamps from NICs) or contexts (virt) are able to record
> hardware clocksource timestamps on their own, and want to be able to
> map that back to the kernel's (or maybe "a kernel's" if there are
> multiple VMs) sense of time.  Sometimes even wanting to do this quite
> a bit later after the timestamp was recorded. The ktime_get_snapshot()
> logic was added in the first place for this reason.
> 
> Some more aggressive approaches try to dump a bunch of the internal
> kernel timekeeping state out to userland and call it an api.
> See https://lore.kernel.org/lkml/410bbef9771ef8aa51704994a70d5965e367e2ce.camel@infradead.org/
> for a recent (and thorough) effort there.
> 
> I'm very much not a fan of this approach, as it mimics older efforts
> for userspace time calculations that were done before we settled on
> VDSOs, which were very fragile and required years of keeping backwards
> compatibility logic to map the current kernel state back to separate
> structures and expensive conversions to different units that userland
> expected.
> 
> The benefit with VDSO interface is while the data is exposed to
> userland, the structure is not, and the logic is still kernel
> controlled, so changes to internal state can be done without breaking
> userland.
> 
> Something I have been thinking about is maybe it would be beneficial
> to rework the timekeeping core so that given a clocksource timestamp,
> it could calculate the time for that timestamp. While existing apis
> would still do a new read of the clocksource, so the timestamps would
> always increase, an old timestamp could be used to retro-calculate a
> past time.  The thing that prevents this now is that the timekeeping
> core doesn't keep any history, so we can't correctly back-calculate
> times before the last state change. But potentially we could keep a
> buffer of timekeeper states associated with clocksource intervals, and
> so we could find the right state to use for a given clocksource
> timestamp. Now, this would still only work to a point, as we don't
> want to keep tons of historical state.  But then with this, maybe we
> could switch to something more VDSO-like where the PTP drivers or host
> systems could request a time given a timestamp (and probably some
> clocksource id so we can sanity check everyone is using the same
> clock), and we could still provide what they want without having to
> expose all of our state.
> 
> Unfortunately though, this is all hand waving and pontificating on my
> part, as it would be a large rework. But it seems something closer
> where we share opaque kernel state along with logic with proper
> syscall like APIs to do the calculations, would be a much better
> approach over just exporting more kernel state as an API.
> 
> For a more short term approach, since you can't be exact outside of
> the timekeeping logic, why not interpolate from the data
> ktime_get_snapshot already provides to calculate your own sense of the
> frequency?

Understood, I shouldn't sneak out mult and shift. So for the following version,
I'll just use the boot clock value and process my "own" mult and "shift".

Thanks for having a look at the change!

> 
> thanks
> -john

[RFC,04/11] timekeeping: Export the boot clock in snapshots

Commit Message

Comments

Patch