rtc: Allow rtc drivers to specify the tv_nsec value for ntp

Message ID	20171013175433.GA22062@obsidianresearch.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org> Date: Fri, 13 Oct 2017 11:54:33 -0600 From: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> To: Russell King - ARM Linux <linux@armlinux.org.uk>, John Stultz <john.stultz@linaro.org> Subject: [PATCH] rtc: Allow rtc drivers to specify the tv_nsec value for ntp Message-ID: <20171013175433.GA22062@obsidianresearch.com> MIME-Version: 1.0 Content-Disposition: inline User-Agent: Mutt/1.5.24 (2015-08-30) Precedence: list Cc: Alessandro Zummo <a.zummo@towertech.it>, Thomas Gleixner <tglx@linutronix.de>, Stephen Boyd <sboyd@codeaurora.org>, Alexandre Belloni <alexandre.belloni@free-electrons.com>, linux-arm-kernel@lists.infradead.org Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" <linux-arm-kernel-bounces@lists.infradead.org> Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org

Jason Gunthorpe Oct. 13, 2017, 5:54 p.m. UTC

ntp is currently hardwired to try and call the rtc set when wall clock
tv_nsec is 0.5 seconds. This historical behaviour works well with certain
PC RTCs, but is not universal to all rtc hardware.

Change how this works by introducing the driver specific concept of
set_offset_nsec, the delay between current wall clock time and the target
time to set (with a 0 tv_nsecs).

For x86-style CMOS set_offset_nsec should be -0.5 s which causes the last
second to be written 0.5 s after it has started.

For compat with the old rtc_set_ntp_time, the value is defaulted to
+ 0.5 s, which causes the next second to be written 0.5s before it starts,
as things were before this patch.

Testing shows many non-x86 RTCs would like set_offset_nsec ~= 0,
so ultimately each RTC driver should set the set_offset_nsec according
to its needs, and non x86 architectures should stop using
update_persistent_clock64 in order to access this feature.
Future patches will revise the drivers as needed.

Since CMOS and RTC now have very different handling they are split
into two dedicated code paths, sharing the support code, and ifdefs
are replaced with IS_ENABLED.

Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
---
 drivers/rtc/class.c   |   3 +
 drivers/rtc/systohc.c |  53 +++++++++++-----
 include/linux/rtc.h   |  43 ++++++++++++-
 kernel/time/ntp.c     | 166 ++++++++++++++++++++++++++++++++++----------------
 4 files changed, 196 insertions(+), 69 deletions(-)

Hello All,

Here is the latest version of this patch that was circulating between
RMK and myself. I have done a few more minor changed and been able to
test it myself on x86 KVM and on the ARM system that motivated the
original CONFIG_RTC_SYSTOHC patch.

From all my testing, this patch does not change the existing behavior
at all, but provides the base infrastructure to fix individual RTCs
one at a time.

There are a few followup patches that will set the set_offset_nsec
value for various RTC drivers, and I still have to look at RMK's
hrtimer patch, but those are all incremental on top of this.

Alexandre Belloni Nov. 23, 2017, 9:54 a.m. UTC | #1

Hi,

On 13/10/2017 at 11:54:33 -0600, Jason Gunthorpe wrote:
> ntp is currently hardwired to try and call the rtc set when wall clock
> tv_nsec is 0.5 seconds. This historical behaviour works well with certain
> PC RTCs, but is not universal to all rtc hardware.
> 
> Change how this works by introducing the driver specific concept of
> set_offset_nsec, the delay between current wall clock time and the target
> time to set (with a 0 tv_nsecs).
> 
> For x86-style CMOS set_offset_nsec should be -0.5 s which causes the last
> second to be written 0.5 s after it has started.
> 
> For compat with the old rtc_set_ntp_time, the value is defaulted to
> + 0.5 s, which causes the next second to be written 0.5s before it starts,
> as things were before this patch.
> 
> Testing shows many non-x86 RTCs would like set_offset_nsec ~= 0,
> so ultimately each RTC driver should set the set_offset_nsec according
> to its needs, and non x86 architectures should stop using
> update_persistent_clock64 in order to access this feature.
> Future patches will revise the drivers as needed.
> 
> Since CMOS and RTC now have very different handling they are split
> into two dedicated code paths, sharing the support code, and ifdefs
> are replaced with IS_ENABLED.
> 
> Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
> ---
>  drivers/rtc/class.c   |   3 +
>  drivers/rtc/systohc.c |  53 +++++++++++-----
>  include/linux/rtc.h   |  43 ++++++++++++-
>  kernel/time/ntp.c     | 166 ++++++++++++++++++++++++++++++++++----------------
>  4 files changed, 196 insertions(+), 69 deletions(-)
> 
> Hello All,
> 
> Here is the latest version of this patch that was circulating between
> RMK and myself. I have done a few more minor changed and been able to
> test it myself on x86 KVM and on the ARM system that motivated the
> original CONFIG_RTC_SYSTOHC patch.
> 
> From all my testing, this patch does not change the existing behavior
> at all, but provides the base infrastructure to fix individual RTCs
> one at a time.
> 
> There are a few followup patches that will set the set_offset_nsec
> value for various RTC drivers, and I still have to look at RMK's
> hrtimer patch, but those are all incremental on top of this.
> 

So I'm discovering that this patch made it upstream silently. While it
somewhat solves the issue for some RTCs, it is not a proper solution for
most.

With this patch, set_offset_nsec will be hardcoded in the driver and
this basically work for the mc146818 but many RTCs are connected on a
slow bus (let's say i2c) and that bus may have various latencies
depending on the platform so the value actually depends on the platform
rather than on the RTC itself.

As noted by Russell in another thread, there is already a proper
solution: do it from userspace as hwclock will already handle most
issues.

I really think that we should simply consider dropping hctosys,
systohc and update_persistent_clock. Most distributions have been
handling it from userspace for a long time. Openembedded has a
startup/shutdown script.
Even better, systemd has a timesyncing daemon (but it doesn't yet do the
correct offset calculations).

I know we've had it in the kernel since 1.3.31 but this feature has
nothing to do in the kernel and the policy is best handled by userspace
and the whole thing is confusing users.


> diff --git a/drivers/rtc/class.c b/drivers/rtc/class.c
> index 2ed970d61da140..722d683e0b0f5f 100644
> --- a/drivers/rtc/class.c
> +++ b/drivers/rtc/class.c
> @@ -161,6 +161,9 @@ static struct rtc_device *rtc_allocate_device(void)
>  
>  	device_initialize(&rtc->dev);
>  
> +	/* Drivers can revise this default after allocating the device. */
> +	rtc->set_offset_nsec =  NSEC_PER_SEC / 2;
> +
>  	rtc->irq_freq = 1;
>  	rtc->max_user_freq = 64;
>  	rtc->dev.class = rtc_class;
> diff --git a/drivers/rtc/systohc.c b/drivers/rtc/systohc.c
> index b4a68ffcd06bb8..0c177647ea6c71 100644
> --- a/drivers/rtc/systohc.c
> +++ b/drivers/rtc/systohc.c
> @@ -10,6 +10,7 @@
>  /**
>   * rtc_set_ntp_time - Save NTP synchronized time to the RTC
>   * @now: Current time of day
> + * @target_nsec: pointer for desired now->tv_nsec value
>   *
>   * Replacement for the NTP platform function update_persistent_clock64
>   * that stores time for later retrieval by rtc_hctosys.
> @@ -18,30 +19,52 @@
>   * possible at all, and various other -errno for specific temporary failure
>   * cases.
>   *
> + * -EPROTO is returned if now.tv_nsec is not close enough to *target_nsec.
> + (
>   * If temporary failure is indicated the caller should try again 'soon'
>   */
> -int rtc_set_ntp_time(struct timespec64 now)
> +int rtc_set_ntp_time(struct timespec64 now, unsigned long *target_nsec)
>  {
>  	struct rtc_device *rtc;
>  	struct rtc_time tm;
> +	struct timespec64 to_set;
>  	int err = -ENODEV;
> -
> -	if (now.tv_nsec < (NSEC_PER_SEC >> 1))
> -		rtc_time64_to_tm(now.tv_sec, &tm);
> -	else
> -		rtc_time64_to_tm(now.tv_sec + 1, &tm);
> +	bool ok;
>  
>  	rtc = rtc_class_open(CONFIG_RTC_SYSTOHC_DEVICE);
> -	if (rtc) {
> -		/* rtc_hctosys exclusively uses UTC, so we call set_time here,
> -		 * not set_mmss. */
> -		if (rtc->ops &&
> -		    (rtc->ops->set_time ||
> -		     rtc->ops->set_mmss64 ||
> -		     rtc->ops->set_mmss))
> -			err = rtc_set_time(rtc, &tm);
> -		rtc_class_close(rtc);
> +	if (!rtc)
> +		goto out_err;
> +
> +	if (!rtc->ops || (!rtc->ops->set_time && !rtc->ops->set_mmss64 &&
> +			  !rtc->ops->set_mmss))
> +		goto out_close;
> +
> +	/* Compute the value of tv_nsec we require the caller to supply in
> +	 * now.tv_nsec.  This is the value such that (now +
> +	 * set_offset_nsec).tv_nsec == 0.
> +	 */
> +	set_normalized_timespec64(&to_set, 0, -rtc->set_offset_nsec);
> +	*target_nsec = to_set.tv_nsec;
> +
> +	/* The ntp code must call this with the correct value in tv_nsec, if
> +	 * it does not we update target_nsec and return EPROTO to make the ntp
> +	 * code try again later.
> +	 */
> +	ok = rtc_tv_nsec_ok(rtc->set_offset_nsec, &to_set, &now);
> +	if (!ok) {
> +		err = -EPROTO;
> +		goto out_close;
>  	}
>  
> +	rtc_time64_to_tm(to_set.tv_sec, &tm);
> +
> +	/* rtc_hctosys exclusively uses UTC, so we call set_time here, not
> +	 * set_mmss.
> +	 */
> +	err = rtc_set_time(rtc, &tm);
> +
> +out_close:
> +	rtc_class_close(rtc);
> +out_err:
>  	return err;
>  }
> diff --git a/include/linux/rtc.h b/include/linux/rtc.h
> index e6d0f9c1cafd81..5b13fa029fd6ae 100644
> --- a/include/linux/rtc.h
> +++ b/include/linux/rtc.h
> @@ -135,6 +135,14 @@ struct rtc_device {
>  	/* Some hardware can't support UIE mode */
>  	int uie_unsupported;
>  
> +	/* Number of nsec it takes to set the RTC clock. This influences when
> +	 * the set ops are called. An offset:
> +	 *   - of 0.5 s will call RTC set for wall clock time 10.0 s at 9.5 s
> +	 *   - of 1.5 s will call RTC set for wall clock time 10.0 s at 8.5 s
> +	 *   - of -0.5 s will call RTC set for wall clock time 10.0 s at 10.5 s
> +	 */
> +	long set_offset_nsec;
> +
>  	bool registered;
>  
>  	struct nvmem_config *nvmem_config;
> @@ -172,7 +180,7 @@ extern void devm_rtc_device_unregister(struct device *dev,
>  
>  extern int rtc_read_time(struct rtc_device *rtc, struct rtc_time *tm);
>  extern int rtc_set_time(struct rtc_device *rtc, struct rtc_time *tm);
> -extern int rtc_set_ntp_time(struct timespec64 now);
> +extern int rtc_set_ntp_time(struct timespec64 now, unsigned long *target_nsec);
>  int __rtc_read_alarm(struct rtc_device *rtc, struct rtc_wkalrm *alarm);
>  extern int rtc_read_alarm(struct rtc_device *rtc,
>  			struct rtc_wkalrm *alrm);
> @@ -221,6 +229,39 @@ static inline bool is_leap_year(unsigned int year)
>  	return (!(year % 4) && (year % 100)) || !(year % 400);
>  }
>  
> +/* Determine if we can call to driver to set the time. Drivers can only be
> + * called to set a second aligned time value, and the field set_offset_nsec
> + * specifies how far away from the second aligned time to call the driver.
> + *
> + * This also computes 'to_set' which is the time we are trying to set, and has
> + * a zero in tv_nsecs, such that:
> + *    to_set - set_delay_nsec == now +/- FUZZ
> + *
> + */
> +static inline bool rtc_tv_nsec_ok(s64 set_offset_nsec,
> +				  struct timespec64 *to_set,
> +				  const struct timespec64 *now)
> +{
> +	/* Allowed error in tv_nsec, arbitarily set to 5 jiffies in ns. */
> +	const unsigned long TIME_SET_NSEC_FUZZ = TICK_NSEC * 5;
> +	struct timespec64 delay = {.tv_sec = 0,
> +				   .tv_nsec = set_offset_nsec};
> +
> +	*to_set = timespec64_add(*now, delay);
> +
> +	if (to_set->tv_nsec < TIME_SET_NSEC_FUZZ) {
> +		to_set->tv_nsec = 0;
> +		return true;
> +	}
> +
> +	if (to_set->tv_nsec > NSEC_PER_SEC - TIME_SET_NSEC_FUZZ) {
> +		to_set->tv_sec++;
> +		to_set->tv_nsec = 0;
> +		return true;
> +	}
> +	return false;
> +}
> +
>  #define rtc_register_device(device) \
>  	__rtc_register_device(THIS_MODULE, device)
>  
> diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c
> index edf19cc5314043..bc19de1a06835e 100644
> --- a/kernel/time/ntp.c
> +++ b/kernel/time/ntp.c
> @@ -492,6 +492,67 @@ int second_overflow(time64_t secs)
>  	return leap;
>  }
>  
> +static void sync_hw_clock(struct work_struct *work);
> +static DECLARE_DELAYED_WORK(sync_work, sync_hw_clock);
> +
> +static void sched_sync_hw_clock(struct timespec64 now,
> +				unsigned long target_nsec, bool fail)
> +
> +{
> +	struct timespec64 next;
> +
> +	getnstimeofday64(&next);
> +	if (!fail)
> +		next.tv_sec = 659;
> +	else {
> +		/*
> +		 * Try again as soon as possible. Delaying long periods
> +		 * decreases the accuracy of the work queue timer. Due to this
> +		 * the algorithm is very likely to require a short-sleep retry
> +		 * after the above long sleep to synchronize ts_nsec.
> +		 */
> +		next.tv_sec = 0;
> +	}
> +
> +	/* Compute the needed delay that will get to tv_nsec == target_nsec */
> +	next.tv_nsec = target_nsec - next.tv_nsec;
> +	if (next.tv_nsec <= 0)
> +		next.tv_nsec += NSEC_PER_SEC;
> +	if (next.tv_nsec >= NSEC_PER_SEC) {
> +		next.tv_sec++;
> +		next.tv_nsec -= NSEC_PER_SEC;
> +	}
> +
> +	queue_delayed_work(system_power_efficient_wq, &sync_work,
> +			   timespec64_to_jiffies(&next));
> +}
> +
> +static void sync_rtc_clock(void)
> +{
> +	unsigned long target_nsec;
> +	struct timespec64 adjust, now;
> +	int rc;
> +
> +	if (!IS_ENABLED(CONFIG_RTC_SYSTOHC))
> +		return;
> +
> +	getnstimeofday64(&now);
> +
> +	adjust = now;
> +	if (persistent_clock_is_local)
> +		adjust.tv_sec -= (sys_tz.tz_minuteswest * 60);
> +
> +	/*
> +	 * The current RTC in use will provide the target_nsec it wants to be
> +	 * called at, and does rtc_tv_nsec_ok internally.
> +	 */
> +	rc = rtc_set_ntp_time(adjust, &target_nsec);
> +	if (rc == -ENODEV)
> +		return;
> +
> +	sched_sync_hw_clock(now, target_nsec, rc);
> +}
> +
>  #ifdef CONFIG_GENERIC_CMOS_UPDATE
>  int __weak update_persistent_clock(struct timespec now)
>  {
> @@ -507,76 +568,75 @@ int __weak update_persistent_clock64(struct timespec64 now64)
>  }
>  #endif
>  
> -#if defined(CONFIG_GENERIC_CMOS_UPDATE) || defined(CONFIG_RTC_SYSTOHC)
> -static void sync_cmos_clock(struct work_struct *work);
> -
> -static DECLARE_DELAYED_WORK(sync_cmos_work, sync_cmos_clock);
> -
> -static void sync_cmos_clock(struct work_struct *work)
> +static bool sync_cmos_clock(void)
>  {
> +	static bool no_cmos;
>  	struct timespec64 now;
> -	struct timespec64 next;
> -	int fail = 1;
> +	struct timespec64 adjust;
> +	int rc = -EPROTO;
> +	long target_nsec = NSEC_PER_SEC / 2;
> +
> +	if (!IS_ENABLED(CONFIG_GENERIC_CMOS_UPDATE))
> +		return false;
> +
> +	if (no_cmos)
> +		return false;
>  
>  	/*
> -	 * If we have an externally synchronized Linux clock, then update
> -	 * CMOS clock accordingly every ~11 minutes. Set_rtc_mmss() has to be
> -	 * called as close as possible to 500 ms before the new second starts.
> -	 * This code is run on a timer.  If the clock is set, that timer
> -	 * may not expire at the correct time.  Thus, we adjust...
> -	 * We want the clock to be within a couple of ticks from the target.
> +	 * Historically update_persistent_clock64() has followed x86
> +	 * semantics, which match the MC146818A/etc RTC. This RTC will store
> +	 * 'adjust' and then in .5s it will advance once second.
> +	 *
> +	 * Architectures are strongly encouraged to use rtclib and not
> +	 * implement this legacy API.
>  	 */
> -	if (!ntp_synced()) {
> -		/*
> -		 * Not synced, exit, do not restart a timer (if one is
> -		 * running, let it run out).
> -		 */
> -		return;
> -	}
> -
>  	getnstimeofday64(&now);
> -	if (abs(now.tv_nsec - (NSEC_PER_SEC / 2)) <= tick_nsec * 5) {
> -		struct timespec64 adjust = now;
> -
> -		fail = -ENODEV;
> +	if (rtc_tv_nsec_ok(-1 * target_nsec, &adjust, &now)) {
>  		if (persistent_clock_is_local)
>  			adjust.tv_sec -= (sys_tz.tz_minuteswest * 60);
> -#ifdef CONFIG_GENERIC_CMOS_UPDATE
> -		fail = update_persistent_clock64(adjust);
> -#endif
> -
> -#ifdef CONFIG_RTC_SYSTOHC
> -		if (fail == -ENODEV)
> -			fail = rtc_set_ntp_time(adjust);
> -#endif
> +		rc = update_persistent_clock64(adjust);
> +		/*
> +		 * The machine does not support update_persistent_clock64 even
> +		 * though it defines CONFIG_GENERIC_CMOS_UPDATE.
> +		 */
> +		if (rc == -ENODEV) {
> +			no_cmos = true;
> +			return false;
> +		}
>  	}
>  
> -	next.tv_nsec = (NSEC_PER_SEC / 2) - now.tv_nsec - (TICK_NSEC / 2);
> -	if (next.tv_nsec <= 0)
> -		next.tv_nsec += NSEC_PER_SEC;
> +	sched_sync_hw_clock(now, target_nsec, rc);
> +	return true;
> +}
>  
> -	if (!fail || fail == -ENODEV)
> -		next.tv_sec = 659;
> -	else
> -		next.tv_sec = 0;
> +/*
> + * If we have an externally synchronized Linux clock, then update RTC clock
> + * accordingly every ~11 minutes. Generally RTCs can only store second
> + * precision, but many RTCs will adjust the phase of their second tick to
> + * match the moment of update. This infrastructure arranges to call to the RTC
> + * set at the correct moment to phase synchronize the RTC second tick over
> + * with the kernel clock.
> + */
> +static void sync_hw_clock(struct work_struct *work)
> +{
> +	if (!ntp_synced())
> +		return;
>  
> -	if (next.tv_nsec >= NSEC_PER_SEC) {
> -		next.tv_sec++;
> -		next.tv_nsec -= NSEC_PER_SEC;
> -	}
> -	queue_delayed_work(system_power_efficient_wq,
> -			   &sync_cmos_work, timespec64_to_jiffies(&next));
> +	if (sync_cmos_clock())
> +		return;
> +
> +	sync_rtc_clock();
>  }
>  
>  void ntp_notify_cmos_timer(void)
>  {
> -	queue_delayed_work(system_power_efficient_wq, &sync_cmos_work, 0);
> -}
> -
> -#else
> -void ntp_notify_cmos_timer(void) { }
> -#endif
> +	if (!ntp_synced())
> +		return;
>  
> +	if (IS_ENABLED(CONFIG_GENERIC_CMOS_UPDATE) ||
> +	    IS_ENABLED(CONFIG_RTC_SYSTOHC))
> +		queue_delayed_work(system_power_efficient_wq, &sync_work, 0);
> +}
>  
>  /*
>   * Propagate a new txc->status value into the NTP state:
> -- 
> 2.7.4

Russell King (Oracle) Nov. 23, 2017, 11:23 a.m. UTC | #2

On Thu, Nov 23, 2017 at 10:54:56AM +0100, Alexandre Belloni wrote:
> Hi,
> 
> On 13/10/2017 at 11:54:33 -0600, Jason Gunthorpe wrote:
> > ntp is currently hardwired to try and call the rtc set when wall clock
> > tv_nsec is 0.5 seconds. This historical behaviour works well with certain
> > PC RTCs, but is not universal to all rtc hardware.
> > 
> > Change how this works by introducing the driver specific concept of
> > set_offset_nsec, the delay between current wall clock time and the target
> > time to set (with a 0 tv_nsecs).
> > 
> > For x86-style CMOS set_offset_nsec should be -0.5 s which causes the last
> > second to be written 0.5 s after it has started.
> > 
> > For compat with the old rtc_set_ntp_time, the value is defaulted to
> > + 0.5 s, which causes the next second to be written 0.5s before it starts,
> > as things were before this patch.
> > 
> > Testing shows many non-x86 RTCs would like set_offset_nsec ~= 0,
> > so ultimately each RTC driver should set the set_offset_nsec according
> > to its needs, and non x86 architectures should stop using
> > update_persistent_clock64 in order to access this feature.
> > Future patches will revise the drivers as needed.
> > 
> > Since CMOS and RTC now have very different handling they are split
> > into two dedicated code paths, sharing the support code, and ifdefs
> > are replaced with IS_ENABLED.
> > 
> > Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
> > ---
> >  drivers/rtc/class.c   |   3 +
> >  drivers/rtc/systohc.c |  53 +++++++++++-----
> >  include/linux/rtc.h   |  43 ++++++++++++-
> >  kernel/time/ntp.c     | 166 ++++++++++++++++++++++++++++++++++----------------
> >  4 files changed, 196 insertions(+), 69 deletions(-)
> > 
> > Hello All,
> > 
> > Here is the latest version of this patch that was circulating between
> > RMK and myself. I have done a few more minor changed and been able to
> > test it myself on x86 KVM and on the ARM system that motivated the
> > original CONFIG_RTC_SYSTOHC patch.
> > 
> > From all my testing, this patch does not change the existing behavior
> > at all, but provides the base infrastructure to fix individual RTCs
> > one at a time.
> > 
> > There are a few followup patches that will set the set_offset_nsec
> > value for various RTC drivers, and I still have to look at RMK's
> > hrtimer patch, but those are all incremental on top of this.
> > 
> 
> So I'm discovering that this patch made it upstream silently. While it
> somewhat solves the issue for some RTCs, it is not a proper solution for
> most.
> 
> With this patch, set_offset_nsec will be hardcoded in the driver and
> this basically work for the mc146818 but many RTCs are connected on a
> slow bus (let's say i2c) and that bus may have various latencies
> depending on the platform so the value actually depends on the platform
> rather than on the RTC itself.
> 
> As noted by Russell in another thread, there is already a proper
> solution: do it from userspace as hwclock will already handle most
> issues.

That's incorrect.  hwclock does not have the information to know when
it should set the time - that is hardware specific.  Some RTCs require
it set .5s before the second flips over, others require it at other
times.

hwclock has hard-coded the assumption that it's writing to a standard
PC RTC, so it does the .5s thing, which doesn't give good results on
various ARM platforms.

Accurately reading the current time is way simpler - hwclock does this
by watching for the RTC's seconds changing (via the update interrupt
or emulation.)  That's way easier than setting the time.

> I really think that we should simply consider dropping hctosys,
> systohc and update_persistent_clock. Most distributions have been
> handling it from userspace for a long time. Openembedded has a
> startup/shutdown script.

Not every system does though - you have to adjust debian's
configuration for it to happen at shutdown.

However, that's not the point of the kernel updating the RTC time -
the point of the RTC updates while synchronised is to reduce the
disruption that a crash/reboot causes on NTP by allowing the system
time to be restored as close as possible to the real time as possible.
If the system has crashed with your idea, the RTC will not have been
synchronised since who-knows-when, and the RTC could be way out,
especially if the system has been running for more than a month.

> Even better, systemd has a timesyncing daemon (but it doesn't yet do the
> correct offset calculations).

No thanks.  systemd's timesyncing daemon replaces ntpd, so it forces
you to use systemd if you want this facility.  What if you want this
facility but also facilities from ntpd (eg, for synchronising with
a reference clock?)

The solution that Alexander and myself have come up with is, IMHO, the
best solution, even on buses such as I2C buses.  I've a bunch of
follow-up patches that add set_offset_nsec for pcf8583 and armada38x,
export controls for adjusting that value, and for disabling the NTP
update.

The idea behind the patches that export those controls is to allow
set_offset_nsec to be finely adjusted - in order to do that:

1. you need to synchronise the machine's time keeping to a stable
   reference, let ntp settle.

2. disable NTP updates of the RTC, measure the RTC drift over a long
   period (eg, a day) and trim the RTC for minimal drift.

3. enable NTP updates, wait for an update, and measure the offset
   between real time and RTC time, and use that to adjust
   set_offset_nsec.

You only need to do the full procedure if you really care about
accurate time keeping (eg, you're trying to do something that
requires stable time.)  The point is, these patches _allow_ you to
do this if you wish.  Without them, it's completely impossible to
use Linux for accurately timestamped monitoring allocations - the
answer is not "just run ntpd" because if the system time is out,
it takes ntpd several *hours* to stabilise the systems timekeeping.

Alexandre Belloni Nov. 23, 2017, 12:04 p.m. UTC | #3

(Correcting Jason's email)

On 23/11/2017 at 11:23:38 +0000, Russell King - ARM Linux wrote:
> > So I'm discovering that this patch made it upstream silently. While it
> > somewhat solves the issue for some RTCs, it is not a proper solution for
> > most.
> > 
> > With this patch, set_offset_nsec will be hardcoded in the driver and
> > this basically work for the mc146818 but many RTCs are connected on a
> > slow bus (let's say i2c) and that bus may have various latencies
> > depending on the platform so the value actually depends on the platform
> > rather than on the RTC itself.
> > 
> > As noted by Russell in another thread, there is already a proper
> > solution: do it from userspace as hwclock will already handle most
> > issues.
> 
> That's incorrect.  hwclock does not have the information to know when
> it should set the time - that is hardware specific.  Some RTCs require
> it set .5s before the second flips over, others require it at other
> times.
> 
> hwclock has hard-coded the assumption that it's writing to a standard
> PC RTC, so it does the .5s thing, which doesn't give good results on
> various ARM platforms.
> 

The 0.5s hardcoding depends on the version of hwclock you use (the
busybox one doesn't do it anymore).

I thought it was handling it better than that and I was indeed
incorrect.

What about setting the time on the RTC once, then using UIE to get the
offset between the set time and the real time then set the time on the
RTC again after accounting for the offset. That would work nicely for
most RTCs.

> Accurately reading the current time is way simpler - hwclock does this
> by watching for the RTC's seconds changing (via the update interrupt
> or emulation.)  That's way easier than setting the time.
> 
> > I really think that we should simply consider dropping hctosys,
> > systohc and update_persistent_clock. Most distributions have been
> > handling it from userspace for a long time. Openembedded has a
> > startup/shutdown script.
> 
> Not every system does though - you have to adjust debian's
> configuration for it to happen at shutdown.
> 
> However, that's not the point of the kernel updating the RTC time -
> the point of the RTC updates while synchronised is to reduce the
> disruption that a crash/reboot causes on NTP by allowing the system
> time to be restored as close as possible to the real time as possible.
> If the system has crashed with your idea, the RTC will not have been
> synchronised since who-knows-when, and the RTC could be way out,
> especially if the system has been running for more than a month.
> 

But nothing prevents you from using hwclock every 11 minutes from
userspace. I really don't think this should be done from the kernel.

Even better, from userspace you can actually chose the time interval you
want. To me, this all seems to be policy encoded in the kernel.

> > Even better, systemd has a timesyncing daemon (but it doesn't yet do the
> > correct offset calculations).
> 
> No thanks.  systemd's timesyncing daemon replaces ntpd, so it forces
> you to use systemd if you want this facility.  What if you want this
> facility but also facilities from ntpd (eg, for synchronising with
> a reference clock?)
> 
> The solution that Alexander and myself have come up with is, IMHO, the
> best solution, even on buses such as I2C buses.  I've a bunch of
> follow-up patches that add set_offset_nsec for pcf8583 and armada38x,
> export controls for adjusting that value, and for disabling the NTP
> update.
> 
> The idea behind the patches that export those controls is to allow
> set_offset_nsec to be finely adjusted - in order to do that:
> 
> 1. you need to synchronise the machine's time keeping to a stable
>    reference, let ntp settle.
> 
> 2. disable NTP updates of the RTC, measure the RTC drift over a long
>    period (eg, a day) and trim the RTC for minimal drift.
> 
> 3. enable NTP updates, wait for an update, and measure the offset
>    between real time and RTC time, and use that to adjust
>    set_offset_nsec.
> 
> You only need to do the full procedure if you really care about
> accurate time keeping (eg, you're trying to do something that
> requires stable time.)  The point is, these patches _allow_ you to
> do this if you wish.  Without them, it's completely impossible to
> use Linux for accurately timestamped monitoring allocations - the
> answer is not "just run ntpd" because if the system time is out,
> it takes ntpd several *hours* to stabilise the systems timekeeping.
> 

I really don't think you currently need help from the kernel to do any
of it (apart from adjusting the oscillator obviously). Nothing currently
prevents you to keep a set_offset_nsec in userspace so you can actually
set the time as close as possible to the current time.

I didn't have time to draft a PoC and that is why I didn't reply on the
patch yet.

What I think is needed is a better tool, maybe a daemon, that would
handle both keeping tabs on the needed offset and handle the oscillator
trimming as it may need to get back and forth between two close values.

I still think we need to drop the SYSTOHC and HCTOSYS stuff. I agree it
can't happen overnight as some people are currently relying on it and
they need to migrate.

But having users wondering whether they should keep hwclock or use
SYSTOHC/HCTOSYS is fucked up as SYSTOHC probably doesn't do what they
want if they don't use NTP (and so they still need to be able to set
time from userspace).

Russell King (Oracle) Nov. 23, 2017, 12:53 p.m. UTC | #4

On Thu, Nov 23, 2017 at 01:04:51PM +0100, Alexandre Belloni wrote:
> (Correcting Jason's email)
> 
> On 23/11/2017 at 11:23:38 +0000, Russell King - ARM Linux wrote:
> > > So I'm discovering that this patch made it upstream silently. While it
> > > somewhat solves the issue for some RTCs, it is not a proper solution for
> > > most.
> > > 
> > > With this patch, set_offset_nsec will be hardcoded in the driver and
> > > this basically work for the mc146818 but many RTCs are connected on a
> > > slow bus (let's say i2c) and that bus may have various latencies
> > > depending on the platform so the value actually depends on the platform
> > > rather than on the RTC itself.
> > > 
> > > As noted by Russell in another thread, there is already a proper
> > > solution: do it from userspace as hwclock will already handle most
> > > issues.
> > 
> > That's incorrect.  hwclock does not have the information to know when
> > it should set the time - that is hardware specific.  Some RTCs require
> > it set .5s before the second flips over, others require it at other
> > times.
> > 
> > hwclock has hard-coded the assumption that it's writing to a standard
> > PC RTC, so it does the .5s thing, which doesn't give good results on
> > various ARM platforms.
> > 
> 
> The 0.5s hardcoding depends on the version of hwclock you use (the
> busybox one doesn't do it anymore).

Right, so it'll be correct for a different set of RTCs and wrong
for others.  So, forget current versions of hwclock, none of them
know how to correctly set the time on all RTCs.  It fundamentally
lacks the information required, because there's no way for it to
know that information at present.

> I thought it was handling it better than that and I was indeed
> incorrect.
> 
> What about setting the time on the RTC once, then using UIE to get the
> offset between the set time and the real time then set the time on the
> RTC again after accounting for the offset. That would work nicely for
> most RTCs.

That could work, but you're looking at making every hwclock based
write take maybe at least two seconds, unless you cache that
information somewhere.  If you cache that, then you end up with a
problem when someone (like I do) copies the rootfs between different
platforms.  Sometimes I copy SD cards.

Sometimes I even NFS boot the same export on different platforms
(but obviously not at the same time.)

> > Accurately reading the current time is way simpler - hwclock does this
> > by watching for the RTC's seconds changing (via the update interrupt
> > or emulation.)  That's way easier than setting the time.
> > 
> > > I really think that we should simply consider dropping hctosys,
> > > systohc and update_persistent_clock. Most distributions have been
> > > handling it from userspace for a long time. Openembedded has a
> > > startup/shutdown script.
> > 
> > Not every system does though - you have to adjust debian's
> > configuration for it to happen at shutdown.
> > 
> > However, that's not the point of the kernel updating the RTC time -
> > the point of the RTC updates while synchronised is to reduce the
> > disruption that a crash/reboot causes on NTP by allowing the system
> > time to be restored as close as possible to the real time as possible.
> > If the system has crashed with your idea, the RTC will not have been
> > synchronised since who-knows-when, and the RTC could be way out,
> > especially if the system has been running for more than a month.
> > 
> 
> But nothing prevents you from using hwclock every 11 minutes from
> userspace. I really don't think this should be done from the kernel.

It's not just about running hwclock every 11 minutes.  It's about
running hwclock when NTP sync'd.  If the local clock is not sync'd
you don't want to be running hwclock, especially if you've trimmed
the RTC.  So merely throwing hwclock -uw into a cron job really
doesn't solve it.

A way around that would be to install adjtimex, so that the kernel's
NTP flags can be read out.  However, that comes with its own set of
problems.

On Debian, installing adjtimex will disrupt the timekeeping because
of the post-install scripts debian runs.  It seems Debian assumes
that if you install something, it has the right to modify the system
timekeeping parameters immediately, screwing up ntpd in the process,
if it's running.  The thought that you're installing adjtimex because
you want to _inspect_ the kernel ntp parameters is not one that
Debian folk appear to have considered as being a reason for installing
the package.

> I really don't think you currently need help from the kernel to do any
> of it (apart from adjusting the oscillator obviously). Nothing currently
> prevents you to keep a set_offset_nsec in userspace so you can actually
> set the time as close as possible to the current time.
> 
> I didn't have time to draft a PoC and that is why I didn't reply on the
> patch yet.
> 
> What I think is needed is a better tool, maybe a daemon, that would
> handle both keeping tabs on the needed offset and handle the oscillator
> trimming as it may need to get back and forth between two close values.
> 
> I still think we need to drop the SYSTOHC and HCTOSYS stuff. I agree it
> can't happen overnight as some people are currently relying on it and
> they need to migrate.
> 
> But having users wondering whether they should keep hwclock or use
> SYSTOHC/HCTOSYS is fucked up as SYSTOHC probably doesn't do what they
> want if they don't use NTP (and so they still need to be able to set
> time from userspace).

Do not go there.  Having run systems with no local RTC, I can tell you
that they're a right pain if system time is not properly set before
userspace starts.  For example, you sometimes can't get a "ps" listing
because a procfs file contains a "btime" of zero, and it complains
and errors out on that.  Other problems have been NFS XIDs which end
up always starting from the same value, so the NFS server thinks they
are being replayed by the client (despite it being an entirely different
kernel being run.)  That still exists if you reboot quickly enough.

So no, removing hctosys would be a big mistake.

systohc is more questionable, and always has been.  The "push it out
to userspace" approach has been tried in the past, yet we seem to
have it back in the kernel.  IIRC, it was originally decided that
rtclib would not support it, and that it was going to be done in
userspace.

However, the userspace part never appeared, and instead rtclib
eventually gained support for it, because it's what people want to
see.

I'm not yet convinced by the "lets push it all into userspace" argument
because that's vapourware - and we've been there before.  It's "nice"
to state but if no one steps up and does it, it's of no benefit and
just results in people carrying local hacks (eg in vendor kernels), or
working around those who state it.

I also don't particularly like the concept of trying to measure the
RTC's time-set offset.  My userspace programs that measure the RTC
offset show that to get to millisecond resolution, it isn't trivial
because of system timing "noise" - you need to calibrate the reading
side first so you can get an estimation of when the RTC second flips.
You'd then need to set the RTC time, and then re-measure when the RTC
second flips, wait for the appropriate system time and then write the
RTC.  You're look at between 2 and 4 seconds for that, and a window
where a perfectly good RTC could be set to an offset time.

Jason Gunthorpe Nov. 23, 2017, 3:04 p.m. UTC | #5

On Thu, Nov 23, 2017 at 10:54:56AM +0100, Alexandre Belloni wrote:

> So I'm discovering that this patch made it upstream silently. While it
> somewhat solves the issue for some RTCs, it is not a proper solution for
> most.

It was cc'd all the right lists?

> With this patch, set_offset_nsec will be hardcoded in the driver and
> this basically work for the mc146818 but many RTCs are connected on a
> slow bus (let's say i2c) and that bus may have various latencies
> depending on the platform so the value actually depends on the platform
> rather than on the RTC itself.

The intention of the patch is for the actual RTC driver to set the
value to what it needs.

If some RTCs are dealing with variable latency busses then they should
do an bus IO to measure their bus latency then tweak their value.

In any event, having the correct offset for the RTC chip (eg 0.5 vs 0)
is a big improvement, even if there are some small I2C latencies.

> I really think that we should simply consider dropping hctosys,
> systohc and update_persistent_clock. Most distributions have been

Like Russell has said already, the kernel is the only place that could
actually have all the information needed. User space would have to use
some kind of trail and error approach to figure out what the offset
should be.

You can't really measure the offset without doing a time set, many
embedded RTCs do not hook up interrupts, etc.

Jason

Alexandre Belloni Nov. 23, 2017, 3:36 p.m. UTC | #6

On 23/11/2017 at 08:04:39 -0700, Jason Gunthorpe wrote:
> On Thu, Nov 23, 2017 at 10:54:56AM +0100, Alexandre Belloni wrote:
> 
> > So I'm discovering that this patch made it upstream silently. While it
> > somewhat solves the issue for some RTCs, it is not a proper solution for
> > most.
> 
> It was cc'd all the right lists?
> 

It wasn't sent to the RTC ML but that's fine.

> > With this patch, set_offset_nsec will be hardcoded in the driver and
> > this basically work for the mc146818 but many RTCs are connected on a
> > slow bus (let's say i2c) and that bus may have various latencies
> > depending on the platform so the value actually depends on the platform
> > rather than on the RTC itself.
> 
> The intention of the patch is for the actual RTC driver to set the
> value to what it needs.
> 
> If some RTCs are dealing with variable latency busses then they should
> do an bus IO to measure their bus latency then tweak their value.
> 
> In any event, having the correct offset for the RTC chip (eg 0.5 vs 0)
> is a big improvement, even if there are some small I2C latencies.
> 
> > I really think that we should simply consider dropping hctosys,
> > systohc and update_persistent_clock. Most distributions have been
> 
> Like Russell has said already, the kernel is the only place that could
> actually have all the information needed. User space would have to use
> some kind of trail and error approach to figure out what the offset
> should be.
> 

And I don't see how this is different from doing it in the kernel. You
currently don't give any feedback loop so the RTC driver doesn't know
whether it's own offset is correct or not.

> You can't really measure the offset without doing a time set, many
> embedded RTCs do not hook up interrupts, etc.
> 

If you don't hook up the interrupt, then all you did is completely
pointless because you will never be able to read the correct time from
the RTC, you'll have up to a 1s offset.

Russell King (Oracle) Nov. 23, 2017, 4:10 p.m. UTC | #7

On Thu, Nov 23, 2017 at 04:36:37PM +0100, Alexandre Belloni wrote:
> On 23/11/2017 at 08:04:39 -0700, Jason Gunthorpe wrote:
> > On Thu, Nov 23, 2017 at 10:54:56AM +0100, Alexandre Belloni wrote:
> > You can't really measure the offset without doing a time set, many
> > embedded RTCs do not hook up interrupts, etc.
> 
> If you don't hook up the interrupt, then all you did is completely
> pointless because you will never be able to read the correct time from
> the RTC, you'll have up to a 1s offset.

That is incorrect.  The kernel provides emulation for the update
interrupt - it polls the RTC every 1/HZ checking it for a change
in seconds.

So, you end up with up to a 1/HZ offset where no interrupt is
present.

Please don't base your assumptions on the output of hwclock(8) -
the output of that which comes after the timezone specifier is
not documented in the man page, and appears on the face of it to
be rather random:

root@clearfog21:~# hwclock -r --noadjfile --utc
Thu 23 Nov 2017 15:52:35 GMT  -0.547531 seconds
root@clearfog21:~# hwclock -r --noadjfile --utc
Thu 23 Nov 2017 15:52:37 GMT  -0.399859 seconds
root@clearfog21:~# hwclock -r --noadjfile --utc
Thu 23 Nov 2017 15:52:38 GMT  -0.275654 seconds

Using my program instead:

root@clearfog21:~# ./test-rtc

Warning: NTP synchronisation is enabled, measurements may be affected

RTC is currently trimmed to -39112 ppb
Takes average of 65127ns to read RTC
Took 64411ns to read RTC on second change

RTC Time: 23-11-2017 15:52:50
System Time was:     15:52:50.000

root@clearfog21:~# ./test-rtc

Warning: NTP synchronisation is enabled, measurements may be affected

RTC is currently trimmed to -39112 ppb
Takes average of 64560ns to read RTC
Took 63884ns to read RTC on second change

RTC Time: 23-11-2017 15:53:14
System Time was:     15:53:14.000
root@clearfog21:~# ./test-rtc

Warning: NTP synchronisation is enabled, measurements may be affected

RTC is currently trimmed to -39112 ppb
Takes average of 64467ns to read RTC
Took 74642ns to read RTC on second change

RTC Time: 23-11-2017 15:53:17
System Time was:     15:53:17.000

root@clearfog21:~#

So, the RTC is closely synchronised with the system time, and is
certainly not variable as hwclock(8) _appears_ to suggest.  So the
figure it comes out with after printing the time is _not_ the
difference betwen system time and RTC time.

It seems to be the negative amount of time from an arbitary point
after hwclock() starts execution to the point just after waiting
for the update interrupt:

root@clearfog21:~# strace -tt -T hwclock -r --noadjfile --utc --debug
15:58:32.411876 execve("/sbin/hwclock", ["hwclock", "-r", "--noadjfile", "--utc", "--debug"], [/* 16 vars */]) = 0 <0.000834>
...
15:58:32.425566 gettimeofday({1511452712, 425707}, NULL) = 0 <0.000134>
...
15:58:32.436914 write(1, "Waiting for clock tick...\n", 26Waiting for clock tick...
) = 26 <0.000233>
15:58:32.437303 ioctl(3, PHN_SET_REGS or RTC_UIE_ON, 0) = 0 <0.000391>
15:58:32.437859 select(4, [3], NULL, NULL, {10, 0}) = 1 (in [3], left {9, 438030}) <0.562095>
15:58:33.002175 ioctl(3, PHN_NOT_OH or RTC_UIE_OFF, 0) = 0 <0.000186>
15:58:33.002592 write(1, "...got clock tick\n", 18...got clock tick
) = 18 <0.000314>
15:58:33.003217 gettimeofday({1511452713, 3378}, NULL) = 0 <0.000159>
...
15:58:33.009146 write(1, "Thu 23 Nov 2017 15:58:33 GMT  -0"..., 48Thu 23 Nov 2017 15:58:33 GMT  -0.577671 seconds

1511452713.003378 - 1511452712.425707 = 0.577671s - the figure printed
by hwclock(8) after the time.

Russell King (Oracle) Nov. 23, 2017, 4:25 p.m. UTC | #8

On Thu, Nov 23, 2017 at 04:10:40PM +0000, Russell King - ARM Linux wrote:
> Using my program instead:
> 
> root@clearfog21:~# ./test-rtc
> 
> Warning: NTP synchronisation is enabled, measurements may be affected
> 
> RTC is currently trimmed to -39112 ppb
> Takes average of 65127ns to read RTC
> Took 64411ns to read RTC on second change
> 
> RTC Time: 23-11-2017 15:52:50
> System Time was:     15:52:50.000

PS, my program works by:

- switching to SCHED_FIFO scheduling
- timing how long it takes to read the RTC 100 times.
- timing how long it takes to read the RTC for real across a change
  in the number of seconds.
    if this is different by more than 25us from the average, then
    the entire measurement cycle is restarted until we hit this target.
- reporting the RTC time and the compensated system time half-way
  through the ioctl() to read the RTC, based on the average time
  taken to read the RTC.

This has the advantage that it's indepdendent of the kernel's UIE
implementation and RTC interrupt latency, and so can be much more
accurate than 1/HZ.

Its measurements have sufficient accuracy (when printed to microsecond
resolution) to show the effect of the RTC's trimming corrections where
RTCs apply trimming pulses every minute rather than every second.

Jason Gunthorpe Nov. 23, 2017, 4:33 p.m. UTC | #9

On Thu, Nov 23, 2017 at 04:36:37PM +0100, Alexandre Belloni wrote:

> It wasn't sent to the RTC ML but that's fine.

Okay, my mistake.

> > Like Russell has said already, the kernel is the only place that could
> > actually have all the information needed. User space would have to use
> > some kind of trail and error approach to figure out what the offset
> > should be.
>
> And I don't see how this is different from doing it in the kernel.

Well, the kernel is closer to the RTC and can issue a single I2C
operation to directly measure bus latency, so it is in a far better
place to do this than user space...

> You currently don't give any feedback loop so the RTC driver doesn't
> know whether it's own offset is correct or not.

Feedback would be very complicated, that seems reasonable to push it
to userspace if people care enough to trim the RTC set very finely.
Allowing userspace to trim set offset from the driver would be
trivial.

Remember, the main goal here is to get bulk accuracy, by supporting
the RTC HW specific offset, which is currently wrong by 0.5s or more
on a lot of RTC hardware.

Jason

J William Piggott Nov. 24, 2017, 12:13 a.m. UTC | #10

On 11/23/2017 07:53 AM, Russell King - ARM Linux wrote:
> On Thu, Nov 23, 2017 at 01:04:51PM +0100, Alexandre Belloni wrote:
>>

 8<

>> But nothing prevents you from using hwclock every 11 minutes from
>> userspace. I really don't think this should be done from the kernel.
> 
> It's not just about running hwclock every 11 minutes.  It's about
> running hwclock when NTP sync'd.  If the local clock is not sync'd
> you don't want to be running hwclock, especially if you've trimmed
> the RTC.  So merely throwing hwclock -uw into a cron job really
> doesn't solve it.
> 
> A way around that would be to install adjtimex, so that the kernel's
> NTP flags can be read out.  However, that comes with its own set of
> problems.
> 
> On Debian, installing adjtimex will disrupt the timekeeping because
> of the post-install scripts debian runs.  It seems Debian assumes
> that if you install something, it has the right to modify the system
> timekeeping parameters immediately, screwing up ntpd in the process,
> if it's running.  The thought that you're installing adjtimex because
> you want to _inspect_ the kernel ntp parameters is not one that
> Debian folk appear to have considered as being a reason for installing
> the package.
> 

IMO, adjtimex is broken anyway. Use ntptime, it should be included
in the ntp package:

 $ /usr/sbin/ntptime | grep status
  status 0x40 (UNSYNC),

'ntptime -f ppm' allows correcting the system clock. So adjtimex really
isn't needed anymore.

Russell King (Oracle) Nov. 27, 2017, 3:43 p.m. UTC | #11

On Thu, Nov 23, 2017 at 08:04:39AM -0700, Jason Gunthorpe wrote:
> Like Russell has said already, the kernel is the only place that could
> actually have all the information needed. User space would have to use
> some kind of trail and error approach to figure out what the offset
> should be.

Jason,

Now that -rc1 is out, and I'm rebasing the branches, what I find is
the patch that was merged is not the patch that I last tested.  The
last patch I had from you was 22 Sep 2017.

The most obvious difference is in drivers/rtc/class.c:

-+      /* Drivers can revise this default after allocating the device. */
++      /* Drivers can revise this default after allocating the device. It
++       * should be the value of wallclock tv_nsec that the driver needs in
++       * order to synchronize the second tick over during set.
++       */

but there's also considerable differences in kernel/time/ntp.c, which
seems to miss out on some fixes for bugs I reported along the way:

 +      getnstimeofday64(&now);
 +
-+      adjust = now;
++      now = adjust;

What's going on?

I've got a rebase mess here to sort out, and it's a horrid mess.

I think what's in -rc1 is rather fscked.

John Stultz Nov. 27, 2017, 5:35 p.m. UTC | #12

On Thu, Nov 23, 2017 at 1:54 AM, Alexandre Belloni
<alexandre.belloni@free-electrons.com> wrote:
> Hi,
>
> On 13/10/2017 at 11:54:33 -0600, Jason Gunthorpe wrote:
>> ntp is currently hardwired to try and call the rtc set when wall clock
>> tv_nsec is 0.5 seconds. This historical behaviour works well with certain
>> PC RTCs, but is not universal to all rtc hardware.
>>
>> Change how this works by introducing the driver specific concept of
>> set_offset_nsec, the delay between current wall clock time and the target
>> time to set (with a 0 tv_nsecs).
>>
>> For x86-style CMOS set_offset_nsec should be -0.5 s which causes the last
>> second to be written 0.5 s after it has started.
>>
>> For compat with the old rtc_set_ntp_time, the value is defaulted to
>> + 0.5 s, which causes the next second to be written 0.5s before it starts,
>> as things were before this patch.
>>
>> Testing shows many non-x86 RTCs would like set_offset_nsec ~= 0,
>> so ultimately each RTC driver should set the set_offset_nsec according
>> to its needs, and non x86 architectures should stop using
>> update_persistent_clock64 in order to access this feature.
>> Future patches will revise the drivers as needed.
>>
>> Since CMOS and RTC now have very different handling they are split
>> into two dedicated code paths, sharing the support code, and ifdefs
>> are replaced with IS_ENABLED.
>>
>> Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
>> ---
>>  drivers/rtc/class.c   |   3 +
>>  drivers/rtc/systohc.c |  53 +++++++++++-----
>>  include/linux/rtc.h   |  43 ++++++++++++-
>>  kernel/time/ntp.c     | 166 ++++++++++++++++++++++++++++++++++----------------
>>  4 files changed, 196 insertions(+), 69 deletions(-)
>>
>> Hello All,
>>
>> Here is the latest version of this patch that was circulating between
>> RMK and myself. I have done a few more minor changed and been able to
>> test it myself on x86 KVM and on the ARM system that motivated the
>> original CONFIG_RTC_SYSTOHC patch.
>>
>> From all my testing, this patch does not change the existing behavior
>> at all, but provides the base infrastructure to fix individual RTCs
>> one at a time.
>>
>> There are a few followup patches that will set the set_offset_nsec
>> value for various RTC drivers, and I still have to look at RMK's
>> hrtimer patch, but those are all incremental on top of this.
>>
>
> So I'm discovering that this patch made it upstream silently. While it
> somewhat solves the issue for some RTCs, it is not a proper solution for
> most.

Sorry. I thought the discussion had finished up and it looked ok, so I
queued it up. Apologies.

Shall I revert it?

thanks
-john

Russell King (Oracle) Nov. 27, 2017, 5:43 p.m. UTC | #13

On Mon, Nov 27, 2017 at 03:43:16PM +0000, Russell King - ARM Linux wrote:
> On Thu, Nov 23, 2017 at 08:04:39AM -0700, Jason Gunthorpe wrote:
> > Like Russell has said already, the kernel is the only place that could
> > actually have all the information needed. User space would have to use
> > some kind of trail and error approach to figure out what the offset
> > should be.
> 
> Jason,
> 
> Now that -rc1 is out, and I'm rebasing the branches, what I find is
> the patch that was merged is not the patch that I last tested.  The
> last patch I had from you was 22 Sep 2017.
> 
> The most obvious difference is in drivers/rtc/class.c:
> 
> -+      /* Drivers can revise this default after allocating the device. */
> ++      /* Drivers can revise this default after allocating the device. It
> ++       * should be the value of wallclock tv_nsec that the driver needs in
> ++       * order to synchronize the second tick over during set.
> ++       */
> 
> but there's also considerable differences in kernel/time/ntp.c, which
> seems to miss out on some fixes for bugs I reported along the way:
> 
>  +      getnstimeofday64(&now);
>  +
> -+      adjust = now;
> ++      now = adjust;
> 
> What's going on?
> 
> I've got a rebase mess here to sort out, and it's a horrid mess.
> 
> I think what's in -rc1 is rather fscked.

Having been through the conflict mess and come out with a resolution,
I think it might be okay, with the exception of the missing explanation
in drivers/rtc/class.c.  I do need to run some tests on it to make sure
though.

Russell King (Oracle) Nov. 27, 2017, 5:52 p.m. UTC | #14

On Mon, Nov 27, 2017 at 09:35:30AM -0800, John Stultz wrote:
> On Thu, Nov 23, 2017 at 1:54 AM, Alexandre Belloni
> <alexandre.belloni@free-electrons.com> wrote:
> > Hi,
> >
> > On 13/10/2017 at 11:54:33 -0600, Jason Gunthorpe wrote:
> >> ntp is currently hardwired to try and call the rtc set when wall clock
> >> tv_nsec is 0.5 seconds. This historical behaviour works well with certain
> >> PC RTCs, but is not universal to all rtc hardware.
> >>
> >> Change how this works by introducing the driver specific concept of
> >> set_offset_nsec, the delay between current wall clock time and the target
> >> time to set (with a 0 tv_nsecs).
> >>
> >> For x86-style CMOS set_offset_nsec should be -0.5 s which causes the last
> >> second to be written 0.5 s after it has started.
> >>
> >> For compat with the old rtc_set_ntp_time, the value is defaulted to
> >> + 0.5 s, which causes the next second to be written 0.5s before it starts,
> >> as things were before this patch.
> >>
> >> Testing shows many non-x86 RTCs would like set_offset_nsec ~= 0,
> >> so ultimately each RTC driver should set the set_offset_nsec according
> >> to its needs, and non x86 architectures should stop using
> >> update_persistent_clock64 in order to access this feature.
> >> Future patches will revise the drivers as needed.
> >>
> >> Since CMOS and RTC now have very different handling they are split
> >> into two dedicated code paths, sharing the support code, and ifdefs
> >> are replaced with IS_ENABLED.
> >>
> >> Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
> >> ---
> >>  drivers/rtc/class.c   |   3 +
> >>  drivers/rtc/systohc.c |  53 +++++++++++-----
> >>  include/linux/rtc.h   |  43 ++++++++++++-
> >>  kernel/time/ntp.c     | 166 ++++++++++++++++++++++++++++++++++----------------
> >>  4 files changed, 196 insertions(+), 69 deletions(-)
> >>
> >> Hello All,
> >>
> >> Here is the latest version of this patch that was circulating between
> >> RMK and myself. I have done a few more minor changed and been able to
> >> test it myself on x86 KVM and on the ARM system that motivated the
> >> original CONFIG_RTC_SYSTOHC patch.
> >>
> >> From all my testing, this patch does not change the existing behavior
> >> at all, but provides the base infrastructure to fix individual RTCs
> >> one at a time.
> >>
> >> There are a few followup patches that will set the set_offset_nsec
> >> value for various RTC drivers, and I still have to look at RMK's
> >> hrtimer patch, but those are all incremental on top of this.
> >>
> >
> > So I'm discovering that this patch made it upstream silently. While it
> > somewhat solves the issue for some RTCs, it is not a proper solution for
> > most.
> 
> Sorry. I thought the discussion had finished up and it looked ok, so I
> queued it up. Apologies.

I'm actually rather disappointed that Alexandre Belloni has only now
brought up his dis-satisfaction with the approach after all the effort
that Jason and myself have put in to it.  It's not like Alexandre was
not copied on the patches and discussion.

If Alexandre could not be bothered to bring up his concerns while the
discussion was on-going in September, and didn't bother raising them
in October, I'd say that Alexandre's opinion at this point doesn't
count for much - if it wasn't important to state at the time or for
a couple of months after, why does it become important to state after
the thing has been merged.

Maybe the idea here is basically to waste people's time letting them
develop a patch for an approach, and then object at the last minute
to that approach.  Hardly seems fair or even reasonable.

Alexandre Belloni Nov. 27, 2017, 6:44 p.m. UTC | #15

On 27/11/2017 at 17:52:54 +0000, Russell King - ARM Linux wrote:
> I'm actually rather disappointed that Alexandre Belloni has only now
> brought up his dis-satisfaction with the approach after all the effort
> that Jason and myself have put in to it.  It's not like Alexandre was
> not copied on the patches and discussion.
> 
> If Alexandre could not be bothered to bring up his concerns while the
> discussion was on-going in September, and didn't bother raising them
> in October, I'd say that Alexandre's opinion at this point doesn't
> count for much - if it wasn't important to state at the time or for
> a couple of months after, why does it become important to state after
> the thing has been merged.
> 
> Maybe the idea here is basically to waste people's time letting them
> develop a patch for an approach, and then object at the last minute
> to that approach.  Hardly seems fair or even reasonable.
> 

How unfair that is! Really, you are not in a position to make that kind
of comment because you are not even replying to patches in your own
subsystem. But maybe my time doesn't count as much as yours.

I was going to go for a middle ground but now, I'm really more inclined
to just revert the whole stuff.

Alexandre Belloni Nov. 27, 2017, 6:48 p.m. UTC | #16

On 23/11/2017 at 16:10:40 +0000, Russell King - ARM Linux wrote:
> On Thu, Nov 23, 2017 at 04:36:37PM +0100, Alexandre Belloni wrote:
> > On 23/11/2017 at 08:04:39 -0700, Jason Gunthorpe wrote:
> > > On Thu, Nov 23, 2017 at 10:54:56AM +0100, Alexandre Belloni wrote:
> > > You can't really measure the offset without doing a time set, many
> > > embedded RTCs do not hook up interrupts, etc.
> > 
> > If you don't hook up the interrupt, then all you did is completely
> > pointless because you will never be able to read the correct time from
> > the RTC, you'll have up to a 1s offset.
> 
> That is incorrect.  The kernel provides emulation for the update
> interrupt - it polls the RTC every 1/HZ checking it for a change
> in seconds.
> 
> So, you end up with up to a 1/HZ offset where no interrupt is
> present.
> 

Yes, I forgot about the UIE emulation.

> Please don't base your assumptions on the output of hwclock(8) -
> the output of that which comes after the timezone specifier is
> not documented in the man page, and appears on the face of it to
> be rather random:
> 

I'm not basing myself on hwclock's output because it is indeed quite
random.

Russell King (Oracle) Nov. 27, 2017, 6:53 p.m. UTC | #17

On Mon, Nov 27, 2017 at 07:44:11PM +0100, Alexandre Belloni wrote:
> On 27/11/2017 at 17:52:54 +0000, Russell King - ARM Linux wrote:
> > I'm actually rather disappointed that Alexandre Belloni has only now
> > brought up his dis-satisfaction with the approach after all the effort
> > that Jason and myself have put in to it.  It's not like Alexandre was
> > not copied on the patches and discussion.
> > 
> > If Alexandre could not be bothered to bring up his concerns while the
> > discussion was on-going in September, and didn't bother raising them
> > in October, I'd say that Alexandre's opinion at this point doesn't
> > count for much - if it wasn't important to state at the time or for
> > a couple of months after, why does it become important to state after
> > the thing has been merged.
> > 
> > Maybe the idea here is basically to waste people's time letting them
> > develop a patch for an approach, and then object at the last minute
> > to that approach.  Hardly seems fair or even reasonable.
> > 
> 
> How unfair that is! Really, you are not in a position to make that kind
> of comment because you are not even replying to patches in your own
> subsystem. But maybe my time doesn't count as much as yours.

You are, yet again, wrong.

I am in a position to make the comment because it was me who identified
the problem, put in the hours to work on, develop and extensively test
Jason's patch.  So, it's partly my time that you seem to be wasting,
and that gives me every right to complain at this point.

You, on the other hand, were copied with every single email, and did
nothing to discuss the issue except for the "easy" bits when I posted
a relatively smaller patch - but you ignored the bigger issue.

Now that the patch was merged, you throw your toys out of the pram and
start blaming everyone else for "silently" merging the patch and how it
wasn't sent to the right email addresses.

And now that someone dare criticise your abilities, you decide to revert
the change and restore Linux back to a crippled state.

Honestly, I don't _care_ if you revert it and if you want to cripple
the kernel as a result in regards to this issue, I can carry the patch
ad infinitum, no skin off my back.  You're only going to be hurting
yourself and other people through your spite by doing that revert.

I suggest you take a good long hard look at what you're about to do and
ask whether you are being reasonable, given that it's taken you over
two months whole months to raise any _technical_ issues with the approach
that Jason and myself came up with.

Jason Gunthorpe Nov. 27, 2017, 6:59 p.m. UTC | #18

On Mon, Nov 27, 2017 at 05:43:23PM +0000, Russell King - ARM Linux wrote:

> Having been through the conflict mess and come out with a resolution,
> I think it might be okay, with the exception of the missing explanation
> in drivers/rtc/class.c.  I do need to run some tests on it to make sure
> though.

Okay thanks, let me know if more is needed.

Jason

Russell King (Oracle) Nov. 27, 2017, 7:18 p.m. UTC | #19

+Linus

Background:

Basically, Jason and myself developed an approach to fix a problem in
the RTC layer - https://marc.info/?t=150590661600002&r=1&w=3

John Stultz saw that discussion had ceased, and decided to merge the
cross-subsystem patch for 4.15.  Then, last week, Alexandre spots it
in mainline and decides he doesn't like it on fairly weak technical
grounds - see second message in:

https://marc.info/?t=150791745000009&r=1&w=3

I believe the grounds that Alexandre objects on are fairly weak as I
detailed in my replies to his objection.

What I can't fathom is why it would take someone two months to come
up with weak arguments against an approach, and then when I complain
about that and the prospect of it being reverted on those grounds,
the reaction is to revert it.

IMHO, maintainers have a duty to respond within a reasonable timescale,
and if patches get merged inspite of them, they need to consider
whether they're really doing a good job in the first place, and whether
they have *really good* reasons to object.

If maintainers are going to sit back, and let people work on problems,
and then have patches reverted only after they hit mainline, people are
just going to stop working on trying to solve problems - because no one
will know whether their approach is acceptable or whether they're merely
wasting their time.  That would be /very/ bad for Linux.

On Mon, Nov 27, 2017 at 06:53:52PM +0000, Russell King - ARM Linux wrote:
> On Mon, Nov 27, 2017 at 07:44:11PM +0100, Alexandre Belloni wrote:
> > On 27/11/2017 at 17:52:54 +0000, Russell King - ARM Linux wrote:
> > > I'm actually rather disappointed that Alexandre Belloni has only now
> > > brought up his dis-satisfaction with the approach after all the effort
> > > that Jason and myself have put in to it.  It's not like Alexandre was
> > > not copied on the patches and discussion.
> > > 
> > > If Alexandre could not be bothered to bring up his concerns while the
> > > discussion was on-going in September, and didn't bother raising them
> > > in October, I'd say that Alexandre's opinion at this point doesn't
> > > count for much - if it wasn't important to state at the time or for
> > > a couple of months after, why does it become important to state after
> > > the thing has been merged.
> > > 
> > > Maybe the idea here is basically to waste people's time letting them
> > > develop a patch for an approach, and then object at the last minute
> > > to that approach.  Hardly seems fair or even reasonable.
> > > 
> > 
> > How unfair that is! Really, you are not in a position to make that kind
> > of comment because you are not even replying to patches in your own
> > subsystem. But maybe my time doesn't count as much as yours.
> 
> You are, yet again, wrong.
> 
> I am in a position to make the comment because it was me who identified
> the problem, put in the hours to work on, develop and extensively test
> Jason's patch.  So, it's partly my time that you seem to be wasting,
> and that gives me every right to complain at this point.
> 
> You, on the other hand, were copied with every single email, and did
> nothing to discuss the issue except for the "easy" bits when I posted
> a relatively smaller patch - but you ignored the bigger issue.
> 
> Now that the patch was merged, you throw your toys out of the pram and
> start blaming everyone else for "silently" merging the patch and how it
> wasn't sent to the right email addresses.
> 
> And now that someone dare criticise your abilities, you decide to revert
> the change and restore Linux back to a crippled state.
> 
> Honestly, I don't _care_ if you revert it and if you want to cripple
> the kernel as a result in regards to this issue, I can carry the patch
> ad infinitum, no skin off my back.  You're only going to be hurting
> yourself and other people through your spite by doing that revert.
> 
> I suggest you take a good long hard look at what you're about to do and
> ask whether you are being reasonable, given that it's taken you over
> two months whole months to raise any _technical_ issues with the approach
> that Jason and myself came up with.
> 
> -- 
> RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
> FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
> According to speedtest.net: 8.21Mbps down 510kbps up
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

Alexandre Belloni Nov. 27, 2017, 7:31 p.m. UTC | #20

On 27/11/2017 at 18:53:52 +0000, Russell King - ARM Linux wrote:
> On Mon, Nov 27, 2017 at 07:44:11PM +0100, Alexandre Belloni wrote:
> > On 27/11/2017 at 17:52:54 +0000, Russell King - ARM Linux wrote:
> > > I'm actually rather disappointed that Alexandre Belloni has only now
> > > brought up his dis-satisfaction with the approach after all the effort
> > > that Jason and myself have put in to it.  It's not like Alexandre was
> > > not copied on the patches and discussion.
> > > 
> > > If Alexandre could not be bothered to bring up his concerns while the
> > > discussion was on-going in September, and didn't bother raising them
> > > in October, I'd say that Alexandre's opinion at this point doesn't
> > > count for much - if it wasn't important to state at the time or for
> > > a couple of months after, why does it become important to state after
> > > the thing has been merged.
> > > 
> > > Maybe the idea here is basically to waste people's time letting them
> > > develop a patch for an approach, and then object at the last minute
> > > to that approach.  Hardly seems fair or even reasonable.
> > > 
> > 
> > How unfair that is! Really, you are not in a position to make that kind
> > of comment because you are not even replying to patches in your own
> > subsystem. But maybe my time doesn't count as much as yours.
> 
> You are, yet again, wrong.
> 
> I am in a position to make the comment because it was me who identified
> the problem, put in the hours to work on, develop and extensively test
> Jason's patch.  So, it's partly my time that you seem to be wasting,
> and that gives me every right to complain at this point.
> 
> You, on the other hand, were copied with every single email, and did
> nothing to discuss the issue except for the "easy" bits when I posted
> a relatively smaller patch - but you ignored the bigger issue.
> 

And this is exactly what you do with other people patches/time when you
don't like their changes.
You simply ignore the patch series until they go away.

> Now that the patch was merged, you throw your toys out of the pram and
> start blaming everyone else for "silently" merging the patch and how it
> wasn't sent to the right email addresses.
> 

I would really expect people merging code in any subsystem to wait for
the ack of the maintainer of that subsystem.

I didn't complain about any missing email addresses, I said the RTC ML
was not copied but that is didn't matter.

You're not even happy about the patch that was merged because it was the
wrong one!

> And now that someone dare criticise your abilities, you decide to revert
> the change and restore Linux back to a crippled state.
> 
> Honestly, I don't _care_ if you revert it and if you want to cripple
> the kernel as a result in regards to this issue, I can carry the patch
> ad infinitum, no skin off my back.  You're only going to be hurting
> yourself and other people through your spite by doing that revert.
> 
> I suggest you take a good long hard look at what you're about to do and
> ask whether you are being reasonable, given that it's taken you over
> two months whole months to raise any _technical_ issues with the approach
> that Jason and myself came up with.
> 

What I don't get is that it has been broken for almost 5 years and now
you seem to think it has to be fixed urgently.

On my side, I want to take the opportunity to think about systohc before
adding an ABI that we will maybe regret later.

Again, I agree the 0.5s offset is crap and basically only works for
mc146818 and that has to be fixed somehow.

Maybe I could have replied earlier but that has been my intent from the
start but I didn't have the time to look at the history of it before.

It was not my intent to waste anyone's time.

Alexandre Belloni Nov. 27, 2017, 8:18 p.m. UTC | #21

On 23/11/2017 at 12:53:00 +0000, Russell King - ARM Linux wrote:
> On Thu, Nov 23, 2017 at 01:04:51PM +0100, Alexandre Belloni wrote:
> > What about setting the time on the RTC once, then using UIE to get the
> > offset between the set time and the real time then set the time on the
> > RTC again after accounting for the offset. That would work nicely for
> > most RTCs.
> 
> That could work, but you're looking at making every hwclock based
> write take maybe at least two seconds, unless you cache that
> information somewhere.  If you cache that, then you end up with a
> problem when someone (like I do) copies the rootfs between different
> platforms.  Sometimes I copy SD cards.
> 
> Sometimes I even NFS boot the same export on different platforms
> (but obviously not at the same time.)
> 

I don't think it is really an issue to have setting the RTC time take a
few seconds.

[...]

> > But nothing prevents you from using hwclock every 11 minutes from
> > userspace. I really don't think this should be done from the kernel.
> 
> It's not just about running hwclock every 11 minutes.  It's about
> running hwclock when NTP sync'd.  If the local clock is not sync'd
> you don't want to be running hwclock, especially if you've trimmed
> the RTC.  So merely throwing hwclock -uw into a cron job really
> doesn't solve it.
> 

Yes and I would hope userspace knows that it is using NTP to sync the
local clock...

> A way around that would be to install adjtimex, so that the kernel's
> NTP flags can be read out.  However, that comes with its own set of
> problems.
> 
> On Debian, installing adjtimex will disrupt the timekeeping because
> of the post-install scripts debian runs.  It seems Debian assumes
> that if you install something, it has the right to modify the system
> timekeeping parameters immediately, screwing up ntpd in the process,
> if it's running.  The thought that you're installing adjtimex because
> you want to _inspect_ the kernel ntp parameters is not one that
> Debian folk appear to have considered as being a reason for installing
> the package.
> 

Aren't those mainly userspace/integration problems that should be fixed
in Debian?

> > I really don't think you currently need help from the kernel to do any
> > of it (apart from adjusting the oscillator obviously). Nothing currently
> > prevents you to keep a set_offset_nsec in userspace so you can actually
> > set the time as close as possible to the current time.
> > 
> > I didn't have time to draft a PoC and that is why I didn't reply on the
> > patch yet.
> > 
> > What I think is needed is a better tool, maybe a daemon, that would
> > handle both keeping tabs on the needed offset and handle the oscillator
> > trimming as it may need to get back and forth between two close values.
> > 
> > I still think we need to drop the SYSTOHC and HCTOSYS stuff. I agree it
> > can't happen overnight as some people are currently relying on it and
> > they need to migrate.
> > 
> > But having users wondering whether they should keep hwclock or use
> > SYSTOHC/HCTOSYS is fucked up as SYSTOHC probably doesn't do what they
> > want if they don't use NTP (and so they still need to be able to set
> > time from userspace).
> 
> Do not go there.  Having run systems with no local RTC, I can tell you
> that they're a right pain if system time is not properly set before
> userspace starts.  For example, you sometimes can't get a "ps" listing
> because a procfs file contains a "btime" of zero, and it complains
> and errors out on that.  Other problems have been NFS XIDs which end
> up always starting from the same value, so the NFS server thinks they
> are being replayed by the client (despite it being an entirely different
> kernel being run.)  That still exists if you reboot quickly enough.
> 
> So no, removing hctosys would be a big mistake.
> 

Ok, that is a good point to keep hctosys.


> systohc is more questionable, and always has been.  The "push it out
> to userspace" approach has been tried in the past, yet we seem to
> have it back in the kernel.  IIRC, it was originally decided that
> rtclib would not support it, and that it was going to be done in
> userspace.
> 
> However, the userspace part never appeared, and instead rtclib
> eventually gained support for it, because it's what people want to
> see.
> 
> I'm not yet convinced by the "lets push it all into userspace" argument
> because that's vapourware - and we've been there before.  It's "nice"
> to state but if no one steps up and does it, it's of no benefit and
> just results in people carrying local hacks (eg in vendor kernels), or
> working around those who state it.
> 

What I see is that people that care about precisely setting the RTC are
not using systohc because of the 0.5s offset issue or because it doesn't
do what they want and instead are using their own userspace tool to set
the RTC time.

They have the right to keep those tools closed if they desire so.

The necessary ABI for userspace to do the right thing is already there
and you are going to add a new one.

Also, fixing systohc will not fix the offset for people using hwclock or
any other tools (e.g. systemd).

I pretty much prefer trying to fix existing userspace tools or if this
is not possible have a new tool to handle reading/setting the RTC time.

> I also don't particularly like the concept of trying to measure the
> RTC's time-set offset.  My userspace programs that measure the RTC
> offset show that to get to millisecond resolution, it isn't trivial
> because of system timing "noise" - you need to calibrate the reading
> side first so you can get an estimation of when the RTC second flips.
> You'd then need to set the RTC time, and then re-measure when the RTC
> second flips, wait for the appropriate system time and then write the
> RTC.  You're look at between 2 and 4 seconds for that, and a window
> where a perfectly good RTC could be set to an offset time.

Jason Gunthorpe Nov. 27, 2017, 8:29 p.m. UTC | #22

On Mon, Nov 27, 2017 at 09:18:12PM +0100, Alexandre Belloni wrote:

> Also, fixing systohc will not fix the offset for people using hwclock or
> any other tools (e.g. systemd).

When I was talking this over with Russel originally I imagined the
systohc fix as a first step, and we could eventually add an rtc API to
let user space do a precise write to the RTC, with the kernel taking
care of the required sleep, etc.

Jason

Russell King (Oracle) Nov. 28, 2017, 10:03 a.m. UTC | #23

On Mon, Nov 27, 2017 at 09:18:12PM +0100, Alexandre Belloni wrote:
> On 23/11/2017 at 12:53:00 +0000, Russell King - ARM Linux wrote:
> > systohc is more questionable, and always has been.  The "push it out
> > to userspace" approach has been tried in the past, yet we seem to
> > have it back in the kernel.  IIRC, it was originally decided that
> > rtclib would not support it, and that it was going to be done in
> > userspace.
> > 
> > However, the userspace part never appeared, and instead rtclib
> > eventually gained support for it, because it's what people want to
> > see.
> > 
> > I'm not yet convinced by the "lets push it all into userspace" argument
> > because that's vapourware - and we've been there before.  It's "nice"
> > to state but if no one steps up and does it, it's of no benefit and
> > just results in people carrying local hacks (eg in vendor kernels), or
> > working around those who state it.
> > 
> 
> What I see is that people that care about precisely setting the RTC are
> not using systohc because of the 0.5s offset issue or because it doesn't
> do what they want and instead are using their own userspace tool to set
> the RTC time.

If that is the case, where are these tools?

> They have the right to keep those tools closed if they desire so.
> 
> The necessary ABI for userspace to do the right thing is already there
> and you are going to add a new one.

Sorry, I don't think userspace has the information to do a good job.
The idea of setting the RTC to measure the offset is a hack - it's
something that is going to have to be measured each and every time
that the RTC is set, and would need to be done by a real-time process
to ensure that other system noise does not affect it.

> Also, fixing systohc will not fix the offset for people using hwclock or
> any other tools (e.g. systemd).

As has already been said, Jason's suggestion is to export this
information to userspace so that hwclock can be fixed in a similar
way.  hwclock isn't too relevant to the NTP question though, because
it isn't used while ntpd is running.

> I pretty much prefer trying to fix existing userspace tools or if this
> is not possible have a new tool to handle reading/setting the RTC time.

I suspect that won't happen, because there won't be enough interest
in fixing it.  As I've said, this issue had come up years ago, and
the same suggestions were made at that time.  Here we are today,
still without a userspace tool to do it.  What's different this
time around?

Are you going to put the effort in yourself to solve this problem,
or are you going to leave it to some ghostly apperition to do,
resulting in yet another vapourware tool that never actually
materialises?

I'll repeat that I have no interest in creating such a tool as I
already have a working in-kernel solution that doesn't need me to
create any userspace tooling, and I don't care that it isn't in
mainline.

Russell King (Oracle) Nov. 28, 2017, 10:20 a.m. UTC | #24

On Mon, Nov 27, 2017 at 08:31:35PM +0100, Alexandre Belloni wrote:
> On 27/11/2017 at 18:53:52 +0000, Russell King - ARM Linux wrote:
> > You are, yet again, wrong.
> > 
> > I am in a position to make the comment because it was me who identified
> > the problem, put in the hours to work on, develop and extensively test
> > Jason's patch.  So, it's partly my time that you seem to be wasting,
> > and that gives me every right to complain at this point.
> > 
> > You, on the other hand, were copied with every single email, and did
> > nothing to discuss the issue except for the "easy" bits when I posted
> > a relatively smaller patch - but you ignored the bigger issue.
> 
> And this is exactly what you do with other people patches/time when you
> don't like their changes.
> You simply ignore the patch series until they go away.

That is not intentional.

> I would really expect people merging code in any subsystem to wait for
> the ack of the maintainer of that subsystem.
> 
> I didn't complain about any missing email addresses, I said the RTC ML
> was not copied but that is didn't matter.

A mailing list is an email address.

> You're not even happy about the patch that was merged because it was the
> wrong one!

That's not what I said.  I was concerned that an _old_ patch had been
merged, but as it turns out, it was not an old patch, but a newer
patch that I'd missed.

> > And now that someone dare criticise your abilities, you decide to revert
> > the change and restore Linux back to a crippled state.
> > 
> > Honestly, I don't _care_ if you revert it and if you want to cripple
> > the kernel as a result in regards to this issue, I can carry the patch
> > ad infinitum, no skin off my back.  You're only going to be hurting
> > yourself and other people through your spite by doing that revert.
> > 
> > I suggest you take a good long hard look at what you're about to do and
> > ask whether you are being reasonable, given that it's taken you over
> > two months whole months to raise any _technical_ issues with the approach
> > that Jason and myself came up with.
> > 
> 
> What I don't get is that it has been broken for almost 5 years and now
> you seem to think it has to be fixed urgently.

Now you're making crap up.  I've not made any comment about this being
something that needs fixing urgently.  In fact, as I've said several
times already, I really don't care what you do with the mainline kernel,
because I have a fix here locally that I intend to use and maintain
into the future.  For me, the issue is fixed and resolved, and I intend
to spend no further time developing any fixes for it.

My comments are about the _two months_ its taken to get to the stage of
finding out that you don't like the approach.

The result of this is, most likely, one of:
1. you'll revert the patch (which, incidentally, has no real effect until
   my other patches get merged) and everyone will either be stuck with a
   kernel that sets their RTC time wrong when they have NTP installed

2. you'll remove the systohc code, people won't realise that their kernel
   no longer sync's the RTC time, and systems that are on the Internet
   (eg, providing stratum 1 NTP servers) will be unreliable if they are
   rebooted even more so than they are today.

> On my side, I want to take the opportunity to think about systohc before
> adding an ABI that we will maybe regret later.

Fine, but bear in mind that I don't care anymore, because I have
perfectly good fixes locally, and I'm not wasting any further time
in this issue.

While you think about it, I would encourage anyone who wishing to run
or running a public NTP server using rtclib to ditch Linux and use
FreeBSD instead to avoid their NTP server supplying false time, even
when synchronised to a reference clock.  A reboot of such a server
will supply false time for a period of a few hours depending on how
far the time is out (which can be up to a couple of seconds) after
such an event.  In such a scenario, NTP will not step-correct the
time but will gradually slew it instead.

This goes for systems that some ISPs deploy as stratum 1 NTP servers,
which today tend to be Raspberry Pis.

Alexandre Belloni Nov. 30, 2017, 7:39 p.m. UTC | #25

On 28/11/2017 at 10:20:25 +0000, Russell King - ARM Linux wrote:
> On Mon, Nov 27, 2017 at 08:31:35PM +0100, Alexandre Belloni wrote:
> > On 27/11/2017 at 18:53:52 +0000, Russell King - ARM Linux wrote:
> > > You are, yet again, wrong.
> > > 
> > > I am in a position to make the comment because it was me who identified
> > > the problem, put in the hours to work on, develop and extensively test
> > > Jason's patch.  So, it's partly my time that you seem to be wasting,
> > > and that gives me every right to complain at this point.
> > > 
> > > You, on the other hand, were copied with every single email, and did
> > > nothing to discuss the issue except for the "easy" bits when I posted
> > > a relatively smaller patch - but you ignored the bigger issue.
> > 
> > And this is exactly what you do with other people patches/time when you
> > don't like their changes.
> > You simply ignore the patch series until they go away.
> 
> That is not intentional.
> 

So please, don't assume that I replied late intentionally.

> > I would really expect people merging code in any subsystem to wait for
> > the ack of the maintainer of that subsystem.
> > 
> > I didn't complain about any missing email addresses, I said the RTC ML
> > was not copied but that is didn't matter.
> 
> A mailing list is an email address.
> 

Yes and again, this was a non issue, simply that the patch was not in
the RTC patchwork but that is completely irrelevant to the matter.

> Now you're making crap up.  I've not made any comment about this being
> something that needs fixing urgently.  In fact, as I've said several
> times already, I really don't care what you do with the mainline kernel,
> because I have a fix here locally that I intend to use and maintain
> into the future.  For me, the issue is fixed and resolved, and I intend
> to spend no further time developing any fixes for it.
> 
> My comments are about the _two months_ its taken to get to the stage of
> finding out that you don't like the approach.
> 

Again, sorry, this was not intentional.

> The result of this is, most likely, one of:
> 1. you'll revert the patch (which, incidentally, has no real effect until
>    my other patches get merged) and everyone will either be stuck with a
>    kernel that sets their RTC time wrong when they have NTP installed
> 

Ok, let's not waste everything. I think the main issue is that I still
don't get how this can actually improve the situation with regards to
userspace. Can you share your series so I could maybe understand? I'm
especially interested in how you will handle the pcf8523 (if you did
that of course).

Alexandre Belloni Nov. 30, 2017, 7:39 p.m. UTC | #26

On 28/11/2017 at 10:20:25 +0000, Russell King - ARM Linux wrote:
> On Mon, Nov 27, 2017 at 08:31:35PM +0100, Alexandre Belloni wrote:
> > On 27/11/2017 at 18:53:52 +0000, Russell King - ARM Linux wrote:
> > > You are, yet again, wrong.
> > > 
> > > I am in a position to make the comment because it was me who identified
> > > the problem, put in the hours to work on, develop and extensively test
> > > Jason's patch.  So, it's partly my time that you seem to be wasting,
> > > and that gives me every right to complain at this point.
> > > 
> > > You, on the other hand, were copied with every single email, and did
> > > nothing to discuss the issue except for the "easy" bits when I posted
> > > a relatively smaller patch - but you ignored the bigger issue.
> > 
> > And this is exactly what you do with other people patches/time when you
> > don't like their changes.
> > You simply ignore the patch series until they go away.
> 
> That is not intentional.
> 

But that is exactly what you are currently reproaching me right now. I
didn't have the time to have a good look a it so I didn't reply it
wasn't intentional either.

You sometimes don't reply to patch series, I sometimes take time to
reply, I don't see how this is different then.

> > I would really expect people merging code in any subsystem to wait for
> > the ack of the maintainer of that subsystem.
> > 
> > I didn't complain about any missing email addresses, I said the RTC ML
> > was not copied but that is didn't matter.
> 
> A mailing list is an email address.
> 

Yes and I said that it didn't matter it was not copied (the patch didn't
make it in patchwork though).

> > What I don't get is that it has been broken for almost 5 years and now
> > you seem to think it has to be fixed urgently.
> 
> Now you're making crap up.  I've not made any comment about this being
> something that needs fixing urgently.  In fact, as I've said several
> times already, I really don't care what you do with the mainline kernel,
> because I have a fix here locally that I intend to use and maintain
> into the future.  For me, the issue is fixed and resolved, and I intend
> to spend no further time developing any fixes for it.
> 

Oh great, that's really the way to go to make progress.

> My comments are about the _two months_ its taken to get to the stage of
> finding out that you don't like the approach.
> 

While I can understand some of the frustration, I don't get why you are
giving up so easily.

> The result of this is, most likely, one of:
> 1. you'll revert the patch (which, incidentally, has no real effect until
>    my other patches get merged) and everyone will either be stuck with a
>    kernel that sets their RTC time wrong when they have NTP installed
> 

Or when using hwclock from util-linux

> 2. you'll remove the systohc code, people won't realise that their kernel
>    no longer sync's the RTC time, and systems that are on the Internet
>    (eg, providing stratum 1 NTP servers) will be unreliable if they are
>    rebooted even more so than they are today.
> 
> > On my side, I want to take the opportunity to think about systohc before
> > adding an ABI that we will maybe regret later.
> 
> Fine, but bear in mind that I don't care anymore, because I have
> perfectly good fixes locally, and I'm not wasting any further time
> in this issue.
> 
> While you think about it, I would encourage anyone who wishing to run
> or running a public NTP server using rtclib to ditch Linux and use
> FreeBSD instead to avoid their NTP server supplying false time, even
> when synchronised to a reference clock.  A reboot of such a server
> will supply false time for a period of a few hours depending on how
> far the time is out (which can be up to a couple of seconds) after
> such an event.  In such a scenario, NTP will not step-correct the
> time but will gradually slew it instead.
> 
> This goes for systems that some ISPs deploy as stratum 1 NTP servers,
> which today tend to be Raspberry Pis.

Alexandre Belloni Nov. 30, 2017, 7:43 p.m. UTC | #27

Please disregard this email, this was a draft that should not have been
sent, written while I was still too annoyed about the whole situation.

On 30/11/2017 at 20:39:59 +0100, Alexandre Belloni wrote:
> On 28/11/2017 at 10:20:25 +0000, Russell King - ARM Linux wrote:
> > On Mon, Nov 27, 2017 at 08:31:35PM +0100, Alexandre Belloni wrote:
> > > On 27/11/2017 at 18:53:52 +0000, Russell King - ARM Linux wrote:
> > > > You are, yet again, wrong.
> > > > 
> > > > I am in a position to make the comment because it was me who identified
> > > > the problem, put in the hours to work on, develop and extensively test
> > > > Jason's patch.  So, it's partly my time that you seem to be wasting,
> > > > and that gives me every right to complain at this point.
> > > > 
> > > > You, on the other hand, were copied with every single email, and did
> > > > nothing to discuss the issue except for the "easy" bits when I posted
> > > > a relatively smaller patch - but you ignored the bigger issue.
> > > 
> > > And this is exactly what you do with other people patches/time when you
> > > don't like their changes.
> > > You simply ignore the patch series until they go away.
> > 
> > That is not intentional.
> > 
> 
> But that is exactly what you are currently reproaching me right now. I
> didn't have the time to have a good look a it so I didn't reply it
> wasn't intentional either.
> 
> You sometimes don't reply to patch series, I sometimes take time to
> reply, I don't see how this is different then.
> 
> > > I would really expect people merging code in any subsystem to wait for
> > > the ack of the maintainer of that subsystem.
> > > 
> > > I didn't complain about any missing email addresses, I said the RTC ML
> > > was not copied but that is didn't matter.
> > 
> > A mailing list is an email address.
> > 
> 
> Yes and I said that it didn't matter it was not copied (the patch didn't
> make it in patchwork though).
> 
> > > What I don't get is that it has been broken for almost 5 years and now
> > > you seem to think it has to be fixed urgently.
> > 
> > Now you're making crap up.  I've not made any comment about this being
> > something that needs fixing urgently.  In fact, as I've said several
> > times already, I really don't care what you do with the mainline kernel,
> > because I have a fix here locally that I intend to use and maintain
> > into the future.  For me, the issue is fixed and resolved, and I intend
> > to spend no further time developing any fixes for it.
> > 
> 
> Oh great, that's really the way to go to make progress.
> 
> > My comments are about the _two months_ its taken to get to the stage of
> > finding out that you don't like the approach.
> > 
> 
> While I can understand some of the frustration, I don't get why you are
> giving up so easily.
> 
> > The result of this is, most likely, one of:
> > 1. you'll revert the patch (which, incidentally, has no real effect until
> >    my other patches get merged) and everyone will either be stuck with a
> >    kernel that sets their RTC time wrong when they have NTP installed
> > 
> 
> Or when using hwclock from util-linux
> 
> > 2. you'll remove the systohc code, people won't realise that their kernel
> >    no longer sync's the RTC time, and systems that are on the Internet
> >    (eg, providing stratum 1 NTP servers) will be unreliable if they are
> >    rebooted even more so than they are today.
> > 
> > > On my side, I want to take the opportunity to think about systohc before
> > > adding an ABI that we will maybe regret later.
> > 
> > Fine, but bear in mind that I don't care anymore, because I have
> > perfectly good fixes locally, and I'm not wasting any further time
> > in this issue.
> > 
> > While you think about it, I would encourage anyone who wishing to run
> > or running a public NTP server using rtclib to ditch Linux and use
> > FreeBSD instead to avoid their NTP server supplying false time, even
> > when synchronised to a reference clock.  A reboot of such a server
> > will supply false time for a period of a few hours depending on how
> > far the time is out (which can be up to a couple of seconds) after
> > such an event.  In such a scenario, NTP will not step-correct the
> > time but will gradually slew it instead.
> > 
> > This goes for systems that some ISPs deploy as stratum 1 NTP servers,
> > which today tend to be Raspberry Pis.
> 
> -- 
> Alexandre Belloni, Free Electrons
> Embedded Linux and Kernel engineering
> http://free-electrons.com

rtc: Allow rtc drivers to specify the tv_nsec value for ntp

Commit Message

Comments

Patch