diff mbox

On NTP, RTCs and accurately setting their time

Message ID 20170921220510.GA11395@obsidianresearch.com (mailing list archive)
State New, archived
Headers show

Commit Message

Jason Gunthorpe Sept. 21, 2017, 10:05 p.m. UTC
On Thu, Sep 21, 2017 at 10:32:19AM +0100, Russell King - ARM Linux
wrote:

I've combined several of your messages together into this reply...

> static inline bool rtc_tv_nsec_ok(struct rtc_device *rtc,
> 				  struct timespec64 *now)
> {
> 	long diff;
> 
> 	diff = (now->tv_nsec - rtc->time_set_nsec) % NSEC_PER_SEC;
> 	/* diff must be within [-NSEC_PER_SEC/2..NSEC_PER_SEC/2) */
> 	if (diff >= NSEC_PER_SEC / 2)
> 		diff -= NSEC_PER_SEC;
> 	else if (diff < -NSEC_PER_SEC / 2)
> 		diff += NSEC_PER_SEC;

I like your concept of allowing a negative time_set_nsec, I was
focused on a positive value and botched the first draft by not casting
to unsigned for the maths.. It seems much better because it
incorporates the rounding calculation into the common code instead of
shifting it to the driver.

I revised this idea a few times and came up with something a bit
simpler to understand by changing from a time_set_nsec to its inverse
the set_offset_nsec, that makes the maths simpler and we can avoid
troublesome signed modulos by using time maths functions instead of
open coding them.

See full revised patch below, this time I did test the
rtc_tv_nsec_ok() using your test vectors and it seems OK. Still have
not tested the whole patch...

So with v2 of this patch, the driver would specify positive 1s, and
then the set will be called when tv_nsec == 0, but the tv_sec will
be +1.

Similarly the existing case of 0.5s offset will work more like
before, where it will be called with tv_nsec == .5s and the tv_sec
will be +1 - before I think this happened reliably 'by accident' due
to the rounding.

> Most drivers use rtc_device_register() or devm_rtc_device_register()
> rather than rtc_allocate_device() (which is static.)  This does not
> give RTC drivers a chance to set rtc->time_set_nsec before the RTC
> is registered with the kernel.

You are correct about the race, of course. I left it like that
deliberately, I'm not sure it is worth solving since I felt nothing
should be setting the RTC within a few CPU instructions of registering
it.

As you say, adding a pre-registration init op will solve the issue if
you think it is worth solving.

Jason

From 566712cb27596836469637b0e50e7fd973368df8 Mon Sep 17 00:00:00 2001
From: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Date: Thu, 21 Sep 2017 15:55:33 -0600
Subject: [PATCH v2] rtc: Allow rtc drivers to specify the tv_nsec value for ntp

ntp is currently hardwired to try and call the rtc set when wall clock
tv_nsec is 0.5 seconds. This historical behaviour works well with certain
PC RTCs, but is not universal to all rtc hardware.

Change how this works by introducing the driver specific concept of
a required delay between current wall clock time and the target time
to set (with a 0 tv_nsecs). When this delay is set to 0.5 seconds then
the behaviour will be the same as today, at wall clock time 0.5 sec the
RTC set will be called to set 1.0 sec.

Each RTC driver should set the set_offset_nsec according to its needs.

This also polices the incoming set time ts_nsec from NTP, if it is not
within +-1ms of the required time then the set is deferred. The target
time to set is 'snapped' to the nearest value.

The calculation of the sleep time for ntp is also revised to use modern
helper functions, and to more directly and safely compute a relative
jiffies delay that will result in the correct tv_nsec. If for some reason
the timer does not meet the requirement then it does a short sleep
and tries again.

Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
---
 drivers/rtc/class.c   |  6 ++++
 drivers/rtc/systohc.c | 85 ++++++++++++++++++++++++++++++++++++++++++---------
 include/linux/rtc.h   | 10 +++++-
 kernel/time/ntp.c     | 62 ++++++++++++++++++++++++-------------
 4 files changed, 125 insertions(+), 38 deletions(-)

Comments

Russell King (Oracle) Sept. 21, 2017, 10:45 p.m. UTC | #1
On Thu, Sep 21, 2017 at 04:05:10PM -0600, Jason Gunthorpe wrote:
> On Thu, Sep 21, 2017 at 10:32:19AM +0100, Russell King - ARM Linux
> wrote:
> 
> I've combined several of your messages together into this reply...

Sorry about that, I'm currently ill and sleep deprived, so it's been
a struggle to pick up on everything together.

> So with v2 of this patch, the driver would specify positive 1s, and
> then the set will be called when tv_nsec == 0, but the tv_sec will
> be +1.
> 
> Similarly the existing case of 0.5s offset will work more like
> before, where it will be called with tv_nsec == .5s and the tv_sec
> will be +1 - before I think this happened reliably 'by accident' due
> to the rounding.

It seems to mean (from reading the code) that I'd need to specify
-470ms for the PCF8523, if I'm understand what you're saying and
the code correctly.

I should point out, however, that I'm running a modified version
of your previous patch with 470ms specified - I added a debug
printk() for the failure cases, and I seem to be getting quite a
number:

[  416.966909] rtc_tv_nsec_ok: fail: ts=1506001533.510194975
[  427.974142] rtc_tv_nsec_ok: fail: ts=1506001544.518073728
[  768.946856] rtc_tv_nsec_ok: fail: ts=1506001885.510091123
[  779.954207] rtc_tv_nsec_ok: fail: ts=1506001896.518068045
...
[31391.230806] rtc_tv_nsec_ok: fail: ts=1506032509.510048412
[31402.238221] rtc_tv_nsec_ok: fail: ts=1506032520.518080991

176 so far in that window, which equates to an average failure rate
of one failure every 3m5s.  This is on a kernel with:

CONFIG_NO_HZ_IDLE=y
CONFIG_NO_HZ=y
CONFIG_HIGH_RES_TIMERS=y
CONFIG_HZ=250

From what I can see, we have failures that are 11s apart and 340s
apart.  If we assume that we had a successful write at 1506001555s,
then that means we only waited 329s before we next tried to set the
RTC, not the 11 minutes that we should have.  Not sure yet what's
going on there, but it seems to suggest that we're struggling to
set the RTC, taking multiple attempts to achieve it.

Also... don't we want to move the call to rtc_set_ntp_time() from out
of the "if (abs(now.tv_nsec - (NSEC_PER_SEC / 2)) <= tick_nsec * 5) {"
check, otherwise how could it be called except at about .5s?

I'm actually not sure if we are ending up setting the time - my test
program reports:

RTC Time: 21-09-2017 22:40:45
System Time was:     22:40:44.997

so the RTC has drifted off by 3ms wrt system time, and that suggests
it hasn't been set.  It's difficult to know.  Given that tick_nsec is
likely to be 4ms (I've not checked) it would suggest that we're never
successfully setting the RTC - since either the test in ntp.c fails
or the test in systohc.c fails.  Both can't succeed given these
constraints.
Jason Gunthorpe Sept. 21, 2017, 11:20 p.m. UTC | #2
On Thu, Sep 21, 2017 at 11:45:41PM +0100, Russell King - ARM Linux wrote:
> > I've combined several of your messages together into this reply...
> 
> Sorry about that, I'm currently ill and sleep deprived, so it's been
> a struggle to pick up on everything together.

No worries, your replys were easy to follow, thanks

> > So with v2 of this patch, the driver would specify positive 1s, and
> > then the set will be called when tv_nsec == 0, but the tv_sec will
> > be +1.
> > 
> > Similarly the existing case of 0.5s offset will work more like
> > before, where it will be called with tv_nsec == .5s and the tv_sec
> > will be +1 - before I think this happened reliably 'by accident' due
> > to the rounding.
> 
> It seems to mean (from reading the code) that I'd need to specify
> -470ms for the PCF8523, if I'm understand what you're saying and
> the code correctly.

Okay.. but I guessed a negative number is probably not what most RTCs would
want, eg a positive number would compensate for delays executing an
I2C command, assuming the RTC zeros its divider when it sets the time.

> I should point out, however, that I'm running a modified version
> of your previous patch with 470ms specified - I added a debug
> printk() for the failure cases, and I seem to be getting quite a
> number:
> 
> [  416.966909] rtc_tv_nsec_ok: fail: ts=1506001533.510194975

> then that means we only waited 329s before we next tried to set the
> RTC, not the 11 minutes that we should have.  Not sure yet what's
> going on there, but it seems to suggest that we're struggling to
> set the RTC, taking multiple attempts to achieve it.

Ah..

> Also... don't we want to move the call to rtc_set_ntp_time() from out
> of the "if (abs(now.tv_nsec - (NSEC_PER_SEC / 2)) <= tick_nsec * 5) {"
> check, otherwise how could it be called except at about .5s?

Yes, definitely. If we do not call through to rtc_set_ntp_time then it
does not update target_nsec, and we go back to the 0.5 offset not the
offset you want, which will certainly cause random rtc_tv_nsec_ok
failures.

My bad to not notice that.. But I'm not sure what the revision should
be.. I think this is using tick_nsec similarly to TIME_SET_NSEC_FUZZ,
but I'm not sure where tick_nsec comes from or if we should push it
down in to TIME_SET_NSEC_FUZZ, or something else.

> I'm actually not sure if we are ending up setting the time - my test
> program reports:

Probably not, given the above!

Jason
Russell King (Oracle) Sept. 22, 2017, 9:57 a.m. UTC | #3
On Thu, Sep 21, 2017 at 05:20:40PM -0600, Jason Gunthorpe wrote:
> On Thu, Sep 21, 2017 at 11:45:41PM +0100, Russell King - ARM Linux wrote:
> > > I've combined several of your messages together into this reply...
> > 
> > Sorry about that, I'm currently ill and sleep deprived, so it's been
> > a struggle to pick up on everything together.
> 
> No worries, your replys were easy to follow, thanks

Feeling better today, thankfully.

> > > So with v2 of this patch, the driver would specify positive 1s, and
> > > then the set will be called when tv_nsec == 0, but the tv_sec will
> > > be +1.
> > > 
> > > Similarly the existing case of 0.5s offset will work more like
> > > before, where it will be called with tv_nsec == .5s and the tv_sec
> > > will be +1 - before I think this happened reliably 'by accident' due
> > > to the rounding.
> > 
> > It seems to mean (from reading the code) that I'd need to specify
> > -470ms for the PCF8523, if I'm understand what you're saying and
> > the code correctly.
> 
> Okay.. but I guessed a negative number is probably not what most RTCs would
> want, eg a positive number would compensate for delays executing an
> I2C command, assuming the RTC zeros its divider when it sets the time.

Okay, I thought I'd investigate a few other systems:

With the mainline code, which sets the RTC to tv_sec + 1 when tv_nsec
>= 500ms or tv_sec when tv_nsec < 500ms, at about tv_nsec = 500ms:

Dove - drifting about 4-5ms per minute:
RTC Time: 22-09-2017 09:06:38
System Time was:     09:06:37.464
RTC Time: 22-09-2017 09:07:38
System Time was:     09:07:37.508
RTC Time: 22-09-2017 09:08:39
System Time was:     09:08:38.504
...
RTC Time: 22-09-2017 09:17:48
System Time was:     09:17:47.464
RTC Time: 22-09-2017 09:18:48
System Time was:     09:18:47.500

I suspect this wants the time to be set at or around tv_nsec = 0.


Armada 388 - drifting at about 2-3ms per minute:
RTC Time: 22-09-2017 08:22:25
System Time was:     08:22:25.498
...
RTC Time: 22-09-2017 08:30:42
System Time was:     08:30:42.516
RTC Time: 22-09-2017 08:31:43
System Time was:     08:31:43.518
RTC Time: 22-09-2017 08:32:45
System Time was:     08:32:44.520
RTC Time: 22-09-2017 08:33:46
System Time was:     08:33:45.522
...
RTC Time: 22-09-2017 08:41:54
System Time was:     08:41:53.540
RTC Time: 22-09-2017 08:42:55
System Time was:     08:42:54.542
RTC Time: 22-09-2017 08:43:56
System Time was:     08:43:55.520
RTC Time: 22-09-2017 08:44:57
System Time was:     08:44:56.522

This looks the same as the Dove case - although there's some flucuation
in the second offset, probably caused by the 500ms threshold in
rtc_set_ntp_time().

> > Also... don't we want to move the call to rtc_set_ntp_time() from out
> > of the "if (abs(now.tv_nsec - (NSEC_PER_SEC / 2)) <= tick_nsec * 5) {"
> > check, otherwise how could it be called except at about .5s?
> 
> Yes, definitely. If we do not call through to rtc_set_ntp_time then it
> does not update target_nsec, and we go back to the 0.5 offset not the
> offset you want, which will certainly cause random rtc_tv_nsec_ok
> failures.
> 
> My bad to not notice that.. But I'm not sure what the revision should
> be.. I think this is using tick_nsec similarly to TIME_SET_NSEC_FUZZ,
> but I'm not sure where tick_nsec comes from or if we should push it
> down in to TIME_SET_NSEC_FUZZ, or something else.

I think tick_nsec is the length of one tick in nanoseconds, adjusted
by NTP (see ntp_update_frequency() where it's calculated.)

It's probably easiest to think about it in terms of the old timekeeping
model, where we had a periodic tick whose period was fixed and defined
the point at which time incremented.  tick_nsec would be the amount of
time for one period.

I think that's still relevant in a modern kernel, especially when it's
using a clock event in periodic mode, as that's when we get the regular
interrupts which guarantee a scheduling point (other scheduling points
are available!) and hence when the workqueue can be run.

The only places it's used is by the NTP code, and also by a few
architectures to that still have gettimeoffset() functionality (pre-
clocksource) as it can be used to derive the scaling factor to convert
timer count ticks to wall time.

I also don't see an obvious rearrangement of the code to fix this -
the problem is we need to call update_persistent_clock64() to know
whether we should call rtc_set_ntp_time(), but we can only call
update_persistent_clock64() on tv_nsec=500ms.

It looks like the users of update_persistent_clock*() are:
arch/mips/kernel/time.c
arch/powerpc/kernel/time.c
arch/sh/kernel/time.c
arch/x86/kernel/rtc.c

I thought SH was introduced after rtclib was born?

All of these have some kind of architecture specific multiplexing of
the update_persistent_clock() call - switch statements depending on
rtc type, function pointers set according to the rtc type, etc.  They
all look like legacy code that no one wants to update to rtclib.

Maybe the solution here is to split two forms of ntp RTC synchronisation
so:
- If you enable CONFIG_GENERIC_CMOS_UPDATE, then you get the
  update_persistent_clock*() methods called.
- If you enable CONFIG_RTC_SYSTOHC, rtc_set_ntp_time() gets called.
- If you enable both, then both get called at their appropriate times.

This would certainly simplify sync_cmos_clock().
Russell King (Oracle) Sept. 22, 2017, 12:24 p.m. UTC | #4
On Fri, Sep 22, 2017 at 10:57:13AM +0100, Russell King - ARM Linux wrote:
> Maybe the solution here is to split two forms of ntp RTC synchronisation
> so:
> - If you enable CONFIG_GENERIC_CMOS_UPDATE, then you get the
>   update_persistent_clock*() methods called.
> - If you enable CONFIG_RTC_SYSTOHC, rtc_set_ntp_time() gets called.
> - If you enable both, then both get called at their appropriate times.
> 
> This would certainly simplify sync_cmos_clock().

I've just tried that out, and it doesn't work very well (not wrapping
the kernel messages):

[   18.827144] sync_rtc_clock: adjust=1506081346.710933001 target=470000000 next=1506081357.470000000 now=1506081346.710953334 delta=10.759046666 jiffies=2690
[   29.662077] sync_rtc_clock: adjust=1506081357.546474534 target=470000000 next=1506081368.470000000 now=1506081357.546487201 delta=10.923512799 jiffies=2731
[   40.671669] sync_rtc_clock: adjust=1506081368.556684059 target=470000000 next=1506081379.470000000 now=1506081368.556696727 delta=10.913303273 jiffies=2729
[   51.675016] sync_rtc_clock: adjust=1506081379.560649235 target=470000000 next=1506081390.470000000 now=1506081379.560662236 delta=10.909337764 jiffies=2728
[   62.682385] sync_rtc_clock: adjust=1506081390.568637970 target=470000000 next=1506081401.470000000 now=1506081390.568648637 delta=10.901351363 jiffies=2726
[   73.689831] sync_rtc_clock: adjust=1506081401.576696708 target=470000000 next=1506081412.470000000 now=1506081401.576708376 delta=10.893291624 jiffies=2724
[   84.697233] sync_rtc_clock: adjust=1506081412.584695110 target=470000000 next=1506081423.470000000 now=1506081412.584723778 delta=10.885276222 jiffies=2722
[   95.704624] sync_rtc_clock: adjust=1506081423.592724847 target=470000000 next=1506081434.470000000 now=1506081423.592736847 delta=10.877263153 jiffies=2720
[  106.711941] sync_rtc_clock: adjust=1506081434.600651911 target=470000000 next=1506081445.470000000 now=1506081434.600665579 delta=10.869334421 jiffies=2718
[  117.719334] sync_rtc_clock: adjust=1506081445.608659647 target=470000000 next=1506081456.470000000 now=1506081445.608673981 delta=10.861326019 jiffies=2716
[  128.726723] sync_rtc_clock: adjust=1506081456.616661715 target=470000000 next=1506081467.470000000 now=1506081456.616676049 delta=10.853323951 jiffies=2714
[  139.734103] sync_rtc_clock: adjust=1506081467.624659450 target=470000000 next=1506081478.470000000 now=1506081467.624672451 delta=10.845327549 jiffies=2712
[  150.741489] sync_rtc_clock: adjust=1506081478.632661185 target=470000000 next=1506081489.470000000 now=1506081478.632673519 delta=10.837326481 jiffies=2710
[  161.748870] sync_rtc_clock: adjust=1506081489.640655587 target=470000000 next=1506081500.470000000 now=1506081489.640668254 delta=10.829331746 jiffies=2708
[  172.756352] sync_rtc_clock: adjust=1506081500.648758661 target=470000000 next=1506081511.470000000 now=1506081500.648772995 delta=10.821227005 jiffies=2706
[  183.763642] sync_rtc_clock: adjust=1506081511.656658057 target=470000000 next=1506081522.470000000 now=1506081511.656671391 delta=10.813328609 jiffies=2704
[  194.771043] sync_rtc_clock: adjust=1506081522.664678127 target=470000000 next=1506081533.470000000 now=1506081522.664689794 delta=10.805310206 jiffies=2702
[  205.778402] sync_rtc_clock: adjust=1506081533.672649194 target=470000000 next=1506081544.470000000 now=1506081533.672662528 delta=10.797337472 jiffies=2700
[  216.785780] sync_rtc_clock: adjust=1506081544.680643928 target=470000000 next=1506081555.470000000 now=1506081544.680656929 delta=10.789343071 jiffies=2698
[  227.792447] sync_rtc_clock: adjust=1506081555.688650369 target=470000000 next=1506081566.470000000 now=1506081555.688664037 delta=10.781335963 jiffies=2696
[  238.798968] sync_rtc_clock: adjust=1506081566.696674887 target=470000000 next=1506081577.470000000 now=1506081566.696687889 delta=10.773312111 jiffies=2694
[  249.805483] sync_rtc_clock: adjust=1506081577.704652923 target=470000000 next=1506081588.470000000 now=1506081577.704666259 delta=10.765333741 jiffies=2692
[  260.812078] sync_rtc_clock: adjust=1506081588.712679979 target=470000000 next=1506081599.470000000 now=1506081588.712693647 delta=10.757306353 jiffies=2690
[  271.818656] sync_rtc_clock: adjust=1506081599.720654429 target=470000000 next=1506081610.470000000 now=1506081599.720668431 delta=10.749331569 jiffies=2688
[  282.825311] sync_rtc_clock: adjust=1506081610.728677710 target=470000000 next=1506081621.470000000 now=1506081610.728690045 delta=10.741309955 jiffies=2686
[  293.575979] sync_rtc_clock: adjust=1506081621.480653995 target=470000000 next=1506081632.470000000 now=1506081621.480666996 delta=10.989333004 jiffies=2748
[  304.582667] sync_rtc_clock: adjust=1506081632.488651088 target=470000000 next=1506081643.470000000 now=1506081632.488663756 delta=10.981336244 jiffies=2746
[  315.589372] sync_rtc_clock: adjust=1506081643.496641761 target=470000000 next=1506081654.470000000 now=1506081643.496653429 delta=10.973346571 jiffies=2744

As previous mentioned, this has HZ=250, so one jiffy is about 4ms.  What
this shows is that we're failing every single time to hit the desired
window - the closest we got in the 5 minutes of uptime to 470ms was
480653995ns.

The comments in linux/workqueue.h refer to a gmane link -
http://thread.gmane.org/gmane.linux.kernel/1480396 but this seems to be
dead - "Archived At Nothing found - bye".  Looks like using gmane URLs
in kernel code is not a good idea, they don't seem to be persistent!

It's interesting to watch what's happening there - despite reducing the
number of jiffies each time, we still seem to be woken later and later
each time, with an approximate increase of 8ms or two jiffies each time.
Eg:

[  293.575979] sync_rtc_clock: adjust=1506081621.480653995 target=470000000
 next=1506081632.470000000 now=1506081621.480666996 delta=10.989333004
 jiffies=2748
[  304.582667] sync_rtc_clock: adjust=1506081632.488651088 target=470000000
 next=1506081643.470000000 now=1506081632.488663756 delta=10.981336244
 jiffies=2746
[  315.589372] sync_rtc_clock: adjust=1506081643.496641761 target=470000000
 next=1506081654.470000000 now=1506081643.496653429 delta=10.973346571
 jiffies=2744

If we do the math, then:

 2748 jiffies is 10.992s, which is 10.989333004 rounded up to a jiffy.
 10.992 + 1506081621.480666996 gives 1506081632.472666996, but we're
 next called at 1506081632.488651088, which is almost 4 jiffies later
 than requested.

 2746 jiffies is 10.984s, which is 10.981336244 rounded up to a jiffy.
 10.984 + 1506081632.488663756 gives 1506081643.472663756, but we're
 next called at 1506081643.496641761, which is almost 6 jiffies later
 than requested.

I'm not sure where those extra jiffies are coming from... but maybe
using the power efficient wq is not the best for accuracy?

Also notice that the target time seems to always be about 2.6ms late.
based on the requested jiffies, which in itself would mean we miss
the 1ms window in the systohc.c code.
Russell King (Oracle) Sept. 22, 2017, 4:28 p.m. UTC | #5
On Fri, Sep 22, 2017 at 01:24:20PM +0100, Russell King - ARM Linux wrote:
> On Fri, Sep 22, 2017 at 10:57:13AM +0100, Russell King - ARM Linux wrote:
> > Maybe the solution here is to split two forms of ntp RTC synchronisation
> > so:
> > - If you enable CONFIG_GENERIC_CMOS_UPDATE, then you get the
> >   update_persistent_clock*() methods called.
> > - If you enable CONFIG_RTC_SYSTOHC, rtc_set_ntp_time() gets called.
> > - If you enable both, then both get called at their appropriate times.
> > 
> > This would certainly simplify sync_cmos_clock().
> 
> I've just tried that out, and it doesn't work very well (not wrapping
> the kernel messages):
> 
> [   18.827144] sync_rtc_clock: adjust=1506081346.710933001 target=470000000 next=1506081357.470000000 now=1506081346.710953334 delta=10.759046666 jiffies=2690
> [   29.662077] sync_rtc_clock: adjust=1506081357.546474534 target=470000000 next=1506081368.470000000 now=1506081357.546487201 delta=10.923512799 jiffies=2731
> [   40.671669] sync_rtc_clock: adjust=1506081368.556684059 target=470000000 next=1506081379.470000000 now=1506081368.556696727 delta=10.913303273 jiffies=2729
> [   51.675016] sync_rtc_clock: adjust=1506081379.560649235 target=470000000 next=1506081390.470000000 now=1506081379.560662236 delta=10.909337764 jiffies=2728
> [   62.682385] sync_rtc_clock: adjust=1506081390.568637970 target=470000000 next=1506081401.470000000 now=1506081390.568648637 delta=10.901351363 jiffies=2726
> [   73.689831] sync_rtc_clock: adjust=1506081401.576696708 target=470000000 next=1506081412.470000000 now=1506081401.576708376 delta=10.893291624 jiffies=2724
> [   84.697233] sync_rtc_clock: adjust=1506081412.584695110 target=470000000 next=1506081423.470000000 now=1506081412.584723778 delta=10.885276222 jiffies=2722
> [   95.704624] sync_rtc_clock: adjust=1506081423.592724847 target=470000000 next=1506081434.470000000 now=1506081423.592736847 delta=10.877263153 jiffies=2720
> [  106.711941] sync_rtc_clock: adjust=1506081434.600651911 target=470000000 next=1506081445.470000000 now=1506081434.600665579 delta=10.869334421 jiffies=2718
> [  117.719334] sync_rtc_clock: adjust=1506081445.608659647 target=470000000 next=1506081456.470000000 now=1506081445.608673981 delta=10.861326019 jiffies=2716
> [  128.726723] sync_rtc_clock: adjust=1506081456.616661715 target=470000000 next=1506081467.470000000 now=1506081456.616676049 delta=10.853323951 jiffies=2714
> [  139.734103] sync_rtc_clock: adjust=1506081467.624659450 target=470000000 next=1506081478.470000000 now=1506081467.624672451 delta=10.845327549 jiffies=2712
> [  150.741489] sync_rtc_clock: adjust=1506081478.632661185 target=470000000 next=1506081489.470000000 now=1506081478.632673519 delta=10.837326481 jiffies=2710
> [  161.748870] sync_rtc_clock: adjust=1506081489.640655587 target=470000000 next=1506081500.470000000 now=1506081489.640668254 delta=10.829331746 jiffies=2708
> [  172.756352] sync_rtc_clock: adjust=1506081500.648758661 target=470000000 next=1506081511.470000000 now=1506081500.648772995 delta=10.821227005 jiffies=2706
> [  183.763642] sync_rtc_clock: adjust=1506081511.656658057 target=470000000 next=1506081522.470000000 now=1506081511.656671391 delta=10.813328609 jiffies=2704
> [  194.771043] sync_rtc_clock: adjust=1506081522.664678127 target=470000000 next=1506081533.470000000 now=1506081522.664689794 delta=10.805310206 jiffies=2702
> [  205.778402] sync_rtc_clock: adjust=1506081533.672649194 target=470000000 next=1506081544.470000000 now=1506081533.672662528 delta=10.797337472 jiffies=2700
> [  216.785780] sync_rtc_clock: adjust=1506081544.680643928 target=470000000 next=1506081555.470000000 now=1506081544.680656929 delta=10.789343071 jiffies=2698
> [  227.792447] sync_rtc_clock: adjust=1506081555.688650369 target=470000000 next=1506081566.470000000 now=1506081555.688664037 delta=10.781335963 jiffies=2696
> [  238.798968] sync_rtc_clock: adjust=1506081566.696674887 target=470000000 next=1506081577.470000000 now=1506081566.696687889 delta=10.773312111 jiffies=2694
> [  249.805483] sync_rtc_clock: adjust=1506081577.704652923 target=470000000 next=1506081588.470000000 now=1506081577.704666259 delta=10.765333741 jiffies=2692
> [  260.812078] sync_rtc_clock: adjust=1506081588.712679979 target=470000000 next=1506081599.470000000 now=1506081588.712693647 delta=10.757306353 jiffies=2690
> [  271.818656] sync_rtc_clock: adjust=1506081599.720654429 target=470000000 next=1506081610.470000000 now=1506081599.720668431 delta=10.749331569 jiffies=2688
> [  282.825311] sync_rtc_clock: adjust=1506081610.728677710 target=470000000 next=1506081621.470000000 now=1506081610.728690045 delta=10.741309955 jiffies=2686
> [  293.575979] sync_rtc_clock: adjust=1506081621.480653995 target=470000000 next=1506081632.470000000 now=1506081621.480666996 delta=10.989333004 jiffies=2748
> [  304.582667] sync_rtc_clock: adjust=1506081632.488651088 target=470000000 next=1506081643.470000000 now=1506081632.488663756 delta=10.981336244 jiffies=2746
> [  315.589372] sync_rtc_clock: adjust=1506081643.496641761 target=470000000 next=1506081654.470000000 now=1506081643.496653429 delta=10.973346571 jiffies=2744
> 
> As previous mentioned, this has HZ=250, so one jiffy is about 4ms.  What
> this shows is that we're failing every single time to hit the desired
> window - the closest we got in the 5 minutes of uptime to 470ms was
> 480653995ns.
> 
> The comments in linux/workqueue.h refer to a gmane link -
> http://thread.gmane.org/gmane.linux.kernel/1480396 but this seems to be
> dead - "Archived At Nothing found - bye".  Looks like using gmane URLs
> in kernel code is not a good idea, they don't seem to be persistent!
> 
> It's interesting to watch what's happening there - despite reducing the
> number of jiffies each time, we still seem to be woken later and later
> each time, with an approximate increase of 8ms or two jiffies each time.
> Eg:
> 
> [  293.575979] sync_rtc_clock: adjust=1506081621.480653995 target=470000000
>  next=1506081632.470000000 now=1506081621.480666996 delta=10.989333004
>  jiffies=2748
> [  304.582667] sync_rtc_clock: adjust=1506081632.488651088 target=470000000
>  next=1506081643.470000000 now=1506081632.488663756 delta=10.981336244
>  jiffies=2746
> [  315.589372] sync_rtc_clock: adjust=1506081643.496641761 target=470000000
>  next=1506081654.470000000 now=1506081643.496653429 delta=10.973346571
>  jiffies=2744
> 
> If we do the math, then:
> 
>  2748 jiffies is 10.992s, which is 10.989333004 rounded up to a jiffy.
>  10.992 + 1506081621.480666996 gives 1506081632.472666996, but we're
>  next called at 1506081632.488651088, which is almost 4 jiffies later
>  than requested.
> 
>  2746 jiffies is 10.984s, which is 10.981336244 rounded up to a jiffy.
>  10.984 + 1506081632.488663756 gives 1506081643.472663756, but we're
>  next called at 1506081643.496641761, which is almost 6 jiffies later
>  than requested.
> 
> I'm not sure where those extra jiffies are coming from... but maybe
> using the power efficient wq is not the best for accuracy?
> 
> Also notice that the target time seems to always be about 2.6ms late.
> based on the requested jiffies, which in itself would mean we miss
> the 1ms window in the systohc.c code.

Okay, the problem seems to be the new timer wheel code - see
https://lwn.net/Articles/646950/

This means that timers set way in the future (like more than a few
milliseconds) can be delayed - they no longer expire "on the specified
jiffies".  Something to bear in mind for the future, as this affects
everything that uses normal timers.

I think the reason this doesn't show up for the original code is
because of your change from a 0-1s retry to a 10s retry - the further
the expiry into the future, the more delayed the timers can get.

I guess we could go back to the 0-1s retry and keep using the delayed
work queues, but I wonder if normal timers will get even less accurate
in the future (if tglx does more work there.)

The suggestion from Arjan van de Ven is to use hrtimers for this - but
we still need the workqueue (as we may sleep) so we need a hrtimer-
delayed workqueue.  On the face of it, that seems a contradiction in
terms to use a hrtimer with a workqueue.

I do have an implementation for the hrtimer suggestion, which I'll try
to sort out during the remainder of the day, once I've ironed out a
few remaining quirks.
diff mbox

Patch

diff --git a/drivers/rtc/class.c b/drivers/rtc/class.c
index 2ed970d61da140..eef4123a573504 100644
--- a/drivers/rtc/class.c
+++ b/drivers/rtc/class.c
@@ -161,6 +161,12 @@  static struct rtc_device *rtc_allocate_device(void)
 
 	device_initialize(&rtc->dev);
 
+	/* Drivers can revise this default after allocating the device. It
+	 * should be the value of wallclock tv_nsec that the driver needs in
+	 * order to synchronize the second tick over during set.
+	 */
+	rtc->set_offset_nsec =  NSEC_PER_SEC / 2;
+
 	rtc->irq_freq = 1;
 	rtc->max_user_freq = 64;
 	rtc->dev.class = rtc_class;
diff --git a/drivers/rtc/systohc.c b/drivers/rtc/systohc.c
index b4a68ffcd06bb8..a7d0bbe2577110 100644
--- a/drivers/rtc/systohc.c
+++ b/drivers/rtc/systohc.c
@@ -7,9 +7,42 @@ 
 #include <linux/rtc.h>
 #include <linux/time.h>
 
+/* Determine if we can call to driver to set the time. Drivers can only be
+ * called to set a second aligned time value, and the field set_offset_nsec
+ * specifies how far away from the second aligned time to call the driver.
+ *
+ * This also compute 'to_set' which is the time we are trying to set, and has
+ * a zero in tv_nsecs, such that:
+ *    to_set - set_delay_nsec == now +/- FUZZ
+ *
+ */
+#define TIME_SET_NSEC_FUZZ (1000 * 1000)
+static inline bool rtc_tv_nsec_ok(struct rtc_device *rtc,
+				  struct timespec64 *to_set,
+				  const struct timespec64 *now)
+{
+	struct timespec64 delay = {.tv_sec = 0,
+				   .tv_nsec = rtc->set_offset_nsec};
+
+	*to_set = timespec64_add(*now, delay);
+
+	if (to_set->tv_nsec < TIME_SET_NSEC_FUZZ) {
+		to_set->tv_nsec = 0;
+		return true;
+	}
+
+	if (to_set->tv_nsec > NSEC_PER_SEC - TIME_SET_NSEC_FUZZ) {
+		to_set->tv_sec++;
+		to_set->tv_nsec = 0;
+		return true;
+	}
+	return false;
+}
+
 /**
  * rtc_set_ntp_time - Save NTP synchronized time to the RTC
  * @now: Current time of day
+ * @target_nsec: pointer for desired now->tv_nsec value
  *
  * Replacement for the NTP platform function update_persistent_clock64
  * that stores time for later retrieval by rtc_hctosys.
@@ -18,30 +51,52 @@ 
  * possible at all, and various other -errno for specific temporary failure
  * cases.
  *
+ * -EPROTO is returned if now.tv_nsec is not close enough to *target_nsec.
+ (
  * If temporary failure is indicated the caller should try again 'soon'
  */
-int rtc_set_ntp_time(struct timespec64 now)
+int rtc_set_ntp_time(struct timespec64 now, long *target_nsec)
 {
 	struct rtc_device *rtc;
 	struct rtc_time tm;
+	struct timespec64 to_set;
 	int err = -ENODEV;
-
-	if (now.tv_nsec < (NSEC_PER_SEC >> 1))
-		rtc_time64_to_tm(now.tv_sec, &tm);
-	else
-		rtc_time64_to_tm(now.tv_sec + 1, &tm);
+	bool ok;
 
 	rtc = rtc_class_open(CONFIG_RTC_SYSTOHC_DEVICE);
-	if (rtc) {
-		/* rtc_hctosys exclusively uses UTC, so we call set_time here,
-		 * not set_mmss. */
-		if (rtc->ops &&
-		    (rtc->ops->set_time ||
-		     rtc->ops->set_mmss64 ||
-		     rtc->ops->set_mmss))
-			err = rtc_set_time(rtc, &tm);
-		rtc_class_close(rtc);
+	if (!rtc)
+		goto out_err;
+
+	if (!rtc->ops || (!rtc->ops->set_time && !rtc->ops->set_mmss64 &&
+			  !rtc->ops->set_mmss))
+		goto out_close;
+
+	/* Compute the value of tv_nsec we require the caller to supply in
+	 * now.tv_nsec.  This is the value such that (now +
+	 * set_offset_nsec).tv_nsec == 0.
+	 */
+	set_normalized_timespec64(&to_set, 0, -rtc->set_offset_nsec);
+	*target_nsec = to_set.tv_nsec;
+
+	/* The ntp code must call this with the correct value in tv_nsec, if
+	 * it does not we update target_nsec and return EPROTO to make the ntp
+	 * code try again later.
+	 */
+	ok = rtc_tv_nsec_ok(rtc, &to_set, &now);
+	if (!ok) {
+		err = -EPROTO;
+		goto out_close;
 	}
 
+	rtc_time64_to_tm(to_set.tv_sec, &tm);
+
+	/* rtc_hctosys exclusively uses UTC, so we call set_time here, not
+	 * set_mmss.
+	 */
+	err = rtc_set_time(rtc, &tm);
+
+out_close:
+	rtc_class_close(rtc);
+out_err:
 	return err;
 }
diff --git a/include/linux/rtc.h b/include/linux/rtc.h
index 0a0f0d14a5fba5..7d526550a700f8 100644
--- a/include/linux/rtc.h
+++ b/include/linux/rtc.h
@@ -137,6 +137,14 @@  struct rtc_device {
 	/* Some hardware can't support UIE mode */
 	int uie_unsupported;
 
+	/* Number of nsec it takes to set the RTC clock. This influences when
+	 * the set ops are called. An offset:
+	 *   - of 0.5 s will call RTC set for wall clock time 10.0 s at 9.5 s
+	 *   - of 1.5 s will call RTC set for wall clock time 10.0 s at 8.5 s
+	 *   - of -0.5 s will call RTC set for wall clock time 10.0 s at 10.5 s
+	 */
+	long set_offset_nsec;
+
 	bool registered;
 
 	struct nvmem_config *nvmem_config;
@@ -174,7 +182,7 @@  extern void devm_rtc_device_unregister(struct device *dev,
 
 extern int rtc_read_time(struct rtc_device *rtc, struct rtc_time *tm);
 extern int rtc_set_time(struct rtc_device *rtc, struct rtc_time *tm);
-extern int rtc_set_ntp_time(struct timespec64 now);
+extern int rtc_set_ntp_time(struct timespec64 now, long *target_nsec);
 int __rtc_read_alarm(struct rtc_device *rtc, struct rtc_wkalrm *alarm);
 extern int rtc_read_alarm(struct rtc_device *rtc,
 			struct rtc_wkalrm *alrm);
diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c
index edf19cc5314043..6d2f521dd97748 100644
--- a/kernel/time/ntp.c
+++ b/kernel/time/ntp.c
@@ -514,17 +514,19 @@  static DECLARE_DELAYED_WORK(sync_cmos_work, sync_cmos_clock);
 
 static void sync_cmos_clock(struct work_struct *work)
 {
-	struct timespec64 now;
-	struct timespec64 next;
+	struct timespec64 now, next, delta;
 	int fail = 1;
+	long target_nsec = NSEC_PER_SEC / 2;
 
 	/*
-	 * If we have an externally synchronized Linux clock, then update
-	 * CMOS clock accordingly every ~11 minutes. Set_rtc_mmss() has to be
-	 * called as close as possible to 500 ms before the new second starts.
-	 * This code is run on a timer.  If the clock is set, that timer
-	 * may not expire at the correct time.  Thus, we adjust...
-	 * We want the clock to be within a couple of ticks from the target.
+	 * If we have an externally synchronized Linux clock, then update CMOS
+	 * clock accordingly every ~11 minutes.  Histiocally Set_rtc_mmss()
+	 * has to be called as close as possible to 500 ms (target_nsec)
+	 * before the new second starts, but new RTC drivers can select a
+	 * different value.  This code is run on a timer.  If the clock is
+	 * set, that timer may not expire at the correct time.  Thus, we
+	 * adjust...  We want the clock to be within a couple of ticks from
+	 * the target.
 	 */
 	if (!ntp_synced()) {
 		/*
@@ -547,25 +549,41 @@  static void sync_cmos_clock(struct work_struct *work)
 
 #ifdef CONFIG_RTC_SYSTOHC
 		if (fail == -ENODEV)
-			fail = rtc_set_ntp_time(adjust);
+			fail = rtc_set_ntp_time(adjust, &target_nsec);
 #endif
 	}
 
-	next.tv_nsec = (NSEC_PER_SEC / 2) - now.tv_nsec - (TICK_NSEC / 2);
-	if (next.tv_nsec <= 0)
-		next.tv_nsec += NSEC_PER_SEC;
+	do {
+		/*
+		 * Compute the next wall clock time to try and set the
+		 * clock
+		 */
+		next = now;
+		if (!fail || fail == -ENODEV)
+			timespec64_add_ns(&next, 659ULL * NSEC_PER_SEC);
+		else
+			/* Update failed, try again in about 10 seconds */
+			timespec64_add_ns(&next, 10ULL * NSEC_PER_SEC);
 
-	if (!fail || fail == -ENODEV)
-		next.tv_sec = 659;
-	else
-		next.tv_sec = 0;
+		/*
+		 * The next call to sync_cmos_clock needs to have have a wall
+		 * clock tv_nsec value equal to target_nsec.
+		 */
+		if (next.tv_nsec > target_nsec)
+			next.tv_sec++;
+		next.tv_nsec = target_nsec;
 
-	if (next.tv_nsec >= NSEC_PER_SEC) {
-		next.tv_sec++;
-		next.tv_nsec -= NSEC_PER_SEC;
-	}
-	queue_delayed_work(system_power_efficient_wq,
-			   &sync_cmos_work, timespec64_to_jiffies(&next));
+		/*
+		 * Convert to a relative delay. If time set took a really long
+		 * time, or the wall clock was changed, this might become
+		 * negative, so try again.
+		 */
+		getnstimeofday64(&now);
+		delta = timespec64_sub(next, now);
+	} while (delta.tv_sec <= 0);
+
+	queue_delayed_work(system_power_efficient_wq, &sync_cmos_work,
+			   timespec64_to_jiffies(&delta));
 }
 
 void ntp_notify_cmos_timer(void)