Message ID | 20170920112152.GL20805@n2100.armlinux.org.uk (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Wed, Sep 20, 2017 at 12:21:52PM +0100, Russell King - ARM Linux wrote: > Hi, > > It's common for systems to be time synchronised using programs such as > chrony or ntpd. In such situations, when we are properly synchronised, > the kernel writes the current time to the RTC every 11 seconds. That should be 11 minutes. > > However, assumptions are made about the RTC: > > 1. kernel/time/ntp.c assumes that all RTCs want to be told to set the > time at around 500ms into the second. > > 2. drivers/rtc/systohc.c assumes that if the time being set is >= 500ms, > then we want to set the _next_ second. > > This leads to RTCs being set with an offset time, where the offset > depends on the RTC. For example, the PCF8523 ends up reliably set > around 970ms in the future: > > RTC Time: 19-09-2017 14:27:51 > System Time was: 14:27:50.031 > > What this means is that when the RTC ticked from 50 to 51 seconds, > system time was at 50.031s. Hence, the RTC in this case is 969ms > in the future. > > For Armada 388, the situation is different: > > RTC Time: 19-09-2017 14:48:17 > System Time was: 14:48:16.521 > > Here, the RTC is being set 479ms in the future. > > The SNVS RTC in imx6 is different again, although I have no definitive > figures yet. > > Why does this matter - if you introduce a 1s offset in system time > (eg, you reboot a machine running NTP) then it takes at least four hours > after the reboot for ntpd to settle down, with the PPM swinging faster > and slower than normal as it compensates for the offset. This means if > you are using a Linux system for time measurement purposes, it is > useless for those four hours. > > Being able to accurately save and restore system time helps to reduce > the disruptive effect of a reboot. > > Currently, there are no controls in the kernel over this mechanism - if > you're running a multi-platform kernel which has support for writing the > RTC if ntp-sync'd, then you're stuck with the kernel writing the RTC > every 11 seconds. Not only is this detrimental due to the enforced RTC and here too. > dependent offset, but it also means that if you want to trim your RTC > to a synchronised source of time (for which the kernel must not write > the RTC, but you do want it to ntp sync), the only way to do it is to > either modify the kernel, or disable CONFIG_RTC_SYSTOHC in the multi- > platform kernel. > > > The kernel can do better - elimination of the rounding-up in systohc.c > gives a better result for PCF8523: > > RTC Time: 19-09-2017 16:30:38 > System Time was: 16:30:38.034 > > and the remaining offset can be reduced by adjusting the 500ms offset > in ntp.c to 470ms. This is specific to the offset that PCF8523 wants. > We know that MC146818 RTCs found in PCs want a 500ms offset (without > the addition of one second that systohc.c does.) The Armada 388 RTC > wants to be set on the exact second. > > We need some way to cater for these differing requirements (eg, RTC > drivers provide some properties to rtclib and the ntp code to describe > how long it takes for a written time to "take"), or we decide (as I > think was decided in the past) that the kernel should not be setting > the RTC, but userspace should be responsible for performing that > function. Either way, we need to know about these RTC specific > properties in order to set their time accurately. > > One of the issues here, however, is that RTC datasheets do not give this > information - this can only be found out by experimentation and > measurement. > > Currently I'm using the hacky patch below to be able to (a) disable the > regular RTC write during runtime, so I can measure the RTC drift and > trim it, and (b) provide a knob to adjust how far past the second the > RTC will receive its write. For a properly trimmed PCF8523 (measured > over about 12 hours to have 0.1ppm drift - which is better than its > associated crystal is rated for), setting this knob to 470ms results > in the following (from two samples this morning): > > RTC Time: 20-09-2017 10:41:30 > System Time was: 10:41:30.000 > RTC Time: 20-09-2017 11:17:38 > System Time was: 11:17:38.000 > > There's probably some noise in the setting of this due to the workqueue, > but getting it within 10ms is definitely an improvement over being > almost a second out. > > So, the question is... how should these differences in rtc requirements > be handled? > > drivers/rtc/systohc.c | 6 +++--- > kernel/sysctl.c | 21 +++++++++++++++++++++ > kernel/time/ntp.c | 9 ++++++--- > 3 files changed, 30 insertions(+), 6 deletions(-) > > diff --git a/drivers/rtc/systohc.c b/drivers/rtc/systohc.c > index b4a68ffcd06b..df804597de71 100644 > --- a/drivers/rtc/systohc.c > +++ b/drivers/rtc/systohc.c > @@ -26,10 +26,7 @@ int rtc_set_ntp_time(struct timespec64 now) > struct rtc_time tm; > int err = -ENODEV; > > - if (now.tv_nsec < (NSEC_PER_SEC >> 1)) > - rtc_time64_to_tm(now.tv_sec, &tm); > - else > - rtc_time64_to_tm(now.tv_sec + 1, &tm); > + rtc_time64_to_tm(now.tv_sec, &tm); > > rtc = rtc_class_open(CONFIG_RTC_SYSTOHC_DEVICE); > if (rtc) { > diff --git a/kernel/sysctl.c b/kernel/sysctl.c > index 6648fbbb8157..c2ce802f7ab9 100644 > --- a/kernel/sysctl.c > +++ b/kernel/sysctl.c > @@ -98,6 +98,9 @@ > > #if defined(CONFIG_SYSCTL) > > +extern unsigned int sysctl_ntp_rtc_offset; > +extern unsigned int sysctl_ntp_rtc_sync; > + > /* External variables not in a header file. */ > extern int suid_dumpable; > #ifdef CONFIG_COREDUMP > @@ -303,6 +306,24 @@ static int max_extfrag_threshold = 1000; > #endif > > static struct ctl_table kern_table[] = { > +#if defined(CONFIG_GENERIC_CMOS_UPDATE) || defined(CONFIG_RTC_SYSTOHC) > + { > + .procname = "ntp_rtc_offset", > + .data = &sysctl_ntp_rtc_offset, > + .maxlen = sizeof(sysctl_ntp_rtc_offset), > + .mode = 0644, > + .proc_handler = proc_douintvec, > + }, > + { > + .procname = "ntp_rtc_sync", > + .data = &sysctl_ntp_rtc_sync, > + .maxlen = sizeof(sysctl_ntp_rtc_sync), > + .mode = 0644, > + .proc_handler = proc_douintvec_minmax, > + .extra1 = &zero, > + .extra2 = &one, > + }, > +#endif > { > .procname = "sched_child_runs_first", > .data = &sysctl_sched_child_runs_first, > diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c > index edf19cc53140..674c45d30561 100644 > --- a/kernel/time/ntp.c > +++ b/kernel/time/ntp.c > @@ -508,6 +508,9 @@ int __weak update_persistent_clock64(struct timespec64 now64) > #endif > > #if defined(CONFIG_GENERIC_CMOS_UPDATE) || defined(CONFIG_RTC_SYSTOHC) > +unsigned long sysctl_ntp_rtc_offset = NSEC_PER_SEC / 2; > +unsigned int sysctl_ntp_rtc_sync = true; > + > static void sync_cmos_clock(struct work_struct *work); > > static DECLARE_DELAYED_WORK(sync_cmos_work, sync_cmos_clock); > @@ -526,7 +529,7 @@ static void sync_cmos_clock(struct work_struct *work) > * may not expire at the correct time. Thus, we adjust... > * We want the clock to be within a couple of ticks from the target. > */ > - if (!ntp_synced()) { > + if (!ntp_synced() || !sysctl_ntp_rtc_sync) { > /* > * Not synced, exit, do not restart a timer (if one is > * running, let it run out). > @@ -535,7 +538,7 @@ static void sync_cmos_clock(struct work_struct *work) > } > > getnstimeofday64(&now); > - if (abs(now.tv_nsec - (NSEC_PER_SEC / 2)) <= tick_nsec * 5) { > + if (abs(now.tv_nsec - sysctl_ntp_rtc_offset) <= tick_nsec * 5) { > struct timespec64 adjust = now; > > fail = -ENODEV; > @@ -551,7 +554,7 @@ static void sync_cmos_clock(struct work_struct *work) > #endif > } > > - next.tv_nsec = (NSEC_PER_SEC / 2) - now.tv_nsec - (TICK_NSEC / 2); > + next.tv_nsec = sysctl_ntp_rtc_offset - now.tv_nsec - (TICK_NSEC / 2); > if (next.tv_nsec <= 0) > next.tv_nsec += NSEC_PER_SEC; > > > > -- > RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ > FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up > According to speedtest.net: 8.21Mbps down 510kbps up > > _______________________________________________ > linux-arm-kernel mailing list > linux-arm-kernel@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
On Wed, Sep 20, 2017 at 12:21:52PM +0100, Russell King - ARM Linux wrote: > However, assumptions are made about the RTC: > > 1. kernel/time/ntp.c assumes that all RTCs want to be told to set the > time at around 500ms into the second. > > 2. drivers/rtc/systohc.c assumes that if the time being set is >= 500ms, > then we want to set the _next_ second. I looked at these issues when I did the sys to HC patches and I concluced the first problem was that the RTC read functions generally did not return sub second resolution, either in sense of directly returning ts_nsec, or the sense of delaying the read until a clock tick over event. Eg if I repeatedly booted my various embedded ARM/PPC systems, and recorded the time offset from NTP, I got a fairly random distribution of offsets. Given that we couldn't seem to read back sub second resolution on my systems, storing it more accurately did not seem important. I think we also did some experiments with a few of the RTCs we were using and some of them did not adjust the seconds clock phase on write, so they seemed incapable of storing sub second data anyhow. So.. My feeling was that we'd need driver support in each RTC driver to enable sub section resolution. Do you know differently? Our pragmatic solution in our products was to have the initial time sync from NTP step the clock even if the offset is small. > So, the question is... how should these differences in rtc requirements > be handled? I think patch wise, this is something I would rather see handled internally via the drivers and perhaps with input from DT, not via sysctl knobs. The HW driver should know how to read and write with sub second resolution. If it works best with a certain value in the ts_nsec field, then it should set something inside rtc_chip that causes the systohc code to try and call it with that tv_nsec. It would probably also make sense to add new ntp ops for sub second get/set that includes the full timespec. This way the RTC driver can provide the right adjustments and we can get rid of the +1 in rtc_set_ntp_time and the +0.5 in rtc_hctosys for sub sec aware drivers.. Jason
On Wed, Sep 20, 2017 at 10:22:08AM -0600, Jason Gunthorpe wrote: > On Wed, Sep 20, 2017 at 12:21:52PM +0100, Russell King - ARM Linux wrote: > > > However, assumptions are made about the RTC: > > > > 1. kernel/time/ntp.c assumes that all RTCs want to be told to set the > > time at around 500ms into the second. > > > > 2. drivers/rtc/systohc.c assumes that if the time being set is >= 500ms, > > then we want to set the _next_ second. > > I looked at these issues when I did the sys to HC patches and I > concluced the first problem was that the RTC read functions generally > did not return sub second resolution, either in sense of directly > returning ts_nsec, or the sense of delaying the read until a clock > tick over event. The boot time problem can be resolved by using hwclock to set the system time in userspace - there are distros that do exactly that. For example, debian has a udev rule: # Set the System Time from the Hardware Clock and set the kernel's timezone # value to the local timezone when the kernel clock module is loaded. KERNEL=="rtc0", RUN+="/lib/udev/hwclock-set $root/$name" and that script uses hwclock to set the system time from the RTC. hwclock doesn't just read the RTC and set the system time, it tries to read the RTC at a whole second, and apply any known RTC drift correction. So I wouldn't worry about the kernel's RTC read being inaccurate, that's been worked around in distros for years. Embedded distros may not care so much about this (since they care more about boot speed) except if they're reliant on having correct time, in which case they may /choose/ to use hwclock in a blocking manner to ensure that system time is accurately set from the RTC. However, all that effort is for nowt if the RTC isn't set correctly in the first place. > I think we also did some experiments with a few of the RTCs we were > using and some of them did not adjust the seconds clock phase on > write, so they seemed incapable of storing sub second data anyhow. I can imagine that there are RTCs out there which do not reset the seconds pre-scaler, but just because there are some like that is no reason to cripple those which are more inteligent about it. > So.. My feeling was that we'd need driver support in each RTC driver > to enable sub section resolution. > > Do you know differently? See above... > Our pragmatic solution in our products was to have the initial time > sync from NTP step the clock even if the offset is small. That's all very well, but consider when your time source is GPS and you're not on a network connection that would allow ntpdate to do any better (eg, a 3G mobile data card...) I have exactly that setup with a remote monitoring system. If I reboot the gateway with the 3G and GPS, NTP makes big PPM adjustments while trying to correct for the 970ms time shift, and that madness is transmitted by ntpd to the stratum 2 servers. > > So, the question is... how should these differences in rtc requirements > > be handled? > > I think patch wise, this is something I would rather see handled > internally via the drivers and perhaps with input from DT, not via > sysctl knobs. I sort-of agree as far as the time offset information goes, but there's a complication that we only open the RTC to set the time at the point in time that we want to set it - while the RTC is closed, the RTC driver module could be removed and replaced by a different RTC driver which replaces the existing device. Don't think that's not possible, there are boards out there with multiple RTCs on them, so its entirely possible that the wrong RTC could be selected with random module probe ordering and need manual resolution. For example, the system I mention above has the built-in iMX6 SVNS RTC which gets powered down when power is removed, and a PCF8523 RTC which doesn't... and there's variants of the board where the PCF8523 isn't fitted, so users are reliant on the iMX6 RTC for cross-reboot time. The patch is not meant to be a final solution, so criticising it for using sysctls is not appropriate - it is firstly to prove the point that we are in fact able to correctly set RTCs, and secondly (as I explained in the bits you cut) to provide a knob to turn off the kernel's automatic RTC setting so it's possible to accurately trim the RTC. Many RTCs contain registers that allow fine trimming of their tick rate, and if you care about the time keeping of your RTC, it's something that needs to be calibrated against a known correct time source. Having the kernel repeatedly set the RTC every 11 minutes while you're trying to measure the RTCs drift over a period of 12 hours is not on - and in order to make such a measurement, you want the machine to be NTP synchronised. That's the exact situation that triggers kernels to periodically write the time to the RTC. So, we _do_ need a knob to turn that kernel timekeeping facility on and off in addition to the "are we NTP sync'd" status. > The HW driver should know how to read and write with sub second > resolution. If it works best with a certain value in the ts_nsec > field, then it should set something inside rtc_chip that causes the > systohc code to try and call it with that tv_nsec. The problem is deeper than the systohc.c code - the timing of the call made into the systohc.c code is decided by kernel/time/ntp.c, and currently is within a tick of 500ms past the second. > It would probably also make sense to add new ntp ops for sub second > get/set that includes the full timespec. This way the RTC driver can > provide the right adjustments and we can get rid of the +1 in > rtc_set_ntp_time and the +0.5 in rtc_hctosys for sub sec aware > drivers.. Do we have any sub-second aware drivers? None of my RTCs are, and I don't think it's fair for RTC drivers to sleep when getting a request to set the time. The userspace API doesn't do that, and the workqueue involved in setting NTP time is probably run using shared system_power_efficient_wq resources, so blocking there will be detrimental to other works queued on that. Maybe the NTP code should call into RTCs to enable/disable the time setting feature and leave RTC drivers to do that, but that seems to me like a lot of code in each driver as well as something of a layering issue.
On Wed, Sep 20, 2017 at 05:51:41PM +0100, Russell King - ARM Linux wrote: > On Wed, Sep 20, 2017 at 10:22:08AM -0600, Jason Gunthorpe wrote: > > On Wed, Sep 20, 2017 at 12:21:52PM +0100, Russell King - ARM Linux wrote: > > > > > However, assumptions are made about the RTC: > > > > > > 1. kernel/time/ntp.c assumes that all RTCs want to be told to set the > > > time at around 500ms into the second. > > > > > > 2. drivers/rtc/systohc.c assumes that if the time being set is >= 500ms, > > > then we want to set the _next_ second. > > > > I looked at these issues when I did the sys to HC patches and I > > concluced the first problem was that the RTC read functions generally > > did not return sub second resolution, either in sense of directly > > returning ts_nsec, or the sense of delaying the read until a clock > > tick over event. > > The boot time problem can be resolved by using hwclock to set the > system time in userspace - there are distros that do exactly that. > For example, debian has a udev rule: Okay, that part I didn't know, thanks. > > I think patch wise, this is something I would rather see handled > > internally via the drivers and perhaps with input from DT, not via > > sysctl knobs. > > I sort-of agree as far as the time offset information goes, but there's > a complication that we only open the RTC to set the time at the point in > time that we want to set it - while the RTC is closed, the RTC driver > module could be removed and replaced by a different RTC driver which > replaces the existing device. Yes, the ntp code would have to open the rtc initially and record the offset. Each time it goes to write it would have to check the offset it got against what the device wants and, if necessary, reschedule the update if the ts_nsec is not correct. That should handle infrequent dynamic changes.. > So, we _do_ need a knob to turn that kernel timekeeping facility on and > off in addition to the "are we NTP sync'd" status. Sure, that makes sense as a sysctl > > The HW driver should know how to read and write with sub second > > resolution. If it works best with a certain value in the ts_nsec > > field, then it should set something inside rtc_chip that causes the > > systohc code to try and call it with that tv_nsec. > > The problem is deeper than the systohc.c code - the timing of the call > made into the systohc.c code is decided by kernel/time/ntp.c, and > currently is within a tick of 500ms past the second. Right, I was thinking about the entire systohc process, including the part in ntp.c > Do we have any sub-second aware drivers? None of my RTCs are, and I > don't think it's fair for RTC drivers to sleep when getting a request > to set the time. I would call any RTC that can change the phase of the seconds clock as sub-second capable. Currently, I think that only the CMOS driver is really a sub-second aware driver, and the rest of the RTC stack has been sort of hardcoded around what it does.. My thinking was adding a new sub-second entry point would let us keep the existing assumptions above while having a cleaner entry point that allows fixing the RTC drivers gradually. Eg you have tested PCF, so it could use the subsecond entry and define the proper ts_nsec value/etc. > The userspace API doesn't do that, and the workqueue involved in > setting NTP time is probably run using shared system_power_efficient_wq > resources, so blocking there will be detrimental to other works queued > on that. I think the NTP path should be non blocking, and rely on the ntp.c code calling into the driver with the desired tv_nsec. However, it also makes sense to provide a blocking sub-second user space API that would have the required sleep to make setting the time work properly. This way we can hide the desired tv_nsec implementation detail from userspace completely. eg I assume hwclock also has the built in 0.5s assumption when trying to write to the RTC? The same would be true for read, I think it would be cleaner to have a kernel uapi that reads the time with subsecond accuracy than to have hwclock implement a spinning loop in userspace. That gives drivers more options in how they measure the second tick over. The common code would just provide a simple loop like we see in hwclock. Jason
diff --git a/drivers/rtc/systohc.c b/drivers/rtc/systohc.c index b4a68ffcd06b..df804597de71 100644 --- a/drivers/rtc/systohc.c +++ b/drivers/rtc/systohc.c @@ -26,10 +26,7 @@ int rtc_set_ntp_time(struct timespec64 now) struct rtc_time tm; int err = -ENODEV; - if (now.tv_nsec < (NSEC_PER_SEC >> 1)) - rtc_time64_to_tm(now.tv_sec, &tm); - else - rtc_time64_to_tm(now.tv_sec + 1, &tm); + rtc_time64_to_tm(now.tv_sec, &tm); rtc = rtc_class_open(CONFIG_RTC_SYSTOHC_DEVICE); if (rtc) { diff --git a/kernel/sysctl.c b/kernel/sysctl.c index 6648fbbb8157..c2ce802f7ab9 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -98,6 +98,9 @@ #if defined(CONFIG_SYSCTL) +extern unsigned int sysctl_ntp_rtc_offset; +extern unsigned int sysctl_ntp_rtc_sync; + /* External variables not in a header file. */ extern int suid_dumpable; #ifdef CONFIG_COREDUMP @@ -303,6 +306,24 @@ static int max_extfrag_threshold = 1000; #endif static struct ctl_table kern_table[] = { +#if defined(CONFIG_GENERIC_CMOS_UPDATE) || defined(CONFIG_RTC_SYSTOHC) + { + .procname = "ntp_rtc_offset", + .data = &sysctl_ntp_rtc_offset, + .maxlen = sizeof(sysctl_ntp_rtc_offset), + .mode = 0644, + .proc_handler = proc_douintvec, + }, + { + .procname = "ntp_rtc_sync", + .data = &sysctl_ntp_rtc_sync, + .maxlen = sizeof(sysctl_ntp_rtc_sync), + .mode = 0644, + .proc_handler = proc_douintvec_minmax, + .extra1 = &zero, + .extra2 = &one, + }, +#endif { .procname = "sched_child_runs_first", .data = &sysctl_sched_child_runs_first, diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c index edf19cc53140..674c45d30561 100644 --- a/kernel/time/ntp.c +++ b/kernel/time/ntp.c @@ -508,6 +508,9 @@ int __weak update_persistent_clock64(struct timespec64 now64) #endif #if defined(CONFIG_GENERIC_CMOS_UPDATE) || defined(CONFIG_RTC_SYSTOHC) +unsigned long sysctl_ntp_rtc_offset = NSEC_PER_SEC / 2; +unsigned int sysctl_ntp_rtc_sync = true; + static void sync_cmos_clock(struct work_struct *work); static DECLARE_DELAYED_WORK(sync_cmos_work, sync_cmos_clock); @@ -526,7 +529,7 @@ static void sync_cmos_clock(struct work_struct *work) * may not expire at the correct time. Thus, we adjust... * We want the clock to be within a couple of ticks from the target. */ - if (!ntp_synced()) { + if (!ntp_synced() || !sysctl_ntp_rtc_sync) { /* * Not synced, exit, do not restart a timer (if one is * running, let it run out). @@ -535,7 +538,7 @@ static void sync_cmos_clock(struct work_struct *work) } getnstimeofday64(&now); - if (abs(now.tv_nsec - (NSEC_PER_SEC / 2)) <= tick_nsec * 5) { + if (abs(now.tv_nsec - sysctl_ntp_rtc_offset) <= tick_nsec * 5) { struct timespec64 adjust = now; fail = -ENODEV; @@ -551,7 +554,7 @@ static void sync_cmos_clock(struct work_struct *work) #endif } - next.tv_nsec = (NSEC_PER_SEC / 2) - now.tv_nsec - (TICK_NSEC / 2); + next.tv_nsec = sysctl_ntp_rtc_offset - now.tv_nsec - (TICK_NSEC / 2); if (next.tv_nsec <= 0) next.tv_nsec += NSEC_PER_SEC;