Message ID | 20240507-b4-sio-ntp-usec-v1-1-15003fc9c2b4@google.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | ntp: remove accidental integer wrap-around | expand |
On Mon, May 6, 2024 at 9:34 PM Justin Stitt <justinstitt@google.com> wrote: > Let's introduce a new macro and use that against NTP_PHASE_LIMIT to > properly limit the max size of time_maxerror without overflowing during > the check itself. > > Link: https://github.com/llvm/llvm-project/pull/82432 [1] > Closes: https://github.com/KSPP/linux/issues/354 > Cc: linux-hardening@vger.kernel.org > Signed-off-by: Justin Stitt <justinstitt@google.com> > --- > include/linux/timex.h | 1 + > kernel/time/ntp.c | 8 ++++---- > 2 files changed, 5 insertions(+), 4 deletions(-) > > diff --git a/include/linux/timex.h b/include/linux/timex.h > index 3871b06bd302..976490a06915 100644 > --- a/include/linux/timex.h > +++ b/include/linux/timex.h > @@ -138,6 +138,7 @@ unsigned long random_get_entropy_fallback(void); > #define MINSEC 256 /* min interval between updates (s) */ > #define MAXSEC 2048 /* max interval between updates (s) */ > #define NTP_PHASE_LIMIT ((MAXPHASE / NSEC_PER_USEC) << 5) /* beyond max. dispersion */ > +#define NTP_MAXFREQ_USEC (MAXFREQ / NSEC_PER_USEC) /* scaled to microseconds */ > > /* > * kernel variables > diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c > index 406dccb79c2b..19027b6d0827 100644 > --- a/kernel/time/ntp.c > +++ b/kernel/time/ntp.c > @@ -454,12 +454,12 @@ int second_overflow(time64_t secs) > } > > > - /* Bump the maxerror field */ > - time_maxerror += MAXFREQ / NSEC_PER_USEC; > - if (time_maxerror > NTP_PHASE_LIMIT) { > + /* Bump the maxerror field, making sure not to exceed NTP_PHASE_LIMIT */ > + if (NTP_PHASE_LIMIT - NTP_MAXFREQ_USEC < time_maxerror) { > time_maxerror = NTP_PHASE_LIMIT; > time_status |= STA_UNSYNC; > - } > + } else > + time_maxerror += NTP_MAXFREQ_USEC; > > /* Compute the phase adjustment for the next second */ > tick_length = tick_length_base; > Looks reasonable to me. Acked-by: John Stultz <jstultz@google.com> thanks -john
On Tue, May 07 2024 at 04:34, Justin Stitt wrote: > Using syzkaller alongside the newly reintroduced signed integer overflow > sanitizer spits out this report: > > [ 138.454979] ------------[ cut here ]------------ > [ 138.458089] UBSAN: signed-integer-overflow in ../kernel/time/ntp.c:461:16 > [ 138.462134] 9223372036854775807 + 500 cannot be represented in type 'long' > [ 138.466234] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.8.0-rc2-00038-gc0a509640e93-dirty #10 > [ 138.471498] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 > [ 138.477110] Call Trace: > [ 138.478657] <IRQ> > [ 138.479964] dump_stack_lvl+0x93/0xd0 > [ 138.482276] handle_overflow+0x171/0x1b0 > [ 138.484699] second_overflow+0x2d6/0x500 > [ 138.487133] accumulate_nsecs_to_secs+0x60/0x160 > [ 138.489931] timekeeping_advance+0x1fe/0x890 > [ 138.492535] update_wall_time+0x10/0x30 Same comment vs. trimming. > Historically, the signed integer overflow sanitizer did not work in the > kernel due to its interaction with `-fwrapv` but this has since been > changed [1] in the newest version of Clang. It was re-enabled in the > kernel with Commit 557f8c582a9ba8ab ("ubsan: Reintroduce signed overflow > sanitizer"). Again. Irrelevant to the problem. > Let's introduce a new macro and use that against NTP_PHASE_LIMIT to > properly limit the max size of time_maxerror without overflowing during > the check itself. This fails to tell what is causing the issue and just talks about what the patch is doing. The latter can be seen from the patch itself, no? Something like this: On second overflow time_maxerror is unconditionally incremented and the result is checked against NTP_PHASE_LIMIT, but the increment can overflow into negative space. Prevent this by checking the overflow condition before incrementing. Hmm? But that obviously begs the question why this can happen at all. #define MAXPHASE 500000000L #define NTP_PHASE_LIMIT ((MAXPHASE / NSEC_PER_USEC) << 5) ==> NTP_PHASE_LIMIT = 1.6e+07 = 0xf42400 #define MAXFREQ 500000 So how can 0xf42400 + 500000/1000 overflow in the first place? It can't unless time_maxerror is somehow initialized to a bogus value and indeed it is: process_adjtimex_modes() .... if (txc->modes & ADJ_MAXERROR) time_maxerror = txc->maxerror; So that wants to be fixed and not the symptom. Thanks, tglx
Hi, On Tue, May 14, 2024 at 3:38 AM Thomas Gleixner <tglx@linutronix.de> wrote: > > On Tue, May 07 2024 at 04:34, Justin Stitt wrote: > > Using syzkaller alongside the newly reintroduced signed integer overflow > > sanitizer spits out this report: > > > > [ 138.454979] ------------[ cut here ]------------ > > [ 138.458089] UBSAN: signed-integer-overflow in ../kernel/time/ntp.c:461:16 > > [ 138.462134] 9223372036854775807 + 500 cannot be represented in type 'long' > > [ 138.466234] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.8.0-rc2-00038-gc0a509640e93-dirty #10 > > [ 138.471498] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 > > [ 138.477110] Call Trace: > > [ 138.478657] <IRQ> > > [ 138.479964] dump_stack_lvl+0x93/0xd0 > > [ 138.482276] handle_overflow+0x171/0x1b0 > > [ 138.484699] second_overflow+0x2d6/0x500 > > [ 138.487133] accumulate_nsecs_to_secs+0x60/0x160 > > [ 138.489931] timekeeping_advance+0x1fe/0x890 > > [ 138.492535] update_wall_time+0x10/0x30 > > Same comment vs. trimming. Gotcha, in the next version this will be trimmed. > > > Historically, the signed integer overflow sanitizer did not work in the > > kernel due to its interaction with `-fwrapv` but this has since been > > changed [1] in the newest version of Clang. It was re-enabled in the > > kernel with Commit 557f8c582a9ba8ab ("ubsan: Reintroduce signed overflow > > sanitizer"). > > Again. Irrelevant to the problem. Right, I'll move it below the fold. > > > Let's introduce a new macro and use that against NTP_PHASE_LIMIT to > > properly limit the max size of time_maxerror without overflowing during > > the check itself. > > This fails to tell what is causing the issue and just talks about what > the patch is doing. The latter can be seen from the patch itself, no? > > Something like this: > > On second overflow time_maxerror is unconditionally incremented and > the result is checked against NTP_PHASE_LIMIT, but the increment can > overflow into negative space. > > Prevent this by checking the overflow condition before incrementing. > > Hmm? Sounds better :thumbs_up: I'll use this! > > But that obviously begs the question why this can happen at all. > > #define MAXPHASE 500000000L > #define NTP_PHASE_LIMIT ((MAXPHASE / NSEC_PER_USEC) << 5) > > ==> NTP_PHASE_LIMIT = 1.6e+07 = 0xf42400 > > #define MAXFREQ 500000 > > So how can 0xf42400 + 500000/1000 overflow in the first place? > > It can't unless time_maxerror is somehow initialized to a bogus > value and indeed it is: > > process_adjtimex_modes() > .... > if (txc->modes & ADJ_MAXERROR) > time_maxerror = txc->maxerror; > > So that wants to be fixed and not the symptom. Isn't this usually supplied from the user and can be some pretty random stuff? Are you suggesting we update timekeeping_validate_timex() to include a check to limit the maxerror field to (NTP_PHASE_LIMIT-(MAXFREQ / NSEC_PER_USEC))? It seems like we should handle the overflow case where it happens: in second_overflow(). The clear intent of the existing code was to saturate at NTP_PHASE_LIMIT, they just did it in a way where the check itself triggers overflow sanitizers. > > Thanks, > > tglx Thanks Justin
On Thu, May 16, 2024 at 4:40 PM Justin Stitt <justinstitt@google.com> wrote: > Isn't this usually supplied from the user and can be some pretty > random stuff? Are you suggesting we update > timekeeping_validate_timex() to include a check to limit the maxerror > field to (NTP_PHASE_LIMIT-(MAXFREQ / NSEC_PER_USEC))? It seems like we > should handle the overflow case where it happens: in > second_overflow(). Or, I suppose we could add a check to timekeeping_validate_timex() like: if (txc->modes & ADJ_MAXERROR) { if (txc->maxerror < 0 || txc->maxerror > NTP_PHASE_LIMIT) return -EINVAL; } > Thanks > Justin
On Thu, May 16 2024 at 16:40, Justin Stitt wrote: > On Tue, May 14, 2024 at 3:38 AM Thomas Gleixner <tglx@linutronix.de> wrote: >> So how can 0xf42400 + 500000/1000 overflow in the first place? >> >> It can't unless time_maxerror is somehow initialized to a bogus >> value and indeed it is: >> >> process_adjtimex_modes() >> .... >> if (txc->modes & ADJ_MAXERROR) >> time_maxerror = txc->maxerror; >> >> So that wants to be fixed and not the symptom. > > Isn't this usually supplied from the user and can be some pretty > random stuff? Sure it comes from user space and can contain random nonsense as syzkaller demonstrated. > Are you suggesting we update timekeeping_validate_timex() to include a > check to limit the maxerror field to (NTP_PHASE_LIMIT-(MAXFREQ / > NSEC_PER_USEC))? It seems like we should handle the overflow case > where it happens: in second_overflow(). > > The clear intent of the existing code was to saturate at > NTP_PHASE_LIMIT, they just did it in a way where the check itself > triggers overflow sanitizers. The clear intent of the code is to do saturation of a bound value. Clearly the user space interface fails to validate the input to be in a sane range and that makes you go and prevent the resulting overflow at the usage site. Seriously? Obviously the sanitizer detects the stupid in second_overflow(), but that does not mean that the proper solution is to add overflow protection to that code. Tools are good to pin-point symptoms, but they are by definition patently bad in root cause analysis. Otherwise we could just let the tool write the "fix". The obvious first question in such a case is to ask _WHY_ does time_maxerror have a bogus value, which clearly cannot be achieved from regular operation. Once you figured out that the only way to set time_maxerror to a bogus value is the user space interface the obvious follow up question is whether such a value has to be considered as valid or not. As it is obviously invalid the logical consequence is to add a sanity check and reject that nonsense at that boundary, no? Thanks, tglx
diff --git a/include/linux/timex.h b/include/linux/timex.h index 3871b06bd302..976490a06915 100644 --- a/include/linux/timex.h +++ b/include/linux/timex.h @@ -138,6 +138,7 @@ unsigned long random_get_entropy_fallback(void); #define MINSEC 256 /* min interval between updates (s) */ #define MAXSEC 2048 /* max interval between updates (s) */ #define NTP_PHASE_LIMIT ((MAXPHASE / NSEC_PER_USEC) << 5) /* beyond max. dispersion */ +#define NTP_MAXFREQ_USEC (MAXFREQ / NSEC_PER_USEC) /* scaled to microseconds */ /* * kernel variables diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c index 406dccb79c2b..19027b6d0827 100644 --- a/kernel/time/ntp.c +++ b/kernel/time/ntp.c @@ -454,12 +454,12 @@ int second_overflow(time64_t secs) } - /* Bump the maxerror field */ - time_maxerror += MAXFREQ / NSEC_PER_USEC; - if (time_maxerror > NTP_PHASE_LIMIT) { + /* Bump the maxerror field, making sure not to exceed NTP_PHASE_LIMIT */ + if (NTP_PHASE_LIMIT - NTP_MAXFREQ_USEC < time_maxerror) { time_maxerror = NTP_PHASE_LIMIT; time_status |= STA_UNSYNC; - } + } else + time_maxerror += NTP_MAXFREQ_USEC; /* Compute the phase adjustment for the next second */ tick_length = tick_length_base;
Using syzkaller alongside the newly reintroduced signed integer overflow sanitizer spits out this report: [ 138.454979] ------------[ cut here ]------------ [ 138.458089] UBSAN: signed-integer-overflow in ../kernel/time/ntp.c:461:16 [ 138.462134] 9223372036854775807 + 500 cannot be represented in type 'long' [ 138.466234] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.8.0-rc2-00038-gc0a509640e93-dirty #10 [ 138.471498] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 [ 138.477110] Call Trace: [ 138.478657] <IRQ> [ 138.479964] dump_stack_lvl+0x93/0xd0 [ 138.482276] handle_overflow+0x171/0x1b0 [ 138.484699] second_overflow+0x2d6/0x500 [ 138.487133] accumulate_nsecs_to_secs+0x60/0x160 [ 138.489931] timekeeping_advance+0x1fe/0x890 [ 138.492535] update_wall_time+0x10/0x30 ... Historically, the signed integer overflow sanitizer did not work in the kernel due to its interaction with `-fwrapv` but this has since been changed [1] in the newest version of Clang. It was re-enabled in the kernel with Commit 557f8c582a9ba8ab ("ubsan: Reintroduce signed overflow sanitizer"). Let's introduce a new macro and use that against NTP_PHASE_LIMIT to properly limit the max size of time_maxerror without overflowing during the check itself. Link: https://github.com/llvm/llvm-project/pull/82432 [1] Closes: https://github.com/KSPP/linux/issues/354 Cc: linux-hardening@vger.kernel.org Signed-off-by: Justin Stitt <justinstitt@google.com> --- include/linux/timex.h | 1 + kernel/time/ntp.c | 8 ++++---- 2 files changed, 5 insertions(+), 4 deletions(-) --- base-commit: 0106679839f7c69632b3b9833c3268c316c0a9fc change-id: 20240507-b4-sio-ntp-usec-1a3ab67bdce1 Best regards, -- Justin Stitt <justinstitt@google.com>