Message ID | 1476125277-6061-1-git-send-email-dianders@chromium.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Hi, On Mon, Oct 10, 2016 at 11:47:57AM -0700, Douglas Anderson wrote: > Users of usleep_range() expect that it will _never_ return in less time > than the minimum passed parameter. However, nothing in any of the code > ensures this. Specifically: Ouch (aka: good catch! - by the reporter, though... ;) > diff --git a/kernel/time/timer.c b/kernel/time/timer.c > index 32bf6f75a8fe..ab03c7e403a4 100644 > --- a/kernel/time/timer.c > +++ b/kernel/time/timer.c > @@ -1898,12 +1898,29 @@ EXPORT_SYMBOL(msleep_interruptible); > > static void __sched do_usleep_range(unsigned long min, unsigned long max) > { > + ktime_t start, end; > ktime_t kmin; > u64 delta; > + unsigned long elapsed = 0; > + int ret; > + > + start = ktime_get(); > + do { > + kmin = ktime_set(0, min * NSEC_PER_USEC); > + delta = (u64)(max - min) * NSEC_PER_USEC; > + ret = schedule_hrtimeout_range(&kmin, delta, HRTIMER_MODE_REL); > + > + /* > + * If schedule_hrtimeout_range() returns 0 then we actually > + * hit the timeout. If not then we need to re-calculate the > + * new timeout ourselves. > + */ > + if (ret == 0) > + break; > > - kmin = ktime_set(0, min * NSEC_PER_USEC); > - delta = (u64)(max - min) * NSEC_PER_USEC; > - schedule_hrtimeout_range(&kmin, delta, HRTIMER_MODE_REL); > + end = ktime_get(); > + elapsed = ktime_to_us(ktime_sub(end, start)); > + } while (elapsed < min); Some thoughts: - max >= min pre-condition is validated somewhere, I'd hope? (somewhere in outer API frame?) > + delta = (u64)(max - min) * NSEC_PER_USEC; - there is a domain transition in there, **within** the repeated loop: > + elapsed = ktime_to_us(ktime_sub(end, start)); --> > + } while (elapsed < min); should likely be made to be able to directly compare a *non-converted* "elapsed" value (IIRC there's an API such as ktime_before()). One could argue that the "repeat" case is the non-likely case (thus the conversion should possibly not be done before actually being required), however I guess it's exactly hitting the non-likely cases which will contribute towards non-predictable, non-deterministic latency behaviour of a code path (think RT requirements) Thanks, Andreas Mohr
Hi Doug, On Mon, Oct 10, 2016 at 11:47:57AM -0700, Doug Anderson wrote: > diff --git a/kernel/time/timer.c b/kernel/time/timer.c > index 32bf6f75a8fe..ab03c7e403a4 100644 > --- a/kernel/time/timer.c > +++ b/kernel/time/timer.c > @@ -1898,12 +1898,29 @@ EXPORT_SYMBOL(msleep_interruptible); > > static void __sched do_usleep_range(unsigned long min, unsigned long max) > { > + ktime_t start, end; > ktime_t kmin; > u64 delta; > + unsigned long elapsed = 0; > + int ret; > + > + start = ktime_get(); > + do { > + kmin = ktime_set(0, min * NSEC_PER_USEC); I believe 'min' is unmodified throughout, and therefore 'kmin' is computed to be the same minimum timeout in each loop. Shouldn't this be decreasing on each iteration of the loop? (i.e., either your compute 'kmin' differently here, or you recompute 'min' based on the elapsed time?) > + delta = (u64)(max - min) * NSEC_PER_USEC; > + ret = schedule_hrtimeout_range(&kmin, delta, HRTIMER_MODE_REL); > + > + /* > + * If schedule_hrtimeout_range() returns 0 then we actually > + * hit the timeout. If not then we need to re-calculate the > + * new timeout ourselves. > + */ > + if (ret == 0) > + break; > > - kmin = ktime_set(0, min * NSEC_PER_USEC); > - delta = (u64)(max - min) * NSEC_PER_USEC; > - schedule_hrtimeout_range(&kmin, delta, HRTIMER_MODE_REL); > + end = ktime_get(); > + elapsed = ktime_to_us(ktime_sub(end, start)); > + } while (elapsed < min); I think Andreas had similar comments, but it seemed to me like ktime_before() might be nicer. But (as Andreas did) you might get into (premature?) micro-optimizations down that path, and I'm certainly not an expert there. > } > > /** Brian
Hi, On Mon, Oct 10, 2016 at 1:04 PM, Brian Norris <briannorris@chromium.org> wrote: > Hi Doug, > > On Mon, Oct 10, 2016 at 11:47:57AM -0700, Doug Anderson wrote: >> diff --git a/kernel/time/timer.c b/kernel/time/timer.c >> index 32bf6f75a8fe..ab03c7e403a4 100644 >> --- a/kernel/time/timer.c >> +++ b/kernel/time/timer.c >> @@ -1898,12 +1898,29 @@ EXPORT_SYMBOL(msleep_interruptible); >> >> static void __sched do_usleep_range(unsigned long min, unsigned long max) >> { >> + ktime_t start, end; >> ktime_t kmin; >> u64 delta; >> + unsigned long elapsed = 0; >> + int ret; >> + >> + start = ktime_get(); >> + do { >> + kmin = ktime_set(0, min * NSEC_PER_USEC); > > I believe 'min' is unmodified throughout, and therefore 'kmin' is > computed to be the same minimum timeout in each loop. Shouldn't this be > decreasing on each iteration of the loop? (i.e., either your compute > 'kmin' differently here, or you recompute 'min' based on the elapsed > time?) Yes, I stupidly changed something at the last second and then didn't test again after my stupid change. Fix coming soon with all comments addressed. Sorry for posting broken code. :( :( :( -Doug
On Mon, Oct 10, 2016 at 01:12:39PM -0700, Doug Anderson wrote: > Hi, > > On Mon, Oct 10, 2016 at 1:04 PM, Brian Norris <briannorris@chromium.org> wrote: > > I believe 'min' is unmodified throughout, and therefore 'kmin' is > > computed to be the same minimum timeout in each loop. Shouldn't this be > > decreasing on each iteration of the loop? (i.e., either your compute > > 'kmin' differently here, or you recompute 'min' based on the elapsed > > time?) > > Yes, I stupidly changed something at the last second and then didn't > test again after my stupid change. Fix coming soon with all comments > addressed. Sorry for posting broken code. :( :( :( With a loop style that is actively re-calculating things, such implementations should then not fall into the trap of basing the "next" value on "current" time, thereby bogusly accumulating scheduling-based delays with each new loop iteration etc. (i.e., things should still be based on hard, precise termination according to an *initially* calculated, *absolute*, *minimum* expiry time). Andreas Mohr
Hi, On Mon, Oct 10, 2016 at 01:04:01PM -0700, Brian Norris wrote: > Hi Doug, > > On Mon, Oct 10, 2016 at 11:47:57AM -0700, Doug Anderson wrote: > > + end = ktime_get(); > > + elapsed = ktime_to_us(ktime_sub(end, start)); > > + } while (elapsed < min); > > I think Andreas had similar comments, but it seemed to me like > ktime_before() might be nicer. But (as Andreas did) you might get into > (premature?) micro-optimizations down that path, and I'm certainly not > an expert there. I'd justify it this way: orientation/type of input data is rather irrelevant - all that matters is how you want to treat things in your *inner handling* - and if that happens to be ktime-based (as is usually - if not pretty much always - the case within kernel code), then it ought to be that way, *consistently* (to avoid conversion transition overhead). Andreas Mohr
Hi, On Mon, Oct 10, 2016 at 12:53 PM, Andreas Mohr <andi@lisas.de> wrote: > Some thoughts: > - max >= min pre-condition is validated somewhere, I'd hope? > (somewhere in outer API frame?) I don't think this is validated, but it's not a new problem. My patch doesn't attempt to solve this. A future patch could, though. You can speculate on whether a WARN_ON would be better or some type of friendly warning. -Doug
diff --git a/kernel/time/timer.c b/kernel/time/timer.c index 32bf6f75a8fe..ab03c7e403a4 100644 --- a/kernel/time/timer.c +++ b/kernel/time/timer.c @@ -1898,12 +1898,29 @@ EXPORT_SYMBOL(msleep_interruptible); static void __sched do_usleep_range(unsigned long min, unsigned long max) { + ktime_t start, end; ktime_t kmin; u64 delta; + unsigned long elapsed = 0; + int ret; + + start = ktime_get(); + do { + kmin = ktime_set(0, min * NSEC_PER_USEC); + delta = (u64)(max - min) * NSEC_PER_USEC; + ret = schedule_hrtimeout_range(&kmin, delta, HRTIMER_MODE_REL); + + /* + * If schedule_hrtimeout_range() returns 0 then we actually + * hit the timeout. If not then we need to re-calculate the + * new timeout ourselves. + */ + if (ret == 0) + break; - kmin = ktime_set(0, min * NSEC_PER_USEC); - delta = (u64)(max - min) * NSEC_PER_USEC; - schedule_hrtimeout_range(&kmin, delta, HRTIMER_MODE_REL); + end = ktime_get(); + elapsed = ktime_to_us(ktime_sub(end, start)); + } while (elapsed < min); } /**
Users of usleep_range() expect that it will _never_ return in less time than the minimum passed parameter. However, nothing in any of the code ensures this. Specifically: usleep_range() => do_usleep_range() => schedule_hrtimeout_range() => schedule_hrtimeout_range_clock() just ends up calling schedule() with an appropriate timeout set using the hrtimer. If someone else happens to wake up our task then we'll happily return from usleep_range() early. msleep() already has code to handle this case since it will loop as long as there was still time left. usleep_range() had no such loop. The problem is is easily demonstrated with a small bit of test code: static int usleep_test_task(void *data) { atomic_t *done = data; ktime_t start, end; start = ktime_get(); usleep_range(50000, 100000); end = ktime_get(); pr_info("Requested 50000 - 100000 us. Actually slept for %llu us\n", (unsigned long long)ktime_to_us(ktime_sub(end, start))); atomic_set(done, 1); return 0; } static void run_usleep_test(void) { struct task_struct *t; atomic_t done; atomic_set(&done, 0); t = kthread_run(usleep_test_task, &done, "usleep_test_task"); while (!atomic_read(&done)) { wake_up_process(t); udelay(1000); } kthread_stop(t); } If you run the above code without this patch you get things like: Requested 50000 - 100000 us. Actually slept for 967 us If you run the above code _with_ this patch, you get: Requested 50000 - 100000 us. Actually slept for 50001 us Presumably this problem was not detected before because: - It's not terribly common to use wake_up_process() directly. - Other ways for processes to wake up are not typically mixed with usleep_range(). - There aren't lotsof places that use usleep_range(), since many people call either msleep() or udelay(). Reported-by: Tao Huang <huangtao@rock-chips.com> Signed-off-by: Douglas Anderson <dianders@chromium.org> --- kernel/time/timer.c | 23 ++++++++++++++++++++--- 1 file changed, 20 insertions(+), 3 deletions(-)