Message ID | fe235a1f65bb6c86d2afcdf52d85f80ae728dcc5.1683722688.git.geert+renesas@glider.be (mailing list archive) |
---|---|
State | Handled Elsewhere, archived |
Headers | show |
Series | iopoll: Busy loop and timeout improvements | expand |
* Geert Uytterhoeven <geert+renesas@glider.be> [230510 13:23]: > It is considered good practice to call cpu_relax() in busy loops, see > Documentation/process/volatile-considered-harmful.rst. This can not > only lower CPU power consumption or yield to a hyperthreaded twin > processor, but also allows an architecture to mitigate hardware issues > (e.g. ARM Erratum 754327 for Cortex-A9 prior to r2p0) in the > architecture-specific cpu_relax() implementation. > > In addition, cpu_relax() is also a compiler barrier. It is not > immediately obvious that the @op argument "function" will result in an > actual function call (e.g. in case of inlining). > > Where a function call is a C sequence point, this is lost on inlining. > Therefore, with agressive enough optimization it might be possible for > the compiler to hoist the: > > (val) = op(args); > > "load" out of the loop because it doesn't see the value changing. The > addition of cpu_relax() would inhibit this. > > As the iopoll helpers lack calls to cpu_relax(), people are sometimes > reluctant to use them, and may fall back to open-coded polling loops > (including cpu_relax() calls) instead. > > Fix this by adding calls to cpu_relax() to the iopoll helpers: > - For the non-atomic case, it is sufficient to call cpu_relax() in > case of a zero sleep-between-reads value, as a call to > usleep_range() is a safe barrier otherwise. However, it doesn't > hurt to add the call regardless, for simplicity, and for similarity > with the atomic case below. > - For the atomic case, cpu_relax() must be called regardless of the > sleep-between-reads value, as there is no guarantee all > architecture-specific implementations of udelay() handle this. Reviewed-by: Tony Lindgren <tony@atomide.com>
On Wed, 10 May 2023 at 15:23, Geert Uytterhoeven <geert+renesas@glider.be> wrote: > > It is considered good practice to call cpu_relax() in busy loops, see > Documentation/process/volatile-considered-harmful.rst. This can not > only lower CPU power consumption or yield to a hyperthreaded twin > processor, but also allows an architecture to mitigate hardware issues > (e.g. ARM Erratum 754327 for Cortex-A9 prior to r2p0) in the > architecture-specific cpu_relax() implementation. > > In addition, cpu_relax() is also a compiler barrier. It is not > immediately obvious that the @op argument "function" will result in an > actual function call (e.g. in case of inlining). > > Where a function call is a C sequence point, this is lost on inlining. > Therefore, with agressive enough optimization it might be possible for > the compiler to hoist the: > > (val) = op(args); > > "load" out of the loop because it doesn't see the value changing. The > addition of cpu_relax() would inhibit this. > > As the iopoll helpers lack calls to cpu_relax(), people are sometimes > reluctant to use them, and may fall back to open-coded polling loops > (including cpu_relax() calls) instead. > > Fix this by adding calls to cpu_relax() to the iopoll helpers: > - For the non-atomic case, it is sufficient to call cpu_relax() in > case of a zero sleep-between-reads value, as a call to > usleep_range() is a safe barrier otherwise. However, it doesn't > hurt to add the call regardless, for simplicity, and for similarity > with the atomic case below. > - For the atomic case, cpu_relax() must be called regardless of the > sleep-between-reads value, as there is no guarantee all > architecture-specific implementations of udelay() handle this. > > Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> > Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> > Acked-by: Arnd Bergmann <arnd@arndb.de> Makes sense to me! Feel free to add: Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org> Kind regards Uffe > --- > v2: > - Add Acked-by, > - Add compiler barrier and inlining explanation (thanks, Peter!). > --- > include/linux/iopoll.h | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/include/linux/iopoll.h b/include/linux/iopoll.h > index 2c8860e406bd8cae..0417360a6db9b0d6 100644 > --- a/include/linux/iopoll.h > +++ b/include/linux/iopoll.h > @@ -53,6 +53,7 @@ > } \ > if (__sleep_us) \ > usleep_range((__sleep_us >> 2) + 1, __sleep_us); \ > + cpu_relax(); \ > } \ > (cond) ? 0 : -ETIMEDOUT; \ > }) > @@ -95,6 +96,7 @@ > } \ > if (__delay_us) \ > udelay(__delay_us); \ > + cpu_relax(); \ > } \ > (cond) ? 0 : -ETIMEDOUT; \ > }) > -- > 2.34.1 >
From: Tony Lindgren > Sent: 11 May 2023 07:49 > > * Geert Uytterhoeven <geert+renesas@glider.be> [230510 13:23]: > > It is considered good practice to call cpu_relax() in busy loops, see > > Documentation/process/volatile-considered-harmful.rst. This can not > > only lower CPU power consumption or yield to a hyperthreaded twin > > processor, but also allows an architecture to mitigate hardware issues > > (e.g. ARM Erratum 754327 for Cortex-A9 prior to r2p0) in the > > architecture-specific cpu_relax() implementation. Don't you also need to call cond_resched() (at least some times). Otherwise the process can't be pre-empted and a RT process that last ran on that cpu will never be scheduled. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
Hi David, On Thu, May 11, 2023 at 12:49 PM David Laight <David.Laight@aculab.com> wrote: > > * Geert Uytterhoeven <geert+renesas@glider.be> [230510 13:23]: > > > It is considered good practice to call cpu_relax() in busy loops, see > > > Documentation/process/volatile-considered-harmful.rst. This can not > > > only lower CPU power consumption or yield to a hyperthreaded twin > > > processor, but also allows an architecture to mitigate hardware issues > > > (e.g. ARM Erratum 754327 for Cortex-A9 prior to r2p0) in the > > > architecture-specific cpu_relax() implementation. > > Don't you also need to call cond_resched() (at least some times). > Otherwise the process can't be pre-empted and a RT process > that last ran on that cpu will never be scheduled. According to [1], cond_resched() must be called at least once per few tens of milliseconds. read_poll_timeout() uses usleep_range(), which calls schedule_hrtimeout_range(). read_poll_timeout_atomic() should not be used with multi-ms timeouts anyway. So I guess we're OK? [1] https://elixir.bootlin.com/linux/latest/source/Documentation/RCU/Design/Requirements/Requirements.rst#L2348 Gr{oetje,eeting}s, Geert
From: Geert Uytterhoeven > Sent: 23 May 2023 08:30 > > Hi David, > > On Thu, May 11, 2023 at 12:49 PM David Laight <David.Laight@aculab.com> wrote: > > > * Geert Uytterhoeven <geert+renesas@glider.be> [230510 13:23]: > > > > It is considered good practice to call cpu_relax() in busy loops, see > > > > Documentation/process/volatile-considered-harmful.rst. This can not > > > > only lower CPU power consumption or yield to a hyperthreaded twin > > > > processor, but also allows an architecture to mitigate hardware issues > > > > (e.g. ARM Erratum 754327 for Cortex-A9 prior to r2p0) in the > > > > architecture-specific cpu_relax() implementation. > > > > Don't you also need to call cond_resched() (at least some times). > > Otherwise the process can't be pre-empted and a RT process > > that last ran on that cpu will never be scheduled. > > According to [1], cond_resched() must be called at least once per few > tens of milliseconds. Hmmm.... tens of milliseconds is really much too long for RT threads. A limit nearer 1ms would be barely acceptable. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
diff --git a/include/linux/iopoll.h b/include/linux/iopoll.h index 2c8860e406bd8cae..0417360a6db9b0d6 100644 --- a/include/linux/iopoll.h +++ b/include/linux/iopoll.h @@ -53,6 +53,7 @@ } \ if (__sleep_us) \ usleep_range((__sleep_us >> 2) + 1, __sleep_us); \ + cpu_relax(); \ } \ (cond) ? 0 : -ETIMEDOUT; \ }) @@ -95,6 +96,7 @@ } \ if (__delay_us) \ udelay(__delay_us); \ + cpu_relax(); \ } \ (cond) ? 0 : -ETIMEDOUT; \ })