Message ID | 20230905232757.36459-1-wahrenst@gmx.net (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [V2] hwrng: bcm2835: Fix hwrng throughput regression | expand |
On Wed, Sep 06, 2023 at 01:27:57AM +0200, Stefan Wahren wrote: > The last RCU stall fix caused a massive throughput regression of the > hwrng on Raspberry Pi 0 - 3. hwrng_msleep doesn't sleep precisely enough > and usleep_range doesn't allow scheduling. So try to restore the > best possible throughput by introducing hwrng_yield which interruptable > sleeps for one jiffy. > > Some performance measurements on Raspberry Pi 3B+ (arm64/defconfig): > > sudo dd if=/dev/hwrng of=/dev/null count=1 bs=10000 > > cpu_relax ~138025 Bytes / sec > hwrng_msleep(1000) ~13 Bytes / sec > hwrng_yield ~2510 Bytes / sec > > Fixes: 96cb9d055445 ("hwrng: bcm2835 - use hwrng_msleep() instead of cpu_relax()") > Link: https://lore.kernel.org/linux-arm-kernel/bc97ece5-44a3-4c4e-77da-2db3eb66b128@gmx.net/ > Signed-off-by: Stefan Wahren <wahrenst@gmx.net> Thanks, looks good to me. Reviewed-by: Jason A. Donenfeld <Jason@zx2c4.com>
On Wed, Sep 06, 2023 at 01:27:57AM +0200, Stefan Wahren wrote: > The last RCU stall fix caused a massive throughput regression of the > hwrng on Raspberry Pi 0 - 3. hwrng_msleep doesn't sleep precisely enough > and usleep_range doesn't allow scheduling. So try to restore the > best possible throughput by introducing hwrng_yield which interruptable > sleeps for one jiffy. > > Some performance measurements on Raspberry Pi 3B+ (arm64/defconfig): > > sudo dd if=/dev/hwrng of=/dev/null count=1 bs=10000 > > cpu_relax ~138025 Bytes / sec > hwrng_msleep(1000) ~13 Bytes / sec > hwrng_yield ~2510 Bytes / sec > > Fixes: 96cb9d055445 ("hwrng: bcm2835 - use hwrng_msleep() instead of cpu_relax()") > Link: https://lore.kernel.org/linux-arm-kernel/bc97ece5-44a3-4c4e-77da-2db3eb66b128@gmx.net/ > Signed-off-by: Stefan Wahren <wahrenst@gmx.net> > --- > > Changes in V2: > - introduce hwrng_yield and use it > > drivers/char/hw_random/bcm2835-rng.c | 2 +- > drivers/char/hw_random/core.c | 6 ++++++ > include/linux/hw_random.h | 1 + > 3 files changed, 8 insertions(+), 1 deletion(-) Patch applied. Thanks.
On 15.09.23 12:45, Herbert Xu wrote: > On Wed, Sep 06, 2023 at 01:27:57AM +0200, Stefan Wahren wrote: >> The last RCU stall fix caused a massive throughput regression of the >> hwrng on Raspberry Pi 0 - 3. hwrng_msleep doesn't sleep precisely enough >> and usleep_range doesn't allow scheduling. So try to restore the >> best possible throughput by introducing hwrng_yield which interruptable >> sleeps for one jiffy. >> >> Some performance measurements on Raspberry Pi 3B+ (arm64/defconfig): >> >> sudo dd if=/dev/hwrng of=/dev/null count=1 bs=10000 >> >> cpu_relax ~138025 Bytes / sec >> hwrng_msleep(1000) ~13 Bytes / sec >> hwrng_yield ~2510 Bytes / sec >> >> Fixes: 96cb9d055445 ("hwrng: bcm2835 - use hwrng_msleep() instead of cpu_relax()") >> Link: https://lore.kernel.org/linux-arm-kernel/bc97ece5-44a3-4c4e-77da-2db3eb66b128@gmx.net/ >> Signed-off-by: Stefan Wahren <wahrenst@gmx.net> >> --- >> >> Changes in V2: >> - introduce hwrng_yield and use it >> >> drivers/char/hw_random/bcm2835-rng.c | 2 +- >> drivers/char/hw_random/core.c | 6 ++++++ >> include/linux/hw_random.h | 1 + >> 3 files changed, 8 insertions(+), 1 deletion(-) > > Patch applied. Thanks. Hi Herbert, I there a strong reason why you merged this to what from here looks like the branch that targets the next merge window? The patch fixes a regression introduced during the last 12 months and thus normally should not wait for the next merge window. For details see "Expectations and best practices for fixing regressions" in Documentation/process/handling-regressions.rst; or if you want to hear it from Linus directly, check these out: https://lore.kernel.org/all/CAHk-=wis_qQy4oDNynNKi5b7Qhosmxtoj1jxo5wmB6SRUwQUBQ@mail.gmail.com/ https://lore.kernel.org/all/CAHk-=wgD98pmSK3ZyHk_d9kZ2bhgN6DuNZMAJaV0WTtbkf=RDw@mail.gmail.com/ Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) -- Everything you wanna know about Linux kernel regression tracking: https://linux-regtracking.leemhuis.info/about/#tldr If I did something stupid, please tell me, as explained on that page.
On Tue, Oct 03, 2023 at 01:35:29PM +0200, Linux regression tracking (Thorsten Leemhuis) wrote: > > Hi Herbert, I there a strong reason why you merged this to what from > here looks like the branch that targets the next merge window? The patch > fixes a regression introduced during the last 12 months and thus > normally should not wait for the next merge window. For details see > "Expectations and best practices for fixing regressions" in > Documentation/process/handling-regressions.rst; or if you want to hear > it from Linus directly, check these out: > https://lore.kernel.org/all/CAHk-=wis_qQy4oDNynNKi5b7Qhosmxtoj1jxo5wmB6SRUwQUBQ@mail.gmail.com/ > https://lore.kernel.org/all/CAHk-=wgD98pmSK3ZyHk_d9kZ2bhgN6DuNZMAJaV0WTtbkf=RDw@mail.gmail.com/ This is a one-year-old regression, and the fix is not a trivial reversion. So there is no urgency in pushing this out. Cheers,
[CCing Linus in case he wants to voice in to clarify how fixes for regressions that are nearly a year old should be handled] On 04.10.23 03:18, Herbert Xu wrote: > On Tue, Oct 03, 2023 at 01:35:29PM +0200, Linux regression tracking (Thorsten Leemhuis) wrote: >> >> Hi Herbert, I there a strong reason why you merged this to what from >> here looks like the branch that targets the next merge window? The patch >> fixes a regression introduced during the last 12 months and thus >> normally should not wait for the next merge window. For details see >> "Expectations and best practices for fixing regressions" in >> Documentation/process/handling-regressions.rst; or if you want to hear >> it from Linus directly, check these out: >> https://lore.kernel.org/all/CAHk-=wis_qQy4oDNynNKi5b7Qhosmxtoj1jxo5wmB6SRUwQUBQ@mail.gmail.com/ >> https://lore.kernel.org/all/CAHk-=wgD98pmSK3ZyHk_d9kZ2bhgN6DuNZMAJaV0WTtbkf=RDw@mail.gmail.com/ > > This is a one-year-old regression, From day the culprit was authored and merged to mainline: yes, nearly; but it's only roughly 9 months from the release of 6.1 till the report and the posting of the fix you applied. But that's splitting hairs; pretty sure Linus meant those 12 months just as a rough estimate. Other things also matter, among them how many people are affected and how risky the fix is; but to my untrained eyes this one doesn't look very dangerous (but I of course might be totally wrong with that). > and the fix is not a trivial reversion. What makes you think the strategy outlined earlier is only relevant for reverts? Yes, one of those links above points to a discussion about a revert, but the other is not about it. > So there is no urgency in pushing this out. I don't care to much about this particular case, but it would help my work to know how Linus think about it. That's why I CCed him, that way he can comment on this if he wants to. Linus, FWIW, the thread starts here with the fix for the regression: https://lore.kernel.org/all/20230905232757.36459-1-wahrenst@gmx.net/ Is also in -next for a while as b58a36008bfa1a. Quoting from there below for convenience. Ciao, Thorsten """> From b58a36008bfa1aadf55f516bcbfae40c779eb54b Mon Sep 17 00:00:00 2001 > From: Stefan Wahren <wahrenst@gmx.net> > Date: Wed, 6 Sep 2023 01:27:57 +0200 > Subject: hwrng: bcm2835 - Fix hwrng throughput regression > > The last RCU stall fix caused a massive throughput regression of the > hwrng on Raspberry Pi 0 - 3. hwrng_msleep doesn't sleep precisely enough > and usleep_range doesn't allow scheduling. So try to restore the > best possible throughput by introducing hwrng_yield which interruptable > sleeps for one jiffy. > > Some performance measurements on Raspberry Pi 3B+ (arm64/defconfig): > > sudo dd if=/dev/hwrng of=/dev/null count=1 bs=10000 > > cpu_relax ~138025 Bytes / sec > hwrng_msleep(1000) ~13 Bytes / sec > hwrng_yield ~2510 Bytes / sec > > Fixes: 96cb9d055445 ("hwrng: bcm2835 - use hwrng_msleep() instead of cpu_relax()") > Link: https://lore.kernel.org/linux-arm-kernel/bc97ece5-44a3-4c4e-77da-2db3eb66b128@gmx.net/ > Signed-off-by: Stefan Wahren <wahrenst@gmx.net> > Reviewed-by: Jason A. Donenfeld <Jason@zx2c4.com> > Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> > --- > drivers/char/hw_random/bcm2835-rng.c | 2 +- > drivers/char/hw_random/core.c | 6 ++++++ > include/linux/hw_random.h | 1 + > 3 files changed, 8 insertions(+), 1 deletion(-) > > diff --git a/drivers/char/hw_random/bcm2835-rng.c b/drivers/char/hw_random/bcm2835-rng.c > index eb04b12f9f01b..b03e803006275 100644 > --- a/drivers/char/hw_random/bcm2835-rng.c > +++ b/drivers/char/hw_random/bcm2835-rng.c > @@ -70,7 +70,7 @@ static int bcm2835_rng_read(struct hwrng *rng, void *buf, size_t max, > while ((rng_readl(priv, RNG_STATUS) >> 24) == 0) { > if (!wait) > return 0; > - hwrng_msleep(rng, 1000); > + hwrng_yield(rng); > } > > num_words = rng_readl(priv, RNG_STATUS) >> 24; > diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c > index e3598ec9cfca8..420f155d251fb 100644 > --- a/drivers/char/hw_random/core.c > +++ b/drivers/char/hw_random/core.c > @@ -678,6 +678,12 @@ long hwrng_msleep(struct hwrng *rng, unsigned int msecs) > } > EXPORT_SYMBOL_GPL(hwrng_msleep); > > +long hwrng_yield(struct hwrng *rng) > +{ > + return wait_for_completion_interruptible_timeout(&rng->dying, 1); > +} > +EXPORT_SYMBOL_GPL(hwrng_yield); > + > static int __init hwrng_modinit(void) > { > int ret; > diff --git a/include/linux/hw_random.h b/include/linux/hw_random.h > index 8a3115516a1ba..136e9842120e8 100644 > --- a/include/linux/hw_random.h > +++ b/include/linux/hw_random.h > @@ -63,5 +63,6 @@ extern void hwrng_unregister(struct hwrng *rng); > extern void devm_hwrng_unregister(struct device *dve, struct hwrng *rng); > > extern long hwrng_msleep(struct hwrng *rng, unsigned int msecs); > +extern long hwrng_yield(struct hwrng *rng); > > #endif /* LINUX_HWRANDOM_H_ */ """
diff --git a/drivers/char/hw_random/bcm2835-rng.c b/drivers/char/hw_random/bcm2835-rng.c index e98fcac578d6..634eab4776f3 100644 --- a/drivers/char/hw_random/bcm2835-rng.c +++ b/drivers/char/hw_random/bcm2835-rng.c @@ -71,7 +71,7 @@ static int bcm2835_rng_read(struct hwrng *rng, void *buf, size_t max, while ((rng_readl(priv, RNG_STATUS) >> 24) == 0) { if (!wait) return 0; - hwrng_msleep(rng, 1000); + hwrng_yield(rng); } num_words = rng_readl(priv, RNG_STATUS) >> 24; diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c index f34d356fe2c0..599a4bc2c548 100644 --- a/drivers/char/hw_random/core.c +++ b/drivers/char/hw_random/core.c @@ -679,6 +679,12 @@ long hwrng_msleep(struct hwrng *rng, unsigned int msecs) } EXPORT_SYMBOL_GPL(hwrng_msleep); +long hwrng_yield(struct hwrng *rng) +{ + return wait_for_completion_interruptible_timeout(&rng->dying, 1); +} +EXPORT_SYMBOL_GPL(hwrng_yield); + static int __init hwrng_modinit(void) { int ret; diff --git a/include/linux/hw_random.h b/include/linux/hw_random.h index 8a3115516a1b..136e9842120e 100644 --- a/include/linux/hw_random.h +++ b/include/linux/hw_random.h @@ -63,5 +63,6 @@ extern void hwrng_unregister(struct hwrng *rng); extern void devm_hwrng_unregister(struct device *dve, struct hwrng *rng); extern long hwrng_msleep(struct hwrng *rng, unsigned int msecs); +extern long hwrng_yield(struct hwrng *rng); #endif /* LINUX_HWRANDOM_H_ */
The last RCU stall fix caused a massive throughput regression of the hwrng on Raspberry Pi 0 - 3. hwrng_msleep doesn't sleep precisely enough and usleep_range doesn't allow scheduling. So try to restore the best possible throughput by introducing hwrng_yield which interruptable sleeps for one jiffy. Some performance measurements on Raspberry Pi 3B+ (arm64/defconfig): sudo dd if=/dev/hwrng of=/dev/null count=1 bs=10000 cpu_relax ~138025 Bytes / sec hwrng_msleep(1000) ~13 Bytes / sec hwrng_yield ~2510 Bytes / sec Fixes: 96cb9d055445 ("hwrng: bcm2835 - use hwrng_msleep() instead of cpu_relax()") Link: https://lore.kernel.org/linux-arm-kernel/bc97ece5-44a3-4c4e-77da-2db3eb66b128@gmx.net/ Signed-off-by: Stefan Wahren <wahrenst@gmx.net> --- Changes in V2: - introduce hwrng_yield and use it drivers/char/hw_random/bcm2835-rng.c | 2 +- drivers/char/hw_random/core.c | 6 ++++++ include/linux/hw_random.h | 1 + 3 files changed, 8 insertions(+), 1 deletion(-) -- 2.34.1