Message ID | 20221010150607.720600-1-Jason@zx2c4.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | hw_random: bcm2835: use hwrng_msleep() instead of cpu_relax() | expand |
On Mon, Oct 10, 2022 at 09:06:07AM -0600, Jason A. Donenfeld wrote: > Rather than busy looping, yield back to the scheduler and sleep for a > bit in the event that there's no data. This should hopefully prevent the > stalls that Mark reported: > > <6>[ 3.362859] Freeing initrd memory: 16196K > <3>[ 23.160131] rcu: INFO: rcu_sched self-detected stall on CPU > <3>[ 23.166057] rcu: 0-....: (2099 ticks this GP) idle=03b4/1/0x40000002 softirq=28/28 fqs=1050 > <4>[ 23.174895] (t=2101 jiffies g=-1147 q=2353 ncpus=4) > <4>[ 23.180203] CPU: 0 PID: 49 Comm: hwrng Not tainted 6.0.0 #1 > <4>[ 23.186125] Hardware name: BCM2835 > <4>[ 23.189837] PC is at bcm2835_rng_read+0x30/0x6c > <4>[ 23.194709] LR is at hwrng_fillfn+0x71/0xf4 > <4>[ 23.199218] pc : [<c07ccdc8>] lr : [<c07cb841>] psr: 40000033 > <4>[ 23.205840] sp : f093df70 ip : 00000000 fp : 00000000 > <4>[ 23.211404] r10: c3c7e800 r9 : 00000000 r8 : c17e6b20 > <4>[ 23.216968] r7 : c17e6b64 r6 : c18b0a74 r5 : c07ccd99 r4 : c3f171c0 > <4>[ 23.223855] r3 : 000fffff r2 : 00000040 r1 : c3c7e800 r0 : c3f171c0 > <4>[ 23.230743] Flags: nZcv IRQs on FIQs on Mode SVC_32 ISA Thumb Segment none > <4>[ 23.238426] Control: 50c5387d Table: 0020406a DAC: 00000051 > <4>[ 23.244519] CPU: 0 PID: 49 Comm: hwrng Not tainted 6.0.0 #1 > > Link: https://lore.kernel.org/all/Y0QJLauamRnCDUef@sirena.org.uk/ > Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> > --- > I haven't tested this. Somebody with access to that kernel CI infra that > triggered this will need to test. So I succeeded at testing this, sort of. I was able to reproduce the hang on a CONFIG_PREEMPT=n kernel with this diff: diff --git a/drivers/net/wireguard/main.c b/drivers/net/wireguard/main.c index ee4da9ab8013..19e1186f0db0 100644 --- a/drivers/net/wireguard/main.c +++ b/drivers/net/wireguard/main.c @@ -15,12 +15,29 @@ #include <linux/init.h> #include <linux/module.h> #include <linux/genetlink.h> +#include <linux/hw_random.h> #include <net/rtnetlink.h> +static int derp_rng_read(struct hwrng *rng, void *buf, size_t max, bool wait) +{ + if (wait) { + for (;;) + cpu_relax(); + } + return 0; +} + +static struct hwrng derp_ops = { + .name = "flurpderp", + .read = derp_rng_read, +}; + static int __init wg_mod_init(void) { int ret; + hwrng_register(&derp_ops); + ret = wg_allowedips_slab_init(); if (ret < 0) goto err_allowedips; Next, I changed the cpu_relax() into hwrng_msleep(), as this patch does: diff --git a/drivers/net/wireguard/main.c b/drivers/net/wireguard/main.c index ee4da9ab8013..19e1186f0db0 100644 --- a/drivers/net/wireguard/main.c +++ b/drivers/net/wireguard/main.c @@ -15,12 +15,29 @@ #include <linux/init.h> #include <linux/module.h> #include <linux/genetlink.h> +#include <linux/hw_random.h> #include <net/rtnetlink.h> +static int derp_rng_read(struct hwrng *rng, void *buf, size_t max, bool wait) +{ + if (wait) { + for (;;) + hwrng_msleep(rng, 1000); + } + return 0; +} + +static struct hwrng derp_ops = { + .name = "flurpderp", + .read = derp_rng_read, +}; + static int __init wg_mod_init(void) { int ret; + hwrng_register(&derp_ops); + ret = wg_allowedips_slab_init(); if (ret < 0) goto err_allowedips; And then the problem went away. So I think this patch is a good one. Jason
On 10/10/22 08:06, 'Jason A. Donenfeld' via BCM-KERNEL-FEEDBACK-LIST,PDL wrote: > Rather than busy looping, yield back to the scheduler and sleep for a > bit in the event that there's no data. This should hopefully prevent the > stalls that Mark reported: > > <6>[ 3.362859] Freeing initrd memory: 16196K > <3>[ 23.160131] rcu: INFO: rcu_sched self-detected stall on CPU > <3>[ 23.166057] rcu: 0-....: (2099 ticks this GP) idle=03b4/1/0x40000002 softirq=28/28 fqs=1050 > <4>[ 23.174895] (t=2101 jiffies g=-1147 q=2353 ncpus=4) > <4>[ 23.180203] CPU: 0 PID: 49 Comm: hwrng Not tainted 6.0.0 #1 > <4>[ 23.186125] Hardware name: BCM2835 > <4>[ 23.189837] PC is at bcm2835_rng_read+0x30/0x6c > <4>[ 23.194709] LR is at hwrng_fillfn+0x71/0xf4 > <4>[ 23.199218] pc : [<c07ccdc8>] lr : [<c07cb841>] psr: 40000033 > <4>[ 23.205840] sp : f093df70 ip : 00000000 fp : 00000000 > <4>[ 23.211404] r10: c3c7e800 r9 : 00000000 r8 : c17e6b20 > <4>[ 23.216968] r7 : c17e6b64 r6 : c18b0a74 r5 : c07ccd99 r4 : c3f171c0 > <4>[ 23.223855] r3 : 000fffff r2 : 00000040 r1 : c3c7e800 r0 : c3f171c0 > <4>[ 23.230743] Flags: nZcv IRQs on FIQs on Mode SVC_32 ISA Thumb Segment none > <4>[ 23.238426] Control: 50c5387d Table: 0020406a DAC: 00000051 > <4>[ 23.244519] CPU: 0 PID: 49 Comm: hwrng Not tainted 6.0.0 #1 > > Link: https://lore.kernel.org/all/Y0QJLauamRnCDUef@sirena.org.uk/ > Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Acked-by: Florian Fainelli <f.fainelli@gmail.com>
On Mon, Oct 10, 2022 at 09:06:07AM -0600, Jason A. Donenfeld wrote: > Rather than busy looping, yield back to the scheduler and sleep for a > bit in the event that there's no data. This should hopefully prevent the > stalls that Mark reported: > > <6>[ 3.362859] Freeing initrd memory: 16196K > <3>[ 23.160131] rcu: INFO: rcu_sched self-detected stall on CPU > <3>[ 23.166057] rcu: 0-....: (2099 ticks this GP) idle=03b4/1/0x40000002 softirq=28/28 fqs=1050 > <4>[ 23.174895] (t=2101 jiffies g=-1147 q=2353 ncpus=4) > <4>[ 23.180203] CPU: 0 PID: 49 Comm: hwrng Not tainted 6.0.0 #1 > <4>[ 23.186125] Hardware name: BCM2835 > <4>[ 23.189837] PC is at bcm2835_rng_read+0x30/0x6c > <4>[ 23.194709] LR is at hwrng_fillfn+0x71/0xf4 > <4>[ 23.199218] pc : [<c07ccdc8>] lr : [<c07cb841>] psr: 40000033 > <4>[ 23.205840] sp : f093df70 ip : 00000000 fp : 00000000 > <4>[ 23.211404] r10: c3c7e800 r9 : 00000000 r8 : c17e6b20 > <4>[ 23.216968] r7 : c17e6b64 r6 : c18b0a74 r5 : c07ccd99 r4 : c3f171c0 > <4>[ 23.223855] r3 : 000fffff r2 : 00000040 r1 : c3c7e800 r0 : c3f171c0 > <4>[ 23.230743] Flags: nZcv IRQs on FIQs on Mode SVC_32 ISA Thumb Segment none > <4>[ 23.238426] Control: 50c5387d Table: 0020406a DAC: 00000051 > <4>[ 23.244519] CPU: 0 PID: 49 Comm: hwrng Not tainted 6.0.0 #1 > > Link: https://lore.kernel.org/all/Y0QJLauamRnCDUef@sirena.org.uk/ > Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> > --- > I haven't tested this. Somebody with access to that kernel CI infra that > triggered this will need to test. > > drivers/char/hw_random/bcm2835-rng.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) Patch applied. Thanks.
diff --git a/drivers/char/hw_random/bcm2835-rng.c b/drivers/char/hw_random/bcm2835-rng.c index e7dd457e9b22..e98fcac578d6 100644 --- a/drivers/char/hw_random/bcm2835-rng.c +++ b/drivers/char/hw_random/bcm2835-rng.c @@ -71,7 +71,7 @@ static int bcm2835_rng_read(struct hwrng *rng, void *buf, size_t max, while ((rng_readl(priv, RNG_STATUS) >> 24) == 0) { if (!wait) return 0; - cpu_relax(); + hwrng_msleep(rng, 1000); } num_words = rng_readl(priv, RNG_STATUS) >> 24;
Rather than busy looping, yield back to the scheduler and sleep for a bit in the event that there's no data. This should hopefully prevent the stalls that Mark reported: <6>[ 3.362859] Freeing initrd memory: 16196K <3>[ 23.160131] rcu: INFO: rcu_sched self-detected stall on CPU <3>[ 23.166057] rcu: 0-....: (2099 ticks this GP) idle=03b4/1/0x40000002 softirq=28/28 fqs=1050 <4>[ 23.174895] (t=2101 jiffies g=-1147 q=2353 ncpus=4) <4>[ 23.180203] CPU: 0 PID: 49 Comm: hwrng Not tainted 6.0.0 #1 <4>[ 23.186125] Hardware name: BCM2835 <4>[ 23.189837] PC is at bcm2835_rng_read+0x30/0x6c <4>[ 23.194709] LR is at hwrng_fillfn+0x71/0xf4 <4>[ 23.199218] pc : [<c07ccdc8>] lr : [<c07cb841>] psr: 40000033 <4>[ 23.205840] sp : f093df70 ip : 00000000 fp : 00000000 <4>[ 23.211404] r10: c3c7e800 r9 : 00000000 r8 : c17e6b20 <4>[ 23.216968] r7 : c17e6b64 r6 : c18b0a74 r5 : c07ccd99 r4 : c3f171c0 <4>[ 23.223855] r3 : 000fffff r2 : 00000040 r1 : c3c7e800 r0 : c3f171c0 <4>[ 23.230743] Flags: nZcv IRQs on FIQs on Mode SVC_32 ISA Thumb Segment none <4>[ 23.238426] Control: 50c5387d Table: 0020406a DAC: 00000051 <4>[ 23.244519] CPU: 0 PID: 49 Comm: hwrng Not tainted 6.0.0 #1 Link: https://lore.kernel.org/all/Y0QJLauamRnCDUef@sirena.org.uk/ Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> --- I haven't tested this. Somebody with access to that kernel CI infra that triggered this will need to test. drivers/char/hw_random/bcm2835-rng.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)