diff mbox series

s390/archrandom: remove CPACF trng invocations in interrupt context

Message ID 20220712100829.128574-1-freude@linux.ibm.com (mailing list archive)
State Not Applicable
Delegated to: Herbert Xu
Headers show
Series s390/archrandom: remove CPACF trng invocations in interrupt context | expand

Commit Message

Harald Freudenberger July 12, 2022, 10:08 a.m. UTC
This patch introduces two things:
1) The arch_get_random_seed_int() implementation now always
   returns false. There is no user in the whole kernel using
   this function.
2) For the arch_get_random_seed_long() make sure the CPACF trng
   instruction is never called in any interrupt context. This
   is done by adding an additional condition in_task().

Justification for 2):

There are some constrains to satisfy for the invocation of the
arch_get_random_seed_long() function:
- It should provide good random data during kernel initialization.
- It should not be called in interrupt context as the instruction
  is relatively heavy weight and may for example make some network
  loads cause to timeout and buck.

However, it was not clear what kind of interrupt context is exactly
encountered during kernel init or network traffic eventually calling
arch_get_random_seed_long().

After some days of investigations it is clear that the s390
start_kernel function is not running in any interrupt context and
so the trng is called:

Jul 11 18:33:39 t35lp54 kernel:  [<00000001064e90ca>] arch_get_random_seed_long.part.0+0x32/0x70
Jul 11 18:33:39 t35lp54 kernel:  [<000000010715f246>] random_init+0xf6/0x238
Jul 11 18:33:39 t35lp54 kernel:  [<000000010712545c>] start_kernel+0x4a4/0x628
Jul 11 18:33:39 t35lp54 kernel:  [<000000010590402a>] startup_continue+0x2a/0x40

The condition in_task() is true and the CPACF trng provides random data
during kernel startup.

The network traffic however, is more difficult. A typical call stack
looks like this:

Jul 06 17:37:07 t35lp54 kernel:  [<000000008b5600fc>] extract_entropy.constprop.0+0x23c/0x240
Jul 06 17:37:07 t35lp54 kernel:  [<000000008b560136>] crng_reseed+0x36/0xd8
Jul 06 17:37:07 t35lp54 kernel:  [<000000008b5604b8>] crng_make_state+0x78/0x340
Jul 06 17:37:07 t35lp54 kernel:  [<000000008b5607e0>] _get_random_bytes+0x60/0xf8
Jul 06 17:37:07 t35lp54 kernel:  [<000000008b56108a>] get_random_u32+0xda/0x248
Jul 06 17:37:07 t35lp54 kernel:  [<000000008aefe7a8>] kfence_guarded_alloc+0x48/0x4b8
Jul 06 17:37:07 t35lp54 kernel:  [<000000008aeff35e>] __kfence_alloc+0x18e/0x1b8
Jul 06 17:37:07 t35lp54 kernel:  [<000000008aef7f10>] __kmalloc_node_track_caller+0x368/0x4d8
Jul 06 17:37:07 t35lp54 kernel:  [<000000008b611eac>] kmalloc_reserve+0x44/0xa0
Jul 06 17:37:07 t35lp54 kernel:  [<000000008b611f98>] __alloc_skb+0x90/0x178
Jul 06 17:37:07 t35lp54 kernel:  [<000000008b6120dc>] __napi_alloc_skb+0x5c/0x118
Jul 06 17:37:07 t35lp54 kernel:  [<000000008b8f06b4>] qeth_extract_skb+0x13c/0x680
Jul 06 17:37:07 t35lp54 kernel:  [<000000008b8f6526>] qeth_poll+0x256/0x3f8
Jul 06 17:37:07 t35lp54 kernel:  [<000000008b63d76e>] __napi_poll.constprop.0+0x46/0x2f8
Jul 06 17:37:07 t35lp54 kernel:  [<000000008b63dbec>] net_rx_action+0x1cc/0x408
Jul 06 17:37:07 t35lp54 kernel:  [<000000008b937302>] __do_softirq+0x132/0x6b0
Jul 06 17:37:07 t35lp54 kernel:  [<000000008abf46ce>] __irq_exit_rcu+0x13e/0x170
Jul 06 17:37:07 t35lp54 kernel:  [<000000008abf531a>] irq_exit_rcu+0x22/0x50
Jul 06 17:37:07 t35lp54 kernel:  [<000000008b922506>] do_io_irq+0xe6/0x198
Jul 06 17:37:07 t35lp54 kernel:  [<000000008b935826>] io_int_handler+0xd6/0x110
Jul 06 17:37:07 t35lp54 kernel:  [<000000008b9358a6>] psw_idle_exit+0x0/0xa
Jul 06 17:37:07 t35lp54 kernel: ([<000000008ab9c59a>] arch_cpu_idle+0x52/0xe0)
Jul 06 17:37:07 t35lp54 kernel:  [<000000008b933cfe>] default_idle_call+0x6e/0xd0
Jul 06 17:37:07 t35lp54 kernel:  [<000000008ac59f4e>] do_idle+0xf6/0x1b0
Jul 06 17:37:07 t35lp54 kernel:  [<000000008ac5a28e>] cpu_startup_entry+0x36/0x40
Jul 06 17:37:07 t35lp54 kernel:  [<000000008abb0d90>] smp_start_secondary+0x148/0x158
Jul 06 17:37:07 t35lp54 kernel:  [<000000008b935b9e>] restart_int_handler+0x6e/0x90

which confirms that the call is in softirq context. So in_task() covers exactly
the cases where we want to have CPACF trng called: not in nmi, not in hard irq,
not in soft irq but in normal task context and during kernel init.

Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
---
 arch/s390/include/asm/archrandom.h | 16 +++++++---------
 1 file changed, 7 insertions(+), 9 deletions(-)

Comments

Jason A. Donenfeld July 12, 2022, 10:23 a.m. UTC | #1
Hi Harald,

On Tue, Jul 12, 2022 at 12:08:29PM +0200, Harald Freudenberger wrote:
> This patch introduces two things:
> 1) The arch_get_random_seed_int() implementation now always
>    returns false. There is no user in the whole kernel using
>    this function.

Please do not do this. It has nothing to do with the rest of the patch,
but also this isn't really the right place to decide on that. As we
discussed last week with the arch_get_random_words{,_seed} branch of the
conversation - https://lore.kernel.org/all/YsQ%2FvZSkzWPLwIte@zx2c4.com/
- there are a few things that might be suboptimal about the API. When we
fix these, I'd prefer for it to be done in some coherent step. What
you're doing here is just gimping the present API, which preemptively
rots the entire thing and *forces* us to remove it for all architectures
since it would become non-dependable. And I don't like having our hands
be forced here. I'd much rather carefully consider this.

So please remove this snippet.

> 2) For the arch_get_random_seed_long() make sure the CPACF trng
>    instruction is never called in any interrupt context.

I don't object overly loudly to this. However, based on your comment in
https://lore.kernel.org/all/7e65130c6e66ce7a9f9eb469eb7e64e0@linux.ibm.com/
, I was under the impression that this wasn't necessary. If you think it
is, it'd be useful to show some measured latency numbers on actual
systems. Otherwise it seems like premature optimization? Anyway, if you
have solid rationale, I'm fine with this as I mentioned in the other
thread. I'm just a bit confused now on the particulars of the "why" part
given your earlier comment.

> This is done by adding an additional condition in_task().

That doesn't seem right. Instead use `!in_hardirq()`, or perhaps
`!in_nmi() && !in_hardirq()`? Otherwise you also disallow this when
serving softirqs, which based on the discussion, I don't think you
really want to do. Or do you? Without actual latency measurements and a
real world look at the implications, it's hard to see what we're after.
 
> which confirms that the call is in softirq context. So in_task() covers exactly
> the cases where we want to have CPACF trng called: not in nmi, not in hard irq,
> not in soft irq but in normal task context and during kernel init.

You've gone through the troubles of confirming experimentally what
in_task() does, but that doesn't answer *why* it should be disallowed
variously in each one of these contexts.

Regards,
Jason
Harald Freudenberger July 12, 2022, 12:09 p.m. UTC | #2
On 2022-07-12 12:23, Jason A. Donenfeld wrote:
> Hi Harald,
> 
> On Tue, Jul 12, 2022 at 12:08:29PM +0200, Harald Freudenberger wrote:
>> This patch introduces two things:
>> 1) The arch_get_random_seed_int() implementation now always
>>    returns false. There is no user in the whole kernel using
>>    this function.
> 
> Please do not do this. It has nothing to do with the rest of the patch,
> but also this isn't really the right place to decide on that. As we
> discussed last week with the arch_get_random_words{,_seed} branch of 
> the
> conversation - 
> https://lore.kernel.org/all/YsQ%2FvZSkzWPLwIte@zx2c4.com/
> - there are a few things that might be suboptimal about the API. When 
> we
> fix these, I'd prefer for it to be done in some coherent step. What
> you're doing here is just gimping the present API, which preemptively
> rots the entire thing and *forces* us to remove it for all 
> architectures
> since it would become non-dependable. And I don't like having our hands
> be forced here. I'd much rather carefully consider this.
> 
> So please remove this snippet.

Ok, will do.

> 
>> 2) For the arch_get_random_seed_long() make sure the CPACF trng
>>    instruction is never called in any interrupt context.
> 
> I don't object overly loudly to this. However, based on your comment in
> https://lore.kernel.org/all/7e65130c6e66ce7a9f9eb469eb7e64e0@linux.ibm.com/
> , I was under the impression that this wasn't necessary. If you think 
> it
> is, it'd be useful to show some measured latency numbers on actual
> systems. Otherwise it seems like premature optimization? Anyway, if you
> have solid rationale, I'm fine with this as I mentioned in the other
> thread. I'm just a bit confused now on the particulars of the "why" 
> part
> given your earlier comment.
> 
>> This is done by adding an additional condition in_task().
> 
> That doesn't seem right. Instead use `!in_hardirq()`, or perhaps
> `!in_nmi() && !in_hardirq()`? Otherwise you also disallow this when
> serving softirqs, which based on the discussion, I don't think you
> really want to do. Or do you? Without actual latency measurements and a
> real world look at the implications, it's hard to see what we're after.
> 
>> which confirms that the call is in softirq context. So in_task() 
>> covers exactly
>> the cases where we want to have CPACF trng called: not in nmi, not in 
>> hard irq,
>> not in soft irq but in normal task context and during kernel init.
> 
> You've gone through the troubles of confirming experimentally what
> in_task() does, but that doesn't answer *why* it should be disallowed
> variously in each one of these contexts.

I think, I showed this. The only real occurrences remaining for the
arch_get_random_seed_long() call is within softirq context when the
network layer tries to allocate some skb buffers. My personal feeling
about this is that it does not hurt - but I asked our network guys
and their feedback is clear: no way - every delay there may cause
high bandwidth traffic to stumble and this is to be absolutely avoided.
However, they can't give me any measurements.

So yes, the intention is now with checking for in_task() to prevent
the trng call in hard and soft interrupt context. But still I'd like
to meet your condition to provide good random at kernel startup.

> 
> Regards,
> Jason

Regards,
Harald Freudenberger
Jason A. Donenfeld July 12, 2022, 12:27 p.m. UTC | #3
Hi Harald,

On Tue, Jul 12, 2022 at 02:09:35PM +0200, Harald Freudenberger wrote:
> > You've gone through the troubles of confirming experimentally what
> > in_task() does, but that doesn't answer *why* it should be disallowed
> > variously in each one of these contexts.
> 
> I think, I showed this. The only real occurrences remaining for the
> arch_get_random_seed_long() call is within softirq context when the
> network layer tries to allocate some skb buffers. My personal feeling
> about this is that it does not hurt - but I asked our network guys
> and their feedback is clear: no way - every delay there may cause
> high bandwidth traffic to stumble and this is to be absolutely avoided.
> However, they can't give me any measurements.
> 
> So yes, the intention is now with checking for in_task() to prevent
> the trng call in hard and soft interrupt context. But still I'd like
> to meet your condition to provide good random at kernel startup.

That's too bad, but okay.

Final question: do you see any of the in_task() vs in_whatever()
semantics changing if arch_get_random_words{,_seed}() is ever
implemented, which would reduce the current multitude of calls to the
trng to a single call?

Jason
Harald Freudenberger July 12, 2022, 2:35 p.m. UTC | #4
On 2022-07-12 14:27, Jason A. Donenfeld wrote:
> Hi Harald,
> 
> On Tue, Jul 12, 2022 at 02:09:35PM +0200, Harald Freudenberger wrote:
>> > You've gone through the troubles of confirming experimentally what
>> > in_task() does, but that doesn't answer *why* it should be disallowed
>> > variously in each one of these contexts.
>> 
>> I think, I showed this. The only real occurrences remaining for the
>> arch_get_random_seed_long() call is within softirq context when the
>> network layer tries to allocate some skb buffers. My personal feeling
>> about this is that it does not hurt - but I asked our network guys
>> and their feedback is clear: no way - every delay there may cause
>> high bandwidth traffic to stumble and this is to be absolutely 
>> avoided.
>> However, they can't give me any measurements.
>> 
>> So yes, the intention is now with checking for in_task() to prevent
>> the trng call in hard and soft interrupt context. But still I'd like
>> to meet your condition to provide good random at kernel startup.
> 
> That's too bad, but okay.
> 
> Final question: do you see any of the in_task() vs in_whatever()
> semantics changing if arch_get_random_words{,_seed}() is ever
> implemented, which would reduce the current multitude of calls to the
> trng to a single call?
> 
> Jason

Hm, no, I can't see a way to provide trng random data in any whatever
interrupt context for the next future. The only enabler would be to
use a buffer ... I started to get in contact with our hardware guys
to make the trng data internally buffered and this the invocation
could be in no time give back random data. But this may be a
hardware development thing for the next machine generation.

Thanks Jason for your work

Regards
Harald Freudenberger
Jason A. Donenfeld July 12, 2022, 2:44 p.m. UTC | #5
Hi Harald,

On Tue, Jul 12, 2022 at 04:35:41PM +0200, Harald Freudenberger wrote:
> On 2022-07-12 14:27, Jason A. Donenfeld wrote:
> > Hi Harald,
> > 
> > On Tue, Jul 12, 2022 at 02:09:35PM +0200, Harald Freudenberger wrote:
> >> > You've gone through the troubles of confirming experimentally what
> >> > in_task() does, but that doesn't answer *why* it should be disallowed
> >> > variously in each one of these contexts.
> >> 
> >> I think, I showed this. The only real occurrences remaining for the
> >> arch_get_random_seed_long() call is within softirq context when the
> >> network layer tries to allocate some skb buffers. My personal feeling
> >> about this is that it does not hurt - but I asked our network guys
> >> and their feedback is clear: no way - every delay there may cause
> >> high bandwidth traffic to stumble and this is to be absolutely 
> >> avoided.
> >> However, they can't give me any measurements.
> >> 
> >> So yes, the intention is now with checking for in_task() to prevent
> >> the trng call in hard and soft interrupt context. But still I'd like
> >> to meet your condition to provide good random at kernel startup.
> > 
> > That's too bad, but okay.
> > 
> > Final question: do you see any of the in_task() vs in_whatever()
> > semantics changing if arch_get_random_words{,_seed}() is ever
> > implemented, which would reduce the current multitude of calls to the
> > trng to a single call?
> > 
> > Jason
> 
> Hm, no, I can't see a way to provide trng random data in any whatever
> interrupt context for the next future. The only enabler would be to
> use a buffer ... I started to get in contact with our hardware guys
> to make the trng data internally buffered and this the invocation
> could be in no time give back random data. But this may be a
> hardware development thing for the next machine generation.

Alrightie then. Well, I'll Ack a v2 that keeps the _int variant. Sounds
like then we'll be done here.

Tangential topic: would be nice to add the TRNG instruction to QEMU's
TCG so that VMs in CI have a bit of randomness available.

Jason
diff mbox series

Patch

diff --git a/arch/s390/include/asm/archrandom.h b/arch/s390/include/asm/archrandom.h
index 2c6e1c6ecbe7..9662c9fa7683 100644
--- a/arch/s390/include/asm/archrandom.h
+++ b/arch/s390/include/asm/archrandom.h
@@ -2,7 +2,7 @@ 
 /*
  * Kernel interface for the s390 arch_random_* functions
  *
- * Copyright IBM Corp. 2017, 2020
+ * Copyright IBM Corp. 2017, 2022
  *
  * Author: Harald Freudenberger <freude@de.ibm.com>
  *
@@ -14,6 +14,7 @@ 
 #ifdef CONFIG_ARCH_RANDOM
 
 #include <linux/static_key.h>
+#include <linux/preempt.h>
 #include <linux/atomic.h>
 #include <asm/cpacf.h>
 
@@ -33,20 +34,17 @@  static inline bool __must_check arch_get_random_int(unsigned int *v)
 static inline bool __must_check arch_get_random_seed_long(unsigned long *v)
 {
 	if (static_branch_likely(&s390_arch_random_available)) {
-		cpacf_trng(NULL, 0, (u8 *)v, sizeof(*v));
-		atomic64_add(sizeof(*v), &s390_arch_random_counter);
-		return true;
+		if (in_task()) {
+			cpacf_trng(NULL, 0, (u8 *)v, sizeof(*v));
+			atomic64_add(sizeof(*v), &s390_arch_random_counter);
+			return true;
+		}
 	}
 	return false;
 }
 
 static inline bool __must_check arch_get_random_seed_int(unsigned int *v)
 {
-	if (static_branch_likely(&s390_arch_random_available)) {
-		cpacf_trng(NULL, 0, (u8 *)v, sizeof(*v));
-		atomic64_add(sizeof(*v), &s390_arch_random_counter);
-		return true;
-	}
 	return false;
 }