Message ID | 20180413013046.404-1-tytso@mit.edu (mailing list archive) |
---|---|
State | Not Applicable |
Delegated to: | Herbert Xu |
Headers | show |
Am Freitag, 13. April 2018, 03:30:42 CEST schrieb Theodore Ts'o: Hi Theodore, > The crng_init variable has three states: > > 0: The CRNG is not initialized at all > 1: The CRNG has a small amount of entropy, hopefully good enough for > early-boot, non-cryptographical use cases > 2: The CRNG is fully initialized and we are sure it is safe for > cryptographic use cases. > > The crng_ready() function should only return true once we are in the > last state. > Do I see that correctly that getrandom(2) will now unblock after the input_pool has obtained 128 bits of entropy? Similarly for get_random_bytes_wait. As this seems to be the only real use case for crng_ready (apart from logging), what is the purpose of crng_init == 1? Ciao Stephan
On Fri, Apr 13, 2018 at 07:38:30AM +0200, Stephan Mueller wrote: > > The crng_init variable has three states: > > > > 0: The CRNG is not initialized at all > > 1: The CRNG has a small amount of entropy, hopefully good enough for > > early-boot, non-cryptographical use cases > > 2: The CRNG is fully initialized and we are sure it is safe for > > cryptographic use cases. > > > > The crng_ready() function should only return true once we are in the > > last state. > > > Do I see that correctly that getrandom(2) will now unblock after the > input_pool has obtained 128 bits of entropy? Similarly for > get_random_bytes_wait. Correct. > As this seems to be the only real use case for crng_ready (apart from > logging), what is the purpose of crng_init == 1? Immediately after boot (crng_init state 0), we funnel the harvested entropy from add_interrupt_entropy() directly to the CRNG pool. The goal is to get at least some amount of entropy into the CRNG is it's not blatently predictable. When we have accomplished this, by mixing in 2 * CHACHA20_KEY_SIZE bytes of input from add_input_randomness, we enter state crng_init state 1. In crng_init state 1, we start funnelling the harvested entropy to the input_pool. Once we have accumulated 128 bits of entropy in the input pool, we then reseed the CRNG from the input_pool, and we consider the CRNG to be fully intialized. During crng_init state 1, the CRNG is basically in a "good enough for government work" state. It's going to get used by things like initial TCP sequence numbers, and UUID's by udev, et. al, but I wouldn't want to use it for anything that has strong cryptographic use cases. It would be preferable if nothing in the kernel and early systemd startup used get_random_bytes() or /dev/urandom or getrandom(2) w/ GRND_NONBLOCK. Unfortunately, (a) there is a lot of legacy code which still uses /dev/urandom and not getrandom(2) and (b) in some cases, such as initialization of the Stack Canary, the kernel simply can't wait for the CRNG to be fully initialized. Or if the kernel needs enough of the networking stack to mount an NFS root file system, for example. This was always the original design intent, but I screwed up and no one noticed until Jann reached out to be and said, "Hey.... this doesn't seem to make much sense". Regards, - Ted
Am Freitag, 13. April 2018, 14:53:13 CEST schrieb Theodore Y. Ts'o: Hi Theodore, > > This was always the original design intent, but I screwed up and no > one noticed until Jann reached out to be and said, "Hey.... this > doesn't seem to make much sense". I disagree, but I guess you would have expected that. But this is not the issue. What I would like to point out that more and more folks change to getrandom(2). As this call will now unblock much later in the boot cycle, these systems see a significant departure from the current system behavior. E.g. an sshd using getrandom(2) would be ready shortly after the boot finishes as of now. Now it can be a matter minutes before it responds. Thus, is such change in the kernel behavior something for stable? Ciao Stephan
On Fri, Apr 13, 2018 at 03:05:01PM +0200, Stephan Mueller wrote: > > What I would like to point out that more and more folks change to > getrandom(2). As this call will now unblock much later in the boot cycle, > these systems see a significant departure from the current system behavior. > > E.g. an sshd using getrandom(2) would be ready shortly after the boot finishes > as of now. Now it can be a matter minutes before it responds. Thus, is such > change in the kernel behavior something for stable? It will have some change on the kernel behavior, but not as much as you might think. That's because in older kernels, we were *already* blocking until crng_init > 2 --- if the getrandom(2) call happened while crng_init was in state 0. Even before this patch series, we didn't wake up a process blocked on crng_init_wait until crng_init state 2 is reached: static void crng_reseed(struct crng_state *crng, struct entropy_store *r) { ... if (crng == &primary_crng && crng_init < 2) { invalidate_batched_entropy(); crng_init = 2; process_random_ready_list(); wake_up_interruptible(&crng_init_wait); pr_notice("random: crng init done\n"); } } This is the reason why there are reports like this: "Boot delayed for about 90 seconds until 'random: crng init done'"[1] [1] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1685794 So we have the problem already. There will be more cases of this after this patch series is applied, true. But what we have already is an inconsistent state where if you call getrandom(2) while the kernel is in crng_init state 0, you will block until crng_init state 2, but if you are in crng_init state 1, you will assume the CRNG is fully initialized. Given the documentation of how getrandom(2) works what its documented guarantees are, I think it does justify making its behavior both more consistent with itself, and more consistent what the security guarantees we have promised people. I was a little worried that on VM's this could end up causing things to block for a long time, but an experiment on a GCE VM shows that isn't a problem: [ 0.000000] Linux version 4.16.0-rc3-ext4-00009-gf6b302ebca85 (tytso@cwcc) (gcc version 7.3.0 (Debian 7.3.0-15)) #16 SMP Thu Apr 12 16:57:17 EDT 2018 [ 1.282220] random: fast init done [ 3.987092] random: crng init done [ 4.376787] EXT4-fs (sda1): re-mounted. Opts: (null) There are some desktops where the "crng_init done" report doesn't happen until 45-90 seconds into the boot. I don't think I've seen reports where it takes _minutes_ however. Can you give me some examples of such cases? - Ted P.S. Of course, in a VM environment, if the host supports virtio-rng, the boot delay problem is completely not an issue. You just have to enable virtio-rng in the guest kernel, which I believe is already the case for most distro kernels. BTW, for KVM, it's fairly simple to set it the host-side support for virtio-rng. Just add to the kvm command-line options: -object rng-random,filename=/dev/urandom,id=rng0 \ -device virtio-rng-pci,rng=rng0
Hi Ted, On Fri, Apr 13, 2018 at 3:30 AM, Theodore Ts'o <tytso@mit.edu> wrote: > The crng_init variable has three states: > > 0: The CRNG is not initialized at all > 1: The CRNG has a small amount of entropy, hopefully good enough for > early-boot, non-cryptographical use cases > 2: The CRNG is fully initialized and we are sure it is safe for > cryptographic use cases. > > The crng_ready() function should only return true once we are in the > last state. > > Reported-by: Jann Horn <jannh@google.com> > Fixes: e192be9d9a30 ("random: replace non-blocking pool...") > Cc: stable@kernel.org # 4.8+ > Signed-off-by: Theodore Ts'o <tytso@mit.edu> > Reviewed-by: Jann Horn <jannh@google.com> Since commit 43838a23a05fbd13 ("random: fix crng_ready() test"), all (few) remaining users of %p are printing too early, leading to "(ptrval)" strings instead of actual hashed pointer values. Sample timings on two platforms (arm / arm64) booting with lots of debug ingo: [ 28.521158] random: crng init done [ 17.792705] random: crng init done Gr{oetje,eeting}s, Geert
On 4/13/18 10:00 AM, Theodore Y. Ts'o wrote: > On Fri, Apr 13, 2018 at 03:05:01PM +0200, Stephan Mueller wrote: >> >> What I would like to point out that more and more folks change to >> getrandom(2). As this call will now unblock much later in the boot cycle, >> these systems see a significant departure from the current system behavior. >> >> E.g. an sshd using getrandom(2) would be ready shortly after the boot finishes >> as of now. Now it can be a matter minutes before it responds. Thus, is such >> change in the kernel behavior something for stable? [...] > I was a little worried that on VM's this could end up causing things > to block for a long time, but an experiment on a GCE VM shows that > isn't a problem: > > [ 0.000000] Linux version 4.16.0-rc3-ext4-00009-gf6b302ebca85 (tytso@cwcc) (gcc version 7.3.0 (Debian 7.3.0-15)) #16 SMP Thu Apr 12 16:57:17 EDT 2018 > [ 1.282220] random: fast init done > [ 3.987092] random: crng init done > [ 4.376787] EXT4-fs (sda1): re-mounted. Opts: (null) > > There are some desktops where the "crng_init done" report doesn't > happen until 45-90 seconds into the boot. I don't think I've seen > reports where it takes _minutes_ however. Can you give me some > examples of such cases? On a Photon OS VM running on VMware ESXi, this patch causes a boot speed regression of 5 minutes :-( [ The VM doesn't have haveged or rng-tools (rngd) installed. ] [ 1.420246] EXT4-fs (sda2): re-mounted. Opts: barrier,noacl,data=ordered [ 1.469722] tsc: Refined TSC clocksource calibration: 1900.002 MHz [ 1.470707] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x36c65c1a9e1, max_idle_ns: 881590695311 ns [ 1.474249] clocksource: Switched to clocksource tsc [ 1.584427] systemd-journald[216]: Received request to flush runtime journal from PID 1 [ 346.620718] random: crng init done Interestingly, the boot delay is exacerbated on VMs with large amounts of RAM. For example, the delay is not so noticeable (< 30 seconds) on a VM with 2GB memory, but goes up to 5 minutes on an 8GB VM. Also, cloud-init-local.service seems to be the one blocking for entropy here. systemd-analyze critical-chain shows: The time after the unit is active or started is printed after the "@" character. The time the unit takes to start is printed after the "+" character. multi-user.target @6min 1.283s └─vmtoolsd.service @6min 1.282s └─cloud-final.service @6min 366ms +914ms └─cloud-config.service @5min 59.174s +1.190s └─cloud-config.target @5min 59.172s └─cloud-init.service @5min 47.423s +11.744s └─systemd-networkd-wait-online.service @5min 45.999s +1.420s └─systemd-networkd.service @5min 45.975s +21ms └─network-pre.target @5min 45.973s └─cloud-init-local.service @241ms +5min 45.687s └─systemd-remount-fs.service @222ms +13ms └─systemd-fsck-root.service @193ms +26ms └─systemd-journald.socket @188ms └─-.mount @151ms └─system.slice @161ms └─-.slice @151ms It would be great if this CVE can be fixed somehow without causing boot speed to spike from ~20 seconds to 5 minutes, as that makes the system pretty much unusable. I can workaround this by installing haveged, but ideally an in-kernel fix would be better. If you need any other info about my setup or if you have a patch that I can test, please let me know! Thank you very much! Regards, Srivatsa VMware Photon OS
Le 13/04/2018 à 19:00, Theodore Y. Ts'o a écrit : > On Fri, Apr 13, 2018 at 03:05:01PM +0200, Stephan Mueller wrote: >> >> What I would like to point out that more and more folks change to >> getrandom(2). As this call will now unblock much later in the boot cycle, >> these systems see a significant departure from the current system behavior. >> >> E.g. an sshd using getrandom(2) would be ready shortly after the boot finishes >> as of now. Now it can be a matter minutes before it responds. Thus, is such >> change in the kernel behavior something for stable? > > It will have some change on the kernel behavior, but not as much as > you might think. That's because in older kernels, we were *already* > blocking until crng_init > 2 --- if the getrandom(2) call happened > while crng_init was in state 0. > > Even before this patch series, we didn't wake up a process blocked on > crng_init_wait until crng_init state 2 is reached: > > static void crng_reseed(struct crng_state *crng, struct entropy_store *r) > { > ... > if (crng == &primary_crng && crng_init < 2) { > invalidate_batched_entropy(); > crng_init = 2; > process_random_ready_list(); > wake_up_interruptible(&crng_init_wait); > pr_notice("random: crng init done\n"); > } > } > > This is the reason why there are reports like this: "Boot delayed for > about 90 seconds until 'random: crng init done'"[1] > > [1] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1685794 > > > So we have the problem already. There will be more cases of this > after this patch series is applied, true. But what we have already is > an inconsistent state where if you call getrandom(2) while the kernel > is in crng_init state 0, you will block until crng_init state 2, but > if you are in crng_init state 1, you will assume the CRNG is fully > initialized. > > Given the documentation of how getrandom(2) works what its documented > guarantees are, I think it does justify making its behavior both more > consistent with itself, and more consistent what the security > guarantees we have promised people. > > I was a little worried that on VM's this could end up causing things > to block for a long time, but an experiment on a GCE VM shows that > isn't a problem: > > [ 0.000000] Linux version 4.16.0-rc3-ext4-00009-gf6b302ebca85 (tytso@cwcc) (gcc version 7.3.0 (Debian 7.3.0-15)) #16 SMP Thu Apr 12 16:57:17 EDT 2018 > [ 1.282220] random: fast init done > [ 3.987092] random: crng init done > [ 4.376787] EXT4-fs (sda1): re-mounted. Opts: (null) > > There are some desktops where the "crng_init done" report doesn't > happen until 45-90 seconds into the boot. I don't think I've seen > reports where it takes _minutes_ however. Can you give me some > examples of such cases? On a powerpc embedded board which has an mpc8xx processor running at 133Mhz, I now get the startup done in more than 7 minutes instead of 30 seconds. This is due to the webserver blocking on read on /dev/random until we get 'random: crng init done': [ 0.000000] Linux version 4.17.0-rc4-00415-gd2f75d40072d (root@localhost) (gcc version 5.4.0 (GCC)) #203 PREEMPT Wed May 16 16:32:02 CEST 2018 [ 0.295453] random: get_random_u32 called from bucket_table_alloc+0x84/0x1bc with crng_init=0 [ 1.030472] device: 'random': device_add [ 1.031279] device: 'urandom': device_add [ 1.420069] device: 'hw_random': device_add [ 2.156853] random: fast init done [ 462.007776] random: crng init done This has become really critical, is there anything that can be done ? Christophe > > - Ted > > P.S. Of course, in a VM environment, if the host supports virtio-rng, > the boot delay problem is completely not an issue. You just have to > enable virtio-rng in the guest kernel, which I believe is already the > case for most distro kernels. > > BTW, for KVM, it's fairly simple to set it the host-side support for > virtio-rng. Just add to the kvm command-line options: > > -object rng-random,filename=/dev/urandom,id=rng0 \ > -device virtio-rng-pci,rng=rng0 >
On Wed, May 16, 2018 at 05:07:08PM -0700, Srivatsa S. Bhat wrote: > > On a Photon OS VM running on VMware ESXi, this patch causes a boot speed > regression of 5 minutes :-( [ The VM doesn't have haveged or rng-tools > (rngd) installed. ] > > [ 1.420246] EXT4-fs (sda2): re-mounted. Opts: barrier,noacl,data=ordered > [ 1.469722] tsc: Refined TSC clocksource calibration: 1900.002 MHz > [ 1.470707] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x36c65c1a9e1, max_idle_ns: 881590695311 ns > [ 1.474249] clocksource: Switched to clocksource tsc > [ 1.584427] systemd-journald[216]: Received request to flush runtime journal from PID 1 > [ 346.620718] random: crng init done > > Interestingly, the boot delay is exacerbated on VMs with large amounts > of RAM. For example, the delay is not so noticeable (< 30 seconds) on a > VM with 2GB memory, but goes up to 5 minutes on an 8GB VM. > > Also, cloud-init-local.service seems to be the one blocking for entropy > here. So the first thing I'd ask you to investigate is what the heck cloud-init-local.service is doing, and why it really needs cryptographic grade random numbers? > It would be great if this CVE can be fixed somehow without causing boot speed > to spuike from ~20 seconds to 5 minutes, as that makes the system pretty much > unusable. I can workaround this by installing haveged, but ideally an in-kernel > fix would be better. If you need any other info about my setup or if you have > a patch that I can test, please let me know! So the question is why is strong random numbers needed by cloud-init-local, and how much do you trust either haveged and/or RDRAND (which is what you will be depending upon if you install rng-tools). If you believe that Intel and/or the NSA hasn't backdoored RDRAND[1], or you believe that because Intel processor's internal cache architecture isn't publically documented, and it's Soooooooo complicated that no one can figure it out (which is what you will be depending upon if you if you choose haveged), be my guest. I personally consider the latter to be "security via obscu7rity", but others have different opinions. [1] As an aside, read the best paper from the 37th IEEE Symposium on Security and Privacy and weep: https://www.computerworld.com/article/3079417/security/researchers-built-devious-undetectable-hardware-level-backdoor-in-computer-chips.html If it turns out that the security if your VM is critically dependent on what cloud-init-local.service is doing, and you don't like making those assumptions, then you may need to ask VMWare ESXi to make virtio-rng available to guest OS's, and then make Photon OS depend on a secure RNG available to the host OS. Best regards, - Ted
On Thu, May 17, 2018 at 08:01:04AM +0200, Christophe LEROY wrote: > > On a powerpc embedded board which has an mpc8xx processor running at 133Mhz, > I now get the startup done in more than 7 minutes instead of 30 seconds. > This is due to the webserver blocking on read on /dev/random until we get > 'random: crng init done': > > [ 0.000000] Linux version 4.17.0-rc4-00415-gd2f75d40072d (root@localhost) > (gcc version 5.4.0 (GCC)) #203 PREEMPT Wed May 16 16:32:02 CEST 2018 > [ 0.295453] random: get_random_u32 called from > bucket_table_alloc+0x84/0x1bc with crng_init=0 > [ 1.030472] device: 'random': device_add > [ 1.031279] device: 'urandom': device_add > [ 1.420069] device: 'hw_random': device_add > [ 2.156853] random: fast init done > [ 462.007776] random: crng init done > > This has become really critical, is there anything that can be done ? Figure out why the webserver needs to read /dev/random and is it for a security critical purpose? A kernel patch which makes the kernel do a "lalalalala I'm secure" when it really isn't secure is a simple "solution", but would it really make you happy? - Ted
diff --git a/drivers/char/random.c b/drivers/char/random.c index e027e7fa1472..c8ec1e70abde 100644 --- a/drivers/char/random.c +++ b/drivers/char/random.c @@ -427,7 +427,7 @@ struct crng_state primary_crng = { * its value (from 0->1->2). */ static int crng_init = 0; -#define crng_ready() (likely(crng_init > 0)) +#define crng_ready() (likely(crng_init > 1)) static int crng_init_cnt = 0; #define CRNG_INIT_CNT_THRESH (2*CHACHA20_KEY_SIZE) static void _extract_crng(struct crng_state *crng, @@ -794,7 +794,7 @@ static int crng_fast_load(const char *cp, size_t len) if (!spin_trylock_irqsave(&primary_crng.lock, flags)) return 0; - if (crng_ready()) { + if (crng_init != 0) { spin_unlock_irqrestore(&primary_crng.lock, flags); return 0; } @@ -856,7 +856,7 @@ static void _extract_crng(struct crng_state *crng, { unsigned long v, flags; - if (crng_init > 1 && + if (crng_ready() && time_after(jiffies, crng->init_time + CRNG_RESEED_INTERVAL)) crng_reseed(crng, crng == &primary_crng ? &input_pool : NULL); spin_lock_irqsave(&crng->lock, flags); @@ -1139,7 +1139,7 @@ void add_interrupt_randomness(int irq, int irq_flags) fast_mix(fast_pool); add_interrupt_bench(cycles); - if (!crng_ready()) { + if (unlikely(crng_init == 0)) { if ((fast_pool->count >= 64) && crng_fast_load((char *) fast_pool->pool, sizeof(fast_pool->pool))) { @@ -2212,7 +2212,7 @@ void add_hwgenerator_randomness(const char *buffer, size_t count, { struct entropy_store *poolp = &input_pool; - if (!crng_ready()) { + if (unlikely(crng_init == 0)) { crng_fast_load(buffer, count); return; }