Message ID | 20220721104950.434544-1-Jason@zx2c4.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v6] hw/i386: pass RNG seed via setup_data entry | expand |
On Thu, Jul 21, 2022 at 12:49:50PM +0200, Jason A. Donenfeld wrote: > Tiny machines optimized for fast boot time generally don't use EFI, > which means a random seed has to be supplied some other way. For this > purpose, Linux (≥5.20) supports passing a seed in the setup_data table > with SETUP_RNG_SEED, specially intended for hypervisors, kexec, and > specialized bootloaders. The linked commit shows the upstream kernel > implementation. > > Link: https://git.kernel.org/tip/tip/c/68b8e9713c8 > Cc: Michael S. Tsirkin <mst@redhat.com> > Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com> > Cc: Paolo Bonzini <pbonzini@redhat.com> > Cc: Richard Henderson <richard.henderson@linaro.org> > Cc: Eduardo Habkost <eduardo@habkost.net> > Cc: Peter Maydell <peter.maydell@linaro.org> > Cc: Philippe Mathieu-Daudé <f4bug@amsat.org> > Cc: Laurent Vivier <laurent@vivier.eu> > Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Well why not. Reviewed-by: Michael S. Tsirkin <mst@redhat.com> who's merging this? Paolo me or you? > --- > hw/i386/x86.c | 21 +++++++++++++++++--- > include/standard-headers/asm-x86/bootparam.h | 1 + > 2 files changed, 19 insertions(+), 3 deletions(-) > > diff --git a/hw/i386/x86.c b/hw/i386/x86.c > index 6003b4b2df..56896cb4b2 100644 > --- a/hw/i386/x86.c > +++ b/hw/i386/x86.c > @@ -26,6 +26,7 @@ > #include "qemu/cutils.h" > #include "qemu/units.h" > #include "qemu/datadir.h" > +#include "qemu/guest-random.h" > #include "qapi/error.h" > #include "qapi/qmp/qerror.h" > #include "qapi/qapi-visit-common.h" > @@ -774,7 +775,7 @@ void x86_load_linux(X86MachineState *x86ms, > int dtb_size, setup_data_offset; > uint32_t initrd_max; > uint8_t header[8192], *setup, *kernel; > - hwaddr real_addr, prot_addr, cmdline_addr, initrd_addr = 0; > + hwaddr real_addr, prot_addr, cmdline_addr, initrd_addr = 0, first_setup_data = 0; > FILE *f; > char *vmode; > MachineState *machine = MACHINE(x86ms); > @@ -784,6 +785,7 @@ void x86_load_linux(X86MachineState *x86ms, > const char *dtb_filename = machine->dtb; > const char *kernel_cmdline = machine->kernel_cmdline; > SevKernelLoaderContext sev_load_ctx = {}; > + enum { RNG_SEED_LENGTH = 32 }; > > /* Align to 16 bytes as a paranoia measure */ > cmdline_size = (strlen(kernel_cmdline) + 16) & ~15; > @@ -1063,16 +1065,29 @@ void x86_load_linux(X86MachineState *x86ms, > kernel_size = setup_data_offset + sizeof(struct setup_data) + dtb_size; > kernel = g_realloc(kernel, kernel_size); > > - stq_p(header + 0x250, prot_addr + setup_data_offset); > > setup_data = (struct setup_data *)(kernel + setup_data_offset); > - setup_data->next = 0; > + setup_data->next = cpu_to_le64(first_setup_data); > + first_setup_data = prot_addr + setup_data_offset; > setup_data->type = cpu_to_le32(SETUP_DTB); > setup_data->len = cpu_to_le32(dtb_size); > > load_image_size(dtb_filename, setup_data->data, dtb_size); > } > > + setup_data_offset = QEMU_ALIGN_UP(kernel_size, 16); > + kernel_size = setup_data_offset + sizeof(struct setup_data) + RNG_SEED_LENGTH; > + kernel = g_realloc(kernel, kernel_size); > + setup_data = (struct setup_data *)(kernel + setup_data_offset); > + setup_data->next = cpu_to_le64(first_setup_data); > + first_setup_data = prot_addr + setup_data_offset; > + setup_data->type = cpu_to_le32(SETUP_RNG_SEED); > + setup_data->len = cpu_to_le32(RNG_SEED_LENGTH); > + qemu_guest_getrandom_nofail(setup_data->data, RNG_SEED_LENGTH); > + > + /* Offset 0x250 is a pointer to the first setup_data link. */ > + stq_p(header + 0x250, first_setup_data); > + > /* > * If we're starting an encrypted VM, it will be OVMF based, which uses the > * efi stub for booting and doesn't require any values to be placed in the > diff --git a/include/standard-headers/asm-x86/bootparam.h b/include/standard-headers/asm-x86/bootparam.h > index 072e2ed546..b2aaad10e5 100644 > --- a/include/standard-headers/asm-x86/bootparam.h > +++ b/include/standard-headers/asm-x86/bootparam.h > @@ -10,6 +10,7 @@ > #define SETUP_EFI 4 > #define SETUP_APPLE_PROPERTIES 5 > #define SETUP_JAILHOUSE 6 > +#define SETUP_RNG_SEED 9 > > #define SETUP_INDIRECT (1<<31) > > -- > 2.35.1
On Thu, Jul 21, 2022 at 1:00 PM Michael S. Tsirkin <mst@redhat.com> wrote: > > On Thu, Jul 21, 2022 at 12:49:50PM +0200, Jason A. Donenfeld wrote: > > Tiny machines optimized for fast boot time generally don't use EFI, > > which means a random seed has to be supplied some other way. For this > > purpose, Linux (≥5.20) supports passing a seed in the setup_data table > > with SETUP_RNG_SEED, specially intended for hypervisors, kexec, and > > specialized bootloaders. The linked commit shows the upstream kernel > > implementation. > > > > Link: https://git.kernel.org/tip/tip/c/68b8e9713c8 > > Cc: Michael S. Tsirkin <mst@redhat.com> > > Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com> > > Cc: Paolo Bonzini <pbonzini@redhat.com> > > Cc: Richard Henderson <richard.henderson@linaro.org> > > Cc: Eduardo Habkost <eduardo@habkost.net> > > Cc: Peter Maydell <peter.maydell@linaro.org> > > Cc: Philippe Mathieu-Daudé <f4bug@amsat.org> > > Cc: Laurent Vivier <laurent@vivier.eu> > > Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> > > Well why not. > > Reviewed-by: Michael S. Tsirkin <mst@redhat.com> > > who's merging this? Paolo me or you? How about you go ahead and merge it. Thanks, Jason
On 7/21/22 13:00, Michael S. Tsirkin wrote: > Well why not. > > Reviewed-by: Michael S. Tsirkin<mst@redhat.com> > > who's merging this? Paolo me or you? I don't think this should be merged as is. The linuxboot ROM takes the data from fw_cfg, and (with the exception of ACPI tables) that data is not migrated. Because reading it into the guest is not atomic, both sides must match. This version of the patches at least doesn't move the preexisting DTB entry of the setup_data, but it still has a mismatching size and that can be a problem when migrating backwards. So for example there is a race window where the guest has read the list head that points to the random number seed entry, but the entry isn't there on the destination. Note that I _do_ agree that this is theoretical and basically impossible to hit. On the other hand, it is there and it's just not the way that we've handled guest ABI compatibility: migration issues are already not fun to debug without having to keep track of which differences are intentional and "harmless" differences and which are bugs. So for this particular case the structure of fw_cfg data MUST match on the source and destination given a machine type and options. Thinking more about it, it's trivial to hit it if you use TCG record-replay, because if the record and replay side will get different sizes fw_cfg and then everything goes south. This shows why this mindset is the correct one even for issues that are theoretical in the case migration: if we hadn't, record-replay would have been much harder to implement. Of course I'm not going to introduce a whole machinery to migrate the seed: if you want determinism (as in the record-replay case) there is the -seed option, if you don't then it's not a huge deal to have a seed that could theoretically be generated half on the source half on the destination. And in fact, after some refactoring, the changes aren't hard to do. My patch is 35(+) 0(-). Paolo
Hi Paolo, On Thu, Jul 21, 2022 at 01:47:02PM +0200, Paolo Bonzini wrote: > On 7/21/22 13:00, Michael S. Tsirkin wrote: > > Well why not. > > > > Reviewed-by: Michael S. Tsirkin<mst@redhat.com> > > > > who's merging this? Paolo me or you? > > I don't think this should be merged as is. > > The linuxboot ROM takes the data from fw_cfg, and (with the exception of > ACPI tables) that data is not migrated. Because reading it into the > guest is not atomic, both sides must match. This version of the patches > at least doesn't move the preexisting DTB entry of the setup_data, but > it still has a mismatching size and that can be a problem when migrating > backwards. As discussed online, this seems absolutely preposterous and will never happen anywhere real ever at all. Trying to account for it is adding needless complexity for no real world benefit; it's the type of thinking that results in a mess. Further, conditionalizing the RNG seed on something else means fewer users receive the increased security of having an early boottime seed. This seems like a silly direction go go in. But to assess things in the open here: - On upgrades, there's no problem because the old bytes don't move. - On downgrades, there's mostly no problem because next will point to 0. - On downgrade there could be some ridiculous theoretical problem if the reader has already read a non-zero next. But this will never actually happen in practice. So we really should just stick with the simple and straight forward path that this v6 accomplishes, and not muck things up with stupidity. Jason
On Thu, Jul 21, 2022 at 02:16:53PM +0200, Jason A. Donenfeld wrote: > Hi Paolo, > > On Thu, Jul 21, 2022 at 01:47:02PM +0200, Paolo Bonzini wrote: > > On 7/21/22 13:00, Michael S. Tsirkin wrote: > > > Well why not. > > > > > > Reviewed-by: Michael S. Tsirkin<mst@redhat.com> > > > > > > who's merging this? Paolo me or you? > > > > I don't think this should be merged as is. > > > > The linuxboot ROM takes the data from fw_cfg, and (with the exception of > > ACPI tables) that data is not migrated. Because reading it into the > > guest is not atomic, both sides must match. This version of the patches > > at least doesn't move the preexisting DTB entry of the setup_data, but > > it still has a mismatching size and that can be a problem when migrating > > backwards. > > As discussed online, this seems absolutely preposterous and will never > happen anywhere real ever at all. Trying to account for it is adding > needless complexity for no real world benefit; it's the type of thinking > that results in a mess. Further, conditionalizing the RNG seed on > something else means fewer users receive the increased security of > having an early boottime seed. This seems like a silly direction go go > in. As mentioned previously, few users are going to benefit from this anyway, because most public cloud VMs don't use direct kernel boot, the guest firmware loads the kernel from the guest /boot partition. Regardless though, what Paolo described with a machine type specific property won't have a significant impact on availablity. This is NOT requiring users to opt-in to the seed in general. Tieing settings to the machine type means that newly provisioned guests will get it enabled out of the box, as they'll typically use the latest machine type. Pre-existing guests which have merely upgraded their QEMU instance won't get the feature, because they'll be fixed on the old machine type to guarantee no guest ABI change. This isn't a problem, as such pre-existing guests likely won't have the new Linux code to consume the seed anyway. With regards, Daniel
On 7/21/22 14:27, Daniel P. Berrangé wrote: > > Pre-existing guests which have merely upgraded their QEMU instance > won't get the feature, because they'll be fixed on the old machine > type to guarantee no guest ABI change. This isn't a problem, as > such pre-existing guests likely won't have the new Linux code to > consume the seed anyway. In fact this isn't a given either and depends on how these pre-existing guests are managed. People that do not use Libvirt, or that just start their guests with "virsh create" and some on-disk XML file, _will_ get the new feature. The way QEMU does things is that fw_cfg is part of guest ABI, and this is _not_ up for discussion. You're not the first one to be confused[1] and you probably will not be the last, so this means that the whole concept of guest ABI should be documented better. (By the way, even though I agree that the failure is completely theoretical apart from the record-replay case, it can actually be reproduced easily by sticking a long-enough sleep in pc-bios/optionrom/linuxboot_dma.c). Paolo
Hi Paolo, Daniel, Okay, patch incoming. Jason
diff --git a/hw/i386/x86.c b/hw/i386/x86.c index 6003b4b2df..56896cb4b2 100644 --- a/hw/i386/x86.c +++ b/hw/i386/x86.c @@ -26,6 +26,7 @@ #include "qemu/cutils.h" #include "qemu/units.h" #include "qemu/datadir.h" +#include "qemu/guest-random.h" #include "qapi/error.h" #include "qapi/qmp/qerror.h" #include "qapi/qapi-visit-common.h" @@ -774,7 +775,7 @@ void x86_load_linux(X86MachineState *x86ms, int dtb_size, setup_data_offset; uint32_t initrd_max; uint8_t header[8192], *setup, *kernel; - hwaddr real_addr, prot_addr, cmdline_addr, initrd_addr = 0; + hwaddr real_addr, prot_addr, cmdline_addr, initrd_addr = 0, first_setup_data = 0; FILE *f; char *vmode; MachineState *machine = MACHINE(x86ms); @@ -784,6 +785,7 @@ void x86_load_linux(X86MachineState *x86ms, const char *dtb_filename = machine->dtb; const char *kernel_cmdline = machine->kernel_cmdline; SevKernelLoaderContext sev_load_ctx = {}; + enum { RNG_SEED_LENGTH = 32 }; /* Align to 16 bytes as a paranoia measure */ cmdline_size = (strlen(kernel_cmdline) + 16) & ~15; @@ -1063,16 +1065,29 @@ void x86_load_linux(X86MachineState *x86ms, kernel_size = setup_data_offset + sizeof(struct setup_data) + dtb_size; kernel = g_realloc(kernel, kernel_size); - stq_p(header + 0x250, prot_addr + setup_data_offset); setup_data = (struct setup_data *)(kernel + setup_data_offset); - setup_data->next = 0; + setup_data->next = cpu_to_le64(first_setup_data); + first_setup_data = prot_addr + setup_data_offset; setup_data->type = cpu_to_le32(SETUP_DTB); setup_data->len = cpu_to_le32(dtb_size); load_image_size(dtb_filename, setup_data->data, dtb_size); } + setup_data_offset = QEMU_ALIGN_UP(kernel_size, 16); + kernel_size = setup_data_offset + sizeof(struct setup_data) + RNG_SEED_LENGTH; + kernel = g_realloc(kernel, kernel_size); + setup_data = (struct setup_data *)(kernel + setup_data_offset); + setup_data->next = cpu_to_le64(first_setup_data); + first_setup_data = prot_addr + setup_data_offset; + setup_data->type = cpu_to_le32(SETUP_RNG_SEED); + setup_data->len = cpu_to_le32(RNG_SEED_LENGTH); + qemu_guest_getrandom_nofail(setup_data->data, RNG_SEED_LENGTH); + + /* Offset 0x250 is a pointer to the first setup_data link. */ + stq_p(header + 0x250, first_setup_data); + /* * If we're starting an encrypted VM, it will be OVMF based, which uses the * efi stub for booting and doesn't require any values to be placed in the diff --git a/include/standard-headers/asm-x86/bootparam.h b/include/standard-headers/asm-x86/bootparam.h index 072e2ed546..b2aaad10e5 100644 --- a/include/standard-headers/asm-x86/bootparam.h +++ b/include/standard-headers/asm-x86/bootparam.h @@ -10,6 +10,7 @@ #define SETUP_EFI 4 #define SETUP_APPLE_PROPERTIES 5 #define SETUP_JAILHOUSE 6 +#define SETUP_RNG_SEED 9 #define SETUP_INDIRECT (1<<31)
Tiny machines optimized for fast boot time generally don't use EFI, which means a random seed has to be supplied some other way. For this purpose, Linux (≥5.20) supports passing a seed in the setup_data table with SETUP_RNG_SEED, specially intended for hypervisors, kexec, and specialized bootloaders. The linked commit shows the upstream kernel implementation. Link: https://git.kernel.org/tip/tip/c/68b8e9713c8 Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Richard Henderson <richard.henderson@linaro.org> Cc: Eduardo Habkost <eduardo@habkost.net> Cc: Peter Maydell <peter.maydell@linaro.org> Cc: Philippe Mathieu-Daudé <f4bug@amsat.org> Cc: Laurent Vivier <laurent@vivier.eu> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> --- hw/i386/x86.c | 21 +++++++++++++++++--- include/standard-headers/asm-x86/bootparam.h | 1 + 2 files changed, 19 insertions(+), 3 deletions(-)