mbox series

[RFC,v2,0/4] QEMU changes to do PVH boot

Message ID 1545422632-24444-1-git-send-email-liam.merwick@oracle.com (mailing list archive)
Headers show
Series QEMU changes to do PVH boot | expand

Message

Liam Merwick Dec. 21, 2018, 8:03 p.m. UTC
For certain applications it is desirable to rapidly boot a KVM virtual
machine. In cases where legacy hardware and software support within the
guest is not needed, QEMU should be able to boot directly into the
uncompressed Linux kernel binary with minimal firmware involvement.

There already exists an ABI to allow this for Xen PVH guests and the ABI
is supported by Linux and FreeBSD:

   https://xenbits.xen.org/docs/unstable/misc/pvh.html

Details on the Linux changes (v9 staged for 4.21): https://lkml.org/lkml/2018/12/14/1330
qboot pull request: https://github.com/bonzini/qboot/pull/17 

This patch series provides QEMU support to read the ELF header of an
uncompressed kernel binary and get the 32-bit PVH kernel entry point
from an ELF Note.  In load_linux() a call is made to load_elfboot()
so see if the header matches that of an uncompressed kernel binary (ELF)
and if so, loads the binary and determines the kernel entry address
from an ELF Note in the binary.  Then qboot does futher initialisation
of the guest (e820, etc.) and jumps to the kernel entry address and
boots the guest.

changes v1 -> v2
- Based on feedback from Stefan Hajnoczi
- The reading of the PVH entry point is now done in a single pass during
  elf_load() which results in Patch2 in v1 being split into Patches 1&2 in v2
  and considerably reworked.
- Patch1 adds a new optional function pointer to parse the ELF note type
  (the type is passed in via the existing translate_opaque arg - the
  function already had 11 args so I didn't want to add more than one new arg).
- Patch2 adds a function to elf_ops.h to find an ELF note
  matching a specific type 
- Patch3 just has a line added to the commit message to state that the Xen
  repo is the canonical location
- Patch4 (that does the PVH boot) is mainly equivalent to Patch3 in v1
  just minor load_elfboot() changes and the addition of a
  read_pvh_start_addr() helper function for load_elf()


Usіng the method/scripts documented by the NEMU team at

   https://github.com/intel/nemu/wiki/Measuring-Boot-Latency
   https://lists.gnu.org/archive/html/qemu-devel/2018-12/msg00200.html

below are some timings measured (vmlinux and bzImage from the same build)
Time to get to kernel start is almost halved (95ṁs -> 48ms)

QEMU + qboot + vmlinux (PVH + 4.20-rc4)
 qemu_init_end: 41.550521
 fw_start: 41.667139 (+0.116618)
 fw_do_boot: 47.448495 (+5.781356)
 linux_startup_64: 47.720785 (+0.27229)
 linux_start_kernel: 48.399541 (+0.678756)
 linux_start_user: 296.952056 (+248.552515)

QEMU + qboot + bzImage:
 qemu_init_end: 29.209276
 fw_start: 29.317342 (+0.108066)
 linux_start_boot: 36.679362 (+7.36202)
 linux_startup_64: 94.531349 (+57.851987)
 linux_start_kernel: 94.900913 (+0.369564)
 linux_start_user: 401.060971 (+306.160058)

QEMU + bzImage:
 qemu_init_end: 30.424430
 linux_startup_64: 893.770334 (+863.345904)
 linux_start_kernel: 894.17049 (+0.400156)
 linux_start_user: 1208.679768 (+314.509278)


Liam Merwick (4):
  elf: Add optional function ptr to load_elf() to parse ELF notes
  elf-ops.h: Add get_elf_note_type()
  pvh: Add x86/HVM direct boot ABI header file
  pvh: Boot uncompressed kernel using direct boot ABI

 hw/alpha/dp264.c               |   4 +-
 hw/arm/armv7m.c                |   3 +-
 hw/arm/boot.c                  |   2 +-
 hw/core/generic-loader.c       |   2 +-
 hw/core/loader.c               |  24 ++++---
 hw/cris/boot.c                 |   3 +-
 hw/hppa/machine.c              |   6 +-
 hw/i386/multiboot.c            |   2 +-
 hw/i386/pc.c                   | 131 +++++++++++++++++++++++++++++++++++-
 hw/lm32/lm32_boards.c          |   6 +-
 hw/lm32/milkymist.c            |   3 +-
 hw/m68k/an5206.c               |   2 +-
 hw/m68k/mcf5208.c              |   2 +-
 hw/microblaze/boot.c           |   7 +-
 hw/mips/mips_fulong2e.c        |   5 +-
 hw/mips/mips_malta.c           |   5 +-
 hw/mips/mips_mipssim.c         |   5 +-
 hw/mips/mips_r4k.c             |   5 +-
 hw/moxie/moxiesim.c            |   2 +-
 hw/nios2/boot.c                |   7 +-
 hw/openrisc/openrisc_sim.c     |   2 +-
 hw/pci-host/prep.c             |   2 +-
 hw/ppc/e500.c                  |   3 +-
 hw/ppc/mac_newworld.c          |   5 +-
 hw/ppc/mac_oldworld.c          |   5 +-
 hw/ppc/ppc440_bamboo.c         |   2 +-
 hw/ppc/sam460ex.c              |   3 +-
 hw/ppc/spapr.c                 |   7 +-
 hw/ppc/virtex_ml507.c          |   2 +-
 hw/riscv/sifive_e.c            |   2 +-
 hw/riscv/sifive_u.c            |   2 +-
 hw/riscv/spike.c               |   2 +-
 hw/riscv/virt.c                |   2 +-
 hw/s390x/ipl.c                 |   9 ++-
 hw/sparc/leon3.c               |   3 +-
 hw/sparc/sun4m.c               |   6 +-
 hw/sparc64/sun4u.c             |   4 +-
 hw/tricore/tricore_testboard.c |   2 +-
 hw/xtensa/sim.c                |  12 ++--
 hw/xtensa/xtfpga.c             |   2 +-
 include/elf.h                  |  10 +++
 include/hw/elf_ops.h           |  72 ++++++++++++++++++++
 include/hw/loader.h            |   9 ++-
 include/hw/xen/start_info.h    | 146 +++++++++++++++++++++++++++++++++++++++++
 44 files changed, 469 insertions(+), 71 deletions(-)
 create mode 100644 include/hw/xen/start_info.h

Comments

no-reply@patchew.org Dec. 26, 2018, 5:08 p.m. UTC | #1
Patchew URL: https://patchew.org/QEMU/1545422632-24444-1-git-send-email-liam.merwick@oracle.com/



Hi,

This series seems to have some coding style problems. See output below for
more information:

Message-id: 1545422632-24444-1-git-send-email-liam.merwick@oracle.com
Type: series
Subject: [Qemu-devel] [RFC v2 0/4] QEMU changes to do PVH boot

=== TEST SCRIPT BEGIN ===
#!/bin/bash

BASE=base
n=1
total=$(git log --oneline $BASE.. | wc -l)
failed=0

git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram

commits="$(git log --format=%H --reverse $BASE..)"
for c in $commits; do
    echo "Checking PATCH $n/$total: $(git log -n 1 --format=%s $c)..."
    if ! git show $c --format=email | ./scripts/checkpatch.pl --mailback -; then
        failed=1
        echo
    fi
    n=$((n+1))
done

exit $failed
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
Switched to a new branch 'test'
ae98c15 pvh: Boot uncompressed kernel using direct boot ABI
b99fe71 pvh: Add x86/HVM direct boot ABI header file
721dd87 elf-ops.h: Add get_elf_note_type()
4b86551 elf: Add optional function ptr to load_elf() to parse ELF notes

=== OUTPUT BEGIN ===
Checking PATCH 1/4: elf: Add optional function ptr to load_elf() to parse ELF notes...
Checking PATCH 2/4: elf-ops.h: Add get_elf_note_type()...
WARNING: Block comments use a leading /* on a separate line
#21: FILE: include/hw/elf_ops.h:268:
+/* Given 'nhdr', a pointer to a range of ELF Notes, search through them

WARNING: Block comments use a leading /* on a separate line
#49: FILE: include/hw/elf_ops.h:296:
+        /* If the offset calculated in this iteration exceeds the

ERROR: code indent should never use tabs
#50: FILE: include/hw/elf_ops.h:297:
+^I * supplied size, we are done and no matching note was found.$

ERROR: code indent should never use tabs
#51: FILE: include/hw/elf_ops.h:298:
+^I */$

ERROR: code indent should never use tabs
#71: FILE: include/hw/elf_ops.h:558:
+^I    /* Search the ELF notes to find one with a type matching the$

WARNING: Block comments use a leading /* on a separate line
#71: FILE: include/hw/elf_ops.h:558:
+           /* Search the ELF notes to find one with a type matching the

ERROR: code indent should never use tabs
#72: FILE: include/hw/elf_ops.h:559:
+^I     * value passed in via 'translate_opaque'$

ERROR: code indent should never use tabs
#73: FILE: include/hw/elf_ops.h:560:
+^I     */$

ERROR: code indent should never use tabs
#75: FILE: include/hw/elf_ops.h:562:
+^I    assert(translate_opaque != NULL);$

total: 6 errors, 3 warnings, 62 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

Checking PATCH 3/4: pvh: Add x86/HVM direct boot ABI header file...
WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
#22: 
new file mode 100644

WARNING: architecture specific defines should be avoided
#49: FILE: include/hw/xen/start_info.h:23:
+#ifndef __XEN_PUBLIC_ARCH_X86_HVM_START_INFO_H__

total: 0 errors, 2 warnings, 146 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
Checking PATCH 4/4: pvh: Boot uncompressed kernel using direct boot ABI...
=== OUTPUT END ===

Test command exited with code: 1


The full log is available at
http://patchew.org/logs/1545422632-24444-1-git-send-email-liam.merwick@oracle.com/testing.checkpatch/?type=message.
---
Email generated automatically by Patchew [http://patchew.org/].
Please send your feedback to patchew-devel@redhat.com
Stefano Garzarella Jan. 3, 2019, 5:22 p.m. UTC | #2
Hi Liam, Hi Maran,
I'm writing the optionrom to do PVH boot also with SeaBIOS.
It is almost complete and I'm testing it, but I have some issue with
QEMU -initrd parameter.
(It works correctly without -initrd and using a kernel with all needed
modules compiled statically)

Linux boots correctly, but it is not able to find the ramdisk. (I have
the same behavior with qboot)
Looking at Linux, QEMU, and qboot patches, I understood that the first
module pointed by 'modlist_paddr' in the 'hvm_start_info' should be
used to pass the ramdisk address and size to the kernel, but I didn't
understand who load it in RAM. (I guess QEMU directly or the firmware
by fw_cfg interface)

Can you give me some suggestions?

Thanks,
Stefano

On Fri, Dec 21, 2018 at 9:07 PM Liam Merwick <liam.merwick@oracle.com> wrote:
>
> For certain applications it is desirable to rapidly boot a KVM virtual
> machine. In cases where legacy hardware and software support within the
> guest is not needed, QEMU should be able to boot directly into the
> uncompressed Linux kernel binary with minimal firmware involvement.
>
> There already exists an ABI to allow this for Xen PVH guests and the ABI
> is supported by Linux and FreeBSD:
>
>    https://xenbits.xen.org/docs/unstable/misc/pvh.html
>
> Details on the Linux changes (v9 staged for 4.21): https://lkml.org/lkml/2018/12/14/1330
> qboot pull request: https://github.com/bonzini/qboot/pull/17
>
> This patch series provides QEMU support to read the ELF header of an
> uncompressed kernel binary and get the 32-bit PVH kernel entry point
> from an ELF Note.  In load_linux() a call is made to load_elfboot()
> so see if the header matches that of an uncompressed kernel binary (ELF)
> and if so, loads the binary and determines the kernel entry address
> from an ELF Note in the binary.  Then qboot does futher initialisation
> of the guest (e820, etc.) and jumps to the kernel entry address and
> boots the guest.
>
> changes v1 -> v2
> - Based on feedback from Stefan Hajnoczi
> - The reading of the PVH entry point is now done in a single pass during
>   elf_load() which results in Patch2 in v1 being split into Patches 1&2 in v2
>   and considerably reworked.
> - Patch1 adds a new optional function pointer to parse the ELF note type
>   (the type is passed in via the existing translate_opaque arg - the
>   function already had 11 args so I didn't want to add more than one new arg).
> - Patch2 adds a function to elf_ops.h to find an ELF note
>   matching a specific type
> - Patch3 just has a line added to the commit message to state that the Xen
>   repo is the canonical location
> - Patch4 (that does the PVH boot) is mainly equivalent to Patch3 in v1
>   just minor load_elfboot() changes and the addition of a
>   read_pvh_start_addr() helper function for load_elf()
>
>
> Usіng the method/scripts documented by the NEMU team at
>
>    https://github.com/intel/nemu/wiki/Measuring-Boot-Latency
>    https://lists.gnu.org/archive/html/qemu-devel/2018-12/msg00200.html
>
> below are some timings measured (vmlinux and bzImage from the same build)
> Time to get to kernel start is almost halved (95ṁs -> 48ms)
>
> QEMU + qboot + vmlinux (PVH + 4.20-rc4)
>  qemu_init_end: 41.550521
>  fw_start: 41.667139 (+0.116618)
>  fw_do_boot: 47.448495 (+5.781356)
>  linux_startup_64: 47.720785 (+0.27229)
>  linux_start_kernel: 48.399541 (+0.678756)
>  linux_start_user: 296.952056 (+248.552515)
>
> QEMU + qboot + bzImage:
>  qemu_init_end: 29.209276
>  fw_start: 29.317342 (+0.108066)
>  linux_start_boot: 36.679362 (+7.36202)
>  linux_startup_64: 94.531349 (+57.851987)
>  linux_start_kernel: 94.900913 (+0.369564)
>  linux_start_user: 401.060971 (+306.160058)
>
> QEMU + bzImage:
>  qemu_init_end: 30.424430
>  linux_startup_64: 893.770334 (+863.345904)
>  linux_start_kernel: 894.17049 (+0.400156)
>  linux_start_user: 1208.679768 (+314.509278)
>
>
> Liam Merwick (4):
>   elf: Add optional function ptr to load_elf() to parse ELF notes
>   elf-ops.h: Add get_elf_note_type()
>   pvh: Add x86/HVM direct boot ABI header file
>   pvh: Boot uncompressed kernel using direct boot ABI
>
>  hw/alpha/dp264.c               |   4 +-
>  hw/arm/armv7m.c                |   3 +-
>  hw/arm/boot.c                  |   2 +-
>  hw/core/generic-loader.c       |   2 +-
>  hw/core/loader.c               |  24 ++++---
>  hw/cris/boot.c                 |   3 +-
>  hw/hppa/machine.c              |   6 +-
>  hw/i386/multiboot.c            |   2 +-
>  hw/i386/pc.c                   | 131 +++++++++++++++++++++++++++++++++++-
>  hw/lm32/lm32_boards.c          |   6 +-
>  hw/lm32/milkymist.c            |   3 +-
>  hw/m68k/an5206.c               |   2 +-
>  hw/m68k/mcf5208.c              |   2 +-
>  hw/microblaze/boot.c           |   7 +-
>  hw/mips/mips_fulong2e.c        |   5 +-
>  hw/mips/mips_malta.c           |   5 +-
>  hw/mips/mips_mipssim.c         |   5 +-
>  hw/mips/mips_r4k.c             |   5 +-
>  hw/moxie/moxiesim.c            |   2 +-
>  hw/nios2/boot.c                |   7 +-
>  hw/openrisc/openrisc_sim.c     |   2 +-
>  hw/pci-host/prep.c             |   2 +-
>  hw/ppc/e500.c                  |   3 +-
>  hw/ppc/mac_newworld.c          |   5 +-
>  hw/ppc/mac_oldworld.c          |   5 +-
>  hw/ppc/ppc440_bamboo.c         |   2 +-
>  hw/ppc/sam460ex.c              |   3 +-
>  hw/ppc/spapr.c                 |   7 +-
>  hw/ppc/virtex_ml507.c          |   2 +-
>  hw/riscv/sifive_e.c            |   2 +-
>  hw/riscv/sifive_u.c            |   2 +-
>  hw/riscv/spike.c               |   2 +-
>  hw/riscv/virt.c                |   2 +-
>  hw/s390x/ipl.c                 |   9 ++-
>  hw/sparc/leon3.c               |   3 +-
>  hw/sparc/sun4m.c               |   6 +-
>  hw/sparc64/sun4u.c             |   4 +-
>  hw/tricore/tricore_testboard.c |   2 +-
>  hw/xtensa/sim.c                |  12 ++--
>  hw/xtensa/xtfpga.c             |   2 +-
>  include/elf.h                  |  10 +++
>  include/hw/elf_ops.h           |  72 ++++++++++++++++++++
>  include/hw/loader.h            |   9 ++-
>  include/hw/xen/start_info.h    | 146 +++++++++++++++++++++++++++++++++++++++++
>  44 files changed, 469 insertions(+), 71 deletions(-)
>  create mode 100644 include/hw/xen/start_info.h
>
> --
> 1.8.3.1
>

--
Stefano Garzarella
Red Hat
Liam Merwick Jan. 8, 2019, 2:47 p.m. UTC | #3
Hi Stefano,

[ Catching up on mail after vacation ]

On 03/01/2019 17:22, Stefano Garzarella wrote:
> Hi Liam, Hi Maran,
> I'm writing the optionrom to do PVH boot also with SeaBIOS.
> It is almost complete and I'm testing it, but I have some issue with
> QEMU -initrd parameter.
> (It works correctly without -initrd and using a kernel with all needed
> modules compiled statically)
> 
> Linux boots correctly, but it is not able to find the ramdisk. (I have
> the same behavior with qboot)
> Looking at Linux, QEMU, and qboot patches, I understood that the first
> module pointed by 'modlist_paddr' in the 'hvm_start_info' should be
> used to pass the ramdisk address and size to the kernel, but I didn't
> understand who load it in RAM. (I guess QEMU directly or the firmware
> by fw_cfg interface)
> 
> Can you give me some suggestions?
> 


QEMU sets the hvm_modlist_entry in load_linux() after the call to 
load_elfboot() and then qboot loads it in boot_pvh_from_fw_cfg()

But the current PVH patches don't handle initrd (they have 
start_info.nr_modules == 1).

During (or after) the call to load_elfboot() it looks like we'd need to 
do something like what load_multiboot() does below (along with the 
associated initialisation)

400     fw_cfg_add_i32(fw_cfg, FW_CFG_INITRD_ADDR, ADDR_MBI);
401     fw_cfg_add_i32(fw_cfg, FW_CFG_INITRD_SIZE, sizeof(bootinfo));
402     fw_cfg_add_bytes(fw_cfg, FW_CFG_INITRD_DATA, mb_bootinfo_data,
403                      sizeof(bootinfo));

I'm checking to see if that has any implications for the kernel side.

Regards,
Liam

> 
> On Fri, Dec 21, 2018 at 9:07 PM Liam Merwick <liam.merwick@oracle.com> wrote:
>>
>> For certain applications it is desirable to rapidly boot a KVM virtual
>> machine. In cases where legacy hardware and software support within the
>> guest is not needed, QEMU should be able to boot directly into the
>> uncompressed Linux kernel binary with minimal firmware involvement.
>>
>> There already exists an ABI to allow this for Xen PVH guests and the ABI
>> is supported by Linux and FreeBSD:
>>
>>     https://xenbits.xen.org/docs/unstable/misc/pvh.html
>>
>> Details on the Linux changes (v9 staged for 4.21): https://lkml.org/lkml/2018/12/14/1330
>> qboot pull request: https://github.com/bonzini/qboot/pull/17
>>
>> This patch series provides QEMU support to read the ELF header of an
>> uncompressed kernel binary and get the 32-bit PVH kernel entry point
>> from an ELF Note.  In load_linux() a call is made to load_elfboot()
>> so see if the header matches that of an uncompressed kernel binary (ELF)
>> and if so, loads the binary and determines the kernel entry address
>> from an ELF Note in the binary.  Then qboot does futher initialisation
>> of the guest (e820, etc.) and jumps to the kernel entry address and
>> boots the guest.
>>
>> changes v1 -> v2
>> - Based on feedback from Stefan Hajnoczi
>> - The reading of the PVH entry point is now done in a single pass during
>>    elf_load() which results in Patch2 in v1 being split into Patches 1&2 in v2
>>    and considerably reworked.
>> - Patch1 adds a new optional function pointer to parse the ELF note type
>>    (the type is passed in via the existing translate_opaque arg - the
>>    function already had 11 args so I didn't want to add more than one new arg).
>> - Patch2 adds a function to elf_ops.h to find an ELF note
>>    matching a specific type
>> - Patch3 just has a line added to the commit message to state that the Xen
>>    repo is the canonical location
>> - Patch4 (that does the PVH boot) is mainly equivalent to Patch3 in v1
>>    just minor load_elfboot() changes and the addition of a
>>    read_pvh_start_addr() helper function for load_elf()
>>
>>
>> Usіng the method/scripts documented by the NEMU team at
>>
>>     https://github.com/intel/nemu/wiki/Measuring-Boot-Latency
>>     https://lists.gnu.org/archive/html/qemu-devel/2018-12/msg00200.html
>>
>> below are some timings measured (vmlinux and bzImage from the same build)
>> Time to get to kernel start is almost halved (95ṁs -> 48ms)
>>
>> QEMU + qboot + vmlinux (PVH + 4.20-rc4)
>>   qemu_init_end: 41.550521
>>   fw_start: 41.667139 (+0.116618)
>>   fw_do_boot: 47.448495 (+5.781356)
>>   linux_startup_64: 47.720785 (+0.27229)
>>   linux_start_kernel: 48.399541 (+0.678756)
>>   linux_start_user: 296.952056 (+248.552515)
>>
>> QEMU + qboot + bzImage:
>>   qemu_init_end: 29.209276
>>   fw_start: 29.317342 (+0.108066)
>>   linux_start_boot: 36.679362 (+7.36202)
>>   linux_startup_64: 94.531349 (+57.851987)
>>   linux_start_kernel: 94.900913 (+0.369564)
>>   linux_start_user: 401.060971 (+306.160058)
>>
>> QEMU + bzImage:
>>   qemu_init_end: 30.424430
>>   linux_startup_64: 893.770334 (+863.345904)
>>   linux_start_kernel: 894.17049 (+0.400156)
>>   linux_start_user: 1208.679768 (+314.509278)
>>
>>
>> Liam Merwick (4):
>>    elf: Add optional function ptr to load_elf() to parse ELF notes
>>    elf-ops.h: Add get_elf_note_type()
>>    pvh: Add x86/HVM direct boot ABI header file
>>    pvh: Boot uncompressed kernel using direct boot ABI
>>
>>   hw/alpha/dp264.c               |   4 +-
>>   hw/arm/armv7m.c                |   3 +-
>>   hw/arm/boot.c                  |   2 +-
>>   hw/core/generic-loader.c       |   2 +-
>>   hw/core/loader.c               |  24 ++++---
>>   hw/cris/boot.c                 |   3 +-
>>   hw/hppa/machine.c              |   6 +-
>>   hw/i386/multiboot.c            |   2 +-
>>   hw/i386/pc.c                   | 131 +++++++++++++++++++++++++++++++++++-
>>   hw/lm32/lm32_boards.c          |   6 +-
>>   hw/lm32/milkymist.c            |   3 +-
>>   hw/m68k/an5206.c               |   2 +-
>>   hw/m68k/mcf5208.c              |   2 +-
>>   hw/microblaze/boot.c           |   7 +-
>>   hw/mips/mips_fulong2e.c        |   5 +-
>>   hw/mips/mips_malta.c           |   5 +-
>>   hw/mips/mips_mipssim.c         |   5 +-
>>   hw/mips/mips_r4k.c             |   5 +-
>>   hw/moxie/moxiesim.c            |   2 +-
>>   hw/nios2/boot.c                |   7 +-
>>   hw/openrisc/openrisc_sim.c     |   2 +-
>>   hw/pci-host/prep.c             |   2 +-
>>   hw/ppc/e500.c                  |   3 +-
>>   hw/ppc/mac_newworld.c          |   5 +-
>>   hw/ppc/mac_oldworld.c          |   5 +-
>>   hw/ppc/ppc440_bamboo.c         |   2 +-
>>   hw/ppc/sam460ex.c              |   3 +-
>>   hw/ppc/spapr.c                 |   7 +-
>>   hw/ppc/virtex_ml507.c          |   2 +-
>>   hw/riscv/sifive_e.c            |   2 +-
>>   hw/riscv/sifive_u.c            |   2 +-
>>   hw/riscv/spike.c               |   2 +-
>>   hw/riscv/virt.c                |   2 +-
>>   hw/s390x/ipl.c                 |   9 ++-
>>   hw/sparc/leon3.c               |   3 +-
>>   hw/sparc/sun4m.c               |   6 +-
>>   hw/sparc64/sun4u.c             |   4 +-
>>   hw/tricore/tricore_testboard.c |   2 +-
>>   hw/xtensa/sim.c                |  12 ++--
>>   hw/xtensa/xtfpga.c             |   2 +-
>>   include/elf.h                  |  10 +++
>>   include/hw/elf_ops.h           |  72 ++++++++++++++++++++
>>   include/hw/loader.h            |   9 ++-
>>   include/hw/xen/start_info.h    | 146 +++++++++++++++++++++++++++++++++++++++++
>>   44 files changed, 469 insertions(+), 71 deletions(-)
>>   create mode 100644 include/hw/xen/start_info.h
>>
>> --
>> 1.8.3.1
>>
> 
> --
> Stefano Garzarella
> Red Hat
>
Stefano Garzarella Jan. 9, 2019, 11:53 a.m. UTC | #4
Hi Liam,

On Tue, Jan 8, 2019 at 3:47 PM Liam Merwick <liam.merwick@oracle.com> wrote:
> QEMU sets the hvm_modlist_entry in load_linux() after the call to
> load_elfboot() and then qboot loads it in boot_pvh_from_fw_cfg()
>
> But the current PVH patches don't handle initrd (they have
> start_info.nr_modules == 1).

Looking in the linux kernel (arch/x86/platform/pvh/enlighten.c) I saw:
    /* The first module is always ramdisk. */
    if (pvh_start_info.nr_modules) {
        struct hvm_modlist_entry *modaddr =
            __va(pvh_start_info.modlist_paddr);
        pvh_bootparams.hdr.ramdisk_image = modaddr->paddr;
        pvh_bootparams.hdr.ramdisk_size = modaddr->size;
    }

So, putting start_info.nr_modules = 1, means that the first
hvm_modlist_entry should have the ramdisk paddr and size. Is it
correct?


>
> During (or after) the call to load_elfboot() it looks like we'd need to
> do something like what load_multiboot() does below (along with the
> associated initialisation)
>
> 400     fw_cfg_add_i32(fw_cfg, FW_CFG_INITRD_ADDR, ADDR_MBI);
> 401     fw_cfg_add_i32(fw_cfg, FW_CFG_INITRD_SIZE, sizeof(bootinfo));
> 402     fw_cfg_add_bytes(fw_cfg, FW_CFG_INITRD_DATA, mb_bootinfo_data,
> 403                      sizeof(bootinfo));
>

In this case I think they used the FW_CFG_INITRD_* to pass bootinfo
varibales to the guest, maybe we could add something like what
linux_load() does:

    /* load initrd */
    if (initrd_filename) {
        ...
        initrd_addr = (initrd_max-initrd_size) & ~4095;

        fw_cfg_add_i32(fw_cfg, FW_CFG_INITRD_ADDR, initrd_addr);
        fw_cfg_add_i32(fw_cfg, FW_CFG_INITRD_SIZE, initrd_size);
        fw_cfg_add_bytes(fw_cfg, FW_CFG_INITRD_DATA, initrd_data, initrd_size);
        ...
    }

Then we can load the initrd in qboot or in the optionrom that I'm writing.

What do you think?

Thanks,
Stefano

> I'm checking to see if that has any implications for the kernel side.
>
> Regards,
> Liam
>
Boris Ostrovsky Jan. 9, 2019, 7:53 p.m. UTC | #5
On 1/9/19 6:53 AM, Stefano Garzarella wrote:
> Hi Liam,
>
> On Tue, Jan 8, 2019 at 3:47 PM Liam Merwick <liam.merwick@oracle.com> wrote:
>> QEMU sets the hvm_modlist_entry in load_linux() after the call to
>> load_elfboot() and then qboot loads it in boot_pvh_from_fw_cfg()
>>
>> But the current PVH patches don't handle initrd (they have
>> start_info.nr_modules == 1).
> Looking in the linux kernel (arch/x86/platform/pvh/enlighten.c) I saw:
>     /* The first module is always ramdisk. */
>     if (pvh_start_info.nr_modules) {
>         struct hvm_modlist_entry *modaddr =
>             __va(pvh_start_info.modlist_paddr);
>         pvh_bootparams.hdr.ramdisk_image = modaddr->paddr;
>         pvh_bootparams.hdr.ramdisk_size = modaddr->size;
>     }
>
> So, putting start_info.nr_modules = 1, means that the first
> hvm_modlist_entry should have the ramdisk paddr and size. Is it
> correct?
>
>
>> During (or after) the call to load_elfboot() it looks like we'd need to
>> do something like what load_multiboot() does below (along with the
>> associated initialisation)
>>
>> 400     fw_cfg_add_i32(fw_cfg, FW_CFG_INITRD_ADDR, ADDR_MBI);
>> 401     fw_cfg_add_i32(fw_cfg, FW_CFG_INITRD_SIZE, sizeof(bootinfo));
>> 402     fw_cfg_add_bytes(fw_cfg, FW_CFG_INITRD_DATA, mb_bootinfo_data,
>> 403                      sizeof(bootinfo));
>>
> In this case I think they used the FW_CFG_INITRD_* to pass bootinfo
> varibales to the guest, maybe we could add something like what
> linux_load() does:
>
>     /* load initrd */
>     if (initrd_filename) {
>         ...
>         initrd_addr = (initrd_max-initrd_size) & ~4095;
>
>         fw_cfg_add_i32(fw_cfg, FW_CFG_INITRD_ADDR, initrd_addr);
>         fw_cfg_add_i32(fw_cfg, FW_CFG_INITRD_SIZE, initrd_size);
>         fw_cfg_add_bytes(fw_cfg, FW_CFG_INITRD_DATA, initrd_data, initrd_size);
>         ...
>     }
>
> Then we can load the initrd in qboot or in the optionrom that I'm writing.
>
> What do you think?


Why not specify this in pvh_start_info? This will be much faster for
everyone, no need to go through fw_cfg.

-boris
Maran Wilson Jan. 9, 2019, 9:18 p.m. UTC | #6
On 1/9/2019 11:53 AM, Boris Ostrovsky wrote:
> On 1/9/19 6:53 AM, Stefano Garzarella wrote:
>> Hi Liam,
>>
>> On Tue, Jan 8, 2019 at 3:47 PM Liam Merwick <liam.merwick@oracle.com> wrote:
>>> QEMU sets the hvm_modlist_entry in load_linux() after the call to
>>> load_elfboot() and then qboot loads it in boot_pvh_from_fw_cfg()
>>>
>>> But the current PVH patches don't handle initrd (they have
>>> start_info.nr_modules == 1).
>> Looking in the linux kernel (arch/x86/platform/pvh/enlighten.c) I saw:
>>      /* The first module is always ramdisk. */
>>      if (pvh_start_info.nr_modules) {
>>          struct hvm_modlist_entry *modaddr =
>>              __va(pvh_start_info.modlist_paddr);
>>          pvh_bootparams.hdr.ramdisk_image = modaddr->paddr;
>>          pvh_bootparams.hdr.ramdisk_size = modaddr->size;
>>      }
>>
>> So, putting start_info.nr_modules = 1, means that the first
>> hvm_modlist_entry should have the ramdisk paddr and size. Is it
>> correct?

That's my understanding.

I think what's missing, is that we just need Qemu or qboot/seabios to 
properly populate the pvh_start_info.modlist_paddr with the address (as 
usable by the guest) of the hvm_modlist_entry which correctly defines 
the details of the initrd that has already been loaded into memory that 
is accessible by the guest.

-Maran

>>> During (or after) the call to load_elfboot() it looks like we'd need to
>>> do something like what load_multiboot() does below (along with the
>>> associated initialisation)
>>>
>>> 400     fw_cfg_add_i32(fw_cfg, FW_CFG_INITRD_ADDR, ADDR_MBI);
>>> 401     fw_cfg_add_i32(fw_cfg, FW_CFG_INITRD_SIZE, sizeof(bootinfo));
>>> 402     fw_cfg_add_bytes(fw_cfg, FW_CFG_INITRD_DATA, mb_bootinfo_data,
>>> 403                      sizeof(bootinfo));
>>>
>> In this case I think they used the FW_CFG_INITRD_* to pass bootinfo
>> varibales to the guest, maybe we could add something like what
>> linux_load() does:
>>
>>      /* load initrd */
>>      if (initrd_filename) {
>>          ...
>>          initrd_addr = (initrd_max-initrd_size) & ~4095;
>>
>>          fw_cfg_add_i32(fw_cfg, FW_CFG_INITRD_ADDR, initrd_addr);
>>          fw_cfg_add_i32(fw_cfg, FW_CFG_INITRD_SIZE, initrd_size);
>>          fw_cfg_add_bytes(fw_cfg, FW_CFG_INITRD_DATA, initrd_data, initrd_size);
>>          ...
>>      }
>>
>> Then we can load the initrd in qboot or in the optionrom that I'm writing.
>>
>> What do you think?
>
> Why not specify this in pvh_start_info? This will be much faster for
> everyone, no need to go through fw_cfg.
>
> -boris
>
Stefano Garzarella Jan. 10, 2019, 3:12 p.m. UTC | #7
On Wed, Jan 09, 2019 at 01:18:12PM -0800, Maran Wilson wrote:
> On 1/9/2019 11:53 AM, Boris Ostrovsky wrote:
> > On 1/9/19 6:53 AM, Stefano Garzarella wrote:
> > > Hi Liam,
> > > 
> > > On Tue, Jan 8, 2019 at 3:47 PM Liam Merwick <liam.merwick@oracle.com> wrote:
> > > > QEMU sets the hvm_modlist_entry in load_linux() after the call to
> > > > load_elfboot() and then qboot loads it in boot_pvh_from_fw_cfg()
> > > > 
> > > > But the current PVH patches don't handle initrd (they have
> > > > start_info.nr_modules == 1).
> > > Looking in the linux kernel (arch/x86/platform/pvh/enlighten.c) I saw:
> > >      /* The first module is always ramdisk. */
> > >      if (pvh_start_info.nr_modules) {
> > >          struct hvm_modlist_entry *modaddr =
> > >              __va(pvh_start_info.modlist_paddr);
> > >          pvh_bootparams.hdr.ramdisk_image = modaddr->paddr;
> > >          pvh_bootparams.hdr.ramdisk_size = modaddr->size;
> > >      }
> > > 
> > > So, putting start_info.nr_modules = 1, means that the first
> > > hvm_modlist_entry should have the ramdisk paddr and size. Is it
> > > correct?
> 
> That's my understanding.
> 
> I think what's missing, is that we just need Qemu or qboot/seabios to
> properly populate the pvh_start_info.modlist_paddr with the address (as
> usable by the guest) of the hvm_modlist_entry which correctly defines the
> details of the initrd that has already been loaded into memory that is
> accessible by the guest.
> 
> -Maran
> 

I tried and it works, I modified QEMU to load the initrd and to expose it
through fw_cfg, then qboot loads it and set correctly the hvm_modlist_entry.

You can find the patch of QEMU at the end of this email and the qboot
patch here: https://github.com/stefano-garzarella/qboot/commit/41e1fd765c8419e270fd79d9b3af5d53576e88a8

Do you think can be a good approach? If you want, you can add this patch
to your series.

Thanks,
Stefano


From d5c0d51768f5a8fb214be6c2bb0cb7e86e9917b7 Mon Sep 17 00:00:00 2001
From: Stefano Garzarella <sgarzare@redhat.com>
Date: Thu, 10 Jan 2019 15:16:44 +0100
Subject: [PATCH] pvh: load initrd and expose it through fw_cfg

When initrd is specified, load and expose it to the guest firmware
through fw_cfg. The firmware will fill the hvm_start_info for the
kernel.

Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Based-on: <1545422632-24444-5-git-send-email-liam.merwick@oracle.com>
---
 hw/i386/pc.c | 38 +++++++++++++++++++++++++++++---------
 1 file changed, 29 insertions(+), 9 deletions(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 06bce6a101..f6721f51be 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -986,25 +986,45 @@ static void load_linux(PCMachineState *pcms,
          */
         if (load_elfboot(kernel_filename, kernel_size,
                          header, pvh_start_addr, fw_cfg)) {
-            struct hvm_modlist_entry ramdisk_mod = { 0 };
-
             fclose(f);
 
             fw_cfg_add_i32(fw_cfg, FW_CFG_CMDLINE_SIZE,
                 strlen(kernel_cmdline) + 1);
             fw_cfg_add_string(fw_cfg, FW_CFG_CMDLINE_DATA, kernel_cmdline);
 
-            assert(machine->device_memory != NULL);
-            ramdisk_mod.paddr = machine->device_memory->base;
-            ramdisk_mod.size =
-                memory_region_size(&machine->device_memory->mr);
-
-            fw_cfg_add_bytes(fw_cfg, FW_CFG_KERNEL_DATA, &ramdisk_mod,
-                             sizeof(ramdisk_mod));
             fw_cfg_add_i32(fw_cfg, FW_CFG_SETUP_SIZE, sizeof(header));
             fw_cfg_add_bytes(fw_cfg, FW_CFG_SETUP_DATA,
                              header, sizeof(header));
 
+            /* load initrd */
+            if (initrd_filename) {
+                gsize initrd_size;
+                gchar *initrd_data;
+                GError *gerr = NULL;
+
+                if (!g_file_get_contents(initrd_filename, &initrd_data,
+                            &initrd_size, &gerr)) {
+                    fprintf(stderr, "qemu: error reading initrd %s: %s\n",
+                            initrd_filename, gerr->message);
+                    exit(1);
+                }
+
+                initrd_max = pcms->below_4g_mem_size - pcmc->acpi_data_size - 1;
+                if (initrd_size >= initrd_max) {
+                    fprintf(stderr, "qemu: initrd is too large, cannot support."
+                            "(max: %"PRIu32", need %"PRId64")\n",
+                            initrd_max, (uint64_t)initrd_size);
+                    exit(1);
+                }
+
+                initrd_addr = (initrd_max - initrd_size) & ~4095;
+
+                fw_cfg_add_i32(fw_cfg, FW_CFG_INITRD_ADDR, initrd_addr);
+                fw_cfg_add_i32(fw_cfg, FW_CFG_INITRD_SIZE, initrd_size);
+                fw_cfg_add_bytes(fw_cfg, FW_CFG_INITRD_DATA, initrd_data,
+                                 initrd_size);
+            }
+
             return;
         }
         /* This looks like a multiboot kernel. If it is, let's stop
Liam Merwick Jan. 15, 2019, 9:51 a.m. UTC | #8
Hi Stefano,

On 10/01/2019 15:12, Stefano Garzarella wrote:
> On Wed, Jan 09, 2019 at 01:18:12PM -0800, Maran Wilson wrote:
>> On 1/9/2019 11:53 AM, Boris Ostrovsky wrote:
>>> On 1/9/19 6:53 AM, Stefano Garzarella wrote:
>>>> Hi Liam,
>>>>
>>>> On Tue, Jan 8, 2019 at 3:47 PM Liam Merwick <liam.merwick@oracle.com> wrote:
>>>>> QEMU sets the hvm_modlist_entry in load_linux() after the call to
>>>>> load_elfboot() and then qboot loads it in boot_pvh_from_fw_cfg()
>>>>>
>>>>> But the current PVH patches don't handle initrd (they have
>>>>> start_info.nr_modules == 1).
>>>> Looking in the linux kernel (arch/x86/platform/pvh/enlighten.c) I saw:
>>>>       /* The first module is always ramdisk. */
>>>>       if (pvh_start_info.nr_modules) {
>>>>           struct hvm_modlist_entry *modaddr =
>>>>               __va(pvh_start_info.modlist_paddr);
>>>>           pvh_bootparams.hdr.ramdisk_image = modaddr->paddr;
>>>>           pvh_bootparams.hdr.ramdisk_size = modaddr->size;
>>>>       }
>>>>
>>>> So, putting start_info.nr_modules = 1, means that the first
>>>> hvm_modlist_entry should have the ramdisk paddr and size. Is it
>>>> correct?
>>
>> That's my understanding.
>>
>> I think what's missing, is that we just need Qemu or qboot/seabios to
>> properly populate the pvh_start_info.modlist_paddr with the address (as
>> usable by the guest) of the hvm_modlist_entry which correctly defines the
>> details of the initrd that has already been loaded into memory that is
>> accessible by the guest.
>>
>> -Maran
>>
> 
> I tried and it works, I modified QEMU to load the initrd and to expose it
> through fw_cfg, then qboot loads it and set correctly the hvm_modlist_entry.
> 
> You can find the patch of QEMU at the end of this email and the qboot
> patch here: https://github.com/stefano-garzarella/qboot/commit/41e1fd765c8419e270fd79d9b3af5d53576e88a8
> 
> Do you think can be a good approach? If you want, you can add this patch
> to your series.

Code looks good to me. I'll include it with v3 of my QEMU patches.

Regards,
Liam


> 
> Thanks,
> Stefano
> 
> 
>  From d5c0d51768f5a8fb214be6c2bb0cb7e86e9917b7 Mon Sep 17 00:00:00 2001
> From: Stefano Garzarella <sgarzare@redhat.com>
> Date: Thu, 10 Jan 2019 15:16:44 +0100
> Subject: [PATCH] pvh: load initrd and expose it through fw_cfg
> 
> When initrd is specified, load and expose it to the guest firmware
> through fw_cfg. The firmware will fill the hvm_start_info for the
> kernel.
> 
> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> Based-on: <1545422632-24444-5-git-send-email-liam.merwick@oracle.com>
> ---
>   hw/i386/pc.c | 38 +++++++++++++++++++++++++++++---------
>   1 file changed, 29 insertions(+), 9 deletions(-)
> 
> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> index 06bce6a101..f6721f51be 100644
> --- a/hw/i386/pc.c
> +++ b/hw/i386/pc.c
> @@ -986,25 +986,45 @@ static void load_linux(PCMachineState *pcms,
>            */
>           if (load_elfboot(kernel_filename, kernel_size,
>                            header, pvh_start_addr, fw_cfg)) {
> -            struct hvm_modlist_entry ramdisk_mod = { 0 };
> -
>               fclose(f);
>   
>               fw_cfg_add_i32(fw_cfg, FW_CFG_CMDLINE_SIZE,
>                   strlen(kernel_cmdline) + 1);
>               fw_cfg_add_string(fw_cfg, FW_CFG_CMDLINE_DATA, kernel_cmdline);
>   
> -            assert(machine->device_memory != NULL);
> -            ramdisk_mod.paddr = machine->device_memory->base;
> -            ramdisk_mod.size =
> -                memory_region_size(&machine->device_memory->mr);
> -
> -            fw_cfg_add_bytes(fw_cfg, FW_CFG_KERNEL_DATA, &ramdisk_mod,
> -                             sizeof(ramdisk_mod));
>               fw_cfg_add_i32(fw_cfg, FW_CFG_SETUP_SIZE, sizeof(header));
>               fw_cfg_add_bytes(fw_cfg, FW_CFG_SETUP_DATA,
>                                header, sizeof(header));
>   
> +            /* load initrd */
> +            if (initrd_filename) {
> +                gsize initrd_size;
> +                gchar *initrd_data;
> +                GError *gerr = NULL;
> +
> +                if (!g_file_get_contents(initrd_filename, &initrd_data,
> +                            &initrd_size, &gerr)) {
> +                    fprintf(stderr, "qemu: error reading initrd %s: %s\n",
> +                            initrd_filename, gerr->message);
> +                    exit(1);
> +                }
> +
> +                initrd_max = pcms->below_4g_mem_size - pcmc->acpi_data_size - 1;
> +                if (initrd_size >= initrd_max) {
> +                    fprintf(stderr, "qemu: initrd is too large, cannot support."
> +                            "(max: %"PRIu32", need %"PRId64")\n",
> +                            initrd_max, (uint64_t)initrd_size);
> +                    exit(1);
> +                }
> +
> +                initrd_addr = (initrd_max - initrd_size) & ~4095;
> +
> +                fw_cfg_add_i32(fw_cfg, FW_CFG_INITRD_ADDR, initrd_addr);
> +                fw_cfg_add_i32(fw_cfg, FW_CFG_INITRD_SIZE, initrd_size);
> +                fw_cfg_add_bytes(fw_cfg, FW_CFG_INITRD_DATA, initrd_data,
> +                                 initrd_size);
> +            }
> +
>               return;
>           }
>           /* This looks like a multiboot kernel. If it is, let's stop
>