mbox series

[v5,00/19] s390x: Add Full Boot Order Support

Message ID 20241020012953.1380075-1-jrossi@linux.ibm.com (mailing list archive)
Headers show
Series s390x: Add Full Boot Order Support | expand

Message

Jared Rossi Oct. 20, 2024, 1:29 a.m. UTC
From: Jared Rossi <jrossi@linux.ibm.com>

changes v4 -> v5:
- Fix a bug with per-deice loadparm support:
    The machine loadparm is no longer overwritten by device values, which now
    allows an empty machine loadparm to propagate to later devices even if
    the primary boot device set an initial loadparm
- Fix two instances where changes were squashed into wrong patch
- Fix an instance where NULL_BLOCK_NR was returned instead of ERROR_BLOCK_NR
- Fix an instance of logical AND being used instead of bitwise AND
- Standardize all error values to be negative in all device type paths
- Minor stylistic changes and code simplification


changes v3 -> v4:
- Ensure signed-ness of return values is appropriate
- Add missing newline character in replacements of sclp_print_int()
- Add a missing return in a SCSI error path
- Restore break that was incorrectly removed for Virtio CU devices
- Remove an extra/early return that caused probing to end early
- Convert "good" device to scsi-cd in a cdrom-test for better coverage
- Minor stylistic clean-ups and variable name clarifications

Jared Rossi (19):
  hw/s390x/ipl: Provide more memory to the s390-ccw.img firmware
  pc-bios/s390-ccw: Use the libc from SLOF and remove sclp prints
  pc-bios/s390-ccw: Link the netboot code into the main s390-ccw.img
    binary
  hw/s390x: Remove the possibility to load the s390-netboot.img binary
  pc-bios/s390-ccw: Merge netboot.mak into the main Makefile
  docs/system/s390x/bootdevices: Update the documentation about network
    booting
  pc-bios/s390-ccw: Remove panics from ISO IPL path
  pc-bios/s390-ccw: Remove panics from ECKD IPL path
  pc-bios/s390-ccw: Remove panics from SCSI IPL path
  pc-bios/s390-ccw: Remove panics from DASD IPL path
  pc-bios/s390-ccw: Remove panics from Netboot IPL path
  pc-bios/s390-ccw: Enable failed IPL to return after error
  include/hw/s390x: Add include files for common IPL structs
  s390x: Add individual loadparm assignment to CCW device
  hw/s390x: Build an IPLB for each boot device
  s390x: Rebuild IPLB for SCSI device directly from DIAG308
  pc-bios/s390x: Enable multi-device boot loop
  docs/system: Update documentation for s390x IPL
  tests/qtest: Add s390x boot order tests to cdrom-test.c

 docs/system/bootindex.rst         |   7 +-
 docs/system/s390x/bootdevices.rst |  29 +-
 pc-bios/s390-ccw/netboot.mak      |  62 ----
 hw/s390x/ccw-device.h             |   2 +
 hw/s390x/ipl.h                    | 123 +-------
 include/hw/s390x/ipl/qipl.h       | 127 +++++++++
 pc-bios/s390-ccw/bootmap.h        |  20 +-
 pc-bios/s390-ccw/cio.h            |   2 +
 pc-bios/s390-ccw/dasd-ipl.h       |   2 +-
 pc-bios/s390-ccw/iplb.h           | 108 ++-----
 pc-bios/s390-ccw/libc.h           |  89 ------
 pc-bios/s390-ccw/s390-ccw.h       |  36 +--
 pc-bios/s390-ccw/virtio.h         |   3 +-
 hw/s390x/ccw-device.c             |  46 +++
 hw/s390x/ipl.c                    | 282 +++++++++---------
 hw/s390x/s390-virtio-ccw.c        |  28 +-
 hw/s390x/sclp.c                   |   9 +-
 pc-bios/s390-ccw/bootmap.c        | 455 ++++++++++++++++++++----------
 pc-bios/s390-ccw/cio.c            |  81 +++---
 pc-bios/s390-ccw/dasd-ipl.c       |  71 ++---
 pc-bios/s390-ccw/jump2ipl.c       |  22 +-
 pc-bios/s390-ccw/libc.c           |  88 ------
 pc-bios/s390-ccw/main.c           |  97 ++++---
 pc-bios/s390-ccw/menu.c           |  51 ++--
 pc-bios/s390-ccw/netmain.c        |  38 ++-
 pc-bios/s390-ccw/sclp.c           |   7 +-
 pc-bios/s390-ccw/virtio-blkdev.c  |  12 +-
 pc-bios/s390-ccw/virtio-net.c     |   7 +-
 pc-bios/s390-ccw/virtio-scsi.c    | 160 +++++++----
 pc-bios/s390-ccw/virtio.c         |  67 +++--
 target/s390x/diag.c               |   9 +-
 tests/qtest/cdrom-test.c          |  24 ++
 pc-bios/meson.build               |   1 -
 pc-bios/s390-ccw/Makefile         |  69 ++++-
 pc-bios/s390-netboot.img          | Bin 67232 -> 0 bytes
 35 files changed, 1158 insertions(+), 1076 deletions(-)
 delete mode 100644 pc-bios/s390-ccw/netboot.mak
 create mode 100644 include/hw/s390x/ipl/qipl.h
 delete mode 100644 pc-bios/s390-ccw/libc.h
 delete mode 100644 pc-bios/s390-ccw/libc.c
 delete mode 100644 pc-bios/s390-netboot.img

Comments

Thomas Huth Oct. 31, 2024, 3:50 p.m. UTC | #1
On 20/10/2024 03.29, jrossi@linux.ibm.com wrote:
> From: Jared Rossi <jrossi@linux.ibm.com>
> 
> changes v4 -> v5:
> - Fix a bug with per-deice loadparm support:
>      The machine loadparm is no longer overwritten by device values, which now
>      allows an empty machine loadparm to propagate to later devices even if
>      the primary boot device set an initial loadparm
> - Fix two instances where changes were squashed into wrong patch
> - Fix an instance where NULL_BLOCK_NR was returned instead of ERROR_BLOCK_NR
> - Fix an instance of logical AND being used instead of bitwise AND
> - Standardize all error values to be negative in all device type paths
> - Minor stylistic changes and code simplification

  Hi Jared!

Our QE Sebastian also had a try with the patches today, and discovered some 
non-working scenarios:

Try to boot from non-working CD image first, then from a working HD image:

dd if=/dev/zero of=/tmp/zero.dat bs=1M count=10
qemu-system-s390x -nographic -accel kvm -m 2G \
  -drive if=none,id=d1,file=/tmp/zero.dat,format=raw,media=cdrom \
  -device virtio-scsi -device scsi-cd,drive=d1,bootindex=1 \
  -drive if=none,file=good-image.qcow2,id=d2 \
  -device virtio-blk,drive=d2,bootindex=2

This outputs something like the following text, then aborts:

LOADPARM=[        ]

Using virtio-scsi.
SCSI CD-ROM detected.
Failed to IPL this ISO image!
LOADPARM=[        ]

Using virtio-blk.
Failed to IPL this ISO image!
ERROR: No suitable device for IPL. Halting...

Looks like the s390-ccw bios is treating the virtio-blk device as CD-ROM in 
this case?

Almost the same setup, first device is again a non-working CD image, but the 
second device is a virtio-net device - results in the same error message (so 
it's likely the same or at least a very similar problem).

Could you please have a look?

  Thanks!
   Thomas
Jared Rossi Nov. 5, 2024, 4:42 p.m. UTC | #2
Hi Thomas, Sebastian,

It looks like this is simply caused by the "is_cdrom" value only ever 
being set
to true.  I think it is a one-line fix that just makes sure to 
initialize the
value to false each time we try a new device:

diff --git a/pc-bios/s390-ccw/main.c b/pc-bios/s390-ccw/main.c
index a4d1c05aac..3fdba0bedc 100644
--- a/pc-bios/s390-ccw/main.c
+++ b/pc-bios/s390-ccw/main.c
@@ -214,6 +214,7 @@ static void boot_setup(void)
  static bool find_boot_device(void)
  {
      VDev *vdev = virtio_get_device();
+    vdev->is_cdrom = false;
      bool found = false;

      switch (iplb.pbt) {

I tested it with the two scenarios you mention and with the existing qtests,
and it seems to work correctly now.

Thanks for finding the mistake,
  Jared Rossi

On 10/31/24 11:50 AM, Thomas Huth wrote:
> On 20/10/2024 03.29, jrossi@linux.ibm.com wrote:
>> From: Jared Rossi <jrossi@linux.ibm.com>
>>
>> changes v4 -> v5:
>> - Fix a bug with per-deice loadparm support:
>>      The machine loadparm is no longer overwritten by device values, 
>> which now
>>      allows an empty machine loadparm to propagate to later devices 
>> even if
>>      the primary boot device set an initial loadparm
>> - Fix two instances where changes were squashed into wrong patch
>> - Fix an instance where NULL_BLOCK_NR was returned instead of 
>> ERROR_BLOCK_NR
>> - Fix an instance of logical AND being used instead of bitwise AND
>> - Standardize all error values to be negative in all device type paths
>> - Minor stylistic changes and code simplification
>
>  Hi Jared!
>
> Our QE Sebastian also had a try with the patches today, and discovered 
> some non-working scenarios:
>
> Try to boot from non-working CD image first, then from a working HD 
> image:
>
> dd if=/dev/zero of=/tmp/zero.dat bs=1M count=10
> qemu-system-s390x -nographic -accel kvm -m 2G \
>  -drive if=none,id=d1,file=/tmp/zero.dat,format=raw,media=cdrom \
>  -device virtio-scsi -device scsi-cd,drive=d1,bootindex=1 \
>  -drive if=none,file=good-image.qcow2,id=d2 \
>  -device virtio-blk,drive=d2,bootindex=2
>
> This outputs something like the following text, then aborts:
>
> LOADPARM=[        ]
>
> Using virtio-scsi.
> SCSI CD-ROM detected.
> Failed to IPL this ISO image!
> LOADPARM=[        ]
>
> Using virtio-blk.
> Failed to IPL this ISO image!
> ERROR: No suitable device for IPL. Halting...
>
> Looks like the s390-ccw bios is treating the virtio-blk device as 
> CD-ROM in this case?
>
> Almost the same setup, first device is again a non-working CD image, 
> but the second device is a virtio-net device - results in the same 
> error message (so it's likely the same or at least a very similar 
> problem).
>
> Could you please have a look?
>
>  Thanks!
>   Thomas
>
Thomas Huth Nov. 6, 2024, 11:10 a.m. UTC | #3
On 05/11/2024 17.42, Jared Rossi wrote:
> Hi Thomas, Sebastian,
> 
> It looks like this is simply caused by the "is_cdrom" value only ever being set
> to true.  I think it is a one-line fix that just makes sure to initialize the
> value to false each time we try a new device:
> 
> diff --git a/pc-bios/s390-ccw/main.c b/pc-bios/s390-ccw/main.c
> index a4d1c05aac..3fdba0bedc 100644
> --- a/pc-bios/s390-ccw/main.c
> +++ b/pc-bios/s390-ccw/main.c
> @@ -214,6 +214,7 @@ static void boot_setup(void)
>   static bool find_boot_device(void)
>   {
>       VDev *vdev = virtio_get_device();
> +    vdev->is_cdrom = false;
>       bool found = false;
> 
>       switch (iplb.pbt) {
> 
> I tested it with the two scenarios you mention and with the existing qtests,
> and it seems to work correctly now.

Agreed, this seems to fix the issue when all devices are properly marked 
with bootindex properties. But it still persists in case the BIOS has to 
scan for the boot device, e.g.:

qemu-system-s390x -accel kvm -m 2G -nographic \
  -drive if=none,id=d1,file=/tmp/bad.dat,format=raw,media=cdrom \
  -device virtio-scsi -device scsi-cd,drive=d1 \
  -drive if=none,id=d2,file=good.qcow2 -device virtio-blk,drive=d2

Isn't there a better place to set the is_cdrom variable?

  Thomas
Jared Rossi Nov. 7, 2024, 8:42 p.m. UTC | #4
On 11/6/24 6:10 AM, Thomas Huth wrote:
> On 05/11/2024 17.42, Jared Rossi wrote:
>> Hi Thomas, Sebastian,
>>
>> It looks like this is simply caused by the "is_cdrom" value only ever 
>> being set
>> to true.  I think it is a one-line fix that just makes sure to 
>> initialize the
>> value to false each time we try a new device:
>>
>> diff --git a/pc-bios/s390-ccw/main.c b/pc-bios/s390-ccw/main.c
>> index a4d1c05aac..3fdba0bedc 100644
>> --- a/pc-bios/s390-ccw/main.c
>> +++ b/pc-bios/s390-ccw/main.c
>> @@ -214,6 +214,7 @@ static void boot_setup(void)
>>   static bool find_boot_device(void)
>>   {
>>       VDev *vdev = virtio_get_device();
>> +    vdev->is_cdrom = false;
>>       bool found = false;
>>
>>       switch (iplb.pbt) {
>>
>> I tested it with the two scenarios you mention and with the existing 
>> qtests,
>> and it seems to work correctly now.
>
> Agreed, this seems to fix the issue when all devices are properly 
> marked with bootindex properties. But it still persists in case the 
> BIOS has to scan for the boot device, e.g.:
>
> qemu-system-s390x -accel kvm -m 2G -nographic \
>  -drive if=none,id=d1,file=/tmp/bad.dat,format=raw,media=cdrom \
>  -device virtio-scsi -device scsi-cd,drive=d1 \
>  -drive if=none,id=d2,file=good.qcow2 -device virtio-blk,drive=d2
>
> Isn't there a better place to set the is_cdrom variable?
>
>  Thomas
>

Hi Thomas,

What I found is that the original issue with clearing the "is_cdrom" 
value is
just as easily fixed for both indexed devices and probed devices by moving
where "is_cdrom" is set to false:

diff --git a/pc-bios/s390-ccw/main.c b/pc-bios/s390-ccw/main.c
index a4d1c05aac..7509755e36 100644
--- a/pc-bios/s390-ccw/main.c
+++ b/pc-bios/s390-ccw/main.c
@@ -242,6 +242,7 @@ static bool find_boot_device(void)
  static int virtio_setup(void)
  {
      VDev *vdev = virtio_get_device();
+    vdev->is_cdrom = false;
      int ret;

      switch (vdev->senseid.cu_model) {

Unfortunately when verifying the fix I found another unrelated issue with
probing, which is that only the first device attached to the scsi controller
will be found.  This is because virtio_scsi_setup() itself contains a 
probing
loop to find a LUN when none is specified, and, unsurprisingly, it returns
the first thing it finds attached to the controller.  As a result, the main
device probing loop will move on after trying only one LUN per controller.

Fixing this won't be a simple one-liner because it will require 
implementation
of nested probing for scsi devices, such that all LUNS on the controller are
probed before moving to the next device.

Regards,
  Jared Rossi
Thomas Huth Nov. 8, 2024, 2:37 p.m. UTC | #5
On 07/11/2024 21.42, Jared Rossi wrote:
> 
> 
> On 11/6/24 6:10 AM, Thomas Huth wrote:
>> On 05/11/2024 17.42, Jared Rossi wrote:
>>> Hi Thomas, Sebastian,
>>>
>>> It looks like this is simply caused by the "is_cdrom" value only ever 
>>> being set
>>> to true.  I think it is a one-line fix that just makes sure to initialize 
>>> the
>>> value to false each time we try a new device:
>>>
>>> diff --git a/pc-bios/s390-ccw/main.c b/pc-bios/s390-ccw/main.c
>>> index a4d1c05aac..3fdba0bedc 100644
>>> --- a/pc-bios/s390-ccw/main.c
>>> +++ b/pc-bios/s390-ccw/main.c
>>> @@ -214,6 +214,7 @@ static void boot_setup(void)
>>>   static bool find_boot_device(void)
>>>   {
>>>       VDev *vdev = virtio_get_device();
>>> +    vdev->is_cdrom = false;
>>>       bool found = false;
>>>
>>>       switch (iplb.pbt) {
>>>
>>> I tested it with the two scenarios you mention and with the existing qtests,
>>> and it seems to work correctly now.
>>
>> Agreed, this seems to fix the issue when all devices are properly marked 
>> with bootindex properties. But it still persists in case the BIOS has to 
>> scan for the boot device, e.g.:
>>
>> qemu-system-s390x -accel kvm -m 2G -nographic \
>>  -drive if=none,id=d1,file=/tmp/bad.dat,format=raw,media=cdrom \
>>  -device virtio-scsi -device scsi-cd,drive=d1 \
>>  -drive if=none,id=d2,file=good.qcow2 -device virtio-blk,drive=d2
>>
>> Isn't there a better place to set the is_cdrom variable?
>>
>>  Thomas
>>
> 
> Hi Thomas,
> 
> What I found is that the original issue with clearing the "is_cdrom" value is
> just as easily fixed for both indexed devices and probed devices by moving
> where "is_cdrom" is set to false:
> 
> diff --git a/pc-bios/s390-ccw/main.c b/pc-bios/s390-ccw/main.c
> index a4d1c05aac..7509755e36 100644
> --- a/pc-bios/s390-ccw/main.c
> +++ b/pc-bios/s390-ccw/main.c
> @@ -242,6 +242,7 @@ static bool find_boot_device(void)
>   static int virtio_setup(void)
>   {
>       VDev *vdev = virtio_get_device();
> +    vdev->is_cdrom = false;
>       int ret;

That looks like a good spot, indeed! Could you please send it as a proper 
patch (i.e. with a Signed-off-by line)? I think that would still be a good 
fix for QEMU 9.2.

> Unfortunately when verifying the fix I found another unrelated issue with
> probing, which is that only the first device attached to the scsi controller
> will be found.  This is because virtio_scsi_setup() itself contains a probing
> loop to find a LUN when none is specified, and, unsurprisingly, it returns
> the first thing it finds attached to the controller.  As a result, the main
> device probing loop will move on after trying only one LUN per controller.
> 
> Fixing this won't be a simple one-liner because it will require implementation
> of nested probing for scsi devices, such that all LUNS on the controller are
> probed before moving to the next device.

Ok, maybe that change will be too big for QEMU 9.2 anyway (we're in freeze 
now), so it's ok to include this later, I think.

And in case you're interested (it's maybe not so important since it's rather 
unlikely that the users will do this), there is another issue when trying to 
boot from multiple network devices where the first one has a DHCP server but 
no bootfile:

qemu-system-s390x -nographic -accel kvm -m 2G -netdev user,id=n1 \
  -device virtio-net-ccw,netdev=n1,bootindex=1 \
  -netdev user,id=n2,tftp=/boot,bootfile=vmlinuz \
  -device virtio-net-ccw,netdev=n2,bootindex=2 -d guest_errors

The firmware seems to panic while trying to request DHCP information from 
the second boot device.

  Thomas
Thomas Huth Nov. 11, 2024, 12:23 p.m. UTC | #6
On 08/11/2024 15.37, Thomas Huth wrote:
...
> And in case you're interested (it's maybe not so important since it's rather 
> unlikely that the users will do this), there is another issue when trying to 
> boot from multiple network devices where the first one has a DHCP server but 
> no bootfile:
> 
> qemu-system-s390x -nographic -accel kvm -m 2G -netdev user,id=n1 \
>   -device virtio-net-ccw,netdev=n1,bootindex=1 \
>   -netdev user,id=n2,tftp=/boot,bootfile=vmlinuz \
>   -device virtio-net-ccw,netdev=n2,bootindex=2 -d guest_errors
> 
> The firmware seems to panic while trying to request DHCP information from 
> the second boot device.

I had a look at this problem now, and the solution is as simple as this:

diff --git a/pc-bios/s390-ccw/virtio-net.c b/pc-bios/s390-ccw/virtio-net.c
--- a/pc-bios/s390-ccw/virtio-net.c
+++ b/pc-bios/s390-ccw/virtio-net.c
@@ -51,6 +51,8 @@ int virtio_net_init(void *mac_addr)
      void *buf;
      int i;

+    rx_last_idx = 0;
+
      vdev->guest_features[0] = VIRTIO_NET_F_MAC_BIT;
      virtio_setup_ccw(vdev);

Otherwise the rx_last_idx does not get reset to 0 when starting the 2nd 
network boot, so the code tries to access some ring data that contains 
garbage later. I'll send it as a proper patch...

  Thomas