Message ID | 20241020012953.1380075-1-jrossi@linux.ibm.com (mailing list archive) |
---|---|
Headers | show |
Series | s390x: Add Full Boot Order Support | expand |
On 20/10/2024 03.29, jrossi@linux.ibm.com wrote: > From: Jared Rossi <jrossi@linux.ibm.com> > > changes v4 -> v5: > - Fix a bug with per-deice loadparm support: > The machine loadparm is no longer overwritten by device values, which now > allows an empty machine loadparm to propagate to later devices even if > the primary boot device set an initial loadparm > - Fix two instances where changes were squashed into wrong patch > - Fix an instance where NULL_BLOCK_NR was returned instead of ERROR_BLOCK_NR > - Fix an instance of logical AND being used instead of bitwise AND > - Standardize all error values to be negative in all device type paths > - Minor stylistic changes and code simplification Hi Jared! Our QE Sebastian also had a try with the patches today, and discovered some non-working scenarios: Try to boot from non-working CD image first, then from a working HD image: dd if=/dev/zero of=/tmp/zero.dat bs=1M count=10 qemu-system-s390x -nographic -accel kvm -m 2G \ -drive if=none,id=d1,file=/tmp/zero.dat,format=raw,media=cdrom \ -device virtio-scsi -device scsi-cd,drive=d1,bootindex=1 \ -drive if=none,file=good-image.qcow2,id=d2 \ -device virtio-blk,drive=d2,bootindex=2 This outputs something like the following text, then aborts: LOADPARM=[ ] Using virtio-scsi. SCSI CD-ROM detected. Failed to IPL this ISO image! LOADPARM=[ ] Using virtio-blk. Failed to IPL this ISO image! ERROR: No suitable device for IPL. Halting... Looks like the s390-ccw bios is treating the virtio-blk device as CD-ROM in this case? Almost the same setup, first device is again a non-working CD image, but the second device is a virtio-net device - results in the same error message (so it's likely the same or at least a very similar problem). Could you please have a look? Thanks! Thomas
Hi Thomas, Sebastian, It looks like this is simply caused by the "is_cdrom" value only ever being set to true. I think it is a one-line fix that just makes sure to initialize the value to false each time we try a new device: diff --git a/pc-bios/s390-ccw/main.c b/pc-bios/s390-ccw/main.c index a4d1c05aac..3fdba0bedc 100644 --- a/pc-bios/s390-ccw/main.c +++ b/pc-bios/s390-ccw/main.c @@ -214,6 +214,7 @@ static void boot_setup(void) static bool find_boot_device(void) { VDev *vdev = virtio_get_device(); + vdev->is_cdrom = false; bool found = false; switch (iplb.pbt) { I tested it with the two scenarios you mention and with the existing qtests, and it seems to work correctly now. Thanks for finding the mistake, Jared Rossi On 10/31/24 11:50 AM, Thomas Huth wrote: > On 20/10/2024 03.29, jrossi@linux.ibm.com wrote: >> From: Jared Rossi <jrossi@linux.ibm.com> >> >> changes v4 -> v5: >> - Fix a bug with per-deice loadparm support: >> The machine loadparm is no longer overwritten by device values, >> which now >> allows an empty machine loadparm to propagate to later devices >> even if >> the primary boot device set an initial loadparm >> - Fix two instances where changes were squashed into wrong patch >> - Fix an instance where NULL_BLOCK_NR was returned instead of >> ERROR_BLOCK_NR >> - Fix an instance of logical AND being used instead of bitwise AND >> - Standardize all error values to be negative in all device type paths >> - Minor stylistic changes and code simplification > > Hi Jared! > > Our QE Sebastian also had a try with the patches today, and discovered > some non-working scenarios: > > Try to boot from non-working CD image first, then from a working HD > image: > > dd if=/dev/zero of=/tmp/zero.dat bs=1M count=10 > qemu-system-s390x -nographic -accel kvm -m 2G \ > -drive if=none,id=d1,file=/tmp/zero.dat,format=raw,media=cdrom \ > -device virtio-scsi -device scsi-cd,drive=d1,bootindex=1 \ > -drive if=none,file=good-image.qcow2,id=d2 \ > -device virtio-blk,drive=d2,bootindex=2 > > This outputs something like the following text, then aborts: > > LOADPARM=[ ] > > Using virtio-scsi. > SCSI CD-ROM detected. > Failed to IPL this ISO image! > LOADPARM=[ ] > > Using virtio-blk. > Failed to IPL this ISO image! > ERROR: No suitable device for IPL. Halting... > > Looks like the s390-ccw bios is treating the virtio-blk device as > CD-ROM in this case? > > Almost the same setup, first device is again a non-working CD image, > but the second device is a virtio-net device - results in the same > error message (so it's likely the same or at least a very similar > problem). > > Could you please have a look? > > Thanks! > Thomas >
On 05/11/2024 17.42, Jared Rossi wrote: > Hi Thomas, Sebastian, > > It looks like this is simply caused by the "is_cdrom" value only ever being set > to true. I think it is a one-line fix that just makes sure to initialize the > value to false each time we try a new device: > > diff --git a/pc-bios/s390-ccw/main.c b/pc-bios/s390-ccw/main.c > index a4d1c05aac..3fdba0bedc 100644 > --- a/pc-bios/s390-ccw/main.c > +++ b/pc-bios/s390-ccw/main.c > @@ -214,6 +214,7 @@ static void boot_setup(void) > static bool find_boot_device(void) > { > VDev *vdev = virtio_get_device(); > + vdev->is_cdrom = false; > bool found = false; > > switch (iplb.pbt) { > > I tested it with the two scenarios you mention and with the existing qtests, > and it seems to work correctly now. Agreed, this seems to fix the issue when all devices are properly marked with bootindex properties. But it still persists in case the BIOS has to scan for the boot device, e.g.: qemu-system-s390x -accel kvm -m 2G -nographic \ -drive if=none,id=d1,file=/tmp/bad.dat,format=raw,media=cdrom \ -device virtio-scsi -device scsi-cd,drive=d1 \ -drive if=none,id=d2,file=good.qcow2 -device virtio-blk,drive=d2 Isn't there a better place to set the is_cdrom variable? Thomas
On 11/6/24 6:10 AM, Thomas Huth wrote: > On 05/11/2024 17.42, Jared Rossi wrote: >> Hi Thomas, Sebastian, >> >> It looks like this is simply caused by the "is_cdrom" value only ever >> being set >> to true. I think it is a one-line fix that just makes sure to >> initialize the >> value to false each time we try a new device: >> >> diff --git a/pc-bios/s390-ccw/main.c b/pc-bios/s390-ccw/main.c >> index a4d1c05aac..3fdba0bedc 100644 >> --- a/pc-bios/s390-ccw/main.c >> +++ b/pc-bios/s390-ccw/main.c >> @@ -214,6 +214,7 @@ static void boot_setup(void) >> static bool find_boot_device(void) >> { >> VDev *vdev = virtio_get_device(); >> + vdev->is_cdrom = false; >> bool found = false; >> >> switch (iplb.pbt) { >> >> I tested it with the two scenarios you mention and with the existing >> qtests, >> and it seems to work correctly now. > > Agreed, this seems to fix the issue when all devices are properly > marked with bootindex properties. But it still persists in case the > BIOS has to scan for the boot device, e.g.: > > qemu-system-s390x -accel kvm -m 2G -nographic \ > -drive if=none,id=d1,file=/tmp/bad.dat,format=raw,media=cdrom \ > -device virtio-scsi -device scsi-cd,drive=d1 \ > -drive if=none,id=d2,file=good.qcow2 -device virtio-blk,drive=d2 > > Isn't there a better place to set the is_cdrom variable? > > Thomas > Hi Thomas, What I found is that the original issue with clearing the "is_cdrom" value is just as easily fixed for both indexed devices and probed devices by moving where "is_cdrom" is set to false: diff --git a/pc-bios/s390-ccw/main.c b/pc-bios/s390-ccw/main.c index a4d1c05aac..7509755e36 100644 --- a/pc-bios/s390-ccw/main.c +++ b/pc-bios/s390-ccw/main.c @@ -242,6 +242,7 @@ static bool find_boot_device(void) static int virtio_setup(void) { VDev *vdev = virtio_get_device(); + vdev->is_cdrom = false; int ret; switch (vdev->senseid.cu_model) { Unfortunately when verifying the fix I found another unrelated issue with probing, which is that only the first device attached to the scsi controller will be found. This is because virtio_scsi_setup() itself contains a probing loop to find a LUN when none is specified, and, unsurprisingly, it returns the first thing it finds attached to the controller. As a result, the main device probing loop will move on after trying only one LUN per controller. Fixing this won't be a simple one-liner because it will require implementation of nested probing for scsi devices, such that all LUNS on the controller are probed before moving to the next device. Regards, Jared Rossi
On 07/11/2024 21.42, Jared Rossi wrote: > > > On 11/6/24 6:10 AM, Thomas Huth wrote: >> On 05/11/2024 17.42, Jared Rossi wrote: >>> Hi Thomas, Sebastian, >>> >>> It looks like this is simply caused by the "is_cdrom" value only ever >>> being set >>> to true. I think it is a one-line fix that just makes sure to initialize >>> the >>> value to false each time we try a new device: >>> >>> diff --git a/pc-bios/s390-ccw/main.c b/pc-bios/s390-ccw/main.c >>> index a4d1c05aac..3fdba0bedc 100644 >>> --- a/pc-bios/s390-ccw/main.c >>> +++ b/pc-bios/s390-ccw/main.c >>> @@ -214,6 +214,7 @@ static void boot_setup(void) >>> static bool find_boot_device(void) >>> { >>> VDev *vdev = virtio_get_device(); >>> + vdev->is_cdrom = false; >>> bool found = false; >>> >>> switch (iplb.pbt) { >>> >>> I tested it with the two scenarios you mention and with the existing qtests, >>> and it seems to work correctly now. >> >> Agreed, this seems to fix the issue when all devices are properly marked >> with bootindex properties. But it still persists in case the BIOS has to >> scan for the boot device, e.g.: >> >> qemu-system-s390x -accel kvm -m 2G -nographic \ >> -drive if=none,id=d1,file=/tmp/bad.dat,format=raw,media=cdrom \ >> -device virtio-scsi -device scsi-cd,drive=d1 \ >> -drive if=none,id=d2,file=good.qcow2 -device virtio-blk,drive=d2 >> >> Isn't there a better place to set the is_cdrom variable? >> >> Thomas >> > > Hi Thomas, > > What I found is that the original issue with clearing the "is_cdrom" value is > just as easily fixed for both indexed devices and probed devices by moving > where "is_cdrom" is set to false: > > diff --git a/pc-bios/s390-ccw/main.c b/pc-bios/s390-ccw/main.c > index a4d1c05aac..7509755e36 100644 > --- a/pc-bios/s390-ccw/main.c > +++ b/pc-bios/s390-ccw/main.c > @@ -242,6 +242,7 @@ static bool find_boot_device(void) > static int virtio_setup(void) > { > VDev *vdev = virtio_get_device(); > + vdev->is_cdrom = false; > int ret; That looks like a good spot, indeed! Could you please send it as a proper patch (i.e. with a Signed-off-by line)? I think that would still be a good fix for QEMU 9.2. > Unfortunately when verifying the fix I found another unrelated issue with > probing, which is that only the first device attached to the scsi controller > will be found. This is because virtio_scsi_setup() itself contains a probing > loop to find a LUN when none is specified, and, unsurprisingly, it returns > the first thing it finds attached to the controller. As a result, the main > device probing loop will move on after trying only one LUN per controller. > > Fixing this won't be a simple one-liner because it will require implementation > of nested probing for scsi devices, such that all LUNS on the controller are > probed before moving to the next device. Ok, maybe that change will be too big for QEMU 9.2 anyway (we're in freeze now), so it's ok to include this later, I think. And in case you're interested (it's maybe not so important since it's rather unlikely that the users will do this), there is another issue when trying to boot from multiple network devices where the first one has a DHCP server but no bootfile: qemu-system-s390x -nographic -accel kvm -m 2G -netdev user,id=n1 \ -device virtio-net-ccw,netdev=n1,bootindex=1 \ -netdev user,id=n2,tftp=/boot,bootfile=vmlinuz \ -device virtio-net-ccw,netdev=n2,bootindex=2 -d guest_errors The firmware seems to panic while trying to request DHCP information from the second boot device. Thomas
On 08/11/2024 15.37, Thomas Huth wrote: ... > And in case you're interested (it's maybe not so important since it's rather > unlikely that the users will do this), there is another issue when trying to > boot from multiple network devices where the first one has a DHCP server but > no bootfile: > > qemu-system-s390x -nographic -accel kvm -m 2G -netdev user,id=n1 \ > -device virtio-net-ccw,netdev=n1,bootindex=1 \ > -netdev user,id=n2,tftp=/boot,bootfile=vmlinuz \ > -device virtio-net-ccw,netdev=n2,bootindex=2 -d guest_errors > > The firmware seems to panic while trying to request DHCP information from > the second boot device. I had a look at this problem now, and the solution is as simple as this: diff --git a/pc-bios/s390-ccw/virtio-net.c b/pc-bios/s390-ccw/virtio-net.c --- a/pc-bios/s390-ccw/virtio-net.c +++ b/pc-bios/s390-ccw/virtio-net.c @@ -51,6 +51,8 @@ int virtio_net_init(void *mac_addr) void *buf; int i; + rx_last_idx = 0; + vdev->guest_features[0] = VIRTIO_NET_F_MAC_BIT; virtio_setup_ccw(vdev); Otherwise the rx_last_idx does not get reset to 0 when starting the 2nd network boot, so the code tries to access some ring data that contains garbage later. I'll send it as a proper patch... Thomas
From: Jared Rossi <jrossi@linux.ibm.com> changes v4 -> v5: - Fix a bug with per-deice loadparm support: The machine loadparm is no longer overwritten by device values, which now allows an empty machine loadparm to propagate to later devices even if the primary boot device set an initial loadparm - Fix two instances where changes were squashed into wrong patch - Fix an instance where NULL_BLOCK_NR was returned instead of ERROR_BLOCK_NR - Fix an instance of logical AND being used instead of bitwise AND - Standardize all error values to be negative in all device type paths - Minor stylistic changes and code simplification changes v3 -> v4: - Ensure signed-ness of return values is appropriate - Add missing newline character in replacements of sclp_print_int() - Add a missing return in a SCSI error path - Restore break that was incorrectly removed for Virtio CU devices - Remove an extra/early return that caused probing to end early - Convert "good" device to scsi-cd in a cdrom-test for better coverage - Minor stylistic clean-ups and variable name clarifications Jared Rossi (19): hw/s390x/ipl: Provide more memory to the s390-ccw.img firmware pc-bios/s390-ccw: Use the libc from SLOF and remove sclp prints pc-bios/s390-ccw: Link the netboot code into the main s390-ccw.img binary hw/s390x: Remove the possibility to load the s390-netboot.img binary pc-bios/s390-ccw: Merge netboot.mak into the main Makefile docs/system/s390x/bootdevices: Update the documentation about network booting pc-bios/s390-ccw: Remove panics from ISO IPL path pc-bios/s390-ccw: Remove panics from ECKD IPL path pc-bios/s390-ccw: Remove panics from SCSI IPL path pc-bios/s390-ccw: Remove panics from DASD IPL path pc-bios/s390-ccw: Remove panics from Netboot IPL path pc-bios/s390-ccw: Enable failed IPL to return after error include/hw/s390x: Add include files for common IPL structs s390x: Add individual loadparm assignment to CCW device hw/s390x: Build an IPLB for each boot device s390x: Rebuild IPLB for SCSI device directly from DIAG308 pc-bios/s390x: Enable multi-device boot loop docs/system: Update documentation for s390x IPL tests/qtest: Add s390x boot order tests to cdrom-test.c docs/system/bootindex.rst | 7 +- docs/system/s390x/bootdevices.rst | 29 +- pc-bios/s390-ccw/netboot.mak | 62 ---- hw/s390x/ccw-device.h | 2 + hw/s390x/ipl.h | 123 +------- include/hw/s390x/ipl/qipl.h | 127 +++++++++ pc-bios/s390-ccw/bootmap.h | 20 +- pc-bios/s390-ccw/cio.h | 2 + pc-bios/s390-ccw/dasd-ipl.h | 2 +- pc-bios/s390-ccw/iplb.h | 108 ++----- pc-bios/s390-ccw/libc.h | 89 ------ pc-bios/s390-ccw/s390-ccw.h | 36 +-- pc-bios/s390-ccw/virtio.h | 3 +- hw/s390x/ccw-device.c | 46 +++ hw/s390x/ipl.c | 282 +++++++++--------- hw/s390x/s390-virtio-ccw.c | 28 +- hw/s390x/sclp.c | 9 +- pc-bios/s390-ccw/bootmap.c | 455 ++++++++++++++++++++---------- pc-bios/s390-ccw/cio.c | 81 +++--- pc-bios/s390-ccw/dasd-ipl.c | 71 ++--- pc-bios/s390-ccw/jump2ipl.c | 22 +- pc-bios/s390-ccw/libc.c | 88 ------ pc-bios/s390-ccw/main.c | 97 ++++--- pc-bios/s390-ccw/menu.c | 51 ++-- pc-bios/s390-ccw/netmain.c | 38 ++- pc-bios/s390-ccw/sclp.c | 7 +- pc-bios/s390-ccw/virtio-blkdev.c | 12 +- pc-bios/s390-ccw/virtio-net.c | 7 +- pc-bios/s390-ccw/virtio-scsi.c | 160 +++++++---- pc-bios/s390-ccw/virtio.c | 67 +++-- target/s390x/diag.c | 9 +- tests/qtest/cdrom-test.c | 24 ++ pc-bios/meson.build | 1 - pc-bios/s390-ccw/Makefile | 69 ++++- pc-bios/s390-netboot.img | Bin 67232 -> 0 bytes 35 files changed, 1158 insertions(+), 1076 deletions(-) delete mode 100644 pc-bios/s390-ccw/netboot.mak create mode 100644 include/hw/s390x/ipl/qipl.h delete mode 100644 pc-bios/s390-ccw/libc.h delete mode 100644 pc-bios/s390-ccw/libc.c delete mode 100644 pc-bios/s390-netboot.img