Message ID | 7578489af5c7df525d4c82231b68bbb7d955d2b4.1743678257.git-series.marmarek@invisiblethingslab.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | Several CI cleanups and improvements, plus yet another new runner | expand |
On 03.04.2025 13:04, Marek Marczykowski-Górecki wrote: > It appears as sometimes it takes more time for Xen even start booting, > mostly due to firmware and fetching large boot files by grub. In some > jobs the current timeout is pretty close to the actual time needed, and > sometimes (rarely for now) test fails due to timeout expiring in the > middle of dom0 booting. This will be happening more often if the > initramfs will grow (and with more complex tests). With that, ... > This has been observed on some dom0pvh-hvm jobs, at least on runners hw3 > and hw11. > > Increase the timeout by yet another 60s (up to 180s now). ... is this little a bump going to be sufficient? How about moving straight to 5min? As to observed failing jobs - the PV Dom0 boot failure seen today looks to also be due to too short a timeout. Jan
On Thu, Apr 03, 2025 at 01:32:38PM +0200, Jan Beulich wrote: > On 03.04.2025 13:04, Marek Marczykowski-Górecki wrote: > > It appears as sometimes it takes more time for Xen even start booting, > > mostly due to firmware and fetching large boot files by grub. In some > > jobs the current timeout is pretty close to the actual time needed, and > > sometimes (rarely for now) test fails due to timeout expiring in the > > middle of dom0 booting. This will be happening more often if the > > initramfs will grow (and with more complex tests). > > With that, ... > > > This has been observed on some dom0pvh-hvm jobs, at least on runners hw3 > > and hw11. > > > > Increase the timeout by yet another 60s (up to 180s now). > > ... is this little a bump going to be sufficient? How about moving straight > to 5min? I don't like this, as many (most) actual failures are visible as timeout (for example panic that prevents reaching Alpine prompt). One improvement I can see is splitting this into two separate timeouts: one before seeing the first line from Xen and then the second one for reaching Alpine login prompt. The first one can be longer as its mostly about firmware+fetching boot files and shouldn't hit on crashes (unless a crash happen before printing anything on the console - but those are rare). > As to observed failing jobs - the PV Dom0 boot failure seen today looks to > also be due to too short a timeout. As responded on Matrix, I'm not so sure, there is over 1m wait after "Built 1 zonelists, mobility grouping on. Total pages: 8228487" line from dom0 (or a bit later, due to buffering by sed), while in successful test next lines follow instantaneously.
On Thu, 3 Apr 2025, Marek Marczykowski-Górecki wrote: > On Thu, Apr 03, 2025 at 01:32:38PM +0200, Jan Beulich wrote: > > On 03.04.2025 13:04, Marek Marczykowski-Górecki wrote: > > > It appears as sometimes it takes more time for Xen even start booting, > > > mostly due to firmware and fetching large boot files by grub. In some > > > jobs the current timeout is pretty close to the actual time needed, and > > > sometimes (rarely for now) test fails due to timeout expiring in the > > > middle of dom0 booting. This will be happening more often if the > > > initramfs will grow (and with more complex tests). > > > > With that, ... > > > > > This has been observed on some dom0pvh-hvm jobs, at least on runners hw3 > > > and hw11. > > > > > > Increase the timeout by yet another 60s (up to 180s now). > > > > ... is this little a bump going to be sufficient? How about moving straight > > to 5min? Hi Marek, would you be up for moving your script to use "expect"? Something like ./automation/scripts/console.exp? That way, we would immediately complete the job no matter the timeout value. It is also nicer :-) > I don't like this, as many (most) actual failures are visible as timeout > (for example panic that prevents reaching Alpine prompt). One > improvement I can see is splitting this into two separate timeouts: one > before seeing the first line from Xen and then the second one for > reaching Alpine login prompt. The first one can be longer as its mostly > about firmware+fetching boot files and shouldn't hit on crashes (unless > a crash happen before printing anything on the console - but those are > rare). This is also something you can very specifically tweak with expect.
On Thu, Apr 03, 2025 at 05:21:56PM -0700, Stefano Stabellini wrote: > Hi Marek, would you be up for moving your script to use "expect"? > Something like ./automation/scripts/console.exp? > > That way, we would immediately complete the job no matter the timeout > value. It is also nicer :-) Oh, this is excellent idea, I'll see how it fits there. It seems I'll need at least add support for wakeup for the suspend tests, but that shouldn't be a problem.
diff --git a/automation/scripts/qubes-x86-64.sh b/automation/scripts/qubes-x86-64.sh index 8e78b7984e98..771c77d6618b 100755 --- a/automation/scripts/qubes-x86-64.sh +++ b/automation/scripts/qubes-x86-64.sh @@ -17,7 +17,7 @@ test_variant=$1 ### defaults extra_xen_opts= wait_and_wakeup= -timeout=120 +timeout=180 domU_type="pvh" domU_vif="'bridge=xenbr0'," domU_extra_config=
It appears as sometimes it takes more time for Xen even start booting, mostly due to firmware and fetching large boot files by grub. In some jobs the current timeout is pretty close to the actual time needed, and sometimes (rarely for now) test fails due to timeout expiring in the middle of dom0 booting. This will be happening more often if the initramfs will grow (and with more complex tests). This has been observed on some dom0pvh-hvm jobs, at least on runners hw3 and hw11. Increase the timeout by yet another 60s (up to 180s now). Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> --- automation/scripts/qubes-x86-64.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)