Message ID | 20231109154922.49118-1-anthony.perard@citrix.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [XEN] CI: Rework RISCV smoke test | expand |
Hello Anthony, On Thu, 2023-11-09 at 15:49 +0000, Anthony PERARD wrote: > Currently, the test rely on QEMU and Xen finishing the boot in under > two seconds. That's both very long and very short. Xen usually > managed > to print "All set up" under a second. Unless for some reason we try > to > run the test on a machine that's busy doing something else. > > Rework the test to exit as soon as Xen is done. > > There's two `tail -f`, the first one is there simply to monitor test > progress in GitLab console. The second one is used to detect the test > result as soon as QEMU add it to the file. Both `tail` exit as soon > as > QEMU exit. > > If QEMU fails at start, and exit early, both `tail` will simply exit, > resulting in a failure. > > If the line we are looking for is never printed, the `timeout` on the > second `tail` will force the test to exit with an error. > > Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> > --- > > Notes: > The "machine might be busy" bits refere to having a gitlab-runner > running two jobs, one been a build with no restriction on CPU > resource use. > > We could use "-daemonize" with "-pidfile" to more easly detect > qemu's > initialisation failure, but needs "-display none" instead of > "-nographic" > > automation/scripts/qemu-smoke-riscv64.sh | 32 ++++++++++++++++++---- > -- > 1 file changed, 24 insertions(+), 8 deletions(-) > > diff --git a/automation/scripts/qemu-smoke-riscv64.sh > b/automation/scripts/qemu-smoke-riscv64.sh > index 4008191302..ba7b61db8b 100755 > --- a/automation/scripts/qemu-smoke-riscv64.sh > +++ b/automation/scripts/qemu-smoke-riscv64.sh > @@ -2,19 +2,35 @@ > > set -ex > > -# Run the test > -rm -f smoke.serial > -set +e > +# Truncate or create serial output file > +echo -n > smoke.serial > > -timeout -k 1 2 \ > +# cleanup: kill QEMU when the script exit for any reason > +qemu_pid= > +cleanup() { > + if [ "$qemu_pid" ]; then > + kill -s SIGKILL "$qemu_pid" > + fi > +} > + > +trap 'cleanup' EXIT > + > +# Run the test > qemu-system-riscv64 \ > -M virt \ > -smp 1 \ > -nographic \ > -m 2g \ > + -monitor none \ > + -chardev file,id=serial-out,path=smoke.serial \ > + -serial chardev:serial-out \ > -kernel binaries/xen \ > - |& tee smoke.serial > + & > +qemu_pid=$! > > -set -e > -(grep -q "All set up" smoke.serial) || exit 1 > -exit 0 > +# Monitor test progression until QEMU exit > +tail --pid=$qemu_pid -f smoke.serial & > + > +# Check boot test result > +timeout 60 tail --pid=$qemu_pid -f smoke.serial | \ > + grep -a -q "All set up" I am OK with provided changes. Thanks! Acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com> ~ Oleksii
On 09/11/2023 3:49 pm, Anthony PERARD wrote: > Currently, the test rely on QEMU and Xen finishing the boot in under > two seconds. That's both very long and very short. Xen usually managed > to print "All set up" under a second. Unless for some reason we try to > run the test on a machine that's busy doing something else. > > Rework the test to exit as soon as Xen is done. > > There's two `tail -f`, the first one is there simply to monitor test > progress in GitLab console. The second one is used to detect the test > result as soon as QEMU add it to the file. Both `tail` exit as soon as > QEMU exit. > > If QEMU fails at start, and exit early, both `tail` will simply exit, > resulting in a failure. > > If the line we are looking for is never printed, the `timeout` on the > second `tail` will force the test to exit with an error. > > Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> Looks plausible, but all these qemu-smoke scripts are pretty similar, and copied from one-another. We should make this change consistently to all testing (there's nothing RISC-V specific about why this test is failing on this runner), and it would be really nice if we could try to make it a bit more common than it currently is. ~Andrew
On Thu, 9 Nov 2023, Andrew Cooper wrote: > On 09/11/2023 3:49 pm, Anthony PERARD wrote: > > Currently, the test rely on QEMU and Xen finishing the boot in under > > two seconds. That's both very long and very short. Xen usually managed > > to print "All set up" under a second. Unless for some reason we try to > > run the test on a machine that's busy doing something else. > > > > Rework the test to exit as soon as Xen is done. > > > > There's two `tail -f`, the first one is there simply to monitor test > > progress in GitLab console. The second one is used to detect the test > > result as soon as QEMU add it to the file. Both `tail` exit as soon as > > QEMU exit. > > > > If QEMU fails at start, and exit early, both `tail` will simply exit, > > resulting in a failure. > > > > If the line we are looking for is never printed, the `timeout` on the > > second `tail` will force the test to exit with an error. > > > > Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> > > Looks plausible, but all these qemu-smoke scripts are pretty similar, > and copied from one-another. > > We should make this change consistently to all testing (there's nothing > RISC-V specific about why this test is failing on this runner), and it > would be really nice if we could try to make it a bit more common than > it currently is. One idea you might like is do to the following (this was developed by Oleksandr). --- # Run the test rm -f smoke.serial set +e export qemu_cmd="./binaries/qemu-system-aarch64 \ -machine virtualization=true \ -cpu cortex-a57 -machine type=virt \ -m 2048 -monitor none -serial stdio \ -smp 2 \ -no-reboot \ -device virtio-net-pci,netdev=n0 \ -netdev user,id=n0,tftp=binaries \ -bios /usr/lib/u-boot/qemu_arm64/u-boot.bin" export qemu_arg="virtio scan; dhcp; tftpb 0x40000000 boot.scr; source 0x40000000" export qemu_log="smoke.serial" export LOG_MSG="Run /init as init process" timeout -k 1 60 ./automation/scripts/qemu_key.sh ### qemu_key.sh is using "expect", see below. I think we should be able ### to achieve the same by using expect to close on the expected string ### (instead of waiting for eof) #!/usr/bin/expect -f set timeout -1 log_file -a $env(qemu_log) match_max 10000 eval spawn $env(qemu_cmd) expect "=>" send "$env(qemu_arg)\r" expect "$env(LOG_MSG)\r" send "$env(XEN_CMD)\r" expect eof
On Thu, Nov 09, 2023 at 05:02:08PM -0800, Stefano Stabellini wrote: > ### qemu_key.sh is using "expect", see below. I think we should be able > ### to achieve the same by using expect to close on the expected string > ### (instead of waiting for eof) So, `expect` is just a different kind of language than shell is? Also, `grep -q` doesn't wait for EOF, and just exit as soon as it found the pattern. Cheers,
On Thu, Nov 09, 2023 at 04:52:36PM +0000, Andrew Cooper wrote: > On 09/11/2023 3:49 pm, Anthony PERARD wrote: > > Currently, the test rely on QEMU and Xen finishing the boot in under > > two seconds. That's both very long and very short. Xen usually managed > > to print "All set up" under a second. Unless for some reason we try to > > run the test on a machine that's busy doing something else. > > > > Rework the test to exit as soon as Xen is done. > > > > There's two `tail -f`, the first one is there simply to monitor test > > progress in GitLab console. The second one is used to detect the test > > result as soon as QEMU add it to the file. Both `tail` exit as soon as > > QEMU exit. > > > > If QEMU fails at start, and exit early, both `tail` will simply exit, > > resulting in a failure. > > > > If the line we are looking for is never printed, the `timeout` on the > > second `tail` will force the test to exit with an error. > > > > Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> > > Looks plausible, but all these qemu-smoke scripts are pretty similar, > and copied from one-another. > > We should make this change consistently to all testing (there's nothing > RISC-V specific about why this test is failing on this runner), and it > would be really nice if we could try to make it a bit more common than > it currently is. Yes, it would be nice if a change to a qemu-smoke script was applied to every other one. But making those scripts more generic is a lot more works, which would be useful to apply a change once for all. The problem I'm trying to resolve only appear with this script, because of a timeout been too short, a solution could just be to increase the timeout (or not allowing runners to do more than one thing at a time). BTW, the last time I've been told to apply a change to other things, I never managed to finish it and the change wasn't applied at all (would have result in some containers been smaller). I guess will see if anyone complain about the test randomly failing. :-) Cheers,
+Artem, Oleksandr On Mon, 13 Nov 2023, Anthony PERARD wrote: > On Thu, Nov 09, 2023 at 05:02:08PM -0800, Stefano Stabellini wrote: > > ### qemu_key.sh is using "expect", see below. I think we should be able > > ### to achieve the same by using expect to close on the expected string > > ### (instead of waiting for eof) > > So, `expect` is just a different kind of language than shell is? > Also, `grep -q` doesn't wait for EOF, and just exit as soon as it found > the pattern.
On Mon, 13 Nov 2023, Anthony PERARD wrote: > On Thu, Nov 09, 2023 at 04:52:36PM +0000, Andrew Cooper wrote: > > On 09/11/2023 3:49 pm, Anthony PERARD wrote: > > > Currently, the test rely on QEMU and Xen finishing the boot in under > > > two seconds. That's both very long and very short. Xen usually managed > > > to print "All set up" under a second. Unless for some reason we try to > > > run the test on a machine that's busy doing something else. > > > > > > Rework the test to exit as soon as Xen is done. > > > > > > There's two `tail -f`, the first one is there simply to monitor test > > > progress in GitLab console. The second one is used to detect the test > > > result as soon as QEMU add it to the file. Both `tail` exit as soon as > > > QEMU exit. > > > > > > If QEMU fails at start, and exit early, both `tail` will simply exit, > > > resulting in a failure. > > > > > > If the line we are looking for is never printed, the `timeout` on the > > > second `tail` will force the test to exit with an error. > > > > > > Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> > > > > Looks plausible, but all these qemu-smoke scripts are pretty similar, > > and copied from one-another. > > > > We should make this change consistently to all testing (there's nothing > > RISC-V specific about why this test is failing on this runner), and it > > would be really nice if we could try to make it a bit more common than > > it currently is. > > Yes, it would be nice if a change to a qemu-smoke script was applied to > every other one. But making those scripts more generic is a lot more > works, which would be useful to apply a change once for all. > > The problem I'm trying to resolve only appear with this script, because > of a timeout been too short, a solution could just be to increase the > timeout (or not allowing runners to do more than one thing at a time). > > BTW, the last time I've been told to apply a change to other things, I > never managed to finish it and the change wasn't applied at all (would > have result in some containers been smaller). > > I guess will see if anyone complain about the test randomly failing. :-) Hi Anthony, I think it is OK if you only fix this script for now. As part of the tests we are writing for Xen FuSa we want to improve the basic gitlab testing infrastructure and make the code common like Andrew asked (we haven't done it yet but soon). We should get to it in the next few months.
Hi Stefano, Anthony On Mon, 13 Nov 2023, Anthony PERARD wrote: > On Thu, Nov 09, 2023 at 05:02:08PM -0800, Stefano Stabellini wrote: > > ### qemu_key.sh is using "expect", see below. I think we should be able > > ### to achieve the same by using expect to close on the expected string > > ### (instead of waiting for eof) > > So, `expect` is just a different kind of language than shell is? > Also, `grep -q` doesn't wait for EOF, and just exit as soon as it found > the pattern. Just a brief background on `expect`: it is a nice tool widely used in testing of embedded Linux systems with CLI. It is popular because it is small and available for basically every distro including very old ones, and it is very flexible. We were using it for ages in automated testing, so for me it was no-brainer to pick it when we needed to "interact" with console. One minor drawback though is that `expect` uses Tcl for scripting, not shell :) -- Artem
diff --git a/automation/scripts/qemu-smoke-riscv64.sh b/automation/scripts/qemu-smoke-riscv64.sh index 4008191302..ba7b61db8b 100755 --- a/automation/scripts/qemu-smoke-riscv64.sh +++ b/automation/scripts/qemu-smoke-riscv64.sh @@ -2,19 +2,35 @@ set -ex -# Run the test -rm -f smoke.serial -set +e +# Truncate or create serial output file +echo -n > smoke.serial -timeout -k 1 2 \ +# cleanup: kill QEMU when the script exit for any reason +qemu_pid= +cleanup() { + if [ "$qemu_pid" ]; then + kill -s SIGKILL "$qemu_pid" + fi +} + +trap 'cleanup' EXIT + +# Run the test qemu-system-riscv64 \ -M virt \ -smp 1 \ -nographic \ -m 2g \ + -monitor none \ + -chardev file,id=serial-out,path=smoke.serial \ + -serial chardev:serial-out \ -kernel binaries/xen \ - |& tee smoke.serial + & +qemu_pid=$! -set -e -(grep -q "All set up" smoke.serial) || exit 1 -exit 0 +# Monitor test progression until QEMU exit +tail --pid=$qemu_pid -f smoke.serial & + +# Check boot test result +timeout 60 tail --pid=$qemu_pid -f smoke.serial | \ + grep -a -q "All set up"
Currently, the test rely on QEMU and Xen finishing the boot in under two seconds. That's both very long and very short. Xen usually managed to print "All set up" under a second. Unless for some reason we try to run the test on a machine that's busy doing something else. Rework the test to exit as soon as Xen is done. There's two `tail -f`, the first one is there simply to monitor test progress in GitLab console. The second one is used to detect the test result as soon as QEMU add it to the file. Both `tail` exit as soon as QEMU exit. If QEMU fails at start, and exit early, both `tail` will simply exit, resulting in a failure. If the line we are looking for is never printed, the `timeout` on the second `tail` will force the test to exit with an error. Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> --- Notes: The "machine might be busy" bits refere to having a gitlab-runner running two jobs, one been a build with no restriction on CPU resource use. We could use "-daemonize" with "-pidfile" to more easly detect qemu's initialisation failure, but needs "-display none" instead of "-nographic" automation/scripts/qemu-smoke-riscv64.sh | 32 ++++++++++++++++++------ 1 file changed, 24 insertions(+), 8 deletions(-)