Message ID | 20250307115314.1096373-11-thuth@redhat.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [PULL,01/15] tests/functional: remove unused 'bin_prefix' variable | expand |
Hi Thomas, Daniel, On 7/3/25 12:53, Thomas Huth wrote: > From: Daniel P. Berrangé <berrange@redhat.com> > > There are two race conditions in the recently added virtio balloon > test > > * The /dev/vda device node is not ready > * The virtio-balloon driver has not issued the first stats refresh > > To fix the former, monitor dmesg for a line about 'vda'. > > To fix the latter, retry the stats query until seeing fresh data. > > Adding 'quiet' to the kernel command line reduces serial output > which otherwise slows boot, making it less likely to hit the former > race too. > > Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> > Message-ID: <20250304183340.3749797-1-berrange@redhat.com> > Reviewed-by: Thomas Huth <thuth@redhat.com> > Reviewed-by: David Hildenbrand <david@redhat.com> > [thuth: Break long line to avoid checkpatch error] > Signed-off-by: Thomas Huth <thuth@redhat.com> > --- > tests/functional/test_virtio_balloon.py | 26 ++++++++++++++++++++----- > 1 file changed, 21 insertions(+), 5 deletions(-) This fix isn't really working, and is more of band-aid IIUC. With the following patch restricting to KVM accel, do we still want it merged?
On 07/03/2025 13.34, Philippe Mathieu-Daudé wrote: > Hi Thomas, Daniel, > > On 7/3/25 12:53, Thomas Huth wrote: >> From: Daniel P. Berrangé <berrange@redhat.com> >> >> There are two race conditions in the recently added virtio balloon >> test >> >> * The /dev/vda device node is not ready >> * The virtio-balloon driver has not issued the first stats refresh >> >> To fix the former, monitor dmesg for a line about 'vda'. >> >> To fix the latter, retry the stats query until seeing fresh data. >> >> Adding 'quiet' to the kernel command line reduces serial output >> which otherwise slows boot, making it less likely to hit the former >> race too. >> >> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> >> Message-ID: <20250304183340.3749797-1-berrange@redhat.com> >> Reviewed-by: Thomas Huth <thuth@redhat.com> >> Reviewed-by: David Hildenbrand <david@redhat.com> >> [thuth: Break long line to avoid checkpatch error] >> Signed-off-by: Thomas Huth <thuth@redhat.com> >> --- >> tests/functional/test_virtio_balloon.py | 26 ++++++++++++++++++++----- >> 1 file changed, 21 insertions(+), 5 deletions(-) > > This fix isn't really working, and is more of band-aid IIUC. > > With the following patch restricting to KVM accel, do we still want it merged? Yes, I think it still makes this test more robust. Thomas
On 7/3/25 13:39, Thomas Huth wrote: > On 07/03/2025 13.34, Philippe Mathieu-Daudé wrote: >> Hi Thomas, Daniel, >> >> On 7/3/25 12:53, Thomas Huth wrote: >>> From: Daniel P. Berrangé <berrange@redhat.com> >>> >>> There are two race conditions in the recently added virtio balloon >>> test >>> >>> * The /dev/vda device node is not ready >>> * The virtio-balloon driver has not issued the first stats refresh >>> >>> To fix the former, monitor dmesg for a line about 'vda'. >>> >>> To fix the latter, retry the stats query until seeing fresh data. >>> >>> Adding 'quiet' to the kernel command line reduces serial output >>> which otherwise slows boot, making it less likely to hit the former >>> race too. >>> >>> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> >>> Message-ID: <20250304183340.3749797-1-berrange@redhat.com> >>> Reviewed-by: Thomas Huth <thuth@redhat.com> >>> Reviewed-by: David Hildenbrand <david@redhat.com> >>> [thuth: Break long line to avoid checkpatch error] >>> Signed-off-by: Thomas Huth <thuth@redhat.com> >>> --- >>> tests/functional/test_virtio_balloon.py | 26 ++++++++++++++++++++----- >>> 1 file changed, 21 insertions(+), 5 deletions(-) >> >> This fix isn't really working, and is more of band-aid IIUC. >> >> With the following patch restricting to KVM accel, do we still want it >> merged? > > Yes, I think it still makes this test more robust. Fine then, thanks.
On Fri, Mar 07, 2025 at 01:39:17PM +0100, Thomas Huth wrote: > On 07/03/2025 13.34, Philippe Mathieu-Daudé wrote: > > Hi Thomas, Daniel, > > > > On 7/3/25 12:53, Thomas Huth wrote: > > > From: Daniel P. Berrangé <berrange@redhat.com> > > > > > > There are two race conditions in the recently added virtio balloon > > > test > > > > > > * The /dev/vda device node is not ready > > > * The virtio-balloon driver has not issued the first stats refresh > > > > > > To fix the former, monitor dmesg for a line about 'vda'. > > > > > > To fix the latter, retry the stats query until seeing fresh data. > > > > > > Adding 'quiet' to the kernel command line reduces serial output > > > which otherwise slows boot, making it less likely to hit the former > > > race too. > > > > > > Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> > > > Message-ID: <20250304183340.3749797-1-berrange@redhat.com> > > > Reviewed-by: Thomas Huth <thuth@redhat.com> > > > Reviewed-by: David Hildenbrand <david@redhat.com> > > > [thuth: Break long line to avoid checkpatch error] > > > Signed-off-by: Thomas Huth <thuth@redhat.com> > > > --- > > > tests/functional/test_virtio_balloon.py | 26 ++++++++++++++++++++----- > > > 1 file changed, 21 insertions(+), 5 deletions(-) > > > > This fix isn't really working, and is more of band-aid IIUC. > > > > With the following patch restricting to KVM accel, do we still want it merged? > > Yes, I think it still makes this test more robust. Agreed, even though this patch isn't perfect it is definitely fixing two clear race conditions. With regards, Daniel
diff --git a/tests/functional/test_virtio_balloon.py b/tests/functional/test_virtio_balloon.py index 67b48e1b4e8..082bf08c4e8 100755 --- a/tests/functional/test_virtio_balloon.py +++ b/tests/functional/test_virtio_balloon.py @@ -32,7 +32,7 @@ class VirtioBalloonx86(QemuSystemTest): 'e3c1b309d9203604922d6e255c2c5d098a309c2d46215d8fc026954f3c5c27a0') DEFAULT_KERNEL_PARAMS = ('root=/dev/vda1 console=ttyS0 net.ifnames=0 ' - 'rd.rescue') + 'rd.rescue quiet') def wait_for_console_pattern(self, success_message, vm=None): wait_for_console_pattern( @@ -47,6 +47,11 @@ def mount_root(self): prompt = '# ' self.wait_for_console_pattern(prompt) + # Synchronize on virtio-block driver creating the root device + exec_command_and_wait_for_pattern(self, + "while ! (dmesg -c | grep vda:) ; do sleep 1 ; done", + "vda1") + exec_command_and_wait_for_pattern(self, 'mount /dev/vda1 /sysroot', prompt) exec_command_and_wait_for_pattern(self, 'chroot /sysroot', @@ -65,10 +70,21 @@ def assert_initial_stats(self): assert val == UNSET_STATS_VALUE def assert_running_stats(self, then): - ret = self.vm.qmp('qom-get', - {'path': '/machine/peripheral/balloon', - 'property': 'guest-stats'})['return'] - when = ret.get('last-update') + # We told the QEMU to refresh stats every 100ms, but + # there can be a delay between virtio-ballon driver + # being modprobed and seeing the first stats refresh + # Retry a few times for robustness under heavy load + retries = 10 + when = 0 + while when == 0 and retries: + ret = self.vm.qmp('qom-get', + {'path': '/machine/peripheral/balloon', + 'property': 'guest-stats'})['return'] + when = ret.get('last-update') + if when == 0: + retries = retries - 1 + time.sleep(0.5) + now = time.time() assert when > then and when < now