diff mbox series

blktests: fix how we handle waiting for nbd to connect

Message ID 9377610cbdc3568c172cd7c5d2e9d36da8dd2cf4.1716312272.git.josef@toxicpanda.com (mailing list archive)
State New, archived
Headers show
Series blktests: fix how we handle waiting for nbd to connect | expand

Commit Message

Josef Bacik May 21, 2024, 5:24 p.m. UTC
Because NBD has the old style configure the device directly config we
sometimes have spurious failures where the device wasn't quite ready
before the rest of the test continued.

nbd/002 had been using _wait_for_nbd_connect, however this helper waits
for the recv task to show up, which actually happens slightly before the
size is set and we're actually ready to be read from.  This means we
would sometimes fail nbd/002 because the device wasn't quite right.

Additionally nbd/001 has a similar issue where we weren't waiting for
the device to be ready before going ahead with the test, which would
cause spurious failures.

Fix this by adjusting _wait_for_nbd_connect to only exit once the size
for the device is being reported properly, meaning that it's ready to be
read from.

Then add this call to nbd/001 to eliminate the random failures we would
see with this test.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 tests/nbd/001 | 1 +
 tests/nbd/rc  | 3 ++-
 2 files changed, 3 insertions(+), 1 deletion(-)

Comments

Shin'ichiro Kawasaki May 22, 2024, 7:02 a.m. UTC | #1
On May 21, 2024 / 13:24, Josef Bacik wrote:
> Because NBD has the old style configure the device directly config we
> sometimes have spurious failures where the device wasn't quite ready
> before the rest of the test continued.
> 
> nbd/002 had been using _wait_for_nbd_connect, however this helper waits
> for the recv task to show up, which actually happens slightly before the
> size is set and we're actually ready to be read from.  This means we
> would sometimes fail nbd/002 because the device wasn't quite right.
> 
> Additionally nbd/001 has a similar issue where we weren't waiting for
> the device to be ready before going ahead with the test, which would
> cause spurious failures.
> 
> Fix this by adjusting _wait_for_nbd_connect to only exit once the size
> for the device is being reported properly, meaning that it's ready to be
> read from.
> 
> Then add this call to nbd/001 to eliminate the random failures we would
> see with this test.
> 
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>

Josef, thank you very much. I've been seeing nbd/001 on my test system and this
patch avoids it :) The change looks good to me. I will leave it for several
days before applying it, in case anyone has some more comments.
Shin'ichiro Kawasaki May 27, 2024, 7:49 a.m. UTC | #2
On May 22, 2024 / 07:02, Shinichiro Kawasaki wrote:
> On May 21, 2024 / 13:24, Josef Bacik wrote:
> > Because NBD has the old style configure the device directly config we
> > sometimes have spurious failures where the device wasn't quite ready
> > before the rest of the test continued.
> > 
> > nbd/002 had been using _wait_for_nbd_connect, however this helper waits
> > for the recv task to show up, which actually happens slightly before the
> > size is set and we're actually ready to be read from.  This means we
> > would sometimes fail nbd/002 because the device wasn't quite right.
> > 
> > Additionally nbd/001 has a similar issue where we weren't waiting for
> > the device to be ready before going ahead with the test, which would
> > cause spurious failures.
> > 
> > Fix this by adjusting _wait_for_nbd_connect to only exit once the size
> > for the device is being reported properly, meaning that it's ready to be
> > read from.
> > 
> > Then add this call to nbd/001 to eliminate the random failures we would
> > see with this test.
> > 
> > Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> 
> Josef, thank you very much. I've been seeing nbd/001 on my test system and this
> patch avoids it :) The change looks good to me. I will leave it for several
> days before applying it, in case anyone has some more comments.

I've applied it. Thanks!
Shin'ichiro Kawasaki May 31, 2024, 1:47 a.m. UTC | #3
On May 27, 2024 / 07:49, Shinichiro Kawasaki wrote:
> On May 22, 2024 / 07:02, Shinichiro Kawasaki wrote:
> > On May 21, 2024 / 13:24, Josef Bacik wrote:
> > > Because NBD has the old style configure the device directly config we
> > > sometimes have spurious failures where the device wasn't quite ready
> > > before the rest of the test continued.
> > > 
> > > nbd/002 had been using _wait_for_nbd_connect, however this helper waits
> > > for the recv task to show up, which actually happens slightly before the
> > > size is set and we're actually ready to be read from.  This means we
> > > would sometimes fail nbd/002 because the device wasn't quite right.
> > > 
> > > Additionally nbd/001 has a similar issue where we weren't waiting for
> > > the device to be ready before going ahead with the test, which would
> > > cause spurious failures.
> > > 
> > > Fix this by adjusting _wait_for_nbd_connect to only exit once the size
> > > for the device is being reported properly, meaning that it's ready to be
> > > read from.
> > > 
> > > Then add this call to nbd/001 to eliminate the random failures we would
> > > see with this test.
> > > 
> > > Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> > 
> > Josef, thank you very much. I've been seeing nbd/001 on my test system and this
> > patch avoids it :) The change looks good to me. I will leave it for several
> > days before applying it, in case anyone has some more comments.
> 
> I've applied it. Thanks!

To: Yi Zhang

Yi, I guess this fix may avoid the nbd/002 failure that CKI reports recently
[*]. To apply the fix, could you rebase the blktests for CKI runs?

[*] https://datawarehouse.cki-project.org/kcidb/tests/12631448
Yi Zhang May 31, 2024, 2:32 a.m. UTC | #4
On Fri, May 31, 2024 at 9:47 AM Shinichiro Kawasaki
<shinichiro.kawasaki@wdc.com> wrote:
>
> On May 27, 2024 / 07:49, Shinichiro Kawasaki wrote:
> > On May 22, 2024 / 07:02, Shinichiro Kawasaki wrote:
> > > On May 21, 2024 / 13:24, Josef Bacik wrote:
> > > > Because NBD has the old style configure the device directly config we
> > > > sometimes have spurious failures where the device wasn't quite ready
> > > > before the rest of the test continued.
> > > >
> > > > nbd/002 had been using _wait_for_nbd_connect, however this helper waits
> > > > for the recv task to show up, which actually happens slightly before the
> > > > size is set and we're actually ready to be read from.  This means we
> > > > would sometimes fail nbd/002 because the device wasn't quite right.
> > > >
> > > > Additionally nbd/001 has a similar issue where we weren't waiting for
> > > > the device to be ready before going ahead with the test, which would
> > > > cause spurious failures.
> > > >
> > > > Fix this by adjusting _wait_for_nbd_connect to only exit once the size
> > > > for the device is being reported properly, meaning that it's ready to be
> > > > read from.
> > > >
> > > > Then add this call to nbd/001 to eliminate the random failures we would
> > > > see with this test.
> > > >
> > > > Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> > >
> > > Josef, thank you very much. I've been seeing nbd/001 on my test system and this
> > > patch avoids it :) The change looks good to me. I will leave it for several
> > > days before applying it, in case anyone has some more comments.
> >
> > I've applied it. Thanks!
>
> To: Yi Zhang
>
> Yi, I guess this fix may avoid the nbd/002 failure that CKI reports recently
> [*]. To apply the fix, could you rebase the blktests for CKI runs?

Sure, will apply this patch and verify the failure.

>
> [*] https://datawarehouse.cki-project.org/kcidb/tests/12631448
>
diff mbox series

Patch

diff --git a/tests/nbd/001 b/tests/nbd/001
index 5fd0d43..0975af0 100755
--- a/tests/nbd/001
+++ b/tests/nbd/001
@@ -18,6 +18,7 @@  test() {
 	echo "Running ${TEST_NAME}"
 	_start_nbd_server
 	nbd-client -L -N export localhost /dev/nbd0 >> "$FULL" 2>&1
+	_wait_for_nbd_connect
 	udevadm settle
 
 	parted -s /dev/nbd0 print 2>> "$FULL" | grep 'Disk /dev/nbd0'
diff --git a/tests/nbd/rc b/tests/nbd/rc
index 9c1c15b..e96dc61 100644
--- a/tests/nbd/rc
+++ b/tests/nbd/rc
@@ -43,7 +43,8 @@  _have_nbd_netlink() {
 
 _wait_for_nbd_connect() {
 	for ((i = 0; i < 3; i++)); do
-		if [[ -e /sys/kernel/debug/nbd/nbd0/tasks ]]; then
+		sz=$(lsblk --raw --noheadings -o SIZE /dev/nbd0)
+		if [ "$sz" != "0B"  ]; then
 			return 0
 		fi
 		sleep 1