diff mbox series

[blktests,v3] nvme/052: wait for namespace removal before recreating namespace

Message ID 20240902105454.1244406-1-shinichiro.kawasaki@wdc.com (mailing list archive)
State New, archived
Headers show
Series [blktests,v3] nvme/052: wait for namespace removal before recreating namespace | expand

Commit Message

Shin'ichiro Kawasaki Sept. 2, 2024, 10:54 a.m. UTC
The CKI project reported that the test case nvme/052 fails occasionally
with the errors below:

  nvme/052 (tr=loop) (Test file-ns creation/deletion under one subsystem) [failed]
      runtime    ...  22.209s
      --- tests/nvme/052.out    2024-07-30 18:38:29.041716566 -0400
      +++
+/mnt/tests/gitlab.com/redhat/centos-stream/tests/kernel/kernel-tests/-/archive/production/kernel-t\
ests-production.zip/storage/blktests/nvme/nvme-loop/blktests
+/results/nodev_tr_loop/nvme/052.out.bad        2024-07-30 18:45:35.438067452 -0400
      @@ -1,2 +1,4 @@
       Running nvme/052
      +cat: /sys/block/nvme1n2/uuid: No such file or directory
      +cat: /sys/block/nvme1n2/uuid: No such file or directory
       Test complete

The test case repeats creating and removing namespaces. When the test
case removes the namespace by echoing 0 to the sysfs enable file, this
echo write does not wait for the completion of the namespace removal.
Before the removal completes, the test case recreates the namespace.
At this point, the sysfs uuid file for the old namespace still exists.
The test case misunderstands that the the sysfs uuid file would be for
the recreated namespace, and tries to read it. However, the removal
process for the old namespace deletes the sysfs uuid file at this point.
Then the read attempt fails and results in the errors.

To avoid the failure, wait for the namespace removal before recreating
the namespace. Modify nvmf_wait_for_ns() so that it can wait for
namespace creation and removal. Call nvmf_wait_for_ns() for removal
after _remove_nvmet_ns() call. While at it, reduce the wait time unit
from 1 second to 0.1 second to shorten test time.

The test case intends to catch the regression fixed by the kernel commit
ff0ffe5b7c3c ("nvme: fix namespace removal list"). I reverted the commit
from the kernel v6.11-rc4, then confirmed that the test case still can
catch the regression with this change.

Link: https://lore.kernel.org/linux-block/tczctp5tkr34o3k3f4dlyhuutgp2ycex6gdbjuqx4trn6ewm2i@qbkza3yr5wdd/
Fixes: 077211a0e9ff ("nvme: add test for creating/deleting file-ns")
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
---
Changes from v2:
* Avoided conditional stderr redirect of _find_nvme_ns

Changes from v1:
* Reused nvmf_wait_for_ns() instead of introuducing nvmf_wait_for_ns_removal()
* Added check of nvme_wait_for_ns() return value

 tests/nvme/052 | 25 ++++++++++++++++++-------
 1 file changed, 18 insertions(+), 7 deletions(-)

Comments

Nilay Shroff Sept. 2, 2024, 5:15 p.m. UTC | #1
On 9/2/24 16:24, Shin'ichiro Kawasaki wrote:
> The CKI project reported that the test case nvme/052 fails occasionally
> with the errors below:
> 
>   nvme/052 (tr=loop) (Test file-ns creation/deletion under one subsystem) [failed]
>       runtime    ...  22.209s
>       --- tests/nvme/052.out    2024-07-30 18:38:29.041716566 -0400
>       +++
> +/mnt/tests/gitlab.com/redhat/centos-stream/tests/kernel/kernel-tests/-/archive/production/kernel-t\
> ests-production.zip/storage/blktests/nvme/nvme-loop/blktests
> +/results/nodev_tr_loop/nvme/052.out.bad        2024-07-30 18:45:35.438067452 -0400
>       @@ -1,2 +1,4 @@
>        Running nvme/052
>       +cat: /sys/block/nvme1n2/uuid: No such file or directory
>       +cat: /sys/block/nvme1n2/uuid: No such file or directory
>        Test complete
> 
> The test case repeats creating and removing namespaces. When the test
> case removes the namespace by echoing 0 to the sysfs enable file, this
> echo write does not wait for the completion of the namespace removal.
> Before the removal completes, the test case recreates the namespace.
> At this point, the sysfs uuid file for the old namespace still exists.
> The test case misunderstands that the the sysfs uuid file would be for
> the recreated namespace, and tries to read it. However, the removal
> process for the old namespace deletes the sysfs uuid file at this point.
> Then the read attempt fails and results in the errors.
> 
> To avoid the failure, wait for the namespace removal before recreating
> the namespace. Modify nvmf_wait_for_ns() so that it can wait for
> namespace creation and removal. Call nvmf_wait_for_ns() for removal
> after _remove_nvmet_ns() call. While at it, reduce the wait time unit
> from 1 second to 0.1 second to shorten test time.
> 
> The test case intends to catch the regression fixed by the kernel commit
> ff0ffe5b7c3c ("nvme: fix namespace removal list"). I reverted the commit
> from the kernel v6.11-rc4, then confirmed that the test case still can
> catch the regression with this change.
> 
> Link: https://lore.kernel.org/linux-block/tczctp5tkr34o3k3f4dlyhuutgp2ycex6gdbjuqx4trn6ewm2i@qbkza3yr5wdd/
> Fixes: 077211a0e9ff ("nvme: add test for creating/deleting file-ns")
> Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
> ---
> Changes from v2:
> * Avoided conditional stderr redirect of _find_nvme_ns
> 
> Changes from v1:
> * Reused nvmf_wait_for_ns() instead of introuducing nvmf_wait_for_ns_removal()
> * Added check of nvme_wait_for_ns() return value
> 
>  tests/nvme/052 | 25 ++++++++++++++++++-------
>  1 file changed, 18 insertions(+), 7 deletions(-)
> 
> diff --git a/tests/nvme/052 b/tests/nvme/052
> index cf6061a..401f043 100755
> --- a/tests/nvme/052
> +++ b/tests/nvme/052
> @@ -20,23 +20,27 @@ set_conditions() {
>  	_set_nvme_trtype "$@"
>  }
>  
> +# Wait for the namespace with specified uuid to fulfill the specified condtion,
> +# "created" or "removed".
>  nvmf_wait_for_ns() {
>  	local ns
>  	local timeout="5"
>  	local uuid="$1"
> +	local condition="$2"
>  
> -	ns=$(_find_nvme_ns "${uuid}")
> +	ns=$(_find_nvme_ns "${uuid}" 2>> "$FULL")
>  
>  	start_time=$(date +%s)
> -	while [[ -z "$ns" ]]; do
> -		sleep 1
> +	while [[ -z "$ns" && "$condition" == created ]] ||
> +		      [[ -n "$ns" && "$condition" == removed ]]; do
> +		sleep .1
>  		end_time=$(date +%s)
>  		if (( end_time - start_time > timeout )); then
>  			echo "namespace with uuid \"${uuid}\" not " \
> -				"found within ${timeout} seconds"
> +				"${condition} within ${timeout} seconds"
>  			return 1
>  		fi
> -		ns=$(_find_nvme_ns "${uuid}")
> +		ns=$(_find_nvme_ns "${uuid}" 2>> "$FULL")
>  	done
>  
>  	return 0
> @@ -63,14 +67,21 @@ test() {
>  		_create_nvmet_ns "${def_subsysnqn}" "${i}" "$(_nvme_def_file_path).$i" "${uuid}"
>  
>  		# wait until async request is processed and ns is created
> -		nvmf_wait_for_ns "${uuid}"
> -		if [ $? -eq 1 ]; then
> +		if ! nvmf_wait_for_ns "${uuid}" created; then
>  			echo "FAIL"
>  			rm "$(_nvme_def_file_path).$i"
>  			break
>  		fi
>  
>  		_remove_nvmet_ns "${def_subsysnqn}" "${i}"
> +
> +		# wait until async request is processed and ns is removed
> +		if ! nvmf_wait_for_ns "${uuid}" removed; then
> +			echo "FAIL"
> +			rm "$(_nvme_def_file_path).$i"
> +			break
> +		fi
> +
>  		rm "$(_nvme_def_file_path).$i"
>  	}
>  	done

Looks good.
Reviewed-by: Nilay Shroff (nilay@linux.ibm.com)
Shin'ichiro Kawasaki Sept. 6, 2024, 8:22 a.m. UTC | #2
On Sep 02, 2024 / 19:54, Shin'ichiro Kawasaki wrote:
> The CKI project reported that the test case nvme/052 fails occasionally
> with the errors below:
[...]
> To avoid the failure, wait for the namespace removal before recreating
> the namespace. Modify nvmf_wait_for_ns() so that it can wait for
> namespace creation and removal. Call nvmf_wait_for_ns() for removal
> after _remove_nvmet_ns() call. While at it, reduce the wait time unit
> from 1 second to 0.1 second to shorten test time.
> 
> The test case intends to catch the regression fixed by the kernel commit
> ff0ffe5b7c3c ("nvme: fix namespace removal list"). I reverted the commit
> from the kernel v6.11-rc4, then confirmed that the test case still can
> catch the regression with this change.
> 
> Link: https://lore.kernel.org/linux-block/tczctp5tkr34o3k3f4dlyhuutgp2ycex6gdbjuqx4trn6ewm2i@qbkza3yr5wdd/
> Fixes: 077211a0e9ff ("nvme: add test for creating/deleting file-ns")
> Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>

I applied the patch. Nilay, thanks for your help!
Yi Zhang Sept. 7, 2024, 6:37 a.m. UTC | #3
On Fri, Sep 6, 2024 at 4:22 PM Shinichiro Kawasaki
<shinichiro.kawasaki@wdc.com> wrote:
>
> On Sep 02, 2024 / 19:54, Shin'ichiro Kawasaki wrote:
> > The CKI project reported that the test case nvme/052 fails occasionally
> > with the errors below:
> [...]
> > To avoid the failure, wait for the namespace removal before recreating
> > the namespace. Modify nvmf_wait_for_ns() so that it can wait for
> > namespace creation and removal. Call nvmf_wait_for_ns() for removal
> > after _remove_nvmet_ns() call. While at it, reduce the wait time unit
> > from 1 second to 0.1 second to shorten test time.
> >
> > The test case intends to catch the regression fixed by the kernel commit
> > ff0ffe5b7c3c ("nvme: fix namespace removal list"). I reverted the commit
> > from the kernel v6.11-rc4, then confirmed that the test case still can
> > catch the regression with this change.
> >
> > Link: https://lore.kernel.org/linux-block/tczctp5tkr34o3k3f4dlyhuutgp2ycex6gdbjuqx4trn6ewm2i@qbkza3yr5wdd/
> > Fixes: 077211a0e9ff ("nvme: add test for creating/deleting file-ns")
> > Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
>
> I applied the patch. Nilay, thanks for your help!
>

Thanks for the fix, I've verified the fix on the reproduced server.
BTW, after the change, the executing time also reduced to 7s from 22s


===============================294
nvme/052 (tr=loop) (Test file-ns creation/deletion under one subsystem) [passed]
    runtime  7.759s  ...  7.673s
Shin'ichiro Kawasaki Sept. 8, 2024, 11:36 p.m. UTC | #4
On Sep 07, 2024 / 14:37, Yi Zhang wrote:
> On Fri, Sep 6, 2024 at 4:22 PM Shinichiro Kawasaki
> <shinichiro.kawasaki@wdc.com> wrote:
> >
> > On Sep 02, 2024 / 19:54, Shin'ichiro Kawasaki wrote:
> > > The CKI project reported that the test case nvme/052 fails occasionally
> > > with the errors below:
> > [...]
> > > To avoid the failure, wait for the namespace removal before recreating
> > > the namespace. Modify nvmf_wait_for_ns() so that it can wait for
> > > namespace creation and removal. Call nvmf_wait_for_ns() for removal
> > > after _remove_nvmet_ns() call. While at it, reduce the wait time unit
> > > from 1 second to 0.1 second to shorten test time.
> > >
> > > The test case intends to catch the regression fixed by the kernel commit
> > > ff0ffe5b7c3c ("nvme: fix namespace removal list"). I reverted the commit
> > > from the kernel v6.11-rc4, then confirmed that the test case still can
> > > catch the regression with this change.
> > >
> > > Link: https://lore.kernel.org/linux-block/tczctp5tkr34o3k3f4dlyhuutgp2ycex6gdbjuqx4trn6ewm2i@qbkza3yr5wdd/
> > > Fixes: 077211a0e9ff ("nvme: add test for creating/deleting file-ns")
> > > Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
> >
> > I applied the patch. Nilay, thanks for your help!
> >
> 
> Thanks for the fix, I've verified the fix on the reproduced server.

Thanks for the confirmation :)

> BTW, after the change, the executing time also reduced to 7s from 22s

This runtime reduction is expected. I suggested a change for it, and it was
discussed here:

  https://lore.kernel.org/linux-block/0750187c-24ad-4073-9ba1-d47b0ee95062@linux.ibm.com/
diff mbox series

Patch

diff --git a/tests/nvme/052 b/tests/nvme/052
index cf6061a..401f043 100755
--- a/tests/nvme/052
+++ b/tests/nvme/052
@@ -20,23 +20,27 @@  set_conditions() {
 	_set_nvme_trtype "$@"
 }
 
+# Wait for the namespace with specified uuid to fulfill the specified condtion,
+# "created" or "removed".
 nvmf_wait_for_ns() {
 	local ns
 	local timeout="5"
 	local uuid="$1"
+	local condition="$2"
 
-	ns=$(_find_nvme_ns "${uuid}")
+	ns=$(_find_nvme_ns "${uuid}" 2>> "$FULL")
 
 	start_time=$(date +%s)
-	while [[ -z "$ns" ]]; do
-		sleep 1
+	while [[ -z "$ns" && "$condition" == created ]] ||
+		      [[ -n "$ns" && "$condition" == removed ]]; do
+		sleep .1
 		end_time=$(date +%s)
 		if (( end_time - start_time > timeout )); then
 			echo "namespace with uuid \"${uuid}\" not " \
-				"found within ${timeout} seconds"
+				"${condition} within ${timeout} seconds"
 			return 1
 		fi
-		ns=$(_find_nvme_ns "${uuid}")
+		ns=$(_find_nvme_ns "${uuid}" 2>> "$FULL")
 	done
 
 	return 0
@@ -63,14 +67,21 @@  test() {
 		_create_nvmet_ns "${def_subsysnqn}" "${i}" "$(_nvme_def_file_path).$i" "${uuid}"
 
 		# wait until async request is processed and ns is created
-		nvmf_wait_for_ns "${uuid}"
-		if [ $? -eq 1 ]; then
+		if ! nvmf_wait_for_ns "${uuid}" created; then
 			echo "FAIL"
 			rm "$(_nvme_def_file_path).$i"
 			break
 		fi
 
 		_remove_nvmet_ns "${def_subsysnqn}" "${i}"
+
+		# wait until async request is processed and ns is removed
+		if ! nvmf_wait_for_ns "${uuid}" removed; then
+			echo "FAIL"
+			rm "$(_nvme_def_file_path).$i"
+			break
+		fi
+
 		rm "$(_nvme_def_file_path).$i"
 	}
 	done