diff mbox series

[blktests] nvme/052: wait for namespace removal before recreating namespace

Message ID 20240820102013.781794-1-shinichiro.kawasaki@wdc.com (mailing list archive)
State New, archived
Headers show
Series [blktests] nvme/052: wait for namespace removal before recreating namespace | expand

Commit Message

Shin'ichiro Kawasaki Aug. 20, 2024, 10:20 a.m. UTC
The CKI project reported that the test case nvme/052 fails occasionally
with the errors below:

  nvme/052 (tr=loop) (Test file-ns creation/deletion under one subsystem) [failed]
      runtime    ...  22.209s
      --- tests/nvme/052.out    2024-07-30 18:38:29.041716566 -0400
      +++
+/mnt/tests/gitlab.com/redhat/centos-stream/tests/kernel/kernel-tests/-/archive/production/kernel-t\
ests-production.zip/storage/blktests/nvme/nvme-loop/blktests
+/results/nodev_tr_loop/nvme/052.out.bad        2024-07-30 18:45:35.438067452 -0400
      @@ -1,2 +1,4 @@
       Running nvme/052
      +cat: /sys/block/nvme1n2/uuid: No such file or directory
      +cat: /sys/block/nvme1n2/uuid: No such file or directory
       Test complete

The test case repeats creating and removing namespaces. When the test
case removes the namespace by echoing 0 to the sysfs enable file, this
echo write does not wait for the completion of the namespace removal.
Before the removal completes, the test case recreates the namespace.
At this point, the sysfs uuid file for the old namespace still exists.
The test case misunderstands that the the sysfs uuid file would be for
the recreated namespace, and tries to read it. However, the removal
process for the old namespace deletes the sysfs uuid file at this point.
Then the read attempt fails and results in the errors.

To avoid the failure, wait for the namespace removal before recreating
the namespace. For this purpose, add the new helper function
nvmf_wait_for_ns_removal(). To specify the namespace to wait for, get
the name of the namespace from nvmf_wait_for_ns(), and pass it to
nvmf_wait_for_ns_removal().

The test case intends to catch the regression fixed by the kernel commit
ff0ffe5b7c3c ("nvme: fix namespace removal list"). I reverted the commit
from the kernel v6.11-rc4, then confirmed that the test case still can
catch the regression with this change.

Link: https://lore.kernel.org/linux-block/tczctp5tkr34o3k3f4dlyhuutgp2ycex6gdbjuqx4trn6ewm2i@qbkza3yr5wdd/
Fixes: 077211a0e9ff ("nvme: add test for creating/deleting file-ns")
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
---
Nelay, Yi, thank you for the feedbacks for the discussion
thread at the Link. Here's the formal fix patch.

 tests/nvme/052 | 22 ++++++++++++++++++++--
 1 file changed, 20 insertions(+), 2 deletions(-)

Comments

Nilay Shroff Aug. 20, 2024, 4:55 p.m. UTC | #1
On 8/20/24 15:50, Shin'ichiro Kawasaki wrote:
> The CKI project reported that the test case nvme/052 fails occasionally
> with the errors below:
> 
>   nvme/052 (tr=loop) (Test file-ns creation/deletion under one subsystem) [failed]
>       runtime    ...  22.209s
>       --- tests/nvme/052.out    2024-07-30 18:38:29.041716566 -0400
>       +++
> +/mnt/tests/gitlab.com/redhat/centos-stream/tests/kernel/kernel-tests/-/archive/production/kernel-t\
> ests-production.zip/storage/blktests/nvme/nvme-loop/blktests
> +/results/nodev_tr_loop/nvme/052.out.bad        2024-07-30 18:45:35.438067452 -0400
>       @@ -1,2 +1,4 @@
>        Running nvme/052
>       +cat: /sys/block/nvme1n2/uuid: No such file or directory
>       +cat: /sys/block/nvme1n2/uuid: No such file or directory
>        Test complete
> 
> The test case repeats creating and removing namespaces. When the test
> case removes the namespace by echoing 0 to the sysfs enable file, this
> echo write does not wait for the completion of the namespace removal.
> Before the removal completes, the test case recreates the namespace.
> At this point, the sysfs uuid file for the old namespace still exists.
> The test case misunderstands that the the sysfs uuid file would be for
> the recreated namespace, and tries to read it. However, the removal
> process for the old namespace deletes the sysfs uuid file at this point.
> Then the read attempt fails and results in the errors.
> 
> To avoid the failure, wait for the namespace removal before recreating
> the namespace. For this purpose, add the new helper function
> nvmf_wait_for_ns_removal(). To specify the namespace to wait for, get
> the name of the namespace from nvmf_wait_for_ns(), and pass it to
> nvmf_wait_for_ns_removal().
> 
> The test case intends to catch the regression fixed by the kernel commit
> ff0ffe5b7c3c ("nvme: fix namespace removal list"). I reverted the commit
> from the kernel v6.11-rc4, then confirmed that the test case still can
> catch the regression with this change.
> 
> Link: https://lore.kernel.org/linux-block/tczctp5tkr34o3k3f4dlyhuutgp2ycex6gdbjuqx4trn6ewm2i@qbkza3yr5wdd/
> Fixes: 077211a0e9ff ("nvme: add test for creating/deleting file-ns")
> Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
> ---
> Nelay, Yi, thank you for the feedbacks for the discussion
> thread at the Link. Here's the formal fix patch.
> 
>  tests/nvme/052 | 22 ++++++++++++++++++++--
>  1 file changed, 20 insertions(+), 2 deletions(-)
> 
> diff --git a/tests/nvme/052 b/tests/nvme/052
> index cf6061a..e1ac823 100755
> --- a/tests/nvme/052
> +++ b/tests/nvme/052
> @@ -39,15 +39,32 @@ nvmf_wait_for_ns() {
>  		ns=$(_find_nvme_ns "${uuid}")
>  	done
>  
> +	echo "$ns"
>  	return 0
>  }
>  
> +nvmf_wait_for_ns_removal() {
> +	local ns=$1 i
> +
> +	for ((i = 0; i < 10; i++)); do
> +		if [[ ! -e /dev/$ns ]]; then
> +			return
> +		fi
> +		sleep .1
> +		echo "wait removal of $ns" >> "$FULL"
> +	done
> +
> +	if [[ -e /dev/$ns ]]; then
> +		echo "Failed to remove the namespace $ns"
> +	fi
> +}
> +
Under nvmf_wait_for_ns_removal(), instead of checking the existence of "/dev/$ns", 
how about checking the existence of file "/sys/block/$ns"? As we know, when this issue 
manifests, we have a stale entry "/sys/block/$ns/$uuid" lurking around from the 
previous iteration for sometime causing the observed symptom. So I think, we may reuse the 
_find_nvme_ns() function to wait until the stale "/sys/block/$ns/$uuid" file 
exists. Maybe something like below:

nvmf_wait_for_ns_removal() {
        local ns
        local timeout="5"
        local uuid="$1"

        ns=$(_find_nvme_ns "${uuid}")

        start_time=$(date +%s)
        while [[ ! -z "$ns" ]]; do
                sleep 1
                end_time=$(date +%s)
                if (( end_time - start_time > timeout )); then
                        echo "namespace with uuid \"${uuid}\" still " \
                                "not deleted within ${timeout} seconds"
                        return 1
                fi
		echo "Waiting for $ns removal" >> ${FULL}
                ns=$(_find_nvme_ns "${uuid}")
				
        done

        return 0
}

Thanks,
--Nilay
Shin'ichiro Kawasaki Aug. 21, 2024, 7:21 a.m. UTC | #2
On Aug 20, 2024 / 22:25, Nilay Shroff wrote:
> 
> 
> On 8/20/24 15:50, Shin'ichiro Kawasaki wrote:
[...]
> > diff --git a/tests/nvme/052 b/tests/nvme/052
> > index cf6061a..e1ac823 100755
> > --- a/tests/nvme/052
> > +++ b/tests/nvme/052
> > @@ -39,15 +39,32 @@ nvmf_wait_for_ns() {
> >  		ns=$(_find_nvme_ns "${uuid}")
> >  	done
> >  
> > +	echo "$ns"
> >  	return 0
> >  }
> >  
> > +nvmf_wait_for_ns_removal() {
> > +	local ns=$1 i
> > +
> > +	for ((i = 0; i < 10; i++)); do
> > +		if [[ ! -e /dev/$ns ]]; then
> > +			return
> > +		fi
> > +		sleep .1
> > +		echo "wait removal of $ns" >> "$FULL"
> > +	done
> > +
> > +	if [[ -e /dev/$ns ]]; then
> > +		echo "Failed to remove the namespace $ns"
> > +	fi
> > +}
> > +
> Under nvmf_wait_for_ns_removal(), instead of checking the existence of "/dev/$ns", 
> how about checking the existence of file "/sys/block/$ns"? As we know, when this issue 
> manifests, we have a stale entry "/sys/block/$ns/$uuid" lurking around from the 
> previous iteration for sometime causing the observed symptom. So I think, we may reuse the 
> _find_nvme_ns() function to wait until the stale "/sys/block/$ns/$uuid" file 
> exists.

It sounds a good idea to reuse _find_nvme_ns().

> Maybe something like below:
> 
> nvmf_wait_for_ns_removal() {
>         local ns
>         local timeout="5"
>         local uuid="$1"
> 
>         ns=$(_find_nvme_ns "${uuid}")

I tried this, and found that the _find_nvme_ns call spits out the failure
"cat: /sys/block/nvme1n2/uuid: No such file or directory", because the
delayed namespace removal can happen here. To suppress the error message,
this line should be,

         ns=$(_find_nvme_ns "${uuid}" 2> /dev/null)

> 
>         start_time=$(date +%s)
>         while [[ ! -z "$ns" ]]; do
>                 sleep 1
>                 end_time=$(date +%s)
>                 if (( end_time - start_time > timeout )); then
>                         echo "namespace with uuid \"${uuid}\" still " \
>                                 "not deleted within ${timeout} seconds"
>                         return 1
>                 fi
> 		echo "Waiting for $ns removal" >> ${FULL}
>                 ns=$(_find_nvme_ns "${uuid}")

Same comment as above.

> 				
>         done
> 
>         return 0
> }

I found that your nvmf_wait_for_ns_removal() above has certain amount of
duplication with the existing nvmf_wait_for_ns(). To avoid the duplication,
I suggest to reuse nvmf_wait_for_ns() and add a new argument to control wait
target event: namespace 'created' or 'removed'. With this idea, I created the
patch below. I confirmed the patch avoids the failure.

One drawback of this new patch based on your suggestion is that it extends
execution time of the test case from 20+ seconds to 40+ seconds. In most cases,
the while loop condition check in nvmf_wait_for_ns() is true at the first time,
and false at the the second time. So, nvmf_wait_for_ns() takes 1 second for one
time "sleep 1". Before applying this patch, it took 20+ seconds for 20 times
iteration. After applying the patch, it takes 40+ seconds, since one iteration
calls nvmf_wait_for_ns() twice. So how about to reduce the sleep time from 1 to
0.1? I tried it and observed that it reduced the runtime from 40+ seconds to 10+
seconds.

diff --git a/tests/nvme/052 b/tests/nvme/052
index cf6061a..22e0bf5 100755
--- a/tests/nvme/052
+++ b/tests/nvme/052
@@ -20,23 +20,35 @@ set_conditions() {
 	_set_nvme_trtype "$@"
 }
 
+find_nvme_ns() {
+	if [[ "$2" == removed ]]; then
+		_find_nvme_ns "$1" 2> /dev/null
+	else
+		_find_nvme_ns "$1"
+	fi
+}
+
+# Wait for the namespace with specified uuid to fulfill the specified condtion,
+# "created" or "removed".
 nvmf_wait_for_ns() {
 	local ns
 	local timeout="5"
 	local uuid="$1"
+	local condition="$2"
 
-	ns=$(_find_nvme_ns "${uuid}")
+	ns=$(find_nvme_ns "${uuid}" "${condition}")
 
 	start_time=$(date +%s)
-	while [[ -z "$ns" ]]; do
+	while [[ -z "$ns" && "$condition" == created  ]] ||
+		      [[ -n "$ns" && "$condition" == removed ]]; do
 		sleep 1
 		end_time=$(date +%s)
 		if (( end_time - start_time > timeout )); then
 			echo "namespace with uuid \"${uuid}\" not " \
-				"found within ${timeout} seconds"
+				"${condition} within ${timeout} seconds"
 			return 1
 		fi
-		ns=$(_find_nvme_ns "${uuid}")
+		ns=$(find_nvme_ns "${uuid}" "${condition}")
 	done
 
 	return 0
@@ -63,7 +75,7 @@ test() {
 		_create_nvmet_ns "${def_subsysnqn}" "${i}" "$(_nvme_def_file_path).$i" "${uuid}"
 
 		# wait until async request is processed and ns is created
-		nvmf_wait_for_ns "${uuid}"
+		nvmf_wait_for_ns "${uuid}" created
 		if [ $? -eq 1 ]; then
 			echo "FAIL"
 			rm "$(_nvme_def_file_path).$i"
@@ -71,6 +83,10 @@ test() {
 		fi
 
 		_remove_nvmet_ns "${def_subsysnqn}" "${i}"
+
+		# wait until async request is processed and ns is removed
+		nvmf_wait_for_ns "${uuid}" removed
+
 		rm "$(_nvme_def_file_path).$i"
 	}
 	done
Nilay Shroff Aug. 21, 2024, 9:28 a.m. UTC | #3
On 8/21/24 12:51, Shinichiro Kawasaki wrote:
>> Under nvmf_wait_for_ns_removal(), instead of checking the existence of "/dev/$ns", 
>> how about checking the existence of file "/sys/block/$ns"? As we know, when this issue 
>> manifests, we have a stale entry "/sys/block/$ns/$uuid" lurking around from the 
>> previous iteration for sometime causing the observed symptom. So I think, we may reuse the 
>> _find_nvme_ns() function to wait until the stale "/sys/block/$ns/$uuid" file 
>> exists.
> 
> It sounds a good idea to reuse _find_nvme_ns().
> 
>> Maybe something like below:
>>
>> nvmf_wait_for_ns_removal() {
>>         local ns
>>         local timeout="5"
>>         local uuid="$1"
>>
>>         ns=$(_find_nvme_ns "${uuid}")
> 
> I tried this, and found that the _find_nvme_ns call spits out the failure
> "cat: /sys/block/nvme1n2/uuid: No such file or directory", because the
> delayed namespace removal can happen here. To suppress the error message,
> this line should be,
> 
>          ns=$(_find_nvme_ns "${uuid}" 2> /dev/null)
> 
yeah I agree here. We need to suppress this error.
>>
>>         start_time=$(date +%s)
>>         while [[ ! -z "$ns" ]]; do
>>                 sleep 1
>>                 end_time=$(date +%s)
>>                 if (( end_time - start_time > timeout )); then
>>                         echo "namespace with uuid \"${uuid}\" still " \
>>                                 "not deleted within ${timeout} seconds"
>>                         return 1
>>                 fi
>> 		echo "Waiting for $ns removal" >> ${FULL}
>>                 ns=$(_find_nvme_ns "${uuid}")
> 
> Same comment as above.
> 
>> 				
>>         done
>>
>>         return 0
>> }
> 
> I found that your nvmf_wait_for_ns_removal() above has certain amount of
> duplication with the existing nvmf_wait_for_ns(). To avoid the duplication,
> I suggest to reuse nvmf_wait_for_ns() and add a new argument to control wait
> target event: namespace 'created' or 'removed'. With this idea, I created the
> patch below. I confirmed the patch avoids the failure.
> 
Sounds good!

> One drawback of this new patch based on your suggestion is that it extends
> execution time of the test case from 20+ seconds to 40+ seconds. In most cases,
> the while loop condition check in nvmf_wait_for_ns() is true at the first time,
> and false at the the second time. So, nvmf_wait_for_ns() takes 1 second for one
> time "sleep 1". Before applying this patch, it took 20+ seconds for 20 times
> iteration. After applying the patch, it takes 40+ seconds, since one iteration
> calls nvmf_wait_for_ns() twice. So how about to reduce the sleep time from 1 to
> 0.1? I tried it and observed that it reduced the runtime from 40+ seconds to 10+
> seconds.
> 
If using sleep of 0.1 second saves execution time then yes this makes sense.

> diff --git a/tests/nvme/052 b/tests/nvme/052
> index cf6061a..22e0bf5 100755
> --- a/tests/nvme/052
> +++ b/tests/nvme/052
> @@ -20,23 +20,35 @@ set_conditions() {
>  	_set_nvme_trtype "$@"
>  }
>  
> +find_nvme_ns() {
> +	if [[ "$2" == removed ]]; then
> +		_find_nvme_ns "$1" 2> /dev/null
> +	else
> +		_find_nvme_ns "$1"
> +	fi
> +}
> +
> +# Wait for the namespace with specified uuid to fulfill the specified condtion,
> +# "created" or "removed".
>  nvmf_wait_for_ns() {
>  	local ns
>  	local timeout="5"
>  	local uuid="$1"
> +	local condition="$2"
>  
> -	ns=$(_find_nvme_ns "${uuid}")
> +	ns=$(find_nvme_ns "${uuid}" "${condition}")
>  
>  	start_time=$(date +%s)
> -	while [[ -z "$ns" ]]; do
> +	while [[ -z "$ns" && "$condition" == created  ]] ||
> +		      [[ -n "$ns" && "$condition" == removed ]]; do
>  		sleep 1
>  		end_time=$(date +%s)
>  		if (( end_time - start_time > timeout )); then
>  			echo "namespace with uuid \"${uuid}\" not " \
> -				"found within ${timeout} seconds"
> +				"${condition} within ${timeout} seconds"
>  			return 1
>  		fi
> -		ns=$(_find_nvme_ns "${uuid}")
> +		ns=$(find_nvme_ns "${uuid}" "${condition}")
>  	done
>  
>  	return 0
> @@ -63,7 +75,7 @@ test() {
>  		_create_nvmet_ns "${def_subsysnqn}" "${i}" "$(_nvme_def_file_path).$i" "${uuid}"
>  
>  		# wait until async request is processed and ns is created
> -		nvmf_wait_for_ns "${uuid}"
> +		nvmf_wait_for_ns "${uuid}" created
>  		if [ $? -eq 1 ]; then
>  			echo "FAIL"
>  			rm "$(_nvme_def_file_path).$i"
> @@ -71,6 +83,10 @@ test() {
>  		fi
>  
>  		_remove_nvmet_ns "${def_subsysnqn}" "${i}"
> +
> +		# wait until async request is processed and ns is removed
> +		nvmf_wait_for_ns "${uuid}" removed
> +
As nvme_wait_for_ns() returns either 0 (success) or 1 (failure), I think we 
should check the return status of nvme_wait_for_ns() here and bail out in case 
it returns failure same as what we do above while creating namespace. 

Another point: I think, we may always suppress error from _find_nvme_ns() irrespective
of it's being called while "creating" or "removing" the namespace assuming we always 
check the return status of nvme_wait_for_ns() in the main loop. So ideally we shall 
invoke _find_nvme_ns() from nvme_wait_for_ns() as below:

ns=$(_find_nvme_ns "${uuid}" 2>/dev/null)

On a cosmetic note: Maybe we can use readonly constants to make "created" and "removed"
parameters looks more elegant/readable. 

# Define constants
readonly NS_ADD="added"
readonly NS_DEL="deleted" 
 
Now we may reuse above constants instead of "created" and "removed". You may rename 
constant name if you don't like the name I used above :)

Thanks,
--Nilay
Shin'ichiro Kawasaki Aug. 22, 2024, 8:33 a.m. UTC | #4
On Aug 21, 2024 / 14:58, Nilay Shroff wrote:
> 
> 
> On 8/21/24 12:51, Shinichiro Kawasaki wrote:
[...]
> > diff --git a/tests/nvme/052 b/tests/nvme/052
> > index cf6061a..22e0bf5 100755
> > --- a/tests/nvme/052
> > +++ b/tests/nvme/052
> > @@ -20,23 +20,35 @@ set_conditions() {
> >  	_set_nvme_trtype "$@"
> >  }
> >  
> > +find_nvme_ns() {
> > +	if [[ "$2" == removed ]]; then
> > +		_find_nvme_ns "$1" 2> /dev/null
> > +	else
> > +		_find_nvme_ns "$1"
> > +	fi
> > +}
> > +
> > +# Wait for the namespace with specified uuid to fulfill the specified condtion,
> > +# "created" or "removed".
> >  nvmf_wait_for_ns() {
> >  	local ns
> >  	local timeout="5"
> >  	local uuid="$1"
> > +	local condition="$2"
> >  
> > -	ns=$(_find_nvme_ns "${uuid}")
> > +	ns=$(find_nvme_ns "${uuid}" "${condition}")
> >  
> >  	start_time=$(date +%s)
> > -	while [[ -z "$ns" ]]; do
> > +	while [[ -z "$ns" && "$condition" == created  ]] ||
> > +		      [[ -n "$ns" && "$condition" == removed ]]; do
> >  		sleep 1
> >  		end_time=$(date +%s)
> >  		if (( end_time - start_time > timeout )); then
> >  			echo "namespace with uuid \"${uuid}\" not " \
> > -				"found within ${timeout} seconds"
> > +				"${condition} within ${timeout} seconds"
> >  			return 1
> >  		fi
> > -		ns=$(_find_nvme_ns "${uuid}")
> > +		ns=$(find_nvme_ns "${uuid}" "${condition}")
> >  	done
> >  
> >  	return 0
> > @@ -63,7 +75,7 @@ test() {
> >  		_create_nvmet_ns "${def_subsysnqn}" "${i}" "$(_nvme_def_file_path).$i" "${uuid}"
> >  
> >  		# wait until async request is processed and ns is created
> > -		nvmf_wait_for_ns "${uuid}"
> > +		nvmf_wait_for_ns "${uuid}" created
> >  		if [ $? -eq 1 ]; then
> >  			echo "FAIL"
> >  			rm "$(_nvme_def_file_path).$i"
> > @@ -71,6 +83,10 @@ test() {
> >  		fi
> >  
> >  		_remove_nvmet_ns "${def_subsysnqn}" "${i}"
> > +
> > +		# wait until async request is processed and ns is removed
> > +		nvmf_wait_for_ns "${uuid}" removed
> > +
> As nvme_wait_for_ns() returns either 0 (success) or 1 (failure), I think we 
> should check the return status of nvme_wait_for_ns() here and bail out in case 
> it returns failure same as what we do above while creating namespace.

Agreed, I will add the check to v2.

> 
> Another point: I think, we may always suppress error from _find_nvme_ns() irrespective
> of it's being called while "creating" or "removing" the namespace assuming we always 
> check the return status of nvme_wait_for_ns() in the main loop. So ideally we shall 
> invoke _find_nvme_ns() from nvme_wait_for_ns() as below:
> 
> ns=$(_find_nvme_ns "${uuid}" 2>/dev/null)

It doesn't sound correct to suppress the _find_nvme_ns errors always. We found
the issue since _find_nvme_ns reported the error at after _create_nvmet_ns()
call. If we suppress it, I can not be confident that this fix avoids the error.

> 
> On a cosmetic note: Maybe we can use readonly constants to make "created" and "removed"
> parameters looks more elegant/readable. 
> 
> # Define constants
> readonly NS_ADD="added"
> readonly NS_DEL="deleted" 
>  
> Now we may reuse above constants instead of "created" and "removed". You may rename 
> constant name if you don't like the name I used above :)

This sounds too much for me. "created" and "removed" are constant strings, so
I'm not sure about the merit of introducing the constant variables. Number of
characters will not change much either.

> 
> Thanks,
> --Nilay
Nilay Shroff Aug. 22, 2024, 11:15 a.m. UTC | #5
On 8/22/24 14:03, Shinichiro Kawasaki wrote:
> On Aug 21, 2024 / 14:58, Nilay Shroff wrote:
>>
>>
>> On 8/21/24 12:51, Shinichiro Kawasaki wrote:
> [...]
>>> diff --git a/tests/nvme/052 b/tests/nvme/052
>>> index cf6061a..22e0bf5 100755
>>> --- a/tests/nvme/052
>>> +++ b/tests/nvme/052
>>> @@ -20,23 +20,35 @@ set_conditions() {
>>>  	_set_nvme_trtype "$@"
>>>  }
>>>  
>>> +find_nvme_ns() {
>>> +	if [[ "$2" == removed ]]; then
>>> +		_find_nvme_ns "$1" 2> /dev/null
>>> +	else
>>> +		_find_nvme_ns "$1"
>>> +	fi
>>> +}
>>> +
>>> +# Wait for the namespace with specified uuid to fulfill the specified condtion,
>>> +# "created" or "removed".
>>>  nvmf_wait_for_ns() {
>>>  	local ns
>>>  	local timeout="5"
>>>  	local uuid="$1"
>>> +	local condition="$2"
>>>  
>>> -	ns=$(_find_nvme_ns "${uuid}")
>>> +	ns=$(find_nvme_ns "${uuid}" "${condition}")
>>>  
>>>  	start_time=$(date +%s)
>>> -	while [[ -z "$ns" ]]; do
>>> +	while [[ -z "$ns" && "$condition" == created  ]] ||
>>> +		      [[ -n "$ns" && "$condition" == removed ]]; do
>>>  		sleep 1
>>>  		end_time=$(date +%s)
>>>  		if (( end_time - start_time > timeout )); then
>>>  			echo "namespace with uuid \"${uuid}\" not " \
>>> -				"found within ${timeout} seconds"
>>> +				"${condition} within ${timeout} seconds"
>>>  			return 1
>>>  		fi
>>> -		ns=$(_find_nvme_ns "${uuid}")
>>> +		ns=$(find_nvme_ns "${uuid}" "${condition}")
>>>  	done
>>>  
>>>  	return 0
>>> @@ -63,7 +75,7 @@ test() {
>>>  		_create_nvmet_ns "${def_subsysnqn}" "${i}" "$(_nvme_def_file_path).$i" "${uuid}"
>>>  
>>>  		# wait until async request is processed and ns is created
>>> -		nvmf_wait_for_ns "${uuid}"
>>> +		nvmf_wait_for_ns "${uuid}" created
>>>  		if [ $? -eq 1 ]; then
>>>  			echo "FAIL"
>>>  			rm "$(_nvme_def_file_path).$i"
>>> @@ -71,6 +83,10 @@ test() {
>>>  		fi
>>>  
>>>  		_remove_nvmet_ns "${def_subsysnqn}" "${i}"
>>> +
>>> +		# wait until async request is processed and ns is removed
>>> +		nvmf_wait_for_ns "${uuid}" removed
>>> +
>> As nvme_wait_for_ns() returns either 0 (success) or 1 (failure), I think we 
>> should check the return status of nvme_wait_for_ns() here and bail out in case 
>> it returns failure same as what we do above while creating namespace.
> 
> Agreed, I will add the check to v2.
> 
>>
>> Another point: I think, we may always suppress error from _find_nvme_ns() irrespective
>> of it's being called while "creating" or "removing" the namespace assuming we always 
>> check the return status of nvme_wait_for_ns() in the main loop. So ideally we shall 
>> invoke _find_nvme_ns() from nvme_wait_for_ns() as below:
>>
>> ns=$(_find_nvme_ns "${uuid}" 2>/dev/null)
> 
> It doesn't sound correct to suppress the _find_nvme_ns errors always. We found
> the issue since _find_nvme_ns reported the error at after _create_nvmet_ns()
> call. If we suppress it, I can not be confident that this fix avoids the error.
>
Agreed, however my thought was that if we always(while creating as well as deleting namespace) 
validate the return value from nvme_wait_for_ns() in the main loop then we should be OK and 
we may not need to worry about any error generated from the _find_nvme_ns(). The return 
status from nvme_wait_for_ns() is good enough for main loop to proceed further or bail out.
Having said that, it's debatable as what you pointed out is also valid. The only thing which 
looks odd to me is that we have to suppress the error from _find_nvme_ns() for one case but 
not for the other.  
>>
>> On a cosmetic note: Maybe we can use readonly constants to make "created" and "removed"
>> parameters looks more elegant/readable. 
>>
>> # Define constants
>> readonly NS_ADD="added"
>> readonly NS_DEL="deleted" 
>>  
>> Now we may reuse above constants instead of "created" and "removed". You may rename 
>> constant name if you don't like the name I used above :)
> 
> This sounds too much for me. "created" and "removed" are constant strings, so
> I'm not sure about the merit of introducing the constant variables. Number of
> characters will not change much either.
> 
Alright! Maybe this is more of a personal test.. so I am fine either way.

Thanks,
--Nilay
Shin'ichiro Kawasaki Aug. 22, 2024, 11:59 a.m. UTC | #6
On Aug 22, 2024 / 16:45, Nilay Shroff wrote:
> 
> 
> On 8/22/24 14:03, Shinichiro Kawasaki wrote:
> > On Aug 21, 2024 / 14:58, Nilay Shroff wrote:
[...]
> >> Another point: I think, we may always suppress error from _find_nvme_ns() irrespective
> >> of it's being called while "creating" or "removing" the namespace assuming we always 
> >> check the return status of nvme_wait_for_ns() in the main loop. So ideally we shall 
> >> invoke _find_nvme_ns() from nvme_wait_for_ns() as below:
> >>
> >> ns=$(_find_nvme_ns "${uuid}" 2>/dev/null)
> > 
> > It doesn't sound correct to suppress the _find_nvme_ns errors always. We found
> > the issue since _find_nvme_ns reported the error at after _create_nvmet_ns()
> > call. If we suppress it, I can not be confident that this fix avoids the error.
> >
> Agreed, however my thought was that if we always(while creating as well as deleting namespace) 
> validate the return value from nvme_wait_for_ns() in the main loop then we should be OK and 
> we may not need to worry about any error generated from the _find_nvme_ns(). The return 
> status from nvme_wait_for_ns() is good enough for main loop to proceed further or bail out.
> Having said that, it's debatable as what you pointed out is also valid. The only thing which 
> looks odd to me is that we have to suppress the error from _find_nvme_ns() for one case but 
> not for the other.

I can agree with this point: it is odd to suppress errors only for the namespace
removal case. I did so to catch other potential errors that _find_nvme_ns() may
return in the future for the namespace creation case. But still this way misses
other potential errors for the namespace removal case. Maybe I was overthinking.
Let's simplify the test with just doing

   ns=$(_find_nvme_ns "${uuid}" 2>/dev/null)

as you suggest. Still the test case can detect the kernel regression, and I
think it's good enough. Will reflect this to v2.
Daniel Wagner Aug. 22, 2024, 2:49 p.m. UTC | #7
On Thu, Aug 22, 2024 at 11:59:35AM GMT, Shinichiro Kawasaki wrote:
> I can agree with this point: it is odd to suppress errors only for the namespace
> removal case. I did so to catch other potential errors that _find_nvme_ns() may
> return in the future for the namespace creation case. But still this way misses
> other potential errors for the namespace removal case. Maybe I was overthinking.
> Let's simplify the test with just doing
> 
>    ns=$(_find_nvme_ns "${uuid}" 2>/dev/null)
> 
> as you suggest. Still the test case can detect the kernel regression, and I
> think it's good enough. Will reflect this to v2.

Not sure if this is relevant, but I'd like to see that we return error
codes so that the caller can actually decide to ignore the failure or
not.
Nilay Shroff Aug. 22, 2024, 3:43 p.m. UTC | #8
On 8/22/24 20:19, Daniel Wagner wrote:
> On Thu, Aug 22, 2024 at 11:59:35AM GMT, Shinichiro Kawasaki wrote:
>> I can agree with this point: it is odd to suppress errors only for the namespace
>> removal case. I did so to catch other potential errors that _find_nvme_ns() may
>> return in the future for the namespace creation case. But still this way misses
>> other potential errors for the namespace removal case. Maybe I was overthinking.
>> Let's simplify the test with just doing
>>
>>    ns=$(_find_nvme_ns "${uuid}" 2>/dev/null)
>>
>> as you suggest. Still the test case can detect the kernel regression, and I
>> think it's good enough. Will reflect this to v2.
> 
> Not sure if this is relevant, but I'd like to see that we return error
> codes so that the caller can actually decide to ignore the failure or
> not.
The _find_nvme_ns(), when fails to find the relevant namespace, it returns 
"empty string" and if it could find namespace then it returns the namespace 
value to the caller. And then caller (in this case nvmf_wait_for_ns()) would 
take action depending on the return value from _find_nvme_ns().

Thanks,
--Nilay
Shin'ichiro Kawasaki Sept. 2, 2024, 10:54 a.m. UTC | #9
On Aug 22, 2024 / 21:13, Nilay Shroff wrote:
> 
> 
> On 8/22/24 20:19, Daniel Wagner wrote:
> > On Thu, Aug 22, 2024 at 11:59:35AM GMT, Shinichiro Kawasaki wrote:
> >> I can agree with this point: it is odd to suppress errors only for the namespace
> >> removal case. I did so to catch other potential errors that _find_nvme_ns() may
> >> return in the future for the namespace creation case. But still this way misses
> >> other potential errors for the namespace removal case. Maybe I was overthinking.
> >> Let's simplify the test with just doing
> >>
> >>    ns=$(_find_nvme_ns "${uuid}" 2>/dev/null)

While I was preparing for v3, I noticed that it would be the better to redirect
the stderr to $FULL than /dev/null, since it will leave some clue on failure.
Will reflect this idea to v3.

> >>
> >> as you suggest. Still the test case can detect the kernel regression, and I
> >> think it's good enough. Will reflect this to v2.
> > 
> > Not sure if this is relevant, but I'd like to see that we return error
> > codes so that the caller can actually decide to ignore the failure or
> > not.
> The _find_nvme_ns(), when fails to find the relevant namespace, it returns 
> "empty string" and if it could find namespace then it returns the namespace 
> value to the caller. And then caller (in this case nvmf_wait_for_ns()) would 
> take action depending on the return value from _find_nvme_ns().

Even if _find_nvme_ns() returns a different error code for the "cat:
/sys/block/nvmeXnY/uuid: No such file or directory" error, it does not decide
the condition to ignore the error or not. So I think the "empty string" check
for the _find_nvme_ns() is enough.

Anyway, will send out v3 soon.
diff mbox series

Patch

diff --git a/tests/nvme/052 b/tests/nvme/052
index cf6061a..e1ac823 100755
--- a/tests/nvme/052
+++ b/tests/nvme/052
@@ -39,15 +39,32 @@  nvmf_wait_for_ns() {
 		ns=$(_find_nvme_ns "${uuid}")
 	done
 
+	echo "$ns"
 	return 0
 }
 
+nvmf_wait_for_ns_removal() {
+	local ns=$1 i
+
+	for ((i = 0; i < 10; i++)); do
+		if [[ ! -e /dev/$ns ]]; then
+			return
+		fi
+		sleep .1
+		echo "wait removal of $ns" >> "$FULL"
+	done
+
+	if [[ -e /dev/$ns ]]; then
+		echo "Failed to remove the namespace $ns"
+	fi
+}
+
 test() {
 	echo "Running ${TEST_NAME}"
 
 	_setup_nvmet
 
-	local iterations=20
+	local iterations=20 ns
 
 	_nvmet_target_setup
 
@@ -63,7 +80,7 @@  test() {
 		_create_nvmet_ns "${def_subsysnqn}" "${i}" "$(_nvme_def_file_path).$i" "${uuid}"
 
 		# wait until async request is processed and ns is created
-		nvmf_wait_for_ns "${uuid}"
+		ns=$(nvmf_wait_for_ns "${uuid}")
 		if [ $? -eq 1 ]; then
 			echo "FAIL"
 			rm "$(_nvme_def_file_path).$i"
@@ -71,6 +88,7 @@  test() {
 		fi
 
 		_remove_nvmet_ns "${def_subsysnqn}" "${i}"
+		nvmf_wait_for_ns_removal "$ns"
 		rm "$(_nvme_def_file_path).$i"
 	}
 	done