diff mbox series

[blktests] nvme/{033-037}: timeout while waiting for nvme passthru namespace device

Message ID 20240924084907.143999-1-nilay@linux.ibm.com (mailing list archive)
State New, archived
Headers show
Series [blktests] nvme/{033-037}: timeout while waiting for nvme passthru namespace device | expand

Commit Message

Nilay Shroff Sept. 24, 2024, 8:48 a.m. UTC
Avoid waiting indefinitely for nvme passthru namespace block device
to appear. Wait for up to 5 seconds and during this time if namespace
block device doesn't appear then bail out and FAIL the test.

Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
---
Hi,

You may find more details about this issue here[1]. 

I found that blktest nvme/033-037 hangs indefinitely when 
kernel rejects the passthru target namespace due to the 
duplicate IDs. This patch helps address this issue by  
ensuring that we bail out and fail the test if for any 
reason passthru target namspace is not created on the 
host. The relevant kernel patchv2 to fix the issue with 
duplicate IDs while using passthru loop target can be
found here[2].

[1]: https://lore.kernel.org/all/8b17203f-ea4b-403b-a204-4fbc00c261ca@linux.ibm.com/
[2]: https://lore.kernel.org/all/20240921070547.531991-1-nilay@linux.ibm.com/

Thanks!
---
 tests/nvme/033 |  7 +++++--
 tests/nvme/034 |  7 +++++--
 tests/nvme/035 |  6 +++---
 tests/nvme/036 | 14 ++++++++------
 tests/nvme/037 |  6 +++++-
 tests/nvme/rc  | 12 +++++++++++-
 6 files changed, 37 insertions(+), 15 deletions(-)

Comments

Shinichiro Kawasaki Sept. 24, 2024, 12:14 p.m. UTC | #1
On Sep 24, 2024 / 14:18, Nilay Shroff wrote:
> Avoid waiting indefinitely for nvme passthru namespace block device
> to appear. Wait for up to 5 seconds and during this time if namespace
> block device doesn't appear then bail out and FAIL the test.
> 
> Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
> ---
> Hi,
> 
> You may find more details about this issue here[1]. 
> 
> I found that blktest nvme/033-037 hangs indefinitely when 
> kernel rejects the passthru target namespace due to the 
> duplicate IDs. This patch helps address this issue by  
> ensuring that we bail out and fail the test if for any 
> reason passthru target namspace is not created on the 
> host. The relevant kernel patchv2 to fix the issue with 
> duplicate IDs while using passthru loop target can be
> found here[2].
> 
> [1]: https://lore.kernel.org/all/8b17203f-ea4b-403b-a204-4fbc00c261ca@linux.ibm.com/
> [2]: https://lore.kernel.org/all/20240921070547.531991-1-nilay@linux.ibm.com/
> 
> Thanks!

Thanks for the patch :) It's bad that you stumbled into the infinite while loop
in _nvmet_passthru_target_connect(). I agree that it should be fixed.

Please find my comments in line.

> ---
>  tests/nvme/033 |  7 +++++--
>  tests/nvme/034 |  7 +++++--
>  tests/nvme/035 |  6 +++---
>  tests/nvme/036 | 14 ++++++++------
>  tests/nvme/037 |  6 +++++-
>  tests/nvme/rc  | 12 +++++++++++-
>  6 files changed, 37 insertions(+), 15 deletions(-)
> 
> diff --git a/tests/nvme/033 b/tests/nvme/033
> index 5e05175..171974e 100755
> --- a/tests/nvme/033
> +++ b/tests/nvme/033
> @@ -62,8 +62,11 @@ test_device() {
>  	_nvmet_passthru_target_setup
>  
>  	nsdev=$(_nvmet_passthru_target_connect)
> -
> -	compare_dev_info "${nsdev}"
> +	if [[ -z "$nsdev" ]]; then
> +		echo "FAIL"

I think some more specific failure message will be useful. How about
"Failed to connect" or so? Same comment for the other test cases 034-037.

> +	else
> +		compare_dev_info "${nsdev}"
> +	fi
>  
>  	_nvme_disconnect_subsys
>  	_nvmet_passthru_target_cleanup
> diff --git a/tests/nvme/034 b/tests/nvme/034
> index 154fc91..7625204 100755
> --- a/tests/nvme/034
> +++ b/tests/nvme/034
> @@ -32,8 +32,11 @@ test_device() {
>  
>  	_nvmet_passthru_target_setup
>  	nsdev=$(_nvmet_passthru_target_connect)
> -
> -	_run_fio_verify_io --size="${NVME_IMG_SIZE}" --filename="${nsdev}"
> +	if [[ -z "$nsdev" ]]; then
> +		echo "FAIL"
> +	else
> +		_run_fio_verify_io --size="${NVME_IMG_SIZE}" --filename="${nsdev}"
> +	fi
>  
>  	_nvme_disconnect_subsys
>  	_nvmet_passthru_target_cleanup
> diff --git a/tests/nvme/035 b/tests/nvme/035
> index ff217d6..6ad9c56 100755
> --- a/tests/nvme/035
> +++ b/tests/nvme/035
> @@ -30,13 +30,13 @@ test_device() {
>  
>  	_setup_nvmet
>  
> -	local ctrldev

Good catch :)

>  	local nsdev
>  
>  	_nvmet_passthru_target_setup
>  	nsdev=$(_nvmet_passthru_target_connect)
> -
> -	if ! _xfs_run_fio_verify_io "${nsdev}" "${NVME_IMG_SIZE}"; then
> +	if [[ -z "$nsdev" ]]; then
> +		echo "FAIL"
> +	elif ! _xfs_run_fio_verify_io "${nsdev}" "${NVME_IMG_SIZE}"; then
>  		echo "FAIL: fio verify failed"
>  	fi
>  
> diff --git a/tests/nvme/036 b/tests/nvme/036
> index 442ffe7..a67ca12 100755
> --- a/tests/nvme/036
> +++ b/tests/nvme/036
> @@ -30,13 +30,15 @@ test_device() {

This could be a good chance to add "local nsdev".

>  
>  	_nvmet_passthru_target_setup
>  	nsdev=$(_nvmet_passthru_target_connect)
> -
> -	ctrldev=$(_find_nvme_dev "${def_subsysnqn}")
> -
> -	if ! nvme reset "/dev/${ctrldev}" >> "$FULL" 2>&1; then
> -		echo "ERROR: reset failed"
> +	if [[ -z "$nsdev" ]]; then
> +		echo "FAIL"
> +	else
> +		ctrldev=$(_find_nvme_dev "${def_subsysnqn}")
> +
> +		if ! nvme reset "/dev/${ctrldev}" >> "$FULL" 2>&1; then
> +			echo "ERROR: reset failed"
> +		fi
>  	fi
> -

Nit: unnecessary change here.

>  	_nvme_disconnect_subsys
>  	_nvmet_passthru_target_cleanup
>  
> diff --git a/tests/nvme/037 b/tests/nvme/037
> index f7ddc2d..f0c8a77 100755
> --- a/tests/nvme/037
> +++ b/tests/nvme/037
> @@ -27,7 +27,6 @@ test_device() {
>  
>  	local subsys="blktests-subsystem-"
>  	local iterations=10
> -	local ctrldev

Good catch. And we can add "local nsdev" here.

>  
>  	for ((i = 0; i < iterations; i++)); do
>  		_nvmet_passthru_target_setup --subsysnqn "${subsys}${i}"
> @@ -37,6 +36,11 @@ test_device() {
>  		_nvme_disconnect_subsys \
>  			--subsysnqn "${subsys}${i}" >>"${FULL}" 2>&1
>  		_nvmet_passthru_target_cleanup --subsysnqn "${subsys}${i}"
> +
> +		if [[ -z "$nsdev" ]]; then
> +			echo "FAIL"
> +			break
> +		fi
>  	done
>  
>  	echo "Test complete"
> diff --git a/tests/nvme/rc b/tests/nvme/rc
> index a877de3..3def0d0 100644
> --- a/tests/nvme/rc
> +++ b/tests/nvme/rc
> @@ -394,6 +394,7 @@ _nvmet_passthru_target_setup() {
>  
>  _nvmet_passthru_target_connect() {
>  	local subsysnqn="$def_subsysnqn"
> +	local timeout="5"
>  
>  	while [[ $# -gt 0 ]]; do
>  		case $1 in
> @@ -414,9 +415,18 @@ _nvmet_passthru_target_connect() {
>  	# The following tests can race with the creation
>  	# of the device so ensure the block device exists
>  	# before continuing
> -	while [ ! -b "${nsdev}" ]; do sleep 1; done
> +	start_time=$(date +%s)
> +	while [ ! -b "${nsdev}" ]; do
> +		sleep .1

Is there any reason to have 0.1 second wait duration? When I changed this to
"sleep 1", runtime of the test cases did not change on my test system.

> +		end_time=$(date +%s)
> +		if ((end_time - start_time > timeout)); then
> +			echo ""

This echo doesn't look required.

> +			return 1
> +		fi

If 1 second wait is okay instead of 0.1 second wait, the if block above can be
a bit simpler, like,

              if ((++count >= timeout)); then
                      return 1
              fi

where, "local count=0" should be declared before.

> +	done
>  
>  	echo "${nsdev}"
> +	return 0

This "return 0" looks redundant. The previous echo succeeds always, then 0 is
returned anyway. Also, all of the callers check the failure of this function by
referring to the nsdev string echoed. They do not refer to the return code. So,
it is not so useful to explicitly show that 0 is returned on success.

>  }
>  
>  _nvmet_passthru_target_cleanup() {
> -- 
> 2.45.2
>
Nilay Shroff Sept. 25, 2024, 5:55 a.m. UTC | #2
On 9/24/24 17:44, Shinichiro Kawasaki wrote:
> On Sep 24, 2024 / 14:18, Nilay Shroff wrote:
>> Avoid waiting indefinitely for nvme passthru namespace block device
>> to appear. Wait for up to 5 seconds and during this time if namespace
>> block device doesn't appear then bail out and FAIL the test.
>>
>> Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
>> ---
>> Hi,
>>
>> You may find more details about this issue here[1]. 
>>
>> I found that blktest nvme/033-037 hangs indefinitely when 
>> kernel rejects the passthru target namespace due to the 
>> duplicate IDs. This patch helps address this issue by  
>> ensuring that we bail out and fail the test if for any 
>> reason passthru target namspace is not created on the 
>> host. The relevant kernel patchv2 to fix the issue with 
>> duplicate IDs while using passthru loop target can be
>> found here[2].
>>
>> [1]: https://lore.kernel.org/all/8b17203f-ea4b-403b-a204-4fbc00c261ca@linux.ibm.com/
>> [2]: https://lore.kernel.org/all/20240921070547.531991-1-nilay@linux.ibm.com/
>>
>> Thanks!
> 
> Thanks for the patch :) It's bad that you stumbled into the infinite while loop
> in _nvmet_passthru_target_connect(). I agree that it should be fixed.
> 
> Please find my comments in line.
> 
Thank you for your review comments!!

>> ---
>>  tests/nvme/033 |  7 +++++--
>>  tests/nvme/034 |  7 +++++--
>>  tests/nvme/035 |  6 +++---
>>  tests/nvme/036 | 14 ++++++++------
>>  tests/nvme/037 |  6 +++++-
>>  tests/nvme/rc  | 12 +++++++++++-
>>  6 files changed, 37 insertions(+), 15 deletions(-)
>>
>> diff --git a/tests/nvme/033 b/tests/nvme/033
>> index 5e05175..171974e 100755
>> --- a/tests/nvme/033
>> +++ b/tests/nvme/033
>> @@ -62,8 +62,11 @@ test_device() {
>>  	_nvmet_passthru_target_setup
>>  
>>  	nsdev=$(_nvmet_passthru_target_connect)
>> -
>> -	compare_dev_info "${nsdev}"
>> +	if [[ -z "$nsdev" ]]; then
>> +		echo "FAIL"
> 
> I think some more specific failure message will be useful. How about
> "Failed to connect" or so? Same comment for the other test cases 034-037.
Agreed, I will add a meaningful error message in next patch.
> 
>> +	else
>> +		compare_dev_info "${nsdev}"
>> +	fi
>>  
>>  	_nvme_disconnect_subsys
>>  	_nvmet_passthru_target_cleanup
>> diff --git a/tests/nvme/034 b/tests/nvme/034
>> index 154fc91..7625204 100755
>> --- a/tests/nvme/034
>> +++ b/tests/nvme/034
>> @@ -32,8 +32,11 @@ test_device() {
>>  
>>  	_nvmet_passthru_target_setup
>>  	nsdev=$(_nvmet_passthru_target_connect)
>> -
>> -	_run_fio_verify_io --size="${NVME_IMG_SIZE}" --filename="${nsdev}"
>> +	if [[ -z "$nsdev" ]]; then
>> +		echo "FAIL"
>> +	else
>> +		_run_fio_verify_io --size="${NVME_IMG_SIZE}" --filename="${nsdev}"
>> +	fi
>>  
>>  	_nvme_disconnect_subsys
>>  	_nvmet_passthru_target_cleanup
>> diff --git a/tests/nvme/035 b/tests/nvme/035
>> index ff217d6..6ad9c56 100755
>> --- a/tests/nvme/035
>> +++ b/tests/nvme/035
>> @@ -30,13 +30,13 @@ test_device() {
>>  
>>  	_setup_nvmet
>>  
>> -	local ctrldev
> 
> Good catch :)
> 
>>  	local nsdev
>>  
>>  	_nvmet_passthru_target_setup
>>  	nsdev=$(_nvmet_passthru_target_connect)
>> -
>> -	if ! _xfs_run_fio_verify_io "${nsdev}" "${NVME_IMG_SIZE}"; then
>> +	if [[ -z "$nsdev" ]]; then
>> +		echo "FAIL"
>> +	elif ! _xfs_run_fio_verify_io "${nsdev}" "${NVME_IMG_SIZE}"; then
>>  		echo "FAIL: fio verify failed"
>>  	fi
>>  
>> diff --git a/tests/nvme/036 b/tests/nvme/036
>> index 442ffe7..a67ca12 100755
>> --- a/tests/nvme/036
>> +++ b/tests/nvme/036
>> @@ -30,13 +30,15 @@ test_device() {
> 
> This could be a good chance to add "local nsdev".
Yeah will do.
> 
>>  
>>  	_nvmet_passthru_target_setup
>>  	nsdev=$(_nvmet_passthru_target_connect)
>> -
>> -	ctrldev=$(_find_nvme_dev "${def_subsysnqn}")
>> -
>> -	if ! nvme reset "/dev/${ctrldev}" >> "$FULL" 2>&1; then
>> -		echo "ERROR: reset failed"
>> +	if [[ -z "$nsdev" ]]; then
>> +		echo "FAIL"
>> +	else
>> +		ctrldev=$(_find_nvme_dev "${def_subsysnqn}")
>> +
>> +		if ! nvme reset "/dev/${ctrldev}" >> "$FULL" 2>&1; then
>> +			echo "ERROR: reset failed"
>> +		fi
>>  	fi
>> -
> 
> Nit: unnecessary change here.
Okay, will remove this.
> 
>>  	_nvme_disconnect_subsys
>>  	_nvmet_passthru_target_cleanup
>>  
>> diff --git a/tests/nvme/037 b/tests/nvme/037
>> index f7ddc2d..f0c8a77 100755
>> --- a/tests/nvme/037
>> +++ b/tests/nvme/037
>> @@ -27,7 +27,6 @@ test_device() {
>>  
>>  	local subsys="blktests-subsystem-"
>>  	local iterations=10
>> -	local ctrldev
> 
> Good catch. And we can add "local nsdev" here.
yes will do.
> 
>>  
>>  	for ((i = 0; i < iterations; i++)); do
>>  		_nvmet_passthru_target_setup --subsysnqn "${subsys}${i}"
>> @@ -37,6 +36,11 @@ test_device() {
>>  		_nvme_disconnect_subsys \
>>  			--subsysnqn "${subsys}${i}" >>"${FULL}" 2>&1
>>  		_nvmet_passthru_target_cleanup --subsysnqn "${subsys}${i}"
>> +
>> +		if [[ -z "$nsdev" ]]; then
>> +			echo "FAIL"
>> +			break
>> +		fi
>>  	done
>>  
>>  	echo "Test complete"
>> diff --git a/tests/nvme/rc b/tests/nvme/rc
>> index a877de3..3def0d0 100644
>> --- a/tests/nvme/rc
>> +++ b/tests/nvme/rc
>> @@ -394,6 +394,7 @@ _nvmet_passthru_target_setup() {
>>  
>>  _nvmet_passthru_target_connect() {
>>  	local subsysnqn="$def_subsysnqn"
>> +	local timeout="5"
>>  
>>  	while [[ $# -gt 0 ]]; do
>>  		case $1 in
>> @@ -414,9 +415,18 @@ _nvmet_passthru_target_connect() {
>>  	# The following tests can race with the creation
>>  	# of the device so ensure the block device exists
>>  	# before continuing
>> -	while [ ! -b "${nsdev}" ]; do sleep 1; done
>> +	start_time=$(date +%s)
>> +	while [ ! -b "${nsdev}" ]; do
>> +		sleep .1
> 
> Is there any reason to have 0.1 second wait duration? When I changed this to
> "sleep 1", runtime of the test cases did not change on my test system.
> 
yes you're right there won't be any noticeable change in the runtime of 
the test unless we run this test in a loop for multiple number of times.
I would say that the gain would be only a fraction of a second in this case.
So I will use sleep 1 instead of .1 as you suggested.

>> +		end_time=$(date +%s)
>> +		if ((end_time - start_time > timeout)); then
>> +			echo ""
> 
> This echo doesn't look required.
> 
>> +			return 1
>> +		fi
> 
> If 1 second wait is okay instead of 0.1 second wait, the if block above can be
> a bit simpler, like,
> 
>               if ((++count >= timeout)); then
>                       return 1
>               fi
> 
> where, "local count=0" should be declared before.
> 
yeah with sleep 1 we can make it simple. I will update this in next patch.

>> +	done
>>  
>>  	echo "${nsdev}"
>> +	return 0
> 
> This "return 0" looks redundant. The previous echo succeeds always, then 0 is
> returned anyway. Also, all of the callers check the failure of this function by
> referring to the nsdev string echoed. They do not refer to the return code. So,
> it is not so useful to explicitly show that 0 is returned on success.
> 
yes I will incorporate it in the next patch.

Thanks,
--Nilay
diff mbox series

Patch

diff --git a/tests/nvme/033 b/tests/nvme/033
index 5e05175..171974e 100755
--- a/tests/nvme/033
+++ b/tests/nvme/033
@@ -62,8 +62,11 @@  test_device() {
 	_nvmet_passthru_target_setup
 
 	nsdev=$(_nvmet_passthru_target_connect)
-
-	compare_dev_info "${nsdev}"
+	if [[ -z "$nsdev" ]]; then
+		echo "FAIL"
+	else
+		compare_dev_info "${nsdev}"
+	fi
 
 	_nvme_disconnect_subsys
 	_nvmet_passthru_target_cleanup
diff --git a/tests/nvme/034 b/tests/nvme/034
index 154fc91..7625204 100755
--- a/tests/nvme/034
+++ b/tests/nvme/034
@@ -32,8 +32,11 @@  test_device() {
 
 	_nvmet_passthru_target_setup
 	nsdev=$(_nvmet_passthru_target_connect)
-
-	_run_fio_verify_io --size="${NVME_IMG_SIZE}" --filename="${nsdev}"
+	if [[ -z "$nsdev" ]]; then
+		echo "FAIL"
+	else
+		_run_fio_verify_io --size="${NVME_IMG_SIZE}" --filename="${nsdev}"
+	fi
 
 	_nvme_disconnect_subsys
 	_nvmet_passthru_target_cleanup
diff --git a/tests/nvme/035 b/tests/nvme/035
index ff217d6..6ad9c56 100755
--- a/tests/nvme/035
+++ b/tests/nvme/035
@@ -30,13 +30,13 @@  test_device() {
 
 	_setup_nvmet
 
-	local ctrldev
 	local nsdev
 
 	_nvmet_passthru_target_setup
 	nsdev=$(_nvmet_passthru_target_connect)
-
-	if ! _xfs_run_fio_verify_io "${nsdev}" "${NVME_IMG_SIZE}"; then
+	if [[ -z "$nsdev" ]]; then
+		echo "FAIL"
+	elif ! _xfs_run_fio_verify_io "${nsdev}" "${NVME_IMG_SIZE}"; then
 		echo "FAIL: fio verify failed"
 	fi
 
diff --git a/tests/nvme/036 b/tests/nvme/036
index 442ffe7..a67ca12 100755
--- a/tests/nvme/036
+++ b/tests/nvme/036
@@ -30,13 +30,15 @@  test_device() {
 
 	_nvmet_passthru_target_setup
 	nsdev=$(_nvmet_passthru_target_connect)
-
-	ctrldev=$(_find_nvme_dev "${def_subsysnqn}")
-
-	if ! nvme reset "/dev/${ctrldev}" >> "$FULL" 2>&1; then
-		echo "ERROR: reset failed"
+	if [[ -z "$nsdev" ]]; then
+		echo "FAIL"
+	else
+		ctrldev=$(_find_nvme_dev "${def_subsysnqn}")
+
+		if ! nvme reset "/dev/${ctrldev}" >> "$FULL" 2>&1; then
+			echo "ERROR: reset failed"
+		fi
 	fi
-
 	_nvme_disconnect_subsys
 	_nvmet_passthru_target_cleanup
 
diff --git a/tests/nvme/037 b/tests/nvme/037
index f7ddc2d..f0c8a77 100755
--- a/tests/nvme/037
+++ b/tests/nvme/037
@@ -27,7 +27,6 @@  test_device() {
 
 	local subsys="blktests-subsystem-"
 	local iterations=10
-	local ctrldev
 
 	for ((i = 0; i < iterations; i++)); do
 		_nvmet_passthru_target_setup --subsysnqn "${subsys}${i}"
@@ -37,6 +36,11 @@  test_device() {
 		_nvme_disconnect_subsys \
 			--subsysnqn "${subsys}${i}" >>"${FULL}" 2>&1
 		_nvmet_passthru_target_cleanup --subsysnqn "${subsys}${i}"
+
+		if [[ -z "$nsdev" ]]; then
+			echo "FAIL"
+			break
+		fi
 	done
 
 	echo "Test complete"
diff --git a/tests/nvme/rc b/tests/nvme/rc
index a877de3..3def0d0 100644
--- a/tests/nvme/rc
+++ b/tests/nvme/rc
@@ -394,6 +394,7 @@  _nvmet_passthru_target_setup() {
 
 _nvmet_passthru_target_connect() {
 	local subsysnqn="$def_subsysnqn"
+	local timeout="5"
 
 	while [[ $# -gt 0 ]]; do
 		case $1 in
@@ -414,9 +415,18 @@  _nvmet_passthru_target_connect() {
 	# The following tests can race with the creation
 	# of the device so ensure the block device exists
 	# before continuing
-	while [ ! -b "${nsdev}" ]; do sleep 1; done
+	start_time=$(date +%s)
+	while [ ! -b "${nsdev}" ]; do
+		sleep .1
+		end_time=$(date +%s)
+		if ((end_time - start_time > timeout)); then
+			echo ""
+			return 1
+		fi
+	done
 
 	echo "${nsdev}"
+	return 0
 }
 
 _nvmet_passthru_target_cleanup() {