Message ID | 20240924084907.143999-1-nilay@linux.ibm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [blktests] nvme/{033-037}: timeout while waiting for nvme passthru namespace device | expand |
On Sep 24, 2024 / 14:18, Nilay Shroff wrote: > Avoid waiting indefinitely for nvme passthru namespace block device > to appear. Wait for up to 5 seconds and during this time if namespace > block device doesn't appear then bail out and FAIL the test. > > Signed-off-by: Nilay Shroff <nilay@linux.ibm.com> > --- > Hi, > > You may find more details about this issue here[1]. > > I found that blktest nvme/033-037 hangs indefinitely when > kernel rejects the passthru target namespace due to the > duplicate IDs. This patch helps address this issue by > ensuring that we bail out and fail the test if for any > reason passthru target namspace is not created on the > host. The relevant kernel patchv2 to fix the issue with > duplicate IDs while using passthru loop target can be > found here[2]. > > [1]: https://lore.kernel.org/all/8b17203f-ea4b-403b-a204-4fbc00c261ca@linux.ibm.com/ > [2]: https://lore.kernel.org/all/20240921070547.531991-1-nilay@linux.ibm.com/ > > Thanks! Thanks for the patch :) It's bad that you stumbled into the infinite while loop in _nvmet_passthru_target_connect(). I agree that it should be fixed. Please find my comments in line. > --- > tests/nvme/033 | 7 +++++-- > tests/nvme/034 | 7 +++++-- > tests/nvme/035 | 6 +++--- > tests/nvme/036 | 14 ++++++++------ > tests/nvme/037 | 6 +++++- > tests/nvme/rc | 12 +++++++++++- > 6 files changed, 37 insertions(+), 15 deletions(-) > > diff --git a/tests/nvme/033 b/tests/nvme/033 > index 5e05175..171974e 100755 > --- a/tests/nvme/033 > +++ b/tests/nvme/033 > @@ -62,8 +62,11 @@ test_device() { > _nvmet_passthru_target_setup > > nsdev=$(_nvmet_passthru_target_connect) > - > - compare_dev_info "${nsdev}" > + if [[ -z "$nsdev" ]]; then > + echo "FAIL" I think some more specific failure message will be useful. How about "Failed to connect" or so? Same comment for the other test cases 034-037. > + else > + compare_dev_info "${nsdev}" > + fi > > _nvme_disconnect_subsys > _nvmet_passthru_target_cleanup > diff --git a/tests/nvme/034 b/tests/nvme/034 > index 154fc91..7625204 100755 > --- a/tests/nvme/034 > +++ b/tests/nvme/034 > @@ -32,8 +32,11 @@ test_device() { > > _nvmet_passthru_target_setup > nsdev=$(_nvmet_passthru_target_connect) > - > - _run_fio_verify_io --size="${NVME_IMG_SIZE}" --filename="${nsdev}" > + if [[ -z "$nsdev" ]]; then > + echo "FAIL" > + else > + _run_fio_verify_io --size="${NVME_IMG_SIZE}" --filename="${nsdev}" > + fi > > _nvme_disconnect_subsys > _nvmet_passthru_target_cleanup > diff --git a/tests/nvme/035 b/tests/nvme/035 > index ff217d6..6ad9c56 100755 > --- a/tests/nvme/035 > +++ b/tests/nvme/035 > @@ -30,13 +30,13 @@ test_device() { > > _setup_nvmet > > - local ctrldev Good catch :) > local nsdev > > _nvmet_passthru_target_setup > nsdev=$(_nvmet_passthru_target_connect) > - > - if ! _xfs_run_fio_verify_io "${nsdev}" "${NVME_IMG_SIZE}"; then > + if [[ -z "$nsdev" ]]; then > + echo "FAIL" > + elif ! _xfs_run_fio_verify_io "${nsdev}" "${NVME_IMG_SIZE}"; then > echo "FAIL: fio verify failed" > fi > > diff --git a/tests/nvme/036 b/tests/nvme/036 > index 442ffe7..a67ca12 100755 > --- a/tests/nvme/036 > +++ b/tests/nvme/036 > @@ -30,13 +30,15 @@ test_device() { This could be a good chance to add "local nsdev". > > _nvmet_passthru_target_setup > nsdev=$(_nvmet_passthru_target_connect) > - > - ctrldev=$(_find_nvme_dev "${def_subsysnqn}") > - > - if ! nvme reset "/dev/${ctrldev}" >> "$FULL" 2>&1; then > - echo "ERROR: reset failed" > + if [[ -z "$nsdev" ]]; then > + echo "FAIL" > + else > + ctrldev=$(_find_nvme_dev "${def_subsysnqn}") > + > + if ! nvme reset "/dev/${ctrldev}" >> "$FULL" 2>&1; then > + echo "ERROR: reset failed" > + fi > fi > - Nit: unnecessary change here. > _nvme_disconnect_subsys > _nvmet_passthru_target_cleanup > > diff --git a/tests/nvme/037 b/tests/nvme/037 > index f7ddc2d..f0c8a77 100755 > --- a/tests/nvme/037 > +++ b/tests/nvme/037 > @@ -27,7 +27,6 @@ test_device() { > > local subsys="blktests-subsystem-" > local iterations=10 > - local ctrldev Good catch. And we can add "local nsdev" here. > > for ((i = 0; i < iterations; i++)); do > _nvmet_passthru_target_setup --subsysnqn "${subsys}${i}" > @@ -37,6 +36,11 @@ test_device() { > _nvme_disconnect_subsys \ > --subsysnqn "${subsys}${i}" >>"${FULL}" 2>&1 > _nvmet_passthru_target_cleanup --subsysnqn "${subsys}${i}" > + > + if [[ -z "$nsdev" ]]; then > + echo "FAIL" > + break > + fi > done > > echo "Test complete" > diff --git a/tests/nvme/rc b/tests/nvme/rc > index a877de3..3def0d0 100644 > --- a/tests/nvme/rc > +++ b/tests/nvme/rc > @@ -394,6 +394,7 @@ _nvmet_passthru_target_setup() { > > _nvmet_passthru_target_connect() { > local subsysnqn="$def_subsysnqn" > + local timeout="5" > > while [[ $# -gt 0 ]]; do > case $1 in > @@ -414,9 +415,18 @@ _nvmet_passthru_target_connect() { > # The following tests can race with the creation > # of the device so ensure the block device exists > # before continuing > - while [ ! -b "${nsdev}" ]; do sleep 1; done > + start_time=$(date +%s) > + while [ ! -b "${nsdev}" ]; do > + sleep .1 Is there any reason to have 0.1 second wait duration? When I changed this to "sleep 1", runtime of the test cases did not change on my test system. > + end_time=$(date +%s) > + if ((end_time - start_time > timeout)); then > + echo "" This echo doesn't look required. > + return 1 > + fi If 1 second wait is okay instead of 0.1 second wait, the if block above can be a bit simpler, like, if ((++count >= timeout)); then return 1 fi where, "local count=0" should be declared before. > + done > > echo "${nsdev}" > + return 0 This "return 0" looks redundant. The previous echo succeeds always, then 0 is returned anyway. Also, all of the callers check the failure of this function by referring to the nsdev string echoed. They do not refer to the return code. So, it is not so useful to explicitly show that 0 is returned on success. > } > > _nvmet_passthru_target_cleanup() { > -- > 2.45.2 >
On 9/24/24 17:44, Shinichiro Kawasaki wrote: > On Sep 24, 2024 / 14:18, Nilay Shroff wrote: >> Avoid waiting indefinitely for nvme passthru namespace block device >> to appear. Wait for up to 5 seconds and during this time if namespace >> block device doesn't appear then bail out and FAIL the test. >> >> Signed-off-by: Nilay Shroff <nilay@linux.ibm.com> >> --- >> Hi, >> >> You may find more details about this issue here[1]. >> >> I found that blktest nvme/033-037 hangs indefinitely when >> kernel rejects the passthru target namespace due to the >> duplicate IDs. This patch helps address this issue by >> ensuring that we bail out and fail the test if for any >> reason passthru target namspace is not created on the >> host. The relevant kernel patchv2 to fix the issue with >> duplicate IDs while using passthru loop target can be >> found here[2]. >> >> [1]: https://lore.kernel.org/all/8b17203f-ea4b-403b-a204-4fbc00c261ca@linux.ibm.com/ >> [2]: https://lore.kernel.org/all/20240921070547.531991-1-nilay@linux.ibm.com/ >> >> Thanks! > > Thanks for the patch :) It's bad that you stumbled into the infinite while loop > in _nvmet_passthru_target_connect(). I agree that it should be fixed. > > Please find my comments in line. > Thank you for your review comments!! >> --- >> tests/nvme/033 | 7 +++++-- >> tests/nvme/034 | 7 +++++-- >> tests/nvme/035 | 6 +++--- >> tests/nvme/036 | 14 ++++++++------ >> tests/nvme/037 | 6 +++++- >> tests/nvme/rc | 12 +++++++++++- >> 6 files changed, 37 insertions(+), 15 deletions(-) >> >> diff --git a/tests/nvme/033 b/tests/nvme/033 >> index 5e05175..171974e 100755 >> --- a/tests/nvme/033 >> +++ b/tests/nvme/033 >> @@ -62,8 +62,11 @@ test_device() { >> _nvmet_passthru_target_setup >> >> nsdev=$(_nvmet_passthru_target_connect) >> - >> - compare_dev_info "${nsdev}" >> + if [[ -z "$nsdev" ]]; then >> + echo "FAIL" > > I think some more specific failure message will be useful. How about > "Failed to connect" or so? Same comment for the other test cases 034-037. Agreed, I will add a meaningful error message in next patch. > >> + else >> + compare_dev_info "${nsdev}" >> + fi >> >> _nvme_disconnect_subsys >> _nvmet_passthru_target_cleanup >> diff --git a/tests/nvme/034 b/tests/nvme/034 >> index 154fc91..7625204 100755 >> --- a/tests/nvme/034 >> +++ b/tests/nvme/034 >> @@ -32,8 +32,11 @@ test_device() { >> >> _nvmet_passthru_target_setup >> nsdev=$(_nvmet_passthru_target_connect) >> - >> - _run_fio_verify_io --size="${NVME_IMG_SIZE}" --filename="${nsdev}" >> + if [[ -z "$nsdev" ]]; then >> + echo "FAIL" >> + else >> + _run_fio_verify_io --size="${NVME_IMG_SIZE}" --filename="${nsdev}" >> + fi >> >> _nvme_disconnect_subsys >> _nvmet_passthru_target_cleanup >> diff --git a/tests/nvme/035 b/tests/nvme/035 >> index ff217d6..6ad9c56 100755 >> --- a/tests/nvme/035 >> +++ b/tests/nvme/035 >> @@ -30,13 +30,13 @@ test_device() { >> >> _setup_nvmet >> >> - local ctrldev > > Good catch :) > >> local nsdev >> >> _nvmet_passthru_target_setup >> nsdev=$(_nvmet_passthru_target_connect) >> - >> - if ! _xfs_run_fio_verify_io "${nsdev}" "${NVME_IMG_SIZE}"; then >> + if [[ -z "$nsdev" ]]; then >> + echo "FAIL" >> + elif ! _xfs_run_fio_verify_io "${nsdev}" "${NVME_IMG_SIZE}"; then >> echo "FAIL: fio verify failed" >> fi >> >> diff --git a/tests/nvme/036 b/tests/nvme/036 >> index 442ffe7..a67ca12 100755 >> --- a/tests/nvme/036 >> +++ b/tests/nvme/036 >> @@ -30,13 +30,15 @@ test_device() { > > This could be a good chance to add "local nsdev". Yeah will do. > >> >> _nvmet_passthru_target_setup >> nsdev=$(_nvmet_passthru_target_connect) >> - >> - ctrldev=$(_find_nvme_dev "${def_subsysnqn}") >> - >> - if ! nvme reset "/dev/${ctrldev}" >> "$FULL" 2>&1; then >> - echo "ERROR: reset failed" >> + if [[ -z "$nsdev" ]]; then >> + echo "FAIL" >> + else >> + ctrldev=$(_find_nvme_dev "${def_subsysnqn}") >> + >> + if ! nvme reset "/dev/${ctrldev}" >> "$FULL" 2>&1; then >> + echo "ERROR: reset failed" >> + fi >> fi >> - > > Nit: unnecessary change here. Okay, will remove this. > >> _nvme_disconnect_subsys >> _nvmet_passthru_target_cleanup >> >> diff --git a/tests/nvme/037 b/tests/nvme/037 >> index f7ddc2d..f0c8a77 100755 >> --- a/tests/nvme/037 >> +++ b/tests/nvme/037 >> @@ -27,7 +27,6 @@ test_device() { >> >> local subsys="blktests-subsystem-" >> local iterations=10 >> - local ctrldev > > Good catch. And we can add "local nsdev" here. yes will do. > >> >> for ((i = 0; i < iterations; i++)); do >> _nvmet_passthru_target_setup --subsysnqn "${subsys}${i}" >> @@ -37,6 +36,11 @@ test_device() { >> _nvme_disconnect_subsys \ >> --subsysnqn "${subsys}${i}" >>"${FULL}" 2>&1 >> _nvmet_passthru_target_cleanup --subsysnqn "${subsys}${i}" >> + >> + if [[ -z "$nsdev" ]]; then >> + echo "FAIL" >> + break >> + fi >> done >> >> echo "Test complete" >> diff --git a/tests/nvme/rc b/tests/nvme/rc >> index a877de3..3def0d0 100644 >> --- a/tests/nvme/rc >> +++ b/tests/nvme/rc >> @@ -394,6 +394,7 @@ _nvmet_passthru_target_setup() { >> >> _nvmet_passthru_target_connect() { >> local subsysnqn="$def_subsysnqn" >> + local timeout="5" >> >> while [[ $# -gt 0 ]]; do >> case $1 in >> @@ -414,9 +415,18 @@ _nvmet_passthru_target_connect() { >> # The following tests can race with the creation >> # of the device so ensure the block device exists >> # before continuing >> - while [ ! -b "${nsdev}" ]; do sleep 1; done >> + start_time=$(date +%s) >> + while [ ! -b "${nsdev}" ]; do >> + sleep .1 > > Is there any reason to have 0.1 second wait duration? When I changed this to > "sleep 1", runtime of the test cases did not change on my test system. > yes you're right there won't be any noticeable change in the runtime of the test unless we run this test in a loop for multiple number of times. I would say that the gain would be only a fraction of a second in this case. So I will use sleep 1 instead of .1 as you suggested. >> + end_time=$(date +%s) >> + if ((end_time - start_time > timeout)); then >> + echo "" > > This echo doesn't look required. > >> + return 1 >> + fi > > If 1 second wait is okay instead of 0.1 second wait, the if block above can be > a bit simpler, like, > > if ((++count >= timeout)); then > return 1 > fi > > where, "local count=0" should be declared before. > yeah with sleep 1 we can make it simple. I will update this in next patch. >> + done >> >> echo "${nsdev}" >> + return 0 > > This "return 0" looks redundant. The previous echo succeeds always, then 0 is > returned anyway. Also, all of the callers check the failure of this function by > referring to the nsdev string echoed. They do not refer to the return code. So, > it is not so useful to explicitly show that 0 is returned on success. > yes I will incorporate it in the next patch. Thanks, --Nilay
diff --git a/tests/nvme/033 b/tests/nvme/033 index 5e05175..171974e 100755 --- a/tests/nvme/033 +++ b/tests/nvme/033 @@ -62,8 +62,11 @@ test_device() { _nvmet_passthru_target_setup nsdev=$(_nvmet_passthru_target_connect) - - compare_dev_info "${nsdev}" + if [[ -z "$nsdev" ]]; then + echo "FAIL" + else + compare_dev_info "${nsdev}" + fi _nvme_disconnect_subsys _nvmet_passthru_target_cleanup diff --git a/tests/nvme/034 b/tests/nvme/034 index 154fc91..7625204 100755 --- a/tests/nvme/034 +++ b/tests/nvme/034 @@ -32,8 +32,11 @@ test_device() { _nvmet_passthru_target_setup nsdev=$(_nvmet_passthru_target_connect) - - _run_fio_verify_io --size="${NVME_IMG_SIZE}" --filename="${nsdev}" + if [[ -z "$nsdev" ]]; then + echo "FAIL" + else + _run_fio_verify_io --size="${NVME_IMG_SIZE}" --filename="${nsdev}" + fi _nvme_disconnect_subsys _nvmet_passthru_target_cleanup diff --git a/tests/nvme/035 b/tests/nvme/035 index ff217d6..6ad9c56 100755 --- a/tests/nvme/035 +++ b/tests/nvme/035 @@ -30,13 +30,13 @@ test_device() { _setup_nvmet - local ctrldev local nsdev _nvmet_passthru_target_setup nsdev=$(_nvmet_passthru_target_connect) - - if ! _xfs_run_fio_verify_io "${nsdev}" "${NVME_IMG_SIZE}"; then + if [[ -z "$nsdev" ]]; then + echo "FAIL" + elif ! _xfs_run_fio_verify_io "${nsdev}" "${NVME_IMG_SIZE}"; then echo "FAIL: fio verify failed" fi diff --git a/tests/nvme/036 b/tests/nvme/036 index 442ffe7..a67ca12 100755 --- a/tests/nvme/036 +++ b/tests/nvme/036 @@ -30,13 +30,15 @@ test_device() { _nvmet_passthru_target_setup nsdev=$(_nvmet_passthru_target_connect) - - ctrldev=$(_find_nvme_dev "${def_subsysnqn}") - - if ! nvme reset "/dev/${ctrldev}" >> "$FULL" 2>&1; then - echo "ERROR: reset failed" + if [[ -z "$nsdev" ]]; then + echo "FAIL" + else + ctrldev=$(_find_nvme_dev "${def_subsysnqn}") + + if ! nvme reset "/dev/${ctrldev}" >> "$FULL" 2>&1; then + echo "ERROR: reset failed" + fi fi - _nvme_disconnect_subsys _nvmet_passthru_target_cleanup diff --git a/tests/nvme/037 b/tests/nvme/037 index f7ddc2d..f0c8a77 100755 --- a/tests/nvme/037 +++ b/tests/nvme/037 @@ -27,7 +27,6 @@ test_device() { local subsys="blktests-subsystem-" local iterations=10 - local ctrldev for ((i = 0; i < iterations; i++)); do _nvmet_passthru_target_setup --subsysnqn "${subsys}${i}" @@ -37,6 +36,11 @@ test_device() { _nvme_disconnect_subsys \ --subsysnqn "${subsys}${i}" >>"${FULL}" 2>&1 _nvmet_passthru_target_cleanup --subsysnqn "${subsys}${i}" + + if [[ -z "$nsdev" ]]; then + echo "FAIL" + break + fi done echo "Test complete" diff --git a/tests/nvme/rc b/tests/nvme/rc index a877de3..3def0d0 100644 --- a/tests/nvme/rc +++ b/tests/nvme/rc @@ -394,6 +394,7 @@ _nvmet_passthru_target_setup() { _nvmet_passthru_target_connect() { local subsysnqn="$def_subsysnqn" + local timeout="5" while [[ $# -gt 0 ]]; do case $1 in @@ -414,9 +415,18 @@ _nvmet_passthru_target_connect() { # The following tests can race with the creation # of the device so ensure the block device exists # before continuing - while [ ! -b "${nsdev}" ]; do sleep 1; done + start_time=$(date +%s) + while [ ! -b "${nsdev}" ]; do + sleep .1 + end_time=$(date +%s) + if ((end_time - start_time > timeout)); then + echo "" + return 1 + fi + done echo "${nsdev}" + return 0 } _nvmet_passthru_target_cleanup() {
Avoid waiting indefinitely for nvme passthru namespace block device to appear. Wait for up to 5 seconds and during this time if namespace block device doesn't appear then bail out and FAIL the test. Signed-off-by: Nilay Shroff <nilay@linux.ibm.com> --- Hi, You may find more details about this issue here[1]. I found that blktest nvme/033-037 hangs indefinitely when kernel rejects the passthru target namespace due to the duplicate IDs. This patch helps address this issue by ensuring that we bail out and fail the test if for any reason passthru target namspace is not created on the host. The relevant kernel patchv2 to fix the issue with duplicate IDs while using passthru loop target can be found here[2]. [1]: https://lore.kernel.org/all/8b17203f-ea4b-403b-a204-4fbc00c261ca@linux.ibm.com/ [2]: https://lore.kernel.org/all/20240921070547.531991-1-nilay@linux.ibm.com/ Thanks! --- tests/nvme/033 | 7 +++++-- tests/nvme/034 | 7 +++++-- tests/nvme/035 | 6 +++--- tests/nvme/036 | 14 ++++++++------ tests/nvme/037 | 6 +++++- tests/nvme/rc | 12 +++++++++++- 6 files changed, 37 insertions(+), 15 deletions(-)