Message ID | 20240820102013.781794-1-shinichiro.kawasaki@wdc.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [blktests] nvme/052: wait for namespace removal before recreating namespace | expand |
On 8/20/24 15:50, Shin'ichiro Kawasaki wrote: > The CKI project reported that the test case nvme/052 fails occasionally > with the errors below: > > nvme/052 (tr=loop) (Test file-ns creation/deletion under one subsystem) [failed] > runtime ... 22.209s > --- tests/nvme/052.out 2024-07-30 18:38:29.041716566 -0400 > +++ > +/mnt/tests/gitlab.com/redhat/centos-stream/tests/kernel/kernel-tests/-/archive/production/kernel-t\ > ests-production.zip/storage/blktests/nvme/nvme-loop/blktests > +/results/nodev_tr_loop/nvme/052.out.bad 2024-07-30 18:45:35.438067452 -0400 > @@ -1,2 +1,4 @@ > Running nvme/052 > +cat: /sys/block/nvme1n2/uuid: No such file or directory > +cat: /sys/block/nvme1n2/uuid: No such file or directory > Test complete > > The test case repeats creating and removing namespaces. When the test > case removes the namespace by echoing 0 to the sysfs enable file, this > echo write does not wait for the completion of the namespace removal. > Before the removal completes, the test case recreates the namespace. > At this point, the sysfs uuid file for the old namespace still exists. > The test case misunderstands that the the sysfs uuid file would be for > the recreated namespace, and tries to read it. However, the removal > process for the old namespace deletes the sysfs uuid file at this point. > Then the read attempt fails and results in the errors. > > To avoid the failure, wait for the namespace removal before recreating > the namespace. For this purpose, add the new helper function > nvmf_wait_for_ns_removal(). To specify the namespace to wait for, get > the name of the namespace from nvmf_wait_for_ns(), and pass it to > nvmf_wait_for_ns_removal(). > > The test case intends to catch the regression fixed by the kernel commit > ff0ffe5b7c3c ("nvme: fix namespace removal list"). I reverted the commit > from the kernel v6.11-rc4, then confirmed that the test case still can > catch the regression with this change. > > Link: https://lore.kernel.org/linux-block/tczctp5tkr34o3k3f4dlyhuutgp2ycex6gdbjuqx4trn6ewm2i@qbkza3yr5wdd/ > Fixes: 077211a0e9ff ("nvme: add test for creating/deleting file-ns") > Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> > --- > Nelay, Yi, thank you for the feedbacks for the discussion > thread at the Link. Here's the formal fix patch. > > tests/nvme/052 | 22 ++++++++++++++++++++-- > 1 file changed, 20 insertions(+), 2 deletions(-) > > diff --git a/tests/nvme/052 b/tests/nvme/052 > index cf6061a..e1ac823 100755 > --- a/tests/nvme/052 > +++ b/tests/nvme/052 > @@ -39,15 +39,32 @@ nvmf_wait_for_ns() { > ns=$(_find_nvme_ns "${uuid}") > done > > + echo "$ns" > return 0 > } > > +nvmf_wait_for_ns_removal() { > + local ns=$1 i > + > + for ((i = 0; i < 10; i++)); do > + if [[ ! -e /dev/$ns ]]; then > + return > + fi > + sleep .1 > + echo "wait removal of $ns" >> "$FULL" > + done > + > + if [[ -e /dev/$ns ]]; then > + echo "Failed to remove the namespace $ns" > + fi > +} > + Under nvmf_wait_for_ns_removal(), instead of checking the existence of "/dev/$ns", how about checking the existence of file "/sys/block/$ns"? As we know, when this issue manifests, we have a stale entry "/sys/block/$ns/$uuid" lurking around from the previous iteration for sometime causing the observed symptom. So I think, we may reuse the _find_nvme_ns() function to wait until the stale "/sys/block/$ns/$uuid" file exists. Maybe something like below: nvmf_wait_for_ns_removal() { local ns local timeout="5" local uuid="$1" ns=$(_find_nvme_ns "${uuid}") start_time=$(date +%s) while [[ ! -z "$ns" ]]; do sleep 1 end_time=$(date +%s) if (( end_time - start_time > timeout )); then echo "namespace with uuid \"${uuid}\" still " \ "not deleted within ${timeout} seconds" return 1 fi echo "Waiting for $ns removal" >> ${FULL} ns=$(_find_nvme_ns "${uuid}") done return 0 } Thanks, --Nilay
On Aug 20, 2024 / 22:25, Nilay Shroff wrote: > > > On 8/20/24 15:50, Shin'ichiro Kawasaki wrote: [...] > > diff --git a/tests/nvme/052 b/tests/nvme/052 > > index cf6061a..e1ac823 100755 > > --- a/tests/nvme/052 > > +++ b/tests/nvme/052 > > @@ -39,15 +39,32 @@ nvmf_wait_for_ns() { > > ns=$(_find_nvme_ns "${uuid}") > > done > > > > + echo "$ns" > > return 0 > > } > > > > +nvmf_wait_for_ns_removal() { > > + local ns=$1 i > > + > > + for ((i = 0; i < 10; i++)); do > > + if [[ ! -e /dev/$ns ]]; then > > + return > > + fi > > + sleep .1 > > + echo "wait removal of $ns" >> "$FULL" > > + done > > + > > + if [[ -e /dev/$ns ]]; then > > + echo "Failed to remove the namespace $ns" > > + fi > > +} > > + > Under nvmf_wait_for_ns_removal(), instead of checking the existence of "/dev/$ns", > how about checking the existence of file "/sys/block/$ns"? As we know, when this issue > manifests, we have a stale entry "/sys/block/$ns/$uuid" lurking around from the > previous iteration for sometime causing the observed symptom. So I think, we may reuse the > _find_nvme_ns() function to wait until the stale "/sys/block/$ns/$uuid" file > exists. It sounds a good idea to reuse _find_nvme_ns(). > Maybe something like below: > > nvmf_wait_for_ns_removal() { > local ns > local timeout="5" > local uuid="$1" > > ns=$(_find_nvme_ns "${uuid}") I tried this, and found that the _find_nvme_ns call spits out the failure "cat: /sys/block/nvme1n2/uuid: No such file or directory", because the delayed namespace removal can happen here. To suppress the error message, this line should be, ns=$(_find_nvme_ns "${uuid}" 2> /dev/null) > > start_time=$(date +%s) > while [[ ! -z "$ns" ]]; do > sleep 1 > end_time=$(date +%s) > if (( end_time - start_time > timeout )); then > echo "namespace with uuid \"${uuid}\" still " \ > "not deleted within ${timeout} seconds" > return 1 > fi > echo "Waiting for $ns removal" >> ${FULL} > ns=$(_find_nvme_ns "${uuid}") Same comment as above. > > done > > return 0 > } I found that your nvmf_wait_for_ns_removal() above has certain amount of duplication with the existing nvmf_wait_for_ns(). To avoid the duplication, I suggest to reuse nvmf_wait_for_ns() and add a new argument to control wait target event: namespace 'created' or 'removed'. With this idea, I created the patch below. I confirmed the patch avoids the failure. One drawback of this new patch based on your suggestion is that it extends execution time of the test case from 20+ seconds to 40+ seconds. In most cases, the while loop condition check in nvmf_wait_for_ns() is true at the first time, and false at the the second time. So, nvmf_wait_for_ns() takes 1 second for one time "sleep 1". Before applying this patch, it took 20+ seconds for 20 times iteration. After applying the patch, it takes 40+ seconds, since one iteration calls nvmf_wait_for_ns() twice. So how about to reduce the sleep time from 1 to 0.1? I tried it and observed that it reduced the runtime from 40+ seconds to 10+ seconds. diff --git a/tests/nvme/052 b/tests/nvme/052 index cf6061a..22e0bf5 100755 --- a/tests/nvme/052 +++ b/tests/nvme/052 @@ -20,23 +20,35 @@ set_conditions() { _set_nvme_trtype "$@" } +find_nvme_ns() { + if [[ "$2" == removed ]]; then + _find_nvme_ns "$1" 2> /dev/null + else + _find_nvme_ns "$1" + fi +} + +# Wait for the namespace with specified uuid to fulfill the specified condtion, +# "created" or "removed". nvmf_wait_for_ns() { local ns local timeout="5" local uuid="$1" + local condition="$2" - ns=$(_find_nvme_ns "${uuid}") + ns=$(find_nvme_ns "${uuid}" "${condition}") start_time=$(date +%s) - while [[ -z "$ns" ]]; do + while [[ -z "$ns" && "$condition" == created ]] || + [[ -n "$ns" && "$condition" == removed ]]; do sleep 1 end_time=$(date +%s) if (( end_time - start_time > timeout )); then echo "namespace with uuid \"${uuid}\" not " \ - "found within ${timeout} seconds" + "${condition} within ${timeout} seconds" return 1 fi - ns=$(_find_nvme_ns "${uuid}") + ns=$(find_nvme_ns "${uuid}" "${condition}") done return 0 @@ -63,7 +75,7 @@ test() { _create_nvmet_ns "${def_subsysnqn}" "${i}" "$(_nvme_def_file_path).$i" "${uuid}" # wait until async request is processed and ns is created - nvmf_wait_for_ns "${uuid}" + nvmf_wait_for_ns "${uuid}" created if [ $? -eq 1 ]; then echo "FAIL" rm "$(_nvme_def_file_path).$i" @@ -71,6 +83,10 @@ test() { fi _remove_nvmet_ns "${def_subsysnqn}" "${i}" + + # wait until async request is processed and ns is removed + nvmf_wait_for_ns "${uuid}" removed + rm "$(_nvme_def_file_path).$i" } done
On 8/21/24 12:51, Shinichiro Kawasaki wrote: >> Under nvmf_wait_for_ns_removal(), instead of checking the existence of "/dev/$ns", >> how about checking the existence of file "/sys/block/$ns"? As we know, when this issue >> manifests, we have a stale entry "/sys/block/$ns/$uuid" lurking around from the >> previous iteration for sometime causing the observed symptom. So I think, we may reuse the >> _find_nvme_ns() function to wait until the stale "/sys/block/$ns/$uuid" file >> exists. > > It sounds a good idea to reuse _find_nvme_ns(). > >> Maybe something like below: >> >> nvmf_wait_for_ns_removal() { >> local ns >> local timeout="5" >> local uuid="$1" >> >> ns=$(_find_nvme_ns "${uuid}") > > I tried this, and found that the _find_nvme_ns call spits out the failure > "cat: /sys/block/nvme1n2/uuid: No such file or directory", because the > delayed namespace removal can happen here. To suppress the error message, > this line should be, > > ns=$(_find_nvme_ns "${uuid}" 2> /dev/null) > yeah I agree here. We need to suppress this error. >> >> start_time=$(date +%s) >> while [[ ! -z "$ns" ]]; do >> sleep 1 >> end_time=$(date +%s) >> if (( end_time - start_time > timeout )); then >> echo "namespace with uuid \"${uuid}\" still " \ >> "not deleted within ${timeout} seconds" >> return 1 >> fi >> echo "Waiting for $ns removal" >> ${FULL} >> ns=$(_find_nvme_ns "${uuid}") > > Same comment as above. > >> >> done >> >> return 0 >> } > > I found that your nvmf_wait_for_ns_removal() above has certain amount of > duplication with the existing nvmf_wait_for_ns(). To avoid the duplication, > I suggest to reuse nvmf_wait_for_ns() and add a new argument to control wait > target event: namespace 'created' or 'removed'. With this idea, I created the > patch below. I confirmed the patch avoids the failure. > Sounds good! > One drawback of this new patch based on your suggestion is that it extends > execution time of the test case from 20+ seconds to 40+ seconds. In most cases, > the while loop condition check in nvmf_wait_for_ns() is true at the first time, > and false at the the second time. So, nvmf_wait_for_ns() takes 1 second for one > time "sleep 1". Before applying this patch, it took 20+ seconds for 20 times > iteration. After applying the patch, it takes 40+ seconds, since one iteration > calls nvmf_wait_for_ns() twice. So how about to reduce the sleep time from 1 to > 0.1? I tried it and observed that it reduced the runtime from 40+ seconds to 10+ > seconds. > If using sleep of 0.1 second saves execution time then yes this makes sense. > diff --git a/tests/nvme/052 b/tests/nvme/052 > index cf6061a..22e0bf5 100755 > --- a/tests/nvme/052 > +++ b/tests/nvme/052 > @@ -20,23 +20,35 @@ set_conditions() { > _set_nvme_trtype "$@" > } > > +find_nvme_ns() { > + if [[ "$2" == removed ]]; then > + _find_nvme_ns "$1" 2> /dev/null > + else > + _find_nvme_ns "$1" > + fi > +} > + > +# Wait for the namespace with specified uuid to fulfill the specified condtion, > +# "created" or "removed". > nvmf_wait_for_ns() { > local ns > local timeout="5" > local uuid="$1" > + local condition="$2" > > - ns=$(_find_nvme_ns "${uuid}") > + ns=$(find_nvme_ns "${uuid}" "${condition}") > > start_time=$(date +%s) > - while [[ -z "$ns" ]]; do > + while [[ -z "$ns" && "$condition" == created ]] || > + [[ -n "$ns" && "$condition" == removed ]]; do > sleep 1 > end_time=$(date +%s) > if (( end_time - start_time > timeout )); then > echo "namespace with uuid \"${uuid}\" not " \ > - "found within ${timeout} seconds" > + "${condition} within ${timeout} seconds" > return 1 > fi > - ns=$(_find_nvme_ns "${uuid}") > + ns=$(find_nvme_ns "${uuid}" "${condition}") > done > > return 0 > @@ -63,7 +75,7 @@ test() { > _create_nvmet_ns "${def_subsysnqn}" "${i}" "$(_nvme_def_file_path).$i" "${uuid}" > > # wait until async request is processed and ns is created > - nvmf_wait_for_ns "${uuid}" > + nvmf_wait_for_ns "${uuid}" created > if [ $? -eq 1 ]; then > echo "FAIL" > rm "$(_nvme_def_file_path).$i" > @@ -71,6 +83,10 @@ test() { > fi > > _remove_nvmet_ns "${def_subsysnqn}" "${i}" > + > + # wait until async request is processed and ns is removed > + nvmf_wait_for_ns "${uuid}" removed > + As nvme_wait_for_ns() returns either 0 (success) or 1 (failure), I think we should check the return status of nvme_wait_for_ns() here and bail out in case it returns failure same as what we do above while creating namespace. Another point: I think, we may always suppress error from _find_nvme_ns() irrespective of it's being called while "creating" or "removing" the namespace assuming we always check the return status of nvme_wait_for_ns() in the main loop. So ideally we shall invoke _find_nvme_ns() from nvme_wait_for_ns() as below: ns=$(_find_nvme_ns "${uuid}" 2>/dev/null) On a cosmetic note: Maybe we can use readonly constants to make "created" and "removed" parameters looks more elegant/readable. # Define constants readonly NS_ADD="added" readonly NS_DEL="deleted" Now we may reuse above constants instead of "created" and "removed". You may rename constant name if you don't like the name I used above :) Thanks, --Nilay
On Aug 21, 2024 / 14:58, Nilay Shroff wrote: > > > On 8/21/24 12:51, Shinichiro Kawasaki wrote: [...] > > diff --git a/tests/nvme/052 b/tests/nvme/052 > > index cf6061a..22e0bf5 100755 > > --- a/tests/nvme/052 > > +++ b/tests/nvme/052 > > @@ -20,23 +20,35 @@ set_conditions() { > > _set_nvme_trtype "$@" > > } > > > > +find_nvme_ns() { > > + if [[ "$2" == removed ]]; then > > + _find_nvme_ns "$1" 2> /dev/null > > + else > > + _find_nvme_ns "$1" > > + fi > > +} > > + > > +# Wait for the namespace with specified uuid to fulfill the specified condtion, > > +# "created" or "removed". > > nvmf_wait_for_ns() { > > local ns > > local timeout="5" > > local uuid="$1" > > + local condition="$2" > > > > - ns=$(_find_nvme_ns "${uuid}") > > + ns=$(find_nvme_ns "${uuid}" "${condition}") > > > > start_time=$(date +%s) > > - while [[ -z "$ns" ]]; do > > + while [[ -z "$ns" && "$condition" == created ]] || > > + [[ -n "$ns" && "$condition" == removed ]]; do > > sleep 1 > > end_time=$(date +%s) > > if (( end_time - start_time > timeout )); then > > echo "namespace with uuid \"${uuid}\" not " \ > > - "found within ${timeout} seconds" > > + "${condition} within ${timeout} seconds" > > return 1 > > fi > > - ns=$(_find_nvme_ns "${uuid}") > > + ns=$(find_nvme_ns "${uuid}" "${condition}") > > done > > > > return 0 > > @@ -63,7 +75,7 @@ test() { > > _create_nvmet_ns "${def_subsysnqn}" "${i}" "$(_nvme_def_file_path).$i" "${uuid}" > > > > # wait until async request is processed and ns is created > > - nvmf_wait_for_ns "${uuid}" > > + nvmf_wait_for_ns "${uuid}" created > > if [ $? -eq 1 ]; then > > echo "FAIL" > > rm "$(_nvme_def_file_path).$i" > > @@ -71,6 +83,10 @@ test() { > > fi > > > > _remove_nvmet_ns "${def_subsysnqn}" "${i}" > > + > > + # wait until async request is processed and ns is removed > > + nvmf_wait_for_ns "${uuid}" removed > > + > As nvme_wait_for_ns() returns either 0 (success) or 1 (failure), I think we > should check the return status of nvme_wait_for_ns() here and bail out in case > it returns failure same as what we do above while creating namespace. Agreed, I will add the check to v2. > > Another point: I think, we may always suppress error from _find_nvme_ns() irrespective > of it's being called while "creating" or "removing" the namespace assuming we always > check the return status of nvme_wait_for_ns() in the main loop. So ideally we shall > invoke _find_nvme_ns() from nvme_wait_for_ns() as below: > > ns=$(_find_nvme_ns "${uuid}" 2>/dev/null) It doesn't sound correct to suppress the _find_nvme_ns errors always. We found the issue since _find_nvme_ns reported the error at after _create_nvmet_ns() call. If we suppress it, I can not be confident that this fix avoids the error. > > On a cosmetic note: Maybe we can use readonly constants to make "created" and "removed" > parameters looks more elegant/readable. > > # Define constants > readonly NS_ADD="added" > readonly NS_DEL="deleted" > > Now we may reuse above constants instead of "created" and "removed". You may rename > constant name if you don't like the name I used above :) This sounds too much for me. "created" and "removed" are constant strings, so I'm not sure about the merit of introducing the constant variables. Number of characters will not change much either. > > Thanks, > --Nilay
On 8/22/24 14:03, Shinichiro Kawasaki wrote: > On Aug 21, 2024 / 14:58, Nilay Shroff wrote: >> >> >> On 8/21/24 12:51, Shinichiro Kawasaki wrote: > [...] >>> diff --git a/tests/nvme/052 b/tests/nvme/052 >>> index cf6061a..22e0bf5 100755 >>> --- a/tests/nvme/052 >>> +++ b/tests/nvme/052 >>> @@ -20,23 +20,35 @@ set_conditions() { >>> _set_nvme_trtype "$@" >>> } >>> >>> +find_nvme_ns() { >>> + if [[ "$2" == removed ]]; then >>> + _find_nvme_ns "$1" 2> /dev/null >>> + else >>> + _find_nvme_ns "$1" >>> + fi >>> +} >>> + >>> +# Wait for the namespace with specified uuid to fulfill the specified condtion, >>> +# "created" or "removed". >>> nvmf_wait_for_ns() { >>> local ns >>> local timeout="5" >>> local uuid="$1" >>> + local condition="$2" >>> >>> - ns=$(_find_nvme_ns "${uuid}") >>> + ns=$(find_nvme_ns "${uuid}" "${condition}") >>> >>> start_time=$(date +%s) >>> - while [[ -z "$ns" ]]; do >>> + while [[ -z "$ns" && "$condition" == created ]] || >>> + [[ -n "$ns" && "$condition" == removed ]]; do >>> sleep 1 >>> end_time=$(date +%s) >>> if (( end_time - start_time > timeout )); then >>> echo "namespace with uuid \"${uuid}\" not " \ >>> - "found within ${timeout} seconds" >>> + "${condition} within ${timeout} seconds" >>> return 1 >>> fi >>> - ns=$(_find_nvme_ns "${uuid}") >>> + ns=$(find_nvme_ns "${uuid}" "${condition}") >>> done >>> >>> return 0 >>> @@ -63,7 +75,7 @@ test() { >>> _create_nvmet_ns "${def_subsysnqn}" "${i}" "$(_nvme_def_file_path).$i" "${uuid}" >>> >>> # wait until async request is processed and ns is created >>> - nvmf_wait_for_ns "${uuid}" >>> + nvmf_wait_for_ns "${uuid}" created >>> if [ $? -eq 1 ]; then >>> echo "FAIL" >>> rm "$(_nvme_def_file_path).$i" >>> @@ -71,6 +83,10 @@ test() { >>> fi >>> >>> _remove_nvmet_ns "${def_subsysnqn}" "${i}" >>> + >>> + # wait until async request is processed and ns is removed >>> + nvmf_wait_for_ns "${uuid}" removed >>> + >> As nvme_wait_for_ns() returns either 0 (success) or 1 (failure), I think we >> should check the return status of nvme_wait_for_ns() here and bail out in case >> it returns failure same as what we do above while creating namespace. > > Agreed, I will add the check to v2. > >> >> Another point: I think, we may always suppress error from _find_nvme_ns() irrespective >> of it's being called while "creating" or "removing" the namespace assuming we always >> check the return status of nvme_wait_for_ns() in the main loop. So ideally we shall >> invoke _find_nvme_ns() from nvme_wait_for_ns() as below: >> >> ns=$(_find_nvme_ns "${uuid}" 2>/dev/null) > > It doesn't sound correct to suppress the _find_nvme_ns errors always. We found > the issue since _find_nvme_ns reported the error at after _create_nvmet_ns() > call. If we suppress it, I can not be confident that this fix avoids the error. > Agreed, however my thought was that if we always(while creating as well as deleting namespace) validate the return value from nvme_wait_for_ns() in the main loop then we should be OK and we may not need to worry about any error generated from the _find_nvme_ns(). The return status from nvme_wait_for_ns() is good enough for main loop to proceed further or bail out. Having said that, it's debatable as what you pointed out is also valid. The only thing which looks odd to me is that we have to suppress the error from _find_nvme_ns() for one case but not for the other. >> >> On a cosmetic note: Maybe we can use readonly constants to make "created" and "removed" >> parameters looks more elegant/readable. >> >> # Define constants >> readonly NS_ADD="added" >> readonly NS_DEL="deleted" >> >> Now we may reuse above constants instead of "created" and "removed". You may rename >> constant name if you don't like the name I used above :) > > This sounds too much for me. "created" and "removed" are constant strings, so > I'm not sure about the merit of introducing the constant variables. Number of > characters will not change much either. > Alright! Maybe this is more of a personal test.. so I am fine either way. Thanks, --Nilay
On Aug 22, 2024 / 16:45, Nilay Shroff wrote: > > > On 8/22/24 14:03, Shinichiro Kawasaki wrote: > > On Aug 21, 2024 / 14:58, Nilay Shroff wrote: [...] > >> Another point: I think, we may always suppress error from _find_nvme_ns() irrespective > >> of it's being called while "creating" or "removing" the namespace assuming we always > >> check the return status of nvme_wait_for_ns() in the main loop. So ideally we shall > >> invoke _find_nvme_ns() from nvme_wait_for_ns() as below: > >> > >> ns=$(_find_nvme_ns "${uuid}" 2>/dev/null) > > > > It doesn't sound correct to suppress the _find_nvme_ns errors always. We found > > the issue since _find_nvme_ns reported the error at after _create_nvmet_ns() > > call. If we suppress it, I can not be confident that this fix avoids the error. > > > Agreed, however my thought was that if we always(while creating as well as deleting namespace) > validate the return value from nvme_wait_for_ns() in the main loop then we should be OK and > we may not need to worry about any error generated from the _find_nvme_ns(). The return > status from nvme_wait_for_ns() is good enough for main loop to proceed further or bail out. > Having said that, it's debatable as what you pointed out is also valid. The only thing which > looks odd to me is that we have to suppress the error from _find_nvme_ns() for one case but > not for the other. I can agree with this point: it is odd to suppress errors only for the namespace removal case. I did so to catch other potential errors that _find_nvme_ns() may return in the future for the namespace creation case. But still this way misses other potential errors for the namespace removal case. Maybe I was overthinking. Let's simplify the test with just doing ns=$(_find_nvme_ns "${uuid}" 2>/dev/null) as you suggest. Still the test case can detect the kernel regression, and I think it's good enough. Will reflect this to v2.
On Thu, Aug 22, 2024 at 11:59:35AM GMT, Shinichiro Kawasaki wrote: > I can agree with this point: it is odd to suppress errors only for the namespace > removal case. I did so to catch other potential errors that _find_nvme_ns() may > return in the future for the namespace creation case. But still this way misses > other potential errors for the namespace removal case. Maybe I was overthinking. > Let's simplify the test with just doing > > ns=$(_find_nvme_ns "${uuid}" 2>/dev/null) > > as you suggest. Still the test case can detect the kernel regression, and I > think it's good enough. Will reflect this to v2. Not sure if this is relevant, but I'd like to see that we return error codes so that the caller can actually decide to ignore the failure or not.
On 8/22/24 20:19, Daniel Wagner wrote: > On Thu, Aug 22, 2024 at 11:59:35AM GMT, Shinichiro Kawasaki wrote: >> I can agree with this point: it is odd to suppress errors only for the namespace >> removal case. I did so to catch other potential errors that _find_nvme_ns() may >> return in the future for the namespace creation case. But still this way misses >> other potential errors for the namespace removal case. Maybe I was overthinking. >> Let's simplify the test with just doing >> >> ns=$(_find_nvme_ns "${uuid}" 2>/dev/null) >> >> as you suggest. Still the test case can detect the kernel regression, and I >> think it's good enough. Will reflect this to v2. > > Not sure if this is relevant, but I'd like to see that we return error > codes so that the caller can actually decide to ignore the failure or > not. The _find_nvme_ns(), when fails to find the relevant namespace, it returns "empty string" and if it could find namespace then it returns the namespace value to the caller. And then caller (in this case nvmf_wait_for_ns()) would take action depending on the return value from _find_nvme_ns(). Thanks, --Nilay
On Aug 22, 2024 / 21:13, Nilay Shroff wrote: > > > On 8/22/24 20:19, Daniel Wagner wrote: > > On Thu, Aug 22, 2024 at 11:59:35AM GMT, Shinichiro Kawasaki wrote: > >> I can agree with this point: it is odd to suppress errors only for the namespace > >> removal case. I did so to catch other potential errors that _find_nvme_ns() may > >> return in the future for the namespace creation case. But still this way misses > >> other potential errors for the namespace removal case. Maybe I was overthinking. > >> Let's simplify the test with just doing > >> > >> ns=$(_find_nvme_ns "${uuid}" 2>/dev/null) While I was preparing for v3, I noticed that it would be the better to redirect the stderr to $FULL than /dev/null, since it will leave some clue on failure. Will reflect this idea to v3. > >> > >> as you suggest. Still the test case can detect the kernel regression, and I > >> think it's good enough. Will reflect this to v2. > > > > Not sure if this is relevant, but I'd like to see that we return error > > codes so that the caller can actually decide to ignore the failure or > > not. > The _find_nvme_ns(), when fails to find the relevant namespace, it returns > "empty string" and if it could find namespace then it returns the namespace > value to the caller. And then caller (in this case nvmf_wait_for_ns()) would > take action depending on the return value from _find_nvme_ns(). Even if _find_nvme_ns() returns a different error code for the "cat: /sys/block/nvmeXnY/uuid: No such file or directory" error, it does not decide the condition to ignore the error or not. So I think the "empty string" check for the _find_nvme_ns() is enough. Anyway, will send out v3 soon.
diff --git a/tests/nvme/052 b/tests/nvme/052 index cf6061a..e1ac823 100755 --- a/tests/nvme/052 +++ b/tests/nvme/052 @@ -39,15 +39,32 @@ nvmf_wait_for_ns() { ns=$(_find_nvme_ns "${uuid}") done + echo "$ns" return 0 } +nvmf_wait_for_ns_removal() { + local ns=$1 i + + for ((i = 0; i < 10; i++)); do + if [[ ! -e /dev/$ns ]]; then + return + fi + sleep .1 + echo "wait removal of $ns" >> "$FULL" + done + + if [[ -e /dev/$ns ]]; then + echo "Failed to remove the namespace $ns" + fi +} + test() { echo "Running ${TEST_NAME}" _setup_nvmet - local iterations=20 + local iterations=20 ns _nvmet_target_setup @@ -63,7 +80,7 @@ test() { _create_nvmet_ns "${def_subsysnqn}" "${i}" "$(_nvme_def_file_path).$i" "${uuid}" # wait until async request is processed and ns is created - nvmf_wait_for_ns "${uuid}" + ns=$(nvmf_wait_for_ns "${uuid}") if [ $? -eq 1 ]; then echo "FAIL" rm "$(_nvme_def_file_path).$i" @@ -71,6 +88,7 @@ test() { fi _remove_nvmet_ns "${def_subsysnqn}" "${i}" + nvmf_wait_for_ns_removal "$ns" rm "$(_nvme_def_file_path).$i" } done
The CKI project reported that the test case nvme/052 fails occasionally with the errors below: nvme/052 (tr=loop) (Test file-ns creation/deletion under one subsystem) [failed] runtime ... 22.209s --- tests/nvme/052.out 2024-07-30 18:38:29.041716566 -0400 +++ +/mnt/tests/gitlab.com/redhat/centos-stream/tests/kernel/kernel-tests/-/archive/production/kernel-t\ ests-production.zip/storage/blktests/nvme/nvme-loop/blktests +/results/nodev_tr_loop/nvme/052.out.bad 2024-07-30 18:45:35.438067452 -0400 @@ -1,2 +1,4 @@ Running nvme/052 +cat: /sys/block/nvme1n2/uuid: No such file or directory +cat: /sys/block/nvme1n2/uuid: No such file or directory Test complete The test case repeats creating and removing namespaces. When the test case removes the namespace by echoing 0 to the sysfs enable file, this echo write does not wait for the completion of the namespace removal. Before the removal completes, the test case recreates the namespace. At this point, the sysfs uuid file for the old namespace still exists. The test case misunderstands that the the sysfs uuid file would be for the recreated namespace, and tries to read it. However, the removal process for the old namespace deletes the sysfs uuid file at this point. Then the read attempt fails and results in the errors. To avoid the failure, wait for the namespace removal before recreating the namespace. For this purpose, add the new helper function nvmf_wait_for_ns_removal(). To specify the namespace to wait for, get the name of the namespace from nvmf_wait_for_ns(), and pass it to nvmf_wait_for_ns_removal(). The test case intends to catch the regression fixed by the kernel commit ff0ffe5b7c3c ("nvme: fix namespace removal list"). I reverted the commit from the kernel v6.11-rc4, then confirmed that the test case still can catch the regression with this change. Link: https://lore.kernel.org/linux-block/tczctp5tkr34o3k3f4dlyhuutgp2ycex6gdbjuqx4trn6ewm2i@qbkza3yr5wdd/ Fixes: 077211a0e9ff ("nvme: add test for creating/deleting file-ns") Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> --- Nelay, Yi, thank you for the feedbacks for the discussion thread at the Link. Here's the formal fix patch. tests/nvme/052 | 22 ++++++++++++++++++++-- 1 file changed, 20 insertions(+), 2 deletions(-)