Message ID | 20230811012334.3392220-1-shinichiro.kawasaki@wdc.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [blktests] nvme/rc: fix nvme device readiness check after _nvme_connect_subsys | expand |
On 8/10/2023 6:23 PM, Shin'ichiro Kawasaki wrote: > The helper function _nvme_connect_subsys() creates a nvme device. It may > take some time after the function call until the device gets ready for > I/O. So it is expected that the test cases call _find_nvme_dev() after > _nvme_connect_subsys() before I/O. _find_nvme_dev() returns the path of > the created device, and it also waits for uuid and wwid sysfs attributes > of the created device get ready. This wait works as the wait for the > device I/O readiness. > > However, this wait by _find_nvme_dev() has two problems. The first > problem is missing call of _find_nvme_dev(). The test case nvme/047 > calls _nvme_connect_subsys() twice, but _find_nvme_dev() is called only > for the first _nvme_connect_subsys() call. This causes too early I/O to > the device with tcp transport [1]. Fix this by moving the code for > device readiness wait from _find_nvme_dev() to _nvme_connect_subsys(). > > The second problem is wrong paths for the sysfs attributes. The paths > do not include namespace index, so the check for the attributes always > fail. Still _find_nvme_dev() does 1 second wait and allows the device > get ready for I/O in most cases, but this is not intended behavior. > Fix the paths by adding the namespace index. > > On top of the checks for sysfs attributes, add check for the created > device file. This ensures that the create device is ready for I/O. > > [1] https://lore.kernel.org/linux-block/CAHj4cs9GNohGUjohNw93jrr8JGNcRYC-ienAZz+sa7az1RK77w@mail.gmail.com/ > > Fixes: c766fccf3aff ("Make the NVMe tests more reliable") > Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> > --- Thanks for fixing this, looks good to me. Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> -ck
On Fri, Aug 11, 2023 at 10:23:34AM +0900, Shin'ichiro Kawasaki wrote: > --- a/tests/nvme/rc > +++ b/tests/nvme/rc > @@ -425,6 +425,7 @@ _nvme_connect_subsys() { > local keep_alive_tmo="" > local reconnect_delay="" > local ctrl_loss_tmo="" > + local dev i > > while [[ $# -gt 0 ]]; do > case $1 in > @@ -529,6 +530,16 @@ _nvme_connect_subsys() { > fi > > nvme connect "${ARGS[@]}" 2> /dev/null > + > + dev=$(_find_nvme_dev "$subsysnqn") > + for ((i = 0; i < 10; i++)); do > + if [[ -b /dev/${dev}n1 && > + -e /sys/block/${dev}n1/uuid && > + -e /sys/block/${dev}n1/wwid ]]; then > + return > + fi > + sleep .1 > + done > } Not sure if this going to work for the passthru case as intended. If you look at the _find_nvme_passthru_loop_dev() function, there is a logic to figure out which namespace to use. _nvmet_passthru_target_connect() is also using _nvme_connect_subsys() so it is possible that the test device for the passthru case uses not namespace 1. If namespace 1 doesn't exist we just loop for 1 second. So in this particular case nothing changes. Still not nice. Thinking about it, shouldn't we log that we couldn't find the device/uuid/wwid at the end of the loop?
> The helper function _nvme_connect_subsys() creates a nvme device. It may > take some time after the function call until the device gets ready for > I/O. So it is expected that the test cases call _find_nvme_dev() after > _nvme_connect_subsys() before I/O. _find_nvme_dev() returns the path of > the created device, and it also waits for uuid and wwid sysfs attributes > of the created device get ready. This wait works as the wait for the > device I/O readiness. Shouldn't this call udevadm --settle or something?
On Aug 11, 2023 / 08:37, Daniel Wagner wrote: > On Fri, Aug 11, 2023 at 10:23:34AM +0900, Shin'ichiro Kawasaki wrote: > > --- a/tests/nvme/rc > > +++ b/tests/nvme/rc > > @@ -425,6 +425,7 @@ _nvme_connect_subsys() { > > local keep_alive_tmo="" > > local reconnect_delay="" > > local ctrl_loss_tmo="" > > + local dev i > > > > while [[ $# -gt 0 ]]; do > > case $1 in > > @@ -529,6 +530,16 @@ _nvme_connect_subsys() { > > fi > > > > nvme connect "${ARGS[@]}" 2> /dev/null > > + > > + dev=$(_find_nvme_dev "$subsysnqn") > > + for ((i = 0; i < 10; i++)); do > > + if [[ -b /dev/${dev}n1 && > > + -e /sys/block/${dev}n1/uuid && > > + -e /sys/block/${dev}n1/wwid ]]; then > > + return > > + fi > > + sleep .1 > > + done > > } > > Not sure if this going to work for the passthru case as intended. IMO, the check above is not for the passthru case. The function _nvmet_passthru_target_connect() has the code below: while [ ! -b "${nsdev}" ]; do sleep 1; done This checks readiness of the device for the passthru case. > If you > look at the _find_nvme_passthru_loop_dev() function, there is a logic to > figure out which namespace to use. _nvmet_passthru_target_connect() > is also using _nvme_connect_subsys() so it is possible that the > test device for the passthru case uses not namespace 1. > > If namespace 1 doesn't exist we just loop for 1 second. So in this > particular case nothing changes. Still not nice. For that case, the check in _nvmet_passthru_target_connect() ensures readiness of the passthru device. The drawback is that the check for namespace 1 consumes 1 second for nothing. > > Thinking about it, shouldn't we log that we couldn't find the > device/uuid/wwid at the end of the loop? Not sure. For the non-passthru case, it will be helpful. But for passthru case, check result log for namespace 1 can be confusing. I can think of two other fix approaches below, but they did not look better than this patch for me. What do you think? 1) Go back to the fix approach to add another _find_nvme_dev() in nvme/047. I worried this will leave the chance that we will fall into the same issue when we will add a new test case with multiple _nvme_connect_subsys calls. 2) Rework _find_nvme_dev into two new functions _find_nvme_ctrl_dev and _find_nvme_ns_dev, and do the readiness check in _find_nvme_ns_dev. IMO, this confusion comes from the fact that _find_nvme_dev returns control device, but some test cases use it to operate namespaces by adding "n1" to the control device name. If a test case uses namespace device, it's the better to call _find_nvme_ns_dev. But I worry this approach may be too much.
On Aug 14, 2023 / 12:37, Sagi Grimberg wrote: > > > The helper function _nvme_connect_subsys() creates a nvme device. It may > > take some time after the function call until the device gets ready for > > I/O. So it is expected that the test cases call _find_nvme_dev() after > > _nvme_connect_subsys() before I/O. _find_nvme_dev() returns the path of > > the created device, and it also waits for uuid and wwid sysfs attributes > > of the created device get ready. This wait works as the wait for the > > device I/O readiness. > > Shouldn't this call udevadm --settle or something? Ah, it will be more stable to add "udevadm settle" before the readiness check. Will add it.
On Wed, Aug 16, 2023 at 12:04:24PM +0000, Shinichiro Kawasaki wrote: > > Not sure if this going to work for the passthru case as intended. > > IMO, the check above is not for the passthru case. The function > _nvmet_passthru_target_connect() has the code below: > > while [ ! -b "${nsdev}" ]; do sleep 1; done > > This checks readiness of the device for the passthru case. I am worried about the case that in _nvmet_passthru_target_connect we call first _nvme_connet_subsys and then we do also the above loop. > For that case, the check in _nvmet_passthru_target_connect() ensures > readiness of the passthru device. The drawback is that the check for > namespace 1 consumes 1 second for nothing. And this is something we should not do on purpose, IMO. > > Thinking about it, shouldn't we log that we couldn't find the > > device/uuid/wwid at the end of the loop? > > Not sure. For the non-passthru case, it will be helpful. But for passthru case, > check result log for namespace 1 can be confusing. > > I can think of two other fix approaches below, but they did not look better than > this patch for me. What do you think? > > 1) Go back to the fix approach to add another _find_nvme_dev() in nvme/047. > I worried this will leave the chance that we will fall into the same issue > when we will add a new test case with multiple _nvme_connect_subsys > calls. I'd rather not go down this route. Ideally, the infrastructure code does the right thing and we don't have to deal in each test case with this problem. > 2) Rework _find_nvme_dev into two new functions _find_nvme_ctrl_dev and > _find_nvme_ns_dev, and do the readiness check in _find_nvme_ns_dev. > IMO, this confusion comes from the fact that _find_nvme_dev returns control > device, but some test cases use it to operate namespaces by adding "n1" to > the control device name. If a test case uses namespace device, it's the > better to call _find_nvme_ns_dev. But I worry this approach may be too much. As we already have an argument parser in _nvme_connect_subsys, we could also introduce a new option which allows to select the wait type. With this _nvmet_passtrhu_target_connect could be something like _nvmet_passthru_target_connect() { [...] _nvme_connect_subsys "${trtype}" "${subsys_name}" \ --wait-for=device || return [...] } and for the rest of the test cases we just set the default for --wait-for to ns.
On Wed, Aug 16, 2023 at 8:11 PM Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com> wrote: > > On Aug 14, 2023 / 12:37, Sagi Grimberg wrote: > > > > > The helper function _nvme_connect_subsys() creates a nvme device. It may > > > take some time after the function call until the device gets ready for > > > I/O. So it is expected that the test cases call _find_nvme_dev() after > > > _nvme_connect_subsys() before I/O. _find_nvme_dev() returns the path of > > > the created device, and it also waits for uuid and wwid sysfs attributes > > > of the created device get ready. This wait works as the wait for the > > > device I/O readiness. > > > > Shouldn't this call udevadm --settle or something? > > Ah, it will be more stable to add "udevadm settle" before the readiness check. > Will add it. > Yes, I just tried to add "udevadm --settle" and it works well.
On Aug 16, 2023 / 16:14, Daniel Wagner wrote: > On Wed, Aug 16, 2023 at 12:04:24PM +0000, Shinichiro Kawasaki wrote: [...] > > 2) Rework _find_nvme_dev into two new functions _find_nvme_ctrl_dev and > > _find_nvme_ns_dev, and do the readiness check in _find_nvme_ns_dev. > > IMO, this confusion comes from the fact that _find_nvme_dev returns control > > device, but some test cases use it to operate namespaces by adding "n1" to > > the control device name. If a test case uses namespace device, it's the > > better to call _find_nvme_ns_dev. But I worry this approach may be too much. > > As we already have an argument parser in _nvme_connect_subsys, we could > also introduce a new option which allows to select the wait type. With > this _nvmet_passtrhu_target_connect could be something like > > _nvmet_passthru_target_connect() { > [...] > > _nvme_connect_subsys "${trtype}" "${subsys_name}" \ > --wait-for=device || return > > [...] > } > > and for the rest of the test cases we just set the default for > --wait-for to ns. Thanks. This idea sounds good. It will fix the failure and avoid the 1 second wait. Will implement this and send out as v2.
diff --git a/tests/nvme/rc b/tests/nvme/rc index 4f3a994..d09e7b4 100644 --- a/tests/nvme/rc +++ b/tests/nvme/rc @@ -425,6 +425,7 @@ _nvme_connect_subsys() { local keep_alive_tmo="" local reconnect_delay="" local ctrl_loss_tmo="" + local dev i while [[ $# -gt 0 ]]; do case $1 in @@ -529,6 +530,16 @@ _nvme_connect_subsys() { fi nvme connect "${ARGS[@]}" 2> /dev/null + + dev=$(_find_nvme_dev "$subsysnqn") + for ((i = 0; i < 10; i++)); do + if [[ -b /dev/${dev}n1 && + -e /sys/block/${dev}n1/uuid && + -e /sys/block/${dev}n1/wwid ]]; then + return + fi + sleep .1 + done } _nvme_discover() { @@ -739,13 +750,6 @@ _find_nvme_dev() { subsysnqn="$(cat "/sys/class/nvme/${dev}/subsysnqn")" if [[ "$subsysnqn" == "$subsys" ]]; then echo "$dev" - for ((i = 0; i < 10; i++)); do - if [[ -e /sys/block/$dev/uuid && - -e /sys/block/$dev/wwid ]]; then - return - fi - sleep .1 - done fi done }
The helper function _nvme_connect_subsys() creates a nvme device. It may take some time after the function call until the device gets ready for I/O. So it is expected that the test cases call _find_nvme_dev() after _nvme_connect_subsys() before I/O. _find_nvme_dev() returns the path of the created device, and it also waits for uuid and wwid sysfs attributes of the created device get ready. This wait works as the wait for the device I/O readiness. However, this wait by _find_nvme_dev() has two problems. The first problem is missing call of _find_nvme_dev(). The test case nvme/047 calls _nvme_connect_subsys() twice, but _find_nvme_dev() is called only for the first _nvme_connect_subsys() call. This causes too early I/O to the device with tcp transport [1]. Fix this by moving the code for device readiness wait from _find_nvme_dev() to _nvme_connect_subsys(). The second problem is wrong paths for the sysfs attributes. The paths do not include namespace index, so the check for the attributes always fail. Still _find_nvme_dev() does 1 second wait and allows the device get ready for I/O in most cases, but this is not intended behavior. Fix the paths by adding the namespace index. On top of the checks for sysfs attributes, add check for the created device file. This ensures that the create device is ready for I/O. [1] https://lore.kernel.org/linux-block/CAHj4cs9GNohGUjohNw93jrr8JGNcRYC-ienAZz+sa7az1RK77w@mail.gmail.com/ Fixes: c766fccf3aff ("Make the NVMe tests more reliable") Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> --- tests/nvme/rc | 18 +++++++++++------- 1 file changed, 11 insertions(+), 7 deletions(-)