diff mbox series

[blktests] tests/nvme/031: fix connecting faiure

Message ID 20230907034423.3928010-1-yi.zhang@redhat.com (mailing list archive)
State New, archived
Headers show
Series [blktests] tests/nvme/031: fix connecting faiure | expand

Commit Message

Yi Zhang Sept. 7, 2023, 3:44 a.m. UTC
allow_any_host was disabled during _create_nvmet_subsystem, call
_create_nvmet_host before connecting to allow the host to connect.

[76096.420586] nvmet: adding nsid 1 to subsystem blktests-subsystem-0
[76096.440595] nvmet_tcp: enabling port 0 (127.0.0.1:4420)
[76096.491344] nvmet: connect by host nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349 for subsystem blktests-subsystem-0 not allowed
[76096.505049] nvme nvme2: Connect for subsystem blktests-subsystem-0 is not allowed, hostnqn: nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349
[76096.519609] nvme nvme2: failed to connect queue: 0 ret=16772

Signed-off-by: Yi Zhang <yi.zhang@redhat.com>
---
 tests/nvme/031 | 2 ++
 1 file changed, 2 insertions(+)

Comments

Shin'ichiro Kawasaki Sept. 8, 2023, 11:26 a.m. UTC | #1
On Sep 07, 2023 / 11:44, Yi Zhang wrote:
> allow_any_host was disabled during _create_nvmet_subsystem, call
> _create_nvmet_host before connecting to allow the host to connect.
> 
> [76096.420586] nvmet: adding nsid 1 to subsystem blktests-subsystem-0
> [76096.440595] nvmet_tcp: enabling port 0 (127.0.0.1:4420)
> [76096.491344] nvmet: connect by host nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349 for subsystem blktests-subsystem-0 not allowed
> [76096.505049] nvme nvme2: Connect for subsystem blktests-subsystem-0 is not allowed, hostnqn: nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349
> [76096.519609] nvme nvme2: failed to connect queue: 0 ret=16772
> 
> Signed-off-by: Yi Zhang <yi.zhang@redhat.com>

Thanks for the catching this. I looked back the past changes and found that the
commit c32b233b7dd6 ("nvme/rc: Add helper for adding/removing to allow list")
triggered the connection failure. So, I think a Fixes tag with this commit is
required (I can add when this patch is applied).

Even after the commit, the test case still passes. That's why I did not notice
the connection failure. I think _nvme_connect_subsys() should check exit status
of "nvme connect" command and print an error message on failure. This will help
to catch similar connection failures in future.

As for this patch, I will wait some days for further comments. Also will do some
confirmation test runs.

> ---
>  tests/nvme/031 | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/tests/nvme/031 b/tests/nvme/031
> index d5c2561..696db2d 100755
> --- a/tests/nvme/031
> +++ b/tests/nvme/031
> @@ -42,10 +42,12 @@ test() {
>  	for ((i = 0; i < iterations; i++)); do
>  		_create_nvmet_subsystem "${subsys}$i" "${loop_dev}"
>  		_add_nvmet_subsys_to_port "${port}" "${subsys}$i"
> +		_create_nvmet_host "${subsys}$i" "${def_hostnqn}"
>  		_nvme_connect_subsys "${nvme_trtype}" "${subsys}$i"
>  		_nvme_disconnect_subsys "${subsys}$i" >> "${FULL}" 2>&1
>  		_remove_nvmet_subsystem_from_port "${port}" "${subsys}$i"
>  		_remove_nvmet_subsystem "${subsys}$i"
> +		_remove_nvmet_host "${def_hostnqn}"
>  	done
>  
>  	_remove_nvmet_port "${port}"
> -- 
> 2.34.3
>
Daniel Wagner Sept. 8, 2023, 1:09 p.m. UTC | #2
On Fri, Sep 08, 2023 at 11:26:31AM +0000, Shinichiro Kawasaki wrote:
> On Sep 07, 2023 / 11:44, Yi Zhang wrote:
> > allow_any_host was disabled during _create_nvmet_subsystem, call
> > _create_nvmet_host before connecting to allow the host to connect.
> >
> > [76096.420586] nvmet: adding nsid 1 to subsystem blktests-subsystem-0
> > [76096.440595] nvmet_tcp: enabling port 0 (127.0.0.1:4420)
> > [76096.491344] nvmet: connect by host nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349 for subsystem blktests-subsystem-0 not allowed
> > [76096.505049] nvme nvme2: Connect for subsystem blktests-subsystem-0 is not allowed, hostnqn: nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349
> > [76096.519609] nvme nvme2: failed to connect queue: 0 ret=16772
> >
> > Signed-off-by: Yi Zhang <yi.zhang@redhat.com>

Reviewed-by: Daniel Wagner <dwagner@suse.de>

> Thanks for the catching this. I looked back the past changes and found that the
> commit c32b233b7dd6 ("nvme/rc: Add helper for adding/removing to allow list")
> triggered the connection failure. So, I think a Fixes tag with this commit is
> required (I can add when this patch is applied).
>
> Even after the commit, the test case still passes. That's why I did not notice
> the connection failure. I think _nvme_connect_subsys() should check exit status
> of "nvme connect" command and print an error message on failure. This will help
> to catch similar connection failures in future.

I was running into a similiar problem for (not yet existing) nvme/050
test case [1]:

nvmf_wait_for_state() {
       local def_state_timeout=5
       local subsys_name="$1"
       local state="$2"
       local timeout="${3:-$def_state_timeout}"
       local nvmedev
       local state_file
       local start_time
       local end_time

       nvmedev=$(_find_nvme_dev "${subsys_name}")
       state_file="/sys/class/nvme-fabrics/ctl/${nvmedev}/state"

       start_time=$(date +%s)
       while ! grep -q "${state}" "${state_file}"; do
               sleep 1
               end_time=$(date +%s)
               if (( end_time - start_time > timeout )); then
                       echo "expected state \"${state}\" not " \
                               "reached within ${timeout} seconds"
                       return 1
               fi
       done

       return 0
}

_nvme_connect_subsys "${nvme_trtype}" "${def_subsysnqn}" \
                            --hostnqn "${def_hostnqn}" \
                            --reconnect-delay 1 \
                            --ctrl-loss-tmo 10

nvmf_wait_for_state "${def_subsysnqn}" "live"
nvmedev=$(_find_nvme_dev "${def_subsysnqn}")

We could make this a bit more generic and move it into the connect
helper. What do you think?

[1] https://lore.kernel.org/linux-nvme/20230621155825.20146-2-dwagner@suse.de/
Shin'ichiro Kawasaki Sept. 11, 2023, 11 a.m. UTC | #3
On Sep 08, 2023 / 15:09, Daniel Wagner wrote:
> On Fri, Sep 08, 2023 at 11:26:31AM +0000, Shinichiro Kawasaki wrote:
> > On Sep 07, 2023 / 11:44, Yi Zhang wrote:
> > > allow_any_host was disabled during _create_nvmet_subsystem, call
> > > _create_nvmet_host before connecting to allow the host to connect.
> > >
> > > [76096.420586] nvmet: adding nsid 1 to subsystem blktests-subsystem-0
> > > [76096.440595] nvmet_tcp: enabling port 0 (127.0.0.1:4420)
> > > [76096.491344] nvmet: connect by host nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349 for subsystem blktests-subsystem-0 not allowed
> > > [76096.505049] nvme nvme2: Connect for subsystem blktests-subsystem-0 is not allowed, hostnqn: nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349
> > > [76096.519609] nvme nvme2: failed to connect queue: 0 ret=16772
> > >
> > > Signed-off-by: Yi Zhang <yi.zhang@redhat.com>
> 
> Reviewed-by: Daniel Wagner <dwagner@suse.de>
> 
> > Thanks for the catching this. I looked back the past changes and found that the
> > commit c32b233b7dd6 ("nvme/rc: Add helper for adding/removing to allow list")
> > triggered the connection failure. So, I think a Fixes tag with this commit is
> > required (I can add when this patch is applied).

I've applied the fix. Thanks!

> >
> > Even after the commit, the test case still passes. That's why I did not notice
> > the connection failure. I think _nvme_connect_subsys() should check exit status
> > of "nvme connect" command and print an error message on failure. This will help
> > to catch similar connection failures in future.
> 
> I was running into a similiar problem for (not yet existing) nvme/050
> test case [1]:
> 
> nvmf_wait_for_state() {
>        local def_state_timeout=5
>        local subsys_name="$1"
>        local state="$2"
>        local timeout="${3:-$def_state_timeout}"
>        local nvmedev
>        local state_file
>        local start_time
>        local end_time
> 
>        nvmedev=$(_find_nvme_dev "${subsys_name}")
>        state_file="/sys/class/nvme-fabrics/ctl/${nvmedev}/state"
> 
>        start_time=$(date +%s)
>        while ! grep -q "${state}" "${state_file}"; do
>                sleep 1
>                end_time=$(date +%s)
>                if (( end_time - start_time > timeout )); then
>                        echo "expected state \"${state}\" not " \
>                                "reached within ${timeout} seconds"
>                        return 1
>                fi
>        done
> 
>        return 0
> }
> 
> _nvme_connect_subsys "${nvme_trtype}" "${def_subsysnqn}" \
>                             --hostnqn "${def_hostnqn}" \
>                             --reconnect-delay 1 \
>                             --ctrl-loss-tmo 10
> 
> nvmf_wait_for_state "${def_subsysnqn}" "live"
> nvmedev=$(_find_nvme_dev "${def_subsysnqn}")
> 
> We could make this a bit more generic and move it into the connect
> helper. What do you think?

This nvme state file check looks a bit complicated, but looks more robust than
"nvme connect" command exit status check. Do you think that "nvme connect" can
fail even when "nvme connect" command returns good status? If so, your approach
will be the way to choose.

> 
> [1] https://lore.kernel.org/linux-nvme/20230621155825.20146-2-dwagner@suse.de/
Daniel Wagner Sept. 11, 2023, 12:16 p.m. UTC | #4
On Mon, Sep 11, 2023 at 11:00:17AM +0000, Shinichiro Kawasaki wrote:
> > nvmf_wait_for_state "${def_subsysnqn}" "live"
> > nvmedev=$(_find_nvme_dev "${def_subsysnqn}")
> > 
> > We could make this a bit more generic and move it into the connect
> > helper. What do you think?
> 
> This nvme state file check looks a bit complicated, but looks more robust than
> "nvme connect" command exit status check. Do you think that "nvme connect" can
> fail even when "nvme connect" command returns good status? If so, your approach
> will be the way to choose.

Currently, 'nvme connect' is a synchronous call for tcp/rdma but not for
fc. 'nvme connect' for tcp/rdma will report an error if something is
wrong but not for fc, because it always return successfully.

The nvme/005 is exposing the behavior differences between the
transports. My long time goal is to address and make all transport
behave the same way (unification of the state machines). But as it
currently stands fc would need someting like this to make sure we are
not blindly reporting success just because the 'nvme connect' call is
successful.

This type of check would make the test suite more robust and better in
detecting errors.
Shin'ichiro Kawasaki Sept. 12, 2023, 12:18 a.m. UTC | #5
On Sep 11, 2023 / 14:16, Daniel Wagner wrote:
> On Mon, Sep 11, 2023 at 11:00:17AM +0000, Shinichiro Kawasaki wrote:
> > > nvmf_wait_for_state "${def_subsysnqn}" "live"
> > > nvmedev=$(_find_nvme_dev "${def_subsysnqn}")
> > > 
> > > We could make this a bit more generic and move it into the connect
> > > helper. What do you think?
> > 
> > This nvme state file check looks a bit complicated, but looks more robust than
> > "nvme connect" command exit status check. Do you think that "nvme connect" can
> > fail even when "nvme connect" command returns good status? If so, your approach
> > will be the way to choose.
> 
> Currently, 'nvme connect' is a synchronous call for tcp/rdma but not for
> fc. 'nvme connect' for tcp/rdma will report an error if something is
> wrong but not for fc, because it always return successfully.
> 
> The nvme/005 is exposing the behavior differences between the
> transports. My long time goal is to address and make all transport
> behave the same way (unification of the state machines). But as it
> currently stands fc would need someting like this to make sure we are
> not blindly reporting success just because the 'nvme connect' call is
> successful.
> 
> This type of check would make the test suite more robust and better in
> detecting errors.

Thanks for the explanation. I agree that the nvme state file check in
_nvme_connect_subsys() it the more robust and the better.
diff mbox series

Patch

diff --git a/tests/nvme/031 b/tests/nvme/031
index d5c2561..696db2d 100755
--- a/tests/nvme/031
+++ b/tests/nvme/031
@@ -42,10 +42,12 @@  test() {
 	for ((i = 0; i < iterations; i++)); do
 		_create_nvmet_subsystem "${subsys}$i" "${loop_dev}"
 		_add_nvmet_subsys_to_port "${port}" "${subsys}$i"
+		_create_nvmet_host "${subsys}$i" "${def_hostnqn}"
 		_nvme_connect_subsys "${nvme_trtype}" "${subsys}$i"
 		_nvme_disconnect_subsys "${subsys}$i" >> "${FULL}" 2>&1
 		_remove_nvmet_subsystem_from_port "${port}" "${subsys}$i"
 		_remove_nvmet_subsystem "${subsys}$i"
+		_remove_nvmet_host "${def_hostnqn}"
 	done
 
 	_remove_nvmet_port "${port}"