Message ID | 20241223054535.295371-1-lizhijian@fujitsu.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | [blktests] tests/rnbd: Implement RNBD regression test | expand |
On Dec 23, 2024 / 13:45, Li Zhijian wrote: > This test case has been in my possession for quite some time. > I am upstreaming it now because it has once again detected a regression in > a recent kernel release [1]. > > It's just stupid to connect and disconnect RNBD on localhost and expect > no dmesg exceptions, with some attempts actually succeeding. > > Please note that currently, only RTRS over RXE is supported. > > [1] https://lore.kernel.org/linux-rdma/20241223025700.292536-1-lizhijian@fujitsu.com/ Hello Li, thank you for the patch. I ran the test case that this patch adds with the kernel v6.13-rc4 KASAN enabled. I observed "BUG: KASAN: slab-use-after-free in __list_add_valid_or_report". I confirmed that your kernel patch avoids the message. So, the new blktests test case catches a kernel BUG. Good. Having said that, even with the kernel fix patch, still I observe the test case failure in my QEMU test environment. The count j is almost always zero. I once observed the count became 3, but it is far smaller than the criteria 10. I guess test environment performance factors might affect the pass/fail results. Do we really need to check the start/stop success ratio? I think the BUG can be caught regardless of the check. Also, please find my other reivew comments in line. > > Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> > --- > Copy to the RDMA/rtrs guys: > > Cc: Jack Wang <jinpu.wang@ionos.com> > Cc: Jason Gunthorpe <jgg@ziepe.ca> > Cc: Leon Romanovsky <leon@kernel.org> > Cc: "Md. Haris Iqbal" <haris.iqbal@ionos.com> > --- > tests/rnbd/001 | 37 ++++++++++++++++++++++++++++ > tests/rnbd/001.out | 2 ++ > tests/rnbd/rc | 60 ++++++++++++++++++++++++++++++++++++++++++++++ > 3 files changed, 99 insertions(+) > create mode 100755 tests/rnbd/001 > create mode 100644 tests/rnbd/001.out > create mode 100644 tests/rnbd/rc > > diff --git a/tests/rnbd/001 b/tests/rnbd/001 > new file mode 100755 > index 000000000000..220468f0f5b4 > --- /dev/null > +++ b/tests/rnbd/001 > @@ -0,0 +1,37 @@ > +#!/bin/bash > +# SPDX-License-Identifier: GPL-3.0+ > +# Copyright 2024, Fujitsu > +# I suggest to describe shortly here that this test can catch the kernel regression, in same manner as the commit message. > +. tests/rnbd/rc > + > +DESCRIPTION="Start Stop RNBD" > +CHECK_DMESG=1 Do you expect this test completes quickly (around 30 seconds)? If so, I suggest to add QUICK=1 here. > + > +requires() { > + _have_rnbd I suggest to add "_have_loop" here. > +} > + > +test_start_stop() > +{ > + _setup_rnbd || return > + > + local loop_dev i j > + loop_dev="$(losetup -f)" > + > + for ((i=0;i<100;i++)) > + do > + _start_rnbd_client "${loop_dev}" &>/dev/null && > + _stop_rnbd_client &>/dev/null && ((j++)) > + done > + > + # We expect at least 10% start/stop successfully > + if [[ $j -lt 10 ]]; then > + echo "Failed: $j/$i" > + fi > +} > + > +test() { > + echo "Running ${TEST_NAME}" > + test_start_stop > + echo "Test complete" > +} > diff --git a/tests/rnbd/001.out b/tests/rnbd/001.out > new file mode 100644 > index 000000000000..c1f9980d0f7b > --- /dev/null > +++ b/tests/rnbd/001.out > @@ -0,0 +1,2 @@ > +Running rnbd/001 > +Test complete > diff --git a/tests/rnbd/rc b/tests/rnbd/rc > new file mode 100644 > index 000000000000..143ba0b59f38 > --- /dev/null > +++ b/tests/rnbd/rc > @@ -0,0 +1,60 @@ > +#!/bin/bash > +# SPDX-License-Identifier: GPL-3.0+ > +# Copyright 2024, Fujitsu > +# > +# RNBD tests. > + > +. common/rc > +. common/multipath-over-rdma > + > +_have_rnbd() { > + if [[ "$USE_RXE" != 1 ]]; then > + SKIP_REASONS+=("Only USE_RXE=1 is supported") > + return 1 > + fi > + _have_driver rdma_rxe || return > + _have_driver rnbd_server || return > + _have_driver rnbd_client > +} > + > +_setup_rnbd() { > + modprobe -q rnbd_server > + modprobe -q rnbd_client > + start_soft_rdma I was not able to find stop_soft_rdma() in this patch. It might be the better to add _cleanup_rnbd() to call stop_soft_rdma(). > + for i in $(rdma_network_interfaces) > + do > + ipv4_addr=$(get_ipv4_addr "$i") > + if [[ -n "${ipv4_addr}" ]]; then > + def_traddr=${ipv4_addr} > + fi > + done > +} > + > +_start_rnbd_client() { > + local a b > + local blkdev=$1 > + local before after > + > + before=$(ls -d /sys/block/rnbd* 2>/dev/null) > + if ! echo "sessname=blktest path=ip:$def_traddr device_path=$blkdev" > /sys/devices/virtual/rnbd-client/ctl/map_device; then > + return 1 > + fi > + > + # Retrieve the newly added rnbd entry > + after=$(ls -d /sys/block/rnbd* 2>/dev/null) > + for a in $after > + do > + [[ -n "$before" ]] || break > + > + for b in $before > + do > + [[ "$a" = "$b" ]] || break > + done > + done > + > + rnbd_node=$a Nit: this rnbd_node is a global variable. To clarify it, I suggest to use capital letters for its name and declare it at the beginning of this script file, like, declare RNBD_NODE > +} > + > +_stop_rnbd_client() { > + echo "normal" > "$rnbd_node"/rnbd/unmap_device > +} > -- > 2.47.0 >
Shinichiro, On 24/12/2024 10:47, Shinichiro Kawasaki wrote: > On Dec 23, 2024 / 13:45, Li Zhijian wrote: >> This test case has been in my possession for quite some time. >> I am upstreaming it now because it has once again detected a regression in >> a recent kernel release [1]. >> >> It's just stupid to connect and disconnect RNBD on localhost and expect >> no dmesg exceptions, with some attempts actually succeeding. >> >> Please note that currently, only RTRS over RXE is supported. >> >> [1] https://lore.kernel.org/linux-rdma/20241223025700.292536-1-lizhijian@fujitsu.com/ > > Hello Li, thank you for the patch. I ran the test case that this patch adds with > the kernel v6.13-rc4 KASAN enabled. I observed "BUG: KASAN: slab-use-after-free > in __list_add_valid_or_report". I confirmed that your kernel patch avoids > the message. So, the new blktests test case catches a kernel BUG. Good. Thanks for the testing > > Having said that, even with the kernel fix patch, still I observe the test > case failure in my QEMU test environment. The count j is almost always zero. > I once observed the count became 3, but it is far smaller than the criteria 10. > I guess test environment performance factors might affect the pass/fail results. > Do we really need to check the start/stop success ratio? Good question I often observed ~50% success in my QEMU environment, zero success is not expected IMO. I think the BUG can be > caught regardless of the check. > Yes, this is true. > Also, please find my other reivew comments in line. > >> >> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> >> --- >> Copy to the RDMA/rtrs guys: >> >> Cc: Jack Wang <jinpu.wang@ionos.com> >> Cc: Jason Gunthorpe <jgg@ziepe.ca> >> Cc: Leon Romanovsky <leon@kernel.org> >> Cc: "Md. Haris Iqbal" <haris.iqbal@ionos.com> >> --- >> tests/rnbd/001 | 37 ++++++++++++++++++++++++++++ >> tests/rnbd/001.out | 2 ++ >> tests/rnbd/rc | 60 ++++++++++++++++++++++++++++++++++++++++++++++ >> 3 files changed, 99 insertions(+) >> create mode 100755 tests/rnbd/001 >> create mode 100644 tests/rnbd/001.out >> create mode 100644 tests/rnbd/rc >> >> diff --git a/tests/rnbd/001 b/tests/rnbd/001 >> new file mode 100755 >> index 000000000000..220468f0f5b4 >> --- /dev/null >> +++ b/tests/rnbd/001 >> @@ -0,0 +1,37 @@ >> +#!/bin/bash >> +# SPDX-License-Identifier: GPL-3.0+ >> +# Copyright 2024, Fujitsu >> +# > > I suggest to describe shortly here that this test can catch the kernel > regression, in same manner as the commit message. Ok, I will update it. > >> +. tests/rnbd/rc >> + >> +DESCRIPTION="Start Stop RNBD" >> +CHECK_DMESG=1 > > Do you expect this test completes quickly (around 30 seconds)? If so, > I suggest to add QUICK=1 here. Yes, QUICK=1 is fine.(It finishes in 15 seconds) > >> + >> +requires() { >> + _have_rnbd > > I suggest to add "_have_loop" here. Ok. > >> +} >> + >> +test_start_stop() >> +{ >> + _setup_rnbd || return >> + >> + local loop_dev i j >> + loop_dev="$(losetup -f)" >> + >> + for ((i=0;i<100;i++)) >> + do >> + _start_rnbd_client "${loop_dev}" &>/dev/null && >> + _stop_rnbd_client &>/dev/null && ((j++))>> + done >> + >> + # We expect at least 10% start/stop successfully I will consider the ratio again. >> + if [[ $j -lt 10 ]]; then >> + echo "Failed: $j/$i" >> + fi >> +} >> + >> +test() { >> + echo "Running ${TEST_NAME}" >> + test_start_stop >> + echo "Test complete" >> +} >> diff --git a/tests/rnbd/001.out b/tests/rnbd/001.out >> new file mode 100644 >> index 000000000000..c1f9980d0f7b >> --- /dev/null >> +++ b/tests/rnbd/001.out >> @@ -0,0 +1,2 @@ >> +Running rnbd/001 >> +Test complete >> diff --git a/tests/rnbd/rc b/tests/rnbd/rc >> new file mode 100644 >> index 000000000000..143ba0b59f38 >> --- /dev/null >> +++ b/tests/rnbd/rc >> @@ -0,0 +1,60 @@ >> +#!/bin/bash >> +# SPDX-License-Identifier: GPL-3.0+ >> +# Copyright 2024, Fujitsu >> +# >> +# RNBD tests. >> + >> +. common/rc >> +. common/multipath-over-rdma >> + >> +_have_rnbd() { >> + if [[ "$USE_RXE" != 1 ]]; then >> + SKIP_REASONS+=("Only USE_RXE=1 is supported") >> + return 1 >> + fi >> + _have_driver rdma_rxe || return >> + _have_driver rnbd_server || return >> + _have_driver rnbd_client >> +} >> + >> +_setup_rnbd() { >> + modprobe -q rnbd_server >> + modprobe -q rnbd_client >> + start_soft_rdma > > I was not able to find stop_soft_rdma() in this patch. It might be the better to > add _cleanup_rnbd() to call stop_soft_rdma(). Good catch, it sounds good to me. > >> + for i in $(rdma_network_interfaces) >> + do >> + ipv4_addr=$(get_ipv4_addr "$i") >> + if [[ -n "${ipv4_addr}" ]]; then >> + def_traddr=${ipv4_addr} >> + fi >> + done >> +} >> + >> +_start_rnbd_client() { >> + local a b >> + local blkdev=$1 >> + local before after >> + >> + before=$(ls -d /sys/block/rnbd* 2>/dev/null) >> + if ! echo "sessname=blktest path=ip:$def_traddr device_path=$blkdev" > /sys/devices/virtual/rnbd-client/ctl/map_device; then >> + return 1 >> + fi >> + >> + # Retrieve the newly added rnbd entry >> + after=$(ls -d /sys/block/rnbd* 2>/dev/null) >> + for a in $after >> + do >> + [[ -n "$before" ]] || break >> + >> + for b in $before >> + do >> + [[ "$a" = "$b" ]] || break >> + done >> + done >> + >> + rnbd_node=$a > > Nit: this rnbd_node is a global variable. To clarify it, I suggest to use > capital letters for its name and declare it at the beginning of this script > file, like, > > declare RNBD_NODE Sounds good. Thanks Zhijian > >> +} >> + >> +_stop_rnbd_client() { >> + echo "normal" > "$rnbd_node"/rnbd/unmap_device >> +} >> -- >> 2.47.0
diff --git a/tests/rnbd/001 b/tests/rnbd/001 new file mode 100755 index 000000000000..220468f0f5b4 --- /dev/null +++ b/tests/rnbd/001 @@ -0,0 +1,37 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-3.0+ +# Copyright 2024, Fujitsu +# +. tests/rnbd/rc + +DESCRIPTION="Start Stop RNBD" +CHECK_DMESG=1 + +requires() { + _have_rnbd +} + +test_start_stop() +{ + _setup_rnbd || return + + local loop_dev i j + loop_dev="$(losetup -f)" + + for ((i=0;i<100;i++)) + do + _start_rnbd_client "${loop_dev}" &>/dev/null && + _stop_rnbd_client &>/dev/null && ((j++)) + done + + # We expect at least 10% start/stop successfully + if [[ $j -lt 10 ]]; then + echo "Failed: $j/$i" + fi +} + +test() { + echo "Running ${TEST_NAME}" + test_start_stop + echo "Test complete" +} diff --git a/tests/rnbd/001.out b/tests/rnbd/001.out new file mode 100644 index 000000000000..c1f9980d0f7b --- /dev/null +++ b/tests/rnbd/001.out @@ -0,0 +1,2 @@ +Running rnbd/001 +Test complete diff --git a/tests/rnbd/rc b/tests/rnbd/rc new file mode 100644 index 000000000000..143ba0b59f38 --- /dev/null +++ b/tests/rnbd/rc @@ -0,0 +1,60 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-3.0+ +# Copyright 2024, Fujitsu +# +# RNBD tests. + +. common/rc +. common/multipath-over-rdma + +_have_rnbd() { + if [[ "$USE_RXE" != 1 ]]; then + SKIP_REASONS+=("Only USE_RXE=1 is supported") + return 1 + fi + _have_driver rdma_rxe || return + _have_driver rnbd_server || return + _have_driver rnbd_client +} + +_setup_rnbd() { + modprobe -q rnbd_server + modprobe -q rnbd_client + start_soft_rdma + for i in $(rdma_network_interfaces) + do + ipv4_addr=$(get_ipv4_addr "$i") + if [[ -n "${ipv4_addr}" ]]; then + def_traddr=${ipv4_addr} + fi + done +} + +_start_rnbd_client() { + local a b + local blkdev=$1 + local before after + + before=$(ls -d /sys/block/rnbd* 2>/dev/null) + if ! echo "sessname=blktest path=ip:$def_traddr device_path=$blkdev" > /sys/devices/virtual/rnbd-client/ctl/map_device; then + return 1 + fi + + # Retrieve the newly added rnbd entry + after=$(ls -d /sys/block/rnbd* 2>/dev/null) + for a in $after + do + [[ -n "$before" ]] || break + + for b in $before + do + [[ "$a" = "$b" ]] || break + done + done + + rnbd_node=$a +} + +_stop_rnbd_client() { + echo "normal" > "$rnbd_node"/rnbd/unmap_device +}
This test case has been in my possession for quite some time. I am upstreaming it now because it has once again detected a regression in a recent kernel release [1]. It's just stupid to connect and disconnect RNBD on localhost and expect no dmesg exceptions, with some attempts actually succeeding. Please note that currently, only RTRS over RXE is supported. [1] https://lore.kernel.org/linux-rdma/20241223025700.292536-1-lizhijian@fujitsu.com/ Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> --- Copy to the RDMA/rtrs guys: Cc: Jack Wang <jinpu.wang@ionos.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Leon Romanovsky <leon@kernel.org> Cc: "Md. Haris Iqbal" <haris.iqbal@ionos.com> --- tests/rnbd/001 | 37 ++++++++++++++++++++++++++++ tests/rnbd/001.out | 2 ++ tests/rnbd/rc | 60 ++++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 99 insertions(+) create mode 100755 tests/rnbd/001 create mode 100644 tests/rnbd/001.out create mode 100644 tests/rnbd/rc