Message ID | 20241104192907.21358-1-kch@nvidia.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | nvme: test nvmet-wq sysfs interface | expand |
On Mon, Nov 04, 2024 at 11:29:07AM GMT, Chaitanya Kulkarni wrote:
> +# Test nvmet-wq cpumask sysfs attribute with NVMe-oF and fio workload
What do you want to test here? What's the objective? I don't see any
block layer related parts or do I miss something?
On 11/4/24 23:02, Daniel Wagner wrote: > On Mon, Nov 04, 2024 at 11:29:07AM GMT, Chaitanya Kulkarni wrote: >> +# Test nvmet-wq cpumask sysfs attribute with NVMe-oF and fio workload > What do you want to test here? What's the objective? I don't see any > block layer related parts or do I miss something? For NVMeOF target we have exported nvmet-wq to userspace with recent patch. This behavior is new and I don't think there is any test exists that runs the fio workload while changing the CPUMASK randomly to ensure the stability for nvmet-wq when application is using the block layer. Hence we need the test for latest patch we have added to the 6.13. -ck
On Wed, Nov 06, 2024 at 03:46:26AM GMT, Chaitanya Kulkarni wrote: > On 11/4/24 23:02, Daniel Wagner wrote: > > On Mon, Nov 04, 2024 at 11:29:07AM GMT, Chaitanya Kulkarni wrote: > >> +# Test nvmet-wq cpumask sysfs attribute with NVMe-oF and fio workload > > What do you want to test here? What's the objective? I don't see any > > block layer related parts or do I miss something? > > For NVMeOF target we have exported nvmet-wq to userspace with > recent patch. This behavior is new and I don't think there is any test > exists that runs the fio workload while changing the CPUMASK randomly > to ensure the stability for nvmet-wq when application is using the block > layer. Hence we need the test for latest patch we have added to the > 6.13. Though this is not a performance measurement just a functional testing if the scheduler works. I don't think we should add such tests because it will add to the overall runtime for little benefit. I am pretty sure there already tests (e.g. kunit) which do more elaborate scheduler tests.
On Nov 06, 2024 / 13:13, Daniel Wagner wrote: > On Wed, Nov 06, 2024 at 03:46:26AM GMT, Chaitanya Kulkarni wrote: > > On 11/4/24 23:02, Daniel Wagner wrote: > > > On Mon, Nov 04, 2024 at 11:29:07AM GMT, Chaitanya Kulkarni wrote: > > >> +# Test nvmet-wq cpumask sysfs attribute with NVMe-oF and fio workload > > > What do you want to test here? What's the objective? I don't see any > > > block layer related parts or do I miss something? > > > > For NVMeOF target we have exported nvmet-wq to userspace with > > recent patch. This behavior is new and I don't think there is any test > > exists that runs the fio workload while changing the CPUMASK randomly > > to ensure the stability for nvmet-wq when application is using the block > > layer. Hence we need the test for latest patch we have added to the > > 6.13. > > Though this is not a performance measurement just a functional testing if > the scheduler works. I don't think we should add such tests > because it will add to the overall runtime for little benefit. I am > pretty sure there already tests (e.g. kunit) which do more elaborate > scheduler tests. When I ran the test case on my test system using the kernel v6.12-rc6 and the patch "nvmet: make nvmet_wq visible in sysfs", I observed a kernel INFO [1]. This INFO looks indicating a bug in nvmet driver. Is this a known issue? The INFO is stably recreated with loop, rdma and tcp transports on my test system. If this is a new, unknown issue, this test case a good test to exercise nvmet driver. It looks worth adding to blktests. As for the runtime cost, I think the test case can be modified to reflect the TIMEOUT variable users set. This will allow users to control the runtime cost to some extent. [1] [ 87.365354][ T963] run blktests nvme/055 at 2024-11-08 09:32:42 [ 87.433572][ T1011] loop0: detected capacity change from 0 to 2097152 [ 87.452511][ T1014] nvmet: adding nsid 1 to subsystem blktests-subsystem-1 [ 87.478018][ T1019] nvmet_tcp: enabling port 0 (127.0.0.1:4420) [ 87.565565][ T226] nvmet: creating nvm controller 1 for subsystem blktests-subsystem-1 for NQN nqn.20. [ 87.570700][ T1026] nvme nvme1: Please enable CONFIG_NVME_MULTIPATH for full support of multi-port dev. [ 87.573064][ T1026] nvme nvme1: creating 4 I/O queues. [ 87.577041][ T1026] nvme nvme1: mapped 4/0/0 default/read/poll queues. [ 87.580437][ T1026] nvme nvme1: new ctrl: NQN "blktests-subsystem-1", addr 127.0.0.1:4420, hostnqn: nq9 [ 101.337422][ T1073] nvme nvme1: Removing ctrl: NQN "blktests-subsystem-1" [ 101.383814][ T104] INFO: trying to register non-static key. [ 101.384456][ T104] The code is fine but needs lockdep annotation, or maybe [ 101.385136][ T104] you didn't initialize this object before use? [ 101.385758][ T104] turning off the locking correctness validator. [ 101.386392][ T104] CPU: 1 UID: 0 PID: 104 Comm: kworker/u16:4 Not tainted 6.12.0-rc6+ #361 [ 101.387197][ T104] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-3.fc41 04/01/204 [ 101.388109][ T104] Workqueue: nvmet-wq nvmet_tcp_release_queue_work [nvmet_tcp] [ 101.388868][ T104] Call Trace: [ 101.389203][ T104] <TASK> [ 101.389484][ T104] dump_stack_lvl+0x6a/0x90 [ 101.389927][ T104] register_lock_class+0xe2a/0x10a0 [ 101.390444][ T104] ? __lock_acquire+0xd1b/0x5f20 [ 101.390921][ T104] ? __pfx_register_lock_class+0x10/0x10 [ 101.391479][ T104] __lock_acquire+0x81e/0x5f20 [ 101.391939][ T104] ? lock_is_held_type+0xd5/0x130 [ 101.392434][ T104] ? find_held_lock+0x2d/0x110 [ 101.392891][ T104] ? __pfx___lock_acquire+0x10/0x10 [ 101.393399][ T104] ? lock_release+0x460/0x7a0 [ 101.393853][ T104] ? __pfx_lock_release+0x10/0x10 [ 101.394351][ T104] lock_acquire.part.0+0x12d/0x360 [ 101.394841][ T104] ? xa_erase+0xd/0x30 [ 101.395255][ T104] ? __pfx_lock_acquire.part.0+0x10/0x10 [ 101.395793][ T104] ? rcu_is_watching+0x11/0xb0 [ 101.396271][ T104] ? trace_lock_acquire+0x12f/0x1a0 [ 101.396770][ T104] ? __pfx___flush_work+0x10/0x10 [ 101.397266][ T104] ? xa_erase+0xd/0x30 [ 101.397658][ T104] ? lock_acquire+0x2d/0xc0 [ 101.398089][ T104] ? xa_erase+0xd/0x30 [ 101.398501][ T104] _raw_spin_lock+0x2f/0x40 [ 101.398933][ T104] ? xa_erase+0xd/0x30 [ 101.400167][ T104] xa_erase+0xd/0x30 [ 101.401359][ T104] nvmet_ctrl_destroy_pr+0x10e/0x1c0 [nvmet] [ 101.402767][ T104] ? __pfx_nvmet_ctrl_destroy_pr+0x10/0x10 [nvmet] [ 101.404235][ T104] ? __pfx___might_resched+0x10/0x10 [ 101.405558][ T104] nvmet_ctrl_free+0x2f0/0x830 [nvmet] [ 101.406900][ T104] ? lockdep_hardirqs_on+0x78/0x100 [ 101.408192][ T104] ? __cancel_work+0x166/0x230 [ 101.409398][ T104] ? __pfx_nvmet_ctrl_free+0x10/0x10 [nvmet] [ 101.410726][ T104] ? rcu_is_watching+0x11/0xb0 [ 101.411938][ T104] ? kfree+0x13e/0x4a0 [ 101.413073][ T104] ? lockdep_hardirqs_on+0x78/0x100 [ 101.414318][ T104] nvmet_sq_destroy+0x1f2/0x3a0 [nvmet] [ 101.415556][ T104] nvmet_tcp_release_queue_work+0x4c0/0xe40 [nvmet_tcp] [ 101.416912][ T104] process_one_work+0x85a/0x1460 [ 101.418064][ T104] ? __pfx_lock_acquire.part.0+0x10/0x10 [ 101.419266][ T104] ? __pfx_process_one_work+0x10/0x10 [ 101.420427][ T104] ? assign_work+0x16c/0x240 [ 101.421486][ T104] ? lock_is_held_type+0xd5/0x130 [ 101.422555][ T104] worker_thread+0x5e2/0xfc0 [ 101.423574][ T104] ? __pfx_worker_thread+0x10/0x10 [ 101.424639][ T104] kthread+0x2d1/0x3a0 [ 101.425587][ T104] ? _raw_spin_unlock_irq+0x24/0x50 [ 101.426644][ T104] ? __pfx_kthread+0x10/0x10 [ 101.427637][ T104] ret_from_fork+0x30/0x70 [ 101.428596][ T104] ? __pfx_kthread+0x10/0x10 [ 101.429562][ T104] ret_from_fork_asm+0x1a/0x30 [ 101.430535][ T104] </TASK>
On Nov 04, 2024 / 11:29, Chaitanya Kulkarni wrote: > Add a test that randomly sets the cpumask from available CPUs for > the nvmet-wq while running the fio workload. This patch has been > tested on nvme-loop and nvme-tcp transport. > > Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com> Thanks. As I noted in another mail, this test case generates a kernel INFO message, and it looks like catching a new bug. So I think this test case worth adding. Please find my review comments in line. > --- > tests/nvme/055 | 99 ++++++++++++++++++++++++++++++++++++++++++++++ > tests/nvme/055.out | 3 ++ > 2 files changed, 102 insertions(+) > create mode 100755 tests/nvme/055 > create mode 100644 tests/nvme/055.out > > diff --git a/tests/nvme/055 b/tests/nvme/055 > new file mode 100755 > index 0000000..9fe27a3 > --- /dev/null > +++ b/tests/nvme/055 > @@ -0,0 +1,99 @@ > +#!/bin/bash > +# SPDX-License-Identifier: GPL-2.0+ > +# Copyright (C) 2024 Chaitanya Kulkarni > +# > +# Test nvmet-wq cpumask sysfs attribute with NVMe-oF and fio workload > +# > + > +. tests/nvme/rc > + > +DESCRIPTION="Test nvmet-wq cpumask sysfs attribute with fio on NVMe-oF device" > +TIMED=1 > + > +requires() { > + _nvme_requires > + _have_fio && _have_loop Nit, a unneccesaary space. > + _require_nvme_trtype_is_fabrics > +} I suggest to add set_conditions() here as below. Without it, the test case is skipped when users do not se NVMET_TRTYPES variable. If we add set_conditions(), _have_loop call in requires() can be removed. set_conditions() { _set_nvme_trtype "$@" } > + > +cleanup_setup() { > + _nvme_disconnect_subsys > + _nvmet_target_cleanup > +} > + > +test() { > + local cpumask_path="/sys/devices/virtual/workqueue/nvmet-wq/cpumask" > + local original_cpumask > + local min_cpus > + local max_cpus > + local numbers > + local idx > + local ns > + > + echo "Running ${TEST_NAME}" > + > + _setup_nvmet > + _nvmet_target_setup > + _nvme_connect_subsys > + > + if [ ! -f "$cpumask_path" ]; then > + SKIP_REASONS+=("nvmet_wq cpumask sysfs attribute not found.") > + cleanup_setup > + return 1 > + fi > + > + ns=$(_find_nvme_ns "${def_subsys_uuid}") > + > + original_cpumask=$(cat "$cpumask_path") > + > + num_cpus=$(nproc) > + max_cpus=$(( num_cpus < 20 ? num_cpus : 20 )) > + min_cpus=0 > + #shellcheck disable=SC2207 > + numbers=($(seq $min_cpus $max_cpus)) Nit: the shellcheck error can be vaoided with this: read -a numbers -d '' < <(seq $min_cpus $max_cpus) > + > + _run_fio_rand_io --filename="/dev/${ns}" --time_based --runtime=130s \ The --time_based and --runtime=130s options are not required, because fio helper bash functions will add them. Instead, let's add this line before the fio command. : ${TIMEOUT:=60} The fio helper functions will refect this TIMEOUT value to the --runtime option. This will allow users to control runtime cost as TIMED=1 indicates. > + --iodepth=8 --size="${NVME_IMG_SIZE}" &> "$FULL" & > + > + # Let the fio settle down else we will break in the loop for fio check > + sleep 1 > + for ((i = 0; i < max_cpus; i++)); do > + if ! pgrep -x fio &> /dev/null ; then pgrep command is not in the GNU coreutils. I suggest to keep the fio process id in a variable and check with "kill -0" command. The test case block/005 does it. > + break > + fi > + > + if [[ ${#numbers[@]} -eq 0 ]]; then > + break > + fi > + > + idx=$((RANDOM % ${#numbers[@]})) > + > + #shellcheck disable=SC2004 > + cpu_mask=$(printf "%X" $((1 << ${numbers[idx]}))) > + echo "$cpu_mask" > "$cpumask_path" When I ran this test case, I observed an error at this line: echo: write error: Value too large for defined data type I think the cpu_mask calculation is wrong, and it should be like this: cpu_mask=0 for ((n = 0; n < numbers[idx]; n++)); do cpu_mask=$((cpu_mask + (1 << n))) done cpu_mask=$(printf "%X" $((cpu_mask))) > + if [[ $(cat "$cpumask_path") =~ ^[0,]*${cpu_mask}\n$ ]]; then > + echo "Test Failed: cpumask was not set correctly " > + echo "Expected ${cpu_mask} found $(cat "$cpumask_path")" > + cleanup_setup > + return 1 > + fi > + sleep 3 > + # Remove the selected number > + numbers=("${numbers[@]:0:$idx}" "${numbers[@]:$((idx + 1))}") > + done > + > + killall fio &> /dev/null killall is not in GNU coreutils either (blktests already uses it at other places though...) Does "kill -9" work instead? > + > + # Restore original cpumask > + echo "$original_cpumask" > "$cpumask_path" > + restored_cpumask=$(cat "$cpumask_path") > + > + if [[ "$restored_cpumask" != "$original_cpumask" ]]; then > + echo "Failed to restore original cpumask." > + cleanup_setup > + return 1 > + fi > + > + cleanup_setup > + echo "Test complete" > +} > diff --git a/tests/nvme/055.out b/tests/nvme/055.out > new file mode 100644 > index 0000000..427dfee > --- /dev/null > +++ b/tests/nvme/055.out > @@ -0,0 +1,3 @@ > +Running nvme/055 > +disconnected 1 controller(s) > +Test complete > -- > 2.40.0 >
On 11/7/24 21:21, Shinichiro Kawasaki wrote: > On Nov 04, 2024 / 11:29, Chaitanya Kulkarni wrote: >> Add a test that randomly sets the cpumask from available CPUs for >> the nvmet-wq while running the fio workload. This patch has been >> tested on nvme-loop and nvme-tcp transport. >> >> Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com> > > Thanks. As I noted in another mail, this test case generates a kernel > INFO message, and it looks like catching a new bug. So I think this > test case worth adding. Please find my review comments in line. > >> --- >> tests/nvme/055 | 99 ++++++++++++++++++++++++++++++++++++++++++++++ >> tests/nvme/055.out | 3 ++ >> 2 files changed, 102 insertions(+) >> create mode 100755 tests/nvme/055 >> create mode 100644 tests/nvme/055.out >> >> diff --git a/tests/nvme/055 b/tests/nvme/055 >> new file mode 100755 >> index 0000000..9fe27a3 >> --- /dev/null >> +++ b/tests/nvme/055 >> @@ -0,0 +1,99 @@ >> +#!/bin/bash >> +# SPDX-License-Identifier: GPL-2.0+ >> +# Copyright (C) 2024 Chaitanya Kulkarni >> +# >> +# Test nvmet-wq cpumask sysfs attribute with NVMe-oF and fio workload >> +# >> + >> +. tests/nvme/rc >> + >> +DESCRIPTION="Test nvmet-wq cpumask sysfs attribute with fio on NVMe-oF device" >> +TIMED=1 >> + >> +requires() { >> + _nvme_requires >> + _have_fio && _have_loop > > Nit, a unneccesaary space. > >> + _require_nvme_trtype_is_fabrics >> +} > done. > I suggest to add set_conditions() here as below. Without it, the test case is > skipped when users do not se NVMET_TRTYPES variable. If we add set_conditions(), > _have_loop call in requires() can be removed. > > set_conditions() { > _set_nvme_trtype "$@" > } > done. >> + >> +cleanup_setup() { >> + _nvme_disconnect_subsys >> + _nvmet_target_cleanup >> +} >> + >> +test() { >> + local cpumask_path="/sys/devices/virtual/workqueue/nvmet-wq/cpumask" >> + local original_cpumask >> + local min_cpus >> + local max_cpus >> + local numbers >> + local idx >> + local ns >> + >> + echo "Running ${TEST_NAME}" >> + >> + _setup_nvmet >> + _nvmet_target_setup >> + _nvme_connect_subsys >> + >> + if [ ! -f "$cpumask_path" ]; then >> + SKIP_REASONS+=("nvmet_wq cpumask sysfs attribute not found.") >> + cleanup_setup >> + return 1 >> + fi >> + >> + ns=$(_find_nvme_ns "${def_subsys_uuid}") >> + >> + original_cpumask=$(cat "$cpumask_path") >> + >> + num_cpus=$(nproc) >> + max_cpus=$(( num_cpus < 20 ? num_cpus : 20 )) >> + min_cpus=0 >> + #shellcheck disable=SC2207 >> + numbers=($(seq $min_cpus $max_cpus)) > > Nit: the shellcheck error can be vaoided with this: > > read -a numbers -d '' < <(seq $min_cpus $max_cpus) > done. >> + >> + _run_fio_rand_io --filename="/dev/${ns}" --time_based --runtime=130s \ > > The --time_based and --runtime=130s options are not required, because fio helper > bash functions will add them. > > Instead, let's add this line before the fio command. > > : ${TIMEOUT:=60} > > The fio helper functions will refect this TIMEOUT value to the --runtime > option. This will allow users to control runtime cost as TIMED=1 indicates. > done. >> + --iodepth=8 --size="${NVME_IMG_SIZE}" &> "$FULL" & >> + >> + # Let the fio settle down else we will break in the loop for fio check >> + sleep 1 >> + for ((i = 0; i < max_cpus; i++)); do >> + if ! pgrep -x fio &> /dev/null ; then > > pgrep command is not in the GNU coreutils. I suggest to keep the fio process id > in a variable and check with "kill -0" command. The test case block/005 does it. > thanks for the pointer, done and it looks much cleaner. >> + break >> + fi >> + >> + if [[ ${#numbers[@]} -eq 0 ]]; then >> + break >> + fi >> + >> + idx=$((RANDOM % ${#numbers[@]})) >> + >> + #shellcheck disable=SC2004 >> + cpu_mask=$(printf "%X" $((1 << ${numbers[idx]}))) >> + echo "$cpu_mask" > "$cpumask_path" > > When I ran this test case, I observed an error at this line: > > echo: write error: Value too large for defined data type > > I think the cpu_mask calculation is wrong, and it should be like this: > > cpu_mask=0 > for ((n = 0; n < numbers[idx]; n++)); do > cpu_mask=$((cpu_mask + (1 << n))) > done > cpu_mask=$(printf "%X" $((cpu_mask))) > > didn't see that error in my execution done. This version tries to only use one CPU at a time, but with your suggestion it will now consider more than one CPUs which I think is useful too. I modified the calculation to keep it simple for now. >> + if [[ $(cat "$cpumask_path") =~ ^[0,]*${cpu_mask}\n$ ]]; then >> + echo "Test Failed: cpumask was not set correctly " >> + echo "Expected ${cpu_mask} found $(cat "$cpumask_path")" >> + cleanup_setup >> + return 1 >> + fi >> + sleep 3 >> + # Remove the selected number >> + numbers=("${numbers[@]:0:$idx}" "${numbers[@]:$((idx + 1))}") >> + done >> + >> + killall fio &> /dev/null > > killall is not in GNU coreutils either (blktests already uses it at other > places though...) Does "kill -9" work instead? > done. Thanks for the review comments, I'll send it V2 soon. -ck
diff --git a/tests/nvme/055 b/tests/nvme/055 new file mode 100755 index 0000000..9fe27a3 --- /dev/null +++ b/tests/nvme/055 @@ -0,0 +1,99 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0+ +# Copyright (C) 2024 Chaitanya Kulkarni +# +# Test nvmet-wq cpumask sysfs attribute with NVMe-oF and fio workload +# + +. tests/nvme/rc + +DESCRIPTION="Test nvmet-wq cpumask sysfs attribute with fio on NVMe-oF device" +TIMED=1 + +requires() { + _nvme_requires + _have_fio && _have_loop + _require_nvme_trtype_is_fabrics +} + +cleanup_setup() { + _nvme_disconnect_subsys + _nvmet_target_cleanup +} + +test() { + local cpumask_path="/sys/devices/virtual/workqueue/nvmet-wq/cpumask" + local original_cpumask + local min_cpus + local max_cpus + local numbers + local idx + local ns + + echo "Running ${TEST_NAME}" + + _setup_nvmet + _nvmet_target_setup + _nvme_connect_subsys + + if [ ! -f "$cpumask_path" ]; then + SKIP_REASONS+=("nvmet_wq cpumask sysfs attribute not found.") + cleanup_setup + return 1 + fi + + ns=$(_find_nvme_ns "${def_subsys_uuid}") + + original_cpumask=$(cat "$cpumask_path") + + num_cpus=$(nproc) + max_cpus=$(( num_cpus < 20 ? num_cpus : 20 )) + min_cpus=0 + #shellcheck disable=SC2207 + numbers=($(seq $min_cpus $max_cpus)) + + _run_fio_rand_io --filename="/dev/${ns}" --time_based --runtime=130s \ + --iodepth=8 --size="${NVME_IMG_SIZE}" &> "$FULL" & + + # Let the fio settle down else we will break in the loop for fio check + sleep 1 + for ((i = 0; i < max_cpus; i++)); do + if ! pgrep -x fio &> /dev/null ; then + break + fi + + if [[ ${#numbers[@]} -eq 0 ]]; then + break + fi + + idx=$((RANDOM % ${#numbers[@]})) + + #shellcheck disable=SC2004 + cpu_mask=$(printf "%X" $((1 << ${numbers[idx]}))) + echo "$cpu_mask" > "$cpumask_path" + if [[ $(cat "$cpumask_path") =~ ^[0,]*${cpu_mask}\n$ ]]; then + echo "Test Failed: cpumask was not set correctly " + echo "Expected ${cpu_mask} found $(cat "$cpumask_path")" + cleanup_setup + return 1 + fi + sleep 3 + # Remove the selected number + numbers=("${numbers[@]:0:$idx}" "${numbers[@]:$((idx + 1))}") + done + + killall fio &> /dev/null + + # Restore original cpumask + echo "$original_cpumask" > "$cpumask_path" + restored_cpumask=$(cat "$cpumask_path") + + if [[ "$restored_cpumask" != "$original_cpumask" ]]; then + echo "Failed to restore original cpumask." + cleanup_setup + return 1 + fi + + cleanup_setup + echo "Test complete" +} diff --git a/tests/nvme/055.out b/tests/nvme/055.out new file mode 100644 index 0000000..427dfee --- /dev/null +++ b/tests/nvme/055.out @@ -0,0 +1,3 @@ +Running nvme/055 +disconnected 1 controller(s) +Test complete
Add a test that randomly sets the cpumask from available CPUs for the nvmet-wq while running the fio workload. This patch has been tested on nvme-loop and nvme-tcp transport. Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com> --- tests/nvme/055 | 99 ++++++++++++++++++++++++++++++++++++++++++++++ tests/nvme/055.out | 3 ++ 2 files changed, 102 insertions(+) create mode 100755 tests/nvme/055 create mode 100644 tests/nvme/055.out