Message ID | 20190329221432.38726-1-dennis@kernel.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | block: fix fio jobs for 027 and add cgroup support | expand |
On Fri, Mar 29, 2019 at 03:14:32PM -0700, Dennis Zhou wrote: > Previously, the test was broken as "$fio_jobs" was considered as a > string instead of additional parameters. This is fixed here. > > Second, there was an issue with earlier kernels when request lists > existed such that request_queues were never being cleaned up because > non-root blkgs took a reference on the queue. However, blkgs were being > cleaned up after the last reference to the request_queue was put back. > This creates a blktest cgroup for the fio jobs to utilize and should be > useful for catching this scenario in the future. > > Signed-off-by: Dennis Zhou <dennis@kernel.org> Would it be valuable to test this both in a cgroup and not in a cgroup? > tests/block/027 | 17 ++++++++++++++--- > 1 file changed, 14 insertions(+), 3 deletions(-) > > diff --git a/tests/block/027 b/tests/block/027 > index b480016..36e88b4 100755 > --- a/tests/block/027 > +++ b/tests/block/027 > @@ -26,6 +26,13 @@ scsi_debug_stress_remove() { > return > fi > > + _init_cgroup2 > + > + # setup cgroups > + echo "+io" > "/sys/fs/cgroup/cgroup.subtree_control" > + echo "+io" > "$CGROUP2_DIR/cgroup.subtree_control" > + mkdir "$CGROUP2_DIR/fast" > + Why is this called "fast"? > # set higher aio limit > echo 524288 > /proc/sys/fs/aio-max-nr > > @@ -45,13 +52,13 @@ scsi_debug_stress_remove() { > ((cnt++)) > done > > - local num_jobs=4 runtime=21 > + local num_jobs=4 runtime=12 > fio --rw=randread --size=128G --direct=1 --ioengine=libaio \ > --iodepth=2048 --numjobs=$num_jobs --bs=4k \ > --group_reporting=1 --group_reporting=1 --runtime=$runtime \ > - --loops=10000 "$fio_jobs" > "$FULL" 2>&1 & > + --loops=10000 --cgroup=blktests/fast $fio_jobs > "$FULL" 2>&1 & > > - sleep 7 > + sleep 6 Is there any particular reason you changed the run time and sleep time and added the sleep at the end? > local device_path > for dev in "${SCSI_DEBUG_DEVICES[@]}"; do > # shutdown devices in progress > @@ -61,6 +68,10 @@ scsi_debug_stress_remove() { > > wait > > + sleep 10 > + > + _exit_cgroup2 > + > _exit_scsi_debug > } > > -- > 2.17.1 >
On Thu, Apr 04, 2019 at 11:02:24AM -0700, Omar Sandoval wrote: > On Fri, Mar 29, 2019 at 03:14:32PM -0700, Dennis Zhou wrote: > > Previously, the test was broken as "$fio_jobs" was considered as a > > string instead of additional parameters. This is fixed here. > > > > Second, there was an issue with earlier kernels when request lists > > existed such that request_queues were never being cleaned up because > > non-root blkgs took a reference on the queue. However, blkgs were being > > cleaned up after the last reference to the request_queue was put back. > > This creates a blktest cgroup for the fio jobs to utilize and should be > > useful for catching this scenario in the future. > > > > Signed-off-by: Dennis Zhou <dennis@kernel.org> > > Would it be valuable to test this both in a cgroup and not in a cgroup? > I lean towards no because testing in a cgroup should subsume testing not in a cgroup. > > tests/block/027 | 17 ++++++++++++++--- > > 1 file changed, 14 insertions(+), 3 deletions(-) > > > > diff --git a/tests/block/027 b/tests/block/027 > > index b480016..36e88b4 100755 > > --- a/tests/block/027 > > +++ b/tests/block/027 > > @@ -26,6 +26,13 @@ scsi_debug_stress_remove() { > > return > > fi > > > > + _init_cgroup2 > > + > > + # setup cgroups > > + echo "+io" > "/sys/fs/cgroup/cgroup.subtree_control" > > + echo "+io" > "$CGROUP2_DIR/cgroup.subtree_control" > > + mkdir "$CGROUP2_DIR/fast" > > + > > Why is this called "fast"? > I just picked a name from Josef's test. I was hesitant to use ${TEST_NAME} because of the /, but there's no reason that's not okay. > > # set higher aio limit > > echo 524288 > /proc/sys/fs/aio-max-nr > > > > @@ -45,13 +52,13 @@ scsi_debug_stress_remove() { > > ((cnt++)) > > done > > > > - local num_jobs=4 runtime=21 > > + local num_jobs=4 runtime=12 > > fio --rw=randread --size=128G --direct=1 --ioengine=libaio \ > > --iodepth=2048 --numjobs=$num_jobs --bs=4k \ > > --group_reporting=1 --group_reporting=1 --runtime=$runtime \ > > - --loops=10000 "$fio_jobs" > "$FULL" 2>&1 & > > + --loops=10000 --cgroup=blktests/fast $fio_jobs > "$FULL" 2>&1 & > > > > - sleep 7 > > + sleep 6 > > Is there any particular reason you changed the run time and sleep time > and added the sleep at the end? > Validating this is a little tricky because we don't export statistics around blkgs. The change in runtime and sleep here is just to shorten the test. Enough time has passed that everything is setup and enough IOs go through and fail to be sure that this all works. The sleep below should probably be a little shorter, but it's to more easily validate that the blkg cleanup is due to request_queue cleanup vs cgroup cleanup. Prior, I made a bad assumption that everything was okay due to always cleaning up the cgroup when in reality request_queue cleanup was failing first. > > local device_path > > for dev in "${SCSI_DEBUG_DEVICES[@]}"; do > > # shutdown devices in progress > > @@ -61,6 +68,10 @@ scsi_debug_stress_remove() { > > > > wait > > > > + sleep 10 > > + > > + _exit_cgroup2 > > + > > _exit_scsi_debug > > } > > > > -- > > 2.17.1 > > I'll send a v2 switching the naming to ${TEST_NAME} + shortening the last sleep. Thanks, Dennis
diff --git a/tests/block/027 b/tests/block/027 index b480016..36e88b4 100755 --- a/tests/block/027 +++ b/tests/block/027 @@ -26,6 +26,13 @@ scsi_debug_stress_remove() { return fi + _init_cgroup2 + + # setup cgroups + echo "+io" > "/sys/fs/cgroup/cgroup.subtree_control" + echo "+io" > "$CGROUP2_DIR/cgroup.subtree_control" + mkdir "$CGROUP2_DIR/fast" + # set higher aio limit echo 524288 > /proc/sys/fs/aio-max-nr @@ -45,13 +52,13 @@ scsi_debug_stress_remove() { ((cnt++)) done - local num_jobs=4 runtime=21 + local num_jobs=4 runtime=12 fio --rw=randread --size=128G --direct=1 --ioengine=libaio \ --iodepth=2048 --numjobs=$num_jobs --bs=4k \ --group_reporting=1 --group_reporting=1 --runtime=$runtime \ - --loops=10000 "$fio_jobs" > "$FULL" 2>&1 & + --loops=10000 --cgroup=blktests/fast $fio_jobs > "$FULL" 2>&1 & - sleep 7 + sleep 6 local device_path for dev in "${SCSI_DEBUG_DEVICES[@]}"; do # shutdown devices in progress @@ -61,6 +68,10 @@ scsi_debug_stress_remove() { wait + sleep 10 + + _exit_cgroup2 + _exit_scsi_debug }
Previously, the test was broken as "$fio_jobs" was considered as a string instead of additional parameters. This is fixed here. Second, there was an issue with earlier kernels when request lists existed such that request_queues were never being cleaned up because non-root blkgs took a reference on the queue. However, blkgs were being cleaned up after the last reference to the request_queue was put back. This creates a blktest cgroup for the fio jobs to utilize and should be useful for catching this scenario in the future. Signed-off-by: Dennis Zhou <dennis@kernel.org> --- tests/block/027 | 17 ++++++++++++++--- 1 file changed, 14 insertions(+), 3 deletions(-)