block: fix fio jobs for 027 and add cgroup support

Message ID	20190329221432.38726-1-dennis@kernel.org (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-block-owner@kernel.org> From: Dennis Zhou <dennis@kernel.org> To: Omar Sandoval <osandov@osandov.com> Cc: kernel-team@fb.com, linux-block@vger.kernel.org, Dennis Zhou <dennis@kernel.org> Subject: [PATCH] block: fix fio jobs for 027 and add cgroup support Date: Fri, 29 Mar 2019 15:14:32 -0700 Message-Id: <20190329221432.38726-1-dennis@kernel.org> Sender: linux-block-owner@vger.kernel.org Precedence: bulk
Series	block: fix fio jobs for 027 and add cgroup support \| expand block: fix fio jobs for 027 and add cgroup support

Message ID

20190329221432.38726-1-dennis@kernel.org (mailing list archive)

State

New, archived

Headers

From: Dennis Zhou <dennis@kernel.org>
To: Omar Sandoval <osandov@osandov.com>
Cc: kernel-team@fb.com, linux-block@vger.kernel.org,
        Dennis Zhou <dennis@kernel.org>
Subject: [PATCH] block: fix fio jobs for 027 and add cgroup support
Date: Fri, 29 Mar 2019 15:14:32 -0700
Message-Id: <20190329221432.38726-1-dennis@kernel.org>
Sender: linux-block-owner@vger.kernel.org
Precedence: bulk

Series

block: fix fio jobs for 027 and add cgroup support | expand

Commit Message

Dennis Zhou March 29, 2019, 10:14 p.m. UTC

Previously, the test was broken as "$fio_jobs" was considered as a
string instead of additional parameters. This is fixed here.

Second, there was an issue with earlier kernels when request lists
existed such that request_queues were never being cleaned up because
non-root blkgs took a reference on the queue. However, blkgs were being
cleaned up after the last reference to the request_queue was put back.
This creates a blktest cgroup for the fio jobs to utilize and should be
useful for catching this scenario in the future.

Signed-off-by: Dennis Zhou <dennis@kernel.org>
---
 tests/block/027 | 17 ++++++++++++++---
 1 file changed, 14 insertions(+), 3 deletions(-)

Comments

Omar Sandoval April 4, 2019, 6:02 p.m. UTC | #1

On Fri, Mar 29, 2019 at 03:14:32PM -0700, Dennis Zhou wrote:
> Previously, the test was broken as "$fio_jobs" was considered as a
> string instead of additional parameters. This is fixed here.
> 
> Second, there was an issue with earlier kernels when request lists
> existed such that request_queues were never being cleaned up because
> non-root blkgs took a reference on the queue. However, blkgs were being
> cleaned up after the last reference to the request_queue was put back.
> This creates a blktest cgroup for the fio jobs to utilize and should be
> useful for catching this scenario in the future.
> 
> Signed-off-by: Dennis Zhou <dennis@kernel.org>

Would it be valuable to test this both in a cgroup and not in a cgroup?

>  tests/block/027 | 17 ++++++++++++++---
>  1 file changed, 14 insertions(+), 3 deletions(-)
> 
> diff --git a/tests/block/027 b/tests/block/027
> index b480016..36e88b4 100755
> --- a/tests/block/027
> +++ b/tests/block/027
> @@ -26,6 +26,13 @@ scsi_debug_stress_remove() {
>  		return
>  	fi
>  
> +	_init_cgroup2
> +
> +	# setup cgroups
> +	echo "+io" > "/sys/fs/cgroup/cgroup.subtree_control"
> +	echo "+io" > "$CGROUP2_DIR/cgroup.subtree_control"
> +	mkdir "$CGROUP2_DIR/fast"
> +

Why is this called "fast"?

>  	# set higher aio limit
>  	echo 524288 > /proc/sys/fs/aio-max-nr
>  
> @@ -45,13 +52,13 @@ scsi_debug_stress_remove() {
>  		((cnt++))
>  	done
>  
> -	local num_jobs=4 runtime=21
> +	local num_jobs=4 runtime=12
>  	fio --rw=randread --size=128G --direct=1 --ioengine=libaio \
>  		--iodepth=2048 --numjobs=$num_jobs --bs=4k \
>  		--group_reporting=1 --group_reporting=1 --runtime=$runtime \
> -		--loops=10000 "$fio_jobs" > "$FULL" 2>&1 &
> +		--loops=10000 --cgroup=blktests/fast $fio_jobs > "$FULL" 2>&1 &
>  
> -	sleep 7
> +	sleep 6

Is there any particular reason you changed the run time and sleep time
and added the sleep at the end?

>  	local device_path
>  	for dev in "${SCSI_DEBUG_DEVICES[@]}"; do
>  		# shutdown devices in progress
> @@ -61,6 +68,10 @@ scsi_debug_stress_remove() {
>  
>  	wait
>  
> +	sleep 10
> +
> +	_exit_cgroup2
> +
>  	_exit_scsi_debug
>  }
>  
> -- 
> 2.17.1
>

Dennis Zhou April 4, 2019, 9:06 p.m. UTC | #2

On Thu, Apr 04, 2019 at 11:02:24AM -0700, Omar Sandoval wrote:
> On Fri, Mar 29, 2019 at 03:14:32PM -0700, Dennis Zhou wrote:
> > Previously, the test was broken as "$fio_jobs" was considered as a
> > string instead of additional parameters. This is fixed here.
> > 
> > Second, there was an issue with earlier kernels when request lists
> > existed such that request_queues were never being cleaned up because
> > non-root blkgs took a reference on the queue. However, blkgs were being
> > cleaned up after the last reference to the request_queue was put back.
> > This creates a blktest cgroup for the fio jobs to utilize and should be
> > useful for catching this scenario in the future.
> > 
> > Signed-off-by: Dennis Zhou <dennis@kernel.org>
> 
> Would it be valuable to test this both in a cgroup and not in a cgroup?
> 

I lean towards no because testing in a cgroup should subsume testing not
in a cgroup.

> >  tests/block/027 | 17 ++++++++++++++---
> >  1 file changed, 14 insertions(+), 3 deletions(-)
> > 
> > diff --git a/tests/block/027 b/tests/block/027
> > index b480016..36e88b4 100755
> > --- a/tests/block/027
> > +++ b/tests/block/027
> > @@ -26,6 +26,13 @@ scsi_debug_stress_remove() {
> >  		return
> >  	fi
> >  
> > +	_init_cgroup2
> > +
> > +	# setup cgroups
> > +	echo "+io" > "/sys/fs/cgroup/cgroup.subtree_control"
> > +	echo "+io" > "$CGROUP2_DIR/cgroup.subtree_control"
> > +	mkdir "$CGROUP2_DIR/fast"
> > +
> 
> Why is this called "fast"?
> 

I just picked a name from Josef's test. I was hesitant to use
${TEST_NAME} because of the /, but there's no reason that's not okay.

> >  	# set higher aio limit
> >  	echo 524288 > /proc/sys/fs/aio-max-nr
> >  
> > @@ -45,13 +52,13 @@ scsi_debug_stress_remove() {
> >  		((cnt++))
> >  	done
> >  
> > -	local num_jobs=4 runtime=21
> > +	local num_jobs=4 runtime=12
> >  	fio --rw=randread --size=128G --direct=1 --ioengine=libaio \
> >  		--iodepth=2048 --numjobs=$num_jobs --bs=4k \
> >  		--group_reporting=1 --group_reporting=1 --runtime=$runtime \
> > -		--loops=10000 "$fio_jobs" > "$FULL" 2>&1 &
> > +		--loops=10000 --cgroup=blktests/fast $fio_jobs > "$FULL" 2>&1 &
> >  
> > -	sleep 7
> > +	sleep 6
> 
> Is there any particular reason you changed the run time and sleep time
> and added the sleep at the end?
> 

Validating this is a little tricky because we don't export statistics
around blkgs. The change in runtime and sleep here is just to shorten
the test. Enough time has passed that everything is setup and enough IOs
go through and fail to be sure that this all works.

The sleep below should probably be a little shorter, but it's to more
easily validate that the blkg cleanup is due to request_queue cleanup
vs cgroup cleanup. Prior, I made a bad assumption that everything was
okay due to always cleaning up the cgroup when in reality request_queue
cleanup was failing first.

> >  	local device_path
> >  	for dev in "${SCSI_DEBUG_DEVICES[@]}"; do
> >  		# shutdown devices in progress
> > @@ -61,6 +68,10 @@ scsi_debug_stress_remove() {
> >  
> >  	wait
> >  
> > +	sleep 10
> > +
> > +	_exit_cgroup2
> > +
> >  	_exit_scsi_debug
> >  }
> >  
> > -- 
> > 2.17.1
> > 

I'll send a v2 switching the naming to ${TEST_NAME} + shortening the
last sleep.

Thanks,
Dennis

diff --git a/tests/block/027 b/tests/block/027
index b480016..36e88b4 100755
--- a/tests/block/027
+++ b/tests/block/027
@@ -26,6 +26,13 @@  scsi_debug_stress_remove() {
 		return
 	fi
 
+	_init_cgroup2
+
+	# setup cgroups
+	echo "+io" > "/sys/fs/cgroup/cgroup.subtree_control"
+	echo "+io" > "$CGROUP2_DIR/cgroup.subtree_control"
+	mkdir "$CGROUP2_DIR/fast"
+
 	# set higher aio limit
 	echo 524288 > /proc/sys/fs/aio-max-nr
 
@@ -45,13 +52,13 @@  scsi_debug_stress_remove() {
 		((cnt++))
 	done
 
-	local num_jobs=4 runtime=21
+	local num_jobs=4 runtime=12
 	fio --rw=randread --size=128G --direct=1 --ioengine=libaio \
 		--iodepth=2048 --numjobs=$num_jobs --bs=4k \
 		--group_reporting=1 --group_reporting=1 --runtime=$runtime \
-		--loops=10000 "$fio_jobs" > "$FULL" 2>&1 &
+		--loops=10000 --cgroup=blktests/fast $fio_jobs > "$FULL" 2>&1 &
 
-	sleep 7
+	sleep 6
 	local device_path
 	for dev in "${SCSI_DEBUG_DEVICES[@]}"; do
 		# shutdown devices in progress
@@ -61,6 +68,10 @@  scsi_debug_stress_remove() {
 
 	wait
 
+	sleep 10
+
+	_exit_cgroup2
+
 	_exit_scsi_debug
 }

block: fix fio jobs for 027 and add cgroup support

Commit Message

Comments

Patch