diff mbox series

[2/2] loop: Test to detect a race condition between loop detach and open

Message ID 20240521224249.7389-2-gulam.mohamed@oracle.com (mailing list archive)
State New, archived
Headers show
Series [V2,for-6.10/block,1/2] loop: Fix a race between loop detach and loop open | expand

Commit Message

Gulam Mohamed May 21, 2024, 10:42 p.m. UTC
When one process opens a loop device partition and another process detaches
it, there will be a race condition due to which stale loop partitions are
created causing IO errors. This test will detect the race

Signed-off-by: Gulam Mohamed <gulam.mohamed@oracle.com>
---
 tests/loop/010     | 90 ++++++++++++++++++++++++++++++++++++++++++++++
 tests/loop/010.out |  2 ++
 2 files changed, 92 insertions(+)
 create mode 100755 tests/loop/010
 create mode 100644 tests/loop/010.out

Comments

Christoph Hellwig May 23, 2024, 8:12 a.m. UTC | #1
On Tue, May 21, 2024 at 10:42:49PM +0000, Gulam Mohamed wrote:
> When one process opens a loop device partition and another process detaches
> it, there will be a race condition due to which stale loop partitions are
> created causing IO errors. This test will detect the race

This isn't really a 2/2, but a single patch for a separate repository.
Shinichiro Kawasaki May 23, 2024, 11:14 a.m. UTC | #2
On May 21, 2024 / 22:42, Gulam Mohamed wrote:
> When one process opens a loop device partition and another process detaches
> it, there will be a race condition due to which stale loop partitions are
> created causing IO errors. This test will detect the race

Hello Gulam, thanks for the patch. I ran the new test case on my system and
observed it failed. It looks working as expected.

This is the patch not for Linux kernel but for blktests, then it's the better to
separate it out. I suggest to add the subject title prefix "[PATCH blktests]"
to this patch so that reviewers can tell the target repository easily.

I suggest to take a look in the "new" script. It has some guides for blktests,
such as "Indent with tabs" or the global variable "TMPDIR". Also, please run
"make check" which runs shellcheck.

  $ make check
  shellcheck -x -e SC2119 -f gcc check common/* \
          tests/*/rc tests/*/[0-9]*[0-9] src/*.sh
  tests/loop/010:23:17: note: Instead of '[ 1 ]', use 'true'. [SC2161]
  tests/loop/010:32:17: note: Instead of '[ 1 ]', use 'true'. [SC2161]
  tests/loop/010:84:7: note: Double quote to prevent globbing and word splitting. [SC2086]
  make: *** [Makefile:21: check] Error 1

Please address the shellcheck warnings above. It will help us to keep script
quality at some level.

> 
> Signed-off-by: Gulam Mohamed <gulam.mohamed@oracle.com>
> ---
>  tests/loop/010     | 90 ++++++++++++++++++++++++++++++++++++++++++++++
>  tests/loop/010.out |  2 ++
>  2 files changed, 92 insertions(+)
>  create mode 100755 tests/loop/010
>  create mode 100644 tests/loop/010.out
> 
> diff --git a/tests/loop/010 b/tests/loop/010
> new file mode 100755
> index 000000000000..ea93a120d51a
> --- /dev/null
> +++ b/tests/loop/010
> @@ -0,0 +1,90 @@
> +#!/bin/bash
> +# SPDX-License-Identifier: GPL-3.0+
> +# Copyright (C) 2024, Oracle and/or its affiliates.
> +#
> +# Test to detect a race between loop detach and loop open which creates
> +# stale loop partitions when one process opens the loop partition and
> +# another process detaches the loop device
> +#
> +. tests/loop/rc
> +DESCRIPTION="check stale loop partition"
> +TIMED=1
> +
> +requires() {
> +        _have_program parted
> +        _have_program mkfs.xfs
> +}
> +
> +image_file="/tmp/loopImg"

$TMPDIR is recommended instead of /tmp. $TMPDIR is cleaned up after test run.

> +line1="partition scan of loop0 failed (rc=-16)"
> +
> +function create_loop()

Nit: the "function" keyword is not used in blktests (there is one exception
though).

> +{
> +        while [ 1 ]
> +        do
> +                loop_device="$(losetup -P -f --show "${image_file}")"
> +                blkid /dev/loop0p1 >> /dev/null 2>&1
> +        done
> +}
> +
> +function detach_loop()
> +{
> +        while [ 1 ]
> +        do
> +                if [ -e /dev/loop0 ]; then
> +                        losetup -d /dev/loop0 > /dev/null 2>&1
> +                fi
> +        done
> +}
> +
> +test() {
> +	echo "Running ${TEST_NAME}"
> +	local failure="/tmp/failure"

Nit: some local variables are declared with "local" keyword, and some other are
not.

> +	touch $failure
> +	echo 0 > $failure

Why the file is required to keep the fail/pass status? Just a bash variable
looks enough.

> +	dmesg -c > /dev/null 2>&1

This kernel ring buffer clear will likely affect test harnesses which run
blktests. To check the kernel message generated during each test case run,
please use _dmesg_since_test_start(): ref nbd/004, nvme/003.

> +
> +	truncate -s 2G "${image_file}"

Other test cases in loop group have 1G file size. Is 2G required for this test?
If not, 1G size would be the better to reduce disk shortage risk.

> +	parted -a none -s "${image_file}" mklabel gpt
> +	loop_device="$(losetup -P -f --show "${image_file}")"
> +	parted -a none -s "${loop_device}" mkpart primary 64s 109051s
> +	
> +	udevadm settle
> +        if [ ! -e "${loop_device}" ]; then
> +		return 1
> +        fi
> +
> +        mkfs.xfs -f "${loop_device}p1" > /dev/null 2>&1
> +
> +        losetup -d "${loop_device}" >  /dev/null 2>&1
> +
> +        create_loop &
> +	create_pid=$!
> +        detach_loop &
> +	detach_pid=$!
> +
> +	sleep "${TIMEOUT:-180}"

The default 180 seconds sounds a bit long. How about 30 seconds?

> +        {
> +                kill -9 $create_pid
> +		kill -9 $detach_pid
> +                wait
> +                sleep 1
> +        } 2>/dev/null
> +
> +        losetup -D > /dev/null 2>&1
> +        dmesg  | while IFS= read -r line2
> +	do
> +        	match=$(echo "$line2" | grep -o "$line1")
> +                if [ "$line1" == "$match" ]; then
> +			echo 1 > "/tmp/failure"
> +			break
> +                fi
> +        done
> +	failed=$(cat $failure)
> +	if [ $failed -eq 0 ]; then
> +		echo "Test complete"
> +	else
> +		echo "Test failed"
> +	fi
> +	rm -f $failure
> +}
> diff --git a/tests/loop/010.out b/tests/loop/010.out
> new file mode 100644
> index 000000000000..64a6aee00b8a
> --- /dev/null
> +++ b/tests/loop/010.out
> @@ -0,0 +1,2 @@
> +Running loop/010
> +Test complete
> -- 
> 2.39.3
>
Gulam Mohamed May 27, 2024, 9:56 a.m. UTC | #3
Hi,

> -----Original Message-----
> From: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>
> Sent: Thursday, May 23, 2024 4:45 PM
> To: Gulam Mohamed <gulam.mohamed@oracle.com>
> Cc: linux-block@vger.kernel.org; linux-kernel@vger.kernel.org;
> axboe@kernel.dk; chaitanyak@nvidia.com; hch@lst.de
> Subject: Re: [PATCH 2/2] loop: Test to detect a race condition between loop
> detach and open
> 
> On May 21, 2024 / 22:42, Gulam Mohamed wrote:
> > When one process opens a loop device partition and another process
> > detaches it, there will be a race condition due to which stale loop
> > partitions are created causing IO errors. This test will detect the
> > race
> 
> Hello Gulam, thanks for the patch. I ran the new test case on my system and
> observed it failed. It looks working as expected.
> 
> This is the patch not for Linux kernel but for blktests, then it's the better to
> separate it out. I suggest to add the subject title prefix "[PATCH blktests]"
> to this patch so that reviewers can tell the target repository easily.
> 
Done. Changed the subject line prefix to "[PATCH blktests]"
 
> I suggest to take a look in the "new" script. It has some guides for blktests,
> such as "Indent with tabs" or the global variable "TMPDIR". Also, please run
> "make check" which runs shellcheck.
> 
>   $ make check
>   shellcheck -x -e SC2119 -f gcc check common/* \
>           tests/*/rc tests/*/[0-9]*[0-9] src/*.sh
>   tests/loop/010:23:17: note: Instead of '[ 1 ]', use 'true'. [SC2161]
>   tests/loop/010:32:17: note: Instead of '[ 1 ]', use 'true'. [SC2161]
>   tests/loop/010:84:7: note: Double quote to prevent globbing and word
> splitting. [SC2086]
>   make: *** [Makefile:21: check] Error 1
> 
> Please address the shellcheck warnings above. It will help us to keep script
> quality at some level.
Done. Resolved the warnings
> 
> >
> > Signed-off-by: Gulam Mohamed <gulam.mohamed@oracle.com>
> > ---
> >  tests/loop/010     | 90
> ++++++++++++++++++++++++++++++++++++++++++++++
> >  tests/loop/010.out |  2 ++
> >  2 files changed, 92 insertions(+)
> >  create mode 100755 tests/loop/010
> >  create mode 100644 tests/loop/010.out
> >
> > diff --git a/tests/loop/010 b/tests/loop/010 new file mode 100755
> > index 000000000000..ea93a120d51a
> > --- /dev/null
> > +++ b/tests/loop/010
> > @@ -0,0 +1,90 @@
> > +#!/bin/bash
> > +# SPDX-License-Identifier: GPL-3.0+
> > +# Copyright (C) 2024, Oracle and/or its affiliates.
> > +#
> > +# Test to detect a race between loop detach and loop open which
> > +creates # stale loop partitions when one process opens the loop
> > +partition and # another process detaches the loop device # .
> > +tests/loop/rc DESCRIPTION="check stale loop partition"
> > +TIMED=1
> > +
> > +requires() {
> > +        _have_program parted
> > +        _have_program mkfs.xfs
> > +}
> > +
> > +image_file="/tmp/loopImg"
> 
> $TMPDIR is recommended instead of /tmp. $TMPDIR is cleaned up after test
> run.
Done. Using $TMPDIR now.
> 
> > +line1="partition scan of loop0 failed (rc=-16)"
> > +
> > +function create_loop()
> 
> Nit: the "function" keyword is not used in blktests (there is one exception
> though).
> 
Done. Removed the keyword "function" now
> > +{
> > +        while [ 1 ]
> > +        do
> > +                loop_device="$(losetup -P -f --show "${image_file}")"
> > +                blkid /dev/loop0p1 >> /dev/null 2>&1
> > +        done
> > +}
> > +
> > +function detach_loop()
> > +{
> > +        while [ 1 ]
> > +        do
> > +                if [ -e /dev/loop0 ]; then
> > +                        losetup -d /dev/loop0 > /dev/null 2>&1
> > +                fi
> > +        done
> > +}
> > +
> > +test() {
> > +	echo "Running ${TEST_NAME}"
> > +	local failure="/tmp/failure"
> 
> Nit: some local variables are declared with "local" keyword, and some other
> are not.
Done. Using the "local" keyword for all local variables now
> 
> > +	touch $failure
> > +	echo 0 > $failure
> 
> Why the file is required to keep the fail/pass status? Just a bash variable looks
> enough.
In the beginning I tried with a bash variable but it was not working as the variable need to be accessed by other two child threads also.
> 
> > +	dmesg -c > /dev/null 2>&1
> 
> This kernel ring buffer clear will likely affect test harnesses which run blktests.
> To check the kernel message generated during each test case run, please use
> _dmesg_since_test_start(): ref nbd/004, nvme/003.
Done. Used the above function
> 
> > +
> > +	truncate -s 2G "${image_file}"
> 
> Other test cases in loop group have 1G file size. Is 2G required for this test?
> If not, 1G size would be the better to reduce disk shortage risk.
> 
Using 1G is not an issue. Done
> > +	parted -a none -s "${image_file}" mklabel gpt
> > +	loop_device="$(losetup -P -f --show "${image_file}")"
> > +	parted -a none -s "${loop_device}" mkpart primary 64s 109051s
> > +
> > +	udevadm settle
> > +        if [ ! -e "${loop_device}" ]; then
> > +		return 1
> > +        fi
> > +
> > +        mkfs.xfs -f "${loop_device}p1" > /dev/null 2>&1
> > +
> > +        losetup -d "${loop_device}" >  /dev/null 2>&1
> > +
> > +        create_loop &
> > +	create_pid=$!
> > +        detach_loop &
> > +	detach_pid=$!
> > +
> > +	sleep "${TIMEOUT:-180}"
> 
> The default 180 seconds sounds a bit long. How about 30 seconds?
Using 180 seconds is just to give more time to reproduce the issue as 30 seconds may not be sufficient for the race condition to occur

Will send the version 2 as soon as possible, with all these changes
> 
> > +        {
> > +                kill -9 $create_pid
> > +		kill -9 $detach_pid
> > +                wait
> > +                sleep 1
> > +        } 2>/dev/null
> > +
> > +        losetup -D > /dev/null 2>&1
> > +        dmesg  | while IFS= read -r line2
> > +	do
> > +        	match=$(echo "$line2" | grep -o "$line1")
> > +                if [ "$line1" == "$match" ]; then
> > +			echo 1 > "/tmp/failure"
> > +			break
> > +                fi
> > +        done
> > +	failed=$(cat $failure)
> > +	if [ $failed -eq 0 ]; then
> > +		echo "Test complete"
> > +	else
> > +		echo "Test failed"
> > +	fi
> > +	rm -f $failure
> > +}
> > diff --git a/tests/loop/010.out b/tests/loop/010.out new file mode
> > 100644 index 000000000000..64a6aee00b8a
> > --- /dev/null
> > +++ b/tests/loop/010.out
> > @@ -0,0 +1,2 @@
> > +Running loop/010
> > +Test complete
> > --
> > 2.39.3
> >
diff mbox series

Patch

diff --git a/tests/loop/010 b/tests/loop/010
new file mode 100755
index 000000000000..ea93a120d51a
--- /dev/null
+++ b/tests/loop/010
@@ -0,0 +1,90 @@ 
+#!/bin/bash
+# SPDX-License-Identifier: GPL-3.0+
+# Copyright (C) 2024, Oracle and/or its affiliates.
+#
+# Test to detect a race between loop detach and loop open which creates
+# stale loop partitions when one process opens the loop partition and
+# another process detaches the loop device
+#
+. tests/loop/rc
+DESCRIPTION="check stale loop partition"
+TIMED=1
+
+requires() {
+        _have_program parted
+        _have_program mkfs.xfs
+}
+
+image_file="/tmp/loopImg"
+line1="partition scan of loop0 failed (rc=-16)"
+
+function create_loop()
+{
+        while [ 1 ]
+        do
+                loop_device="$(losetup -P -f --show "${image_file}")"
+                blkid /dev/loop0p1 >> /dev/null 2>&1
+        done
+}
+
+function detach_loop()
+{
+        while [ 1 ]
+        do
+                if [ -e /dev/loop0 ]; then
+                        losetup -d /dev/loop0 > /dev/null 2>&1
+                fi
+        done
+}
+
+test() {
+	echo "Running ${TEST_NAME}"
+	local failure="/tmp/failure"
+	touch $failure
+	echo 0 > $failure
+	dmesg -c > /dev/null 2>&1
+
+	truncate -s 2G "${image_file}"
+	parted -a none -s "${image_file}" mklabel gpt
+	loop_device="$(losetup -P -f --show "${image_file}")"
+	parted -a none -s "${loop_device}" mkpart primary 64s 109051s
+	
+	udevadm settle
+        if [ ! -e "${loop_device}" ]; then
+		return 1
+        fi
+
+        mkfs.xfs -f "${loop_device}p1" > /dev/null 2>&1
+
+        losetup -d "${loop_device}" >  /dev/null 2>&1
+
+        create_loop &
+	create_pid=$!
+        detach_loop &
+	detach_pid=$!
+
+	sleep "${TIMEOUT:-180}"
+        {
+                kill -9 $create_pid
+		kill -9 $detach_pid
+                wait
+                sleep 1
+        } 2>/dev/null
+
+        losetup -D > /dev/null 2>&1
+        dmesg  | while IFS= read -r line2
+	do
+        	match=$(echo "$line2" | grep -o "$line1")
+                if [ "$line1" == "$match" ]; then
+			echo 1 > "/tmp/failure"
+			break
+                fi
+        done
+	failed=$(cat $failure)
+	if [ $failed -eq 0 ]; then
+		echo "Test complete"
+	else
+		echo "Test failed"
+	fi
+	rm -f $failure
+}
diff --git a/tests/loop/010.out b/tests/loop/010.out
new file mode 100644
index 000000000000..64a6aee00b8a
--- /dev/null
+++ b/tests/loop/010.out
@@ -0,0 +1,2 @@ 
+Running loop/010
+Test complete