Message ID | 20240521224249.7389-2-gulam.mohamed@oracle.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [V2,for-6.10/block,1/2] loop: Fix a race between loop detach and loop open | expand |
On Tue, May 21, 2024 at 10:42:49PM +0000, Gulam Mohamed wrote: > When one process opens a loop device partition and another process detaches > it, there will be a race condition due to which stale loop partitions are > created causing IO errors. This test will detect the race This isn't really a 2/2, but a single patch for a separate repository.
On May 21, 2024 / 22:42, Gulam Mohamed wrote: > When one process opens a loop device partition and another process detaches > it, there will be a race condition due to which stale loop partitions are > created causing IO errors. This test will detect the race Hello Gulam, thanks for the patch. I ran the new test case on my system and observed it failed. It looks working as expected. This is the patch not for Linux kernel but for blktests, then it's the better to separate it out. I suggest to add the subject title prefix "[PATCH blktests]" to this patch so that reviewers can tell the target repository easily. I suggest to take a look in the "new" script. It has some guides for blktests, such as "Indent with tabs" or the global variable "TMPDIR". Also, please run "make check" which runs shellcheck. $ make check shellcheck -x -e SC2119 -f gcc check common/* \ tests/*/rc tests/*/[0-9]*[0-9] src/*.sh tests/loop/010:23:17: note: Instead of '[ 1 ]', use 'true'. [SC2161] tests/loop/010:32:17: note: Instead of '[ 1 ]', use 'true'. [SC2161] tests/loop/010:84:7: note: Double quote to prevent globbing and word splitting. [SC2086] make: *** [Makefile:21: check] Error 1 Please address the shellcheck warnings above. It will help us to keep script quality at some level. > > Signed-off-by: Gulam Mohamed <gulam.mohamed@oracle.com> > --- > tests/loop/010 | 90 ++++++++++++++++++++++++++++++++++++++++++++++ > tests/loop/010.out | 2 ++ > 2 files changed, 92 insertions(+) > create mode 100755 tests/loop/010 > create mode 100644 tests/loop/010.out > > diff --git a/tests/loop/010 b/tests/loop/010 > new file mode 100755 > index 000000000000..ea93a120d51a > --- /dev/null > +++ b/tests/loop/010 > @@ -0,0 +1,90 @@ > +#!/bin/bash > +# SPDX-License-Identifier: GPL-3.0+ > +# Copyright (C) 2024, Oracle and/or its affiliates. > +# > +# Test to detect a race between loop detach and loop open which creates > +# stale loop partitions when one process opens the loop partition and > +# another process detaches the loop device > +# > +. tests/loop/rc > +DESCRIPTION="check stale loop partition" > +TIMED=1 > + > +requires() { > + _have_program parted > + _have_program mkfs.xfs > +} > + > +image_file="/tmp/loopImg" $TMPDIR is recommended instead of /tmp. $TMPDIR is cleaned up after test run. > +line1="partition scan of loop0 failed (rc=-16)" > + > +function create_loop() Nit: the "function" keyword is not used in blktests (there is one exception though). > +{ > + while [ 1 ] > + do > + loop_device="$(losetup -P -f --show "${image_file}")" > + blkid /dev/loop0p1 >> /dev/null 2>&1 > + done > +} > + > +function detach_loop() > +{ > + while [ 1 ] > + do > + if [ -e /dev/loop0 ]; then > + losetup -d /dev/loop0 > /dev/null 2>&1 > + fi > + done > +} > + > +test() { > + echo "Running ${TEST_NAME}" > + local failure="/tmp/failure" Nit: some local variables are declared with "local" keyword, and some other are not. > + touch $failure > + echo 0 > $failure Why the file is required to keep the fail/pass status? Just a bash variable looks enough. > + dmesg -c > /dev/null 2>&1 This kernel ring buffer clear will likely affect test harnesses which run blktests. To check the kernel message generated during each test case run, please use _dmesg_since_test_start(): ref nbd/004, nvme/003. > + > + truncate -s 2G "${image_file}" Other test cases in loop group have 1G file size. Is 2G required for this test? If not, 1G size would be the better to reduce disk shortage risk. > + parted -a none -s "${image_file}" mklabel gpt > + loop_device="$(losetup -P -f --show "${image_file}")" > + parted -a none -s "${loop_device}" mkpart primary 64s 109051s > + > + udevadm settle > + if [ ! -e "${loop_device}" ]; then > + return 1 > + fi > + > + mkfs.xfs -f "${loop_device}p1" > /dev/null 2>&1 > + > + losetup -d "${loop_device}" > /dev/null 2>&1 > + > + create_loop & > + create_pid=$! > + detach_loop & > + detach_pid=$! > + > + sleep "${TIMEOUT:-180}" The default 180 seconds sounds a bit long. How about 30 seconds? > + { > + kill -9 $create_pid > + kill -9 $detach_pid > + wait > + sleep 1 > + } 2>/dev/null > + > + losetup -D > /dev/null 2>&1 > + dmesg | while IFS= read -r line2 > + do > + match=$(echo "$line2" | grep -o "$line1") > + if [ "$line1" == "$match" ]; then > + echo 1 > "/tmp/failure" > + break > + fi > + done > + failed=$(cat $failure) > + if [ $failed -eq 0 ]; then > + echo "Test complete" > + else > + echo "Test failed" > + fi > + rm -f $failure > +} > diff --git a/tests/loop/010.out b/tests/loop/010.out > new file mode 100644 > index 000000000000..64a6aee00b8a > --- /dev/null > +++ b/tests/loop/010.out > @@ -0,0 +1,2 @@ > +Running loop/010 > +Test complete > -- > 2.39.3 >
Hi, > -----Original Message----- > From: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com> > Sent: Thursday, May 23, 2024 4:45 PM > To: Gulam Mohamed <gulam.mohamed@oracle.com> > Cc: linux-block@vger.kernel.org; linux-kernel@vger.kernel.org; > axboe@kernel.dk; chaitanyak@nvidia.com; hch@lst.de > Subject: Re: [PATCH 2/2] loop: Test to detect a race condition between loop > detach and open > > On May 21, 2024 / 22:42, Gulam Mohamed wrote: > > When one process opens a loop device partition and another process > > detaches it, there will be a race condition due to which stale loop > > partitions are created causing IO errors. This test will detect the > > race > > Hello Gulam, thanks for the patch. I ran the new test case on my system and > observed it failed. It looks working as expected. > > This is the patch not for Linux kernel but for blktests, then it's the better to > separate it out. I suggest to add the subject title prefix "[PATCH blktests]" > to this patch so that reviewers can tell the target repository easily. > Done. Changed the subject line prefix to "[PATCH blktests]" > I suggest to take a look in the "new" script. It has some guides for blktests, > such as "Indent with tabs" or the global variable "TMPDIR". Also, please run > "make check" which runs shellcheck. > > $ make check > shellcheck -x -e SC2119 -f gcc check common/* \ > tests/*/rc tests/*/[0-9]*[0-9] src/*.sh > tests/loop/010:23:17: note: Instead of '[ 1 ]', use 'true'. [SC2161] > tests/loop/010:32:17: note: Instead of '[ 1 ]', use 'true'. [SC2161] > tests/loop/010:84:7: note: Double quote to prevent globbing and word > splitting. [SC2086] > make: *** [Makefile:21: check] Error 1 > > Please address the shellcheck warnings above. It will help us to keep script > quality at some level. Done. Resolved the warnings > > > > > Signed-off-by: Gulam Mohamed <gulam.mohamed@oracle.com> > > --- > > tests/loop/010 | 90 > ++++++++++++++++++++++++++++++++++++++++++++++ > > tests/loop/010.out | 2 ++ > > 2 files changed, 92 insertions(+) > > create mode 100755 tests/loop/010 > > create mode 100644 tests/loop/010.out > > > > diff --git a/tests/loop/010 b/tests/loop/010 new file mode 100755 > > index 000000000000..ea93a120d51a > > --- /dev/null > > +++ b/tests/loop/010 > > @@ -0,0 +1,90 @@ > > +#!/bin/bash > > +# SPDX-License-Identifier: GPL-3.0+ > > +# Copyright (C) 2024, Oracle and/or its affiliates. > > +# > > +# Test to detect a race between loop detach and loop open which > > +creates # stale loop partitions when one process opens the loop > > +partition and # another process detaches the loop device # . > > +tests/loop/rc DESCRIPTION="check stale loop partition" > > +TIMED=1 > > + > > +requires() { > > + _have_program parted > > + _have_program mkfs.xfs > > +} > > + > > +image_file="/tmp/loopImg" > > $TMPDIR is recommended instead of /tmp. $TMPDIR is cleaned up after test > run. Done. Using $TMPDIR now. > > > +line1="partition scan of loop0 failed (rc=-16)" > > + > > +function create_loop() > > Nit: the "function" keyword is not used in blktests (there is one exception > though). > Done. Removed the keyword "function" now > > +{ > > + while [ 1 ] > > + do > > + loop_device="$(losetup -P -f --show "${image_file}")" > > + blkid /dev/loop0p1 >> /dev/null 2>&1 > > + done > > +} > > + > > +function detach_loop() > > +{ > > + while [ 1 ] > > + do > > + if [ -e /dev/loop0 ]; then > > + losetup -d /dev/loop0 > /dev/null 2>&1 > > + fi > > + done > > +} > > + > > +test() { > > + echo "Running ${TEST_NAME}" > > + local failure="/tmp/failure" > > Nit: some local variables are declared with "local" keyword, and some other > are not. Done. Using the "local" keyword for all local variables now > > > + touch $failure > > + echo 0 > $failure > > Why the file is required to keep the fail/pass status? Just a bash variable looks > enough. In the beginning I tried with a bash variable but it was not working as the variable need to be accessed by other two child threads also. > > > + dmesg -c > /dev/null 2>&1 > > This kernel ring buffer clear will likely affect test harnesses which run blktests. > To check the kernel message generated during each test case run, please use > _dmesg_since_test_start(): ref nbd/004, nvme/003. Done. Used the above function > > > + > > + truncate -s 2G "${image_file}" > > Other test cases in loop group have 1G file size. Is 2G required for this test? > If not, 1G size would be the better to reduce disk shortage risk. > Using 1G is not an issue. Done > > + parted -a none -s "${image_file}" mklabel gpt > > + loop_device="$(losetup -P -f --show "${image_file}")" > > + parted -a none -s "${loop_device}" mkpart primary 64s 109051s > > + > > + udevadm settle > > + if [ ! -e "${loop_device}" ]; then > > + return 1 > > + fi > > + > > + mkfs.xfs -f "${loop_device}p1" > /dev/null 2>&1 > > + > > + losetup -d "${loop_device}" > /dev/null 2>&1 > > + > > + create_loop & > > + create_pid=$! > > + detach_loop & > > + detach_pid=$! > > + > > + sleep "${TIMEOUT:-180}" > > The default 180 seconds sounds a bit long. How about 30 seconds? Using 180 seconds is just to give more time to reproduce the issue as 30 seconds may not be sufficient for the race condition to occur Will send the version 2 as soon as possible, with all these changes > > > + { > > + kill -9 $create_pid > > + kill -9 $detach_pid > > + wait > > + sleep 1 > > + } 2>/dev/null > > + > > + losetup -D > /dev/null 2>&1 > > + dmesg | while IFS= read -r line2 > > + do > > + match=$(echo "$line2" | grep -o "$line1") > > + if [ "$line1" == "$match" ]; then > > + echo 1 > "/tmp/failure" > > + break > > + fi > > + done > > + failed=$(cat $failure) > > + if [ $failed -eq 0 ]; then > > + echo "Test complete" > > + else > > + echo "Test failed" > > + fi > > + rm -f $failure > > +} > > diff --git a/tests/loop/010.out b/tests/loop/010.out new file mode > > 100644 index 000000000000..64a6aee00b8a > > --- /dev/null > > +++ b/tests/loop/010.out > > @@ -0,0 +1,2 @@ > > +Running loop/010 > > +Test complete > > -- > > 2.39.3 > >
diff --git a/tests/loop/010 b/tests/loop/010 new file mode 100755 index 000000000000..ea93a120d51a --- /dev/null +++ b/tests/loop/010 @@ -0,0 +1,90 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-3.0+ +# Copyright (C) 2024, Oracle and/or its affiliates. +# +# Test to detect a race between loop detach and loop open which creates +# stale loop partitions when one process opens the loop partition and +# another process detaches the loop device +# +. tests/loop/rc +DESCRIPTION="check stale loop partition" +TIMED=1 + +requires() { + _have_program parted + _have_program mkfs.xfs +} + +image_file="/tmp/loopImg" +line1="partition scan of loop0 failed (rc=-16)" + +function create_loop() +{ + while [ 1 ] + do + loop_device="$(losetup -P -f --show "${image_file}")" + blkid /dev/loop0p1 >> /dev/null 2>&1 + done +} + +function detach_loop() +{ + while [ 1 ] + do + if [ -e /dev/loop0 ]; then + losetup -d /dev/loop0 > /dev/null 2>&1 + fi + done +} + +test() { + echo "Running ${TEST_NAME}" + local failure="/tmp/failure" + touch $failure + echo 0 > $failure + dmesg -c > /dev/null 2>&1 + + truncate -s 2G "${image_file}" + parted -a none -s "${image_file}" mklabel gpt + loop_device="$(losetup -P -f --show "${image_file}")" + parted -a none -s "${loop_device}" mkpart primary 64s 109051s + + udevadm settle + if [ ! -e "${loop_device}" ]; then + return 1 + fi + + mkfs.xfs -f "${loop_device}p1" > /dev/null 2>&1 + + losetup -d "${loop_device}" > /dev/null 2>&1 + + create_loop & + create_pid=$! + detach_loop & + detach_pid=$! + + sleep "${TIMEOUT:-180}" + { + kill -9 $create_pid + kill -9 $detach_pid + wait + sleep 1 + } 2>/dev/null + + losetup -D > /dev/null 2>&1 + dmesg | while IFS= read -r line2 + do + match=$(echo "$line2" | grep -o "$line1") + if [ "$line1" == "$match" ]; then + echo 1 > "/tmp/failure" + break + fi + done + failed=$(cat $failure) + if [ $failed -eq 0 ]; then + echo "Test complete" + else + echo "Test failed" + fi + rm -f $failure +} diff --git a/tests/loop/010.out b/tests/loop/010.out new file mode 100644 index 000000000000..64a6aee00b8a --- /dev/null +++ b/tests/loop/010.out @@ -0,0 +1,2 @@ +Running loop/010 +Test complete
When one process opens a loop device partition and another process detaches it, there will be a race condition due to which stale loop partitions are created causing IO errors. This test will detect the race Signed-off-by: Gulam Mohamed <gulam.mohamed@oracle.com> --- tests/loop/010 | 90 ++++++++++++++++++++++++++++++++++++++++++++++ tests/loop/010.out | 2 ++ 2 files changed, 92 insertions(+) create mode 100755 tests/loop/010 create mode 100644 tests/loop/010.out