diff mbox series

fstests: generic/366: add a new test case to verify if certain fio load will hang the filesystem

Message ID 20241028054121.50985-1-wqu@suse.com (mailing list archive)
State New
Headers show
Series fstests: generic/366: add a new test case to verify if certain fio load will hang the filesystem | expand

Commit Message

Qu Wenruo Oct. 28, 2024, 5:41 a.m. UTC
[BUG]
During the development to make btrfs pass generic/563 (which needs to
make btrfs to support partial updaote folios), generic/095 causes hangs
during tests.

The call trace for the hanging process looks like this:

  __switch_to+0xf8/0x168
  __schedule+0x328/0x8a8
  schedule+0x54/0x140
  io_schedule+0x44/0x68
  folio_wait_bit_common+0x198/0x3f8
  __folio_lock+0x24/0x40
  extent_write_cache_pages+0x2e0/0x4c0 [btrfs]
  btrfs_writepages+0x94/0x158 [btrfs]
  do_writepages+0x74/0x190
  filemap_fdatawrite_wbc+0x88/0xc8
  __filemap_fdatawrite_range+0x6c/0xa8
  filemap_fdatawrite_range+0x1c/0x30
  btrfs_start_ordered_extent+0x264/0x2e0 [btrfs]
  btrfs_lock_and_flush_ordered_range+0x8c/0x160 [btrfs]
  __get_extent_map+0xa0/0x220 [btrfs]
  btrfs_do_readpage+0x1bc/0x5d8 [btrfs]
  btrfs_read_folio+0x50/0xa0 [btrfs]
  filemap_read_folio+0x54/0x110
  filemap_update_page+0x2e0/0x3b8
  filemap_get_pages+0x228/0x4d8
  filemap_read+0x11c/0x3b8
  btrfs_file_read_iter+0x74/0x90 [btrfs]
  new_sync_read+0xd0/0x1d0
  vfs_read+0x1a0/0x1f0

[CAUSE]
The root cause is a btrfs specific behavior that during a folio read, we
can trigger writeback of the same folio, which will try to lock the same
folio already locked by the read process.

The fix is already sent to the mailing list:
https://lore.kernel.org/linux-btrfs/62bf73ada7be2888d45a787c2b6fd252103a5d25.1729725088.git.wqu@suse.com/

This problem can only happen if all the following conditions are met:

- The sector size of btrfs is smaller than page size
  To have partial uptodate folios.

- Btrfs won't read the full folio if buffered write is block aligned
  This is done by the not yet merged patch:
  https://lore.kernel.org/linux-btrfs/ac2639ec4e9ac176d33e95ef7ecf008fa6be5461.1727833878.git.wqu@suse.com/

[TEST CASE]
During the debugging of that generic/095 hang, I extracted a minimal
reproducer which is much smaller and faster, although it still requires
several runs to trigger a hang.

The test case will run the fio workload 32 times by default, which is
more than enough to trigger the hang.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 tests/generic/366     | 80 +++++++++++++++++++++++++++++++++++++++++++
 tests/generic/366.out |  2 ++
 2 files changed, 82 insertions(+)
 create mode 100755 tests/generic/366
 create mode 100644 tests/generic/366.out

Comments

Anand Jain Oct. 28, 2024, 9:35 a.m. UTC | #1
On 28/10/24 13:41, Qu Wenruo wrote:
> [BUG]
> During the development to make btrfs pass generic/563 (which needs to
> make btrfs to support partial updaote folios), generic/095 causes hangs

  s/updaote/update

> during tests.
> 
> The call trace for the hanging process looks like this:
> 
>    __switch_to+0xf8/0x168
>    __schedule+0x328/0x8a8
>    schedule+0x54/0x140
>    io_schedule+0x44/0x68
>    folio_wait_bit_common+0x198/0x3f8
>    __folio_lock+0x24/0x40
>    extent_write_cache_pages+0x2e0/0x4c0 [btrfs]
>    btrfs_writepages+0x94/0x158 [btrfs]
>    do_writepages+0x74/0x190
>    filemap_fdatawrite_wbc+0x88/0xc8
>    __filemap_fdatawrite_range+0x6c/0xa8
>    filemap_fdatawrite_range+0x1c/0x30
>    btrfs_start_ordered_extent+0x264/0x2e0 [btrfs]
>    btrfs_lock_and_flush_ordered_range+0x8c/0x160 [btrfs]
>    __get_extent_map+0xa0/0x220 [btrfs]
>    btrfs_do_readpage+0x1bc/0x5d8 [btrfs]
>    btrfs_read_folio+0x50/0xa0 [btrfs]
>    filemap_read_folio+0x54/0x110
>    filemap_update_page+0x2e0/0x3b8
>    filemap_get_pages+0x228/0x4d8
>    filemap_read+0x11c/0x3b8
>    btrfs_file_read_iter+0x74/0x90 [btrfs]
>    new_sync_read+0xd0/0x1d0
>    vfs_read+0x1a0/0x1f0
> 
> [CAUSE]
> The root cause is a btrfs specific behavior that during a folio read, we
> can trigger writeback of the same folio, which will try to lock the same
> folio already locked by the read process.
> 
> The fix is already sent to the mailing list:
> https://lore.kernel.org/linux-btrfs/62bf73ada7be2888d45a787c2b6fd252103a5d25.1729725088.git.wqu@suse.com/
> 
> This problem can only happen if all the following conditions are met:
> 
> - The sector size of btrfs is smaller than page size
>    To have partial uptodate folios.
> 
> - Btrfs won't read the full folio if buffered write is block aligned
>    This is done by the not yet merged patch:
>    https://lore.kernel.org/linux-btrfs/ac2639ec4e9ac176d33e95ef7ecf008fa6be5461.1727833878.git.wqu@suse.com/
 > > [TEST CASE]
> During the debugging of that generic/095 hang, I extracted a minimal
> reproducer which is much smaller and faster, although it still requires
> several runs to trigger a hang.
> 
 > The test case will run the fio workload 32 times by default, which 
is> more than enough to trigger the hang.


> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>   tests/generic/366     | 80 +++++++++++++++++++++++++++++++++++++++++++
>   tests/generic/366.out |  2 ++
>   2 files changed, 82 insertions(+)
>   create mode 100755 tests/generic/366
>   create mode 100644 tests/generic/366.out
> 
> diff --git a/tests/generic/366 b/tests/generic/366
> new file mode 100755
> index 00000000..2ebc5728
> --- /dev/null
> +++ b/tests/generic/366
> @@ -0,0 +1,80 @@
> +#! /bin/bash
> +# SPDX-License-Identifier: GPL-2.0
> +# Copyright (C) 2024 SUSE Linux Products GmbH. All Rights Reserved.
> +#
> +# FS QA Test 366
> +#
> +# Test if certain fio load will hang the filesystem.
> +#

It is a good idea to also specify the type of workload being used
with fio, for example, as in generic/095
----
# Concurrent mixed I/O (buffer I/O, aiodio, mmap, splice) on the same files
----

> +# This is exposed by an incoming btrfs feature, which allows a folio to be
> +# partial uptodate if the buffered write range is block aligned but not yet
> +# full folio aligned.
> +#
> +# Such behavior makes btrfs to hang reliably under generic/095.
> +# This is the extracted minimal reproducer for 4k block size and 64K page size.
> +#




> +. ./common/preamble
> +_begin_fstest auto quick
> +
> +. ./common/filter
> +
> +_require_scratch
> +_require_odirect
> +_require_aio
> +

> +_fixed_by_kernel_commit fa630df665aa \
> +_fixed_by_kernel_commit xxxxxxxxxxxx \
> +	"btrfs: avoid deadlock when reading a partial uptodate folio"

An extra _fixed_by_kernel_commit.

> +
> +iterations=$((32 * LOAD_FACTOR))
> +
> +fio_config=$tmp.fio
> +fio_out=$tmp.fio.out
> +blksz=`$here/src/min_dio_alignment $SCRATCH_MNT $SCRATCH_DEV`


> +cat >$fio_config <<EOF
> +[global]
> +bs=8k
> +iodepth=1
> +randrepeat=1
> +size=256k
> +directory=$SCRATCH_MNT
> +numjobs=1
> +[job1]
> +ioengine=sync
> +bs=512
> +direct=1
> +rw=randread
> +filename=file1
> +[job2]
> +ioengine=libaio
> +rw=randwrite
> +direct=1
> +filename=file1
> +[job3]
> +ioengine=posixaio
> +rw=randwrite
> +filename=file1
> +EOF
> +_require_fio $fio_config
> +

  So this fio drops splice and extra sync and posixaio workloads
  from generic/095.  And iterates the workload.

> +for (( i = 0; i < $iterations; i++)); do
> +	_scratch_mkfs >>$seqres.full 2>&1
> +	_scratch_mount


> +	# There's a known EIO failure to report collisions between directio and buffered
> +	# writes to userspace, refer to upstream linux 5a9d929d6e13. So ignore EIO error
> +	# at here.

  This can be updated with a short snippet of how the workload
  helped reproduce the bug.


> +	$FIO_PROG $fio_config --ignore_error=,EIO --output=$fio_out
> +	# umount before checking dmesg in case umount triggers any WARNING or Oops
> +	_scratch_unmount
> +
> +	_check_dmesg _filter_aiodio_dmesg
> +
> +	echo "=== fio $i/$iterations ===" >> $seqres.full
> +	cat $fio_out >> $seqres.full
> +done
> +
> +echo "Silence is golden"
> +
> +# success, all done
> +status=0
> +exit

looks good.

Thanks.


> diff --git a/tests/generic/366.out b/tests/generic/366.out
> new file mode 100644
> index 00000000..1fe90e06
> --- /dev/null
> +++ b/tests/generic/366.out
> @@ -0,0 +1,2 @@
> +QA output created by 366
> +Silence is golden
diff mbox series

Patch

diff --git a/tests/generic/366 b/tests/generic/366
new file mode 100755
index 00000000..2ebc5728
--- /dev/null
+++ b/tests/generic/366
@@ -0,0 +1,80 @@ 
+#! /bin/bash
+# SPDX-License-Identifier: GPL-2.0
+# Copyright (C) 2024 SUSE Linux Products GmbH. All Rights Reserved.
+#
+# FS QA Test 366
+#
+# Test if certain fio load will hang the filesystem.
+#
+# This is exposed by an incoming btrfs feature, which allows a folio to be
+# partial uptodate if the buffered write range is block aligned but not yet
+# full folio aligned.
+#
+# Such behavior makes btrfs to hang reliably under generic/095.
+# This is the extracted minimal reproducer for 4k block size and 64K page size.
+#
+. ./common/preamble
+_begin_fstest auto quick
+
+. ./common/filter
+
+_require_scratch
+_require_odirect
+_require_aio
+
+_fixed_by_kernel_commit fa630df665aa \
+_fixed_by_kernel_commit xxxxxxxxxxxx \
+	"btrfs: avoid deadlock when reading a partial uptodate folio"
+
+iterations=$((32 * LOAD_FACTOR))
+
+fio_config=$tmp.fio
+fio_out=$tmp.fio.out
+blksz=`$here/src/min_dio_alignment $SCRATCH_MNT $SCRATCH_DEV`
+cat >$fio_config <<EOF
+[global]
+bs=8k
+iodepth=1
+randrepeat=1
+size=256k
+directory=$SCRATCH_MNT
+numjobs=1
+[job1]
+ioengine=sync
+bs=512
+direct=1
+rw=randread
+filename=file1
+[job2]
+ioengine=libaio
+rw=randwrite
+direct=1
+filename=file1
+[job3]
+ioengine=posixaio
+rw=randwrite
+filename=file1
+EOF
+_require_fio $fio_config
+
+for (( i = 0; i < $iterations; i++)); do
+	_scratch_mkfs >>$seqres.full 2>&1
+	_scratch_mount
+	# There's a known EIO failure to report collisions between directio and buffered
+	# writes to userspace, refer to upstream linux 5a9d929d6e13. So ignore EIO error
+	# at here.
+	$FIO_PROG $fio_config --ignore_error=,EIO --output=$fio_out
+	# umount before checking dmesg in case umount triggers any WARNING or Oops
+	_scratch_unmount
+
+	_check_dmesg _filter_aiodio_dmesg
+
+	echo "=== fio $i/$iterations ===" >> $seqres.full
+	cat $fio_out >> $seqres.full
+done
+
+echo "Silence is golden"
+
+# success, all done
+status=0
+exit
diff --git a/tests/generic/366.out b/tests/generic/366.out
new file mode 100644
index 00000000..1fe90e06
--- /dev/null
+++ b/tests/generic/366.out
@@ -0,0 +1,2 @@ 
+QA output created by 366
+Silence is golden