diff mbox series

[RFC] fstests: add fsstress + writeback + debugfs folio split test

Message ID 20240424224649.1494092-1-mcgrof@kernel.org (mailing list archive)
State Accepted, archived
Headers show
Series [RFC] fstests: add fsstress + writeback + debugfs folio split test | expand

Commit Message

Luis Chamberlain April 24, 2024, 10:46 p.m. UTC
Stress test folio splits by using the debugfs interface to a target
a new smaller folio order. This is dangerous at the moment as its using
a debugfs API which requires two out of tree fixes [0] [1] which will
be posted shortly. With these patches applied no crash is possible yet.
However, this test was designed to try to exacerbate races with folio
splits and writeback, at least running generic/447 twice ends up
producing a crash only if large folio splits with minimum folio order
is enabled.

With the known fixes for the debugfs interface, this test produces no
crashes even after 3 hour soaking for 4k and LBS. We should enhance
this test a bit more so to reproduce the issues observed by running
generic/447 twice.

This also begs the question if something like MADV_NOHUGEPAGE might be
desirable from userspace, so to enable userspace to request splits when
possible.

If inspecting more closely, you'll want to enable on your kernel boot:

	dyndbg='file mm/huge_memory.c +p'

[0] https://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux.git/commit/?h=20240424-lbs&id=80f6df5037fd0ad560526af45bd7f4d779fe03f6
[1] https://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux.git/commit/?h=20240424-lbs&id=38f6fac5b4283ea48b1876fc56728f062168f8c3
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
---

For those that want to help stress test folio splits to an order, hopefully
this will help start to enable this. Perhaps there are better ways to create
more easy targets to stress test folio splits, and in particular try to
reproduce the issue which so far is only possible by running generic/447 twice
on LBS. The issue with generic/447 on LBS is not observed on 4k, and this test
produces no crashes on LBS...

 common/rc             | 20 +++++++++
 tests/generic/745     | 97 +++++++++++++++++++++++++++++++++++++++++++
 tests/generic/745.out |  2 +
 3 files changed, 119 insertions(+)
 create mode 100755 tests/generic/745
 create mode 100644 tests/generic/745.out

Comments

Zorro Lang April 25, 2024, 5:31 a.m. UTC | #1
On Wed, Apr 24, 2024 at 03:46:48PM -0700, Luis Chamberlain wrote:
> Stress test folio splits by using the debugfs interface to a target
> a new smaller folio order. This is dangerous at the moment as its using
> a debugfs API which requires two out of tree fixes [0] [1] which will
> be posted shortly. With these patches applied no crash is possible yet.
> However, this test was designed to try to exacerbate races with folio
> splits and writeback, at least running generic/447 twice ends up
> producing a crash only if large folio splits with minimum folio order
> is enabled.
> 
> With the known fixes for the debugfs interface, this test produces no
> crashes even after 3 hour soaking for 4k and LBS. We should enhance
> this test a bit more so to reproduce the issues observed by running
> generic/447 twice.
> 
> This also begs the question if something like MADV_NOHUGEPAGE might be
> desirable from userspace, so to enable userspace to request splits when
> possible.
> 
> If inspecting more closely, you'll want to enable on your kernel boot:
> 
> 	dyndbg='file mm/huge_memory.c +p'
> 
> [0] https://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux.git/commit/?h=20240424-lbs&id=80f6df5037fd0ad560526af45bd7f4d779fe03f6
> [1] https://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux.git/commit/?h=20240424-lbs&id=38f6fac5b4283ea48b1876fc56728f062168f8c3
> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
> ---
> 
> For those that want to help stress test folio splits to an order, hopefully
> this will help start to enable this. Perhaps there are better ways to create
> more easy targets to stress test folio splits, and in particular try to
> reproduce the issue which so far is only possible by running generic/447 twice
> on LBS. The issue with generic/447 on LBS is not observed on 4k, and this test
> produces no crashes on LBS...
> 
>  common/rc             | 20 +++++++++
>  tests/generic/745     | 97 +++++++++++++++++++++++++++++++++++++++++++
>  tests/generic/745.out |  2 +
>  3 files changed, 119 insertions(+)
>  create mode 100755 tests/generic/745
>  create mode 100644 tests/generic/745.out
> 
> diff --git a/common/rc b/common/rc
> index d4432f5ce259..1eefb53aa84b 100644
> --- a/common/rc
> +++ b/common/rc
> @@ -127,6 +127,26 @@ _require_compaction()
>  	    _notrun "Need compaction enabled CONFIG_COMPACTION=y"
>  	fi
>  }
> +
> +# Requires CONFIG_DEBUGFS and truncation knobs
> +SPLIT_DEBUGFS="/sys/kernel/debug/split_huge_pages"
> +_require_split_debugfs()
> +{
> +       if [ ! -f $SPLIT_DEBUGFS ]; then
> +           _notrun "Needs CONFIG_DEBUGFS and split_huge_pages"
> +       fi
> +}

The global SPLIT_DEBUGFS isn't necessary, you can just use:

  $DEBUGFS_MNT/split_huge_pages

And this remind me we have a _require_debugfs helper in common/rc, but
it's not used by any one test case. And it looks like not correct:

  _require_debugfs()
  {
      #boot_params always present in debugfs
      [ -d "$DEBUGFS_MNT/boot_params" ] || _notrun "Debugfs not mounted"
  }

I can't find "boot_params" in /sys/kernel/debug/. So we might need to
fix this helper, and then call it in debugfs related test cases.

I'll send another patch to talk about that fix. For this case, it needs
_require_debugfs and $DEBUGFS_MNT.

> +
> +_split_huge_pages_file_full()

May we have a comment to explain what this common helper for? Thanks.

> +{
> +	local file=$1
> +	local offset="0x0"
> +	local len=$(printf "%x" $(stat --format='%s' $file))
> +	local order="0"
> +	local split_cmd="$file,$offset,0x${len},$order"
> +	echo $split_cmd > $SPLIT_DEBUGFS
                           ^^^^^^^^^^^^^
$DEBUGFS_MNT/split_huge_pages is good enough.

> +}
> +
>  # Get hugepagesize in bytes
>  _get_hugepagesize()
>  {
> diff --git a/tests/generic/745 b/tests/generic/745
> new file mode 100755
> index 000000000000..0a30dbee35bd
> --- /dev/null
> +++ b/tests/generic/745
> @@ -0,0 +1,97 @@
> +#! /bin/bash
> +# SPDX-License-Identifier: GPL-2.0
> +# Copyright (C) 2024 Luis Chamberlain. All Rights Reserved.
> +#
> +# FS QA Test No. 734
> +#
> +# Run fsstress in a loop, and in the background force some writeback and
> +# folio splits for every file. If you're enabling this and want to check
> +# underneath the hood you may want to enable:
> +#
> +# dyndbg='file mm/huge_memory.c +p'
> +. ./common/preamble
> +_begin_fstest long_rw stress soak smoketest dangerous_fuzzers

Just double check, is it a dangerous_fuzzers test, and not good to be in auto group?

> +
> +# Override the default cleanup function.
> +_cleanup()
> +{
> +	cd /
> +	rm -f $tmp.*
> +	$KILLALL_PROG -9 fsstress > /dev/null 2>&1
> +}
> +
> +# Import common functions.
> +. ./common/filter

I didn't see any filters called in this case, so don't need to
import it.

> +
> +# real QA test starts here
> +_supported_fs generic
> +_require_test

I didn't see TEST_DIR or TEST_DEV in this test case, so don't need
_require_test

> +_require_scratch
> +_require_split_debugfs
> +_require_command "$KILLALL_PROG" "killall"
> +
> +echo "Silence is golden"
> +
> +_scratch_mkfs >>$seqres.full 2>&1
> +_scratch_mount >> $seqres.full 2>&1
> +
> +nr_cpus=$((LOAD_FACTOR * 4))
> +nr_ops=$((25000 * nr_cpus * TIME_FACTOR))
> +
> +fsstress_args=(-w -d $SCRATCH_MNT/test -n $nr_ops -p $nr_cpus)
> +
> +# used to let our loops know when to stop
> +runfile="$tmp.keep.running.loop"
> +touch $runfile
> +
> +# The background ops are out of bounds, the goal is to race with fsstress.
> +
> +# Force folio split if possible, this seems to be screaming for MADV_NOHUGEPAGE
> +# for large folios.
> +while [ -e $runfile ]; do
> +	for i in $(find $SCRATCH_MNT/test \( -type f \) 2>/dev/null); do

May I ask why the "\( .. \)" is needed?

> +		_split_huge_pages_file_full $i >/dev/null 2>&1

Just make sure, don't you want to output to $seqres.full file to get more information
for debug, if something wrong.

> +	done
> +	sleep 2
> +done &
> +split_huge_pages_files_pid=$!

This split_huge_pages_files_pid isn't be used anywhere, you can deal with it in
_cleanup.

> +
> +blocksize=$(_get_file_block_size $SCRATCH_MNT)
> +export XFS_DIO_MIN=$((blocksize * 2))

Oh, I even forgot we have this parameter in fsstress.c :)

> +
> +test -n "$SOAK_DURATION" && fsstress_args+=(--duration="$SOAK_DURATION")
> +
> +# Our general goal here is to race with ops which can stress folio addition,
> +# removal, edits, and writeback.
> +
> +# zero frequencies for write ops to minimize writeback
> +fsstress_args+=(-R)

But you set "fsstress_args=(-w -d $SCRATCH_MNT/test -n $nr_ops -p $nr_cpus)" above,
so you zeros frequencies of non-write operations (-w) and then zeros frequencies of
write operations (-R). Do you want to use "-z" directly, to zeros frequencies of
all operations ?

> +
> +# XXX: we can improve this, so to increase the chances to allow more
> +# folio splits. Also running generic/447 twice triggers a corner case we can't
> +# capture here on folio splits and write_cache_pages, increasing the chances of
> +# this test to cover that same corner case would be ideal.
> +#
> +# https://gist.github.com/mcgrof/d12f586ec6ebe32b2472b5d634c397df
> +fsstress_args+=(-f creat=1)
> +fsstress_args+=(-f write=1)
> +fsstress_args+=(-f dwrite=1)
> +fsstress_args+=(-f truncate=1)
> +fsstress_args+=(-f zero=1)
> +fsstress_args+=(-f unlink=1)
> +fsstress_args+=(-f fsync=1)
> +fsstress_args+=(-f punch=2)
> +fsstress_args+=(-f copyrange=4)
> +fsstress_args+=(-f clonerange=4)
> +
> +if [[ "$FSTYP" != "xfs" || "$FSTYP" == "ext4" ]]; then
> +	fsstress_args+=(-f collapse=1)

Can you explain why the FALLOC_FL_COLLAPSE_RANGE is special, and not suitable
for xfs?

And I think the range of $FSTYP" != "xfs" contains "$FSTYP" == "ext4", so
this logic makes me confused.

> +fi
> +
> +$FSSTRESS_PROG $FSSTRESS_AVOID "${fsstress_args[@]}" >> $seqres.full
> +
> +rm -f $runfile
> +wait > /dev/null 2>&1

Better to move this to _cleanup. e.g.

rm -f $runfile
[ -n "$split_huge_pages_files_pid" ] && kill $split_huge_pages_files_pid
$KILLALL_PROG -9 fsstress
wait


Thanks,
Zorro
> +
> +status=0
> +exit
> diff --git a/tests/generic/745.out b/tests/generic/745.out
> new file mode 100644
> index 000000000000..fce6b7f5489d
> --- /dev/null
> +++ b/tests/generic/745.out
> @@ -0,0 +1,2 @@
> +QA output created by 745
> +Silence is golden
> -- 
> 2.43.0
> 
>
Luis Chamberlain April 29, 2024, 4:40 a.m. UTC | #2
On Thu, Apr 25, 2024 at 01:31:50PM +0800, Zorro Lang wrote:
> On Wed, Apr 24, 2024 at 03:46:48PM -0700, Luis Chamberlain wrote:
> > This is dangerous at the moment as its using
> > a debugfs API which requires two out of tree fixes [0] [1] which will
> > be posted shortly.
> I can't find "boot_params" in /sys/kernel/debug/. So we might need to
> fix this helper, and then call it in debugfs related test cases.
> 
> I'll send another patch to talk about that fix. For this case, it needs
> _require_debugfs and $DEBUGFS_MNT.

Sure I'll wait until you send patches to fix this.

> > +_begin_fstest long_rw stress soak smoketest dangerous_fuzzers
> 
> Just double check, is it a dangerous_fuzzers test, and not good to be in auto group?

Yes for the explanation quoted above from the commit log, today the
existing code requires fixes which I just posted. The sscanf bug is also
hard to reproduce but I manged to reproduce it so this simple fix is
still being discussed. So for now only those in the know of what they
are doing should run this test.

> > +blocksize=$(_get_file_block_size $SCRATCH_MNT)
> > +export XFS_DIO_MIN=$((blocksize * 2))
> 
> Oh, I even forgot we have this parameter in fsstress.c :)

Yes, but its not well documented for good reason too, its only for DIO.

> > +test -n "$SOAK_DURATION" && fsstress_args+=(--duration="$SOAK_DURATION")
> > +
> > +# Our general goal here is to race with ops which can stress folio addition,
> > +# removal, edits, and writeback.
> > +
> > +# zero frequencies for write ops to minimize writeback
> > +fsstress_args+=(-R)
> 
> But you set "fsstress_args=(-w -d $SCRATCH_MNT/test -n $nr_ops -p $nr_cpus)" above,
> so you zeros frequencies of non-write operations (-w) and then zeros frequencies of
> write operations (-R). Do you want to use "-z" directly, to zeros frequencies of
> all operations ?

You know, it took a whileto figure out the right values here but I've
done more experimentation and we can just go with the defaults, what we
also needed is to add compaction so to help reproduce the real crash I
was trying to aim for. Now its cooked up.

For now I'll just post a v2 RFC and leave the cleanup stuff you've
requested for a v3. Right now this is just useful for folks who want
to help debug truncation + writeback + compaction races we are
uncovering with min order.

  Luis
diff mbox series

Patch

diff --git a/common/rc b/common/rc
index d4432f5ce259..1eefb53aa84b 100644
--- a/common/rc
+++ b/common/rc
@@ -127,6 +127,26 @@  _require_compaction()
 	    _notrun "Need compaction enabled CONFIG_COMPACTION=y"
 	fi
 }
+
+# Requires CONFIG_DEBUGFS and truncation knobs
+SPLIT_DEBUGFS="/sys/kernel/debug/split_huge_pages"
+_require_split_debugfs()
+{
+       if [ ! -f $SPLIT_DEBUGFS ]; then
+           _notrun "Needs CONFIG_DEBUGFS and split_huge_pages"
+       fi
+}
+
+_split_huge_pages_file_full()
+{
+	local file=$1
+	local offset="0x0"
+	local len=$(printf "%x" $(stat --format='%s' $file))
+	local order="0"
+	local split_cmd="$file,$offset,0x${len},$order"
+	echo $split_cmd > $SPLIT_DEBUGFS
+}
+
 # Get hugepagesize in bytes
 _get_hugepagesize()
 {
diff --git a/tests/generic/745 b/tests/generic/745
new file mode 100755
index 000000000000..0a30dbee35bd
--- /dev/null
+++ b/tests/generic/745
@@ -0,0 +1,97 @@ 
+#! /bin/bash
+# SPDX-License-Identifier: GPL-2.0
+# Copyright (C) 2024 Luis Chamberlain. All Rights Reserved.
+#
+# FS QA Test No. 734
+#
+# Run fsstress in a loop, and in the background force some writeback and
+# folio splits for every file. If you're enabling this and want to check
+# underneath the hood you may want to enable:
+#
+# dyndbg='file mm/huge_memory.c +p'
+. ./common/preamble
+_begin_fstest long_rw stress soak smoketest dangerous_fuzzers
+
+# Override the default cleanup function.
+_cleanup()
+{
+	cd /
+	rm -f $tmp.*
+	$KILLALL_PROG -9 fsstress > /dev/null 2>&1
+}
+
+# Import common functions.
+. ./common/filter
+
+# real QA test starts here
+_supported_fs generic
+_require_test
+_require_scratch
+_require_split_debugfs
+_require_command "$KILLALL_PROG" "killall"
+
+echo "Silence is golden"
+
+_scratch_mkfs >>$seqres.full 2>&1
+_scratch_mount >> $seqres.full 2>&1
+
+nr_cpus=$((LOAD_FACTOR * 4))
+nr_ops=$((25000 * nr_cpus * TIME_FACTOR))
+
+fsstress_args=(-w -d $SCRATCH_MNT/test -n $nr_ops -p $nr_cpus)
+
+# used to let our loops know when to stop
+runfile="$tmp.keep.running.loop"
+touch $runfile
+
+# The background ops are out of bounds, the goal is to race with fsstress.
+
+# Force folio split if possible, this seems to be screaming for MADV_NOHUGEPAGE
+# for large folios.
+while [ -e $runfile ]; do
+	for i in $(find $SCRATCH_MNT/test \( -type f \) 2>/dev/null); do
+		_split_huge_pages_file_full $i >/dev/null 2>&1
+	done
+	sleep 2
+done &
+split_huge_pages_files_pid=$!
+
+blocksize=$(_get_file_block_size $SCRATCH_MNT)
+export XFS_DIO_MIN=$((blocksize * 2))
+
+test -n "$SOAK_DURATION" && fsstress_args+=(--duration="$SOAK_DURATION")
+
+# Our general goal here is to race with ops which can stress folio addition,
+# removal, edits, and writeback.
+
+# zero frequencies for write ops to minimize writeback
+fsstress_args+=(-R)
+
+# XXX: we can improve this, so to increase the chances to allow more
+# folio splits. Also running generic/447 twice triggers a corner case we can't
+# capture here on folio splits and write_cache_pages, increasing the chances of
+# this test to cover that same corner case would be ideal.
+#
+# https://gist.github.com/mcgrof/d12f586ec6ebe32b2472b5d634c397df
+fsstress_args+=(-f creat=1)
+fsstress_args+=(-f write=1)
+fsstress_args+=(-f dwrite=1)
+fsstress_args+=(-f truncate=1)
+fsstress_args+=(-f zero=1)
+fsstress_args+=(-f unlink=1)
+fsstress_args+=(-f fsync=1)
+fsstress_args+=(-f punch=2)
+fsstress_args+=(-f copyrange=4)
+fsstress_args+=(-f clonerange=4)
+
+if [[ "$FSTYP" != "xfs" || "$FSTYP" == "ext4" ]]; then
+	fsstress_args+=(-f collapse=1)
+fi
+
+$FSSTRESS_PROG $FSSTRESS_AVOID "${fsstress_args[@]}" >> $seqres.full
+
+rm -f $runfile
+wait > /dev/null 2>&1
+
+status=0
+exit
diff --git a/tests/generic/745.out b/tests/generic/745.out
new file mode 100644
index 000000000000..fce6b7f5489d
--- /dev/null
+++ b/tests/generic/745.out
@@ -0,0 +1,2 @@ 
+QA output created by 745
+Silence is golden