Message ID | 98d2055515cd765b0835a7f751a21cbb72c03621.1671059406.git.boris@bur.io (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | btrfs: new test for logical inode resolution panic | expand |
Hi Boris, Thanks for providing this reproducer... On Wed, 14 Dec 2022 15:12:01 -0800, Boris Burkov wrote: > If we create a file that has an inline extent followed by a prealloc > extent, then attempt to use the logical to inode ioctl on the prealloc > extent, but in the overwritten range, backref resolution will process > the inline extent. Depending on the leaf eb layout, this can panic. > Add a new test for this condition. In the long run, we can add spew when > we read out-of-bounds fields of inline extent items and simplify this > test to look for dmesg warnings rather than trying to force a fairly > fragile panic (dependent on non-standardized details of leaf layout). > > The test causes a kernel panic unless: > btrfs: fix logical_ino ioctl panic > is applied to the kernel. This could be included as a test hint via _fixed_by_kernel_commit(), but given that failure is a panic, it probably doesn't matter... > Signed-off-by: Boris Burkov <boris@bur.io> > --- > tests/btrfs/279 | 95 +++++++++++++++++++++++++++++++++++++++++++++ > tests/btrfs/279.out | 2 + Looks like a rebase is needed - btrfs/279 is already taken. > 2 files changed, 97 insertions(+) > create mode 100755 tests/btrfs/279 > create mode 100644 tests/btrfs/279.out > > diff --git a/tests/btrfs/279 b/tests/btrfs/279 > new file mode 100755 > index 00000000..ef77f84b > --- /dev/null > +++ b/tests/btrfs/279 > @@ -0,0 +1,95 @@ > +#! /bin/bash > +# SPDX-License-Identifier: GPL-2.0 > +# Copyright (c) 2022 Meta Platforms, Inc. All Rights Reserved. > +# > +# FS QA Test 279 > +# > +# Given a file with extents: > +# [0 : 4096) (inline) > +# [4096 : N] (prealloc) > +# if a user uses the ioctl BTRFS_IOC_LOGICAL_INO[_V2] asking for the file of the > +# non-inline extent, it results in reading the offset field of the inline > +# extent, which is meaningless (it is full of user data..). If we are > +# particularly lucky, it can be past the end of the extent buffer, resulting in > +# a crash. > +# > +. ./common/preamble > +_begin_fstest auto quick > + > +# Import common functions. > +. ./common/filter > +. ./common/dmlogwrites > + > +# real QA test starts here > + > +# Modify as appropriate. > +_supported_fs btrfs > +_require_scratch > +_require_xfs_io_command "falloc" > +_require_xfs_io_command "fsync" > +_require_xfs_io_command "pwrite" > + > +dump_tree() { > + $BTRFS_UTIL_PROG inspect-internal dump-tree $SCRATCH_DEV > +} > + > +get_extent_data() { > + local ino=$1 > + dump_tree $SCRATCH_DEV | grep -A4 "($ino EXTENT_DATA " > +} > + > +get_prealloc_offset() { > + local ino=$1 > + get_extent_data $ino | grep "disk byte" | awk '{print $5}' > +} > + > +# This test needs to create conditions s.t. the special inode's inline extent > +# is the first item in a leaf. Therefore, fix a leaf size and add > +# items that are otherwise not necessary to reproduce the inline-prealloc > +# condition to get to such a state. > +# > +# Roughly, the idea for getting the right item fill is to: > +# 1. create an extra file with a variable sized inline extent item > +# 2. create our evil file that will cause the panic > +# 3. create a whole bunch of files to create a bunch of dir/index items > +# 4. size the variable extent item s.t. the evil extent item is item 0 in the > +# next leaf > +# > +# We do it in this somewhat convoluted way because the dir and index items all > +# come before any inode, inode_ref, or extent_data items. So we can predictably > +# create a bunch of them, then sneak in a funny sized extent to round out the > +# difference. > + > +_scratch_mkfs "--nodesize 16k" >/dev/null > +_scratch_mount > + > +f=$SCRATCH_MNT/f > + > +# the variable extra "leaf padding" file > +$XFS_IO_PROG -fc "pwrite -q 0 700" $f.pad > + > +# the evil file with an inline extent followed by a prealloc extent > +# created by falloc with keep-size, then two non-truncating writes to the front > +touch $f.evil > +$XFS_IO_PROG -fc "falloc -k 0 1m" $f.evil > +$XFS_IO_PROG -fc fsync $f.evil > +ino=$(stat -c '%i' $f.evil) > +logical=$(get_prealloc_offset $ino) > +$XFS_IO_PROG -fc "pwrite -q 0 23" $f.evil > +$XFS_IO_PROG -fc fsync $f.evil > +$XFS_IO_PROG -fc "pwrite -q 0 23" $f.evil > +$XFS_IO_PROG -fc fsync $f.evil > +sync > + > +# a bunch of inodes to stuff dir items in front of the extent items > +for i in $(seq 122); do nit: a c-style bash loop would avoid the extra seq process > + $XFS_IO_PROG -fc "pwrite -q 0 8192" $f.$i > +done > +sync > + > +btrfs inspect-internal logical-resolve $logical $SCRATCH_MNT | _filter_scratch > +_scratch_unmount > + > +echo "Silence is golden" > +status=0 > +exit > diff --git a/tests/btrfs/279.out b/tests/btrfs/279.out > new file mode 100644 > index 00000000..c5906093 > --- /dev/null > +++ b/tests/btrfs/279.out > @@ -0,0 +1,2 @@ > +QA output created by 279 > +Silence is golden With the rebase to avoid tests/btrfs/279 collision, this looks good: Reviewed-by: David Disseldorp <ddiss@suse.de>
On Wed, Dec 14, 2022 at 03:12:01PM -0800, Boris Burkov wrote: > If we create a file that has an inline extent followed by a prealloc > extent, then attempt to use the logical to inode ioctl on the prealloc > extent, but in the overwritten range, backref resolution will process > the inline extent. Depending on the leaf eb layout, this can panic. > Add a new test for this condition. In the long run, we can add spew when > we read out-of-bounds fields of inline extent items and simplify this > test to look for dmesg warnings rather than trying to force a fairly > fragile panic (dependent on non-standardized details of leaf layout). > > The test causes a kernel panic unless: > btrfs: fix logical_ino ioctl panic Hi, Thanks for this new case. Is this patch merged, does it has a commit id? BTW, please mark this known issue by _fixed_by_kernel_commit in case source code. > is applied to the kernel. > > Signed-off-by: Boris Burkov <boris@bur.io> > --- > tests/btrfs/279 | 95 +++++++++++++++++++++++++++++++++++++++++++++ > tests/btrfs/279.out | 2 + > 2 files changed, 97 insertions(+) > create mode 100755 tests/btrfs/279 > create mode 100644 tests/btrfs/279.out > > diff --git a/tests/btrfs/279 b/tests/btrfs/279 > new file mode 100755 > index 00000000..ef77f84b > --- /dev/null > +++ b/tests/btrfs/279 btrfs/279 was just token last weekend, please help to rebase to the latest fstests for-next branch, or change the 279 to a big enough number which won't cause conflict (I'll give it a proper number when merge it). > @@ -0,0 +1,95 @@ > +#! /bin/bash > +# SPDX-License-Identifier: GPL-2.0 > +# Copyright (c) 2022 Meta Platforms, Inc. All Rights Reserved. > +# > +# FS QA Test 279 > +# > +# Given a file with extents: > +# [0 : 4096) (inline) > +# [4096 : N] (prealloc) > +# if a user uses the ioctl BTRFS_IOC_LOGICAL_INO[_V2] asking for the file of the > +# non-inline extent, it results in reading the offset field of the inline > +# extent, which is meaningless (it is full of user data..). If we are > +# particularly lucky, it can be past the end of the extent buffer, resulting in > +# a crash. > +# > +. ./common/preamble > +_begin_fstest auto quick > + > +# Import common functions. > +. ./common/filter > +. ./common/dmlogwrites Does this case really use helpers in above two included files? > + > +# real QA test starts here > + > +# Modify as appropriate. > +_supported_fs btrfs > +_require_scratch > +_require_xfs_io_command "falloc" Add this case to prealloc or preallocrw group > +_require_xfs_io_command "fsync" > +_require_xfs_io_command "pwrite" > + > +dump_tree() { > + $BTRFS_UTIL_PROG inspect-internal dump-tree $SCRATCH_DEV _require_btrfs_command inspect-internal dump-tree > +} > + > +get_extent_data() { > + local ino=$1 > + dump_tree $SCRATCH_DEV | grep -A4 "($ino EXTENT_DATA " > +} > + > +get_prealloc_offset() { > + local ino=$1 > + get_extent_data $ino | grep "disk byte" | awk '{print $5}' > +} > + > +# This test needs to create conditions s.t. the special inode's inline extent > +# is the first item in a leaf. Therefore, fix a leaf size and add > +# items that are otherwise not necessary to reproduce the inline-prealloc > +# condition to get to such a state. > +# > +# Roughly, the idea for getting the right item fill is to: > +# 1. create an extra file with a variable sized inline extent item > +# 2. create our evil file that will cause the panic > +# 3. create a whole bunch of files to create a bunch of dir/index items > +# 4. size the variable extent item s.t. the evil extent item is item 0 in the > +# next leaf > +# > +# We do it in this somewhat convoluted way because the dir and index items all > +# come before any inode, inode_ref, or extent_data items. So we can predictably > +# create a bunch of them, then sneak in a funny sized extent to round out the > +# difference. > + > +_scratch_mkfs "--nodesize 16k" >/dev/null We recommend calling _fail at here if _scratch_mkfs fails with a *specified* option. _scratch_mkfs "--nodesize 16k" >>$seqres.full || _fail "mkfs failed" > +_scratch_mount > + > +f=$SCRATCH_MNT/f > + > +# the variable extra "leaf padding" file > +$XFS_IO_PROG -fc "pwrite -q 0 700" $f.pad > + > +# the evil file with an inline extent followed by a prealloc extent > +# created by falloc with keep-size, then two non-truncating writes to the front > +touch $f.evil Not sure if you reall use this touch command as you've used "-f" option for xfs_io. > +$XFS_IO_PROG -fc "falloc -k 0 1m" $f.evil > +$XFS_IO_PROG -fc fsync $f.evil > +ino=$(stat -c '%i' $f.evil) > +logical=$(get_prealloc_offset $ino) > +$XFS_IO_PROG -fc "pwrite -q 0 23" $f.evil > +$XFS_IO_PROG -fc fsync $f.evil > +$XFS_IO_PROG -fc "pwrite -q 0 23" $f.evil > +$XFS_IO_PROG -fc fsync $f.evil xfs_io commands can be combined in one line, to make the code looks clear (except you hope the fd closed after each command done) E.g: $XFS_IO_PROG -fc "falloc -k 0 1m" -c fsync $f.evil ino=$(stat -c '%i' $f.evil) logical=$(get_prealloc_offset $ino) $XFS_IO_PROG -c "pwrite -q 0 23" -c fsync \ -c "pwrite -q 0 23" -c fsync \ $f.evil > +sync > + > +# a bunch of inodes to stuff dir items in front of the extent items > +for i in $(seq 122); do > + $XFS_IO_PROG -fc "pwrite -q 0 8192" $f.$i > +done > +sync > + > +btrfs inspect-internal logical-resolve $logical $SCRATCH_MNT | _filter_scratch ^^ $BTRFS_UTIL_PROG _require_btrfs_command inspect-internal logical-resolve The .out file only has "Silence is golden", what's this _filter_scratch used for? > +_scratch_unmount Is the ending _scratch_unmount useful? > + > +echo "Silence is golden" > +status=0 > +exit > diff --git a/tests/btrfs/279.out b/tests/btrfs/279.out > new file mode 100644 > index 00000000..c5906093 > --- /dev/null > +++ b/tests/btrfs/279.out > @@ -0,0 +1,2 @@ > +QA output created by 279 > +Silence is golden > -- > 2.38.1 >
On 2022/12/16 16:46, Zorro Lang wrote: > On Wed, Dec 14, 2022 at 03:12:01PM -0800, Boris Burkov wrote: >> If we create a file that has an inline extent followed by a prealloc >> extent, then attempt to use the logical to inode ioctl on the prealloc >> extent, but in the overwritten range, backref resolution will process >> the inline extent. Depending on the leaf eb layout, this can panic. >> Add a new test for this condition. In the long run, we can add spew when >> we read out-of-bounds fields of inline extent items and simplify this >> test to look for dmesg warnings rather than trying to force a fairly >> fragile panic (dependent on non-standardized details of leaf layout). >> >> The test causes a kernel panic unless: >> btrfs: fix logical_ino ioctl panic > > Hi, > > Thanks for this new case. > > Is this patch merged, does it has a commit id? BTW, please mark this > known issue by _fixed_by_kernel_commit in case source code. This fix is not yet merged, but I believe it would soon get merged: https://patchwork.kernel.org/project/linux-btrfs/patch/80f01f297721481fd0d4cbf03fd2550e25b578fb.1671058852.git.boris@bur.io/ > >> is applied to the kernel. >> >> Signed-off-by: Boris Burkov <boris@bur.io> >> --- >> tests/btrfs/279 | 95 +++++++++++++++++++++++++++++++++++++++++++++ >> tests/btrfs/279.out | 2 + >> 2 files changed, 97 insertions(+) >> create mode 100755 tests/btrfs/279 >> create mode 100644 tests/btrfs/279.out >> >> diff --git a/tests/btrfs/279 b/tests/btrfs/279 >> new file mode 100755 >> index 00000000..ef77f84b >> --- /dev/null >> +++ b/tests/btrfs/279 > > btrfs/279 was just token last weekend, please help to rebase to the latest > fstests for-next branch, or change the 279 to a big enough number which won't > cause conflict (I'll give it a proper number when merge it). > >> @@ -0,0 +1,95 @@ >> +#! /bin/bash >> +# SPDX-License-Identifier: GPL-2.0 >> +# Copyright (c) 2022 Meta Platforms, Inc. All Rights Reserved. >> +# >> +# FS QA Test 279 >> +# >> +# Given a file with extents: >> +# [0 : 4096) (inline) >> +# [4096 : N] (prealloc) >> +# if a user uses the ioctl BTRFS_IOC_LOGICAL_INO[_V2] asking for the file of the >> +# non-inline extent, it results in reading the offset field of the inline >> +# extent, which is meaningless (it is full of user data..). If we are >> +# particularly lucky, it can be past the end of the extent buffer, resulting in >> +# a crash. >> +# >> +. ./common/preamble >> +_begin_fstest auto quick >> + >> +# Import common functions. >> +. ./common/filter >> +. ./common/dmlogwrites > > Does this case really use helpers in above two included files? > >> + >> +# real QA test starts here >> + >> +# Modify as appropriate. >> +_supported_fs btrfs >> +_require_scratch >> +_require_xfs_io_command "falloc" > > Add this case to prealloc or preallocrw group > >> +_require_xfs_io_command "fsync" >> +_require_xfs_io_command "pwrite" >> + >> +dump_tree() { >> + $BTRFS_UTIL_PROG inspect-internal dump-tree $SCRATCH_DEV > > _require_btrfs_command inspect-internal dump-tree > >> +} >> + >> +get_extent_data() { >> + local ino=$1 >> + dump_tree $SCRATCH_DEV | grep -A4 "($ino EXTENT_DATA " >> +} >> + >> +get_prealloc_offset() { >> + local ino=$1 >> + get_extent_data $ino | grep "disk byte" | awk '{print $5}' >> +} >> + >> +# This test needs to create conditions s.t. the special inode's inline extent >> +# is the first item in a leaf. Therefore, fix a leaf size and add >> +# items that are otherwise not necessary to reproduce the inline-prealloc >> +# condition to get to such a state. >> +# >> +# Roughly, the idea for getting the right item fill is to: >> +# 1. create an extra file with a variable sized inline extent item >> +# 2. create our evil file that will cause the panic >> +# 3. create a whole bunch of files to create a bunch of dir/index items >> +# 4. size the variable extent item s.t. the evil extent item is item 0 in the >> +# next leaf >> +# >> +# We do it in this somewhat convoluted way because the dir and index items all >> +# come before any inode, inode_ref, or extent_data items. So we can predictably >> +# create a bunch of them, then sneak in a funny sized extent to round out the >> +# difference. >> + >> +_scratch_mkfs "--nodesize 16k" >/dev/null > > We recommend calling _fail at here if _scratch_mkfs fails with a *specified* > option. > > _scratch_mkfs "--nodesize 16k" >>$seqres.full || _fail "mkfs failed" > >> +_scratch_mount >> + >> +f=$SCRATCH_MNT/f >> + >> +# the variable extra "leaf padding" file >> +$XFS_IO_PROG -fc "pwrite -q 0 700" $f.pad >> + >> +# the evil file with an inline extent followed by a prealloc extent >> +# created by falloc with keep-size, then two non-truncating writes to the front >> +touch $f.evil > > Not sure if you reall use this touch command as you've used "-f" option for > xfs_io. > >> +$XFS_IO_PROG -fc "falloc -k 0 1m" $f.evil >> +$XFS_IO_PROG -fc fsync $f.evil >> +ino=$(stat -c '%i' $f.evil) >> +logical=$(get_prealloc_offset $ino) >> +$XFS_IO_PROG -fc "pwrite -q 0 23" $f.evil >> +$XFS_IO_PROG -fc fsync $f.evil >> +$XFS_IO_PROG -fc "pwrite -q 0 23" $f.evil >> +$XFS_IO_PROG -fc fsync $f.evil > > xfs_io commands can be combined in one line, to make the code looks clear (except > you hope the fd closed after each command done) E.g: > > $XFS_IO_PROG -fc "falloc -k 0 1m" -c fsync $f.evil > ino=$(stat -c '%i' $f.evil) > logical=$(get_prealloc_offset $ino) > $XFS_IO_PROG -c "pwrite -q 0 23" -c fsync \ > -c "pwrite -q 0 23" -c fsync \ > $f.evil > >> +sync >> + >> +# a bunch of inodes to stuff dir items in front of the extent items >> +for i in $(seq 122); do >> + $XFS_IO_PROG -fc "pwrite -q 0 8192" $f.$i >> +done >> +sync >> + >> +btrfs inspect-internal logical-resolve $logical $SCRATCH_MNT | _filter_scratch > ^^ > $BTRFS_UTIL_PROG > > _require_btrfs_command inspect-internal logical-resolve > > The .out file only has "Silence is golden", what's this _filter_scratch used for? > >> +_scratch_unmount > > Is the ending _scratch_unmount useful? > >> + >> +echo "Silence is golden" >> +status=0 >> +exit >> diff --git a/tests/btrfs/279.out b/tests/btrfs/279.out >> new file mode 100644 >> index 00000000..c5906093 >> --- /dev/null >> +++ b/tests/btrfs/279.out >> @@ -0,0 +1,2 @@ >> +QA output created by 279 >> +Silence is golden >> -- >> 2.38.1 >> >
On Wed, Dec 14, 2022 at 11:15 PM Boris Burkov <boris@bur.io> wrote: > > If we create a file that has an inline extent followed by a prealloc > extent, then attempt to use the logical to inode ioctl on the prealloc > extent, but in the overwritten range, backref resolution will process > the inline extent. Depending on the leaf eb layout, this can panic. > Add a new test for this condition. In the long run, we can add spew when > we read out-of-bounds fields of inline extent items and simplify this > test to look for dmesg warnings rather than trying to force a fairly > fragile panic (dependent on non-standardized details of leaf layout). > > The test causes a kernel panic unless: > btrfs: fix logical_ino ioctl panic > is applied to the kernel. > > Signed-off-by: Boris Burkov <boris@bur.io> So in addition to feedback already received from David and Zorro, I have some comments inlined below. > --- > tests/btrfs/279 | 95 +++++++++++++++++++++++++++++++++++++++++++++ > tests/btrfs/279.out | 2 + > 2 files changed, 97 insertions(+) > create mode 100755 tests/btrfs/279 > create mode 100644 tests/btrfs/279.out > > diff --git a/tests/btrfs/279 b/tests/btrfs/279 > new file mode 100755 > index 00000000..ef77f84b > --- /dev/null > +++ b/tests/btrfs/279 > @@ -0,0 +1,95 @@ > +#! /bin/bash > +# SPDX-License-Identifier: GPL-2.0 > +# Copyright (c) 2022 Meta Platforms, Inc. All Rights Reserved. > +# > +# FS QA Test 279 > +# > +# Given a file with extents: > +# [0 : 4096) (inline) > +# [4096 : N] (prealloc) > +# if a user uses the ioctl BTRFS_IOC_LOGICAL_INO[_V2] asking for the file of the > +# non-inline extent, it results in reading the offset field of the inline > +# extent, which is meaningless (it is full of user data..). If we are > +# particularly lucky, it can be past the end of the extent buffer, resulting in > +# a crash. > +# > +. ./common/preamble > +_begin_fstest auto quick > + > +# Import common functions. > +. ./common/filter > +. ./common/dmlogwrites > + > +# real QA test starts here > + > +# Modify as appropriate. > +_supported_fs btrfs > +_require_scratch > +_require_xfs_io_command "falloc" Here it should be: _require_xfs_io_command "falloc" "-k" > +_require_xfs_io_command "fsync" > +_require_xfs_io_command "pwrite" This is the first time I'm seeing a test requiring xfs_io to support the pwrite and fsync commands. Presumably there aren't any because either these commands always existed or they have been around for a very long time. > + > +dump_tree() { > + $BTRFS_UTIL_PROG inspect-internal dump-tree $SCRATCH_DEV > +} > + > +get_extent_data() { > + local ino=$1 > + dump_tree $SCRATCH_DEV | grep -A4 "($ino EXTENT_DATA " > +} > + > +get_prealloc_offset() { > + local ino=$1 > + get_extent_data $ino | grep "disk byte" | awk '{print $5}' In fstests we use $AWK_PROG instead of bare 'awk'. > +} > + > +# This test needs to create conditions s.t. the special inode's inline extent > +# is the first item in a leaf. Therefore, fix a leaf size and add > +# items that are otherwise not necessary to reproduce the inline-prealloc > +# condition to get to such a state. > +# > +# Roughly, the idea for getting the right item fill is to: > +# 1. create an extra file with a variable sized inline extent item > +# 2. create our evil file that will cause the panic > +# 3. create a whole bunch of files to create a bunch of dir/index items > +# 4. size the variable extent item s.t. the evil extent item is item 0 in the > +# next leaf > +# > +# We do it in this somewhat convoluted way because the dir and index items all > +# come before any inode, inode_ref, or extent_data items. So we can predictably > +# create a bunch of them, then sneak in a funny sized extent to round out the > +# difference. > + > +_scratch_mkfs "--nodesize 16k" >/dev/null So this will fail on a machine with a 64K page size (PPC for e.g.) using a kernel or btrfs-progs without subpage sector size support. That will make mkfs output an error to stderr and cause the test to fail. Please use a 64K node size and adapt the number of files below so that we get the problematic leaf layout to trigger the bug. Like this the test will be able to run, and reproduce the bug on an unpatched kernel, on machines with any page size and on kernels without subpage sector size support (thinking of SLE kernels for example). That's generally what we do when we need to exercise a particular leaf layout and allow the test to run on machines with any page size. For example btrfs/239 and btrfs/154 do this. > +_scratch_mount > + > +f=$SCRATCH_MNT/f > + > +# the variable extra "leaf padding" file > +$XFS_IO_PROG -fc "pwrite -q 0 700" $f.pad > + > +# the evil file with an inline extent followed by a prealloc extent > +# created by falloc with keep-size, then two non-truncating writes to the front > +touch $f.evil > +$XFS_IO_PROG -fc "falloc -k 0 1m" $f.evil > +$XFS_IO_PROG -fc fsync $f.evil > +ino=$(stat -c '%i' $f.evil) > +logical=$(get_prealloc_offset $ino) > +$XFS_IO_PROG -fc "pwrite -q 0 23" $f.evil > +$XFS_IO_PROG -fc fsync $f.evil > +$XFS_IO_PROG -fc "pwrite -q 0 23" $f.evil > +$XFS_IO_PROG -fc fsync $f.evil > +sync This is complex, and it doesn't provide any comments regarding: 1) Why all this frenzy of fsync and a final sync (which makes the last fsync pointless)? 2) Why do we need to write twice to the range [0, 23)? Wouldn't something more simple like this work too: $XFS_IO_PROG -fc "falloc -k 0 1m" $f.evil # sync to commit the transaction and make sure dump-tree sees the fs tree with # the prealloc extent when we try to get its bytenr. sync ino=$(stat -c '%i' $f.evil) logical=$(get_prealloc_offset $ino) # Do a write that will result in an inline extent. $XFS_IO_PROG -c "pwrite -q 0 23" $f.evil # A bunch of inodes to stuff dir items in front of the file extent items. for i in $(seq 122); do $XFS_IO_PROG -fc "pwrite -q 0 8192" $f.$i done # Flush all dealloc so we get all the file extent items inserted in the fs tree. sync Please add comments about each necessary step, as I've done there. Otherwise it's very hard to understand why those steps are needed... I'm surprised no one commented on this before. I'm comfortable with btrfs' internals and yet I can't understand why there are so many steps, and what exactly each one is supposed to accomplish. > + > +# a bunch of inodes to stuff dir items in front of the extent items > +for i in $(seq 122); do > + $XFS_IO_PROG -fc "pwrite -q 0 8192" $f.$i > +done > +sync > + > +btrfs inspect-internal logical-resolve $logical $SCRATCH_MNT | _filter_scratch > +_scratch_unmount The unmount is pointless, fstests framework will do that for us. Finally I would also like to see the _fixed_by_kernel_commit annotation in the test. Since the fix is not yet in Linus' tree, in IMO you can replace the commit hash with "xxxx..." and later update the test once it's merged in Linus' tree. Thanks. > + > +echo "Silence is golden" > +status=0 > +exit > diff --git a/tests/btrfs/279.out b/tests/btrfs/279.out > new file mode 100644 > index 00000000..c5906093 > --- /dev/null > +++ b/tests/btrfs/279.out > @@ -0,0 +1,2 @@ > +QA output created by 279 > +Silence is golden > -- > 2.38.1 >
On Fri, Dec 16, 2022 at 11:10:26AM +0000, Filipe Manana wrote: > On Wed, Dec 14, 2022 at 11:15 PM Boris Burkov <boris@bur.io> wrote: > > > > If we create a file that has an inline extent followed by a prealloc > > extent, then attempt to use the logical to inode ioctl on the prealloc > > extent, but in the overwritten range, backref resolution will process > > the inline extent. Depending on the leaf eb layout, this can panic. > > Add a new test for this condition. In the long run, we can add spew when > > we read out-of-bounds fields of inline extent items and simplify this > > test to look for dmesg warnings rather than trying to force a fairly > > fragile panic (dependent on non-standardized details of leaf layout). > > > > The test causes a kernel panic unless: > > btrfs: fix logical_ino ioctl panic > > is applied to the kernel. > > > > Signed-off-by: Boris Burkov <boris@bur.io> > > So in addition to feedback already received from David and Zorro, I > have some comments inlined below. > > > --- > > tests/btrfs/279 | 95 +++++++++++++++++++++++++++++++++++++++++++++ > > tests/btrfs/279.out | 2 + > > 2 files changed, 97 insertions(+) > > create mode 100755 tests/btrfs/279 > > create mode 100644 tests/btrfs/279.out > > > > diff --git a/tests/btrfs/279 b/tests/btrfs/279 > > new file mode 100755 > > index 00000000..ef77f84b > > --- /dev/null > > +++ b/tests/btrfs/279 > > @@ -0,0 +1,95 @@ > > +#! /bin/bash > > +# SPDX-License-Identifier: GPL-2.0 > > +# Copyright (c) 2022 Meta Platforms, Inc. All Rights Reserved. > > +# > > +# FS QA Test 279 > > +# > > +# Given a file with extents: > > +# [0 : 4096) (inline) > > +# [4096 : N] (prealloc) > > +# if a user uses the ioctl BTRFS_IOC_LOGICAL_INO[_V2] asking for the file of the > > +# non-inline extent, it results in reading the offset field of the inline > > +# extent, which is meaningless (it is full of user data..). If we are > > +# particularly lucky, it can be past the end of the extent buffer, resulting in > > +# a crash. > > +# > > +. ./common/preamble > > +_begin_fstest auto quick > > + > > +# Import common functions. > > +. ./common/filter > > +. ./common/dmlogwrites > > + > > +# real QA test starts here > > + > > +# Modify as appropriate. > > +_supported_fs btrfs > > +_require_scratch > > +_require_xfs_io_command "falloc" > > Here it should be: > > _require_xfs_io_command "falloc" "-k" > > > +_require_xfs_io_command "fsync" > > +_require_xfs_io_command "pwrite" > > This is the first time I'm seeing a test requiring xfs_io to support > the pwrite and fsync commands. > Presumably there aren't any because either these commands always > existed or they have been around for a very long time. > > > + > > +dump_tree() { > > + $BTRFS_UTIL_PROG inspect-internal dump-tree $SCRATCH_DEV > > +} > > + > > +get_extent_data() { > > + local ino=$1 > > + dump_tree $SCRATCH_DEV | grep -A4 "($ino EXTENT_DATA " > > +} > > + > > +get_prealloc_offset() { > > + local ino=$1 > > + get_extent_data $ino | grep "disk byte" | awk '{print $5}' > > In fstests we use $AWK_PROG instead of bare 'awk'. > > > +} > > + > > +# This test needs to create conditions s.t. the special inode's inline extent > > +# is the first item in a leaf. Therefore, fix a leaf size and add > > +# items that are otherwise not necessary to reproduce the inline-prealloc > > +# condition to get to such a state. > > +# > > +# Roughly, the idea for getting the right item fill is to: > > +# 1. create an extra file with a variable sized inline extent item > > +# 2. create our evil file that will cause the panic > > +# 3. create a whole bunch of files to create a bunch of dir/index items > > +# 4. size the variable extent item s.t. the evil extent item is item 0 in the > > +# next leaf > > +# > > +# We do it in this somewhat convoluted way because the dir and index items all > > +# come before any inode, inode_ref, or extent_data items. So we can predictably > > +# create a bunch of them, then sneak in a funny sized extent to round out the > > +# difference. > > + > > +_scratch_mkfs "--nodesize 16k" >/dev/null > > So this will fail on a machine with a 64K page size (PPC for e.g.) > using a kernel or btrfs-progs without subpage sector size support. > That will make mkfs output an error to stderr and cause the test to fail. > > Please use a 64K node size and adapt the number of files below so that > we get the problematic leaf layout to trigger > the bug. > > Like this the test will be able to run, and reproduce the bug on an > unpatched kernel, on machines with any page size > and on kernels without subpage sector size support (thinking of SLE > kernels for example). > > That's generally what we do when we need to exercise a particular leaf > layout and allow the test to run on machines > with any page size. For example btrfs/239 and btrfs/154 do this. Good point, I'll re-do it for 64k to make it fully general as that was my intent in setting a fixed nodesize. > > > +_scratch_mount > > + > > +f=$SCRATCH_MNT/f > > + > > +# the variable extra "leaf padding" file > > +$XFS_IO_PROG -fc "pwrite -q 0 700" $f.pad > > + > > +# the evil file with an inline extent followed by a prealloc extent > > +# created by falloc with keep-size, then two non-truncating writes to the front > > +touch $f.evil > > +$XFS_IO_PROG -fc "falloc -k 0 1m" $f.evil > > +$XFS_IO_PROG -fc fsync $f.evil > > +ino=$(stat -c '%i' $f.evil) > > +logical=$(get_prealloc_offset $ino) > > +$XFS_IO_PROG -fc "pwrite -q 0 23" $f.evil > > +$XFS_IO_PROG -fc fsync $f.evil > > +$XFS_IO_PROG -fc "pwrite -q 0 23" $f.evil > > +$XFS_IO_PROG -fc fsync $f.evil > > +sync > > This is complex, and it doesn't provide any comments regarding: > > 1) Why all this frenzy of fsync and a final sync (which makes the last > fsync pointless)? > > 2) Why do we need to write twice to the range [0, 23)? Honestly, I'm not sure. A single write results in a regular extent while two (separated by an fsync) results in an inline extent. Since we don't consider the inline extent a bug (which I inferred from I think your previous discussion with Qu for a similar case), I didn't prioritize digging into that behavior while working on this fix/test. > > Wouldn't something more simple like this work too: > > $XFS_IO_PROG -fc "falloc -k 0 1m" $f.evil > > # sync to commit the transaction and make sure dump-tree sees the fs tree with > # the prealloc extent when we try to get its bytenr. > sync > ino=$(stat -c '%i' $f.evil) > logical=$(get_prealloc_offset $ino) > > # Do a write that will result in an inline extent. > $XFS_IO_PROG -c "pwrite -q 0 23" $f.evil > > # A bunch of inodes to stuff dir items in front of the file extent items. > for i in $(seq 122); do > $XFS_IO_PROG -fc "pwrite -q 0 8192" $f.$i > done > > # Flush all dealloc so we get all the file extent items inserted in the fs tree. > sync > > Please add comments about each necessary step, as I've done there. > Otherwise it's very hard to understand why those steps are needed... > > I'm surprised no one commented on this before. > I'm comfortable with btrfs' internals and yet I can't understand why > there are so many steps, and what exactly each one is supposed to > accomplish. I like fstests having a literate style, so I'm happy to do this, but I must point out that it's far from universal in the repo... > > > + > > +# a bunch of inodes to stuff dir items in front of the extent items > > +for i in $(seq 122); do > > + $XFS_IO_PROG -fc "pwrite -q 0 8192" $f.$i > > +done > > +sync > > + > > +btrfs inspect-internal logical-resolve $logical $SCRATCH_MNT | _filter_scratch > > +_scratch_unmount > > The unmount is pointless, fstests framework will do that for us. > > Finally I would also like to see the _fixed_by_kernel_commit > annotation in the test. > Since the fix is not yet in Linus' tree, in IMO you can replace the > commit hash with "xxxx..." and later update the test once it's merged > in Linus' tree. > > Thanks. Thanks for the review. > > > + > > +echo "Silence is golden" > > +status=0 > > +exit > > diff --git a/tests/btrfs/279.out b/tests/btrfs/279.out > > new file mode 100644 > > index 00000000..c5906093 > > --- /dev/null > > +++ b/tests/btrfs/279.out > > @@ -0,0 +1,2 @@ > > +QA output created by 279 > > +Silence is golden > > -- > > 2.38.1 > >
diff --git a/tests/btrfs/279 b/tests/btrfs/279 new file mode 100755 index 00000000..ef77f84b --- /dev/null +++ b/tests/btrfs/279 @@ -0,0 +1,95 @@ +#! /bin/bash +# SPDX-License-Identifier: GPL-2.0 +# Copyright (c) 2022 Meta Platforms, Inc. All Rights Reserved. +# +# FS QA Test 279 +# +# Given a file with extents: +# [0 : 4096) (inline) +# [4096 : N] (prealloc) +# if a user uses the ioctl BTRFS_IOC_LOGICAL_INO[_V2] asking for the file of the +# non-inline extent, it results in reading the offset field of the inline +# extent, which is meaningless (it is full of user data..). If we are +# particularly lucky, it can be past the end of the extent buffer, resulting in +# a crash. +# +. ./common/preamble +_begin_fstest auto quick + +# Import common functions. +. ./common/filter +. ./common/dmlogwrites + +# real QA test starts here + +# Modify as appropriate. +_supported_fs btrfs +_require_scratch +_require_xfs_io_command "falloc" +_require_xfs_io_command "fsync" +_require_xfs_io_command "pwrite" + +dump_tree() { + $BTRFS_UTIL_PROG inspect-internal dump-tree $SCRATCH_DEV +} + +get_extent_data() { + local ino=$1 + dump_tree $SCRATCH_DEV | grep -A4 "($ino EXTENT_DATA " +} + +get_prealloc_offset() { + local ino=$1 + get_extent_data $ino | grep "disk byte" | awk '{print $5}' +} + +# This test needs to create conditions s.t. the special inode's inline extent +# is the first item in a leaf. Therefore, fix a leaf size and add +# items that are otherwise not necessary to reproduce the inline-prealloc +# condition to get to such a state. +# +# Roughly, the idea for getting the right item fill is to: +# 1. create an extra file with a variable sized inline extent item +# 2. create our evil file that will cause the panic +# 3. create a whole bunch of files to create a bunch of dir/index items +# 4. size the variable extent item s.t. the evil extent item is item 0 in the +# next leaf +# +# We do it in this somewhat convoluted way because the dir and index items all +# come before any inode, inode_ref, or extent_data items. So we can predictably +# create a bunch of them, then sneak in a funny sized extent to round out the +# difference. + +_scratch_mkfs "--nodesize 16k" >/dev/null +_scratch_mount + +f=$SCRATCH_MNT/f + +# the variable extra "leaf padding" file +$XFS_IO_PROG -fc "pwrite -q 0 700" $f.pad + +# the evil file with an inline extent followed by a prealloc extent +# created by falloc with keep-size, then two non-truncating writes to the front +touch $f.evil +$XFS_IO_PROG -fc "falloc -k 0 1m" $f.evil +$XFS_IO_PROG -fc fsync $f.evil +ino=$(stat -c '%i' $f.evil) +logical=$(get_prealloc_offset $ino) +$XFS_IO_PROG -fc "pwrite -q 0 23" $f.evil +$XFS_IO_PROG -fc fsync $f.evil +$XFS_IO_PROG -fc "pwrite -q 0 23" $f.evil +$XFS_IO_PROG -fc fsync $f.evil +sync + +# a bunch of inodes to stuff dir items in front of the extent items +for i in $(seq 122); do + $XFS_IO_PROG -fc "pwrite -q 0 8192" $f.$i +done +sync + +btrfs inspect-internal logical-resolve $logical $SCRATCH_MNT | _filter_scratch +_scratch_unmount + +echo "Silence is golden" +status=0 +exit diff --git a/tests/btrfs/279.out b/tests/btrfs/279.out new file mode 100644 index 00000000..c5906093 --- /dev/null +++ b/tests/btrfs/279.out @@ -0,0 +1,2 @@ +QA output created by 279 +Silence is golden
If we create a file that has an inline extent followed by a prealloc extent, then attempt to use the logical to inode ioctl on the prealloc extent, but in the overwritten range, backref resolution will process the inline extent. Depending on the leaf eb layout, this can panic. Add a new test for this condition. In the long run, we can add spew when we read out-of-bounds fields of inline extent items and simplify this test to look for dmesg warnings rather than trying to force a fairly fragile panic (dependent on non-standardized details of leaf layout). The test causes a kernel panic unless: btrfs: fix logical_ino ioctl panic is applied to the kernel. Signed-off-by: Boris Burkov <boris@bur.io> --- tests/btrfs/279 | 95 +++++++++++++++++++++++++++++++++++++++++++++ tests/btrfs/279.out | 2 + 2 files changed, 97 insertions(+) create mode 100755 tests/btrfs/279 create mode 100644 tests/btrfs/279.out