Message ID | 837d97d52fee15653d1dac216d1d75a14bb1916d.1717586749.git.fdmanana@suse.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | btrfs/280: run defrag after creating file to get expected extent layout | expand |
On Wed, 5 Jun 2024 12:26:20 +0100, fdmanana@kernel.org wrote: > From: Filipe Manana <fdmanana@suse.com> > > The test writes a 128M file and expects to end up with 1024 extents, each > with a size of 128K, which is the maximum size for compressed extents. > Generally this is what happens, but often it's possibly for writeback to > kick in while creating the file (due to memory pressure, or something > calling sync in parallel, etc) which may result in creating more and > smaller extents, which makes the test fail since its golden output > expects exactly 1024 extents with a size of 128K each. > > So to work around run defrag after creating the file, which will ensure > we get only 128K extents in the file. > > Signed-off-by: Filipe Manana <fdmanana@suse.com> Looks fine. Signed-off-by: David Disseldorp <ddiss@suse.de> > --- > tests/btrfs/280 | 10 +++++++++- > 1 file changed, 9 insertions(+), 1 deletion(-) > > diff --git a/tests/btrfs/280 b/tests/btrfs/280 > index d4f613ce..0f7f8a37 100755 > --- a/tests/btrfs/280 > +++ b/tests/btrfs/280 > @@ -13,7 +13,7 @@ > # the backref walking code, used by fiemap to determine if an extent is shared. > # > . ./common/preamble > -_begin_fstest auto quick compress snapshot fiemap > +_begin_fstest auto quick compress snapshot fiemap defrag > > . ./common/filter > . ./common/punch # for _filter_fiemap_flags _require_defrag might be worth calling, but it doesn't really do anything for btrfs, so I'm fine either way.
On Wed, Jun 05, 2024 at 12:26:20PM +0100, fdmanana@kernel.org wrote: > From: Filipe Manana <fdmanana@suse.com> > > The test writes a 128M file and expects to end up with 1024 extents, each > with a size of 128K, which is the maximum size for compressed extents. > Generally this is what happens, but often it's possibly for writeback to > kick in while creating the file (due to memory pressure, or something > calling sync in parallel, etc) which may result in creating more and > smaller extents, which makes the test fail since its golden output > expects exactly 1024 extents with a size of 128K each. > > So to work around run defrag after creating the file, which will ensure > we get only 128K extents in the file. > > Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com>
在 2024/6/5 20:56, fdmanana@kernel.org 写道: > From: Filipe Manana <fdmanana@suse.com> > > The test writes a 128M file and expects to end up with 1024 extents, each > with a size of 128K, which is the maximum size for compressed extents. > Generally this is what happens, but often it's possibly for writeback to > kick in while creating the file (due to memory pressure, or something > calling sync in parallel, etc) which may result in creating more and > smaller extents, which makes the test fail since its golden output > expects exactly 1024 extents with a size of 128K each. > > So to work around run defrag after creating the file, which will ensure > we get only 128K extents in the file. But defrag is not much different than reading the page and set it dirty for writeback again. It can be affected by the same memory pressure things to get split. I guess you choose compressed file extents is to bump up the subvolume tree meanwhile also compatible for all sector sizes. In that case, what about doing DIO using sectorsize of the fs? So that each dio write would result one file extent item, meanwhile since it's a single sector/page, memory pressure will never be able to writeback that sector halfway. Thanks, Qu > > Signed-off-by: Filipe Manana <fdmanana@suse.com> > --- > tests/btrfs/280 | 10 +++++++++- > 1 file changed, 9 insertions(+), 1 deletion(-) > > diff --git a/tests/btrfs/280 b/tests/btrfs/280 > index d4f613ce..0f7f8a37 100755 > --- a/tests/btrfs/280 > +++ b/tests/btrfs/280 > @@ -13,7 +13,7 @@ > # the backref walking code, used by fiemap to determine if an extent is shared. > # > . ./common/preamble > -_begin_fstest auto quick compress snapshot fiemap > +_begin_fstest auto quick compress snapshot fiemap defrag > > . ./common/filter > . ./common/punch # for _filter_fiemap_flags > @@ -36,6 +36,14 @@ _scratch_mount -o compress > # extent tree (if the root was a leaf, we would have only data references). > $XFS_IO_PROG -f -c "pwrite -b 1M 0 128M" $SCRATCH_MNT/foo | _filter_xfs_io > > +# While writing the file it's possible, but rare, that writeback kicked in due > +# to memory pressure or a concurrent sync call for example, so we may end up > +# with extents smaller than 128K (the maximum size for compressed extents) and > +# therefore make the test expectations fail because we get more extents than > +# what the golden output expects. So run defrag to make sure we get exactly > +# the expected number of 128K extents (1024 extents). > +$BTRFS_UTIL_PROG filesystem defrag "$SCRATCH_MNT/foo" >> $seqres.full > + > # Create a RW snapshot of the default subvolume. > _btrfs subvolume snapshot $SCRATCH_MNT $SCRATCH_MNT/snap >
On Wed, Jun 5, 2024 at 11:30 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote: > > > > 在 2024/6/5 20:56, fdmanana@kernel.org 写道: > > From: Filipe Manana <fdmanana@suse.com> > > > > The test writes a 128M file and expects to end up with 1024 extents, each > > with a size of 128K, which is the maximum size for compressed extents. > > Generally this is what happens, but often it's possibly for writeback to > > kick in while creating the file (due to memory pressure, or something > > calling sync in parallel, etc) which may result in creating more and > > smaller extents, which makes the test fail since its golden output > > expects exactly 1024 extents with a size of 128K each. > > > > So to work around run defrag after creating the file, which will ensure > > we get only 128K extents in the file. > > But defrag is not much different than reading the page and set it dirty > for writeback again. > > It can be affected by the same memory pressure things to get split. Defrag locks the range, the pages, then dirties the pages and then unlocks the pages. So any writeback attempt happening in parallel will wait for the pages to be unlocked. So we shouldn't get extents smaller than 128K. Did I miss anything? > > I guess you choose compressed file extents is to bump up the subvolume > tree meanwhile also compatible for all sector sizes. Yes, and to be fast and use very little space. > > In that case, what about doing DIO using sectorsize of the fs? > So that each dio write would result one file extent item, meanwhile > since it's a single sector/page, memory pressure will never be able to > writeback that sector halfway. I thought about DIO, but would have to leave holes between every extent (and for that I would rather use buffered IO for simplicity and probably faster). Otherwise fiemap merges all adjacent extents, you get one 8M extent reported, covering the range of the odd single profile data block group created by mkfs, and another one for the remaining of the file - it's just ugly and hard to reason about, plus that could break one day if we ever get rid of that 8M data block group. > > Thanks, > Qu > > > > Signed-off-by: Filipe Manana <fdmanana@suse.com> > > --- > > tests/btrfs/280 | 10 +++++++++- > > 1 file changed, 9 insertions(+), 1 deletion(-) > > > > diff --git a/tests/btrfs/280 b/tests/btrfs/280 > > index d4f613ce..0f7f8a37 100755 > > --- a/tests/btrfs/280 > > +++ b/tests/btrfs/280 > > @@ -13,7 +13,7 @@ > > # the backref walking code, used by fiemap to determine if an extent is shared. > > # > > . ./common/preamble > > -_begin_fstest auto quick compress snapshot fiemap > > +_begin_fstest auto quick compress snapshot fiemap defrag > > > > . ./common/filter > > . ./common/punch # for _filter_fiemap_flags > > @@ -36,6 +36,14 @@ _scratch_mount -o compress > > # extent tree (if the root was a leaf, we would have only data references). > > $XFS_IO_PROG -f -c "pwrite -b 1M 0 128M" $SCRATCH_MNT/foo | _filter_xfs_io > > > > +# While writing the file it's possible, but rare, that writeback kicked in due > > +# to memory pressure or a concurrent sync call for example, so we may end up > > +# with extents smaller than 128K (the maximum size for compressed extents) and > > +# therefore make the test expectations fail because we get more extents than > > +# what the golden output expects. So run defrag to make sure we get exactly > > +# the expected number of 128K extents (1024 extents). > > +$BTRFS_UTIL_PROG filesystem defrag "$SCRATCH_MNT/foo" >> $seqres.full > > + > > # Create a RW snapshot of the default subvolume. > > _btrfs subvolume snapshot $SCRATCH_MNT $SCRATCH_MNT/snap > >
在 2024/6/6 08:47, Filipe Manana 写道: > On Wed, Jun 5, 2024 at 11:30 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote: >> >> >> >> 在 2024/6/5 20:56, fdmanana@kernel.org 写道: >>> From: Filipe Manana <fdmanana@suse.com> >>> >>> The test writes a 128M file and expects to end up with 1024 extents, each >>> with a size of 128K, which is the maximum size for compressed extents. >>> Generally this is what happens, but often it's possibly for writeback to >>> kick in while creating the file (due to memory pressure, or something >>> calling sync in parallel, etc) which may result in creating more and >>> smaller extents, which makes the test fail since its golden output >>> expects exactly 1024 extents with a size of 128K each. >>> >>> So to work around run defrag after creating the file, which will ensure >>> we get only 128K extents in the file. >> >> But defrag is not much different than reading the page and set it dirty >> for writeback again. >> >> It can be affected by the same memory pressure things to get split. > > Defrag locks the range, the pages, then dirties the pages and then > unlocks the pages. So any writeback attempt happening in parallel will > wait for the pages > to be unlocked. So we shouldn't get extents smaller than 128K. Did I > miss anything? > You're right, I forgot the page is also locked, and the defrag cluster size is 256K, exactly aligned with compression extent size. So it's completely fine. >> >> I guess you choose compressed file extents is to bump up the subvolume >> tree meanwhile also compatible for all sector sizes. > > Yes, and to be fast and use very little space. > >> >> In that case, what about doing DIO using sectorsize of the fs? >> So that each dio write would result one file extent item, meanwhile >> since it's a single sector/page, memory pressure will never be able to >> writeback that sector halfway. > > I thought about DIO, but would have to leave holes between every > extent (and for that I would rather use buffered IO for simplicity and > probably faster). > Otherwise fiemap merges all adjacent extents, you get one 8M extent > reported, covering the range of the odd single profile data block group created > by mkfs, and another one for the remaining of the file - it's just > ugly and hard to reason about, plus that could break one day if we > ever get rid of that 8M data block group. Yep, fiemap merging is another problem. So this looks totally fine now. Reviewed-by: Qu Wenruo <wqu@suse.com> Thanks, Qu > > > >> >> Thanks, >> Qu >>> >>> Signed-off-by: Filipe Manana <fdmanana@suse.com> >>> --- >>> tests/btrfs/280 | 10 +++++++++- >>> 1 file changed, 9 insertions(+), 1 deletion(-) >>> >>> diff --git a/tests/btrfs/280 b/tests/btrfs/280 >>> index d4f613ce..0f7f8a37 100755 >>> --- a/tests/btrfs/280 >>> +++ b/tests/btrfs/280 >>> @@ -13,7 +13,7 @@ >>> # the backref walking code, used by fiemap to determine if an extent is shared. >>> # >>> . ./common/preamble >>> -_begin_fstest auto quick compress snapshot fiemap >>> +_begin_fstest auto quick compress snapshot fiemap defrag >>> >>> . ./common/filter >>> . ./common/punch # for _filter_fiemap_flags >>> @@ -36,6 +36,14 @@ _scratch_mount -o compress >>> # extent tree (if the root was a leaf, we would have only data references). >>> $XFS_IO_PROG -f -c "pwrite -b 1M 0 128M" $SCRATCH_MNT/foo | _filter_xfs_io >>> >>> +# While writing the file it's possible, but rare, that writeback kicked in due >>> +# to memory pressure or a concurrent sync call for example, so we may end up >>> +# with extents smaller than 128K (the maximum size for compressed extents) and >>> +# therefore make the test expectations fail because we get more extents than >>> +# what the golden output expects. So run defrag to make sure we get exactly >>> +# the expected number of 128K extents (1024 extents). >>> +$BTRFS_UTIL_PROG filesystem defrag "$SCRATCH_MNT/foo" >> $seqres.full >>> + >>> # Create a RW snapshot of the default subvolume. >>> _btrfs subvolume snapshot $SCRATCH_MNT $SCRATCH_MNT/snap >>> >
diff --git a/tests/btrfs/280 b/tests/btrfs/280 index d4f613ce..0f7f8a37 100755 --- a/tests/btrfs/280 +++ b/tests/btrfs/280 @@ -13,7 +13,7 @@ # the backref walking code, used by fiemap to determine if an extent is shared. # . ./common/preamble -_begin_fstest auto quick compress snapshot fiemap +_begin_fstest auto quick compress snapshot fiemap defrag . ./common/filter . ./common/punch # for _filter_fiemap_flags @@ -36,6 +36,14 @@ _scratch_mount -o compress # extent tree (if the root was a leaf, we would have only data references). $XFS_IO_PROG -f -c "pwrite -b 1M 0 128M" $SCRATCH_MNT/foo | _filter_xfs_io +# While writing the file it's possible, but rare, that writeback kicked in due +# to memory pressure or a concurrent sync call for example, so we may end up +# with extents smaller than 128K (the maximum size for compressed extents) and +# therefore make the test expectations fail because we get more extents than +# what the golden output expects. So run defrag to make sure we get exactly +# the expected number of 128K extents (1024 extents). +$BTRFS_UTIL_PROG filesystem defrag "$SCRATCH_MNT/foo" >> $seqres.full + # Create a RW snapshot of the default subvolume. _btrfs subvolume snapshot $SCRATCH_MNT $SCRATCH_MNT/snap