diff mbox series

fstests: btrfs/002: fix the OOM caused by too large block size

Message ID 20241012071824.124468-1-wqu@suse.com (mailing list archive)
State New, archived
Headers show
Series fstests: btrfs/002: fix the OOM caused by too large block size | expand

Commit Message

Qu Wenruo Oct. 12, 2024, 7:18 a.m. UTC
[BUG]
When running the test case btrfs/002, with 64K page size and 64K sector
size, and the VM doesn't have much memory (in my case 4G Vram), the test
case will trigger OOM and fail:

btrfs/002 4s ... [failed, exit status 1]- output mismatch (see /home/adam/xfstests-dev/results//btrfs/002.out.bad)
    --- tests/btrfs/002.out	2024-04-25 18:13:45.035555469 +0930
    +++ /home/adam/xfstests-dev/results//btrfs/002.out.bad	2024-10-12 17:19:48.785156223 +1030
    @@ -1,2 +1 @@
     QA output created by 002
    -Silence is golden
    ...

The OOM is triggered by the dd process, and a lot of dd processes are
using too much memory:

 dd invoked oom-killer: gfp_mask=0x140dca(GFP_HIGHUSER_MOVABLE|__GFP_COMP|__GFP_ZERO), order=0, oom_score_adj=250
 CPU: 0 UID: 0 PID: 185764 Comm: dd Not tainted 6.12.0-rc2-custom+ #76
 Hardware name: QEMU KVM Virtual Machine, BIOS unknown 2/2/2022
 Tasks state (memory values in pages):
 [  pid  ]   uid  tgid total_vm      rss rss_anon rss_file rss_shmem pgtables_bytes swapents oom_score_adj name
 [ 185665]     0 185665     8688     3840     3840        0         0   458752     4832           250 dd
 [ 185672]     0 185672     8688     2432     2432        0         0   393216     5312           250 dd
 [ 185680]     0 185680     8688     2016     2016        0         0   458752     4960           250 dd
 [ 185686]     0 185686     8688     2080     2080        0         0   458752     3584           250 dd
 [ 185693]     0 185693     8688     2144     2144        0         0   458752     4384           250 dd
 [ 185700]     0 185700     8688     2176     2176        0         0   458752     3584           250 dd
 [ 185707]     0 185707     8688     1792     1792        0         0   524288     3616           250 dd
 [ 185714]     0 185714     8688     2304     2304        0         0   458752     3488           250 dd
 [ 185721]     0 185721     8688     1920     1920        0         0   458752     2624           250 dd
 [ 185728]     0 185728     8688     2272     2272        0         0   393216     2528           250 dd
 [ 185735]     0 185735     8688     2048     2048        0         0   393216     3552           250 dd
 [ 185742]     0 185742     8688     1984     1984        0         0   458752     2816           250 dd
 [ 185751]     0 185751     8688     1600     1600        0         0   458752     2784           250 dd
 [ 185756]     0 185756     8688     1120     1120        0         0   458752     2400           250 dd
 [ 185764]     0 185764     8688     1504     1504        0         0   393216     2240           250 dd
 [ 185772]     0 185772     8688     1504     1504        0         0   458752     1984           250 dd
 [ 185777]     0 185777     8688     1280     1280        0         0   393216     2336           250 dd
 [ 185784]     0 185784     8688     2144     2144        0         0   393216     2272           250 dd
 [ 185791]     0 185791     8688     2176     2176        0         0   458752      576           250 dd
 [ 185798]     0 185798     8688     1696     1696        0         0   458752     1536           250 dd
 [ 185806]     0 185806     8688     1728     1728        0         0   393216      544           250 dd
 [ 185815]     0 185815     8688     2240     2240        0         0   458752        0           250 dd
 [ 185819]     0 185819     8688     1504     1504        0         0   458752      384           250 dd
 [ 185826]     0 185826     8688     1536     1536        0         0   458752      160           250 dd
 [ 185833]     0 185833     8688     2944     2944        0         0   458752       64           250 dd
 [ 185838]     0 185838     8688     2400     2400        0         0   458752        0           250 dd
 [ 185847]     0 185847     8688      864      864        0         0   458752        0           250 dd
 [ 185853]     0 185853     8688     1088     1088        0         0   393216        0           250 dd
 [ 185860]     0 185860     8688      416      416        0         0   393216        0           250 dd
 [ 185867]     0 185867     8688      352      352        0         0   458752        0           250 dd

[CAUSE]
The test case workload _fill_blk() is going to fill the file to its block
boundary.

But the implementation is not taking larger blocks into consideration.

	FSIZE=`stat -t $i | cut -d" " -f2`
	BLKS=`stat -c "%B" $i`
	NBLK=`stat -c "%b" $i`
	FALLOC=$(($BLKS * $NBLK))
	WS=$(($FALLOC - $FSIZE))

$FSIZE is the file size, $BLKS is the size of each reported block,
$NBLK is the number of blocks the file takes, thus $FALLOC is the
rounded up block size.

For 64K sector size, the BLKS is 512, and BLKS is 128 (one 64K sector).
$FALLOC is the correct value of 64K (10K rounded up to 64K).

Then the problem comes to how the write is done:

	_ddt of=$i oseek=$FSIZE obs=$WS count=1 status=noxfer 2>/dev/null &

Unfrotunately the above command is using output block size of 54K, and
need to skip 10K * 54K bytes, resulting a file size of 540M.

So far although it's not the correct intention, it's not yet causing
problem.

But during _append_file(), we further enlarge the file by:

		FSIZE=`stat -t $i | cut -d" " -f2`
		dd if=$X of=$i seek=1 bs=$FSIZE obs=$FSIZE count=1 status=noxfer 2>/dev/null &

In above case, since the previous file is 540M size, the output block
size will also be 540M, taking a lot of memory.

Furthermore since the workload is run in background, we can have many dd
processes taking up at least 540M, causing huge memory usage and trigger
OOM.

[FIX]
The original code is already not doing what it should do, just get rid of
the cursed dd command usage inside _fill_blk(), and use pwrite from
xfs_io instead.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 tests/btrfs/002 | 14 +++++---------
 1 file changed, 5 insertions(+), 9 deletions(-)

Comments

Anand Jain Oct. 22, 2024, 11:24 a.m. UTC | #1
looks good.
Nits:

> For 64K sector size, the BLKS is 512, and BLKS is 128 (one 64K sector).
                                             ^^ NBLKS

> $FALLOC is the correct value of 64K (10K rounded up to 64K).
> 

Not sure what you meant here. More below.


> diff --git a/tests/btrfs/002 b/tests/btrfs/002
> index f223cc60..0c231b89 100755
> --- a/tests/btrfs/002
> +++ b/tests/btrfs/002
> @@ -70,19 +70,14 @@ _read_modify_write()
>   _fill_blk()
>   {
>   	local FSIZE
> -	local BLKS
> -	local NBLK
> -	local FALLOC
> -	local WS
> +	local NEWSIZE
>   
>   	for i in `find /$1 -type f`
>   	do
>   		FSIZE=`stat -t $i | cut -d" " -f2`
> -		BLKS=`stat -c "%B" $i`
> -		NBLK=`stat -c "%b" $i`
> -		FALLOC=$(($BLKS * $NBLK))
> -		WS=$(($FALLOC - $FSIZE))
> -		_ddt of=$i oseek=$FSIZE obs=$WS count=1 status=noxfer 2>/dev/null &


> +		NEWSIZE=$(( ($FSIZE + $blksize -1 ) / $blksize * $blksize ))


Please add extra ( ) otherwise it is very confusing.

    NEWSIZE=$(( (($FSIZE + $blksize -1 ) / $blksize) * $blksize ))



> +
> +		$XFS_IO_PROG -c "pwrite -i /dev/urandom $FSIZE $(( $NEWSIZE - $FSIZE ))" $i > /dev/null &



Reviewed-by: Anand Jain <anand.jain@oracle.com>


Thanks, Anand
Qu Wenruo Oct. 22, 2024, 9:08 p.m. UTC | #2
在 2024/10/22 21:54, Anand Jain 写道:
> 
> looks good.
> Nits:
> 
>> For 64K sector size, the BLKS is 512, and BLKS is 128 (one 64K sector).
>                                              ^^ NBLKS
> 
>> $FALLOC is the correct value of 64K (10K rounded up to 64K).
>>
> 
> Not sure what you meant here. More below.

I just mean the old $FALLOC value is 64K, which is correct.

I didn't see anything confusing here, I was explaining each variable 
involved.

> 
> 
>> diff --git a/tests/btrfs/002 b/tests/btrfs/002
>> index f223cc60..0c231b89 100755
>> --- a/tests/btrfs/002
>> +++ b/tests/btrfs/002
>> @@ -70,19 +70,14 @@ _read_modify_write()
>>   _fill_blk()
>>   {
>>       local FSIZE
>> -    local BLKS
>> -    local NBLK
>> -    local FALLOC
>> -    local WS
>> +    local NEWSIZE
>>       for i in `find /$1 -type f`
>>       do
>>           FSIZE=`stat -t $i | cut -d" " -f2`
>> -        BLKS=`stat -c "%B" $i`
>> -        NBLK=`stat -c "%b" $i`
>> -        FALLOC=$(($BLKS * $NBLK))
>> -        WS=$(($FALLOC - $FSIZE))
>> -        _ddt of=$i oseek=$FSIZE obs=$WS count=1 status=noxfer 2>/dev/ 
>> null &
> 
> 
>> +        NEWSIZE=$(( ($FSIZE + $blksize -1 ) / $blksize * $blksize ))
> 
> 
> Please add extra ( ) otherwise it is very confusing.
> 
>     NEWSIZE=$(( (($FSIZE + $blksize -1 ) / $blksize) * $blksize ))

I didn't see how it's confusing.

It's the very basic round down, and I didn't see any kernel code doing 
extra () just for round down.

Thanks,
Qu
> 
> 
> 
>> +
>> +        $XFS_IO_PROG -c "pwrite -i /dev/urandom $FSIZE $(( $NEWSIZE - 
>> $FSIZE ))" $i > /dev/null &
> 
> 
> 
> Reviewed-by: Anand Jain <anand.jain@oracle.com>
> 
> 
> Thanks, Anand
diff mbox series

Patch

diff --git a/tests/btrfs/002 b/tests/btrfs/002
index f223cc60..0c231b89 100755
--- a/tests/btrfs/002
+++ b/tests/btrfs/002
@@ -70,19 +70,14 @@  _read_modify_write()
 _fill_blk()
 {
 	local FSIZE
-	local BLKS
-	local NBLK
-	local FALLOC
-	local WS
+	local NEWSIZE
 
 	for i in `find /$1 -type f`
 	do
 		FSIZE=`stat -t $i | cut -d" " -f2`
-		BLKS=`stat -c "%B" $i`
-		NBLK=`stat -c "%b" $i`
-		FALLOC=$(($BLKS * $NBLK))
-		WS=$(($FALLOC - $FSIZE))
-		_ddt of=$i oseek=$FSIZE obs=$WS count=1 status=noxfer 2>/dev/null &
+		NEWSIZE=$(( ($FSIZE + $blksize -1 ) / $blksize * $blksize ))
+
+		$XFS_IO_PROG -c "pwrite -i /dev/urandom $FSIZE $(( $NEWSIZE - $FSIZE ))" $i > /dev/null &
 	done
 	wait $!
 }
@@ -118,6 +113,7 @@  $BTRFS_UTIL_PROG subvolume create $firstvol > /dev/null || _fail "btrfs subvolum
 dirp=`mktemp -duq $firstvol/dir.XXXXXX`
 _populate_fs -n 1 -f 20 -d 10 -r $dirp -s 10 -c
 SNAPNAME=0
+blksize=$(_get_block_size $SCRATCH_MNT)
 _create_snap $firstvol
 _save_checksum $firstvol $tmp.sv1.sum
 _verify_checksum $SNAPNAME $tmp.sv1.sum