diff mbox series

[v5,5/5] nfs: Run on btrfs, ext4, xfs

Message ID 20230504131414.3826283-6-pvorel@suse.cz (mailing list archive)
State New, archived
Headers show
Series NFS: test on btrfs, ext4, xfs filesystems | expand

Commit Message

Petr Vorel May 4, 2023, 1:14 p.m. UTC
because there are some problems with at least vfat, exfat and tmpfs (on
nfs-utils < 2) instead of testing on all available filesystems, test
just on modern Linux filesystems btrfs, ext4, xfs.

Use variables:
* TST_ALL_FILESYSTEMS=1 to run on all filesystems
* TST_FORMAT_DEVICE=1 to get loop device formatted
* TST_MOUNT_DEVICE=1 to get it mounted

Filesystems (tested the usual LTP way on loop device) are used for
server side (exportfs), client side (NFS mount) is kept outside of it.

For some reason umounting needs some time before NFS server stops using
underlying loop device. Also exportfs needs time before files can be
removed. Otherwise second umounting fails:

nfs07 4 TINFO: Cleaning up testcase
umount: /var/tmp/LTP_nfs07.FNZ7yCbqZe/mntpoint: target is busy.
nfs07 4 TINFO: umount(/var/tmp/LTP_nfs07.FNZ7yCbqZe/mntpoint) failed, try 1 ...
nfs07 4 TINFO: Likely gvfsd-trash is probing newly mounted  fs, kill it to speed up tests.
umount: /var/tmp/LTP_nfs07.FNZ7yCbqZe/mntpoint: target is busy.

Solved with adding sleep for 2 sec after first umount and sleeps for 1 sec after exportfs.
Second umount of the loop device in tst_test.sh works without any extra
sleep.

Signed-off-by: Petr Vorel <pvorel@suse.cz>
---
 testcases/network/nfs/nfs_stress/nfs_lib.sh | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

Comments

Cyril Hrubis May 4, 2023, 2:01 p.m. UTC | #1
Hi!
>  nfs_get_remote_path()
> @@ -210,6 +213,7 @@ nfs_cleanup()
>  		grep -q "$local_dir" /proc/mounts && umount $local_dir
>  		n=$(( n + 1 ))
>  	done
> +	sleep 2
>  
>  	n=0
>  	for i in $VERSION; do
> @@ -219,12 +223,15 @@ nfs_cleanup()
>  		if tst_net_use_netns; then
>  			if test -d $remote_dir; then
>  				exportfs -u *:$remote_dir
> +				sleep 1
>  				rm -rf $remote_dir
>  			fi
>  		else
>  			tst_rhost_run -c "test -d $remote_dir && exportfs -u *:$remote_dir"
> +			sleep 1
>  			tst_rhost_run -c "test -d $remote_dir && rm -rf $remote_dir"
>  		fi
> +
>  		n=$(( n + 1 ))
>  	done

Generally I'm not happy about the sleeps in the code, the main problem
is that the test still may fail in a case that something was slower than
usuall and we decided to proceed after we slept for a pre-defined
interval. Ideally these should be replaced with a active waiting, i.e.
loop that checks some condition 10 times in a second or so. Hoewever I'm
okay with getting this in so that we manage to have these tests in
before the release and fixing it later on.
Petr Vorel May 4, 2023, 10 p.m. UTC | #2
> Hi!
> >  nfs_get_remote_path()
> > @@ -210,6 +213,7 @@ nfs_cleanup()
> >  		grep -q "$local_dir" /proc/mounts && umount $local_dir
> >  		n=$(( n + 1 ))
> >  	done
> > +	sleep 2

> >  	n=0
> >  	for i in $VERSION; do
> > @@ -219,12 +223,15 @@ nfs_cleanup()
> >  		if tst_net_use_netns; then
> >  			if test -d $remote_dir; then
> >  				exportfs -u *:$remote_dir
> > +				sleep 1
> >  				rm -rf $remote_dir
> >  			fi
> >  		else
> >  			tst_rhost_run -c "test -d $remote_dir && exportfs -u *:$remote_dir"
> > +			sleep 1
> >  			tst_rhost_run -c "test -d $remote_dir && rm -rf $remote_dir"
> >  		fi
> > +
> >  		n=$(( n + 1 ))
> >  	done

> Generally I'm not happy about the sleeps in the code, the main problem
> is that the test still may fail in a case that something was slower than
> usuall and we decided to proceed after we slept for a pre-defined
> interval. Ideally these should be replaced with a active waiting, i.e.
> loop that checks some condition 10 times in a second or so. Hoewever I'm
> okay with getting this in so that we manage to have these tests in
> before the release and fixing it later on.

I wonder myself what is wrong and if there is a problem with my code (likely) or
with nfsd or with loop device (Christoph Hellwig has been changing
drivers/block/loop.c quite a lot).

Because if first umount is too early, it does not recover (i.e. tst_umount
contain 50 times trying to umount with 100ms, and that does not help).

I also wonder how should I detect it's ready to be umounted.
I'll have look if losetup "fuser -vm /dev/loop0" would help, but instead of
depending on the device, we'd probably want to look into /proc/mounts, right?
Tomorrow I'll test this:

check_umount()
{
	local i
	local dir="$1"
	local type="${2:-lhost}"
	local cmd="grep -q $dir /proc/mounts"

	for i in $(seq 50); do
		if [ "$type" = "lhost" ]; then
			$cmd || return
		else
			tst_rhost_run -c "$cmd" || return
		fi
		tst_sleep 100ms
		tst_res TWARN "failed to umount '$dir'"
	done
}

	for i in $VERSION; do
		local_dir="$(get_local_dir $i $n)"
		grep -q "$local_dir" /proc/mounts && umount $local_dir

		# instead of 'sleep 2' below check here:
		check_umount "$local_dir"

		n=$(( n + 1 ))
	done

	n=0
	for i in $VERSION; do
		type=$(get_socket_type $n)
		remote_dir="$(get_remote_dir $i $type)"

		if tst_net_use_netns; then
			if test -d $remote_dir; then
				exportfs -u *:$remote_dir
				check_umount "$remote_dir" # instead of sleep 1
				rm -rf $remote_dir
			fi
		else
			tst_rhost_run -c "test -d $remote_dir && exportfs -u *:$remote_dir"
			check_umount "$remote_dir" rhost # instead of sleep 1
			tst_rhost_run -c "test -d $remote_dir && rm -rf $remote_dir"
		fi

		n=$(( n + 1 ))
	done

[1] https://gitlab.com/psmisc/psmisc/-/blob/master/src/fuser.c
Petr Vorel May 4, 2023, 10:02 p.m. UTC | #3
Hi,

...
> check_umount()
> {
> 	local i
> 	local dir="$1"
> 	local type="${2:-lhost}"
> 	local cmd="grep -q $dir /proc/mounts"

> 	for i in $(seq 50); do
> 		if [ "$type" = "lhost" ]; then
> 			$cmd || return
> 		else
> 			tst_rhost_run -c "$cmd" || return
> 		fi
> 		tst_sleep 100ms
> 		tst_res TWARN "failed to umount '$dir'"
> 	done
Obviously TWARN should be here.
...

Kind regards,
Petr
Petr Vorel May 4, 2023, 11:23 p.m. UTC | #4
Hi all,

...
> Tomorrow I'll test this:

> check_umount()
> {
> 	local i
> 	local dir="$1"
> 	local type="${2:-lhost}"
> 	local cmd="grep -q $dir /proc/mounts"

> 	for i in $(seq 50); do
> 		if [ "$type" = "lhost" ]; then
> 			$cmd || return
> 		else
> 			tst_rhost_run -c "$cmd" || return
> 		fi
> 		tst_sleep 100ms
> 		tst_res TWARN "failed to umount '$dir'"
> 	done
> }

> 	for i in $VERSION; do
> 		local_dir="$(get_local_dir $i $n)"
> 		grep -q "$local_dir" /proc/mounts && umount $local_dir

> 		# instead of 'sleep 2' below check here:
> 		check_umount "$local_dir"

> 		n=$(( n + 1 ))
> 	done

> 	n=0
> 	for i in $VERSION; do
> 		type=$(get_socket_type $n)
> 		remote_dir="$(get_remote_dir $i $type)"

> 		if tst_net_use_netns; then
> 			if test -d $remote_dir; then
> 				exportfs -u *:$remote_dir
> 				check_umount "$remote_dir" # instead of sleep 1
> 				rm -rf $remote_dir
> 			fi
> 		else
> 			tst_rhost_run -c "test -d $remote_dir && exportfs -u *:$remote_dir"
> 			check_umount "$remote_dir" rhost # instead of sleep 1
> 			tst_rhost_run -c "test -d $remote_dir && rm -rf $remote_dir"
> 		fi

> 		n=$(( n + 1 ))
> 	done

I tried various code (not only the one - wrong - above), but nothing works.
With fuser -vm, I wonder if there is some bug in older nfs versions:

                     USER        PID ACCESS COMMAND
/tmp/LTP_nfs08.fncaxJum2W/mntpoint/4/tcp:
                     root     kernel mount /tmp/LTP_nfs08.fncaxJum2W/mntpoint
                     root     kernel knfsd /tmp/LTP_nfs08.fncaxJum2W/mntpoint/4

Looking at c6c7f2a84da4 ("nfsd: Ensure knfsd shuts down when the "nfsd" pseudofs
is unmounted"), maybe there are other bugs, which causes files are used.

I also tried to use /proc/fs/nfsd/unlock_filesystem and
/proc/fs/nfsd/write_unlock_ip which made things better a bit (6.2.12), but still
it's not 100% solution. Also, "Release all locks on a local file system"
(unlock_filesystem) and Release all locks used by a client (write_unlock_ip) are
unlikely what I'm searching.

		umount $local_dir
		echo "$local_dir" > /proc/fs/nfsd/unlock_filesystem
		# $(tst_ipaddr rhost) => IP of the server
		echo "$(tst_ipaddr rhost)" > /proc/fs/nfsd/write_unlock_ip
		...

		exportfs -u *:$remote_dir
		echo "$remote_dir" > /proc/fs/nfsd/unlock_filesystem

=> any hint is welcome.

Kind regards,
Petr
Cyril Hrubis May 5, 2023, 11:08 a.m. UTC | #5
Hi!
Let's get this merged with the sleeps for now and lets try to fix it
properly after the release. You can push the patchset with my Ack.
Petr Vorel May 5, 2023, 12:59 p.m. UTC | #6
Hi Cyril,

> Hi!
> Let's get this merged with the sleeps for now and lets try to fix it
> properly after the release. You can push the patchset with my Ack.

Patchset merged, I guess we can freeze LTP before the release.
Thanks!

Kind regards,
Petr
diff mbox series

Patch

diff --git a/testcases/network/nfs/nfs_stress/nfs_lib.sh b/testcases/network/nfs/nfs_stress/nfs_lib.sh
index 042fea5e4..abf7ba5a2 100644
--- a/testcases/network/nfs/nfs_stress/nfs_lib.sh
+++ b/testcases/network/nfs/nfs_stress/nfs_lib.sh
@@ -28,7 +28,10 @@  NFS_PARSE_ARGS_CALLER="$TST_PARSE_ARGS"
 TST_OPTS="v:t:$TST_OPTS"
 TST_PARSE_ARGS=nfs_parse_args
 TST_USAGE=nfs_usage
-TST_NEEDS_TMPDIR=1
+TST_ALL_FILESYSTEMS=1
+TST_SKIP_FILESYSTEMS="exfat,ext2,ext3,fuse,ntfs,vfat,tmpfs"
+TST_MOUNT_DEVICE=1
+TST_FORMAT_DEVICE=1
 TST_NEEDS_ROOT=1
 TST_NEEDS_CMDS="$TST_NEEDS_CMDS mount exportfs mount.nfs"
 TST_SETUP="${TST_SETUP:-nfs_setup}"
@@ -68,7 +71,7 @@  get_remote_dir()
 	local v="$1"
 	local n="$2"
 
-	echo "$TST_TMPDIR/$v/$n"
+	echo "$TST_MNTPOINT/$v/$n"
 }
 
 nfs_get_remote_path()
@@ -210,6 +213,7 @@  nfs_cleanup()
 		grep -q "$local_dir" /proc/mounts && umount $local_dir
 		n=$(( n + 1 ))
 	done
+	sleep 2
 
 	n=0
 	for i in $VERSION; do
@@ -219,12 +223,15 @@  nfs_cleanup()
 		if tst_net_use_netns; then
 			if test -d $remote_dir; then
 				exportfs -u *:$remote_dir
+				sleep 1
 				rm -rf $remote_dir
 			fi
 		else
 			tst_rhost_run -c "test -d $remote_dir && exportfs -u *:$remote_dir"
+			sleep 1
 			tst_rhost_run -c "test -d $remote_dir && rm -rf $remote_dir"
 		fi
+
 		n=$(( n + 1 ))
 	done
 }