Dcache oops - Patchwork

Message ID	20160604005611.GA14480@ZenIV.linux.org.uk (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-nfs-owner@kernel.org> Date: Sat, 4 Jun 2016 01:56:11 +0100 From: Al Viro <viro@ZenIV.linux.org.uk> To: Jeff Layton <jlayton@poochiereds.net> Cc: Linus Torvalds <torvalds@linux-foundation.org>, "<linux-kernel@vger.kernel.org> Mailing List" <linux-kernel@vger.kernel.org>, "<linux-fsdevel@vger.kernel.org>" <linux-fsdevel@vger.kernel.org>, Oleg Drokin <green@linuxhacker.ru>, linux-nfs@vger.kernel.org Subject: Re: Dcache oops Message-ID: <20160604005611.GA14480@ZenIV.linux.org.uk> References: <74306F63-DBDF-4DED-85D2-5C3FB21B8A1E@linuxhacker.ru> <20160603182203.GR14480@ZenIV.linux.org.uk> <4285E00F-7228-485C-AD32-97552ED746F2@linuxhacker.ru> <20160603200759.GS14480@ZenIV.linux.org.uk> <CA+55aFznPit97evuwZHdCG55=N0QbkFSST2ydKxvExQTR-YXTg@mail.gmail.com> <20160603212652.GT14480@ZenIV.linux.org.uk> <CA+55aFyYkbXqoJ3PqfrnxWDEXJM_BH6Q_OdKf+UVJYEUF4gdQA@mail.gmail.com> <20160603222355.GW14480@ZenIV.linux.org.uk> <20160603223700.GY14480@ZenIV.linux.org.uk> <DA67EEEA-D769-4D0F-9C20-6145F2B0808F@linuxhacker.ru> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <DA67EEEA-D769-4D0F-9C20-6145F2B0808F@linuxhacker.ru> User-Agent: Mutt/1.6.0 (2016-04-01) Sender: linux-nfs-owner@vger.kernel.org Precedence: bulk

Message ID

20160604005611.GA14480@ZenIV.linux.org.uk (mailing list archive)

State

New, archived

Headers

Date: Sat, 4 Jun 2016 01:56:11 +0100
From: Al Viro <viro@ZenIV.linux.org.uk>
To: Jeff Layton <jlayton@poochiereds.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	"<linux-kernel@vger.kernel.org> Mailing List" 
	<linux-kernel@vger.kernel.org>,
	"<linux-fsdevel@vger.kernel.org>" <linux-fsdevel@vger.kernel.org>,
	Oleg Drokin <green@linuxhacker.ru>, linux-nfs@vger.kernel.org
Subject: Re: Dcache oops
Message-ID: <20160604005611.GA14480@ZenIV.linux.org.uk>
References: <74306F63-DBDF-4DED-85D2-5C3FB21B8A1E@linuxhacker.ru>
	<20160603182203.GR14480@ZenIV.linux.org.uk>
	<4285E00F-7228-485C-AD32-97552ED746F2@linuxhacker.ru>
	<20160603200759.GS14480@ZenIV.linux.org.uk>
	<CA+55aFznPit97evuwZHdCG55=N0QbkFSST2ydKxvExQTR-YXTg@mail.gmail.com>
	<20160603212652.GT14480@ZenIV.linux.org.uk>
	<CA+55aFyYkbXqoJ3PqfrnxWDEXJM_BH6Q_OdKf+UVJYEUF4gdQA@mail.gmail.com>
	<20160603222355.GW14480@ZenIV.linux.org.uk>
	<20160603223700.GY14480@ZenIV.linux.org.uk>
	<DA67EEEA-D769-4D0F-9C20-6145F2B0808F@linuxhacker.ru>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <DA67EEEA-D769-4D0F-9C20-6145F2B0808F@linuxhacker.ru>
User-Agent: Mutt/1.6.0 (2016-04-01)
Sender: linux-nfs-owner@vger.kernel.org
Precedence: bulk

Commit Message

Al Viro June 4, 2016, 12:56 a.m. UTC

On Fri, Jun 03, 2016 at 07:58:37PM -0400, Oleg Drokin wrote:

> > EOPENSTALE, that is...  Oleg, could you check if the following works?
> 
> Yes, this one lasted for an hour with no crashing, so it must be good.
> Thanks.
> (note, I am not equipped to verify correctness of NFS operations, though).

I suspect that Jeff Layton might have relevant regression tests.  Incidentally,
we really need a consolidated regression testsuite, including the tests you'd
been running.  Right now there's some stuff in xfstests, LTP and cthon; if
anything, this mess shows just why we need all of that and then some in
a single place.  Lustre stuff has caught a 3 years old NFS bug (missing
d_drop() in nfs_atomic_open()) and a year-old bug in handling of EOPENSTALE
retries on the last component of a trailing non-embedded symlink.  Neither
is hard to trigger; it's just that relevant tests hadn't been run on NFS,
period.

Jeff, could you verify that the following does not cause regressions in
stale fhandles treatment?  I want to rip the damn retry logics out of
do_last() and if the staleness had only been discovered inside of
nfs4_file_open() just have the upper-level logics handle it by doing
a normal LOOKUP_REVAL pass from scratch.  To hell with trying to be clever;
a few roundtrips it saves us in some cases is not worth the complexity and
potential for bugs.  I'm fairly sure that the time spent debugging this
particular turd exceeds the total amount of time it has ever saved,
and do_last() is in dire need of simplification.  All talk about "enough eyes"
isn't worth much when the readers of code in question feel like ripping their
eyes out...

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Jeff Layton June 4, 2016, 12:25 p.m. UTC | #1

On Sat, 2016-06-04 at 01:56 +0100, Al Viro wrote:
> On Fri, Jun 03, 2016 at 07:58:37PM -0400, Oleg Drokin wrote:
> 
> > 
> > > 
> > > EOPENSTALE, that is...  Oleg, could you check if the following works?
> > Yes, this one lasted for an hour with no crashing, so it must be good.
> > Thanks.
> > (note, I am not equipped to verify correctness of NFS operations, though).
> I suspect that Jeff Layton might have relevant regression tests.  Incidentally,
> we really need a consolidated regression testsuite, including the tests you'd
> been running.  Right now there's some stuff in xfstests, LTP and cthon; if
> anything, this mess shows just why we need all of that and then some in
> a single place.  Lustre stuff has caught a 3 years old NFS bug (missing
> d_drop() in nfs_atomic_open()) and a year-old bug in handling of EOPENSTALE
> retries on the last component of a trailing non-embedded symlink.  Neither
> is hard to trigger; it's just that relevant tests hadn't been run on NFS,
> period.
> 
> Jeff, could you verify that the following does not cause regressions in
> stale fhandles treatment?  I want to rip the damn retry logics out of
> do_last() and if the staleness had only been discovered inside of
> nfs4_file_open() just have the upper-level logics handle it by doing
> a normal LOOKUP_REVAL pass from scratch.  To hell with trying to be clever;
> a few roundtrips it saves us in some cases is not worth the complexity and
> potential for bugs.  I'm fairly sure that the time spent debugging this
> particular turd exceeds the total amount of time it has ever saved,
> and do_last() is in dire need of simplification.  All talk about "enough eyes"
> isn't worth much when the readers of code in question feel like ripping their
> eyes out...
> 

Agreed. I see no need to optimize an error case here. Any performance
hit that we'd get here is almost certainly acceptable in this
situation. The main thing is that we prevent the ESTALE from bubbling
up into userland if we can avoid it by retrying.

No, I didn't have the test for this anymore unfortunately. RHQA might
have one though.

Either way, I cooked one up that does this on the server:

#!/bin/bash

while true; do
	rm -rf foo
	mkdir foo
	echo foo > foo/bar
	usleep 100000
done

...and then this on the client after mounting the fs with
lookupcache=none and noac.

#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <errno.h>
#include <stdio.h>
#include <unistd.h>

int main(int argc, char **argv)
{
	int fd;

	while(1) {
		fd = open(argv[1], O_RDONLY);
		if (fd < 0) {
			if (errno == ESTALE) {
				printf("ESTALE");
				return 1;
			}
			continue;
		}
		close(fd);
	}
	return 0;
}

I did see some of the OPEN compounds come back with NFS4ERR_STALE on
the PUTFH op but no corresponding ESTALE error in userland. So, this
patch does seem to do the right thing.

Reviewed-and-Tested-by: Jeff Layton <jlayton@poochiereds.net>
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Oleg Drokin June 4, 2016, 4:12 p.m. UTC | #2

On Jun 3, 2016, at 8:56 PM, Al Viro wrote:

> On Fri, Jun 03, 2016 at 07:58:37PM -0400, Oleg Drokin wrote:
> 
>>> EOPENSTALE, that is...  Oleg, could you check if the following works?
>> 
>> Yes, this one lasted for an hour with no crashing, so it must be good.
>> Thanks.
>> (note, I am not equipped to verify correctness of NFS operations, though).
> 
> I suspect that Jeff Layton might have relevant regression tests.  Incidentally,
> we really need a consolidated regression testsuite, including the tests you'd
> been running.  Right now there's some stuff in xfstests, LTP and cthon; if
> anything, this mess shows just why we need all of that and then some in
> a single place.  Lustre stuff has caught a 3 years old NFS bug (missing
> d_drop() in nfs_atomic_open()) and a year-old bug in handling of EOPENSTALE
> retries on the last component of a trailing non-embedded symlink.  Neither
> is hard to trigger; it's just that relevant tests hadn't been run on NFS,
> period.

BTW, the nets also have brought in another use after free in nfs4 state
tracking code (this is the one I was trying to hunt down from the start).
I'll submit a patch shortly.
And also there's a mysterious ext4 data corruption that I do not really fully
understand but only hit once so far.--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

diff --git a/fs/namei.c b/fs/namei.c
index 4c4f95a..3d9511e 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3166,9 +3166,7 @@  static int do_last(struct nameidata *nd,
 	int acc_mode = op->acc_mode;
 	unsigned seq;
 	struct inode *inode;
-	struct path save_parent = { .dentry = NULL, .mnt = NULL };
 	struct path path;
-	bool retried = false;
 	int error;
 
 	nd->flags &= ~LOOKUP_PARENT;
@@ -3211,7 +3209,6 @@  static int do_last(struct nameidata *nd,
 			return -EISDIR;
 	}
 
-retry_lookup:
 	if (open_flag & (O_CREAT | O_TRUNC | O_WRONLY | O_RDWR)) {
 		error = mnt_want_write(nd->path.mnt);
 		if (!error)
@@ -3292,23 +3289,14 @@  finish_lookup:
 	if (unlikely(error))
 		return error;
 
-	if ((nd->flags & LOOKUP_RCU) || nd->path.mnt != path.mnt) {
-		path_to_nameidata(&path, nd);
-	} else {
-		save_parent.dentry = nd->path.dentry;
-		save_parent.mnt = mntget(path.mnt);
-		nd->path.dentry = path.dentry;
-
-	}
+	path_to_nameidata(&path, nd);
 	nd->inode = inode;
 	nd->seq = seq;
 	/* Why this, you ask?  _Now_ we might have grown LOOKUP_JUMPED... */
 finish_open:
 	error = complete_walk(nd);
-	if (error) {
-		path_put(&save_parent);
+	if (error)
 		return error;
-	}
 	audit_inode(nd->name, nd->path.dentry, 0);
 	error = -EISDIR;
 	if ((open_flag & O_CREAT) && d_is_dir(nd->path.dentry))
@@ -3331,13 +3319,9 @@  finish_open_created:
 		goto out;
 	BUG_ON(*opened & FILE_OPENED); /* once it's opened, it's opened */
 	error = vfs_open(&nd->path, file, current_cred());
-	if (!error) {
-		*opened |= FILE_OPENED;
-	} else {
-		if (error == -EOPENSTALE)
-			goto stale_open;
+	if (error)
 		goto out;
-	}
+	*opened |= FILE_OPENED;
 opened:
 	error = open_check_o_direct(file);
 	if (!error)
@@ -3353,26 +3337,7 @@  out:
 	}
 	if (got_write)
 		mnt_drop_write(nd->path.mnt);
-	path_put(&save_parent);
 	return error;
-
-stale_open:
-	/* If no saved parent or already retried then can't retry */
-	if (!save_parent.dentry || retried)
-		goto out;
-
-	BUG_ON(save_parent.dentry != dir);
-	path_put(&nd->path);
-	nd->path = save_parent;
-	nd->inode = dir->d_inode;
-	save_parent.mnt = NULL;
-	save_parent.dentry = NULL;
-	if (got_write) {
-		mnt_drop_write(nd->path.mnt);
-		got_write = false;
-	}
-	retried = true;
-	goto retry_lookup;
 }
 
 static int do_tmpfile(struct nameidata *nd, unsigned flags,