From patchwork Mon Feb 18 02:25:09 2013
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-Patchwork-Submitter: NeilBrown <neilb@suse.de>
X-Patchwork-Id: 2155001
Return-Path: <linux-nfs-owner@vger.kernel.org>
X-Original-To: patchwork-linux-nfs@patchwork.kernel.org
Delivered-To: patchwork-process-083081@patchwork2.kernel.org
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by patchwork2.kernel.org (Postfix) with ESMTP id 2B48CDF283
	for <patchwork-linux-nfs@patchwork.kernel.org>;
	Mon, 18 Feb 2013 02:25:29 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1757462Ab3BRCZ2 (ORCPT
	<rfc822;patchwork-linux-nfs@patchwork.kernel.org>);
	Sun, 17 Feb 2013 21:25:28 -0500
Received: from cantor2.suse.de ([195.135.220.15]:33953 "EHLO mx2.suse.de"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1755687Ab3BRCZ1 (ORCPT <rfc822;linux-nfs@vger.kernel.org>);
	Sun, 17 Feb 2013 21:25:27 -0500
Received: from relay1.suse.de (unknown [195.135.220.254])
	by mx2.suse.de (Postfix) with ESMTP id 436F3A3DDF;
	Mon, 18 Feb 2013 03:25:22 +0100 (CET)
Date: Mon, 18 Feb 2013 13:25:09 +1100
From: NeilBrown <neilb@suse.de>
To: Jeff Layton <jlayton@redhat.com>
Cc: "Myklebust, Trond" <Trond.Myklebust@netapp.com>,
	Alexander Viro <viro@zeniv.linux.org.uk>, NFS <linux-nfs@vger.kernel.org>
Subject: Re: More fun with unmounting ESTALE directories.
Message-ID: <20130218132509.0ce779de@notabene.brown>
In-Reply-To: <20130214104230.013b7f71@tlielax.poochiereds.net>
References: <20130212113813.427b8e05@notabene.brown>
	<20130214104230.013b7f71@tlielax.poochiereds.net>
X-Mailer: Claws Mail 3.8.1 (GTK+ 2.24.10; x86_64-suse-linux-gnu)
Mime-Version: 1.0
Sender: linux-nfs-owner@vger.kernel.org
Precedence: bulk
List-ID: <linux-nfs.vger.kernel.org>
X-Mailing-List: linux-nfs@vger.kernel.org

On Thu, 14 Feb 2013 10:42:30 -0500 Jeff Layton <jlayton@redhat.com> wrote:

> On Tue, 12 Feb 2013 11:38:13 +1100
> NeilBrown <neilb@suse.de> wrote:
> 
> > 
> > I've been exploring difficulties with unmounting stale directories and
> > discovered another bug.
> > 
> > If I:
> > 
> > SERVER:  mkdir /foo/bar  #and make sure it is exported
> > CLIENT:  mount -o vers=4 server:/foo/bar /mnt
> > SERVER:  rm -r /foo
> > CLIENT:  > /mnt/baz # gets an error of course
> > CLIENT:  ls -l /mnt # error again
> > CLIENT:  umount /mnt
> > 
> > The result of that last command is:
> > 
> > /mnt was not found in /proc/mounts
> > /mnt was not found in /proc/mounts
> > 
> > Strange?
> > 
> > cat /proc/mounts
> > 
> > .....
> > 10.0.2.2://foo/bar /mnt\040(deleted) nfs4 rw,relatime,vers=4,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.0.2.15,minorversion=0,local_lock=none,addr=10.0.2.2 0 0
> > ....
> > 
> > Notice the "\040(deleted)".
> > 
> > NFS has unhashed that directory because it is obviously bad, and d_path()
> > notices and adds " (deleted)".
> > 
> > Now I might be able to argue that NFS shouldn't be unhashing a directory that
> > is a mountpoint - it certainly seems strange behaviour.
> > 
> > But I think I can more strongly argue that /proc/mounts shouldn't be showing
> > the mounted directory, but instead the directory that it is mounted on.
> > Obviously these both have the same name so it shouldn't matter ... except
> > that here is a case where it does.
> > 
> > I "fixed" it with
> > 
> > --- a/fs/proc_namespace.c
> > +++ b/fs/proc_namespace.c
> > @@ -93,7 +93,7 @@ static int show_vfsmnt(struct seq_file *m, struct vfsmount *mnt)
> >  {
> >  	struct mount *r = real_mount(mnt);
> >  	int err = 0;
> > -	struct path mnt_path = { .dentry = mnt->mnt_root, .mnt = mnt };
> > +	struct path mnt_path = { .dentry = r->mnt_mountpoint, .mnt = &(r->mnt_parent)->mnt };
> >  	struct super_block *sb = mnt_path.dentry->d_sb;
> >  
> >  	if (sb->s_op->show_devname) {
> > 
> > though I suspect that isn't safe and needs some locking.
> > 
> > Probably both should be fixed:  NFS should not invalidate any mounted
> > directory, and show_vfsmnt() should report the mointpoint, not the mounted
> > directory.
> > 
> > I can't figure out any way to get NFS to not invalidate the mounted directory.
> > I think it happens in nfs_lookup_revalidate() when it calls d_drop(), but I
> > don't know how to tell if a given dentry is a mnt_root for any mountpoint.
> > 
> > Suggestions?  Thoughts?
> > 
> > Thanks,
> > NeilBrown
> > 
> 
> I've also been looking at some weird ESTALE problems. Here's another
> fun one that doesn't involve mountpoints. Assume here that we're
> working in the same exported directory on client and server:
> 
>     server# mkdir a
>     client# cd a
>     server# mv a a.bak
>     client# sleep 30  # (or whatever the dir attrcache timeout is)
>     client# stat .
>     stat: cannot stat ‘.’: Stale NFS file handle
> 
> Obviously, "." should not be stale. It got renamed, but the inode still
> exists on the server.
> 
> If you sniff on the wire, you'll see that the server doesn't ever send
> an ESTALE here. What happens is that due to FS_REVAL_DOT being set, we
> end up trying to revalidate the dentry that "." refers to. We find that
> the parent changed (obviously) and then try to redo the lookup of "a".
> At that point we notice that it doesn't exist and turn it into ESTALE.
> 
> I don't really understand the point of FS_REVAL_DOT. What does that
> actually buy us? I wonder if removing it would also help your testcase?
> 

I think that is a slightly different issue, but certainly related. 
I have hit your problem before and have the following patch in SLES.  I think
I tried pushing it upstream, but didn't get much in the way of a useful
response.
(patch is space-damaged - don't try to apply with 'patch').

BTW I have another problem, related to this one and which could be fixed by
removing FS_REVAL_DOT.

If you
   mount -o vers=4,noac server:/some/path /mnt
then stop nfsd on the server and
   umount /mnt

it hangs.
Partly it hangs because  'mount' tries to do a 'readlink' on the mountpoint.
I can probably get it to not do that (or use --no-canonicalize).
But then sys_umount hangs because it tries to check with the server that the
thing being unmounted really is still a directory...

I would be really nice if sys_unmount used a LOOKUP_MOUNTPOINT flag that
works a bit like LOOKUP_PARENT and LOOKUP_NOFOLLOW in that it skips the very
last step and returns the mounted-on directory, not the mountpoint that is
mounted there - or at least makes sure not revalidate happens on that final
mounted directory.


I think FS_REVAL_DOT is needed so that if you call stat("."), it will update
attributes from the server if the cache is old.  However it seems to do a
whole lot more than that, including "lookup" calls which it I'm sure is wrong.

NeilBrown


Subject: [PATCH] nfs - handle d_revalidate of 'dot' correctly.

When d_revalidate is called on a dentry because FS_REVAL_DOT is set
it isn't really appropriate to revalidate the name.

If the path was simply ".", then the current-working-directory could
have been renamed on the server and should still be accessible as "."
even if it has a new name.

If the path was "/some/long/path/.", then the final component ("path" in
this case) has already been revalidated and there is no particular
need to do it again.

If we change nd->last_type to refer to "the last component looked at"
rather than just "the last component", then these cases can be
detected by "nd->last_type != LAST_NORM".

Signed-off-by: NeilBrown <neilb@suse.de>
---
 fs/namei.c   |    2 +-
 fs/nfs/dir.c |    9 +++++++++
 2 files changed, 10 insertions(+), 1 deletion(-)

--- linux-3.0-SLE11-SP2.orig/fs/namei.c
+++ linux-3.0-SLE11-SP2/fs/namei.c
@@ -1460,6 +1460,7 @@ static int link_path_walk(const char *na
                        }
                }
 
+               nd->last_type = type;
                /* remove trailing slashes? */
                if (!c)
                        goto last_component;
@@ -1486,7 +1487,6 @@ last_component:
                /* Clear LOOKUP_CONTINUE iff it was previously unset */
                nd->flags &= lookup_flags | ~LOOKUP_CONTINUE;
                nd->last = this;
-               nd->last_type = type;
                return 0;
        }
        terminate_walk(nd);
--- linux-3.0-SLE11-SP2.orig/fs/nfs/dir.c
+++ linux-3.0-SLE11-SP2/fs/nfs/dir.c
@@ -1138,6 +1138,15 @@ static int nfs_lookup_revalidate(struct
        if (NFS_STALE(inode))
                goto out_bad;
 
+       if (nd->last_type != LAST_NORM) {
+               /* name not relevant, just inode */
+               error = nfs_revalidate_inode(NFS_SERVER(inode), inode);
+               if (error)
+                       goto out_bad;
+               else
+                       goto out_valid;
+       }
+
        error = -ENOMEM;
        fhandle = nfs_alloc_fhandle();
        fattr = nfs_alloc_fattr();