Message ID | 553663B7.7030506@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Tue, Apr 21, 2015 at 10:50:31PM +0800, Kinglong Mee wrote: > If there are some mount points(not exported for nfs) under > a pseudo root, after client's operation of those entry under > the root, anyone can unmount those mount points until nfsd stop. > > # cat /etc/exports > /nfs/xfs *(rw,insecure,no_subtree_check,no_root_squash) > /nfs/pnfs *(rw,insecure,no_subtree_check,no_root_squash) > # ll /nfs/ > total 0 > drwxr-xr-x. 3 root root 84 Apr 21 22:27 pnfs > drwxr-xr-x. 3 root root 84 Apr 21 22:27 test > drwxr-xr-x. 2 root root 6 Apr 20 22:01 xfs > # mount /dev/sde /nfs/test > # df > Filesystem 1K-blocks Used Available Use% Mounted on > ...... > /dev/sdd 1038336 32944 1005392 4% /nfs/pnfs > /dev/sdc 10475520 32928 10442592 1% /nfs/xfs > /dev/sde 999320 1284 929224 1% /nfs/test > # mount -t nfs 127.0.0.1:/nfs/ /mnt > # ll /mnt/*/ > /mnt/pnfs/: > total 0 > -rw-r--r--. 1 root root 0 Apr 21 22:23 attr > drwxr-xr-x. 2 root root 6 Apr 21 22:19 tmp > > /mnt/xfs/: > total 0 > # umount /nfs/test/ > umount: /nfs/test/: target is busy > (In some cases useful info about processes that > use the device is found by lsof(8) or fuser(1).) > > I don't think that's user expect, they want umount /nfs/test/. I agree that this is annoying. > It's caused by exports cache of nfsd holds the reference of > the path (here is /nfs/test/), so, it can't umounted until nfsd stop. A cache flush (exportfs -f) should also do the job, as long as you then manage to unmount before the cache entry is re-added. (So I suppose if you wanted to be completely safe: exportfs -f killall -STOP rpc.mountd umount /nfs/test/ killall -CONT rpc.mountd ) > There is one strange thing exist, the export cache status of > /nfs/test/ is CACHE_NEGATIVE, also hold the reference of the path. > I think nfsd should not hold the reference when CACHE_NEGATIVE. > > I can't find a better way to put the reference, just path it after > svc_export_update(). Unfortunately ex_path is part of the key--it's what we need to do lookups in this cache. I'm surprised something like ls /mnt/ still works after this. I'm not sure what to do. I wonder if we really need the struct path as part of the key, or if we could get away with just a string path? That's what we're actually passing to userspace, after all. --b. > > Any comments are welcome, thanks. > > Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> > --- > fs/nfsd/export.c | 12 +++++++++--- > 1 file changed, 9 insertions(+), 3 deletions(-) > > diff --git a/fs/nfsd/export.c b/fs/nfsd/export.c > index c3e3b6e..5595cffc7 100644 > --- a/fs/nfsd/export.c > +++ b/fs/nfsd/export.c > @@ -793,9 +793,15 @@ svc_export_update(struct svc_export *new, struct svc_export *old) > int hash = svc_export_hash(old); > > ch = sunrpc_cache_update(old->cd, &new->h, &old->h, hash); > - if (ch) > - return container_of(ch, struct svc_export, h); > - else > + if (ch) { > + struct svc_export *exp = container_of(ch, struct svc_export, h); > + if (test_bit(CACHE_NEGATIVE, &exp->h.flags)) { > + path_put(&exp->ex_path); > + exp->ex_path.mnt = NULL; > + exp->ex_path.dentry = NULL; > + } > + return exp; > + } else > return NULL; > } > > -- > 2.3.5 -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 21 Apr 2015 17:54:17 -0400 "J. Bruce Fields" <bfields@fieldses.org> > > Unfortunately ex_path is part of the key--it's what we need to do > lookups in this cache. I'm surprised something like ls /mnt/ still > works after this. > > I'm not sure what to do. I wonder if we really need the struct path as > part of the key, or if we could get away with just a string path? > That's what we're actually passing to userspace, after all. I wonder if we can use Al's new 'fs_pin' stuff, so that when the filesystem is unmounted we get a call-back and can release the filesystem... NeilBrown
On 4/22/2015 5:54 AM, J. Bruce Fields wrote: > On Tue, Apr 21, 2015 at 10:50:31PM +0800, Kinglong Mee wrote: >> If there are some mount points(not exported for nfs) under >> a pseudo root, after client's operation of those entry under >> the root, anyone can unmount those mount points until nfsd stop. >> >> # cat /etc/exports >> /nfs/xfs *(rw,insecure,no_subtree_check,no_root_squash) >> /nfs/pnfs *(rw,insecure,no_subtree_check,no_root_squash) >> # ll /nfs/ >> total 0 >> drwxr-xr-x. 3 root root 84 Apr 21 22:27 pnfs >> drwxr-xr-x. 3 root root 84 Apr 21 22:27 test >> drwxr-xr-x. 2 root root 6 Apr 20 22:01 xfs >> # mount /dev/sde /nfs/test >> # df >> Filesystem 1K-blocks Used Available Use% Mounted on >> ...... >> /dev/sdd 1038336 32944 1005392 4% /nfs/pnfs >> /dev/sdc 10475520 32928 10442592 1% /nfs/xfs >> /dev/sde 999320 1284 929224 1% /nfs/test >> # mount -t nfs 127.0.0.1:/nfs/ /mnt >> # ll /mnt/*/ >> /mnt/pnfs/: >> total 0 >> -rw-r--r--. 1 root root 0 Apr 21 22:23 attr >> drwxr-xr-x. 2 root root 6 Apr 21 22:19 tmp >> >> /mnt/xfs/: >> total 0 >> # umount /nfs/test/ >> umount: /nfs/test/: target is busy >> (In some cases useful info about processes that >> use the device is found by lsof(8) or fuser(1).) >> >> I don't think that's user expect, they want umount /nfs/test/. > > I agree that this is annoying. > >> It's caused by exports cache of nfsd holds the reference of >> the path (here is /nfs/test/), so, it can't umounted until nfsd stop. > > A cache flush (exportfs -f) should also do the job, as long as you > then manage to unmount before the cache entry is re-added. (So I > suppose if you wanted to be completely safe: > > exportfs -f > killall -STOP rpc.mountd > umount /nfs/test/ > killall -CONT rpc.mountd > ) Yes, that's right. > >> There is one strange thing exist, the export cache status of >> /nfs/test/ is CACHE_NEGATIVE, also hold the reference of the path. >> I think nfsd should not hold the reference when CACHE_NEGATIVE. >> >> I can't find a better way to put the reference, just path it after >> svc_export_update(). > > Unfortunately ex_path is part of the key--it's what we need to do > lookups in this cache. I'm surprised something like ls /mnt/ still > works after this. I just do some simplify tests, and can't make sure everything is right. > > I'm not sure what to do. I wonder if we really need the struct path as > part of the key, or if we could get away with just a string path? > That's what we're actually passing to userspace, after all. I have do a simplify grep of ex_path, it is used in many places. I don't think a string path can resolve the problem. Reference of dentry/mnt is like a cache, avoids re-finding of them, it is necessary to store them in svc_export. Neil points out another way of 'fs_pin', I will check that. thanks, Kinglong Mee > > --b. > >> >> Any comments are welcome, thanks. >> >> Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> >> --- >> fs/nfsd/export.c | 12 +++++++++--- >> 1 file changed, 9 insertions(+), 3 deletions(-) >> >> diff --git a/fs/nfsd/export.c b/fs/nfsd/export.c >> index c3e3b6e..5595cffc7 100644 >> --- a/fs/nfsd/export.c >> +++ b/fs/nfsd/export.c >> @@ -793,9 +793,15 @@ svc_export_update(struct svc_export *new, struct svc_export *old) >> int hash = svc_export_hash(old); >> >> ch = sunrpc_cache_update(old->cd, &new->h, &old->h, hash); >> - if (ch) >> - return container_of(ch, struct svc_export, h); >> - else >> + if (ch) { >> + struct svc_export *exp = container_of(ch, struct svc_export, h); >> + if (test_bit(CACHE_NEGATIVE, &exp->h.flags)) { >> + path_put(&exp->ex_path); >> + exp->ex_path.mnt = NULL; >> + exp->ex_path.dentry = NULL; >> + } >> + return exp; >> + } else >> return NULL; >> } >> >> -- >> 2.3.5 > -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Apr 22, 2015 at 07:11:30PM +0800, Kinglong Mee wrote: > On 4/22/2015 5:54 AM, J. Bruce Fields wrote: > > On Tue, Apr 21, 2015 at 10:50:31PM +0800, Kinglong Mee wrote: > >> If there are some mount points(not exported for nfs) under > >> a pseudo root, after client's operation of those entry under > >> the root, anyone can unmount those mount points until nfsd stop. > >> > >> # cat /etc/exports > >> /nfs/xfs *(rw,insecure,no_subtree_check,no_root_squash) > >> /nfs/pnfs *(rw,insecure,no_subtree_check,no_root_squash) > >> # ll /nfs/ > >> total 0 > >> drwxr-xr-x. 3 root root 84 Apr 21 22:27 pnfs > >> drwxr-xr-x. 3 root root 84 Apr 21 22:27 test > >> drwxr-xr-x. 2 root root 6 Apr 20 22:01 xfs > >> # mount /dev/sde /nfs/test > >> # df > >> Filesystem 1K-blocks Used Available Use% Mounted on > >> ...... > >> /dev/sdd 1038336 32944 1005392 4% /nfs/pnfs > >> /dev/sdc 10475520 32928 10442592 1% /nfs/xfs > >> /dev/sde 999320 1284 929224 1% /nfs/test > >> # mount -t nfs 127.0.0.1:/nfs/ /mnt > >> # ll /mnt/*/ > >> /mnt/pnfs/: > >> total 0 > >> -rw-r--r--. 1 root root 0 Apr 21 22:23 attr > >> drwxr-xr-x. 2 root root 6 Apr 21 22:19 tmp > >> > >> /mnt/xfs/: > >> total 0 > >> # umount /nfs/test/ > >> umount: /nfs/test/: target is busy > >> (In some cases useful info about processes that > >> use the device is found by lsof(8) or fuser(1).) > >> > >> I don't think that's user expect, they want umount /nfs/test/. > > > > I agree that this is annoying. > > > >> It's caused by exports cache of nfsd holds the reference of > >> the path (here is /nfs/test/), so, it can't umounted until nfsd stop. > > > > A cache flush (exportfs -f) should also do the job, as long as you > > then manage to unmount before the cache entry is re-added. (So I > > suppose if you wanted to be completely safe: > > > > exportfs -f > > killall -STOP rpc.mountd > > umount /nfs/test/ > > killall -CONT rpc.mountd > > ) > > Yes, that's right. > > > > >> There is one strange thing exist, the export cache status of > >> /nfs/test/ is CACHE_NEGATIVE, also hold the reference of the path. > >> I think nfsd should not hold the reference when CACHE_NEGATIVE. > >> > >> I can't find a better way to put the reference, just path it after > >> svc_export_update(). > > > > Unfortunately ex_path is part of the key--it's what we need to do > > lookups in this cache. I'm surprised something like ls /mnt/ still > > works after this. > > I just do some simplify tests, and can't make sure everything is right. > > > > > I'm not sure what to do. I wonder if we really need the struct path as > > part of the key, or if we could get away with just a string path? > > That's what we're actually passing to userspace, after all. > > I have do a simplify grep of ex_path, it is used in many places. > I don't think a string path can resolve the problem. I was thinking we could still keep ex_path, but as part of the "value" rather than part of the "key"? I'm not sure if that works. > Reference of dentry/mnt is like a cache, avoids re-finding of them, > it is necessary to store them in svc_export. > > Neil points out another way of 'fs_pin', I will check that. Yes, that'd be interesting. On a very quick look--I don't understand what it's meant to do at all. But if it does provide a way to get a callback on umount, that'd certainly be interesting.... --b. > > thanks, > Kinglong Mee > > > > > --b. > > > >> > >> Any comments are welcome, thanks. > >> > >> Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> > >> --- > >> fs/nfsd/export.c | 12 +++++++++--- > >> 1 file changed, 9 insertions(+), 3 deletions(-) > >> > >> diff --git a/fs/nfsd/export.c b/fs/nfsd/export.c > >> index c3e3b6e..5595cffc7 100644 > >> --- a/fs/nfsd/export.c > >> +++ b/fs/nfsd/export.c > >> @@ -793,9 +793,15 @@ svc_export_update(struct svc_export *new, struct svc_export *old) > >> int hash = svc_export_hash(old); > >> > >> ch = sunrpc_cache_update(old->cd, &new->h, &old->h, hash); > >> - if (ch) > >> - return container_of(ch, struct svc_export, h); > >> - else > >> + if (ch) { > >> + struct svc_export *exp = container_of(ch, struct svc_export, h); > >> + if (test_bit(CACHE_NEGATIVE, &exp->h.flags)) { > >> + path_put(&exp->ex_path); > >> + exp->ex_path.mnt = NULL; > >> + exp->ex_path.dentry = NULL; > >> + } > >> + return exp; > >> + } else > >> return NULL; > >> } > >> > >> -- > >> 2.3.5 > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 22 Apr 2015 11:07:03 -0400 "J. Bruce Fields" <bfields@fieldses.org> wrote: > > Reference of dentry/mnt is like a cache, avoids re-finding of them, > > it is necessary to store them in svc_export. > > > > Neil points out another way of 'fs_pin', I will check that. > > Yes, that'd be interesting. On a very quick look--I don't understand > what it's meant to do at all. But if it does provide a way to get a > callback on umount, that'd certainly be interesting.... Yeah, on a quick look it isn't really obvious at all. But I didn't read the code. I read https://lwn.net/Articles/636730/ which says: In its place is a mechanism by which an object can be added to a vfsmount structure (which represents a mounted filesystem); that object supports only one method: kill(). These "pin" objects hang around until the final reference to the vfsmount goes away, at which point each one's kill() function is called. It thus is a clean mechanism for performing cleanup when a filesystem goes away. This is used to close "BSD process accounting" files on umount, and sound like the perfect interface for flushing things from the sunrpc cache on umount. NeilBrown
diff --git a/fs/nfsd/export.c b/fs/nfsd/export.c index c3e3b6e..5595cffc7 100644 --- a/fs/nfsd/export.c +++ b/fs/nfsd/export.c @@ -793,9 +793,15 @@ svc_export_update(struct svc_export *new, struct svc_export *old) int hash = svc_export_hash(old); ch = sunrpc_cache_update(old->cd, &new->h, &old->h, hash); - if (ch) - return container_of(ch, struct svc_export, h); - else + if (ch) { + struct svc_export *exp = container_of(ch, struct svc_export, h); + if (test_bit(CACHE_NEGATIVE, &exp->h.flags)) { + path_put(&exp->ex_path); + exp->ex_path.mnt = NULL; + exp->ex_path.dentry = NULL; + } + return exp; + } else return NULL; }
If there are some mount points(not exported for nfs) under a pseudo root, after client's operation of those entry under the root, anyone can unmount those mount points until nfsd stop. # cat /etc/exports /nfs/xfs *(rw,insecure,no_subtree_check,no_root_squash) /nfs/pnfs *(rw,insecure,no_subtree_check,no_root_squash) # ll /nfs/ total 0 drwxr-xr-x. 3 root root 84 Apr 21 22:27 pnfs drwxr-xr-x. 3 root root 84 Apr 21 22:27 test drwxr-xr-x. 2 root root 6 Apr 20 22:01 xfs # mount /dev/sde /nfs/test # df Filesystem 1K-blocks Used Available Use% Mounted on ...... /dev/sdd 1038336 32944 1005392 4% /nfs/pnfs /dev/sdc 10475520 32928 10442592 1% /nfs/xfs /dev/sde 999320 1284 929224 1% /nfs/test # mount -t nfs 127.0.0.1:/nfs/ /mnt # ll /mnt/*/ /mnt/pnfs/: total 0 -rw-r--r--. 1 root root 0 Apr 21 22:23 attr drwxr-xr-x. 2 root root 6 Apr 21 22:19 tmp /mnt/xfs/: total 0 # umount /nfs/test/ umount: /nfs/test/: target is busy (In some cases useful info about processes that use the device is found by lsof(8) or fuser(1).) I don't think that's user expect, they want umount /nfs/test/. It's caused by exports cache of nfsd holds the reference of the path (here is /nfs/test/), so, it can't umounted until nfsd stop. There is one strange thing exist, the export cache status of /nfs/test/ is CACHE_NEGATIVE, also hold the reference of the path. I think nfsd should not hold the reference when CACHE_NEGATIVE. I can't find a better way to put the reference, just path it after svc_export_update(). Any comments are welcome, thanks. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> --- fs/nfsd/export.c | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-)