Message ID | 20180511211339.GG30522@ZenIV.linux.org.uk (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Fri, May 11, 2018 at 10:13:39PM +0100, Al Viro wrote: > [-stable fodder; as it is, one can e.g. add > /mnt/cgroup localhost(rw,no_root_squash,fsid=4242) > to /etc/exports, > mount -t cgroup none /mnt/cgroup > mkdir /tmp/a > mount -t nfs localhost://mnt/cgroup /tmp/a > mkdir /tmp/a/foo How is the cgroup filesystem exportable? That sounds like a bug in itself. We don't want people using NFS as some kind of weird remote configuration protocol. > and have knfsd oops; the patch below deals with that. > > Questions: > 1) is fh_update() use below legitimate, or should we > do fh_put/fh_compose instead? fh_update looks OK to me, but do we need it here? There's already a if (!err) err = fh_update(reshp); at the end of nfsd_create_locked. > 2) is nfserr_serverfail valid for situation when > directory created by such vfs_mkdir() manages to disappear > under us immediately afterwards? Or should we return nfserr_stale > instead? We just got a successful result on the create and the parent's still locked, so if somebody hits this I think we want them reporting a bug, not wasting time trying to find a mistake in their own logic. > It's in vfs.git#for-linus, if you prefer to use git for review... > ] > > That can (and does, on some filesystems) happen - ->mkdir() (and thus > vfs_mkdir()) can legitimately leave its argument negative and just > unhash it, counting upon the lookup to pick the object we'd created > next time we try to look at that name. > > Some vfs_mkdir() callers forget about that possibility... I'd rather have this in a helper function with a comment or two, but I can do that as a followup patch. --b. > > Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> > --- > diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c > index 2410b093a2e6..b0555d7d8200 100644 > --- a/fs/nfsd/vfs.c > +++ b/fs/nfsd/vfs.c > @@ -1201,6 +1201,28 @@ nfsd_create_locked(struct svc_rqst *rqstp, struct svc_fh *fhp, > break; > case S_IFDIR: > host_err = vfs_mkdir(dirp, dchild, iap->ia_mode); > + if (!host_err && unlikely(d_unhashed(dchild))) { > + struct dentry *d; > + d = lookup_one_len(dchild->d_name.name, > + dchild->d_parent, > + dchild->d_name.len); > + if (IS_ERR(d)) { > + host_err = PTR_ERR(d); > + break; > + } > + if (unlikely(d_is_negative(d))) { > + dput(d); > + err = nfserr_serverfault; > + goto out; > + } > + dput(resfhp->fh_dentry); > + resfhp->fh_dentry = dget(d); > + err = fh_update(resfhp); > + dput(dchild); > + dchild = d; > + if (err) > + goto out; > + } > break; > case S_IFCHR: > case S_IFBLK: -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, May 14, 2018 at 11:32:16AM -0400, J. Bruce Fields wrote: > On Fri, May 11, 2018 at 10:13:39PM +0100, Al Viro wrote: > > [-stable fodder; as it is, one can e.g. add > > /mnt/cgroup localhost(rw,no_root_squash,fsid=4242) > > to /etc/exports, > > mount -t cgroup none /mnt/cgroup > > mkdir /tmp/a > > mount -t nfs localhost://mnt/cgroup /tmp/a > > mkdir /tmp/a/foo > > How is the cgroup filesystem exportable? That sounds like a bug in > itself. We don't want people using NFS as some kind of weird remote > configuration protocol. You can't have open-by-fhandle without exportability. And it's not the only fs like that. > > and have knfsd oops; the patch below deals with that. > > > > Questions: > > 1) is fh_update() use below legitimate, or should we > > do fh_put/fh_compose instead? > > fh_update looks OK to me, but do we need it here? There's already a > > if (!err) > err = fh_update(reshp); > > at the end of nfsd_create_locked. Might be too late for that, though - the trouble hits when we hit nfsd_create_setattr(). > > 2) is nfserr_serverfail valid for situation when > > directory created by such vfs_mkdir() manages to disappear > > under us immediately afterwards? Or should we return nfserr_stale > > instead? > > We just got a successful result on the create and the parent's still > locked, so if somebody hits this I think we want them reporting a bug, > not wasting time trying to find a mistake in their own logic. No. Suppose it's NFS-over-NFS (and yes, I agree that it's a bad idea; somebody *has* done that). Lookup after successful mkdir can legitimately fail if it's been removed server-side. And we *will* need to allow nfs_mkdir() to return that way in some cases (== success with dentry passed to it left unhashed negative), I'm afraid ;-/ -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, May 14, 2018 at 04:45:51PM +0100, Al Viro wrote: > On Mon, May 14, 2018 at 11:32:16AM -0400, J. Bruce Fields wrote: > > On Fri, May 11, 2018 at 10:13:39PM +0100, Al Viro wrote: > > > [-stable fodder; as it is, one can e.g. add > > > /mnt/cgroup localhost(rw,no_root_squash,fsid=4242) > > > to /etc/exports, > > > mount -t cgroup none /mnt/cgroup > > > mkdir /tmp/a > > > mount -t nfs localhost://mnt/cgroup /tmp/a > > > mkdir /tmp/a/foo > > > > How is the cgroup filesystem exportable? That sounds like a bug in > > itself. We don't want people using NFS as some kind of weird remote > > configuration protocol. > > You can't have open-by-fhandle without exportability. And it's not > the only fs like that. We could separate the two--add a flag to export_operations, if necessary. I haven't formulated a strong argument, but exporting those filesystems makes me *really* uncomfortable. Poking around.... Looks like this was added by aa8188253474 "kernfs: add exportfs operations", and they really do claim a use case for lookup by filehandle. > > > and have knfsd oops; the patch below deals with that. > > > > > > Questions: > > > 1) is fh_update() use below legitimate, or should we > > > do fh_put/fh_compose instead? > > > > fh_update looks OK to me, but do we need it here? There's already a > > > > if (!err) > > err = fh_update(reshp); > > > > at the end of nfsd_create_locked. > > Might be too late for that, though - the trouble hits when we hit > nfsd_create_setattr(). Oh, got it. Could move the bottom fh_update to just above the nfsd_create_setattr(), though? > > > 2) is nfserr_serverfail valid for situation when > > > directory created by such vfs_mkdir() manages to disappear > > > under us immediately afterwards? Or should we return nfserr_stale > > > instead? > > > > We just got a successful result on the create and the parent's still > > locked, so if somebody hits this I think we want them reporting a bug, > > not wasting time trying to find a mistake in their own logic. > > No. Suppose it's NFS-over-NFS (and yes, I agree that it's a bad idea; > somebody *has* done that). Lookup after successful mkdir can legitimately > fail if it's been removed server-side. > > And we *will* need to allow nfs_mkdir() to return that way in some cases > (== success with dentry passed to it left unhashed negative), I'm afraid ;-/ Thanks, makes sense. --b. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, May 14, 2018 at 04:45:51PM +0100, Al Viro wrote: > > > 2) is nfserr_serverfail valid for situation when > > > directory created by such vfs_mkdir() manages to disappear > > > under us immediately afterwards? Or should we return nfserr_stale > > > instead? > > > > We just got a successful result on the create and the parent's still > > locked, so if somebody hits this I think we want them reporting a bug, > > not wasting time trying to find a mistake in their own logic. > > No. Suppose it's NFS-over-NFS (and yes, I agree that it's a bad idea; > somebody *has* done that). Lookup after successful mkdir can legitimately > fail if it's been removed server-side. > > And we *will* need to allow nfs_mkdir() to return that way in some cases > (== success with dentry passed to it left unhashed negative), I'm afraid ;-/ Consider the situation when you have NFS reexported - there's server S1, exporting a filesystem to S2, which reexports it for client C. On S2, process A does stat foo, gets ENOENT and proceeds to mkdir foo. Dentry of foo is passed to nfs_mkdir(), which calls e.g. nfs3_proc_mkdir(), which calls nfs3_do_create(), which requests S1 to create the damn thing. Then, after that succeeds, it calls nfs_instantiate(). There we would proceed to get the inode and call d_add(dentry, inode). In the meanwhile, C, having figured out the fhandle S2 would assign to foo (e.g. having spied upon the traffic, etc.) sends that fhandle to S2. nfs_fh_to_dentry() is called by process B on S2 (either knfsd, or, in case if C==S2 and attacker has done open-by-fhandle - the caller of open_by_handle(2)). It gets the inode - the damn thing has been created on S1 already. That inode still has no dentries attached to it (B has just set it up), so d_obtain_alias() creates one and attaches to it. Now A gets around to nfs_fhget() and finds the same inode. Voila - d_add(dentry, inode) and we are fucked; two dentries over the same directory inode. Fun starts very shortly when fs/exportfs/* code is used by B to reconnect its dentry - the things really get bogus once that constraint is violated. The obvious solution would be to replace that d_add() with res = d_splice_alias(inode, dentry); if (IS_ERR(res)) { error = PTR_ERR(res); goto out_error; } dput(res); leaving the dentry passed to nfs_mkdir() unhashed and negative if we hit that race. That's fine - the next lookup (e.g. the one done by exportfs to reconnect the sucker) will find the right dentry in dcache; it's just that it won't the one passed to vfs_mkdir(). That's different from the local case - there mkdir gets the inumber, inserts locked in-core inode with that inumber into icache and only then proceeds to set on-disk data up. Anyone guessing the inumber (and thus the fhandle) will either get -ESTALE (if they come first, as in this scenario) or find the in-core inode mkdir is setting up and wait for it to be unlocked, at which point d_obtain_alias() will already find it attached to dentry passed to mkdir. But that critically relies upon the icache search key being known to mkdir *BEFORE* the on-disk metadata starts looking acceptable. For NFS we really can't do that - there the key is S1's fhandle and we don't get that until S1 has created the damn thing. I don't see any variation of the trick used by local filesystems that would not have this race; we really can end up with B getting there first and creating directory inode with dentry attached to it before A gets through. Which, AFAICS, leaves only one solution - let A put the dentry created by B in place of what had been passed to A (and leave its argument unhashed). Which is trivial to implement (see above); however, it means that NFS itself is in the same class as cgroup - its ->mkdir() may, in some cases, end up succeeding and leaving its argument unhashed negative. Note that dcache is perfectly fine after that - we have hashed positive dentry with the right name and right parent, over the inode for directory we'd just created; everything's fine, except that it's not the struct dentry originally passed to vfs_mkdir(). -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, May 14, 2018 at 12:27:04PM -0400, J. Bruce Fields wrote: > > > if (!err) > > > err = fh_update(reshp); > > > > > > at the end of nfsd_create_locked. > > > > Might be too late for that, though - the trouble hits when we hit > > nfsd_create_setattr(). > > Oh, got it. Could move the bottom fh_update to just above the > nfsd_create_setattr(), though? Probably, but I'm not comfortable enough with the nfsd guts wrt in-core fhandle and assumptions made by users of such, so I'd rather bounce that question to nfsd maintainers ;-) -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, May 14, 2018 at 06:00:41PM +0100, Al Viro wrote: > On Mon, May 14, 2018 at 12:27:04PM -0400, J. Bruce Fields wrote: > > > > if (!err) > > > > err = fh_update(reshp); > > > > > > > > at the end of nfsd_create_locked. > > > > > > Might be too late for that, though - the trouble hits when we hit > > > nfsd_create_setattr(). > > > > Oh, got it. Could move the bottom fh_update to just above the > > nfsd_create_setattr(), though? > > Probably, but I'm not comfortable enough with the nfsd guts wrt in-core > fhandle and assumptions made by users of such, so I'd rather bounce that > question to nfsd maintainers ;-) I'm fine with doing it as in your patch and then queuing up cleanup later. --b. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, May 14, 2018 at 05:47:43PM +0100, Al Viro wrote: > On Mon, May 14, 2018 at 04:45:51PM +0100, Al Viro wrote: > > > > > 2) is nfserr_serverfail valid for situation when > > > > directory created by such vfs_mkdir() manages to disappear > > > > under us immediately afterwards? Or should we return nfserr_stale > > > > instead? > > > > > > We just got a successful result on the create and the parent's still > > > locked, so if somebody hits this I think we want them reporting a bug, > > > not wasting time trying to find a mistake in their own logic. > > > > No. Suppose it's NFS-over-NFS (and yes, I agree that it's a bad idea; > > somebody *has* done that). Lookup after successful mkdir can legitimately > > fail if it's been removed server-side. > > > > And we *will* need to allow nfs_mkdir() to return that way in some cases > > (== success with dentry passed to it left unhashed negative), I'm afraid ;-/ > > Consider the situation when you have NFS reexported - there's server S1, > exporting a filesystem to S2, which reexports it for client C. Thanks for the explanation! I'd missed the connection between this and the mkdir/filehandle-lookup races. Acked-by: J. Bruce Fields <bfields@redhat.com> for the patch. --b. > > On S2, process A does stat foo, gets ENOENT and proceeds to mkdir foo. > Dentry of foo is passed to nfs_mkdir(), which calls e.g. nfs3_proc_mkdir(), > which calls nfs3_do_create(), which requests S1 to create the damn thing. > Then, after that succeeds, it calls nfs_instantiate(). There we would > proceed to get the inode and call d_add(dentry, inode). > > In the meanwhile, C, having figured out the fhandle S2 would assign to > foo (e.g. having spied upon the traffic, etc.) sends that fhandle to > S2. nfs_fh_to_dentry() is called by process B on S2 (either knfsd, or, > in case if C==S2 and attacker has done open-by-fhandle - the caller > of open_by_handle(2)). It gets the inode - the damn thing has been > created on S1 already. That inode still has no dentries attached to > it (B has just set it up), so d_obtain_alias() creates one and attaches > to it. > > Now A gets around to nfs_fhget() and finds the same inode. Voila - > d_add(dentry, inode) and we are fucked; two dentries over the same > directory inode. Fun starts very shortly when fs/exportfs/* code > is used by B to reconnect its dentry - the things really get bogus > once that constraint is violated. > > The obvious solution would be to replace that d_add() with > res = d_splice_alias(inode, dentry); > if (IS_ERR(res)) { > error = PTR_ERR(res); > goto out_error; > } > dput(res); > > leaving the dentry passed to nfs_mkdir() unhashed and negative if > we hit that race. That's fine - the next lookup (e.g. the one > done by exportfs to reconnect the sucker) will find the right > dentry in dcache; it's just that it won't the one passed to > vfs_mkdir(). > > That's different from the local case - there mkdir gets the inumber, > inserts locked in-core inode with that inumber into icache and > only then proceeds to set on-disk data up. Anyone guessing the > inumber (and thus the fhandle) will either get -ESTALE (if they > come first, as in this scenario) or find the in-core inode mkdir > is setting up and wait for it to be unlocked, at which point > d_obtain_alias() will already find it attached to dentry passed > to mkdir. > > But that critically relies upon the icache search key being known > to mkdir *BEFORE* the on-disk metadata starts looking acceptable. > For NFS we really can't do that - there the key is S1's fhandle > and we don't get that until S1 has created the damn thing. > > I don't see any variation of the trick used by local filesystems > that would not have this race; we really can end up with B getting > there first and creating directory inode with dentry attached to > it before A gets through. Which, AFAICS, leaves only one solution - > let A put the dentry created by B in place of what had been passed > to A (and leave its argument unhashed). Which is trivial to > implement (see above); however, it means that NFS itself is in > the same class as cgroup - its ->mkdir() may, in some cases, > end up succeeding and leaving its argument unhashed negative. > Note that dcache is perfectly fine after that - we have hashed > positive dentry with the right name and right parent, over the > inode for directory we'd just created; everything's fine, except > that it's not the struct dentry originally passed to vfs_mkdir(). -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index 2410b093a2e6..b0555d7d8200 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -1201,6 +1201,28 @@ nfsd_create_locked(struct svc_rqst *rqstp, struct svc_fh *fhp, break; case S_IFDIR: host_err = vfs_mkdir(dirp, dchild, iap->ia_mode); + if (!host_err && unlikely(d_unhashed(dchild))) { + struct dentry *d; + d = lookup_one_len(dchild->d_name.name, + dchild->d_parent, + dchild->d_name.len); + if (IS_ERR(d)) { + host_err = PTR_ERR(d); + break; + } + if (unlikely(d_is_negative(d))) { + dput(d); + err = nfserr_serverfault; + goto out; + } + dput(resfhp->fh_dentry); + resfhp->fh_dentry = dget(d); + err = fh_update(resfhp); + dput(dchild); + dchild = d; + if (err) + goto out; + } break; case S_IFCHR: case S_IFBLK:
[-stable fodder; as it is, one can e.g. add /mnt/cgroup localhost(rw,no_root_squash,fsid=4242) to /etc/exports, mount -t cgroup none /mnt/cgroup mkdir /tmp/a mount -t nfs localhost://mnt/cgroup /tmp/a mkdir /tmp/a/foo and have knfsd oops; the patch below deals with that. Questions: 1) is fh_update() use below legitimate, or should we do fh_put/fh_compose instead? 2) is nfserr_serverfail valid for situation when directory created by such vfs_mkdir() manages to disappear under us immediately afterwards? Or should we return nfserr_stale instead? It's in vfs.git#for-linus, if you prefer to use git for review... ] That can (and does, on some filesystems) happen - ->mkdir() (and thus vfs_mkdir()) can legitimately leave its argument negative and just unhash it, counting upon the lookup to pick the object we'd created next time we try to look at that name. Some vfs_mkdir() callers forget about that possibility... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> --- -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html