Message ID | 87iod68aa3.fsf_-_@x220.int.ebiederm.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Wed, Apr 08, 2015 at 06:34:12PM -0500, Eric W. Biederman wrote: > +static unsigned d_depth(const struct dentry *dentry) > +{ > + unsigned depth = 0; > + > + while (!IS_ROOT(dentry)) { > + dentry = dentry->d_parent; > + depth++; > + } > + return depth; > +} This relies on a depth of 2^32 being impossible, right? Which is guaranteed somewhat because you would need something like a terabyte of RAM to have that many dentries in RAM? I can't find any explicit check. Maybe it would make sense to let the depth be 64 bits or add some kind of overflow check? Or did I just miss some kind of check on allocation? <https://access.redhat.com/articles/rhel-limits> claims that redhat has tested RHEL on a machine with 6TB of physical RAM. I think that 2^32 dentries would fit in there. > +static const struct dentry *d_common_ancestor(const struct dentry *left, > + const struct dentry *right) > +{ > + unsigned ldepth = d_depth(left); > + unsigned rdepth = d_depth(right); > + > + if (ldepth > rdepth) { > + swap(left, right); > + swap(ldepth, rdepth); > + } > + > + while (rdepth > ldepth) { > + right = right->d_parent; > + rdepth--; > + } At this point, the actual depths could differ by 2^32, right? > + while (right != left) { > + if (IS_ROOT(right)) > + return NULL; > + right = right->d_parent; > + left = left->d_parent; And then one of these could crash with a NULL pointer deref?
Jann Horn <jann@thejh.net> writes: > On Wed, Apr 08, 2015 at 06:34:12PM -0500, Eric W. Biederman wrote: >> +static unsigned d_depth(const struct dentry *dentry) >> +{ >> + unsigned depth = 0; >> + >> + while (!IS_ROOT(dentry)) { >> + dentry = dentry->d_parent; >> + depth++; >> + } >> + return depth; >> +} > > This relies on a depth of 2^32 being impossible, right? Which is guaranteed > somewhat because you would need something like a terabyte of RAM to have > that many dentries in RAM? I can't find any explicit check. Maybe it would > make sense to let the depth be 64 bits or add some kind of overflow check? > Or did I just miss some kind of check on allocation? Well there is the 4K PATH_MAX. If nothing else your performance will grind to a halt if you attempt to use a path that deeply nested. That said it doesn't cost us anything to make the variables unsigned long and it avoids having to worry about this piece of code. I will respin this patch. Eric -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Apr 08, 2015 at 06:34:12PM -0500, Eric W. Biederman wrote: > + if (ancestor) { > + mark_violated_mounts(dentry, ancestor); > + mark_violated_mounts(target, ancestor); > + } Umm... Both sides the same way, regardless of whether it's exchange or move? Looks wrong... Look: mkdir /tmp/a mkdir /tmp/b mkdir /tmp/c mkdir /tmp/b/c touch /tmp/a/x mount --bind /tmp/b /tmp/c mv /tmp/a/x /tmp/b/c/x should that make the vfsmount on /tmp/c violated? And if so, why? -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Al Viro <viro@ZenIV.linux.org.uk> writes: > On Wed, Apr 08, 2015 at 06:34:12PM -0500, Eric W. Biederman wrote: >> + if (ancestor) { >> + mark_violated_mounts(dentry, ancestor); >> + mark_violated_mounts(target, ancestor); >> + } > > Umm... Both sides the same way, regardless of whether it's exchange or > move? Looks wrong... I am pretty certain it can cause d_path to become an information leak if we do not. > Look: > > mkdir /tmp/a > mkdir /tmp/b > mkdir /tmp/c > mkdir /tmp/b/c > touch /tmp/a/x > mount --bind /tmp/b /tmp/c > mv /tmp/a/x /tmp/b/c/x > > should that make the vfsmount on /tmp/c violated? And if so, why? If /tmp is a mount point and before the move there was a: touch /tmp/b/c/x And a process opened /tmp/c/c/x. d_path on that file descriptor before __d_move would say: /tmp/c/c/x after the __d_move d_path would say: /tmp/c/a/x Which is bizareely weird in this example, and could potentially be an expolitable information leak in the hands of someone who knew what they were doing. I am not clever enough to take that deleted directory and walk up the tree, so the damage may be limited to seeing the true path on the fileystem. But it just may be that I am dense today. Furthermore all of the relevant changes to the dentry that happen when exchange is true also happen when exchange is false, so I am very reluctant to believe that the non-exchange case is not exploitable by a sufficiently clever individual. Eric -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Apr 09, 2015 at 09:51:11PM -0500, Eric W. Biederman wrote: > And a process opened /tmp/c/c/x. > d_path on that file descriptor before __d_move would say: > > /tmp/c/c/x > > after the __d_move d_path would say: > > /tmp/c/a/x So what? > Which is bizareely weird in this example, and could potentially be > an expolitable information leak in the hands of someone who knew > what they were doing. > > I am not clever enough to take that deleted directory and walk up the > tree, so the damage may be limited to seeing the true path on the > fileystem. But it just may be that I am dense today. > > Furthermore all of the relevant changes to the dentry that happen > when exchange is true also happen when exchange is false, so I am very > reluctant to believe that the non-exchange case is not exploitable by a > sufficiently clever individual. Exploited how? The same assistant might very well have done echo "/tmp/c/a/x or whatever else I might want to pass to you" >/tmp/c/c/x and pass whatever information they wanted _that_ way. As it is, you've created one hell of a DoS - *anyone* can poison any vfsmount covering a subtree if they have access to a containing subtree somewhere and write permissions on a directory inside and directory outside of the victim one. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/fs/dcache.c b/fs/dcache.c index 6e68312494ed..7baecba354dd 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -2535,6 +2535,56 @@ static void dentry_unlock_for_move(struct dentry *dentry, struct dentry *target) spin_unlock(&dentry->d_lock); } +static unsigned d_depth(const struct dentry *dentry) +{ + unsigned depth = 0; + + while (!IS_ROOT(dentry)) { + dentry = dentry->d_parent; + depth++; + } + return depth; +} + +static const struct dentry *d_common_ancestor(const struct dentry *left, + const struct dentry *right) +{ + unsigned ldepth = d_depth(left); + unsigned rdepth = d_depth(right); + + if (ldepth > rdepth) { + swap(left, right); + swap(ldepth, rdepth); + } + + while (rdepth > ldepth) { + right = right->d_parent; + rdepth--; + } + + while (right != left) { + if (IS_ROOT(right)) + return NULL; + right = right->d_parent; + left = left->d_parent; + } + + return right; +} + +static void mark_violated_mounts(struct dentry *dentry, + const struct dentry *ancestor) +{ + /* Mark all mountroots that are children of the common + * ancestor and ancestors of dentry. + */ + struct dentry *p; + for (p = dentry->d_parent; p != ancestor; p = p->d_parent) { + if (d_mountroot(p)) + mnt_set_violated(p); + } +} + /* * When switching names, the actual string doesn't strictly have to * be preserved in the target - because we're dropping the target @@ -2563,11 +2613,22 @@ static void dentry_unlock_for_move(struct dentry *dentry, struct dentry *target) static void __d_move(struct dentry *dentry, struct dentry *target, bool exchange) { + const struct dentry *ancestor = d_common_ancestor(dentry, target); + if (!dentry->d_inode) printk(KERN_WARNING "VFS: moving negative dcache entry\n"); - BUG_ON(d_ancestor(dentry, target)); - BUG_ON(d_ancestor(target, dentry)); + BUG_ON(dentry == ancestor); + BUG_ON(target == ancestor); + + /* If there is a common ancestor, mark mounts which may have + * paths that are no longer able to follow d_parent up to + * mnt_root after this move. + */ + if (ancestor) { + mark_violated_mounts(dentry, ancestor); + mark_violated_mounts(target, ancestor); + } dentry_lock_for_move(dentry, target); diff --git a/fs/internal.h b/fs/internal.h index 046767f0042e..d6a6cbd1e7a1 100644 --- a/fs/internal.h +++ b/fs/internal.h @@ -71,6 +71,7 @@ extern int __mnt_want_write_file(struct file *); extern void __mnt_drop_write(struct vfsmount *); extern void __mnt_drop_write_file(struct file *); +extern void mnt_set_violated(struct dentry *root); /* * fs_struct.c */ diff --git a/fs/namespace.c b/fs/namespace.c index 75abc9fcaafa..083d96bdbd60 100644 --- a/fs/namespace.c +++ b/fs/namespace.c @@ -1644,6 +1644,23 @@ out_unlock: namespace_unlock(); } +void mnt_set_violated(struct dentry *root) +{ + struct mountroot *mr; + + /* Locking the mount guarantees that root is a mountpoint and + * pushes rcu path walkers onto the slow path. + */ + lock_mount_hash(); + mr = lookup_mountroot(root); + if (mr) { + spin_lock(&root->d_lock); + root->d_flags |= DCACHE_MOUNT_VIOLATED; + spin_unlock(&root->d_lock); + } + unlock_mount_hash(); +} + /* * Is the caller allowed to modify his namespace? */
Rename can move a file or directory outside of a bind mount. This has allowed programs with paths below the renamed directory to traverse up their directory tree to the real root of the filesystem instead of just the root of their bind mount. In the presence of such renames limit applications to what the bind mount intended to reveal by marking mounts that have had dentries renamed out of them with MNT_VIOLATED, marking mounts that can no longer walk up to their parent mounts with MNT_UMOUNT_PENDING and then lazily unmounting such mounts. All moves go through __d_move so __d_move has been modified to mark all mounts whose dentries have been moved outside of them. Once the root dentry of a violated mount has been found a new function mnt_set_violated is called to mark all mounts that have that dentry as their root as violated. The worst case performance of the changes to __d_move is two extra trip from dentry and target to the root of their respective dentry trees. This closes a hole where it was possible in some circumstances to follow .. past the root of a bind mount. Cc: stable@vger.kernel.org Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> --- fs/dcache.c | 65 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++-- fs/internal.h | 1 + fs/namespace.c | 17 +++++++++++++++ 3 files changed, 81 insertions(+), 2 deletions(-)