Message ID | 1430803373-4948-51-git-send-email-viro@ZenIV.linux.org.uk (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Tue, 5 May 2015 06:22:25 +0100 Al Viro <viro@ZenIV.linux.org.uk> wrote: > From: Al Viro <viro@zeniv.linux.org.uk> > > The only restriction is that on the total amount of symlinks > crossed; how they are nested does not matter > > Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> > --- > - if (unlikely(current->total_link_count >= 40)) { > + if (unlikely(current->total_link_count >= MAXSYMLINKS)) { There is still a literal '40' in follow_automount. current->total_link_count++; if (current->total_link_count >= 40) return -ELOOP; should that become MAXSYMLINKS too? Thanks, NeilBrown
On Wed, May 06, 2015 at 12:55:28PM +1000, NeilBrown wrote: > > - if (unlikely(current->total_link_count >= 40)) { > > + if (unlikely(current->total_link_count >= MAXSYMLINKS)) { > > There is still a literal '40' in follow_automount. > > current->total_link_count++; > if (current->total_link_count >= 40) > return -ELOOP; > > > should that become MAXSYMLINKS too? Yes, assuming we want to keep it at all... -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, May 06, 2015 at 08:24:07AM +0100, Al Viro wrote: > On Wed, May 06, 2015 at 12:55:28PM +1000, NeilBrown wrote: > > > > - if (unlikely(current->total_link_count >= 40)) { > > > + if (unlikely(current->total_link_count >= MAXSYMLINKS)) { > > > > There is still a literal '40' in follow_automount. > > > > current->total_link_count++; > > if (current->total_link_count >= 40) > > return -ELOOP; > > > > > > should that become MAXSYMLINKS too? > > Yes, assuming we want to keep it at all... To elaborate a bit - the reason why we count those among the symlink crossings is historical; we used to implement automounting via ->follow_link(). Note that it's not really consistent - once a process has triggered automount, subsequent lookups crossing that point *won't* have it counted as link traversal. Except for ones that come while automount attempt is in progress, etc. Limit on link traversals makes obvious sense wrt avoiding loops and DoS - reaching a symlink in effect is adding a pile of path components to the path yet to be traversed. Automount points do nothing of that sort - the pending path remains as-is. So I'm not sure it makes sense to have those contribute to the same counter. OTOH, there _is_ an unpleasant problem around follow_automount() - it's a place where we call a method with heavy stack footprint still fairly deep in stack; in mainline it can get as bad as ~2.8K deep, with this tree the worst call chain entirely inside fs/namei.c is SyS_renameat2 -> user_path_parent -> filename_lookup -> path_lookupat -> path_init -> link_path_walk -> walk_component -> lookup_fast -> follow_managed -> ->d_automount and it costs 1.1K; slightly more shallow chains lead from linkat(2) and do_filp_open()/do_file_open_root(). Again, mainline is more than two times worse on those. ->d_manage() gets hit on the same depth, ->d_revalidate() at a bit less than that. FWIW, the sums on a fairly sane amd64 config are [->d_manage] 1104 [->d_automount] 1104 [->d_revalidate] 1024 [->lookup] 992 [->put_link] 896 [->permission] 832 [->follow_link] 768 [->d_hash] 768 [->d_weak_revalidate] 624 [->create] 544 [->tmpfile] 480 [->atomic_open] 480 [->rename] 336 [->rename2] 336 [->link] 240 [->unlink] 224 [->rmdir] 176 [->mknod] 160 [->symlink] 144 [->mkdir] 144 And ->d_automount() is easily the heaviest of that bunch in terms of stack footprint. So much that I really wonder if we ought to use something like schedule_work() and (interruptibly) wait for completion; the interesting question here is how much of the initiator's state is needed by worker. The real namespace isn't - actual mounting is in finish_automount(). Credentials probably are; what about netns? Suppose a process steps on NFS4 referral point and triggers a new NFS mount; which netns should be used - that of the triggering process or that of the NFS mount where we'd run into the referral? Ditto for AFS and CIFS... BTW, if you want to torture the whole thing wrt stack footprint, try to make /lib/firmware a symlink that would lead you through a referral point (on the mainline - do so at the maximal nesting depth). The thing is, request_firmware() will chase that one *on* *caller's* *stack*. Seeing that we do have an async variant anyway (request_firmware_nowait()), could we have request_firmware() itself go through schedule_work() (and wait for completion, of course)? -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/fs/namei.c b/fs/namei.c index 8fd8ccb..078db1d 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -492,6 +492,7 @@ void path_put(const struct path *path) } EXPORT_SYMBOL(path_put); +#define EMBEDDED_LEVELS 2 struct nameidata { struct path path; union { @@ -509,9 +510,42 @@ struct nameidata { struct path link; void *cookie; const char *name; - } stack[MAX_NESTED_LINKS + 1]; + } *stack, internal[EMBEDDED_LEVELS]; }; +static void set_nameidata(struct nameidata *nd) +{ + nd->stack = nd->internal; +} + +static void restore_nameidata(struct nameidata *nd) +{ + if (nd->stack != nd->internal) { + kfree(nd->stack); + nd->stack = nd->internal; + } +} + +static int __nd_alloc_stack(struct nameidata *nd) +{ + struct saved *p = kmalloc((MAXSYMLINKS + 1) * sizeof(struct saved), + GFP_KERNEL); + if (unlikely(!p)) + return -ENOMEM; + memcpy(p, nd->internal, sizeof(nd->internal)); + nd->stack = p; + return 0; +} + +static inline int nd_alloc_stack(struct nameidata *nd) +{ + if (likely(nd->depth != EMBEDDED_LEVELS - 1)) + return 0; + if (likely(nd->stack != nd->internal)) + return 0; + return __nd_alloc_stack(nd); +} + /* * Path walking has 2 modes, rcu-walk and ref-walk (see * Documentation/filesystems/path-lookup.txt). In situations when we can't @@ -857,7 +891,7 @@ const char *get_link(struct nameidata *nd) if (nd->link.mnt == nd->path.mnt) mntget(nd->link.mnt); - if (unlikely(current->total_link_count >= 40)) { + if (unlikely(current->total_link_count >= MAXSYMLINKS)) { path_put(&nd->path); path_put(&nd->link); return ERR_PTR(-ELOOP); @@ -1781,22 +1815,18 @@ Walked: if (err) { const char *s; - if (unlikely(current->link_count >= MAX_NESTED_LINKS)) { - path_put_conditional(&nd->link, nd); - path_put(&nd->path); - err = -ELOOP; - goto Err; + err = nd_alloc_stack(nd); + if (unlikely(err)) { + path_to_nameidata(&nd->link, nd); + break; } - BUG_ON(nd->depth >= MAX_NESTED_LINKS); nd->depth++; - current->link_count++; s = get_link(nd); if (unlikely(IS_ERR(s))) { err = PTR_ERR(s); - current->link_count--; nd->depth--; goto Err; } @@ -1804,7 +1834,6 @@ Walked: if (unlikely(!s)) { /* jumped */ put_link(nd); - current->link_count--; nd->depth--; } else { if (*s == '/') { @@ -1834,7 +1863,6 @@ Walked: Err: while (unlikely(nd->depth)) { put_link(nd); - current->link_count--; nd->depth--; } return err; @@ -1843,7 +1871,6 @@ OK: name = nd->stack[nd->depth].name; err = walk_component(nd, LOOKUP_FOLLOW); put_link(nd); - current->link_count--; nd->depth--; goto Walked; } @@ -2047,7 +2074,11 @@ static int path_lookupat(int dfd, const struct filename *name, static int filename_lookup(int dfd, struct filename *name, unsigned int flags, struct nameidata *nd) { - int retval = path_lookupat(dfd, name, flags | LOOKUP_RCU, nd); + int retval; + + set_nameidata(nd); + retval = path_lookupat(dfd, name, flags | LOOKUP_RCU, nd); + if (unlikely(retval == -ECHILD)) retval = path_lookupat(dfd, name, flags, nd); if (unlikely(retval == -ESTALE)) @@ -2055,6 +2086,7 @@ static int filename_lookup(int dfd, struct filename *name, if (likely(!retval)) audit_inode(name, nd->path.dentry, flags & LOOKUP_PARENT); + restore_nameidata(nd); return retval; } @@ -2385,6 +2417,7 @@ filename_mountpoint(int dfd, struct filename *name, struct path *path, int error; if (IS_ERR(name)) return PTR_ERR(name); + set_nameidata(&nd); error = path_mountpoint(dfd, name, path, &nd, flags | LOOKUP_RCU); if (unlikely(error == -ECHILD)) error = path_mountpoint(dfd, name, path, &nd, flags); @@ -2392,6 +2425,7 @@ filename_mountpoint(int dfd, struct filename *name, struct path *path, error = path_mountpoint(dfd, name, path, &nd, flags | LOOKUP_REVAL); if (likely(!error)) audit_inode(name, path->dentry, 0); + restore_nameidata(&nd); putname(name); return error; } @@ -3281,11 +3315,13 @@ struct file *do_filp_open(int dfd, struct filename *pathname, int flags = op->lookup_flags; struct file *filp; + set_nameidata(&nd); filp = path_openat(dfd, pathname, &nd, op, flags | LOOKUP_RCU); if (unlikely(filp == ERR_PTR(-ECHILD))) filp = path_openat(dfd, pathname, &nd, op, flags); if (unlikely(filp == ERR_PTR(-ESTALE))) filp = path_openat(dfd, pathname, &nd, op, flags | LOOKUP_REVAL); + restore_nameidata(&nd); return filp; } @@ -3299,6 +3335,7 @@ struct file *do_file_open_root(struct dentry *dentry, struct vfsmount *mnt, nd.root.mnt = mnt; nd.root.dentry = dentry; + set_nameidata(&nd); if (d_is_symlink(dentry) && op->intent & LOOKUP_OPEN) return ERR_PTR(-ELOOP); @@ -3312,6 +3349,7 @@ struct file *do_file_open_root(struct dentry *dentry, struct vfsmount *mnt, file = path_openat(-1, filename, &nd, op, flags); if (unlikely(file == ERR_PTR(-ESTALE))) file = path_openat(-1, filename, &nd, op, flags | LOOKUP_REVAL); + restore_nameidata(&nd); putname(filename); return file; } diff --git a/include/linux/namei.h b/include/linux/namei.h index a5d5bed..3a6cc96 100644 --- a/include/linux/namei.h +++ b/include/linux/namei.h @@ -11,6 +11,8 @@ struct nameidata; enum { MAX_NESTED_LINKS = 8 }; +#define MAXSYMLINKS 40 + /* * Type of the last component on LOOKUP_PARENT */