Message ID | 20201217161911.743222-2-axboe@kernel.dk (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | fs: Support for LOOKUP_CACHED / RESOLVE_CACHED | expand |
On 12/17/20 9:19 AM, Jens Axboe wrote: > Most callers check for non-zero return, and assume it's -ECHILD (which > it always will be). One caller uses the actual error return. Clean this > up and make it fully consistent, by having unlazy_walk() return a bool > instead. Rename it to try_to_unlazy() and return true on success, and > failure on error. That's easier to read. Al, were you planning on queuing this one up for 5.11 still? I'm fine with holding for 5.12 as well, would just like to know what your plans are. Latter goes for the whole series too, fwiw.
On Fri, Dec 25, 2020 at 07:41:17PM -0700, Jens Axboe wrote: > On 12/17/20 9:19 AM, Jens Axboe wrote: > > Most callers check for non-zero return, and assume it's -ECHILD (which > > it always will be). One caller uses the actual error return. Clean this > > up and make it fully consistent, by having unlazy_walk() return a bool > > instead. Rename it to try_to_unlazy() and return true on success, and > > failure on error. That's easier to read. > > Al, were you planning on queuing this one up for 5.11 still? I'm fine > with holding for 5.12 as well, would just like to know what your plans > are. Latter goes for the whole series too, fwiw. Seeing that it has not sat in -next at all, what I'm going to do is to put it into 5.11-rc1-based branch. It's really been too late for something like that for this cycle and IME a topic branch started before the merges for previous cycle are over is too likely to require backmerges, if not outright rebases. So let's branch it at -rc1 and it'll go into #for-next from the very beginning.
On 12/25/20 9:50 PM, Al Viro wrote: > On Fri, Dec 25, 2020 at 07:41:17PM -0700, Jens Axboe wrote: >> On 12/17/20 9:19 AM, Jens Axboe wrote: >>> Most callers check for non-zero return, and assume it's -ECHILD (which >>> it always will be). One caller uses the actual error return. Clean this >>> up and make it fully consistent, by having unlazy_walk() return a bool >>> instead. Rename it to try_to_unlazy() and return true on success, and >>> failure on error. That's easier to read. >> >> Al, were you planning on queuing this one up for 5.11 still? I'm fine >> with holding for 5.12 as well, would just like to know what your plans >> are. Latter goes for the whole series too, fwiw. > > Seeing that it has not sat in -next at all, what I'm going to do is > to put it into 5.11-rc1-based branch. It's really been too late for > something like that for this cycle and IME a topic branch started > before the merges for previous cycle are over is too likely to require > backmerges, if not outright rebases. So let's branch it at -rc1 and > it'll go into #for-next from the very beginning. That sounds fine, thanks Al. Will you queue up 1-3 in a stable branch? Then I can just pull that into mine for the io_uring bits. I'll also then post the stat part of this too, for separate review. BTW, I wrote a man page addition for openat2(2) for RESOLVE_CACHED: commit 3b381a6bc50da79eee6a3a1da330480d0cb46302 Author: Jens Axboe <axboe@kernel.dk> Date: Thu Dec 17 14:15:15 2020 -0700 man2/openat2.2: add RESOLVE_CACHED RESOLVE_CACHED allows an application to attempt a cache-only open of a file. If this isn't possible, the request will fail with -1/EAGAIN and the caller should retry without RESOLVE_CACHED set. This will generally happen from a different context, where a slower open operation can be performed. Signed-off-by: Jens Axboe <axboe@kernel.dk> diff --git a/man2/openat2.2 b/man2/openat2.2 index 3bda20620574..f5af3eee2cda 100644 --- a/man2/openat2.2 +++ b/man2/openat2.2 @@ -385,6 +385,17 @@ This may occur if, for example, a system pathname that is used by an application is modified (e.g., in a new distribution release) so that a pathname component (now) contains a bind mount. +.TP +.B RESOLVE_CACHED +Make the open operation fail unless all path components are already present +in the kernels lookup cache. +If any kind of revalidation or IO is needed to satisfy the lookup, +.BR openat2 () +fails with the error +.B EAGAIN. +This is useful in providing a fast path open that can be performed without +resorting to thread offload, or other mechanism that an application might +use to offload slower operations. .RE .IP If any bits other than those listed above are set in @@ -421,6 +432,14 @@ The caller may choose to retry the .BR openat2 () call. .TP +.B EAGAIN +.BR RESOLVE_CACHED +was set, and the open operation cannot be performed cached. +The caller should retry without +.B RESOLVE_CACHED +set in +.I how.resolve +.TP .B EINVAL An unknown flag or invalid value was specified in .IR how .
On Sat, Dec 26, 2020 at 10:33:25AM -0700, Jens Axboe wrote: > +.TP > +.B RESOLVE_CACHED > +Make the open operation fail unless all path components are already present > +in the kernels lookup cache. > +If any kind of revalidation or IO is needed to satisfy the lookup, Usually spelled I/O in manpages. > +.BR openat2 () > +fails with the error > +.B EAGAIN. > +This is useful in providing a fast path open that can be performed without > +resorting to thread offload, or other mechanism that an application might > +use to offload slower operations. That almost reads backwards ... how about this? This provides a fast path open that can be used when an application does not wish to block. It allows the application to hand off the lookup to a separate thread which can block.
On 12/26/20 10:58 AM, Matthew Wilcox wrote: > On Sat, Dec 26, 2020 at 10:33:25AM -0700, Jens Axboe wrote: >> +.TP >> +.B RESOLVE_CACHED >> +Make the open operation fail unless all path components are already present >> +in the kernels lookup cache. >> +If any kind of revalidation or IO is needed to satisfy the lookup, > > Usually spelled I/O in manpages. Changed, thanks! >> +.BR openat2 () >> +fails with the error >> +.B EAGAIN. >> +This is useful in providing a fast path open that can be performed without >> +resorting to thread offload, or other mechanism that an application might >> +use to offload slower operations. > > That almost reads backwards ... how about this? > > This provides a fast path open that can be used when an application does > not wish to block. It allows the application to hand off the lookup to > a separate thread which can block. I deliberately did not want to include anything about blocking, would prefer to just keep it about having the necessary data/state cached.
diff --git a/fs/namei.c b/fs/namei.c index 03d0e11e4f36..c31ddddcef3c 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -669,17 +669,17 @@ static bool legitimize_root(struct nameidata *nd) */ /** - * unlazy_walk - try to switch to ref-walk mode. + * try_to_unlazy - try to switch to ref-walk mode. * @nd: nameidata pathwalk data - * Returns: 0 on success, -ECHILD on failure + * Returns: true on success, false on failure * - * unlazy_walk attempts to legitimize the current nd->path and nd->root + * try_to_unlazy attempts to legitimize the current nd->path and nd->root * for ref-walk mode. * Must be called from rcu-walk context. - * Nothing should touch nameidata between unlazy_walk() failure and + * Nothing should touch nameidata between try_to_unlazy() failure and * terminate_walk(). */ -static int unlazy_walk(struct nameidata *nd) +static bool try_to_unlazy(struct nameidata *nd) { struct dentry *parent = nd->path.dentry; @@ -694,14 +694,14 @@ static int unlazy_walk(struct nameidata *nd) goto out; rcu_read_unlock(); BUG_ON(nd->inode != parent->d_inode); - return 0; + return true; out1: nd->path.mnt = NULL; nd->path.dentry = NULL; out: rcu_read_unlock(); - return -ECHILD; + return false; } /** @@ -792,7 +792,7 @@ static int complete_walk(struct nameidata *nd) */ if (!(nd->flags & (LOOKUP_ROOT | LOOKUP_IS_SCOPED))) nd->root.mnt = NULL; - if (unlikely(unlazy_walk(nd))) + if (!try_to_unlazy(nd)) return -ECHILD; } @@ -1466,7 +1466,7 @@ static struct dentry *lookup_fast(struct nameidata *nd, unsigned seq; dentry = __d_lookup_rcu(parent, &nd->last, &seq); if (unlikely(!dentry)) { - if (unlazy_walk(nd)) + if (!try_to_unlazy(nd)) return ERR_PTR(-ECHILD); return NULL; } @@ -1567,10 +1567,8 @@ static inline int may_lookup(struct nameidata *nd) { if (nd->flags & LOOKUP_RCU) { int err = inode_permission(nd->inode, MAY_EXEC|MAY_NOT_BLOCK); - if (err != -ECHILD) + if (err != -ECHILD || !try_to_unlazy(nd)) return err; - if (unlazy_walk(nd)) - return -ECHILD; } return inode_permission(nd->inode, MAY_EXEC); } @@ -1592,7 +1590,7 @@ static int reserve_stack(struct nameidata *nd, struct path *link, unsigned seq) // unlazy even if we fail to grab the link - cleanup needs it bool grabbed_link = legitimize_path(nd, link, seq); - if (unlazy_walk(nd) != 0 || !grabbed_link) + if (!try_to_unlazy(nd) != 0 || !grabbed_link) return -ECHILD; if (nd_alloc_stack(nd)) @@ -1634,7 +1632,7 @@ static const char *pick_link(struct nameidata *nd, struct path *link, touch_atime(&last->link); cond_resched(); } else if (atime_needs_update(&last->link, inode)) { - if (unlikely(unlazy_walk(nd))) + if (!try_to_unlazy(nd)) return ERR_PTR(-ECHILD); touch_atime(&last->link); } @@ -1651,11 +1649,8 @@ static const char *pick_link(struct nameidata *nd, struct path *link, get = inode->i_op->get_link; if (nd->flags & LOOKUP_RCU) { res = get(NULL, inode, &last->done); - if (res == ERR_PTR(-ECHILD)) { - if (unlikely(unlazy_walk(nd))) - return ERR_PTR(-ECHILD); + if (res == ERR_PTR(-ECHILD) && try_to_unlazy(nd)) res = get(link->dentry, inode, &last->done); - } } else { res = get(link->dentry, inode, &last->done); } @@ -2193,7 +2188,7 @@ static int link_path_walk(const char *name, struct nameidata *nd) } if (unlikely(!d_can_lookup(nd->path.dentry))) { if (nd->flags & LOOKUP_RCU) { - if (unlazy_walk(nd)) + if (!try_to_unlazy(nd)) return -ECHILD; } return -ENOTDIR; @@ -3127,7 +3122,6 @@ static const char *open_last_lookups(struct nameidata *nd, struct inode *inode; struct dentry *dentry; const char *res; - int error; nd->flags |= op->intent; @@ -3151,9 +3145,8 @@ static const char *open_last_lookups(struct nameidata *nd, } else { /* create side of things */ if (nd->flags & LOOKUP_RCU) { - error = unlazy_walk(nd); - if (unlikely(error)) - return ERR_PTR(error); + if (!try_to_unlazy(nd)) + return ERR_PTR(-ECHILD); } audit_inode(nd->name, dir, AUDIT_INODE_PARENT); /* trailing slashes? */ @@ -3162,9 +3155,7 @@ static const char *open_last_lookups(struct nameidata *nd, } if (open_flag & (O_CREAT | O_TRUNC | O_WRONLY | O_RDWR)) { - error = mnt_want_write(nd->path.mnt); - if (!error) - got_write = true; + got_write = !mnt_want_write(nd->path.mnt); /* * do _not_ fail yet - we might not need that or fail with * a different error; let lookup_open() decide; we'll be
Most callers check for non-zero return, and assume it's -ECHILD (which it always will be). One caller uses the actual error return. Clean this up and make it fully consistent, by having unlazy_walk() return a bool instead. Rename it to try_to_unlazy() and return true on success, and failure on error. That's easier to read. No functional changes in this patch. Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Jens Axboe <axboe@kernel.dk> --- fs/namei.c | 43 +++++++++++++++++-------------------------- 1 file changed, 17 insertions(+), 26 deletions(-)