Message ID | 20241009040316.GY4017910@ZenIV (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [RFC] getname_maybe_null() - the third variant of pathname copy-in | expand |
On Wed, Oct 09, 2024 at 05:03:16AM +0100, Al Viro wrote: > [ > in #work.getname; if nobody objects, I'm going to make #work.xattr pull that. > IMO it's saner than hacks around vfs_empty_path() and it does very similar > logics - with simpler handling on the caller side. > ] > > Semantics used by statx(2) (and later *xattrat(2)): without AT_EMPTY_PATH > it's standard getname() (i.e. ERR_PTR(-ENOENT) on empty string, > ERR_PTR(-EFAULT) on NULL), with AT_EMPTY_PATH both empty string and > NULL are accepted. > > Calling conventions: getname_maybe_null(user_pointer, flags) returns > * pointer to struct filename when non-empty string had been > successfully read > * ERR_PTR(...) on error > * NULL if an empty string or NULL pointer had been given > with AT_EMPTY_FLAGS in the flags argument. > > It tries to avoid allocation in the last case; it's not always > able to do so, in which case the temporary struct filename instance > is freed and NULL returned anyway. > > Fast path is inlined. > > Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> > --- Looks good. Fyi, I'm using your #base.getname as a base for some other work that I'm currently doing. So please don't rebase #base.getname anymore. ;) Since you have your #work.xattr and #work.stat using it as base it seems pretty unlikely anyway but I just thought I mention explicitly that I'm relying on that #base.getname branch.
On Tue, Oct 15, 2024 at 04:05:15PM +0200, Christian Brauner wrote: > Looks good. > > Fyi, I'm using your #base.getname as a base for some other work that I'm > currently doing. So please don't rebase #base.getname anymore. ;) > > Since you have your #work.xattr and #work.stat using it as base it seems > pretty unlikely anyway but I just thought I mention explicitly that I'm > relying on that #base.getname branch. FWIW, I see a problem with that sucker. The trouble is in the combination AT_FDCWD, "", AT_EMPTY_PATH. vfs_empty_path() returns false on that, so fstatat() in mainline falls back to vfs_statx() and does the right thing. This variant does _not_. Note that your variant of xattr series also ended up failing on e.g. getxattrat() with such combination: if (at_flags & AT_EMPTY_PATH && vfs_empty_path(dfd, pathname)) { CLASS(fd, f)(dfd); if (!f.file) return -EBADF; audit_file(f.file); return getxattr(file_mnt_idmap(f.file), file_dentry(f.file), name, value, size); } lookup_flags = (at_flags & AT_SYMLINK_NOFOLLOW) ? 0 : LOOKUP_FOLLOW; retry: error = user_path_at(dfd, pathname, lookup_flags, &path); ended up calling user_path_at() with empty pathname and nothing like LOOKUP_EMPTY in lookup_flags. Which bails out with -ENOENT, since getname() in there does so. My variant bails out with -EBADF and I'd argue that neither is correct. Not sure what's the sane solution here, need to think for a while...
On Wed, Oct 16, 2024 at 06:09:08AM +0100, Al Viro wrote: > On Tue, Oct 15, 2024 at 04:05:15PM +0200, Christian Brauner wrote: > > Looks good. > > > > Fyi, I'm using your #base.getname as a base for some other work that I'm > > currently doing. So please don't rebase #base.getname anymore. ;) > > > > Since you have your #work.xattr and #work.stat using it as base it seems > > pretty unlikely anyway but I just thought I mention explicitly that I'm > > relying on that #base.getname branch. > > FWIW, I see a problem with that sucker. The trouble is in the > combination AT_FDCWD, "", AT_EMPTY_PATH. vfs_empty_path() returns Yeah, we're aware. > false on that, so fstatat() in mainline falls back to vfs_statx() and > does the right thing. This variant does _not_. > > Note that your variant of xattr series also ended up failing on e.g. > getxattrat() with such combination: > if (at_flags & AT_EMPTY_PATH && vfs_empty_path(dfd, pathname)) { > CLASS(fd, f)(dfd); > if (!f.file) > return -EBADF; > audit_file(f.file); > return getxattr(file_mnt_idmap(f.file), file_dentry(f.file), > name, value, size); > } > > lookup_flags = (at_flags & AT_SYMLINK_NOFOLLOW) ? 0 : LOOKUP_FOLLOW; > > retry: > error = user_path_at(dfd, pathname, lookup_flags, &path); > > ended up calling user_path_at() with empty pathname and nothing like LOOKUP_EMPTY > in lookup_flags. Which bails out with -ENOENT, since getname() in there does > so. My variant bails out with -EBADF and I'd argue that neither is correct. > > Not sure what's the sane solution here, need to think for a while... Fwiw, in the other thread we concluded to just not care about AT_FDCWD with "". And so far I agree with that.
On Wed, Oct 16, 2024 at 10:32:16AM +0200, Christian Brauner wrote: > > ended up calling user_path_at() with empty pathname and nothing like LOOKUP_EMPTY > > in lookup_flags. Which bails out with -ENOENT, since getname() in there does > > so. My variant bails out with -EBADF and I'd argue that neither is correct. > > > > Not sure what's the sane solution here, need to think for a while... > > Fwiw, in the other thread we concluded to just not care about AT_FDCWD with "". > And so far I agree with that. Subject:?
On Wed, Oct 16, 2024 at 03:00:50PM +0100, Al Viro wrote: > On Wed, Oct 16, 2024 at 10:32:16AM +0200, Christian Brauner wrote: > > > > ended up calling user_path_at() with empty pathname and nothing like LOOKUP_EMPTY > > > in lookup_flags. Which bails out with -ENOENT, since getname() in there does > > > so. My variant bails out with -EBADF and I'd argue that neither is correct. > > > > > > Not sure what's the sane solution here, need to think for a while... > > > > Fwiw, in the other thread we concluded to just not care about AT_FDCWD with "". > > And so far I agree with that. > > Subject:? https://lore.kernel.org/r/CAGudoHHdccL5Lh8zAO-0swqqRCW4GXMSXhq4jQGoVj=UdBK-Lg@mail.gmail.com Hm, this only speaks about the NULL case. I just looked through codesearch on github and on debian and the only example I found was https://sources.debian.org/src/snapd/2.65.3-1/tests/main/seccomp-statx/test-snapd-statx/bin/statx.py/?hl=71#L71 So really, just special-case it for statx() imho instead of spreading that ugliness everywhere?
On Wed, Oct 16, 2024 at 04:49:48PM +0200, Christian Brauner wrote: > On Wed, Oct 16, 2024 at 03:00:50PM +0100, Al Viro wrote: > > On Wed, Oct 16, 2024 at 10:32:16AM +0200, Christian Brauner wrote: > > > > > > ended up calling user_path_at() with empty pathname and nothing like LOOKUP_EMPTY > > > > in lookup_flags. Which bails out with -ENOENT, since getname() in there does > > > > so. My variant bails out with -EBADF and I'd argue that neither is correct. > > > > > > > > Not sure what's the sane solution here, need to think for a while... > > > > > > Fwiw, in the other thread we concluded to just not care about AT_FDCWD with "". > > > And so far I agree with that. > > > > Subject:? > > https://lore.kernel.org/r/CAGudoHHdccL5Lh8zAO-0swqqRCW4GXMSXhq4jQGoVj=UdBK-Lg@mail.gmail.com > > Hm, this only speaks about the NULL case. > > > I just looked through codesearch on github and on debian and the only > example I found was > https://sources.debian.org/src/snapd/2.65.3-1/tests/main/seccomp-statx/test-snapd-statx/bin/statx.py/?hl=71#L71 > > So really, just special-case it for statx() imho instead of spreading > that ugliness everywhere? Not sure, TBH. I wonder if it would be simpler to have filename_lookup() accept NULL for pathname and have it treat that as "". Look: all it takes is the following trick * add const char *pathname to struct nameidata * in __set_nameidata() add p->pathname = likely(name) ? name->name : ""; * in path_init() replace const char *s = nd->name->name; with const char *s = nd->pathname; and we are done. Oh, and teach putname() to treat NULL as no-op. With such changes in fs/namei.c we could do struct filename *name = getname_maybe_null(filename, flags); if (!name && dfd != AT_FDCWD) return do_statx_fd(dfd, flags & ~AT_NO_AUTOMOUNT, mask, buffer); ret = vfs_statx(dfd, name, statx_flags, stat, STATX_BASIC_STATS); putname(name); return ret; in statx(2) and similar in vfs_fstatat(). With that I'm not even sure we want to bother with separate vfs_statx_fd() - the overhead might very well turn out to be tolerable. It is non-zero, but that's a fairly straight trip through filename_lookup() machinery. Would require some profiling, though...
On Fri, Oct 18, 2024 at 12:54:59AM +0100, Al Viro wrote: > On Wed, Oct 16, 2024 at 04:49:48PM +0200, Christian Brauner wrote: > > On Wed, Oct 16, 2024 at 03:00:50PM +0100, Al Viro wrote: > > > On Wed, Oct 16, 2024 at 10:32:16AM +0200, Christian Brauner wrote: > > > > > > > > ended up calling user_path_at() with empty pathname and nothing like LOOKUP_EMPTY > > > > > in lookup_flags. Which bails out with -ENOENT, since getname() in there does > > > > > so. My variant bails out with -EBADF and I'd argue that neither is correct. > > > > > > > > > > Not sure what's the sane solution here, need to think for a while... > > > > > > > > Fwiw, in the other thread we concluded to just not care about AT_FDCWD with "". > > > > And so far I agree with that. > > > > > > Subject:? > > > > https://lore.kernel.org/r/CAGudoHHdccL5Lh8zAO-0swqqRCW4GXMSXhq4jQGoVj=UdBK-Lg@mail.gmail.com > > > > Hm, this only speaks about the NULL case. > > > > > > I just looked through codesearch on github and on debian and the only > > example I found was > > https://sources.debian.org/src/snapd/2.65.3-1/tests/main/seccomp-statx/test-snapd-statx/bin/statx.py/?hl=71#L71 > > > > So really, just special-case it for statx() imho instead of spreading > > that ugliness everywhere? > > Not sure, TBH. I wonder if it would be simpler to have filename_lookup() > accept NULL for pathname and have it treat that as "". > > Look: all it takes is the following trick > * add const char *pathname to struct nameidata > * in __set_nameidata() add > p->pathname = likely(name) ? name->name : ""; > * in path_init() replace > const char *s = nd->name->name; > with > const char *s = nd->pathname; > and we are done. Oh, and teach putname() to treat NULL as no-op. I know, that's what I suggested to Linus initially but he NAKed it because he didn't want the extra cycles. > > With such changes in fs/namei.c we could do > struct filename *name = getname_maybe_null(filename, flags); > > if (!name && dfd != AT_FDCWD) > return do_statx_fd(dfd, flags & ~AT_NO_AUTOMOUNT, mask, buffer); > ret = vfs_statx(dfd, name, statx_flags, stat, STATX_BASIC_STATS); > putname(name); > return ret; > in statx(2) and similar in vfs_fstatat(). > > With that I'm not even sure we want to bother with separate > vfs_statx_fd() - the overhead might very well turn out to > be tolerable. It is non-zero, but that's a fairly straight > trip through filename_lookup() machinery. Would require some > profiling, though...
On Fri, Oct 18, 2024 at 01:06:12PM +0200, Christian Brauner wrote: > > Look: all it takes is the following trick > > * add const char *pathname to struct nameidata > > * in __set_nameidata() add > > p->pathname = likely(name) ? name->name : ""; > > * in path_init() replace > > const char *s = nd->name->name; > > with > > const char *s = nd->pathname; > > and we are done. Oh, and teach putname() to treat NULL as no-op. > > I know, that's what I suggested to Linus initially but he NAKed it > because he didn't want the extra cycles. Extra cycles where? If anything, I'd expect a too-small-to-measure speedup due to dereference shifted from path_init() to __set_nameidata(). Below is literally all it takes to make filename_lookup() treat NULL as empty-string name. NOTE: I'm not talking about forcing the pure by-descriptor case through the dfd+pathname codepath; not without serious profiling. But treating AT_FDCWD + NULL by the delta below and passing NULL struct filename to filename_lookup()? Where do you expect to have the lost cycles on that? diff --git a/fs/namei.c b/fs/namei.c index 4a4a22a08ac2..fc2053877e5c 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -264,7 +264,7 @@ EXPORT_SYMBOL(getname_kernel); void putname(struct filename *name) { - if (IS_ERR(name)) + if (IS_ERR_OR_NULL(name)) return; if (WARN_ON_ONCE(!atomic_read(&name->refcnt))) @@ -588,6 +588,7 @@ struct nameidata { unsigned seq; } *stack, internal[EMBEDDED_LEVELS]; struct filename *name; + const char *pathname; struct nameidata *saved; unsigned root_seq; int dfd; @@ -606,6 +607,7 @@ static void __set_nameidata(struct nameidata *p, int dfd, struct filename *name) p->depth = 0; p->dfd = dfd; p->name = name; + p->pathname = likely(name) ? name->name : ""; p->path.mnt = NULL; p->path.dentry = NULL; p->total_link_count = old ? old->total_link_count : 0; @@ -2439,7 +2441,7 @@ static int link_path_walk(const char *name, struct nameidata *nd) static const char *path_init(struct nameidata *nd, unsigned flags) { int error; - const char *s = nd->name->name; + const char *s = nd->pathname; /* LOOKUP_CACHED requires RCU, ask caller to retry */ if ((flags & (LOOKUP_RCU | LOOKUP_CACHED)) == LOOKUP_CACHED)
On Fri, Oct 18, 2024 at 05:51:58PM +0100, Al Viro wrote: > Extra cycles where? If anything, I'd expect a too-small-to-measure > speedup due to dereference shifted from path_init() to __set_nameidata(). > Below is literally all it takes to make filename_lookup() treat NULL > as empty-string name. > > NOTE: I'm not talking about forcing the pure by-descriptor case through > the dfd+pathname codepath; not without serious profiling. But treating > AT_FDCWD + NULL by the delta below and passing NULL struct filename to > filename_lookup()? Where do you expect to have the lost cycles on that? [snip] BTW, could you give me a reference to the mail with those objections? I don't see anything in my mailbox, but... Or was that in one of those AT_EMPTY_PATH_NOCHECK (IIRC?) threads? Anyway, what I'm suggesting is 1) teach filename_lookup() to handle NULL struct filename * argument, treating it as "". Trivial and does not impose any overhead on the normal cases. 2) have statx try to recognize AT_EMPTY_PATH, "" and AT_EMPTY_PATH, NULL. If we have that and dfd is *NOT* AT_FDCWD, we have a nice descriptor-based case and can deal with it. If the name is not empty, we have to go for dfd+filename path. Also obvious. Where we get trouble is AT_FDCWD, NULL case. But with (1) we can simply route that to the same dfd+filename path, just passing it NULL for filename. That handles the currently broken case, with very little disruption to anything else. What's more, the logics for "is it NULL or empty with AT_EMPTY_PATH" no longer needs to care about dfd. We can encapsulate it nicely into a function that takes userland pointer + flags and does the following: if no AT_EMPTY_PATH in flags return getname(pointer) if pointer == NULL (same as vfs_empty_path()) return NULL peek at the first byte, if it's '\0' (same as vfs_empty_path()) return NULL name = getname_flags(pointer, LOOKUP_EMPTY) if IS_ERR(name) return name if unlikely(name is empty) putname(name) return NULL return name Then statx() (or anyone who wants similar AT_EMPTY_PATH + NULL semantics) can do this: name = getname_maybe_null(user_pointer, flags) if (!name && dfd >= 0) // or dfd != AT_FDCWD, perhaps do_something_by_fd(dfd, ...) else do_something_by_name(dfd, name, ...) without the bitrot-prone boilerplate.
On Fri, Oct 18, 2024 at 08:38:22PM +0100, Al Viro wrote: > On Fri, Oct 18, 2024 at 05:51:58PM +0100, Al Viro wrote: > > > Extra cycles where? If anything, I'd expect a too-small-to-measure > > speedup due to dereference shifted from path_init() to __set_nameidata(). > > Below is literally all it takes to make filename_lookup() treat NULL > > as empty-string name. > > > > NOTE: I'm not talking about forcing the pure by-descriptor case through > > the dfd+pathname codepath; not without serious profiling. But treating > > AT_FDCWD + NULL by the delta below and passing NULL struct filename to > > filename_lookup()? Where do you expect to have the lost cycles on that? > > [snip] > > BTW, could you give me a reference to the mail with those objections? > I don't see anything in my mailbox, but... > > Or was that in one of those AT_EMPTY_PATH_NOCHECK (IIRC?) threads? > > Anyway, what I'm suggesting is > > 1) teach filename_lookup() to handle NULL struct filename * argument, treating > it as "". Trivial and does not impose any overhead on the normal cases. > > 2) have statx try to recognize AT_EMPTY_PATH, "" and AT_EMPTY_PATH, NULL. > If we have that and dfd is *NOT* AT_FDCWD, we have a nice descriptor-based > case and can deal with it. > If the name is not empty, we have to go for dfd+filename path. Also obvious. > Where we get trouble is AT_FDCWD, NULL case. But with (1) we can simply > route that to the same dfd+filename path, just passing it NULL for filename. > > That handles the currently broken case, with very little disruption to > anything else. See #getname.fixup; on top of #base.getname and IMO worth folding into it. I don't believe that it's going to give any measurable slowdown compared to mainline. Again, if the concerns about wasted cycles had been about routing the dfd,"" and dfd,NULL cases through the filename_lookup(), this does *NOT* happen with that patch. Christian, Linus? commit 9fb26eb324d9aa9e6889f181f1683e710946258f Author: Al Viro <viro@zeniv.linux.org.uk> Date: Sat Oct 19 00:57:14 2024 -0400 fix statx(2) regression on AT_FDCWD,"" and AT_FDCWD,NULL teach filename_lookup() et.al. to handle NULL struct filename reference as ""; make AT_FDCWD,"" (but _NOT_ the normal dfd,"") case in statx(2) fall back to path-based variant. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> diff --git a/fs/namei.c b/fs/namei.c index 27eb0a81d9b8..2bfe476c3bd0 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -280,7 +280,7 @@ EXPORT_SYMBOL(getname_kernel); void putname(struct filename *name) { - if (IS_ERR(name)) + if (IS_ERR_OR_NULL(name)) return; if (WARN_ON_ONCE(!atomic_read(&name->refcnt))) @@ -604,6 +604,7 @@ struct nameidata { unsigned seq; } *stack, internal[EMBEDDED_LEVELS]; struct filename *name; + const char *pathname; struct nameidata *saved; unsigned root_seq; int dfd; @@ -622,6 +623,7 @@ static void __set_nameidata(struct nameidata *p, int dfd, struct filename *name) p->depth = 0; p->dfd = dfd; p->name = name; + p->pathname = likely(name) ? name->name : ""; p->path.mnt = NULL; p->path.dentry = NULL; p->total_link_count = old ? old->total_link_count : 0; @@ -2455,7 +2457,7 @@ static int link_path_walk(const char *name, struct nameidata *nd) static const char *path_init(struct nameidata *nd, unsigned flags) { int error; - const char *s = nd->name->name; + const char *s = nd->pathname; /* LOOKUP_CACHED requires RCU, ask caller to retry */ if ((flags & (LOOKUP_RCU | LOOKUP_CACHED)) == LOOKUP_CACHED) diff --git a/fs/stat.c b/fs/stat.c index d0d82efd44d6..b74831dc7ae6 100644 --- a/fs/stat.c +++ b/fs/stat.c @@ -328,7 +328,7 @@ int vfs_fstatat(int dfd, const char __user *filename, int statx_flags = flags | AT_NO_AUTOMOUNT; struct filename *name = getname_maybe_null(filename, flags); - if (!name) + if (!name && dfd >= 0) return vfs_fstat(dfd, stat); ret = vfs_statx(dfd, name, statx_flags, stat, STATX_BASIC_STATS); @@ -769,7 +769,7 @@ SYSCALL_DEFINE5(statx, int ret; struct filename *name = getname_maybe_null(filename, flags); - if (!name) + if (!name && dfd >= 0) return do_statx_fd(dfd, flags & ~AT_NO_AUTOMOUNT, mask, buffer); ret = do_statx(dfd, name, flags, mask, buffer);
On Fri, 18 Oct 2024 at 22:03, Al Viro <viro@zeniv.linux.org.uk> wrote: > > See #getname.fixup; on top of #base.getname and IMO worth folding into it. > I don't believe that it's going to give any measurable slowdown compared to > mainline. Again, if the concerns about wasted cycles had been about routing > the dfd,"" and dfd,NULL cases through the filename_lookup(), this does *NOT* > happen with that patch. Christian, Linus? I'm certainly ok with this, although I did react to the - if (!name) + if (!name && dfd >= 0) changes. I see why you do them, but it makes me wonder if maybe getname_maybe_null() should just get the dfd too. And then in getname_maybe_null(), the if (!name) return NULL; would become if (!name) return dfd >= 0 ? NULL : ERR_PTR(-EBADF); because while I'm not entirely against it, I'm not convinced we really want to support fstatat(AT_FDCWD, NULL, &st, AT_EMPTY_PATH); becoming the same as fstat(".", &st); IOW, I think the (NULL, AT_EMPTY_PATH) tuple makes sense as a way to pass just an 'fd', but I'm _not_ convinced it makes sense as a way to pass in AT_FDCWD. Put another way: imagine you have a silly library implementation that does #define fstat(fd,buf) fstatat(fd, NULL, buf, AT_EMPTY_PATH) and I think *that* is what we want to support. But if 'fd' is not a valid fd, you should get -EBADF, not "fstat of current working directlry". Hmm? This is not a hugely important thing. If people really think that doing "fstat()" on the current working directory is worth doing this for, then whatever. And yes, there's some argument for returning -EFAULT, but I do think that "treat NULL+AT_EMPTY_PATH as the non-at version" argument is stronger. But the whole "some argument" does mean that I can certainly be convinced that I'm just wrong. Linus
On Sat, Oct 19, 2024 at 09:15:32AM -0700, Linus Torvalds wrote: > IOW, I think the (NULL, AT_EMPTY_PATH) tuple makes sense as a way to > pass just an 'fd', but I'm _not_ convinced it makes sense as a way to > pass in AT_FDCWD. > > Put another way: imagine you have a silly library implementation that does > > #define fstat(fd,buf) fstatat(fd, NULL, buf, AT_EMPTY_PATH) > > and I think *that* is what we want to support. But if 'fd' is not a > valid fd, you should get -EBADF, not "fstat of current working > directlry". > > Hmm? There'd been an example of live use of that posted upthread: https://sources.debian.org/src/snapd/2.65.3-1/tests/main/seccomp-statx/test-snapd-statx/bin/statx.py/?hl=71#L71 "" rather than NULL, but... > This is not a hugely important thing. If people really think that > doing "fstat()" on the current working directory is worth doing this > for, then whatever.
On Sat, 19 Oct 2024 at 10:11, Al Viro <viro@zeniv.linux.org.uk> wrote: > > There'd been an example of live use of that posted upthread: > > https://sources.debian.org/src/snapd/2.65.3-1/tests/main/seccomp-statx/test-snapd-statx/bin/statx.py/?hl=71#L71 > > "" rather than NULL, but... Note that outside of a library, I'd argue that using NULL is an active bug, and should not be supported. It's purely a Linux extension, after all. That said, I do think my "what if a library implements fstat() using fstatat()" argument could equally well be "what if a library does a Linux-specific optimization and uses NULL instead of the empty string". So I guess it would be entirely valid for a Linux libc to do something like int fstatat(int dirfd, const char *restrict pathname, struct stat *restrict statbuf, int flags) { if ((flags & AT_EMPTY_PATH) && !*pathname) pathname = NULL; return SYSCALL(__NR_fstatat, dirfd, pathname, statbuf, flags); } so yeah, I guess it's however we just decide to document it. Linus
On Fri, Oct 18, 2024 at 05:51:58PM +0100, Al Viro wrote: > On Fri, Oct 18, 2024 at 01:06:12PM +0200, Christian Brauner wrote: > > > Look: all it takes is the following trick > > > * add const char *pathname to struct nameidata > > > * in __set_nameidata() add > > > p->pathname = likely(name) ? name->name : ""; > > > * in path_init() replace > > > const char *s = nd->name->name; > > > with > > > const char *s = nd->pathname; > > > and we are done. Oh, and teach putname() to treat NULL as no-op. > > > > I know, that's what I suggested to Linus initially but he NAKed it > > because he didn't want the extra cycles. > > Extra cycles where? If anything, I'd expect a too-small-to-measure > speedup due to dereference shifted from path_init() to __set_nameidata(). > Below is literally all it takes to make filename_lookup() treat NULL > as empty-string name. > > NOTE: I'm not talking about forcing the pure by-descriptor case through > the dfd+pathname codepath; not without serious profiling. But treating Ah ok, that makes sense. I thought you were talking about that... > AT_FDCWD + NULL by the delta below and passing NULL struct filename to > filename_lookup()? Yes, that's good!
On Sat, Oct 19, 2024 at 06:11:18PM +0100, Al Viro wrote: > On Sat, Oct 19, 2024 at 09:15:32AM -0700, Linus Torvalds wrote: > > > IOW, I think the (NULL, AT_EMPTY_PATH) tuple makes sense as a way to > > pass just an 'fd', but I'm _not_ convinced it makes sense as a way to > > pass in AT_FDCWD. > > > > Put another way: imagine you have a silly library implementation that does > > > > #define fstat(fd,buf) fstatat(fd, NULL, buf, AT_EMPTY_PATH) > > > > and I think *that* is what we want to support. But if 'fd' is not a > > valid fd, you should get -EBADF, not "fstat of current working > > directlry". > > > > Hmm? > > There'd been an example of live use of that posted upthread: > > https://sources.debian.org/src/snapd/2.65.3-1/tests/main/seccomp-statx/test-snapd-statx/bin/statx.py/?hl=71#L71 > > "" rather than NULL, but... Which I think is really broken testing nonsense anyway so really we could easily get them to change this and not even have to worry about this. We've done stuff like that before. But I'm fine either way.
On Sat, Oct 19, 2024 at 06:03:22AM +0100, Al Viro wrote: > On Fri, Oct 18, 2024 at 08:38:22PM +0100, Al Viro wrote: > > On Fri, Oct 18, 2024 at 05:51:58PM +0100, Al Viro wrote: > > > > > Extra cycles where? If anything, I'd expect a too-small-to-measure > > > speedup due to dereference shifted from path_init() to __set_nameidata(). > > > Below is literally all it takes to make filename_lookup() treat NULL > > > as empty-string name. > > > > > > NOTE: I'm not talking about forcing the pure by-descriptor case through > > > the dfd+pathname codepath; not without serious profiling. But treating > > > AT_FDCWD + NULL by the delta below and passing NULL struct filename to > > > filename_lookup()? Where do you expect to have the lost cycles on that? > > > > [snip] > > > > BTW, could you give me a reference to the mail with those objections? > > I don't see anything in my mailbox, but... > > > > Or was that in one of those AT_EMPTY_PATH_NOCHECK (IIRC?) threads? > > > > Anyway, what I'm suggesting is > > > > 1) teach filename_lookup() to handle NULL struct filename * argument, treating > > it as "". Trivial and does not impose any overhead on the normal cases. > > > > 2) have statx try to recognize AT_EMPTY_PATH, "" and AT_EMPTY_PATH, NULL. > > If we have that and dfd is *NOT* AT_FDCWD, we have a nice descriptor-based > > case and can deal with it. > > If the name is not empty, we have to go for dfd+filename path. Also obvious. > > Where we get trouble is AT_FDCWD, NULL case. But with (1) we can simply > > route that to the same dfd+filename path, just passing it NULL for filename. > > > > That handles the currently broken case, with very little disruption to > > anything else. > > See #getname.fixup; on top of #base.getname and IMO worth folding into it. Yes, please fold so I can rebase my series on top of it. > I don't believe that it's going to give any measurable slowdown compared to > mainline. Again, if the concerns about wasted cycles had been about routing > the dfd,"" and dfd,NULL cases through the filename_lookup(), this does *NOT* > happen with that patch. Christian, Linus? This looks good! Reviewed-by: Christian Brauner <brauner@kernel.org>
On Fri, Oct 18, 2024 at 08:38:22PM +0100, Al Viro wrote: > On Fri, Oct 18, 2024 at 05:51:58PM +0100, Al Viro wrote: > > > Extra cycles where? If anything, I'd expect a too-small-to-measure > > speedup due to dereference shifted from path_init() to __set_nameidata(). > > Below is literally all it takes to make filename_lookup() treat NULL > > as empty-string name. > > > > NOTE: I'm not talking about forcing the pure by-descriptor case through > > the dfd+pathname codepath; not without serious profiling. But treating > > AT_FDCWD + NULL by the delta below and passing NULL struct filename to > > filename_lookup()? Where do you expect to have the lost cycles on that? > > [snip] > > BTW, could you give me a reference to the mail with those objections? > I don't see anything in my mailbox, but... I had to search for quite a bit myself: https://lore.kernel.org/r/CAHk-=wifPKRG2w4mw+YchNtAuk4mMJBde7bG-Z7wt0+ZeQMJ_A@mail.gmail.com
On Mon, Oct 21, 2024 at 02:47:59PM +0200, Christian Brauner wrote: > On Fri, Oct 18, 2024 at 08:38:22PM +0100, Al Viro wrote: > > On Fri, Oct 18, 2024 at 05:51:58PM +0100, Al Viro wrote: > > > > > Extra cycles where? If anything, I'd expect a too-small-to-measure > > > speedup due to dereference shifted from path_init() to __set_nameidata(). > > > Below is literally all it takes to make filename_lookup() treat NULL > > > as empty-string name. > > > > > > NOTE: I'm not talking about forcing the pure by-descriptor case through > > > the dfd+pathname codepath; not without serious profiling. But treating > > > AT_FDCWD + NULL by the delta below and passing NULL struct filename to > > > filename_lookup()? Where do you expect to have the lost cycles on that? > > > > [snip] > > > > BTW, could you give me a reference to the mail with those objections? > > I don't see anything in my mailbox, but... > > I had to search for quite a bit myself: > > https://lore.kernel.org/r/CAHk-=wifPKRG2w4mw+YchNtAuk4mMJBde7bG-Z7wt0+ZeQMJ_A@mail.gmail.com Re get_user() - there's one architecture where this fetch is a clear loss. Take a look at what um is doing; it's a full page table walk, then (single-byte) memcpy(). With no caching of page table walk results, so strncpy_from_user() in case the sucker is _not_ empty will have to start from scratch (and it's _not_ generic strncpy_from_user() there, for the same reasons). BTW, I wonder if we could speed the things up on um by caching the last page table walk result - and treating that as a TLB. Might make back-to-back get_user()/put_user() seriously cheaper there - unsafe_get_user() could grow a fastpath, possibly making generic strncpy_from_user() cheap enough to be used. If that is feasible, the only non-generic variant would remain on mips, and that's a lot less convincing case than um. Possibly strnlen_user(), as well - that one has a variant on xtensa, but that's also not an obvious win compared to generic... That's a separate story, anyway.
On Mon, Oct 21, 2024 at 02:39:58PM +0200, Christian Brauner wrote: > > See #getname.fixup; on top of #base.getname and IMO worth folding into it. > > Yes, please fold so I can rebase my series on top of it. OK... What I have is #base.getname-fixed, with two commits - trivial "teach filename_lookup() to accept NULL" and introducing getname_maybe_null(), with fix folded in. #work.xattr2 and #work.statx2 are on top of that.
On Mon, Oct 21, 2024 at 06:09:10PM +0100, Al Viro wrote: > On Mon, Oct 21, 2024 at 02:39:58PM +0200, Christian Brauner wrote: > > > > See #getname.fixup; on top of #base.getname and IMO worth folding into it. > > > > Yes, please fold so I can rebase my series on top of it. > > OK... What I have is #base.getname-fixed, with two commits - trivial > "teach filename_lookup() to accept NULL" and introducing getname_maybe_null(), > with fix folded in. > > #work.xattr2 and #work.statx2 are on top of that. BTW, speaking of statx() - I would rather lift the call of cp_statx() out of do_statx() and do_statx_fd() into the callers. Yes, that needs making it non-static, due to io_uring; not a problem, IMO - it fits into the "how do we copy internal objects to userland ones" family of helpers. Another fun issue: for by-pathname case vfs_fstatat() ends up hitting the same vfs_statx_path() as statx(2); however, for by-descriptor case they do vfs_getattr() and vfs_statx_path() resp. The difference is, vfs_statx_path() has if (request_mask & STATX_MNT_ID_UNIQUE) { stat->mnt_id = real_mount(path->mnt)->mnt_id_unique; stat->result_mask |= STATX_MNT_ID_UNIQUE; } else { stat->mnt_id = real_mount(path->mnt)->mnt_id; stat->result_mask |= STATX_MNT_ID; } if (path_mounted(path)) stat->attributes |= STATX_ATTR_MOUNT_ROOT; stat->attributes_mask |= STATX_ATTR_MOUNT_ROOT; /* * If this is a block device inode, override the filesystem * attributes with the block device specific parameters that need to be * obtained from the bdev backing inode. */ if (S_ISBLK(stat->mode)) bdev_statx(path, stat, request_mask); done after vfs_getattr(). Questions: 1) why is STATX_MNT_ID set without checking if it's in the mask passed to the damn thing? 2) why, in the name of everything unholy, does statx() on /dev/weird_shite trigger modprobe on that thing? Without even a permission check on it...
On Mon, Oct 21, 2024 at 11:43:13PM +0100, Al Viro wrote: > On Mon, Oct 21, 2024 at 06:09:10PM +0100, Al Viro wrote: > > On Mon, Oct 21, 2024 at 02:39:58PM +0200, Christian Brauner wrote: > > > > > > See #getname.fixup; on top of #base.getname and IMO worth folding into it. > > > > > > Yes, please fold so I can rebase my series on top of it. > > > > OK... What I have is #base.getname-fixed, with two commits - trivial > > "teach filename_lookup() to accept NULL" and introducing getname_maybe_null(), > > with fix folded in. > > > > #work.xattr2 and #work.statx2 are on top of that. > > BTW, speaking of statx() - I would rather lift the call of cp_statx() out > of do_statx() and do_statx_fd() into the callers. Yes, that needs making > it non-static, due to io_uring; not a problem, IMO - it fits into the > "how do we copy internal objects to userland ones" family of helpers. > > Another fun issue: for by-pathname case vfs_fstatat() ends up hitting > the same vfs_statx_path() as statx(2); however, for by-descriptor case > they do vfs_getattr() and vfs_statx_path() resp. > > The difference is, vfs_statx_path() has > if (request_mask & STATX_MNT_ID_UNIQUE) { > stat->mnt_id = real_mount(path->mnt)->mnt_id_unique; > stat->result_mask |= STATX_MNT_ID_UNIQUE; > } else { > stat->mnt_id = real_mount(path->mnt)->mnt_id; > stat->result_mask |= STATX_MNT_ID; > } > > if (path_mounted(path)) > stat->attributes |= STATX_ATTR_MOUNT_ROOT; > stat->attributes_mask |= STATX_ATTR_MOUNT_ROOT; > > /* > * If this is a block device inode, override the filesystem > * attributes with the block device specific parameters that need to be > * obtained from the bdev backing inode. > */ > if (S_ISBLK(stat->mode)) > bdev_statx(path, stat, request_mask); > done after vfs_getattr(). Questions: > > 1) why is STATX_MNT_ID set without checking if it's in the mask passed to > the damn thing? If you look at the history you'll find that STATX_MNT_ID has always been set unconditionally, i.e., userspace didn't have to request it explicitly. And that's fine. It's not like it's expensive. > > 2) why, in the name of everything unholy, does statx() on /dev/weird_shite > trigger modprobe on that thing? Without even a permission check on it... Block people should know. But afaict, that's a block device thing which happens with CONFIG_BLOCK_LEGACY_AUTOLOAD enabled which is supposedly deprecated.
On Tue, Oct 22, 2024 at 10:49:59AM +0200, Christian Brauner wrote: > > 1) why is STATX_MNT_ID set without checking if it's in the mask passed to > > the damn thing? > > If you look at the history you'll find that STATX_MNT_ID has always been > set unconditionally, i.e., userspace didn't have to request it > explicitly. And that's fine. It's not like it's expensive. Not the issue, really - the difference between the by-fd (when we call vfs_fstat()) and by-path (calling vfs_statx()) cases of vfs_fstatat() is what worries me. In by-fd case you hit vfs_getattr() and that's it. In by-path case you hit exact same helper you do for statx(2), with STATX_BASIC_FLAGS for the mask argument. Which ends with vfs_getattr() + a bunch of followups in vfs_statx_path(). _IF_ all those followups had been no-ops when mask is just STATX_BASIC_FLAGS, everything would've been fine. They are not, though. And while all current users of vfs_fstatat() ignore stat->attributes, stat->attributes_mask and stat->mnt_id (as well as stat->result_mask), that's a property of code we feed that struct kstat to after vfs_fstatat() call has filled it. Currently that's 9 functions, spread over a lot of places. Sure, each of those is fine, but any additional instance that does care and we are getting an unpleasant inconsistency between by-fd and by-path users. Variants: 1) We can clone vfs_statx(), have the clone call vfs_getattr() instead of vfs_statx_path() and use it in vfs_fstatat(). That would take care of any future inconsistencies (as well as a minor side benefit of losing request_mask argument in that thing). OTOH, it's extra code duplication. 2) go for your "it's cheap anyway" and have vfs_fstatat() use vfs_statx_fd(fd, flags, stat, STATX_BASIC_FLAGS) on the by-fd path. Again, that restores consistency. Cost: that extra work (tail of vfs_statx_path()) done for fstat(2) just as it's currently done for stat(2). 3) add an explicit "none of that fancy crap" flag argument to vfs_statx_path() and pass it through vfs_statx(). Cost: extra argument passed through the call chain. And yes, it's all cheap, but {f,}stat(2) is often _very_ hot, so it's worth getting right. > > 2) why, in the name of everything unholy, does statx() on /dev/weird_shite > > trigger modprobe on that thing? Without even a permission check on it... > > Block people should know. But afaict, that's a block device thing which > happens with CONFIG_BLOCK_LEGACY_AUTOLOAD enabled which is supposedly > deprecated. Umm... Deprecated or not, one generally expects that if /dev/foo is owned by root:root with 0600 for permissions, then non-root won't be able to trigger anything in the vicinity of the hardware in question, including "load the driver"... Note that open(2) won't get anywhere near blkdev_open() - it will get EPERM before that. stat(2) also won't go there. statx(2) will. And frankly, getting the STATX_DIOALIGN and STATX_WRITE_ATOMIC information out of such device is also somewhat dubious, but that's less obviously wrong.
--- a/fs/namei.c +++ b/fs/namei.c @@ -211,22 +211,38 @@ getname_flags(const char __user *filename, int flags) return result; } -struct filename * -getname_uflags(const char __user *filename, int uflags) +struct filename *getname_uflags(const char __user *filename, int uflags) { int flags = (uflags & AT_EMPTY_PATH) ? LOOKUP_EMPTY : 0; return getname_flags(filename, flags); } -struct filename * -getname(const char __user * filename) +struct filename *getname(const char __user * filename) { return getname_flags(filename, 0); } -struct filename * -getname_kernel(const char * filename) +struct filename *__getname_maybe_null(const char __user *pathname) +{ + struct filename *name; + char c; + + /* try to save on allocations; loss on um, though */ + if (get_user(c, pathname)) + return ERR_PTR(-EFAULT); + if (!c) + return NULL; + + name = getname_flags(pathname, LOOKUP_EMPTY); + if (!IS_ERR(name) && !(name->name[0])) { + putname(name); + name = NULL; + } + return name; +} + +struct filename *getname_kernel(const char * filename) { struct filename *result; int len = strlen(filename) + 1; diff --git a/fs/stat.c b/fs/stat.c index 41e598376d7e..aa5bfc41a669 100644 --- a/fs/stat.c +++ b/fs/stat.c @@ -326,18 +326,11 @@ int vfs_fstatat(int dfd, const char __user *filename, { int ret; int statx_flags = flags | AT_NO_AUTOMOUNT; - struct filename *name; + struct filename *name = getname_maybe_null(filename, flags); - /* - * Work around glibc turning fstat() into fstatat(AT_EMPTY_PATH) - * - * If AT_EMPTY_PATH is set, we expect the common case to be that - * empty path, and avoid doing all the extra pathname work. - */ - if (flags == AT_EMPTY_PATH && vfs_empty_path(dfd, filename)) + if (!name) return vfs_fstat(dfd, stat); - name = getname_flags(filename, getname_statx_lookup_flags(statx_flags)); ret = vfs_statx(dfd, name, statx_flags, stat, STATX_BASIC_STATS); putname(name); @@ -775,7 +768,7 @@ SYSCALL_DEFINE5(statx, { int ret; unsigned lflags; - struct filename *name; + struct filename *name = getname_maybe_null(filename, flags); /* * Short-circuit handling of NULL and "" paths. @@ -788,10 +781,9 @@ SYSCALL_DEFINE5(statx, * Supporting this results in the uglification below. */ lflags = flags & ~(AT_NO_AUTOMOUNT | AT_STATX_SYNC_TYPE); - if (lflags == AT_EMPTY_PATH && vfs_empty_path(dfd, filename)) + if (!name) return do_statx_fd(dfd, flags & ~AT_NO_AUTOMOUNT, mask, buffer); - name = getname_flags(filename, getname_statx_lookup_flags(flags)); ret = do_statx(dfd, name, flags, mask, buffer); putname(name); diff --git a/include/linux/fs.h b/include/linux/fs.h index e3c603d01337..403258ac2ea2 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -2766,6 +2766,16 @@ extern struct filename *getname_flags(const char __user *, int); extern struct filename *getname_uflags(const char __user *, int); extern struct filename *getname(const char __user *); extern struct filename *getname_kernel(const char *); +extern struct filename *__getname_maybe_null(const char __user *); +static inline struct filename *getname_maybe_null(const char __user *name, int flags) +{ + if (!(flags & AT_EMPTY_PATH)) + return getname(name); + + if (!name) + return NULL; + return __getname_maybe_null(name); +} extern void putname(struct filename *name); extern int finish_open(struct file *file, struct dentry *dentry,
[ in #work.getname; if nobody objects, I'm going to make #work.xattr pull that. IMO it's saner than hacks around vfs_empty_path() and it does very similar logics - with simpler handling on the caller side. ] Semantics used by statx(2) (and later *xattrat(2)): without AT_EMPTY_PATH it's standard getname() (i.e. ERR_PTR(-ENOENT) on empty string, ERR_PTR(-EFAULT) on NULL), with AT_EMPTY_PATH both empty string and NULL are accepted. Calling conventions: getname_maybe_null(user_pointer, flags) returns * pointer to struct filename when non-empty string had been successfully read * ERR_PTR(...) on error * NULL if an empty string or NULL pointer had been given with AT_EMPTY_FLAGS in the flags argument. It tries to avoid allocation in the last case; it's not always able to do so, in which case the temporary struct filename instance is freed and NULL returned anyway. Fast path is inlined. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> --- diff --git a/fs/namei.c b/fs/namei.c index 4a4a22a08ac2..27eb0a81d9b8 100644