diff mbox series

[RFC] getname_maybe_null() - the third variant of pathname copy-in

Message ID 20241009040316.GY4017910@ZenIV (mailing list archive)
State New
Headers show
Series [RFC] getname_maybe_null() - the third variant of pathname copy-in | expand

Commit Message

Al Viro Oct. 9, 2024, 4:03 a.m. UTC
[
in #work.getname; if nobody objects, I'm going to make #work.xattr pull that.
IMO it's saner than hacks around vfs_empty_path() and it does very similar
logics - with simpler handling on the caller side.
]

Semantics used by statx(2) (and later *xattrat(2)): without AT_EMPTY_PATH
it's standard getname() (i.e. ERR_PTR(-ENOENT) on empty string,
ERR_PTR(-EFAULT) on NULL), with AT_EMPTY_PATH both empty string and
NULL are accepted.
    
Calling conventions: getname_maybe_null(user_pointer, flags) returns
	* pointer to struct filename when non-empty string had been
successfully read
	* ERR_PTR(...) on error
	* NULL if an empty string or NULL pointer had been given
with AT_EMPTY_FLAGS in the flags argument.

It tries to avoid allocation in the last case; it's not always
able to do so, in which case the temporary struct filename instance
is freed and NULL returned anyway.

Fast path is inlined.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
--- 
diff --git a/fs/namei.c b/fs/namei.c
index 4a4a22a08ac2..27eb0a81d9b8 100644

Comments

Christian Brauner Oct. 15, 2024, 2:05 p.m. UTC | #1
On Wed, Oct 09, 2024 at 05:03:16AM +0100, Al Viro wrote:
> [
> in #work.getname; if nobody objects, I'm going to make #work.xattr pull that.
> IMO it's saner than hacks around vfs_empty_path() and it does very similar
> logics - with simpler handling on the caller side.
> ]
> 
> Semantics used by statx(2) (and later *xattrat(2)): without AT_EMPTY_PATH
> it's standard getname() (i.e. ERR_PTR(-ENOENT) on empty string,
> ERR_PTR(-EFAULT) on NULL), with AT_EMPTY_PATH both empty string and
> NULL are accepted.
>     
> Calling conventions: getname_maybe_null(user_pointer, flags) returns
> 	* pointer to struct filename when non-empty string had been
> successfully read
> 	* ERR_PTR(...) on error
> 	* NULL if an empty string or NULL pointer had been given
> with AT_EMPTY_FLAGS in the flags argument.
> 
> It tries to avoid allocation in the last case; it's not always
> able to do so, in which case the temporary struct filename instance
> is freed and NULL returned anyway.
> 
> Fast path is inlined.
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> --- 

Looks good.

Fyi, I'm using your #base.getname as a base for some other work that I'm
currently doing. So please don't rebase #base.getname anymore. ;)

Since you have your #work.xattr and #work.stat using it as base it seems
pretty unlikely anyway but I just thought I mention explicitly that I'm
relying on that #base.getname branch.
Al Viro Oct. 16, 2024, 5:09 a.m. UTC | #2
On Tue, Oct 15, 2024 at 04:05:15PM +0200, Christian Brauner wrote:
> Looks good.
> 
> Fyi, I'm using your #base.getname as a base for some other work that I'm
> currently doing. So please don't rebase #base.getname anymore. ;)
> 
> Since you have your #work.xattr and #work.stat using it as base it seems
> pretty unlikely anyway but I just thought I mention explicitly that I'm
> relying on that #base.getname branch.

FWIW, I see a problem with that sucker.  The trouble is in the
combination AT_FDCWD, "", AT_EMPTY_PATH.  vfs_empty_path() returns
false on that, so fstatat() in mainline falls back to vfs_statx() and
does the right thing.  This variant does _not_.

Note that your variant of xattr series also ended up failing on e.g.
getxattrat() with such combination:
        if (at_flags & AT_EMPTY_PATH && vfs_empty_path(dfd, pathname)) {
                CLASS(fd, f)(dfd);
                if (!f.file)
                        return -EBADF;
                audit_file(f.file);
                return getxattr(file_mnt_idmap(f.file), file_dentry(f.file),
                                name, value, size);
        }

        lookup_flags = (at_flags & AT_SYMLINK_NOFOLLOW) ? 0 : LOOKUP_FOLLOW;

retry:
        error = user_path_at(dfd, pathname, lookup_flags, &path);

ended up calling user_path_at() with empty pathname and nothing like LOOKUP_EMPTY
in lookup_flags.  Which bails out with -ENOENT, since getname() in there does
so.  My variant bails out with -EBADF and I'd argue that neither is correct.

Not sure what's the sane solution here, need to think for a while...
Christian Brauner Oct. 16, 2024, 8:32 a.m. UTC | #3
On Wed, Oct 16, 2024 at 06:09:08AM +0100, Al Viro wrote:
> On Tue, Oct 15, 2024 at 04:05:15PM +0200, Christian Brauner wrote:
> > Looks good.
> > 
> > Fyi, I'm using your #base.getname as a base for some other work that I'm
> > currently doing. So please don't rebase #base.getname anymore. ;)
> > 
> > Since you have your #work.xattr and #work.stat using it as base it seems
> > pretty unlikely anyway but I just thought I mention explicitly that I'm
> > relying on that #base.getname branch.
> 
> FWIW, I see a problem with that sucker.  The trouble is in the
> combination AT_FDCWD, "", AT_EMPTY_PATH.  vfs_empty_path() returns

Yeah, we're aware.

> false on that, so fstatat() in mainline falls back to vfs_statx() and
> does the right thing.  This variant does _not_.
> 
> Note that your variant of xattr series also ended up failing on e.g.
> getxattrat() with such combination:
>         if (at_flags & AT_EMPTY_PATH && vfs_empty_path(dfd, pathname)) {
>                 CLASS(fd, f)(dfd);
>                 if (!f.file)
>                         return -EBADF;
>                 audit_file(f.file);
>                 return getxattr(file_mnt_idmap(f.file), file_dentry(f.file),
>                                 name, value, size);
>         }
> 
>         lookup_flags = (at_flags & AT_SYMLINK_NOFOLLOW) ? 0 : LOOKUP_FOLLOW;
> 
> retry:
>         error = user_path_at(dfd, pathname, lookup_flags, &path);
> 
> ended up calling user_path_at() with empty pathname and nothing like LOOKUP_EMPTY
> in lookup_flags.  Which bails out with -ENOENT, since getname() in there does
> so.  My variant bails out with -EBADF and I'd argue that neither is correct.
> 
> Not sure what's the sane solution here, need to think for a while...

Fwiw, in the other thread we concluded to just not care about AT_FDCWD with "".
And so far I agree with that.
Al Viro Oct. 16, 2024, 2 p.m. UTC | #4
On Wed, Oct 16, 2024 at 10:32:16AM +0200, Christian Brauner wrote:

> > ended up calling user_path_at() with empty pathname and nothing like LOOKUP_EMPTY
> > in lookup_flags.  Which bails out with -ENOENT, since getname() in there does
> > so.  My variant bails out with -EBADF and I'd argue that neither is correct.
> > 
> > Not sure what's the sane solution here, need to think for a while...
> 
> Fwiw, in the other thread we concluded to just not care about AT_FDCWD with "".
> And so far I agree with that.

Subject:?
Christian Brauner Oct. 16, 2024, 2:49 p.m. UTC | #5
On Wed, Oct 16, 2024 at 03:00:50PM +0100, Al Viro wrote:
> On Wed, Oct 16, 2024 at 10:32:16AM +0200, Christian Brauner wrote:
> 
> > > ended up calling user_path_at() with empty pathname and nothing like LOOKUP_EMPTY
> > > in lookup_flags.  Which bails out with -ENOENT, since getname() in there does
> > > so.  My variant bails out with -EBADF and I'd argue that neither is correct.
> > > 
> > > Not sure what's the sane solution here, need to think for a while...
> > 
> > Fwiw, in the other thread we concluded to just not care about AT_FDCWD with "".
> > And so far I agree with that.
> 
> Subject:?

https://lore.kernel.org/r/CAGudoHHdccL5Lh8zAO-0swqqRCW4GXMSXhq4jQGoVj=UdBK-Lg@mail.gmail.com

Hm, this only speaks about the NULL case.


I just looked through codesearch on github and on debian and the only
example I found was
https://sources.debian.org/src/snapd/2.65.3-1/tests/main/seccomp-statx/test-snapd-statx/bin/statx.py/?hl=71#L71

So really, just special-case it for statx() imho instead of spreading
that ugliness everywhere?
Al Viro Oct. 17, 2024, 11:54 p.m. UTC | #6
On Wed, Oct 16, 2024 at 04:49:48PM +0200, Christian Brauner wrote:
> On Wed, Oct 16, 2024 at 03:00:50PM +0100, Al Viro wrote:
> > On Wed, Oct 16, 2024 at 10:32:16AM +0200, Christian Brauner wrote:
> > 
> > > > ended up calling user_path_at() with empty pathname and nothing like LOOKUP_EMPTY
> > > > in lookup_flags.  Which bails out with -ENOENT, since getname() in there does
> > > > so.  My variant bails out with -EBADF and I'd argue that neither is correct.
> > > > 
> > > > Not sure what's the sane solution here, need to think for a while...
> > > 
> > > Fwiw, in the other thread we concluded to just not care about AT_FDCWD with "".
> > > And so far I agree with that.
> > 
> > Subject:?
> 
> https://lore.kernel.org/r/CAGudoHHdccL5Lh8zAO-0swqqRCW4GXMSXhq4jQGoVj=UdBK-Lg@mail.gmail.com
> 
> Hm, this only speaks about the NULL case.
> 
> 
> I just looked through codesearch on github and on debian and the only
> example I found was
> https://sources.debian.org/src/snapd/2.65.3-1/tests/main/seccomp-statx/test-snapd-statx/bin/statx.py/?hl=71#L71
> 
> So really, just special-case it for statx() imho instead of spreading
> that ugliness everywhere?

Not sure, TBH.  I wonder if it would be simpler to have filename_lookup()
accept NULL for pathname and have it treat that as "".

Look: all it takes is the following trick
	* add const char *pathname to struct nameidata
	* in __set_nameidata() add
        p->pathname = likely(name) ? name->name : "";
	* in path_init() replace
	const char *s = nd->name->name;
with
	const char *s = nd->pathname;
and we are done.  Oh, and teach putname() to treat NULL as no-op.

With such changes in fs/namei.c we could do
        struct filename *name = getname_maybe_null(filename, flags);

	if (!name && dfd != AT_FDCWD)
		return do_statx_fd(dfd, flags & ~AT_NO_AUTOMOUNT, mask, buffer);
	ret = vfs_statx(dfd, name, statx_flags, stat, STATX_BASIC_STATS);
	putname(name);
	return ret;
in statx(2) and similar in vfs_fstatat().

With that I'm not even sure we want to bother with separate
vfs_statx_fd() - the overhead might very well turn out to
be tolerable.  It is non-zero, but that's a fairly straight
trip through filename_lookup() machinery.  Would require some
profiling, though...
Christian Brauner Oct. 18, 2024, 11:06 a.m. UTC | #7
On Fri, Oct 18, 2024 at 12:54:59AM +0100, Al Viro wrote:
> On Wed, Oct 16, 2024 at 04:49:48PM +0200, Christian Brauner wrote:
> > On Wed, Oct 16, 2024 at 03:00:50PM +0100, Al Viro wrote:
> > > On Wed, Oct 16, 2024 at 10:32:16AM +0200, Christian Brauner wrote:
> > > 
> > > > > ended up calling user_path_at() with empty pathname and nothing like LOOKUP_EMPTY
> > > > > in lookup_flags.  Which bails out with -ENOENT, since getname() in there does
> > > > > so.  My variant bails out with -EBADF and I'd argue that neither is correct.
> > > > > 
> > > > > Not sure what's the sane solution here, need to think for a while...
> > > > 
> > > > Fwiw, in the other thread we concluded to just not care about AT_FDCWD with "".
> > > > And so far I agree with that.
> > > 
> > > Subject:?
> > 
> > https://lore.kernel.org/r/CAGudoHHdccL5Lh8zAO-0swqqRCW4GXMSXhq4jQGoVj=UdBK-Lg@mail.gmail.com
> > 
> > Hm, this only speaks about the NULL case.
> > 
> > 
> > I just looked through codesearch on github and on debian and the only
> > example I found was
> > https://sources.debian.org/src/snapd/2.65.3-1/tests/main/seccomp-statx/test-snapd-statx/bin/statx.py/?hl=71#L71
> > 
> > So really, just special-case it for statx() imho instead of spreading
> > that ugliness everywhere?
> 
> Not sure, TBH.  I wonder if it would be simpler to have filename_lookup()
> accept NULL for pathname and have it treat that as "".
> 
> Look: all it takes is the following trick
> 	* add const char *pathname to struct nameidata
> 	* in __set_nameidata() add
>         p->pathname = likely(name) ? name->name : "";
> 	* in path_init() replace
> 	const char *s = nd->name->name;
> with
> 	const char *s = nd->pathname;
> and we are done.  Oh, and teach putname() to treat NULL as no-op.

I know, that's what I suggested to Linus initially but he NAKed it
because he didn't want the extra cycles.

> 
> With such changes in fs/namei.c we could do
>         struct filename *name = getname_maybe_null(filename, flags);
> 
> 	if (!name && dfd != AT_FDCWD)
> 		return do_statx_fd(dfd, flags & ~AT_NO_AUTOMOUNT, mask, buffer);
> 	ret = vfs_statx(dfd, name, statx_flags, stat, STATX_BASIC_STATS);
> 	putname(name);
> 	return ret;
> in statx(2) and similar in vfs_fstatat().
> 
> With that I'm not even sure we want to bother with separate
> vfs_statx_fd() - the overhead might very well turn out to
> be tolerable.  It is non-zero, but that's a fairly straight
> trip through filename_lookup() machinery.  Would require some
> profiling, though...
Al Viro Oct. 18, 2024, 4:51 p.m. UTC | #8
On Fri, Oct 18, 2024 at 01:06:12PM +0200, Christian Brauner wrote:
> > Look: all it takes is the following trick
> > 	* add const char *pathname to struct nameidata
> > 	* in __set_nameidata() add
> >         p->pathname = likely(name) ? name->name : "";
> > 	* in path_init() replace
> > 	const char *s = nd->name->name;
> > with
> > 	const char *s = nd->pathname;
> > and we are done.  Oh, and teach putname() to treat NULL as no-op.
> 
> I know, that's what I suggested to Linus initially but he NAKed it
> because he didn't want the extra cycles.

Extra cycles where?  If anything, I'd expect a too-small-to-measure
speedup due to dereference shifted from path_init() to __set_nameidata().
Below is literally all it takes to make filename_lookup() treat NULL
as empty-string name.

NOTE: I'm not talking about forcing the pure by-descriptor case through
the dfd+pathname codepath; not without serious profiling.  But treating
AT_FDCWD + NULL by the delta below and passing NULL struct filename to
filename_lookup()?  Where do you expect to have the lost cycles on that?

diff --git a/fs/namei.c b/fs/namei.c
index 4a4a22a08ac2..fc2053877e5c 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -264,7 +264,7 @@ EXPORT_SYMBOL(getname_kernel);
 
 void putname(struct filename *name)
 {
-	if (IS_ERR(name))
+	if (IS_ERR_OR_NULL(name))
 		return;
 
 	if (WARN_ON_ONCE(!atomic_read(&name->refcnt)))
@@ -588,6 +588,7 @@ struct nameidata {
 		unsigned seq;
 	} *stack, internal[EMBEDDED_LEVELS];
 	struct filename	*name;
+	const char *pathname;
 	struct nameidata *saved;
 	unsigned	root_seq;
 	int		dfd;
@@ -606,6 +607,7 @@ static void __set_nameidata(struct nameidata *p, int dfd, struct filename *name)
 	p->depth = 0;
 	p->dfd = dfd;
 	p->name = name;
+	p->pathname = likely(name) ? name->name : "";
 	p->path.mnt = NULL;
 	p->path.dentry = NULL;
 	p->total_link_count = old ? old->total_link_count : 0;
@@ -2439,7 +2441,7 @@ static int link_path_walk(const char *name, struct nameidata *nd)
 static const char *path_init(struct nameidata *nd, unsigned flags)
 {
 	int error;
-	const char *s = nd->name->name;
+	const char *s = nd->pathname;
 
 	/* LOOKUP_CACHED requires RCU, ask caller to retry */
 	if ((flags & (LOOKUP_RCU | LOOKUP_CACHED)) == LOOKUP_CACHED)
Al Viro Oct. 18, 2024, 7:38 p.m. UTC | #9
On Fri, Oct 18, 2024 at 05:51:58PM +0100, Al Viro wrote:

> Extra cycles where?  If anything, I'd expect a too-small-to-measure
> speedup due to dereference shifted from path_init() to __set_nameidata().
> Below is literally all it takes to make filename_lookup() treat NULL
> as empty-string name.
> 
> NOTE: I'm not talking about forcing the pure by-descriptor case through
> the dfd+pathname codepath; not without serious profiling.  But treating
> AT_FDCWD + NULL by the delta below and passing NULL struct filename to
> filename_lookup()?  Where do you expect to have the lost cycles on that?

[snip]

BTW, could you give me a reference to the mail with those objections?
I don't see anything in my mailbox, but...

Or was that in one of those AT_EMPTY_PATH_NOCHECK (IIRC?) threads?

Anyway, what I'm suggesting is

1) teach filename_lookup() to handle NULL struct filename * argument, treating
it as "".  Trivial and does not impose any overhead on the normal cases.

2) have statx try to recognize AT_EMPTY_PATH, "" and AT_EMPTY_PATH, NULL.
If we have that and dfd is *NOT* AT_FDCWD, we have a nice descriptor-based
case and can deal with it.
If the name is not empty, we have to go for dfd+filename path.  Also obvious.
Where we get trouble is AT_FDCWD, NULL case.  But with (1) we can simply
route that to the same dfd+filename path, just passing it NULL for filename.

That handles the currently broken case, with very little disruption to
anything else.

What's more, the logics for "is it NULL or empty with AT_EMPTY_PATH" no
longer needs to care about dfd.  We can encapsulate it nicely into
a function that takes userland pointer + flags and does the following:

	if no AT_EMPTY_PATH in flags
		return getname(pointer)
	if pointer == NULL		     (same as vfs_empty_path())
		return NULL
	peek at the first byte, if it's '\0' (same as vfs_empty_path())
		return NULL
	name = getname_flags(pointer, LOOKUP_EMPTY)
	if IS_ERR(name)
		return name
	if unlikely(name is empty)
		putname(name)
		return NULL
	return name

Then statx() (or anyone who wants similar AT_EMPTY_PATH + NULL semantics)
can do this:
	name = getname_maybe_null(user_pointer, flags)
	if (!name && dfd >= 0) // or dfd != AT_FDCWD, perhaps
		do_something_by_fd(dfd, ...)
	else
		do_something_by_name(dfd, name, ...)
without the bitrot-prone boilerplate.
Al Viro Oct. 19, 2024, 5:03 a.m. UTC | #10
On Fri, Oct 18, 2024 at 08:38:22PM +0100, Al Viro wrote:
> On Fri, Oct 18, 2024 at 05:51:58PM +0100, Al Viro wrote:
> 
> > Extra cycles where?  If anything, I'd expect a too-small-to-measure
> > speedup due to dereference shifted from path_init() to __set_nameidata().
> > Below is literally all it takes to make filename_lookup() treat NULL
> > as empty-string name.
> > 
> > NOTE: I'm not talking about forcing the pure by-descriptor case through
> > the dfd+pathname codepath; not without serious profiling.  But treating
> > AT_FDCWD + NULL by the delta below and passing NULL struct filename to
> > filename_lookup()?  Where do you expect to have the lost cycles on that?
> 
> [snip]
> 
> BTW, could you give me a reference to the mail with those objections?
> I don't see anything in my mailbox, but...
> 
> Or was that in one of those AT_EMPTY_PATH_NOCHECK (IIRC?) threads?
> 
> Anyway, what I'm suggesting is
> 
> 1) teach filename_lookup() to handle NULL struct filename * argument, treating
> it as "".  Trivial and does not impose any overhead on the normal cases.
> 
> 2) have statx try to recognize AT_EMPTY_PATH, "" and AT_EMPTY_PATH, NULL.
> If we have that and dfd is *NOT* AT_FDCWD, we have a nice descriptor-based
> case and can deal with it.
> If the name is not empty, we have to go for dfd+filename path.  Also obvious.
> Where we get trouble is AT_FDCWD, NULL case.  But with (1) we can simply
> route that to the same dfd+filename path, just passing it NULL for filename.
> 
> That handles the currently broken case, with very little disruption to
> anything else.

See #getname.fixup; on top of #base.getname and IMO worth folding into it.
I don't believe that it's going to give any measurable slowdown compared to
mainline.  Again, if the concerns about wasted cycles had been about routing
the dfd,"" and dfd,NULL cases through the filename_lookup(), this does *NOT*
happen with that patch.  Christian, Linus?

commit 9fb26eb324d9aa9e6889f181f1683e710946258f
Author: Al Viro <viro@zeniv.linux.org.uk>
Date:   Sat Oct 19 00:57:14 2024 -0400

    fix statx(2) regression on AT_FDCWD,"" and AT_FDCWD,NULL
    
    teach filename_lookup() et.al. to handle NULL struct filename reference
    as ""; make AT_FDCWD,"" (but _NOT_ the normal dfd,"") case in statx(2)
    fall back to path-based variant.
    
    Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

diff --git a/fs/namei.c b/fs/namei.c
index 27eb0a81d9b8..2bfe476c3bd0 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -280,7 +280,7 @@ EXPORT_SYMBOL(getname_kernel);
 
 void putname(struct filename *name)
 {
-	if (IS_ERR(name))
+	if (IS_ERR_OR_NULL(name))
 		return;
 
 	if (WARN_ON_ONCE(!atomic_read(&name->refcnt)))
@@ -604,6 +604,7 @@ struct nameidata {
 		unsigned seq;
 	} *stack, internal[EMBEDDED_LEVELS];
 	struct filename	*name;
+	const char *pathname;
 	struct nameidata *saved;
 	unsigned	root_seq;
 	int		dfd;
@@ -622,6 +623,7 @@ static void __set_nameidata(struct nameidata *p, int dfd, struct filename *name)
 	p->depth = 0;
 	p->dfd = dfd;
 	p->name = name;
+	p->pathname = likely(name) ? name->name : "";
 	p->path.mnt = NULL;
 	p->path.dentry = NULL;
 	p->total_link_count = old ? old->total_link_count : 0;
@@ -2455,7 +2457,7 @@ static int link_path_walk(const char *name, struct nameidata *nd)
 static const char *path_init(struct nameidata *nd, unsigned flags)
 {
 	int error;
-	const char *s = nd->name->name;
+	const char *s = nd->pathname;
 
 	/* LOOKUP_CACHED requires RCU, ask caller to retry */
 	if ((flags & (LOOKUP_RCU | LOOKUP_CACHED)) == LOOKUP_CACHED)
diff --git a/fs/stat.c b/fs/stat.c
index d0d82efd44d6..b74831dc7ae6 100644
--- a/fs/stat.c
+++ b/fs/stat.c
@@ -328,7 +328,7 @@ int vfs_fstatat(int dfd, const char __user *filename,
 	int statx_flags = flags | AT_NO_AUTOMOUNT;
 	struct filename *name = getname_maybe_null(filename, flags);
 
-	if (!name)
+	if (!name && dfd >= 0)
 		return vfs_fstat(dfd, stat);
 
 	ret = vfs_statx(dfd, name, statx_flags, stat, STATX_BASIC_STATS);
@@ -769,7 +769,7 @@ SYSCALL_DEFINE5(statx,
 	int ret;
 	struct filename *name = getname_maybe_null(filename, flags);
 
-	if (!name)
+	if (!name && dfd >= 0)
 		return do_statx_fd(dfd, flags & ~AT_NO_AUTOMOUNT, mask, buffer);
 
 	ret = do_statx(dfd, name, flags, mask, buffer);
Linus Torvalds Oct. 19, 2024, 4:15 p.m. UTC | #11
On Fri, 18 Oct 2024 at 22:03, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> See #getname.fixup; on top of #base.getname and IMO worth folding into it.
> I don't believe that it's going to give any measurable slowdown compared to
> mainline.  Again, if the concerns about wasted cycles had been about routing
> the dfd,"" and dfd,NULL cases through the filename_lookup(), this does *NOT*
> happen with that patch.  Christian, Linus?

I'm certainly ok with this, although I did react to the

-       if (!name)
+       if (!name && dfd >= 0)

changes.

I see why you do them, but it makes me wonder if maybe
getname_maybe_null() should just get the dfd too.

And then in getname_maybe_null(), the

        if (!name)
                return NULL;

would become

       if (!name)
                return dfd >= 0 ? NULL : ERR_PTR(-EBADF);

because while I'm not entirely against it, I'm not convinced we really
want to support

        fstatat(AT_FDCWD, NULL, &st, AT_EMPTY_PATH);

becoming the same as

        fstat(".", &st);

IOW, I think the (NULL, AT_EMPTY_PATH) tuple makes sense as a way to
pass just an 'fd', but I'm _not_ convinced it makes sense as a way to
pass in AT_FDCWD.

Put another way: imagine you have a silly library implementation that does

    #define fstat(fd,buf) fstatat(fd, NULL, buf, AT_EMPTY_PATH)

and I think *that* is what we want to support. But if 'fd' is not a
valid fd, you should get -EBADF, not "fstat of current working
directlry".

Hmm?

This is not a hugely important thing. If people really think that
doing "fstat()" on the current working directory is worth doing this
for, then whatever.

And yes, there's some argument for returning -EFAULT, but I do think
that "treat NULL+AT_EMPTY_PATH as the non-at version" argument is
stronger.

But the whole "some argument" does mean that I can certainly be
convinced that I'm just wrong.

                    Linus
Al Viro Oct. 19, 2024, 5:11 p.m. UTC | #12
On Sat, Oct 19, 2024 at 09:15:32AM -0700, Linus Torvalds wrote:

> IOW, I think the (NULL, AT_EMPTY_PATH) tuple makes sense as a way to
> pass just an 'fd', but I'm _not_ convinced it makes sense as a way to
> pass in AT_FDCWD.
> 
> Put another way: imagine you have a silly library implementation that does
> 
>     #define fstat(fd,buf) fstatat(fd, NULL, buf, AT_EMPTY_PATH)
> 
> and I think *that* is what we want to support. But if 'fd' is not a
> valid fd, you should get -EBADF, not "fstat of current working
> directlry".
> 
> Hmm?

There'd been an example of live use of that posted upthread:

https://sources.debian.org/src/snapd/2.65.3-1/tests/main/seccomp-statx/test-snapd-statx/bin/statx.py/?hl=71#L71

"" rather than NULL, but...
 
> This is not a hugely important thing. If people really think that
> doing "fstat()" on the current working directory is worth doing this
> for, then whatever.
Linus Torvalds Oct. 19, 2024, 5:27 p.m. UTC | #13
On Sat, 19 Oct 2024 at 10:11, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> There'd been an example of live use of that posted upthread:
>
> https://sources.debian.org/src/snapd/2.65.3-1/tests/main/seccomp-statx/test-snapd-statx/bin/statx.py/?hl=71#L71
>
> "" rather than NULL, but...

Note that outside of a library, I'd argue that using NULL is an active
bug, and should not be supported.

It's purely a Linux extension, after all.

That said, I do think my "what if a library implements fstat() using
fstatat()" argument could equally well be "what if a library does a
Linux-specific optimization and uses NULL instead of the empty
string".

So I guess it would be entirely valid for a Linux libc to do something like

       int fstatat(int dirfd, const char *restrict pathname,
                struct stat *restrict statbuf, int flags)
        {
                if ((flags & AT_EMPTY_PATH) && !*pathname)
                        pathname = NULL;
                return SYSCALL(__NR_fstatat, dirfd, pathname, statbuf, flags);
        }

so yeah, I guess it's however we just decide to document it.

               Linus
Christian Brauner Oct. 21, 2024, 12:36 p.m. UTC | #14
On Fri, Oct 18, 2024 at 05:51:58PM +0100, Al Viro wrote:
> On Fri, Oct 18, 2024 at 01:06:12PM +0200, Christian Brauner wrote:
> > > Look: all it takes is the following trick
> > > 	* add const char *pathname to struct nameidata
> > > 	* in __set_nameidata() add
> > >         p->pathname = likely(name) ? name->name : "";
> > > 	* in path_init() replace
> > > 	const char *s = nd->name->name;
> > > with
> > > 	const char *s = nd->pathname;
> > > and we are done.  Oh, and teach putname() to treat NULL as no-op.
> > 
> > I know, that's what I suggested to Linus initially but he NAKed it
> > because he didn't want the extra cycles.
> 
> Extra cycles where?  If anything, I'd expect a too-small-to-measure
> speedup due to dereference shifted from path_init() to __set_nameidata().
> Below is literally all it takes to make filename_lookup() treat NULL
> as empty-string name.
> 
> NOTE: I'm not talking about forcing the pure by-descriptor case through
> the dfd+pathname codepath; not without serious profiling.  But treating

Ah ok, that makes sense. I thought you were talking about that...

> AT_FDCWD + NULL by the delta below and passing NULL struct filename to
> filename_lookup()?

Yes, that's good!
Christian Brauner Oct. 21, 2024, 12:38 p.m. UTC | #15
On Sat, Oct 19, 2024 at 06:11:18PM +0100, Al Viro wrote:
> On Sat, Oct 19, 2024 at 09:15:32AM -0700, Linus Torvalds wrote:
> 
> > IOW, I think the (NULL, AT_EMPTY_PATH) tuple makes sense as a way to
> > pass just an 'fd', but I'm _not_ convinced it makes sense as a way to
> > pass in AT_FDCWD.
> > 
> > Put another way: imagine you have a silly library implementation that does
> > 
> >     #define fstat(fd,buf) fstatat(fd, NULL, buf, AT_EMPTY_PATH)
> > 
> > and I think *that* is what we want to support. But if 'fd' is not a
> > valid fd, you should get -EBADF, not "fstat of current working
> > directlry".
> > 
> > Hmm?
> 
> There'd been an example of live use of that posted upthread:
> 
> https://sources.debian.org/src/snapd/2.65.3-1/tests/main/seccomp-statx/test-snapd-statx/bin/statx.py/?hl=71#L71
> 
> "" rather than NULL, but...

Which I think is really broken testing nonsense anyway so really we
could easily get them to change this and not even have to worry about
this. We've done stuff like that before. But I'm fine either way.
Christian Brauner Oct. 21, 2024, 12:39 p.m. UTC | #16
On Sat, Oct 19, 2024 at 06:03:22AM +0100, Al Viro wrote:
> On Fri, Oct 18, 2024 at 08:38:22PM +0100, Al Viro wrote:
> > On Fri, Oct 18, 2024 at 05:51:58PM +0100, Al Viro wrote:
> > 
> > > Extra cycles where?  If anything, I'd expect a too-small-to-measure
> > > speedup due to dereference shifted from path_init() to __set_nameidata().
> > > Below is literally all it takes to make filename_lookup() treat NULL
> > > as empty-string name.
> > > 
> > > NOTE: I'm not talking about forcing the pure by-descriptor case through
> > > the dfd+pathname codepath; not without serious profiling.  But treating
> > > AT_FDCWD + NULL by the delta below and passing NULL struct filename to
> > > filename_lookup()?  Where do you expect to have the lost cycles on that?
> > 
> > [snip]
> > 
> > BTW, could you give me a reference to the mail with those objections?
> > I don't see anything in my mailbox, but...
> > 
> > Or was that in one of those AT_EMPTY_PATH_NOCHECK (IIRC?) threads?
> > 
> > Anyway, what I'm suggesting is
> > 
> > 1) teach filename_lookup() to handle NULL struct filename * argument, treating
> > it as "".  Trivial and does not impose any overhead on the normal cases.
> > 
> > 2) have statx try to recognize AT_EMPTY_PATH, "" and AT_EMPTY_PATH, NULL.
> > If we have that and dfd is *NOT* AT_FDCWD, we have a nice descriptor-based
> > case and can deal with it.
> > If the name is not empty, we have to go for dfd+filename path.  Also obvious.
> > Where we get trouble is AT_FDCWD, NULL case.  But with (1) we can simply
> > route that to the same dfd+filename path, just passing it NULL for filename.
> > 
> > That handles the currently broken case, with very little disruption to
> > anything else.
> 
> See #getname.fixup; on top of #base.getname and IMO worth folding into it.

Yes, please fold so I can rebase my series on top of it.

> I don't believe that it's going to give any measurable slowdown compared to
> mainline.  Again, if the concerns about wasted cycles had been about routing
> the dfd,"" and dfd,NULL cases through the filename_lookup(), this does *NOT*
> happen with that patch.  Christian, Linus?

This looks good!
Reviewed-by: Christian Brauner <brauner@kernel.org>
Christian Brauner Oct. 21, 2024, 12:47 p.m. UTC | #17
On Fri, Oct 18, 2024 at 08:38:22PM +0100, Al Viro wrote:
> On Fri, Oct 18, 2024 at 05:51:58PM +0100, Al Viro wrote:
> 
> > Extra cycles where?  If anything, I'd expect a too-small-to-measure
> > speedup due to dereference shifted from path_init() to __set_nameidata().
> > Below is literally all it takes to make filename_lookup() treat NULL
> > as empty-string name.
> > 
> > NOTE: I'm not talking about forcing the pure by-descriptor case through
> > the dfd+pathname codepath; not without serious profiling.  But treating
> > AT_FDCWD + NULL by the delta below and passing NULL struct filename to
> > filename_lookup()?  Where do you expect to have the lost cycles on that?
> 
> [snip]
> 
> BTW, could you give me a reference to the mail with those objections?
> I don't see anything in my mailbox, but...

I had to search for quite a bit myself:

https://lore.kernel.org/r/CAHk-=wifPKRG2w4mw+YchNtAuk4mMJBde7bG-Z7wt0+ZeQMJ_A@mail.gmail.com
Al Viro Oct. 21, 2024, 5:05 p.m. UTC | #18
On Mon, Oct 21, 2024 at 02:47:59PM +0200, Christian Brauner wrote:
> On Fri, Oct 18, 2024 at 08:38:22PM +0100, Al Viro wrote:
> > On Fri, Oct 18, 2024 at 05:51:58PM +0100, Al Viro wrote:
> > 
> > > Extra cycles where?  If anything, I'd expect a too-small-to-measure
> > > speedup due to dereference shifted from path_init() to __set_nameidata().
> > > Below is literally all it takes to make filename_lookup() treat NULL
> > > as empty-string name.
> > > 
> > > NOTE: I'm not talking about forcing the pure by-descriptor case through
> > > the dfd+pathname codepath; not without serious profiling.  But treating
> > > AT_FDCWD + NULL by the delta below and passing NULL struct filename to
> > > filename_lookup()?  Where do you expect to have the lost cycles on that?
> > 
> > [snip]
> > 
> > BTW, could you give me a reference to the mail with those objections?
> > I don't see anything in my mailbox, but...
> 
> I had to search for quite a bit myself:
> 
> https://lore.kernel.org/r/CAHk-=wifPKRG2w4mw+YchNtAuk4mMJBde7bG-Z7wt0+ZeQMJ_A@mail.gmail.com

Re get_user() - there's one architecture where this fetch is a clear loss.
Take a look at what um is doing; it's a full page table walk, then
(single-byte) memcpy().  With no caching of page table walk results,
so strncpy_from_user() in case the sucker is _not_ empty will have to
start from scratch (and it's _not_ generic strncpy_from_user() there,
for the same reasons).

BTW, I wonder if we could speed the things up on um by caching the last
page table walk result - and treating that as a TLB.  Might make
back-to-back get_user()/put_user() seriously cheaper there - unsafe_get_user()
could grow a fastpath, possibly making generic strncpy_from_user() cheap
enough to be used.  If that is feasible, the only non-generic variant
would remain on mips, and that's a lot less convincing case than um.
Possibly strnlen_user(), as well - that one has a variant on xtensa,
but that's also not an obvious win compared to generic...

That's a separate story, anyway.
Al Viro Oct. 21, 2024, 5:09 p.m. UTC | #19
On Mon, Oct 21, 2024 at 02:39:58PM +0200, Christian Brauner wrote:

> > See #getname.fixup; on top of #base.getname and IMO worth folding into it.
> 
> Yes, please fold so I can rebase my series on top of it.

OK...  What I have is #base.getname-fixed, with two commits - trivial
"teach filename_lookup() to accept NULL" and introducing getname_maybe_null(),
with fix folded in.

#work.xattr2 and #work.statx2 are on top of that.
Al Viro Oct. 21, 2024, 10:43 p.m. UTC | #20
On Mon, Oct 21, 2024 at 06:09:10PM +0100, Al Viro wrote:
> On Mon, Oct 21, 2024 at 02:39:58PM +0200, Christian Brauner wrote:
> 
> > > See #getname.fixup; on top of #base.getname and IMO worth folding into it.
> > 
> > Yes, please fold so I can rebase my series on top of it.
> 
> OK...  What I have is #base.getname-fixed, with two commits - trivial
> "teach filename_lookup() to accept NULL" and introducing getname_maybe_null(),
> with fix folded in.
> 
> #work.xattr2 and #work.statx2 are on top of that.

BTW, speaking of statx() - I would rather lift the call of cp_statx() out
of do_statx() and do_statx_fd() into the callers.  Yes, that needs making
it non-static, due to io_uring; not a problem, IMO - it fits into the
"how do we copy internal objects to userland ones" family of helpers.

Another fun issue: for by-pathname case vfs_fstatat() ends up hitting
the same vfs_statx_path() as statx(2); however, for by-descriptor case
they do vfs_getattr() and vfs_statx_path() resp.

The difference is, vfs_statx_path() has
        if (request_mask & STATX_MNT_ID_UNIQUE) {
                stat->mnt_id = real_mount(path->mnt)->mnt_id_unique;
                stat->result_mask |= STATX_MNT_ID_UNIQUE;
        } else {
                stat->mnt_id = real_mount(path->mnt)->mnt_id;
                stat->result_mask |= STATX_MNT_ID;
        }

        if (path_mounted(path))
                stat->attributes |= STATX_ATTR_MOUNT_ROOT;
        stat->attributes_mask |= STATX_ATTR_MOUNT_ROOT;

        /*
         * If this is a block device inode, override the filesystem
         * attributes with the block device specific parameters that need to be
         * obtained from the bdev backing inode.
         */
        if (S_ISBLK(stat->mode))
                bdev_statx(path, stat, request_mask);
done after vfs_getattr().  Questions:

1) why is STATX_MNT_ID set without checking if it's in the mask passed to
the damn thing?

2) why, in the name of everything unholy, does statx() on /dev/weird_shite
trigger modprobe on that thing?  Without even a permission check on it...
Christian Brauner Oct. 22, 2024, 8:49 a.m. UTC | #21
On Mon, Oct 21, 2024 at 11:43:13PM +0100, Al Viro wrote:
> On Mon, Oct 21, 2024 at 06:09:10PM +0100, Al Viro wrote:
> > On Mon, Oct 21, 2024 at 02:39:58PM +0200, Christian Brauner wrote:
> > 
> > > > See #getname.fixup; on top of #base.getname and IMO worth folding into it.
> > > 
> > > Yes, please fold so I can rebase my series on top of it.
> > 
> > OK...  What I have is #base.getname-fixed, with two commits - trivial
> > "teach filename_lookup() to accept NULL" and introducing getname_maybe_null(),
> > with fix folded in.
> > 
> > #work.xattr2 and #work.statx2 are on top of that.
> 
> BTW, speaking of statx() - I would rather lift the call of cp_statx() out
> of do_statx() and do_statx_fd() into the callers.  Yes, that needs making
> it non-static, due to io_uring; not a problem, IMO - it fits into the
> "how do we copy internal objects to userland ones" family of helpers.
> 
> Another fun issue: for by-pathname case vfs_fstatat() ends up hitting
> the same vfs_statx_path() as statx(2); however, for by-descriptor case
> they do vfs_getattr() and vfs_statx_path() resp.
> 
> The difference is, vfs_statx_path() has
>         if (request_mask & STATX_MNT_ID_UNIQUE) {
>                 stat->mnt_id = real_mount(path->mnt)->mnt_id_unique;
>                 stat->result_mask |= STATX_MNT_ID_UNIQUE;
>         } else {
>                 stat->mnt_id = real_mount(path->mnt)->mnt_id;
>                 stat->result_mask |= STATX_MNT_ID;
>         }
> 
>         if (path_mounted(path))
>                 stat->attributes |= STATX_ATTR_MOUNT_ROOT;
>         stat->attributes_mask |= STATX_ATTR_MOUNT_ROOT;
> 
>         /*
>          * If this is a block device inode, override the filesystem
>          * attributes with the block device specific parameters that need to be
>          * obtained from the bdev backing inode.
>          */
>         if (S_ISBLK(stat->mode))
>                 bdev_statx(path, stat, request_mask);
> done after vfs_getattr().  Questions:
> 
> 1) why is STATX_MNT_ID set without checking if it's in the mask passed to
> the damn thing?

If you look at the history you'll find that STATX_MNT_ID has always been
set unconditionally, i.e., userspace didn't have to request it
explicitly. And that's fine. It's not like it's expensive.

> 
> 2) why, in the name of everything unholy, does statx() on /dev/weird_shite
> trigger modprobe on that thing?  Without even a permission check on it...

Block people should know. But afaict, that's a block device thing which
happens with CONFIG_BLOCK_LEGACY_AUTOLOAD enabled which is supposedly
deprecated.
Al Viro Oct. 30, 2024, 6:37 a.m. UTC | #22
On Tue, Oct 22, 2024 at 10:49:59AM +0200, Christian Brauner wrote:

> > 1) why is STATX_MNT_ID set without checking if it's in the mask passed to
> > the damn thing?
> 
> If you look at the history you'll find that STATX_MNT_ID has always been
> set unconditionally, i.e., userspace didn't have to request it
> explicitly. And that's fine. It's not like it's expensive.

Not the issue, really - the difference between the by-fd (when we call vfs_fstat())
and by-path (calling vfs_statx()) cases of vfs_fstatat() is what worries me.

In by-fd case you hit vfs_getattr() and that's it.  In by-path case you
hit exact same helper you do for statx(2), with STATX_BASIC_FLAGS for the
mask argument.  Which ends with vfs_getattr() + a bunch of followups in
vfs_statx_path().  _IF_ all those followups had been no-ops when mask
is just STATX_BASIC_FLAGS, everything would've been fine.  They are not,
though.

And while all current users of vfs_fstatat() ignore stat->attributes,
stat->attributes_mask and stat->mnt_id (as well as stat->result_mask),
that's a property of code we feed that struct kstat to after vfs_fstatat()
call has filled it.  Currently that's 9 functions, spread over a lot of
places.  Sure, each of those is fine, but any additional instance that does
care and we are getting an unpleasant inconsistency between by-fd and by-path
users.

Variants:

1) We can clone vfs_statx(), have the clone call vfs_getattr() instead
of vfs_statx_path() and use it in vfs_fstatat().  That would take care
of any future inconsistencies (as well as a minor side benefit of losing
request_mask argument in that thing).  OTOH, it's extra code duplication.

2) go for your "it's cheap anyway" and have vfs_fstatat() use
vfs_statx_fd(fd, flags, stat, STATX_BASIC_FLAGS) on the by-fd path.
Again, that restores consistency.  Cost: that extra work (tail of
vfs_statx_path()) done for fstat(2) just as it's currently done for
stat(2).

3) add an explicit "none of that fancy crap" flag argument to vfs_statx_path()
and pass it through vfs_statx().  Cost: extra argument passed through
the call chain.

And yes, it's all cheap, but {f,}stat(2) is often _very_ hot, so it's worth
getting right.


> > 2) why, in the name of everything unholy, does statx() on /dev/weird_shite
> > trigger modprobe on that thing?  Without even a permission check on it...
> 
> Block people should know. But afaict, that's a block device thing which
> happens with CONFIG_BLOCK_LEGACY_AUTOLOAD enabled which is supposedly
> deprecated.

Umm...  Deprecated or not, one generally expects that if /dev/foo is
owned by root:root with 0600 for permissions, then non-root won't be
able to trigger anything in the vicinity of the hardware in question,
including "load the driver"...  Note that open(2) won't get anywhere
near blkdev_open() - it will get EPERM before that.  stat(2) also
won't go there.  statx(2) will.  And frankly, getting the STATX_DIOALIGN
and STATX_WRITE_ATOMIC information out of such device is also somewhat
dubious, but that's less obviously wrong.
diff mbox series

Patch

--- a/fs/namei.c
+++ b/fs/namei.c
@@ -211,22 +211,38 @@  getname_flags(const char __user *filename, int flags)
 	return result;
 }
 
-struct filename *
-getname_uflags(const char __user *filename, int uflags)
+struct filename *getname_uflags(const char __user *filename, int uflags)
 {
 	int flags = (uflags & AT_EMPTY_PATH) ? LOOKUP_EMPTY : 0;
 
 	return getname_flags(filename, flags);
 }
 
-struct filename *
-getname(const char __user * filename)
+struct filename *getname(const char __user * filename)
 {
 	return getname_flags(filename, 0);
 }
 
-struct filename *
-getname_kernel(const char * filename)
+struct filename *__getname_maybe_null(const char __user *pathname)
+{
+	struct filename *name;
+	char c;
+
+	/* try to save on allocations; loss on um, though */
+	if (get_user(c, pathname))
+		return ERR_PTR(-EFAULT);
+	if (!c)
+		return NULL;
+
+	name = getname_flags(pathname, LOOKUP_EMPTY);
+	if (!IS_ERR(name) && !(name->name[0])) {
+		putname(name);
+		name = NULL;
+	}
+	return name;
+}
+
+struct filename *getname_kernel(const char * filename)
 {
 	struct filename *result;
 	int len = strlen(filename) + 1;
diff --git a/fs/stat.c b/fs/stat.c
index 41e598376d7e..aa5bfc41a669 100644
--- a/fs/stat.c
+++ b/fs/stat.c
@@ -326,18 +326,11 @@  int vfs_fstatat(int dfd, const char __user *filename,
 {
 	int ret;
 	int statx_flags = flags | AT_NO_AUTOMOUNT;
-	struct filename *name;
+	struct filename *name = getname_maybe_null(filename, flags);
 
-	/*
-	 * Work around glibc turning fstat() into fstatat(AT_EMPTY_PATH)
-	 *
-	 * If AT_EMPTY_PATH is set, we expect the common case to be that
-	 * empty path, and avoid doing all the extra pathname work.
-	 */
-	if (flags == AT_EMPTY_PATH && vfs_empty_path(dfd, filename))
+	if (!name)
 		return vfs_fstat(dfd, stat);
 
-	name = getname_flags(filename, getname_statx_lookup_flags(statx_flags));
 	ret = vfs_statx(dfd, name, statx_flags, stat, STATX_BASIC_STATS);
 	putname(name);
 
@@ -775,7 +768,7 @@  SYSCALL_DEFINE5(statx,
 {
 	int ret;
 	unsigned lflags;
-	struct filename *name;
+	struct filename *name = getname_maybe_null(filename, flags);
 
 	/*
 	 * Short-circuit handling of NULL and "" paths.
@@ -788,10 +781,9 @@  SYSCALL_DEFINE5(statx,
 	 * Supporting this results in the uglification below.
 	 */
 	lflags = flags & ~(AT_NO_AUTOMOUNT | AT_STATX_SYNC_TYPE);
-	if (lflags == AT_EMPTY_PATH && vfs_empty_path(dfd, filename))
+	if (!name)
 		return do_statx_fd(dfd, flags & ~AT_NO_AUTOMOUNT, mask, buffer);
 
-	name = getname_flags(filename, getname_statx_lookup_flags(flags));
 	ret = do_statx(dfd, name, flags, mask, buffer);
 	putname(name);
 
diff --git a/include/linux/fs.h b/include/linux/fs.h
index e3c603d01337..403258ac2ea2 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2766,6 +2766,16 @@  extern struct filename *getname_flags(const char __user *, int);
 extern struct filename *getname_uflags(const char __user *, int);
 extern struct filename *getname(const char __user *);
 extern struct filename *getname_kernel(const char *);
+extern struct filename *__getname_maybe_null(const char __user *);
+static inline struct filename *getname_maybe_null(const char __user *name, int flags)
+{
+	if (!(flags & AT_EMPTY_PATH))
+		return getname(name);
+
+	if (!name)
+		return NULL;
+	return __getname_maybe_null(name);
+}
 extern void putname(struct filename *name);
 
 extern int finish_open(struct file *file, struct dentry *dentry,