[v3,0/2] openat2: minor uapi cleanups

Message ID	20200118120800.16358-1-cyphar@cyphar.com (mailing list archive)
Headers	show Return-Path: <SRS0=vkf+=3H=vger.kernel.org=linux-kselftest-owner@kernel.org> From: Aleksa Sarai <cyphar@cyphar.com> To: Alexander Viro <viro@zeniv.linux.org.uk>, Jeff Layton <jlayton@kernel.org>, "J. Bruce Fields" <bfields@fieldses.org>, Shuah Khan <shuah@kernel.org> Cc: Aleksa Sarai <cyphar@cyphar.com>, Florian Weimer <fweimer@redhat.com>, David Laight <david.laight@aculab.com>, Christian Brauner <christian.brauner@ubuntu.com>, quae@daurnimator.com, dev@opencontainers.org, containers@lists.linux-foundation.org, libc-alpha@sourceware.org, linux-api@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org Subject: [PATCH v3 0/2] openat2: minor uapi cleanups Date: Sat, 18 Jan 2020 23:07:58 +1100 Message-Id: <20200118120800.16358-1-cyphar@cyphar.com> In-Reply-To: <20200115144831.GJ8904@ZenIV.linux.org.uk> References: <20200115144831.GJ8904@ZenIV.linux.org.uk> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-kselftest-owner@vger.kernel.org Precedence: bulk
Series	openat2: minor uapi cleanups \| expand [v3,0/2] openat2: minor uapi cleanups [v3,1/2] open: introduce openat2(2) syscall [v3,2/2] selftests: add openat2(2) selftests

Message ID

20200118120800.16358-1-cyphar@cyphar.com (mailing list archive)

Headers

From: Aleksa Sarai <cyphar@cyphar.com>
To: Alexander Viro <viro@zeniv.linux.org.uk>,
        Jeff Layton <jlayton@kernel.org>,
        "J. Bruce Fields" <bfields@fieldses.org>,
        Shuah Khan <shuah@kernel.org>
Cc: Aleksa Sarai <cyphar@cyphar.com>,
        Florian Weimer <fweimer@redhat.com>,
        David Laight <david.laight@aculab.com>,
        Christian Brauner <christian.brauner@ubuntu.com>,
        quae@daurnimator.com, dev@opencontainers.org,
        containers@lists.linux-foundation.org, libc-alpha@sourceware.org,
        linux-api@vger.kernel.org, linux-fsdevel@vger.kernel.org,
        linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org
Subject: [PATCH v3 0/2] openat2: minor uapi cleanups
Date: Sat, 18 Jan 2020 23:07:58 +1100
Message-Id: <20200118120800.16358-1-cyphar@cyphar.com>
In-Reply-To: <20200115144831.GJ8904@ZenIV.linux.org.uk>
References: <20200115144831.GJ8904@ZenIV.linux.org.uk>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Sender: linux-kselftest-owner@vger.kernel.org
Precedence: bulk

Series

openat2: minor uapi cleanups | expand

Message

Aleksa Sarai Jan. 18, 2020, 12:07 p.m. UTC

Patch changelog:
  v3:
   * Merge changes into the original patches to make Al's life easier.
     [Al Viro]
  v2:
   * Add include <linux/types.h> to openat2.h. [Florian Weimer]
   * Move OPEN_HOW_SIZE_* constants out of UAPI. [Florian Weimer]
   * Switch from __aligned_u64 to __u64 since it isn't necessary.
     [David Laight]
  v1: <https://lore.kernel.org/lkml/20191219105533.12508-1-cyphar@cyphar.com/>

While openat2(2) is still not yet in Linus's tree, we can take this
opportunity to iron out some small warts that weren't noticed earlier:

  * A fix was suggested by Florian Weimer, to separate the openat2
    definitions so glibc can use the header directly. I've put the
    maintainership under VFS but let me know if you'd prefer it belong
    ot the fcntl folks.

  * Having heterogenous field sizes in an extensible struct results in
    "padding hole" problems when adding new fields (in addition the
    correct error to use for non-zero padding isn't entirely clear ).
    The simplest solution is to just copy clone(3)'s model -- always use
    u64s. It will waste a little more space in the struct, but it
    removes a possible future headache.

This patch is intended to replace the corresponding patches in Al's
#work.openat2 tree (and *will not* apply on Linus' tree).

@Al: I will send some additional patches later, but they will require
     proper design review since they're ABI-related features (namely,
	 adding a way to check what features a syscall supports as I
	 outlined in my talk here[1]).

[1]: https://youtu.be/ggD-eb3yPVs

Aleksa Sarai (2):
  open: introduce openat2(2) syscall
  selftests: add openat2(2) selftests

 CREDITS                                       |   4 +-
 MAINTAINERS                                   |   1 +
 arch/alpha/kernel/syscalls/syscall.tbl        |   1 +
 arch/arm/tools/syscall.tbl                    |   1 +
 arch/arm64/include/asm/unistd.h               |   2 +-
 arch/arm64/include/asm/unistd32.h             |   2 +
 arch/ia64/kernel/syscalls/syscall.tbl         |   1 +
 arch/m68k/kernel/syscalls/syscall.tbl         |   1 +
 arch/microblaze/kernel/syscalls/syscall.tbl   |   1 +
 arch/mips/kernel/syscalls/syscall_n32.tbl     |   1 +
 arch/mips/kernel/syscalls/syscall_n64.tbl     |   1 +
 arch/mips/kernel/syscalls/syscall_o32.tbl     |   1 +
 arch/parisc/kernel/syscalls/syscall.tbl       |   1 +
 arch/powerpc/kernel/syscalls/syscall.tbl      |   1 +
 arch/s390/kernel/syscalls/syscall.tbl         |   1 +
 arch/sh/kernel/syscalls/syscall.tbl           |   1 +
 arch/sparc/kernel/syscalls/syscall.tbl        |   1 +
 arch/x86/entry/syscalls/syscall_32.tbl        |   1 +
 arch/x86/entry/syscalls/syscall_64.tbl        |   1 +
 arch/xtensa/kernel/syscalls/syscall.tbl       |   1 +
 fs/open.c                                     | 147 +++--
 include/linux/fcntl.h                         |  16 +-
 include/linux/syscalls.h                      |   3 +
 include/uapi/asm-generic/unistd.h             |   5 +-
 include/uapi/linux/fcntl.h                    |   2 +-
 include/uapi/linux/openat2.h                  |  39 ++
 tools/testing/selftests/Makefile              |   1 +
 tools/testing/selftests/openat2/.gitignore    |   1 +
 tools/testing/selftests/openat2/Makefile      |   8 +
 tools/testing/selftests/openat2/helpers.c     | 109 ++++
 tools/testing/selftests/openat2/helpers.h     | 106 ++++
 .../testing/selftests/openat2/openat2_test.c  | 312 +++++++++++
 .../selftests/openat2/rename_attack_test.c    | 160 ++++++
 .../testing/selftests/openat2/resolve_test.c  | 523 ++++++++++++++++++
 34 files changed, 1418 insertions(+), 39 deletions(-)
 create mode 100644 include/uapi/linux/openat2.h
 create mode 100644 tools/testing/selftests/openat2/.gitignore
 create mode 100644 tools/testing/selftests/openat2/Makefile
 create mode 100644 tools/testing/selftests/openat2/helpers.c
 create mode 100644 tools/testing/selftests/openat2/helpers.h
 create mode 100644 tools/testing/selftests/openat2/openat2_test.c
 create mode 100644 tools/testing/selftests/openat2/rename_attack_test.c
 create mode 100644 tools/testing/selftests/openat2/resolve_test.c

Comments

Al Viro Jan. 18, 2020, 3:28 p.m. UTC | #1

On Sat, Jan 18, 2020 at 11:07:58PM +1100, Aleksa Sarai wrote:
> Patch changelog:
>   v3:
>    * Merge changes into the original patches to make Al's life easier.
>      [Al Viro]
>   v2:
>    * Add include <linux/types.h> to openat2.h. [Florian Weimer]
>    * Move OPEN_HOW_SIZE_* constants out of UAPI. [Florian Weimer]
>    * Switch from __aligned_u64 to __u64 since it isn't necessary.
>      [David Laight]
>   v1: <https://lore.kernel.org/lkml/20191219105533.12508-1-cyphar@cyphar.com/>
> 
> While openat2(2) is still not yet in Linus's tree, we can take this
> opportunity to iron out some small warts that weren't noticed earlier:
> 
>   * A fix was suggested by Florian Weimer, to separate the openat2
>     definitions so glibc can use the header directly. I've put the
>     maintainership under VFS but let me know if you'd prefer it belong
>     ot the fcntl folks.
> 
>   * Having heterogenous field sizes in an extensible struct results in
>     "padding hole" problems when adding new fields (in addition the
>     correct error to use for non-zero padding isn't entirely clear ).
>     The simplest solution is to just copy clone(3)'s model -- always use
>     u64s. It will waste a little more space in the struct, but it
>     removes a possible future headache.
> 
> This patch is intended to replace the corresponding patches in Al's
> #work.openat2 tree (and *will not* apply on Linus' tree).
> 
> @Al: I will send some additional patches later, but they will require
>      proper design review since they're ABI-related features (namely,
> 	 adding a way to check what features a syscall supports as I
> 	 outlined in my talk here[1]).

#work.openat2 updated, #for-next rebuilt and force-pushed.  There's
a massive update of #work.namei as well, also pushed out; not in
#for-next yet, will post the patch series for review later today.

Al Viro Jan. 18, 2020, 6:09 p.m. UTC | #2

On Sat, Jan 18, 2020 at 03:28:33PM +0000, Al Viro wrote:

> #work.openat2 updated, #for-next rebuilt and force-pushed.  There's
> a massive update of #work.namei as well, also pushed out; not in
> #for-next yet, will post the patch series for review later today.

BTW, looking through that code again, how could this
static bool legitimize_root(struct nameidata *nd)
{
        /*
         * For scoped-lookups (where nd->root has been zeroed), we need to
         * restart the whole lookup from scratch -- because set_root() is wrong
         * for these lookups (nd->dfd is the root, not the filesystem root).
         */
        if (!nd->root.mnt && (nd->flags & LOOKUP_IS_SCOPED))
                return false;

possibly trigger?  The only things that ever clean ->root.mnt are

	1) failing legitimize_path(nd, &nd->root, nd->root_seq) in
legitimize_root() itself.  If *ANY* legitimize_path() has failed,
we are through - RCU pathwalk is given up.  In particular, if you
look at the call chains leading to legitimize_root(), you'll see
that it's called by unlazy_walk() or unlazy_child() and failure
has either of those buggger off immediately.  The same goes for
their callers; fail any of those and we are done; the very next
thing that will be done with that nameidata is going to be
terminate_walk().  We don't look at its fields, etc. - just return
to the top level ASAP and call terminate_walk() on it.  Which is where
we run into
                if (nd->flags & LOOKUP_ROOT_GRABBED) {
                        path_put(&nd->root);
                        nd->flags &= ~LOOKUP_ROOT_GRABBED;
                }
paired with setting LOOKUP_ROOT_GRABBED just before the attempt
to legitimize in legitimize_root().  The next thing *after*
terminate_walk() is either path_init() or the end of life for
that struct nameidata instance.
	This is really, really fundamental for understanding the whole
thing - a failure of unlazy_walk/unlazy_child means that we are through
with that attempt.

	2) complete_walk() doing
                if (!(nd->flags & (LOOKUP_ROOT | LOOKUP_IS_SCOPED)))
                        nd->root.mnt = NULL;   
Can't happen with LOOKUP_IS_SCOPED in flags, obviously.

	3) path_init().  Where it's followed either by leaving through
        if (*s == '/' && !(flags & LOOKUP_IN_ROOT)) {
		....
        }
(and LOOKUP_IS_SCOPED includes LOOKUP_IN_ROOT) or with a failure exit
(no calls of *anything* but terminate_walk() after that or with
        if (flags & LOOKUP_IS_SCOPED) {
                nd->root = nd->path;
... and that makes damn sure nd->root.mnt is not NULL.

And neither of the LOOKUP_IS_SCOPED bits ever gets changed in nd->flags -
they remain as path_init() has set them.

The same, BTW, goes for the check you've added in the beginning of
set_root() - set_root() is called only with NULL nd->root.mnt (trivial to
prove) and that is incompatible with LOOKUP_IS_SCOPED.  I'm kinda-sorta
OK with having WARN_ON() there for a while, but IMO the check in the
beginning of legitimize_root() should go away - this kind of defensive
programming only makes harder to reason about the behaviour of the
entire thing.  And fs/namei.c is too convoluted as it is...

Aleksa Sarai Jan. 18, 2020, 11:03 p.m. UTC | #3

On 2020-01-18, Al Viro <viro@zeniv.linux.org.uk> wrote:
> On Sat, Jan 18, 2020 at 03:28:33PM +0000, Al Viro wrote:
> 
> > #work.openat2 updated, #for-next rebuilt and force-pushed.  There's
> > a massive update of #work.namei as well, also pushed out; not in
> > #for-next yet, will post the patch series for review later today.
> 
> BTW, looking through that code again, how could this
> static bool legitimize_root(struct nameidata *nd)
> {
>         /*
>          * For scoped-lookups (where nd->root has been zeroed), we need to
>          * restart the whole lookup from scratch -- because set_root() is wrong
>          * for these lookups (nd->dfd is the root, not the filesystem root).
>          */
>         if (!nd->root.mnt && (nd->flags & LOOKUP_IS_SCOPED))
>                 return false;
> 
> possibly trigger?  The only things that ever clean ->root.mnt are

You're quite right -- the codepath I was worried about was pick_link()
failing (which *does* clear nd->path.mnt, and I must've misread it at
the time as nd->root.mnt).

We can drop this check, though now complete_walk()'s main defence
against a NULL nd->root.mnt is that path_is_under() will fail and
trigger -EXDEV (or set_root() will fail at some point in the future).
However, as you pointed out, a NULL nd->root.mnt won't happen with
things as they stand today -- I might be a little too paranoid. :P

> 	This is really, really fundamental for understanding the whole
> thing - a failure of unlazy_walk/unlazy_child means that we are through
> with that attempt.

Yup -- see above, the worry was about pick_link() not about how the
RCU-walk and REF-walk dances operate.

> The same, BTW, goes for the check you've added in the beginning of
> set_root() - set_root() is called only with NULL nd->root.mnt (trivial to
> prove) and that is incompatible with LOOKUP_IS_SCOPED.  I'm kinda-sorta
> OK with having WARN_ON() there for a while, but IMO the check in the
> beginning of legitimize_root() should go away -

You're quite right about dropping the legitimize_root() check, but I'd
like to keep the WARN_ON() in set_root(). The main reason being that it
makes us very damn sure that a future change won't accidentally break
the nd->root contract which all of the LOOKUP_IS_SCOPED changes rely on.
Then again, this might be my paranoia popping up again.

> this kind of defensive programming only makes harder to reason about
> the behaviour of the entire thing.  And fs/namei.c is too convoluted
> as it is...

If you feel that dropping some of these more defensive checks is better
for the codebase as a whole, then I defer to your judgement. I
completely agree that namei is a pretty complicated chunk of code.

Al Viro Jan. 19, 2020, 1:12 a.m. UTC | #4

On Sun, Jan 19, 2020 at 10:03:13AM +1100, Aleksa Sarai wrote:

> > possibly trigger?  The only things that ever clean ->root.mnt are
> 
> You're quite right -- the codepath I was worried about was pick_link()
> failing (which *does* clear nd->path.mnt, and I must've misread it at
> the time as nd->root.mnt).

pick_link() (allocation failure of external stack in RCU case, followed
by failure to legitimize the link) is, unfortunately, subtle and nasty.
We *must* path_put() the link; if we'd managed to legitimize the mount
and failed on dentry, the mount needs to be dropped.  No way around it.
And while everything else there can be left for soon-to-be-reached
terminate_walk(), this cannot.  We have no good way to pass what
we need to drop to the place where that eventual terminate_walk()
drops rcu_read_lock().  So we end up having to do what terminate_walk()
would've done and do it right there, so we could do that path_put(link)
before we bugger off.

I'm not happy about that, but I don't see cleaner solutions, more's the
pity.  However, it doesn't mess with ->root - nor should it, since
we don't have LOOKUP_ROOT_GRABBED (not in RCU mode), so it can and
should be left alone.

> We can drop this check, though now complete_walk()'s main defence
> against a NULL nd->root.mnt is that path_is_under() will fail and
> trigger -EXDEV (or set_root() will fail at some point in the future).
> However, as you pointed out, a NULL nd->root.mnt won't happen with
> things as they stand today -- I might be a little too paranoid. :P

The only reason why complete_walk() zeroes nd->root in some cases is
microoptimization - we *know* we won't be using it later, so we don't
care whether it's stale or not and can spare unlazy_walk() a bit of
work.  All there is to that one.

I don't see any reason for adding code that would clear nd->root in later
work; if such thing does get added (again, I don't see what purpose
could that possibly serve), we'll need to watch out for a lot of things.
Starting with LOOKUP_ROOT case...  It's not something likely to slip
in unnoticed.