[v4,0/2] implement OA2_INHERIT_CRED flag for openat2()

Message ID	20240424105248.189032-1-stsp2@yandex.ru (mailing list archive)
Headers	show Received: from forward102b.mail.yandex.net (forward102b.mail.yandex.net [178.154.239.149]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3EDE1152E0B; Wed, 24 Apr 2024 10:52:56 +0000 (UTC) From: Stas Sergeev <stsp2@yandex.ru> To: linux-kernel@vger.kernel.org Cc: Stas Sergeev <stsp2@yandex.ru>, Stefan Metzmacher <metze@samba.org>, Eric Biederman <ebiederm@xmission.com>, Alexander Viro <viro@zeniv.linux.org.uk>, Andy Lutomirski <luto@kernel.org>, Christian Brauner <brauner@kernel.org>, Jan Kara <jack@suse.cz>, Jeff Layton <jlayton@kernel.org>, Chuck Lever <chuck.lever@oracle.com>, Alexander Aring <alex.aring@gmail.com>, David Laight <David.Laight@ACULAB.COM>, linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org, Paolo Bonzini <pbonzini@redhat.com>, =?utf-8?q?C?= =?utf-8?q?hristian_G=C3=B6ttsche?= <cgzones@googlemail.com> Subject: [PATCH v4 0/2] implement OA2_INHERIT_CRED flag for openat2() Date: Wed, 24 Apr 2024 13:52:46 +0300 Message-ID: <20240424105248.189032-1-stsp2@yandex.ru> Precedence: bulk MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit
Series	implement OA2_INHERIT_CRED flag for openat2() \| expand [v4,0/2] implement OA2_INHERIT_CRED flag for openat2() [1/2] fs: reorganize path_openat() [2/2] openat2: add OA2_INHERIT_CRED flag

stsp April 24, 2024, 10:52 a.m. UTC

This patch-set implements the OA2_INHERIT_CRED flag for openat2() syscall.
It is needed to perform an open operation with the creds that were in
effect when the dir_fd was opened. This allows the process to pre-open
some dirs and switch eUID (and other UIDs/GIDs) to the less-privileged
user, while still retaining the possibility to open/create files within
the pre-opened directory set.

The sand-boxing is security-oriented: symlinks leading outside of a
sand-box are rejected. /proc magic links are rejected.
The more detailed description (including security considerations)
is available in the log messages of individual patches.

Changes in v4:
- add optimizations suggested by David Laight <David.Laight@ACULAB.COM>
- move security checks to build_open_flags()
- force RESOLVE_NO_MAGICLINKS as suggested by Andy Lutomirski <luto@kernel.org>

Changes in v3:
- partially revert v2 changes to avoid overriding capabilities.
  Only the bare minimum is overridden: fsuid, fsgid and group_info.
  Document the fact the full cred override is unwanted, as it may
  represent an unneeded security risk.

Changes in v2:
- capture full struct cred instead of just fsuid/fsgid.
  Suggested by Stefan Metzmacher <metze@samba.org>

CC: Stefan Metzmacher <metze@samba.org>
CC: Eric Biederman <ebiederm@xmission.com>
CC: Alexander Viro <viro@zeniv.linux.org.uk>
CC: Andy Lutomirski <luto@kernel.org>
CC: Christian Brauner <brauner@kernel.org>
CC: Jan Kara <jack@suse.cz>
CC: Jeff Layton <jlayton@kernel.org>
CC: Chuck Lever <chuck.lever@oracle.com>
CC: Alexander Aring <alex.aring@gmail.com>
CC: David Laight <David.Laight@ACULAB.COM>
CC: linux-fsdevel@vger.kernel.org
CC: linux-kernel@vger.kernel.org
CC: linux-api@vger.kernel.org
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Christian Göttsche <cgzones@googlemail.com>

Christian Brauner April 24, 2024, 4:09 p.m. UTC | #1

On Wed, Apr 24, 2024 at 01:52:46PM +0300, Stas Sergeev wrote:
> This patch-set implements the OA2_INHERIT_CRED flag for openat2() syscall.
> It is needed to perform an open operation with the creds that were in
> effect when the dir_fd was opened. This allows the process to pre-open
> some dirs and switch eUID (and other UIDs/GIDs) to the less-privileged
> user, while still retaining the possibility to open/create files within
> the pre-opened directory set.
> 
> The sand-boxing is security-oriented: symlinks leading outside of a
> sand-box are rejected. /proc magic links are rejected.
> The more detailed description (including security considerations)
> is available in the log messages of individual patches.
> 
> Changes in v4:
> - add optimizations suggested by David Laight <David.Laight@ACULAB.COM>
> - move security checks to build_open_flags()
> - force RESOLVE_NO_MAGICLINKS as suggested by Andy Lutomirski <luto@kernel.org>
> 
> Changes in v3:
> - partially revert v2 changes to avoid overriding capabilities.
>   Only the bare minimum is overridden: fsuid, fsgid and group_info.
>   Document the fact the full cred override is unwanted, as it may
>   represent an unneeded security risk.
> 
> Changes in v2:
> - capture full struct cred instead of just fsuid/fsgid.
>   Suggested by Stefan Metzmacher <metze@samba.org>

This smells ripe enough to serve as an attack vector in non-obvious
ways. And in general this has the potential to confuse the hell out
unsuspecting userspace. They can now suddenly get sent such
special-sauce files such as this that they have no way of recognizing as
there's neither an FMODE_* flag nor is the OA2_* flag recorded so it's
not available in F_GETFL.

There's not even a way to restrict that new flag because no LSM ever
sees it. So that behavior might break LSM assumptions as well.

And it is effectively usable to steal credentials. If process A opens a
directory with uid/gid 0 then sends that directory fd via AF_UNIX or
something to process B then process B can inherit the uid/gid of process
A by specifying OA2_* with no way for process A to prevent this - not
even through an LSM.

The permission checking model that we have right now is already baroque.
I see zero reason to add more complexity for the sake of "lightweight
sandboxing". We have LSMs and namespaces for stuff like this.

NAK.

stsp April 24, 2024, 5:50 p.m. UTC | #2

24.04.2024 19:09, Christian Brauner пишет:
> This smells ripe enough to serve as an attack vector in non-obvious
> ways. And in general this has the potential to confuse the hell out
> unsuspecting userspace.

Unsuspecting user-space will simply
not use this flag. What do you mean?


>   They can now suddenly get sent such
> special-sauce files

There are no any special files.
This flag helps you to open a file on
which you currently have no perms
to open, but had those in the past.


>   such as this that they have no way of recognizing as
> there's neither an FMODE_* flag nor is the OA2_* flag recorded so it's
> not available in F_GETFL.
>
> There's not even a way to restrict that new flag because no LSM ever
> sees it. So that behavior might break LSM assumptions as well.
>
> And it is effectively usable to steal credentials. If process A opens a
> directory with uid/gid 0 then sends that directory fd via AF_UNIX or
> something to process B then process B can inherit the uid/gid of process

No, it doesn't inherit anything.
The inheritance happens only for
a duration of an open() call, helping
open() to succeed. The creds are
reverted when open() completed.

The only theoretically possible attack
would be to open some file you'd never
intended to open. Also note that a
very minimal sed of creds is overridden:
fsuid, fsgid, groupinfo.

> A by specifying OA2_* with no way for process A to prevent this - not
> even through an LSM.

If process B doesn't use that flag, it
inherits nothing, no matter what process
A did or passed via a socket.
So an unaware process that doesn't
use that flag, is completely unaffected.

> The permission checking model that we have right now is already baroque.
> I see zero reason to add more complexity for the sake of "lightweight
> sandboxing". We have LSMs and namespaces for stuff like this.
>
> NAK.

I don't think it is fair to say NAK
without actually reading the patch
or asking its author for clarifications.
Even though you didn't ask, I provided
my clarifications above, as I find that
a polite action.

Christian Brauner April 25, 2024, 9:54 a.m. UTC | #3

On Wed, Apr 24, 2024 at 08:50:30PM +0300, stsp wrote:
> 24.04.2024 19:09, Christian Brauner пишет:
> > This smells ripe enough to serve as an attack vector in non-obvious
> > ways. And in general this has the potential to confuse the hell out
> > unsuspecting userspace.
> 
> Unsuspecting user-space will simply
> not use this flag. What do you mean?
> 
> 
> >   They can now suddenly get sent such
> > special-sauce files
> 
> There are no any special files.
> This flag helps you to open a file on
> which you currently have no perms
> to open, but had those in the past.
> 
> 
> >   such as this that they have no way of recognizing as
> > there's neither an FMODE_* flag nor is the OA2_* flag recorded so it's
> > not available in F_GETFL.
> > 
> > There's not even a way to restrict that new flag because no LSM ever
> > sees it. So that behavior might break LSM assumptions as well.
> > 
> > And it is effectively usable to steal credentials. If process A opens a
> > directory with uid/gid 0 then sends that directory fd via AF_UNIX or
> > something to process B then process B can inherit the uid/gid of process
> 
> No, it doesn't inherit anything.
> The inheritance happens only for
> a duration of an open() call, helping
> open() to succeed. The creds are
> reverted when open() completed.
> 
> The only theoretically possible attack
> would be to open some file you'd never
> intended to open. Also note that a
> very minimal sed of creds is overridden:
> fsuid, fsgid, groupinfo.
> 
> > A by specifying OA2_* with no way for process A to prevent this - not
> > even through an LSM.
> 
> If process B doesn't use that flag, it
> inherits nothing, no matter what process
> A did or passed via a socket.
> So an unaware process that doesn't
> use that flag, is completely unaffected.

The point is that the original opener has no way to prevent his creds
being abused by a completely unrelated process later on. Something I've
clearly explained in my mail.

> 
> > The permission checking model that we have right now is already baroque.
> > I see zero reason to add more complexity for the sake of "lightweight
> > sandboxing". We have LSMs and namespaces for stuff like this.
> > 
> > NAK.
> 
> I don't think it is fair to say NAK
> without actually reading the patch
> or asking its author for clarifications.
> Even though you didn't ask, I provided
> my clarifications above, as I find that
> a polite action.

I'm not sure what you don't understand or why you need further
clarification. Your patch allows any opener using your new flag to steal
the uid/gid/whatever from the original opener. It was even worse in the
first version where the whole struct cred of the original opener was
used. It's obviously a glaring security hole that's opened up by this.

Let alone that the justification "It's useful for some lightweight
sandboxing" is absolutely not sufficient to justify substantial shifts
in the permission model.

The NAK stands.

stsp April 25, 2024, 10:12 a.m. UTC | #4

25.04.2024 12:54, Christian Brauner пишет:
> I'm not sure what you don't understand or why you need further
> clarification. Your patch allows any opener using your new flag to steal
> the uid/gid/whatever from the original opener.

No, absolutely impossible (see below).


>   It was even worse in the
> first version where the whole struct cred of the original opener was
> used. It's obviously a glaring security hole that's opened up by this.

Well, it was the second version actually
(first one only had fsuid/fsgid), but no,
its the same thing either way.
The creds are overridden for a diration of
an openat2() syscall. It doesn't matter
what uid/gid are there, because they are
not used during openat2(), and are reverted
back at the end. The only reason I decided
to get back to fsuid/fsgid, were the capabilities,
which I don't want to be raised during openat2().

> Let alone that the justification "It's useful for some lightweight
> sandboxing" is absolutely not sufficient to justify substantial shifts
> in the permission model.
>
> The NAK stands.

But I am sure you still don't understand
what exactly the patch does, so why not
to ask instead of asserting?
You say uid/gid can be stolen, but no,
it can't: the creds are reverted. Only
fsuid/fsgid (and caps in v2 of the patch)
actually affect openat2(), but nothing is
"leaked" after openat2() finished.

That said, Viro already pointed to the actual
problem, and the patch-testing bot did the
same. So I have a valid complains already,
and you don't have to add the invalid ones
to them. :)

Christian Brauner April 25, 2024, 12:08 p.m. UTC | #5

> But I am sure you still don't understand
> what exactly the patch does, so why not
> to ask instead of asserting?
> You say uid/gid can be stolen, but no,
> it can't: the creds are reverted. Only
> fsuid/fsgid (and caps in v2 of the patch)
> actually affect openat2(), but nothing is
> "leaked" after openat2() finished.

I say "stolen" because the original opener has no say in this. You're
taking their fsuid/fsgid and groups and overriding creds for the
duration of the lookup and open. Something the original opener never
consented to. But let's call it "borrowed" if you're hung up on
terminology here.

But ultimately it's the same complaint: the original opener has no way
of controlling this behavior. Once ignored in my first reply, and now
again conveniently cut off again. Let alone all the other objections.

And fwiw, the same way you somehow feel like I haven't read your patch
it seems you to consistently underestimate the implications of this
change. Which is really strange because it's pretty obvious. It's
effectively temporary setuid/setgid with some light limitations.

Leaking directory descriptors across security boundaries becomes a lot
more interesting with this patch applied. Something which has happened
multiple times already and heavily figures in container escapes. And the
RESOLVE_BENEATH/IN_ROOT restrictions only stave off symlink (and magic
link) attacks. If a directory fd is leaked a container can take the
fsuid/fsgid/groups from the original opener of that directory and write
to disk with whatever it resolves to in that namespace's idmapping. It's
basically a nice way to puzzle together arbitrary fsuid/fsgid and
idmapping pairs in whatever namespace the opener happens to be in.

And again it also messes with LSMs because you're changing credentials
behind their back.

And to the "unsuspecting userspace" point you dismissed earlier.
Providing a dirfd to a lesser privileged process isn't horrendously
dangerous right now. But with this change it sure is. For stuff like
libpathrs or systemd's fdstore this change has immediate security
implications.

stsp April 25, 2024, 12:39 p.m. UTC | #6

25.04.2024 15:08, Christian Brauner пишет:
>> But I am sure you still don't understand
>> what exactly the patch does, so why not
>> to ask instead of asserting?
>> You say uid/gid can be stolen, but no,
>> it can't: the creds are reverted. Only
>> fsuid/fsgid (and caps in v2 of the patch)
>> actually affect openat2(), but nothing is
>> "leaked" after openat2() finished.
> I say "stolen" because the original opener has no say in this.

The initial idea was to keep this all
within a single-process boundary. It
wasn't coded that way though. :(


>   You're
> taking their fsuid/fsgid and groups and overriding creds for the
> duration of the lookup and open. Something the original opener never
> consented to. But let's call it "borrowed" if you're hung up on
> terminology here.

Not a terminology: you were explicitly
talking about uid/gid, blaming a v2 of
my patch. But v2 was not any more
harmful than others and uid/gid cannot
be leaked even there.
But I don't mind: if now you realize v2
is not a leak for uid/gid, then we are on
the same track.

> But ultimately it's the same complaint: the original opener has no way
> of controlling this behavior.

It wasn't clear if the opener should
control that behaviour, or maybe instead
that all should be limited within a single
process?
Andy Lutomirski's explanation made it
clear that even if the openers are the
same, the control is still needed. So now
this argument is undeniable.


>   Once ignored in my first reply, and now
> again conveniently cut off again. Let alone all the other objections.

Sorry but your complains were about
stealing uid/gid in v2, which were clearly
wrong. But this is a matter of the past.

> And fwiw, the same way you somehow feel like I haven't read your patch
> it seems you to consistently underestimate the implications of this
> change. Which is really strange because it's pretty obvious. It's
> effectively temporary setuid/setgid with some light limitations.

Temporary cred override is quite common
within the current code AFAICS when I grep
it for override_creds() call.

> Leaking directory descriptors across security boundaries becomes a lot
> more interesting with this patch applied. Something which has happened
> multiple times already and heavily figures in container escapes. And the
> RESOLVE_BENEATH/IN_ROOT restrictions only stave off symlink (and magic
> link) attacks. If a directory fd is leaked a container can take the
> fsuid/fsgid/groups from the original opener of that directory and write
> to disk with whatever it resolves to in that namespace's idmapping. It's
> basically a nice way to puzzle together arbitrary fsuid/fsgid and
> idmapping pairs in whatever namespace the opener happens to be in.

Yes, so the opt-in flag is undeniably needed.

> And to the "unsuspecting userspace" point you dismissed earlier.
> Providing a dirfd to a lesser privileged process isn't horrendously
> dangerous right now. But with this change it sure is. For stuff like
> libpathrs or systemd's fdstore this change has immediate security
> implications.

So am I getting your point correctly:
- process A uses the opt-in flag (eg O_CAPTURE_CREDS)
   and passes the fd to "unsuspecting userspace" B.
- process B is not going to use O_INHERIT_CREDS,
   but it now still can't drop his privs fully.

Is this what you mean?

[v4,0/2] implement OA2_INHERIT_CRED flag for openat2()

Message

Comments