Message ID | 1436989569-69582-4-git-send-email-seth.forshee@canonical.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Wed, Jul 15, 2015 at 02:46:04PM -0500, Seth Forshee wrote: > Capability sets attached to files must be ignored except in the > user namespaces where the mounter is privileged, i.e. s_user_ns > and its descendants. Otherwise a vector exists for gaining > privileges in namespaces where a user is not already privileged. > > Add a new helper function, in_user_ns(), to test whether a user > namespace is the same as or a descendant of another namespace. > Use this helper to determine whether a file's capability set > should be applied to the caps constructed during exec. > > Signed-off-by: Seth Forshee <seth.forshee@canonical.com> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> I think it's an ok behavior, though let's just go over the alternatives. It might actually be ok to simply require that the user_ns be equal. If I unshare a new userns in which a different uid is mapped to root, I may not want file capabilities to be granted to tasks in that ns. (On the other hand, I might be creating a new user_ns specifically to not have a uid 0 mapped into it at all, and only have file capabilities grant privilege) Conversely, if I unshare one user_ns with a MS_SHARED mnt_ns, mount an ext4fs, and then (from the parent shell) unshare another user_ns with the same mapping, intending it to be a "peer" to the first one I'd unshared and be able to use the ext4fs it mounted. This won't work here. That's probably best - the appropriate thing to do was to attach to the existing user_ns. But it could end up being limiting in some special cases, so I'm bringing it up here. Again I think what you have here is the simplest and most sensible choice, so ack. > --- > include/linux/user_namespace.h | 8 ++++++++ > kernel/user_namespace.c | 14 ++++++++++++++ > security/commoncap.c | 2 ++ > 3 files changed, 24 insertions(+) > > diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h > index 8297e5b341d8..a43faa727124 100644 > --- a/include/linux/user_namespace.h > +++ b/include/linux/user_namespace.h > @@ -72,6 +72,8 @@ extern ssize_t proc_projid_map_write(struct file *, const char __user *, size_t, > extern ssize_t proc_setgroups_write(struct file *, const char __user *, size_t, loff_t *); > extern int proc_setgroups_show(struct seq_file *m, void *v); > extern bool userns_may_setgroups(const struct user_namespace *ns); > +extern bool in_userns(const struct user_namespace *ns, > + const struct user_namespace *target_ns); > #else > > static inline struct user_namespace *get_user_ns(struct user_namespace *ns) > @@ -100,6 +102,12 @@ static inline bool userns_may_setgroups(const struct user_namespace *ns) > { > return true; > } > + > +static inline bool in_userns(const struct user_namespace *ns, > + const struct user_namespace *target_ns) > +{ > + return true; > +} > #endif > > #endif /* _LINUX_USER_H */ > diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c > index 4109f8320684..2b043876d5f0 100644 > --- a/kernel/user_namespace.c > +++ b/kernel/user_namespace.c > @@ -944,6 +944,20 @@ bool userns_may_setgroups(const struct user_namespace *ns) > return allowed; > } > > +/* > + * Returns true if @ns is the same namespace as or a descendant of > + * @target_ns. > + */ > +bool in_userns(const struct user_namespace *ns, > + const struct user_namespace *target_ns) > +{ > + for (; ns; ns = ns->parent) { > + if (ns == target_ns) > + return true; > + } > + return false; > +} > + > static inline struct user_namespace *to_user_ns(struct ns_common *ns) > { > return container_of(ns, struct user_namespace, ns); > diff --git a/security/commoncap.c b/security/commoncap.c > index d103f5a4043d..175ab497e810 100644 > --- a/security/commoncap.c > +++ b/security/commoncap.c > @@ -439,6 +439,8 @@ static int get_file_caps(struct linux_binprm *bprm, bool *effective, bool *has_c > > if (bprm->file->f_path.mnt->mnt_flags & MNT_NOSUID) > return 0; > + if (!in_userns(current_user_ns(), bprm->file->f_path.mnt->mnt_sb->s_user_ns)) > + return 0; > > rc = get_vfs_caps_from_disk(bprm->file->f_path.dentry, &vcaps); > if (rc < 0) { > -- > 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Jul 15, 2015 at 2:48 PM, Serge E. Hallyn <serge@hallyn.com> wrote: > On Wed, Jul 15, 2015 at 02:46:04PM -0500, Seth Forshee wrote: >> Capability sets attached to files must be ignored except in the >> user namespaces where the mounter is privileged, i.e. s_user_ns >> and its descendants. Otherwise a vector exists for gaining >> privileges in namespaces where a user is not already privileged. >> >> Add a new helper function, in_user_ns(), to test whether a user >> namespace is the same as or a descendant of another namespace. >> Use this helper to determine whether a file's capability set >> should be applied to the caps constructed during exec. >> >> Signed-off-by: Seth Forshee <seth.forshee@canonical.com> > > Acked-by: Serge Hallyn <serge.hallyn@canonical.com> > > I think it's an ok behavior, though let's just go over the > alternatives. > > It might actually be ok to simply require that the user_ns be > equal. If I unshare a new userns in which a different uid is > mapped to root, I may not want file capabilities to be granted > to tasks in that ns. (On the other hand, I might be creating > a new user_ns specifically to not have a uid 0 mapped into it > at all, and only have file capabilities grant privilege) > > Conversely, if I unshare one user_ns with a MS_SHARED mnt_ns, mount > an ext4fs, and then (from the parent shell) unshare another user_ns > with the same mapping, intending it to be a "peer" to the first one > I'd unshared and be able to use the ext4fs it mounted. This won't > work here. That's probably best - the appropriate thing to do was > to attach to the existing user_ns. But it could end up being > limiting in some special cases, so I'm bringing it up here. > > Again I think what you have here is the simplest and most sensible > choice, so ack. > I think I'm missing something. Why is this separate from mount_may_suid? I can see why it would make sense to check s_user_ns (or maybe s_user_ns *and* the vfsmount namespace) in mount_may_suid, but I don't see why we need separate checks. --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Andy Lutomirski <luto@amacapital.net> writes: > On Wed, Jul 15, 2015 at 2:48 PM, Serge E. Hallyn <serge@hallyn.com> wrote: >> On Wed, Jul 15, 2015 at 02:46:04PM -0500, Seth Forshee wrote: >>> Capability sets attached to files must be ignored except in the >>> user namespaces where the mounter is privileged, i.e. s_user_ns >>> and its descendants. Otherwise a vector exists for gaining >>> privileges in namespaces where a user is not already privileged. >>> >>> Add a new helper function, in_user_ns(), to test whether a user >>> namespace is the same as or a descendant of another namespace. >>> Use this helper to determine whether a file's capability set >>> should be applied to the caps constructed during exec. >>> >>> Signed-off-by: Seth Forshee <seth.forshee@canonical.com> >> >> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> >> >> I think it's an ok behavior, though let's just go over the >> alternatives. >> >> It might actually be ok to simply require that the user_ns be >> equal. If I unshare a new userns in which a different uid is >> mapped to root, I may not want file capabilities to be granted >> to tasks in that ns. (On the other hand, I might be creating >> a new user_ns specifically to not have a uid 0 mapped into it >> at all, and only have file capabilities grant privilege) >> >> Conversely, if I unshare one user_ns with a MS_SHARED mnt_ns, mount >> an ext4fs, and then (from the parent shell) unshare another user_ns >> with the same mapping, intending it to be a "peer" to the first one >> I'd unshared and be able to use the ext4fs it mounted. This won't >> work here. That's probably best - the appropriate thing to do was >> to attach to the existing user_ns. But it could end up being >> limiting in some special cases, so I'm bringing it up here. >> >> Again I think what you have here is the simplest and most sensible >> choice, so ack. >> > > I think I'm missing something. Why is this separate from mount_may_suid? > > I can see why it would make sense to check s_user_ns (or maybe > s_user_ns *and* the vfsmount namespace) in mount_may_suid, but I don't > see why we need separate checks. So I don't quite understand your concerns that lead to the mnt_may_suid patch. But in my limited understanding there are two distinct issues. 1) What do file capabilities mean on a filesystem mounted with user namespace privileges. Where the unprivileged user can control what resides on disk. That is what this patch should be about. Meaning and restricting those permissions to unprivileged users. 2) The second issue that I think your mnt_may_suid patch is about seems to be what to do if a mount winds up in a place we never intended. Aka leaks. I don't think any changes to mnt_may_suid are necessary in that sense. However they may be useful. So I think your mnt_may_suid change may be worth having but so far it seems unnecessary. Which is a long way of saying this patch is fundamentally necessary, and I am not certain about the mnt_may_suid patch. Am I right in understanding it's purpose? Or does this patch actually succeed in obsoleting it? Eric -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Jul 15, 2015 at 05:35:24PM -0500, Eric W. Biederman wrote: > Andy Lutomirski <luto@amacapital.net> writes: > > > On Wed, Jul 15, 2015 at 2:48 PM, Serge E. Hallyn <serge@hallyn.com> wrote: > >> On Wed, Jul 15, 2015 at 02:46:04PM -0500, Seth Forshee wrote: > >>> Capability sets attached to files must be ignored except in the > >>> user namespaces where the mounter is privileged, i.e. s_user_ns > >>> and its descendants. Otherwise a vector exists for gaining > >>> privileges in namespaces where a user is not already privileged. > >>> > >>> Add a new helper function, in_user_ns(), to test whether a user > >>> namespace is the same as or a descendant of another namespace. > >>> Use this helper to determine whether a file's capability set > >>> should be applied to the caps constructed during exec. > >>> > >>> Signed-off-by: Seth Forshee <seth.forshee@canonical.com> > >> > >> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> > >> > >> I think it's an ok behavior, though let's just go over the > >> alternatives. > >> > >> It might actually be ok to simply require that the user_ns be > >> equal. If I unshare a new userns in which a different uid is > >> mapped to root, I may not want file capabilities to be granted > >> to tasks in that ns. (On the other hand, I might be creating > >> a new user_ns specifically to not have a uid 0 mapped into it > >> at all, and only have file capabilities grant privilege) > >> > >> Conversely, if I unshare one user_ns with a MS_SHARED mnt_ns, mount > >> an ext4fs, and then (from the parent shell) unshare another user_ns > >> with the same mapping, intending it to be a "peer" to the first one > >> I'd unshared and be able to use the ext4fs it mounted. This won't > >> work here. That's probably best - the appropriate thing to do was > >> to attach to the existing user_ns. But it could end up being > >> limiting in some special cases, so I'm bringing it up here. > >> > >> Again I think what you have here is the simplest and most sensible > >> choice, so ack. > >> > > > > I think I'm missing something. Why is this separate from mount_may_suid? > > > > I can see why it would make sense to check s_user_ns (or maybe > > s_user_ns *and* the vfsmount namespace) in mount_may_suid, but I don't > > see why we need separate checks. > > So I don't quite understand your concerns that lead to the mnt_may_suid > patch. But in my limited understanding there are two distinct issues. > > 1) What do file capabilities mean on a filesystem mounted with user > namespace privileges. Where the unprivileged user can control what > resides on disk. > > That is what this patch should be about. > > Meaning and restricting those permissions to unprivileged users. > > 2) The second issue that I think your mnt_may_suid patch is about seems > to be what to do if a mount winds up in a place we never intended. > > Aka leaks. I don't think any changes to mnt_may_suid are necessary > in that sense. However they may be useful. > > So I think your mnt_may_suid change may be worth having but so far it > seems unnecessary. > > Which is a long way of saying this patch is fundamentally necessary, > and I am not certain about the mnt_may_suid patch. > > Am I right in understanding it's purpose? Or does this patch actually > succeed in obsoleting it? The only part that's absolutely needed is the restriction on file caps, otherwise it will be trivial to get root through a user namespace mount. I've become convinced that the safest and most logical thing to do is to restrict file capabilites to the user namespaces where the mounter already has privileges, which is what the patch does. mnt_may_suid would also restrict the namespaces where the capabilities would be honored, but not to only namespaces where the mounter is already privileged. Of course it does require a user privileged in another namespace to perform a mount, but that still leaves me feeling a bit uncomfortable. suid doesn't require quite so strict a check because (jumping ahead to the patches I haven't sent yet) ids in a user namespace mount of a normal filesystem are constrained to ids in that namespace. So users could only exploit this to suid to ids they already control, or if they managed to somehow bypass other kernel protections they could possibly gain access to user ns mounts belonging to another user. So if we have the s_user_ns check in get_file_caps the mnt_may_suid pass isn't strictly necessary, but I still think it is useful as a mitigation to the "leaks" Eric mentions. It _should_ be impossible for a user to gain access to another user's mount namespace, and it _should_ be impossible for a user to clear MNT_NOSUID in a bind mount from init_user_ns. But if someone does find a way to do either then the patch stops them from being able to gain privileges via suid, and I think that's worth adding the check. Andy alludes to the possibility of checking s_user_ns or both s_user_ns and the mount namespace in mnt_may_suid, and those are certainly possibilities that would work equally well (though checking both is probably unnecessary). One thing I came away with from conversing with Eric though is that he wants to see a clear and explicit check in get_file_caps, not something implicit from may_mnt_suid. And I can see his point - there is a concern with file capabilities independent of the question of whether suid is allowed, and having a separate check does make that clearer. Seth -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Jul 15, 2015 at 3:35 PM, Eric W. Biederman <ebiederm@xmission.com> wrote: > Andy Lutomirski <luto@amacapital.net> writes: > >> On Wed, Jul 15, 2015 at 2:48 PM, Serge E. Hallyn <serge@hallyn.com> wrote: >>> On Wed, Jul 15, 2015 at 02:46:04PM -0500, Seth Forshee wrote: >>>> Capability sets attached to files must be ignored except in the >>>> user namespaces where the mounter is privileged, i.e. s_user_ns >>>> and its descendants. Otherwise a vector exists for gaining >>>> privileges in namespaces where a user is not already privileged. >>>> >>>> Add a new helper function, in_user_ns(), to test whether a user >>>> namespace is the same as or a descendant of another namespace. >>>> Use this helper to determine whether a file's capability set >>>> should be applied to the caps constructed during exec. >>>> >>>> Signed-off-by: Seth Forshee <seth.forshee@canonical.com> >>> >>> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> >>> >>> I think it's an ok behavior, though let's just go over the >>> alternatives. >>> >>> It might actually be ok to simply require that the user_ns be >>> equal. If I unshare a new userns in which a different uid is >>> mapped to root, I may not want file capabilities to be granted >>> to tasks in that ns. (On the other hand, I might be creating >>> a new user_ns specifically to not have a uid 0 mapped into it >>> at all, and only have file capabilities grant privilege) >>> >>> Conversely, if I unshare one user_ns with a MS_SHARED mnt_ns, mount >>> an ext4fs, and then (from the parent shell) unshare another user_ns >>> with the same mapping, intending it to be a "peer" to the first one >>> I'd unshared and be able to use the ext4fs it mounted. This won't >>> work here. That's probably best - the appropriate thing to do was >>> to attach to the existing user_ns. But it could end up being >>> limiting in some special cases, so I'm bringing it up here. >>> >>> Again I think what you have here is the simplest and most sensible >>> choice, so ack. >>> >> >> I think I'm missing something. Why is this separate from mount_may_suid? >> >> I can see why it would make sense to check s_user_ns (or maybe >> s_user_ns *and* the vfsmount namespace) in mount_may_suid, but I don't >> see why we need separate checks. > > So I don't quite understand your concerns that lead to the mnt_may_suid > patch. But in my limited understanding there are two distinct issues. The issue is that we need some kind of control for whether a given operation should trust a given mounted filesystem. There are two kinds of trust: trusting the fs for execve security context (nosuid controls this) and trusting it for LSM access restrictions. I think that, in an unprivileged namespace context, the latter is a bit silly -- the creator of the fs owns it, full stop. I'm talking about the former. In particular, If I unshare everything, mount a fresh FUSE, shove a setuid, fcapped, LSM-labeled thing in it, pass a file descriptor out, and have someone in the root ns execve it, and *pwned*. My suggestion is to use a single function to control this, and I called it mnt_may_suid. We can certainly debate when that function should return true, but I'm unconvinced that the conditions for LSM and for regular setuid should be different. > > 1) What do file capabilities mean on a filesystem mounted with user > namespace privileges. Where the unprivileged user can control what > resides on disk. > > That is what this patch should be about. > > Meaning and restricting those permissions to unprivileged users. I think that file caps should mean what they usually do if the execve caller's userns should trust the file. Otherwise file caps should do nothing. My original idea was that a namespace trusts a vfsmount if the namespace or one of its ancestors created the mount. Doing the same thing but with s_user_ns might also make sense. > > 2) The second issue that I think your mnt_may_suid patch is about seems > to be what to do if a mount winds up in a place we never intended. > > Aka leaks. I don't think any changes to mnt_may_suid are necessary > in that sense. However they may be useful. > > So I think your mnt_may_suid change may be worth having but so far it > seems unnecessary. There's that, too. For one thing, with my mnt_may_suid patch (or a variant that checks the vfsmount and s_user_ns), we could drop the bind-mount nosuid restrictions. If you want to bind-mount an MS_NOSUID mount without MS_NOSUID, then that's fine -- you can't do any harm. > > Which is a long way of saying this patch is fundamentally necessary, > and I am not certain about the mnt_may_suid patch. > > Am I right in understanding it's purpose? Or does this patch actually > succeed in obsoleting it? Other way around. I think that an improved mnt_may_suid patch might render this patch unnecessary. --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Jul 15, 2015 at 6:14 PM, Seth Forshee <seth.forshee@canonical.com> wrote: > mnt_may_suid would also restrict the namespaces where the capabilities > would be honored, but not to only namespaces where the mounter is > already privileged. Of course it does require a user privileged in > another namespace to perform a mount, but that still leaves me feeling a > bit uncomfortable. Right. I think mnt_may_suid should check s_user_ns in addition. > > suid doesn't require quite so strict a check because (jumping ahead to > the patches I haven't sent yet) ids in a user namespace mount of a > normal filesystem are constrained to ids in that namespace. So users > could only exploit this to suid to ids they already control, or if they > managed to somehow bypass other kernel protections they could possibly > gain access to user ns mounts belonging to another user. True. But LSMs labels probably want the same protection as file caps, and the mnt_no_suid approach handles that, too. (Your patches also do this, but maybe we'd want to relax that some day for LSMs that are scoped sensibly.) > > So if we have the s_user_ns check in get_file_caps the mnt_may_suid pass > isn't strictly necessary, but I still think it is useful as a mitigation > to the "leaks" Eric mentions. It _should_ be impossible for a user to > gain access to another user's mount namespace, No, it's very easy with SCM_RIGHTS. We should make sure it's safe. > Andy alludes to the possibility of checking s_user_ns or both s_user_ns > and the mount namespace in mnt_may_suid, and those are certainly > possibilities that would work equally well (though checking both is > probably unnecessary). One thing I came away with from conversing with > Eric though is that he wants to see a clear and explicit check in > get_file_caps, not something implicit from may_mnt_suid. And I can see > his point - there is a concern with file capabilities independent of the > question of whether suid is allowed, and having a separate check does > make that clearer. But we absolutely need MS_NOSUID to block file caps, and it does. Why not just use the existing mechanism with an expanded sense of "nosuid"? --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Ok. Andy I have stopped and really looked at your patch that is 4/7 in this series. Something I had not done before since it sounded totally wrong. That combined with your earlier comments I think I can say something meaningful. Andy as I read your patch the thread you are primarily worried about is chdir(/some/directory/in/another/mnt/ns). I think enhancing nosuid to deal with that case is reasonable, and is unlikely to break userspace. It is one of those hairy security things so we need to be careful not to introduce a regression. I think a top down enhancement of nosuid to just block funny cases that no one cares about is completely sensible. Removing goofy corner that no one cares about and that are only good for security exploits seems reasonable. I am a little concerned that smack does not seem to respect nosuid on filesystems. But that is an issue with nosuid not with your enhanced nosuid. Now this patch 3/7 really should be entitled: "Limit file caps to the userns of the super block". It really really is doing something different. This change is about a bottom up understanding of what file caps means on a filesystem mounted by a user namespace root. That is file caps should only apply to the user namespace root of the root user who mounted the filesystem, because that is all the privileges the mounter of the filesystem had. This guarantees that even if the filesystem somehow propagates with mount propagation that there will be no issues. I think I know how to make that happen... But deeply and fundamentally limiting a filesystem to only the privilieges of it's user namespace root, and enhancing nosuid protections are rather different things. The approaches show up differently for dealing with uids and gids, as mappings are required. The approaches will likely to continue to show up differently for file caps when Serge implements a version of file caps with a user namespace root in them. The approaches fundamentally will need to do different things with security xattrs. As mnt_may_suid can just treat as a filesystem without labels, while ultimately the lsms will have to do something meaningful. So while in the very narrow case of todays file caps the two approaches are the same. Enhancing nosuid is something very different from limiting a filesystem to it's mounters user namespace. Eric -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Jul 15, 2015 at 9:23 PM, Eric W. Biederman <ebiederm@xmission.com> wrote: > > Ok. Andy I have stopped and really looked at your patch that is 4/7 in > this series. Something I had not done before since it sounded totally > wrong. > > That combined with your earlier comments I think I can say something > meaningful. > > Andy as I read your patch the thread you are primarily worried about is > chdir(/some/directory/in/another/mnt/ns). I think enhancing nosuid to > deal with that case is reasonable, and is unlikely to break userspace. > It is one of those hairy security things so we need to be careful not to > introduce a regression. > Indeed. It's plausible this could regress something, but it would be really weird. > I think a top down enhancement of nosuid to just block funny cases that > no one cares about is completely sensible. Removing goofy corner > that no one cares about and that are only good for security exploits > seems reasonable. > Agreed. > I am a little concerned that smack does not seem to respect nosuid > on filesystems. But that is an issue with nosuid not with your enhanced > nosuid. > > > > > Now this patch 3/7 really should be entitled: > "Limit file caps to the userns of the super block". > > It really really is doing something different. This change is about a > bottom up understanding of what file caps means on a filesystem mounted > by a user namespace root. > > That is file caps should only apply to the user namespace root of the > root user who mounted the filesystem, because that is all the privileges > the mounter of the filesystem had. > > This guarantees that even if the filesystem somehow propagates with > mount propagation that there will be no issues. I think I know how to > make that happen... > > > > > But deeply and fundamentally limiting a filesystem to only the > privilieges of it's user namespace root, and enhancing nosuid > protections are rather different things. > So here's the semantic question: Suppose an unprivileged user (uid 1000) creates a user namespace and a mount namespace. They stick a file (owned by uid 1000 as seen by init_user_ns) in there and mark it setuid root and give it fcaps. Then global root gets an fd to this filesystem. If they execve the file directly, then, with my patch 4, it won't act as setuid 1000 and the fcaps will be ignored. Even with my patch 4, though, if they bind mount the fs and execve the file from their bind mount, it will act as setuid 1000. Maybe this is odd. However, with Seth's patch 3, the fcaps will (correctly) not be honored. I tend to thing that, if we're not honoring the fcaps, we shouldn't be honoring the setuid bit either. After all, it's really not a trusted file, even though the only user who could have messed with it really is the apparent owner. And, if we're going to say we don't trust the file and shouldn't honor setuid or fcaps, then merging all the functionality into mnt_may_suid could make sense. Yes, these two things do different things, but they could hook in to the same place. --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Andy Lutomirski <luto@amacapital.net> writes: > On Wed, Jul 15, 2015 at 9:23 PM, Eric W. Biederman > <ebiederm@xmission.com> wrote: >> >> Ok. Andy I have stopped and really looked at your patch that is 4/7 in >> this series. Something I had not done before since it sounded totally >> wrong. >> >> That combined with your earlier comments I think I can say something >> meaningful. >> >> Andy as I read your patch the thread you are primarily worried about is >> chdir(/some/directory/in/another/mnt/ns). I think enhancing nosuid to >> deal with that case is reasonable, and is unlikely to break userspace. >> It is one of those hairy security things so we need to be careful not to >> introduce a regression. >> > > Indeed. It's plausible this could regress something, but it would be > really weird. > >> I think a top down enhancement of nosuid to just block funny cases that >> no one cares about is completely sensible. Removing goofy corner >> that no one cares about and that are only good for security exploits >> seems reasonable. >> > > Agreed. > >> I am a little concerned that smack does not seem to respect nosuid >> on filesystems. But that is an issue with nosuid not with your enhanced >> nosuid. >> >> >> >> >> Now this patch 3/7 really should be entitled: >> "Limit file caps to the userns of the super block". >> >> It really really is doing something different. This change is about a >> bottom up understanding of what file caps means on a filesystem mounted >> by a user namespace root. >> >> That is file caps should only apply to the user namespace root of the >> root user who mounted the filesystem, because that is all the privileges >> the mounter of the filesystem had. >> >> This guarantees that even if the filesystem somehow propagates with >> mount propagation that there will be no issues. I think I know how to >> make that happen... >> >> >> >> >> But deeply and fundamentally limiting a filesystem to only the >> privilieges of it's user namespace root, and enhancing nosuid >> protections are rather different things. >> > > So here's the semantic question: > > Suppose an unprivileged user (uid 1000) creates a user namespace and a > mount namespace. They stick a file (owned by uid 1000 as seen by > init_user_ns) in there and mark it setuid root and give it fcaps. To make this make sense I have to ask, is this file on a filesystem where uid 1000 as seen by the init_user_ns stored as uid 1000 on the filesystem? Or is this uid 0 as seen by the filesystem? I assume this is uid 0 on the filesystem in question or else your unprivileged user would not have sufficient privileges over the filesystem to setup fcaps. > Then global root gets an fd to this filesystem. If they execve the > file directly, then, with my patch 4, it won't act as setuid 1000 and > the fcaps will be ignored. Even with my patch 4, though, if they bind > mount the fs and execve the file from their bind mount, it will act as > setuid 1000. Maybe this is odd. However, with Seth's patch 3, the > fcaps will (correctly) not be honored. With patch 3 you can also think of it as fcaps being honored and you get all the caps in the appropriate user namespace, but since you are not in that user namespace and so don't have a place to store them in struct cred you don't get the file caps. From the philosophy of interpreting the file as defined by the filesystem in principle we could extend struct cred so you actually get the creds just in uid 1000s user namespace, but that is very unlikely to be worth it. > I tend to thing that, if we're not honoring the fcaps, we shouldn't be > honoring the setuid bit either. After all, it's really not a trusted > file, even though the only user who could have messed with it really > is the apparent owner. For the file caps we can't honor them because you don't have the bits in struct cred. For setuid we can honor it, and setuid is something that the user namespace allows. > And, if we're going to say we don't trust the file and shouldn't honor > setuid or fcaps, then merging all the functionality into mnt_may_suid > could make sense. Yes, these two things do different things, but they > could hook in to the same place. There are really two separate questions: - Do we trust this filesystem? - Do you have the bits to implement this concept? Even if in this specific context the two questions wind up looking exactly the same. I think it makes a lot of sense to ask the two questions separately. As future maintenance changes may cause the implementation of the questions to diverge. Eric -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Jul 15, 2015 at 10:04 PM, Eric W. Biederman <ebiederm@xmission.com> wrote: > Andy Lutomirski <luto@amacapital.net> writes: > >> On Wed, Jul 15, 2015 at 9:23 PM, Eric W. Biederman >> <ebiederm@xmission.com> wrote: >>> >>> Ok. Andy I have stopped and really looked at your patch that is 4/7 in >>> this series. Something I had not done before since it sounded totally >>> wrong. >>> >>> That combined with your earlier comments I think I can say something >>> meaningful. >>> >>> Andy as I read your patch the thread you are primarily worried about is >>> chdir(/some/directory/in/another/mnt/ns). I think enhancing nosuid to >>> deal with that case is reasonable, and is unlikely to break userspace. >>> It is one of those hairy security things so we need to be careful not to >>> introduce a regression. >>> >> >> Indeed. It's plausible this could regress something, but it would be >> really weird. >> >>> I think a top down enhancement of nosuid to just block funny cases that >>> no one cares about is completely sensible. Removing goofy corner >>> that no one cares about and that are only good for security exploits >>> seems reasonable. >>> >> >> Agreed. >> >>> I am a little concerned that smack does not seem to respect nosuid >>> on filesystems. But that is an issue with nosuid not with your enhanced >>> nosuid. >>> >>> >>> >>> >>> Now this patch 3/7 really should be entitled: >>> "Limit file caps to the userns of the super block". >>> >>> It really really is doing something different. This change is about a >>> bottom up understanding of what file caps means on a filesystem mounted >>> by a user namespace root. >>> >>> That is file caps should only apply to the user namespace root of the >>> root user who mounted the filesystem, because that is all the privileges >>> the mounter of the filesystem had. >>> >>> This guarantees that even if the filesystem somehow propagates with >>> mount propagation that there will be no issues. I think I know how to >>> make that happen... >>> >>> >>> >>> >>> But deeply and fundamentally limiting a filesystem to only the >>> privilieges of it's user namespace root, and enhancing nosuid >>> protections are rather different things. >>> >> >> So here's the semantic question: >> >> Suppose an unprivileged user (uid 1000) creates a user namespace and a >> mount namespace. They stick a file (owned by uid 1000 as seen by >> init_user_ns) in there and mark it setuid root and give it fcaps. > > To make this make sense I have to ask, is this file on a filesystem > where uid 1000 as seen by the init_user_ns stored as uid 1000 on > the filesystem? Or is this uid 0 as seen by the filesystem? > > I assume this is uid 0 on the filesystem in question or else your > unprivileged user would not have sufficient privileges over the > filesystem to setup fcaps. I was thinking uid 0 as seen by the filesystem. But even if it were uid 1000, the unprivileged user can still set whatever mode and xattrs they want -- they control the backing store. > >> Then global root gets an fd to this filesystem. If they execve the >> file directly, then, with my patch 4, it won't act as setuid 1000 and >> the fcaps will be ignored. Even with my patch 4, though, if they bind >> mount the fs and execve the file from their bind mount, it will act as >> setuid 1000. Maybe this is odd. However, with Seth's patch 3, the >> fcaps will (correctly) not be honored. > > With patch 3 you can also think of it as fcaps being honored and you > get all the caps in the appropriate user namespace, but since you are > not in that user namespace and so don't have a place to store them > in struct cred you don't get the file caps. > > From the philosophy of interpreting the file as defined by the > filesystem in principle we could extend struct cred so you actually > get the creds just in uid 1000s user namespace, but that is very > unlikely to be worth it. I agree. > >> I tend to thing that, if we're not honoring the fcaps, we shouldn't be >> honoring the setuid bit either. After all, it's really not a trusted >> file, even though the only user who could have messed with it really >> is the apparent owner. > > For the file caps we can't honor them because you don't have the bits > in struct cred. > > For setuid we can honor it, and setuid is something that the user > namespace allows. > We certainly *can* honor it. But why should we? I'd be more comfortable with this if the contents of an untrusted filesystem were really treated as just data. >> And, if we're going to say we don't trust the file and shouldn't honor >> setuid or fcaps, then merging all the functionality into mnt_may_suid >> could make sense. Yes, these two things do different things, but they >> could hook in to the same place. > > There are really two separate questions: > - Do we trust this filesystem? > - Do you have the bits to implement this concept? > > Even if in this specific context the two questions wind up looking > exactly the same. I think it makes a lot of sense to ask the two > questions separately. As future maintenance changes may cause the > implementation of the questions to diverge. > Agreed. Unless someone thinks of an argument to the contrary, I'd say "no, we don't trust this filesystem". I could be convinced otherwise. --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Andy Lutomirski <luto@amacapital.net> writes: > On Wed, Jul 15, 2015 at 10:04 PM, Eric W. Biederman > <ebiederm@xmission.com> wrote: >> Andy Lutomirski <luto@amacapital.net> writes: >> >>> >>> So here's the semantic question: >>> >>> Suppose an unprivileged user (uid 1000) creates a user namespace and a >>> mount namespace. They stick a file (owned by uid 1000 as seen by >>> init_user_ns) in there and mark it setuid root and give it fcaps. >> >> To make this make sense I have to ask, is this file on a filesystem >> where uid 1000 as seen by the init_user_ns stored as uid 1000 on >> the filesystem? Or is this uid 0 as seen by the filesystem? >> >> I assume this is uid 0 on the filesystem in question or else your >> unprivileged user would not have sufficient privileges over the >> filesystem to setup fcaps. > > I was thinking uid 0 as seen by the filesystem. But even if it were > uid 1000, the unprivileged user can still set whatever mode and xattrs > they want -- they control the backing store. Yes. And that is what I was really asking. Are we taking about a filesystem where the user controls the backing store? >>> Then global root gets an fd to this filesystem. If they execve the >>> file directly, then, with my patch 4, it won't act as setuid 1000 and >>> the fcaps will be ignored. Even with my patch 4, though, if they bind >>> mount the fs and execve the file from their bind mount, it will act as >>> setuid 1000. Maybe this is odd. However, with Seth's patch 3, the >>> fcaps will (correctly) not be honored. >> >> With patch 3 you can also think of it as fcaps being honored and you >> get all the caps in the appropriate user namespace, but since you are >> not in that user namespace and so don't have a place to store them >> in struct cred you don't get the file caps. >> >> From the philosophy of interpreting the file as defined by the >> filesystem in principle we could extend struct cred so you actually >> get the creds just in uid 1000s user namespace, but that is very >> unlikely to be worth it. > > I agree. > >> >>> I tend to thing that, if we're not honoring the fcaps, we shouldn't be >>> honoring the setuid bit either. After all, it's really not a trusted >>> file, even though the only user who could have messed with it really >>> is the apparent owner. >> >> For the file caps we can't honor them because you don't have the bits >> in struct cred. >> >> For setuid we can honor it, and setuid is something that the user >> namespace allows. >> > > We certainly *can* honor it. But why should we? I'd be more > comfortable with this if the contents of an untrusted filesystem were > really treated as just data. In these weird bleed through situtations I don't know that we should. But extending nosuid protections in this way is a bit like yama a bit gratuitious stomping don't care cases in the semantics to make bugs harder to exploit. >>> And, if we're going to say we don't trust the file and shouldn't honor >>> setuid or fcaps, then merging all the functionality into mnt_may_suid >>> could make sense. Yes, these two things do different things, but they >>> could hook in to the same place. >> >> There are really two separate questions: >> - Do we trust this filesystem? >> - Do you have the bits to implement this concept? >> >> Even if in this specific context the two questions wind up looking >> exactly the same. I think it makes a lot of sense to ask the two >> questions separately. As future maintenance changes may cause the >> implementation of the questions to diverge. >> > > Agreed. > > Unless someone thinks of an argument to the contrary, I'd say "no, we > don't trust this filesystem". I could be convinced otherwise. But this is context dependent. From the perspective of the container we really do want to trust the filesystem. As the container root set it up, and if he isn't being hostile likely has a use for setfcaps files and setuid files and all of the rest. Perhaps I should phrase it as: - In this context do we trust the code? AKA mnt_may_suid? - What do these bits mean in this context? (Usually something more complicated). Which says to me we want both patches 3 and 4 (even if 4 uses s_user_ns) because 3 is different than 4. And now I better context switch back to fixing bind mounts. Eric -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Jul 15, 2015 at 06:23:01PM -0700, Andy Lutomirski wrote: > > So if we have the s_user_ns check in get_file_caps the mnt_may_suid pass > > isn't strictly necessary, but I still think it is useful as a mitigation > > to the "leaks" Eric mentions. It _should_ be impossible for a user to > > gain access to another user's mount namespace, > > No, it's very easy with SCM_RIGHTS. We should make sure it's safe. Sure, what I really meant was that an attacker shouldn't be able to do so without cooperation from the other user's processes. But I think we're all in agreement that making it safe is a good idea. Seth -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Jul 16, 2015 at 12:44:49AM -0500, Eric W. Biederman wrote: > Andy Lutomirski <luto@amacapital.net> writes: > > > On Wed, Jul 15, 2015 at 10:04 PM, Eric W. Biederman > > <ebiederm@xmission.com> wrote: > >> Andy Lutomirski <luto@amacapital.net> writes: > >> > >>> > >>> So here's the semantic question: > >>> > >>> Suppose an unprivileged user (uid 1000) creates a user namespace and a > >>> mount namespace. They stick a file (owned by uid 1000 as seen by > >>> init_user_ns) in there and mark it setuid root and give it fcaps. > >> > >> To make this make sense I have to ask, is this file on a filesystem > >> where uid 1000 as seen by the init_user_ns stored as uid 1000 on > >> the filesystem? Or is this uid 0 as seen by the filesystem? > >> > >> I assume this is uid 0 on the filesystem in question or else your > >> unprivileged user would not have sufficient privileges over the > >> filesystem to setup fcaps. > > > > I was thinking uid 0 as seen by the filesystem. But even if it were > > uid 1000, the unprivileged user can still set whatever mode and xattrs > > they want -- they control the backing store. > > Yes. And that is what I was really asking. Are we taking about a > filesystem where the user controls the backing store? > > >>> Then global root gets an fd to this filesystem. If they execve the > >>> file directly, then, with my patch 4, it won't act as setuid 1000 and > >>> the fcaps will be ignored. Even with my patch 4, though, if they bind > >>> mount the fs and execve the file from their bind mount, it will act as > >>> setuid 1000. Maybe this is odd. However, with Seth's patch 3, the > >>> fcaps will (correctly) not be honored. > >> > >> With patch 3 you can also think of it as fcaps being honored and you > >> get all the caps in the appropriate user namespace, but since you are > >> not in that user namespace and so don't have a place to store them > >> in struct cred you don't get the file caps. > >> > >> From the philosophy of interpreting the file as defined by the > >> filesystem in principle we could extend struct cred so you actually > >> get the creds just in uid 1000s user namespace, but that is very > >> unlikely to be worth it. > > > > I agree. > > > >> > >>> I tend to thing that, if we're not honoring the fcaps, we shouldn't be > >>> honoring the setuid bit either. After all, it's really not a trusted > >>> file, even though the only user who could have messed with it really > >>> is the apparent owner. > >> > >> For the file caps we can't honor them because you don't have the bits > >> in struct cred. > >> > >> For setuid we can honor it, and setuid is something that the user > >> namespace allows. > >> > > > > We certainly *can* honor it. But why should we? I'd be more > > comfortable with this if the contents of an untrusted filesystem were > > really treated as just data. > > In these weird bleed through situtations I don't know that we should. > But extending nosuid protections in this way is a bit like yama > a bit gratuitious stomping don't care cases in the semantics to > make bugs harder to exploit. > > >>> And, if we're going to say we don't trust the file and shouldn't honor > >>> setuid or fcaps, then merging all the functionality into mnt_may_suid > >>> could make sense. Yes, these two things do different things, but they > >>> could hook in to the same place. > >> > >> There are really two separate questions: > >> - Do we trust this filesystem? > >> - Do you have the bits to implement this concept? > >> > >> Even if in this specific context the two questions wind up looking > >> exactly the same. I think it makes a lot of sense to ask the two > >> questions separately. As future maintenance changes may cause the > >> implementation of the questions to diverge. > >> > > > > Agreed. > > > > Unless someone thinks of an argument to the contrary, I'd say "no, we > > don't trust this filesystem". I could be convinced otherwise. > > But this is context dependent. From the perspective of the container > we really do want to trust the filesystem. As the container root set it > up, and if he isn't being hostile likely has a use for setfcaps files > and setuid files and all of the rest. > > Perhaps I should phrase it as: > - In this context do we trust the code? AKA mnt_may_suid? > - What do these bits mean in this context? (Usually something more complicated). > > Which says to me we want both patches 3 and 4 (even if 4 uses s_user_ns) > because 3 is different than 4. So what I'll do is: - Add a s_user_ns check to mnt_may_suid - Keep the (now redundant) s_user_ns check in get_file_caps I'm on the fence about having both the mnt and user ns checks in mnt_may_suid - it might be overkill, but it still adds the protection against clearing MNT_NOSUID in a bind mount. So I guess I'll keep the mnt ns check. Seth -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Seth Forshee <seth.forshee@canonical.com> writes: > On Thu, Jul 16, 2015 at 12:44:49AM -0500, Eric W. Biederman wrote: >> Andy Lutomirski <luto@amacapital.net> writes: >> >> > On Wed, Jul 15, 2015 at 10:04 PM, Eric W. Biederman >> > <ebiederm@xmission.com> wrote: >> >> Andy Lutomirski <luto@amacapital.net> writes: >> >> >> >>> >> >>> So here's the semantic question: >> >>> >> >>> Suppose an unprivileged user (uid 1000) creates a user namespace and a >> >>> mount namespace. They stick a file (owned by uid 1000 as seen by >> >>> init_user_ns) in there and mark it setuid root and give it fcaps. >> >> >> >> To make this make sense I have to ask, is this file on a filesystem >> >> where uid 1000 as seen by the init_user_ns stored as uid 1000 on >> >> the filesystem? Or is this uid 0 as seen by the filesystem? >> >> >> >> I assume this is uid 0 on the filesystem in question or else your >> >> unprivileged user would not have sufficient privileges over the >> >> filesystem to setup fcaps. >> > >> > I was thinking uid 0 as seen by the filesystem. But even if it were >> > uid 1000, the unprivileged user can still set whatever mode and xattrs >> > they want -- they control the backing store. >> >> Yes. And that is what I was really asking. Are we taking about a >> filesystem where the user controls the backing store? >> >> >>> Then global root gets an fd to this filesystem. If they execve the >> >>> file directly, then, with my patch 4, it won't act as setuid 1000 and >> >>> the fcaps will be ignored. Even with my patch 4, though, if they bind >> >>> mount the fs and execve the file from their bind mount, it will act as >> >>> setuid 1000. Maybe this is odd. However, with Seth's patch 3, the >> >>> fcaps will (correctly) not be honored. >> >> >> >> With patch 3 you can also think of it as fcaps being honored and you >> >> get all the caps in the appropriate user namespace, but since you are >> >> not in that user namespace and so don't have a place to store them >> >> in struct cred you don't get the file caps. >> >> >> >> From the philosophy of interpreting the file as defined by the >> >> filesystem in principle we could extend struct cred so you actually >> >> get the creds just in uid 1000s user namespace, but that is very >> >> unlikely to be worth it. >> > >> > I agree. >> > >> >> >> >>> I tend to thing that, if we're not honoring the fcaps, we shouldn't be >> >>> honoring the setuid bit either. After all, it's really not a trusted >> >>> file, even though the only user who could have messed with it really >> >>> is the apparent owner. >> >> >> >> For the file caps we can't honor them because you don't have the bits >> >> in struct cred. >> >> >> >> For setuid we can honor it, and setuid is something that the user >> >> namespace allows. >> >> >> > >> > We certainly *can* honor it. But why should we? I'd be more >> > comfortable with this if the contents of an untrusted filesystem were >> > really treated as just data. >> >> In these weird bleed through situtations I don't know that we should. >> But extending nosuid protections in this way is a bit like yama >> a bit gratuitious stomping don't care cases in the semantics to >> make bugs harder to exploit. >> >> >>> And, if we're going to say we don't trust the file and shouldn't honor >> >>> setuid or fcaps, then merging all the functionality into mnt_may_suid >> >>> could make sense. Yes, these two things do different things, but they >> >>> could hook in to the same place. >> >> >> >> There are really two separate questions: >> >> - Do we trust this filesystem? >> >> - Do you have the bits to implement this concept? >> >> >> >> Even if in this specific context the two questions wind up looking >> >> exactly the same. I think it makes a lot of sense to ask the two >> >> questions separately. As future maintenance changes may cause the >> >> implementation of the questions to diverge. >> >> >> > >> > Agreed. >> > >> > Unless someone thinks of an argument to the contrary, I'd say "no, we >> > don't trust this filesystem". I could be convinced otherwise. >> >> But this is context dependent. From the perspective of the container >> we really do want to trust the filesystem. As the container root set it >> up, and if he isn't being hostile likely has a use for setfcaps files >> and setuid files and all of the rest. >> >> Perhaps I should phrase it as: >> - In this context do we trust the code? AKA mnt_may_suid? >> - What do these bits mean in this context? (Usually something more complicated). >> >> Which says to me we want both patches 3 and 4 (even if 4 uses s_user_ns) >> because 3 is different than 4. > > So what I'll do is: > > - Add a s_user_ns check to mnt_may_suid > - Keep the (now redundant) s_user_ns check in get_file_caps > > I'm on the fence about having both the mnt and user ns checks in > mnt_may_suid - it might be overkill, but it still adds the protection > against clearing MNT_NOSUID in a bind mount. So I guess I'll keep the > mnt ns check. That sounds like a plan. Eric -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Jul 16, 2015 at 12:04:43AM -0500, Eric W. Biederman wrote: > > I tend to thing that, if we're not honoring the fcaps, we shouldn't be > > honoring the setuid bit either. After all, it's really not a trusted > > file, even though the only user who could have messed with it really > > is the apparent owner. > > For the file caps we can't honor them because you don't have the bits > in struct cred. > > For setuid we can honor it, and setuid is something that the user > namespace allows. Setuid is something explicitly tied to the user id. File capabilities are MAC, that is, explicitly orthogonal to user id. So 100% agreed with honoring setuid in user_ns and, for now, ignoring file caps. As I've mentioned a few times privately, I'm intending to implement user-namespaced file capabilities as a new xattr. Design is not 100% nailed down, but probably it would support a set of userns_fcaps, each of which lists the k_uid of the root user in the namespace assigning the filecaps, followed by three sets. Then when exec()ing the file, if the current->userns->root user has a userns_fcap entry, or there is a -1 entry, then use that, else use nothing. I think this is a very importing thing to support, to remove a barrier to shipping packages with software using filecaps. Without this, any package, say ping, which wants to support being installed in a (unprivileged) cotainer would need to also support use without filecaps, meaning that will likely be the only supported mode. -serge -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Jul 29, 2015 at 11:04:50AM -0500, Serge E. Hallyn wrote: > On Thu, Jul 16, 2015 at 12:04:43AM -0500, Eric W. Biederman wrote: > > > I tend to thing that, if we're not honoring the fcaps, we shouldn't be > > > honoring the setuid bit either. After all, it's really not a trusted > > > file, even though the only user who could have messed with it really > > > is the apparent owner. > > > > For the file caps we can't honor them because you don't have the bits > > in struct cred. > > > > For setuid we can honor it, and setuid is something that the user > > namespace allows. > > Setuid is something explicitly tied to the user id. File capabilities > are MAC, that is, explicitly orthogonal to user id. So 100% agreed with > honoring setuid in user_ns and, for now, ignoring file caps. Hm. No. Seems like both should be fine when current is in the mounter's user_ns, and ignored otherwise. (The below is still needed :) > As I've mentioned a few times privately, I'm intending to implement > user-namespaced file capabilities as a new xattr. Design is not 100% > nailed down, but probably it would support a set of userns_fcaps, each > of which lists the k_uid of the root user in the namespace assigning the > filecaps, followed by three sets. Then when exec()ing the file, if > the current->userns->root user has a userns_fcap entry, or there is a -1 > entry, then use that, else use nothing. I think this is a very importing > thing to support, to remove a barrier to shipping packages with software > using filecaps. Without this, any package, say ping, which wants to > support being installed in a (unprivileged) cotainer would need to also > support use without filecaps, meaning that will likely be the only > supported mode. > > -serge -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h index 8297e5b341d8..a43faa727124 100644 --- a/include/linux/user_namespace.h +++ b/include/linux/user_namespace.h @@ -72,6 +72,8 @@ extern ssize_t proc_projid_map_write(struct file *, const char __user *, size_t, extern ssize_t proc_setgroups_write(struct file *, const char __user *, size_t, loff_t *); extern int proc_setgroups_show(struct seq_file *m, void *v); extern bool userns_may_setgroups(const struct user_namespace *ns); +extern bool in_userns(const struct user_namespace *ns, + const struct user_namespace *target_ns); #else static inline struct user_namespace *get_user_ns(struct user_namespace *ns) @@ -100,6 +102,12 @@ static inline bool userns_may_setgroups(const struct user_namespace *ns) { return true; } + +static inline bool in_userns(const struct user_namespace *ns, + const struct user_namespace *target_ns) +{ + return true; +} #endif #endif /* _LINUX_USER_H */ diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c index 4109f8320684..2b043876d5f0 100644 --- a/kernel/user_namespace.c +++ b/kernel/user_namespace.c @@ -944,6 +944,20 @@ bool userns_may_setgroups(const struct user_namespace *ns) return allowed; } +/* + * Returns true if @ns is the same namespace as or a descendant of + * @target_ns. + */ +bool in_userns(const struct user_namespace *ns, + const struct user_namespace *target_ns) +{ + for (; ns; ns = ns->parent) { + if (ns == target_ns) + return true; + } + return false; +} + static inline struct user_namespace *to_user_ns(struct ns_common *ns) { return container_of(ns, struct user_namespace, ns); diff --git a/security/commoncap.c b/security/commoncap.c index d103f5a4043d..175ab497e810 100644 --- a/security/commoncap.c +++ b/security/commoncap.c @@ -439,6 +439,8 @@ static int get_file_caps(struct linux_binprm *bprm, bool *effective, bool *has_c if (bprm->file->f_path.mnt->mnt_flags & MNT_NOSUID) return 0; + if (!in_userns(current_user_ns(), bprm->file->f_path.mnt->mnt_sb->s_user_ns)) + return 0; rc = get_vfs_caps_from_disk(bprm->file->f_path.dentry, &vcaps); if (rc < 0) {
Capability sets attached to files must be ignored except in the user namespaces where the mounter is privileged, i.e. s_user_ns and its descendants. Otherwise a vector exists for gaining privileges in namespaces where a user is not already privileged. Add a new helper function, in_user_ns(), to test whether a user namespace is the same as or a descendant of another namespace. Use this helper to determine whether a file's capability set should be applied to the caps constructed during exec. Signed-off-by: Seth Forshee <seth.forshee@canonical.com> --- include/linux/user_namespace.h | 8 ++++++++ kernel/user_namespace.c | 14 ++++++++++++++ security/commoncap.c | 2 ++ 3 files changed, 24 insertions(+)