Message ID | 20211221021744.864115-1-longman@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | exec: Make suid_dumpable apply to SUID/SGID binaries irrespective of invoking users | expand |
Waiman Long <longman@redhat.com> writes: > The begin_new_exec() function checks for SUID or SGID binaries by > comparing effective uid and gid against real uid and gid and using > the suid_dumpable sysctl parameter setting only if either one of them > differs. > > In the special case that the uid and/or gid of the SUID/SGID binaries > matches the id's of the user invoking it, the suid_dumpable is not > used and SUID_DUMP_USER will be used instead. The documentation for the > suid_dumpable sysctl parameter does not include that exception and so > this will be an undocumented behavior. > > Eliminate this undocumented behavior by adding a flag in the linux_binprm > structure to designate a SUID/SGID binary and use it for determining > if the suid_dumpable setting should be applied or not. I see that you are making the code match the documentation. What harm/problems does this mismatch cause in practice? What is the motivation for this change? I am trying to see the motivation but all I can see is that in the case where suid and sgid do nothing in practice the code does not change dumpable. The point of dumpable is to refuse to core dump when it is not safe. In this case since nothing happened in practice it is safe. So how does this matter in practice. If there isn't a good motivation my feel is that it is the documentation that needs to be updated rather than the code. There are a lot of warts to the suid/sgid handling during exec. This just doesn't look like one of them. Eric > Signed-off-by: Waiman Long <longman@redhat.com> > --- > fs/exec.c | 6 +++--- > include/linux/binfmts.h | 5 ++++- > 2 files changed, 7 insertions(+), 4 deletions(-) > > diff --git a/fs/exec.c b/fs/exec.c > index 537d92c41105..60e02e678fb6 100644 > --- a/fs/exec.c > +++ b/fs/exec.c > @@ -1344,9 +1344,7 @@ int begin_new_exec(struct linux_binprm * bprm) > * is wrong, but userspace depends on it. This should be testing > * bprm->secureexec instead. > */ > - if (bprm->interp_flags & BINPRM_FLAGS_ENFORCE_NONDUMP || > - !(uid_eq(current_euid(), current_uid()) && > - gid_eq(current_egid(), current_gid()))) > + if (bprm->interp_flags & BINPRM_FLAGS_ENFORCE_NONDUMP || bprm->is_sugid) > set_dumpable(current->mm, suid_dumpable); > else > set_dumpable(current->mm, SUID_DUMP_USER); > @@ -1619,11 +1617,13 @@ static void bprm_fill_uid(struct linux_binprm *bprm, struct file *file) > if (mode & S_ISUID) { > bprm->per_clear |= PER_CLEAR_ON_SETID; > bprm->cred->euid = uid; > + bprm->is_sugid = 1; > } > > if ((mode & (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP)) { > bprm->per_clear |= PER_CLEAR_ON_SETID; > bprm->cred->egid = gid; > + bprm->is_sugid = 1; > } > } > > diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h > index 049cf9421d83..6d9893c59085 100644 > --- a/include/linux/binfmts.h > +++ b/include/linux/binfmts.h > @@ -41,7 +41,10 @@ struct linux_binprm { > * Set when errors can no longer be returned to the > * original userspace. > */ > - point_of_no_return:1; > + point_of_no_return:1, > + > + /* Is a SUID or SGID binary? */ > + is_sugid:1; > #ifdef __alpha__ > unsigned int taso:1; > #endif
On 12/21/21 10:55, Eric W. Biederman wrote: > Waiman Long <longman@redhat.com> writes: > >> The begin_new_exec() function checks for SUID or SGID binaries by >> comparing effective uid and gid against real uid and gid and using >> the suid_dumpable sysctl parameter setting only if either one of them >> differs. >> >> In the special case that the uid and/or gid of the SUID/SGID binaries >> matches the id's of the user invoking it, the suid_dumpable is not >> used and SUID_DUMP_USER will be used instead. The documentation for the >> suid_dumpable sysctl parameter does not include that exception and so >> this will be an undocumented behavior. >> >> Eliminate this undocumented behavior by adding a flag in the linux_binprm >> structure to designate a SUID/SGID binary and use it for determining >> if the suid_dumpable setting should be applied or not. > I see that you are making the code match the documentation. > What harm/problems does this mismatch cause in practice? > What is the motivation for this change? > > I am trying to see the motivation but all I can see is that > in the case where suid and sgid do nothing in practice the code > does not change dumpable. The point of dumpable is to refuse to > core dump when it is not safe. In this case since nothing happened > in practice it is safe. > > So how does this matter in practice. If there isn't a good > motivation my feel is that it is the documentation that needs to be > updated rather than the code. > > There are a lot of warts to the suid/sgid handling during exec. This > just doesn't look like one of them This patch is a minor mitigation in response to the security vulnerability as posted in https://www.openwall.com/lists/oss-security/2021/10/20/2 (aka CVE-2021-3864). In particular, the Su PoC (tested on CentOS 7) showing that the su invokes /usr/sbin/unix_chkpwd which is also a SUID binary. The initial su invocation won't generate a core dump because the real uid and euid differs, but the second unix_chkpwd invocation will. This patch eliminates this hole by making sure that all SUID binaries follow suid_dumpable setting. Cheers, Longman
Adding a couple of other people who have expressed opinions on how to mitigate this issue in the kernel. Waiman Long <longman@redhat.com> writes: > On 12/21/21 10:55, Eric W. Biederman wrote: >> Waiman Long <longman@redhat.com> writes: >> >>> The begin_new_exec() function checks for SUID or SGID binaries by >>> comparing effective uid and gid against real uid and gid and using >>> the suid_dumpable sysctl parameter setting only if either one of them >>> differs. >>> >>> In the special case that the uid and/or gid of the SUID/SGID binaries >>> matches the id's of the user invoking it, the suid_dumpable is not >>> used and SUID_DUMP_USER will be used instead. The documentation for the >>> suid_dumpable sysctl parameter does not include that exception and so >>> this will be an undocumented behavior. >>> >>> Eliminate this undocumented behavior by adding a flag in the linux_binprm >>> structure to designate a SUID/SGID binary and use it for determining >>> if the suid_dumpable setting should be applied or not. >> I see that you are making the code match the documentation. >> What harm/problems does this mismatch cause in practice? >> What is the motivation for this change? >> >> I am trying to see the motivation but all I can see is that >> in the case where suid and sgid do nothing in practice the code >> does not change dumpable. The point of dumpable is to refuse to >> core dump when it is not safe. In this case since nothing happened >> in practice it is safe. >> >> So how does this matter in practice. If there isn't a good >> motivation my feel is that it is the documentation that needs to be >> updated rather than the code. >> >> There are a lot of warts to the suid/sgid handling during exec. This >> just doesn't look like one of them > > This patch is a minor mitigation in response to the security > vulnerability as posted in > https://www.openwall.com/lists/oss-security/2021/10/20/2 (aka > CVE-2021-3864). In particular, the Su PoC (tested on CentOS 7) showing > that the su invokes /usr/sbin/unix_chkpwd which is also a SUID > binary. The initial su invocation won't generate a core dump because > the real uid and euid differs, but the second unix_chkpwd invocation > will. This patch eliminates this hole by making sure that all SUID > binaries follow suid_dumpable setting. All that is required to take advantage of this vulnerability is for an suid program to exec something that will coredump. That exec resets the dumpability. While the example exploit is execing a suid program it is not required that the exec'd program be suid. This makes your proposed change is not a particularly effective mitigation. The best idea I have seen to mitigate this from the kernel side is: 1) set RLIMIT_CORE to 0 during an suid exec 2) update do_coredump to honor an rlimit of 0 for pipes Anecdotally this should not effect the common systems that pipe coredumps into programs as those programs are reported to honor RLIMIT_CORE of 0. This needs to be verified. If those programs do honor RLIMIT_CORE of 0 we won't have any user visible changes if they never see coredumps from a program with a RLIMIT_CORE of 0. I have been meaning to audit userspace and see if the common coredump catchers truly honor an RLIMIT_CORE of 0. Unfortunately I have not found time to do that yet. Eric
On 12/21/21 12:35, Eric W. Biederman wrote: > Adding a couple of other people who have expressed opinions on how > to mitigate this issue in the kernel. > > Waiman Long <longman@redhat.com> writes: > >> On 12/21/21 10:55, Eric W. Biederman wrote: >>> Waiman Long <longman@redhat.com> writes: >>> >>>> The begin_new_exec() function checks for SUID or SGID binaries by >>>> comparing effective uid and gid against real uid and gid and using >>>> the suid_dumpable sysctl parameter setting only if either one of them >>>> differs. >>>> >>>> In the special case that the uid and/or gid of the SUID/SGID binaries >>>> matches the id's of the user invoking it, the suid_dumpable is not >>>> used and SUID_DUMP_USER will be used instead. The documentation for the >>>> suid_dumpable sysctl parameter does not include that exception and so >>>> this will be an undocumented behavior. >>>> >>>> Eliminate this undocumented behavior by adding a flag in the linux_binprm >>>> structure to designate a SUID/SGID binary and use it for determining >>>> if the suid_dumpable setting should be applied or not. >>> I see that you are making the code match the documentation. >>> What harm/problems does this mismatch cause in practice? >>> What is the motivation for this change? >>> >>> I am trying to see the motivation but all I can see is that >>> in the case where suid and sgid do nothing in practice the code >>> does not change dumpable. The point of dumpable is to refuse to >>> core dump when it is not safe. In this case since nothing happened >>> in practice it is safe. >>> >>> So how does this matter in practice. If there isn't a good >>> motivation my feel is that it is the documentation that needs to be >>> updated rather than the code. >>> >>> There are a lot of warts to the suid/sgid handling during exec. This >>> just doesn't look like one of them >> This patch is a minor mitigation in response to the security >> vulnerability as posted in >> https://www.openwall.com/lists/oss-security/2021/10/20/2 (aka >> CVE-2021-3864). In particular, the Su PoC (tested on CentOS 7) showing >> that the su invokes /usr/sbin/unix_chkpwd which is also a SUID >> binary. The initial su invocation won't generate a core dump because >> the real uid and euid differs, but the second unix_chkpwd invocation >> will. This patch eliminates this hole by making sure that all SUID >> binaries follow suid_dumpable setting. > All that is required to take advantage of this vulnerability is > for an suid program to exec something that will coredump. That > exec resets the dumpability. > > While the example exploit is execing a suid program it is not required > that the exec'd program be suid. > > This makes your proposed change is not a particularly effective mitigation. Yes, I am aware of that. That is why I said it is just a minor mitigation. This patch was inspired after investigating this problem, but I do think it is good to make the code consistent with the documentation. Of course, we can go either way. I prefer my approach to use a flag to indicate a suid binary instead of just comparing ruid and euid. > > The best idea I have seen to mitigate this from the kernel side is: > > 1) set RLIMIT_CORE to 0 during an suid exec > 2) update do_coredump to honor an rlimit of 0 for pipes > > Anecdotally this should not effect the common systems that pipe > coredumps into programs as those programs are reported to honor > RLIMIT_CORE of 0. This needs to be verified. > > If those programs do honor RLIMIT_CORE of 0 we won't have any user > visible changes if they never see coredumps from a program with a > RLIMIT_CORE of 0. > > > I have been meaning to audit userspace and see if the common coredump > catchers truly honor an RLIMIT_CORE of 0. Unfortunately I have not > found time to do that yet. Default RLIMIT_CORE to 0 will likely mitigate this vulnerability. However, there are still some userspace impacts as existing behavior will be modified. For instance, we may need to modify su to restore a proper value for RLIMIT_CORE after successful authentication. Cheers, Longman
On Tue, Dec 21, 2021 at 10:01 AM Waiman Long <longman@redhat.com> wrote: > > Default RLIMIT_CORE to 0 will likely mitigate this vulnerability. > However, there are still some userspace impacts as existing behavior > will be modified. For instance, we may need to modify su to restore a > proper value for RLIMIT_CORE after successful authentication. We had a "clever" idea for this that I thought people were ok with. It's been some time since this came up, but iirc the notion was to instead of setting the rlimit to zero (which makes it really hard to restore afterwards, because you don't know what the restored value would be, so you are dependent on user space doing it), we just never reset set_dumpable() when we execve. So any suid exec will do set_dumpable() to suid_dumpable, and exec'ing something else does nothing at all - it stays non-dumpable (obviously "non-dumpable" here depends on the actual value for "suid_dumpable" - you can enable suid dump debugging manually). And instead, we say that operations like "setsid()" that start a new session - *those* are the ones that enable core dumping again. Or doing things like a "ulimit(RLIMIT_CORE)" (which clearly implies "I want core-dumps"). Those will all very naturally make "login" and friends work correctly, while keeping core-dumps disabled for some suid situation that doesn't explicitly set up a new context. I think the basic problem with the traditional UNIX model of "suid exec doesn't core dump" is that the "enter non-core-dump" is a nice clear "your privileges changed". But then the "exit non-core-dump" thing is an exec that *doesn't* change privileges. That's the odd and crazy part: you just disabled core-dumps because there was a privilege level change, and then you enable core-dumps again because there *wasn't* a privilege change - even if you're still at those elevated privileges. Now, this is clearly not a Linux issue - we're just doing what others have been doing too. But I think we should just admit that "what others have been doing" is simply broken. And yes, some odd situation migth be broken by this kind of change, but I think this kind of "the old model was broken" may simply require that. I suspect we can find a solution that fixes all the regular cases. Hmm? Linus
On 12/21/21 13:19, Linus Torvalds wrote: > On Tue, Dec 21, 2021 at 10:01 AM Waiman Long <longman@redhat.com> wrote: >> Default RLIMIT_CORE to 0 will likely mitigate this vulnerability. >> However, there are still some userspace impacts as existing behavior >> will be modified. For instance, we may need to modify su to restore a >> proper value for RLIMIT_CORE after successful authentication. > We had a "clever" idea for this that I thought people were ok with. > > It's been some time since this came up, but iirc the notion was to > instead of setting the rlimit to zero (which makes it really hard to > restore afterwards, because you don't know what the restored value > would be, so you are dependent on user space doing it), we just never > reset set_dumpable() when we execve. > > So any suid exec will do set_dumpable() to suid_dumpable, and exec'ing > something else does nothing at all - it stays non-dumpable (obviously > "non-dumpable" here depends on the actual value for "suid_dumpable" - > you can enable suid dump debugging manually). > > And instead, we say that operations like "setsid()" that start a new > session - *those* are the ones that enable core dumping again. Or > doing things like a "ulimit(RLIMIT_CORE)" (which clearly implies "I > want core-dumps"). > > Those will all very naturally make "login" and friends work correctly, > while keeping core-dumps disabled for some suid situation that doesn't > explicitly set up a new context. > > I think the basic problem with the traditional UNIX model of "suid > exec doesn't core dump" is that the "enter non-core-dump" is a nice > clear "your privileges changed". > > But then the "exit non-core-dump" thing is an exec that *doesn't* > change privileges. That's the odd and crazy part: you just disabled > core-dumps because there was a privilege level change, and then you > enable core-dumps again because there *wasn't* a privilege change - > even if you're still at those elevated privileges. > > Now, this is clearly not a Linux issue - we're just doing what others > have been doing too. But I think we should just admit that "what > others have been doing" is simply broken. > > And yes, some odd situation migth be broken by this kind of change, > but I think this kind of "the old model was broken" may simply require > that. I suspect we can find a solution that fixes all the regular > cases. > > Hmm? I think this is a pretty clever idea. At least it is better than resetting RLIMIT_CORE to 0. As it is all done within the kernel, there is no need to change any userspace code. We may need to add a flag bit in the task structure to indicate using the suid_dumpable setting so that it can be inherited across fork/exec. Thanks for the suggestion. Cheers, Longman
On Tue, Dec 21, 2021 at 02:27:47PM -0500, Waiman Long wrote: > On 12/21/21 13:19, Linus Torvalds wrote: > > On Tue, Dec 21, 2021 at 10:01 AM Waiman Long <longman@redhat.com> wrote: > > > Default RLIMIT_CORE to 0 will likely mitigate this vulnerability. > > > However, there are still some userspace impacts as existing behavior > > > will be modified. For instance, we may need to modify su to restore a > > > proper value for RLIMIT_CORE after successful authentication. > > We had a "clever" idea for this that I thought people were ok with. > > > > It's been some time since this came up, but iirc the notion was to > > instead of setting the rlimit to zero (which makes it really hard to > > restore afterwards, because you don't know what the restored value > > would be, so you are dependent on user space doing it), we just never > > reset set_dumpable() when we execve. > > > > So any suid exec will do set_dumpable() to suid_dumpable, and exec'ing > > something else does nothing at all - it stays non-dumpable (obviously > > "non-dumpable" here depends on the actual value for "suid_dumpable" - > > you can enable suid dump debugging manually). > > > > And instead, we say that operations like "setsid()" that start a new > > session - *those* are the ones that enable core dumping again. Or > > doing things like a "ulimit(RLIMIT_CORE)" (which clearly implies "I > > want core-dumps"). > > > > Those will all very naturally make "login" and friends work correctly, > > while keeping core-dumps disabled for some suid situation that doesn't > > explicitly set up a new context. > > > > I think the basic problem with the traditional UNIX model of "suid > > exec doesn't core dump" is that the "enter non-core-dump" is a nice > > clear "your privileges changed". > > > > But then the "exit non-core-dump" thing is an exec that *doesn't* > > change privileges. That's the odd and crazy part: you just disabled > > core-dumps because there was a privilege level change, and then you > > enable core-dumps again because there *wasn't* a privilege change - > > even if you're still at those elevated privileges. > > > > Now, this is clearly not a Linux issue - we're just doing what others > > have been doing too. But I think we should just admit that "what > > others have been doing" is simply broken. > > > > And yes, some odd situation migth be broken by this kind of change, > > but I think this kind of "the old model was broken" may simply require > > that. I suspect we can find a solution that fixes all the regular > > cases. > > > > Hmm? > > I think this is a pretty clever idea. At least it is better than resetting > RLIMIT_CORE to 0. Another problem that was raised when discussing RLIMIT_CORE to zero was the loss of the old value. > As it is all done within the kernel, there is no need to > change any userspace code. We may need to add a flag bit in the task > structure to indicate using the suid_dumpable setting so that it can be > inherited across fork/exec. Depending on what we change there can be some subtly visible changes. In one of my servers I explicitly re-enable dumpable before setsid() when a core dump is desired for debugging. But other deamons could do the exact opposite. If setsid() systematically restores suid_dumpable, a process that explicitly disables it before calling setsid() would see it come back. But if we have a special "suid_in_progress" flag to mask suid_dumpable and that's reset by setsid() and possibly prctl(PR_SET_DUMPABLE) then I think it could even cover that unlikely case. Just my two cents, Willy
On Tue, Dec 21, 2021 at 09:56:35PM +0100, Willy Tarreau wrote: > > As it is all done within the kernel, there is no need to > > change any userspace code. We may need to add a flag bit in the task > > structure to indicate using the suid_dumpable setting so that it can be > > inherited across fork/exec. > > Depending on what we change there can be some subtly visible changes. > In one of my servers I explicitly re-enable dumpable before setsid() > when a core dump is desired for debugging. But other deamons could do > the exact opposite. If setsid() systematically restores suid_dumpable, > a process that explicitly disables it before calling setsid() would > see it come back. But if we have a special "suid_in_progress" flag > to mask suid_dumpable and that's reset by setsid() and possibly > prctl(PR_SET_DUMPABLE) then I think it could even cover that unlikely > case. Would there be any interest in pursuing attempts like the untested patch below ? The intent is to set a new MMF_NOT_DUMPABLE on exec on setuid or setgid bit, but clear it on setrlimit(RLIMIT_CORE), prctl(SET_DUMPABLE), and setsid(). This flag makes get_dumpable() return SUID_DUMP_DISABLED when set. I think that in the spirit it could maintain the info that a suidexec happened and was not reset, without losing any tuning made by the application. I never feel at ease touching all this and I certainly did some mistakes but for now it's mostly to have a base to discuss around, so do not hesitate to suggest or criticize. Willy diff --git a/fs/exec.c b/fs/exec.c index a098c133d8d7..a80bfd835235 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -1348,8 +1348,11 @@ int begin_new_exec(struct linux_binprm * bprm) */ if (bprm->interp_flags & BINPRM_FLAGS_ENFORCE_NONDUMP || !(uid_eq(current_euid(), current_uid()) && - gid_eq(current_egid(), current_gid()))) + gid_eq(current_egid(), current_gid()))) { + set_bit(MMF_NOT_DUMPABLE, &mm->flags); + set_dumpable(current->mm, suid_dumpable); + } else set_dumpable(current->mm, SUID_DUMP_USER); diff --git a/include/linux/sched/coredump.h b/include/linux/sched/coredump.h index 4d9e3a656875..fd2109d036bc 100644 --- a/include/linux/sched/coredump.h +++ b/include/linux/sched/coredump.h @@ -14,23 +14,6 @@ #define MMF_DUMPABLE_BITS 2 #define MMF_DUMPABLE_MASK ((1 << MMF_DUMPABLE_BITS) - 1) -extern void set_dumpable(struct mm_struct *mm, int value); -/* - * This returns the actual value of the suid_dumpable flag. For things - * that are using this for checking for privilege transitions, it must - * test against SUID_DUMP_USER rather than treating it as a boolean - * value. - */ -static inline int __get_dumpable(unsigned long mm_flags) -{ - return mm_flags & MMF_DUMPABLE_MASK; -} - -static inline int get_dumpable(struct mm_struct *mm) -{ - return __get_dumpable(mm->flags); -} - /* coredump filter bits */ #define MMF_DUMP_ANON_PRIVATE 2 #define MMF_DUMP_ANON_SHARED 3 @@ -81,9 +64,29 @@ static inline int get_dumpable(struct mm_struct *mm) * lifecycle of this mm, just for simplicity. */ #define MMF_HAS_PINNED 28 /* FOLL_PIN has run, never cleared */ +#define MMF_NOT_DUMPABLE 29 /* dump disable by suidexec */ #define MMF_DISABLE_THP_MASK (1 << MMF_DISABLE_THP) #define MMF_INIT_MASK (MMF_DUMPABLE_MASK | MMF_DUMP_FILTER_MASK |\ MMF_DISABLE_THP_MASK) +extern void set_dumpable(struct mm_struct *mm, int value); +/* + * This returns the actual value of the suid_dumpable flag. For things + * that are using this for checking for privilege transitions, it must + * test against SUID_DUMP_USER rather than treating it as a boolean + * value. + */ +static inline int __get_dumpable(unsigned long mm_flags) +{ + if (mm_flag & MMF_NOT_DUMPABLE) + return SUID_DUMP_DISABLE; + return mm_flags & MMF_DUMPABLE_MASK; +} + +static inline int get_dumpable(struct mm_struct *mm) +{ + return __get_dumpable(mm->flags); +} + #endif /* _LINUX_SCHED_COREDUMP_H */ diff --git a/kernel/sys.c b/kernel/sys.c index 8fdac0d90504..a20002075496 100644 --- a/kernel/sys.c +++ b/kernel/sys.c @@ -1215,6 +1215,13 @@ int ksys_setsid(void) out: write_unlock_irq(&tasklist_lock); if (err > 0) { + struct mm_struct *mm = get_task_mm(current); + if (mm) { + /* session leaders reset the not-dumpable protection */ + clear_bit(MMF_NOT_DUMPABLE, &mm->flags); + mmput(mm); + } + proc_sid_connector(group_leader); sched_autogroup_create_attach(group_leader); } @@ -1610,6 +1617,18 @@ int do_prlimit(struct task_struct *tsk, unsigned int resource, new_rlim->rlim_cur != RLIM_INFINITY && IS_ENABLED(CONFIG_POSIX_TIMERS)) update_rlimit_cpu(tsk, new_rlim->rlim_cur); + + /* + * If an application wants to change core dump settings, it means + * it wants to decide on its dumpability so we reset MMF_NOT_DUMPABLE. + */ + if (resource == RLIMIT_CORE) { + struct mm_struct *mm = get_task_mm(tsk); + if (mm) { + clear_bit(MMF_NOT_DUMPABLE, &mm->flags); + mmput(mm); + } + } out: read_unlock(&tasklist_lock); return retval; @@ -2292,6 +2311,7 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3, error = -EINVAL; break; } + clear_bit(MMF_NOT_DUMPABLE, &me->mm->flags); set_dumpable(me->mm, arg2); break;
Willy Tarreau <w@1wt.eu> writes: > On Tue, Dec 21, 2021 at 09:56:35PM +0100, Willy Tarreau wrote: >> > As it is all done within the kernel, there is no need to >> > change any userspace code. We may need to add a flag bit in the task >> > structure to indicate using the suid_dumpable setting so that it can be >> > inherited across fork/exec. >> >> Depending on what we change there can be some subtly visible changes. >> In one of my servers I explicitly re-enable dumpable before setsid() >> when a core dump is desired for debugging. But other deamons could do >> the exact opposite. If setsid() systematically restores suid_dumpable, >> a process that explicitly disables it before calling setsid() would >> see it come back. But if we have a special "suid_in_progress" flag >> to mask suid_dumpable and that's reset by setsid() and possibly >> prctl(PR_SET_DUMPABLE) then I think it could even cover that unlikely >> case. > > Would there be any interest in pursuing attempts like the untested patch > below ? The intent is to set a new MMF_NOT_DUMPABLE on exec on setuid or > setgid bit, but clear it on setrlimit(RLIMIT_CORE), prctl(SET_DUMPABLE), > and setsid(). This flag makes get_dumpable() return SUID_DUMP_DISABLED > when set. I think that in the spirit it could maintain the info that a > suidexec happened and was not reset, without losing any tuning made by > the application. I never feel at ease touching all this and I certainly > did some mistakes but for now it's mostly to have a base to discuss > around, so do not hesitate to suggest or criticize. Yes. This looks like a good place to start the conversation. We need to do something like you are doing to separate dumpability changes due to privilege gains during exec and dumpability changes due to privilege shuffling with setresuid. As long as we only impact processes descending from a binary that has gained privileges during exec (like this patch) I think we have a lot of latitude in how we make this happen. Basically we only need to test su and sudo and verify that whatever we do works reasonably well for them. On the one hand I believe of gaining privileges during exec while letting the caller control some aspect of our environment is a dangerous design flaw and I would love to remove gaining privileges during exec entirely. On the other hand we need to introduces as few regressions as possible and make gaining privileges during exec as safe as possible. I do agree that RLIMIT_CORE and prctl(SET_DUMPABLE) are good places to clear the flag. I don't know if setsid is the proper key to re-enabling dumpability. I ran a quick test and simply doing "su" and then running a shell as root does not change the session, nor does "su -" (which creates a login shell). Also "sudo -s" does not create a new session. So session creation does not happen naturally. Still setsid is part of the standard formula for starting a daemon, so I don't think system services that run as daemons will be affected. I don't think anything we do matters for systemd. As I understand it "systemctl start ..." causes pid 1 to fork and exec services, which will ensure the started processes are not descendants of the binary the gained privileges during exec. Eric
On Tue, Dec 21, 2021 at 05:35:29PM -0600, Eric W. Biederman wrote: > > Would there be any interest in pursuing attempts like the untested patch > > below ? The intent is to set a new MMF_NOT_DUMPABLE on exec on setuid or > > setgid bit, but clear it on setrlimit(RLIMIT_CORE), prctl(SET_DUMPABLE), > > and setsid(). This flag makes get_dumpable() return SUID_DUMP_DISABLED > > when set. I think that in the spirit it could maintain the info that a > > suidexec happened and was not reset, without losing any tuning made by > > the application. I never feel at ease touching all this and I certainly > > did some mistakes but for now it's mostly to have a base to discuss > > around, so do not hesitate to suggest or criticize. > > > Yes. This looks like a good place to start the conversation. OK thanks. > We need to do something like you are doing to separate dumpability > changes due to privilege gains during exec and dumpability changes due > to privilege shuffling with setresuid. > > As long as we only impact processes descending from a binary that has > gained privileges during exec (like this patch) I think we have a lot > of latitude in how we make this happen. Yes that's the idea. I think that fundamentally we ought to mark that a chain of processes are potentially unsafe for dumps until the application has shown that it could regain control of the code paths, and hence is expected to deal properly with errors that might appear so as not to dump a core anywhere with random permissions. > Basically we only need to > test su and sudo and verify that whatever we do works reasonably > well for them. > > On the one hand I believe of gaining privileges during exec while > letting the caller control some aspect of our environment is a dangerous > design flaw and I would love to remove gaining privileges during exec > entirely. You would like to postpone this ? It's not very clear to me how to do that nor if it could reliably address this shortcoming. > On the other hand we need to introduces as few regressions as possible > and make gaining privileges during exec as safe as possible. Yep I think so. Also code that is designed to run under setuid (like sudo) is usally well tested, and quite portable thanks to various OS-specific tweaks that we definitely don't want to break. > I do agree that RLIMIT_CORE and prctl(SET_DUMPABLE) are good places > to clear the flag. There are probably other ones, but ideally we ought to avoid stuff that could happen early in the dynamic linker. > I don't know if setsid is the proper key to re-enabling dumpability. It was a supposition emitted by Linus, which deserved being checked at least given that it's part of the usual sequence when starting a deamon. > I ran a quick test and simply doing "su" and then running a shell > as root does not change the session, nor does "su -" (which creates > a login shell). Also "sudo -s" does not create a new session. > > So session creation does not happen naturally. OK, so it will not help them. For them we could use setuid()/setreuid() and setresuid() as good indicators that the application has taken control of its fate. Sadly we cannot do that in set_user() because this one is not called when the uid doesn't change. > Still setsid is part of the standard formula for starting a daemon, > so I don't think system services that run as daemons will be affected. > > > I don't think anything we do matters for systemd. As I understand > it "systemctl start ..." causes pid 1 to fork and exec services, > which will ensure the started processes are not descendants of > the binary the gained privileges during exec. Good point, I hadn't thought about that, but I agree with you. Willy
Hi Eric,
I've experimented a bit with this and am not completely convinced we're
on the right track.
First, resetting the not-dumpable status on setuid()/setsid() and friends
means that sudo itself resets it, and as such that the executable that it
launches may very well crash on CPU limits for example, thus allowing a
sudoer to produce a root core in a random directory. That's the current
state of affairs after the first attached patch.
This made me think that since we want to protect the called program and
not sudo itself, we ought to verify that the called program performs the
action itself. I.e. only programs that are marked as dumpable may reset
their not-dumpable status on various actions. That's what the second patch
does.
Doing this worked a lot better and initially made me think it solved the
issue: a user becoming root via sudo could regularly dump cores, but
crashes during startup would not work. That was until I tested with
"sudo ping" and saw root cores again, because ping, like many setuid
enabled programs, takes care of its permissions via setuid(0).
So this makes me think that trying to infer a program's intents via
snooping a few syscalls isn't going very far. Earlier in this
conversation there were a few other proposals around just playing with
RLIMIT_CORE and PR_SET_DUMPABLE.
I tend to think that if we combine the principle above (only monitor
dumpable programs) with the only two actions that *really* act on the
ability to produce a core (rlimit and prctl) then it might actually
reasonably work, because then a program could explicitly want to enable
core dumps and be allowed to do that. That's what the last patch does
and I couldn't find a case where it doesn't work. I.e. switching from
a user to root via sudo allows me to dump a core from a shell as the
shell sets RLIMIT_CORE, but will not let a program such as "ping" dump
a core by default for example. I've put that in the last patch which
replaces the first two ones.
I'm still wondering if adding a 4th value to suid_dumpable wouldn't
solve all of this: the value would automatically change when setting
RLIMIT_CORE. It could just be a bit more confusing to configure.
Other (orthogonal) approaches could consist in forcefully resetting a
few limits on suidexec. At least memory limits and CPU limits could be
reset to default (unlimited) values since there doesn't seem to be any
valid reason for an unprivileged process to change them for a privileged
one. And the core dump limits could be set to zero for the same reason.
It could be decided that we'd never dump a core on user-addressable
signals (SIGQUIT, maybe others?) and that could be even cleaner than
the currently discussed solutions.
Please let me know what you think.
Thanks,
Willy
From 8ca54b3a0bba56dbfce8ccf96ba36a1d6c189e85 Mon Sep 17 00:00:00 2001
From: Willy Tarreau <w@1wt.eu>
Date: Sun, 26 Dec 2021 15:17:08 +0100
Subject: WIP: only remove non-dumpable in the non-suid program
this way sudo doesn't drop its own not-dumpable flag before executing.
---
fs/coredump.c | 2 ++
include/linux/sched/coredump.h | 4 ++--
kernel/sys.c | 15 ++++++++++-----
3 files changed, 14 insertions(+), 7 deletions(-)
diff --git a/fs/coredump.c b/fs/coredump.c
index 3224dee44d30..5f0bfe2c00a6 100644
--- a/fs/coredump.c
+++ b/fs/coredump.c
@@ -609,6 +609,8 @@ void do_coredump(const kernel_siginfo_t *siginfo)
goto fail;
if (!__get_dumpable(cprm.mm_flags))
goto fail;
+ if (cprm.mm_flags & MMF_NOT_DUMPABLE_MASK)
+ goto fail;
cred = prepare_creds();
if (!cred)
diff --git a/include/linux/sched/coredump.h b/include/linux/sched/coredump.h
index 302f31247c90..662508d139e1 100644
--- a/include/linux/sched/coredump.h
+++ b/include/linux/sched/coredump.h
@@ -80,8 +80,8 @@ extern void set_dumpable(struct mm_struct *mm, int value);
*/
static inline int __get_dumpable(unsigned long mm_flags)
{
- if (mm_flags & MMF_NOT_DUMPABLE_MASK)
- return SUID_DUMP_DISABLE;
+ //if (mm_flags & MMF_NOT_DUMPABLE_MASK)
+ // return SUID_DUMP_DISABLE;
return mm_flags & MMF_DUMPABLE_MASK;
}
diff --git a/kernel/sys.c b/kernel/sys.c
index f4405e004145..0ecdb4cc64e7 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -564,7 +564,8 @@ long __sys_setreuid(uid_t ruid, uid_t euid)
goto error;
/* attempt to change ID drops the not-dumpable protection */
- clear_bit(MMF_NOT_DUMPABLE, ¤t->mm->flags);
+ if (get_dumpable(current->mm))
+ clear_bit(MMF_NOT_DUMPABLE, ¤t->mm->flags);
return commit_creds(new);
@@ -629,7 +630,8 @@ long __sys_setuid(uid_t uid)
goto error;
/* attempt to change ID drops the not-dumpable protection */
- clear_bit(MMF_NOT_DUMPABLE, ¤t->mm->flags);
+ if (get_dumpable(current->mm))
+ clear_bit(MMF_NOT_DUMPABLE, ¤t->mm->flags);
return commit_creds(new);
@@ -711,7 +713,8 @@ long __sys_setresuid(uid_t ruid, uid_t euid, uid_t suid)
goto error;
/* attempt to change ID drops the not-dumpable protection */
- clear_bit(MMF_NOT_DUMPABLE, ¤t->mm->flags);
+ if (get_dumpable(current->mm))
+ clear_bit(MMF_NOT_DUMPABLE, ¤t->mm->flags);
return commit_creds(new);
@@ -1225,7 +1228,8 @@ int ksys_setsid(void)
write_unlock_irq(&tasklist_lock);
if (err > 0) {
/* session leaders reset the not-dumpable protection */
- clear_bit(MMF_NOT_DUMPABLE, ¤t->mm->flags);
+ if (get_dumpable(current->mm))
+ clear_bit(MMF_NOT_DUMPABLE, ¤t->mm->flags);
proc_sid_connector(group_leader);
sched_autogroup_create_attach(group_leader);
@@ -1627,7 +1631,8 @@ int do_prlimit(struct task_struct *tsk, unsigned int resource,
* If an application wants to change its own core dump settings, it
* wants to decide on its dumpability so we reset MMF_NOT_DUMPABLE.
*/
- if (!retval && new_rlim && resource == RLIMIT_CORE && tsk == current)
+ if (!retval && new_rlim && resource == RLIMIT_CORE && tsk == current &&
+ get_dumpable(tsk->mm))
clear_bit(MMF_NOT_DUMPABLE, &tsk->mm->flags);
out:
read_unlock(&tasklist_lock);
diff --git a/fs/exec.c b/fs/exec.c index 537d92c41105..60e02e678fb6 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -1344,9 +1344,7 @@ int begin_new_exec(struct linux_binprm * bprm) * is wrong, but userspace depends on it. This should be testing * bprm->secureexec instead. */ - if (bprm->interp_flags & BINPRM_FLAGS_ENFORCE_NONDUMP || - !(uid_eq(current_euid(), current_uid()) && - gid_eq(current_egid(), current_gid()))) + if (bprm->interp_flags & BINPRM_FLAGS_ENFORCE_NONDUMP || bprm->is_sugid) set_dumpable(current->mm, suid_dumpable); else set_dumpable(current->mm, SUID_DUMP_USER); @@ -1619,11 +1617,13 @@ static void bprm_fill_uid(struct linux_binprm *bprm, struct file *file) if (mode & S_ISUID) { bprm->per_clear |= PER_CLEAR_ON_SETID; bprm->cred->euid = uid; + bprm->is_sugid = 1; } if ((mode & (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP)) { bprm->per_clear |= PER_CLEAR_ON_SETID; bprm->cred->egid = gid; + bprm->is_sugid = 1; } } diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h index 049cf9421d83..6d9893c59085 100644 --- a/include/linux/binfmts.h +++ b/include/linux/binfmts.h @@ -41,7 +41,10 @@ struct linux_binprm { * Set when errors can no longer be returned to the * original userspace. */ - point_of_no_return:1; + point_of_no_return:1, + + /* Is a SUID or SGID binary? */ + is_sugid:1; #ifdef __alpha__ unsigned int taso:1; #endif
The begin_new_exec() function checks for SUID or SGID binaries by comparing effective uid and gid against real uid and gid and using the suid_dumpable sysctl parameter setting only if either one of them differs. In the special case that the uid and/or gid of the SUID/SGID binaries matches the id's of the user invoking it, the suid_dumpable is not used and SUID_DUMP_USER will be used instead. The documentation for the suid_dumpable sysctl parameter does not include that exception and so this will be an undocumented behavior. Eliminate this undocumented behavior by adding a flag in the linux_binprm structure to designate a SUID/SGID binary and use it for determining if the suid_dumpable setting should be applied or not. Signed-off-by: Waiman Long <longman@redhat.com> --- fs/exec.c | 6 +++--- include/linux/binfmts.h | 5 ++++- 2 files changed, 7 insertions(+), 4 deletions(-)