Message ID | 20190920083007.11475-2-christian.brauner@ubuntu.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | seccomp: continue syscall from notifier | expand |
On Fri, Sep 20, 2019 at 10:30:05AM +0200, Christian Brauner wrote: > + * Similar precautions should be applied when stacking SECCOMP_RET_USER_NOTIF. > + * For SECCOMP_RET_USER_NOTIF filters acting on the same syscall the uppermost > + * filter takes precedence. This means that the uppermost > + * SECCOMP_RET_USER_NOTIF filter can override any SECCOMP_IOCTL_NOTIF_SEND from > + * lower filters essentially allowing all syscalls to pass by using > + * SECCOMP_USER_NOTIF_FLAG_CONTINUE. Note that SECCOMP_RET_USER_NOTIF can ^^^^^^^^^^^^^^ This is meant to read RET_TRACE, yes? > + * equally be overriden by SECCOMP_USER_NOTIF_FLAG_CONTINUE. I rewrote this paragraph with that corrected and swapping some "upper/lower" to "most recently added" etc: + * Similar precautions should be applied when stacking SECCOMP_RET_USER_NOTIF + * or SECCOMP_RET_TRACE. For SECCOMP_RET_USER_NOTIF filters acting on the + * same syscall, the most recently added filter takes precedence. This means + * that the new SECCOMP_RET_USER_NOTIF filter can override any + * SECCOMP_IOCTL_NOTIF_SEND from earlier filters, essentially allowing all + * such filtered syscalls to be executed by sending the response + * SECCOMP_USER_NOTIF_FLAG_CONTINUE. Note that SECCOMP_RET_TRACE can equally + * be overriden by SECCOMP_USER_NOTIF_FLAG_CONTINUE. Ultimately, I think this caveat is fine. RET_USER_NOTIF and RET_TRACE are both used from the "process manager" use-case. The benefits of "continue" semantics here outweighs the RET_USER_NOTIF and RET_TRACE "bypass". If we end up in a situation where we need to deal with some kind of nesting where this is a problem in practice, we can revisit this. Applied to my for-next/seccomp. Thanks!
On Thu, Oct 10, 2019 at 02:45:38PM -0700, Kees Cook wrote: > On Fri, Sep 20, 2019 at 10:30:05AM +0200, Christian Brauner wrote: > > + * Similar precautions should be applied when stacking SECCOMP_RET_USER_NOTIF. > > + * For SECCOMP_RET_USER_NOTIF filters acting on the same syscall the uppermost > > + * filter takes precedence. This means that the uppermost > > + * SECCOMP_RET_USER_NOTIF filter can override any SECCOMP_IOCTL_NOTIF_SEND from > > + * lower filters essentially allowing all syscalls to pass by using > > + * SECCOMP_USER_NOTIF_FLAG_CONTINUE. Note that SECCOMP_RET_USER_NOTIF can > ^^^^^^^^^^^^^^ > This is meant to read RET_TRACE, yes? Yes. :) > > > + * equally be overriden by SECCOMP_USER_NOTIF_FLAG_CONTINUE. > > I rewrote this paragraph with that corrected and swapping some > "upper/lower" to "most recently added" etc: > > + * Similar precautions should be applied when stacking SECCOMP_RET_USER_NOTIF > + * or SECCOMP_RET_TRACE. For SECCOMP_RET_USER_NOTIF filters acting on the > + * same syscall, the most recently added filter takes precedence. This means > + * that the new SECCOMP_RET_USER_NOTIF filter can override any > + * SECCOMP_IOCTL_NOTIF_SEND from earlier filters, essentially allowing all > + * such filtered syscalls to be executed by sending the response > + * SECCOMP_USER_NOTIF_FLAG_CONTINUE. Note that SECCOMP_RET_TRACE can equally > + * be overriden by SECCOMP_USER_NOTIF_FLAG_CONTINUE. > > > Ultimately, I think this caveat is fine. RET_USER_NOTIF and RET_TRACE are > both used from the "process manager" use-case. The benefits of "continue" > semantics here outweighs the RET_USER_NOTIF and RET_TRACE "bypass". If > we end up in a situation where we need to deal with some kind of > nesting where this is a problem in practice, we can revisit this. > > Applied to my for-next/seccomp. Thanks! Thanks! Christian
diff --git a/include/uapi/linux/seccomp.h b/include/uapi/linux/seccomp.h index 90734aa5aa36..61fbbb7c1ee9 100644 --- a/include/uapi/linux/seccomp.h +++ b/include/uapi/linux/seccomp.h @@ -76,6 +76,34 @@ struct seccomp_notif { struct seccomp_data data; }; +/* + * Valid flags for struct seccomp_notif_resp + * + * Note, the SECCOMP_USER_NOTIF_FLAG_CONTINUE flag must be used with caution! + * If set by the process supervising the syscalls of another process the + * syscall will continue. This is problematic because of an inherent TOCTOU. + * An attacker can exploit the time while the supervised process is waiting on + * a response from the supervising process to rewrite syscall arguments which + * are passed as pointers of the intercepted syscall. + * It should be absolutely clear that this means that the seccomp notifier + * _cannot_ be used to implement a security policy! It should only ever be used + * in scenarios where a more privileged process supervises the syscalls of a + * lesser privileged process to get around kernel-enforced security + * restrictions when the privileged process deems this safe. In other words, + * in order to continue a syscall the supervising process should be sure that + * another security mechanism or the kernel itself will sufficiently block + * syscalls if arguments are rewritten to something unsafe. + * + * Similar precautions should be applied when stacking SECCOMP_RET_USER_NOTIF. + * For SECCOMP_RET_USER_NOTIF filters acting on the same syscall the uppermost + * filter takes precedence. This means that the uppermost + * SECCOMP_RET_USER_NOTIF filter can override any SECCOMP_IOCTL_NOTIF_SEND from + * lower filters essentially allowing all syscalls to pass by using + * SECCOMP_USER_NOTIF_FLAG_CONTINUE. Note that SECCOMP_RET_USER_NOTIF can + * equally be overriden by SECCOMP_USER_NOTIF_FLAG_CONTINUE. + */ +#define SECCOMP_USER_NOTIF_FLAG_CONTINUE BIT(0) + struct seccomp_notif_resp { __u64 id; __s64 val; diff --git a/kernel/seccomp.c b/kernel/seccomp.c index dba52a7db5e8..12d2227e5786 100644 --- a/kernel/seccomp.c +++ b/kernel/seccomp.c @@ -75,6 +75,7 @@ struct seccomp_knotif { /* The return values, only valid when in SECCOMP_NOTIFY_REPLIED */ int error; long val; + u32 flags; /* Signals when this has entered SECCOMP_NOTIFY_REPLIED */ struct completion ready; @@ -732,11 +733,12 @@ static u64 seccomp_next_notify_id(struct seccomp_filter *filter) return filter->notif->next_id++; } -static void seccomp_do_user_notification(int this_syscall, - struct seccomp_filter *match, - const struct seccomp_data *sd) +static int seccomp_do_user_notification(int this_syscall, + struct seccomp_filter *match, + const struct seccomp_data *sd) { int err; + u32 flags = 0; long ret = 0; struct seccomp_knotif n = {}; @@ -764,6 +766,7 @@ static void seccomp_do_user_notification(int this_syscall, if (err == 0) { ret = n.val; err = n.error; + flags = n.flags; } /* @@ -780,8 +783,14 @@ static void seccomp_do_user_notification(int this_syscall, list_del(&n.list); out: mutex_unlock(&match->notify_lock); + + /* Userspace requests to continue the syscall. */ + if (flags & SECCOMP_USER_NOTIF_FLAG_CONTINUE) + return 0; + syscall_set_return_value(current, task_pt_regs(current), err, ret); + return -1; } static int __seccomp_filter(int this_syscall, const struct seccomp_data *sd, @@ -867,8 +876,10 @@ static int __seccomp_filter(int this_syscall, const struct seccomp_data *sd, return 0; case SECCOMP_RET_USER_NOTIF: - seccomp_do_user_notification(this_syscall, match, sd); - goto skip; + if (seccomp_do_user_notification(this_syscall, match, sd)) + goto skip; + + return 0; case SECCOMP_RET_LOG: seccomp_log(this_syscall, 0, action, true); @@ -1087,7 +1098,11 @@ static long seccomp_notify_send(struct seccomp_filter *filter, if (copy_from_user(&resp, buf, sizeof(resp))) return -EFAULT; - if (resp.flags) + if (resp.flags & ~SECCOMP_USER_NOTIF_FLAG_CONTINUE) + return -EINVAL; + + if ((resp.flags & SECCOMP_USER_NOTIF_FLAG_CONTINUE) && + (resp.error || resp.val)) return -EINVAL; ret = mutex_lock_interruptible(&filter->notify_lock); @@ -1116,6 +1131,7 @@ static long seccomp_notify_send(struct seccomp_filter *filter, knotif->state = SECCOMP_NOTIFY_REPLIED; knotif->error = resp.error; knotif->val = resp.val; + knotif->flags = resp.flags; complete(&knotif->ready); out: mutex_unlock(&filter->notify_lock);