Message ID | 20230109153348.5625-2-gregory.price@memverge.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | Checkpoint Support for Syscall User Dispatch | expand |
On Mon, Jan 09, 2023 at 10:33:46AM -0500, Gregory Price wrote: > @@ -36,6 +37,10 @@ bool syscall_user_dispatch(struct pt_regs *regs) > struct syscall_user_dispatch *sd = ¤t->syscall_dispatch; > char state; > > + if (IS_ENABLED(CONFIG_CHECKPOINT_RESTORE) && > + unlikely(current->ptrace & PT_SUSPEND_SYSCALL_USER_DISPATCH)) > + return false; > + > if (likely(instruction_pointer(regs) - sd->offset < sd->len)) > return false; > So by making syscall_user_dispatch() return false, we'll make syscall_trace_enter() continue to handle things, and supposedly you want to land in ptrace_report_syscall_entry(), right? > diff --git a/kernel/ptrace.c b/kernel/ptrace.c > index 54482193e1ed..a6ad815bd4be 100644 > --- a/kernel/ptrace.c > +++ b/kernel/ptrace.c > @@ -370,6 +370,11 @@ static int check_ptrace_options(unsigned long data) > if (data & ~(unsigned long)PTRACE_O_MASK) > return -EINVAL; > > + if (unlikely(data & PTRACE_O_SUSPEND_SYSCALL_USER_DISPATCH)) { > + if (!IS_ENABLED(CONFIG_CHECKPOINT_RESTART)) > + return -EINVAL; > + } Should setting this then not also depend on having SYSCALL_WORK_SYSCALL_TRACE set? Because without that, you get 'funny' things.
On Wed, Jan 18, 2023 at 02:41:00PM -0500, Gregory Price wrote: > ---------- Forwarded message --------- > From: Peter Zijlstra <peterz@infradead.org> > Date: Wed, Jan 18, 2023 at 12:16 PM > Subject: Re: [PATCH 1/3] ptrace,syscall_user_dispatch: Implement Syscall > User Dispatch Suspension > To: Gregory Price <gourry.memverge@gmail.com> > > > On Mon, Jan 09, 2023 at 10:33:46AM -0500, Gregory Price wrote: > > @@ -36,6 +37,10 @@ bool syscall_user_dispatch(struct pt_regs *regs) > > struct syscall_user_dispatch *sd = ¤t->syscall_dispatch; > > char state; > > > > + if (IS_ENABLED(CONFIG_CHECKPOINT_RESTORE) && > > + unlikely(current->ptrace & > PT_SUSPEND_SYSCALL_USER_DISPATCH)) > > + return false; > > + > > if (likely(instruction_pointer(regs) - sd->offset < sd->len)) > > return false; > > > > So by making syscall_user_dispatch() return false, we'll make > syscall_trace_enter() continue to handle things, and supposedly you want > to land in ptrace_report_syscall_entry(), right? > > ... snip ... > > Should setting this then not also depend on having > SYSCALL_WORK_SYSCALL_TRACE set? Because without that, you get 'funny' > things. Hm, this is an interesting question. My thoughts are that I want the process to handle the syscall as-if syscall user dispatch was not present at all, regardless of SYSCALL_TRACE. This is because some software, like CRIU, actually injects syscalls to run in the context of the software in an effort to collect resources. So I actually *want* those 'funny' things to occur, because they're most likely intentional. I don't necessarily want to intercept system calls that subsequently occur (although i might). So if this feature required SYSCALL_TRACE, you would no longer be able to inject system calls ala CRIU. That's also my understanding of the SECCOMP_SUSPEND feature as well, it's intended specifically to allow *otherwise disallowed* syscalls to be injected into the process and SECCOMP bypassed. (in this case, SECCOMP_SUSPEND requires root for exactly this reason).
On Wed, Jan 18, 2023 at 02:49:31PM -0500, Gregory Price wrote: > On Wed, Jan 18, 2023 at 02:41:00PM -0500, Gregory Price wrote: > > ---------- Forwarded message --------- > > From: Peter Zijlstra <peterz@infradead.org> > > Date: Wed, Jan 18, 2023 at 12:16 PM > > Subject: Re: [PATCH 1/3] ptrace,syscall_user_dispatch: Implement Syscall > > User Dispatch Suspension > > To: Gregory Price <gourry.memverge@gmail.com> > > > > > > On Mon, Jan 09, 2023 at 10:33:46AM -0500, Gregory Price wrote: > > > @@ -36,6 +37,10 @@ bool syscall_user_dispatch(struct pt_regs *regs) > > > struct syscall_user_dispatch *sd = ¤t->syscall_dispatch; > > > char state; > > > > > > + if (IS_ENABLED(CONFIG_CHECKPOINT_RESTORE) && > > > + unlikely(current->ptrace & > > PT_SUSPEND_SYSCALL_USER_DISPATCH)) > > > + return false; > > > + > > > if (likely(instruction_pointer(regs) - sd->offset < sd->len)) > > > return false; > > > > > > > So by making syscall_user_dispatch() return false, we'll make > > syscall_trace_enter() continue to handle things, and supposedly you want > > to land in ptrace_report_syscall_entry(), right? > > > > ... snip ... > > > > Should setting this then not also depend on having > > SYSCALL_WORK_SYSCALL_TRACE set? Because without that, you get 'funny' > > things. > > Hm, this is an interesting question. My thoughts are that I want the > process to handle the syscall as-if syscall user dispatch was not > present at all, regardless of SYSCALL_TRACE. > > This is because some software, like CRIU, actually injects syscalls to > run in the context of the software in an effort to collect resources. Oh, right. I used to know that. > So I actually *want* those 'funny' things to occur, because they're most > likely intentional. I don't necessarily want to intercept system calls > that subsequently occur (although i might). > > So if this feature required SYSCALL_TRACE, you would no longer be able > to inject system calls ala CRIU. Yeah, I suppose you're right. It makes it a very sharp instrument, but I suppose you get what you asked for.
diff --git a/include/linux/ptrace.h b/include/linux/ptrace.h index eaaef3ffec22..461ae5c99d57 100644 --- a/include/linux/ptrace.h +++ b/include/linux/ptrace.h @@ -45,6 +45,8 @@ extern int ptrace_access_vm(struct task_struct *tsk, unsigned long addr, #define PT_EXITKILL (PTRACE_O_EXITKILL << PT_OPT_FLAG_SHIFT) #define PT_SUSPEND_SECCOMP (PTRACE_O_SUSPEND_SECCOMP << PT_OPT_FLAG_SHIFT) +#define PT_SUSPEND_SYSCALL_USER_DISPATCH \ + (PTRACE_O_SUSPEND_SYSCALL_USER_DISPATCH << PT_OPT_FLAG_SHIFT) extern long arch_ptrace(struct task_struct *child, long request, unsigned long addr, unsigned long data); diff --git a/include/uapi/linux/ptrace.h b/include/uapi/linux/ptrace.h index 195ae64a8c87..ba9e3f19a22c 100644 --- a/include/uapi/linux/ptrace.h +++ b/include/uapi/linux/ptrace.h @@ -146,9 +146,13 @@ struct ptrace_rseq_configuration { /* eventless options */ #define PTRACE_O_EXITKILL (1 << 20) #define PTRACE_O_SUSPEND_SECCOMP (1 << 21) +#define PTRACE_O_SUSPEND_SYSCALL_USER_DISPATCH (1 << 22) #define PTRACE_O_MASK (\ - 0x000000ff | PTRACE_O_EXITKILL | PTRACE_O_SUSPEND_SECCOMP) + 0x000000ff | \ + PTRACE_O_EXITKILL | \ + PTRACE_O_SUSPEND_SECCOMP | \ + PTRACE_O_SUSPEND_SYSCALL_USER_DISPATCH) #include <asm/ptrace.h> diff --git a/kernel/entry/syscall_user_dispatch.c b/kernel/entry/syscall_user_dispatch.c index 0b6379adff6b..f097c06224c9 100644 --- a/kernel/entry/syscall_user_dispatch.c +++ b/kernel/entry/syscall_user_dispatch.c @@ -8,6 +8,7 @@ #include <linux/uaccess.h> #include <linux/signal.h> #include <linux/elf.h> +#include <linux/ptrace.h> #include <linux/sched/signal.h> #include <linux/sched/task_stack.h> @@ -36,6 +37,10 @@ bool syscall_user_dispatch(struct pt_regs *regs) struct syscall_user_dispatch *sd = ¤t->syscall_dispatch; char state; + if (IS_ENABLED(CONFIG_CHECKPOINT_RESTORE) && + unlikely(current->ptrace & PT_SUSPEND_SYSCALL_USER_DISPATCH)) + return false; + if (likely(instruction_pointer(regs) - sd->offset < sd->len)) return false; diff --git a/kernel/ptrace.c b/kernel/ptrace.c index 54482193e1ed..a6ad815bd4be 100644 --- a/kernel/ptrace.c +++ b/kernel/ptrace.c @@ -370,6 +370,11 @@ static int check_ptrace_options(unsigned long data) if (data & ~(unsigned long)PTRACE_O_MASK) return -EINVAL; + if (unlikely(data & PTRACE_O_SUSPEND_SYSCALL_USER_DISPATCH)) { + if (!IS_ENABLED(CONFIG_CHECKPOINT_RESTART)) + return -EINVAL; + } + if (unlikely(data & PTRACE_O_SUSPEND_SECCOMP)) { if (!IS_ENABLED(CONFIG_CHECKPOINT_RESTORE) || !IS_ENABLED(CONFIG_SECCOMP))
Adds PTRACE_O_SUSPEND_SYSCALL_USER_DISPATCH to ptrace options, and modify Syscall User Dispatch to suspend interception when enabled. This is modeled after the SUSPEND_SECCOMP feature, which suspends SECCOMP interposition. Without doing this, software like CRIU will inject system calls into a process and be intercepted by Syscall User Dispatch, either causing a crash (due to blocked signals) or the delivery of those signals to a ptracer (not the intended behavior). Since Syscall User Dispatch is not a privileged feature, a check for permissions is not required, however attempting to set this option when CONFIG_CHECKPOINT_RESTORE it not supported should be disallowed, as its intended use is checkpoint/resume. Signed-off-by: Gregory Price <gregory.price@memverge.com> --- include/linux/ptrace.h | 2 ++ include/uapi/linux/ptrace.h | 6 +++++- kernel/entry/syscall_user_dispatch.c | 5 +++++ kernel/ptrace.c | 5 +++++ 4 files changed, 17 insertions(+), 1 deletion(-)