diff mbox

[v2,1/8] exec: introduce cred_guard_light

Message ID 1474663238-22134-2-git-send-email-jann@thejh.net (mailing list archive)
State New, archived
Headers show

Commit Message

Jann Horn Sept. 23, 2016, 8:40 p.m. UTC
This is a new per-threadgroup lock that can often be taken instead of
cred_guard_mutex and has less deadlock potential. I'm doing this because
Oleg Nesterov mentioned the potential for deadlocks, in particular if a
debugged task is stuck in execve, trying to get rid of a ptrace-stopped
thread, and the debugger attempts to inspect procfs files of the debugged
task.

The binfmt handlers (in particular for elf_fdpic and flat) might still
call VFS read and mmap operations on the binary with the lock held, but
not open operations (as is the case with cred_guard_mutex).

An rwlock would be more appropriate here, but apparently those don't
have _killable variants of the locking functions?

This is a preparation patch for using proper locking in more places.

Reported-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Jann Horn <jann@thejh.net>
---
 fs/exec.c                 | 15 ++++++++++++++-
 include/linux/init_task.h |  1 +
 include/linux/sched.h     | 10 ++++++++++
 kernel/fork.c             |  1 +
 kernel/ptrace.c           | 10 ++++++++++
 5 files changed, 36 insertions(+), 1 deletion(-)

Comments

Oleg Nesterov Sept. 30, 2016, 3:35 p.m. UTC | #1
On 09/23, Jann Horn wrote:
>
> This is a new per-threadgroup lock that can often be taken instead of
> cred_guard_mutex and has less deadlock potential.

Oh, please don't.

> I'm doing this because
> Oleg Nesterov mentioned the potential for deadlocks, in particular if a
> debugged task is stuck in execve, trying to get rid of a ptrace-stopped
> thread, and the debugger attempts to inspect procfs files of the debugged
> task.

Yes, but we need to fix this anyway. And I am not sure the new mutex can
actually help.

And I think that cred_guard_mutex is already over-used in fs/proc. Say,
I think lock_trace() must die, I simply can't understand why it is useful.

Suppose we modify, say, proc_pid_stack() to do

	save_stack_trace_tsk(task, &trace);
	if (!ptrace_may_access(task, ...))
		goto return -EPERM;

	for (i = 0; i < trace.nr_entries; i++)
		seq_printf(...);

	return 0;

is there any problem if it shows some trace before setuid exec does
install_exec_creds() ?

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric W. Biederman Sept. 30, 2016, 6:27 p.m. UTC | #2
Oleg Nesterov <oleg@redhat.com> writes:

> On 09/23, Jann Horn wrote:
>>
>> This is a new per-threadgroup lock that can often be taken instead of
>> cred_guard_mutex and has less deadlock potential.
>
> Oh, please don't.
>
>> I'm doing this because
>> Oleg Nesterov mentioned the potential for deadlocks, in particular if a
>> debugged task is stuck in execve, trying to get rid of a ptrace-stopped
>> thread, and the debugger attempts to inspect procfs files of the debugged
>> task.
>
> Yes, but we need to fix this anyway. And I am not sure the new mutex can
> actually help.
>
> And I think that cred_guard_mutex is already over-used in fs/proc. Say,
> I think lock_trace() must die, I simply can't understand why it is useful.
>
> Suppose we modify, say, proc_pid_stack() to do
>
> 	save_stack_trace_tsk(task, &trace);
> 	if (!ptrace_may_access(task, ...))
> 		goto return -EPERM;
>
> 	for (i = 0; i < trace.nr_entries; i++)
> 		seq_printf(...);
>
> 	return 0;
>
> is there any problem if it shows some trace before setuid exec does
> install_exec_creds() ?

You should make certain that the mm doesn't change in that picture,
perhaps like /proc/<pid>/mem does.  At which point exec (which changes
the mm) should not be an issue.

But we definitely should be able to check permissions on open (possibly
including grabbing resources) and then perform the work.

Eric

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Oleg Nesterov Oct. 3, 2016, 4:02 p.m. UTC | #3
On 09/30, Eric W. Biederman wrote:
>
> Oleg Nesterov <oleg@redhat.com> writes:
>
> > And I think that cred_guard_mutex is already over-used in fs/proc. Say,
> > I think lock_trace() must die, I simply can't understand why it is useful.
> >
> > Suppose we modify, say, proc_pid_stack() to do
> >
> > 	save_stack_trace_tsk(task, &trace);
> > 	if (!ptrace_may_access(task, ...))
> > 		goto return -EPERM;
> >
> > 	for (i = 0; i < trace.nr_entries; i++)
> > 		seq_printf(...);
> >
> > 	return 0;
> >
> > is there any problem if it shows some trace before setuid exec does
> > install_exec_creds() ?
>
> You should make certain that the mm doesn't change in that picture,
> perhaps like /proc/<pid>/mem does.

Why? We do not care I think, /proc/pid/stack has nothing to do with
task->mm.

OK, we can use __ptrace_may_access() and call save_stack_trace_tsk()
under task_lock(), but I don't understand why should we worry.

And again, do you see any security problem with the code above? Yes,
it is "racy" but I think it is fine to occasionally succeed when it
races with credentials change. I fail to understand why the current
code abuses cred_guard_mutex to close the race with setuid exec.

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jann Horn Oct. 30, 2016, 9:12 p.m. UTC | #4
On Fri, Sep 30, 2016 at 05:35:05PM +0200, Oleg Nesterov wrote:
> On 09/23, Jann Horn wrote:
> >
> > This is a new per-threadgroup lock that can often be taken instead of
> > cred_guard_mutex and has less deadlock potential.
> 
> Oh, please don't.
> 
> > I'm doing this because
> > Oleg Nesterov mentioned the potential for deadlocks, in particular if a
> > debugged task is stuck in execve, trying to get rid of a ptrace-stopped
> > thread, and the debugger attempts to inspect procfs files of the debugged
> > task.
> 
> Yes, but we need to fix this anyway. And I am not sure the new mutex can
> actually help.
> 
> And I think that cred_guard_mutex is already over-used in fs/proc. Say,
> I think lock_trace() must die, I simply can't understand why it is useful.

IMO it's just about reducing potential (really small) information leaks.
These leaks probably don't matter much in practice - they're racy and only
disclose minimal amounts of data -, but in principle, they expose data.

But since you're opposed to it and I don't see a significant benefit in
having these checks, I'll just remove the lock_trace() stuff from my patch
for now.

> Suppose we modify, say, proc_pid_stack() to do
> 
> 	save_stack_trace_tsk(task, &trace);
> 	if (!ptrace_may_access(task, ...))
> 		goto return -EPERM;
> 
> 	for (i = 0; i < trace.nr_entries; i++)
> 		seq_printf(...);
> 
> 	return 0;
> 
> is there any problem if it shows some trace before setuid exec does
> install_exec_creds() ?
> 
> Oleg.
>
diff mbox

Patch

diff --git a/fs/exec.c b/fs/exec.c
index 6fcfb3f..84430ee 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1238,6 +1238,10 @@  int flush_old_exec(struct linux_binprm * bprm)
 	if (retval)
 		goto out;
 
+	retval = mutex_lock_killable(&current->signal->cred_guard_light);
+	if (retval)
+		goto out;
+
 	/*
 	 * Must be called _before_ exec_mmap() as bprm->mm is
 	 * not visibile until then. This also enables the update
@@ -1251,7 +1255,7 @@  int flush_old_exec(struct linux_binprm * bprm)
 	acct_arg_size(bprm, 0);
 	retval = exec_mmap(bprm->mm);
 	if (retval)
-		goto out;
+		goto out_unlock;
 
 	bprm->mm = NULL;		/* We're using it now */
 
@@ -1263,6 +1267,8 @@  int flush_old_exec(struct linux_binprm * bprm)
 
 	return 0;
 
+out_unlock:
+	mutex_unlock(&current->signal->cred_guard_light);
 out:
 	return retval;
 }
@@ -1386,6 +1392,7 @@  void install_exec_creds(struct linux_binprm *bprm)
 	 * credentials; any time after this it may be unlocked.
 	 */
 	security_bprm_committed_creds(bprm);
+	mutex_unlock(&current->signal->cred_guard_light);
 	mutex_unlock(&current->signal->cred_guard_mutex);
 }
 EXPORT_SYMBOL(install_exec_creds);
@@ -1753,6 +1760,12 @@  static int do_execveat_common(int fd, struct filename *filename,
 	return retval;
 
 out:
+	if (!bprm->mm && bprm->cred) {
+		/* failure after flush_old_exec(), but before
+		 * install_exec_creds()
+		 */
+		mutex_unlock(&current->signal->cred_guard_light);
+	}
 	if (bprm->mm) {
 		acct_arg_size(bprm, 0);
 		mmput(bprm->mm);
diff --git a/include/linux/init_task.h b/include/linux/init_task.h
index f8834f8..cd9faa0 100644
--- a/include/linux/init_task.h
+++ b/include/linux/init_task.h
@@ -58,6 +58,7 @@  extern struct fs_struct init_fs;
 	INIT_PREV_CPUTIME(sig)						\
 	.cred_guard_mutex =						\
 		 __MUTEX_INITIALIZER(sig.cred_guard_mutex),		\
+	.cred_guard_light = __MUTEX_INITIALIZER(sig.cred_guard_light)	\
 }
 
 extern struct nsproxy init_nsproxy;
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 62c68e5..2a1df2f 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -808,6 +808,16 @@  struct signal_struct {
 	struct mutex cred_guard_mutex;	/* guard against foreign influences on
 					 * credential calculations
 					 * (notably. ptrace) */
+	/*
+	 * Lightweight version of cred_guard_mutex; used to prevent race
+	 * conditions where a user can gain information about the post-execve
+	 * state of a task to which access should only be granted pre-execve.
+	 * Hold this mutex while performing remote task inspection associated
+	 * with a security check.
+	 * This mutex MUST NOT be used in cases where anything changes about
+	 * the security properties of a running execve().
+	 */
+	struct mutex cred_guard_light;
 };
 
 /*
diff --git a/kernel/fork.c b/kernel/fork.c
index beb3172..2d46f3a 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1215,6 +1215,7 @@  static int copy_signal(unsigned long clone_flags, struct task_struct *tsk)
 				   current->signal->is_child_subreaper;
 
 	mutex_init(&sig->cred_guard_mutex);
+	mutex_init(&sig->cred_guard_light);
 
 	return 0;
 }
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 1d3b766..b5120ec 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -283,6 +283,16 @@  ok:
 	return security_ptrace_access_check(task, mode);
 }
 
+/*
+ * NOTE: When you call this function, you need to ensure that the target task
+ * can't acquire (via setuid execve) credentials between the ptrace access
+ * check and the privileged access. The recommended way to do this is to hold
+ * one of task->signal->{cred_guard_mutex,cred_guard_light} while calling this
+ * function and performing the requested access.
+ *
+ * This function may only be used if access is requested in the name of
+ * current_cred().
+ */
 bool ptrace_may_access(struct task_struct *task, unsigned int mode)
 {
 	int err;