diff mbox series

[PATCHv5] exec: Fix a deadlock in ptrace

Message ID AM6PR03MB5170285B336790D3450E2644E4E40@AM6PR03MB5170.eurprd03.prod.outlook.com (mailing list archive)
State New, archived
Headers show
Series [PATCHv5] exec: Fix a deadlock in ptrace | expand

Commit Message

Bernd Edlinger March 3, 2020, 1:02 p.m. UTC
This fixes a deadlock in the tracer when tracing a multi-threaded
application that calls execve while more than one thread are running.

I observed that when running strace on the gcc test suite, it always
blocks after a while, when expect calls execve, because other threads
have to be terminated.  They send ptrace events, but the strace is no
longer able to respond, since it is blocked in vm_access.

The deadlock is always happening when strace needs to access the
tracees process mmap, while another thread in the tracee starts to
execve a child process, but that cannot continue until the
PTRACE_EVENT_EXIT is handled and the WIFEXITED event is received:

strace          D    0 30614  30584 0x00000000
Call Trace:
__schedule+0x3ce/0x6e0
schedule+0x5c/0xd0
schedule_preempt_disabled+0x15/0x20
__mutex_lock.isra.13+0x1ec/0x520
__mutex_lock_killable_slowpath+0x13/0x20
mutex_lock_killable+0x28/0x30
mm_access+0x27/0xa0
process_vm_rw_core.isra.3+0xff/0x550
process_vm_rw+0xdd/0xf0
__x64_sys_process_vm_readv+0x31/0x40
do_syscall_64+0x64/0x220
entry_SYSCALL_64_after_hwframe+0x44/0xa9

expect          D    0 31933  30876 0x80004003
Call Trace:
__schedule+0x3ce/0x6e0
schedule+0x5c/0xd0
flush_old_exec+0xc4/0x770
load_elf_binary+0x35a/0x16c0
search_binary_handler+0x97/0x1d0
__do_execve_file.isra.40+0x5d4/0x8a0
__x64_sys_execve+0x49/0x60
do_syscall_64+0x64/0x220
entry_SYSCALL_64_after_hwframe+0x44/0xa9

The proposed solution is to take the cred_guard_mutex only
in a critical section at the beginning, and at the end of the
execve function, and let PTRACE_ATTACH fail with EAGAIN while
execve is not complete, but other functions like vm_access are
allowed to complete normally.

This changes the lifetime of the cred_guard_mutex lock to be:
	- during prepare_bprm_creds()
	- from flush_old_exec() through install_exec_creds()
Before, cred_guard_mutex was held from prepare_bprm_creds() through
install_exec_creds().

I also took the opportunity to improve the documentation
of prepare_creds, which is obviously out of sync.

Signed-off-by: Bernd Edlinger <bernd.edlinger@hotmail.de>
---
 Documentation/security/credentials.rst    | 19 +++++----
 fs/exec.c                                 | 41 ++++++++++++++++---
 include/linux/binfmts.h                   |  6 ++-
 include/linux/sched/signal.h              |  2 +
 kernel/cred.c                             |  2 +-
 kernel/ptrace.c                           |  4 ++
 kernel/seccomp.c                          |  3 ++
 mm/process_vm_access.c                    |  2 +-
 tools/testing/selftests/ptrace/Makefile   |  4 +-
 tools/testing/selftests/ptrace/vmaccess.c | 66 +++++++++++++++++++++++++++++++
 10 files changed, 130 insertions(+), 19 deletions(-)
 create mode 100644 tools/testing/selftests/ptrace/vmaccess.c

v2: adds a test case which passes when this patch is applied.
v3: fixes the issue without introducing a new mutex.
v4: fixes one comment and a formatting issue found by checkpatch.pl in the test case. 
v5: addresses review comments.

Comments

Eric W. Biederman March 3, 2020, 3:18 p.m. UTC | #1
Bernd Edlinger <bernd.edlinger@hotmail.de> writes:

> This fixes a deadlock in the tracer when tracing a multi-threaded
> application that calls execve while more than one thread are running.
>
> I observed that when running strace on the gcc test suite, it always
> blocks after a while, when expect calls execve, because other threads
> have to be terminated.  They send ptrace events, but the strace is no
> longer able to respond, since it is blocked in vm_access.
>
> The deadlock is always happening when strace needs to access the
> tracees process mmap, while another thread in the tracee starts to
> execve a child process, but that cannot continue until the
> PTRACE_EVENT_EXIT is handled and the WIFEXITED event is received:

A couple of things.

Why do we think it is safe to change the behavior exposed to userspace?
Not the deadlock but all of the times the current code would not
deadlock?

Especially given that this is a small window it might be hard for people
to track down and report so we need a strong argument that this won't
break existing userspace before we just change things.

Usually surveying all of the users of a system call that we can find
and checking to see if they might be affected by the change in behavior
is difficult enough that we usually opt for not being lazy and
preserving the behavior.

This patch is up to two changes in behavior now, that could potentially
affect a whole array of programs.  Adding linux-api so that this change
in behavior can be documented if/when this change goes through.

If you can split the documentation and test fixes out into separate
patches that would help reviewing this code, or please make it explicit
that the your are changing documentation about behavior that is changing
with this patch.

Eric

> diff --git a/tools/testing/selftests/ptrace/vmaccess.c b/tools/testing/selftests/ptrace/vmaccess.c
> new file mode 100644
> index 0000000..6d8a048
> --- /dev/null
> +++ b/tools/testing/selftests/ptrace/vmaccess.c
> @@ -0,0 +1,66 @@
> +// SPDX-License-Identifier: GPL-2.0+
> +/*
> + * Copyright (c) 2020 Bernd Edlinger <bernd.edlinger@hotmail.de>
> + * All rights reserved.
> + *
> + * Check whether /proc/$pid/mem can be accessed without causing deadlocks
> + * when de_thread is blocked with ->cred_guard_mutex held.
> + */
> +
> +#include "../kselftest_harness.h"
> +#include <stdio.h>
> +#include <fcntl.h>
> +#include <pthread.h>
> +#include <signal.h>
> +#include <unistd.h>
> +#include <sys/ptrace.h>
> +
> +static void *thread(void *arg)
> +{
> +	ptrace(PTRACE_TRACEME, 0, 0L, 0L);
> +	return NULL;
> +}
> +
> +TEST(vmaccess)
> +{
> +	int f, pid = fork();
> +	char mm[64];
> +
> +	if (!pid) {
> +		pthread_t pt;
> +
> +		pthread_create(&pt, NULL, thread, NULL);
> +		pthread_join(pt, NULL);
> +		execlp("true", "true", NULL);
> +	}
> +
> +	sleep(1);
> +	sprintf(mm, "/proc/%d/mem", pid);
> +	f = open(mm, O_RDONLY);
> +	ASSERT_LE(0, f);
> +	close(f);
> +	f = kill(pid, SIGCONT);
> +	ASSERT_EQ(0, f);
> +}
> +
> +TEST(attach)
> +{
> +	int f, pid = fork();
> +
> +	if (!pid) {
> +		pthread_t pt;
> +
> +		pthread_create(&pt, NULL, thread, NULL);
> +		pthread_join(pt, NULL);
> +		execlp("true", "true", NULL);
> +	}
> +
> +	sleep(1);
> +	f = ptrace(PTRACE_ATTACH, pid, 0L, 0L);

To be meaningful this code needs to learn to loop when
ptrace returns -EAGAIN.

Because that is pretty much what any self respecting user space
process will do.

At which point I am not certain we can say that the behavior has
sufficiently improved not to be a deadlock.

> +	ASSERT_EQ(EAGAIN, errno);
> +	ASSERT_EQ(f, -1);
> +	f = kill(pid, SIGCONT);
> +	ASSERT_EQ(0, f);
> +}
> +
> +TEST_HARNESS_MAIN

Eric
Bernd Edlinger March 3, 2020, 4:48 p.m. UTC | #2
On 3/3/20 4:18 PM, Eric W. Biederman wrote:
> Bernd Edlinger <bernd.edlinger@hotmail.de> writes:
> 
>> This fixes a deadlock in the tracer when tracing a multi-threaded
>> application that calls execve while more than one thread are running.
>>
>> I observed that when running strace on the gcc test suite, it always
>> blocks after a while, when expect calls execve, because other threads
>> have to be terminated.  They send ptrace events, but the strace is no
>> longer able to respond, since it is blocked in vm_access.
>>
>> The deadlock is always happening when strace needs to access the
>> tracees process mmap, while another thread in the tracee starts to
>> execve a child process, but that cannot continue until the
>> PTRACE_EVENT_EXIT is handled and the WIFEXITED event is received:
> 
> A couple of things.
> 
> Why do we think it is safe to change the behavior exposed to userspace?
> Not the deadlock but all of the times the current code would not
> deadlock?
> 
> Especially given that this is a small window it might be hard for people
> to track down and report so we need a strong argument that this won't
> break existing userspace before we just change things.
> 

Hmm, I tend to agree.

> Usually surveying all of the users of a system call that we can find
> and checking to see if they might be affected by the change in behavior
> is difficult enough that we usually opt for not being lazy and
> preserving the behavior.
> 
> This patch is up to two changes in behavior now, that could potentially
> affect a whole array of programs.  Adding linux-api so that this change
> in behavior can be documented if/when this change goes through.
> 

One is PTRACE_ACCESS possibly returning EAGAIN, yes.

We could try to restrict that behavior change to when any
thread is ptraced when execve starts, can't be too complicated.


But the other is only SYS_seccomp returning EAGAIN, when a different
thread of the current process is calling execve at the same time.

I would consider it completely impossible to have any user-visual effect,
since de_thread is just terminating all threads, including the thread
where the -EAGAIN was returned, so we will never know what happened.


> If you can split the documentation and test fixes out into separate
> patches that would help reviewing this code, or please make it explicit
> that the your are changing documentation about behavior that is changing
> with this patch.
> 

I am not sure if I have touched the right user documentation.

I only saw a document referring to a non-existent "current->cred_replace_mutex"
I haven't digged the git history, but that must be pre-historic IMHO.
It appears to me that is some developer documentation, but it's nevertheless
worth to keep up to date when the code changes.

So where would I add the possibility for PTRACE_ATTACH to return -EAGAIN ?


Bernd.

> Eric
> 
>> diff --git a/tools/testing/selftests/ptrace/vmaccess.c b/tools/testing/selftests/ptrace/vmaccess.c
>> new file mode 100644
>> index 0000000..6d8a048
>> --- /dev/null
>> +++ b/tools/testing/selftests/ptrace/vmaccess.c
>> @@ -0,0 +1,66 @@
>> +// SPDX-License-Identifier: GPL-2.0+
>> +/*
>> + * Copyright (c) 2020 Bernd Edlinger <bernd.edlinger@hotmail.de>
>> + * All rights reserved.
>> + *
>> + * Check whether /proc/$pid/mem can be accessed without causing deadlocks
>> + * when de_thread is blocked with ->cred_guard_mutex held.
>> + */
>> +
>> +#include "../kselftest_harness.h"
>> +#include <stdio.h>
>> +#include <fcntl.h>
>> +#include <pthread.h>
>> +#include <signal.h>
>> +#include <unistd.h>
>> +#include <sys/ptrace.h>
>> +
>> +static void *thread(void *arg)
>> +{
>> +	ptrace(PTRACE_TRACEME, 0, 0L, 0L);
>> +	return NULL;
>> +}
>> +
>> +TEST(vmaccess)
>> +{
>> +	int f, pid = fork();
>> +	char mm[64];
>> +
>> +	if (!pid) {
>> +		pthread_t pt;
>> +
>> +		pthread_create(&pt, NULL, thread, NULL);
>> +		pthread_join(pt, NULL);
>> +		execlp("true", "true", NULL);
>> +	}
>> +
>> +	sleep(1);
>> +	sprintf(mm, "/proc/%d/mem", pid);
>> +	f = open(mm, O_RDONLY);
>> +	ASSERT_LE(0, f);
>> +	close(f);
>> +	f = kill(pid, SIGCONT);
>> +	ASSERT_EQ(0, f);
>> +}
>> +
>> +TEST(attach)
>> +{
>> +	int f, pid = fork();
>> +
>> +	if (!pid) {
>> +		pthread_t pt;
>> +
>> +		pthread_create(&pt, NULL, thread, NULL);
>> +		pthread_join(pt, NULL);
>> +		execlp("true", "true", NULL);
>> +	}
>> +
>> +	sleep(1);
>> +	f = ptrace(PTRACE_ATTACH, pid, 0L, 0L);
> 
> To be meaningful this code needs to learn to loop when
> ptrace returns -EAGAIN.
> 
> Because that is pretty much what any self respecting user space
> process will do.
> 
> At which point I am not certain we can say that the behavior has
> sufficiently improved not to be a deadlock.
> 

In this special dead-duck test it won't work, but it would
still be lots more transparent what is going on, since previously
you had two zombie process, and no way to even output debug
messages, which also all self respecting user space processes
should do.

So yes, I can at least give a good example and re-try it several
times together with wait4 which a tracer is expected to do.

Bernd.

>> +	ASSERT_EQ(EAGAIN, errno);
>> +	ASSERT_EQ(f, -1);
>> +	f = kill(pid, SIGCONT);
>> +	ASSERT_EQ(0, f);
>> +}
>> +
>> +TEST_HARNESS_MAIN
> 
> Eric
>
Christian Brauner March 3, 2020, 4:50 p.m. UTC | #3
On Tue, Mar 03, 2020 at 09:18:44AM -0600, Eric W. Biederman wrote:
> Bernd Edlinger <bernd.edlinger@hotmail.de> writes:
> 
> > This fixes a deadlock in the tracer when tracing a multi-threaded
> > application that calls execve while more than one thread are running.
> >
> > I observed that when running strace on the gcc test suite, it always
> > blocks after a while, when expect calls execve, because other threads
> > have to be terminated.  They send ptrace events, but the strace is no
> > longer able to respond, since it is blocked in vm_access.
> >
> > The deadlock is always happening when strace needs to access the
> > tracees process mmap, while another thread in the tracee starts to
> > execve a child process, but that cannot continue until the
> > PTRACE_EVENT_EXIT is handled and the WIFEXITED event is received:
> 
> A couple of things.
> 
> Why do we think it is safe to change the behavior exposed to userspace?
> Not the deadlock but all of the times the current code would not
> deadlock?
> 
> Especially given that this is a small window it might be hard for people
> to track down and report so we need a strong argument that this won't
> break existing userspace before we just change things.
> 
> Usually surveying all of the users of a system call that we can find
> and checking to see if they might be affected by the change in behavior
> is difficult enough that we usually opt for not being lazy and
> preserving the behavior.
> 
> This patch is up to two changes in behavior now, that could potentially
> affect a whole array of programs.  Adding linux-api so that this change
> in behavior can be documented if/when this change goes through.
> 
> If you can split the documentation and test fixes out into separate
> patches that would help reviewing this code, or please make it explicit
> that the your are changing documentation about behavior that is changing
> with this patch.

Agreed. I think it'd be good to do it in three patches:
1. unrelated documentation update
2. fix + documentation changes specific to the fix
3. test(s)

Christian
Christian Brauner March 3, 2020, 5:01 p.m. UTC | #4
On Tue, Mar 03, 2020 at 04:48:01PM +0000, Bernd Edlinger wrote:
> On 3/3/20 4:18 PM, Eric W. Biederman wrote:
> > Bernd Edlinger <bernd.edlinger@hotmail.de> writes:
> > 
> >> This fixes a deadlock in the tracer when tracing a multi-threaded
> >> application that calls execve while more than one thread are running.
> >>
> >> I observed that when running strace on the gcc test suite, it always
> >> blocks after a while, when expect calls execve, because other threads
> >> have to be terminated.  They send ptrace events, but the strace is no
> >> longer able to respond, since it is blocked in vm_access.
> >>
> >> The deadlock is always happening when strace needs to access the
> >> tracees process mmap, while another thread in the tracee starts to
> >> execve a child process, but that cannot continue until the
> >> PTRACE_EVENT_EXIT is handled and the WIFEXITED event is received:
> > 
> > A couple of things.
> > 
> > Why do we think it is safe to change the behavior exposed to userspace?
> > Not the deadlock but all of the times the current code would not
> > deadlock?
> > 
> > Especially given that this is a small window it might be hard for people
> > to track down and report so we need a strong argument that this won't
> > break existing userspace before we just change things.
> > 
> 
> Hmm, I tend to agree.
> 
> > Usually surveying all of the users of a system call that we can find
> > and checking to see if they might be affected by the change in behavior
> > is difficult enough that we usually opt for not being lazy and
> > preserving the behavior.
> > 
> > This patch is up to two changes in behavior now, that could potentially
> > affect a whole array of programs.  Adding linux-api so that this change
> > in behavior can be documented if/when this change goes through.
> > 
> 
> One is PTRACE_ACCESS possibly returning EAGAIN, yes.
> 
> We could try to restrict that behavior change to when any
> thread is ptraced when execve starts, can't be too complicated.
> 
> 
> But the other is only SYS_seccomp returning EAGAIN, when a different
> thread of the current process is calling execve at the same time.
> 
> I would consider it completely impossible to have any user-visual effect,
> since de_thread is just terminating all threads, including the thread
> where the -EAGAIN was returned, so we will never know what happened.

I think if we risk a user-space facing change we should try the simple
thing first before making the fix more convoluted? But it's a tough
call...

> 
> 
> > If you can split the documentation and test fixes out into separate
> > patches that would help reviewing this code, or please make it explicit
> > that the your are changing documentation about behavior that is changing
> > with this patch.
> > 
> 
> I am not sure if I have touched the right user documentation.
> 
> I only saw a document referring to a non-existent "current->cred_replace_mutex"
> I haven't digged the git history, but that must be pre-historic IMHO.
> It appears to me that is some developer documentation, but it's nevertheless
> worth to keep up to date when the code changes.
> 
> So where would I add the possibility for PTRACE_ATTACH to return -EAGAIN ?

Since that would be a potentially user-visible change it would make the
most sense to add it to man ptrace(2) if/when we land this change.

For developers, placing a comment in kernel/ptrace.c:ptrace_attach()
would make the most sense? We already have something about exec
protection in there.

Christian
Christian Brauner March 3, 2020, 5:20 p.m. UTC | #5
On Tue, Mar 03, 2020 at 06:01:11PM +0100, Christian Brauner wrote:
> On Tue, Mar 03, 2020 at 04:48:01PM +0000, Bernd Edlinger wrote:
> > On 3/3/20 4:18 PM, Eric W. Biederman wrote:
> > > Bernd Edlinger <bernd.edlinger@hotmail.de> writes:
> > > 
> > >> This fixes a deadlock in the tracer when tracing a multi-threaded
> > >> application that calls execve while more than one thread are running.
> > >>
> > >> I observed that when running strace on the gcc test suite, it always
> > >> blocks after a while, when expect calls execve, because other threads
> > >> have to be terminated.  They send ptrace events, but the strace is no
> > >> longer able to respond, since it is blocked in vm_access.
> > >>
> > >> The deadlock is always happening when strace needs to access the
> > >> tracees process mmap, while another thread in the tracee starts to
> > >> execve a child process, but that cannot continue until the
> > >> PTRACE_EVENT_EXIT is handled and the WIFEXITED event is received:
> > > 
> > > A couple of things.
> > > 
> > > Why do we think it is safe to change the behavior exposed to userspace?
> > > Not the deadlock but all of the times the current code would not
> > > deadlock?
> > > 
> > > Especially given that this is a small window it might be hard for people
> > > to track down and report so we need a strong argument that this won't
> > > break existing userspace before we just change things.
> > > 
> > 
> > Hmm, I tend to agree.
> > 
> > > Usually surveying all of the users of a system call that we can find
> > > and checking to see if they might be affected by the change in behavior
> > > is difficult enough that we usually opt for not being lazy and
> > > preserving the behavior.
> > > 
> > > This patch is up to two changes in behavior now, that could potentially
> > > affect a whole array of programs.  Adding linux-api so that this change
> > > in behavior can be documented if/when this change goes through.
> > > 
> > 
> > One is PTRACE_ACCESS possibly returning EAGAIN, yes.
> > 
> > We could try to restrict that behavior change to when any
> > thread is ptraced when execve starts, can't be too complicated.
> > 
> > 
> > But the other is only SYS_seccomp returning EAGAIN, when a different
> > thread of the current process is calling execve at the same time.
> > 
> > I would consider it completely impossible to have any user-visual effect,
> > since de_thread is just terminating all threads, including the thread
> > where the -EAGAIN was returned, so we will never know what happened.
> 
> I think if we risk a user-space facing change we should try the simple
> thing first before making the fix more convoluted? But it's a tough
> call...

Actually, to get a _rough_ estimate of the possible impact I would
recommend you run the criu test suite (and possible the strace
test-suite) on a kernel with and without your fix. That's what I tend to
do when I touch code I fear will have impact on APIs that very deeply
touch core kernel. Criu's test-suite makes heavy use of ptrace and
usually runs into a bunch of interesting (exec) races too, and does have
tests for handling zombies processes etc. pp.

Should be relatively simple: create a vm and then criu build-dependencies,
git clone criu; cd criu; make; cd test; ./zdtm.py run -a --keep-going
If your system doesn't support Selinux properly, you need to disable it
when running the tests and you also need to make sure that you're using
python3 or change the shebang in zdtm.py to python3.

Just a recommendation.

Christian
Eric W. Biederman March 3, 2020, 8:08 p.m. UTC | #6
Bernd Edlinger <bernd.edlinger@hotmail.de> writes:

> On 3/3/20 4:18 PM, Eric W. Biederman wrote:
>> Bernd Edlinger <bernd.edlinger@hotmail.de> writes:
>>> diff --git a/tools/testing/selftests/ptrace/vmaccess.c b/tools/testing/selftests/ptrace/vmaccess.c
>>> new file mode 100644
>>> index 0000000..6d8a048
>>> --- /dev/null
>>> +++ b/tools/testing/selftests/ptrace/vmaccess.c
>>> @@ -0,0 +1,66 @@
>>> +// SPDX-License-Identifier: GPL-2.0+
>>> +/*
>>> + * Copyright (c) 2020 Bernd Edlinger <bernd.edlinger@hotmail.de>
>>> + * All rights reserved.
>>> + *
>>> + * Check whether /proc/$pid/mem can be accessed without causing deadlocks
>>> + * when de_thread is blocked with ->cred_guard_mutex held.
>>> + */
>>> +
>>> +#include "../kselftest_harness.h"
>>> +#include <stdio.h>
>>> +#include <fcntl.h>
>>> +#include <pthread.h>
>>> +#include <signal.h>
>>> +#include <unistd.h>
>>> +#include <sys/ptrace.h>
>>> +
>>> +static void *thread(void *arg)
>>> +{
>>> +	ptrace(PTRACE_TRACEME, 0, 0L, 0L);
>>> +	return NULL;
>>> +}
>>> +
>>> +TEST(vmaccess)
>>> +{
>>> +	int f, pid = fork();
>>> +	char mm[64];
>>> +
>>> +	if (!pid) {
>>> +		pthread_t pt;
>>> +
>>> +		pthread_create(&pt, NULL, thread, NULL);
>>> +		pthread_join(pt, NULL);
>>> +		execlp("true", "true", NULL);
>>> +	}
>>> +
>>> +	sleep(1);
>>> +	sprintf(mm, "/proc/%d/mem", pid);
>>> +	f = open(mm, O_RDONLY);
>>> +	ASSERT_LE(0, f);
>>> +	close(f);
>>> +	f = kill(pid, SIGCONT);
>>> +	ASSERT_EQ(0, f);
>>> +}
>>> +
>>> +TEST(attach)
>>> +{
>>> +	int f, pid = fork();
>>> +
>>> +	if (!pid) {
>>> +		pthread_t pt;
>>> +
>>> +		pthread_create(&pt, NULL, thread, NULL);
>>> +		pthread_join(pt, NULL);
>>> +		execlp("true", "true", NULL);
>>> +	}
>>> +
>>> +	sleep(1);
>>> +	f = ptrace(PTRACE_ATTACH, pid, 0L, 0L);
>> 
>> To be meaningful this code needs to learn to loop when
>> ptrace returns -EAGAIN.
>> 
>> Because that is pretty much what any self respecting user space
>> process will do.
>> 
>> At which point I am not certain we can say that the behavior has
>> sufficiently improved not to be a deadlock.
>> 
>
> In this special dead-duck test it won't work, but it would
> still be lots more transparent what is going on, since previously
> you had two zombie process, and no way to even output debug
> messages, which also all self respecting user space processes
> should do.

Agreed it is more transparent.  So if you are going to deadlock
it is better.

My previous proposal (which I admit is more work to implement) would
actually allow succeeding in this case and so it would not be subject to
a dead lock (even via -EGAIN) at this point.

> So yes, I can at least give a good example and re-try it several
> times together with wait4 which a tracer is expected to do.

Thank you,

Eric
Bernd Edlinger March 4, 2020, 2:37 p.m. UTC | #7
On 3/3/20 9:08 PM, Eric W. Biederman wrote:
> Bernd Edlinger <bernd.edlinger@hotmail.de> writes:
> 
>> On 3/3/20 4:18 PM, Eric W. Biederman wrote:
>>> Bernd Edlinger <bernd.edlinger@hotmail.de> writes:
>>>> diff --git a/tools/testing/selftests/ptrace/vmaccess.c b/tools/testing/selftests/ptrace/vmaccess.c
>>>> new file mode 100644
>>>> index 0000000..6d8a048
>>>> --- /dev/null
>>>> +++ b/tools/testing/selftests/ptrace/vmaccess.c
>>>> @@ -0,0 +1,66 @@
>>>> +// SPDX-License-Identifier: GPL-2.0+
>>>> +/*
>>>> + * Copyright (c) 2020 Bernd Edlinger <bernd.edlinger@hotmail.de>
>>>> + * All rights reserved.
>>>> + *
>>>> + * Check whether /proc/$pid/mem can be accessed without causing deadlocks
>>>> + * when de_thread is blocked with ->cred_guard_mutex held.
>>>> + */
>>>> +
>>>> +#include "../kselftest_harness.h"
>>>> +#include <stdio.h>
>>>> +#include <fcntl.h>
>>>> +#include <pthread.h>
>>>> +#include <signal.h>
>>>> +#include <unistd.h>
>>>> +#include <sys/ptrace.h>
>>>> +
>>>> +static void *thread(void *arg)
>>>> +{
>>>> +	ptrace(PTRACE_TRACEME, 0, 0L, 0L);
>>>> +	return NULL;
>>>> +}
>>>> +
>>>> +TEST(vmaccess)
>>>> +{
>>>> +	int f, pid = fork();
>>>> +	char mm[64];
>>>> +
>>>> +	if (!pid) {
>>>> +		pthread_t pt;
>>>> +
>>>> +		pthread_create(&pt, NULL, thread, NULL);
>>>> +		pthread_join(pt, NULL);
>>>> +		execlp("true", "true", NULL);
>>>> +	}
>>>> +
>>>> +	sleep(1);
>>>> +	sprintf(mm, "/proc/%d/mem", pid);
>>>> +	f = open(mm, O_RDONLY);
>>>> +	ASSERT_LE(0, f);
>>>> +	close(f);
>>>> +	f = kill(pid, SIGCONT);
>>>> +	ASSERT_EQ(0, f);
>>>> +}
>>>> +
>>>> +TEST(attach)
>>>> +{
>>>> +	int f, pid = fork();
>>>> +
>>>> +	if (!pid) {
>>>> +		pthread_t pt;
>>>> +
>>>> +		pthread_create(&pt, NULL, thread, NULL);
>>>> +		pthread_join(pt, NULL);
>>>> +		execlp("true", "true", NULL);
>>>> +	}
>>>> +
>>>> +	sleep(1);
>>>> +	f = ptrace(PTRACE_ATTACH, pid, 0L, 0L);
>>>
>>> To be meaningful this code needs to learn to loop when
>>> ptrace returns -EAGAIN.
>>>
>>> Because that is pretty much what any self respecting user space
>>> process will do.
>>>
>>> At which point I am not certain we can say that the behavior has
>>> sufficiently improved not to be a deadlock.
>>>
>>
>> In this special dead-duck test it won't work, but it would
>> still be lots more transparent what is going on, since previously
>> you had two zombie process, and no way to even output debug
>> messages, which also all self respecting user space processes
>> should do.
> 
> Agreed it is more transparent.  So if you are going to deadlock
> it is better.
> 
> My previous proposal (which I admit is more work to implement) would
> actually allow succeeding in this case and so it would not be subject to
> a dead lock (even via -EGAIN) at this point.
> 
>> So yes, I can at least give a good example and re-try it several
>> times together with wait4 which a tracer is expected to do.
> 
> Thank you,
> 
> Eric
> 

Okay, I think it can be done with minimal API changes,
but it needs two mutexes, one that guards the execve,
and one that guards only the credentials.

If no traced sibling thread exists, the mutexes are used this way:
lock(exec_guard_mutex)
cred_locked_in_execve = true;
de_thread()
lock(cred_guard_mutex)
unlock(cred_guard_mutex)
cred_locked_in_execve = false;
unlock(exec_guard_mutex)

so effectively no API change at all.

If a traced sibling thread exists, the mutexes are used differently:
lock(exec_guard_mutex)
cred_locked_in_execve = true;
unlock(exec_guard_mutex)
de_thread()
lock(cred_guard_mutex)
unlock(cred_guard_mutex)
lock(exec_guard_mutex)
cred_locked_in_execve = false;
unlock(exec_guard_mutex)

Only the case changes that would deadlock anyway.


Bernd.
Eric W. Biederman March 4, 2020, 4:33 p.m. UTC | #8
Bernd Edlinger <bernd.edlinger@hotmail.de> writes:

> On 3/3/20 9:08 PM, Eric W. Biederman wrote:
>> Bernd Edlinger <bernd.edlinger@hotmail.de> writes:
>> 
>>> On 3/3/20 4:18 PM, Eric W. Biederman wrote:
>>>> Bernd Edlinger <bernd.edlinger@hotmail.de> writes:
>>>>> diff --git a/tools/testing/selftests/ptrace/vmaccess.c b/tools/testing/selftests/ptrace/vmaccess.c
>>>>> new file mode 100644
>>>>> index 0000000..6d8a048
>>>>> --- /dev/null
>>>>> +++ b/tools/testing/selftests/ptrace/vmaccess.c
>>>>> @@ -0,0 +1,66 @@
>>>>> +// SPDX-License-Identifier: GPL-2.0+
>>>>> +/*
>>>>> + * Copyright (c) 2020 Bernd Edlinger <bernd.edlinger@hotmail.de>
>>>>> + * All rights reserved.
>>>>> + *
>>>>> + * Check whether /proc/$pid/mem can be accessed without causing deadlocks
>>>>> + * when de_thread is blocked with ->cred_guard_mutex held.
>>>>> + */
>>>>> +
>>>>> +#include "../kselftest_harness.h"
>>>>> +#include <stdio.h>
>>>>> +#include <fcntl.h>
>>>>> +#include <pthread.h>
>>>>> +#include <signal.h>
>>>>> +#include <unistd.h>
>>>>> +#include <sys/ptrace.h>
>>>>> +
>>>>> +static void *thread(void *arg)
>>>>> +{
>>>>> +	ptrace(PTRACE_TRACEME, 0, 0L, 0L);
>>>>> +	return NULL;
>>>>> +}
>>>>> +
>>>>> +TEST(vmaccess)
>>>>> +{
>>>>> +	int f, pid = fork();
>>>>> +	char mm[64];
>>>>> +
>>>>> +	if (!pid) {
>>>>> +		pthread_t pt;
>>>>> +
>>>>> +		pthread_create(&pt, NULL, thread, NULL);
>>>>> +		pthread_join(pt, NULL);
>>>>> +		execlp("true", "true", NULL);
>>>>> +	}
>>>>> +
>>>>> +	sleep(1);
>>>>> +	sprintf(mm, "/proc/%d/mem", pid);
>>>>> +	f = open(mm, O_RDONLY);
>>>>> +	ASSERT_LE(0, f);
>>>>> +	close(f);
>>>>> +	f = kill(pid, SIGCONT);
>>>>> +	ASSERT_EQ(0, f);
>>>>> +}
>>>>> +
>>>>> +TEST(attach)
>>>>> +{
>>>>> +	int f, pid = fork();
>>>>> +
>>>>> +	if (!pid) {
>>>>> +		pthread_t pt;
>>>>> +
>>>>> +		pthread_create(&pt, NULL, thread, NULL);
>>>>> +		pthread_join(pt, NULL);
>>>>> +		execlp("true", "true", NULL);
>>>>> +	}
>>>>> +
>>>>> +	sleep(1);
>>>>> +	f = ptrace(PTRACE_ATTACH, pid, 0L, 0L);
>>>>
>>>> To be meaningful this code needs to learn to loop when
>>>> ptrace returns -EAGAIN.
>>>>
>>>> Because that is pretty much what any self respecting user space
>>>> process will do.
>>>>
>>>> At which point I am not certain we can say that the behavior has
>>>> sufficiently improved not to be a deadlock.
>>>>
>>>
>>> In this special dead-duck test it won't work, but it would
>>> still be lots more transparent what is going on, since previously
>>> you had two zombie process, and no way to even output debug
>>> messages, which also all self respecting user space processes
>>> should do.
>> 
>> Agreed it is more transparent.  So if you are going to deadlock
>> it is better.
>> 
>> My previous proposal (which I admit is more work to implement) would
>> actually allow succeeding in this case and so it would not be subject to
>> a dead lock (even via -EGAIN) at this point.
>> 
>>> So yes, I can at least give a good example and re-try it several
>>> times together with wait4 which a tracer is expected to do.
>> 
>> Thank you,
>> 
>> Eric
>> 
>
> Okay, I think it can be done with minimal API changes,
> but it needs two mutexes, one that guards the execve,
> and one that guards only the credentials.
>
> If no traced sibling thread exists, the mutexes are used this way:
> lock(exec_guard_mutex)
> cred_locked_in_execve = true;
> de_thread()
> lock(cred_guard_mutex)
> unlock(cred_guard_mutex)
> cred_locked_in_execve = false;
> unlock(exec_guard_mutex)
>
> so effectively no API change at all.
>
> If a traced sibling thread exists, the mutexes are used differently:
> lock(exec_guard_mutex)
> cred_locked_in_execve = true;
> unlock(exec_guard_mutex)
> de_thread()
> lock(cred_guard_mutex)
> unlock(cred_guard_mutex)
> lock(exec_guard_mutex)
> cred_locked_in_execve = false;
> unlock(exec_guard_mutex)
>
> Only the case changes that would deadlock anyway.


Let me propose a slight alternative that I think sets us up for long
term success.

Leave cred_guard_mutex as is, but declare it undesirable.  The
cred_guard_mutex as designed really is something we should get rid of.
As it it can sleep over several different userspace accesses.  The
copying of the exec arguments is technically as prone to deadlock as the
ptrace case.

Add a new mutex with a better name perhaps "exec_change_mutex" that is
used to guard the changes that exec makes to a process.

Then we gradually shift all the cred_guard_mutex users over to the new
mutex.  AKA one patch per user of cred_guard_mutex.  At each patch that
shifts things over we will have the opportunity to review the code to
see that there no funny dependencies that were missed.

I will sign up for working on the no_new_privs and ptrace_attach cases
as I think I can make those happen.  Especially no_new_privs.

Getting the easier cases will resolve your issues and put things on a
better footing.

Eric
Bernd Edlinger March 4, 2020, 9:49 p.m. UTC | #9
On 3/4/20 5:33 PM, Eric W. Biederman wrote:
> Bernd Edlinger <bernd.edlinger@hotmail.de> writes:
> 
>> On 3/3/20 9:08 PM, Eric W. Biederman wrote:
>>> Bernd Edlinger <bernd.edlinger@hotmail.de> writes:
>>>
>>>> On 3/3/20 4:18 PM, Eric W. Biederman wrote:
>>>>> Bernd Edlinger <bernd.edlinger@hotmail.de> writes:
>>>>>> diff --git a/tools/testing/selftests/ptrace/vmaccess.c b/tools/testing/selftests/ptrace/vmaccess.c
>>>>>> new file mode 100644
>>>>>> index 0000000..6d8a048
>>>>>> --- /dev/null
>>>>>> +++ b/tools/testing/selftests/ptrace/vmaccess.c
>>>>>> @@ -0,0 +1,66 @@
>>>>>> +// SPDX-License-Identifier: GPL-2.0+
>>>>>> +/*
>>>>>> + * Copyright (c) 2020 Bernd Edlinger <bernd.edlinger@hotmail.de>
>>>>>> + * All rights reserved.
>>>>>> + *
>>>>>> + * Check whether /proc/$pid/mem can be accessed without causing deadlocks
>>>>>> + * when de_thread is blocked with ->cred_guard_mutex held.
>>>>>> + */
>>>>>> +
>>>>>> +#include "../kselftest_harness.h"
>>>>>> +#include <stdio.h>
>>>>>> +#include <fcntl.h>
>>>>>> +#include <pthread.h>
>>>>>> +#include <signal.h>
>>>>>> +#include <unistd.h>
>>>>>> +#include <sys/ptrace.h>
>>>>>> +
>>>>>> +static void *thread(void *arg)
>>>>>> +{
>>>>>> +	ptrace(PTRACE_TRACEME, 0, 0L, 0L);
>>>>>> +	return NULL;
>>>>>> +}
>>>>>> +
>>>>>> +TEST(vmaccess)
>>>>>> +{
>>>>>> +	int f, pid = fork();
>>>>>> +	char mm[64];
>>>>>> +
>>>>>> +	if (!pid) {
>>>>>> +		pthread_t pt;
>>>>>> +
>>>>>> +		pthread_create(&pt, NULL, thread, NULL);
>>>>>> +		pthread_join(pt, NULL);
>>>>>> +		execlp("true", "true", NULL);
>>>>>> +	}
>>>>>> +
>>>>>> +	sleep(1);
>>>>>> +	sprintf(mm, "/proc/%d/mem", pid);
>>>>>> +	f = open(mm, O_RDONLY);
>>>>>> +	ASSERT_LE(0, f);
>>>>>> +	close(f);
>>>>>> +	f = kill(pid, SIGCONT);
>>>>>> +	ASSERT_EQ(0, f);
>>>>>> +}
>>>>>> +
>>>>>> +TEST(attach)
>>>>>> +{
>>>>>> +	int f, pid = fork();
>>>>>> +
>>>>>> +	if (!pid) {
>>>>>> +		pthread_t pt;
>>>>>> +
>>>>>> +		pthread_create(&pt, NULL, thread, NULL);
>>>>>> +		pthread_join(pt, NULL);
>>>>>> +		execlp("true", "true", NULL);
>>>>>> +	}
>>>>>> +
>>>>>> +	sleep(1);
>>>>>> +	f = ptrace(PTRACE_ATTACH, pid, 0L, 0L);
>>>>>
>>>>> To be meaningful this code needs to learn to loop when
>>>>> ptrace returns -EAGAIN.
>>>>>
>>>>> Because that is pretty much what any self respecting user space
>>>>> process will do.
>>>>>
>>>>> At which point I am not certain we can say that the behavior has
>>>>> sufficiently improved not to be a deadlock.
>>>>>
>>>>
>>>> In this special dead-duck test it won't work, but it would
>>>> still be lots more transparent what is going on, since previously
>>>> you had two zombie process, and no way to even output debug
>>>> messages, which also all self respecting user space processes
>>>> should do.
>>>
>>> Agreed it is more transparent.  So if you are going to deadlock
>>> it is better.
>>>
>>> My previous proposal (which I admit is more work to implement) would
>>> actually allow succeeding in this case and so it would not be subject to
>>> a dead lock (even via -EGAIN) at this point.
>>>
>>>> So yes, I can at least give a good example and re-try it several
>>>> times together with wait4 which a tracer is expected to do.
>>>
>>> Thank you,
>>>
>>> Eric
>>>
>>
>> Okay, I think it can be done with minimal API changes,
>> but it needs two mutexes, one that guards the execve,
>> and one that guards only the credentials.
>>
>> If no traced sibling thread exists, the mutexes are used this way:
>> lock(exec_guard_mutex)
>> cred_locked_in_execve = true;
>> de_thread()
>> lock(cred_guard_mutex)
>> unlock(cred_guard_mutex)
>> cred_locked_in_execve = false;
>> unlock(exec_guard_mutex)
>>
>> so effectively no API change at all.
>>
>> If a traced sibling thread exists, the mutexes are used differently:
>> lock(exec_guard_mutex)
>> cred_locked_in_execve = true;
>> unlock(exec_guard_mutex)
>> de_thread()
>> lock(cred_guard_mutex)
>> unlock(cred_guard_mutex)
>> lock(exec_guard_mutex)
>> cred_locked_in_execve = false;
>> unlock(exec_guard_mutex)
>>
>> Only the case changes that would deadlock anyway.
> 
> 
> Let me propose a slight alternative that I think sets us up for long
> term success.
> 
> Leave cred_guard_mutex as is, but declare it undesirable.  The
> cred_guard_mutex as designed really is something we should get rid of.
> As it it can sleep over several different userspace accesses.  The
> copying of the exec arguments is technically as prone to deadlock as the
> ptrace case.
> 
> Add a new mutex with a better name perhaps "exec_change_mutex" that is
> used to guard the changes that exec makes to a process.
> 
> Then we gradually shift all the cred_guard_mutex users over to the new
> mutex.  AKA one patch per user of cred_guard_mutex.  At each patch that
> shifts things over we will have the opportunity to review the code to
> see that there no funny dependencies that were missed.
> 
> I will sign up for working on the no_new_privs and ptrace_attach cases
> as I think I can make those happen.  Especially no_new_privs.
> 
> Getting the easier cases will resolve your issues and put things on a
> better footing.
> 
> Eric
> 

Okay, however I think we will need two mutexes in the long term.

So currently I have reduced the cred_guard_mutex to protect just
the loading of the executable code in the process vm, since that
is what works for vm_access, (one of the test cases).
And another mutex that protects the whole execve function, that
is need for ptrace, (and seccomp).
But I have only a test case for ptrace.


If I understand that right, I should not recycle cred_guard_mutex
but leave it as is, and create two additional mutexes which will
take over step by step.

Sounds reasonable, indeed.

I will send an update (v6) what I have right now,
but just for information, so you can see how my minimal API-Change
approach works.


Bernd.
diff mbox series

Patch

diff --git a/Documentation/security/credentials.rst b/Documentation/security/credentials.rst
index 282e79f..0988798 100644
--- a/Documentation/security/credentials.rst
+++ b/Documentation/security/credentials.rst
@@ -437,9 +437,14 @@  new set of credentials by calling::
 
 	struct cred *prepare_creds(void);
 
-this locks current->cred_replace_mutex and then allocates and constructs a
-duplicate of the current process's credentials, returning with the mutex still
-held if successful.  It returns NULL if not successful (out of memory).
+this allocates and constructs a duplicate of the current process's credentials.
+It returns NULL if not successful (out of memory).
+
+If called from __do_execve_file, the mutex current->signal->cred_guard_mutex
+is acquired before this function gets called, and released after setting
+current->signal->cred_locked_in_execve.  The same mutex is acquired later,
+while the credentials and the process mmap are actually changed, and
+current->signal->cred_locked_in_execve is reset again.
 
 The mutex prevents ``ptrace()`` from altering the ptrace state of a process
 while security checks on credentials construction and changing is taking place
@@ -466,9 +471,8 @@  by calling::
 
 This will alter various aspects of the credentials and the process, giving the
 LSM a chance to do likewise, then it will use ``rcu_assign_pointer()`` to
-actually commit the new credentials to ``current->cred``, it will release
-``current->cred_replace_mutex`` to allow ``ptrace()`` to take place, and it
-will notify the scheduler and others of the changes.
+actually commit the new credentials to ``current->cred``, and it will notify
+the scheduler and others of the changes.
 
 This function is guaranteed to return 0, so that it can be tail-called at the
 end of such functions as ``sys_setresuid()``.
@@ -486,8 +490,7 @@  invoked::
 
 	void abort_creds(struct cred *new);
 
-This releases the lock on ``current->cred_replace_mutex`` that
-``prepare_creds()`` got and then releases the new credentials.
+This releases the new credentials.
 
 
 A typical credentials alteration function would look something like this::
diff --git a/fs/exec.c b/fs/exec.c
index 74d88da..5fc744e 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1266,6 +1266,12 @@  int flush_old_exec(struct linux_binprm * bprm)
 	if (retval)
 		goto out;
 
+	retval = mutex_lock_killable(&current->signal->cred_guard_mutex);
+	if (retval)
+		goto out;
+
+	bprm->called_flush_old_exec = 1;
+
 	/*
 	 * Must be called _before_ exec_mmap() as bprm->mm is
 	 * not visibile until then. This also enables the update
@@ -1398,29 +1404,51 @@  void finalize_exec(struct linux_binprm *bprm)
 EXPORT_SYMBOL(finalize_exec);
 
 /*
- * Prepare credentials and lock ->cred_guard_mutex.
+ * Prepare credentials and set ->cred_locked_in_execve.
  * install_exec_creds() commits the new creds and drops the lock.
  * Or, if exec fails before, free_bprm() should release ->cred and
  * and unlock.
  */
 static int prepare_bprm_creds(struct linux_binprm *bprm)
 {
+	int ret;
+
 	if (mutex_lock_interruptible(&current->signal->cred_guard_mutex))
 		return -ERESTARTNOINTR;
 
+	ret = -EAGAIN;
+	if (unlikely(current->signal->cred_locked_in_execve))
+		goto out;
+
+	ret = -ENOMEM;
 	bprm->cred = prepare_exec_creds();
-	if (likely(bprm->cred))
-		return 0;
+	if (likely(bprm->cred)) {
+		current->signal->cred_locked_in_execve = true;
+		ret = 0;
+	}
 
+out:
 	mutex_unlock(&current->signal->cred_guard_mutex);
-	return -ENOMEM;
+	return ret;
 }
 
 static void free_bprm(struct linux_binprm *bprm)
 {
 	free_arg_pages(bprm);
 	if (bprm->cred) {
-		mutex_unlock(&current->signal->cred_guard_mutex);
+		/*
+		 * If flush_old_exec did not acquire the cred_guard_mutex,
+		 * try again here, but if that fails, just leave
+		 * cred_locked_in_execve alone, since this means there
+		 * must be a fatal signal pending.
+		 * We don't want to prevent this task to be killed, just
+		 * because it is stuck in the middle of execve.
+		 */
+		if (bprm->called_flush_old_exec ||
+		    !mutex_lock_killable(&current->signal->cred_guard_mutex)) {
+			current->signal->cred_locked_in_execve = false;
+			mutex_unlock(&current->signal->cred_guard_mutex);
+		}
 		abort_creds(bprm->cred);
 	}
 	if (bprm->file) {
@@ -1469,13 +1497,14 @@  void install_exec_creds(struct linux_binprm *bprm)
 	 * credentials; any time after this it may be unlocked.
 	 */
 	security_bprm_committed_creds(bprm);
+	current->signal->cred_locked_in_execve = false;
 	mutex_unlock(&current->signal->cred_guard_mutex);
 }
 EXPORT_SYMBOL(install_exec_creds);
 
 /*
  * determine how safe it is to execute the proposed program
- * - the caller must hold ->cred_guard_mutex to protect against
+ * - the caller must have set ->cred_locked_in_execve to protect against
  *   PTRACE_ATTACH or seccomp thread-sync
  */
 static void check_unsafe_exec(struct linux_binprm *bprm)
diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h
index b40fc63..2930253 100644
--- a/include/linux/binfmts.h
+++ b/include/linux/binfmts.h
@@ -44,7 +44,11 @@  struct linux_binprm {
 		 * exec has happened. Used to sanitize execution environment
 		 * and to set AT_SECURE auxv for glibc.
 		 */
-		secureexec:1;
+		secureexec:1,
+		/*
+		 * Set by flush_old_exec, when the cred_guard_mutex is taken.
+		 */
+		called_flush_old_exec:1;
 #ifdef __alpha__
 	unsigned int taso:1;
 #endif
diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
index 8805025..8f8e358 100644
--- a/include/linux/sched/signal.h
+++ b/include/linux/sched/signal.h
@@ -225,6 +225,8 @@  struct signal_struct {
 	struct mutex cred_guard_mutex;	/* guard against foreign influences on
 					 * credential calculations
 					 * (notably. ptrace) */
+	bool cred_locked_in_execve;	/* set while in execve, only valid when
+					 * cred_guard_mutex is held */
 } __randomize_layout;
 
 /*
diff --git a/kernel/cred.c b/kernel/cred.c
index 809a985..e4c78de 100644
--- a/kernel/cred.c
+++ b/kernel/cred.c
@@ -676,7 +676,7 @@  void __init cred_init(void)
  *
  * Returns the new credentials or NULL if out of memory.
  *
- * Does not take, and does not return holding current->cred_replace_mutex.
+ * Does not take, and does not return holding ->cred_guard_mutex.
  */
 struct cred *prepare_kernel_cred(struct task_struct *daemon)
 {
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 43d6179..0f82bab 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -395,6 +395,10 @@  static int ptrace_attach(struct task_struct *task, long request,
 	if (mutex_lock_interruptible(&task->signal->cred_guard_mutex))
 		goto out;
 
+	retval = -EAGAIN;
+	if (task->signal->cred_locked_in_execve)
+		goto unlock_creds;
+
 	task_lock(task);
 	retval = __ptrace_may_access(task, PTRACE_MODE_ATTACH_REALCREDS);
 	task_unlock(task);
diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index b6ea3dc..3efa3e5 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -342,6 +342,9 @@  static inline pid_t seccomp_can_sync_threads(void)
 	BUG_ON(!mutex_is_locked(&current->signal->cred_guard_mutex));
 	assert_spin_locked(&current->sighand->siglock);
 
+	if (current->signal->cred_locked_in_execve)
+		return -EAGAIN;
+
 	/* Validate all threads being eligible for synchronization. */
 	caller = current;
 	for_each_thread(caller, thread) {
diff --git a/mm/process_vm_access.c b/mm/process_vm_access.c
index 357aa7b..b3e6eb5 100644
--- a/mm/process_vm_access.c
+++ b/mm/process_vm_access.c
@@ -204,7 +204,7 @@  static ssize_t process_vm_rw_core(pid_t pid, struct iov_iter *iter,
 	if (!mm || IS_ERR(mm)) {
 		rc = IS_ERR(mm) ? PTR_ERR(mm) : -ESRCH;
 		/*
-		 * Explicitly map EACCES to EPERM as EPERM is a more a
+		 * Explicitly map EACCES to EPERM as EPERM is a more
 		 * appropriate error code for process_vw_readv/writev
 		 */
 		if (rc == -EACCES)
diff --git a/tools/testing/selftests/ptrace/Makefile b/tools/testing/selftests/ptrace/Makefile
index c0b7f89..2f1f532 100644
--- a/tools/testing/selftests/ptrace/Makefile
+++ b/tools/testing/selftests/ptrace/Makefile
@@ -1,6 +1,6 @@ 
 # SPDX-License-Identifier: GPL-2.0-only
-CFLAGS += -iquote../../../../include/uapi -Wall
+CFLAGS += -std=c99 -pthread -iquote../../../../include/uapi -Wall
 
-TEST_GEN_PROGS := get_syscall_info peeksiginfo
+TEST_GEN_PROGS := get_syscall_info peeksiginfo vmaccess
 
 include ../lib.mk
diff --git a/tools/testing/selftests/ptrace/vmaccess.c b/tools/testing/selftests/ptrace/vmaccess.c
new file mode 100644
index 0000000..6d8a048
--- /dev/null
+++ b/tools/testing/selftests/ptrace/vmaccess.c
@@ -0,0 +1,66 @@ 
+// SPDX-License-Identifier: GPL-2.0+
+/*
+ * Copyright (c) 2020 Bernd Edlinger <bernd.edlinger@hotmail.de>
+ * All rights reserved.
+ *
+ * Check whether /proc/$pid/mem can be accessed without causing deadlocks
+ * when de_thread is blocked with ->cred_guard_mutex held.
+ */
+
+#include "../kselftest_harness.h"
+#include <stdio.h>
+#include <fcntl.h>
+#include <pthread.h>
+#include <signal.h>
+#include <unistd.h>
+#include <sys/ptrace.h>
+
+static void *thread(void *arg)
+{
+	ptrace(PTRACE_TRACEME, 0, 0L, 0L);
+	return NULL;
+}
+
+TEST(vmaccess)
+{
+	int f, pid = fork();
+	char mm[64];
+
+	if (!pid) {
+		pthread_t pt;
+
+		pthread_create(&pt, NULL, thread, NULL);
+		pthread_join(pt, NULL);
+		execlp("true", "true", NULL);
+	}
+
+	sleep(1);
+	sprintf(mm, "/proc/%d/mem", pid);
+	f = open(mm, O_RDONLY);
+	ASSERT_LE(0, f);
+	close(f);
+	f = kill(pid, SIGCONT);
+	ASSERT_EQ(0, f);
+}
+
+TEST(attach)
+{
+	int f, pid = fork();
+
+	if (!pid) {
+		pthread_t pt;
+
+		pthread_create(&pt, NULL, thread, NULL);
+		pthread_join(pt, NULL);
+		execlp("true", "true", NULL);
+	}
+
+	sleep(1);
+	f = ptrace(PTRACE_ATTACH, pid, 0L, 0L);
+	ASSERT_EQ(EAGAIN, errno);
+	ASSERT_EQ(f, -1);
+	f = kill(pid, SIGCONT);
+	ASSERT_EQ(0, f);
+}
+
+TEST_HARNESS_MAIN