[0/2] Infrastructure to allow fixing exec deadlocks

Message ID	87tv32cxmf.fsf_-_@x220.int.ebiederm.org (mailing list archive)
Headers	show Return-Path: <SRS0=LZ0z=4W=kvack.org=owner-linux-mm@kernel.org> DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 009C422522 From: ebiederm@xmission.com (Eric W. Biederman) To: Bernd Edlinger <bernd.edlinger@hotmail.de> Cc: Christian Brauner <christian.brauner@ubuntu.com>, Kees Cook <keescook@chromium.org>, Jann Horn <jannh@google.com>, Jonathan Corbet <corbet@lwn.net>, Alexander Viro <viro@zeniv.linux.org.uk>, Andrew Morton <akpm@linux-foundation.org>, Alexey Dobriyan <adobriyan@gmail.com>, Thomas Gleixner <tglx@linutronix.de>, Oleg Nesterov <oleg@redhat.com>, Frederic Weisbecker <frederic@kernel.org>, Andrei Vagin <avagin@gmail.com>, Ingo Molnar <mingo@kernel.org>, "Peter Zijlstra \(Intel\)" <peterz@infradead.org>, Yuyang Du <duyuyang@gmail.com>, David Hildenbrand <david@redhat.com>, Sebastian Andrzej Siewior <bigeasy@linutronix.de>, Anshuman Khandual <anshuman.khandual@arm.com>, David Howells <dhowells@redhat.com>, James Morris <jamorris@linux.microsoft.com>, Greg Kroah-Hartman <gregkh@linuxfoundation.org>, Shakeel Butt <shakeelb@google.com>, Jason Gunthorpe <jgg@ziepe.ca>, Christian Kellner <christian@kellner.me>, Andrea Arcangeli <aarcange@redhat.com>, Aleksa Sarai <cyphar@cyphar.com>, "Dmitry V. Levin" <ldv@altlinux.org>, "linux-doc\@vger.kernel.org" <linux-doc@vger.kernel.org>, "linux-kernel\@vger.kernel.org" <linux-kernel@vger.kernel.org>, "linux-fsdevel\@vger.kernel.org" <linux-fsdevel@vger.kernel.org>, "linux-mm\@kvack.org" <linux-mm@kvack.org>, "stable\@vger.kernel.org" <stable@vger.kernel.org>, "linux-api\@vger.kernel.org" <linux-api@vger.kernel.org> References: <AM6PR03MB5170EB4427BF5C67EE98FF09E4E60@AM6PR03MB5170.eurprd03.prod.outlook.com> <87k142lpfz.fsf@x220.int.ebiederm.org> <AM6PR03MB51704206634C009500A8080DE4E70@AM6PR03MB5170.eurprd03.prod.outlook.com> <875zfmloir.fsf@x220.int.ebiederm.org> <AM6PR03MB51707ABF20B6CBBECC34865FE4E70@AM6PR03MB5170.eurprd03.prod.outlook.com> <87v9nmjulm.fsf@x220.int.ebiederm.org> <AM6PR03MB5170B976E6387FDDAD59A118E4E70@AM6PR03MB5170.eurprd03.prod.outlook.com> <202003021531.C77EF10@keescook> <20200303085802.eqn6jbhwxtmz4j2x@wittgenstein> <AM6PR03MB5170285B336790D3450E2644E4E40@AM6PR03MB5170.eurprd03.prod.outlook.com> <87v9nlii0b.fsf@x220.int.ebiederm.org> <AM6PR03MB5170609D44967E044FD1BE40E4E40@AM6PR03MB5170.eurprd03.prod.outlook.com> <87a74xi4kz.fsf@x220.int.ebiederm.org> <AM6PR03MB51705AA3009B4986BB6EF92FE4E50@AM6PR03MB5170.eurprd03.prod.outlook.com> <87r1y8dqqz.fsf@x220.int.ebiederm.org> <AM6PR03MB517053AED7DC89F7C0704B7DE4E50@AM6PR03MB5170.eurprd03.prod.outlook.com> <AM6PR03MB51703B44170EAB4626C9B2CAE4E20@AM6PR03MB5170.eurprd03.prod.outlook.com> Date: Thu, 05 Mar 2020 15:14:48 -0600 In-Reply-To: <AM6PR03MB51703B44170EAB4626C9B2CAE4E20@AM6PR03MB5170.eurprd03.prod.outlook.com> (Bernd Edlinger's message of "Thu, 5 Mar 2020 18:36:53 +0000") Message-ID: <87tv32cxmf.fsf_-_@x220.int.ebiederm.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain BODY: Bayes spam probability is 40 to 60% * [score: 0.4805] * 1.5 XMNoVowels Alpha-numberic number with no vowels * -0.0 DCC_CHECK_NEGATIVE Not listed in DCC * [sa04 1397; Body=1 Fuz1=1 Fuz2=1] parse: 1.53 (0.3%), extract_message_metadata: 3.8 (0.6%), get_uri_detail_list: 1.30 (0.2%), tests_pri_-1000: 10 (1.6%), tests_pri_-950: 1.62 (0.3%), tests_pri_-900: 1.35 (0.2%), tests_pri_-90: 32 (5.3%), check_bayes: 31 (5.0%), b_tokenize: 13 (2.2%), b_tok_get_all: 8 (1.4%), b_comp_prob: 2.9 (0.5%), b_tok_touch_all: 3.8 (0.6%), b_finish: 0.75 (0.1%), tests_pri_0: 538 (88.0%), check_dkim_signature: 0.60 (0.1%), check_dkim_adsp: 2.5 (0.4%), poll_dns_idle: 0.60 (0.1%), tests_pri_10: 2.0 (0.3%), tests_pri_500: 6 (1.0%), rewrite_mail: 0.00 (0.0%) Subject: [PATCH 0/2] Infrastructure to allow fixing exec deadlocks Sender: owner-linux-mm@kvack.org Precedence: bulk
Series	Infrastructure to allow fixing exec deadlocks \| expand [0/2] Infrastructure to allow fixing exec deadlocks [1/2] exec: Properly mark the point of no return [2/2] exec: Add a exec_update_mutex to replace cred_guard_mutex

Message ID

87tv32cxmf.fsf_-_@x220.int.ebiederm.org (mailing list archive)

Headers

DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 009C422522
From: ebiederm@xmission.com (Eric W. Biederman)
To: Bernd Edlinger <bernd.edlinger@hotmail.de>
Cc: Christian Brauner <christian.brauner@ubuntu.com>,  Kees Cook
 <keescook@chromium.org>,  Jann Horn <jannh@google.com>,  Jonathan Corbet
 <corbet@lwn.net>,  Alexander Viro <viro@zeniv.linux.org.uk>,  Andrew
 Morton <akpm@linux-foundation.org>,  Alexey Dobriyan
 <adobriyan@gmail.com>,  Thomas Gleixner <tglx@linutronix.de>,  Oleg
 Nesterov <oleg@redhat.com>,  Frederic Weisbecker <frederic@kernel.org>,
  Andrei Vagin <avagin@gmail.com>,  Ingo Molnar <mingo@kernel.org>,  "Peter
 Zijlstra \(Intel\)" <peterz@infradead.org>,  Yuyang Du <duyuyang@gmail.com>,
  David Hildenbrand <david@redhat.com>,  Sebastian Andrzej Siewior
 <bigeasy@linutronix.de>,  Anshuman Khandual <anshuman.khandual@arm.com>,
  David Howells <dhowells@redhat.com>,  James Morris
 <jamorris@linux.microsoft.com>,  Greg Kroah-Hartman
 <gregkh@linuxfoundation.org>,  Shakeel Butt <shakeelb@google.com>,  Jason
 Gunthorpe <jgg@ziepe.ca>,  Christian Kellner <christian@kellner.me>,
  Andrea Arcangeli <aarcange@redhat.com>,  Aleksa Sarai
 <cyphar@cyphar.com>,  "Dmitry V. Levin" <ldv@altlinux.org>,
  "linux-doc\@vger.kernel.org" <linux-doc@vger.kernel.org>,
  "linux-kernel\@vger.kernel.org" <linux-kernel@vger.kernel.org>,
  "linux-fsdevel\@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
  "linux-mm\@kvack.org" <linux-mm@kvack.org>,  "stable\@vger.kernel.org"
 <stable@vger.kernel.org>,  "linux-api\@vger.kernel.org"
 <linux-api@vger.kernel.org>
References: 
 <AM6PR03MB5170EB4427BF5C67EE98FF09E4E60@AM6PR03MB5170.eurprd03.prod.outlook.com>
	<87k142lpfz.fsf@x220.int.ebiederm.org>
	<AM6PR03MB51704206634C009500A8080DE4E70@AM6PR03MB5170.eurprd03.prod.outlook.com>
	<875zfmloir.fsf@x220.int.ebiederm.org>
	<AM6PR03MB51707ABF20B6CBBECC34865FE4E70@AM6PR03MB5170.eurprd03.prod.outlook.com>
	<87v9nmjulm.fsf@x220.int.ebiederm.org>
	<AM6PR03MB5170B976E6387FDDAD59A118E4E70@AM6PR03MB5170.eurprd03.prod.outlook.com>
	<202003021531.C77EF10@keescook>
	<20200303085802.eqn6jbhwxtmz4j2x@wittgenstein>
	<AM6PR03MB5170285B336790D3450E2644E4E40@AM6PR03MB5170.eurprd03.prod.outlook.com>
	<87v9nlii0b.fsf@x220.int.ebiederm.org>
	<AM6PR03MB5170609D44967E044FD1BE40E4E40@AM6PR03MB5170.eurprd03.prod.outlook.com>
	<87a74xi4kz.fsf@x220.int.ebiederm.org>
	<AM6PR03MB51705AA3009B4986BB6EF92FE4E50@AM6PR03MB5170.eurprd03.prod.outlook.com>
	<87r1y8dqqz.fsf@x220.int.ebiederm.org>
	<AM6PR03MB517053AED7DC89F7C0704B7DE4E50@AM6PR03MB5170.eurprd03.prod.outlook.com>
	<AM6PR03MB51703B44170EAB4626C9B2CAE4E20@AM6PR03MB5170.eurprd03.prod.outlook.com>
Date: Thu, 05 Mar 2020 15:14:48 -0600
In-Reply-To: 
 <AM6PR03MB51703B44170EAB4626C9B2CAE4E20@AM6PR03MB5170.eurprd03.prod.outlook.com>
	(Bernd Edlinger's message of "Thu, 5 Mar 2020 18:36:53 +0000")
Message-ID: <87tv32cxmf.fsf_-_@x220.int.ebiederm.org>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain
Subject: [PATCH 0/2] Infrastructure to allow fixing exec deadlocks
Sender: owner-linux-mm@kvack.org
Precedence: bulk

Series

Infrastructure to allow fixing exec deadlocks | expand

Message

Eric W. Biederman March 5, 2020, 9:14 p.m. UTC

Bernd, everyone

This is how I think the infrastructure change should look that makes way
for fixing this issue.

- Correct the point of no return.
- Add a new mutex to replace cred_guard_mutex

Then I think it is just going through the existing
users of cred_guard_mutex and fixing them to use the new one.

There really aren't that many users of cred_guard_mutex so we should be
able to get through the easy ones fairly quickly.  And anything that
isn't easy we can wait until we have a good fix.

The users of cred_guard_mutex that I saw were:
    fs/proc/base.c:
       proc_pid_attr_write
       do_io_accounting
       proc_pid_stack
       proc_pid_syscall
       proc_pid_personality
    
    perf_event_open
    mm_access
    kcmp
    pidfd_fget
    seccomp_set_mode_filter

Bernd does this make sense to you?  

I think we can fix the seccomp/no_new_privs issue with some careful
refactoring.  We can probably do the same for ptrace but that appears
to need a little lsm bug fixing.

My goal here is to allow us to fix the uncontroversial easy bits.  While
still allowing the difficult tricky bits to be fixed.

Eric W. Biederman (2):
      exec: Properly mark the point of no return
      exec: Add a exec_update_mutex to replace cred_guard_mutex

 fs/exec.c                    | 11 ++++++++---
 include/linux/binfmts.h      |  7 ++++++-
 include/linux/sched/signal.h |  9 ++++++++-
 kernel/fork.c                |  1 +
 4 files changed, 23 insertions(+), 5 deletions(-)

Eric

Comments

Bernd Edlinger March 5, 2020, 10:31 p.m. UTC | #1

On 3/5/20 10:14 PM, Eric W. Biederman wrote:
> 
> Bernd, everyone
> 
> This is how I think the infrastructure change should look that makes way
> for fixing this issue.
> 
> - Correct the point of no return.
> - Add a new mutex to replace cred_guard_mutex
> 
> Then I think it is just going through the existing
> users of cred_guard_mutex and fixing them to use the new one.
> 
> There really aren't that many users of cred_guard_mutex so we should be
> able to get through the easy ones fairly quickly.  And anything that
> isn't easy we can wait until we have a good fix.
> 
> The users of cred_guard_mutex that I saw were:
>     fs/proc/base.c:
>        proc_pid_attr_write
>        do_io_accounting
>        proc_pid_stack
>        proc_pid_syscall
>        proc_pid_personality
>     
>     perf_event_open
>     mm_access
>     kcmp
>     pidfd_fget
>     seccomp_set_mode_filter
> 
> Bernd does this make sense to you?  
> 
> I think we can fix the seccomp/no_new_privs issue with some careful
> refactoring.  We can probably do the same for ptrace but that appears
> to need a little lsm bug fixing.
> 

Yes, for most functions the proposed "exec_update_mutex" is fine,
but we will need a longer-time block for ptrace_attach, seccomp_set_mode_filter
and proc_pid_attr_write need to be blocked for the whole exec duration so
they need a second "mutex", with deadlock-detection as in my previous patch,
if I see that right.

Unfortunately only one of the two test cases can be fixed without the
second mutex, of course the mm_access is what cause the practical problem.

Currently for the unlimited user space delay, I have only the case of
a ptraced sibling thread on my radar, de_thread waits for the parent
to call wait in this case, that can literally take forever.
But I know that also PTRACE_CONT may be needed after a PTRACE_EVENT_EXIT.

Can you explain what else in the user space can go wrong to make an
unlimited delay in the execve?

Bernd.

Eric W. Biederman March 6, 2020, 5:06 a.m. UTC | #2

Bernd Edlinger <bernd.edlinger@hotmail.de> writes:

> On 3/5/20 10:14 PM, Eric W. Biederman wrote:
>> 
>> Bernd, everyone
>> 
>> This is how I think the infrastructure change should look that makes way
>> for fixing this issue.
>> 
>> - Correct the point of no return.
>> - Add a new mutex to replace cred_guard_mutex
>> 
>> Then I think it is just going through the existing
>> users of cred_guard_mutex and fixing them to use the new one.
>> 
>> There really aren't that many users of cred_guard_mutex so we should be
>> able to get through the easy ones fairly quickly.  And anything that
>> isn't easy we can wait until we have a good fix.
>> 
>> The users of cred_guard_mutex that I saw were:
>>     fs/proc/base.c:
>>        proc_pid_attr_write
>>        do_io_accounting
>>        proc_pid_stack
>>        proc_pid_syscall
>>        proc_pid_personality
>>     
>>     perf_event_open
>>     mm_access
>>     kcmp
>>     pidfd_fget
>>     seccomp_set_mode_filter
>> 
>> Bernd does this make sense to you?  
>> 
>> I think we can fix the seccomp/no_new_privs issue with some careful
>> refactoring.  We can probably do the same for ptrace but that appears
>> to need a little lsm bug fixing.
>> 
>
> Yes, for most functions the proposed "exec_update_mutex" is fine,
> but we will need a longer-time block for ptrace_attach, seccomp_set_mode_filter
> and proc_pid_attr_write need to be blocked for the whole exec duration so
> they need a second "mutex", with deadlock-detection as in my previous patch,
> if I see that right.

So far I am leaving "cred_guard_mutex" as that second "mutex".  My sense
is that when all we have left are the hard cases we can take those
cases out in detail, examine them and see what really can be done.

> Unfortunately only one of the two test cases can be fixed without the
> second mutex, of course the mm_access is what cause the practical problem.

Fixing the practical problems are foremost on my agenda.
That and clearing away enough of the noise that we can really focus on
the hard problems when we begin to address them.

That way I am hoping we can really solve some of these issues and make
them go away.

> Currently for the unlimited user space delay, I have only the case of
> a ptraced sibling thread on my radar, de_thread waits for the parent
> to call wait in this case, that can literally take forever.
> But I know that also PTRACE_CONT may be needed after a PTRACE_EVENT_EXIT.
>
> Can you explain what else in the user space can go wrong to make an
> unlimited delay in the execve?

Triggering a page fault.  Depending on the backing store or possibly
with the use of userfaultfd that page fault can be delayed indefinitely
and pretty much be as bad as the ptrace case.

Eric