Message ID | 87zhcq4jdj.fsf_-_@x220.int.ebiederm.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Infrastructure to allow fixing exec deadlocks | expand |
On 3/8/20 10:38 PM, Eric W. Biederman wrote: > > The cred_guard_mutex is problematic. The cred_guard_mutex is held > over the userspace accesses as the arguments from userspace are read. > The cred_guard_mutex is held of PTRACE_EVENT_EXIT as the the other ... is held while waiting for the trace parent to handle PTRACE_EVENT_EXIT or something? I wonder if we also should mention that it is held while waiting for the trace parent to receive the exit code with "wait"? > threads are killed. The cred_guard_mutex is held over > "put_user(0, tsk->clear_child_tid)" in exit_mm(). > > Any of those can result in deadlock, as the cred_guard_mutex is held > over a possible indefinite userspace waits for userspace. > > Add exec_update_mutex that is only held over exec updating process Add ? > with the new contents of exec, so that code that needs not to be > confused by exec changing the mm and the cred in ways that can not > happen during ordinary execution of a process. > > The plan is to switch the users of cred_guard_mutex to > exec_udpate_mutex one by one. This lets us move forward while still s/udpate/update/ Bernd.
Bernd Edlinger <bernd.edlinger@hotmail.de> writes: > On 3/8/20 10:38 PM, Eric W. Biederman wrote: >> >> The cred_guard_mutex is problematic. The cred_guard_mutex is held >> over the userspace accesses as the arguments from userspace are read. >> The cred_guard_mutex is held of PTRACE_EVENT_EXIT as the the other ^ over > > ... is held while waiting for the trace parent to handle PTRACE_EVENT_EXIT > or something? Yes. Let me see if I can phrase that better. > I wonder if we also should mention that > it is held while waiting for the trace parent to > receive the exit code with "wait"? I don't think we have to spell out the details of how it all works, unless that makes things clearer. Kernel developers can be expected to figure out how the kernel works. The critical thing is that it is an indefinite wait for userspace to take action. But I will look. >> threads are killed. The cred_guard_mutex is held over >> "put_user(0, tsk->clear_child_tid)" in exit_mm(). >> >> Any of those can result in deadlock, as the cred_guard_mutex is held >> over a possible indefinite userspace waits for userspace. >> >> Add exec_update_mutex that is only held over exec updating process > > Add ? Yes. That is what the change does: add exec_update_mutex. >> with the new contents of exec, so that code that needs not to be >> confused by exec changing the mm and the cred in ways that can not >> happen during ordinary execution of a process. >> >> The plan is to switch the users of cred_guard_mutex to >> exec_udpate_mutex one by one. This lets us move forward while still > > s/udpate/update/ Yes. Very much so. Eric
On 3/9/20 6:40 PM, Eric W. Biederman wrote: > Bernd Edlinger <bernd.edlinger@hotmail.de> writes: > >> On 3/8/20 10:38 PM, Eric W. Biederman wrote: >>> >>> The cred_guard_mutex is problematic. The cred_guard_mutex is held >>> over the userspace accesses as the arguments from userspace are read. >>> The cred_guard_mutex is held of PTRACE_EVENT_EXIT as the the other > ^ over >> >> ... is held while waiting for the trace parent to handle PTRACE_EVENT_EXIT >> or something? > > Yes. Let me see if I can phrase that better. > >> I wonder if we also should mention that >> it is held while waiting for the trace parent to >> receive the exit code with "wait"? > > I don't think we have to spell out the details of how it all works, > unless that makes things clearer. Kernel developers can be expected > to figure out how the kernel works. The critical thing is that it is > an indefinite wait for userspace to take action. > > But I will look. > >>> threads are killed. The cred_guard_mutex is held over >>> "put_user(0, tsk->clear_child_tid)" in exit_mm(). >>> >>> Any of those can result in deadlock, as the cred_guard_mutex is held >>> over a possible indefinite userspace waits for userspace. >>> >>> Add exec_update_mutex that is only held over exec updating process >> >> Add ? > > Yes. That is what the change does: add exec_update_mutex. > I just kind of missed the "subject" in this sentence, like "This patch adds an exec_update_mutex that is ..." but english is a foreign language for me, so may be okay as is. Bernd. >>> with the new contents of exec, so that code that needs not to be >>> confused by exec changing the mm and the cred in ways that can not >>> happen during ordinary execution of a process. >>> >>> The plan is to switch the users of cred_guard_mutex to >>> exec_udpate_mutex one by one. This lets us move forward while still >> >> s/udpate/update/ > > Yes. Very much so. > > Eric >
Bernd Edlinger <bernd.edlinger@hotmail.de> writes: > On 3/9/20 6:40 PM, Eric W. Biederman wrote: >> Bernd Edlinger <bernd.edlinger@hotmail.de> writes: >> >>> On 3/8/20 10:38 PM, Eric W. Biederman wrote: >>>> >>>> The cred_guard_mutex is problematic. The cred_guard_mutex is held >>>> over the userspace accesses as the arguments from userspace are read. >>>> The cred_guard_mutex is held of PTRACE_EVENT_EXIT as the the other >> ^ over >>> >>> ... is held while waiting for the trace parent to handle PTRACE_EVENT_EXIT >>> or something? >> >> Yes. Let me see if I can phrase that better. >> >>> I wonder if we also should mention that >>> it is held while waiting for the trace parent to >>> receive the exit code with "wait"? >> >> I don't think we have to spell out the details of how it all works, >> unless that makes things clearer. Kernel developers can be expected >> to figure out how the kernel works. The critical thing is that it is >> an indefinite wait for userspace to take action. >> >> But I will look. >> >>>> threads are killed. The cred_guard_mutex is held over >>>> "put_user(0, tsk->clear_child_tid)" in exit_mm(). >>>> >>>> Any of those can result in deadlock, as the cred_guard_mutex is held >>>> over a possible indefinite userspace waits for userspace. >>>> >>>> Add exec_update_mutex that is only held over exec updating process >>> >>> Add ? >> >> Yes. That is what the change does: add exec_update_mutex. >> > > I just kind of missed the "subject" in this sentence, > like "This patch adds an exec_update_mutex that is ..." > but english is a foreign language for me, so may be okay as is. English has a lot of options. I think this is a stylistic difference. Instead of being an observer and describing what the change does: "This patch adds exec_update_mutex ..." I was being there in the moment and saying/commading what is happening: "Add exec_update_mutex ..." Using the more immdediate form ends up with more concise and clearer sentences. Every one of my writing teachers in school emphasized that point and I see the who it works when I write things. But writing is hard and I still tend toward long rambling sentences with many qualifiers that confuse and detract from the point rather than make it clear what is happening. Eric
ebiederm@xmission.com (Eric W. Biederman) writes: > Bernd Edlinger <bernd.edlinger@hotmail.de> writes: > >> On 3/9/20 6:40 PM, Eric W. Biederman wrote: >>> Bernd Edlinger <bernd.edlinger@hotmail.de> writes: >>> >>>> On 3/8/20 10:38 PM, Eric W. Biederman wrote: >>>>> >>>>> The cred_guard_mutex is problematic. The cred_guard_mutex is held >>>>> over the userspace accesses as the arguments from userspace are read. >>>>> The cred_guard_mutex is held of PTRACE_EVENT_EXIT as the the other >>> ^ over >>>> >>>> ... is held while waiting for the trace parent to handle PTRACE_EVENT_EXIT >>>> or something? >>> >>> Yes. Let me see if I can phrase that better. >>> >>>> I wonder if we also should mention that >>>> it is held while waiting for the trace parent to >>>> receive the exit code with "wait"? >>> >>> I don't think we have to spell out the details of how it all works, >>> unless that makes things clearer. Kernel developers can be expected >>> to figure out how the kernel works. The critical thing is that it is >>> an indefinite wait for userspace to take action. >>> >>> But I will look. >>> >>>>> threads are killed. The cred_guard_mutex is held over >>>>> "put_user(0, tsk->clear_child_tid)" in exit_mm(). >>>>> >>>>> Any of those can result in deadlock, as the cred_guard_mutex is held >>>>> over a possible indefinite userspace waits for userspace. >>>>> >>>>> Add exec_update_mutex that is only held over exec updating process >>>> >>>> Add ? >>> >>> Yes. That is what the change does: add exec_update_mutex. >>> >> >> I just kind of missed the "subject" in this sentence, >> like "This patch adds an exec_update_mutex that is ..." >> but english is a foreign language for me, so may be okay as is. > > English has a lot of options. I think this is a stylistic difference. > > Instead of being an observer and describing what the change does: > "This patch adds exec_update_mutex ..." > > I was being there in the moment and saying/commading what is happening: > "Add exec_update_mutex ..." > > Using the more immdediate form ends up with more concise and clearer > sentences. > > Every one of my writing teachers in school emphasized that point > and I see the who it works when I write things. But writing is hard and > I still tend toward long rambling sentences with many qualifiers that > confuse and detract from the point rather than make it clear what is > happening. And reading through it all now I can see your confusion. That description of my changes was not well done. Reworking it now. Eric
My rewritten change description reads as follows:
exec: Add a exec_update_mutex to replace cred_guard_mutex
The cred_guard_mutex is problematic as it is held over possibly
indefinite waits for userspace. The possilbe indefinite waits for
userspace that I have identified are: The cred_guard_mutex is held in
PTRACE_EVENT_EXIT waiting for the tracer. The cred_guard_mutex is
held over "put_user(0, tsk->clear_child_tid)" in exit_mm(). The
cred_guard_mutex is held over "get_user(futex_offset, ...") in
exit_robust_list. The cred_guard_mutex held over copy_strings.
The functions get_user and put_user can trigger a page fault which can
potentially wait indefinitely in the case of userfaultfd or if
userspace implements part of the page fault path.
In any of those cases the userspace process that the kernel is waiting
for might userspace might make a different system call that winds up
taking the cred_guard_mutex and result in deadlock.
Holding a mutex over any of those possibly indefinite waits for
userspace does not appear necessary. Add exec_update_mutex that will
just cover updating the process during exec where the permissions and
the objects pointed to by the task struct may be out of sync.
The plan is to switch the users of cred_guard_mutex to
exec_udpate_mutex one by one. This lets us move forward while still
being careful and not introducing any regressions.
Link: https://lore.kernel.org/lkml/20160921152946.GA24210@dhcp22.suse.cz/
Link: https://lore.kernel.org/lkml/AM6PR03MB5170B06F3A2B75EFB98D071AE4E60@AM6PR03MB5170.eurprd03.prod.outlook.com/
Link: https://lore.kernel.org/linux-fsdevel/20161102181806.GB1112@redhat.com/
Link: https://lore.kernel.org/lkml/20160923095031.GA14923@redhat.com/
Link: https://lore.kernel.org/lkml/20170213141452.GA30203@redhat.com/
Ref: 45c1a159b85b ("Add PTRACE_O_TRACEVFORKDONE and PTRACE_O_TRACEEXIT facilities.")
Ref: 456f17cd1a28 ("[PATCH] user-vm-unlock-2.5.31-A2")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Does that sound better?
Eric
On 3/9/20 7:36 PM, Eric W. Biederman wrote: > > My rewritten change description reads as follows: > > exec: Add a exec_update_mutex to replace cred_guard_mutex is this "an" exec_update_mutex? > > The cred_guard_mutex is problematic as it is held over possibly > indefinite waits for userspace. The possilbe indefinite waits for > userspace that I have identified are: The cred_guard_mutex is held in > PTRACE_EVENT_EXIT waiting for the tracer. The cred_guard_mutex is > held over "put_user(0, tsk->clear_child_tid)" in exit_mm(). The > cred_guard_mutex is held over "get_user(futex_offset, ...") in > exit_robust_list. The cred_guard_mutex held over copy_strings. > > The functions get_user and put_user can trigger a page fault which can > potentially wait indefinitely in the case of userfaultfd or if > userspace implements part of the page fault path. > > In any of those cases the userspace process that the kernel is waiting > for might userspace might make a different system call that winds up ^-------------^ ^- remove this > taking the cred_guard_mutex and result in deadlock. > > Holding a mutex over any of those possibly indefinite waits for > userspace does not appear necessary. Add exec_update_mutex that will > just cover updating the process during exec where the permissions and > the objects pointed to by the task struct may be out of sync. > > The plan is to switch the users of cred_guard_mutex to > exec_udpate_mutex one by one. This lets us move forward while still ^-- typo: update > being careful and not introducing any regressions. > > Link: https://lore.kernel.org/lkml/20160921152946.GA24210@dhcp22.suse.cz/ > Link: https://lore.kernel.org/lkml/AM6PR03MB5170B06F3A2B75EFB98D071AE4E60@AM6PR03MB5170.eurprd03.prod.outlook.com/ > Link: https://lore.kernel.org/linux-fsdevel/20161102181806.GB1112@redhat.com/ > Link: https://lore.kernel.org/lkml/20160923095031.GA14923@redhat.com/ > Link: https://lore.kernel.org/lkml/20170213141452.GA30203@redhat.com/ > Ref: 45c1a159b85b ("Add PTRACE_O_TRACEVFORKDONE and PTRACE_O_TRACEEXIT facilities.") > Ref: 456f17cd1a28 ("[PATCH] user-vm-unlock-2.5.31-A2") > Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> > > Does that sound better? > almost done. > Eric >
Bernd Edlinger <bernd.edlinger@hotmail.de> writes: > On 3/9/20 7:36 PM, Eric W. Biederman wrote: >> >> >> Does that sound better? >> > > almost done. I think this text is finally clean. exec: Add exec_update_mutex to replace cred_guard_mutex The cred_guard_mutex is problematic as it is held over possibly indefinite waits for userspace. The possilbe indefinite waits for userspace that I have identified are: The cred_guard_mutex is held in PTRACE_EVENT_EXIT waiting for the tracer. The cred_guard_mutex is held over "put_user(0, tsk->clear_child_tid)" in exit_mm(). The cred_guard_mutex is held over "get_user(futex_offset, ...") in exit_robust_list. The cred_guard_mutex held over copy_strings. The functions get_user and put_user can trigger a page fault which can potentially wait indefinitely in the case of userfaultfd or if userspace implements part of the page fault path. In any of those cases the userspace process that the kernel is waiting for might make a different system call that winds up taking the cred_guard_mutex and result in deadlock. Holding a mutex over any of those possibly indefinite waits for userspace does not appear necessary. Add exec_update_mutex that will just cover updating the process during exec where the permissions and the objects pointed to by the task struct may be out of sync. The plan is to switch the users of cred_guard_mutex to exec_update_mutex one by one. This lets us move forward while still being careful and not introducing any regressions. Link: https://lore.kernel.org/lkml/20160921152946.GA24210@dhcp22.suse.cz/ Link: https://lore.kernel.org/lkml/AM6PR03MB5170B06F3A2B75EFB98D071AE4E60@AM6PR03MB5170.eurprd03.prod.outlook.com/ Link: https://lore.kernel.org/linux-fsdevel/20161102181806.GB1112@redhat.com/ Link: https://lore.kernel.org/lkml/20160923095031.GA14923@redhat.com/ Link: https://lore.kernel.org/lkml/20170213141452.GA30203@redhat.com/ Ref: 45c1a159b85b ("Add PTRACE_O_TRACEVFORKDONE and PTRACE_O_TRACEEXIT facilities.") Ref: 456f17cd1a28 ("[PATCH] user-vm-unlock-2.5.31-A2") Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Bernd do you want to give me your Reviewed-by for this part of the series? After that do you think you can write the obvious patch for mm_access? I will apply these changes to my tree and push them into linux-next. Eric
On 3/9/20 8:02 PM, Eric W. Biederman wrote: > Bernd Edlinger <bernd.edlinger@hotmail.de> writes: > >> On 3/9/20 7:36 PM, Eric W. Biederman wrote: >>> >>> >>> Does that sound better? >>> >> >> almost done. > > I think this text is finally clean. > > exec: Add exec_update_mutex to replace cred_guard_mutex > > The cred_guard_mutex is problematic as it is held over possibly > indefinite waits for userspace. The possilbe indefinite waits for > userspace that I have identified are: The cred_guard_mutex is held in > PTRACE_EVENT_EXIT waiting for the tracer. The cred_guard_mutex is > held over "put_user(0, tsk->clear_child_tid)" in exit_mm(). The > cred_guard_mutex is held over "get_user(futex_offset, ...") in > exit_robust_list. The cred_guard_mutex held over copy_strings. > > The functions get_user and put_user can trigger a page fault which can > potentially wait indefinitely in the case of userfaultfd or if > userspace implements part of the page fault path. > > In any of those cases the userspace process that the kernel is waiting > for might make a different system call that winds up taking the > cred_guard_mutex and result in deadlock. > > Holding a mutex over any of those possibly indefinite waits for > userspace does not appear necessary. Add exec_update_mutex that will > just cover updating the process during exec where the permissions and > the objects pointed to by the task struct may be out of sync. > > The plan is to switch the users of cred_guard_mutex to > exec_update_mutex one by one. This lets us move forward while still > being careful and not introducing any regressions. > > Link: https://lore.kernel.org/lkml/20160921152946.GA24210@dhcp22.suse.cz/ > Link: https://lore.kernel.org/lkml/AM6PR03MB5170B06F3A2B75EFB98D071AE4E60@AM6PR03MB5170.eurprd03.prod.outlook.com/ > Link: https://lore.kernel.org/linux-fsdevel/20161102181806.GB1112@redhat.com/ > Link: https://lore.kernel.org/lkml/20160923095031.GA14923@redhat.com/ > Link: https://lore.kernel.org/lkml/20170213141452.GA30203@redhat.com/ > Ref: 45c1a159b85b ("Add PTRACE_O_TRACEVFORKDONE and PTRACE_O_TRACEEXIT facilities.") > Ref: 456f17cd1a28 ("[PATCH] user-vm-unlock-2.5.31-A2") I checked the urls they all work. Just one last question, are these git references? I can't find them in my linux git tree (cloned from linus' git)? Sorry for being pedantically. > Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> > > > Bernd do you want to give me your Reviewed-by for this part of the > series? > Sure also the other parts of course. Reviewed-by: Bernd Edlinger <bernd.edlinger@hotmail.de> > After that do you think you can write the obvious patch for mm_access? > Yes, I can do that. I also have some typos in comments, will make them extra patches as well. I wonder if the test case is okay to include the ptrace_attach altough that is not yet passing? Thanks Bernd.
On Mon, Mar 09, 2020 at 02:02:37PM -0500, Eric W. Biederman wrote: > Bernd Edlinger <bernd.edlinger@hotmail.de> writes: > > > On 3/9/20 7:36 PM, Eric W. Biederman wrote: > >> > >> > >> Does that sound better? > >> > > > > almost done. > > I think this text is finally clean. > > exec: Add exec_update_mutex to replace cred_guard_mutex > > The cred_guard_mutex is problematic as it is held over possibly > indefinite waits for userspace. The possilbe indefinite waits for -------------------------------------------^^^^^^^^ possible?
Bernd Edlinger <bernd.edlinger@hotmail.de> writes: > On 3/9/20 8:02 PM, Eric W. Biederman wrote: >> Bernd Edlinger <bernd.edlinger@hotmail.de> writes: >> >>> On 3/9/20 7:36 PM, Eric W. Biederman wrote: >>>> >>>> >>>> Does that sound better? >>>> >>> >>> almost done. >> >> I think this text is finally clean. >> >> exec: Add exec_update_mutex to replace cred_guard_mutex >> >> The cred_guard_mutex is problematic as it is held over possibly >> indefinite waits for userspace. The possilbe indefinite waits for >> userspace that I have identified are: The cred_guard_mutex is held in >> PTRACE_EVENT_EXIT waiting for the tracer. The cred_guard_mutex is >> held over "put_user(0, tsk->clear_child_tid)" in exit_mm(). The >> cred_guard_mutex is held over "get_user(futex_offset, ...") in >> exit_robust_list. The cred_guard_mutex held over copy_strings. >> >> The functions get_user and put_user can trigger a page fault which can >> potentially wait indefinitely in the case of userfaultfd or if >> userspace implements part of the page fault path. >> >> In any of those cases the userspace process that the kernel is waiting >> for might make a different system call that winds up taking the >> cred_guard_mutex and result in deadlock. >> >> Holding a mutex over any of those possibly indefinite waits for >> userspace does not appear necessary. Add exec_update_mutex that will >> just cover updating the process during exec where the permissions and >> the objects pointed to by the task struct may be out of sync. >> >> The plan is to switch the users of cred_guard_mutex to >> exec_update_mutex one by one. This lets us move forward while still >> being careful and not introducing any regressions. >> >> Link: https://lore.kernel.org/lkml/20160921152946.GA24210@dhcp22.suse.cz/ >> Link: https://lore.kernel.org/lkml/AM6PR03MB5170B06F3A2B75EFB98D071AE4E60@AM6PR03MB5170.eurprd03.prod.outlook.com/ >> Link: https://lore.kernel.org/linux-fsdevel/20161102181806.GB1112@redhat.com/ >> Link: https://lore.kernel.org/lkml/20160923095031.GA14923@redhat.com/ >> Link: https://lore.kernel.org/lkml/20170213141452.GA30203@redhat.com/ >> Ref: 45c1a159b85b ("Add PTRACE_O_TRACEVFORKDONE and PTRACE_O_TRACEEXIT facilities.") >> Ref: 456f17cd1a28 ("[PATCH] user-vm-unlock-2.5.31-A2") > > I checked the urls they all work. > Just one last question, are these git references? > I can't find them in my linux git tree (cloned from linus' git)? > > Sorry for being pedantically. You have to track down tglx's historicaly git tree from when everything was in bitkeeper. But yes they are git references and yes they work. Just that part of the history is not in linux.git. >> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> >> >> >> Bernd do you want to give me your Reviewed-by for this part of the >> series? >> > > Sure also the other parts of course. > > Reviewed-by: Bernd Edlinger <bernd.edlinger@hotmail.de> > >> After that do you think you can write the obvious patch for mm_access? >> > > Yes, I can do that. > I also have some typos in comments, will make them extra patches as well. > > I wonder if the test case is okay to include the ptrace_attach altough > that is not yet passing? It is an existing kernel but that it doesn't pass. My sense is that if you include it as a separate patch if it is a problem for someone we can identify it easily via bisect and we do whatever is appropriate. Eric
Bernd Edlinger <bernd.edlinger@hotmail.de> writes: > On 3/9/20 8:02 PM, Eric W. Biederman wrote: >> >> >> Link: https://lore.kernel.org/lkml/20160921152946.GA24210@dhcp22.suse.cz/ >> Link: https://lore.kernel.org/lkml/AM6PR03MB5170B06F3A2B75EFB98D071AE4E60@AM6PR03MB5170.eurprd03.prod.outlook.com/ >> Link: https://lore.kernel.org/linux-fsdevel/20161102181806.GB1112@redhat.com/ >> Link: https://lore.kernel.org/lkml/20160923095031.GA14923@redhat.com/ >> Link: https://lore.kernel.org/lkml/20170213141452.GA30203@redhat.com/ >> Ref: 45c1a159b85b ("Add PTRACE_O_TRACEVFORKDONE and PTRACE_O_TRACEEXIT facilities.") >> Ref: 456f17cd1a28 ("[PATCH] user-vm-unlock-2.5.31-A2") > > I checked the urls they all work. > Just one last question, are these git references? > I can't find them in my linux git tree (cloned from linus' git)? I will add this tag to help people figure out what is going on. History Tree: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git Eric
"Dmitry V. Levin" <ldv@altlinux.org> writes: > On Mon, Mar 09, 2020 at 02:02:37PM -0500, Eric W. Biederman wrote: >> Bernd Edlinger <bernd.edlinger@hotmail.de> writes: >> >> > On 3/9/20 7:36 PM, Eric W. Biederman wrote: >> >> >> >> >> >> Does that sound better? >> >> >> > >> > almost done. >> >> I think this text is finally clean. >> >> exec: Add exec_update_mutex to replace cred_guard_mutex >> >> The cred_guard_mutex is problematic as it is held over possibly >> indefinite waits for userspace. The possilbe indefinite waits for > > -------------------------------------------^^^^^^^^ possible? Yes. Thank you. Fixed. Eric
On Mon, Mar 09, 2020 at 02:02:37PM -0500, Eric W. Biederman wrote: > exec: Add exec_update_mutex to replace cred_guard_mutex > > The cred_guard_mutex is problematic as it is held over possibly > indefinite waits for userspace. The possilbe indefinite waits for > userspace that I have identified are: The cred_guard_mutex is held in > PTRACE_EVENT_EXIT waiting for the tracer. The cred_guard_mutex is > held over "put_user(0, tsk->clear_child_tid)" in exit_mm(). The > cred_guard_mutex is held over "get_user(futex_offset, ...") in > exit_robust_list. The cred_guard_mutex held over copy_strings. I suspect you're not trying to make a comprehensive list here, but do you want to mention seccomp too (since it's yet another weird case). > [...] > Holding a mutex over any of those possibly indefinite waits for > userspace does not appear necessary. Add exec_update_mutex that will > just cover updating the process during exec where the permissions and > the objects pointed to by the task struct may be out of sync. Should the specific resources be pointed out here? creds, mm, ... ? But otherwise, yup, looks sane: Reviewed-by: Kees Cook <keescook@chromium.org>
Kees Cook <keescook@chromium.org> writes: > On Mon, Mar 09, 2020 at 02:02:37PM -0500, Eric W. Biederman wrote: >> exec: Add exec_update_mutex to replace cred_guard_mutex >> >> The cred_guard_mutex is problematic as it is held over possibly >> indefinite waits for userspace. The possilbe indefinite waits for >> userspace that I have identified are: The cred_guard_mutex is held in >> PTRACE_EVENT_EXIT waiting for the tracer. The cred_guard_mutex is >> held over "put_user(0, tsk->clear_child_tid)" in exit_mm(). The >> cred_guard_mutex is held over "get_user(futex_offset, ...") in >> exit_robust_list. The cred_guard_mutex held over copy_strings. > > I suspect you're not trying to make a comprehensive list here, but do > you want to mention seccomp too (since it's yet another weird case). I was calling out all of the places I have found so far where cred_guard_mutex is held over waiting for userspace to maybe do something. Those places are what cause our deadlocks. >> [...] >> Holding a mutex over any of those possibly indefinite waits for >> userspace does not appear necessary. Add exec_update_mutex that will >> just cover updating the process during exec where the permissions and >> the objects pointed to by the task struct may be out of sync. > > Should the specific resources be pointed out here? creds, mm, ... ? > > But otherwise, yup, looks sane: Probably not. The design is if exec changes it we will hold the cred_guard_mutex over it, so things are semi-atomic. > Reviewed-by: Kees Cook <keescook@chromium.org> Eric
On Sun, Mar 8, 2020 at 10:41 PM Eric W. Biederman <ebiederm@xmission.com> wrote: > The cred_guard_mutex is problematic. The cred_guard_mutex is held > over the userspace accesses as the arguments from userspace are read. > The cred_guard_mutex is held of PTRACE_EVENT_EXIT as the the other > threads are killed. The cred_guard_mutex is held over > "put_user(0, tsk->clear_child_tid)" in exit_mm(). > > Any of those can result in deadlock, as the cred_guard_mutex is held > over a possible indefinite userspace waits for userspace. > > Add exec_update_mutex that is only held over exec updating process > with the new contents of exec, so that code that needs not to be > confused by exec changing the mm and the cred in ways that can not > happen during ordinary execution of a process. > > The plan is to switch the users of cred_guard_mutex to > exec_udpate_mutex one by one. This lets us move forward while still > being careful and not introducing any regressions. [...] > @@ -1034,6 +1035,11 @@ static int exec_mmap(struct mm_struct *mm) > return -EINTR; > } > } > + > + ret = mutex_lock_killable(&tsk->signal->exec_update_mutex); > + if (ret) > + return ret; We're already holding the old mmap_sem, and now nest the exec_update_mutex inside it; but then while still holding the exec_update_mutex, we do mmput(), which can e.g. end up in ksm_exit(), which can do down_write(&mm->mmap_sem) from __ksm_exit(). So I think at least lockdep will be unhappy, and I'm not sure whether it's an actual problem or not.
Jann Horn <jannh@google.com> writes: > On Sun, Mar 8, 2020 at 10:41 PM Eric W. Biederman <ebiederm@xmission.com> wrote: >> The cred_guard_mutex is problematic. The cred_guard_mutex is held >> over the userspace accesses as the arguments from userspace are read. >> The cred_guard_mutex is held of PTRACE_EVENT_EXIT as the the other >> threads are killed. The cred_guard_mutex is held over >> "put_user(0, tsk->clear_child_tid)" in exit_mm(). >> >> Any of those can result in deadlock, as the cred_guard_mutex is held >> over a possible indefinite userspace waits for userspace. >> >> Add exec_update_mutex that is only held over exec updating process >> with the new contents of exec, so that code that needs not to be >> confused by exec changing the mm and the cred in ways that can not >> happen during ordinary execution of a process. >> >> The plan is to switch the users of cred_guard_mutex to >> exec_udpate_mutex one by one. This lets us move forward while still >> being careful and not introducing any regressions. > [...] >> @@ -1034,6 +1035,11 @@ static int exec_mmap(struct mm_struct *mm) >> return -EINTR; >> } >> } >> + >> + ret = mutex_lock_killable(&tsk->signal->exec_update_mutex); >> + if (ret) >> + return ret; > > We're already holding the old mmap_sem, and now nest the > exec_update_mutex inside it; but then while still holding the > exec_update_mutex, we do mmput(), which can e.g. end up in ksm_exit(), > which can do down_write(&mm->mmap_sem) from __ksm_exit(). So I think > at least lockdep will be unhappy, and I'm not sure whether it's an > actual problem or not. Good point. I should double check the lock ordering here with mmap_sem. It doesn't look like mmput takes mmap_sem, but still there might be a lock inversion of some kind here. At least as far as lockdep is concerned and we don't want anything like that. Eric
On Tue, Mar 10, 2020 at 10:33 PM Eric W. Biederman <ebiederm@xmission.com> wrote: > Jann Horn <jannh@google.com> writes: > > On Sun, Mar 8, 2020 at 10:41 PM Eric W. Biederman <ebiederm@xmission.com> wrote: > >> The cred_guard_mutex is problematic. The cred_guard_mutex is held > >> over the userspace accesses as the arguments from userspace are read. > >> The cred_guard_mutex is held of PTRACE_EVENT_EXIT as the the other > >> threads are killed. The cred_guard_mutex is held over > >> "put_user(0, tsk->clear_child_tid)" in exit_mm(). > >> > >> Any of those can result in deadlock, as the cred_guard_mutex is held > >> over a possible indefinite userspace waits for userspace. > >> > >> Add exec_update_mutex that is only held over exec updating process > >> with the new contents of exec, so that code that needs not to be > >> confused by exec changing the mm and the cred in ways that can not > >> happen during ordinary execution of a process. > >> > >> The plan is to switch the users of cred_guard_mutex to > >> exec_udpate_mutex one by one. This lets us move forward while still > >> being careful and not introducing any regressions. > > [...] > >> @@ -1034,6 +1035,11 @@ static int exec_mmap(struct mm_struct *mm) > >> return -EINTR; > >> } > >> } > >> + > >> + ret = mutex_lock_killable(&tsk->signal->exec_update_mutex); > >> + if (ret) > >> + return ret; > > > > We're already holding the old mmap_sem, and now nest the > > exec_update_mutex inside it; but then while still holding the > > exec_update_mutex, we do mmput(), which can e.g. end up in ksm_exit(), > > which can do down_write(&mm->mmap_sem) from __ksm_exit(). So I think > > at least lockdep will be unhappy, and I'm not sure whether it's an > > actual problem or not. > > Good point. I should double check the lock ordering here with mmap_sem. > It doesn't look like mmput takes mmap_sem You sure about that? mmput() -> __mmput() -> ksm_exit() -> __ksm_exit() -> down_write(&mm->mmap_sem) Or also: mmput() -> __mmput() -> khugepaged_exit() -> __khugepaged_exit() -> down_write(&mm->mmap_sem) Or is there a reason why those paths can't happen?
Jann Horn <jannh@google.com> writes: > On Tue, Mar 10, 2020 at 10:33 PM Eric W. Biederman > <ebiederm@xmission.com> wrote: >> Jann Horn <jannh@google.com> writes: >> > On Sun, Mar 8, 2020 at 10:41 PM Eric W. Biederman <ebiederm@xmission.com> wrote: >> >> The cred_guard_mutex is problematic. The cred_guard_mutex is held >> >> over the userspace accesses as the arguments from userspace are read. >> >> The cred_guard_mutex is held of PTRACE_EVENT_EXIT as the the other >> >> threads are killed. The cred_guard_mutex is held over >> >> "put_user(0, tsk->clear_child_tid)" in exit_mm(). >> >> >> >> Any of those can result in deadlock, as the cred_guard_mutex is held >> >> over a possible indefinite userspace waits for userspace. >> >> >> >> Add exec_update_mutex that is only held over exec updating process >> >> with the new contents of exec, so that code that needs not to be >> >> confused by exec changing the mm and the cred in ways that can not >> >> happen during ordinary execution of a process. >> >> >> >> The plan is to switch the users of cred_guard_mutex to >> >> exec_udpate_mutex one by one. This lets us move forward while still >> >> being careful and not introducing any regressions. >> > [...] >> >> @@ -1034,6 +1035,11 @@ static int exec_mmap(struct mm_struct *mm) >> >> return -EINTR; >> >> } >> >> } >> >> + >> >> + ret = mutex_lock_killable(&tsk->signal->exec_update_mutex); >> >> + if (ret) >> >> + return ret; >> > >> > We're already holding the old mmap_sem, and now nest the >> > exec_update_mutex inside it; but then while still holding the >> > exec_update_mutex, we do mmput(), which can e.g. end up in ksm_exit(), >> > which can do down_write(&mm->mmap_sem) from __ksm_exit(). So I think >> > at least lockdep will be unhappy, and I'm not sure whether it's an >> > actual problem or not. >> >> Good point. I should double check the lock ordering here with mmap_sem. >> It doesn't look like mmput takes mmap_sem > > You sure about that? mmput() -> __mmput() -> ksm_exit() -> > __ksm_exit() -> down_write(&mm->mmap_sem) > > Or also: mmput() -> __mmput() -> khugepaged_exit() -> > __khugepaged_exit() -> down_write(&mm->mmap_sem) > > Or is there a reason why those paths can't happen? Clearly I didn't look far enough. I will adjust this so that exec_update_mutex is taken before mmap_sem. Anything else is just asking for trouble. Eric
On 3/11/20 1:15 AM, Eric W. Biederman wrote: > Jann Horn <jannh@google.com> writes: > >> On Tue, Mar 10, 2020 at 10:33 PM Eric W. Biederman >> <ebiederm@xmission.com> wrote: >>> Jann Horn <jannh@google.com> writes: >>>> On Sun, Mar 8, 2020 at 10:41 PM Eric W. Biederman <ebiederm@xmission.com> wrote: >>>>> The cred_guard_mutex is problematic. The cred_guard_mutex is held >>>>> over the userspace accesses as the arguments from userspace are read. >>>>> The cred_guard_mutex is held of PTRACE_EVENT_EXIT as the the other >>>>> threads are killed. The cred_guard_mutex is held over >>>>> "put_user(0, tsk->clear_child_tid)" in exit_mm(). >>>>> >>>>> Any of those can result in deadlock, as the cred_guard_mutex is held >>>>> over a possible indefinite userspace waits for userspace. >>>>> >>>>> Add exec_update_mutex that is only held over exec updating process >>>>> with the new contents of exec, so that code that needs not to be >>>>> confused by exec changing the mm and the cred in ways that can not >>>>> happen during ordinary execution of a process. >>>>> >>>>> The plan is to switch the users of cred_guard_mutex to >>>>> exec_udpate_mutex one by one. This lets us move forward while still >>>>> being careful and not introducing any regressions. >>>> [...] >>>>> @@ -1034,6 +1035,11 @@ static int exec_mmap(struct mm_struct *mm) >>>>> return -EINTR; >>>>> } >>>>> } >>>>> + >>>>> + ret = mutex_lock_killable(&tsk->signal->exec_update_mutex); >>>>> + if (ret) >>>>> + return ret; >>>> >>>> We're already holding the old mmap_sem, and now nest the >>>> exec_update_mutex inside it; but then while still holding the >>>> exec_update_mutex, we do mmput(), which can e.g. end up in ksm_exit(), >>>> which can do down_write(&mm->mmap_sem) from __ksm_exit(). So I think >>>> at least lockdep will be unhappy, and I'm not sure whether it's an >>>> actual problem or not. >>> >>> Good point. I should double check the lock ordering here with mmap_sem. >>> It doesn't look like mmput takes mmap_sem >> >> You sure about that? mmput() -> __mmput() -> ksm_exit() -> >> __ksm_exit() -> down_write(&mm->mmap_sem) >> >> Or also: mmput() -> __mmput() -> khugepaged_exit() -> >> __khugepaged_exit() -> down_write(&mm->mmap_sem) >> >> Or is there a reason why those paths can't happen? > > Clearly I didn't look far enough. > > I will adjust this so that exec_update_mutex is taken before mmap_sem. > Anything else is just asking for trouble. > Note that vm_access does also mmput under the exec_update_mutex. So I don't see a huge problem here. But maybe I missed something. Bernd.
On Sun, 2020-03-08 at 16:38 -0500, Eric W. Biederman wrote: > The cred_guard_mutex is problematic. The cred_guard_mutex is held > over the userspace accesses as the arguments from userspace are read. > The cred_guard_mutex is held of PTRACE_EVENT_EXIT as the the other > threads are killed. The cred_guard_mutex is held over > "put_user(0, tsk->clear_child_tid)" in exit_mm(). > > Any of those can result in deadlock, as the cred_guard_mutex is held > over a possible indefinite userspace waits for userspace. > > Add exec_update_mutex that is only held over exec updating process > with the new contents of exec, so that code that needs not to be > confused by exec changing the mm and the cred in ways that can not > happen during ordinary execution of a process. > > The plan is to switch the users of cred_guard_mutex to > exec_udpate_mutex one by one. This lets us move forward while still > being careful and not introducing any regressions. > > Link: https://lore.kernel.org/lkml/20160921152946.GA24210@dhcp22.suse.cz/ > Link: https://lore.kernel.org/lkml/AM6PR03MB5170B06F3A2B75EFB98D071AE4E60@AM6PR03MB5170.eurprd03.prod.outlook.com/ > Link: https://lore.kernel.org/linux-fsdevel/20161102181806.GB1112@redhat.com/ > Link: https://lore.kernel.org/lkml/20160923095031.GA14923@redhat.com/ > Link: https://lore.kernel.org/lkml/20170213141452.GA30203@redhat.com/ > Ref: 45c1a159b85b ("Add PTRACE_O_TRACEVFORKDONE and PTRACE_O_TRACEEXIT facilities.") > Ref: 456f17cd1a28 ("[PATCH] user-vm-unlock-2.5.31-A2") > Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> This patch will trigger a warning during boot, [Â Â Â 19.707214][Â Â Â Â T1] pci 0035:01:00.0: enabling device (0545 -> 0547) [Â Â Â 19.707287][Â Â Â Â T1] EEH: Capable adapter found: recovery enabled. [Â Â Â 19.732541][Â Â Â Â T1] cpuidle-powernv: Default stop: psscr = 0x0000000000000330,mask=0x00000000003003ff [Â Â Â 19.732567][Â Â Â Â T1] cpuidle-powernv: Deepest stop: psscr = 0x0000000000300375,mask=0x00000000003003ff [Â Â Â 19.732598][Â Â Â Â T1] cpuidle-powernv: First stop level that may lose SPRs = 0x4 [Â Â Â 19.732617][Â Â Â Â T1] cpuidle-powernv: First stop level that may lose timebase = 0x10 [Â Â Â 19.769784][Â Â Â Â T1] HugeTLB registered 2.00 MiB page size, pre-allocated 0 pages [Â Â Â 19.769810][Â Â Â Â T1] HugeTLB registered 1.00 GiB page size, pre-allocated 0 pages [Â Â Â 19.789344][Â Â T718]Â [Â Â Â 19.789367][Â Â T718] ===================================== [Â Â Â 19.789379][Â Â T718] WARNING: bad unlock balance detected! [Â Â Â 19.789393][Â Â T718] 5.6.0-rc5-next-20200311+ #4 Not tainted [Â Â Â 19.789414][Â Â T718] ------------------------------------- [Â Â Â 19.789426][Â Â T718] kworker/u257:0/718 is trying to release lock (&sig- >exec_update_mutex) at: [Â Â Â 19.789459][Â Â T718] [<c0000000004c6770>] free_bprm+0xe0/0xf0 [Â Â Â 19.789481][Â Â T718] but there are no more locks to release! [Â Â Â 19.789502][Â Â T718]Â [Â Â Â 19.789502][Â Â T718] other info that might help us debug this: [Â Â Â 19.789537][Â Â T718] 1 lock held by kworker/u257:0/718: [Â Â Â 19.789558][Â Â T718]Â Â #0: c000001fa8842808 (&sig->cred_guard_mutex){+.+.}, at: __do_execve_file.isra.33+0x1b0/0xda0 [Â Â Â 19.789611][Â Â T718]Â [Â Â Â 19.789611][Â Â T718] stack backtrace: [Â Â Â 19.789645][Â Â T718] CPU: 8 PID: 718 Comm: kworker/u257:0 Not tainted 5.6.0- rc5-next-20200311+ #4 [Â Â Â 19.789681][Â Â T718] Call Trace: [Â Â Â 19.789703][Â Â T718] [c000000dad8cfa70] [c000000000979b40] dump_stack+0xf4/0x164 (unreliable) [Â Â Â 19.789742][Â Â T718] [c000000dad8cfac0] [c0000000001c1d78] print_unlock_imbalance_bug+0x118/0x140 [Â Â Â 19.789780][Â Â T718] [c000000dad8cfb40] [c0000000001ceaa0] lock_release+0x270/0x520 [Â Â Â 19.789817][Â Â T718] [c000000dad8cfbf0] [c0000000009a2898] __mutex_unlock_slowpath+0x68/0x400 [Â Â Â 19.789854][Â Â T718] [c000000dad8cfcc0] [c0000000004c6770] free_bprm+0xe0/0xf0 [Â Â Â 19.789900][Â Â T718] [c000000dad8cfcf0] [c0000000004c845c] __do_execve_file.isra.33+0x44c/0xda0 __do_execve_file at fs/exec.c:1904 [Â Â Â 19.789938][Â Â T718] [c000000dad8cfde0] [c0000000001391d8] call_usermodehelper_exec_async+0x218/0x250 [Â Â Â 19.789977][Â Â T718] [c000000dad8cfe20] [c00000000000b748] ret_from_kernel_thread+0x5c/0x74 > --- > fs/exec.c | 9 +++++++++ > include/linux/sched/signal.h | 9 ++++++++- > init/init_task.c | 1 + > kernel/fork.c | 1 + > 4 files changed, 19 insertions(+), 1 deletion(-) > > diff --git a/fs/exec.c b/fs/exec.c > index d820a7272a76..ffeebb1f167b 100644 > --- a/fs/exec.c > +++ b/fs/exec.c > @@ -1014,6 +1014,7 @@ static int exec_mmap(struct mm_struct *mm) > { > struct task_struct *tsk; > struct mm_struct *old_mm, *active_mm; > + int ret; > > /* Notify parent that we're no longer interested in the old VM */ > tsk = current; > @@ -1034,6 +1035,11 @@ static int exec_mmap(struct mm_struct *mm) > return -EINTR; > } > } > + > + ret = mutex_lock_killable(&tsk->signal->exec_update_mutex); > + if (ret) > + return ret; > + > task_lock(tsk); > active_mm = tsk->active_mm; > membarrier_exec_mmap(mm); > @@ -1438,6 +1444,8 @@ static void free_bprm(struct linux_binprm *bprm) > { > free_arg_pages(bprm); > if (bprm->cred) { > + if (!bprm->mm) > + mutex_unlock(¤t->signal->exec_update_mutex); > mutex_unlock(¤t->signal->cred_guard_mutex); > abort_creds(bprm->cred); > } > @@ -1487,6 +1495,7 @@ void install_exec_creds(struct linux_binprm *bprm) > * credentials; any time after this it may be unlocked. > */ > security_bprm_committed_creds(bprm); > + mutex_unlock(¤t->signal->exec_update_mutex); > mutex_unlock(¤t->signal->cred_guard_mutex); > } > EXPORT_SYMBOL(install_exec_creds); > diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h > index 88050259c466..a29df79540ce 100644 > --- a/include/linux/sched/signal.h > +++ b/include/linux/sched/signal.h > @@ -224,7 +224,14 @@ struct signal_struct { > > struct mutex cred_guard_mutex; /* guard against foreign influences on > * credential calculations > - * (notably. ptrace) */ > + * (notably. ptrace) > + * Deprecated do not use in new code. > + * Use exec_update_mutex instead. > + */ > + struct mutex exec_update_mutex; /* Held while task_struct is being > + * updated during exec, and may have > + * inconsistent permissions. > + */ > } __randomize_layout; > > /* > diff --git a/init/init_task.c b/init/init_task.c > index 9e5cbe5eab7b..bd403ed3e418 100644 > --- a/init/init_task.c > +++ b/init/init_task.c > @@ -26,6 +26,7 @@ static struct signal_struct init_signals = { > .multiprocess = HLIST_HEAD_INIT, > .rlim = INIT_RLIMITS, > .cred_guard_mutex = __MUTEX_INITIALIZER(init_signals.cred_guard_mutex), > + .exec_update_mutex = __MUTEX_INITIALIZER(init_signals.exec_update_mutex), > #ifdef CONFIG_POSIX_TIMERS > .posix_timers = LIST_HEAD_INIT(init_signals.posix_timers), > .cputimer = { > diff --git a/kernel/fork.c b/kernel/fork.c > index 60a1295f4384..12896a6ecee6 100644 > --- a/kernel/fork.c > +++ b/kernel/fork.c > @@ -1594,6 +1594,7 @@ static int copy_signal(unsigned long clone_flags, struct task_struct *tsk) > sig->oom_score_adj_min = current->signal->oom_score_adj_min; > > mutex_init(&sig->cred_guard_mutex); > + mutex_init(&sig->exec_update_mutex); > > return 0; > }
Bernd Edlinger <bernd.edlinger@hotmail.de> writes: > On 3/11/20 1:15 AM, Eric W. Biederman wrote: >> Jann Horn <jannh@google.com> writes: >> >>> On Tue, Mar 10, 2020 at 10:33 PM Eric W. Biederman >>> <ebiederm@xmission.com> wrote: >>>> Jann Horn <jannh@google.com> writes: >>>>> On Sun, Mar 8, 2020 at 10:41 PM Eric W. Biederman <ebiederm@xmission.com> wrote: >>>>>> The cred_guard_mutex is problematic. The cred_guard_mutex is held >>>>>> over the userspace accesses as the arguments from userspace are read. >>>>>> The cred_guard_mutex is held of PTRACE_EVENT_EXIT as the the other >>>>>> threads are killed. The cred_guard_mutex is held over >>>>>> "put_user(0, tsk->clear_child_tid)" in exit_mm(). >>>>>> >>>>>> Any of those can result in deadlock, as the cred_guard_mutex is held >>>>>> over a possible indefinite userspace waits for userspace. >>>>>> >>>>>> Add exec_update_mutex that is only held over exec updating process >>>>>> with the new contents of exec, so that code that needs not to be >>>>>> confused by exec changing the mm and the cred in ways that can not >>>>>> happen during ordinary execution of a process. >>>>>> >>>>>> The plan is to switch the users of cred_guard_mutex to >>>>>> exec_udpate_mutex one by one. This lets us move forward while still >>>>>> being careful and not introducing any regressions. >>>>> [...] >>>>>> @@ -1034,6 +1035,11 @@ static int exec_mmap(struct mm_struct *mm) >>>>>> return -EINTR; >>>>>> } >>>>>> } >>>>>> + >>>>>> + ret = mutex_lock_killable(&tsk->signal->exec_update_mutex); >>>>>> + if (ret) >>>>>> + return ret; >>>>> >>>>> We're already holding the old mmap_sem, and now nest the >>>>> exec_update_mutex inside it; but then while still holding the >>>>> exec_update_mutex, we do mmput(), which can e.g. end up in ksm_exit(), >>>>> which can do down_write(&mm->mmap_sem) from __ksm_exit(). So I think >>>>> at least lockdep will be unhappy, and I'm not sure whether it's an >>>>> actual problem or not. >>>> >>>> Good point. I should double check the lock ordering here with mmap_sem. >>>> It doesn't look like mmput takes mmap_sem >>> >>> You sure about that? mmput() -> __mmput() -> ksm_exit() -> >>> __ksm_exit() -> down_write(&mm->mmap_sem) >>> >>> Or also: mmput() -> __mmput() -> khugepaged_exit() -> >>> __khugepaged_exit() -> down_write(&mm->mmap_sem) >>> >>> Or is there a reason why those paths can't happen? >> >> Clearly I didn't look far enough. >> >> I will adjust this so that exec_update_mutex is taken before mmap_sem. >> Anything else is just asking for trouble. >> > > Note that vm_access does also mmput under the exec_update_mutex. > So I don't see a huge problem here. > But maybe I missed something. The issue is that to prevent deadlock locks must always be taken in the same order. Taking mmap_sem then exec_update_mutex at the start of the function, then taking exec_update_mutex then mmap_sem in mmput, takes the two locks in two different orders. Which means that in the right set or circumstances: thread1: thread2: obtain mmap_sem optain exec_update_mutex wait for exec_update_mutex wait for mmap_sem Which guarantees that neither thread will make progress. The fix is easy I just need to take exec_update_mutex a few lines earlier. Eric
On 09.03.2020 00:38, Eric W. Biederman wrote: > > The cred_guard_mutex is problematic. The cred_guard_mutex is held > over the userspace accesses as the arguments from userspace are read. > The cred_guard_mutex is held of PTRACE_EVENT_EXIT as the the other > threads are killed. The cred_guard_mutex is held over > "put_user(0, tsk->clear_child_tid)" in exit_mm(). > > Any of those can result in deadlock, as the cred_guard_mutex is held > over a possible indefinite userspace waits for userspace. > > Add exec_update_mutex that is only held over exec updating process > with the new contents of exec, so that code that needs not to be > confused by exec changing the mm and the cred in ways that can not > happen during ordinary execution of a process. > > The plan is to switch the users of cred_guard_mutex to > exec_udpate_mutex one by one. This lets us move forward while still > being careful and not introducing any regressions. > > Link: https://lore.kernel.org/lkml/20160921152946.GA24210@dhcp22.suse.cz/ > Link: https://lore.kernel.org/lkml/AM6PR03MB5170B06F3A2B75EFB98D071AE4E60@AM6PR03MB5170.eurprd03.prod.outlook.com/ > Link: https://lore.kernel.org/linux-fsdevel/20161102181806.GB1112@redhat.com/ > Link: https://lore.kernel.org/lkml/20160923095031.GA14923@redhat.com/ > Link: https://lore.kernel.org/lkml/20170213141452.GA30203@redhat.com/ > Ref: 45c1a159b85b ("Add PTRACE_O_TRACEVFORKDONE and PTRACE_O_TRACEEXIT facilities.") > Ref: 456f17cd1a28 ("[PATCH] user-vm-unlock-2.5.31-A2") > Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> > --- > fs/exec.c | 9 +++++++++ > include/linux/sched/signal.h | 9 ++++++++- > init/init_task.c | 1 + > kernel/fork.c | 1 + > 4 files changed, 19 insertions(+), 1 deletion(-) > > diff --git a/fs/exec.c b/fs/exec.c > index d820a7272a76..ffeebb1f167b 100644 > --- a/fs/exec.c > +++ b/fs/exec.c > @@ -1014,6 +1014,7 @@ static int exec_mmap(struct mm_struct *mm) > { > struct task_struct *tsk; > struct mm_struct *old_mm, *active_mm; > + int ret; > > /* Notify parent that we're no longer interested in the old VM */ > tsk = current; > @@ -1034,6 +1035,11 @@ static int exec_mmap(struct mm_struct *mm) > return -EINTR; > } > } > + > + ret = mutex_lock_killable(&tsk->signal->exec_update_mutex); > + if (ret) > + return ret; You missed old_mm->mmap_sem unlock. See here: diff --git a/fs/exec.c b/fs/exec.c index 47582cd97f86..d557bac3e862 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -1063,8 +1063,11 @@ static int exec_mmap(struct mm_struct *mm) } ret = mutex_lock_killable(&tsk->signal->exec_update_mutex); - if (ret) + if (ret) { + if (old_mm) + up_read(&old_mm->mmap_sem); return ret; + } task_lock(tsk); active_mm = tsk->active_mm;
Kirill Tkhai <ktkhai@virtuozzo.com> writes: > On 09.03.2020 00:38, Eric W. Biederman wrote: >> >> The cred_guard_mutex is problematic. The cred_guard_mutex is held >> over the userspace accesses as the arguments from userspace are read. >> The cred_guard_mutex is held of PTRACE_EVENT_EXIT as the the other >> threads are killed. The cred_guard_mutex is held over >> "put_user(0, tsk->clear_child_tid)" in exit_mm(). >> >> Any of those can result in deadlock, as the cred_guard_mutex is held >> over a possible indefinite userspace waits for userspace. >> >> Add exec_update_mutex that is only held over exec updating process >> with the new contents of exec, so that code that needs not to be >> confused by exec changing the mm and the cred in ways that can not >> happen during ordinary execution of a process. >> >> The plan is to switch the users of cred_guard_mutex to >> exec_udpate_mutex one by one. This lets us move forward while still >> being careful and not introducing any regressions. >> >> Link: https://lore.kernel.org/lkml/20160921152946.GA24210@dhcp22.suse.cz/ >> Link: https://lore.kernel.org/lkml/AM6PR03MB5170B06F3A2B75EFB98D071AE4E60@AM6PR03MB5170.eurprd03.prod.outlook.com/ >> Link: https://lore.kernel.org/linux-fsdevel/20161102181806.GB1112@redhat.com/ >> Link: https://lore.kernel.org/lkml/20160923095031.GA14923@redhat.com/ >> Link: https://lore.kernel.org/lkml/20170213141452.GA30203@redhat.com/ >> Ref: 45c1a159b85b ("Add PTRACE_O_TRACEVFORKDONE and PTRACE_O_TRACEEXIT facilities.") >> Ref: 456f17cd1a28 ("[PATCH] user-vm-unlock-2.5.31-A2") >> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> >> --- >> fs/exec.c | 9 +++++++++ >> include/linux/sched/signal.h | 9 ++++++++- >> init/init_task.c | 1 + >> kernel/fork.c | 1 + >> 4 files changed, 19 insertions(+), 1 deletion(-) >> >> diff --git a/fs/exec.c b/fs/exec.c >> index d820a7272a76..ffeebb1f167b 100644 >> --- a/fs/exec.c >> +++ b/fs/exec.c >> @@ -1014,6 +1014,7 @@ static int exec_mmap(struct mm_struct *mm) >> { >> struct task_struct *tsk; >> struct mm_struct *old_mm, *active_mm; >> + int ret; >> >> /* Notify parent that we're no longer interested in the old VM */ >> tsk = current; >> @@ -1034,6 +1035,11 @@ static int exec_mmap(struct mm_struct *mm) >> return -EINTR; >> } >> } >> + >> + ret = mutex_lock_killable(&tsk->signal->exec_update_mutex); >> + if (ret) >> + return ret; > > You missed old_mm->mmap_sem unlock. See here: Duh. Thank you. I actually need to switch the lock ordering here, and I haven't yet because my son was sick yesterday. Something like this. diff --git a/fs/exec.c b/fs/exec.c index 96f89401b4d1..03d50c27ec01 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -1020,9 +1020,14 @@ static int exec_mmap(struct mm_struct *mm) tsk = current; old_mm = current->mm; exec_mm_release(tsk, old_mm); + if (old_mm) + sync_mm_rss(old_mm); + + ret = mutex_lock_killable(&tsk->signal->exec_update_mutex); + if (ret) + return ret; if (old_mm) { - sync_mm_rss(old_mm); /* * Make sure that if there is a core dump in progress * for the old mm, we get out and die instead of going @@ -1032,14 +1037,11 @@ static int exec_mmap(struct mm_struct *mm) down_read(&old_mm->mmap_sem); if (unlikely(old_mm->core_state)) { up_read(&old_mm->mmap_sem); + mutex_unlock(&tsk->signal->exec_update_mutex); return -EINTR; } } - ret = mutex_lock_killable(&tsk->signal->exec_update_mutex); - if (ret) - return ret; - task_lock(tsk); active_mm = tsk->active_mm; membarrier_exec_mmap(mm); > diff --git a/fs/exec.c b/fs/exec.c > index 47582cd97f86..d557bac3e862 100644 > --- a/fs/exec.c > +++ b/fs/exec.c > @@ -1063,8 +1063,11 @@ static int exec_mmap(struct mm_struct *mm) > } > > ret = mutex_lock_killable(&tsk->signal->exec_update_mutex); > - if (ret) > + if (ret) { > + if (old_mm) > + up_read(&old_mm->mmap_sem); > return ret; > + } > > task_lock(tsk); > active_mm = tsk->active_mm; Eric
On 12.03.2020 15:24, Eric W. Biederman wrote: > Kirill Tkhai <ktkhai@virtuozzo.com> writes: > >> On 09.03.2020 00:38, Eric W. Biederman wrote: >>> >>> The cred_guard_mutex is problematic. The cred_guard_mutex is held >>> over the userspace accesses as the arguments from userspace are read. >>> The cred_guard_mutex is held of PTRACE_EVENT_EXIT as the the other >>> threads are killed. The cred_guard_mutex is held over >>> "put_user(0, tsk->clear_child_tid)" in exit_mm(). >>> >>> Any of those can result in deadlock, as the cred_guard_mutex is held >>> over a possible indefinite userspace waits for userspace. >>> >>> Add exec_update_mutex that is only held over exec updating process >>> with the new contents of exec, so that code that needs not to be >>> confused by exec changing the mm and the cred in ways that can not >>> happen during ordinary execution of a process. >>> >>> The plan is to switch the users of cred_guard_mutex to >>> exec_udpate_mutex one by one. This lets us move forward while still >>> being careful and not introducing any regressions. >>> >>> Link: https://lore.kernel.org/lkml/20160921152946.GA24210@dhcp22.suse.cz/ >>> Link: https://lore.kernel.org/lkml/AM6PR03MB5170B06F3A2B75EFB98D071AE4E60@AM6PR03MB5170.eurprd03.prod.outlook.com/ >>> Link: https://lore.kernel.org/linux-fsdevel/20161102181806.GB1112@redhat.com/ >>> Link: https://lore.kernel.org/lkml/20160923095031.GA14923@redhat.com/ >>> Link: https://lore.kernel.org/lkml/20170213141452.GA30203@redhat.com/ >>> Ref: 45c1a159b85b ("Add PTRACE_O_TRACEVFORKDONE and PTRACE_O_TRACEEXIT facilities.") >>> Ref: 456f17cd1a28 ("[PATCH] user-vm-unlock-2.5.31-A2") >>> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> >>> --- >>> fs/exec.c | 9 +++++++++ >>> include/linux/sched/signal.h | 9 ++++++++- >>> init/init_task.c | 1 + >>> kernel/fork.c | 1 + >>> 4 files changed, 19 insertions(+), 1 deletion(-) >>> >>> diff --git a/fs/exec.c b/fs/exec.c >>> index d820a7272a76..ffeebb1f167b 100644 >>> --- a/fs/exec.c >>> +++ b/fs/exec.c >>> @@ -1014,6 +1014,7 @@ static int exec_mmap(struct mm_struct *mm) >>> { >>> struct task_struct *tsk; >>> struct mm_struct *old_mm, *active_mm; >>> + int ret; >>> >>> /* Notify parent that we're no longer interested in the old VM */ >>> tsk = current; >>> @@ -1034,6 +1035,11 @@ static int exec_mmap(struct mm_struct *mm) >>> return -EINTR; >>> } >>> } >>> + >>> + ret = mutex_lock_killable(&tsk->signal->exec_update_mutex); >>> + if (ret) >>> + return ret; >> >> You missed old_mm->mmap_sem unlock. See here: > > Duh. Thank you. > > I actually need to switch the lock ordering here, and I haven't yet > because my son was sick yesterday. There is some fundamental problem with your patch, since the below fires in 100% cases on current linux-next: [ 22.838717] kernel BUG at fs/exec.c:1474! diff --git a/fs/exec.c b/fs/exec.c index 47582cd97f86..0f77f8c94905 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -1470,8 +1470,10 @@ static void free_bprm(struct linux_binprm *bprm) { free_arg_pages(bprm); if (bprm->cred) { - if (!bprm->mm) + if (!bprm->mm) { + BUG_ON(!mutex_is_locked(¤t->signal->exec_update_mutex)); mutex_unlock(¤t->signal->exec_update_mutex); + } mutex_unlock(¤t->signal->cred_guard_mutex); abort_creds(bprm->cred); } @@ -1521,6 +1523,7 @@ void install_exec_creds(struct linux_binprm *bprm) * credentials; any time after this it may be unlocked. */ security_bprm_committed_creds(bprm); + BUG_ON(!mutex_is_locked(¤t->signal->exec_update_mutex)); mutex_unlock(¤t->signal->exec_update_mutex); mutex_unlock(¤t->signal->cred_guard_mutex); } --------------------------------------------------------------------------------------------- First time the mutex is unlocked in: exec_binprm()->search_binary_handler()->.load_binary->install_exec_creds() Then exec_binprm()->search_binary_handler()->.load_binary->flush_old_exec() clears mm: bprm->mm = NULL; Second time the mutex is unlocked in free_bprm(): if (bprm->cred) { if (!bprm->mm) mutex_unlock(¤t->signal->exec_update_mutex); My opinion is we should not relay on side indicators like bprm->mm. Better you may introduce struct linux_binprm::exec_update_mutex_is_locked. So the next person dealing with this after you won't waste much time on diving into this. Also, if someone decides to change the place, where bprm->mm is set into NULL, this person will bump into hell of dependences between unrelated components like your newly introduced mutex. So, I'm strongly for *struct linux_binprm::exec_update_mutex_is_locked*, since this improves modularity.
Kirill Tkhai <ktkhai@virtuozzo.com> writes: > On 12.03.2020 15:24, Eric W. Biederman wrote: >> Kirill Tkhai <ktkhai@virtuozzo.com> writes: >> >>> On 09.03.2020 00:38, Eric W. Biederman wrote: >>>> >>>> The cred_guard_mutex is problematic. The cred_guard_mutex is held >>>> over the userspace accesses as the arguments from userspace are read. >>>> The cred_guard_mutex is held of PTRACE_EVENT_EXIT as the the other >>>> threads are killed. The cred_guard_mutex is held over >>>> "put_user(0, tsk->clear_child_tid)" in exit_mm(). >>>> >>>> Any of those can result in deadlock, as the cred_guard_mutex is held >>>> over a possible indefinite userspace waits for userspace. >>>> >>>> Add exec_update_mutex that is only held over exec updating process >>>> with the new contents of exec, so that code that needs not to be >>>> confused by exec changing the mm and the cred in ways that can not >>>> happen during ordinary execution of a process. >>>> >>>> The plan is to switch the users of cred_guard_mutex to >>>> exec_udpate_mutex one by one. This lets us move forward while still >>>> being careful and not introducing any regressions. >>>> >>>> Link: https://lore.kernel.org/lkml/20160921152946.GA24210@dhcp22.suse.cz/ >>>> Link: https://lore.kernel.org/lkml/AM6PR03MB5170B06F3A2B75EFB98D071AE4E60@AM6PR03MB5170.eurprd03.prod.outlook.com/ >>>> Link: https://lore.kernel.org/linux-fsdevel/20161102181806.GB1112@redhat.com/ >>>> Link: https://lore.kernel.org/lkml/20160923095031.GA14923@redhat.com/ >>>> Link: https://lore.kernel.org/lkml/20170213141452.GA30203@redhat.com/ >>>> Ref: 45c1a159b85b ("Add PTRACE_O_TRACEVFORKDONE and PTRACE_O_TRACEEXIT facilities.") >>>> Ref: 456f17cd1a28 ("[PATCH] user-vm-unlock-2.5.31-A2") >>>> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> >>>> --- >>>> fs/exec.c | 9 +++++++++ >>>> include/linux/sched/signal.h | 9 ++++++++- >>>> init/init_task.c | 1 + >>>> kernel/fork.c | 1 + >>>> 4 files changed, 19 insertions(+), 1 deletion(-) >>>> >>>> diff --git a/fs/exec.c b/fs/exec.c >>>> index d820a7272a76..ffeebb1f167b 100644 >>>> --- a/fs/exec.c >>>> +++ b/fs/exec.c >>>> @@ -1014,6 +1014,7 @@ static int exec_mmap(struct mm_struct *mm) >>>> { >>>> struct task_struct *tsk; >>>> struct mm_struct *old_mm, *active_mm; >>>> + int ret; >>>> >>>> /* Notify parent that we're no longer interested in the old VM */ >>>> tsk = current; >>>> @@ -1034,6 +1035,11 @@ static int exec_mmap(struct mm_struct *mm) >>>> return -EINTR; >>>> } >>>> } >>>> + >>>> + ret = mutex_lock_killable(&tsk->signal->exec_update_mutex); >>>> + if (ret) >>>> + return ret; >>> >>> You missed old_mm->mmap_sem unlock. See here: >> >> Duh. Thank you. >> >> I actually need to switch the lock ordering here, and I haven't yet >> because my son was sick yesterday. > > There is some fundamental problem with your patch, since the below fires in 100% cases > on current linux-next: Thank you. I have just backed this out of linux-next for now because it is clearly flawed. You make some good points about the recursion. I will go back to the drawing board and see what I can work out. > [ 22.838717] kernel BUG at fs/exec.c:1474! > > diff --git a/fs/exec.c b/fs/exec.c > index 47582cd97f86..0f77f8c94905 100644 > --- a/fs/exec.c > +++ b/fs/exec.c > @@ -1470,8 +1470,10 @@ static void free_bprm(struct linux_binprm *bprm) > { > free_arg_pages(bprm); > if (bprm->cred) { > - if (!bprm->mm) > + if (!bprm->mm) { > + BUG_ON(!mutex_is_locked(¤t->signal->exec_update_mutex)); > mutex_unlock(¤t->signal->exec_update_mutex); > + } > mutex_unlock(¤t->signal->cred_guard_mutex); > abort_creds(bprm->cred); > } > @@ -1521,6 +1523,7 @@ void install_exec_creds(struct linux_binprm *bprm) > * credentials; any time after this it may be unlocked. > */ > security_bprm_committed_creds(bprm); > + BUG_ON(!mutex_is_locked(¤t->signal->exec_update_mutex)); > mutex_unlock(¤t->signal->exec_update_mutex); > mutex_unlock(¤t->signal->cred_guard_mutex); > } > > --------------------------------------------------------------------------------------------- > > First time the mutex is unlocked in: > > exec_binprm()->search_binary_handler()->.load_binary->install_exec_creds() > > Then exec_binprm()->search_binary_handler()->.load_binary->flush_old_exec() clears mm: > > bprm->mm = NULL; > > Second time the mutex is unlocked in free_bprm(): > > if (bprm->cred) { > if (!bprm->mm) > mutex_unlock(¤t->signal->exec_update_mutex); > > My opinion is we should not relay on side indicators like bprm->mm. Better you may > introduce struct linux_binprm::exec_update_mutex_is_locked. So the next person dealing > with this after you won't waste much time on diving into this. Also, if someone decides > to change the place, where bprm->mm is set into NULL, this person will bump into hell > of dependences between unrelated components like your newly introduced mutex. > > So, I'm strongly for *struct linux_binprm::exec_update_mutex_is_locked*, since this improves > modularity. Am I wrong or is that also a problem with cred_guard_mutex? Eric
On 12.03.2020 17:38, Eric W. Biederman wrote: > Kirill Tkhai <ktkhai@virtuozzo.com> writes: > >> On 12.03.2020 15:24, Eric W. Biederman wrote: >>> Kirill Tkhai <ktkhai@virtuozzo.com> writes: >>> >>>> On 09.03.2020 00:38, Eric W. Biederman wrote: >>>>> >>>>> The cred_guard_mutex is problematic. The cred_guard_mutex is held >>>>> over the userspace accesses as the arguments from userspace are read. >>>>> The cred_guard_mutex is held of PTRACE_EVENT_EXIT as the the other >>>>> threads are killed. The cred_guard_mutex is held over >>>>> "put_user(0, tsk->clear_child_tid)" in exit_mm(). >>>>> >>>>> Any of those can result in deadlock, as the cred_guard_mutex is held >>>>> over a possible indefinite userspace waits for userspace. >>>>> >>>>> Add exec_update_mutex that is only held over exec updating process >>>>> with the new contents of exec, so that code that needs not to be >>>>> confused by exec changing the mm and the cred in ways that can not >>>>> happen during ordinary execution of a process. >>>>> >>>>> The plan is to switch the users of cred_guard_mutex to >>>>> exec_udpate_mutex one by one. This lets us move forward while still >>>>> being careful and not introducing any regressions. >>>>> >>>>> Link: https://lore.kernel.org/lkml/20160921152946.GA24210@dhcp22.suse.cz/ >>>>> Link: https://lore.kernel.org/lkml/AM6PR03MB5170B06F3A2B75EFB98D071AE4E60@AM6PR03MB5170.eurprd03.prod.outlook.com/ >>>>> Link: https://lore.kernel.org/linux-fsdevel/20161102181806.GB1112@redhat.com/ >>>>> Link: https://lore.kernel.org/lkml/20160923095031.GA14923@redhat.com/ >>>>> Link: https://lore.kernel.org/lkml/20170213141452.GA30203@redhat.com/ >>>>> Ref: 45c1a159b85b ("Add PTRACE_O_TRACEVFORKDONE and PTRACE_O_TRACEEXIT facilities.") >>>>> Ref: 456f17cd1a28 ("[PATCH] user-vm-unlock-2.5.31-A2") >>>>> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> >>>>> --- >>>>> fs/exec.c | 9 +++++++++ >>>>> include/linux/sched/signal.h | 9 ++++++++- >>>>> init/init_task.c | 1 + >>>>> kernel/fork.c | 1 + >>>>> 4 files changed, 19 insertions(+), 1 deletion(-) >>>>> >>>>> diff --git a/fs/exec.c b/fs/exec.c >>>>> index d820a7272a76..ffeebb1f167b 100644 >>>>> --- a/fs/exec.c >>>>> +++ b/fs/exec.c >>>>> @@ -1014,6 +1014,7 @@ static int exec_mmap(struct mm_struct *mm) >>>>> { >>>>> struct task_struct *tsk; >>>>> struct mm_struct *old_mm, *active_mm; >>>>> + int ret; >>>>> >>>>> /* Notify parent that we're no longer interested in the old VM */ >>>>> tsk = current; >>>>> @@ -1034,6 +1035,11 @@ static int exec_mmap(struct mm_struct *mm) >>>>> return -EINTR; >>>>> } >>>>> } >>>>> + >>>>> + ret = mutex_lock_killable(&tsk->signal->exec_update_mutex); >>>>> + if (ret) >>>>> + return ret; >>>> >>>> You missed old_mm->mmap_sem unlock. See here: >>> >>> Duh. Thank you. >>> >>> I actually need to switch the lock ordering here, and I haven't yet >>> because my son was sick yesterday. >> >> There is some fundamental problem with your patch, since the below fires in 100% cases >> on current linux-next: > > Thank you. > > I have just backed this out of linux-next for now because it is clearly > flawed. > > You make some good points about the recursion. I will go back to the > drawing board and see what I can work out. > > >> [ 22.838717] kernel BUG at fs/exec.c:1474! >> >> diff --git a/fs/exec.c b/fs/exec.c >> index 47582cd97f86..0f77f8c94905 100644 >> --- a/fs/exec.c >> +++ b/fs/exec.c >> @@ -1470,8 +1470,10 @@ static void free_bprm(struct linux_binprm *bprm) >> { >> free_arg_pages(bprm); >> if (bprm->cred) { >> - if (!bprm->mm) >> + if (!bprm->mm) { >> + BUG_ON(!mutex_is_locked(¤t->signal->exec_update_mutex)); >> mutex_unlock(¤t->signal->exec_update_mutex); >> + } >> mutex_unlock(¤t->signal->cred_guard_mutex); >> abort_creds(bprm->cred); >> } >> @@ -1521,6 +1523,7 @@ void install_exec_creds(struct linux_binprm *bprm) >> * credentials; any time after this it may be unlocked. >> */ >> security_bprm_committed_creds(bprm); >> + BUG_ON(!mutex_is_locked(¤t->signal->exec_update_mutex)); >> mutex_unlock(¤t->signal->exec_update_mutex); >> mutex_unlock(¤t->signal->cred_guard_mutex); >> } >> >> --------------------------------------------------------------------------------------------- >> >> First time the mutex is unlocked in: >> >> exec_binprm()->search_binary_handler()->.load_binary->install_exec_creds() >> >> Then exec_binprm()->search_binary_handler()->.load_binary->flush_old_exec() clears mm: >> >> bprm->mm = NULL; >> >> Second time the mutex is unlocked in free_bprm(): >> >> if (bprm->cred) { >> if (!bprm->mm) >> mutex_unlock(¤t->signal->exec_update_mutex); >> >> My opinion is we should not relay on side indicators like bprm->mm. Better you may >> introduce struct linux_binprm::exec_update_mutex_is_locked. So the next person dealing >> with this after you won't waste much time on diving into this. Also, if someone decides >> to change the place, where bprm->mm is set into NULL, this person will bump into hell >> of dependences between unrelated components like your newly introduced mutex. >> >> So, I'm strongly for *struct linux_binprm::exec_update_mutex_is_locked*, since this improves >> modularity. > > Am I wrong or is that also a problem with cred_guard_mutex? No, there is no a problem. cred_guard_mutex is locked in a pair with bprm->cred = prepare_exec_creds() assignment. cred_guard_mutex is unlocked in a pair with bprm->cred = NULL clearing (see install_exec_creds()). Further free_bprm() skip unlock in case of bprm->cred is NULL.
On 3/12/20 3:38 PM, Eric W. Biederman wrote: > Kirill Tkhai <ktkhai@virtuozzo.com> writes: > >> On 12.03.2020 15:24, Eric W. Biederman wrote: >>> >>> I actually need to switch the lock ordering here, and I haven't yet >>> because my son was sick yesterday. All the best wishes to you and your son. I hope he will get well soon. And sorry for not missing the issue in the review. The reason turns out that bprm_mm_init is called after prepare_bprm_creds, but there are error pathes between those where free_bprm is called up with cred != NULL and mm == NULL, but the mutex not locked. I figured out a possible fix for the problem that was pointed out: From ceb6f65b52b3a7f0280f4f20509a1564a439edf6 Mon Sep 17 00:00:00 2001 From: Bernd Edlinger <bernd.edlinger@hotmail.de> Date: Wed, 11 Mar 2020 15:31:07 +0100 Subject: [PATCH] Fix issues with exec_update_mutex Signed-off-by: Bernd Edlinger <bernd.edlinger@hotmail.de> --- fs/exec.c | 17 ++++++++++------- 1 file changed, 10 insertions(+), 7 deletions(-) diff --git a/fs/exec.c b/fs/exec.c index ffeebb1..cde4937 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -1021,8 +1021,14 @@ static int exec_mmap(struct mm_struct *mm) old_mm = current->mm; exec_mm_release(tsk, old_mm); - if (old_mm) { + if (old_mm) sync_mm_rss(old_mm); + + ret = mutex_lock_killable(&tsk->signal->exec_update_mutex); + if (ret) + return ret; + + if (old_mm) { /* * Make sure that if there is a core dump in progress * for the old mm, we get out and die instead of going @@ -1032,14 +1038,11 @@ static int exec_mmap(struct mm_struct *mm) down_read(&old_mm->mmap_sem); if (unlikely(old_mm->core_state)) { up_read(&old_mm->mmap_sem); + mutex_unlock(&tsk->signal->exec_update_mutex); return -EINTR; } } - ret = mutex_lock_killable(&tsk->signal->exec_update_mutex); - if (ret) - return ret; - task_lock(tsk); active_mm = tsk->active_mm; membarrier_exec_mmap(mm); @@ -1444,8 +1447,6 @@ static void free_bprm(struct linux_binprm *bprm) { free_arg_pages(bprm); if (bprm->cred) { - if (!bprm->mm) - mutex_unlock(¤t->signal->exec_update_mutex); mutex_unlock(¤t->signal->cred_guard_mutex); abort_creds(bprm->cred); } @@ -1846,6 +1847,8 @@ static int __do_execve_file(int fd, struct filename *filename, would_dump(bprm, bprm->file); retval = exec_binprm(bprm); + if (bprm->cred && !bprm->mm) + mutex_unlock(¤t->signal->exec_update_mutex); if (retval < 0) goto out;
On 13.03.2020 04:05, Bernd Edlinger wrote: > On 3/12/20 3:38 PM, Eric W. Biederman wrote: >> Kirill Tkhai <ktkhai@virtuozzo.com> writes: >> >>> On 12.03.2020 15:24, Eric W. Biederman wrote: >>>> >>>> I actually need to switch the lock ordering here, and I haven't yet >>>> because my son was sick yesterday. > > All the best wishes to you and your son. I hope he will get well soon. > > And sorry for not missing the issue in the review. The reason turns > out that bprm_mm_init is called after prepare_bprm_creds, but there > are error pathes between those where free_bprm is called up with > cred != NULL and mm == NULL, but the mutex not locked. > > I figured out a possible fix for the problem that was pointed out: > > > From ceb6f65b52b3a7f0280f4f20509a1564a439edf6 Mon Sep 17 00:00:00 2001 > From: Bernd Edlinger <bernd.edlinger@hotmail.de> > Date: Wed, 11 Mar 2020 15:31:07 +0100 > Subject: [PATCH] Fix issues with exec_update_mutex > > Signed-off-by: Bernd Edlinger <bernd.edlinger@hotmail.de> > --- > fs/exec.c | 17 ++++++++++------- > 1 file changed, 10 insertions(+), 7 deletions(-) > > diff --git a/fs/exec.c b/fs/exec.c > index ffeebb1..cde4937 100644 > --- a/fs/exec.c > +++ b/fs/exec.c > @@ -1021,8 +1021,14 @@ static int exec_mmap(struct mm_struct *mm) > old_mm = current->mm; > exec_mm_release(tsk, old_mm); > > - if (old_mm) { > + if (old_mm) > sync_mm_rss(old_mm); > + > + ret = mutex_lock_killable(&tsk->signal->exec_update_mutex); > + if (ret) > + return ret; > + > + if (old_mm) { > /* > * Make sure that if there is a core dump in progress > * for the old mm, we get out and die instead of going > @@ -1032,14 +1038,11 @@ static int exec_mmap(struct mm_struct *mm) > down_read(&old_mm->mmap_sem); > if (unlikely(old_mm->core_state)) { > up_read(&old_mm->mmap_sem); > + mutex_unlock(&tsk->signal->exec_update_mutex); > return -EINTR; > } > } > > - ret = mutex_lock_killable(&tsk->signal->exec_update_mutex); > - if (ret) > - return ret; > - > task_lock(tsk); > active_mm = tsk->active_mm; > membarrier_exec_mmap(mm); > @@ -1444,8 +1447,6 @@ static void free_bprm(struct linux_binprm *bprm) > { > free_arg_pages(bprm); > if (bprm->cred) { > - if (!bprm->mm) > - mutex_unlock(¤t->signal->exec_update_mutex); > mutex_unlock(¤t->signal->cred_guard_mutex); > abort_creds(bprm->cred); > } > @@ -1846,6 +1847,8 @@ static int __do_execve_file(int fd, struct filename *filename, > would_dump(bprm, bprm->file); > > retval = exec_binprm(bprm); > + if (bprm->cred && !bprm->mm) > + mutex_unlock(¤t->signal->exec_update_mutex); Despite this should fix the problem, this looks like a broken puzzle. We can't use bprm->cred as an identifier whether the mutex was locked or not. We can check for bprm->cred in regard to cred_guard_mutex, because of there is strong rule: "cred_guard_mutex is becomes locked together with bprm->cred assignment (see prepare_bprm_creds()), and it becomes unlocked together with bprm->cred zeroing". Take attention on modularity of all this: there is no dependencies between anything else. In regard to newly introduced exec_update_mutex, your fix and source patch way look like an obfuscation. The mutex becomes deadly glued to unrelated bprm->cred and bprm->mm, and this introduces the problems in the future modifications and support of all involved entities. If someone wants to move some functions in relation to each other, there will be a pain, and this person will have to go again the same dependencies and bug way, Eric stepped on in the original patch.
This completes the new infrastructure patch, and replaces the cred_guard_mutex with an exec_guard_mutex, and a boolean, that is set, when a dead-lock situation is detected. I also change ptrace_traceme to use the new mutex, but I consider it a bug, that it didn't take any mutex previously since it calls security_ptrace_traceme, and all the security modules operate under the assumption that execve is not operating in parallel. This patch fixes the test case tools/testing/selftests/ptrace/vmaccess: [==========] Running 2 tests from 1 test cases. [ RUN ] global.vmaccess [ OK ] global.vmaccess [ RUN ] global.attach [ OK ] global.attach <= this was still failing [==========] 2 / 2 tests passed. [ PASSED ] Yes, it is an API change, but only in some very special case, so I would exepect this to be un-noticeable to user space applications. Bernd Edlinger (2): exec: Fix dead-lock in de_thread with ptrace_attach doc: Update documentation of ->exec_*_mutex Documentation/security/credentials.rst | 29 +++++++++++++++------- fs/exec.c | 44 +++++++++++++++++++++++++++------- fs/proc/base.c | 13 ++++++---- include/linux/sched/signal.h | 14 +++++++---- init/init_task.c | 2 +- kernel/cred.c | 2 +- kernel/fork.c | 2 +- kernel/ptrace.c | 20 +++++++++++++--- kernel/seccomp.c | 15 +++++++----- 9 files changed, 102 insertions(+), 39 deletions(-)
On 3/13/20 10:13 AM, Kirill Tkhai wrote: > > Despite this should fix the problem, this looks like a broken puzzle. > > We can't use bprm->cred as an identifier whether the mutex was locked or not. > We can check for bprm->cred in regard to cred_guard_mutex, because of there is > strong rule: "cred_guard_mutex is becomes locked together with bprm->cred assignment > (see prepare_bprm_creds()), and it becomes unlocked together with bprm->cred zeroing". > Take attention on modularity of all this: there is no dependencies between anything else. > > In regard to newly introduced exec_update_mutex, your fix and source patch way look like > an obfuscation. The mutex becomes deadly glued to unrelated bprm->cred and bprm->mm, > and this introduces the problems in the future modifications and support of all involved > entities. If someone wants to move some functions in relation to each other, there will > be a pain, and this person will have to go again the same dependencies and bug way, > Eric stepped on in the original patch. > Okay, yes, valid points you make, thanks. I just wanted to understand what was exactly wrong with this patch, since the failure mode looked a lot like it was failing because of something clobbering the data unexpectedly. So I have posted a few updated patch for the failed one here: [PATCH v3 5/5] exec: Add a exec_update_mutex to replace cred_guard_mutex [PATCH] pidfd: Use new infrastructure to fix deadlocks in execve which replaces these: [PATCH v2 5/5] exec: Add a exec_update_mutex to replace cred_guard_mutex https://lore.kernel.org/lkml/87zhcq4jdj.fsf_-_@x220.int.ebiederm.org/ [PATCH] pidfd: Stop taking cred_guard_mutex https://lore.kernel.org/lkml/87wo7svy96.fsf_-_@x220.int.ebiederm.org/ and a new patch series to fix deadlock in ptrace_attach and update doc: [PATCH 0/2] exec: Fix dead-lock in de_thread with ptrace_attach [PATCH 1/2] exec: Fix dead-lock in de_thread with ptrace_attach [PATCH 2/2] doc: Update documentation of ->exec_*_mutex Other patches needed, still valid: [PATCH v2 1/5] exec: Only compute current once in flush_old_exec https://lore.kernel.org/lkml/87pndm5y3l.fsf_-_@x220.int.ebiederm.org/ [PATCH v2 2/5] exec: Factor unshare_sighand out of de_thread and call it separately https://lore.kernel.org/lkml/87k13u5y26.fsf_-_@x220.int.ebiederm.org/ [PATCH v2 4/5] exec: Move exec_mmap right after de_thread in flush_old_exec https://lore.kernel.org/lkml/875zfe5xzb.fsf_-_@x220.int.ebiederm.org/ [PATCH 1/4] exec: Fix a deadlock in ptrace https://lore.kernel.org/lkml/AM6PR03MB517033EAD25BED15CC84E17DE4FF0@AM6PR03MB5170.eurprd03.prod.outlook.com/ [PATCH 2/4] selftests/ptrace: add test cases for dead-locks https://lore.kernel.org/lkml/AM6PR03MB51703199741A2C27A78980FFE4FF0@AM6PR03MB5170.eurprd03.prod.outlook.com/ [PATCH 3/4] mm: docs: Fix a comment in process_vm_rw_core https://lore.kernel.org/lkml/AM6PR03MB5170ED6D4D216EEEEF400136E4FF0@AM6PR03MB5170.eurprd03.prod.outlook.com/ [PATCH 4/4] kernel: doc: remove outdated comment cred.c https://lore.kernel.org/lkml/AM6PR03MB517039DB07AB641C194FEA57E4FF0@AM6PR03MB5170.eurprd03.prod.outlook.com/ [PATCH 1/4] kernel/kcmp.c: Use new infrastructure to fix deadlocks in execve https://lore.kernel.org/lkml/AM6PR03MB517057A2269C3A4FB287B76EE4FF0@AM6PR03MB5170.eurprd03.prod.outlook.com/ [PATCH 2/4] proc: Use new infrastructure to fix deadlocks in execve https://lore.kernel.org/lkml/AM6PR03MB51705D211EC8E7EA270627B1E4FF0@AM6PR03MB5170.eurprd03.prod.outlook.com/ [PATCH 3/4] proc: io_accounting: Use new infrastructure to fix deadlocks in execve https://lore.kernel.org/lkml/AM6PR03MB5170BD2476E35068E182EFA4E4FF0@AM6PR03MB5170.eurprd03.prod.outlook.com/ [PATCH 4/4] perf: Use new infrastructure to fix deadlocks in execve https://lore.kernel.org/lkml/AM6PR03MB517035DEEDB9C8699CB6B34EE4FF0@AM6PR03MB5170.eurprd03.prod.outlook.com/ I think most of the existing patches are already approved, but if there are still change requests, please let me know. Thanks Bernd.
On 3/14/20 10:57 AM, Bernd Edlinger wrote: > On 3/13/20 10:13 AM, Kirill Tkhai wrote: >> >> Despite this should fix the problem, this looks like a broken puzzle. >> >> We can't use bprm->cred as an identifier whether the mutex was locked or not. >> We can check for bprm->cred in regard to cred_guard_mutex, because of there is >> strong rule: "cred_guard_mutex is becomes locked together with bprm->cred assignment >> (see prepare_bprm_creds()), and it becomes unlocked together with bprm->cred zeroing". >> Take attention on modularity of all this: there is no dependencies between anything else. >> >> In regard to newly introduced exec_update_mutex, your fix and source patch way look like >> an obfuscation. The mutex becomes deadly glued to unrelated bprm->cred and bprm->mm, >> and this introduces the problems in the future modifications and support of all involved >> entities. If someone wants to move some functions in relation to each other, there will >> be a pain, and this person will have to go again the same dependencies and bug way, >> Eric stepped on in the original patch. >> > > Okay, yes, valid points you make, thanks. > I just wanted to understand what was exactly wrong with this patch, > since the failure mode looked a lot like it was failing because of > something clobbering the data unexpectedly. > > > So I have posted a few updated patch for the failed one here: > > [PATCH v3 5/5] exec: Add a exec_update_mutex to replace cred_guard_mutex > [PATCH] pidfd: Use new infrastructure to fix deadlocks in execve > > which replaces these: > [PATCH v2 5/5] exec: Add a exec_update_mutex to replace cred_guard_mutex > https://lore.kernel.org/lkml/87zhcq4jdj.fsf_-_@x220.int.ebiederm.org/ > > [PATCH] pidfd: Stop taking cred_guard_mutex > https://lore.kernel.org/lkml/87wo7svy96.fsf_-_@x220.int.ebiederm.org/ > > > and a new patch series to fix deadlock in ptrace_attach and update doc: > [PATCH 0/2] exec: Fix dead-lock in de_thread with ptrace_attach > [PATCH 1/2] exec: Fix dead-lock in de_thread with ptrace_attach > [PATCH 2/2] doc: Update documentation of ->exec_*_mutex > > > Other patches needed, still valid: > > [PATCH v2 1/5] exec: Only compute current once in flush_old_exec > https://lore.kernel.org/lkml/87pndm5y3l.fsf_-_@x220.int.ebiederm.org/ > > [PATCH v2 2/5] exec: Factor unshare_sighand out of de_thread and call it separately > https://lore.kernel.org/lkml/87k13u5y26.fsf_-_@x220.int.ebiederm.org/ > Ah, sorry, forgot this one: [PATCH v2 3/5] exec: Move cleanup of posix timers on exec out of de_thread https://lore.kernel.org/lkml/87eeu25y14.fsf_-_@x220.int.ebiederm.org/ > [PATCH v2 4/5] exec: Move exec_mmap right after de_thread in flush_old_exec > https://lore.kernel.org/lkml/875zfe5xzb.fsf_-_@x220.int.ebiederm.org/ > > [PATCH 1/4] exec: Fix a deadlock in ptrace > https://lore.kernel.org/lkml/AM6PR03MB517033EAD25BED15CC84E17DE4FF0@AM6PR03MB5170.eurprd03.prod.outlook.com/ > > [PATCH 2/4] selftests/ptrace: add test cases for dead-locks > https://lore.kernel.org/lkml/AM6PR03MB51703199741A2C27A78980FFE4FF0@AM6PR03MB5170.eurprd03.prod.outlook.com/ > > [PATCH 3/4] mm: docs: Fix a comment in process_vm_rw_core > https://lore.kernel.org/lkml/AM6PR03MB5170ED6D4D216EEEEF400136E4FF0@AM6PR03MB5170.eurprd03.prod.outlook.com/ > > [PATCH 4/4] kernel: doc: remove outdated comment cred.c > https://lore.kernel.org/lkml/AM6PR03MB517039DB07AB641C194FEA57E4FF0@AM6PR03MB5170.eurprd03.prod.outlook.com/ > > [PATCH 1/4] kernel/kcmp.c: Use new infrastructure to fix deadlocks in execve > https://lore.kernel.org/lkml/AM6PR03MB517057A2269C3A4FB287B76EE4FF0@AM6PR03MB5170.eurprd03.prod.outlook.com/ > > [PATCH 2/4] proc: Use new infrastructure to fix deadlocks in execve > https://lore.kernel.org/lkml/AM6PR03MB51705D211EC8E7EA270627B1E4FF0@AM6PR03MB5170.eurprd03.prod.outlook.com/ > > [PATCH 3/4] proc: io_accounting: Use new infrastructure to fix deadlocks in execve > https://lore.kernel.org/lkml/AM6PR03MB5170BD2476E35068E182EFA4E4FF0@AM6PR03MB5170.eurprd03.prod.outlook.com/ > > [PATCH 4/4] perf: Use new infrastructure to fix deadlocks in execve > https://lore.kernel.org/lkml/AM6PR03MB517035DEEDB9C8699CB6B34EE4FF0@AM6PR03MB5170.eurprd03.prod.outlook.com/ > > > I think most of the existing patches are already approved, but if > there are still change requests, please let me know. > > > Thanks > Bernd. > Hope it is correct now. I haven't seen the new patches on the kernel archives yet, so I cannot add URLs for them. Bernd.
On 14.03.2020 13:02, Bernd Edlinger wrote: > On 3/14/20 10:57 AM, Bernd Edlinger wrote: >> On 3/13/20 10:13 AM, Kirill Tkhai wrote: >>> >>> Despite this should fix the problem, this looks like a broken puzzle. >>> >>> We can't use bprm->cred as an identifier whether the mutex was locked or not. >>> We can check for bprm->cred in regard to cred_guard_mutex, because of there is >>> strong rule: "cred_guard_mutex is becomes locked together with bprm->cred assignment >>> (see prepare_bprm_creds()), and it becomes unlocked together with bprm->cred zeroing". >>> Take attention on modularity of all this: there is no dependencies between anything else. >>> >>> In regard to newly introduced exec_update_mutex, your fix and source patch way look like >>> an obfuscation. The mutex becomes deadly glued to unrelated bprm->cred and bprm->mm, >>> and this introduces the problems in the future modifications and support of all involved >>> entities. If someone wants to move some functions in relation to each other, there will >>> be a pain, and this person will have to go again the same dependencies and bug way, >>> Eric stepped on in the original patch. >>> >> >> Okay, yes, valid points you make, thanks. >> I just wanted to understand what was exactly wrong with this patch, >> since the failure mode looked a lot like it was failing because of >> something clobbering the data unexpectedly. >> >> >> So I have posted a few updated patch for the failed one here: >> >> [PATCH v3 5/5] exec: Add a exec_update_mutex to replace cred_guard_mutex >> [PATCH] pidfd: Use new infrastructure to fix deadlocks in execve >> >> which replaces these: >> [PATCH v2 5/5] exec: Add a exec_update_mutex to replace cred_guard_mutex >> https://lore.kernel.org/lkml/87zhcq4jdj.fsf_-_@x220.int.ebiederm.org/ >> >> [PATCH] pidfd: Stop taking cred_guard_mutex >> https://lore.kernel.org/lkml/87wo7svy96.fsf_-_@x220.int.ebiederm.org/ >> >> >> and a new patch series to fix deadlock in ptrace_attach and update doc: >> [PATCH 0/2] exec: Fix dead-lock in de_thread with ptrace_attach >> [PATCH 1/2] exec: Fix dead-lock in de_thread with ptrace_attach >> [PATCH 2/2] doc: Update documentation of ->exec_*_mutex >> >> >> Other patches needed, still valid: >> >> [PATCH v2 1/5] exec: Only compute current once in flush_old_exec >> https://lore.kernel.org/lkml/87pndm5y3l.fsf_-_@x220.int.ebiederm.org/ >> >> [PATCH v2 2/5] exec: Factor unshare_sighand out of de_thread and call it separately >> https://lore.kernel.org/lkml/87k13u5y26.fsf_-_@x220.int.ebiederm.org/ >> > > Ah, sorry, forgot this one: > [PATCH v2 3/5] exec: Move cleanup of posix timers on exec out of de_thread > https://lore.kernel.org/lkml/87eeu25y14.fsf_-_@x220.int.ebiederm.org/ > >> [PATCH v2 4/5] exec: Move exec_mmap right after de_thread in flush_old_exec >> https://lore.kernel.org/lkml/875zfe5xzb.fsf_-_@x220.int.ebiederm.org/ 1-4/5 look OK for me. You may add my Reviewed-by: Kirill Tkhai <ktkhai@virtuozzo.com> >> [PATCH 1/4] exec: Fix a deadlock in ptrace >> https://lore.kernel.org/lkml/AM6PR03MB517033EAD25BED15CC84E17DE4FF0@AM6PR03MB5170.eurprd03.prod.outlook.com/ >> >> [PATCH 2/4] selftests/ptrace: add test cases for dead-locks >> https://lore.kernel.org/lkml/AM6PR03MB51703199741A2C27A78980FFE4FF0@AM6PR03MB5170.eurprd03.prod.outlook.com/ >> >> [PATCH 3/4] mm: docs: Fix a comment in process_vm_rw_core >> https://lore.kernel.org/lkml/AM6PR03MB5170ED6D4D216EEEEF400136E4FF0@AM6PR03MB5170.eurprd03.prod.outlook.com/ >> >> [PATCH 4/4] kernel: doc: remove outdated comment cred.c >> https://lore.kernel.org/lkml/AM6PR03MB517039DB07AB641C194FEA57E4FF0@AM6PR03MB5170.eurprd03.prod.outlook.com/ >> >> [PATCH 1/4] kernel/kcmp.c: Use new infrastructure to fix deadlocks in execve >> https://lore.kernel.org/lkml/AM6PR03MB517057A2269C3A4FB287B76EE4FF0@AM6PR03MB5170.eurprd03.prod.outlook.com/ >> >> [PATCH 2/4] proc: Use new infrastructure to fix deadlocks in execve >> https://lore.kernel.org/lkml/AM6PR03MB51705D211EC8E7EA270627B1E4FF0@AM6PR03MB5170.eurprd03.prod.outlook.com/ >> >> [PATCH 3/4] proc: io_accounting: Use new infrastructure to fix deadlocks in execve >> https://lore.kernel.org/lkml/AM6PR03MB5170BD2476E35068E182EFA4E4FF0@AM6PR03MB5170.eurprd03.prod.outlook.com/ >> >> [PATCH 4/4] perf: Use new infrastructure to fix deadlocks in execve >> https://lore.kernel.org/lkml/AM6PR03MB517035DEEDB9C8699CB6B34EE4FF0@AM6PR03MB5170.eurprd03.prod.outlook.com/ >> >> >> I think most of the existing patches are already approved, but if >> there are still change requests, please let me know. >> >> >> Thanks >> Bernd. >> > > Hope it is correct now. > I haven't seen the new patches on the kernel archives yet, > so I cannot add URLs for them. > > Bernd. >
diff --git a/fs/exec.c b/fs/exec.c index d820a7272a76..ffeebb1f167b 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -1014,6 +1014,7 @@ static int exec_mmap(struct mm_struct *mm) { struct task_struct *tsk; struct mm_struct *old_mm, *active_mm; + int ret; /* Notify parent that we're no longer interested in the old VM */ tsk = current; @@ -1034,6 +1035,11 @@ static int exec_mmap(struct mm_struct *mm) return -EINTR; } } + + ret = mutex_lock_killable(&tsk->signal->exec_update_mutex); + if (ret) + return ret; + task_lock(tsk); active_mm = tsk->active_mm; membarrier_exec_mmap(mm); @@ -1438,6 +1444,8 @@ static void free_bprm(struct linux_binprm *bprm) { free_arg_pages(bprm); if (bprm->cred) { + if (!bprm->mm) + mutex_unlock(¤t->signal->exec_update_mutex); mutex_unlock(¤t->signal->cred_guard_mutex); abort_creds(bprm->cred); } @@ -1487,6 +1495,7 @@ void install_exec_creds(struct linux_binprm *bprm) * credentials; any time after this it may be unlocked. */ security_bprm_committed_creds(bprm); + mutex_unlock(¤t->signal->exec_update_mutex); mutex_unlock(¤t->signal->cred_guard_mutex); } EXPORT_SYMBOL(install_exec_creds); diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h index 88050259c466..a29df79540ce 100644 --- a/include/linux/sched/signal.h +++ b/include/linux/sched/signal.h @@ -224,7 +224,14 @@ struct signal_struct { struct mutex cred_guard_mutex; /* guard against foreign influences on * credential calculations - * (notably. ptrace) */ + * (notably. ptrace) + * Deprecated do not use in new code. + * Use exec_update_mutex instead. + */ + struct mutex exec_update_mutex; /* Held while task_struct is being + * updated during exec, and may have + * inconsistent permissions. + */ } __randomize_layout; /* diff --git a/init/init_task.c b/init/init_task.c index 9e5cbe5eab7b..bd403ed3e418 100644 --- a/init/init_task.c +++ b/init/init_task.c @@ -26,6 +26,7 @@ static struct signal_struct init_signals = { .multiprocess = HLIST_HEAD_INIT, .rlim = INIT_RLIMITS, .cred_guard_mutex = __MUTEX_INITIALIZER(init_signals.cred_guard_mutex), + .exec_update_mutex = __MUTEX_INITIALIZER(init_signals.exec_update_mutex), #ifdef CONFIG_POSIX_TIMERS .posix_timers = LIST_HEAD_INIT(init_signals.posix_timers), .cputimer = { diff --git a/kernel/fork.c b/kernel/fork.c index 60a1295f4384..12896a6ecee6 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -1594,6 +1594,7 @@ static int copy_signal(unsigned long clone_flags, struct task_struct *tsk) sig->oom_score_adj_min = current->signal->oom_score_adj_min; mutex_init(&sig->cred_guard_mutex); + mutex_init(&sig->exec_update_mutex); return 0; }
The cred_guard_mutex is problematic. The cred_guard_mutex is held over the userspace accesses as the arguments from userspace are read. The cred_guard_mutex is held of PTRACE_EVENT_EXIT as the the other threads are killed. The cred_guard_mutex is held over "put_user(0, tsk->clear_child_tid)" in exit_mm(). Any of those can result in deadlock, as the cred_guard_mutex is held over a possible indefinite userspace waits for userspace. Add exec_update_mutex that is only held over exec updating process with the new contents of exec, so that code that needs not to be confused by exec changing the mm and the cred in ways that can not happen during ordinary execution of a process. The plan is to switch the users of cred_guard_mutex to exec_udpate_mutex one by one. This lets us move forward while still being careful and not introducing any regressions. Link: https://lore.kernel.org/lkml/20160921152946.GA24210@dhcp22.suse.cz/ Link: https://lore.kernel.org/lkml/AM6PR03MB5170B06F3A2B75EFB98D071AE4E60@AM6PR03MB5170.eurprd03.prod.outlook.com/ Link: https://lore.kernel.org/linux-fsdevel/20161102181806.GB1112@redhat.com/ Link: https://lore.kernel.org/lkml/20160923095031.GA14923@redhat.com/ Link: https://lore.kernel.org/lkml/20170213141452.GA30203@redhat.com/ Ref: 45c1a159b85b ("Add PTRACE_O_TRACEVFORKDONE and PTRACE_O_TRACEEXIT facilities.") Ref: 456f17cd1a28 ("[PATCH] user-vm-unlock-2.5.31-A2") Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> --- fs/exec.c | 9 +++++++++ include/linux/sched/signal.h | 9 ++++++++- init/init_task.c | 1 + kernel/fork.c | 1 + 4 files changed, 19 insertions(+), 1 deletion(-)