Message ID | 20190205025308.GA24455@visor (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v2] exec: don't force_sigsegv processes with a pending fatal signal | expand |
On Mon, 4 Feb 2019 18:53:08 -0800 Ivan Delalande <colona@arista.com> wrote: > We were seeing unexplained segfaults in coreutils processes and other > basic utilities on systems with print-fatal-signals enabled: > > [ 311.001986] potentially unexpected fatal signal 11. > [ 311.001993] CPU: 3 PID: 4565 Comm: tail Tainted: P O 4.9.100.Ar-8497547.eostrunkkernel49 #1 > [ 311.001995] task: ffff88021431b400 task.stack: ffffc90004cec000 > [ 311.001997] RIP: 0023:[<00000000f7722c09>] [<00000000f7722c09>] 0xf7722c09 > [ 311.002003] RSP: 002b:00000000ffcc8aa4 EFLAGS: 00000296 > [ 311.002004] RAX: fffffffffffffff2 RBX: 0000000057efc530 RCX: 0000000057efdb68 > [ 311.002006] RDX: 0000000057effb60 RSI: 0000000057efdb68 RDI: 00000000f768f000 > [ 311.002007] RBP: 0000000057efc530 R08: 0000000000000000 R09: 0000000000000000 > [ 311.002008] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 > [ 311.002009] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 > [ 311.002011] FS: 0000000000000000(0000) GS:ffff88021e980000(0000) knlGS:0000000000000000 > [ 311.002013] CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033 > [ 311.002014] CR2: 00000000f77bf097 CR3: 0000000150f6f000 CR4: 00000000000406f0 > > We tracked these crashes down to binfmt_elf failing to load segments > for ld.so inside the kernel. Digging further, the actual problem > seems to occur when a process gets sigkilled while it is still being > loaded by the kernel. In our case when _do_page_fault goes for a retry > it will return early as it first checks for fatal_signal_pending(), so > load_elf_interp also returns with error and as a result > search_binary_handler will force_sigsegv() which is pretty confusing as > nothing actually failed here. > > > v2: add a message when load_binary fails, add a check for fatal signals > in signal_delivered (avoiding a single check in force_sigsegv as other > architectures use it directly and may have different expectations). > > Thanks to Dmitry Safonov and Oleg Nesterov for their comments and > suggestions. > > ... > > --- a/fs/exec.c > +++ b/fs/exec.c > @@ -1660,7 +1660,12 @@ int search_binary_handler(struct linux_binprm *bprm) > if (retval < 0 && !bprm->mm) { > /* we got to flush_old_exec() and failed after it */ > read_unlock(&binfmt_lock); > - force_sigsegv(SIGSEGV, current); > + if (!fatal_signal_pending(current)) { > + if (print_fatal_signals) > + pr_info("load_binary() failed: %d\n", > + retval); Should we be using print_fatal_signal() here? > + force_sigsegv(SIGSEGV, current); > + } > return retval; > } > if (retval != -ENOEXEC || !bprm->file) { > diff --git a/kernel/signal.c b/kernel/signal.c > index e1d7ad8e6ab1..674076e63624 100644 > --- a/kernel/signal.c > +++ b/kernel/signal.c > @@ -2552,10 +2552,10 @@ static void signal_delivered(struct ksignal *ksig, int stepping) > > void signal_setup_done(int failed, struct ksignal *ksig, int stepping) > { > - if (failed) > - force_sigsegv(ksig->sig, current); > - else > + if (!failed) > signal_delivered(ksig, stepping); > + else if (!fatal_signal_pending(current)) > + force_sigsegv(ksig->sig, current); > }
On Tue, Feb 05, 2019 at 01:11:19PM -0800, Andrew Morton wrote: > On Mon, 4 Feb 2019 18:53:08 -0800 Ivan Delalande <colona@arista.com> wrote: > > --- a/fs/exec.c > > +++ b/fs/exec.c > > @@ -1660,7 +1660,12 @@ int search_binary_handler(struct linux_binprm *bprm) > > if (retval < 0 && !bprm->mm) { > > /* we got to flush_old_exec() and failed after it */ > > read_unlock(&binfmt_lock); > > - force_sigsegv(SIGSEGV, current); > > + if (!fatal_signal_pending(current)) { > > + if (print_fatal_signals) > > + pr_info("load_binary() failed: %d\n", > > + retval); > > Should we be using print_fatal_signal() here? I don't think so, the force_sigsegv() already ensures it will be called from get_signal() when the signal is handled, and so the process information will be printed then. > > + force_sigsegv(SIGSEGV, current); > > + } > > return retval; > > } > > if (retval != -ENOEXEC || !bprm->file) { Thanks,
Ivan Delalande <colona@arista.com> writes: > On Tue, Feb 05, 2019 at 01:11:19PM -0800, Andrew Morton wrote: >> On Mon, 4 Feb 2019 18:53:08 -0800 Ivan Delalande <colona@arista.com> wrote: >> > --- a/fs/exec.c >> > +++ b/fs/exec.c >> > @@ -1660,7 +1660,12 @@ int search_binary_handler(struct linux_binprm *bprm) >> > if (retval < 0 && !bprm->mm) { >> > /* we got to flush_old_exec() and failed after it */ >> > read_unlock(&binfmt_lock); >> > - force_sigsegv(SIGSEGV, current); >> > + if (!fatal_signal_pending(current)) { >> > + if (print_fatal_signals) >> > + pr_info("load_binary() failed: %d\n", >> > + retval); >> >> Should we be using print_fatal_signal() here? > > I don't think so, the force_sigsegv() already ensures it will be called > from get_signal() when the signal is handled, and so the process > information will be printed then. > >> > + force_sigsegv(SIGSEGV, current); >> > + } >> > return retval; >> > } >> > if (retval != -ENOEXEC || !bprm->file) { > I just noticed this. From my patch queue that I intend to send to Linus tomorrow. I think this change fixes your issue of getting the SIGSEGV instead of the already pending fatal signal. So I think this fixes your issue without any other code changes. Ivan can you verify that the patch below is enough? diff --git a/kernel/signal.c b/kernel/signal.c index 9ca8e5278c8e..5424cb0006bc 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -2393,6 +2393,11 @@ bool get_signal(struct ksignal *ksig) goto relock; } + /* Has this task already been marked for death? */ + ksig->info.si_signo = signr = SIGKILL; + if (signal_group_exit(signal)) + goto fatal; + for (;;) { struct k_sigaction *ka; @@ -2488,6 +2493,7 @@ bool get_signal(struct ksignal *ksig) continue; } + fatal: spin_unlock_irq(&sighand->siglock);
Hi Eric, On Thu, Feb 07, 2019 at 11:13:59PM -0600, Eric W. Biederman wrote: > I just noticed this. From my patch queue that I intend to send to > Linus tomorrow. I think this change fixes your issue of getting > the SIGSEGV instead of the already pending fatal signal. > > So I think this fixes your issue without any other code changes. > Ivan can you verify that the patch below is enough? I was having issues with just this patch applied on top of v5.0-rc5 or the latest master: defunct processes accumulating, exiting processes that would hang forever, and some kernel functions eating all the CPU (setup_sigcontext, common_interrupt, __clear_user, do_signal…). But using your user-namespace.git/for-linus worked great and I've been running my reproducer for a few hours now without issue. I'll probably keep it running over the week-end as it has been unreliable at times, but it looks promising so far. A difference I've noticed with your tree (unrelated to my issue here but that you may want to look at) is when I run my reproducer under strace -f, I'm now getting quite a lot of "Exit of unknown pid 12345 ignored" warnings from strace, which I've never seen with mainline. My reproducer simply fork-exec tail processes in a loop, and tries to sigkill them in the parent with a variable delay. Thank you, > diff --git a/kernel/signal.c b/kernel/signal.c > index 9ca8e5278c8e..5424cb0006bc 100644 > --- a/kernel/signal.c > +++ b/kernel/signal.c > @@ -2393,6 +2393,11 @@ bool get_signal(struct ksignal *ksig) > goto relock; > } > > + /* Has this task already been marked for death? */ > + ksig->info.si_signo = signr = SIGKILL; > + if (signal_group_exit(signal)) > + goto fatal; > + > for (;;) { > struct k_sigaction *ka; > > @@ -2488,6 +2493,7 @@ bool get_signal(struct ksignal *ksig) > continue; > } > > + fatal: > spin_unlock_irq(&sighand->siglock); > >
Ivan Delalande <colona@arista.com> writes: > Hi Eric, > > On Thu, Feb 07, 2019 at 11:13:59PM -0600, Eric W. Biederman wrote: >> I just noticed this. From my patch queue that I intend to send to >> Linus tomorrow. I think this change fixes your issue of getting >> the SIGSEGV instead of the already pending fatal signal. >> >> So I think this fixes your issue without any other code changes. >> Ivan can you verify that the patch below is enough? > > I was having issues with just this patch applied on top of v5.0-rc5 or > the latest master: defunct processes accumulating, exiting processes > that would hang forever, and some kernel functions eating all the CPU > (setup_sigcontext, common_interrupt, __clear_user, do_signal…). > > But using your user-namespace.git/for-linus worked great and I've been > running my reproducer for a few hours now without issue. I'll probably > keep it running over the week-end as it has been unreliable at times, > but it looks promising so far. Sounds. Thank you for finding my tree, and thank you for testing. > A difference I've noticed with your tree (unrelated to my issue here but > that you may want to look at) is when I run my reproducer under > strace -f, I'm now getting quite a lot of "Exit of unknown pid 12345 > ignored" warnings from strace, which I've never seen with mainline. > My reproducer simply fork-exec tail processes in a loop, and tries to > sigkill them in the parent with a variable delay. What was your base tree? My best guess is that your SIGKILL is getting there before strace realizes the process has been forked. If we can understand the race it is probably worth fixing. Any chance you can post your reproducer. It is possible it is my most recent fixes, or it is possible something changed from the tree you were testing and the tree you are working on. Eric
On 02/08, Ivan Delalande wrote: > > A difference I've noticed with your tree (unrelated to my issue here but > that you may want to look at) is when I run my reproducer under > strace -f, I'm now getting quite a lot of "Exit of unknown pid 12345 > ignored" warnings from strace, which I've never seen with mainline. > My reproducer simply fork-exec tail processes in a loop, and tries to > sigkill them in the parent with a variable delay. Hmm... may be because of PTRACE_EVENT_EXIT problem I mentioned in reply to this change... > > Thank you, > > > diff --git a/kernel/signal.c b/kernel/signal.c > > index 9ca8e5278c8e..5424cb0006bc 100644 > > --- a/kernel/signal.c > > +++ b/kernel/signal.c > > @@ -2393,6 +2393,11 @@ bool get_signal(struct ksignal *ksig) > > goto relock; > > } > > > > + /* Has this task already been marked for death? */ > > + ksig->info.si_signo = signr = SIGKILL; > > + if (signal_group_exit(signal)) > > + goto fatal; > > + > > for (;;) { > > struct k_sigaction *ka; > > > > @@ -2488,6 +2493,7 @@ bool get_signal(struct ksignal *ksig) > > continue; > > } > > > > + fatal: > > spin_unlock_irq(&sighand->siglock); > > > > > > -- > Ivan Delalande > Arista Networks
sorry, I couldn't look at this patch before. On 02/04, Ivan Delalande wrote: > > --- a/fs/exec.c > +++ b/fs/exec.c > @@ -1660,7 +1660,12 @@ int search_binary_handler(struct linux_binprm *bprm) > if (retval < 0 && !bprm->mm) { > /* we got to flush_old_exec() and failed after it */ > read_unlock(&binfmt_lock); > - force_sigsegv(SIGSEGV, current); > + if (!fatal_signal_pending(current)) { > + if (print_fatal_signals) > + pr_info("load_binary() failed: %d\n", > + retval); I won't argue, but do we really want this spam? > + force_sigsegv(SIGSEGV, current); > + } > return retval; > } > if (retval != -ENOEXEC || !bprm->file) { > diff --git a/kernel/signal.c b/kernel/signal.c > index e1d7ad8e6ab1..674076e63624 100644 > --- a/kernel/signal.c > +++ b/kernel/signal.c > @@ -2552,10 +2552,10 @@ static void signal_delivered(struct ksignal *ksig, int stepping) > > void signal_setup_done(int failed, struct ksignal *ksig, int stepping) > { > - if (failed) > - force_sigsegv(ksig->sig, current); > - else > + if (!failed) > signal_delivered(ksig, stepping); > + else if (!fatal_signal_pending(current)) > + force_sigsegv(ksig->sig, current); The changelog doesn't explain this change. OK, I guess it comes from the previous discussion, setup_rt_frame() can equally fail if fatal_signal_pending(). But this should be documented at least in the changelog, and I still think we could simply change force_sigsegv() instead. In any case, Eric has already mentioned that we going to give SIGKILL more priority, so I think we can drop this patch? Oleg.
On Mon, Feb 11, 2019 at 06:12:53PM +0100, Oleg Nesterov wrote: > sorry, I couldn't look at this patch before. > > On 02/04, Ivan Delalande wrote: > > > > --- a/fs/exec.c > > +++ b/fs/exec.c > > @@ -1660,7 +1660,12 @@ int search_binary_handler(struct linux_binprm *bprm) > > if (retval < 0 && !bprm->mm) { > > /* we got to flush_old_exec() and failed after it */ > > read_unlock(&binfmt_lock); > > - force_sigsegv(SIGSEGV, current); > > + if (!fatal_signal_pending(current)) { > > + if (print_fatal_signals) > > + pr_info("load_binary() failed: %d\n", > > + retval); > > I won't argue, but do we really want this spam? > > > + force_sigsegv(SIGSEGV, current); > > + } > > return retval; > > } > > if (retval != -ENOEXEC || !bprm->file) { > > diff --git a/kernel/signal.c b/kernel/signal.c > > index e1d7ad8e6ab1..674076e63624 100644 > > --- a/kernel/signal.c > > +++ b/kernel/signal.c > > @@ -2552,10 +2552,10 @@ static void signal_delivered(struct ksignal *ksig, int stepping) > > > > void signal_setup_done(int failed, struct ksignal *ksig, int stepping) > > { > > - if (failed) > > - force_sigsegv(ksig->sig, current); > > - else > > + if (!failed) > > signal_delivered(ksig, stepping); > > + else if (!fatal_signal_pending(current)) > > + force_sigsegv(ksig->sig, current); > > The changelog doesn't explain this change. > > OK, I guess it comes from the previous discussion, setup_rt_frame() can equally fail > if fatal_signal_pending(). But this should be documented at least in the changelog, > and I still think we could simply change force_sigsegv() instead. > > In any case, Eric has already mentioned that we going to give SIGKILL more priority, > so I think we can drop this patch? Yes, I've been running our tests on top of Eric's tree over the week-end and haven't seen any new hit. I also see that Andrew has dropped the patch from -mm, so no futher action should be required here. Thank you for taking a look at the patch anyway.
On Sun, Feb 10, 2019 at 11:05:52AM -0600, Eric W. Biederman wrote: > Ivan Delalande <colona@arista.com> writes: > > A difference I've noticed with your tree (unrelated to my issue here but > > that you may want to look at) is when I run my reproducer under > > strace -f, I'm now getting quite a lot of "Exit of unknown pid 12345 > > ignored" warnings from strace, which I've never seen with mainline. > > My reproducer simply fork-exec tail processes in a loop, and tries to > > sigkill them in the parent with a variable delay. > > What was your base tree? It was just off v5.0-rc5, and I didn't see these warnings on the last few RCs either. Now I'm seeing them on vanilla v5.0-rc6 as well. > My best guess is that your SIGKILL is getting there before strace > realizes the process has been forked. If we can understand the race > it is probably worth fixing. > > Any chance you can post your reproducer. Sure, see the attachment. I think this is the simplest version where these warnings show up. This one just forks/exec `tail -a` to make it fail and exit 1 as soon as possible, and progressively increase the delay between the fork and sigkill to try to hit our original issue, stopping and restarting only after 10 completions of the child as the timing varies a fair bit. Running this program under `strace -f -o /dev/null` prints the warnings almost instantly on my system. > It is possible it is my most recent fixes, or it is possible something > changed from the tree you were testing and the tree you are working > on. Thanks,
diff --git a/fs/exec.c b/fs/exec.c index fb72d36f7823..caf584064f89 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -1660,7 +1660,12 @@ int search_binary_handler(struct linux_binprm *bprm) if (retval < 0 && !bprm->mm) { /* we got to flush_old_exec() and failed after it */ read_unlock(&binfmt_lock); - force_sigsegv(SIGSEGV, current); + if (!fatal_signal_pending(current)) { + if (print_fatal_signals) + pr_info("load_binary() failed: %d\n", + retval); + force_sigsegv(SIGSEGV, current); + } return retval; } if (retval != -ENOEXEC || !bprm->file) { diff --git a/kernel/signal.c b/kernel/signal.c index e1d7ad8e6ab1..674076e63624 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -2552,10 +2552,10 @@ static void signal_delivered(struct ksignal *ksig, int stepping) void signal_setup_done(int failed, struct ksignal *ksig, int stepping) { - if (failed) - force_sigsegv(ksig->sig, current); - else + if (!failed) signal_delivered(ksig, stepping); + else if (!fatal_signal_pending(current)) + force_sigsegv(ksig->sig, current); } /*
We were seeing unexplained segfaults in coreutils processes and other basic utilities on systems with print-fatal-signals enabled: [ 311.001986] potentially unexpected fatal signal 11. [ 311.001993] CPU: 3 PID: 4565 Comm: tail Tainted: P O 4.9.100.Ar-8497547.eostrunkkernel49 #1 [ 311.001995] task: ffff88021431b400 task.stack: ffffc90004cec000 [ 311.001997] RIP: 0023:[<00000000f7722c09>] [<00000000f7722c09>] 0xf7722c09 [ 311.002003] RSP: 002b:00000000ffcc8aa4 EFLAGS: 00000296 [ 311.002004] RAX: fffffffffffffff2 RBX: 0000000057efc530 RCX: 0000000057efdb68 [ 311.002006] RDX: 0000000057effb60 RSI: 0000000057efdb68 RDI: 00000000f768f000 [ 311.002007] RBP: 0000000057efc530 R08: 0000000000000000 R09: 0000000000000000 [ 311.002008] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 [ 311.002009] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 [ 311.002011] FS: 0000000000000000(0000) GS:ffff88021e980000(0000) knlGS:0000000000000000 [ 311.002013] CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033 [ 311.002014] CR2: 00000000f77bf097 CR3: 0000000150f6f000 CR4: 00000000000406f0 We tracked these crashes down to binfmt_elf failing to load segments for ld.so inside the kernel. Digging further, the actual problem seems to occur when a process gets sigkilled while it is still being loaded by the kernel. In our case when _do_page_fault goes for a retry it will return early as it first checks for fatal_signal_pending(), so load_elf_interp also returns with error and as a result search_binary_handler will force_sigsegv() which is pretty confusing as nothing actually failed here. v2: add a message when load_binary fails, add a check for fatal signals in signal_delivered (avoiding a single check in force_sigsegv as other architectures use it directly and may have different expectations). Thanks to Dmitry Safonov and Oleg Nesterov for their comments and suggestions. Fixes: 19d860a140be ("handle suicide on late failure exits in execve() in search_binary_handler()") Reference: https://lkml.org/lkml/2013/2/14/5 Signed-off-by: Ivan Delalande <colona@arista.com> --- fs/exec.c | 7 ++++++- kernel/signal.c | 6 +++--- 2 files changed, 9 insertions(+), 4 deletions(-)