diff mbox

[v2,5/7] clone4: Add a CLONE_AUTOREAP flag to automatically reap the child process

Message ID 6d002995485d446e659105f6931307f3e532ce89.1426376419.git.josh@joshtriplett.org (mailing list archive)
State New, archived
Headers show

Commit Message

Josh Triplett March 15, 2015, 8 a.m. UTC
If a process launches a child process with the notification signal set
to SIGCHLD (e.g. with fork()), and then the parent process either
ignores SIGCHLD or sets a handler with SA_NOCLDWAIT, the child process
will get automatically reaped without waiting for the parent to wait on
it.

However, there's currently no way to get the same autoreaping behavior
if the signal is not set to SIGCHLD, including in particular if the
signal is set to 0 to disable notification.  Furthermore, the code
launching the child process may not own process-wide signal handling for
the parent process.

Add a CLONE_AUTOREAP flag to request this behavior unconditionally,
regardless of the notification signal or the state of the parent
process's signal handling when the process exits.

This is particularly useful for libraries that want to launch unattended
child processes without interfering with the calling process's signal
handling or wait loop.

Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Thiago Macieira <thiago.macieira@intel.com>
---
 include/linux/sched.h      | 2 ++
 include/uapi/linux/sched.h | 7 ++++++-
 kernel/fork.c              | 2 ++
 kernel/signal.c            | 2 ++
 4 files changed, 12 insertions(+), 1 deletion(-)

Comments

Oleg Nesterov March 15, 2015, 2:52 p.m. UTC | #1
On 03/15, Josh Triplett wrote:
>
> Add a CLONE_AUTOREAP flag to request this behavior unconditionally,

Yes, CLONE_AUTOREAP is much better. And I agree (mostly) with that
we should rely on do_notify_parent().

Howver the patch still doesn't look right. First of all, ->autoreap
should be per-process, not per-thread. And there are ptrace/mt issues,
it seems. Just for example, we should avoid EXIT_TRACE if autoreap in
wait_task_zombie() even if we are going to re-notify parent.

Yes... and other problems with ptrace. So let me nack this patch for
the moment ;) But let me repeat that personally I agree with this
change "in general".

EXCEPT: do we really want SIGCHLD from the exiting child? I think we
do not. I won't really argue though, but this should be discussed and
documented. IIUC, with your patch it is still sent.

Josh, please give me some time to think and re-check, I'll write another
email next week. I am not sure this is really needed, but it seems to
me that we need the preparation patch to make this change clear/simple.

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Josh Triplett March 15, 2015, 5:18 p.m. UTC | #2
On Sun, Mar 15, 2015 at 03:52:23PM +0100, Oleg Nesterov wrote:
> On 03/15, Josh Triplett wrote:
> > Add a CLONE_AUTOREAP flag to request this behavior unconditionally,
> 
> Yes, CLONE_AUTOREAP is much better. And I agree (mostly) with that
> we should rely on do_notify_parent().
> 
> Howver the patch still doesn't look right. First of all, ->autoreap
> should be per-process, not per-thread.

Ah, you're thinking of the case where the parent process launches a
child with CLONE_AUTOREAP, that child process launches siblings with
CLONE_THREAD and without CLONE_AUTOREAP, and one of those siblings is
the last to exit?  That seems easy enough to handle: instead of setting
->autoreap unconditionally in copy_process, I can set it only in the
non-CLONE_THREAD case, and otherwise let it inherit.  Then every task in
the group will have the same value for autoreap.

(As an aside, what *is* the use case for CLONE_PARENT without
CLONE_THREAD?)

> And there are ptrace/mt issues,
> it seems. Just for example, we should avoid EXIT_TRACE if autoreap in
> wait_task_zombie() even if we are going to re-notify parent.

I don't see how EXIT_TRACE can happen in wait_task_zombie if autoreap is
set.  wait_task_zombie does a cmpxchg with exit_state and doesn't
proceed unless exit_state was EXIT_ZOMBIE, and I don't see how we can
ever reach the EXIT_ZOMBIE state if autoreap.

> EXCEPT: do we really want SIGCHLD from the exiting child? I think we
> do not. I won't really argue though, but this should be discussed and
> documented. IIUC, with your patch it is still sent.

I think we do, yes.  The caller of clone can already specify what signal
they want, including no signal at all.  If they specify a signal
(SIGCHLD or otherwise) along with CLONE_AUTOREAP, we can send that
signal.  I don't think that causes any particular problem.

That's the same semantic you'd get if you have a SIGCHLD handler with
SA_NOCLDWAIT: you'd still get the signal, even though you don't need to
(and can't) wait on the child process.

> Josh, please give me some time to think and re-check, I'll write another
> email next week. I am not sure this is really needed, but it seems to
> me that we need the preparation patch to make this change clear/simple.

I'd appreciate any feedback you can offer on this series, including any
potential subtle interactions with ptrace.

- Josh Triplett
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Oleg Nesterov March 15, 2015, 7:55 p.m. UTC | #3
On 03/15, Josh Triplett wrote:
>
> On Sun, Mar 15, 2015 at 03:52:23PM +0100, Oleg Nesterov wrote:
> > On 03/15, Josh Triplett wrote:
> > > Add a CLONE_AUTOREAP flag to request this behavior unconditionally,
> >
> > Yes, CLONE_AUTOREAP is much better. And I agree (mostly) with that
> > we should rely on do_notify_parent().
> >
> > Howver the patch still doesn't look right. First of all, ->autoreap
> > should be per-process, not per-thread.
>
> Ah, you're thinking of the case where the parent process launches a
> ...

Not really, although we probably need more sanity checks.

It should be per-process simply because this "autoreap" affects the whole
process. And the sub-threads are already "autoreap". And these 2 autoreap's
semantics differ, we should not confuse them.

> (As an aside, what *is* the use case for CLONE_PARENT without
> CLONE_THREAD?)

To me CLONE_PARENT is another historical mistake and the source of misc
problems ;)

> > And there are ptrace/mt issues,
> > it seems. Just for example, we should avoid EXIT_TRACE if autoreap in
> > wait_task_zombie() even if we are going to re-notify parent.
>
> I don't see how EXIT_TRACE can happen in wait_task_zombie if autoreap is
> set.  wait_task_zombie does a cmpxchg with exit_state and doesn't
> proceed unless exit_state was EXIT_ZOMBIE, and I don't see how we can
> ever reach the EXIT_ZOMBIE state if autoreap.

Because you again forgot about ptrace ;)

Josh. Let me try to summarise this later when I have time. Again, I am
not sure, perhaps this is even simpler than I currently think. And let
me apologize in advance, most probably I will be busy tomorrow.

> > EXCEPT: do we really want SIGCHLD from the exiting child? I think we
> > do not. I won't really argue though, but this should be discussed and
> > documented. IIUC, with your patch it is still sent.
>
> I think we do, yes.  The caller of clone can already specify what signal
> they want, including no signal at all.  If they specify a signal
> (SIGCHLD or otherwise) along with CLONE_AUTOREAP, we can send that
> signal.

OK. Agreed.

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Josh Triplett March 15, 2015, 11:34 p.m. UTC | #4
On Sun, Mar 15, 2015 at 08:55:06PM +0100, Oleg Nesterov wrote:
> On 03/15, Josh Triplett wrote:
> > On Sun, Mar 15, 2015 at 03:52:23PM +0100, Oleg Nesterov wrote:
> > > On 03/15, Josh Triplett wrote:
> > > > Add a CLONE_AUTOREAP flag to request this behavior unconditionally,
> > >
> > > Yes, CLONE_AUTOREAP is much better. And I agree (mostly) with that
> > > we should rely on do_notify_parent().
> > >
> > > Howver the patch still doesn't look right. First of all, ->autoreap
> > > should be per-process, not per-thread.
> >
> > Ah, you're thinking of the case where the parent process launches a
> > ...
> 
> Not really, although we probably need more sanity checks.
> 
> It should be per-process simply because this "autoreap" affects the whole
> process. And the sub-threads are already "autoreap". And these 2 autoreap's
> semantics differ, we should not confuse them.

Will the approach I suggested, of having clones with CLONE_THREAD
inherit the autoreap value rather than setting it from CLONE_AUTOREAP,
implement the semantics you're looking for?

Also, are you suggesting that CLONE_AUTOREAP with CLONE_THREAD should
produce -EINVAL, or just that it should be ignored?

> > (As an aside, what *is* the use case for CLONE_PARENT without
> > CLONE_THREAD?)
> 
> To me CLONE_PARENT is another historical mistake and the source of misc
> problems ;)

I kinda figured. :)

> > > And there are ptrace/mt issues,
> > > it seems. Just for example, we should avoid EXIT_TRACE if autoreap in
> > > wait_task_zombie() even if we are going to re-notify parent.
> >
> > I don't see how EXIT_TRACE can happen in wait_task_zombie if autoreap is
> > set.  wait_task_zombie does a cmpxchg with exit_state and doesn't
> > proceed unless exit_state was EXIT_ZOMBIE, and I don't see how we can
> > ever reach the EXIT_ZOMBIE state if autoreap.
> 
> Because you again forgot about ptrace ;)
> 
> Josh. Let me try to summarise this later when I have time. Again, I am
> not sure, perhaps this is even simpler than I currently think. And let
> me apologize in advance, most probably I will be busy tomorrow.

I look forward to your later review and feedback.

- Josh Triplett
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Oleg Nesterov March 20, 2015, 6:14 p.m. UTC | #5
Josh,

I am really sorry for delay.

On 03/15, Josh Triplett wrote:
>
> On Sun, Mar 15, 2015 at 08:55:06PM +0100, Oleg Nesterov wrote:
 >
> > It should be per-process simply because this "autoreap" affects the whole
> > process. And the sub-threads are already "autoreap". And these 2 autoreap's
> > semantics differ, we should not confuse them.
>
> Will the approach I suggested, of having clones with CLONE_THREAD
> inherit the autoreap value rather than setting it from CLONE_AUTOREAP,
> implement the semantics you're looking for?

Not sure I understand... CLONE_THREAD should not inherit the autoreap.
A sub-thread is always autoreapable.

> Also, are you suggesting that CLONE_AUTOREAP with CLONE_THREAD should
> produce -EINVAL, or just that it should be ignored?

Yes, I think CLONE_AUTOREAP | CLONE_THREAD should return -EINVAL. But
this all is minor...

The main problem is how/when we should check this "autoreap" without
making this code even more ugly.

I still think we need a preparation patch. I tried to make it today but
failed. Will try again on weekend...


Note that we can't solely rely on do_notify_parent() which (with your patch)
correctly checks !ptrace && autoreap.

Just for example. Please look at __ptrace_detach(). Note that if we add
CLONE_AUTOREAP this needs a fix in any case. The tracee can be "autoreap"
but zombie, because "autoreap" should be ignored until the tracer detaches.
But the "same_thread_group" should not call do_notify_parent() again. So
this needs another check.

And let me quote our discussion from the previous email:

	> > EXCEPT: do we really want SIGCHLD from the exiting child? I think we
	> > do not. I won't really argue though, but this should be discussed and
	> > documented. IIUC, with your patch it is still sent.
	>
	> I think we do, yes.  The caller of clone can already specify what signal
	> they want, including no signal at all.  If they specify a signal
	> (SIGCHLD or otherwise) along with CLONE_AUTOREAP, we can send that
	> signal.

	OK. Agreed.

Yes, I agree...

But the changes in __ptrace_detach() depend on whether we need to send a signal
or not. Either way the changle is simple, but looks ugly. It would be nice to
cleanup this somehow.

Also. I forgot that the kernel always resets ->exit_signal to SIGCHLD on exec
or reparenting. Reparenting is probably fine. But what about exec? Should it
keep ->exit_signal == 0 if "autoreap" ? I think it should not, to avoid the
strange special case.

> > > > And there are ptrace/mt issues,
> > > > it seems. Just for example, we should avoid EXIT_TRACE if autoreap in
> > > > wait_task_zombie() even if we are going to re-notify parent.
> > >
> > > I don't see how EXIT_TRACE can happen in wait_task_zombie if autoreap is
> > > set.  wait_task_zombie does a cmpxchg with exit_state and doesn't
> > > proceed unless exit_state was EXIT_ZOMBIE, and I don't see how we can
> > > ever reach the EXIT_ZOMBIE state if autoreap.
> >
> > Because you again forgot about ptrace ;)

And this too asks for preparation before CLONE_AUTOREAP...

So I'll try to think about this all again on weekend. I'll try very much
to not disappear again ;)

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Thiago Macieira March 20, 2015, 6:46 p.m. UTC | #6
On Friday 20 March 2015 19:14:04 Oleg Nesterov wrote:
> Also. I forgot that the kernel always resets ->exit_signal to SIGCHLD on
> exec or reparenting. Reparenting is probably fine. But what about exec?
> Should it keep ->exit_signal == 0 if "autoreap" ? I think it should not, to
> avoid the strange special case.

Not delivering any signal was the objective of this patch series, so yes 
exit_signal == 0 should survive an exec and even re-exec.
Oleg Nesterov March 20, 2015, 7:09 p.m. UTC | #7
On 03/20, Thiago Macieira wrote:
>
> On Friday 20 March 2015 19:14:04 Oleg Nesterov wrote:
> > Also. I forgot that the kernel always resets ->exit_signal to SIGCHLD on
> > exec or reparenting. Reparenting is probably fine. But what about exec?
> > Should it keep ->exit_signal == 0 if "autoreap" ? I think it should not, to
> > avoid the strange special case.
>
> Not delivering any signal was the objective of this patch series, so yes
> exit_signal == 0 should survive an exec and even re-exec.

OK, but then perhaps we should never send SIGCHLD (on exit) if "autoreap",
to make the logic simple.

And copy_process() should probably do

	if ((clone_flags & CSIGNAL) && (clone_flags && CLONE_AUTOREAP))
		return -EINVAL;

so that we still can change this behaviour later.

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Josh Triplett March 20, 2015, 9:10 p.m. UTC | #8
On Fri, Mar 20, 2015 at 08:09:14PM +0100, Oleg Nesterov wrote:
> On 03/20, Thiago Macieira wrote:
> >
> > On Friday 20 March 2015 19:14:04 Oleg Nesterov wrote:
> > > Also. I forgot that the kernel always resets ->exit_signal to SIGCHLD on
> > > exec or reparenting. Reparenting is probably fine. But what about exec?
> > > Should it keep ->exit_signal == 0 if "autoreap" ? I think it should not, to
> > > avoid the strange special case.
> >
> > Not delivering any signal was the objective of this patch series, so yes
> > exit_signal == 0 should survive an exec and even re-exec.
> 
> OK, but then perhaps we should never send SIGCHLD (on exit) if "autoreap",
> to make the logic simple.
> 
> And copy_process() should probably do
> 
> 	if ((clone_flags & CSIGNAL) && (clone_flags && CLONE_AUTOREAP))
> 		return -EINVAL;
> 
> so that we still can change this behaviour later.

I'm fine with that, as it would handle the particular use case we care
about.

However, the reset-signal-on-reparent thing might still make sense,
particularly for the ptrace-reparent case (less so for the
reparent-to-child-reaper case).

- Josh Triplett
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 9ec36fd..66feeb7 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1372,6 +1372,8 @@  struct task_struct {
 	unsigned memcg_kmem_skip_account:1;
 #endif
 
+	unsigned autoreap:1; /* Do not become a zombie on exit */
+
 	unsigned long atomic_flags; /* Flags needing atomic access. */
 
 	struct restart_block restart_block;
diff --git a/include/uapi/linux/sched.h b/include/uapi/linux/sched.h
index 7656152..f606c0a 100644
--- a/include/uapi/linux/sched.h
+++ b/include/uapi/linux/sched.h
@@ -37,13 +37,18 @@ 
 #define CLONE_DETACHED	0x00400000
 #define CLONE_STOPPED	0x02000000
 
+/*
+ * Flags that only work with clone4.
+ */
+#define CLONE_AUTOREAP	0x00001000	/* Automatically reap the process */
+
 #ifdef __KERNEL__
 /*
  * Valid flags for clone and for clone4. Kept in this file next to the flag
  * list above, but not exposed to userspace.
  */
 #define CLONE_VALID_FLAGS	(0xffffffffULL & ~(CLONE_PID | CLONE_DETACHED | CLONE_STOPPED))
-#define CLONE4_VALID_FLAGS	CLONE_VALID_FLAGS
+#define CLONE4_VALID_FLAGS	(CLONE_VALID_FLAGS | CLONE_AUTOREAP)
 #endif /* __KERNEL__ */
 
 /*
diff --git a/kernel/fork.c b/kernel/fork.c
index db9012a..c297e5e 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1461,6 +1461,8 @@  static struct task_struct *copy_process(u64 clone_flags,
 		p->tgid = p->pid;
 	}
 
+	p->autoreap = !!(clone_flags & CLONE_AUTOREAP);
+
 	p->nr_dirtied = 0;
 	p->nr_dirtied_pause = 128 >> (PAGE_SHIFT - 10);
 	p->dirty_paused_when = 0;
diff --git a/kernel/signal.c b/kernel/signal.c
index a390499..c0011c0 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1702,6 +1702,8 @@  bool do_notify_parent(struct task_struct *tsk, int sig)
 		if (psig->action[SIGCHLD-1].sa.sa_handler == SIG_IGN)
 			sig = 0;
 	}
+	if (!tsk->ptrace && tsk->autoreap)
+		autoreap = true;
 	if (valid_signal(sig) && sig)
 		__group_send_sig_info(sig, &info, tsk->parent);
 	__wake_up_parent(tsk, tsk->parent);