Message ID | 6d002995485d446e659105f6931307f3e532ce89.1426376419.git.josh@joshtriplett.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 03/15, Josh Triplett wrote: > > Add a CLONE_AUTOREAP flag to request this behavior unconditionally, Yes, CLONE_AUTOREAP is much better. And I agree (mostly) with that we should rely on do_notify_parent(). Howver the patch still doesn't look right. First of all, ->autoreap should be per-process, not per-thread. And there are ptrace/mt issues, it seems. Just for example, we should avoid EXIT_TRACE if autoreap in wait_task_zombie() even if we are going to re-notify parent. Yes... and other problems with ptrace. So let me nack this patch for the moment ;) But let me repeat that personally I agree with this change "in general". EXCEPT: do we really want SIGCHLD from the exiting child? I think we do not. I won't really argue though, but this should be discussed and documented. IIUC, with your patch it is still sent. Josh, please give me some time to think and re-check, I'll write another email next week. I am not sure this is really needed, but it seems to me that we need the preparation patch to make this change clear/simple. Oleg. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sun, Mar 15, 2015 at 03:52:23PM +0100, Oleg Nesterov wrote: > On 03/15, Josh Triplett wrote: > > Add a CLONE_AUTOREAP flag to request this behavior unconditionally, > > Yes, CLONE_AUTOREAP is much better. And I agree (mostly) with that > we should rely on do_notify_parent(). > > Howver the patch still doesn't look right. First of all, ->autoreap > should be per-process, not per-thread. Ah, you're thinking of the case where the parent process launches a child with CLONE_AUTOREAP, that child process launches siblings with CLONE_THREAD and without CLONE_AUTOREAP, and one of those siblings is the last to exit? That seems easy enough to handle: instead of setting ->autoreap unconditionally in copy_process, I can set it only in the non-CLONE_THREAD case, and otherwise let it inherit. Then every task in the group will have the same value for autoreap. (As an aside, what *is* the use case for CLONE_PARENT without CLONE_THREAD?) > And there are ptrace/mt issues, > it seems. Just for example, we should avoid EXIT_TRACE if autoreap in > wait_task_zombie() even if we are going to re-notify parent. I don't see how EXIT_TRACE can happen in wait_task_zombie if autoreap is set. wait_task_zombie does a cmpxchg with exit_state and doesn't proceed unless exit_state was EXIT_ZOMBIE, and I don't see how we can ever reach the EXIT_ZOMBIE state if autoreap. > EXCEPT: do we really want SIGCHLD from the exiting child? I think we > do not. I won't really argue though, but this should be discussed and > documented. IIUC, with your patch it is still sent. I think we do, yes. The caller of clone can already specify what signal they want, including no signal at all. If they specify a signal (SIGCHLD or otherwise) along with CLONE_AUTOREAP, we can send that signal. I don't think that causes any particular problem. That's the same semantic you'd get if you have a SIGCHLD handler with SA_NOCLDWAIT: you'd still get the signal, even though you don't need to (and can't) wait on the child process. > Josh, please give me some time to think and re-check, I'll write another > email next week. I am not sure this is really needed, but it seems to > me that we need the preparation patch to make this change clear/simple. I'd appreciate any feedback you can offer on this series, including any potential subtle interactions with ptrace. - Josh Triplett -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 03/15, Josh Triplett wrote: > > On Sun, Mar 15, 2015 at 03:52:23PM +0100, Oleg Nesterov wrote: > > On 03/15, Josh Triplett wrote: > > > Add a CLONE_AUTOREAP flag to request this behavior unconditionally, > > > > Yes, CLONE_AUTOREAP is much better. And I agree (mostly) with that > > we should rely on do_notify_parent(). > > > > Howver the patch still doesn't look right. First of all, ->autoreap > > should be per-process, not per-thread. > > Ah, you're thinking of the case where the parent process launches a > ... Not really, although we probably need more sanity checks. It should be per-process simply because this "autoreap" affects the whole process. And the sub-threads are already "autoreap". And these 2 autoreap's semantics differ, we should not confuse them. > (As an aside, what *is* the use case for CLONE_PARENT without > CLONE_THREAD?) To me CLONE_PARENT is another historical mistake and the source of misc problems ;) > > And there are ptrace/mt issues, > > it seems. Just for example, we should avoid EXIT_TRACE if autoreap in > > wait_task_zombie() even if we are going to re-notify parent. > > I don't see how EXIT_TRACE can happen in wait_task_zombie if autoreap is > set. wait_task_zombie does a cmpxchg with exit_state and doesn't > proceed unless exit_state was EXIT_ZOMBIE, and I don't see how we can > ever reach the EXIT_ZOMBIE state if autoreap. Because you again forgot about ptrace ;) Josh. Let me try to summarise this later when I have time. Again, I am not sure, perhaps this is even simpler than I currently think. And let me apologize in advance, most probably I will be busy tomorrow. > > EXCEPT: do we really want SIGCHLD from the exiting child? I think we > > do not. I won't really argue though, but this should be discussed and > > documented. IIUC, with your patch it is still sent. > > I think we do, yes. The caller of clone can already specify what signal > they want, including no signal at all. If they specify a signal > (SIGCHLD or otherwise) along with CLONE_AUTOREAP, we can send that > signal. OK. Agreed. Oleg. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sun, Mar 15, 2015 at 08:55:06PM +0100, Oleg Nesterov wrote: > On 03/15, Josh Triplett wrote: > > On Sun, Mar 15, 2015 at 03:52:23PM +0100, Oleg Nesterov wrote: > > > On 03/15, Josh Triplett wrote: > > > > Add a CLONE_AUTOREAP flag to request this behavior unconditionally, > > > > > > Yes, CLONE_AUTOREAP is much better. And I agree (mostly) with that > > > we should rely on do_notify_parent(). > > > > > > Howver the patch still doesn't look right. First of all, ->autoreap > > > should be per-process, not per-thread. > > > > Ah, you're thinking of the case where the parent process launches a > > ... > > Not really, although we probably need more sanity checks. > > It should be per-process simply because this "autoreap" affects the whole > process. And the sub-threads are already "autoreap". And these 2 autoreap's > semantics differ, we should not confuse them. Will the approach I suggested, of having clones with CLONE_THREAD inherit the autoreap value rather than setting it from CLONE_AUTOREAP, implement the semantics you're looking for? Also, are you suggesting that CLONE_AUTOREAP with CLONE_THREAD should produce -EINVAL, or just that it should be ignored? > > (As an aside, what *is* the use case for CLONE_PARENT without > > CLONE_THREAD?) > > To me CLONE_PARENT is another historical mistake and the source of misc > problems ;) I kinda figured. :) > > > And there are ptrace/mt issues, > > > it seems. Just for example, we should avoid EXIT_TRACE if autoreap in > > > wait_task_zombie() even if we are going to re-notify parent. > > > > I don't see how EXIT_TRACE can happen in wait_task_zombie if autoreap is > > set. wait_task_zombie does a cmpxchg with exit_state and doesn't > > proceed unless exit_state was EXIT_ZOMBIE, and I don't see how we can > > ever reach the EXIT_ZOMBIE state if autoreap. > > Because you again forgot about ptrace ;) > > Josh. Let me try to summarise this later when I have time. Again, I am > not sure, perhaps this is even simpler than I currently think. And let > me apologize in advance, most probably I will be busy tomorrow. I look forward to your later review and feedback. - Josh Triplett -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Josh, I am really sorry for delay. On 03/15, Josh Triplett wrote: > > On Sun, Mar 15, 2015 at 08:55:06PM +0100, Oleg Nesterov wrote: > > > It should be per-process simply because this "autoreap" affects the whole > > process. And the sub-threads are already "autoreap". And these 2 autoreap's > > semantics differ, we should not confuse them. > > Will the approach I suggested, of having clones with CLONE_THREAD > inherit the autoreap value rather than setting it from CLONE_AUTOREAP, > implement the semantics you're looking for? Not sure I understand... CLONE_THREAD should not inherit the autoreap. A sub-thread is always autoreapable. > Also, are you suggesting that CLONE_AUTOREAP with CLONE_THREAD should > produce -EINVAL, or just that it should be ignored? Yes, I think CLONE_AUTOREAP | CLONE_THREAD should return -EINVAL. But this all is minor... The main problem is how/when we should check this "autoreap" without making this code even more ugly. I still think we need a preparation patch. I tried to make it today but failed. Will try again on weekend... Note that we can't solely rely on do_notify_parent() which (with your patch) correctly checks !ptrace && autoreap. Just for example. Please look at __ptrace_detach(). Note that if we add CLONE_AUTOREAP this needs a fix in any case. The tracee can be "autoreap" but zombie, because "autoreap" should be ignored until the tracer detaches. But the "same_thread_group" should not call do_notify_parent() again. So this needs another check. And let me quote our discussion from the previous email: > > EXCEPT: do we really want SIGCHLD from the exiting child? I think we > > do not. I won't really argue though, but this should be discussed and > > documented. IIUC, with your patch it is still sent. > > I think we do, yes. The caller of clone can already specify what signal > they want, including no signal at all. If they specify a signal > (SIGCHLD or otherwise) along with CLONE_AUTOREAP, we can send that > signal. OK. Agreed. Yes, I agree... But the changes in __ptrace_detach() depend on whether we need to send a signal or not. Either way the changle is simple, but looks ugly. It would be nice to cleanup this somehow. Also. I forgot that the kernel always resets ->exit_signal to SIGCHLD on exec or reparenting. Reparenting is probably fine. But what about exec? Should it keep ->exit_signal == 0 if "autoreap" ? I think it should not, to avoid the strange special case. > > > > And there are ptrace/mt issues, > > > > it seems. Just for example, we should avoid EXIT_TRACE if autoreap in > > > > wait_task_zombie() even if we are going to re-notify parent. > > > > > > I don't see how EXIT_TRACE can happen in wait_task_zombie if autoreap is > > > set. wait_task_zombie does a cmpxchg with exit_state and doesn't > > > proceed unless exit_state was EXIT_ZOMBIE, and I don't see how we can > > > ever reach the EXIT_ZOMBIE state if autoreap. > > > > Because you again forgot about ptrace ;) And this too asks for preparation before CLONE_AUTOREAP... So I'll try to think about this all again on weekend. I'll try very much to not disappear again ;) Oleg. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Friday 20 March 2015 19:14:04 Oleg Nesterov wrote: > Also. I forgot that the kernel always resets ->exit_signal to SIGCHLD on > exec or reparenting. Reparenting is probably fine. But what about exec? > Should it keep ->exit_signal == 0 if "autoreap" ? I think it should not, to > avoid the strange special case. Not delivering any signal was the objective of this patch series, so yes exit_signal == 0 should survive an exec and even re-exec.
On 03/20, Thiago Macieira wrote: > > On Friday 20 March 2015 19:14:04 Oleg Nesterov wrote: > > Also. I forgot that the kernel always resets ->exit_signal to SIGCHLD on > > exec or reparenting. Reparenting is probably fine. But what about exec? > > Should it keep ->exit_signal == 0 if "autoreap" ? I think it should not, to > > avoid the strange special case. > > Not delivering any signal was the objective of this patch series, so yes > exit_signal == 0 should survive an exec and even re-exec. OK, but then perhaps we should never send SIGCHLD (on exit) if "autoreap", to make the logic simple. And copy_process() should probably do if ((clone_flags & CSIGNAL) && (clone_flags && CLONE_AUTOREAP)) return -EINVAL; so that we still can change this behaviour later. Oleg. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Mar 20, 2015 at 08:09:14PM +0100, Oleg Nesterov wrote: > On 03/20, Thiago Macieira wrote: > > > > On Friday 20 March 2015 19:14:04 Oleg Nesterov wrote: > > > Also. I forgot that the kernel always resets ->exit_signal to SIGCHLD on > > > exec or reparenting. Reparenting is probably fine. But what about exec? > > > Should it keep ->exit_signal == 0 if "autoreap" ? I think it should not, to > > > avoid the strange special case. > > > > Not delivering any signal was the objective of this patch series, so yes > > exit_signal == 0 should survive an exec and even re-exec. > > OK, but then perhaps we should never send SIGCHLD (on exit) if "autoreap", > to make the logic simple. > > And copy_process() should probably do > > if ((clone_flags & CSIGNAL) && (clone_flags && CLONE_AUTOREAP)) > return -EINVAL; > > so that we still can change this behaviour later. I'm fine with that, as it would handle the particular use case we care about. However, the reset-signal-on-reparent thing might still make sense, particularly for the ptrace-reparent case (less so for the reparent-to-child-reaper case). - Josh Triplett -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/include/linux/sched.h b/include/linux/sched.h index 9ec36fd..66feeb7 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1372,6 +1372,8 @@ struct task_struct { unsigned memcg_kmem_skip_account:1; #endif + unsigned autoreap:1; /* Do not become a zombie on exit */ + unsigned long atomic_flags; /* Flags needing atomic access. */ struct restart_block restart_block; diff --git a/include/uapi/linux/sched.h b/include/uapi/linux/sched.h index 7656152..f606c0a 100644 --- a/include/uapi/linux/sched.h +++ b/include/uapi/linux/sched.h @@ -37,13 +37,18 @@ #define CLONE_DETACHED 0x00400000 #define CLONE_STOPPED 0x02000000 +/* + * Flags that only work with clone4. + */ +#define CLONE_AUTOREAP 0x00001000 /* Automatically reap the process */ + #ifdef __KERNEL__ /* * Valid flags for clone and for clone4. Kept in this file next to the flag * list above, but not exposed to userspace. */ #define CLONE_VALID_FLAGS (0xffffffffULL & ~(CLONE_PID | CLONE_DETACHED | CLONE_STOPPED)) -#define CLONE4_VALID_FLAGS CLONE_VALID_FLAGS +#define CLONE4_VALID_FLAGS (CLONE_VALID_FLAGS | CLONE_AUTOREAP) #endif /* __KERNEL__ */ /* diff --git a/kernel/fork.c b/kernel/fork.c index db9012a..c297e5e 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -1461,6 +1461,8 @@ static struct task_struct *copy_process(u64 clone_flags, p->tgid = p->pid; } + p->autoreap = !!(clone_flags & CLONE_AUTOREAP); + p->nr_dirtied = 0; p->nr_dirtied_pause = 128 >> (PAGE_SHIFT - 10); p->dirty_paused_when = 0; diff --git a/kernel/signal.c b/kernel/signal.c index a390499..c0011c0 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -1702,6 +1702,8 @@ bool do_notify_parent(struct task_struct *tsk, int sig) if (psig->action[SIGCHLD-1].sa.sa_handler == SIG_IGN) sig = 0; } + if (!tsk->ptrace && tsk->autoreap) + autoreap = true; if (valid_signal(sig) && sig) __group_send_sig_info(sig, &info, tsk->parent); __wake_up_parent(tsk, tsk->parent);