Message ID | 20250320-work-pidfs-thread_group-v4-1-da678ce805bf@kernel.org (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | pidfs: handle multi-threaded exec and premature thread-group leader exit | expand |
On 03/20, Christian Brauner wrote: > > Co-Developed-by: Oleg Nesterov <oleg@redhat.com> > Signed-off-by: Oleg Nesterov <oleg@redhat.com> > Signed-off-by: Christian Brauner <brauner@kernel.org> > --- > fs/pidfs.c | 9 +++++---- > kernel/exit.c | 6 +++--- > kernel/signal.c | 3 +-- > 3 files changed, 9 insertions(+), 9 deletions(-) Thanks, LGTM. Todo: - As we already discussed, do_notify_pidfd() can be called twice from exit_notify() paths, we can avoid this. But this connects to another minor issue: - With this change, debuggers can no longer use PIDFD_THREAD. I guess we don't really care, I don't think PIDFD_THREAD was ever used for this purpose. but perhaps we can change this or at least document somewhere... I'll try to do this but not today and (most probably) not tomorrow. Oleg.
On Thu, Mar 20, 2025 at 03:28:17PM +0100, Oleg Nesterov wrote: > On 03/20, Christian Brauner wrote: > > > > Co-Developed-by: Oleg Nesterov <oleg@redhat.com> > > Signed-off-by: Oleg Nesterov <oleg@redhat.com> > > Signed-off-by: Christian Brauner <brauner@kernel.org> > > --- > > fs/pidfs.c | 9 +++++---- > > kernel/exit.c | 6 +++--- > > kernel/signal.c | 3 +-- > > 3 files changed, 9 insertions(+), 9 deletions(-) > > Thanks, LGTM. > > Todo: > > - As we already discussed, do_notify_pidfd() can be called > twice from exit_notify() paths, we can avoid this. > > But this connects to another minor issue: > > - With this change, debuggers can no longer use PIDFD_THREAD. > I guess we don't really care, I don't think PIDFD_THREAD was > ever used for this purpose. but perhaps we can change this If you have ideas how to let them use it I'm all for it! > or at least document somewhere... > > I'll try to do this but not today and (most probably) not tomorrow. Cool, sounds great!
diff --git a/fs/pidfs.c b/fs/pidfs.c index a48cc44ced6f..1b3d23e0ffdd 100644 --- a/fs/pidfs.c +++ b/fs/pidfs.c @@ -210,20 +210,21 @@ static void pidfd_show_fdinfo(struct seq_file *m, struct file *f) static __poll_t pidfd_poll(struct file *file, struct poll_table_struct *pts) { struct pid *pid = pidfd_pid(file); - bool thread = file->f_flags & PIDFD_THREAD; struct task_struct *task; __poll_t poll_flags = 0; poll_wait(file, &pid->wait_pidfd, pts); /* - * Depending on PIDFD_THREAD, inform pollers when the thread - * or the whole thread-group exits. + * Don't wake waiters if the thread-group leader exited + * prematurely. They either get notified when the last subthread + * exits or not at all if one of the remaining subthreads execs + * and assumes the struct pid of the old thread-group leader. */ guard(rcu)(); task = pid_task(pid, PIDTYPE_PID); if (!task) poll_flags = EPOLLIN | EPOLLRDNORM | EPOLLHUP; - else if (task->exit_state && (thread || thread_group_empty(task))) + else if (task->exit_state && !delay_group_leader(task)) poll_flags = EPOLLIN | EPOLLRDNORM; return poll_flags; diff --git a/kernel/exit.c b/kernel/exit.c index 9916305e34d3..683766316a3d 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -743,10 +743,10 @@ static void exit_notify(struct task_struct *tsk, int group_dead) tsk->exit_state = EXIT_ZOMBIE; /* - * sub-thread or delay_group_leader(), wake up the - * PIDFD_THREAD waiters. + * Ignore thread-group leaders that exited before all + * subthreads did. */ - if (!thread_group_empty(tsk)) + if (!delay_group_leader(tsk)) do_notify_pidfd(tsk); if (unlikely(tsk->ptrace)) { diff --git a/kernel/signal.c b/kernel/signal.c index 081f19a24506..027ad9e97417 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -2180,8 +2180,7 @@ bool do_notify_parent(struct task_struct *tsk, int sig) WARN_ON_ONCE(!tsk->ptrace && (tsk->group_leader != tsk || !thread_group_empty(tsk))); /* - * tsk is a group leader and has no threads, wake up the - * non-PIDFD_THREAD waiters. + * Notify for thread-group leaders without subthreads. */ if (thread_group_empty(tsk)) do_notify_pidfd(tsk);