From patchwork Tue Mar 18 09:52:16 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christian Brauner X-Patchwork-Id: 14020684 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B579420296A for ; Tue, 18 Mar 2025 09:52:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1742291551; cv=none; b=jmHHTmTnxjVhcg7nZvk5JNBbKZTLsqfrHhELOiZvnJHw3Dahfdhhe+oR8CeX+zMdeb5IRaX+r9ycmb3jEI8Hc8eOCfc2Sp7gP5Ykp8wO8+mDz9Dq+u3xJ/KJzWHoLUt/WQYBxBvHD6PjxDMVuxe8N2Ihtgv+8kIL1UOvIyNcjPA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1742291551; c=relaxed/simple; bh=RuGsSRN+c5sXqZdSafH71M4K+BbmCU/itXWFIrLwWZ4=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=SNOowo+/ZwdxQk/8wlSjE30wr937sVWX4aJXWNXv4qYDt1LsmLndZ+H4yWgujYFkZW3newo5ROPyJgWyMOp19T6/aZEUeLxOOT3f787FBtEIueDUL9V7FCV1wsQ3iFAYV/U+XicxnhyxJ4mnNYEipskhaaMBulfmlUnFET9Ypkw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=hBGS9HzQ; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="hBGS9HzQ" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 6A4DAC4CEEF; Tue, 18 Mar 2025 09:52:29 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1742291551; bh=RuGsSRN+c5sXqZdSafH71M4K+BbmCU/itXWFIrLwWZ4=; h=From:Date:Subject:References:In-Reply-To:To:Cc:From; b=hBGS9HzQHDp0dSRQ9WWXl3nepMKfkiJb3haUqJKgxweb1OSYRl3VrVVgxXf0vq1uo lb4F+I2p05cxf+8NAHSn5rWb6YfCRpC036yjGcLo3NoB7Pf9RDb3xpfca6kIOFOaFh /hCw+4d83W8JEf6iuPoK/mHF8AmcImquqbpwwY1ckn3fd1ptv8YDNQgo9MZ9RzPEMH YJ/eXPjqFOc9URzVGVmRDDyxbCOmz6h/diXrPvhvxg1LyGlypwkyClz/iSMcAwZjX+ 05xCNc/wIvu8MVsutAMvkJjNYYdovMBZXdyf5msTMtau3G99Q8h8CBo7HyveKtWFvG 3swAAf6gJS2rw== From: Christian Brauner Date: Tue, 18 Mar 2025 10:52:16 +0100 Subject: [PATCH RFC v2 1/3] pidfs: improve multi-threaded exec and premature thread-group leader exit polling Precedence: bulk X-Mailing-List: linux-fsdevel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20250318-work-pidfs-thread_group-v2-1-2677898ffa2e@kernel.org> References: <20250318-work-pidfs-thread_group-v2-0-2677898ffa2e@kernel.org> In-Reply-To: <20250318-work-pidfs-thread_group-v2-0-2677898ffa2e@kernel.org> To: Oleg Nesterov Cc: linux-fsdevel@vger.kernel.org, Jeff Layton , Lennart Poettering , Daan De Meyer , Mike Yuan , Christian Brauner X-Mailer: b4 0.15-dev-42535 X-Developer-Signature: v=1; a=openpgp-sha256; l=7105; i=brauner@kernel.org; h=from:subject:message-id; bh=RuGsSRN+c5sXqZdSafH71M4K+BbmCU/itXWFIrLwWZ4=; b=owGbwMvMwCU28Zj0gdSKO4sYT6slMaTfdIpSFrJUumIj4Ox68EAE58l172d++zvF+O0iacblb oeLrhfFdZSyMIhxMciKKbI4tJuEyy3nqdhslKkBM4eVCWQIAxenAEzEsYKR4aSQn2bifx1h1VUf fcW/deSfnGX76eSW/35/ZHtFJ063EGNkWH9W1VBm9oHJlRe82kzfVHWe7AjZMT1C3+ds5eeCyq0 lPAA= X-Developer-Key: i=brauner@kernel.org; a=openpgp; fpr=4880B8C9BD0E5106FC070F4F7B3C391EFEA93624 This is another attempt trying to make pidfd polling for multi-threaded exec and premature thread-group leader exit consistent. A quick recap of these two cases: (1) During a multi-threaded exec by a subthread, i.e., non-thread-group leader thread, all other threads in the thread-group including the thread-group leader are killed and the struct pid of the thread-group leader will be taken over by the subthread that called exec. IOW, two tasks change their TIDs. (2) A premature thread-group leader exit means that the thread-group leader exited before all of the other subthreads in the thread-group have exited. Both cases lead to inconsistencies for pidfd polling with PIDFD_THREAD. Any caller that holds a PIDFD_THREAD pidfd to the current thread-group leader may or may not see an exit notification on the file descriptor depending on when poll is performed. If the poll is performed before the exec of the subthread has concluded an exit notification is generated for the old thread-group leader. If the poll is performed after the exec of the subthread has concluded no exit notification is generated for the old thread-group leader. The correct behavior would be to simply not generate an exit notification on the struct pid of a subhthread exec because the struct pid is taken over by the subthread and thus remains alive. But this is difficult to handle because a thread-group may exit prematurely as mentioned in (2). In that case an exit notification is reliably generated but the subthreads may continue to run for an indeterminate amount of time and thus also may exec at some point. So far there was no way to distinguish between (1) and (2) internally. This tiny series tries to address this problem by remembering a premature leader exit in struct pid and forgetting it when a subthread execs and takes over the old thread-group leaders struct pid. This can be done without growing struct pid. The 32-bit pid namespace level indicator can be split into two 16-bit integers as only 32 levels of pid namespace nesting are allowed. Even with 16-bit the nesting level can in the future be bumped up to 65,535 (which isn't feasible/sensible for a lot of reasons). The second 16-bit are used as an indicator for a premature thread-group leader exec which is cleared when the last subthread gets autoreaped and the prematurely exited thread-group leader is notified. If that works correctly then no exit notifications are generated for a PIDFD_THREAD pidfd for a thread-group leader until all subthreads have been reaped. If a subthread should exec aftewards no exit notification will be generated until that task exits or it creates subthreads and repeates the cycle. Signed-off-by: Christian Brauner --- fs/pidfs.c | 27 +++++++++++++++++++++++++-- include/linux/pid.h | 3 ++- kernel/exit.c | 24 ++++++++++++++++++++++-- kernel/pid.c | 10 ++++++++++ 4 files changed, 59 insertions(+), 5 deletions(-) diff --git a/fs/pidfs.c b/fs/pidfs.c index d980f779c213..3874ccc0f9d7 100644 --- a/fs/pidfs.c +++ b/fs/pidfs.c @@ -223,8 +223,31 @@ static __poll_t pidfd_poll(struct file *file, struct poll_table_struct *pts) task = pid_task(pid, PIDTYPE_PID); if (!task) poll_flags = EPOLLIN | EPOLLRDNORM | EPOLLHUP; - else if (task->exit_state && (thread || thread_group_empty(task))) - poll_flags = EPOLLIN | EPOLLRDNORM; + else if (task->exit_state) { + if (thread) { + /* + * If this is a regular thread exit then notify + * the PIDFD_THREAD waiters. + * + * Don't notify in the following circumstances: + * + * (1) If this is a premature thread-group + * leader exit then delay the exit nofication + * until the last thread in the thread-group + * gets autoreaped as there might still be a + * thread that execs and revives the struct + * pid of the old thread-group leader. + * (2) There's a multi-threaded exec commencing + * and @pid is the current and therefore new + * thread-group leader's pid. + */ + if (likely(!READ_ONCE(pid->delayed_leader))) + poll_flags = EPOLLIN | EPOLLRDNORM; + } + + if (thread_group_empty(task)) + poll_flags = EPOLLIN | EPOLLRDNORM; + } return poll_flags; } diff --git a/include/linux/pid.h b/include/linux/pid.h index 98837a1ff0f3..e6dade16caad 100644 --- a/include/linux/pid.h +++ b/include/linux/pid.h @@ -55,7 +55,8 @@ struct upid { struct pid { refcount_t count; - unsigned int level; + u16 level; + u16 delayed_leader; spinlock_t lock; struct dentry *stashed; u64 ino; diff --git a/kernel/exit.c b/kernel/exit.c index 9916305e34d3..f01ee0a08707 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -267,6 +267,11 @@ void release_task(struct task_struct *p) leader = p->group_leader; if (leader != p && thread_group_empty(leader) && leader->exit_state == EXIT_ZOMBIE) { + struct pid *pid; + + pid = task_pid(leader); + WRITE_ONCE(pid->delayed_leader, 0); + /* * If we were the last child thread and the leader has * exited already, and the leader's parent ignores SIGCHLD, @@ -746,8 +751,23 @@ static void exit_notify(struct task_struct *tsk, int group_dead) * sub-thread or delay_group_leader(), wake up the * PIDFD_THREAD waiters. */ - if (!thread_group_empty(tsk)) - do_notify_pidfd(tsk); + if (!thread_group_empty(tsk)) { + if (delay_group_leader(tsk)) { + struct pid *pid; + + /* + * This is a thread-group leader exiting before + * all of its subthreads have exited allow pidfd + * polling to detect this case and delay exit + * notification until the last thread has + * exited. + */ + pid = task_pid(tsk); + WRITE_ONCE(pid->delayed_leader, 1); + } else { + do_notify_pidfd(tsk); + } + } if (unlikely(tsk->ptrace)) { int sig = thread_group_leader(tsk) && diff --git a/kernel/pid.c b/kernel/pid.c index 22f5d2b2e290..7b8ad2a8e74f 100644 --- a/kernel/pid.c +++ b/kernel/pid.c @@ -264,6 +264,7 @@ struct pid *alloc_pid(struct pid_namespace *ns, pid_t *set_tid, get_pid_ns(ns); refcount_set(&pid->count, 1); spin_lock_init(&pid->lock); + pid->delayed_leader = 0; for (type = 0; type < PIDTYPE_MAX; ++type) INIT_HLIST_HEAD(&pid->tasks[type]); @@ -386,6 +387,15 @@ void exchange_tids(struct task_struct *left, struct task_struct *right) struct hlist_head *head1 = &pid1->tasks[PIDTYPE_PID]; struct hlist_head *head2 = &pid2->tasks[PIDTYPE_PID]; + /* + * If delayed leader marker is set then this was a malformed + * thread-group exec. The thread-group leader had exited before + * all of its subthreads and then one of the subthreads execed. + * The struct pid continues it's existence so remove the + * premature thread-group leader exit indicator. + */ + WRITE_ONCE(pid2->delayed_leader, 0); + /* Swap the single entry tid lists */ hlists_swap_heads_rcu(head1, head2); From patchwork Tue Mar 18 09:52:17 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christian Brauner X-Patchwork-Id: 14020685 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E708E207E0E for ; Tue, 18 Mar 2025 09:52:34 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1742291555; cv=none; b=rs+gaCnv7bq4sLVCEM49yU4U2iyL4pOERnCA5TyWFyOU19mNTO/yqDPs2UFTD2xtkYkmpDWt/wUV5mwcSFqiecIN2wRsC85T2DrXuej+XLPVwoK136j4ezAxF9LrGBCQidjUEJfvv/3sknIPIufGBIt9z2vk36ro7dylkq6iGhY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1742291555; c=relaxed/simple; bh=BjKAgkh2rS0T3SwX180fJnUH9jZJlezJNFeTeIdoUpM=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=THr5JSjD/NdEpAtX2kdU7/LX5kFlHyc3rcddd7zeAWzbYNkeu4X35i59abjAQrk7T+HxrR4V99p1LqbTG08qjq4p3rEId1636UbGo2ZeqHObhu4EO1fypkjSc7n0Msrimx7cHPX9rDGu3QeQYogkenpQ8/DjYDnR8N2pwXa9w24= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=J6xMtYb2; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="J6xMtYb2" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 3C2D1C4CEE9; Tue, 18 Mar 2025 09:52:32 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1742291554; bh=BjKAgkh2rS0T3SwX180fJnUH9jZJlezJNFeTeIdoUpM=; h=From:Date:Subject:References:In-Reply-To:To:Cc:From; b=J6xMtYb2772TbyU2G3dBbfaNXbS/2m8c2++kKrEDSdi5o48MchLtqWCX27dIsBwA2 4v3JqlP67TO8eKo9TCU6iy3EuV9cAUgDSUBhKmdmODmCOytUSJRLvlhgovILWHCN3m DMvgXfxJDQVa5YRkNlda3dVYsOFD/+NQJ3zNNZ6sVUw5EJL5QNtdVtAGNm5/Z1C/kW yDIVhTu9HWMwsFGY5+p2eZ9dLkByEdKo9qCjXHG0162oGqnCLYucelmcecMdXT5Xpa q199fKi32x61xw0MyYHjsIO/+aVpVQa3MZ2/XQ0PB8kv0ghPup1SnSJc/q+1/pzQkB mLak3ORzNzWlQ== From: Christian Brauner Date: Tue, 18 Mar 2025 10:52:17 +0100 Subject: [PATCH RFC v2 2/3] selftests/pidfd: first test for multi-threaded exec polling Precedence: bulk X-Mailing-List: linux-fsdevel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20250318-work-pidfs-thread_group-v2-2-2677898ffa2e@kernel.org> References: <20250318-work-pidfs-thread_group-v2-0-2677898ffa2e@kernel.org> In-Reply-To: <20250318-work-pidfs-thread_group-v2-0-2677898ffa2e@kernel.org> To: Oleg Nesterov Cc: linux-fsdevel@vger.kernel.org, Jeff Layton , Lennart Poettering , Daan De Meyer , Mike Yuan , Christian Brauner X-Mailer: b4 0.15-dev-42535 X-Developer-Signature: v=1; a=openpgp-sha256; l=3444; i=brauner@kernel.org; h=from:subject:message-id; bh=BjKAgkh2rS0T3SwX180fJnUH9jZJlezJNFeTeIdoUpM=; b=kA0DAAoWkcYbwGV43KIByyZiAGfZQlqhc0ncoaqwjhsCrqZm8fw3kzwxMgMYWw/avuw2wvvfX 4h1BAAWCgAdFiEEQIc0Vx6nDHizMmkokcYbwGV43KIFAmfZQloACgkQkcYbwGV43KJJpwD/caZt uHLd+7FZirbtMFkGsWMJNNdPY3mZTjbmFRJwNuUA/AnLFtli84DqhlmhBFXNzp1fmtdCRQ2olDh 9OKcSvlgN X-Developer-Key: i=brauner@kernel.org; a=openpgp; fpr=4880B8C9BD0E5106FC070F4F7B3C391EFEA93624 Ensure that during a multi-threaded exec and premature thread-group leader exit no exit notification is generated. Signed-off-by: Christian Brauner --- tools/testing/selftests/pidfd/pidfd_info_test.c | 47 +++++++++++++++++++------ 1 file changed, 36 insertions(+), 11 deletions(-) diff --git a/tools/testing/selftests/pidfd/pidfd_info_test.c b/tools/testing/selftests/pidfd/pidfd_info_test.c index 09bc4ae7aed5..f06b8e2f969a 100644 --- a/tools/testing/selftests/pidfd/pidfd_info_test.c +++ b/tools/testing/selftests/pidfd/pidfd_info_test.c @@ -428,26 +428,37 @@ TEST_F(pidfd_info, thread_group_exec) ASSERT_GE(pidfd_leader_thread, 0); /* - * We can poll and wait for the old thread-group leader to exit - * using a thread-specific pidfd. + * We can't poll and wait for the old thread-group leader to exit + * using a thread-specific pidfd. The thread-group leader exited + * prematurely and notification is delayed until all subthreads + * have exited. * - * This only works until the thread has execed. When the thread - * has execed it will have taken over the old thread-group - * leaders struct pid. Calling poll after the thread execed will - * thus block again because a new thread-group has started (Yes, - * it's fscked.). + * When the thread has execed it will taken over the old + * thread-group leaders struct pid. Calling poll after the + * thread execed will thus block again because a new + * thread-group has started. */ fds.events = POLLIN; fds.fd = pidfd_leader_thread; - nevents = poll(&fds, 1, -1); - ASSERT_EQ(nevents, 1); - /* The thread-group leader has exited. */ - ASSERT_TRUE(!!(fds.revents & POLLIN)); + nevents = poll(&fds, 1, 2000 /* wait 2 seconds */); + ASSERT_EQ(nevents, 0); + /* The thread-group leader has exited but there's still a live subthread. */ + ASSERT_FALSE(!!(fds.revents & POLLIN)); /* The thread-group leader hasn't been reaped. */ ASSERT_FALSE(!!(fds.revents & POLLHUP)); /* Now that we've opened a thread-specific pidfd the thread can exec. */ ASSERT_EQ(write_nointr(ipc_sockets[0], &pid_thread, sizeof(pid_thread)), sizeof(pid_thread)); + + fds.events = POLLIN; + fds.fd = pidfd_leader_thread; + nevents = poll(&fds, 1, 2000 /* wait 2 seconds */); + ASSERT_EQ(nevents, 0); + /* The thread-group leader has exited but there's still a live subthread. */ + ASSERT_FALSE(!!(fds.revents & POLLIN)); + /* The thread-group leader hasn't been reaped. */ + ASSERT_FALSE(!!(fds.revents & POLLHUP)); + EXPECT_EQ(close(ipc_sockets[0]), 0); /* Wait until the kernel has SIGKILLed the thread. */ @@ -482,6 +493,20 @@ TEST_F(pidfd_info, thread_group_exec) /* Take down the thread-group leader. */ EXPECT_EQ(sys_pidfd_send_signal(pidfd_leader, SIGKILL, NULL, 0), 0); + + /* + * Afte the exec we're dealing with an empty thread-group so now + * we must see an exit notification on the thread-specific pidfd + * for the thread-group leader as there's no subthread that can + * revive the struct pid. + */ + fds.events = POLLIN; + fds.fd = pidfd_leader_thread; + nevents = poll(&fds, 1, -1); + ASSERT_EQ(nevents, 1); + ASSERT_TRUE(!!(fds.revents & POLLIN)); + ASSERT_FALSE(!!(fds.revents & POLLHUP)); + EXPECT_EQ(sys_waitid(P_PIDFD, pidfd_leader, NULL, WEXITED), 0); /* Retrieve exit information for the thread-group leader. */ From patchwork Tue Mar 18 09:52:18 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christian Brauner X-Patchwork-Id: 14020686 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6ECCE207DEE for ; Tue, 18 Mar 2025 09:52:37 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1742291557; cv=none; b=Xi9y9h2od951bJRiMZ6e6jBQkS1rDIxEOAfOhaCk1GURCmCpwafJPg5cq/IbU0AUkM94i/lJ4KwEaMOhCMuJCxPNh2MYooE2RttSmtyYVFJnAvnJAu4dUU77fH/+kSbrR+Q5l56AryBSqZ6HYvNjZJdJ3kZ8UCfiVnxplKIylyU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1742291557; c=relaxed/simple; bh=48L0rXuRmbKbnU8ZJxRjmReXIG9hmCA13naUO/++Rn4=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=cTsadN+5ELBZErYnJh8Q9hKev0u+ZrKt4f0rBTAkq8m2y0efOAkCoLZ/fOA3+52mSi3zafhzq+3ETW/9HbSrG8WhC8WCffC2DbEF4pXS8d/R/7nsjUbe/zRlAbUjyoQ0WTmDYUV4jgAkWFcpX15nnY3L1yZ0idlc6yjnm21k8+Y= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=Y+DCXs1v; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="Y+DCXs1v" Received: by smtp.kernel.org (Postfix) with ESMTPSA id EE799C4CEDD; Tue, 18 Mar 2025 09:52:34 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1742291556; bh=48L0rXuRmbKbnU8ZJxRjmReXIG9hmCA13naUO/++Rn4=; h=From:Date:Subject:References:In-Reply-To:To:Cc:From; b=Y+DCXs1vM8VljXeek6SpXtsBnxdVaqwdY3wYbT2EY6N6wp6VSUKwFPw9VmFKkLXyM jqOy55MSfkxwAqnOuQ/HEX+REsQrPHAs6o/V+n00YHOek9M0355+2lu/WVyH8LVV8O 61P3UsuwCI3tt+iCnuPX9bXGleLJfWetnD/Mb8WnhskYHcrnZ64InYKDqDay1ZMtLH C4BE2rDgm/Kl7yL3eR4WcmPf2vpAGIJxTDJ1E9YMMDjYgAw/+Yz9TtCqtd/Pm8IvAY Mnd7a2ZbOITxmcBRt2G6E7VKnalbmZ8zYWxKlr97E46TVMqChiHHpUZSZeNbBK0np4 bOxKysU4PMu4A== From: Christian Brauner Date: Tue, 18 Mar 2025 10:52:18 +0100 Subject: [PATCH RFC v2 3/3] selftests/pidfd: second test for multi-threaded exec polling Precedence: bulk X-Mailing-List: linux-fsdevel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20250318-work-pidfs-thread_group-v2-3-2677898ffa2e@kernel.org> References: <20250318-work-pidfs-thread_group-v2-0-2677898ffa2e@kernel.org> In-Reply-To: <20250318-work-pidfs-thread_group-v2-0-2677898ffa2e@kernel.org> To: Oleg Nesterov Cc: linux-fsdevel@vger.kernel.org, Jeff Layton , Lennart Poettering , Daan De Meyer , Mike Yuan , Christian Brauner X-Mailer: b4 0.15-dev-42535 X-Developer-Signature: v=1; a=openpgp-sha256; l=5489; i=brauner@kernel.org; h=from:subject:message-id; bh=48L0rXuRmbKbnU8ZJxRjmReXIG9hmCA13naUO/++Rn4=; b=owGbwMvMwCU28Zj0gdSKO4sYT6slMaTfdIrqOxyhFntpWmF1GO8MYXe3z8+W+F3/+vtl6rrFc 4pfuze4dJSyMIhxMciKKbI4tJuEyy3nqdhslKkBM4eVCWQIAxenAEwkIZGR4fGyss1mwhrRE7m+ 3Z3i6cI0c+M54ZnrHBN2Re6qDWLe/orhF1P/deG2UDXGxV0fFLhKczq3lxfu/vvX6mYnw4aQrAu qTAA= X-Developer-Key: i=brauner@kernel.org; a=openpgp; fpr=4880B8C9BD0E5106FC070F4F7B3C391EFEA93624 Ensure that during a multi-threaded exec and premature thread-group leader exit no exit notification is generated. Signed-off-by: Christian Brauner --- tools/testing/selftests/pidfd/pidfd_info_test.c | 138 ++++++++++++++++++++++++ 1 file changed, 138 insertions(+) diff --git a/tools/testing/selftests/pidfd/pidfd_info_test.c b/tools/testing/selftests/pidfd/pidfd_info_test.c index f06b8e2f969a..f6ead7993f7e 100644 --- a/tools/testing/selftests/pidfd/pidfd_info_test.c +++ b/tools/testing/selftests/pidfd/pidfd_info_test.c @@ -519,4 +519,142 @@ TEST_F(pidfd_info, thread_group_exec) EXPECT_EQ(close(pidfd_thread), 0); } +static void *pidfd_info_thread_exec_sane(void *arg) +{ + pid_t pid_thread = gettid(); + int ipc_socket = *(int *)arg; + + /* Inform the grand-parent what the tid of this thread is. */ + if (write_nointr(ipc_socket, &pid_thread, sizeof(pid_thread)) != sizeof(pid_thread)) + return NULL; + + if (read_nointr(ipc_socket, &pid_thread, sizeof(pid_thread)) != sizeof(pid_thread)) + return NULL; + + close(ipc_socket); + + sys_execveat(AT_FDCWD, "pidfd_exec_helper", NULL, NULL, 0); + return NULL; +} + +TEST_F(pidfd_info, thread_group_exec_thread) +{ + pid_t pid_leader, pid_thread; + pthread_t thread; + int nevents, pidfd_leader, pidfd_leader_thread, pidfd_thread, ret; + int ipc_sockets[2]; + struct pollfd fds = {}; + struct pidfd_info info = { + .mask = PIDFD_INFO_CGROUPID | PIDFD_INFO_EXIT, + }; + + ret = socketpair(AF_LOCAL, SOCK_STREAM | SOCK_CLOEXEC, 0, ipc_sockets); + EXPECT_EQ(ret, 0); + + pid_leader = create_child(&pidfd_leader, 0); + EXPECT_GE(pid_leader, 0); + + if (pid_leader == 0) { + close(ipc_sockets[0]); + + /* The thread will outlive the thread-group leader. */ + if (pthread_create(&thread, NULL, pidfd_info_thread_exec_sane, &ipc_sockets[1])) + syscall(__NR_exit, EXIT_FAILURE); + + /* + * Pause the thread-group leader. It will be killed once + * the subthread execs. + */ + pause(); + syscall(__NR_exit, EXIT_SUCCESS); + } + + /* Retrieve the tid of the thread. */ + EXPECT_EQ(close(ipc_sockets[1]), 0); + ASSERT_EQ(read_nointr(ipc_sockets[0], &pid_thread, sizeof(pid_thread)), sizeof(pid_thread)); + + /* Opening a thread as a PIDFD_THREAD must succeed. */ + pidfd_thread = sys_pidfd_open(pid_thread, PIDFD_THREAD); + ASSERT_GE(pidfd_thread, 0); + + /* Open a thread-specific pidfd for the thread-group leader. */ + pidfd_leader_thread = sys_pidfd_open(pid_leader, PIDFD_THREAD); + ASSERT_GE(pidfd_leader_thread, 0); + + /* Now that we've opened a thread-specific pidfd the thread can exec. */ + ASSERT_EQ(write_nointr(ipc_sockets[0], &pid_thread, sizeof(pid_thread)), sizeof(pid_thread)); + EXPECT_EQ(close(ipc_sockets[0]), 0); + + /* + * The subthread will now exec. The struct pid of the old + * thread-group leader will be assumed by the subthread which + * becomes the new thread-group leader. So no exit notification + * must be generated. Wait for 5 seconds and call it a success + * if no notification has been received. + */ + fds.events = POLLIN; + fds.fd = pidfd_leader_thread; + nevents = poll(&fds, 1, 5000 /* wait 5 seconds */); + ASSERT_EQ(nevents, 0); + ASSERT_FALSE(!!(fds.revents & POLLIN)); + ASSERT_FALSE(!!(fds.revents & POLLHUP)); + + /* Wait until the kernel has SIGKILLed the thread. */ + fds.events = POLLHUP; + fds.fd = pidfd_thread; + nevents = poll(&fds, 1, -1); + ASSERT_EQ(nevents, 1); + /* The thread has been reaped. */ + ASSERT_TRUE(!!(fds.revents & POLLHUP)); + + /* Retrieve thread-specific exit info from pidfd. */ + ASSERT_EQ(ioctl(pidfd_thread, PIDFD_GET_INFO, &info), 0); + ASSERT_FALSE(!!(info.mask & PIDFD_INFO_CREDS)); + ASSERT_TRUE(!!(info.mask & PIDFD_INFO_EXIT)); + /* + * While the kernel will have SIGKILLed the whole thread-group + * during exec it will cause the individual threads to exit + * cleanly. + */ + ASSERT_TRUE(WIFEXITED(info.exit_code)); + ASSERT_EQ(WEXITSTATUS(info.exit_code), 0); + + /* + * The thread-group leader is still alive, the thread has taken + * over its struct pid and thus its pid number. + */ + info.mask = PIDFD_INFO_CGROUPID | PIDFD_INFO_EXIT; + ASSERT_EQ(ioctl(pidfd_leader, PIDFD_GET_INFO, &info), 0); + ASSERT_TRUE(!!(info.mask & PIDFD_INFO_CREDS)); + ASSERT_FALSE(!!(info.mask & PIDFD_INFO_EXIT)); + ASSERT_EQ(info.pid, pid_leader); + + /* Take down the thread-group leader. */ + EXPECT_EQ(sys_pidfd_send_signal(pidfd_leader, SIGKILL, NULL, 0), 0); + + /* + * Afte the exec we're dealing with an empty thread-group so now + * we must see an exit notification on the thread-specific pidfd + * for the thread-group leader as there's no subthread that can + * revive the struct pid. + */ + fds.events = POLLIN; + fds.fd = pidfd_leader_thread; + nevents = poll(&fds, 1, -1); + ASSERT_EQ(nevents, 1); + ASSERT_TRUE(!!(fds.revents & POLLIN)); + ASSERT_FALSE(!!(fds.revents & POLLHUP)); + + EXPECT_EQ(sys_waitid(P_PIDFD, pidfd_leader, NULL, WEXITED), 0); + + /* Retrieve exit information for the thread-group leader. */ + info.mask = PIDFD_INFO_CGROUPID | PIDFD_INFO_EXIT; + ASSERT_EQ(ioctl(pidfd_leader, PIDFD_GET_INFO, &info), 0); + ASSERT_FALSE(!!(info.mask & PIDFD_INFO_CREDS)); + ASSERT_TRUE(!!(info.mask & PIDFD_INFO_EXIT)); + + EXPECT_EQ(close(pidfd_leader), 0); + EXPECT_EQ(close(pidfd_thread), 0); +} + TEST_HARNESS_MAIN