From patchwork Thu Mar 20 13:24:08 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christian Brauner X-Patchwork-Id: 14023980 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 010B81DFE00 for ; Thu, 20 Mar 2025 13:24:18 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1742477059; cv=none; b=adbqF1jVP+9VgCiYM9IxZ/MRc8jq6J+jnFgxLDhp1dLgtf9DuiujrfQ9iHAMRX+NrzBr73FpD5PNO15bslG3kWw4fUSUKYm/fnXqA8vzCd1ulvKDKKPFxKvNRFnjgbqvdy5q1VEFm5Sev/VAf0yesx3JYOonnui/QiTLtj3hWl0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1742477059; c=relaxed/simple; bh=BtOaVjDpk1fZqkfhmjGWg+YUBdmOSa3NNOtmB7NQCiY=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=O9WRu7osQEKjkkMyp65iTKZ8u754E+bY5CQMN3HGLsA9DJKSlZOYda2CeQER7IsB95qBlFXvnmceY40nUv/s3Whz+h8CRT+IcwV88tpnMuT7MW+rAT6eTRUckawVvgriHLRJqACl46ajs85OxcAjhVhJvHmvassFhDlEpr7iK2A= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=SIXwluCc; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="SIXwluCc" Received: by smtp.kernel.org (Postfix) with ESMTPSA id CBD32C4CEE3; Thu, 20 Mar 2025 13:24:16 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1742477058; bh=BtOaVjDpk1fZqkfhmjGWg+YUBdmOSa3NNOtmB7NQCiY=; h=From:Date:Subject:References:In-Reply-To:To:Cc:From; b=SIXwluCcD57pydf5ZoqGOjdCxgCHs65MBijEaPoSfwwl8GvkSsJy/Y7m7EJKbkRAb rA+Bwdt5sdkqY/RAw3HRJNwoMZC6asWIIxEIl2l10xrov3OXddAzCQXuBZV/nuaQPZ OKAz141f/C5hqN3c9uOTl8YWSV/iz49/4bYjtR+zsxDUkCIg50ptE51KzOv8O5vDsf anqvSq7FDcF4OY2FXuDZJ5ne1H7gAKDiXR/KIzcXaxm1xxNCPFDt5QXgKAtJct02Eh Z/4qrNzvGXgxox8KC3pYAa6/n6ZRJprhuVT5kym0GW++LV3NGuFjkuFIPsgZWMt7UB /AxKdGfIQeh7g== From: Christian Brauner Date: Thu, 20 Mar 2025 14:24:08 +0100 Subject: [PATCH v4 1/4] pidfs: improve multi-threaded exec and premature thread-group leader exit polling Precedence: bulk X-Mailing-List: linux-fsdevel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20250320-work-pidfs-thread_group-v4-1-da678ce805bf@kernel.org> References: <20250320-work-pidfs-thread_group-v4-0-da678ce805bf@kernel.org> In-Reply-To: <20250320-work-pidfs-thread_group-v4-0-da678ce805bf@kernel.org> To: Oleg Nesterov Cc: linux-fsdevel@vger.kernel.org, Jeff Layton , Lennart Poettering , Daan De Meyer , Mike Yuan , Christian Brauner X-Mailer: b4 0.15-dev-42535 X-Developer-Signature: v=1; a=openpgp-sha256; l=4740; i=brauner@kernel.org; h=from:subject:message-id; bh=BtOaVjDpk1fZqkfhmjGWg+YUBdmOSa3NNOtmB7NQCiY=; b=owGbwMvMwCU28Zj0gdSKO4sYT6slMaTfEfu3Q+KNZMKDK8I/4k26lOv9b6/5vk6701l520aeF dK8xZP7O0pZGMS4GGTFFFkc2k3C5ZbzVGw2ytSAmcPKBDKEgYtTACaieYGRYYrf48bva+cvspyo VjFF8+DdaWLh3GJs4je+nj6Ve1lPW43hn+HbJTNjMi8t/mLXY88zT1HYw7+y4v5NBQX96bki+29 9YgcA X-Developer-Key: i=brauner@kernel.org; a=openpgp; fpr=4880B8C9BD0E5106FC070F4F7B3C391EFEA93624 This is another attempt trying to make pidfd polling for multi-threaded exec and premature thread-group leader exit consistent. A quick recap of these two cases: (1) During a multi-threaded exec by a subthread, i.e., non-thread-group leader thread, all other threads in the thread-group including the thread-group leader are killed and the struct pid of the thread-group leader will be taken over by the subthread that called exec. IOW, two tasks change their TIDs. (2) A premature thread-group leader exit means that the thread-group leader exited before all of the other subthreads in the thread-group have exited. Both cases lead to inconsistencies for pidfd polling with PIDFD_THREAD. Any caller that holds a PIDFD_THREAD pidfd to the current thread-group leader may or may not see an exit notification on the file descriptor depending on when poll is performed. If the poll is performed before the exec of the subthread has concluded an exit notification is generated for the old thread-group leader. If the poll is performed after the exec of the subthread has concluded no exit notification is generated for the old thread-group leader. The correct behavior would be to simply not generate an exit notification on the struct pid of a subhthread exec because the struct pid is taken over by the subthread and thus remains alive. But this is difficult to handle because a thread-group may exit prematurely as mentioned in (2). In that case an exit notification is reliably generated but the subthreads may continue to run for an indeterminate amount of time and thus also may exec at some point. So far there was no way to distinguish between (1) and (2) internally. This tiny series tries to address this problem by discarding PIDFD_THREAD notification on premature thread-group leader exit. If that works correctly then no exit notifications are generated for a PIDFD_THREAD pidfd for a thread-group leader until all subthreads have been reaped. If a subthread should exec aftewards no exit notification will be generated until that task exits or it creates subthreads and repeates the cycle. Co-Developed-by: Oleg Nesterov Signed-off-by: Oleg Nesterov Signed-off-by: Christian Brauner --- fs/pidfs.c | 9 +++++---- kernel/exit.c | 6 +++--- kernel/signal.c | 3 +-- 3 files changed, 9 insertions(+), 9 deletions(-) diff --git a/fs/pidfs.c b/fs/pidfs.c index a48cc44ced6f..1b3d23e0ffdd 100644 --- a/fs/pidfs.c +++ b/fs/pidfs.c @@ -210,20 +210,21 @@ static void pidfd_show_fdinfo(struct seq_file *m, struct file *f) static __poll_t pidfd_poll(struct file *file, struct poll_table_struct *pts) { struct pid *pid = pidfd_pid(file); - bool thread = file->f_flags & PIDFD_THREAD; struct task_struct *task; __poll_t poll_flags = 0; poll_wait(file, &pid->wait_pidfd, pts); /* - * Depending on PIDFD_THREAD, inform pollers when the thread - * or the whole thread-group exits. + * Don't wake waiters if the thread-group leader exited + * prematurely. They either get notified when the last subthread + * exits or not at all if one of the remaining subthreads execs + * and assumes the struct pid of the old thread-group leader. */ guard(rcu)(); task = pid_task(pid, PIDTYPE_PID); if (!task) poll_flags = EPOLLIN | EPOLLRDNORM | EPOLLHUP; - else if (task->exit_state && (thread || thread_group_empty(task))) + else if (task->exit_state && !delay_group_leader(task)) poll_flags = EPOLLIN | EPOLLRDNORM; return poll_flags; diff --git a/kernel/exit.c b/kernel/exit.c index 9916305e34d3..683766316a3d 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -743,10 +743,10 @@ static void exit_notify(struct task_struct *tsk, int group_dead) tsk->exit_state = EXIT_ZOMBIE; /* - * sub-thread or delay_group_leader(), wake up the - * PIDFD_THREAD waiters. + * Ignore thread-group leaders that exited before all + * subthreads did. */ - if (!thread_group_empty(tsk)) + if (!delay_group_leader(tsk)) do_notify_pidfd(tsk); if (unlikely(tsk->ptrace)) { diff --git a/kernel/signal.c b/kernel/signal.c index 081f19a24506..027ad9e97417 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -2180,8 +2180,7 @@ bool do_notify_parent(struct task_struct *tsk, int sig) WARN_ON_ONCE(!tsk->ptrace && (tsk->group_leader != tsk || !thread_group_empty(tsk))); /* - * tsk is a group leader and has no threads, wake up the - * non-PIDFD_THREAD waiters. + * Notify for thread-group leaders without subthreads. */ if (thread_group_empty(tsk)) do_notify_pidfd(tsk); From patchwork Thu Mar 20 13:24:09 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christian Brauner X-Patchwork-Id: 14023981 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7DAF91DFE00 for ; Thu, 20 Mar 2025 13:24:21 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1742477061; cv=none; b=gz3ib44q91SZYwDU2HmdmucLhoqKolLfS8TbwbqFZX4PDHZYvorVYOn3BRkGL14hzpUpJ3cXyc4fj0nb9bgJ9wnWADU8iuFfFw8dICLV1pkam65cFdv7evndSjGpfbTecb6bxmJF1j8Zv5hSBkA8xl15bi4Yn0M9XG4lPfkt4PA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1742477061; c=relaxed/simple; bh=nbqWaPaXqyDnB0HDQ1wL/80R9Ma05kzOsoIHnfmrKKY=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=iQClOhYt8EPdCyVSafhwyDxdy9CYpczGAu4tXpKhRxbxf7w+af3FkalT1bDNcPbef+167O40uu2yMs/FTZVjoZch8NNz9Q/YpiQKADHxF8sdwy98pGdLxVDV30j7OBj/9qyVaEIT/u8PtLF3oMyNeWESOy2ZQ/FyjoJNbavsdEg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=f3MR5Ibh; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="f3MR5Ibh" Received: by smtp.kernel.org (Postfix) with ESMTPSA id F00CDC4CEDD; Thu, 20 Mar 2025 13:24:18 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1742477061; bh=nbqWaPaXqyDnB0HDQ1wL/80R9Ma05kzOsoIHnfmrKKY=; h=From:Date:Subject:References:In-Reply-To:To:Cc:From; b=f3MR5Ibhd2l+kn1XAlXC9u4BxOCgL9hW/IoqJfqcKvrWbPIIOXsrUZAXif+lfzMML UCi919NexzlKHrEB7t95V/71lXEZGVLoaaPDbWhcNpjbhannJuYN7xCzAmQdq2s6mI OubaWPAuBGmTwqMYXKAmgHKxlmv0961JUGayCMK11BC/UY1tN6px2Ortpw2qGkuId9 3OR0Ui0qCVRAhCgWexPN2xmQEVPxyz4aFoI4RWWz58LAo4cksG7ImWivOmVf7+gjAd akI0lhEn3cuSzcC3EEWVeNFKnJhG1yNsx9RKDdob3le5S6LGZ5+WdweHq5MQeNYkNk OY8o2PzFq62yA== From: Christian Brauner Date: Thu, 20 Mar 2025 14:24:09 +0100 Subject: [PATCH v4 2/4] selftests/pidfd: first test for multi-threaded exec polling Precedence: bulk X-Mailing-List: linux-fsdevel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20250320-work-pidfs-thread_group-v4-2-da678ce805bf@kernel.org> References: <20250320-work-pidfs-thread_group-v4-0-da678ce805bf@kernel.org> In-Reply-To: <20250320-work-pidfs-thread_group-v4-0-da678ce805bf@kernel.org> To: Oleg Nesterov Cc: linux-fsdevel@vger.kernel.org, Jeff Layton , Lennart Poettering , Daan De Meyer , Mike Yuan , Christian Brauner X-Mailer: b4 0.15-dev-42535 X-Developer-Signature: v=1; a=openpgp-sha256; l=2472; i=brauner@kernel.org; h=from:subject:message-id; bh=nbqWaPaXqyDnB0HDQ1wL/80R9Ma05kzOsoIHnfmrKKY=; b=owGbwMvMwCU28Zj0gdSKO4sYT6slMaTfEfsnP/9yvGZfauZsZQuhA0Uy1k+/6HY1efx4KWvDx 13/r0O7o5SFQYyLQVZMkcWh3SRcbjlPxWajTA2YOaxMIEMYuDgFYCLHQxn+17KppTiVxkhtt97x 2OHJP8u3z/7d+/hS/ebBGAYd/w6rKYwMV5kEteLqpTMSQmo/Gq10zTfxm6E35cj+ns6AEp9H+3f xAgA= X-Developer-Key: i=brauner@kernel.org; a=openpgp; fpr=4880B8C9BD0E5106FC070F4F7B3C391EFEA93624 Add first test for premature thread-group leader exit. Signed-off-by: Christian Brauner --- tools/testing/selftests/pidfd/pidfd_info_test.c | 38 ++++++++++++++++++++----- 1 file changed, 31 insertions(+), 7 deletions(-) diff --git a/tools/testing/selftests/pidfd/pidfd_info_test.c b/tools/testing/selftests/pidfd/pidfd_info_test.c index 09bc4ae7aed5..28a28ae4686a 100644 --- a/tools/testing/selftests/pidfd/pidfd_info_test.c +++ b/tools/testing/selftests/pidfd/pidfd_info_test.c @@ -236,7 +236,7 @@ static void *pidfd_info_pause_thread(void *arg) TEST_F(pidfd_info, thread_group) { - pid_t pid_leader, pid_thread; + pid_t pid_leader, pid_poller, pid_thread; pthread_t thread; int nevents, pidfd_leader, pidfd_thread, pidfd_leader_thread, ret; int ipc_sockets[2]; @@ -262,6 +262,35 @@ TEST_F(pidfd_info, thread_group) syscall(__NR_exit, EXIT_SUCCESS); } + /* + * Opening a PIDFD_THREAD aka thread-specific pidfd based on a + * thread-group leader must succeed. + */ + pidfd_leader_thread = sys_pidfd_open(pid_leader, PIDFD_THREAD); + ASSERT_GE(pidfd_leader_thread, 0); + + pid_poller = fork(); + ASSERT_GE(pid_poller, 0); + if (pid_poller == 0) { + /* + * We can't poll and wait for the old thread-group + * leader to exit using a thread-specific pidfd. The + * thread-group leader exited prematurely and + * notification is delayed until all subthreads have + * exited. + */ + fds.events = POLLIN; + fds.fd = pidfd_leader_thread; + nevents = poll(&fds, 1, 10000 /* wait 5 seconds */); + if (nevents != 0) + _exit(EXIT_FAILURE); + if (fds.revents & POLLIN) + _exit(EXIT_FAILURE); + if (fds.revents & POLLHUP) + _exit(EXIT_FAILURE); + _exit(EXIT_SUCCESS); + } + /* Retrieve the tid of the thread. */ EXPECT_EQ(close(ipc_sockets[1]), 0); ASSERT_EQ(read_nointr(ipc_sockets[0], &pid_thread, sizeof(pid_thread)), sizeof(pid_thread)); @@ -275,12 +304,7 @@ TEST_F(pidfd_info, thread_group) pidfd_thread = sys_pidfd_open(pid_thread, PIDFD_THREAD); ASSERT_GE(pidfd_thread, 0); - /* - * Opening a PIDFD_THREAD aka thread-specific pidfd based on a - * thread-group leader must succeed. - */ - pidfd_leader_thread = sys_pidfd_open(pid_leader, PIDFD_THREAD); - ASSERT_GE(pidfd_leader_thread, 0); + ASSERT_EQ(wait_for_pid(pid_poller), 0); /* * Note that pidfd_leader is a thread-group pidfd, so polling on it From patchwork Thu Mar 20 13:24:10 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christian Brauner X-Patchwork-Id: 14023982 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 709961DFE00 for ; Thu, 20 Mar 2025 13:24:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1742477063; cv=none; b=b07DMBaKt/9Ft6plVO/ZgTzrD7FEBtY42+Y/tiDxC4zrDhs1DDcCw4oJNvBu+AL0OmH76kNUbEKcfoTNFmZ5Ux4YQXz2TqqATQxevcQQsm3ftWC/tOX8SSW2YXH+OhEa7bsm6qRx/Hz029FiupQjm1VeqOfS6htMlIvcM4R8LqQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1742477063; c=relaxed/simple; bh=CAwFZ+NFSzQrP2Mr5CLbJgEZkFRBy4WeTWw8eSTlCbI=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=CA5z1lQdT6epiJxHD90rvd2Kh/bLfSb9EPPz6ya18YT43VsY8iJiTDp3ov6SG9FD5XzAIS/gkPG+KnvefnT0/VyXN8o0zcFYGLMc7Yt/HB6+84k07nnwp5YlN/xhGsJbemm3xXJpOtNe2rsmXp+PBk/dOZ7SE3uuXlmoKKceTAc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=mdk+klSM; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="mdk+klSM" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 6E79CC4CEE3; Thu, 20 Mar 2025 13:24:21 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1742477063; bh=CAwFZ+NFSzQrP2Mr5CLbJgEZkFRBy4WeTWw8eSTlCbI=; h=From:Date:Subject:References:In-Reply-To:To:Cc:From; b=mdk+klSMiGxqAP3BTA20fejJs5AkSTxxDi7QjI+Rg32XEVF/9NBZNgkwX+jc91OHG 3zBke5u8JlX69O1OwaTYCS4fHQ44rjCJWN0uUYntMeZVrlCVOklnU6kN+cBX2HQyZH BBbMSpcmqHDY7EImIvwhPcvX+eXe6Nxe8vK3N/4V1pEzRwmo68xBp5P5fvXeZUV7Ft KKKPQWwK1oO7koKJywJWvOF+lIiFlmYAoWVhLQ+ZIjRtgnqKrswYwJXSEmmNla4EPo lcS8LdHqxybl+r3UY8iR1GqK23aABShu31oT7R5bviaIVewFoltl9tlArlPAFtf3ns pcx9fWxB72aVg== From: Christian Brauner Date: Thu, 20 Mar 2025 14:24:10 +0100 Subject: [PATCH v4 3/4] selftests/pidfd: second test for multi-threaded exec polling Precedence: bulk X-Mailing-List: linux-fsdevel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20250320-work-pidfs-thread_group-v4-3-da678ce805bf@kernel.org> References: <20250320-work-pidfs-thread_group-v4-0-da678ce805bf@kernel.org> In-Reply-To: <20250320-work-pidfs-thread_group-v4-0-da678ce805bf@kernel.org> To: Oleg Nesterov Cc: linux-fsdevel@vger.kernel.org, Jeff Layton , Lennart Poettering , Daan De Meyer , Mike Yuan , Christian Brauner X-Mailer: b4 0.15-dev-42535 X-Developer-Signature: v=1; a=openpgp-sha256; l=4392; i=brauner@kernel.org; h=from:subject:message-id; bh=CAwFZ+NFSzQrP2Mr5CLbJgEZkFRBy4WeTWw8eSTlCbI=; b=owGbwMvMwCU28Zj0gdSKO4sYT6slMaTfEft3dHa9+QnV10XLnoQ9L8zadX+HoeV9D7vOPM5L6 tNnTW+62VHKwiDGxSArpsji0G4SLrecp2KzUaYGzBxWJpAhDFycAjCR7S4M/5R+9/c/rH0q4JLy wMibed7aaMco7+3zD4VP+P7nz5FP21UZGZ7MklC7na1Q6z573Y1KjW8nzjiezpluWKaTWKko4Tn zDy8A X-Developer-Key: i=brauner@kernel.org; a=openpgp; fpr=4880B8C9BD0E5106FC070F4F7B3C391EFEA93624 Ensure that during a multi-threaded exec and premature thread-group leader exit no exit notification is generated. Signed-off-by: Christian Brauner --- tools/testing/selftests/pidfd/pidfd_info_test.c | 72 ++++++++++++++++--------- 1 file changed, 48 insertions(+), 24 deletions(-) diff --git a/tools/testing/selftests/pidfd/pidfd_info_test.c b/tools/testing/selftests/pidfd/pidfd_info_test.c index 28a28ae4686a..4169780c9e55 100644 --- a/tools/testing/selftests/pidfd/pidfd_info_test.c +++ b/tools/testing/selftests/pidfd/pidfd_info_test.c @@ -413,7 +413,7 @@ static void *pidfd_info_thread_exec(void *arg) TEST_F(pidfd_info, thread_group_exec) { - pid_t pid_leader, pid_thread; + pid_t pid_leader, pid_poller, pid_thread; pthread_t thread; int nevents, pidfd_leader, pidfd_leader_thread, pidfd_thread, ret; int ipc_sockets[2]; @@ -439,6 +439,37 @@ TEST_F(pidfd_info, thread_group_exec) syscall(__NR_exit, EXIT_SUCCESS); } + /* Open a thread-specific pidfd for the thread-group leader. */ + pidfd_leader_thread = sys_pidfd_open(pid_leader, PIDFD_THREAD); + ASSERT_GE(pidfd_leader_thread, 0); + + pid_poller = fork(); + ASSERT_GE(pid_poller, 0); + if (pid_poller == 0) { + /* + * We can't poll and wait for the old thread-group + * leader to exit using a thread-specific pidfd. The + * thread-group leader exited prematurely and + * notification is delayed until all subthreads have + * exited. + * + * When the thread has execed it will taken over the old + * thread-group leaders struct pid. Calling poll after + * the thread execed will thus block again because a new + * thread-group has started. + */ + fds.events = POLLIN; + fds.fd = pidfd_leader_thread; + nevents = poll(&fds, 1, 10000 /* wait 5 seconds */); + if (nevents != 0) + _exit(EXIT_FAILURE); + if (fds.revents & POLLIN) + _exit(EXIT_FAILURE); + if (fds.revents & POLLHUP) + _exit(EXIT_FAILURE); + _exit(EXIT_SUCCESS); + } + /* Retrieve the tid of the thread. */ EXPECT_EQ(close(ipc_sockets[1]), 0); ASSERT_EQ(read_nointr(ipc_sockets[0], &pid_thread, sizeof(pid_thread)), sizeof(pid_thread)); @@ -447,33 +478,12 @@ TEST_F(pidfd_info, thread_group_exec) pidfd_thread = sys_pidfd_open(pid_thread, PIDFD_THREAD); ASSERT_GE(pidfd_thread, 0); - /* Open a thread-specific pidfd for the thread-group leader. */ - pidfd_leader_thread = sys_pidfd_open(pid_leader, PIDFD_THREAD); - ASSERT_GE(pidfd_leader_thread, 0); - - /* - * We can poll and wait for the old thread-group leader to exit - * using a thread-specific pidfd. - * - * This only works until the thread has execed. When the thread - * has execed it will have taken over the old thread-group - * leaders struct pid. Calling poll after the thread execed will - * thus block again because a new thread-group has started (Yes, - * it's fscked.). - */ - fds.events = POLLIN; - fds.fd = pidfd_leader_thread; - nevents = poll(&fds, 1, -1); - ASSERT_EQ(nevents, 1); - /* The thread-group leader has exited. */ - ASSERT_TRUE(!!(fds.revents & POLLIN)); - /* The thread-group leader hasn't been reaped. */ - ASSERT_FALSE(!!(fds.revents & POLLHUP)); - /* Now that we've opened a thread-specific pidfd the thread can exec. */ ASSERT_EQ(write_nointr(ipc_sockets[0], &pid_thread, sizeof(pid_thread)), sizeof(pid_thread)); EXPECT_EQ(close(ipc_sockets[0]), 0); + ASSERT_EQ(wait_for_pid(pid_poller), 0); + /* Wait until the kernel has SIGKILLed the thread. */ fds.events = POLLHUP; fds.fd = pidfd_thread; @@ -506,6 +516,20 @@ TEST_F(pidfd_info, thread_group_exec) /* Take down the thread-group leader. */ EXPECT_EQ(sys_pidfd_send_signal(pidfd_leader, SIGKILL, NULL, 0), 0); + + /* + * Afte the exec we're dealing with an empty thread-group so now + * we must see an exit notification on the thread-specific pidfd + * for the thread-group leader as there's no subthread that can + * revive the struct pid. + */ + fds.events = POLLIN; + fds.fd = pidfd_leader_thread; + nevents = poll(&fds, 1, -1); + ASSERT_EQ(nevents, 1); + ASSERT_TRUE(!!(fds.revents & POLLIN)); + ASSERT_FALSE(!!(fds.revents & POLLHUP)); + EXPECT_EQ(sys_waitid(P_PIDFD, pidfd_leader, NULL, WEXITED), 0); /* Retrieve exit information for the thread-group leader. */ From patchwork Thu Mar 20 13:24:11 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christian Brauner X-Patchwork-Id: 14023983 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E9A2C2AE99 for ; Thu, 20 Mar 2025 13:24:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1742477066; cv=none; b=b+Dd2Va2UW9GneSTCrlDat4WAQkiFENPL/of5NqiKxg66ZSquTdCBZHufwF25Nm62IND7VPDealHSFUMP8zg+QEn2BnXkCtfI+y0Dnc1JFT+mWzhM/FIv1pctOCSigluA8wjMAJe+bF7fK22G9i6YdMuFh3z6uVJvrUu5gcFXnQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1742477066; c=relaxed/simple; bh=2oPuMKXAnppK07WHh02sLh41hDtkPLKu83E2A4l+cPw=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=SeuxGYnFa55eFRLBYHSV+m6PSvVMa52Jgh7OEk3BpLPfzptbnaFhdgRLfSTmXDMJBEf+MoJ8SzLEvS24bJ3igHee/zZSuIZf2f/7yH20ED73CyGkT9ZPHgPy0+nJUBlcBkHy6xudF8Ni0c8QS8ZPt6pyEJXy7Gc6DBXt8sexVcU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=Q6OXomrq; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="Q6OXomrq" Received: by smtp.kernel.org (Postfix) with ESMTPSA id D10D6C4CEEC; Thu, 20 Mar 2025 13:24:23 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1742477065; bh=2oPuMKXAnppK07WHh02sLh41hDtkPLKu83E2A4l+cPw=; h=From:Date:Subject:References:In-Reply-To:To:Cc:From; b=Q6OXomrq/fFmUUszs/ghr8EBPk2w83o8eoMKIxd8OC98cgHtME/6+T17hiA341d5j zIyr4aHtK1mOoc4boUSPGaPY+7+XQepRZIKR7y4Yk3Vc7HrVFEJ4kd44VhGALaC25b uoJ/PHO7vxRypi6PVedI7yhv5bVZyH39+8cv5aJvaCFQBqCxMg5+GBGCtR8WihhHaC IlD+QEdNf8vF5izCtLRJQITJhjghnAwwMehPg+NVZfq8oesF4ULJM+zatXlgoAp4dK ovpSC/LuUTFcELFG/XtLTffCKBKugZx6XkelMN66g/8JJz5woZfqB7LgXmXuL0RCIh yV7FG3xNfvJ9A== From: Christian Brauner Date: Thu, 20 Mar 2025 14:24:11 +0100 Subject: [PATCH v4 4/4] selftests/pidfd: third test for multi-threaded exec polling Precedence: bulk X-Mailing-List: linux-fsdevel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20250320-work-pidfs-thread_group-v4-4-da678ce805bf@kernel.org> References: <20250320-work-pidfs-thread_group-v4-0-da678ce805bf@kernel.org> In-Reply-To: <20250320-work-pidfs-thread_group-v4-0-da678ce805bf@kernel.org> To: Oleg Nesterov Cc: linux-fsdevel@vger.kernel.org, Jeff Layton , Lennart Poettering , Daan De Meyer , Mike Yuan , Christian Brauner X-Mailer: b4 0.15-dev-42535 X-Developer-Signature: v=1; a=openpgp-sha256; l=5712; i=brauner@kernel.org; h=from:subject:message-id; bh=2oPuMKXAnppK07WHh02sLh41hDtkPLKu83E2A4l+cPw=; b=owGbwMvMwCU28Zj0gdSKO4sYT6slMaTfEftnE+q4koP90KxMn4RtnxwOa3Lefcree1Dpw9tsJ pmHCySMO0pZGMS4GGTFFFkc2k3C5ZbzVGw2ytSAmcPKBDKEgYtTACbypJaR4dbLSxofO+9pPN7F eqJ7Ssv7wLgfIssvi2pq32GfyiCzdz4jw50fzne8dz+47njzt+N63fgNJwwsJy9fNGH28rjrHnG PVbgA X-Developer-Key: i=brauner@kernel.org; a=openpgp; fpr=4880B8C9BD0E5106FC070F4F7B3C391EFEA93624 Ensure that during a multi-threaded exec and premature thread-group leader exit no exit notification is generated. Signed-off-by: Christian Brauner --- tools/testing/selftests/pidfd/pidfd_info_test.c | 147 ++++++++++++++++++++++++ 1 file changed, 147 insertions(+) diff --git a/tools/testing/selftests/pidfd/pidfd_info_test.c b/tools/testing/selftests/pidfd/pidfd_info_test.c index 4169780c9e55..1758a1b0457b 100644 --- a/tools/testing/selftests/pidfd/pidfd_info_test.c +++ b/tools/testing/selftests/pidfd/pidfd_info_test.c @@ -542,4 +542,151 @@ TEST_F(pidfd_info, thread_group_exec) EXPECT_EQ(close(pidfd_thread), 0); } +static void *pidfd_info_thread_exec_sane(void *arg) +{ + pid_t pid_thread = gettid(); + int ipc_socket = *(int *)arg; + + /* Inform the grand-parent what the tid of this thread is. */ + if (write_nointr(ipc_socket, &pid_thread, sizeof(pid_thread)) != sizeof(pid_thread)) + return NULL; + + if (read_nointr(ipc_socket, &pid_thread, sizeof(pid_thread)) != sizeof(pid_thread)) + return NULL; + + close(ipc_socket); + + sys_execveat(AT_FDCWD, "pidfd_exec_helper", NULL, NULL, 0); + return NULL; +} + +TEST_F(pidfd_info, thread_group_exec_thread) +{ + pid_t pid_leader, pid_poller, pid_thread; + pthread_t thread; + int nevents, pidfd_leader, pidfd_leader_thread, pidfd_thread, ret; + int ipc_sockets[2]; + struct pollfd fds = {}; + struct pidfd_info info = { + .mask = PIDFD_INFO_CGROUPID | PIDFD_INFO_EXIT, + }; + + ret = socketpair(AF_LOCAL, SOCK_STREAM | SOCK_CLOEXEC, 0, ipc_sockets); + EXPECT_EQ(ret, 0); + + pid_leader = create_child(&pidfd_leader, 0); + EXPECT_GE(pid_leader, 0); + + if (pid_leader == 0) { + close(ipc_sockets[0]); + + /* The thread will outlive the thread-group leader. */ + if (pthread_create(&thread, NULL, pidfd_info_thread_exec_sane, &ipc_sockets[1])) + syscall(__NR_exit, EXIT_FAILURE); + + /* + * Pause the thread-group leader. It will be killed once + * the subthread execs. + */ + pause(); + syscall(__NR_exit, EXIT_SUCCESS); + } + + /* Retrieve the tid of the thread. */ + EXPECT_EQ(close(ipc_sockets[1]), 0); + ASSERT_EQ(read_nointr(ipc_sockets[0], &pid_thread, sizeof(pid_thread)), sizeof(pid_thread)); + + /* Opening a thread as a PIDFD_THREAD must succeed. */ + pidfd_thread = sys_pidfd_open(pid_thread, PIDFD_THREAD); + ASSERT_GE(pidfd_thread, 0); + + /* Open a thread-specific pidfd for the thread-group leader. */ + pidfd_leader_thread = sys_pidfd_open(pid_leader, PIDFD_THREAD); + ASSERT_GE(pidfd_leader_thread, 0); + + pid_poller = fork(); + ASSERT_GE(pid_poller, 0); + if (pid_poller == 0) { + /* + * The subthread will now exec. The struct pid of the old + * thread-group leader will be assumed by the subthread which + * becomes the new thread-group leader. So no exit notification + * must be generated. Wait for 5 seconds and call it a success + * if no notification has been received. + */ + fds.events = POLLIN; + fds.fd = pidfd_leader_thread; + nevents = poll(&fds, 1, 10000 /* wait 5 seconds */); + if (nevents != 0) + _exit(EXIT_FAILURE); + if (fds.revents & POLLIN) + _exit(EXIT_FAILURE); + if (fds.revents & POLLHUP) + _exit(EXIT_FAILURE); + _exit(EXIT_SUCCESS); + } + + /* Now that we've opened a thread-specific pidfd the thread can exec. */ + ASSERT_EQ(write_nointr(ipc_sockets[0], &pid_thread, sizeof(pid_thread)), sizeof(pid_thread)); + EXPECT_EQ(close(ipc_sockets[0]), 0); + ASSERT_EQ(wait_for_pid(pid_poller), 0); + + /* Wait until the kernel has SIGKILLed the thread. */ + fds.events = POLLHUP; + fds.fd = pidfd_thread; + nevents = poll(&fds, 1, -1); + ASSERT_EQ(nevents, 1); + /* The thread has been reaped. */ + ASSERT_TRUE(!!(fds.revents & POLLHUP)); + + /* Retrieve thread-specific exit info from pidfd. */ + ASSERT_EQ(ioctl(pidfd_thread, PIDFD_GET_INFO, &info), 0); + ASSERT_FALSE(!!(info.mask & PIDFD_INFO_CREDS)); + ASSERT_TRUE(!!(info.mask & PIDFD_INFO_EXIT)); + /* + * While the kernel will have SIGKILLed the whole thread-group + * during exec it will cause the individual threads to exit + * cleanly. + */ + ASSERT_TRUE(WIFEXITED(info.exit_code)); + ASSERT_EQ(WEXITSTATUS(info.exit_code), 0); + + /* + * The thread-group leader is still alive, the thread has taken + * over its struct pid and thus its pid number. + */ + info.mask = PIDFD_INFO_CGROUPID | PIDFD_INFO_EXIT; + ASSERT_EQ(ioctl(pidfd_leader, PIDFD_GET_INFO, &info), 0); + ASSERT_TRUE(!!(info.mask & PIDFD_INFO_CREDS)); + ASSERT_FALSE(!!(info.mask & PIDFD_INFO_EXIT)); + ASSERT_EQ(info.pid, pid_leader); + + /* Take down the thread-group leader. */ + EXPECT_EQ(sys_pidfd_send_signal(pidfd_leader, SIGKILL, NULL, 0), 0); + + /* + * Afte the exec we're dealing with an empty thread-group so now + * we must see an exit notification on the thread-specific pidfd + * for the thread-group leader as there's no subthread that can + * revive the struct pid. + */ + fds.events = POLLIN; + fds.fd = pidfd_leader_thread; + nevents = poll(&fds, 1, -1); + ASSERT_EQ(nevents, 1); + ASSERT_TRUE(!!(fds.revents & POLLIN)); + ASSERT_FALSE(!!(fds.revents & POLLHUP)); + + EXPECT_EQ(sys_waitid(P_PIDFD, pidfd_leader, NULL, WEXITED), 0); + + /* Retrieve exit information for the thread-group leader. */ + info.mask = PIDFD_INFO_CGROUPID | PIDFD_INFO_EXIT; + ASSERT_EQ(ioctl(pidfd_leader, PIDFD_GET_INFO, &info), 0); + ASSERT_FALSE(!!(info.mask & PIDFD_INFO_CREDS)); + ASSERT_TRUE(!!(info.mask & PIDFD_INFO_EXIT)); + + EXPECT_EQ(close(pidfd_leader), 0); + EXPECT_EQ(close(pidfd_thread), 0); +} + TEST_HARNESS_MAIN