diff mbox series

[RFC,3/4] pidfd: improve uapi when task isn't found

Message ID 20250403-work-pidfd-fixes-v1-3-a123b6ed6716@kernel.org (mailing list archive)
State New
Headers show
Series pidfd: improve uapi when task isn't found | expand

Commit Message

Christian Brauner April 3, 2025, 2:09 p.m. UTC
We currently report EINVAL whenever a struct pid has no tasked attached
anymore thereby conflating two concepts:

(1) The task has already been reaped.
(2) The caller requested a pidfd for a thread-group leader but the pid
    actually references a struct pid that isn't used as a thread-group
    leader.

This is causing issues for non-threaded workloads as in [1].

This patch tries to allow userspace to distinguish between (1) and (2).
This is racy of course but that shouldn't matter.

Link: https://github.com/systemd/systemd/pull/36982 [1]
Signed-off-by: Christian Brauner <brauner@kernel.org>
---
 kernel/fork.c | 31 ++++++++++++++++++++++++++++---
 1 file changed, 28 insertions(+), 3 deletions(-)
diff mbox series

Patch

diff --git a/kernel/fork.c b/kernel/fork.c
index 182ec2e9087d..0fe54fcd11b3 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2108,10 +2108,35 @@  static int __pidfd_prepare(struct pid *pid, unsigned int flags, struct file **re
  */
 int pidfd_prepare(struct pid *pid, unsigned int flags, struct file **ret)
 {
-	bool thread = flags & PIDFD_THREAD;
+	int err = 0;
 
-	if (!pid_has_task(pid, thread ? PIDTYPE_PID : PIDTYPE_TGID))
-		return -EINVAL;
+	if (!(flags & PIDFD_THREAD)) {
+		/*
+		 * If this is struct pid isn't used as a thread-group
+		 * leader pid but the caller requested to create a
+		 * thread-group leader pidfd then report ENOENT to the
+		 * caller as a hint.
+		 */
+		if (!pid_has_task(pid, PIDTYPE_TGID))
+			err = -ENOENT;
+	}
+
+	/*
+	 * If this wasn't a thread-group leader struct pid or the task
+	 * got reaped in the meantime report -ESRCH to userspace.
+	 *
+	 * This is racy of course. This could've not been a thread-group
+	 * leader struct pid and we set ENOENT above but in the meantime
+	 * the task got reaped. Or there was a multi-threaded-exec by a
+	 * subthread and we were a thread-group leader but now got
+	 * killed. All of that doesn't matter since the task has already
+	 * been reaped that distinction is meaningless to userspace so
+	 * just report ESRCH.
+	 */
+	if (!pid_has_task(pid, PIDTYPE_PID))
+		err = -ESRCH;
+	if (err)
+		return err;
 
 	return __pidfd_prepare(pid, flags, ret);
 }