[RFC,05/10] pidfs: record exit code and cgroupid at exit

Message ID	20250228-work-pidfs-kill_on_last_close-v1-5-5bd7e6bb428e@kernel.org (mailing list archive)
State	New
Headers	show Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2092026E14F for <linux-fsdevel@vger.kernel.org>; Fri, 28 Feb 2025 12:44:30 +0000 (UTC) From: Christian Brauner <brauner@kernel.org> Date: Fri, 28 Feb 2025 13:44:05 +0100 Subject: [PATCH RFC 05/10] pidfs: record exit code and cgroupid at exit Precedence: bulk MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Message-Id: <20250228-work-pidfs-kill_on_last_close-v1-5-5bd7e6bb428e@kernel.org> References: <20250228-work-pidfs-kill_on_last_close-v1-0-5bd7e6bb428e@kernel.org> In-Reply-To: <20250228-work-pidfs-kill_on_last_close-v1-0-5bd7e6bb428e@kernel.org> To: Oleg Nesterov <oleg@redhat.com> Cc: linux-fsdevel@vger.kernel.org, Jeff Layton <jlayton@kernel.org>, Lennart Poettering <lennart@poettering.net>, Daan De Meyer <daan.j.demeyer@gmail.com>, Mike Yuan <me@yhndnzj.com>, Christian Brauner <brauner@kernel.org>
Series	pidfs: provide information after task has been reaped \| expand [RFC,00/10] pidfs: provide information after task has been reaped [RFC,01/10] pidfs: switch to copy_struct_to_user() [RFC,02/10] pidfd: rely on automatic cleanup in __pidfd_prepare() [RFC,03/10] pidfs: move setting flags into pidfs_alloc_file() [RFC,04/10] pidfs: add inode allocation [RFC,05/10] pidfs: record exit code and cgroupid at exit [RFC,06/10] pidfs: allow to retrieve exit information [RFC,07/10] selftests/pidfd: fix header inclusion [RFC,08/10] pidfs/selftests: ensure correct headers for ioctl handling [RFC,09/10] selftests/pidfd: move more defines to common header [RFC,10/10] selftests/pidfd: add PIDFD_INFO_EXIT tests

Message ID

20250228-work-pidfs-kill_on_last_close-v1-5-5bd7e6bb428e@kernel.org (mailing list archive)

State

New

Headers

From: Christian Brauner <brauner@kernel.org>
Date: Fri, 28 Feb 2025 13:44:05 +0100
Subject: [PATCH RFC 05/10] pidfs: record exit code and cgroupid at exit
Precedence: bulk
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 7bit
Message-Id: 
 <20250228-work-pidfs-kill_on_last_close-v1-5-5bd7e6bb428e@kernel.org>
References: 
 <20250228-work-pidfs-kill_on_last_close-v1-0-5bd7e6bb428e@kernel.org>
In-Reply-To: 
 <20250228-work-pidfs-kill_on_last_close-v1-0-5bd7e6bb428e@kernel.org>
To: Oleg Nesterov <oleg@redhat.com>
Cc: linux-fsdevel@vger.kernel.org, Jeff Layton <jlayton@kernel.org>,
 Lennart Poettering <lennart@poettering.net>,
 Daan De Meyer <daan.j.demeyer@gmail.com>, Mike Yuan <me@yhndnzj.com>,
 Christian Brauner <brauner@kernel.org>

Series

pidfs: provide information after task has been reaped | expand

Commit Message

Christian Brauner Feb. 28, 2025, 12:44 p.m. UTC

Record the exit code and cgroupid in do_exit() and stash in struct
pidfs_exit_info so it can be retrieved even after the task has been
reaped.

Signed-off-by: Christian Brauner <brauner@kernel.org>
---
 fs/internal.h         |  1 +
 fs/libfs.c            |  4 ++--
 fs/pidfs.c            | 47 +++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/pidfs.h |  1 +
 kernel/exit.c         |  2 ++
 5 files changed, 53 insertions(+), 2 deletions(-)

Comments

Oleg Nesterov March 2, 2025, 3:19 p.m. UTC | #1

On 02/28, Christian Brauner wrote:
>
> +void pidfs_exit(struct task_struct *tsk)
> +{
> +	struct dentry *dentry;
> +
> +	dentry = stashed_dentry_get(&task_pid(tsk)->stashed);
> +	if (dentry) {
> +		struct inode *inode;
> +		struct pidfs_exit_info *exit_info;
> +#ifdef CONFIG_CGROUPS
> +		struct cgroup *cgrp;
> +#endif
> +		inode = d_inode(dentry);
> +		exit_info = &pidfs_i(inode)->exit_info;
> +
> +		/* TODO: Annoy Oleg to tell me how to do this correctly. */
> +		if (tsk->signal->flags & SIGNAL_GROUP_EXIT)
> +			exit_info->exit_code = tsk->signal->group_exit_code;
> +		else
> +			exit_info->exit_code = tsk->exit_code;

I think you don't need to check SIGNAL_GROUP_EXIT,

		exit_info->exit_code = tsk->exit_code;

should be fine.

Yes, if SIGNAL_GROUP_EXIT is already set then signal->group_exit_code
can differ.

But this can only happen if the "current" thread exits on its own using
sys_exit() and it races with another thread which does sys_exit_group()
and sets SIGNAL_GROUP_EXIT.

In this case pidfs_exit() can miss SIGNAL_GROUP_EXIT anyway, but we don't
care.  This doesn't differ from the case when current exits, and then
another thread does sys_exit_group() or exec().

Just in case... If current exits because it was killed by sys_exit_group()
from another thread, current->exit_code will be correct, it will be equal
to signal->group_exit_code.

But I am not sure I understand the next patch. let me check...

Oleg.

diff --git a/fs/internal.h b/fs/internal.h
index e7f02ae1e098..c1e6d8b294cb 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -325,6 +325,7 @@  struct stashed_operations {
 int path_from_stashed(struct dentry **stashed, struct vfsmount *mnt, void *data,
 		      struct path *path);
 void stashed_dentry_prune(struct dentry *dentry);
+struct dentry *stashed_dentry_get(struct dentry **stashed);
 /**
  * path_mounted - check whether path is mounted
  * @path: path to check
diff --git a/fs/libfs.c b/fs/libfs.c
index 8444f5cc4064..cf5a267aafe4 100644
--- a/fs/libfs.c
+++ b/fs/libfs.c
@@ -2113,7 +2113,7 @@  struct timespec64 simple_inode_init_ts(struct inode *inode)
 }
 EXPORT_SYMBOL(simple_inode_init_ts);
 
-static inline struct dentry *get_stashed_dentry(struct dentry **stashed)
+struct dentry *stashed_dentry_get(struct dentry **stashed)
 {
 	struct dentry *dentry;
 
@@ -2215,7 +2215,7 @@  int path_from_stashed(struct dentry **stashed, struct vfsmount *mnt, void *data,
 	const struct stashed_operations *sops = mnt->mnt_sb->s_fs_info;
 
 	/* See if dentry can be reused. */
-	path->dentry = get_stashed_dentry(stashed);
+	path->dentry = stashed_dentry_get(stashed);
 	if (path->dentry) {
 		sops->put_data(data);
 		goto out_path;
diff --git a/fs/pidfs.c b/fs/pidfs.c
index 64428697996f..433f676c066c 100644
--- a/fs/pidfs.c
+++ b/fs/pidfs.c
@@ -458,6 +458,53 @@  struct pid *pidfd_pid(const struct file *file)
 	return file_inode(file)->i_private;
 }
 
+/*
+ * We're called from do_exit(). We know there's at least one reference
+ * to struct pid being held that won't be released until the task has
+ * been reaped which cannot happen until we're out of do_exit().
+ *
+ * If this struct pid is refered to by a pidfd then stashed_dentry_get()
+ * will return the dentry and inode for that struct pid. Since we've
+ * taken a reference on it there's now an additional reference from the
+ * exit path on it. Which is fine. We're going to put it again in a
+ * second and we know that the pid is kept alive anyway.
+ *
+ * Worst case is that we've filled in the info and immediately free the
+ * dentry and inode afterwards since the pidfd has been closed. Since
+ * pidfs_exit() currently is placed after exit_task_work() we know that
+ * it cannot be us aka the exiting task holding a pidfd to ourselves.
+ */
+void pidfs_exit(struct task_struct *tsk)
+{
+	struct dentry *dentry;
+
+	dentry = stashed_dentry_get(&task_pid(tsk)->stashed);
+	if (dentry) {
+		struct inode *inode;
+		struct pidfs_exit_info *exit_info;
+#ifdef CONFIG_CGROUPS
+		struct cgroup *cgrp;
+#endif
+		inode = d_inode(dentry);
+		exit_info = &pidfs_i(inode)->exit_info;
+
+		/* TODO: Annoy Oleg to tell me how to do this correctly. */
+		if (tsk->signal->flags & SIGNAL_GROUP_EXIT)
+			exit_info->exit_code = tsk->signal->group_exit_code;
+		else
+			exit_info->exit_code = tsk->exit_code;
+
+#ifdef CONFIG_CGROUPS
+		rcu_read_lock();
+		cgrp = task_dfl_cgroup(tsk);
+		exit_info->cgroupid = cgroup_id(cgrp);
+		rcu_read_unlock();
+#endif
+
+		dput(dentry);
+	}
+}
+
 static struct vfsmount *pidfs_mnt __ro_after_init;
 
 /*
diff --git a/include/linux/pidfs.h b/include/linux/pidfs.h
index 7c830d0dec9a..05e6f8f4a026 100644
--- a/include/linux/pidfs.h
+++ b/include/linux/pidfs.h
@@ -6,6 +6,7 @@  struct file *pidfs_alloc_file(struct pid *pid, unsigned int flags);
 void __init pidfs_init(void);
 void pidfs_add_pid(struct pid *pid);
 void pidfs_remove_pid(struct pid *pid);
+void pidfs_exit(struct task_struct *tsk);
 extern const struct dentry_operations pidfs_dentry_operations;
 
 #endif /* _LINUX_PID_FS_H */
diff --git a/kernel/exit.c b/kernel/exit.c
index 3485e5fc499e..cae475e7858c 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -69,6 +69,7 @@ 
 #include <linux/sysfs.h>
 #include <linux/user_events.h>
 #include <linux/uaccess.h>
+#include <linux/pidfs.h>
 
 #include <uapi/linux/wait.h>
 
@@ -948,6 +949,7 @@  void __noreturn do_exit(long code)
 
 	sched_autogroup_exit_task(tsk);
 	cgroup_exit(tsk);
+	pidfs_exit(tsk);
 
 	/*
 	 * FIXME: do that only when needed, using sched_exit tracepoint

[RFC,05/10] pidfs: record exit code and cgroupid at exit

Commit Message

Comments

Patch