diff mbox series

[1/2] exec: fix up /proc/pid/comm in the execveat(AT_EMPTY_PATH) case

Message ID 20241030203732.248767-1-tycho@tycho.pizza (mailing list archive)
State New
Headers show
Series [1/2] exec: fix up /proc/pid/comm in the execveat(AT_EMPTY_PATH) case | expand

Commit Message

Tycho Andersen Oct. 30, 2024, 8:37 p.m. UTC
From: Tycho Andersen <tandersen@netflix.com>

Zbigniew mentioned at Linux Plumber's that systemd is interested in
switching to execveat() for service execution, but can't, because the
contents of /proc/pid/comm are the file descriptor which was used,
instead of the path to the binary. This makes the output of tools like
top and ps useless, especially in a world where most fds are opened
CLOEXEC so the number is truly meaningless.

Change exec path to fix up /proc/pid/comm in the case where we have
allocated one of these synthetic paths in bprm_init(). This way the actual
exec machinery is unchanged, but cosmetically the comm looks reasonable to
admins investigating things.

Signed-off-by: Tycho Andersen <tandersen@netflix.com>
Suggested-by: Zbigniew Jędrzejewski-Szmek <zbyszek@in.waw.pl>
CC: Aleksa Sarai <cyphar@cyphar.com>
Link: https://github.com/uapi-group/kernel-features#set-comm-field-before-exec
---
v2: * drop the flag, everyone :)
    * change the rendered value to f_path.dentry->d_name.name instead of
      argv[0], Eric
v3: * fix up subject line, Eric
v4: * switch to no flag, always rewrite approach, with some cleanup
      suggested by Kees
---
 fs/exec.c               | 36 +++++++++++++++++++++++++++++++++++-
 include/linux/binfmts.h |  1 +
 2 files changed, 36 insertions(+), 1 deletion(-)


base-commit: c1e939a21eb111a6d6067b38e8e04b8809b64c4e

Comments

Kees Cook Oct. 31, 2024, 10:10 p.m. UTC | #1
On Wed, 30 Oct 2024 14:37:31 -0600, Tycho Andersen wrote:
> Zbigniew mentioned at Linux Plumber's that systemd is interested in
> switching to execveat() for service execution, but can't, because the
> contents of /proc/pid/comm are the file descriptor which was used,
> instead of the path to the binary. This makes the output of tools like
> top and ps useless, especially in a world where most fds are opened
> CLOEXEC so the number is truly meaningless.
> 
> [...]

Applied to for-next/execve, thanks!

[1/2] exec: fix up /proc/pid/comm in the execveat(AT_EMPTY_PATH) case
      https://git.kernel.org/kees/c/7bdc6fc85c9a
[2/2] selftests/exec: add a test for execveat()'s comm
      https://git.kernel.org/kees/c/bd104872311a

Take care,
Zbigniew Jędrzejewski-Szmek Nov. 2, 2024, 11:29 a.m. UTC | #2
On Thu, Oct 31, 2024 at 03:10:37PM -0700, Kees Cook wrote:
> On Wed, 30 Oct 2024 14:37:31 -0600, Tycho Andersen wrote:
> > Zbigniew mentioned at Linux Plumber's that systemd is interested in
> > switching to execveat() for service execution, but can't, because the
> > contents of /proc/pid/comm are the file descriptor which was used,
> > instead of the path to the binary. This makes the output of tools like
> > top and ps useless, especially in a world where most fds are opened
> > CLOEXEC so the number is truly meaningless.
> > 
> > [...]
> 
> Applied to for-next/execve, thanks!
> 
> [1/2] exec: fix up /proc/pid/comm in the execveat(AT_EMPTY_PATH) case
>       https://git.kernel.org/kees/c/7bdc6fc85c9a
> [2/2] selftests/exec: add a test for execveat()'s comm
>       https://git.kernel.org/kees/c/bd104872311a

I tested this with systemd compiled with -Dfexece=true and it all
seems to work fine. Thanks!

Zbyszek
Kees Cook Nov. 2, 2024, 7:58 p.m. UTC | #3
On Sat, Nov 02, 2024 at 11:29:55AM +0000, Zbigniew Jędrzejewski-Szmek wrote:
> On Thu, Oct 31, 2024 at 03:10:37PM -0700, Kees Cook wrote:
> > On Wed, 30 Oct 2024 14:37:31 -0600, Tycho Andersen wrote:
> > > Zbigniew mentioned at Linux Plumber's that systemd is interested in
> > > switching to execveat() for service execution, but can't, because the
> > > contents of /proc/pid/comm are the file descriptor which was used,
> > > instead of the path to the binary. This makes the output of tools like
> > > top and ps useless, especially in a world where most fds are opened
> > > CLOEXEC so the number is truly meaningless.
> > > 
> > > [...]
> > 
> > Applied to for-next/execve, thanks!
> > 
> > [1/2] exec: fix up /proc/pid/comm in the execveat(AT_EMPTY_PATH) case
> >       https://git.kernel.org/kees/c/7bdc6fc85c9a
> > [2/2] selftests/exec: add a test for execveat()'s comm
> >       https://git.kernel.org/kees/c/bd104872311a
> 
> I tested this with systemd compiled with -Dfexece=true and it all
> seems to work fine. Thanks!

Great; thank you!
Christian Brauner Nov. 6, 2024, 10:06 a.m. UTC | #4
On Wed, Oct 30, 2024 at 02:37:31PM -0600, Tycho Andersen wrote:
> From: Tycho Andersen <tandersen@netflix.com>
> 
> Zbigniew mentioned at Linux Plumber's that systemd is interested in
> switching to execveat() for service execution, but can't, because the
> contents of /proc/pid/comm are the file descriptor which was used,
> instead of the path to the binary. This makes the output of tools like
> top and ps useless, especially in a world where most fds are opened
> CLOEXEC so the number is truly meaningless.
> 
> Change exec path to fix up /proc/pid/comm in the case where we have
> allocated one of these synthetic paths in bprm_init(). This way the actual
> exec machinery is unchanged, but cosmetically the comm looks reasonable to
> admins investigating things.
> 
> Signed-off-by: Tycho Andersen <tandersen@netflix.com>
> Suggested-by: Zbigniew Jędrzejewski-Szmek <zbyszek@in.waw.pl>
> CC: Aleksa Sarai <cyphar@cyphar.com>
> Link: https://github.com/uapi-group/kernel-features#set-comm-field-before-exec
> ---

We finally went full circle back to what was originally proposed :)

Reviewed-by: Christian Brauner <brauner@kernel.org>
diff mbox series

Patch

diff --git a/fs/exec.c b/fs/exec.c
index 6c53920795c2..3b559f598c74 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1347,7 +1347,16 @@  int begin_new_exec(struct linux_binprm * bprm)
 		set_dumpable(current->mm, SUID_DUMP_USER);
 
 	perf_event_exec();
-	__set_task_comm(me, kbasename(bprm->filename), true);
+
+	/*
+	 * If argv0 was set, alloc_bprm() made up a path that will
+	 * probably not be useful to admins running ps or similar.
+	 * Let's fix it up to be something reasonable.
+	 */
+	if (bprm->argv0)
+		__set_task_comm(me, kbasename(bprm->argv0), true);
+	else
+		__set_task_comm(me, kbasename(bprm->filename), true);
 
 	/* An exec changes our domain. We are no longer part of the thread
 	   group */
@@ -1497,9 +1506,28 @@  static void free_bprm(struct linux_binprm *bprm)
 	if (bprm->interp != bprm->filename)
 		kfree(bprm->interp);
 	kfree(bprm->fdpath);
+	kfree(bprm->argv0);
 	kfree(bprm);
 }
 
+static int bprm_add_fixup_comm(struct linux_binprm *bprm,
+			       struct user_arg_ptr argv)
+{
+	const char __user *p = get_user_arg_ptr(argv, 0);
+
+	/*
+	 * If p == NULL, let's just fall back to fdpath.
+	 */
+	if (!p)
+		return 0;
+
+	bprm->argv0 = strndup_user(p, MAX_ARG_STRLEN);
+	if (bprm->argv0)
+		return 0;
+
+	return -EFAULT;
+}
+
 static struct linux_binprm *alloc_bprm(int fd, struct filename *filename, int flags)
 {
 	struct linux_binprm *bprm;
@@ -1906,6 +1934,12 @@  static int do_execveat_common(int fd, struct filename *filename,
 		goto out_ret;
 	}
 
+	if (unlikely(bprm->fdpath)) {
+		retval = bprm_add_fixup_comm(bprm, argv);
+		if (retval != 0)
+			goto out_free;
+	}
+
 	retval = count(argv, MAX_ARG_STRINGS);
 	if (retval == 0)
 		pr_warn_once("process '%s' launched '%s' with NULL argv: empty string added\n",
diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h
index e6c00e860951..bab5121a746b 100644
--- a/include/linux/binfmts.h
+++ b/include/linux/binfmts.h
@@ -55,6 +55,7 @@  struct linux_binprm {
 				   of the time same as filename, but could be
 				   different for binfmt_{misc,script} */
 	const char *fdpath;	/* generated filename for execveat */
+	const char *argv0;	/* argv0 from execveat */
 	unsigned interp_flags;
 	int execfd;		/* File descriptor of the executable */
 	unsigned long loader, exec;