diff mbox series

[v2,1/2] exec: add a flag for "reasonable" execveat() comm

Message ID 20240927151746.391931-1-tycho@tycho.pizza (mailing list archive)
State New
Headers show
Series [v2,1/2] exec: add a flag for "reasonable" execveat() comm | expand

Commit Message

Tycho Andersen Sept. 27, 2024, 3:17 p.m. UTC
From: Tycho Andersen <tandersen@netflix.com>

Zbigniew mentioned at Linux Plumber's that systemd is interested in
switching to execveat() for service execution, but can't, because the
contents of /proc/pid/comm are the file descriptor which was used,
instead of the path to the binary. This makes the output of tools like
top and ps useless, especially in a world where most fds are opened
CLOEXEC so the number is truly meaningless.

Change exec path to fix up /proc/pid/comm in the case where we have
allocated one of these synthetic paths in bprm_init(). This way the actual
exec machinery is unchanged, but cosmetically the comm looks reasonable to
admins investigating things.

Signed-off-by: Tycho Andersen <tandersen@netflix.com>
Suggested-by: Zbigniew Jędrzejewski-Szmek <zbyszek@in.waw.pl>
CC: Aleksa Sarai <cyphar@cyphar.com>
Link: https://github.com/uapi-group/kernel-features#set-comm-field-before-exec
---
v2: * drop the flag, everyone :)
    * change the rendered value to f_path.dentry->d_name.name instead of
      argv[0], Eric
---
 fs/exec.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)


base-commit: baeb9a7d8b60b021d907127509c44507539c15e5

Comments

Eric W. Biederman Sept. 27, 2024, 3:45 p.m. UTC | #1
Tycho Andersen <tycho@tycho.pizza> writes:

> From: Tycho Andersen <tandersen@netflix.com>
>
> Zbigniew mentioned at Linux Plumber's that systemd is interested in
> switching to execveat() for service execution, but can't, because the
> contents of /proc/pid/comm are the file descriptor which was used,
> instead of the path to the binary. This makes the output of tools like
> top and ps useless, especially in a world where most fds are opened
> CLOEXEC so the number is truly meaningless.
>
> Change exec path to fix up /proc/pid/comm in the case where we have
> allocated one of these synthetic paths in bprm_init(). This way the actual
> exec machinery is unchanged, but cosmetically the comm looks reasonable to
> admins investigating things.

Perhaps change the subject to match the code.

> Signed-off-by: Tycho Andersen <tandersen@netflix.com>
> Suggested-by: Zbigniew Jędrzejewski-Szmek <zbyszek@in.waw.pl>
> CC: Aleksa Sarai <cyphar@cyphar.com>
> Link: https://github.com/uapi-group/kernel-features#set-comm-field-before-exec
> ---
> v2: * drop the flag, everyone :)
>     * change the rendered value to f_path.dentry->d_name.name instead of
>       argv[0], Eric
> ---
>  fs/exec.c | 13 ++++++++++++-
>  1 file changed, 12 insertions(+), 1 deletion(-)
>
> diff --git a/fs/exec.c b/fs/exec.c
> index dad402d55681..9520359a8dcc 100644
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -1416,7 +1416,18 @@ int begin_new_exec(struct linux_binprm * bprm)
>  		set_dumpable(current->mm, SUID_DUMP_USER);
>  
>  	perf_event_exec();
> -	__set_task_comm(me, kbasename(bprm->filename), true);
> +
> +	/*
> +	 * If fdpath was set, execveat() made up a path that will
> +	 * probably not be useful to admins running ps or similar.
> +	 * Let's fix it up to be something reasonable.
> +	 */
> +	if (bprm->fdpath) {
> +		BUILD_BUG_ON(TASK_COMM_LEN > DNAME_INLINE_LEN);
> +		__set_task_comm(me, bprm->file->f_path.dentry->d_name.name, true);

We can just do this regardless of bprm->fdpath.

It will be a change of behavior on when executing symlinks and possibly
mount points but I don't think we care.  If we do then we can add make
it conditional with "if (bprm->fdpath)"

At the very least using the above version unconditionally ought to flush
out any bugs.

It should be 99% application invisible as all an application can see
is argv0.  So it is only ps and friends where the comm value is visible.

Eric
Kees Cook Sept. 28, 2024, 9:56 p.m. UTC | #2
On Fri, Sep 27, 2024 at 10:45:58AM -0500, Eric W. Biederman wrote:
> Tycho Andersen <tycho@tycho.pizza> writes:
> 
> > From: Tycho Andersen <tandersen@netflix.com>
> >
> > Zbigniew mentioned at Linux Plumber's that systemd is interested in
> > switching to execveat() for service execution, but can't, because the
> > contents of /proc/pid/comm are the file descriptor which was used,
> > instead of the path to the binary. This makes the output of tools like
> > top and ps useless, especially in a world where most fds are opened
> > CLOEXEC so the number is truly meaningless.
> >
> > Change exec path to fix up /proc/pid/comm in the case where we have
> > allocated one of these synthetic paths in bprm_init(). This way the actual
> > exec machinery is unchanged, but cosmetically the comm looks reasonable to
> > admins investigating things.
> 
> Perhaps change the subject to match the code.
> 
> > Signed-off-by: Tycho Andersen <tandersen@netflix.com>
> > Suggested-by: Zbigniew Jędrzejewski-Szmek <zbyszek@in.waw.pl>
> > CC: Aleksa Sarai <cyphar@cyphar.com>
> > Link: https://github.com/uapi-group/kernel-features#set-comm-field-before-exec
> > ---
> > v2: * drop the flag, everyone :)
> >     * change the rendered value to f_path.dentry->d_name.name instead of
> >       argv[0], Eric
> > ---
> >  fs/exec.c | 13 ++++++++++++-
> >  1 file changed, 12 insertions(+), 1 deletion(-)
> >
> > diff --git a/fs/exec.c b/fs/exec.c
> > index dad402d55681..9520359a8dcc 100644
> > --- a/fs/exec.c
> > +++ b/fs/exec.c
> > @@ -1416,7 +1416,18 @@ int begin_new_exec(struct linux_binprm * bprm)
> >  		set_dumpable(current->mm, SUID_DUMP_USER);
> >  
> >  	perf_event_exec();
> > -	__set_task_comm(me, kbasename(bprm->filename), true);
> > +
> > +	/*
> > +	 * If fdpath was set, execveat() made up a path that will
> > +	 * probably not be useful to admins running ps or similar.
> > +	 * Let's fix it up to be something reasonable.
> > +	 */
> > +	if (bprm->fdpath) {
> > +		BUILD_BUG_ON(TASK_COMM_LEN > DNAME_INLINE_LEN);
> > +		__set_task_comm(me, bprm->file->f_path.dentry->d_name.name, true);
> 
> We can just do this regardless of bprm->fdpath.
> 
> It will be a change of behavior on when executing symlinks and possibly
> mount points but I don't think we care.  If we do then we can add make
> it conditional with "if (bprm->fdpath)"
> 
> At the very least using the above version unconditionally ought to flush
> out any bugs.

I'm not super comfortable doing this regardless of bprm->fdpath; that
seems like too many cases getting changed. Can we just leave it as
depending on bprm->fdpath?

Also, is d_name.name always going to be set? e.g. what about memfd, etc?
Eric W. Biederman Sept. 30, 2024, 2:59 a.m. UTC | #3
Kees Cook <kees@kernel.org> writes:

> On Fri, Sep 27, 2024 at 10:45:58AM -0500, Eric W. Biederman wrote:
>> Tycho Andersen <tycho@tycho.pizza> writes:
>> 
>> > From: Tycho Andersen <tandersen@netflix.com>
>> >
>> > Zbigniew mentioned at Linux Plumber's that systemd is interested in
>> > switching to execveat() for service execution, but can't, because the
>> > contents of /proc/pid/comm are the file descriptor which was used,
>> > instead of the path to the binary. This makes the output of tools like
>> > top and ps useless, especially in a world where most fds are opened
>> > CLOEXEC so the number is truly meaningless.
>> >
>> > Change exec path to fix up /proc/pid/comm in the case where we have
>> > allocated one of these synthetic paths in bprm_init(). This way the actual
>> > exec machinery is unchanged, but cosmetically the comm looks reasonable to
>> > admins investigating things.
>> 
>> Perhaps change the subject to match the code.
>> 
>> > Signed-off-by: Tycho Andersen <tandersen@netflix.com>
>> > Suggested-by: Zbigniew Jędrzejewski-Szmek <zbyszek@in.waw.pl>
>> > CC: Aleksa Sarai <cyphar@cyphar.com>
>> > Link: https://github.com/uapi-group/kernel-features#set-comm-field-before-exec
>> > ---
>> > v2: * drop the flag, everyone :)
>> >     * change the rendered value to f_path.dentry->d_name.name instead of
>> >       argv[0], Eric
>> > ---
>> >  fs/exec.c | 13 ++++++++++++-
>> >  1 file changed, 12 insertions(+), 1 deletion(-)
>> >
>> > diff --git a/fs/exec.c b/fs/exec.c
>> > index dad402d55681..9520359a8dcc 100644
>> > --- a/fs/exec.c
>> > +++ b/fs/exec.c
>> > @@ -1416,7 +1416,18 @@ int begin_new_exec(struct linux_binprm * bprm)
>> >  		set_dumpable(current->mm, SUID_DUMP_USER);
>> >  
>> >  	perf_event_exec();
>> > -	__set_task_comm(me, kbasename(bprm->filename), true);
>> > +
>> > +	/*
>> > +	 * If fdpath was set, execveat() made up a path that will
>> > +	 * probably not be useful to admins running ps or similar.
>> > +	 * Let's fix it up to be something reasonable.
>> > +	 */
>> > +	if (bprm->fdpath) {
>> > +		BUILD_BUG_ON(TASK_COMM_LEN > DNAME_INLINE_LEN);
>> > +		__set_task_comm(me, bprm->file->f_path.dentry->d_name.name, true);
>> 
>> We can just do this regardless of bprm->fdpath.
>> 
>> It will be a change of behavior on when executing symlinks and possibly
>> mount points but I don't think we care.  If we do then we can add make
>> it conditional with "if (bprm->fdpath)"
>> 
>> At the very least using the above version unconditionally ought to flush
>> out any bugs.
>
> I'm not super comfortable doing this regardless of bprm->fdpath; that
> seems like too many cases getting changed. Can we just leave it as
> depending on bprm->fdpath?
>
> Also, is d_name.name always going to be set? e.g. what about memfd,
> etc?

Reading __d_alloc I don't see how a dentry can ever be allocated with a
NULL pointer in d_name.name.

There are filesystems that implement .d_dname and have a special case
in d_path().  I don't imagine when dealing with executables we care.

I can see an argument for having a helper function that wraps
the pointer load and uses READ_ONCE() or smp_load_acquire() something like:

const char *d_dname(const struct path *path)
{
	/* see prepend_name for the rational */
	return smp_load_acquire(&path->dentry->d_name.name);
}

I kind of see the optimization being nice enough in the common case, and
the weird corner cases where the differences are apparent in task->comm
(symlinks, bind mounts) being rare enough.  That it is probably worth
the optimization.

Like a number of things that change userspace behavior we just document
that in rare cases task->comm will have a different value and if it is
a problem for someone we just add back in the fdpath check.

Given the ease of code maintenance we get if one piece of straight line
code can work for everyone it seems worth trying.

Kees at the end of the day it is your call what code to merge.  I just
expect it is unnecessary complexity to be tentative about the change.

Eric
Eric W. Biederman Sept. 30, 2024, 8:10 p.m. UTC | #4
"Eric W. Biederman" <ebiederm@xmission.com> writes:

> Kees Cook <kees@kernel.org> writes:

>> I'm not super comfortable doing this regardless of bprm->fdpath; that
>> seems like too many cases getting changed. Can we just leave it as
>> depending on bprm->fdpath?

I was recommending that because I did not expect that there was any
widespread usage of aliasing of binary names using symlinks.

I realized today that on debian there are many aliases
of binaries created with the /etc/alternatives mechanism.
So there is much wider exposure to problems than I would have
supposed.

So I remove any objections to making the new code conditional on bprm->fdpath.

Eric
Tycho Andersen Oct. 1, 2024, 1:43 p.m. UTC | #5
On Mon, Sep 30, 2024 at 03:10:29PM -0500, Eric W. Biederman wrote:
> "Eric W. Biederman" <ebiederm@xmission.com> writes:
> 
> > Kees Cook <kees@kernel.org> writes:
> 
> >> I'm not super comfortable doing this regardless of bprm->fdpath; that
> >> seems like too many cases getting changed. Can we just leave it as
> >> depending on bprm->fdpath?
> 
> I was recommending that because I did not expect that there was any
> widespread usage of aliasing of binary names using symlinks.
> 
> I realized today that on debian there are many aliases
> of binaries created with the /etc/alternatives mechanism.
> So there is much wider exposure to problems than I would have
> supposed.
> 
> So I remove any objections to making the new code conditional on bprm->fdpath.

Yep, and it looks like Alpine distributes busybox with symlinks
instead of hard links. I will respin with a fixed subject line shortly.

Thanks,

Tycho
diff mbox series

Patch

diff --git a/fs/exec.c b/fs/exec.c
index dad402d55681..9520359a8dcc 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1416,7 +1416,18 @@  int begin_new_exec(struct linux_binprm * bprm)
 		set_dumpable(current->mm, SUID_DUMP_USER);
 
 	perf_event_exec();
-	__set_task_comm(me, kbasename(bprm->filename), true);
+
+	/*
+	 * If fdpath was set, execveat() made up a path that will
+	 * probably not be useful to admins running ps or similar.
+	 * Let's fix it up to be something reasonable.
+	 */
+	if (bprm->fdpath) {
+		BUILD_BUG_ON(TASK_COMM_LEN > DNAME_INLINE_LEN);
+		__set_task_comm(me, bprm->file->f_path.dentry->d_name.name, true);
+	} else {
+		__set_task_comm(me, kbasename(bprm->filename), true);
+	}
 
 	/* An exec changes our domain. We are no longer part of the thread
 	   group */