Message ID | 20240927151746.391931-1-tycho@tycho.pizza (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [v2,1/2] exec: add a flag for "reasonable" execveat() comm | expand |
Tycho Andersen <tycho@tycho.pizza> writes: > From: Tycho Andersen <tandersen@netflix.com> > > Zbigniew mentioned at Linux Plumber's that systemd is interested in > switching to execveat() for service execution, but can't, because the > contents of /proc/pid/comm are the file descriptor which was used, > instead of the path to the binary. This makes the output of tools like > top and ps useless, especially in a world where most fds are opened > CLOEXEC so the number is truly meaningless. > > Change exec path to fix up /proc/pid/comm in the case where we have > allocated one of these synthetic paths in bprm_init(). This way the actual > exec machinery is unchanged, but cosmetically the comm looks reasonable to > admins investigating things. Perhaps change the subject to match the code. > Signed-off-by: Tycho Andersen <tandersen@netflix.com> > Suggested-by: Zbigniew Jędrzejewski-Szmek <zbyszek@in.waw.pl> > CC: Aleksa Sarai <cyphar@cyphar.com> > Link: https://github.com/uapi-group/kernel-features#set-comm-field-before-exec > --- > v2: * drop the flag, everyone :) > * change the rendered value to f_path.dentry->d_name.name instead of > argv[0], Eric > --- > fs/exec.c | 13 ++++++++++++- > 1 file changed, 12 insertions(+), 1 deletion(-) > > diff --git a/fs/exec.c b/fs/exec.c > index dad402d55681..9520359a8dcc 100644 > --- a/fs/exec.c > +++ b/fs/exec.c > @@ -1416,7 +1416,18 @@ int begin_new_exec(struct linux_binprm * bprm) > set_dumpable(current->mm, SUID_DUMP_USER); > > perf_event_exec(); > - __set_task_comm(me, kbasename(bprm->filename), true); > + > + /* > + * If fdpath was set, execveat() made up a path that will > + * probably not be useful to admins running ps or similar. > + * Let's fix it up to be something reasonable. > + */ > + if (bprm->fdpath) { > + BUILD_BUG_ON(TASK_COMM_LEN > DNAME_INLINE_LEN); > + __set_task_comm(me, bprm->file->f_path.dentry->d_name.name, true); We can just do this regardless of bprm->fdpath. It will be a change of behavior on when executing symlinks and possibly mount points but I don't think we care. If we do then we can add make it conditional with "if (bprm->fdpath)" At the very least using the above version unconditionally ought to flush out any bugs. It should be 99% application invisible as all an application can see is argv0. So it is only ps and friends where the comm value is visible. Eric
On Fri, Sep 27, 2024 at 10:45:58AM -0500, Eric W. Biederman wrote: > Tycho Andersen <tycho@tycho.pizza> writes: > > > From: Tycho Andersen <tandersen@netflix.com> > > > > Zbigniew mentioned at Linux Plumber's that systemd is interested in > > switching to execveat() for service execution, but can't, because the > > contents of /proc/pid/comm are the file descriptor which was used, > > instead of the path to the binary. This makes the output of tools like > > top and ps useless, especially in a world where most fds are opened > > CLOEXEC so the number is truly meaningless. > > > > Change exec path to fix up /proc/pid/comm in the case where we have > > allocated one of these synthetic paths in bprm_init(). This way the actual > > exec machinery is unchanged, but cosmetically the comm looks reasonable to > > admins investigating things. > > Perhaps change the subject to match the code. > > > Signed-off-by: Tycho Andersen <tandersen@netflix.com> > > Suggested-by: Zbigniew Jędrzejewski-Szmek <zbyszek@in.waw.pl> > > CC: Aleksa Sarai <cyphar@cyphar.com> > > Link: https://github.com/uapi-group/kernel-features#set-comm-field-before-exec > > --- > > v2: * drop the flag, everyone :) > > * change the rendered value to f_path.dentry->d_name.name instead of > > argv[0], Eric > > --- > > fs/exec.c | 13 ++++++++++++- > > 1 file changed, 12 insertions(+), 1 deletion(-) > > > > diff --git a/fs/exec.c b/fs/exec.c > > index dad402d55681..9520359a8dcc 100644 > > --- a/fs/exec.c > > +++ b/fs/exec.c > > @@ -1416,7 +1416,18 @@ int begin_new_exec(struct linux_binprm * bprm) > > set_dumpable(current->mm, SUID_DUMP_USER); > > > > perf_event_exec(); > > - __set_task_comm(me, kbasename(bprm->filename), true); > > + > > + /* > > + * If fdpath was set, execveat() made up a path that will > > + * probably not be useful to admins running ps or similar. > > + * Let's fix it up to be something reasonable. > > + */ > > + if (bprm->fdpath) { > > + BUILD_BUG_ON(TASK_COMM_LEN > DNAME_INLINE_LEN); > > + __set_task_comm(me, bprm->file->f_path.dentry->d_name.name, true); > > We can just do this regardless of bprm->fdpath. > > It will be a change of behavior on when executing symlinks and possibly > mount points but I don't think we care. If we do then we can add make > it conditional with "if (bprm->fdpath)" > > At the very least using the above version unconditionally ought to flush > out any bugs. I'm not super comfortable doing this regardless of bprm->fdpath; that seems like too many cases getting changed. Can we just leave it as depending on bprm->fdpath? Also, is d_name.name always going to be set? e.g. what about memfd, etc?
Kees Cook <kees@kernel.org> writes: > On Fri, Sep 27, 2024 at 10:45:58AM -0500, Eric W. Biederman wrote: >> Tycho Andersen <tycho@tycho.pizza> writes: >> >> > From: Tycho Andersen <tandersen@netflix.com> >> > >> > Zbigniew mentioned at Linux Plumber's that systemd is interested in >> > switching to execveat() for service execution, but can't, because the >> > contents of /proc/pid/comm are the file descriptor which was used, >> > instead of the path to the binary. This makes the output of tools like >> > top and ps useless, especially in a world where most fds are opened >> > CLOEXEC so the number is truly meaningless. >> > >> > Change exec path to fix up /proc/pid/comm in the case where we have >> > allocated one of these synthetic paths in bprm_init(). This way the actual >> > exec machinery is unchanged, but cosmetically the comm looks reasonable to >> > admins investigating things. >> >> Perhaps change the subject to match the code. >> >> > Signed-off-by: Tycho Andersen <tandersen@netflix.com> >> > Suggested-by: Zbigniew Jędrzejewski-Szmek <zbyszek@in.waw.pl> >> > CC: Aleksa Sarai <cyphar@cyphar.com> >> > Link: https://github.com/uapi-group/kernel-features#set-comm-field-before-exec >> > --- >> > v2: * drop the flag, everyone :) >> > * change the rendered value to f_path.dentry->d_name.name instead of >> > argv[0], Eric >> > --- >> > fs/exec.c | 13 ++++++++++++- >> > 1 file changed, 12 insertions(+), 1 deletion(-) >> > >> > diff --git a/fs/exec.c b/fs/exec.c >> > index dad402d55681..9520359a8dcc 100644 >> > --- a/fs/exec.c >> > +++ b/fs/exec.c >> > @@ -1416,7 +1416,18 @@ int begin_new_exec(struct linux_binprm * bprm) >> > set_dumpable(current->mm, SUID_DUMP_USER); >> > >> > perf_event_exec(); >> > - __set_task_comm(me, kbasename(bprm->filename), true); >> > + >> > + /* >> > + * If fdpath was set, execveat() made up a path that will >> > + * probably not be useful to admins running ps or similar. >> > + * Let's fix it up to be something reasonable. >> > + */ >> > + if (bprm->fdpath) { >> > + BUILD_BUG_ON(TASK_COMM_LEN > DNAME_INLINE_LEN); >> > + __set_task_comm(me, bprm->file->f_path.dentry->d_name.name, true); >> >> We can just do this regardless of bprm->fdpath. >> >> It will be a change of behavior on when executing symlinks and possibly >> mount points but I don't think we care. If we do then we can add make >> it conditional with "if (bprm->fdpath)" >> >> At the very least using the above version unconditionally ought to flush >> out any bugs. > > I'm not super comfortable doing this regardless of bprm->fdpath; that > seems like too many cases getting changed. Can we just leave it as > depending on bprm->fdpath? > > Also, is d_name.name always going to be set? e.g. what about memfd, > etc? Reading __d_alloc I don't see how a dentry can ever be allocated with a NULL pointer in d_name.name. There are filesystems that implement .d_dname and have a special case in d_path(). I don't imagine when dealing with executables we care. I can see an argument for having a helper function that wraps the pointer load and uses READ_ONCE() or smp_load_acquire() something like: const char *d_dname(const struct path *path) { /* see prepend_name for the rational */ return smp_load_acquire(&path->dentry->d_name.name); } I kind of see the optimization being nice enough in the common case, and the weird corner cases where the differences are apparent in task->comm (symlinks, bind mounts) being rare enough. That it is probably worth the optimization. Like a number of things that change userspace behavior we just document that in rare cases task->comm will have a different value and if it is a problem for someone we just add back in the fdpath check. Given the ease of code maintenance we get if one piece of straight line code can work for everyone it seems worth trying. Kees at the end of the day it is your call what code to merge. I just expect it is unnecessary complexity to be tentative about the change. Eric
"Eric W. Biederman" <ebiederm@xmission.com> writes: > Kees Cook <kees@kernel.org> writes: >> I'm not super comfortable doing this regardless of bprm->fdpath; that >> seems like too many cases getting changed. Can we just leave it as >> depending on bprm->fdpath? I was recommending that because I did not expect that there was any widespread usage of aliasing of binary names using symlinks. I realized today that on debian there are many aliases of binaries created with the /etc/alternatives mechanism. So there is much wider exposure to problems than I would have supposed. So I remove any objections to making the new code conditional on bprm->fdpath. Eric
On Mon, Sep 30, 2024 at 03:10:29PM -0500, Eric W. Biederman wrote: > "Eric W. Biederman" <ebiederm@xmission.com> writes: > > > Kees Cook <kees@kernel.org> writes: > > >> I'm not super comfortable doing this regardless of bprm->fdpath; that > >> seems like too many cases getting changed. Can we just leave it as > >> depending on bprm->fdpath? > > I was recommending that because I did not expect that there was any > widespread usage of aliasing of binary names using symlinks. > > I realized today that on debian there are many aliases > of binaries created with the /etc/alternatives mechanism. > So there is much wider exposure to problems than I would have > supposed. > > So I remove any objections to making the new code conditional on bprm->fdpath. Yep, and it looks like Alpine distributes busybox with symlinks instead of hard links. I will respin with a fixed subject line shortly. Thanks, Tycho
diff --git a/fs/exec.c b/fs/exec.c index dad402d55681..9520359a8dcc 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -1416,7 +1416,18 @@ int begin_new_exec(struct linux_binprm * bprm) set_dumpable(current->mm, SUID_DUMP_USER); perf_event_exec(); - __set_task_comm(me, kbasename(bprm->filename), true); + + /* + * If fdpath was set, execveat() made up a path that will + * probably not be useful to admins running ps or similar. + * Let's fix it up to be something reasonable. + */ + if (bprm->fdpath) { + BUILD_BUG_ON(TASK_COMM_LEN > DNAME_INLINE_LEN); + __set_task_comm(me, bprm->file->f_path.dentry->d_name.name, true); + } else { + __set_task_comm(me, kbasename(bprm->filename), true); + } /* An exec changes our domain. We are no longer part of the thread group */