diff mbox series

tracing: Add new_exec tracepoint

Message ID 20240408090205.3714934-1-elver@google.com (mailing list archive)
State New, archived
Headers show
Series tracing: Add new_exec tracepoint | expand

Commit Message

Marco Elver April 8, 2024, 9:01 a.m. UTC
Add "new_exec" tracepoint, which is run right after the point of no
return but before the current task assumes its new exec identity.

Unlike the tracepoint "sched_process_exec", the "new_exec" tracepoint
runs before flushing the old exec, i.e. while the task still has the
original state (such as original MM), but when the new exec either
succeeds or crashes (but never returns to the original exec).

Being able to trace this event can be helpful in a number of use cases:

  * allowing tracing eBPF programs access to the original MM on exec,
    before current->mm is replaced;
  * counting exec in the original task (via perf event);
  * profiling flush time ("new_exec" to "sched_process_exec").

Example of tracing output ("new_exec" and "sched_process_exec"):

  $ cat /sys/kernel/debug/tracing/trace_pipe
      <...>-379     [003] .....   179.626921: new_exec: filename=/usr/bin/sshd pid=379 comm=sshd
      <...>-379     [003] .....   179.629131: sched_process_exec: filename=/usr/bin/sshd pid=379 old_pid=379
      <...>-381     [002] .....   180.048580: new_exec: filename=/bin/bash pid=381 comm=sshd
      <...>-381     [002] .....   180.053122: sched_process_exec: filename=/bin/bash pid=381 old_pid=381
      <...>-385     [001] .....   180.068277: new_exec: filename=/usr/bin/tty pid=385 comm=bash
      <...>-385     [001] .....   180.069485: sched_process_exec: filename=/usr/bin/tty pid=385 old_pid=385
      <...>-389     [006] .....   192.020147: new_exec: filename=/usr/bin/dmesg pid=389 comm=bash
       bash-389     [006] .....   192.021377: sched_process_exec: filename=/usr/bin/dmesg pid=389 old_pid=389

Signed-off-by: Marco Elver <elver@google.com>
---
 fs/exec.c                   |  2 ++
 include/trace/events/task.h | 30 ++++++++++++++++++++++++++++++
 2 files changed, 32 insertions(+)

Comments

Steven Rostedt April 9, 2024, 2:33 p.m. UTC | #1
On Mon,  8 Apr 2024 11:01:54 +0200
Marco Elver <elver@google.com> wrote:

> Add "new_exec" tracepoint, which is run right after the point of no
> return but before the current task assumes its new exec identity.
> 
> Unlike the tracepoint "sched_process_exec", the "new_exec" tracepoint
> runs before flushing the old exec, i.e. while the task still has the
> original state (such as original MM), but when the new exec either
> succeeds or crashes (but never returns to the original exec).
> 
> Being able to trace this event can be helpful in a number of use cases:
> 
>   * allowing tracing eBPF programs access to the original MM on exec,
>     before current->mm is replaced;
>   * counting exec in the original task (via perf event);
>   * profiling flush time ("new_exec" to "sched_process_exec").
> 
> Example of tracing output ("new_exec" and "sched_process_exec"):

How common is this? And can't you just do the same with adding a kprobe?

-- Steve
Marco Elver April 9, 2024, 2:45 p.m. UTC | #2
On Tue, 9 Apr 2024 at 16:31, Steven Rostedt <rostedt@goodmis.org> wrote:
>
> On Mon,  8 Apr 2024 11:01:54 +0200
> Marco Elver <elver@google.com> wrote:
>
> > Add "new_exec" tracepoint, which is run right after the point of no
> > return but before the current task assumes its new exec identity.
> >
> > Unlike the tracepoint "sched_process_exec", the "new_exec" tracepoint
> > runs before flushing the old exec, i.e. while the task still has the
> > original state (such as original MM), but when the new exec either
> > succeeds or crashes (but never returns to the original exec).
> >
> > Being able to trace this event can be helpful in a number of use cases:
> >
> >   * allowing tracing eBPF programs access to the original MM on exec,
> >     before current->mm is replaced;
> >   * counting exec in the original task (via perf event);
> >   * profiling flush time ("new_exec" to "sched_process_exec").
> >
> > Example of tracing output ("new_exec" and "sched_process_exec"):
>
> How common is this? And can't you just do the same with adding a kprobe?

Our main use case would be to use this in BPF programs to become
exec-aware, where using the sched_process_exec hook is too late. This
is particularly important where the BPF program must stop inspecting
the user space's VM when the task does exec to become a new process.

kprobe (or BPF's fentry) is brittle here, because begin_new_exec()'s
permission check can still return an error which returns to the
original task without crashing. Only at the point of no return are we
guaranteed that the exec either succeeds, or the task is terminated on
failure.

I don't know if "common" is the right question here, because it's a
chicken-egg problem: no tracepoint, we give up; we have the
tracepoint, it unlocks a range of new use cases (that require robust
solution to make BPF programs exec-aware, and a tracepoint is the only
option IMHO).

Thanks,
-- Marco
Kees Cook April 9, 2024, 3:46 p.m. UTC | #3
On Mon, Apr 08, 2024 at 11:01:54AM +0200, Marco Elver wrote:
> Add "new_exec" tracepoint, which is run right after the point of no
> return but before the current task assumes its new exec identity.
> 
> Unlike the tracepoint "sched_process_exec", the "new_exec" tracepoint
> runs before flushing the old exec, i.e. while the task still has the
> original state (such as original MM), but when the new exec either
> succeeds or crashes (but never returns to the original exec).
> 
> Being able to trace this event can be helpful in a number of use cases:
> 
>   * allowing tracing eBPF programs access to the original MM on exec,
>     before current->mm is replaced;
>   * counting exec in the original task (via perf event);
>   * profiling flush time ("new_exec" to "sched_process_exec").
> 
> Example of tracing output ("new_exec" and "sched_process_exec"):
> 
>   $ cat /sys/kernel/debug/tracing/trace_pipe
>       <...>-379     [003] .....   179.626921: new_exec: filename=/usr/bin/sshd pid=379 comm=sshd
>       <...>-379     [003] .....   179.629131: sched_process_exec: filename=/usr/bin/sshd pid=379 old_pid=379
>       <...>-381     [002] .....   180.048580: new_exec: filename=/bin/bash pid=381 comm=sshd
>       <...>-381     [002] .....   180.053122: sched_process_exec: filename=/bin/bash pid=381 old_pid=381
>       <...>-385     [001] .....   180.068277: new_exec: filename=/usr/bin/tty pid=385 comm=bash
>       <...>-385     [001] .....   180.069485: sched_process_exec: filename=/usr/bin/tty pid=385 old_pid=385
>       <...>-389     [006] .....   192.020147: new_exec: filename=/usr/bin/dmesg pid=389 comm=bash
>        bash-389     [006] .....   192.021377: sched_process_exec: filename=/usr/bin/dmesg pid=389 old_pid=389
> 
> Signed-off-by: Marco Elver <elver@google.com>
> ---
>  fs/exec.c                   |  2 ++
>  include/trace/events/task.h | 30 ++++++++++++++++++++++++++++++
>  2 files changed, 32 insertions(+)
> 
> diff --git a/fs/exec.c b/fs/exec.c
> index 38bf71cbdf5e..ab778ae1fc06 100644
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -1268,6 +1268,8 @@ int begin_new_exec(struct linux_binprm * bprm)
>  	if (retval)
>  		return retval;
>  
> +	trace_new_exec(current, bprm);
> +

All other steps in this function have explicit comments about
what/why/etc. Please add some kind of comment describing why the
tracepoint is where it is, etc.

For example, maybe something like:

/*
 * Before any changes to 'current', report that the exec is about to
 * happen (since we made it to the point of no return). On a successful
 * exec, the 'sched_process_exec' tracepoint will also fire. On failure,
 * ... [something else]
 */

> +TRACE_EVENT(new_exec,
> +
> +	TP_PROTO(struct task_struct *task, struct linux_binprm *bprm),
> +
> +	TP_ARGS(task, bprm),
> +
> +	TP_STRUCT__entry(
> +		__string(	filename,	bprm->filename	)
> +		__field(	pid_t,		pid		)
> +		__string(	comm,		task->comm	)
> +	),
> +
> +	TP_fast_assign(
> +		__assign_str(filename, bprm->filename);

What about binfmt_misc, and binfmt_script? You may want bprm->interp
too?

-Kees

> +		__entry->pid = task->pid;
> +		__assign_str(comm, task->comm);
> +	),
> +
> +	TP_printk("filename=%s pid=%d comm=%s",
> +		  __get_str(filename), __entry->pid, __get_str(comm))
> +);
> +
>  #endif
>  
>  /* This part must be outside protection */
> -- 
> 2.44.0.478.gd926399ef9-goog
>
Marco Elver April 9, 2024, 6:25 p.m. UTC | #4
On Tue, Apr 09, 2024 at 08:46AM -0700, Kees Cook wrote:
[...]
> > +	trace_new_exec(current, bprm);
> > +
> 
> All other steps in this function have explicit comments about
> what/why/etc. Please add some kind of comment describing why the
> tracepoint is where it is, etc.

I beefed up the tracepoint documentation, and wrote a little paragraph
above where it's called to reinforce what we want.

[...]
> What about binfmt_misc, and binfmt_script? You may want bprm->interp
> too?

Good points. I'll make the below changes for v2:

diff --git a/fs/exec.c b/fs/exec.c
index ab778ae1fc06..472b9f7b40e8 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1268,6 +1268,12 @@ int begin_new_exec(struct linux_binprm * bprm)
 	if (retval)
 		return retval;
 
+	/*
+	 * This tracepoint marks the point before flushing the old exec where
+	 * the current task is still unchanged, but errors are fatal (point of
+	 * no return). The later "sched_process_exec" tracepoint is called after
+	 * the current task has successfully switched to the new exec.
+	 */
 	trace_new_exec(current, bprm);
 
 	/*
diff --git a/include/trace/events/task.h b/include/trace/events/task.h
index 8853dc44783d..623d9af777c1 100644
--- a/include/trace/events/task.h
+++ b/include/trace/events/task.h
@@ -61,8 +61,11 @@ TRACE_EVENT(task_rename,
  * @task:	pointer to the current task
  * @bprm:	pointer to linux_binprm used for new exec
  *
- * Called before flushing the old exec, but at the point of no return during
- * switching to the new exec.
+ * Called before flushing the old exec, where @task is still unchanged, but at
+ * the point of no return during switching to the new exec. At the point it is
+ * called the exec will either succeed, or on failure terminate the task. Also
+ * see the "sched_process_exec" tracepoint, which is called right after @task
+ * has successfully switched to the new exec.
  */
 TRACE_EVENT(new_exec,
 
@@ -71,19 +74,22 @@ TRACE_EVENT(new_exec,
 	TP_ARGS(task, bprm),
 
 	TP_STRUCT__entry(
+		__string(	interp,		bprm->interp	)
 		__string(	filename,	bprm->filename	)
 		__field(	pid_t,		pid		)
 		__string(	comm,		task->comm	)
 	),
 
 	TP_fast_assign(
+		__assign_str(interp, bprm->interp);
 		__assign_str(filename, bprm->filename);
 		__entry->pid = task->pid;
 		__assign_str(comm, task->comm);
 	),
 
-	TP_printk("filename=%s pid=%d comm=%s",
-		  __get_str(filename), __entry->pid, __get_str(comm))
+	TP_printk("interp=%s filename=%s pid=%d comm=%s",
+		  __get_str(interp), __get_str(filename),
+		  __entry->pid, __get_str(comm))
 );
 
 #endif
Kees Cook April 9, 2024, 9:28 p.m. UTC | #5
On Tue, Apr 09, 2024 at 08:25:45PM +0200, Marco Elver wrote:
> On Tue, Apr 09, 2024 at 08:46AM -0700, Kees Cook wrote:
> [...]
> > > +	trace_new_exec(current, bprm);
> > > +
> > 
> > All other steps in this function have explicit comments about
> > what/why/etc. Please add some kind of comment describing why the
> > tracepoint is where it is, etc.
> 
> I beefed up the tracepoint documentation, and wrote a little paragraph
> above where it's called to reinforce what we want.
> 
> [...]
> > What about binfmt_misc, and binfmt_script? You may want bprm->interp
> > too?
> 
> Good points. I'll make the below changes for v2:
> 
> diff --git a/fs/exec.c b/fs/exec.c
> index ab778ae1fc06..472b9f7b40e8 100644
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -1268,6 +1268,12 @@ int begin_new_exec(struct linux_binprm * bprm)
>  	if (retval)
>  		return retval;
>  
> +	/*
> +	 * This tracepoint marks the point before flushing the old exec where
> +	 * the current task is still unchanged, but errors are fatal (point of
> +	 * no return). The later "sched_process_exec" tracepoint is called after
> +	 * the current task has successfully switched to the new exec.
> +	 */
>  	trace_new_exec(current, bprm);
>  
>  	/*
> diff --git a/include/trace/events/task.h b/include/trace/events/task.h
> index 8853dc44783d..623d9af777c1 100644
> --- a/include/trace/events/task.h
> +++ b/include/trace/events/task.h
> @@ -61,8 +61,11 @@ TRACE_EVENT(task_rename,
>   * @task:	pointer to the current task
>   * @bprm:	pointer to linux_binprm used for new exec
>   *
> - * Called before flushing the old exec, but at the point of no return during
> - * switching to the new exec.
> + * Called before flushing the old exec, where @task is still unchanged, but at
> + * the point of no return during switching to the new exec. At the point it is
> + * called the exec will either succeed, or on failure terminate the task. Also
> + * see the "sched_process_exec" tracepoint, which is called right after @task
> + * has successfully switched to the new exec.
>   */
>  TRACE_EVENT(new_exec,
>  
> @@ -71,19 +74,22 @@ TRACE_EVENT(new_exec,
>  	TP_ARGS(task, bprm),
>  
>  	TP_STRUCT__entry(
> +		__string(	interp,		bprm->interp	)
>  		__string(	filename,	bprm->filename	)
>  		__field(	pid_t,		pid		)
>  		__string(	comm,		task->comm	)
>  	),
>  
>  	TP_fast_assign(
> +		__assign_str(interp, bprm->interp);
>  		__assign_str(filename, bprm->filename);
>  		__entry->pid = task->pid;
>  		__assign_str(comm, task->comm);
>  	),
>  
> -	TP_printk("filename=%s pid=%d comm=%s",
> -		  __get_str(filename), __entry->pid, __get_str(comm))
> +	TP_printk("interp=%s filename=%s pid=%d comm=%s",
> +		  __get_str(interp), __get_str(filename),
> +		  __entry->pid, __get_str(comm))
>  );
>  
>  #endif

Looks good; I await v2, and Steven's Ack. :)
Masami Hiramatsu (Google) April 9, 2024, 11:54 p.m. UTC | #6
On Tue, 9 Apr 2024 16:45:47 +0200
Marco Elver <elver@google.com> wrote:

> On Tue, 9 Apr 2024 at 16:31, Steven Rostedt <rostedt@goodmis.org> wrote:
> >
> > On Mon,  8 Apr 2024 11:01:54 +0200
> > Marco Elver <elver@google.com> wrote:
> >
> > > Add "new_exec" tracepoint, which is run right after the point of no
> > > return but before the current task assumes its new exec identity.
> > >
> > > Unlike the tracepoint "sched_process_exec", the "new_exec" tracepoint
> > > runs before flushing the old exec, i.e. while the task still has the
> > > original state (such as original MM), but when the new exec either
> > > succeeds or crashes (but never returns to the original exec).
> > >
> > > Being able to trace this event can be helpful in a number of use cases:
> > >
> > >   * allowing tracing eBPF programs access to the original MM on exec,
> > >     before current->mm is replaced;
> > >   * counting exec in the original task (via perf event);
> > >   * profiling flush time ("new_exec" to "sched_process_exec").
> > >
> > > Example of tracing output ("new_exec" and "sched_process_exec"):
> >
> > How common is this? And can't you just do the same with adding a kprobe?
> 
> Our main use case would be to use this in BPF programs to become
> exec-aware, where using the sched_process_exec hook is too late. This
> is particularly important where the BPF program must stop inspecting
> the user space's VM when the task does exec to become a new process.

Just out of curiousity, would you like to audit that the user-program
is not malformed? (security tracepoint?) I think that is an interesting
idea. What kind of information you need?

> 
> kprobe (or BPF's fentry) is brittle here, because begin_new_exec()'s
> permission check can still return an error which returns to the
> original task without crashing. Only at the point of no return are we
> guaranteed that the exec either succeeds, or the task is terminated on
> failure.

Just a note: That is BPF limitation, kprobe and kprobe events can put
a probe in the function body, but that is not supported on BPF (I guess
because it depends on kernel debuginfo.) You can add kprobe-event using
"perf probe" tool.

Thank you,

> 
> I don't know if "common" is the right question here, because it's a
> chicken-egg problem: no tracepoint, we give up; we have the
> tracepoint, it unlocks a range of new use cases (that require robust
> solution to make BPF programs exec-aware, and a tracepoint is the only
> option IMHO).
> 
> Thanks,
> -- Marco
Marco Elver April 10, 2024, 7:54 a.m. UTC | #7
On Wed, 10 Apr 2024 at 01:54, Masami Hiramatsu <mhiramat@kernel.org> wrote:
>
> On Tue, 9 Apr 2024 16:45:47 +0200
> Marco Elver <elver@google.com> wrote:
>
> > On Tue, 9 Apr 2024 at 16:31, Steven Rostedt <rostedt@goodmis.org> wrote:
> > >
> > > On Mon,  8 Apr 2024 11:01:54 +0200
> > > Marco Elver <elver@google.com> wrote:
> > >
> > > > Add "new_exec" tracepoint, which is run right after the point of no
> > > > return but before the current task assumes its new exec identity.
> > > >
> > > > Unlike the tracepoint "sched_process_exec", the "new_exec" tracepoint
> > > > runs before flushing the old exec, i.e. while the task still has the
> > > > original state (such as original MM), but when the new exec either
> > > > succeeds or crashes (but never returns to the original exec).
> > > >
> > > > Being able to trace this event can be helpful in a number of use cases:
> > > >
> > > >   * allowing tracing eBPF programs access to the original MM on exec,
> > > >     before current->mm is replaced;
> > > >   * counting exec in the original task (via perf event);
> > > >   * profiling flush time ("new_exec" to "sched_process_exec").
> > > >
> > > > Example of tracing output ("new_exec" and "sched_process_exec"):
> > >
> > > How common is this? And can't you just do the same with adding a kprobe?
> >
> > Our main use case would be to use this in BPF programs to become
> > exec-aware, where using the sched_process_exec hook is too late. This
> > is particularly important where the BPF program must stop inspecting
> > the user space's VM when the task does exec to become a new process.
>
> Just out of curiousity, would you like to audit that the user-program
> is not malformed? (security tracepoint?) I think that is an interesting
> idea. What kind of information you need?

I didn't have that in mind. If the BPF program reads (or even writes)
to user space memory, it must stop doing so before current->mm is
switched, otherwise it will lead to random results or memory
corruption. The new process may reallocate the memory that we want to
inspect, but the user space process must explicitly opt in to being
inspected or being manipulated. Just like the kernel "flushes" various
old state on exec since it's becoming a new process, a BPF program
that has per-process state needs to do the same.
Masami Hiramatsu (Google) April 10, 2024, 1:56 p.m. UTC | #8
On Mon,  8 Apr 2024 11:01:54 +0200
Marco Elver <elver@google.com> wrote:

> Add "new_exec" tracepoint, which is run right after the point of no
> return but before the current task assumes its new exec identity.
> 
> Unlike the tracepoint "sched_process_exec", the "new_exec" tracepoint
> runs before flushing the old exec, i.e. while the task still has the
> original state (such as original MM), but when the new exec either
> succeeds or crashes (but never returns to the original exec).
> 
> Being able to trace this event can be helpful in a number of use cases:
> 
>   * allowing tracing eBPF programs access to the original MM on exec,
>     before current->mm is replaced;
>   * counting exec in the original task (via perf event);
>   * profiling flush time ("new_exec" to "sched_process_exec").
> 
> Example of tracing output ("new_exec" and "sched_process_exec"):

nit: "new_exec" name a bit stands out compared to other events, and hard to
expect it comes before or after "sched_process_exec". Since "begin_new_exec"
is internal implementation name, IMHO, it should not exposed to user.
What do you think about calling this "sched_prepare_exec" ?

Thank you,

> 
>   $ cat /sys/kernel/debug/tracing/trace_pipe
>       <...>-379     [003] .....   179.626921: new_exec: filename=/usr/bin/sshd pid=379 comm=sshd
>       <...>-379     [003] .....   179.629131: sched_process_exec: filename=/usr/bin/sshd pid=379 old_pid=379
>       <...>-381     [002] .....   180.048580: new_exec: filename=/bin/bash pid=381 comm=sshd
>       <...>-381     [002] .....   180.053122: sched_process_exec: filename=/bin/bash pid=381 old_pid=381
>       <...>-385     [001] .....   180.068277: new_exec: filename=/usr/bin/tty pid=385 comm=bash
>       <...>-385     [001] .....   180.069485: sched_process_exec: filename=/usr/bin/tty pid=385 old_pid=385
>       <...>-389     [006] .....   192.020147: new_exec: filename=/usr/bin/dmesg pid=389 comm=bash
>        bash-389     [006] .....   192.021377: sched_process_exec: filename=/usr/bin/dmesg pid=389 old_pid=389
> 
> Signed-off-by: Marco Elver <elver@google.com>
> ---
>  fs/exec.c                   |  2 ++
>  include/trace/events/task.h | 30 ++++++++++++++++++++++++++++++
>  2 files changed, 32 insertions(+)
> 
> diff --git a/fs/exec.c b/fs/exec.c
> index 38bf71cbdf5e..ab778ae1fc06 100644
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -1268,6 +1268,8 @@ int begin_new_exec(struct linux_binprm * bprm)
>  	if (retval)
>  		return retval;
>  
> +	trace_new_exec(current, bprm);
> +
>  	/*
>  	 * Ensure all future errors are fatal.
>  	 */
> diff --git a/include/trace/events/task.h b/include/trace/events/task.h
> index 47b527464d1a..8853dc44783d 100644
> --- a/include/trace/events/task.h
> +++ b/include/trace/events/task.h
> @@ -56,6 +56,36 @@ TRACE_EVENT(task_rename,
>  		__entry->newcomm, __entry->oom_score_adj)
>  );
>  
> +/**
> + * new_exec - called before setting up new exec
> + * @task:	pointer to the current task
> + * @bprm:	pointer to linux_binprm used for new exec
> + *
> + * Called before flushing the old exec, but at the point of no return during
> + * switching to the new exec.
> + */
> +TRACE_EVENT(new_exec,
> +
> +	TP_PROTO(struct task_struct *task, struct linux_binprm *bprm),
> +
> +	TP_ARGS(task, bprm),
> +
> +	TP_STRUCT__entry(
> +		__string(	filename,	bprm->filename	)
> +		__field(	pid_t,		pid		)
> +		__string(	comm,		task->comm	)
> +	),
> +
> +	TP_fast_assign(
> +		__assign_str(filename, bprm->filename);
> +		__entry->pid = task->pid;
> +		__assign_str(comm, task->comm);
> +	),
> +
> +	TP_printk("filename=%s pid=%d comm=%s",
> +		  __get_str(filename), __entry->pid, __get_str(comm))
> +);
> +
>  #endif
>  
>  /* This part must be outside protection */
> -- 
> 2.44.0.478.gd926399ef9-goog
>
Marco Elver April 10, 2024, 1:59 p.m. UTC | #9
On Wed, 10 Apr 2024 at 15:56, Masami Hiramatsu <mhiramat@kernel.org> wrote:
>
> On Mon,  8 Apr 2024 11:01:54 +0200
> Marco Elver <elver@google.com> wrote:
>
> > Add "new_exec" tracepoint, which is run right after the point of no
> > return but before the current task assumes its new exec identity.
> >
> > Unlike the tracepoint "sched_process_exec", the "new_exec" tracepoint
> > runs before flushing the old exec, i.e. while the task still has the
> > original state (such as original MM), but when the new exec either
> > succeeds or crashes (but never returns to the original exec).
> >
> > Being able to trace this event can be helpful in a number of use cases:
> >
> >   * allowing tracing eBPF programs access to the original MM on exec,
> >     before current->mm is replaced;
> >   * counting exec in the original task (via perf event);
> >   * profiling flush time ("new_exec" to "sched_process_exec").
> >
> > Example of tracing output ("new_exec" and "sched_process_exec"):
>
> nit: "new_exec" name a bit stands out compared to other events, and hard to
> expect it comes before or after "sched_process_exec". Since "begin_new_exec"
> is internal implementation name, IMHO, it should not exposed to user.
> What do you think about calling this "sched_prepare_exec" ?

I like it, I'll rename it to sched_prepare_exec.

Thanks!
diff mbox series

Patch

diff --git a/fs/exec.c b/fs/exec.c
index 38bf71cbdf5e..ab778ae1fc06 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1268,6 +1268,8 @@  int begin_new_exec(struct linux_binprm * bprm)
 	if (retval)
 		return retval;
 
+	trace_new_exec(current, bprm);
+
 	/*
 	 * Ensure all future errors are fatal.
 	 */
diff --git a/include/trace/events/task.h b/include/trace/events/task.h
index 47b527464d1a..8853dc44783d 100644
--- a/include/trace/events/task.h
+++ b/include/trace/events/task.h
@@ -56,6 +56,36 @@  TRACE_EVENT(task_rename,
 		__entry->newcomm, __entry->oom_score_adj)
 );
 
+/**
+ * new_exec - called before setting up new exec
+ * @task:	pointer to the current task
+ * @bprm:	pointer to linux_binprm used for new exec
+ *
+ * Called before flushing the old exec, but at the point of no return during
+ * switching to the new exec.
+ */
+TRACE_EVENT(new_exec,
+
+	TP_PROTO(struct task_struct *task, struct linux_binprm *bprm),
+
+	TP_ARGS(task, bprm),
+
+	TP_STRUCT__entry(
+		__string(	filename,	bprm->filename	)
+		__field(	pid_t,		pid		)
+		__string(	comm,		task->comm	)
+	),
+
+	TP_fast_assign(
+		__assign_str(filename, bprm->filename);
+		__entry->pid = task->pid;
+		__assign_str(comm, task->comm);
+	),
+
+	TP_printk("filename=%s pid=%d comm=%s",
+		  __get_str(filename), __entry->pid, __get_str(comm))
+);
+
 #endif
 
 /* This part must be outside protection */