diff mbox series

[tip/perf] uprobes: avoid false lockdep splat in uprobe timer callback

Message ID 20250403171831.3803479-1-andrii@kernel.org (mailing list archive)
State New
Headers show
Series [tip/perf] uprobes: avoid false lockdep splat in uprobe timer callback | expand

Commit Message

Andrii Nakryiko April 3, 2025, 5:18 p.m. UTC
Avoid a false-positive lockdep warning in PREEMPT_RT configuration when
using write_seqcount_begin() in uprobe timer callback by using
raw_write_* APIs. Uprobe's use of timer callback is guaranteed to not
race with itself, and as such seqcount's insistence on having hardirqs
disabled on the writer side is irrelevant. So switch to raw_ variants of
seqcount API instead of disabling hardirqs unnecessarily.

Also, point out in the comments more explicitly why we use seqcount
despite our reader side being rather simple and never retrying. We favor
well-maintained kernel primitive in favor of open-coding our own memory
barriers.

Link: https://lore.kernel.org/bpf/CAADnVQLLOHZmPO4X_dQ+cTaSDvzdWHzA0qUqQDhLFYL3D6xPxg@mail.gmail.com/
Reported-by: Alexei Starovoitov <ast@kernel.org>
Suggested-by: Sebastian Sewior <bigeasy@linutronix.de>
Fixes: 8622e45b5da1 ("uprobes: Reuse return_instances between multiple uretprobes within task")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 kernel/events/uprobes.c | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

Comments

Sebastian Andrzej Siewior April 3, 2025, 5:49 p.m. UTC | #1
On 2025-04-03 10:18:31 [-0700], Andrii Nakryiko wrote:
> Avoid a false-positive lockdep warning in PREEMPT_RT configuration when
> using write_seqcount_begin() in uprobe timer callback by using
> raw_write_* APIs. Uprobe's use of timer callback is guaranteed to not
> race with itself, and as such seqcount's insistence on having hardirqs
preemption, not hardirqs

> disabled on the writer side is irrelevant. So switch to raw_ variants of
> seqcount API instead of disabling hardirqs unnecessarily.
> 
> Also, point out in the comments more explicitly why we use seqcount
> despite our reader side being rather simple and never retrying. We favor
> well-maintained kernel primitive in favor of open-coding our own memory
> barriers.

Thank you.

> Link: https://lore.kernel.org/bpf/CAADnVQLLOHZmPO4X_dQ+cTaSDvzdWHzA0qUqQDhLFYL3D6xPxg@mail.gmail.com/
> Reported-by: Alexei Starovoitov <ast@kernel.org>
> Suggested-by: Sebastian Sewior <bigeasy@linutronix.de>
> Fixes: 8622e45b5da1 ("uprobes: Reuse return_instances between multiple uretprobes within task")
> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> ---
>  kernel/events/uprobes.c | 13 +++++++++++--
>  1 file changed, 11 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
> index 70c84b9d7be3..6d7e7da0fbbc 100644
> --- a/kernel/events/uprobes.c
> +++ b/kernel/events/uprobes.c
> @@ -1944,6 +1944,9 @@ static void free_ret_instance(struct uprobe_task *utask,
>  	 * to-be-reused return instances for future uretprobes. If ri_timer()
>  	 * happens to be running right now, though, we fallback to safety and
>  	 * just perform RCU-delated freeing of ri.
> +	 * Admittedly, this is a rather simple use of seqcount, but it nicely
> +	 * abstracts away all the necessary memory barriers, so we use
> +	 * a well-supported kernel primitive here.
>  	 */
>  	if (raw_seqcount_try_begin(&utask->ri_seqcount, seq)) {
>  		/* immediate reuse of ri without RCU GP is OK */
> @@ -2004,12 +2007,18 @@ static void ri_timer(struct timer_list *timer)
>  	/* RCU protects return_instance from freeing. */
>  	guard(rcu)();
>  
> -	write_seqcount_begin(&utask->ri_seqcount);

> +	/* See free_ret_instance() for notes on seqcount use.

This is not a proper multi line comment.

> +	 * We also employ raw API variants to avoid lockdep false-positive
> +	 * warning complaining about hardirqs not being disabled. We have

s/hardirqs/preemption. The warning is about missing disabled preemption.

> +	 * a guarantee that this timer callback won't race with itself, so no
> +	 * need to disable hardirqs.

The timer can only be invoked once for a uprobe_task. Therefore there
can only be one writer. The reader does not require an even sequence
count so it is okay to remain preemptible on PREEMPT_RT. 

> +	 */
> +	raw_write_seqcount_begin(&utask->ri_seqcount);
>  
>  	for_each_ret_instance_rcu(ri, utask->return_instances)
>  		hprobe_expire(&ri->hprobe, false);
>  
> -	write_seqcount_end(&utask->ri_seqcount);
> +	raw_write_seqcount_end(&utask->ri_seqcount);
>  }
>  
>  static struct uprobe_task *alloc_utask(void)

Sebastian
Steven Rostedt April 3, 2025, 5:53 p.m. UTC | #2
On Thu, 3 Apr 2025 19:49:17 +0200
Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote:

> > +	/* See free_ret_instance() for notes on seqcount use.  
> 
> This is not a proper multi line comment.

It's only proper in the networking code, but not the rest of the kernel.

-- Steve
Sebastian Andrzej Siewior April 3, 2025, 5:56 p.m. UTC | #3
On 2025-04-03 13:53:31 [-0400], Steven Rostedt wrote:
> On Thu, 3 Apr 2025 19:49:17 +0200
> Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote:
> 
> > > +	/* See free_ret_instance() for notes on seqcount use.  
> > 
> > This is not a proper multi line comment.
> 
> It's only proper in the networking code, but not the rest of the kernel.

I wasn't aware that uprobe is following networking standards here.

> -- Steve

Sebastian
Andrii Nakryiko April 3, 2025, 6:30 p.m. UTC | #4
On Thu, Apr 3, 2025 at 10:49 AM Sebastian Andrzej Siewior
<bigeasy@linutronix.de> wrote:
>
> On 2025-04-03 10:18:31 [-0700], Andrii Nakryiko wrote:
> > Avoid a false-positive lockdep warning in PREEMPT_RT configuration when
> > using write_seqcount_begin() in uprobe timer callback by using
> > raw_write_* APIs. Uprobe's use of timer callback is guaranteed to not
> > race with itself, and as such seqcount's insistence on having hardirqs
> preemption, not hardirqs
>
> > disabled on the writer side is irrelevant. So switch to raw_ variants of
> > seqcount API instead of disabling hardirqs unnecessarily.
> >
> > Also, point out in the comments more explicitly why we use seqcount
> > despite our reader side being rather simple and never retrying. We favor
> > well-maintained kernel primitive in favor of open-coding our own memory
> > barriers.
>
> Thank you.
>
> > Link: https://lore.kernel.org/bpf/CAADnVQLLOHZmPO4X_dQ+cTaSDvzdWHzA0qUqQDhLFYL3D6xPxg@mail.gmail.com/
> > Reported-by: Alexei Starovoitov <ast@kernel.org>
> > Suggested-by: Sebastian Sewior <bigeasy@linutronix.de>
> > Fixes: 8622e45b5da1 ("uprobes: Reuse return_instances between multiple uretprobes within task")
> > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> > ---
> >  kernel/events/uprobes.c | 13 +++++++++++--
> >  1 file changed, 11 insertions(+), 2 deletions(-)
> >
> > diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
> > index 70c84b9d7be3..6d7e7da0fbbc 100644
> > --- a/kernel/events/uprobes.c
> > +++ b/kernel/events/uprobes.c
> > @@ -1944,6 +1944,9 @@ static void free_ret_instance(struct uprobe_task *utask,
> >        * to-be-reused return instances for future uretprobes. If ri_timer()
> >        * happens to be running right now, though, we fallback to safety and
> >        * just perform RCU-delated freeing of ri.
> > +      * Admittedly, this is a rather simple use of seqcount, but it nicely
> > +      * abstracts away all the necessary memory barriers, so we use
> > +      * a well-supported kernel primitive here.
> >        */
> >       if (raw_seqcount_try_begin(&utask->ri_seqcount, seq)) {
> >               /* immediate reuse of ri without RCU GP is OK */
> > @@ -2004,12 +2007,18 @@ static void ri_timer(struct timer_list *timer)
> >       /* RCU protects return_instance from freeing. */
> >       guard(rcu)();
> >
> > -     write_seqcount_begin(&utask->ri_seqcount);
>
> > +     /* See free_ret_instance() for notes on seqcount use.
>
> This is not a proper multi line comment.

yep, will fix; no, uprobe is not networking, this style is just
ingrained in my brain from working in BPF code base for a while

>
> > +      * We also employ raw API variants to avoid lockdep false-positive
> > +      * warning complaining about hardirqs not being disabled. We have
>
> s/hardirqs/preemption. The warning is about missing disabled preemption.

Right, sorry, the `this_cpu_read(hardirqs_enabled)` part of the check
in lockdep_assert_preemption_disabled() made too strong an impression
on me :) Will fix.

>
> > +      * a guarantee that this timer callback won't race with itself, so no
> > +      * need to disable hardirqs.
>
> The timer can only be invoked once for a uprobe_task. Therefore there
> can only be one writer. The reader does not require an even sequence
> count so it is okay to remain preemptible on PREEMPT_RT.
>
> > +      */
> > +     raw_write_seqcount_begin(&utask->ri_seqcount);
> >
> >       for_each_ret_instance_rcu(ri, utask->return_instances)
> >               hprobe_expire(&ri->hprobe, false);
> >
> > -     write_seqcount_end(&utask->ri_seqcount);
> > +     raw_write_seqcount_end(&utask->ri_seqcount);
> >  }
> >
> >  static struct uprobe_task *alloc_utask(void)
>
> Sebastian
Steven Rostedt April 3, 2025, 6:53 p.m. UTC | #5
On Thu, 3 Apr 2025 19:56:19 +0200
Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote:

> On 2025-04-03 13:53:31 [-0400], Steven Rostedt wrote:
> > On Thu, 3 Apr 2025 19:49:17 +0200
> > Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote:
> >   
> > > > +	/* See free_ret_instance() for notes on seqcount use.    
> > > 
> > > This is not a proper multi line comment.  
> > 
> > It's only proper in the networking code, but not the rest of the kernel.  
> 
> I wasn't aware that uprobe is following networking standards here.

It's not, but I know that Andrii works a bit with the networking code.

-- Steve
Ingo Molnar April 4, 2025, 8:36 a.m. UTC | #6
* Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:

> On Thu, Apr 3, 2025 at 10:49 AM Sebastian Andrzej Siewior
> <bigeasy@linutronix.de> wrote:
> >
> > On 2025-04-03 10:18:31 [-0700], Andrii Nakryiko wrote:
> > > Avoid a false-positive lockdep warning in PREEMPT_RT configuration when
> > > using write_seqcount_begin() in uprobe timer callback by using
> > > raw_write_* APIs. Uprobe's use of timer callback is guaranteed to not
> > > race with itself, and as such seqcount's insistence on having hardirqs
> > preemption, not hardirqs
> >
> > > disabled on the writer side is irrelevant. So switch to raw_ variants of
> > > seqcount API instead of disabling hardirqs unnecessarily.
> > >
> > > Also, point out in the comments more explicitly why we use seqcount
> > > despite our reader side being rather simple and never retrying. We favor
> > > well-maintained kernel primitive in favor of open-coding our own memory
> > > barriers.
> >
> > Thank you.
> >
> > > Link: https://lore.kernel.org/bpf/CAADnVQLLOHZmPO4X_dQ+cTaSDvzdWHzA0qUqQDhLFYL3D6xPxg@mail.gmail.com/
> > > Reported-by: Alexei Starovoitov <ast@kernel.org>
> > > Suggested-by: Sebastian Sewior <bigeasy@linutronix.de>
> > > Fixes: 8622e45b5da1 ("uprobes: Reuse return_instances between multiple uretprobes within task")
> > > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> > > ---
> > >  kernel/events/uprobes.c | 13 +++++++++++--
> > >  1 file changed, 11 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
> > > index 70c84b9d7be3..6d7e7da0fbbc 100644
> > > --- a/kernel/events/uprobes.c
> > > +++ b/kernel/events/uprobes.c
> > > @@ -1944,6 +1944,9 @@ static void free_ret_instance(struct uprobe_task *utask,
> > >        * to-be-reused return instances for future uretprobes. If ri_timer()
> > >        * happens to be running right now, though, we fallback to safety and
> > >        * just perform RCU-delated freeing of ri.
> > > +      * Admittedly, this is a rather simple use of seqcount, but it nicely
> > > +      * abstracts away all the necessary memory barriers, so we use
> > > +      * a well-supported kernel primitive here.
> > >        */
> > >       if (raw_seqcount_try_begin(&utask->ri_seqcount, seq)) {
> > >               /* immediate reuse of ri without RCU GP is OK */
> > > @@ -2004,12 +2007,18 @@ static void ri_timer(struct timer_list *timer)
> > >       /* RCU protects return_instance from freeing. */
> > >       guard(rcu)();
> > >
> > > -     write_seqcount_begin(&utask->ri_seqcount);
> >
> > > +     /* See free_ret_instance() for notes on seqcount use.
> >
> > This is not a proper multi line comment.
> 
> yep, will fix; no, uprobe is not networking, this style is just
> ingrained in my brain from working in BPF code base for a while

... and this example underlines why we've been asking the networking 
folks for years to use the standard Linux kernel coding style for 
comments, instead of creating this pointless noise & inconsistency.

Thanks,

	Ingo
diff mbox series

Patch

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 70c84b9d7be3..6d7e7da0fbbc 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -1944,6 +1944,9 @@  static void free_ret_instance(struct uprobe_task *utask,
 	 * to-be-reused return instances for future uretprobes. If ri_timer()
 	 * happens to be running right now, though, we fallback to safety and
 	 * just perform RCU-delated freeing of ri.
+	 * Admittedly, this is a rather simple use of seqcount, but it nicely
+	 * abstracts away all the necessary memory barriers, so we use
+	 * a well-supported kernel primitive here.
 	 */
 	if (raw_seqcount_try_begin(&utask->ri_seqcount, seq)) {
 		/* immediate reuse of ri without RCU GP is OK */
@@ -2004,12 +2007,18 @@  static void ri_timer(struct timer_list *timer)
 	/* RCU protects return_instance from freeing. */
 	guard(rcu)();
 
-	write_seqcount_begin(&utask->ri_seqcount);
+	/* See free_ret_instance() for notes on seqcount use.
+	 * We also employ raw API variants to avoid lockdep false-positive
+	 * warning complaining about hardirqs not being disabled. We have
+	 * a guarantee that this timer callback won't race with itself, so no
+	 * need to disable hardirqs.
+	 */
+	raw_write_seqcount_begin(&utask->ri_seqcount);
 
 	for_each_ret_instance_rcu(ri, utask->return_instances)
 		hprobe_expire(&ri->hprobe, false);
 
-	write_seqcount_end(&utask->ri_seqcount);
+	raw_write_seqcount_end(&utask->ri_seqcount);
 }
 
 static struct uprobe_task *alloc_utask(void)