Message ID | 20230927160231.XRCDDSK4@linutronix.de (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | [RFC] srcu: Use try-lock lockdep annotation for NMI-safe access. | expand |
On Wed, Sep 27, 2023 at 06:02:31PM +0200, Sebastian Andrzej Siewior wrote: > It is claimed that srcu_read_lock_nmisafe() NMI-safe. However it > triggers a lockdep if used from NMI because lockdep expects a deadlock > since nothing disables NMIs while the lock is acquired. > > Use a try-lock annotation for srcu_read_lock_nmisafe() to avoid lockdep > complains if used from NMI. > > Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> > --- > > The splat: > | ================================ > | WARNING: inconsistent lock state > | 6.6.0-rc3-rt5+ #85 Not tainted > | -------------------------------- > | inconsistent {INITIAL USE} -> {IN-NMI} usage. > | swapper/0/0 [HC1[1]:SC0[0]:HE0:SE1] takes: > | ffffffff828e6c90 (console_srcu){....}-{0:0}, at: console_srcu_read_lock+0x3a/0x50 > | {INITIAL USE} state was registered at: > … > | CPU0 > | ---- > | lock(console_srcu); > | <Interrupt> > | lock(console_srcu); > | > | *** DEADLOCK *** > | > > My guess is that trylock annotation should not apply to > rcu_lock_acquire(). This would distinguish it from from non-NMI safe > srcu_read_lock_nmisafe() and NMI check in rcu_read_unlock() is only > there to survive if accidentally used in-NMI. I think this is a "side-effect" of commit f0f44752f5f6 ("rcu: Annotate SRCU's update-side lockdep dependencies"). In verify_lock_unused(), i.e. the checking for NMI lock usages, the logic is that 1) read lock usages in NMI conflicts with write lock usage in normal context (i.e. LOCKF_USED) 2) write lock usage in NMI conflicts with read and write lock usage in normal context (i.e. LOCKF_USED | LOCKF_USED_READ) before that commit, only read-side of SRCU is annotated, in other words, SRCU only has read lock usage from lockdep PoV, but after that commit, we annotate synchronize_srcu() as a write lock usage, so that we can detect deadlocks between *normal* srcu_read_lock() and synchronize_srcu(), however the side effect is now SRCU has a write lock usage from lockdep PoV. Actually in the above commit, I explicitly leave srcu_read_lock_nmisafe() alone since its locking rules may be different compared to srcu_read_lock(). In lockdep terms, srcu_read_lock_nmisafe() is a !check read lock and srcu_read_lock() is a check read lock. Maybe instead of using the trylock trick, we change lockdep to igore !check locks for NMI context detection? Untested code as below: diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c index e85b5ad3e206..1af8d44e5eb4 100644 --- a/kernel/locking/lockdep.c +++ b/kernel/locking/lockdep.c @@ -5727,8 +5727,9 @@ void lock_acquire(struct lockdep_map *lock, unsigned int subclass, return; if (unlikely(!lockdep_enabled())) { + /* Only do NMI context checking if it's a check lock */ /* XXX allow trylock from NMI ?!? */ - if (lockdep_nmi() && !trylock) { + if (check && lockdep_nmi() && !trylock) { struct held_lock hlock; hlock.acquire_ip = ip; Peter, thoughts? Of course, either way, we need Fixes: f0f44752f5f6 ("rcu: Annotate SRCU's update-side lockdep dependencies") Regards, Boqun > > include/linux/rcupdate.h | 6 ++++++ > include/linux/srcu.h | 2 +- > 2 files changed, 7 insertions(+), 1 deletion(-) > > diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h > index 5e5f920ade909..44aab5c0bd2c1 100644 > --- a/include/linux/rcupdate.h > +++ b/include/linux/rcupdate.h > @@ -303,6 +303,11 @@ static inline void rcu_lock_acquire(struct lockdep_map *map) > lock_acquire(map, 0, 0, 2, 0, NULL, _THIS_IP_); > } > > +static inline void rcu_try_lock_acquire(struct lockdep_map *map) > +{ > + lock_acquire(map, 0, 1, 2, 0, NULL, _THIS_IP_); > +} > + > static inline void rcu_lock_release(struct lockdep_map *map) > { > lock_release(map, _THIS_IP_); > @@ -317,6 +322,7 @@ int rcu_read_lock_any_held(void); > #else /* #ifdef CONFIG_DEBUG_LOCK_ALLOC */ > > # define rcu_lock_acquire(a) do { } while (0) > +# define rcu_try_lock_acquire(a) do { } while (0) > # define rcu_lock_release(a) do { } while (0) > > static inline int rcu_read_lock_held(void) > diff --git a/include/linux/srcu.h b/include/linux/srcu.h > index 127ef3b2e6073..236610e4a8fa5 100644 > --- a/include/linux/srcu.h > +++ b/include/linux/srcu.h > @@ -229,7 +229,7 @@ static inline int srcu_read_lock_nmisafe(struct srcu_struct *ssp) __acquires(ssp > > srcu_check_nmi_safety(ssp, true); > retval = __srcu_read_lock_nmisafe(ssp); > - rcu_lock_acquire(&ssp->dep_map); > + rcu_try_lock_acquire(&ssp->dep_map); > return retval; > } > > -- > 2.40.1 >
On 2023-09-27 23:06:09 [-0700], Boqun Feng wrote: > SRCU only has read lock usage from lockdep PoV, but after that commit, > we annotate synchronize_srcu() as a write lock usage, so that we can > detect deadlocks between *normal* srcu_read_lock() and > synchronize_srcu(), however the side effect is now SRCU has a write lock > usage from lockdep PoV. Ach. There is a write annotation for SRCU and RCU has none. Okay that explains it. > Actually in the above commit, I explicitly leave > srcu_read_lock_nmisafe() alone since its locking rules may be different > compared to srcu_read_lock(). In lockdep terms, srcu_read_lock_nmisafe() > is a !check read lock and srcu_read_lock() is a check read lock. This was on v6.6-rc3 so it has the commit f0f44752f5f61 ("rcu: Annotate SRCU's update-side lockdep dependencies"). > Maybe > instead of using the trylock trick, we change lockdep to igore !check > locks for NMI context detection? Untested code as below: Just tested, no splat for the SRCU-in-NMI usage. Sebastian
On Wed, Sep 27, 2023 at 11:06:09PM -0700, Boqun Feng wrote: > diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c > index e85b5ad3e206..1af8d44e5eb4 100644 > --- a/kernel/locking/lockdep.c > +++ b/kernel/locking/lockdep.c > @@ -5727,8 +5727,9 @@ void lock_acquire(struct lockdep_map *lock, unsigned int subclass, > return; > > if (unlikely(!lockdep_enabled())) { > + /* Only do NMI context checking if it's a check lock */ > /* XXX allow trylock from NMI ?!? */ > - if (lockdep_nmi() && !trylock) { > + if (check && lockdep_nmi() && !trylock) { > struct held_lock hlock; > > hlock.acquire_ip = ip; > > Peter, thoughts? > I think I prefer the trylock one. Fundamentally trylock conveys the 'we wont block' thing. Making 'lock' sometimes work for NMI is just confusing.
On Wed, Sep 27, 2023 at 11:06:09PM -0700, Boqun Feng wrote: > I think this is a "side-effect" of commit f0f44752f5f6 ("rcu: Annotate > SRCU's update-side lockdep dependencies"). In verify_lock_unused(), i.e. > the checking for NMI lock usages, the logic is that I think I'm having a problem with this commit -- that is, by adding lockdep you're adding tracepoint, which rely on RCU being active. The result is that SRCU is now no longer usable from !RCU regions. Was this considered and intended?
On Thu, Sep 28, 2023 at 10:09:00AM +0200, Peter Zijlstra wrote: > On Wed, Sep 27, 2023 at 11:06:09PM -0700, Boqun Feng wrote: > > > I think this is a "side-effect" of commit f0f44752f5f6 ("rcu: Annotate > > SRCU's update-side lockdep dependencies"). In verify_lock_unused(), i.e. > > the checking for NMI lock usages, the logic is that > > I think I'm having a problem with this commit -- that is, by adding > lockdep you're adding tracepoint, which rely on RCU being active. > > The result is that SRCU is now no longer usable from !RCU regions. > Interesting > Was this considered and intended? > No, I don't think I have considered this before, I think I may still miss something here, maybe you or Paul can provide an example for such a case? One thing though, before the commit, srcu_read_lock() already has an rcu_lock_acquire() annotation which eventually calls lock_acquire() which has a tracepoint in it. Regards, Boqun
On Thu, Sep 28, 2023 at 07:54:10AM -0700, Boqun Feng wrote: > On Thu, Sep 28, 2023 at 10:09:00AM +0200, Peter Zijlstra wrote: > > On Wed, Sep 27, 2023 at 11:06:09PM -0700, Boqun Feng wrote: > > > > > I think this is a "side-effect" of commit f0f44752f5f6 ("rcu: Annotate > > > SRCU's update-side lockdep dependencies"). In verify_lock_unused(), i.e. > > > the checking for NMI lock usages, the logic is that > > > > I think I'm having a problem with this commit -- that is, by adding > > lockdep you're adding tracepoint, which rely on RCU being active. > > > > The result is that SRCU is now no longer usable from !RCU regions. > > > > Interesting > > > Was this considered and intended? > > > > No, I don't think I have considered this before, I think I may still > miss something here, maybe you or Paul can provide an example for such > a case? The whole trace_.*_rcuidle() machinery. Which I thought I had fully eradicated, but apparently still exists (with *one* user) :-/ Search for rcuidle in include/linux/tracepoint.h Also, git grep trace_.*_rcuidle
On Thu, Sep 28, 2023 at 05:29:42PM +0200, Peter Zijlstra wrote: > On Thu, Sep 28, 2023 at 07:54:10AM -0700, Boqun Feng wrote: > > On Thu, Sep 28, 2023 at 10:09:00AM +0200, Peter Zijlstra wrote: > > > On Wed, Sep 27, 2023 at 11:06:09PM -0700, Boqun Feng wrote: > > > > > > > I think this is a "side-effect" of commit f0f44752f5f6 ("rcu: Annotate > > > > SRCU's update-side lockdep dependencies"). In verify_lock_unused(), i.e. > > > > the checking for NMI lock usages, the logic is that > > > > > > I think I'm having a problem with this commit -- that is, by adding > > > lockdep you're adding tracepoint, which rely on RCU being active. > > > > > > The result is that SRCU is now no longer usable from !RCU regions. > > > > > > > Interesting > > > > > Was this considered and intended? > > > > > > > No, I don't think I have considered this before, I think I may still > > miss something here, maybe you or Paul can provide an example for such > > a case? > > The whole trace_.*_rcuidle() machinery. Which I thought I had fully > eradicated, but apparently still exists (with *one* user) :-/ > > Search for rcuidle in include/linux/tracepoint.h > > Also, git grep trace_.*_rcuidle > Thanks! But as I mentioned in the IRC, trace_.*_rcuidle() call the special APIs srcu_read_{un,}lock_notrace(), and these won't call lockdep annotation functions. And what the commit did was only changing the lockdep annotation of srcu_read_{un,}lock(), so we are still fine here? Regards, Boqun >
diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h index 5e5f920ade909..44aab5c0bd2c1 100644 --- a/include/linux/rcupdate.h +++ b/include/linux/rcupdate.h @@ -303,6 +303,11 @@ static inline void rcu_lock_acquire(struct lockdep_map *map) lock_acquire(map, 0, 0, 2, 0, NULL, _THIS_IP_); } +static inline void rcu_try_lock_acquire(struct lockdep_map *map) +{ + lock_acquire(map, 0, 1, 2, 0, NULL, _THIS_IP_); +} + static inline void rcu_lock_release(struct lockdep_map *map) { lock_release(map, _THIS_IP_); @@ -317,6 +322,7 @@ int rcu_read_lock_any_held(void); #else /* #ifdef CONFIG_DEBUG_LOCK_ALLOC */ # define rcu_lock_acquire(a) do { } while (0) +# define rcu_try_lock_acquire(a) do { } while (0) # define rcu_lock_release(a) do { } while (0) static inline int rcu_read_lock_held(void) diff --git a/include/linux/srcu.h b/include/linux/srcu.h index 127ef3b2e6073..236610e4a8fa5 100644 --- a/include/linux/srcu.h +++ b/include/linux/srcu.h @@ -229,7 +229,7 @@ static inline int srcu_read_lock_nmisafe(struct srcu_struct *ssp) __acquires(ssp srcu_check_nmi_safety(ssp, true); retval = __srcu_read_lock_nmisafe(ssp); - rcu_lock_acquire(&ssp->dep_map); + rcu_try_lock_acquire(&ssp->dep_map); return retval; }
It is claimed that srcu_read_lock_nmisafe() NMI-safe. However it triggers a lockdep if used from NMI because lockdep expects a deadlock since nothing disables NMIs while the lock is acquired. Use a try-lock annotation for srcu_read_lock_nmisafe() to avoid lockdep complains if used from NMI. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> --- The splat: | ================================ | WARNING: inconsistent lock state | 6.6.0-rc3-rt5+ #85 Not tainted | -------------------------------- | inconsistent {INITIAL USE} -> {IN-NMI} usage. | swapper/0/0 [HC1[1]:SC0[0]:HE0:SE1] takes: | ffffffff828e6c90 (console_srcu){....}-{0:0}, at: console_srcu_read_lock+0x3a/0x50 | {INITIAL USE} state was registered at: … | CPU0 | ---- | lock(console_srcu); | <Interrupt> | lock(console_srcu); | | *** DEADLOCK *** | My guess is that trylock annotation should not apply to rcu_lock_acquire(). This would distinguish it from from non-NMI safe srcu_read_lock_nmisafe() and NMI check in rcu_read_unlock() is only there to survive if accidentally used in-NMI. include/linux/rcupdate.h | 6 ++++++ include/linux/srcu.h | 2 +- 2 files changed, 7 insertions(+), 1 deletion(-)