Message ID | 20230831151414.2714750-1-mjguzik@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [RFC] vfs: add inode lockdep assertions | expand |
On Thu, Aug 31, 2023 at 05:14:14PM +0200, Mateusz Guzik wrote: > Thread "Use exclusive lock for file_remove_privs" [1] reports an issue > which should have been found by asserts -- inode not write locked by the > caller. > > It did not happen because the attempt to do it in notify_change: > WARN_ON_ONCE(!inode_is_locked(inode)); > > passes if the inode is only read-locked: > static inline int rwsem_is_locked(struct rw_semaphore *sem) > { > return atomic_long_read(&sem->count) != 0; > } > > According to git blame this regressed from 2 commits: > 1. 5955102c9984 ("wrappers for ->i_mutex access") which replaced a > bunch of mutex_is_locked with inode_is_locked > 2. 9902af79c01a ("parallel lookups: actual switch to rwsem") which > implemented inode_is_locked as a mere check on the semaphore being > held in *any* manner > > In order to remedy this I'm proposing lockdep-ing the check with 2 > helpers: inode_assert_locked and inode_assert_write_locked > > Below I'm adding the helpers and converting *some* of the spots modified > by the first patch. I boot tested it and nothing blow up on ext4, but > btrfs should cause a complaint. > > I can finish the other spots originally touched by 1 and touch up the 3 > uses I grepped in fs/namei.c, but ultimately filesystem maintainers are > going to have to patch their code at their leasure. On top of that there > are probably quite a few places which should assert, but don't. > > Comments? I think this is useful and I would be supportive of this.
On Thu, Aug 31, 2023 at 05:14:14PM +0200, Mateusz Guzik wrote: > +++ b/include/linux/fs.h > @@ -842,6 +842,16 @@ static inline void inode_lock_shared_nested(struct inode *inode, unsigned subcla > down_read_nested(&inode->i_rwsem, subclass); > } > > +static inline void inode_assert_locked(struct inode *inode) > +{ > + lockdep_assert_held(&inode->i_rwsem); > +} > + > +static inline void inode_assert_write_locked(struct inode *inode) > +{ > + lockdep_assert_held_write(&inode->i_rwsem); > +} This mirrors what we have in mm, but it's only going to trigger on builds that have lockdep enabled. Lockdep is very expensive; it easily doubles the time it takes to run xfstests on my laptop, so I don't generally enable it. So what we also have in MM is: static inline void mmap_assert_write_locked(struct mm_struct *mm) { lockdep_assert_held_write(&mm->mmap_lock); VM_BUG_ON_MM(!rwsem_is_locked(&mm->mmap_lock), mm); } Now if you have lockdep enabled, you get the lockdep check which gives you all the lovely lockdep information, but if you don't, you at least get the cheap check that someone is holding the lock at all. ie I would make this: +static inline void inode_assert_write_locked(struct inode *inode) +{ + lockdep_assert_held_write(&inode->i_rwsem); + WARN_ON_ONCE(!inode_is_locked(inode)); +} Maybe the locking people could give us a rwsem_is_write_locked() predicate, but until then, this is the best solution we came up with.
On 9/6/23 17:20, Matthew Wilcox wrote: > On Thu, Aug 31, 2023 at 05:14:14PM +0200, Mateusz Guzik wrote: >> +++ b/include/linux/fs.h >> @@ -842,6 +842,16 @@ static inline void inode_lock_shared_nested(struct inode *inode, unsigned subcla >> down_read_nested(&inode->i_rwsem, subclass); >> } >> >> +static inline void inode_assert_locked(struct inode *inode) >> +{ >> + lockdep_assert_held(&inode->i_rwsem); >> +} >> + >> +static inline void inode_assert_write_locked(struct inode *inode) >> +{ >> + lockdep_assert_held_write(&inode->i_rwsem); >> +} > > This mirrors what we have in mm, but it's only going to trigger on > builds that have lockdep enabled. Lockdep is very expensive; it > easily doubles the time it takes to run xfstests on my laptop, so > I don't generally enable it. So what we also have in MM is: > > static inline void mmap_assert_write_locked(struct mm_struct *mm) > { > lockdep_assert_held_write(&mm->mmap_lock); > VM_BUG_ON_MM(!rwsem_is_locked(&mm->mmap_lock), mm); > } > > Now if you have lockdep enabled, you get the lockdep check which > gives you all the lovely lockdep information, but if you don't, you > at least get the cheap check that someone is holding the lock at all. > > ie I would make this: > > +static inline void inode_assert_write_locked(struct inode *inode) > +{ > + lockdep_assert_held_write(&inode->i_rwsem); > + WARN_ON_ONCE(!inode_is_locked(inode)); > +} > > Maybe the locking people could give us a rwsem_is_write_locked() > predicate, but until then, this is the best solution we came up with. Which is exactly what I had suggested in the other thread :)
On Wed, Sep 06, 2023 at 05:23:42PM +0200, Bernd Schubert wrote: > > > On 9/6/23 17:20, Matthew Wilcox wrote: > > On Thu, Aug 31, 2023 at 05:14:14PM +0200, Mateusz Guzik wrote: > > > +++ b/include/linux/fs.h > > > @@ -842,6 +842,16 @@ static inline void inode_lock_shared_nested(struct inode *inode, unsigned subcla > > > down_read_nested(&inode->i_rwsem, subclass); > > > } > > > +static inline void inode_assert_locked(struct inode *inode) > > > +{ > > > + lockdep_assert_held(&inode->i_rwsem); > > > +} > > > + > > > +static inline void inode_assert_write_locked(struct inode *inode) > > > +{ > > > + lockdep_assert_held_write(&inode->i_rwsem); > > > +} > > > > This mirrors what we have in mm, but it's only going to trigger on > > builds that have lockdep enabled. Lockdep is very expensive; it > > easily doubles the time it takes to run xfstests on my laptop, so > > I don't generally enable it. So what we also have in MM is: > > > > static inline void mmap_assert_write_locked(struct mm_struct *mm) > > { > > lockdep_assert_held_write(&mm->mmap_lock); > > VM_BUG_ON_MM(!rwsem_is_locked(&mm->mmap_lock), mm); > > } > > > > Now if you have lockdep enabled, you get the lockdep check which > > gives you all the lovely lockdep information, but if you don't, you > > at least get the cheap check that someone is holding the lock at all. > > > > ie I would make this: > > > > +static inline void inode_assert_write_locked(struct inode *inode) > > +{ > > + lockdep_assert_held_write(&inode->i_rwsem); > > + WARN_ON_ONCE(!inode_is_locked(inode)); > > +} > > > > Maybe the locking people could give us a rwsem_is_write_locked() > > predicate, but until then, this is the best solution we came up with. > > > Which is exactly what I had suggested in the other thread :) Yes, but apparently comments in that thread don't count :eyeroll:
On Wed, Sep 06, 2023 at 05:23:42PM +0200, Bernd Schubert wrote: > > > On 9/6/23 17:20, Matthew Wilcox wrote: > > On Thu, Aug 31, 2023 at 05:14:14PM +0200, Mateusz Guzik wrote: > > > +++ b/include/linux/fs.h > > > @@ -842,6 +842,16 @@ static inline void inode_lock_shared_nested(struct inode *inode, unsigned subcla > > > down_read_nested(&inode->i_rwsem, subclass); > > > } > > > +static inline void inode_assert_locked(struct inode *inode) > > > +{ > > > + lockdep_assert_held(&inode->i_rwsem); > > > +} > > > + > > > +static inline void inode_assert_write_locked(struct inode *inode) > > > +{ > > > + lockdep_assert_held_write(&inode->i_rwsem); > > > +} > > > > This mirrors what we have in mm, but it's only going to trigger on > > builds that have lockdep enabled. Lockdep is very expensive; it > > easily doubles the time it takes to run xfstests on my laptop, so > > I don't generally enable it. So what we also have in MM is: > > > > static inline void mmap_assert_write_locked(struct mm_struct *mm) > > { > > lockdep_assert_held_write(&mm->mmap_lock); > > VM_BUG_ON_MM(!rwsem_is_locked(&mm->mmap_lock), mm); > > } > > > > Now if you have lockdep enabled, you get the lockdep check which > > gives you all the lovely lockdep information, but if you don't, you > > at least get the cheap check that someone is holding the lock at all. > > > > ie I would make this: > > > > +static inline void inode_assert_write_locked(struct inode *inode) > > +{ > > + lockdep_assert_held_write(&inode->i_rwsem); > > + WARN_ON_ONCE(!inode_is_locked(inode)); > > +} > > > > Maybe the locking people could give us a rwsem_is_write_locked() > > predicate, but until then, this is the best solution we came up with. > > > Which is exactly what I had suggested in the other thread :) Or hoist the XFS mrlock, because it actually /does/ know if the rwsem is held in shared or exclusive mode. --D
On Wed, Sep 06, 2023 at 08:29:48AM -0700, Darrick J. Wong wrote: > Or hoist the XFS mrlock, because it actually /does/ know if the rwsem is > held in shared or exclusive mode. ... or to put it another way, if we had rwsem_is_write_locked(), we could get rid of mrlock? diff --git a/fs/xfs/mrlock.h b/fs/xfs/mrlock.h index 79155eec341b..5530f03aaed1 100644 --- a/fs/xfs/mrlock.h +++ b/fs/xfs/mrlock.h @@ -10,18 +10,10 @@ typedef struct { struct rw_semaphore mr_lock; -#if defined(DEBUG) || defined(XFS_WARN) - int mr_writer; -#endif } mrlock_t; -#if defined(DEBUG) || defined(XFS_WARN) -#define mrinit(mrp, name) \ - do { (mrp)->mr_writer = 0; init_rwsem(&(mrp)->mr_lock); } while (0) -#else #define mrinit(mrp, name) \ do { init_rwsem(&(mrp)->mr_lock); } while (0) -#endif #define mrlock_init(mrp, t,n,s) mrinit(mrp, n) #define mrfree(mrp) do { } while (0) @@ -34,9 +26,6 @@ static inline void mraccess_nested(mrlock_t *mrp, int subclass) static inline void mrupdate_nested(mrlock_t *mrp, int subclass) { down_write_nested(&mrp->mr_lock, subclass); -#if defined(DEBUG) || defined(XFS_WARN) - mrp->mr_writer = 1; -#endif } static inline int mrtryaccess(mrlock_t *mrp) @@ -48,17 +37,11 @@ static inline int mrtryupdate(mrlock_t *mrp) { if (!down_write_trylock(&mrp->mr_lock)) return 0; -#if defined(DEBUG) || defined(XFS_WARN) - mrp->mr_writer = 1; -#endif return 1; } static inline void mrunlock_excl(mrlock_t *mrp) { -#if defined(DEBUG) || defined(XFS_WARN) - mrp->mr_writer = 0; -#endif up_write(&mrp->mr_lock); } @@ -69,9 +52,6 @@ static inline void mrunlock_shared(mrlock_t *mrp) static inline void mrdemote(mrlock_t *mrp) { -#if defined(DEBUG) || defined(XFS_WARN) - mrp->mr_writer = 0; -#endif downgrade_write(&mrp->mr_lock); } diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c index 9e62cc500140..b99c3bd78c5e 100644 --- a/fs/xfs/xfs_inode.c +++ b/fs/xfs/xfs_inode.c @@ -361,7 +361,7 @@ xfs_isilocked( { if (lock_flags & (XFS_ILOCK_EXCL|XFS_ILOCK_SHARED)) { if (!(lock_flags & XFS_ILOCK_SHARED)) - return !!ip->i_lock.mr_writer; + return rwsem_is_write_locked(&ip->i_lock.mr_lock); return rwsem_is_locked(&ip->i_lock.mr_lock); } diff --git a/include/linux/mmap_lock.h b/include/linux/mmap_lock.h index e05e167dbd16..277b8c96bbf9 100644 --- a/include/linux/mmap_lock.h +++ b/include/linux/mmap_lock.h @@ -69,7 +69,7 @@ static inline void mmap_assert_locked(struct mm_struct *mm) static inline void mmap_assert_write_locked(struct mm_struct *mm) { lockdep_assert_held_write(&mm->mmap_lock); - VM_BUG_ON_MM(!rwsem_is_locked(&mm->mmap_lock), mm); + VM_BUG_ON_MM(!rwsem_is_write_locked(&mm->mmap_lock), mm); } #ifdef CONFIG_PER_VMA_LOCK diff --git a/include/linux/rwbase_rt.h b/include/linux/rwbase_rt.h index 1d264dd08625..3c25b14edc05 100644 --- a/include/linux/rwbase_rt.h +++ b/include/linux/rwbase_rt.h @@ -31,6 +31,11 @@ static __always_inline bool rw_base_is_locked(struct rwbase_rt *rwb) return atomic_read(&rwb->readers) != READER_BIAS; } +static __always_inline bool rw_base_is_write_locked(struct rwbase_rt *rwb) +{ + return atomic_read(&rwb->readers) == WRITER_BIAS; +} + static __always_inline bool rw_base_is_contended(struct rwbase_rt *rwb) { return atomic_read(&rwb->readers) > 0; diff --git a/include/linux/rwsem.h b/include/linux/rwsem.h index 1dd530ce8b45..241a12c6019e 100644 --- a/include/linux/rwsem.h +++ b/include/linux/rwsem.h @@ -72,6 +72,11 @@ static inline int rwsem_is_locked(struct rw_semaphore *sem) return atomic_long_read(&sem->count) != 0; } +static inline int rwsem_is_write_locked(struct rw_semaphore *sem) +{ + return atomic_long_read(&sem->count) & 1; +} + #define RWSEM_UNLOCKED_VALUE 0L #define __RWSEM_COUNT_INIT(name) .count = ATOMIC_LONG_INIT(RWSEM_UNLOCKED_VALUE) @@ -157,6 +162,11 @@ static __always_inline int rwsem_is_locked(struct rw_semaphore *sem) return rw_base_is_locked(&sem->rwbase); } +static __always_inline int rwsem_is_write_locked(struct rw_semaphore *sem) +{ + return rw_base_is_write_locked(&sem->rwbase); +} + static __always_inline int rwsem_is_contended(struct rw_semaphore *sem) { return rw_base_is_contended(&sem->rwbase);
On Wed, Sep 06, 2023 at 05:00:14PM +0100, Matthew Wilcox wrote: > On Wed, Sep 06, 2023 at 08:29:48AM -0700, Darrick J. Wong wrote: > > Or hoist the XFS mrlock, because it actually /does/ know if the rwsem is > > held in shared or exclusive mode. > > ... or to put it another way, if we had rwsem_is_write_locked(), > we could get rid of mrlock? > > diff --git a/fs/xfs/mrlock.h b/fs/xfs/mrlock.h > index 79155eec341b..5530f03aaed1 100644 > --- a/fs/xfs/mrlock.h > +++ b/fs/xfs/mrlock.h > @@ -10,18 +10,10 @@ > > typedef struct { > struct rw_semaphore mr_lock; > -#if defined(DEBUG) || defined(XFS_WARN) > - int mr_writer; > -#endif > } mrlock_t; > > -#if defined(DEBUG) || defined(XFS_WARN) > -#define mrinit(mrp, name) \ > - do { (mrp)->mr_writer = 0; init_rwsem(&(mrp)->mr_lock); } while (0) > -#else > #define mrinit(mrp, name) \ > do { init_rwsem(&(mrp)->mr_lock); } while (0) > -#endif > > #define mrlock_init(mrp, t,n,s) mrinit(mrp, n) > #define mrfree(mrp) do { } while (0) > @@ -34,9 +26,6 @@ static inline void mraccess_nested(mrlock_t *mrp, int subclass) > static inline void mrupdate_nested(mrlock_t *mrp, int subclass) > { > down_write_nested(&mrp->mr_lock, subclass); > -#if defined(DEBUG) || defined(XFS_WARN) > - mrp->mr_writer = 1; > -#endif > } > > static inline int mrtryaccess(mrlock_t *mrp) > @@ -48,17 +37,11 @@ static inline int mrtryupdate(mrlock_t *mrp) > { > if (!down_write_trylock(&mrp->mr_lock)) > return 0; > -#if defined(DEBUG) || defined(XFS_WARN) > - mrp->mr_writer = 1; > -#endif > return 1; > } > > static inline void mrunlock_excl(mrlock_t *mrp) > { > -#if defined(DEBUG) || defined(XFS_WARN) > - mrp->mr_writer = 0; > -#endif > up_write(&mrp->mr_lock); > } > > @@ -69,9 +52,6 @@ static inline void mrunlock_shared(mrlock_t *mrp) > > static inline void mrdemote(mrlock_t *mrp) > { > -#if defined(DEBUG) || defined(XFS_WARN) > - mrp->mr_writer = 0; > -#endif > downgrade_write(&mrp->mr_lock); > } > > diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c > index 9e62cc500140..b99c3bd78c5e 100644 > --- a/fs/xfs/xfs_inode.c > +++ b/fs/xfs/xfs_inode.c > @@ -361,7 +361,7 @@ xfs_isilocked( > { > if (lock_flags & (XFS_ILOCK_EXCL|XFS_ILOCK_SHARED)) { > if (!(lock_flags & XFS_ILOCK_SHARED)) > - return !!ip->i_lock.mr_writer; > + return rwsem_is_write_locked(&ip->i_lock.mr_lock); You'd be better off converting this to: return __xfs_rwsem_islocked(&ip->i_lock.mr_lock, (lock_flags & XFS_ILOCK_SHARED)); And then fixing __xfs_rwsem_islocked to do: static inline bool __xfs_rwsem_islocked( struct rw_semaphore *rwsem, bool shared) { if (!debug_locks) { if (!shared) return rwsem_is_write_locked(rwsem); return rwsem_is_locked(rwsem); } ... } (and then getting rid of mrlock_t.h entirely) > return rwsem_is_locked(&ip->i_lock.mr_lock); > } > > diff --git a/include/linux/mmap_lock.h b/include/linux/mmap_lock.h > index e05e167dbd16..277b8c96bbf9 100644 > --- a/include/linux/mmap_lock.h > +++ b/include/linux/mmap_lock.h > @@ -69,7 +69,7 @@ static inline void mmap_assert_locked(struct mm_struct *mm) > static inline void mmap_assert_write_locked(struct mm_struct *mm) > { > lockdep_assert_held_write(&mm->mmap_lock); > - VM_BUG_ON_MM(!rwsem_is_locked(&mm->mmap_lock), mm); > + VM_BUG_ON_MM(!rwsem_is_write_locked(&mm->mmap_lock), mm); > } > > #ifdef CONFIG_PER_VMA_LOCK > diff --git a/include/linux/rwbase_rt.h b/include/linux/rwbase_rt.h > index 1d264dd08625..3c25b14edc05 100644 > --- a/include/linux/rwbase_rt.h > +++ b/include/linux/rwbase_rt.h > @@ -31,6 +31,11 @@ static __always_inline bool rw_base_is_locked(struct rwbase_rt *rwb) > return atomic_read(&rwb->readers) != READER_BIAS; > } > > +static __always_inline bool rw_base_is_write_locked(struct rwbase_rt *rwb) > +{ > + return atomic_read(&rwb->readers) == WRITER_BIAS; > +} > + > static __always_inline bool rw_base_is_contended(struct rwbase_rt *rwb) > { > return atomic_read(&rwb->readers) > 0; > diff --git a/include/linux/rwsem.h b/include/linux/rwsem.h > index 1dd530ce8b45..241a12c6019e 100644 > --- a/include/linux/rwsem.h > +++ b/include/linux/rwsem.h > @@ -72,6 +72,11 @@ static inline int rwsem_is_locked(struct rw_semaphore *sem) > return atomic_long_read(&sem->count) != 0; > } > > +static inline int rwsem_is_write_locked(struct rw_semaphore *sem) > +{ > + return atomic_long_read(&sem->count) & 1; atomic_long_read(&sem->count) & RWSEM_WRITER_LOCKED ? In one of the past "replace mrlock_t" threads I complained about hardcoding this value instead of using the symbol. I saw it go by, neglected to copy the url, and ten minutes later I can't find it. :( --D > +} > + > #define RWSEM_UNLOCKED_VALUE 0L > #define __RWSEM_COUNT_INIT(name) .count = ATOMIC_LONG_INIT(RWSEM_UNLOCKED_VALUE) > > @@ -157,6 +162,11 @@ static __always_inline int rwsem_is_locked(struct rw_semaphore *sem) > return rw_base_is_locked(&sem->rwbase); > } > > +static __always_inline int rwsem_is_write_locked(struct rw_semaphore *sem) > +{ > + return rw_base_is_write_locked(&sem->rwbase); > +} > + > static __always_inline int rwsem_is_contended(struct rw_semaphore *sem) > { > return rw_base_is_contended(&sem->rwbase);
On Wed, Sep 06, 2023 at 10:07:24AM -0700, Darrick J. Wong wrote: > On Wed, Sep 06, 2023 at 05:00:14PM +0100, Matthew Wilcox wrote: > > +++ b/fs/xfs/xfs_inode.c > > @@ -361,7 +361,7 @@ xfs_isilocked( > > { > > if (lock_flags & (XFS_ILOCK_EXCL|XFS_ILOCK_SHARED)) { > > if (!(lock_flags & XFS_ILOCK_SHARED)) > > - return !!ip->i_lock.mr_writer; > > + return rwsem_is_write_locked(&ip->i_lock.mr_lock); > > You'd be better off converting this to: > > return __xfs_rwsem_islocked(&ip->i_lock.mr_lock, > (lock_flags & XFS_ILOCK_SHARED)); > > And then fixing __xfs_rwsem_islocked to do: > > static inline bool > __xfs_rwsem_islocked( > struct rw_semaphore *rwsem, > bool shared) > { > if (!debug_locks) { > if (!shared) > return rwsem_is_write_locked(rwsem); > return rwsem_is_locked(rwsem); > } > > ... > } Thanks. > > +++ b/include/linux/rwsem.h > > @@ -72,6 +72,11 @@ static inline int rwsem_is_locked(struct rw_semaphore *sem) > > return atomic_long_read(&sem->count) != 0; > > } > > > > +static inline int rwsem_is_write_locked(struct rw_semaphore *sem) > > +{ > > + return atomic_long_read(&sem->count) & 1; > > > atomic_long_read(&sem->count) & RWSEM_WRITER_LOCKED ? Then this would either have to be in rwsem.c or we'd have to move the definition of RWSEM_WRITER_LOCKED to rwsem.h. All three options are kind of bad. I think I hate the bare '1' least.
On Wed, Sep 06, 2023 at 07:33:48PM +0100, Matthew Wilcox wrote: > On Wed, Sep 06, 2023 at 10:07:24AM -0700, Darrick J. Wong wrote: > > On Wed, Sep 06, 2023 at 05:00:14PM +0100, Matthew Wilcox wrote: > > > +++ b/fs/xfs/xfs_inode.c > > > @@ -361,7 +361,7 @@ xfs_isilocked( > > > { > > > if (lock_flags & (XFS_ILOCK_EXCL|XFS_ILOCK_SHARED)) { > > > if (!(lock_flags & XFS_ILOCK_SHARED)) > > > - return !!ip->i_lock.mr_writer; > > > + return rwsem_is_write_locked(&ip->i_lock.mr_lock); > > > > You'd be better off converting this to: > > > > return __xfs_rwsem_islocked(&ip->i_lock.mr_lock, > > (lock_flags & XFS_ILOCK_SHARED)); > > > > And then fixing __xfs_rwsem_islocked to do: > > > > static inline bool > > __xfs_rwsem_islocked( > > struct rw_semaphore *rwsem, > > bool shared) > > { > > if (!debug_locks) { > > if (!shared) > > return rwsem_is_write_locked(rwsem); > > return rwsem_is_locked(rwsem); > > } > > > > ... > > } > > Thanks. > > > > +++ b/include/linux/rwsem.h > > > @@ -72,6 +72,11 @@ static inline int rwsem_is_locked(struct rw_semaphore *sem) > > > return atomic_long_read(&sem->count) != 0; > > > } > > > > > > +static inline int rwsem_is_write_locked(struct rw_semaphore *sem) > > > +{ > > > + return atomic_long_read(&sem->count) & 1; > > > > > > atomic_long_read(&sem->count) & RWSEM_WRITER_LOCKED ? > > Then this would either have to be in rwsem.c or we'd have to move the > definition of RWSEM_WRITER_LOCKED to rwsem.h. All three options are > kind of bad. I think I hate the bare '1' least. I disagree, because using the bare 1 brings the most risk that someone will subtly break the locking assertions some day when they get the bright idea to move RWSEM_WRITER_LOCKED to the upper bit and fail to notice this predicate and its magic number. IMO moving it to the header file (possibly with the usual __ prefix) would be preferable to leaving a landmine. --D
On Wed, Sep 06, 2023 at 11:43:36AM -0700, Darrick J. Wong wrote: > On Wed, Sep 06, 2023 at 07:33:48PM +0100, Matthew Wilcox wrote: > > On Wed, Sep 06, 2023 at 10:07:24AM -0700, Darrick J. Wong wrote: > > > On Wed, Sep 06, 2023 at 05:00:14PM +0100, Matthew Wilcox wrote: > > > > +++ b/fs/xfs/xfs_inode.c > > > > @@ -361,7 +361,7 @@ xfs_isilocked( > > > > { > > > > if (lock_flags & (XFS_ILOCK_EXCL|XFS_ILOCK_SHARED)) { > > > > if (!(lock_flags & XFS_ILOCK_SHARED)) > > > > - return !!ip->i_lock.mr_writer; > > > > + return rwsem_is_write_locked(&ip->i_lock.mr_lock); > > > > > > You'd be better off converting this to: > > > > > > return __xfs_rwsem_islocked(&ip->i_lock.mr_lock, > > > (lock_flags & XFS_ILOCK_SHARED)); > > > > > > And then fixing __xfs_rwsem_islocked to do: > > > > > > static inline bool > > > __xfs_rwsem_islocked( > > > struct rw_semaphore *rwsem, > > > bool shared) > > > { > > > if (!debug_locks) { > > > if (!shared) > > > return rwsem_is_write_locked(rwsem); > > > return rwsem_is_locked(rwsem); > > > } > > > > > > ... > > > } > > > > Thanks. > > > > > > +++ b/include/linux/rwsem.h > > > > @@ -72,6 +72,11 @@ static inline int rwsem_is_locked(struct rw_semaphore *sem) > > > > return atomic_long_read(&sem->count) != 0; > > > > } > > > > > > > > +static inline int rwsem_is_write_locked(struct rw_semaphore *sem) > > > > +{ > > > > + return atomic_long_read(&sem->count) & 1; > > > > > > > > > atomic_long_read(&sem->count) & RWSEM_WRITER_LOCKED ? > > > > Then this would either have to be in rwsem.c or we'd have to move the > > definition of RWSEM_WRITER_LOCKED to rwsem.h. All three options are > > kind of bad. I think I hate the bare '1' least. > > I disagree, because using the bare 1 brings the most risk that someone > will subtly break the locking assertions some day when they get the > bright idea to move RWSEM_WRITER_LOCKED to the upper bit and fail to > notice this predicate and its magic number. IMO moving it to the header > file (possibly with the usual __ prefix) would be preferable to leaving > a landmine. + return atomic_long_read(&sem->count) & 1 /* RWSEM_WRITER_LOCKED */; works for you?
On Wed, Sep 06, 2023 at 08:01:56PM +0100, Matthew Wilcox wrote: > On Wed, Sep 06, 2023 at 11:43:36AM -0700, Darrick J. Wong wrote: > > On Wed, Sep 06, 2023 at 07:33:48PM +0100, Matthew Wilcox wrote: > > > On Wed, Sep 06, 2023 at 10:07:24AM -0700, Darrick J. Wong wrote: > > > > On Wed, Sep 06, 2023 at 05:00:14PM +0100, Matthew Wilcox wrote: > > > > > +++ b/fs/xfs/xfs_inode.c > > > > > @@ -361,7 +361,7 @@ xfs_isilocked( > > > > > { > > > > > if (lock_flags & (XFS_ILOCK_EXCL|XFS_ILOCK_SHARED)) { > > > > > if (!(lock_flags & XFS_ILOCK_SHARED)) > > > > > - return !!ip->i_lock.mr_writer; > > > > > + return rwsem_is_write_locked(&ip->i_lock.mr_lock); > > > > > > > > You'd be better off converting this to: > > > > > > > > return __xfs_rwsem_islocked(&ip->i_lock.mr_lock, > > > > (lock_flags & XFS_ILOCK_SHARED)); > > > > > > > > And then fixing __xfs_rwsem_islocked to do: > > > > > > > > static inline bool > > > > __xfs_rwsem_islocked( > > > > struct rw_semaphore *rwsem, > > > > bool shared) > > > > { > > > > if (!debug_locks) { > > > > if (!shared) > > > > return rwsem_is_write_locked(rwsem); > > > > return rwsem_is_locked(rwsem); > > > > } > > > > > > > > ... > > > > } > > > > > > Thanks. > > > > > > > > +++ b/include/linux/rwsem.h > > > > > @@ -72,6 +72,11 @@ static inline int rwsem_is_locked(struct rw_semaphore *sem) > > > > > return atomic_long_read(&sem->count) != 0; > > > > > } > > > > > > > > > > +static inline int rwsem_is_write_locked(struct rw_semaphore *sem) > > > > > +{ > > > > > + return atomic_long_read(&sem->count) & 1; > > > > > > > > > > > > atomic_long_read(&sem->count) & RWSEM_WRITER_LOCKED ? > > > > > > Then this would either have to be in rwsem.c or we'd have to move the > > > definition of RWSEM_WRITER_LOCKED to rwsem.h. All three options are > > > kind of bad. I think I hate the bare '1' least. > > > > I disagree, because using the bare 1 brings the most risk that someone > > will subtly break the locking assertions some day when they get the > > bright idea to move RWSEM_WRITER_LOCKED to the upper bit and fail to > > notice this predicate and its magic number. IMO moving it to the header > > file (possibly with the usual __ prefix) would be preferable to leaving > > a landmine. > > + return atomic_long_read(&sem->count) & 1 /* RWSEM_WRITER_LOCKED */; > > works for you? Yeah I guess that works. --D
On Wed, Sep 06, 2023 at 10:07:24AM -0700, Darrick J. Wong wrote: > You'd be better off converting this to: > > return __xfs_rwsem_islocked(&ip->i_lock.mr_lock, > (lock_flags & XFS_ILOCK_SHARED)); > > And then fixing __xfs_rwsem_islocked to do: > > static inline bool > __xfs_rwsem_islocked( > struct rw_semaphore *rwsem, > bool shared) > { > if (!debug_locks) { > if (!shared) > return rwsem_is_write_locked(rwsem); > return rwsem_is_locked(rwsem); > } > > ... > } so ... I did that. And then many isilocked() calls started failing. generic/017 was the first one: 00030 XFS: Assertion failed: xfs_isilocked(ip, XFS_ILOCK_EXCL), file: fs/xfs/libxfs/xfs_trans_inode.c, line: 93 00030 ------------[ cut here ]------------ 00030 WARNING: CPU: 5 PID: 77 at fs/xfs/xfs_message.c:104 assfail+0x2c/0x40 00030 Modules linked in: 00030 CPU: 5 PID: 77 Comm: kworker/5:1 Kdump: loaded Not tainted 6.5.0-00006-g88a61c17df8f-dirty #14 00030 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 00030 Workqueue: xfsalloc xfs_btree_split_worker 00030 RIP: 0010:assfail+0x2c/0x40 00030 Code: 89 d0 41 89 c9 48 c7 c2 80 70 0f 82 48 89 f1 48 89 e5 48 89 fe 48 c7 c7 08 4f 07 82 e8 fd fd ff ff 80 3d 26 cc ed 00 00 75 04 <0f> 0b 5d c3 0f 0b 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 55 48 00030 RSP: 0018:ffff888003f0bc28 EFLAGS: 00010246 00030 RAX: 00000000ffffffea RBX: ffff88800d9a0000 RCX: 000000007fffffff 00030 RDX: 0000000000000021 RSI: 0000000000000000 RDI: ffffffff82074f08 00030 RBP: ffff888003f0bc28 R08: 0000000000000000 R09: 000000000000000a 00030 R10: 000000000000000a R11: 0fffffffffffffff R12: ffff888009ff6000 00030 R13: ffff88800ac8a000 R14: 0000000000000001 R15: ffff88800af0a000 00030 FS: 0000000000000000(0000) GS:ffff88807d940000(0000) knlGS:0000000000000000 00030 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 00030 CR2: 00007ff177b05d58 CR3: 0000000007aa2002 CR4: 0000000000770ea0 00030 PKRU: 55555554 00030 Call Trace: 00030 <TASK> 00030 ? show_regs+0x5c/0x70 00030 ? __warn+0x84/0x180 00030 ? assfail+0x2c/0x40 00030 ? report_bug+0x18e/0x1c0 00030 ? handle_bug+0x3e/0x70 00030 ? exc_invalid_op+0x18/0x70 00030 ? asm_exc_invalid_op+0x1b/0x20 00030 ? assfail+0x2c/0x40 00030 ? assfail+0x23/0x40 00030 xfs_trans_log_inode+0xf9/0x120 00030 xfs_bmbt_alloc_block+0xf0/0x1c0 00030 __xfs_btree_split+0xf8/0x6c0 00030 ? __this_cpu_preempt_check+0x13/0x20 00030 ? lock_acquire+0xc8/0x280 00030 xfs_btree_split_worker+0x8a/0x110 00030 process_one_work+0x23f/0x510 00030 worker_thread+0x4d/0x3b0 00030 ? process_one_work+0x510/0x510 00030 kthread+0x106/0x140 00030 ? kthread_complete_and_exit+0x20/0x20 00030 ret_from_fork+0x31/0x50 00030 ? kthread_complete_and_exit+0x20/0x20 00030 ret_from_fork_asm+0x11/0x20 00030 </TASK> 00030 irq event stamp: 2901 00030 hardirqs last enabled at (2909): [<ffffffff810e4d83>] __up_console_sem+0x53/0x60 00030 hardirqs last disabled at (2916): [<ffffffff810e4d68>] __up_console_sem+0x38/0x60 00030 softirqs last enabled at (0): [<ffffffff81067bd0>] copy_process+0x830/0x1c10 00030 softirqs last disabled at (0): [<0000000000000000>] 0x0 00030 ---[ end trace 0000000000000000 ]--- but here's the thing, I have lockdep enabled. So it's not testing my new rwsem_is_write_locked() code, it's testing the current lockdep stuff. So I have a feeling that we're not telling lockdep that we've acquired the i_lock. Or something? Seems unlikely that this is a real bug; surely we'd've noticed before now. Code here: https://git.infradead.org/users/willy/pagecache.git/shortlog/refs/heads/mrlock ie git clone git://git.infradead.org/users/willy/pagecache.git mrlock You don't need the top commit. 754fb6a68dae is enough to provoke it, as long as you have CONFIG_LOCKDEP enabled.
On Wed, Sep 06, 2023 at 10:10:25PM +0100, Matthew Wilcox wrote: > On Wed, Sep 06, 2023 at 10:07:24AM -0700, Darrick J. Wong wrote: > > You'd be better off converting this to: > > > > return __xfs_rwsem_islocked(&ip->i_lock.mr_lock, > > (lock_flags & XFS_ILOCK_SHARED)); > > > > And then fixing __xfs_rwsem_islocked to do: > > > > static inline bool > > __xfs_rwsem_islocked( > > struct rw_semaphore *rwsem, > > bool shared) > > { > > if (!debug_locks) { > > if (!shared) > > return rwsem_is_write_locked(rwsem); > > return rwsem_is_locked(rwsem); > > } > > > > ... > > } > > so ... I did that. And then many isilocked() calls started failing. > generic/017 was the first one: > > 00030 XFS: Assertion failed: xfs_isilocked(ip, XFS_ILOCK_EXCL), file: fs/xfs/libxfs/xfs_trans_inode.c, line: 93 > 00030 ------------[ cut here ]------------ > 00030 WARNING: CPU: 5 PID: 77 at fs/xfs/xfs_message.c:104 assfail+0x2c/0x40 > 00030 Modules linked in: > 00030 CPU: 5 PID: 77 Comm: kworker/5:1 Kdump: loaded Not tainted 6.5.0-00006-g88a61c17df8f-dirty #14 > 00030 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 > 00030 Workqueue: xfsalloc xfs_btree_split_worker > 00030 RIP: 0010:assfail+0x2c/0x40 > 00030 Code: 89 d0 41 89 c9 48 c7 c2 80 70 0f 82 48 89 f1 48 89 e5 48 89 fe 48 c7 > c7 08 4f 07 82 e8 fd fd ff ff 80 3d 26 cc ed 00 00 75 04 <0f> 0b 5d c3 0f 0b 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 55 48 > 00030 RSP: 0018:ffff888003f0bc28 EFLAGS: 00010246 > 00030 RAX: 00000000ffffffea RBX: ffff88800d9a0000 RCX: 000000007fffffff > 00030 RDX: 0000000000000021 RSI: 0000000000000000 RDI: ffffffff82074f08 > 00030 RBP: ffff888003f0bc28 R08: 0000000000000000 R09: 000000000000000a > 00030 R10: 000000000000000a R11: 0fffffffffffffff R12: ffff888009ff6000 > 00030 R13: ffff88800ac8a000 R14: 0000000000000001 R15: ffff88800af0a000 > 00030 FS: 0000000000000000(0000) GS:ffff88807d940000(0000) knlGS:0000000000000000 > 00030 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > 00030 CR2: 00007ff177b05d58 CR3: 0000000007aa2002 CR4: 0000000000770ea0 > 00030 PKRU: 55555554 > 00030 Call Trace: > 00030 <TASK> > 00030 ? show_regs+0x5c/0x70 > 00030 ? __warn+0x84/0x180 > 00030 ? assfail+0x2c/0x40 > 00030 ? report_bug+0x18e/0x1c0 > 00030 ? handle_bug+0x3e/0x70 > 00030 ? exc_invalid_op+0x18/0x70 > 00030 ? asm_exc_invalid_op+0x1b/0x20 > 00030 ? assfail+0x2c/0x40 > 00030 ? assfail+0x23/0x40 > 00030 xfs_trans_log_inode+0xf9/0x120 > 00030 xfs_bmbt_alloc_block+0xf0/0x1c0 > 00030 __xfs_btree_split+0xf8/0x6c0 > 00030 ? __this_cpu_preempt_check+0x13/0x20 > 00030 ? lock_acquire+0xc8/0x280 > 00030 xfs_btree_split_worker+0x8a/0x110 > 00030 process_one_work+0x23f/0x510 > 00030 worker_thread+0x4d/0x3b0 > 00030 ? process_one_work+0x510/0x510 > 00030 kthread+0x106/0x140 > 00030 ? kthread_complete_and_exit+0x20/0x20 > 00030 ret_from_fork+0x31/0x50 > 00030 ? kthread_complete_and_exit+0x20/0x20 > 00030 ret_from_fork_asm+0x11/0x20 > 00030 </TASK> > 00030 irq event stamp: 2901 > 00030 hardirqs last enabled at (2909): [<ffffffff810e4d83>] __up_console_sem+0x53/0x60 > 00030 hardirqs last disabled at (2916): [<ffffffff810e4d68>] __up_console_sem+0x38/0x60 > 00030 softirqs last enabled at (0): [<ffffffff81067bd0>] copy_process+0x830/0x1c10 > 00030 softirqs last disabled at (0): [<0000000000000000>] 0x0 > 00030 ---[ end trace 0000000000000000 ]--- > > but here's the thing, I have lockdep enabled. So it's not testing my > new rwsem_is_write_locked() code, it's testing the current lockdep > stuff. > > So I have a feeling that we're not telling lockdep that we've acquired > the i_lock. Or something? Seems unlikely that this is a real bug; > surely we'd've noticed before now. > > Code here: > https://git.infradead.org/users/willy/pagecache.git/shortlog/refs/heads/mrlock > > ie git clone git://git.infradead.org/users/willy/pagecache.git mrlock > > You don't need the top commit. 754fb6a68dae is enough to provoke it, > as long as you have CONFIG_LOCKDEP enabled. Yeah. The lockdep assertions in __xfs_rwsem_islocked are broken because the fsstress thread takes the ILOCK_EXCL, decides to defer a bmap btree split to a workqueue to avoid overflowing the kernel stack, and doesn't tell lockdep that the workqueue is (effectively) the rwsem owner until it completes. The last time we tried to get rid of mrlock_t, dchinner suggested using lockdep_assert_held_non_owner to fix the assertions, but (a) that doesn't seem to exist in TOT 6.5 and (b) the inode rwsems protect the inode data structure, not threads. Hence you could just get rid of the lockdep assertions and use the rwsem_* versions unconditionally. --D
On 9/6/23, Matthew Wilcox <willy@infradead.org> wrote: > On Wed, Sep 06, 2023 at 05:23:42PM +0200, Bernd Schubert wrote: >> >> >> On 9/6/23 17:20, Matthew Wilcox wrote: >> > On Thu, Aug 31, 2023 at 05:14:14PM +0200, Mateusz Guzik wrote: >> > > +++ b/include/linux/fs.h >> > > @@ -842,6 +842,16 @@ static inline void >> > > inode_lock_shared_nested(struct inode *inode, unsigned subcla >> > > down_read_nested(&inode->i_rwsem, subclass); >> > > } >> > > +static inline void inode_assert_locked(struct inode *inode) >> > > +{ >> > > + lockdep_assert_held(&inode->i_rwsem); >> > > +} >> > > + >> > > +static inline void inode_assert_write_locked(struct inode *inode) >> > > +{ >> > > + lockdep_assert_held_write(&inode->i_rwsem); >> > > +} >> > >> > This mirrors what we have in mm, but it's only going to trigger on >> > builds that have lockdep enabled. Lockdep is very expensive; it >> > easily doubles the time it takes to run xfstests on my laptop, so >> > I don't generally enable it. So what we also have in MM is: >> > >> > static inline void mmap_assert_write_locked(struct mm_struct *mm) >> > { >> > lockdep_assert_held_write(&mm->mmap_lock); >> > VM_BUG_ON_MM(!rwsem_is_locked(&mm->mmap_lock), mm); >> > } >> > >> > Now if you have lockdep enabled, you get the lockdep check which >> > gives you all the lovely lockdep information, but if you don't, you >> > at least get the cheap check that someone is holding the lock at all. >> > >> > ie I would make this: >> > >> > +static inline void inode_assert_write_locked(struct inode *inode) >> > +{ >> > + lockdep_assert_held_write(&inode->i_rwsem); >> > + WARN_ON_ONCE(!inode_is_locked(inode)); >> > +} >> > >> > Maybe the locking people could give us a rwsem_is_write_locked() >> > predicate, but until then, this is the best solution we came up with. >> >> >> Which is exactly what I had suggested in the other thread :) > > Yes, but apparently comments in that thread don't count :eyeroll: > Pretty weird reaction mate, they very much *do* count which is why I'm confused why you resent an e-mail with the bogus is_locked check (which I explicitly pointed out). Since you posted a separate patch to add write-locking check to rwsem I'm going to wait for that bit to get sorted out (unless it stalls for a long time). fwiw I'm confused what's up with people making kernel changes without running lockdep. If it adds too much overhead for use in normal development someone(tm) should have fixed that aspect long time ago.
diff --git a/fs/attr.c b/fs/attr.c index a8ae5f6d9b16..90dec999a952 100644 --- a/fs/attr.c +++ b/fs/attr.c @@ -387,7 +387,7 @@ int notify_change(struct mnt_idmap *idmap, struct dentry *dentry, struct timespec64 now; unsigned int ia_valid = attr->ia_valid; - WARN_ON_ONCE(!inode_is_locked(inode)); + inode_assert_write_locked(inode); error = may_setattr(idmap, inode, ia_valid); if (error) diff --git a/fs/btrfs/xattr.c b/fs/btrfs/xattr.c index 96828a13dd43..46b268a433dd 100644 --- a/fs/btrfs/xattr.c +++ b/fs/btrfs/xattr.c @@ -120,7 +120,7 @@ int btrfs_setxattr(struct btrfs_trans_handle *trans, struct inode *inode, * locks the inode's i_mutex before calling setxattr or removexattr. */ if (flags & XATTR_REPLACE) { - ASSERT(inode_is_locked(inode)); + inode_assert_write_locked(inode); di = btrfs_lookup_xattr(NULL, root, path, btrfs_ino(BTRFS_I(inode)), name, name_len, 0); if (!di) diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index 481491e892df..df428f22f624 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -3364,8 +3364,8 @@ do { \ /* Update i_disksize. Requires i_rwsem to avoid races with truncate */ static inline void ext4_update_i_disksize(struct inode *inode, loff_t newsize) { - WARN_ON_ONCE(S_ISREG(inode->i_mode) && - !inode_is_locked(inode)); + if (S_ISREG(inode->i_mode)) + inode_assert_write_locked(inode); down_write(&EXT4_I(inode)->i_data_sem); if (newsize > EXT4_I(inode)->i_disksize) WRITE_ONCE(EXT4_I(inode)->i_disksize, newsize); diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c index 202c76996b62..149783ecfe16 100644 --- a/fs/ext4/extents.c +++ b/fs/ext4/extents.c @@ -5588,8 +5588,8 @@ ext4_swap_extents(handle_t *handle, struct inode *inode1, BUG_ON(!rwsem_is_locked(&EXT4_I(inode1)->i_data_sem)); BUG_ON(!rwsem_is_locked(&EXT4_I(inode2)->i_data_sem)); - BUG_ON(!inode_is_locked(inode1)); - BUG_ON(!inode_is_locked(inode2)); + inode_assert_write_locked(inode1); + inode_assert_write_locked(inode2); ext4_es_remove_extent(inode1, lblk1, count); ext4_es_remove_extent(inode2, lblk2, count); diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 89737d5a1614..2ecdef6ddc88 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -3797,7 +3797,7 @@ int ext4_update_disksize_before_punch(struct inode *inode, loff_t offset, loff_t size = i_size_read(inode); - WARN_ON(!inode_is_locked(inode)); + inode_assert_write_locked(inode); if (offset > size || offset + len < size) return 0; @@ -4068,7 +4068,7 @@ int ext4_truncate(struct inode *inode) * have i_rwsem locked because it's not necessary. */ if (!(inode->i_state & (I_NEW|I_FREEING))) - WARN_ON(!inode_is_locked(inode)); + inode_assert_write_locked(inode); trace_ext4_truncate_enter(inode); if (!ext4_can_truncate(inode)) diff --git a/include/linux/fs.h b/include/linux/fs.h index c8ff4156a0a1..93d48b6b9f67 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -842,6 +842,16 @@ static inline void inode_lock_shared_nested(struct inode *inode, unsigned subcla down_read_nested(&inode->i_rwsem, subclass); } +static inline void inode_assert_locked(struct inode *inode) +{ + lockdep_assert_held(&inode->i_rwsem); +} + +static inline void inode_assert_write_locked(struct inode *inode) +{ + lockdep_assert_held_write(&inode->i_rwsem); +} + static inline void filemap_invalidate_lock(struct address_space *mapping) { down_write(&mapping->invalidate_lock);