Message ID | 20230907174705.2976191-2-willy@infradead.org (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | Remove the XFS mrlock | expand |
On 9/7/23 13:47, Matthew Wilcox (Oracle) wrote: > Several places want to know whether the lock is held by a writer, instead > of just whether it's held. We can implement this for both normal and > rt rwsems. RWSEM_WRITER_LOCKED is declared in rwsem.c and exposing > it outside that file might tempt other people to use it, so just use > a comment to note that's what the 1 means, and help anybody find it if > they're looking to change the implementation. > > Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> > --- > include/linux/rwbase_rt.h | 5 +++++ > include/linux/rwsem.h | 10 ++++++++++ > 2 files changed, 15 insertions(+) > > diff --git a/include/linux/rwbase_rt.h b/include/linux/rwbase_rt.h > index 1d264dd08625..3c25b14edc05 100644 > --- a/include/linux/rwbase_rt.h > +++ b/include/linux/rwbase_rt.h > @@ -31,6 +31,11 @@ static __always_inline bool rw_base_is_locked(struct rwbase_rt *rwb) > return atomic_read(&rwb->readers) != READER_BIAS; > } > > +static __always_inline bool rw_base_is_write_locked(struct rwbase_rt *rwb) > +{ > + return atomic_read(&rwb->readers) == WRITER_BIAS; > +} > + > static __always_inline bool rw_base_is_contended(struct rwbase_rt *rwb) > { > return atomic_read(&rwb->readers) > 0; > diff --git a/include/linux/rwsem.h b/include/linux/rwsem.h > index 1dd530ce8b45..0f78b8d2e653 100644 > --- a/include/linux/rwsem.h > +++ b/include/linux/rwsem.h > @@ -72,6 +72,11 @@ static inline int rwsem_is_locked(struct rw_semaphore *sem) > return atomic_long_read(&sem->count) != 0; > } > > +static inline int rwsem_is_write_locked(struct rw_semaphore *sem) > +{ > + return atomic_long_read(&sem->count) & 1 /* RWSEM_WRITER_LOCKED */; > +} I would prefer you move the various RWSEM_* count bit macros from kernel/locking/rwsem.c to under the !PREEMPT_RT block and directly use RWSEM_WRITER_LOCKED instead of hardcoding a value of 1. Cheers, Longman
On Thu, Sep 07, 2023 at 06:47:01PM +0100, Matthew Wilcox (Oracle) wrote: > Several places want to know whether the lock is held by a writer, instead > of just whether it's held. We can implement this for both normal and > rt rwsems. RWSEM_WRITER_LOCKED is declared in rwsem.c and exposing > it outside that file might tempt other people to use it, so just use > a comment to note that's what the 1 means, and help anybody find it if > they're looking to change the implementation. I'm presuming this is deep in a callchain where they know they hold the lock, but they lost in what capacity? In general I strongly dislike the whole _is_locked family, because it gives very poorly defined semantics if used by anybody but the owner. If these new functions are indeed to be used only by lock holders to determine what kind of lock they hold, could we please put: lockdep_assert_held() in them?
On Thu, Sep 07, 2023 at 09:08:10PM +0200, Peter Zijlstra wrote: > On Thu, Sep 07, 2023 at 06:47:01PM +0100, Matthew Wilcox (Oracle) wrote: > > Several places want to know whether the lock is held by a writer, instead > > of just whether it's held. We can implement this for both normal and > > rt rwsems. RWSEM_WRITER_LOCKED is declared in rwsem.c and exposing > > it outside that file might tempt other people to use it, so just use > > a comment to note that's what the 1 means, and help anybody find it if > > they're looking to change the implementation. > > I'm presuming this is deep in a callchain where they know they hold the > lock, but they lost in what capacity? No, it's just assertions. You can see that in patch 3 where it's used in functions called things like "xfs_islocked". > In general I strongly dislike the whole _is_locked family, because it > gives very poorly defined semantics if used by anybody but the owner. > > If these new functions are indeed to be used only by lock holders to > determine what kind of lock they hold, could we please put: > > lockdep_assert_held() > > in them? Patch 2 shows it in use in the MM code. We already have a lockdep_assert_held_write(), but most people don't enable lockdep, so we also have VM_BUG_ON_MM(!rwsem_is_write_locked(&mm->mmap_lock), mm) to give us a good assertion when lockdep is disabled. XFS has a problem with using lockdep in general, which is that a worker thread can be spawned and use the fact that the spawner is holding the lock. There's no mechanism for the worker thread to ask "Does struct task_struct *p hold the lock?".
On Thu, Sep 07, 2023 at 02:05:54PM -0400, Waiman Long wrote: > On 9/7/23 13:47, Matthew Wilcox (Oracle) wrote: > > +static inline int rwsem_is_write_locked(struct rw_semaphore *sem) > > +{ > > + return atomic_long_read(&sem->count) & 1 /* RWSEM_WRITER_LOCKED */; > > +} > > I would prefer you move the various RWSEM_* count bit macros from > kernel/locking/rwsem.c to under the !PREEMPT_RT block and directly use > RWSEM_WRITER_LOCKED instead of hardcoding a value of 1. Just to be clear, you want the ~50 lines from: /* * On 64-bit architectures, the bit definitions of the count are: ... #define RWSEM_READ_FAILED_MASK (RWSEM_WRITER_MASK|RWSEM_FLAG_WAITERS|\ RWSEM_FLAG_HANDOFF|RWSEM_FLAG_READFAIL) moved from rwsem.c to rwsem.h? Or just these four lines: #define RWSEM_WRITER_LOCKED (1UL << 0) #define RWSEM_FLAG_WAITERS (1UL << 1) #define RWSEM_FLAG_HANDOFF (1UL << 2) #define RWSEM_FLAG_READFAIL (1UL << (BITS_PER_LONG - 1))
On Thu, Sep 07, 2023 at 08:20:30PM +0100, Matthew Wilcox wrote: > On Thu, Sep 07, 2023 at 09:08:10PM +0200, Peter Zijlstra wrote: > > On Thu, Sep 07, 2023 at 06:47:01PM +0100, Matthew Wilcox (Oracle) wrote: > > > Several places want to know whether the lock is held by a writer, instead > > > of just whether it's held. We can implement this for both normal and > > > rt rwsems. RWSEM_WRITER_LOCKED is declared in rwsem.c and exposing > > > it outside that file might tempt other people to use it, so just use > > > a comment to note that's what the 1 means, and help anybody find it if > > > they're looking to change the implementation. > > > > I'm presuming this is deep in a callchain where they know they hold the > > lock, but they lost in what capacity? > > No, it's just assertions. You can see that in patch 3 where it's > used in functions called things like "xfs_islocked". Right, but if you're not the lock owner, your answer to the question is a dice-roll, it might be locked, it might not be. > > In general I strongly dislike the whole _is_locked family, because it > > gives very poorly defined semantics if used by anybody but the owner. > > > > If these new functions are indeed to be used only by lock holders to > > determine what kind of lock they hold, could we please put: > > > > lockdep_assert_held() > > > > in them? > > Patch 2 shows it in use in the MM code. We already have a > lockdep_assert_held_write(), but most people don't enable lockdep, so Most devs should run with lockdep on when writing new code, and I know the sanitizer robots run with lockdep on. In general there seems to be a ton of lockdep on coverage. > we also have VM_BUG_ON_MM(!rwsem_is_write_locked(&mm->mmap_lock), mm) > to give us a good assertion when lockdep is disabled. Is that really worth it still? I mean, much of these assertions pre-date lockdep. > XFS has a problem with using lockdep in general, which is that a worker > thread can be spawned and use the fact that the spawner is holding the > lock. There's no mechanism for the worker thread to ask "Does struct > task_struct *p hold the lock?". Will be somewhat tricky to make happen -- but might be doable. It is however an interface that is *very* hard to use correctly. Basically I think you want to also assert that your target task 'p' is blocked, right? That is: assert @p is blocked and holds @lock.
On 9/7/23 15:33, Matthew Wilcox wrote: > On Thu, Sep 07, 2023 at 02:05:54PM -0400, Waiman Long wrote: >> On 9/7/23 13:47, Matthew Wilcox (Oracle) wrote: >>> +static inline int rwsem_is_write_locked(struct rw_semaphore *sem) >>> +{ >>> + return atomic_long_read(&sem->count) & 1 /* RWSEM_WRITER_LOCKED */; >>> +} >> I would prefer you move the various RWSEM_* count bit macros from >> kernel/locking/rwsem.c to under the !PREEMPT_RT block and directly use >> RWSEM_WRITER_LOCKED instead of hardcoding a value of 1. > Just to be clear, you want the ~50 lines from: > > /* > * On 64-bit architectures, the bit definitions of the count are: > ... > #define RWSEM_READ_FAILED_MASK (RWSEM_WRITER_MASK|RWSEM_FLAG_WAITERS|\ > RWSEM_FLAG_HANDOFF|RWSEM_FLAG_READFAIL) > > moved from rwsem.c to rwsem.h? > > Or just these four lines: > > #define RWSEM_WRITER_LOCKED (1UL << 0) > #define RWSEM_FLAG_WAITERS (1UL << 1) > #define RWSEM_FLAG_HANDOFF (1UL << 2) > #define RWSEM_FLAG_READFAIL (1UL << (BITS_PER_LONG - 1)) I think just the first 3 lines will be enough. Maybe a bit of comment about these bit flags in the count atomic_long value. Cheers, Longman
On Thu, Sep 07, 2023 at 09:38:38PM +0200, Peter Zijlstra wrote: > On Thu, Sep 07, 2023 at 08:20:30PM +0100, Matthew Wilcox wrote: > > On Thu, Sep 07, 2023 at 09:08:10PM +0200, Peter Zijlstra wrote: > > > On Thu, Sep 07, 2023 at 06:47:01PM +0100, Matthew Wilcox (Oracle) wrote: > > > > Several places want to know whether the lock is held by a writer, instead > > > > of just whether it's held. We can implement this for both normal and > > > > rt rwsems. RWSEM_WRITER_LOCKED is declared in rwsem.c and exposing > > > > it outside that file might tempt other people to use it, so just use > > > > a comment to note that's what the 1 means, and help anybody find it if > > > > they're looking to change the implementation. > > > > > > I'm presuming this is deep in a callchain where they know they hold the > > > lock, but they lost in what capacity? > > > > No, it's just assertions. You can see that in patch 3 where it's > > used in functions called things like "xfs_islocked". > > Right, but if you're not the lock owner, your answer to the question is > a dice-roll, it might be locked, it might not be. Except that the person writing the code knows the call chain that leads up to that code, and so they have a pretty good idea whether the object should be locked or not. If we are running that code, and the object is locked, then it's pretty much guaranteed that the owner of the lock is code that executed the check, because otherwise we have a *major lock implementation bug*. i.e. if we get to a place where rwsem_is_write_locked() fires because some other task holds the lock, it almost always means we have *two* tasks holding the lock exclusively. Yes, it's these non-lockdep checks in XFS that have found rwsem implementation bugs in the past. We've had them fire when the lock was write locked when we know a few lines earlier it was taken as a read lock, or marked write locked when they should have been unlocked, etc because the rwsem code failed to enforce rw exclusion semantics correctly. So, really, these lock checks should be considered in the context of the code that is running them and what such a "false detection" would actually mean. In my experience, a false detection like you talk about above means "rwsems are broken", not that there is a problem with the code using the rwsems or the rwsem state check. > > > In general I strongly dislike the whole _is_locked family, because it > > > gives very poorly defined semantics if used by anybody but the owner. > > > > > > If these new functions are indeed to be used only by lock holders to > > > determine what kind of lock they hold, could we please put: > > > > > > lockdep_assert_held() > > > > > > in them? > > > > Patch 2 shows it in use in the MM code. We already have a > > lockdep_assert_held_write(), but most people don't enable lockdep, so > > Most devs should run with lockdep on when writing new code, and I know > the sanitizer robots run with lockdep on. > > In general there seems to be a ton of lockdep on coverage. *cough* Bit locks, semaphores, and all sorts of other constructs for IO serialisation (like inode_dio_wait()) have no lockdep coverage at all. IOWs, large chunks of many filesystems, the VFS and the VM have little to no lockdep coverage at all. > > we also have VM_BUG_ON_MM(!rwsem_is_write_locked(&mm->mmap_lock), mm) > > to give us a good assertion when lockdep is disabled. > > Is that really worth it still? I mean, much of these assertions pre-date > lockdep. And we're trying to propagate them because lockdep isn't a viable option for day to day testing of filesystems because of it's overhead vs how infrequently it finds new problems. > > XFS has a problem with using lockdep in general, which is that a worker > > thread can be spawned and use the fact that the spawner is holding the > > lock. There's no mechanism for the worker thread to ask "Does struct > > task_struct *p hold the lock?". > > Will be somewhat tricky to make happen -- but might be doable. It is > however an interface that is *very* hard to use correctly. Basically I > think you want to also assert that your target task 'p' is blocked, > right? > > That is: assert @p is blocked and holds @lock. That addresses the immediate symptom; it doesn't address the large problem with lockdep and needing non-owner rwsem semantics. i.e. synchronous task based locking models don't work for asynchronous multi-stage pipeline processing engines like XFS. The lock protects the data object and follows the data object through the processing pipeline, whilst the original submitter moves on to the next operation to processes without blocking. This is the non-blocking, async processing model that io_uring development is pushing filesystems towards, so assuming that we only hand a lock to a single worker task and then wait for it complete (i.e. synchronous operation) flies in the face of current development directions... -Dave.
On 9/7/23 17:06, Waiman Long wrote: > > On 9/7/23 15:33, Matthew Wilcox wrote: >> On Thu, Sep 07, 2023 at 02:05:54PM -0400, Waiman Long wrote: >>> On 9/7/23 13:47, Matthew Wilcox (Oracle) wrote: >>>> +static inline int rwsem_is_write_locked(struct rw_semaphore *sem) >>>> +{ >>>> + return atomic_long_read(&sem->count) & 1 /* >>>> RWSEM_WRITER_LOCKED */; >>>> +} >>> I would prefer you move the various RWSEM_* count bit macros from >>> kernel/locking/rwsem.c to under the !PREEMPT_RT block and directly use >>> RWSEM_WRITER_LOCKED instead of hardcoding a value of 1. >> Just to be clear, you want the ~50 lines from: >> >> /* >> * On 64-bit architectures, the bit definitions of the count are: >> ... >> #define RWSEM_READ_FAILED_MASK (RWSEM_WRITER_MASK|RWSEM_FLAG_WAITERS|\ >> RWSEM_FLAG_HANDOFF|RWSEM_FLAG_READFAIL) >> >> moved from rwsem.c to rwsem.h? >> >> Or just these four lines: >> >> #define RWSEM_WRITER_LOCKED (1UL << 0) >> #define RWSEM_FLAG_WAITERS (1UL << 1) >> #define RWSEM_FLAG_HANDOFF (1UL << 2) >> #define RWSEM_FLAG_READFAIL (1UL << (BITS_PER_LONG - 1)) > > I think just the first 3 lines will be enough. Maybe a bit of comment > about these bit flags in the count atomic_long value. Actually, the old rwsem implementation won't allow you to reliably determine if a rwsem is write locked because the xadd instruction is used for write locking and the code had to back out the WRITER_BIAS if the attempt failed. Maybe that is why XFS has its own code to check if a rwsem is write locked which is needed with the old rwsem implementation. The new implementation makes this check reliable. Still it is not easy to check if a rwsem is read locked as the check will be rather complicated and probably racy. Cheers, Longman
On Thu, Sep 07, 2023 at 09:38:38PM +0200, Peter Zijlstra wrote: > On Thu, Sep 07, 2023 at 08:20:30PM +0100, Matthew Wilcox wrote: > > On Thu, Sep 07, 2023 at 09:08:10PM +0200, Peter Zijlstra wrote: > > > On Thu, Sep 07, 2023 at 06:47:01PM +0100, Matthew Wilcox (Oracle) wrote: > > > > Several places want to know whether the lock is held by a writer, instead > > > > of just whether it's held. We can implement this for both normal and > > > > rt rwsems. RWSEM_WRITER_LOCKED is declared in rwsem.c and exposing > > > > it outside that file might tempt other people to use it, so just use > > > > a comment to note that's what the 1 means, and help anybody find it if > > > > they're looking to change the implementation. > > > > > > I'm presuming this is deep in a callchain where they know they hold the > > > lock, but they lost in what capacity? > > > > No, it's just assertions. You can see that in patch 3 where it's > > used in functions called things like "xfs_islocked". > > Right, but if you're not the lock owner, your answer to the question is > a dice-roll, it might be locked, it might not be. Sure, but if you are the lock owner, it's guaranteed to work. And if you hold it for read, and check that it's held for write, that will definitely fail. I agree "somebody else is holding it" gives you a false negative, but most locks aren't contended, so it'll fail the assertion often enough. > > Patch 2 shows it in use in the MM code. We already have a > > lockdep_assert_held_write(), but most people don't enable lockdep, so > > Most devs should run with lockdep on when writing new code, and I know > the sanitizer robots run with lockdep on. > > In general there seems to be a ton of lockdep on coverage. Hm. Enabling lockdep causes an xfstests run to go up from 6000 seconds to 8000 seconds for me. That's a significant reduction in the number of test cycles I can run per day. > > we also have VM_BUG_ON_MM(!rwsem_is_write_locked(&mm->mmap_lock), mm) > > to give us a good assertion when lockdep is disabled. > > Is that really worth it still? I mean, much of these assertions pre-date > lockdep. That's true. Is it possible to do a cheaper version of lockdep where it only records that you have the lock and doesn't, eg, check for locking dependencies? I haven't analysed lockdep to see what the expensive part is; for all I know it's recording the holders, and this idea is worthless. > > XFS has a problem with using lockdep in general, which is that a worker > > thread can be spawned and use the fact that the spawner is holding the > > lock. There's no mechanism for the worker thread to ask "Does struct > > task_struct *p hold the lock?". > > Will be somewhat tricky to make happen -- but might be doable. It is > however an interface that is *very* hard to use correctly. Basically I > think you want to also assert that your target task 'p' is blocked, > right? > > That is: assert @p is blocked and holds @lock. Probably, although I think there might be a race where the worker thread starts running before the waiter starts waiting? In conversation with Darrick & Dave, XFS does this in order to avoid overflowing the kernel stack. So if you've been thinking about "we could support segmented stacks", it would avoid having to support this lockdep handoff. It'd probably work a lot better for the scheduler too.
On Thu, Sep 07, 2023 at 07:47:31PM -0400, Waiman Long wrote: > > On 9/7/23 17:06, Waiman Long wrote: > > > > On 9/7/23 15:33, Matthew Wilcox wrote: > > > On Thu, Sep 07, 2023 at 02:05:54PM -0400, Waiman Long wrote: > > > > On 9/7/23 13:47, Matthew Wilcox (Oracle) wrote: > > > > > +static inline int rwsem_is_write_locked(struct rw_semaphore *sem) > > > > > +{ > > > > > + return atomic_long_read(&sem->count) & 1 /* > > > > > RWSEM_WRITER_LOCKED */; > > > > > +} > > > > I would prefer you move the various RWSEM_* count bit macros from > > > > kernel/locking/rwsem.c to under the !PREEMPT_RT block and directly use > > > > RWSEM_WRITER_LOCKED instead of hardcoding a value of 1. > > > Just to be clear, you want the ~50 lines from: > > > > > > /* > > > * On 64-bit architectures, the bit definitions of the count are: > > > ... > > > #define RWSEM_READ_FAILED_MASK (RWSEM_WRITER_MASK|RWSEM_FLAG_WAITERS|\ > > > RWSEM_FLAG_HANDOFF|RWSEM_FLAG_READFAIL) > > > > > > moved from rwsem.c to rwsem.h? > > > > > > Or just these four lines: > > > > > > #define RWSEM_WRITER_LOCKED (1UL << 0) > > > #define RWSEM_FLAG_WAITERS (1UL << 1) > > > #define RWSEM_FLAG_HANDOFF (1UL << 2) > > > #define RWSEM_FLAG_READFAIL (1UL << (BITS_PER_LONG - 1)) > > > > I think just the first 3 lines will be enough. Maybe a bit of comment > > about these bit flags in the count atomic_long value. > > Actually, the old rwsem implementation won't allow you to reliably determine > if a rwsem is write locked because the xadd instruction is used for write > locking and the code had to back out the WRITER_BIAS if the attempt failed. > Maybe that is why XFS has its own code to check if a rwsem is write locked > which is needed with the old rwsem implementation. mrlocks pre-date rwsems entirely on Linux. mrlocks were introduced to XFS as part of the port from Irix back in 2000. This originally had a 'ismrlocked()' function for checking lock state. In 2003, this was expanded to allow explicit lock type checks via 'mrislocked_access() and 'mrislocked_update()' wrappers that checked internal counters to determine how it was locked. In 2004, the mrlock was converted to use the generic kernel rwsem implementation, and because that couldn't be used to track writers, the mrlock included a mr_writer boolean field to indicate it was write locked for the purpose of implementing the existing debug checks. Hence the mrlock debug code has always had reliable differentiation of read vs write state, whereas we couldn't do that natively with rwsems for a real long time. The mrlocks have essentially remained unchanged since 2004 - this long predates lockdep, and it lives on because gives us something lockdep doesn't: zero overhead locking validation. > The new implementation makes this check reliable. Still it is not easy to > check if a rwsem is read locked as the check will be rather complicated and > probably racy. You can't look at these locking checks in isolation. These checks are done in code paths where we expect the caller to have already locked the rwsem in the manner required. Hence there should be no races with rwsem state changes at all. If we see a locking assert to fire, it means we either screwed up the XFS locking completely (no races necessary), or there's a bug in the rwsem implementation. The latter case has occurred several times; the rwsem locking checks in XFS have uncovered more than one rwsem implementation bug in the past... IOWs, the explicit lock state checks and asserts provide a lock implemetnation validation mechanism that lockdep doesn't.... Cheers, Dave.
On Fri, Sep 08, 2023 at 09:00:08AM +1000, Dave Chinner wrote: > > Right, but if you're not the lock owner, your answer to the question is > > a dice-roll, it might be locked, it might not be. > > Except that the person writing the code knows the call chain that > leads up to that code, and so they have a pretty good idea whether > the object should be locked or not. If we are running that code, and > the object is locked, then it's pretty much guaranteed that the > owner of the lock is code that executed the check, because otherwise > we have a *major lock implementation bug*. Agreed, and this is fine. However there's been some very creative 'use' of the _is_locked() class of functions in the past that did not follow 'common' sense. If all usage was: I should be holding this, lets check. I probably wouldn't have this bad feeling about things. > > Most devs should run with lockdep on when writing new code, and I know > > the sanitizer robots run with lockdep on. > > > > In general there seems to be a ton of lockdep on coverage. > > *cough* > > Bit locks, semaphores, and all sorts of other constructs for IO > serialisation (like inode_dio_wait()) have no lockdep coverage at > all. IOWs, large chunks of many filesystems, the VFS and the VM have > little to no lockdep coverage at all. True, however I was commenting on the assertion that vm code has duplicate asserts with the implication that was because not a lot of people run with lockdep on. > > > we also have VM_BUG_ON_MM(!rwsem_is_write_locked(&mm->mmap_lock), mm) > > > to give us a good assertion when lockdep is disabled. > > > > Is that really worth it still? I mean, much of these assertions pre-date > > lockdep. > > And we're trying to propagate them because lockdep isn't a viable > option for day to day testing of filesystems because of it's > overhead vs how infrequently it finds new problems. ... in XFS. Lockdep avoids a giant pile of broken from entering the kernel and the robots still report plenty. > > > XFS has a problem with using lockdep in general, which is that a worker > > > thread can be spawned and use the fact that the spawner is holding the > > > lock. There's no mechanism for the worker thread to ask "Does struct > > > task_struct *p hold the lock?". > > > > Will be somewhat tricky to make happen -- but might be doable. It is > > however an interface that is *very* hard to use correctly. Basically I > > think you want to also assert that your target task 'p' is blocked, > > right? > > > > That is: assert @p is blocked and holds @lock. > > That addresses the immediate symptom; it doesn't address the large > problem with lockdep and needing non-owner rwsem semantics. > > i.e. synchronous task based locking models don't work for > asynchronous multi-stage pipeline processing engines like XFS. The > lock protects the data object and follows the data object through > the processing pipeline, whilst the original submitter moves on to > the next operation to processes without blocking. > > This is the non-blocking, async processing model that io_uring > development is pushing filesystems towards, so assuming that we only > hand a lock to a single worker task and then wait for it complete > (i.e. synchronous operation) flies in the face of current > development directions... I was looking at things from an interface abuse perspective. How easy is it to do the wrong thing. As said, we've had a bunch of really dodgy code with the _is_locked class of functions, hence my desire to find something else. As to the whole non-owner locking, yes, that's problematic. I'm not convinced async operations require non-owner locking, at the same time I do see that IO completions pose a challence. Coming from the schedulability and real-time corner, non-owner locks are a nightmare because of the inversions. So yeah, fun to be had I'm sure.
On Fri, Sep 08, 2023 at 12:44:34PM +0200, Peter Zijlstra wrote: > On Fri, Sep 08, 2023 at 09:00:08AM +1000, Dave Chinner wrote: > > > > Right, but if you're not the lock owner, your answer to the question is > > > a dice-roll, it might be locked, it might not be. > > > > Except that the person writing the code knows the call chain that > > leads up to that code, and so they have a pretty good idea whether > > the object should be locked or not. If we are running that code, and > > the object is locked, then it's pretty much guaranteed that the > > owner of the lock is code that executed the check, because otherwise > > we have a *major lock implementation bug*. > > Agreed, and this is fine. However there's been some very creative > 'use' of the _is_locked() class of functions in the past that did not > follow 'common' sense. > > If all usage was: I should be holding this, lets check. I probably > wouldn't have this bad feeling about things. So your argument against such an interface is essentially "we can't have nice things because someone might abuse them"? > > > Most devs should run with lockdep on when writing new code, and I know > > > the sanitizer robots run with lockdep on. > > > > > > In general there seems to be a ton of lockdep on coverage. > > > > *cough* > > > > Bit locks, semaphores, and all sorts of other constructs for IO > > serialisation (like inode_dio_wait()) have no lockdep coverage at > > all. IOWs, large chunks of many filesystems, the VFS and the VM have > > little to no lockdep coverage at all. > > True, however I was commenting on the assertion that vm code has > duplicate asserts with the implication that was because not a lot of > people run with lockdep on. I think that implication is pretty much spot on the money for any subsystem that has significant correctness testing overhead. A single fstests regression test pass for a single configuration for XFS takes well over 4 hours to run. If I add lockdep, it's about 7 hours. If I add lockdep and KASAN, it's closer to 12 hours. It just isn't viable to run test kernels with these options day-in, day-out. Maybe once a release I'll run a weekend sanity check with them enabled, but otherwise the rare issue they find just isn't worth the cost of enabling them.... > > > > we also have VM_BUG_ON_MM(!rwsem_is_write_locked(&mm->mmap_lock), mm) > > > > to give us a good assertion when lockdep is disabled. > > > > > > Is that really worth it still? I mean, much of these assertions pre-date > > > lockdep. > > > > And we're trying to propagate them because lockdep isn't a viable > > option for day to day testing of filesystems because of it's > > overhead vs how infrequently it finds new problems. > > ... in XFS. Lockdep avoids a giant pile of broken from entering the > kernel and the robots still report plenty. Nobody is suggesting that lockdep gets replaced by these functions. They are *in addition* to lockdep, and are used to give use 99.9% certainty that locks are being used correctly without adding any runtime overhead at all. That's the whole point - it is simple introspection code that will find most gross locking mistakes that people make very quickly, without any real costs. If you're worried about people abusing introspection code, then you're forgetting that they can just dig directly at the rwsem guts directly themselves. Adding an interface that does introspection right and enables the internal imp[lementation to change without breaking anything is a no brainer; it stops people from just digging in the guts of the structure and guaranteeing that code will break if the implementation changes... > > > > XFS has a problem with using lockdep in general, which is that a worker > > > > thread can be spawned and use the fact that the spawner is holding the > > > > lock. There's no mechanism for the worker thread to ask "Does struct > > > > task_struct *p hold the lock?". > > > > > > Will be somewhat tricky to make happen -- but might be doable. It is > > > however an interface that is *very* hard to use correctly. Basically I > > > think you want to also assert that your target task 'p' is blocked, > > > right? > > > > > > That is: assert @p is blocked and holds @lock. > > > > That addresses the immediate symptom; it doesn't address the large > > problem with lockdep and needing non-owner rwsem semantics. > > > > i.e. synchronous task based locking models don't work for > > asynchronous multi-stage pipeline processing engines like XFS. The > > lock protects the data object and follows the data object through > > the processing pipeline, whilst the original submitter moves on to > > the next operation to processes without blocking. > > > > This is the non-blocking, async processing model that io_uring > > development is pushing filesystems towards, so assuming that we only > > hand a lock to a single worker task and then wait for it complete > > (i.e. synchronous operation) flies in the face of current > > development directions... > > I was looking at things from an interface abuse perspective. How easy is > it to do the wrong thing. As said, we've had a bunch of really dodgy > code with the _is_locked class of functions, hence my desire to find > something else. > > As to the whole non-owner locking, yes, that's problematic. I'm not > convinced async operations require non-owner locking, at the same time I > do see that IO completions pose a challence. > > Coming from the schedulability and real-time corner, non-owner locks are > a nightmare because of the inversions. So yeah, fun to be had I'm sure. I'm not sure you understand the scope of the problem with modern filesystems vs RT processing. The moment code enters a modern filesystem, it gives up all hope of real-time response guarantees. There is currently nothing a RT process can do but wait for the filesystem to finish with the locks it holds, and the wait times are effectively unbound because there may be a requirement for tens of thousands of IOs to be done before the lock is dropped and the RT task can make progress. Priority inheritance for the lock owner won't make any difference here, because the latency is not caused by something running on a CPU. IOWs, lock inversions and non-owner locks are the very least of the problems for RT priority apps when it comes to filesystem operations. The solution for RT priority apps avoiding priority inversions in filesystems is going be io_uring. i.e. the initial NOWAIT operation is done with RT priority in the RT task itself, but if that is going to be blocked it gets punted to a background worker for async processing and the RT priority task goes on to processing the next thing it needs to do. All the background async operations are performed with the same (non-RT) priority and we just don't need to care about priority inversions or the problems RT has with non-owner lock contexts. The RT tasks themselves don't care, either, because they don't ever get stuck waiting on a filesystem lock that a lower priority task might hold, or get stuck on an operation that might require unbound amounts of IO to complete (e.g. transaction reservations). IOWs, if we want to make "RT with filesystems" a reality, we need to stop worrying about constraining lock implementations and handling priority inversions. Instead, we need to look towards how to make filesystem infrastructure fully non-blocking for RT priority tasks and writing RT applications to use that infrastructure.... Cheers, Dave.
On Mon, Sep 11, 2023 at 08:56:45AM +1000, Dave Chinner wrote: > On Fri, Sep 08, 2023 at 12:44:34PM +0200, Peter Zijlstra wrote: > > Agreed, and this is fine. However there's been some very creative > > 'use' of the _is_locked() class of functions in the past that did not > > follow 'common' sense. > > > > If all usage was: I should be holding this, lets check. I probably > > wouldn't have this bad feeling about things. > > So your argument against such an interface is essentially "we can't > have nice things because someone might abuse them"? Some people are very creative ... I was thinking about how to handle this better. We could have static inline void rwsem_assert_locked(const struct rw_semaphore *sem) { BUG_ON(atomic_long_read(&sem->count) == 0); } static inline void rwsem_assert_write_locked(const struct rw_semaphore *sem) { BUG_ON((atomic_long_read(&sem->count) & 1) != 1); } but then we'd also need to change how XFS currently uses the ASSERT() macro to be ASSERT_LOCKED(ip, flags), and I understand this code is also used in userspace, so it'd involve changing that shim, and we're getting way past the amount of code I'm comfortable changing, and way past the amount of time I should be spending on this. And then there'd be the inevitable bikeshedding about "don't use BUG_ON" and it's probably just for the best if I walk away at this point, becoming the third person to fail to remove the mrlock. I'll keep reading this thread to see if some consensus emerges, but I'm not optimistic.
On Mon, Sep 11, 2023 at 12:17:18AM +0100, Matthew Wilcox wrote: > On Mon, Sep 11, 2023 at 08:56:45AM +1000, Dave Chinner wrote: > > On Fri, Sep 08, 2023 at 12:44:34PM +0200, Peter Zijlstra wrote: > > > Agreed, and this is fine. However there's been some very creative > > > 'use' of the _is_locked() class of functions in the past that did not > > > follow 'common' sense. > > > > > > If all usage was: I should be holding this, lets check. I probably > > > wouldn't have this bad feeling about things. > > > > So your argument against such an interface is essentially "we can't > > have nice things because someone might abuse them"? > > Some people are very creative ... Sure, but that's no reason to stop anyone else from making progress. > I was thinking about how to handle this better. We could have > > static inline void rwsem_assert_locked(const struct rw_semaphore *sem) > { > BUG_ON(atomic_long_read(&sem->count) == 0); > } > > static inline void rwsem_assert_write_locked(const struct rw_semaphore *sem) > { > BUG_ON((atomic_long_read(&sem->count) & 1) != 1); > } We already have CONFIG_DEBUG_RWSEMS, so we can put these introspection interfaces inside debug code, and make any attempt to use them outside of debug builds break the build. e.g: #if DEBUG_RWSEMS /* * rwsem locked checks can only be used by conditionally compiled * subsystem debug code. It is not valid to use them in normal * production code. */ static inline bool rwsem_is_write_locked() { .... } static inline bool rwsem_is_locked() { .... } #else /* !DEBUG_RWSEMS */ #define rwsem_is_write_locked() BUILD_BUG() #define rwsem_is_locked() BUILD_BUG() #endif /* DEBUG_RWSEMS */ And now we simply add a single line to subsystem Kconfig debug options to turn on rwsem introspection for their debug checks like so: config XFS_DEBUG bool "XFS Debugging support" depends on XFS_FS + select RWSEM_DEBUG help Say Y here to get an XFS build with many debugging features, including ASSERT checks, function wrappers around macros, > but then we'd also need to change how XFS currently uses the ASSERT() > macro to be ASSERT_LOCKED(ip, flags), and I understand this code is also > used in userspace, so it'd involve changing that shim, and we're getting > way past the amount of code I'm comfortable changing, and way past the > amount of time I should be spending on this. > > And then there'd be the inevitable bikeshedding about "don't use BUG_ON" > and it's probably just for the best if I walk away at this point, > becoming the third person to fail to remove the mrlock. Yeah, which further points out how ridiculous the situation is. This is useful debug code and it can *obviously* and *easily* be constrained to debug environments. Cheers, Dave.
On 9/10/23 20:55, Dave Chinner wrote: > On Mon, Sep 11, 2023 at 12:17:18AM +0100, Matthew Wilcox wrote: >> On Mon, Sep 11, 2023 at 08:56:45AM +1000, Dave Chinner wrote: >>> On Fri, Sep 08, 2023 at 12:44:34PM +0200, Peter Zijlstra wrote: >>>> Agreed, and this is fine. However there's been some very creative >>>> 'use' of the _is_locked() class of functions in the past that did not >>>> follow 'common' sense. >>>> >>>> If all usage was: I should be holding this, lets check. I probably >>>> wouldn't have this bad feeling about things. >>> So your argument against such an interface is essentially "we can't >>> have nice things because someone might abuse them"? >> Some people are very creative ... > Sure, but that's no reason to stop anyone else from making progress. > >> I was thinking about how to handle this better. We could have >> >> static inline void rwsem_assert_locked(const struct rw_semaphore *sem) >> { >> BUG_ON(atomic_long_read(&sem->count) == 0); >> } >> >> static inline void rwsem_assert_write_locked(const struct rw_semaphore *sem) >> { >> BUG_ON((atomic_long_read(&sem->count) & 1) != 1); >> } > We already have CONFIG_DEBUG_RWSEMS, so we can put these > introspection interfaces inside debug code, and make any attempt to > use them outside of debug builds break the build. e.g: > > #if DEBUG_RWSEMS > /* > * rwsem locked checks can only be used by conditionally compiled > * subsystem debug code. It is not valid to use them in normal > * production code. > */ > static inline bool rwsem_is_write_locked() > { > .... > } > > static inline bool rwsem_is_locked() > { > .... > } > #else /* !DEBUG_RWSEMS */ > #define rwsem_is_write_locked() BUILD_BUG() > #define rwsem_is_locked() BUILD_BUG() > #endif /* DEBUG_RWSEMS */ > > And now we simply add a single line to subsystem Kconfig debug > options to turn on rwsem introspection for their debug checks like > so: > > config XFS_DEBUG > bool "XFS Debugging support" > depends on XFS_FS > + select RWSEM_DEBUG > help > Say Y here to get an XFS build with many debugging features, > including ASSERT checks, function wrappers around macros, That may be a possible compromise. Actually, I am not against having them defined even outside the DEBUG_RWSEMS. We already have mutex_is_locked() defined and used in a lot of places. So this is just providing the rwsem equivalents. I also agreed that these APIs can be misused by other users. I think we should clearly document the caveats of using these. So unless there are other means of maintaining the stability of the lock state, the test result may no longer be true right after the test. It is simply just the lock state at a certain moment in time. Callers are using them at their own risk. Cheers, Longman
On Sun, Sep 10, 2023 at 10:15:59PM -0400, Waiman Long wrote: > > On 9/10/23 20:55, Dave Chinner wrote: > > On Mon, Sep 11, 2023 at 12:17:18AM +0100, Matthew Wilcox wrote: > > > On Mon, Sep 11, 2023 at 08:56:45AM +1000, Dave Chinner wrote: > > > > On Fri, Sep 08, 2023 at 12:44:34PM +0200, Peter Zijlstra wrote: > > > > > Agreed, and this is fine. However there's been some very creative > > > > > 'use' of the _is_locked() class of functions in the past that did not > > > > > follow 'common' sense. > > > > > > > > > > If all usage was: I should be holding this, lets check. I probably > > > > > wouldn't have this bad feeling about things. > > > > So your argument against such an interface is essentially "we can't > > > > have nice things because someone might abuse them"? > > > Some people are very creative ... > > Sure, but that's no reason to stop anyone else from making progress. > > > > > I was thinking about how to handle this better. We could have > > > > > > static inline void rwsem_assert_locked(const struct rw_semaphore *sem) > > > { > > > BUG_ON(atomic_long_read(&sem->count) == 0); > > > } > > > > > > static inline void rwsem_assert_write_locked(const struct rw_semaphore *sem) > > > { > > > BUG_ON((atomic_long_read(&sem->count) & 1) != 1); > > > } > > We already have CONFIG_DEBUG_RWSEMS, so we can put these > > introspection interfaces inside debug code, and make any attempt to > > use them outside of debug builds break the build. e.g: > > > > #if DEBUG_RWSEMS > > /* > > * rwsem locked checks can only be used by conditionally compiled > > * subsystem debug code. It is not valid to use them in normal > > * production code. > > */ > > static inline bool rwsem_is_write_locked() > > { > > .... > > } > > > > static inline bool rwsem_is_locked() > > { > > .... > > } > > #else /* !DEBUG_RWSEMS */ > > #define rwsem_is_write_locked() BUILD_BUG() > > #define rwsem_is_locked() BUILD_BUG() > > #endif /* DEBUG_RWSEMS */ > > > > And now we simply add a single line to subsystem Kconfig debug > > options to turn on rwsem introspection for their debug checks like > > so: > > > > config XFS_DEBUG > > bool "XFS Debugging support" > > depends on XFS_FS > > + select RWSEM_DEBUG > > help > > Say Y here to get an XFS build with many debugging features, > > including ASSERT checks, function wrappers around macros, > > That may be a possible compromise. Actually, I am not against having them > defined even outside the DEBUG_RWSEMS. We already have mutex_is_locked() > defined and used in a lot of places. So this is just providing the rwsem > equivalents. So, once again, we have mixed messages from the lock maintainers. One says "no, it might get abused", another says "I'm fine with that", and now we have a maintainer disagreement stalemate. This is dysfunctional. Peter, Waiman, please make a decision one way or the other about allowing rwsems ito support native write lock checking. In the absence of an actual yes/no decision, do we assume that the maintainers don't actually care about it and we should just submit it straight to Linus? -Dave.
On Tue, Sep 12, 2023 at 08:29:55AM +1000, Dave Chinner wrote: > So, once again, we have mixed messages from the lock maintainers. > One says "no, it might get abused", another says "I'm fine with > that", and now we have a maintainer disagreement stalemate. I didn't say no, I was trying to see if there's alternatives because the is_locked pattern has a history of abuse. If not, then sure we can do this; it's not like I managed to get rid of muteX_is_locked() -- and I actually tried at some point :/ And just now I grepped for it, and look what I find: drivers/hid/hid-nintendo.c: if (unlikely(mutex_is_locked(&ctlr->output_mutex))) { drivers/nvdimm/btt.c: if (mutex_is_locked(&arena->err_lock) And there's more :-( Also, please just calm down already..
On Tue, Sep 12, 2023 at 11:03:42AM +0200, Peter Zijlstra wrote: > If not, then sure we can do this; it's not like I managed to get rid of > muteX_is_locked() -- and I actually tried at some point :/ > > And just now I grepped for it, and look what I find: > > drivers/hid/hid-nintendo.c: if (unlikely(mutex_is_locked(&ctlr->output_mutex))) { > drivers/nvdimm/btt.c: if (mutex_is_locked(&arena->err_lock) > > And there's more :-( Are these actually abuse? I looked at these two, and they both seem to be asking "Does somebody else currently have this mutex?" rather than "Do I have this mutex?".
On Tue, Sep 12, 2023 at 01:28:13PM +0100, Matthew Wilcox wrote: > On Tue, Sep 12, 2023 at 11:03:42AM +0200, Peter Zijlstra wrote: > > If not, then sure we can do this; it's not like I managed to get rid of > > muteX_is_locked() -- and I actually tried at some point :/ > > > > And just now I grepped for it, and look what I find: > > > > drivers/hid/hid-nintendo.c: if (unlikely(mutex_is_locked(&ctlr->output_mutex))) { > > drivers/nvdimm/btt.c: if (mutex_is_locked(&arena->err_lock) > > > > And there's more :-( > > Are these actually abuse? I looked at these two, and they both seem to > be asking "Does somebody else currently have this mutex?" rather than > "Do I have this mutex?". It's effectively a random number generator in that capacity. Someone might have it or might have had it when you looked and no longer have it, or might have it now but not when you asked.
On Tue, Sep 12, 2023 at 03:52:13PM +0200, Peter Zijlstra wrote: > On Tue, Sep 12, 2023 at 01:28:13PM +0100, Matthew Wilcox wrote: > > On Tue, Sep 12, 2023 at 11:03:42AM +0200, Peter Zijlstra wrote: > > > If not, then sure we can do this; it's not like I managed to get rid of > > > muteX_is_locked() -- and I actually tried at some point :/ > > > > > > And just now I grepped for it, and look what I find: > > > > > > drivers/hid/hid-nintendo.c: if (unlikely(mutex_is_locked(&ctlr->output_mutex))) { > > > drivers/nvdimm/btt.c: if (mutex_is_locked(&arena->err_lock) > > > > > > And there's more :-( > > > > Are these actually abuse? I looked at these two, and they both seem to > > be asking "Does somebody else currently have this mutex?" rather than > > "Do I have this mutex?". > > It's effectively a random number generator in that capacity. Someone > might have it or might have had it when you looked and no longer have > it, or might have it now but not when you asked. Well, no. if (mutex_is_locked(&arena->err_lock) || arena->freelist[lane].has_err) { nd_region_release_lane(btt->nd_region, lane); ret = arena_clear_freelist_error(arena, lane); So that's "Is somebody currently processing an error, or have they already finished setting an error". Sure, it's somewhat racy, but it looks like a performance optimisation, not something that needs 100% accuracy. The other one's in a similar boat; an optimisation if anyone else is currently holding this mutex: /* * Immediately after receiving a report is the most reliable time to * send a subcommand to the controller. Wake any subcommand senders * waiting for a report. */ if (unlikely(mutex_is_locked(&ctlr->output_mutex))) { spin_lock_irqsave(&ctlr->lock, flags); ctlr->received_input_report = true; spin_unlock_irqrestore(&ctlr->lock, flags); wake_up(&ctlr->wait); } Sure, they might not still be holding it, or it may have been grabbed one clock tick later; that just means they miss out on this optimisation.
On Tue, Sep 12, 2023 at 03:52:13PM +0200, Peter Zijlstra wrote: > On Tue, Sep 12, 2023 at 01:28:13PM +0100, Matthew Wilcox wrote: > > On Tue, Sep 12, 2023 at 11:03:42AM +0200, Peter Zijlstra wrote: > > > If not, then sure we can do this; it's not like I managed to get rid of > > > muteX_is_locked() -- and I actually tried at some point :/ > > > > > > And just now I grepped for it, and look what I find: > > > > > > drivers/hid/hid-nintendo.c: if (unlikely(mutex_is_locked(&ctlr->output_mutex))) { > > > drivers/nvdimm/btt.c: if (mutex_is_locked(&arena->err_lock) > > > > > > And there's more :-( > > > > Are these actually abuse? I looked at these two, and they both seem to > > be asking "Does somebody else currently have this mutex?" rather than > > "Do I have this mutex?". > > It's effectively a random number generator in that capacity. Someone > might have it or might have had it when you looked and no longer have > it, or might have it now but not when you asked. Also, there's more fun; the 'is_locked' store from spin_lock() (or mutex, or whatever) is not ordered vs any other write inside the critical section. So something like: bar = 0; CPU0 CPU1 spin_lock(&foo) bar = 1; x = READ_ONCE(bar) y = spin_is_locked(&foo); spin_unlock(&foo); can have x==1 && y==0, even though CPU0 is currently inside the critical section. Normally that doesn't matter, and for the program-order case where you ask 'am I holding the lock' this obviously cannot go wrong. But the moment you ask: 'is someone else holding the lock' it all goes sideways real fast. We've been there, done that, got a t-shirt etc..
On Tue, Sep 12, 2023 at 02:58:32PM +0100, Matthew Wilcox wrote: > On Tue, Sep 12, 2023 at 03:52:13PM +0200, Peter Zijlstra wrote: > > On Tue, Sep 12, 2023 at 01:28:13PM +0100, Matthew Wilcox wrote: > > > On Tue, Sep 12, 2023 at 11:03:42AM +0200, Peter Zijlstra wrote: > > > > If not, then sure we can do this; it's not like I managed to get rid of > > > > muteX_is_locked() -- and I actually tried at some point :/ > > > > > > > > And just now I grepped for it, and look what I find: > > > > > > > > drivers/hid/hid-nintendo.c: if (unlikely(mutex_is_locked(&ctlr->output_mutex))) { > > > > drivers/nvdimm/btt.c: if (mutex_is_locked(&arena->err_lock) > > > > > > > > And there's more :-( > > > > > > Are these actually abuse? I looked at these two, and they both seem to > > > be asking "Does somebody else currently have this mutex?" rather than > > > "Do I have this mutex?". > > > > It's effectively a random number generator in that capacity. Someone > > might have it or might have had it when you looked and no longer have > > it, or might have it now but not when you asked. > > Well, no. > > if (mutex_is_locked(&arena->err_lock) > || arena->freelist[lane].has_err) { > nd_region_release_lane(btt->nd_region, lane); > > ret = arena_clear_freelist_error(arena, lane); > > So that's "Is somebody currently processing an error, or have they > already finished setting an error". Sure, it's somewhat racy, but > it looks like a performance optimisation, not something that needs > 100% accuracy. We're arguing past one another I think. Yes mutex_is_locked() is a random number generator when asked for something you don't own. But it might not be a bug because the code is ok with races. It is still fully dodgy IMO, such usage is pretty close to UB.
On Tue, Sep 12, 2023 at 04:23:00PM +0200, Peter Zijlstra wrote: > On Tue, Sep 12, 2023 at 02:58:32PM +0100, Matthew Wilcox wrote: > > On Tue, Sep 12, 2023 at 03:52:13PM +0200, Peter Zijlstra wrote: > > > On Tue, Sep 12, 2023 at 01:28:13PM +0100, Matthew Wilcox wrote: > > > > On Tue, Sep 12, 2023 at 11:03:42AM +0200, Peter Zijlstra wrote: > > > > > If not, then sure we can do this; it's not like I managed to get rid of > > > > > muteX_is_locked() -- and I actually tried at some point :/ > > > > > > > > > > And just now I grepped for it, and look what I find: > > > > > > > > > > drivers/hid/hid-nintendo.c: if (unlikely(mutex_is_locked(&ctlr->output_mutex))) { > > > > > drivers/nvdimm/btt.c: if (mutex_is_locked(&arena->err_lock) > > > > > > > > > > And there's more :-( > > > > > > > > Are these actually abuse? I looked at these two, and they both seem to > > > > be asking "Does somebody else currently have this mutex?" rather than > > > > "Do I have this mutex?". > > > > > > It's effectively a random number generator in that capacity. Someone > > > might have it or might have had it when you looked and no longer have > > > it, or might have it now but not when you asked. > > > > Well, no. > > > > if (mutex_is_locked(&arena->err_lock) > > || arena->freelist[lane].has_err) { > > nd_region_release_lane(btt->nd_region, lane); > > > > ret = arena_clear_freelist_error(arena, lane); > > > > So that's "Is somebody currently processing an error, or have they > > already finished setting an error". Sure, it's somewhat racy, but > > it looks like a performance optimisation, not something that needs > > 100% accuracy. > > We're arguing past one another I think. Yes mutex_is_locked() is a > random number generator when asked for something you don't own. But it > might not be a bug because the code is ok with races. > > It is still fully dodgy IMO, such usage is pretty close to UB. My 2 cents here: I could live with Longman's suggestion of an rwsem_assert_is_locked that only exists if DEBUG_RWSEMS is enabled. Something like: #ifdef CONFIG_DEBUG_RWSEMS static inline bool __rwsem_assert_is_locked(struct rw_semaphore *rwsem, const char *file, int line) { bool ret = rwsem_is_locked(rwsem); if (!ret) WARN(1, "!rwsem_is_locked(rwsem) at %s line %d", file, line); return ret; } #define rwsem_assert_is_locked(r) \ __rwsem_assert_is_locked((r), __FILE__, __LINE__) #endif and then XFS could do: ASSERT(rwsem_assert_is_locked(&VFS_I(ip)->i_rwsem)); Wherein ASSERT is only #defined if CONFIG_XFS_DEBUG, and XFS_DEBUG selects DEBUG_RWSEMS, per Longman's suggestion. That's work for what we want it for (simple cheap lock checking) without becoming a general lockabuse predicate. --D
On Tue, Sep 12, 2023 at 11:03:42AM +0200, Peter Zijlstra wrote: > On Tue, Sep 12, 2023 at 08:29:55AM +1000, Dave Chinner wrote: > > > So, once again, we have mixed messages from the lock maintainers. > > One says "no, it might get abused", another says "I'm fine with > > that", and now we have a maintainer disagreement stalemate. > > I didn't say no, I was trying to see if there's alternatives because the > is_locked pattern has a history of abuse. Yet you haven't suggested or commented on the proposed methods to avoid abuse - you are still arguing that it can be abused. Go back and read what I proposed before continuing to argue about mutex_is_locked().... > If not, then sure we can do this; it's not like I managed to get rid of > muteX_is_locked() -- and I actually tried at some point :/ > > And just now I grepped for it, and look what I find: > > drivers/hid/hid-nintendo.c: if (unlikely(mutex_is_locked(&ctlr->output_mutex))) { > drivers/nvdimm/btt.c: if (mutex_is_locked(&arena->err_lock) > > And there's more :-( .... like this. Seriously: if we put it behind CONFIG_DEBUG_RWSEM, and then turn that on when other subsystem debug code wants the rwsem introspection, why does anyone outside that subsystem even how it gets used? It won't even get used in production kernels, because nobody will turn something on that requires rwsem debug in a production kernel. If you are still concerned that this will happen, then do the same that we've done for trace_printk and other debug only functions: dump a big warning at boot time that rwsem debug is enabled and this is not a supported production kernel configuration. > Also, please just calm down already.. I'm perfectly calm and relaxed. Asking for a definitive decision between co-maintainers who are giving decidedly mixed signals is a very reasonable request to make. Just because you may not like what such a request implies about how the code is being maintained, it doesn't mean I'm the slightest bit upset, hysterical or irrational. -Dave.
On Tue, Sep 12, 2023 at 08:27:15AM -0700, Darrick J. Wong wrote: > I could live with Longman's suggestion of an rwsem_assert_is_locked that > only exists if DEBUG_RWSEMS is enabled. Something like: > > #ifdef CONFIG_DEBUG_RWSEMS > static inline bool __rwsem_assert_is_locked(struct rw_semaphore *rwsem, > const char *file, int line) > { > bool ret = rwsem_is_locked(rwsem); > if (!ret) > WARN(1, "!rwsem_is_locked(rwsem) at %s line %d", file, line); > return ret; > } > #define rwsem_assert_is_locked(r) \ > __rwsem_assert_is_locked((r), __FILE__, __LINE__) > #endif > > and then XFS could do: > > ASSERT(rwsem_assert_is_locked(&VFS_I(ip)->i_rwsem)); > > Wherein ASSERT is only #defined if CONFIG_XFS_DEBUG, and XFS_DEBUG > selects DEBUG_RWSEMS, per Longman's suggestion. That's work for what we > want it for (simple cheap lock checking) without becoming a general > lockabuse predicate. Ack.
diff --git a/include/linux/rwbase_rt.h b/include/linux/rwbase_rt.h index 1d264dd08625..3c25b14edc05 100644 --- a/include/linux/rwbase_rt.h +++ b/include/linux/rwbase_rt.h @@ -31,6 +31,11 @@ static __always_inline bool rw_base_is_locked(struct rwbase_rt *rwb) return atomic_read(&rwb->readers) != READER_BIAS; } +static __always_inline bool rw_base_is_write_locked(struct rwbase_rt *rwb) +{ + return atomic_read(&rwb->readers) == WRITER_BIAS; +} + static __always_inline bool rw_base_is_contended(struct rwbase_rt *rwb) { return atomic_read(&rwb->readers) > 0; diff --git a/include/linux/rwsem.h b/include/linux/rwsem.h index 1dd530ce8b45..0f78b8d2e653 100644 --- a/include/linux/rwsem.h +++ b/include/linux/rwsem.h @@ -72,6 +72,11 @@ static inline int rwsem_is_locked(struct rw_semaphore *sem) return atomic_long_read(&sem->count) != 0; } +static inline int rwsem_is_write_locked(struct rw_semaphore *sem) +{ + return atomic_long_read(&sem->count) & 1 /* RWSEM_WRITER_LOCKED */; +} + #define RWSEM_UNLOCKED_VALUE 0L #define __RWSEM_COUNT_INIT(name) .count = ATOMIC_LONG_INIT(RWSEM_UNLOCKED_VALUE) @@ -157,6 +162,11 @@ static __always_inline int rwsem_is_locked(struct rw_semaphore *sem) return rw_base_is_locked(&sem->rwbase); } +static __always_inline int rwsem_is_write_locked(struct rw_semaphore *sem) +{ + return rw_base_is_write_locked(&sem->rwbase); +} + static __always_inline int rwsem_is_contended(struct rw_semaphore *sem) { return rw_base_is_contended(&sem->rwbase);
Several places want to know whether the lock is held by a writer, instead of just whether it's held. We can implement this for both normal and rt rwsems. RWSEM_WRITER_LOCKED is declared in rwsem.c and exposing it outside that file might tempt other people to use it, so just use a comment to note that's what the 1 means, and help anybody find it if they're looking to change the implementation. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> --- include/linux/rwbase_rt.h | 5 +++++ include/linux/rwsem.h | 10 ++++++++++ 2 files changed, 15 insertions(+)