Message ID | 20230712211115.2174650-12-kent.overstreet@linux.dev (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | bcachefs prereqs patch series | expand |
On 7/12/23 17:11, Kent Overstreet wrote: > These are used by bcachefs's six locks. > > Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> > Cc: Peter Zijlstra <peterz@infradead.org> > Cc: Ingo Molnar <mingo@redhat.com> > Cc: Waiman Long <longman@redhat.com> > Cc: Boqun Feng <boqun.feng@gmail.com> > --- > kernel/locking/osq_lock.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c > index d5610ad52b..b752ec5cc6 100644 > --- a/kernel/locking/osq_lock.c > +++ b/kernel/locking/osq_lock.c > @@ -203,6 +203,7 @@ bool osq_lock(struct optimistic_spin_queue *lock) > > return false; > } > +EXPORT_SYMBOL_GPL(osq_lock); > > void osq_unlock(struct optimistic_spin_queue *lock) > { > @@ -230,3 +231,4 @@ void osq_unlock(struct optimistic_spin_queue *lock) > if (next) > WRITE_ONCE(next->locked, 1); > } > +EXPORT_SYMBOL_GPL(osq_unlock); Have you considered extending the current rw_semaphore to support a SIX lock semantics? There are a number of instances in the kernel that a up_read() is followed by a down_write(). Basically, the code try to upgrade the lock from read to write. I have been thinking about adding a upgrade_read() API to do that. However, the concern that I had was that another writer may come in and make modification before the reader can be upgraded to have exclusive write access and will make the task to repeat what has been done in the read lock part. By adding a read with intent to upgrade to write, we can have that guarantee. With that said, I would prefer to keep osq_{lock/unlock} for internal use by some higher level locking primitives - mutex, rwsem and rt_mutex. Cheers, Longman
On Wed, Aug 02, 2023 at 04:16:12PM -0400, Waiman Long wrote: > On 7/12/23 17:11, Kent Overstreet wrote: > > These are used by bcachefs's six locks. > > > > Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> > > Cc: Peter Zijlstra <peterz@infradead.org> > > Cc: Ingo Molnar <mingo@redhat.com> > > Cc: Waiman Long <longman@redhat.com> > > Cc: Boqun Feng <boqun.feng@gmail.com> > > --- > > kernel/locking/osq_lock.c | 2 ++ > > 1 file changed, 2 insertions(+) > > > > diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c > > index d5610ad52b..b752ec5cc6 100644 > > --- a/kernel/locking/osq_lock.c > > +++ b/kernel/locking/osq_lock.c > > @@ -203,6 +203,7 @@ bool osq_lock(struct optimistic_spin_queue *lock) > > return false; > > } > > +EXPORT_SYMBOL_GPL(osq_lock); > > void osq_unlock(struct optimistic_spin_queue *lock) > > { > > @@ -230,3 +231,4 @@ void osq_unlock(struct optimistic_spin_queue *lock) > > if (next) > > WRITE_ONCE(next->locked, 1); > > } > > +EXPORT_SYMBOL_GPL(osq_unlock); > > Have you considered extending the current rw_semaphore to support a SIX lock > semantics? There are a number of instances in the kernel that a up_read() is > followed by a down_write(). Basically, the code try to upgrade the lock from > read to write. I have been thinking about adding a upgrade_read() API to do > that. However, the concern that I had was that another writer may come in > and make modification before the reader can be upgraded to have exclusive > write access and will make the task to repeat what has been done in the read > lock part. By adding a read with intent to upgrade to write, we can have > that guarantee. It's been discussed, Linus had the same thought. But it'd be a massive change to the rw semaphore code; this "read with intent" really is a third lock state which needs all the same lock/trylock/unlock paths, and with the way rw semaphore has separate entry points for read and write it'd be a _ton_ of new code. It really touches everything - waitlist handling included. And six locks have several other features that bcachefs needs, and other users may also end up wanting, that rw semaphores don't have; the two main features being a percpu read lock mode and support for an external cycle detector (which requires exposing lock waitlists, with some guarantees about how those waitlists are used). > With that said, I would prefer to keep osq_{lock/unlock} for internal use by > some higher level locking primitives - mutex, rwsem and rt_mutex. Yeah, I'm aware, but it seems like exposing osq_(lock|unlock) is the most palatable solution for now. Long term, I'd like to get six locks promoted to kernel/locking.
On 8/2/23 16:44, Kent Overstreet wrote: > On Wed, Aug 02, 2023 at 04:16:12PM -0400, Waiman Long wrote: >> On 7/12/23 17:11, Kent Overstreet wrote: >>> These are used by bcachefs's six locks. >>> >>> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> >>> Cc: Peter Zijlstra <peterz@infradead.org> >>> Cc: Ingo Molnar <mingo@redhat.com> >>> Cc: Waiman Long <longman@redhat.com> >>> Cc: Boqun Feng <boqun.feng@gmail.com> >>> --- >>> kernel/locking/osq_lock.c | 2 ++ >>> 1 file changed, 2 insertions(+) >>> >>> diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c >>> index d5610ad52b..b752ec5cc6 100644 >>> --- a/kernel/locking/osq_lock.c >>> +++ b/kernel/locking/osq_lock.c >>> @@ -203,6 +203,7 @@ bool osq_lock(struct optimistic_spin_queue *lock) >>> return false; >>> } >>> +EXPORT_SYMBOL_GPL(osq_lock); >>> void osq_unlock(struct optimistic_spin_queue *lock) >>> { >>> @@ -230,3 +231,4 @@ void osq_unlock(struct optimistic_spin_queue *lock) >>> if (next) >>> WRITE_ONCE(next->locked, 1); >>> } >>> +EXPORT_SYMBOL_GPL(osq_unlock); >> Have you considered extending the current rw_semaphore to support a SIX lock >> semantics? There are a number of instances in the kernel that a up_read() is >> followed by a down_write(). Basically, the code try to upgrade the lock from >> read to write. I have been thinking about adding a upgrade_read() API to do >> that. However, the concern that I had was that another writer may come in >> and make modification before the reader can be upgraded to have exclusive >> write access and will make the task to repeat what has been done in the read >> lock part. By adding a read with intent to upgrade to write, we can have >> that guarantee. > It's been discussed, Linus had the same thought. > > But it'd be a massive change to the rw semaphore code; this "read with > intent" really is a third lock state which needs all the same > lock/trylock/unlock paths, and with the way rw semaphore has separate > entry points for read and write it'd be a _ton_ of new code. It really > touches everything - waitlist handling included. Yes, it is a major change, but I had done that before and it is certainly doable. There are spare bits in the low byte of rwsem->count that can be used as an intent bit. We also need to add a new rwsem_wake_type for that for waitlist handling. > > And six locks have several other features that bcachefs needs, and other > users may also end up wanting, that rw semaphores don't have; the two > main features being a percpu read lock mode and support for an external > cycle detector (which requires exposing lock waitlists, with some > guarantees about how those waitlists are used). Can you provide more information about those features? > >> With that said, I would prefer to keep osq_{lock/unlock} for internal use by >> some higher level locking primitives - mutex, rwsem and rt_mutex. > Yeah, I'm aware, but it seems like exposing osq_(lock|unlock) is the > most palatable solution for now. Long term, I'd like to get six locks > promoted to kernel/locking. Your SIX overlaps with rwsem in term of features. So we will have to somehow merge them instead of having 2 APIs with somewhat similar functionality. Cheers, Longman >
On Wed, Aug 02, 2023 at 05:09:13PM -0400, Waiman Long wrote: > On 8/2/23 16:44, Kent Overstreet wrote: > > On Wed, Aug 02, 2023 at 04:16:12PM -0400, Waiman Long wrote: > > > On 7/12/23 17:11, Kent Overstreet wrote: > > > > These are used by bcachefs's six locks. > > > > > > > > Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> > > > > Cc: Peter Zijlstra <peterz@infradead.org> > > > > Cc: Ingo Molnar <mingo@redhat.com> > > > > Cc: Waiman Long <longman@redhat.com> > > > > Cc: Boqun Feng <boqun.feng@gmail.com> > > > > --- > > > > kernel/locking/osq_lock.c | 2 ++ > > > > 1 file changed, 2 insertions(+) > > > > > > > > diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c > > > > index d5610ad52b..b752ec5cc6 100644 > > > > --- a/kernel/locking/osq_lock.c > > > > +++ b/kernel/locking/osq_lock.c > > > > @@ -203,6 +203,7 @@ bool osq_lock(struct optimistic_spin_queue *lock) > > > > return false; > > > > } > > > > +EXPORT_SYMBOL_GPL(osq_lock); > > > > void osq_unlock(struct optimistic_spin_queue *lock) > > > > { > > > > @@ -230,3 +231,4 @@ void osq_unlock(struct optimistic_spin_queue *lock) > > > > if (next) > > > > WRITE_ONCE(next->locked, 1); > > > > } > > > > +EXPORT_SYMBOL_GPL(osq_unlock); > > > Have you considered extending the current rw_semaphore to support a SIX lock > > > semantics? There are a number of instances in the kernel that a up_read() is > > > followed by a down_write(). Basically, the code try to upgrade the lock from > > > read to write. I have been thinking about adding a upgrade_read() API to do > > > that. However, the concern that I had was that another writer may come in > > > and make modification before the reader can be upgraded to have exclusive > > > write access and will make the task to repeat what has been done in the read > > > lock part. By adding a read with intent to upgrade to write, we can have > > > that guarantee. > > It's been discussed, Linus had the same thought. > > > > But it'd be a massive change to the rw semaphore code; this "read with > > intent" really is a third lock state which needs all the same > > lock/trylock/unlock paths, and with the way rw semaphore has separate > > entry points for read and write it'd be a _ton_ of new code. It really > > touches everything - waitlist handling included. > > Yes, it is a major change, but I had done that before and it is certainly > doable. There are spare bits in the low byte of rwsem->count that can be > used as an intent bit. We also need to add a new rwsem_wake_type for that > for waitlist handling. > > > > > > And six locks have several other features that bcachefs needs, and other > > users may also end up wanting, that rw semaphores don't have; the two > > main features being a percpu read lock mode and support for an external > > cycle detector (which requires exposing lock waitlists, with some > > guarantees about how those waitlists are used). > > Can you provide more information about those features? > > > > > > With that said, I would prefer to keep osq_{lock/unlock} for internal use by > > > some higher level locking primitives - mutex, rwsem and rt_mutex. > > Yeah, I'm aware, but it seems like exposing osq_(lock|unlock) is the > > most palatable solution for now. Long term, I'd like to get six locks > > promoted to kernel/locking. > > Your SIX overlaps with rwsem in term of features. So we will have to somehow > merge them instead of having 2 APIs with somewhat similar functionality. Waiman, if you think you can add all the features of six locks to rwsem, knock yourself out - but right now this is a vaporware idea for you, not something I can seriously entertain. I'm looking to merge bcachefs next cycle, not sit around and bikeshed for the next six months. If you start making a serious effort on adding those features to rwsem I'll start walking you through everything six locks has, but right now this is a major digression on a patch that just exports two symbols.
* Kent Overstreet <kent.overstreet@linux.dev> wrote: > On Wed, Aug 02, 2023 at 05:09:13PM -0400, Waiman Long wrote: > > On 8/2/23 16:44, Kent Overstreet wrote: > > > On Wed, Aug 02, 2023 at 04:16:12PM -0400, Waiman Long wrote: > > > > On 7/12/23 17:11, Kent Overstreet wrote: > > > > > These are used by bcachefs's six locks. > > > > > > > > > > Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> > > > > > Cc: Peter Zijlstra <peterz@infradead.org> > > > > > Cc: Ingo Molnar <mingo@redhat.com> > > > > > Cc: Waiman Long <longman@redhat.com> > > > > > Cc: Boqun Feng <boqun.feng@gmail.com> > > > > > --- > > > > > kernel/locking/osq_lock.c | 2 ++ > > > > > 1 file changed, 2 insertions(+) > > > > > > > > > > diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c > > > > > index d5610ad52b..b752ec5cc6 100644 > > > > > --- a/kernel/locking/osq_lock.c > > > > > +++ b/kernel/locking/osq_lock.c > > > > > @@ -203,6 +203,7 @@ bool osq_lock(struct optimistic_spin_queue *lock) > > > > > return false; > > > > > } > > > > > +EXPORT_SYMBOL_GPL(osq_lock); > > > > > void osq_unlock(struct optimistic_spin_queue *lock) > > > > > { > > > > > @@ -230,3 +231,4 @@ void osq_unlock(struct optimistic_spin_queue *lock) > > > > > if (next) > > > > > WRITE_ONCE(next->locked, 1); > > > > > } > > > > > +EXPORT_SYMBOL_GPL(osq_unlock); > > > > Have you considered extending the current rw_semaphore to support a SIX lock > > > > semantics? There are a number of instances in the kernel that a up_read() is > > > > followed by a down_write(). Basically, the code try to upgrade the lock from > > > > read to write. I have been thinking about adding a upgrade_read() API to do > > > > that. However, the concern that I had was that another writer may come in > > > > and make modification before the reader can be upgraded to have exclusive > > > > write access and will make the task to repeat what has been done in the read > > > > lock part. By adding a read with intent to upgrade to write, we can have > > > > that guarantee. > > > It's been discussed, Linus had the same thought. > > > > > > But it'd be a massive change to the rw semaphore code; this "read with > > > intent" really is a third lock state which needs all the same > > > lock/trylock/unlock paths, and with the way rw semaphore has separate > > > entry points for read and write it'd be a _ton_ of new code. It really > > > touches everything - waitlist handling included. > > > > Yes, it is a major change, but I had done that before and it is certainly > > doable. There are spare bits in the low byte of rwsem->count that can be > > used as an intent bit. We also need to add a new rwsem_wake_type for that > > for waitlist handling. > > > > > > > > > > And six locks have several other features that bcachefs needs, and other > > > users may also end up wanting, that rw semaphores don't have; the two > > > main features being a percpu read lock mode and support for an external > > > cycle detector (which requires exposing lock waitlists, with some > > > guarantees about how those waitlists are used). > > > > Can you provide more information about those features? > > > > > > > > > With that said, I would prefer to keep osq_{lock/unlock} for internal use by > > > > some higher level locking primitives - mutex, rwsem and rt_mutex. > > > Yeah, I'm aware, but it seems like exposing osq_(lock|unlock) is the > > > most palatable solution for now. Long term, I'd like to get six locks > > > promoted to kernel/locking. > > > > Your SIX overlaps with rwsem in term of features. So we will have to somehow > > merge them instead of having 2 APIs with somewhat similar functionality. > > Waiman, if you think you can add all the features of six locks to rwsem, > knock yourself out - but right now this is a vaporware idea for you, not > something I can seriously entertain. I'm looking to merge bcachefs next > cycle, not sit around and bikeshed for the next six months. That's an entirely inappropriate response to valid review feedback. Not having two overlapping locking facilities is not 'bikeshedding' at all ... > If you start making a serious effort on adding those features to rwsem > I'll start walking you through everything six locks has, but right now > this is a major digression on a patch that just exports two symbols. In Linux the burden of work is on people submitting new code, not on reviewers. The rule is that you should not reinvent the wheel in new features - extend existing locking facilities please. Waiman gave you some pointers as to how to extend rwsems. Meanwhile, NAK on the export of osq_(lock|unlock): NAKed-by: Ingo Molnar <mingo@kernel.org> Ie. NAK on this commit in linux-next: 97da2065b7cb ("locking/osq: Export osq_(lock|unlock)") This is an internal function of Linux locking subsystem we would not like to expose or export. This commit was applied without an Ack or Reviewed-by by a locking subsystem maintainer or reviewer (which Waiman Long is). Thanks, Ingo
On Tue, Oct 10, 2023 at 10:09:38AM +0200, Ingo Molnar wrote: > > Waiman, if you think you can add all the features of six locks to rwsem, > > knock yourself out - but right now this is a vaporware idea for you, not > > something I can seriously entertain. I'm looking to merge bcachefs next > > cycle, not sit around and bikeshed for the next six months. > > That's an entirely inappropriate response to valid review feedback. > > Not having two overlapping locking facilities is not 'bikeshedding' at all ... Well, there was already a long off-list discussion about adding six lock features to rwsem. Basically, it looks to me like a total redesign of rwsem in order to do it correctly, and I don't think that would fly. The rwsem code has separate entrypoints for every lock state, and adding a third lock state would at a minimum add a lot of new - nearly duplicate - code. There's also features and optimizations in six locks that rwsem doesn't have, and it's not clear to me that it would be appropriate to add them to rwsem - each of them would need real discussion. The big ones are: - percpu reader mode, used for locks for interior nodes and subvolume keys in bcachefs - exposing of waitlist entries (and this requires nontrivial guarantees to do correctly!), so that bcachefs can do cycle detection deadlock avoidance on top. In short, this would _not_ be a small project, and I think the saner approach if we really did want to condense down to a single locking implementation would be to replace rwsem with six locks. But before even contemplating that we'd want to see six locks getting wider usage and testing first. Hence why we're at leaving six locks in fs/bcachefs/ for now. > > If you start making a serious effort on adding those features to rwsem > > I'll start walking you through everything six locks has, but right now > > this is a major digression on a patch that just exports two symbols. > > In Linux the burden of work is on people submitting new code, not on > reviewers. The rule is that you should not reinvent the wheel in new > features - extend existing locking facilities please. > > Waiman gave you some pointers as to how to extend rwsems. > > Meanwhile, NAK on the export of osq_(lock|unlock): Perhaps we could get some justification for why you want osq locks to be private? My initial pull request had six locks in kernel/locking/, specifically to keep osq locks private, as requested by locking people (some years back). But since Linus shot that down, I need an alternative. If you're really dead set against exporting osq locks (and again, why?), my only alternative will be to either take optimistic spinning out of six locks, or implement optimistic spinning another way (which is something I was already looking at before; the way lock handoff works in six locks now makes that an attractive idea anyways, but of course the devil is in the details with locking code).
diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c index d5610ad52b..b752ec5cc6 100644 --- a/kernel/locking/osq_lock.c +++ b/kernel/locking/osq_lock.c @@ -203,6 +203,7 @@ bool osq_lock(struct optimistic_spin_queue *lock) return false; } +EXPORT_SYMBOL_GPL(osq_lock); void osq_unlock(struct optimistic_spin_queue *lock) { @@ -230,3 +231,4 @@ void osq_unlock(struct optimistic_spin_queue *lock) if (next) WRITE_ONCE(next->locked, 1); } +EXPORT_SYMBOL_GPL(osq_unlock);
These are used by bcachefs's six locks. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Waiman Long <longman@redhat.com> Cc: Boqun Feng <boqun.feng@gmail.com> --- kernel/locking/osq_lock.c | 2 ++ 1 file changed, 2 insertions(+)