[11/20] locking/osq: Export osq_(lock|unlock)

Message ID	20230712211115.2174650-12-kent.overstreet@linux.dev (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-fsdevel-owner@vger.kernel.org> From: Kent Overstreet <kent.overstreet@linux.dev> To: linux-bcachefs@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Kent Overstreet <kent.overstreet@linux.dev>, Peter Zijlstra <peterz@infradead.org>, Ingo Molnar <mingo@redhat.com>, Waiman Long <longman@redhat.com>, Boqun Feng <boqun.feng@gmail.com> Subject: [PATCH 11/20] locking/osq: Export osq_(lock\|unlock) Date: Wed, 12 Jul 2023 17:11:06 -0400 Message-Id: <20230712211115.2174650-12-kent.overstreet@linux.dev> In-Reply-To: <20230712211115.2174650-1-kent.overstreet@linux.dev> References: <20230712211115.2174650-1-kent.overstreet@linux.dev> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	bcachefs prereqs patch series \| expand [00/20] bcachefs prereqs patch series [01/20] sched: Add task_struct->faults_disabled_mapping [02/20] fs: factor out d_mark_tmpfile() [03/20] iov_iter: Handle compound highmem pages in copy_page_from_iter_atomic() [04/20] block: Add some exports for bcachefs [05/20] block: Allow bio_iov_iter_get_pages() with bio->bi_bdev unset [06/20] block: Bring back zero_fill_bio_iter [07/20] block: Don't block on s_umount from __invalidate_super() [08/20] stacktrace: Export stack_trace_save_tsk [09/20] lib/string_helpers: string_get_size() now returns characters wrote [10/20] lib: Export errname [11/20] locking/osq: Export osq_(lock\|unlock) [12/20] bcache: move closures to lib/ [13/20] MAINTAINERS: Add entry for closures [14/20] closures: closure_wait_event() [15/20] closures: closure_nr_remaining() [16/20] closures: Add a missing include [17/20] MAINTAINERS: Add entry for generic-radix-tree [18/20] lib/generic-radix-tree.c: Don't overflow in peek() [19/20] lib/generic-radix-tree.c: Add a missing include [20/20] lib/generic-radix-tree.c: Add peek_prev()

Kent Overstreet July 12, 2023, 9:11 p.m. UTC

These are used by bcachefs's six locks.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Waiman Long <longman@redhat.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
---
 kernel/locking/osq_lock.c | 2 ++
 1 file changed, 2 insertions(+)

Waiman Long Aug. 2, 2023, 8:16 p.m. UTC | #1

On 7/12/23 17:11, Kent Overstreet wrote:
> These are used by bcachefs's six locks.
>
> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Waiman Long <longman@redhat.com>
> Cc: Boqun Feng <boqun.feng@gmail.com>
> ---
>   kernel/locking/osq_lock.c | 2 ++
>   1 file changed, 2 insertions(+)
>
> diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c
> index d5610ad52b..b752ec5cc6 100644
> --- a/kernel/locking/osq_lock.c
> +++ b/kernel/locking/osq_lock.c
> @@ -203,6 +203,7 @@ bool osq_lock(struct optimistic_spin_queue *lock)
>   
>   	return false;
>   }
> +EXPORT_SYMBOL_GPL(osq_lock);
>   
>   void osq_unlock(struct optimistic_spin_queue *lock)
>   {
> @@ -230,3 +231,4 @@ void osq_unlock(struct optimistic_spin_queue *lock)
>   	if (next)
>   		WRITE_ONCE(next->locked, 1);
>   }
> +EXPORT_SYMBOL_GPL(osq_unlock);

Have you considered extending the current rw_semaphore to support a SIX 
lock semantics? There are a number of instances in the kernel that a 
up_read() is followed by a down_write(). Basically, the code try to 
upgrade the lock from read to write. I have been thinking about adding a 
upgrade_read() API to do that. However, the concern that I had was that 
another writer may come in and make modification before the reader can 
be upgraded to have exclusive write access and will make the task to 
repeat what has been done in the read lock part. By adding a read with 
intent to upgrade to write, we can have that guarantee.

With that said, I would prefer to keep osq_{lock/unlock} for internal 
use by some higher level locking primitives - mutex, rwsem and rt_mutex.

Cheers,
Longman

Kent Overstreet Aug. 2, 2023, 8:44 p.m. UTC | #2

On Wed, Aug 02, 2023 at 04:16:12PM -0400, Waiman Long wrote:
> On 7/12/23 17:11, Kent Overstreet wrote:
> > These are used by bcachefs's six locks.
> > 
> > Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: Ingo Molnar <mingo@redhat.com>
> > Cc: Waiman Long <longman@redhat.com>
> > Cc: Boqun Feng <boqun.feng@gmail.com>
> > ---
> >   kernel/locking/osq_lock.c | 2 ++
> >   1 file changed, 2 insertions(+)
> > 
> > diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c
> > index d5610ad52b..b752ec5cc6 100644
> > --- a/kernel/locking/osq_lock.c
> > +++ b/kernel/locking/osq_lock.c
> > @@ -203,6 +203,7 @@ bool osq_lock(struct optimistic_spin_queue *lock)
> >   	return false;
> >   }
> > +EXPORT_SYMBOL_GPL(osq_lock);
> >   void osq_unlock(struct optimistic_spin_queue *lock)
> >   {
> > @@ -230,3 +231,4 @@ void osq_unlock(struct optimistic_spin_queue *lock)
> >   	if (next)
> >   		WRITE_ONCE(next->locked, 1);
> >   }
> > +EXPORT_SYMBOL_GPL(osq_unlock);
> 
> Have you considered extending the current rw_semaphore to support a SIX lock
> semantics? There are a number of instances in the kernel that a up_read() is
> followed by a down_write(). Basically, the code try to upgrade the lock from
> read to write. I have been thinking about adding a upgrade_read() API to do
> that. However, the concern that I had was that another writer may come in
> and make modification before the reader can be upgraded to have exclusive
> write access and will make the task to repeat what has been done in the read
> lock part. By adding a read with intent to upgrade to write, we can have
> that guarantee.

It's been discussed, Linus had the same thought.

But it'd be a massive change to the rw semaphore code; this "read with
intent" really is a third lock state which needs all the same
lock/trylock/unlock paths, and with the way rw semaphore has separate
entry points for read and write it'd be a _ton_ of new code. It really
touches everything - waitlist handling included.

And six locks have several other features that bcachefs needs, and other
users may also end up wanting, that rw semaphores don't have; the two
main features being a percpu read lock mode and support for an external
cycle detector (which requires exposing lock waitlists, with some
guarantees about how those waitlists are used).

> With that said, I would prefer to keep osq_{lock/unlock} for internal use by
> some higher level locking primitives - mutex, rwsem and rt_mutex.

Yeah, I'm aware, but it seems like exposing osq_(lock|unlock) is the
most palatable solution for now. Long term, I'd like to get six locks
promoted to kernel/locking.

Waiman Long Aug. 2, 2023, 9:09 p.m. UTC | #3

On 8/2/23 16:44, Kent Overstreet wrote:
> On Wed, Aug 02, 2023 at 04:16:12PM -0400, Waiman Long wrote:
>> On 7/12/23 17:11, Kent Overstreet wrote:
>>> These are used by bcachefs's six locks.
>>>
>>> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
>>> Cc: Peter Zijlstra <peterz@infradead.org>
>>> Cc: Ingo Molnar <mingo@redhat.com>
>>> Cc: Waiman Long <longman@redhat.com>
>>> Cc: Boqun Feng <boqun.feng@gmail.com>
>>> ---
>>>    kernel/locking/osq_lock.c | 2 ++
>>>    1 file changed, 2 insertions(+)
>>>
>>> diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c
>>> index d5610ad52b..b752ec5cc6 100644
>>> --- a/kernel/locking/osq_lock.c
>>> +++ b/kernel/locking/osq_lock.c
>>> @@ -203,6 +203,7 @@ bool osq_lock(struct optimistic_spin_queue *lock)
>>>    	return false;
>>>    }
>>> +EXPORT_SYMBOL_GPL(osq_lock);
>>>    void osq_unlock(struct optimistic_spin_queue *lock)
>>>    {
>>> @@ -230,3 +231,4 @@ void osq_unlock(struct optimistic_spin_queue *lock)
>>>    	if (next)
>>>    		WRITE_ONCE(next->locked, 1);
>>>    }
>>> +EXPORT_SYMBOL_GPL(osq_unlock);
>> Have you considered extending the current rw_semaphore to support a SIX lock
>> semantics? There are a number of instances in the kernel that a up_read() is
>> followed by a down_write(). Basically, the code try to upgrade the lock from
>> read to write. I have been thinking about adding a upgrade_read() API to do
>> that. However, the concern that I had was that another writer may come in
>> and make modification before the reader can be upgraded to have exclusive
>> write access and will make the task to repeat what has been done in the read
>> lock part. By adding a read with intent to upgrade to write, we can have
>> that guarantee.
> It's been discussed, Linus had the same thought.
>
> But it'd be a massive change to the rw semaphore code; this "read with
> intent" really is a third lock state which needs all the same
> lock/trylock/unlock paths, and with the way rw semaphore has separate
> entry points for read and write it'd be a _ton_ of new code. It really
> touches everything - waitlist handling included.

Yes, it is a major change, but I had done that before and it is 
certainly doable. There are spare bits in the low byte of rwsem->count 
that can be used as an intent bit. We also need to add a new 
rwsem_wake_type for that for waitlist handling.


>
> And six locks have several other features that bcachefs needs, and other
> users may also end up wanting, that rw semaphores don't have; the two
> main features being a percpu read lock mode and support for an external
> cycle detector (which requires exposing lock waitlists, with some
> guarantees about how those waitlists are used).

Can you provide more information about those features?

>
>> With that said, I would prefer to keep osq_{lock/unlock} for internal use by
>> some higher level locking primitives - mutex, rwsem and rt_mutex.
> Yeah, I'm aware, but it seems like exposing osq_(lock|unlock) is the
> most palatable solution for now. Long term, I'd like to get six locks
> promoted to kernel/locking.

Your SIX overlaps with rwsem in term of features. So we will have to 
somehow merge them instead of having 2 APIs with somewhat similar 
functionality.

Cheers,
Longman

>

Kent Overstreet Aug. 2, 2023, 9:42 p.m. UTC | #4

On Wed, Aug 02, 2023 at 05:09:13PM -0400, Waiman Long wrote:
> On 8/2/23 16:44, Kent Overstreet wrote:
> > On Wed, Aug 02, 2023 at 04:16:12PM -0400, Waiman Long wrote:
> > > On 7/12/23 17:11, Kent Overstreet wrote:
> > > > These are used by bcachefs's six locks.
> > > > 
> > > > Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
> > > > Cc: Peter Zijlstra <peterz@infradead.org>
> > > > Cc: Ingo Molnar <mingo@redhat.com>
> > > > Cc: Waiman Long <longman@redhat.com>
> > > > Cc: Boqun Feng <boqun.feng@gmail.com>
> > > > ---
> > > >    kernel/locking/osq_lock.c | 2 ++
> > > >    1 file changed, 2 insertions(+)
> > > > 
> > > > diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c
> > > > index d5610ad52b..b752ec5cc6 100644
> > > > --- a/kernel/locking/osq_lock.c
> > > > +++ b/kernel/locking/osq_lock.c
> > > > @@ -203,6 +203,7 @@ bool osq_lock(struct optimistic_spin_queue *lock)
> > > >    	return false;
> > > >    }
> > > > +EXPORT_SYMBOL_GPL(osq_lock);
> > > >    void osq_unlock(struct optimistic_spin_queue *lock)
> > > >    {
> > > > @@ -230,3 +231,4 @@ void osq_unlock(struct optimistic_spin_queue *lock)
> > > >    	if (next)
> > > >    		WRITE_ONCE(next->locked, 1);
> > > >    }
> > > > +EXPORT_SYMBOL_GPL(osq_unlock);
> > > Have you considered extending the current rw_semaphore to support a SIX lock
> > > semantics? There are a number of instances in the kernel that a up_read() is
> > > followed by a down_write(). Basically, the code try to upgrade the lock from
> > > read to write. I have been thinking about adding a upgrade_read() API to do
> > > that. However, the concern that I had was that another writer may come in
> > > and make modification before the reader can be upgraded to have exclusive
> > > write access and will make the task to repeat what has been done in the read
> > > lock part. By adding a read with intent to upgrade to write, we can have
> > > that guarantee.
> > It's been discussed, Linus had the same thought.
> > 
> > But it'd be a massive change to the rw semaphore code; this "read with
> > intent" really is a third lock state which needs all the same
> > lock/trylock/unlock paths, and with the way rw semaphore has separate
> > entry points for read and write it'd be a _ton_ of new code. It really
> > touches everything - waitlist handling included.
> 
> Yes, it is a major change, but I had done that before and it is certainly
> doable. There are spare bits in the low byte of rwsem->count that can be
> used as an intent bit. We also need to add a new rwsem_wake_type for that
> for waitlist handling.
> 
> 
> > 
> > And six locks have several other features that bcachefs needs, and other
> > users may also end up wanting, that rw semaphores don't have; the two
> > main features being a percpu read lock mode and support for an external
> > cycle detector (which requires exposing lock waitlists, with some
> > guarantees about how those waitlists are used).
> 
> Can you provide more information about those features?
> 
> > 
> > > With that said, I would prefer to keep osq_{lock/unlock} for internal use by
> > > some higher level locking primitives - mutex, rwsem and rt_mutex.
> > Yeah, I'm aware, but it seems like exposing osq_(lock|unlock) is the
> > most palatable solution for now. Long term, I'd like to get six locks
> > promoted to kernel/locking.
> 
> Your SIX overlaps with rwsem in term of features. So we will have to somehow
> merge them instead of having 2 APIs with somewhat similar functionality.

Waiman, if you think you can add all the features of six locks to rwsem,
knock yourself out - but right now this is a vaporware idea for you, not
something I can seriously entertain. I'm looking to merge bcachefs next
cycle, not sit around and bikeshed for the next six months.

If you start making a serious effort on adding those features to rwsem
I'll start walking you through everything six locks has, but right now
this is a major digression on a patch that just exports two symbols.

Ingo Molnar Oct. 10, 2023, 8:09 a.m. UTC | #5

* Kent Overstreet <kent.overstreet@linux.dev> wrote:

> On Wed, Aug 02, 2023 at 05:09:13PM -0400, Waiman Long wrote:
> > On 8/2/23 16:44, Kent Overstreet wrote:
> > > On Wed, Aug 02, 2023 at 04:16:12PM -0400, Waiman Long wrote:
> > > > On 7/12/23 17:11, Kent Overstreet wrote:
> > > > > These are used by bcachefs's six locks.
> > > > > 
> > > > > Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
> > > > > Cc: Peter Zijlstra <peterz@infradead.org>
> > > > > Cc: Ingo Molnar <mingo@redhat.com>
> > > > > Cc: Waiman Long <longman@redhat.com>
> > > > > Cc: Boqun Feng <boqun.feng@gmail.com>
> > > > > ---
> > > > >    kernel/locking/osq_lock.c | 2 ++
> > > > >    1 file changed, 2 insertions(+)
> > > > > 
> > > > > diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c
> > > > > index d5610ad52b..b752ec5cc6 100644
> > > > > --- a/kernel/locking/osq_lock.c
> > > > > +++ b/kernel/locking/osq_lock.c
> > > > > @@ -203,6 +203,7 @@ bool osq_lock(struct optimistic_spin_queue *lock)
> > > > >    	return false;
> > > > >    }
> > > > > +EXPORT_SYMBOL_GPL(osq_lock);
> > > > >    void osq_unlock(struct optimistic_spin_queue *lock)
> > > > >    {
> > > > > @@ -230,3 +231,4 @@ void osq_unlock(struct optimistic_spin_queue *lock)
> > > > >    	if (next)
> > > > >    		WRITE_ONCE(next->locked, 1);
> > > > >    }
> > > > > +EXPORT_SYMBOL_GPL(osq_unlock);
> > > > Have you considered extending the current rw_semaphore to support a SIX lock
> > > > semantics? There are a number of instances in the kernel that a up_read() is
> > > > followed by a down_write(). Basically, the code try to upgrade the lock from
> > > > read to write. I have been thinking about adding a upgrade_read() API to do
> > > > that. However, the concern that I had was that another writer may come in
> > > > and make modification before the reader can be upgraded to have exclusive
> > > > write access and will make the task to repeat what has been done in the read
> > > > lock part. By adding a read with intent to upgrade to write, we can have
> > > > that guarantee.
> > > It's been discussed, Linus had the same thought.
> > > 
> > > But it'd be a massive change to the rw semaphore code; this "read with
> > > intent" really is a third lock state which needs all the same
> > > lock/trylock/unlock paths, and with the way rw semaphore has separate
> > > entry points for read and write it'd be a _ton_ of new code. It really
> > > touches everything - waitlist handling included.
> > 
> > Yes, it is a major change, but I had done that before and it is certainly
> > doable. There are spare bits in the low byte of rwsem->count that can be
> > used as an intent bit. We also need to add a new rwsem_wake_type for that
> > for waitlist handling.
> > 
> > 
> > > 
> > > And six locks have several other features that bcachefs needs, and other
> > > users may also end up wanting, that rw semaphores don't have; the two
> > > main features being a percpu read lock mode and support for an external
> > > cycle detector (which requires exposing lock waitlists, with some
> > > guarantees about how those waitlists are used).
> > 
> > Can you provide more information about those features?
> > 
> > > 
> > > > With that said, I would prefer to keep osq_{lock/unlock} for internal use by
> > > > some higher level locking primitives - mutex, rwsem and rt_mutex.
> > > Yeah, I'm aware, but it seems like exposing osq_(lock|unlock) is the
> > > most palatable solution for now. Long term, I'd like to get six locks
> > > promoted to kernel/locking.
> > 
> > Your SIX overlaps with rwsem in term of features. So we will have to somehow
> > merge them instead of having 2 APIs with somewhat similar functionality.
> 
> Waiman, if you think you can add all the features of six locks to rwsem,
> knock yourself out - but right now this is a vaporware idea for you, not
> something I can seriously entertain. I'm looking to merge bcachefs next
> cycle, not sit around and bikeshed for the next six months.

That's an entirely inappropriate response to valid review feedback.

Not having two overlapping locking facilities is not 'bikeshedding' at all ...

> If you start making a serious effort on adding those features to rwsem
> I'll start walking you through everything six locks has, but right now
> this is a major digression on a patch that just exports two symbols.

In Linux the burden of work is on people submitting new code, not on 
reviewers. The rule is that you should not reinvent the wheel in new
features - extend existing locking facilities please.

Waiman gave you some pointers as to how to extend rwsems.

Meanwhile, NAK on the export of osq_(lock|unlock):

    NAKed-by: Ingo Molnar <mingo@kernel.org>

Ie. NAK on this commit in linux-next:

    97da2065b7cb ("locking/osq: Export osq_(lock|unlock)")

This is an internal function of Linux locking subsystem we would not like to
expose or export.

This commit was applied without an Ack or Reviewed-by by a locking subsystem
maintainer or reviewer (which Waiman Long is).

Thanks,

	Ingo

Kent Overstreet Oct. 18, 2023, 9:04 p.m. UTC | #6

On Tue, Oct 10, 2023 at 10:09:38AM +0200, Ingo Molnar wrote:
> > Waiman, if you think you can add all the features of six locks to rwsem,
> > knock yourself out - but right now this is a vaporware idea for you, not
> > something I can seriously entertain. I'm looking to merge bcachefs next
> > cycle, not sit around and bikeshed for the next six months.
> 
> That's an entirely inappropriate response to valid review feedback.
> 
> Not having two overlapping locking facilities is not 'bikeshedding' at all ...

Well, there was already a long off-list discussion about adding six lock
features to rwsem.

Basically, it looks to me like a total redesign of rwsem in order to do
it correctly, and I don't think that would fly. The rwsem code has
separate entrypoints for every lock state, and adding a third lock state
would at a minimum add a lot of new - nearly duplicate - code.

There's also features and optimizations in six locks that rwsem doesn't
have, and it's not clear to me that it would be appropriate to add them
to rwsem - each of them would need real discussion. The big ones are:

 - percpu reader mode, used for locks for interior nodes and subvolume
   keys in bcachefs
 - exposing of waitlist entries (and this requires nontrivial guarantees
   to do correctly!), so that bcachefs can do cycle detection deadlock
   avoidance on top.

In short, this would _not_ be a small project, and I think the saner
approach if we really did want to condense down to a single locking
implementation would be to replace rwsem with six locks. But before even
contemplating that we'd want to see six locks getting wider usage and
testing first.

Hence why we're at leaving six locks in fs/bcachefs/ for now.

> > If you start making a serious effort on adding those features to rwsem
> > I'll start walking you through everything six locks has, but right now
> > this is a major digression on a patch that just exports two symbols.
> 
> In Linux the burden of work is on people submitting new code, not on 
> reviewers. The rule is that you should not reinvent the wheel in new
> features - extend existing locking facilities please.
> 
> Waiman gave you some pointers as to how to extend rwsems.
> 
> Meanwhile, NAK on the export of osq_(lock|unlock):

Perhaps we could get some justification for why you want osq locks to be
private?

My initial pull request had six locks in kernel/locking/, specifically
to keep osq locks private, as requested by locking people (some years
back). But since Linus shot that down, I need an alternative.

If you're really dead set against exporting osq locks (and again, why?),
my only alternative will be to either take optimistic spinning out of
six locks, or implement optimistic spinning another way (which is
something I was already looking at before; the way lock handoff works in
six locks now makes that an attractive idea anyways, but of course the
devil is in the details with locking code).

[11/20] locking/osq: Export osq_(lock|unlock)

Commit Message

Comments

Patch