diff mbox series

refcount: Strengthen inc_not_zero()

Message ID 20250115160011.GG8385@noisy.programming.kicks-ass.net (mailing list archive)
State New
Headers show
Series refcount: Strengthen inc_not_zero() | expand

Commit Message

Peter Zijlstra Jan. 15, 2025, 4 p.m. UTC
On Wed, Jan 15, 2025 at 12:13:34PM +0100, Peter Zijlstra wrote:

> Notably, it means refcount_t is entirely unsuitable for anything
> SLAB_TYPESAFE_BY_RCU, since they all will need secondary validation
> conditions after the refcount succeeds.
> 
> And this is probably fine, but let me ponder this all a little more.

Even though SLAB_TYPESAFE_BY_RCU is relatively rare, I think we'd better
fix this, these things are hard enough as they are.

Will, others, wdyt?

---
Subject: refcount: Strengthen inc_not_zero()

For speculative lookups where a successful inc_not_zero() pins the
object, but where we still need to double check if the object acquired
is indeed the one we set out to aquire, needs this validation to happen
*after* the increment.

Notably SLAB_TYPESAFE_BY_RCU is one such an example.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 include/linux/refcount.h | 15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)

Comments

Suren Baghdasaryan Jan. 16, 2025, 3:12 p.m. UTC | #1
On Wed, Jan 15, 2025 at 8:00 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Wed, Jan 15, 2025 at 12:13:34PM +0100, Peter Zijlstra wrote:
>
> > Notably, it means refcount_t is entirely unsuitable for anything
> > SLAB_TYPESAFE_BY_RCU, since they all will need secondary validation
> > conditions after the refcount succeeds.
> >
> > And this is probably fine, but let me ponder this all a little more.
>
> Even though SLAB_TYPESAFE_BY_RCU is relatively rare, I think we'd better
> fix this, these things are hard enough as they are.
>
> Will, others, wdyt?

I'll wait for the verdict on this patch before proceeding with my
series. It obviously simplifies my job. Thanks Peter!

>
> ---
> Subject: refcount: Strengthen inc_not_zero()
>
> For speculative lookups where a successful inc_not_zero() pins the
> object, but where we still need to double check if the object acquired
> is indeed the one we set out to aquire, needs this validation to happen
> *after* the increment.
>
> Notably SLAB_TYPESAFE_BY_RCU is one such an example.
>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  include/linux/refcount.h | 15 ++++++++-------
>  1 file changed, 8 insertions(+), 7 deletions(-)
>
> diff --git a/include/linux/refcount.h b/include/linux/refcount.h
> index 35f039ecb272..340e7ffa445e 100644
> --- a/include/linux/refcount.h
> +++ b/include/linux/refcount.h
> @@ -69,9 +69,10 @@
>   * its the lock acquire, for RCU/lockless data structures its the dependent
>   * load.
>   *
> - * Do note that inc_not_zero() provides a control dependency which will order
> - * future stores against the inc, this ensures we'll never modify the object
> - * if we did not in fact acquire a reference.
> + * Do note that inc_not_zero() does provide acquire order, which will order
> + * future load and stores against the inc, this ensures all subsequent accesses
> + * are from this object and not anything previously occupying this memory --
> + * consider SLAB_TYPESAFE_BY_RCU.
>   *
>   * The decrements will provide release order, such that all the prior loads and
>   * stores will be issued before, it also provides a control dependency, which
> @@ -144,7 +145,7 @@ bool __refcount_add_not_zero(int i, refcount_t *r, int *oldp)
>         do {
>                 if (!old)
>                         break;
> -       } while (!atomic_try_cmpxchg_relaxed(&r->refs, &old, old + i));
> +       } while (!atomic_try_cmpxchg_acquire(&r->refs, &old, old + i));
>
>         if (oldp)
>                 *oldp = old;
> @@ -225,9 +226,9 @@ static inline __must_check bool __refcount_inc_not_zero(refcount_t *r, int *oldp
>   * Similar to atomic_inc_not_zero(), but will saturate at REFCOUNT_SATURATED
>   * and WARN.
>   *
> - * Provides no memory ordering, it is assumed the caller has guaranteed the
> - * object memory to be stable (RCU, etc.). It does provide a control dependency
> - * and thereby orders future stores. See the comment on top.
> + * Provides acquire ordering, such that subsequent accesses are after the
> + * increment. This is important for the cases where secondary validation is
> + * required, eg. SLAB_TYPESAFE_BY_RCU.
>   *
>   * Return: true if the increment was successful, false otherwise
>   */
Will Deacon Jan. 17, 2025, 3:41 p.m. UTC | #2
On Wed, Jan 15, 2025 at 05:00:11PM +0100, Peter Zijlstra wrote:
> On Wed, Jan 15, 2025 at 12:13:34PM +0100, Peter Zijlstra wrote:
> 
> > Notably, it means refcount_t is entirely unsuitable for anything
> > SLAB_TYPESAFE_BY_RCU, since they all will need secondary validation
> > conditions after the refcount succeeds.
> > 
> > And this is probably fine, but let me ponder this all a little more.
> 
> Even though SLAB_TYPESAFE_BY_RCU is relatively rare, I think we'd better
> fix this, these things are hard enough as they are.
> 
> Will, others, wdyt?

We should also update the Documentation (atomic_t.txt and
refcount-vs-atomic.rst) if we strengthen this.

> ---
> Subject: refcount: Strengthen inc_not_zero()
> 
> For speculative lookups where a successful inc_not_zero() pins the
> object, but where we still need to double check if the object acquired
> is indeed the one we set out to aquire, needs this validation to happen
> *after* the increment.
> 
> Notably SLAB_TYPESAFE_BY_RCU is one such an example.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  include/linux/refcount.h | 15 ++++++++-------
>  1 file changed, 8 insertions(+), 7 deletions(-)
> 
> diff --git a/include/linux/refcount.h b/include/linux/refcount.h
> index 35f039ecb272..340e7ffa445e 100644
> --- a/include/linux/refcount.h
> +++ b/include/linux/refcount.h
> @@ -69,9 +69,10 @@
>   * its the lock acquire, for RCU/lockless data structures its the dependent
>   * load.
>   *
> - * Do note that inc_not_zero() provides a control dependency which will order
> - * future stores against the inc, this ensures we'll never modify the object
> - * if we did not in fact acquire a reference.
> + * Do note that inc_not_zero() does provide acquire order, which will order
> + * future load and stores against the inc, this ensures all subsequent accesses
> + * are from this object and not anything previously occupying this memory --
> + * consider SLAB_TYPESAFE_BY_RCU.
>   *
>   * The decrements will provide release order, such that all the prior loads and
>   * stores will be issued before, it also provides a control dependency, which
> @@ -144,7 +145,7 @@ bool __refcount_add_not_zero(int i, refcount_t *r, int *oldp)
>  	do {
>  		if (!old)
>  			break;
> -	} while (!atomic_try_cmpxchg_relaxed(&r->refs, &old, old + i));
> +	} while (!atomic_try_cmpxchg_acquire(&r->refs, &old, old + i));

Hmm, do the later memory accesses need to be ordered against the store
part of the increment or just the read? If it's the former, then I don't
think that _acquire is sufficient -- accesses can still get in-between
the read and write parts of the RmW.

Will
Matthew Wilcox Jan. 17, 2025, 4:13 p.m. UTC | #3
On Wed, Jan 15, 2025 at 05:00:11PM +0100, Peter Zijlstra wrote:
> Subject: refcount: Strengthen inc_not_zero()
> 
> For speculative lookups where a successful inc_not_zero() pins the
> object, but where we still need to double check if the object acquired
> is indeed the one we set out to aquire, needs this validation to happen
> *after* the increment.
> 
> Notably SLAB_TYPESAFE_BY_RCU is one such an example.

While you're looking at inc_not_zero(), have you already thought about
doing something like this?  ie failing rather than saturating since
all users of this already have to check for failure.  It looks like
two comparisons per call rather than three.

diff --git a/include/linux/refcount.h b/include/linux/refcount.h
index 35f039ecb272..3ef7d316e870 100644
--- a/include/linux/refcount.h
+++ b/include/linux/refcount.h
@@ -142,16 +142,13 @@ bool __refcount_add_not_zero(int i, refcount_t *r, int *oldp)
 	int old = refcount_read(r);
 
 	do {
-		if (!old)
+		if (old <= 0 || old + i < 0)
 			break;
 	} while (!atomic_try_cmpxchg_relaxed(&r->refs, &old, old + i));
 
 	if (oldp)
 		*oldp = old;
 
-	if (unlikely(old < 0 || old + i < 0))
-		refcount_warn_saturate(r, REFCOUNT_ADD_NOT_ZERO_OVF);
-
 	return old;
 }
 

$ ./scripts/bloat-o-meter before.o after.o 
add/remove: 0/0 grow/shrink: 0/4 up/down: 0/-91 (-91)
Function                                     old     new   delta
io_wq_for_each_worker.isra                   162     158      -4
io_worker_handle_work                       1403    1387     -16
io_wq_activate_free_worker                   187     158     -29
io_queue_worker_create                       367     325     -42
Total: Before=10068, After=9977, chg -0.90%

(that's io_uring/io-wq.o as an example user of refcount_inc_not_zero())
diff mbox series

Patch

diff --git a/include/linux/refcount.h b/include/linux/refcount.h
index 35f039ecb272..340e7ffa445e 100644
--- a/include/linux/refcount.h
+++ b/include/linux/refcount.h
@@ -69,9 +69,10 @@ 
  * its the lock acquire, for RCU/lockless data structures its the dependent
  * load.
  *
- * Do note that inc_not_zero() provides a control dependency which will order
- * future stores against the inc, this ensures we'll never modify the object
- * if we did not in fact acquire a reference.
+ * Do note that inc_not_zero() does provide acquire order, which will order
+ * future load and stores against the inc, this ensures all subsequent accesses
+ * are from this object and not anything previously occupying this memory --
+ * consider SLAB_TYPESAFE_BY_RCU.
  *
  * The decrements will provide release order, such that all the prior loads and
  * stores will be issued before, it also provides a control dependency, which
@@ -144,7 +145,7 @@  bool __refcount_add_not_zero(int i, refcount_t *r, int *oldp)
 	do {
 		if (!old)
 			break;
-	} while (!atomic_try_cmpxchg_relaxed(&r->refs, &old, old + i));
+	} while (!atomic_try_cmpxchg_acquire(&r->refs, &old, old + i));
 
 	if (oldp)
 		*oldp = old;
@@ -225,9 +226,9 @@  static inline __must_check bool __refcount_inc_not_zero(refcount_t *r, int *oldp
  * Similar to atomic_inc_not_zero(), but will saturate at REFCOUNT_SATURATED
  * and WARN.
  *
- * Provides no memory ordering, it is assumed the caller has guaranteed the
- * object memory to be stable (RCU, etc.). It does provide a control dependency
- * and thereby orders future stores. See the comment on top.
+ * Provides acquire ordering, such that subsequent accesses are after the
+ * increment. This is important for the cases where secondary validation is
+ * required, eg. SLAB_TYPESAFE_BY_RCU.
  *
  * Return: true if the increment was successful, false otherwise
  */