mbox series

[v3,00/15] futex: More futex2 bits

Message ID 20230921104505.717750284@noisy.programming.kicks-ass.net (mailing list archive)
Headers show
Series futex: More futex2 bits | expand

Message

Peter Zijlstra Sept. 21, 2023, 10:45 a.m. UTC
Hi!

New version of the futex2 patches. Futex2 is a new interface to the same 'old'
futex core. An attempt to get away from the multiplex syscall and add a little
room for extentions.

Changes since v2:
 - Rebased to v6.6-rc
 - New FUTEX_STRICT flag (Andre)
 - Reordered futex_size() helper (tglx)
 - Updated some comments (tglx)
 - Folded some tags

My plan is to push the first 10 patches (all the syscalls) into
tip/locking/core this afternoon. All those patches have plenty review tags
including from Thomas who is the actual maintainer of this lot :-)

This should be plenty for Jens to get a move on with the io-uring stuff.

I'm holding off on the NUMA bits for now, because I want to write some
userspace for it since there is some confusion on that -- but I seem to keep
getting side-tracked :/

Patches also available at:

  git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git locking/core
  git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git locking/futex

Where the locking/core thing is the first 10 patches only, and barring Link
tags (which I'll harvest from this posting), will be what I'll push out to tip.

Comments

Peter Zijlstra Sept. 22, 2023, 8:01 p.m. UTC | #1
Hi!

Updated version of patch 15/15 and a few extra patches for testing the
FUTEX2_NUMA bits. The last patch (17/15) should never be applied for anything
you care about and exists purely because I'm too lazy to generate actual
hash-bucket contention.

On my 2 node IVB-EP:

 $ echo FUTEX_SQUASH > /debug/sched/features

Effectively reducing each node to 1 bucket.

 $ numactl -m0 -N0 ./futex_numa -c10 -t2 -n0 -N0 &
   numactl -m1 -N1 ./futex_numa -c10 -t2 -n0 -N0

 ...
 contenders: 16154935
 contenders: 16202472

 $ numactl -m0 -N0 ./futex_numa -c10 -t2 -n0 -N0 &
   numactl -m1 -N1 ./futex_numa -c10 -t2 -n0 -N1

 contenders: 48584991
 contenders: 48680560

(loop counts, higher is better)

Clearly showing how separating the hashes works. 

The first one runs 10 contenders on each node but forces the (numa) futex to
hash to node 0 for both. This ensures all 20 contenders hash to the same
bucket and *ouch*.

The second one does the same, except now fully separates the nodes. Performance
is much improved.

Proving the per-node hashing actually works as advertised.

Further:

 $ ./futex_numa -t2 -n50000 -s1 -N
 ...
 node: -1
 node: -1
 node: 0
 node: 0
 node: -1
 node: -1
 node: 1
 node: 1
 ...
 total: 8980

Shows how a FUTEX2_NUMA lock can bounce around the nodes. The test has some
trivial asserts trying to show critical section integrity, but otherwise does
lock+unlock cycles with a nanosleep.

This both illustrates how to build a (trivial) lock using FUTEX2_NUMA and
proves the functionality works.
Davidlohr Bueso Sept. 28, 2023, 1:40 p.m. UTC | #2
On Fri, 22 Sep 2023, Peter Zijlstra wrote:

>Hi!
>
>Updated version of patch 15/15 and a few extra patches for testing the
>FUTEX2_NUMA bits. The last patch (17/15) should never be applied for anything
>you care about and exists purely because I'm too lazy to generate actual
>hash-bucket contention.
>
>On my 2 node IVB-EP:
>
> $ echo FUTEX_SQUASH > /debug/sched/features
>
>Effectively reducing each node to 1 bucket.
>
> $ numactl -m0 -N0 ./futex_numa -c10 -t2 -n0 -N0 &
>   numactl -m1 -N1 ./futex_numa -c10 -t2 -n0 -N0
>
> ...
> contenders: 16154935
> contenders: 16202472
>
> $ numactl -m0 -N0 ./futex_numa -c10 -t2 -n0 -N0 &
>   numactl -m1 -N1 ./futex_numa -c10 -t2 -n0 -N1
>
> contenders: 48584991
> contenders: 48680560
>
>(loop counts, higher is better)
>
>Clearly showing how separating the hashes works.
>
>The first one runs 10 contenders on each node but forces the (numa) futex to
>hash to node 0 for both. This ensures all 20 contenders hash to the same
>bucket and *ouch*.
>
>The second one does the same, except now fully separates the nodes. Performance
>is much improved.
>
>Proving the per-node hashing actually works as advertised.

Very nice.