mbox series

[v3,00/10] fs: lockless mntns lookup

Message ID 20241213-work-mount-rbtree-lockless-v3-0-6e3cdaf9b280@kernel.org (mailing list archive)
Headers show
Series fs: lockless mntns lookup | expand

Message

Christian Brauner Dec. 12, 2024, 11:03 p.m. UTC
Hey,

This now also includes selftests for iterating mount namespaces both
backwards and forwards.

Currently we take the read lock when looking for a mount namespace to
list mounts in. We can make this lockless. The simple search case can
just use a sequence counter to detect concurrent changes to the rbtree.

For walking the list of mount namespaces sequentially via nsfs we keep a
separate rcu list as rb_prev() and rb_next() aren't usable safely with
rcu.

Since creating mount namespaces is a relatively rare event compared with
querying mounts in a foreign mount namespace this is worth it. Once
libmount and systemd pick up this mechanism to list mounts in foreign
mount namespaces this will be used very frequently.

Thanks!
Christian

---
Changes in v3:
- Add selftests.
- Put list_head into a union with the wait_queue_head_t for poll instead
  of the mnt_ns_tree_node which would've risked breaking rbtree
  traversal.
- Handle insertion into the mount namespace list correctly by making use
  of the rbtree position information after the mount namespace has been
  added to it.
- Improve the documentation for the new list_bidir_{del,prev}_rcu().
- Link to v2: https://lore.kernel.org/r/20241212-work-mount-rbtree-lockless-v2-0-4fe6cef02534@kernel.org

Changes in v2:
- Remove mnt_ns_find_it_at() by switching to rb_find_rcu().
- Add separate list to lookup sequential mount namespaces.
- Link to v1: https://lore.kernel.org/r/20241210-work-mount-rbtree-lockless-v1-0-338366b9bbe4@kernel.org

---
Christian Brauner (10):
      mount: remove inlude/nospec.h include
      fs: add mount namespace to rbtree late
      fs: lockless mntns rbtree lookup
      rculist: add list_bidir_{del,prev}_rcu()
      fs: lockless mntns lookup for nsfs
      fs: simplify rwlock to spinlock
      seltests: move nsfs into filesystems subfolder
      selftests: add tests for mntns iteration
      selftests: remove unneeded include
      samples: add test-list-all-mounts

 fs/mount.h                                         |  18 +-
 fs/namespace.c                                     | 163 ++++++++------
 fs/nsfs.c                                          |   5 +-
 include/linux/rculist.h                            |  47 +++++
 samples/vfs/.gitignore                             |   1 +
 samples/vfs/Makefile                               |   2 +-
 samples/vfs/test-list-all-mounts.c                 | 235 +++++++++++++++++++++
 .../selftests/{ => filesystems}/nsfs/.gitignore    |   1 +
 .../selftests/{ => filesystems}/nsfs/Makefile      |   4 +-
 .../selftests/{ => filesystems}/nsfs/config        |   0
 .../selftests/filesystems/nsfs/iterate_mntns.c     | 149 +++++++++++++
 .../selftests/{ => filesystems}/nsfs/owner.c       |   0
 .../selftests/{ => filesystems}/nsfs/pidns.c       |   0
 tools/testing/selftests/pidfd/pidfd.h              |   1 -
 14 files changed, 546 insertions(+), 80 deletions(-)
---
base-commit: 40384c840ea1944d7c5a392e8975ed088ecf0b37
change-id: 20241207-work-mount-rbtree-lockless-7d4071b74f18

Comments

Jeff Layton Dec. 13, 2024, 7:03 p.m. UTC | #1
On Fri, 2024-12-13 at 00:03 +0100, Christian Brauner wrote:
> Hey,
> 
> This now also includes selftests for iterating mount namespaces both
> backwards and forwards.
> 
> Currently we take the read lock when looking for a mount namespace to
> list mounts in. We can make this lockless. The simple search case can
> just use a sequence counter to detect concurrent changes to the rbtree.
> 
> For walking the list of mount namespaces sequentially via nsfs we keep a
> separate rcu list as rb_prev() and rb_next() aren't usable safely with
> rcu.
> 
> Since creating mount namespaces is a relatively rare event compared with
> querying mounts in a foreign mount namespace this is worth it. Once
> libmount and systemd pick up this mechanism to list mounts in foreign
> mount namespaces this will be used very frequently.
> 
> Thanks!
> Christian
> 
> ---
> Changes in v3:
> - Add selftests.
> - Put list_head into a union with the wait_queue_head_t for poll instead
>   of the mnt_ns_tree_node which would've risked breaking rbtree
>   traversal.
> - Handle insertion into the mount namespace list correctly by making use
>   of the rbtree position information after the mount namespace has been
>   added to it.
> - Improve the documentation for the new list_bidir_{del,prev}_rcu().
> - Link to v2: https://lore.kernel.org/r/20241212-work-mount-rbtree-lockless-v2-0-4fe6cef02534@kernel.org
> 
> Changes in v2:
> - Remove mnt_ns_find_it_at() by switching to rb_find_rcu().
> - Add separate list to lookup sequential mount namespaces.
> - Link to v1: https://lore.kernel.org/r/20241210-work-mount-rbtree-lockless-v1-0-338366b9bbe4@kernel.org
> 
> ---
> Christian Brauner (10):
>       mount: remove inlude/nospec.h include
>       fs: add mount namespace to rbtree late
>       fs: lockless mntns rbtree lookup
>       rculist: add list_bidir_{del,prev}_rcu()
>       fs: lockless mntns lookup for nsfs
>       fs: simplify rwlock to spinlock
>       seltests: move nsfs into filesystems subfolder
>       selftests: add tests for mntns iteration
>       selftests: remove unneeded include
>       samples: add test-list-all-mounts
> 
>  fs/mount.h                                         |  18 +-
>  fs/namespace.c                                     | 163 ++++++++------
>  fs/nsfs.c                                          |   5 +-
>  include/linux/rculist.h                            |  47 +++++
>  samples/vfs/.gitignore                             |   1 +
>  samples/vfs/Makefile                               |   2 +-
>  samples/vfs/test-list-all-mounts.c                 | 235 +++++++++++++++++++++
>  .../selftests/{ => filesystems}/nsfs/.gitignore    |   1 +
>  .../selftests/{ => filesystems}/nsfs/Makefile      |   4 +-
>  .../selftests/{ => filesystems}/nsfs/config        |   0
>  .../selftests/filesystems/nsfs/iterate_mntns.c     | 149 +++++++++++++
>  .../selftests/{ => filesystems}/nsfs/owner.c       |   0
>  .../selftests/{ => filesystems}/nsfs/pidns.c       |   0
>  tools/testing/selftests/pidfd/pidfd.h              |   1 -
>  14 files changed, 546 insertions(+), 80 deletions(-)
> ---
> base-commit: 40384c840ea1944d7c5a392e8975ed088ecf0b37
> change-id: 20241207-work-mount-rbtree-lockless-7d4071b74f18
> 

Reviewed-by: Jeff Layton <jlayton@kernel.org>