diff mbox series

[GIT,PULL] vfs mount

Message ID 20250322-vfs-mount-b08c842965f4@brauner (mailing list archive)
State New
Headers show
Series [GIT,PULL] vfs mount | expand

Pull-request

git@gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs tags/vfs-6.15-rc1.mount

Commit Message

Christian Brauner March 22, 2025, 10:13 a.m. UTC
Hey Linus,

/* Summary */

This contains the first batch of mount updates for this cycle:

- Mount notifications

  The day has come where we finally provide a new api to listen for
  mount topology changes outside of /proc/<pid>/mountinfo. A mount
  namespace file descriptor can be supplied and registered with fanotify
  to listen for mount topology changes.

  Currently notifications for mount, umount and moving mounts are
  generated. The generated notification record contains the unique mount
  id of the mount.

  The listmount() and statmount() api can be used to query detailed
  information about the mount using the received unique mount id.

  This allows userspace to figure out exactly how the mount topology
  changed without having to generating diffs of /proc/<pid>/mountinfo in
  userspace.

- Support O_PATH file descriptors with FSCONFIG_SET_FD in the new mount api.

- Support detached mounts in overlayfs.

  Since last cycle we support specifying overlayfs layers via file
  descriptors. However, we don't allow detached mounts which means
  userspace cannot user file descriptors received via
  open_tree(OPEN_TREE_CLONE) and fsmount() directly. They have to attach
  them to a mount namespace via move_mount() first. This is cumbersome
  and means they have to undo mounts via umount(). This allows them to
  directly use detached mounts.

- Allow to retrieve idmappings with statmount.

  Currently it isn't possible to figure out what idmapping has been
  attached to an idmapped mount. Add an extension to statmount() which
  allows to read the idmapping from the mount.

- Allow creating idmapped mounts from mounts that are already idmapped.

  So far it isn't possible to allow the creation of idmapped mounts from
  already idmapped mounts as this has significant lifetime implications.
  Make the creation of idmapped mounts atomic by allow to pass struct
  mount_attr together with the open_tree_attr() system call allowing to
  solve these issues without complicating VFS lookup in any way.

  The system call has in general the benefit that creating a detached
  mount and applying mount attributes to it becomes an atomic operation
  for userspace.

- Add a way to query statmount() for supported options.

  Allow userspace to query which mount information can be retrieved
  through statmount().

- Allow superblock owners to force unmount.

/* Testing */

gcc version 14.2.0 (Debian 14.2.0-6)
Debian clang version 16.0.6 (27+b1)

No build failures or warnings were observed.

/* Conflicts */

Merge conflicts with mainline
=============================

No known conflicts.

Merge conflicts with other trees
================================

This contains a merge conflict with the vfs-6.15.misc pull request:

+++ b/fs/internal.h
@@@ -337,4 -338,4 +337,5 @@@ static inline bool path_mounted(const s
        return path->mnt->mnt_root == path->dentry;
  }
  void file_f_owner_release(struct file *file);
 +bool file_seek_cur_needs_f_lock(struct file *file);
+ int statmount_mnt_idmap(struct mnt_idmap *idmap, struct seq_file *seq, bool uid_map);

The following changes since commit 2014c95afecee3e76ca4a56956a936e23283f05b:

  Linux 6.14-rc1 (2025-02-02 15:39:26 -0800)

are available in the Git repository at:

  git@gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs tags/vfs-6.15-rc1.mount

for you to fetch changes up to e1ff7aa34dec7e650159fd7ca8ec6af7cc428d9f:

  umount: Allow superblock owners to force umount (2025-03-19 09:19:04 +0100)

Please consider pulling these changes from the signed vfs-6.15-rc1.mount tag.

Thanks!
Christian

----------------------------------------------------------------
vfs-6.15-rc1.mount

----------------------------------------------------------------
Arnd Bergmann (1):
      samples/vfs: fix printf format string for size_t

Christian Brauner (18):
      Merge patch series "mount notification"
      fs: support O_PATH fds with FSCONFIG_SET_FD
      selftests/overlayfs: test specifying layers as O_PATH file descriptors
      Merge patch series "ovl: allow O_PATH file descriptor when specifying layers"
      fs: allow detached mounts in clone_private_mount()
      uidgid: add map_id_range_up()
      statmount: allow to retrieve idmappings
      samples/vfs: check whether flag was raised
      selftests: add tests for using detached mount with overlayfs
      samples/vfs: add STATMOUNT_MNT_{G,U}IDMAP
      Merge patch series "fs: allow detached mounts in clone_private_mount()"
      fs: add vfs_open_tree() helper
      fs: add copy_mount_setattr() helper
      fs: add open_tree_attr()
      fs: add kflags member to struct mount_kattr
      fs: allow changing idmappings
      Merge patch series "statmount: allow to retrieve idmappings"
      Merge patch series "fs: allow changing idmappings"

Jeff Layton (1):
      statmount: add a new supported_mask field

Miklos Szeredi (5):
      fsnotify: add mount notification infrastructure
      fanotify: notify on mount attach and detach
      vfs: add notifications for mount attach and detach
      selinux: add FILE__WATCH_MOUNTNS
      selftests: add tests for mount notification

Trond Myklebust (1):
      umount: Allow superblock owners to force umount

 arch/alpha/kernel/syscalls/syscall.tbl             |   1 +
 arch/arm/tools/syscall.tbl                         |   1 +
 arch/arm64/tools/syscall_32.tbl                    |   1 +
 arch/m68k/kernel/syscalls/syscall.tbl              |   1 +
 arch/microblaze/kernel/syscalls/syscall.tbl        |   1 +
 arch/mips/kernel/syscalls/syscall_n32.tbl          |   1 +
 arch/mips/kernel/syscalls/syscall_n64.tbl          |   1 +
 arch/mips/kernel/syscalls/syscall_o32.tbl          |   1 +
 arch/parisc/kernel/syscalls/syscall.tbl            |   1 +
 arch/powerpc/kernel/syscalls/syscall.tbl           |   1 +
 arch/s390/kernel/syscalls/syscall.tbl              |   1 +
 arch/sh/kernel/syscalls/syscall.tbl                |   1 +
 arch/sparc/kernel/syscalls/syscall.tbl             |   1 +
 arch/x86/entry/syscalls/syscall_32.tbl             |   1 +
 arch/x86/entry/syscalls/syscall_64.tbl             |   1 +
 arch/xtensa/kernel/syscalls/syscall.tbl            |   1 +
 fs/autofs/autofs_i.h                               |   2 +
 fs/fsopen.c                                        |   2 +-
 fs/internal.h                                      |   1 +
 fs/mnt_idmapping.c                                 |  51 ++
 fs/mount.h                                         |  26 ++
 fs/namespace.c                                     | 485 ++++++++++++++-----
 fs/notify/fanotify/fanotify.c                      |  38 +-
 fs/notify/fanotify/fanotify.h                      |  18 +
 fs/notify/fanotify/fanotify_user.c                 |  89 +++-
 fs/notify/fdinfo.c                                 |   5 +
 fs/notify/fsnotify.c                               |  47 +-
 fs/notify/fsnotify.h                               |  11 +
 fs/notify/mark.c                                   |  14 +-
 fs/pnode.c                                         |   4 +-
 include/linux/fanotify.h                           |  12 +-
 include/linux/fsnotify.h                           |  20 +
 include/linux/fsnotify_backend.h                   |  42 ++
 include/linux/mnt_idmapping.h                      |   5 +
 include/linux/syscalls.h                           |   4 +
 include/linux/uidgid.h                             |   6 +
 include/uapi/asm-generic/unistd.h                  |   4 +-
 include/uapi/linux/fanotify.h                      |  10 +
 include/uapi/linux/mount.h                         |  10 +-
 kernel/user_namespace.c                            |  26 +-
 samples/vfs/samples-vfs.h                          |  14 +-
 samples/vfs/test-list-all-mounts.c                 |  35 +-
 scripts/syscall.tbl                                |   1 +
 security/selinux/hooks.c                           |   3 +
 security/selinux/include/classmap.h                |   2 +-
 tools/testing/selftests/Makefile                   |   1 +
 .../selftests/filesystems/mount-notify/.gitignore  |   2 +
 .../selftests/filesystems/mount-notify/Makefile    |   6 +
 .../filesystems/mount-notify/mount-notify_test.c   | 516 +++++++++++++++++++++
 .../filesystems/overlayfs/set_layers_via_fds.c     | 195 ++++++++
 .../selftests/filesystems/overlayfs/wrappers.h     |  17 +
 .../selftests/filesystems/statmount/statmount.h    |   2 +-
 52 files changed, 1567 insertions(+), 175 deletions(-)
 create mode 100644 tools/testing/selftests/filesystems/mount-notify/.gitignore
 create mode 100644 tools/testing/selftests/filesystems/mount-notify/Makefile
 create mode 100644 tools/testing/selftests/filesystems/mount-notify/mount-notify_test.c

Comments

pr-tracker-bot@kernel.org March 24, 2025, 9 p.m. UTC | #1
The pull request you sent on Sat, 22 Mar 2025 11:13:18 +0100:

> git@gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs tags/vfs-6.15-rc1.mount

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/fd101da676362aaa051b4f5d8a941bd308603041

Thank you!
Leon Romanovsky April 1, 2025, 5:07 p.m. UTC | #2
On Mon, Mar 24, 2025 at 09:00:59PM +0000, pr-tracker-bot@kernel.org wrote:
> The pull request you sent on Sat, 22 Mar 2025 11:13:18 +0100:
> 
> > git@gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs tags/vfs-6.15-rc1.mount
> 
> has been merged into torvalds/linux.git:
> https://git.kernel.org/torvalds/c/fd101da676362aaa051b4f5d8a941bd308603041

I didn't bisect, but this PR looks like the most relevant candidate.
The latest Linus's master generates the following slab-use-after-free:

 [ 1845.404658] ==================================================================
 [ 1845.405460] BUG: KASAN: slab-use-after-free in clone_private_mount+0x309/0x390
 [ 1845.406205] Read of size 8 at addr ffff8881507b5ab0 by task dockerd/8697
 [ 1845.406847]
 [ 1845.407081] CPU: 5 UID: 0 PID: 8697 Comm: dockerd Not tainted 6.14.0master_fbece6d #1 NONE
 [ 1845.407086] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
 [ 1845.407097] Call Trace:
 [ 1845.407102]  <TASK>
 [ 1845.407104]  dump_stack_lvl+0x69/0xa0
 [ 1845.407114]  print_report+0x156/0x523
 [ 1845.407120]  ? __virt_addr_valid+0x1de/0x3c0
 [ 1845.407124]  ? clone_private_mount+0x309/0x390
 [ 1845.407128]  kasan_report+0xc1/0xf0
 [ 1845.407134]  ? clone_private_mount+0x309/0x390
 [ 1845.407138]  clone_private_mount+0x309/0x390
 [ 1845.407144]  ovl_fill_super+0x2965/0x59e0 [overlay]
 [ 1845.407165]  ? ovl_workdir_create+0x900/0x900 [overlay]
 [ 1845.407177]  ? wait_for_completion_io_timeout+0x20/0x20
 [ 1845.407182]  ? lockdep_init_map_type+0x58/0x220
 [ 1845.407186]  ? lockdep_init_map_type+0x58/0x220
 [ 1845.407189]  ? shrinker_register+0x177/0x200
 [ 1845.407194]  ? sget_fc+0x449/0xb30
 [ 1845.407199]  ? ovl_workdir_create+0x900/0x900 [overlay]
 [ 1845.407211]  ? get_tree_nodev+0xa5/0x130
 [ 1845.407214]  get_tree_nodev+0xa5/0x130
 [ 1845.407218]  ? cap_capable+0xd0/0x320
 [ 1845.407223]  vfs_get_tree+0x83/0x2e0
 [ 1845.407227]  ? ns_capable+0x55/0xb0
 [ 1845.407232]  path_mount+0x891/0x1aa0
 [ 1845.407237]  ? finish_automount+0x860/0x860
 [ 1845.407240]  ? kmem_cache_free+0x14c/0x4f0
 [ 1845.407245]  ? user_path_at+0x3d/0x50
 [ 1845.407250]  __x64_sys_mount+0x2d4/0x3a0
 [ 1845.407254]  ? path_mount+0x1aa0/0x1aa0
 [ 1845.407259]  do_syscall_64+0x6d/0x140
 [ 1845.407263]  entry_SYSCALL_64_after_hwframe+0x4b/0x53
 [ 1845.407267] RIP: 0033:0x55e3487f1fea
 [ 1845.407274] Code: e8 1b 96 fa ff 48 8b 7c 24 10 48 8b 74 24 18 48 8b 54 24 20 4c 8b 54 24 28 4c 8b 44 24 30 4c 8b 4c 24 38 48 8b 44 24 08 0f 05 <48> 3d 01 f0 ff ff 76 20 48 c7 44 24 40 ff ff ff ff 48 c7 44 24 48
 [ 1845.407278] RSP: 002b:000000c000b563b8 EFLAGS: 00000212 ORIG_RAX: 00000000000000a5
 [ 1845.407282] RAX: ffffffffffffffda RBX: 000000c00006c000 RCX: 000055e3487f1fea
 [ 1845.407285] RDX: 000000c0012cf7d8 RSI: 000000c0012616c0 RDI: 000000c0012cf7d0
 [ 1845.407287] RBP: 000000c000b56458 R08: 000000c0004fa600 R09: 0000000000000000
 [ 1845.407289] R10: 0000000000000000 R11: 0000000000000212 R12: 000000c0012cf7d0
 [ 1845.407291] R13: 0000000000000000 R14: 000000c00098b6c0 R15: ffffffffffffffff
 [ 1845.407296]  </TASK>
 [ 1845.407297]
 [ 1845.431635] Allocated by task 17044:
 [ 1845.432033]  kasan_save_stack+0x1e/0x40
 [ 1845.432463]  kasan_save_track+0x10/0x30
 [ 1845.432882]  __kasan_slab_alloc+0x62/0x70
 [ 1845.433308]  kmem_cache_alloc_noprof+0x1a0/0x4a0
 [ 1845.433781]  alloc_vfsmnt+0x23/0x6c0
 [ 1845.434195]  vfs_create_mount+0x82/0x4a0
 [ 1845.434623]  path_mount+0x939/0x1aa0
 [ 1845.435018]  __x64_sys_mount+0x2d4/0x3a0
 [ 1845.435440]  do_syscall_64+0x6d/0x140
 [ 1845.435842]  entry_SYSCALL_64_after_hwframe+0x4b/0x53
 [ 1845.436355]
 [ 1845.436601] Freed by task 0:
 [ 1845.436945]  kasan_save_stack+0x1e/0x40
 [ 1845.437354]  kasan_save_track+0x10/0x30
 [ 1845.437770]  kasan_save_free_info+0x37/0x60
 [ 1845.438217]  __kasan_slab_free+0x33/0x40
 [ 1845.438646]  kmem_cache_free+0x14c/0x4f0
 [ 1845.439068]  rcu_core+0x605/0x1d50
 [ 1845.439451]  handle_softirqs+0x192/0x810
 [ 1845.439880]  irq_exit_rcu+0x106/0x190
 [ 1845.440280]  sysvec_apic_timer_interrupt+0x7c/0xb0
 [ 1845.440785]  asm_sysvec_apic_timer_interrupt+0x16/0x20
 [ 1845.441300]
 [ 1845.441544] Last potentially related work creation:
 [ 1845.442048]  kasan_save_stack+0x1e/0x40
 [ 1845.442465]  kasan_record_aux_stack+0x97/0xa0
 [ 1845.442921]  __call_rcu_common.constprop.0+0x6d/0xb40
 [ 1845.443437]  task_work_run+0x111/0x1f0
 [ 1845.443851]  syscall_exit_to_user_mode+0x1df/0x1f0
 [ 1845.444337]  do_syscall_64+0x79/0x140
 [ 1845.444758]  entry_SYSCALL_64_after_hwframe+0x4b/0x53
 [ 1845.445272]
 [ 1845.445505] Second to last potentially related work creation:
 [ 1845.446078]  kasan_save_stack+0x1e/0x40
 [ 1845.446494]  kasan_record_aux_stack+0x97/0xa0
 [ 1845.446947]  task_work_add+0x178/0x250
 [ 1845.447356]  mntput_no_expire+0x4fc/0x9f0
 [ 1845.447789]  path_umount+0x4ed/0x10d0
 [ 1845.448190]  __x64_sys_umount+0xfb/0x120
 [ 1845.448617]  do_syscall_64+0x6d/0x140
 [ 1845.449016]  entry_SYSCALL_64_after_hwframe+0x4b/0x53
 [ 1845.449529]
 [ 1845.449766] The buggy address belongs to the object at ffff8881507b5a40
 [ 1845.449766]  which belongs to the cache mnt_cache of size 368
 [ 1845.450898] The buggy address is located 112 bytes inside of
 [ 1845.450898]  freed 368-byte region [ffff8881507b5a40, ffff8881507b5bb0)
 [ 1845.452009]
 [ 1845.452250] The buggy address belongs to the physical page:
 [ 1845.452808] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1507b4
 [ 1845.453595] head: order:2 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0
 [ 1845.454363] anon flags: 0x200000000000040(head|node=0|zone=2)
 [ 1845.454936] page_type: f5(slab)
 [ 1845.455300] raw: 0200000000000040 ffff8881009f5680 0000000000000000 dead000000000001
 [ 1845.456077] raw: 0000000000000000 0000000080240024 00000000f5000000 0000000000000000
 [ 1845.456857] head: 0200000000000040 ffff8881009f5680 0000000000000000 dead000000000001
 [ 1845.457616] head: 0000000000000000 0000000080240024 00000000f5000000 0000000000000000
 [ 1845.458399] head: 0200000000000002 ffffea000541ed01 ffffffffffffffff 0000000000000000
 [ 1845.459169] head: 0000000000000004 0000000000000000 00000000ffffffff 0000000000000000
 [ 1845.459945] page dumped because: kasan: bad access detected
 [ 1845.460506]
 [ 1845.460745] Memory state around the buggy address:
 [ 1845.461228]  ffff8881507b5980: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fc fc
 [ 1845.461963]  ffff8881507b5a00: fc fc fc fc fc fc fc fc fa fb fb fb fb fb fb fb
 [ 1845.462759] >ffff8881507b5a80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 [ 1845.463480]                                      ^
 [ 1845.463968]  ffff8881507b5b00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 [ 1845.464704]  ffff8881507b5b80: fb fb fb fb fb fb fc fc fc fc fc fc fc fc fc fc
 [ 1845.465430] ==================================================================
 [ 1845.466181] Disabling lock debugging due to kernel taint
 [ 1845.466717] ==================================================================
 [ 1845.467443] BUG: KASAN: slab-use-after-free in clone_private_mount+0x313/0x390
 [ 1845.468192] Read of size 8 at addr ffff8881507b5a58 by task dockerd/8697
 [ 1845.468837]
 [ 1845.469072] CPU: 5 UID: 0 PID: 8697 Comm: dockerd Tainted: G    B               6.14.0master_fbece6d #1 NONE
 [ 1845.469078] Tainted: [B]=BAD_PAGE
 [ 1845.469079] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
 [ 1845.469082] Call Trace:
 [ 1845.469084]  <TASK>
 [ 1845.469086]  dump_stack_lvl+0x69/0xa0
 [ 1845.469093]  print_report+0x156/0x523
 [ 1845.469098]  ? __virt_addr_valid+0x1de/0x3c0
 [ 1845.469103]  ? clone_private_mount+0x313/0x390
 [ 1845.469107]  kasan_report+0xc1/0xf0
 [ 1845.469112]  ? clone_private_mount+0x313/0x390
 [ 1845.469116]  clone_private_mount+0x313/0x390
 [ 1845.469121]  ovl_fill_super+0x2965/0x59e0 [overlay]
 [ 1845.469140]  ? ovl_workdir_create+0x900/0x900 [overlay]
 [ 1845.469152]  ? wait_for_completion_io_timeout+0x20/0x20
 [ 1845.469157]  ? lockdep_init_map_type+0x58/0x220
 [ 1845.469161]  ? lockdep_init_map_type+0x58/0x220
 [ 1845.469164]  ? shrinker_register+0x177/0x200
 [ 1845.469169]  ? sget_fc+0x449/0xb30
 [ 1845.469174]  ? ovl_workdir_create+0x900/0x900 [overlay]
 [ 1845.469185]  ? get_tree_nodev+0xa5/0x130
 [ 1845.469189]  get_tree_nodev+0xa5/0x130
 [ 1845.469192]  ? cap_capable+0xd0/0x320
 [ 1845.469198]  vfs_get_tree+0x83/0x2e0
 [ 1845.469202]  ? ns_capable+0x55/0xb0
 [ 1845.469206]  path_mount+0x891/0x1aa0
 [ 1845.469210]  ? finish_automount+0x860/0x860
 [ 1845.469217]  ? kmem_cache_free+0x14c/0x4f0
 [ 1845.469221]  ? user_path_at+0x3d/0x50
 [ 1845.469227]  __x64_sys_mount+0x2d4/0x3a0
 [ 1845.469231]  ? path_mount+0x1aa0/0x1aa0
 [ 1845.469235]  do_syscall_64+0x6d/0x140
 [ 1845.469239]  entry_SYSCALL_64_after_hwframe+0x4b/0x53
 [ 1845.469242] RIP: 0033:0x55e3487f1fea
 [ 1845.469246] Code: e8 1b 96 fa ff 48 8b 7c 24 10 48 8b 74 24 18 48 8b 54 24 20 4c 8b 54 24 28 4c 8b 44 24 30 4c 8b 4c 24 38 48 8b 44 24 08 0f 05 <48> 3d 01 f0 ff ff 76 20 48 c7 44 24 40 ff ff ff ff 48 c7 44 24 48
 [ 1845.469249] RSP: 002b:000000c000b563b8 EFLAGS: 00000212 ORIG_RAX: 00000000000000a5
 [ 1845.469253] RAX: ffffffffffffffda RBX: 000000c00006c000 RCX: 000055e3487f1fea
 [ 1845.469256] RDX: 000000c0012cf7d8 RSI: 000000c0012616c0 RDI: 000000c0012cf7d0
 [ 1845.469260] RBP: 000000c000b56458 R08: 000000c0004fa600 R09: 0000000000000000
 [ 1845.469261] R10: 0000000000000000 R11: 0000000000000212 R12: 000000c0012cf7d0
 [ 1845.469263] R13: 0000000000000000 R14: 000000c00098b6c0 R15: ffffffffffffffff
 [ 1845.469268]  </TASK>
 [ 1845.469269]
 [ 1845.494368] Allocated by task 17044:
 [ 1845.494768]  kasan_save_stack+0x1e/0x40
 [ 1845.495185]  kasan_save_track+0x10/0x30
 [ 1845.495594]  __kasan_slab_alloc+0x62/0x70
 [ 1845.496024]  kmem_cache_alloc_noprof+0x1a0/0x4a0
 [ 1845.496518]  alloc_vfsmnt+0x23/0x6c0
 [ 1845.496911]  vfs_create_mount+0x82/0x4a0
 [ 1845.497333]  path_mount+0x939/0x1aa0
 [ 1845.497728]  __x64_sys_mount+0x2d4/0x3a0
 [ 1845.498167]  do_syscall_64+0x6d/0x140
 [ 1845.498563]  entry_SYSCALL_64_after_hwframe+0x4b/0x53
 [ 1845.499064]
 [ 1845.499295] Freed by task 0:
 [ 1845.499636]  kasan_save_stack+0x1e/0x40
 [ 1845.500052]  kasan_save_track+0x10/0x30
 [ 1845.500494]  kasan_save_free_info+0x37/0x60
 [ 1845.500934]  __kasan_slab_free+0x33/0x40
 [ 1845.501355]  kmem_cache_free+0x14c/0x4f0
 [ 1845.501774]  rcu_core+0x605/0x1d50
 [ 1845.502162]  handle_softirqs+0x192/0x810
 [ 1845.502587]  irq_exit_rcu+0x106/0x190
 [ 1845.502995]  sysvec_apic_timer_interrupt+0x7c/0xb0
 [ 1845.503487]  asm_sysvec_apic_timer_interrupt+0x16/0x20
 [ 1845.504002]
 [ 1845.504236] Last potentially related work creation:
 [ 1845.504748]  kasan_save_stack+0x1e/0x40
 [ 1845.505164]  kasan_record_aux_stack+0x97/0xa0
 [ 1845.505621]  __call_rcu_common.constprop.0+0x6d/0xb40
 [ 1845.506136]  task_work_run+0x111/0x1f0
 [ 1845.506545]  syscall_exit_to_user_mode+0x1df/0x1f0
 [ 1845.507038]  do_syscall_64+0x79/0x140
 [ 1845.507439]  entry_SYSCALL_64_after_hwframe+0x4b/0x53
 [ 1845.507949]
 [ 1845.508187] Second to last potentially related work creation:
 [ 1845.508760]  kasan_save_stack+0x1e/0x40
 [ 1845.509175]  kasan_record_aux_stack+0x97/0xa0
 [ 1845.509630]  task_work_add+0x178/0x250
 [ 1845.510040]  mntput_no_expire+0x4fc/0x9f0
 [ 1845.510468]  path_umount+0x4ed/0x10d0
 [ 1845.510870]  __x64_sys_umount+0xfb/0x120
 [ 1845.511298]  do_syscall_64+0x6d/0x140
 [ 1845.511700]  entry_SYSCALL_64_after_hwframe+0x4b/0x53
 [ 1845.512210]
 [ 1845.512442] The buggy address belongs to the object at ffff8881507b5a40
 [ 1845.512442]  which belongs to the cache mnt_cache of size 368
 [ 1845.513553] The buggy address is located 24 bytes inside of
 [ 1845.513553]  freed 368-byte region [ffff8881507b5a40, ffff8881507b5bb0)
 [ 1845.514650]
 [ 1845.514883] The buggy address belongs to the physical page:
 [ 1845.515436] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1507b4
 [ 1845.516221] head: order:2 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0
 [ 1845.516986] anon flags: 0x200000000000040(head|node=0|zone=2)
 [ 1845.517549] page_type: f5(slab)
 [ 1845.517912] raw: 0200000000000040 ffff8881009f5680 0000000000000000 dead000000000001
 [ 1845.518684] raw: 0000000000000000 0000000080240024 00000000f5000000 0000000000000000
 [ 1845.519445] head: 0200000000000040 ffff8881009f5680 0000000000000000 dead000000000001
 [ 1845.520220] head: 0000000000000000 0000000080240024 00000000f5000000 0000000000000000
 [ 1845.521006] head: 0200000000000002 ffffea000541ed01 ffffffffffffffff 0000000000000000
 [ 1845.521812] head: 0000000000000004 0000000000000000 00000000ffffffff 0000000000000000
 [ 1845.522581] page dumped because: kasan: bad access detected
 [ 1845.523131]
 [ 1845.523362] Memory state around the buggy address:
 [ 1845.523851]  ffff8881507b5900: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 [ 1845.524588]  ffff8881507b5980: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fc fc
 [ 1845.525321] >ffff8881507b5a00: fc fc fc fc fc fc fc fc fa fb fb fb fb fb fb fb
 [ 1845.526059]                                                     ^
 [ 1845.526651]  ffff8881507b5a80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 [ 1845.527378]  ffff8881507b5b00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 [ 1845.528095] ==================================================================

> 
> Thank you!
> 
> -- 
> Deet-doot-dot, I am a bot.
> https://korg.docs.kernel.org/prtracker.html
Christian Brauner April 3, 2025, 8:29 a.m. UTC | #3
On Tue, Apr 01, 2025 at 08:07:15PM +0300, Leon Romanovsky wrote:
> On Mon, Mar 24, 2025 at 09:00:59PM +0000, pr-tracker-bot@kernel.org wrote:
> > The pull request you sent on Sat, 22 Mar 2025 11:13:18 +0100:
> > 
> > > git@gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs tags/vfs-6.15-rc1.mount
> > 
> > has been merged into torvalds/linux.git:
> > https://git.kernel.org/torvalds/c/fd101da676362aaa051b4f5d8a941bd308603041
> 
> I didn't bisect, but this PR looks like the most relevant candidate.
> The latest Linus's master generates the following slab-use-after-free:

Sorry, did just see this today. I'll take a look now.

> 
>  [ 1845.404658] ==================================================================
>  [ 1845.405460] BUG: KASAN: slab-use-after-free in clone_private_mount+0x309/0x390
>  [ 1845.406205] Read of size 8 at addr ffff8881507b5ab0 by task dockerd/8697
>  [ 1845.406847]
>  [ 1845.407081] CPU: 5 UID: 0 PID: 8697 Comm: dockerd Not tainted 6.14.0master_fbece6d #1 NONE
>  [ 1845.407086] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
>  [ 1845.407097] Call Trace:
>  [ 1845.407102]  <TASK>
>  [ 1845.407104]  dump_stack_lvl+0x69/0xa0
>  [ 1845.407114]  print_report+0x156/0x523
>  [ 1845.407120]  ? __virt_addr_valid+0x1de/0x3c0
>  [ 1845.407124]  ? clone_private_mount+0x309/0x390
>  [ 1845.407128]  kasan_report+0xc1/0xf0
>  [ 1845.407134]  ? clone_private_mount+0x309/0x390
>  [ 1845.407138]  clone_private_mount+0x309/0x390
>  [ 1845.407144]  ovl_fill_super+0x2965/0x59e0 [overlay]
>  [ 1845.407165]  ? ovl_workdir_create+0x900/0x900 [overlay]
>  [ 1845.407177]  ? wait_for_completion_io_timeout+0x20/0x20
>  [ 1845.407182]  ? lockdep_init_map_type+0x58/0x220
>  [ 1845.407186]  ? lockdep_init_map_type+0x58/0x220
>  [ 1845.407189]  ? shrinker_register+0x177/0x200
>  [ 1845.407194]  ? sget_fc+0x449/0xb30
>  [ 1845.407199]  ? ovl_workdir_create+0x900/0x900 [overlay]
>  [ 1845.407211]  ? get_tree_nodev+0xa5/0x130
>  [ 1845.407214]  get_tree_nodev+0xa5/0x130
>  [ 1845.407218]  ? cap_capable+0xd0/0x320
>  [ 1845.407223]  vfs_get_tree+0x83/0x2e0
>  [ 1845.407227]  ? ns_capable+0x55/0xb0
>  [ 1845.407232]  path_mount+0x891/0x1aa0
>  [ 1845.407237]  ? finish_automount+0x860/0x860
>  [ 1845.407240]  ? kmem_cache_free+0x14c/0x4f0
>  [ 1845.407245]  ? user_path_at+0x3d/0x50
>  [ 1845.407250]  __x64_sys_mount+0x2d4/0x3a0
>  [ 1845.407254]  ? path_mount+0x1aa0/0x1aa0
>  [ 1845.407259]  do_syscall_64+0x6d/0x140
>  [ 1845.407263]  entry_SYSCALL_64_after_hwframe+0x4b/0x53
>  [ 1845.407267] RIP: 0033:0x55e3487f1fea
>  [ 1845.407274] Code: e8 1b 96 fa ff 48 8b 7c 24 10 48 8b 74 24 18 48 8b 54 24 20 4c 8b 54 24 28 4c 8b 44 24 30 4c 8b 4c 24 38 48 8b 44 24 08 0f 05 <48> 3d 01 f0 ff ff 76 20 48 c7 44 24 40 ff ff ff ff 48 c7 44 24 48
>  [ 1845.407278] RSP: 002b:000000c000b563b8 EFLAGS: 00000212 ORIG_RAX: 00000000000000a5
>  [ 1845.407282] RAX: ffffffffffffffda RBX: 000000c00006c000 RCX: 000055e3487f1fea
>  [ 1845.407285] RDX: 000000c0012cf7d8 RSI: 000000c0012616c0 RDI: 000000c0012cf7d0
>  [ 1845.407287] RBP: 000000c000b56458 R08: 000000c0004fa600 R09: 0000000000000000
>  [ 1845.407289] R10: 0000000000000000 R11: 0000000000000212 R12: 000000c0012cf7d0
>  [ 1845.407291] R13: 0000000000000000 R14: 000000c00098b6c0 R15: ffffffffffffffff
>  [ 1845.407296]  </TASK>
>  [ 1845.407297]
>  [ 1845.431635] Allocated by task 17044:
>  [ 1845.432033]  kasan_save_stack+0x1e/0x40
>  [ 1845.432463]  kasan_save_track+0x10/0x30
>  [ 1845.432882]  __kasan_slab_alloc+0x62/0x70
>  [ 1845.433308]  kmem_cache_alloc_noprof+0x1a0/0x4a0
>  [ 1845.433781]  alloc_vfsmnt+0x23/0x6c0
>  [ 1845.434195]  vfs_create_mount+0x82/0x4a0
>  [ 1845.434623]  path_mount+0x939/0x1aa0
>  [ 1845.435018]  __x64_sys_mount+0x2d4/0x3a0
>  [ 1845.435440]  do_syscall_64+0x6d/0x140
>  [ 1845.435842]  entry_SYSCALL_64_after_hwframe+0x4b/0x53
>  [ 1845.436355]
>  [ 1845.436601] Freed by task 0:
>  [ 1845.436945]  kasan_save_stack+0x1e/0x40
>  [ 1845.437354]  kasan_save_track+0x10/0x30
>  [ 1845.437770]  kasan_save_free_info+0x37/0x60
>  [ 1845.438217]  __kasan_slab_free+0x33/0x40
>  [ 1845.438646]  kmem_cache_free+0x14c/0x4f0
>  [ 1845.439068]  rcu_core+0x605/0x1d50
>  [ 1845.439451]  handle_softirqs+0x192/0x810
>  [ 1845.439880]  irq_exit_rcu+0x106/0x190
>  [ 1845.440280]  sysvec_apic_timer_interrupt+0x7c/0xb0
>  [ 1845.440785]  asm_sysvec_apic_timer_interrupt+0x16/0x20
>  [ 1845.441300]
>  [ 1845.441544] Last potentially related work creation:
>  [ 1845.442048]  kasan_save_stack+0x1e/0x40
>  [ 1845.442465]  kasan_record_aux_stack+0x97/0xa0
>  [ 1845.442921]  __call_rcu_common.constprop.0+0x6d/0xb40
>  [ 1845.443437]  task_work_run+0x111/0x1f0
>  [ 1845.443851]  syscall_exit_to_user_mode+0x1df/0x1f0
>  [ 1845.444337]  do_syscall_64+0x79/0x140
>  [ 1845.444758]  entry_SYSCALL_64_after_hwframe+0x4b/0x53
>  [ 1845.445272]
>  [ 1845.445505] Second to last potentially related work creation:
>  [ 1845.446078]  kasan_save_stack+0x1e/0x40
>  [ 1845.446494]  kasan_record_aux_stack+0x97/0xa0
>  [ 1845.446947]  task_work_add+0x178/0x250
>  [ 1845.447356]  mntput_no_expire+0x4fc/0x9f0
>  [ 1845.447789]  path_umount+0x4ed/0x10d0
>  [ 1845.448190]  __x64_sys_umount+0xfb/0x120
>  [ 1845.448617]  do_syscall_64+0x6d/0x140
>  [ 1845.449016]  entry_SYSCALL_64_after_hwframe+0x4b/0x53
>  [ 1845.449529]
>  [ 1845.449766] The buggy address belongs to the object at ffff8881507b5a40
>  [ 1845.449766]  which belongs to the cache mnt_cache of size 368
>  [ 1845.450898] The buggy address is located 112 bytes inside of
>  [ 1845.450898]  freed 368-byte region [ffff8881507b5a40, ffff8881507b5bb0)
>  [ 1845.452009]
>  [ 1845.452250] The buggy address belongs to the physical page:
>  [ 1845.452808] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1507b4
>  [ 1845.453595] head: order:2 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0
>  [ 1845.454363] anon flags: 0x200000000000040(head|node=0|zone=2)
>  [ 1845.454936] page_type: f5(slab)
>  [ 1845.455300] raw: 0200000000000040 ffff8881009f5680 0000000000000000 dead000000000001
>  [ 1845.456077] raw: 0000000000000000 0000000080240024 00000000f5000000 0000000000000000
>  [ 1845.456857] head: 0200000000000040 ffff8881009f5680 0000000000000000 dead000000000001
>  [ 1845.457616] head: 0000000000000000 0000000080240024 00000000f5000000 0000000000000000
>  [ 1845.458399] head: 0200000000000002 ffffea000541ed01 ffffffffffffffff 0000000000000000
>  [ 1845.459169] head: 0000000000000004 0000000000000000 00000000ffffffff 0000000000000000
>  [ 1845.459945] page dumped because: kasan: bad access detected
>  [ 1845.460506]
>  [ 1845.460745] Memory state around the buggy address:
>  [ 1845.461228]  ffff8881507b5980: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fc fc
>  [ 1845.461963]  ffff8881507b5a00: fc fc fc fc fc fc fc fc fa fb fb fb fb fb fb fb
>  [ 1845.462759] >ffff8881507b5a80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>  [ 1845.463480]                                      ^
>  [ 1845.463968]  ffff8881507b5b00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>  [ 1845.464704]  ffff8881507b5b80: fb fb fb fb fb fb fc fc fc fc fc fc fc fc fc fc
>  [ 1845.465430] ==================================================================
>  [ 1845.466181] Disabling lock debugging due to kernel taint
>  [ 1845.466717] ==================================================================
>  [ 1845.467443] BUG: KASAN: slab-use-after-free in clone_private_mount+0x313/0x390
>  [ 1845.468192] Read of size 8 at addr ffff8881507b5a58 by task dockerd/8697
>  [ 1845.468837]
>  [ 1845.469072] CPU: 5 UID: 0 PID: 8697 Comm: dockerd Tainted: G    B               6.14.0master_fbece6d #1 NONE
>  [ 1845.469078] Tainted: [B]=BAD_PAGE
>  [ 1845.469079] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
>  [ 1845.469082] Call Trace:
>  [ 1845.469084]  <TASK>
>  [ 1845.469086]  dump_stack_lvl+0x69/0xa0
>  [ 1845.469093]  print_report+0x156/0x523
>  [ 1845.469098]  ? __virt_addr_valid+0x1de/0x3c0
>  [ 1845.469103]  ? clone_private_mount+0x313/0x390
>  [ 1845.469107]  kasan_report+0xc1/0xf0
>  [ 1845.469112]  ? clone_private_mount+0x313/0x390
>  [ 1845.469116]  clone_private_mount+0x313/0x390
>  [ 1845.469121]  ovl_fill_super+0x2965/0x59e0 [overlay]
>  [ 1845.469140]  ? ovl_workdir_create+0x900/0x900 [overlay]
>  [ 1845.469152]  ? wait_for_completion_io_timeout+0x20/0x20
>  [ 1845.469157]  ? lockdep_init_map_type+0x58/0x220
>  [ 1845.469161]  ? lockdep_init_map_type+0x58/0x220
>  [ 1845.469164]  ? shrinker_register+0x177/0x200
>  [ 1845.469169]  ? sget_fc+0x449/0xb30
>  [ 1845.469174]  ? ovl_workdir_create+0x900/0x900 [overlay]
>  [ 1845.469185]  ? get_tree_nodev+0xa5/0x130
>  [ 1845.469189]  get_tree_nodev+0xa5/0x130
>  [ 1845.469192]  ? cap_capable+0xd0/0x320
>  [ 1845.469198]  vfs_get_tree+0x83/0x2e0
>  [ 1845.469202]  ? ns_capable+0x55/0xb0
>  [ 1845.469206]  path_mount+0x891/0x1aa0
>  [ 1845.469210]  ? finish_automount+0x860/0x860
>  [ 1845.469217]  ? kmem_cache_free+0x14c/0x4f0
>  [ 1845.469221]  ? user_path_at+0x3d/0x50
>  [ 1845.469227]  __x64_sys_mount+0x2d4/0x3a0
>  [ 1845.469231]  ? path_mount+0x1aa0/0x1aa0
>  [ 1845.469235]  do_syscall_64+0x6d/0x140
>  [ 1845.469239]  entry_SYSCALL_64_after_hwframe+0x4b/0x53
>  [ 1845.469242] RIP: 0033:0x55e3487f1fea
>  [ 1845.469246] Code: e8 1b 96 fa ff 48 8b 7c 24 10 48 8b 74 24 18 48 8b 54 24 20 4c 8b 54 24 28 4c 8b 44 24 30 4c 8b 4c 24 38 48 8b 44 24 08 0f 05 <48> 3d 01 f0 ff ff 76 20 48 c7 44 24 40 ff ff ff ff 48 c7 44 24 48
>  [ 1845.469249] RSP: 002b:000000c000b563b8 EFLAGS: 00000212 ORIG_RAX: 00000000000000a5
>  [ 1845.469253] RAX: ffffffffffffffda RBX: 000000c00006c000 RCX: 000055e3487f1fea
>  [ 1845.469256] RDX: 000000c0012cf7d8 RSI: 000000c0012616c0 RDI: 000000c0012cf7d0
>  [ 1845.469260] RBP: 000000c000b56458 R08: 000000c0004fa600 R09: 0000000000000000
>  [ 1845.469261] R10: 0000000000000000 R11: 0000000000000212 R12: 000000c0012cf7d0
>  [ 1845.469263] R13: 0000000000000000 R14: 000000c00098b6c0 R15: ffffffffffffffff
>  [ 1845.469268]  </TASK>
>  [ 1845.469269]
>  [ 1845.494368] Allocated by task 17044:
>  [ 1845.494768]  kasan_save_stack+0x1e/0x40
>  [ 1845.495185]  kasan_save_track+0x10/0x30
>  [ 1845.495594]  __kasan_slab_alloc+0x62/0x70
>  [ 1845.496024]  kmem_cache_alloc_noprof+0x1a0/0x4a0
>  [ 1845.496518]  alloc_vfsmnt+0x23/0x6c0
>  [ 1845.496911]  vfs_create_mount+0x82/0x4a0
>  [ 1845.497333]  path_mount+0x939/0x1aa0
>  [ 1845.497728]  __x64_sys_mount+0x2d4/0x3a0
>  [ 1845.498167]  do_syscall_64+0x6d/0x140
>  [ 1845.498563]  entry_SYSCALL_64_after_hwframe+0x4b/0x53
>  [ 1845.499064]
>  [ 1845.499295] Freed by task 0:
>  [ 1845.499636]  kasan_save_stack+0x1e/0x40
>  [ 1845.500052]  kasan_save_track+0x10/0x30
>  [ 1845.500494]  kasan_save_free_info+0x37/0x60
>  [ 1845.500934]  __kasan_slab_free+0x33/0x40
>  [ 1845.501355]  kmem_cache_free+0x14c/0x4f0
>  [ 1845.501774]  rcu_core+0x605/0x1d50
>  [ 1845.502162]  handle_softirqs+0x192/0x810
>  [ 1845.502587]  irq_exit_rcu+0x106/0x190
>  [ 1845.502995]  sysvec_apic_timer_interrupt+0x7c/0xb0
>  [ 1845.503487]  asm_sysvec_apic_timer_interrupt+0x16/0x20
>  [ 1845.504002]
>  [ 1845.504236] Last potentially related work creation:
>  [ 1845.504748]  kasan_save_stack+0x1e/0x40
>  [ 1845.505164]  kasan_record_aux_stack+0x97/0xa0
>  [ 1845.505621]  __call_rcu_common.constprop.0+0x6d/0xb40
>  [ 1845.506136]  task_work_run+0x111/0x1f0
>  [ 1845.506545]  syscall_exit_to_user_mode+0x1df/0x1f0
>  [ 1845.507038]  do_syscall_64+0x79/0x140
>  [ 1845.507439]  entry_SYSCALL_64_after_hwframe+0x4b/0x53
>  [ 1845.507949]
>  [ 1845.508187] Second to last potentially related work creation:
>  [ 1845.508760]  kasan_save_stack+0x1e/0x40
>  [ 1845.509175]  kasan_record_aux_stack+0x97/0xa0
>  [ 1845.509630]  task_work_add+0x178/0x250
>  [ 1845.510040]  mntput_no_expire+0x4fc/0x9f0
>  [ 1845.510468]  path_umount+0x4ed/0x10d0
>  [ 1845.510870]  __x64_sys_umount+0xfb/0x120
>  [ 1845.511298]  do_syscall_64+0x6d/0x140
>  [ 1845.511700]  entry_SYSCALL_64_after_hwframe+0x4b/0x53
>  [ 1845.512210]
>  [ 1845.512442] The buggy address belongs to the object at ffff8881507b5a40
>  [ 1845.512442]  which belongs to the cache mnt_cache of size 368
>  [ 1845.513553] The buggy address is located 24 bytes inside of
>  [ 1845.513553]  freed 368-byte region [ffff8881507b5a40, ffff8881507b5bb0)
>  [ 1845.514650]
>  [ 1845.514883] The buggy address belongs to the physical page:
>  [ 1845.515436] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1507b4
>  [ 1845.516221] head: order:2 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0
>  [ 1845.516986] anon flags: 0x200000000000040(head|node=0|zone=2)
>  [ 1845.517549] page_type: f5(slab)
>  [ 1845.517912] raw: 0200000000000040 ffff8881009f5680 0000000000000000 dead000000000001
>  [ 1845.518684] raw: 0000000000000000 0000000080240024 00000000f5000000 0000000000000000
>  [ 1845.519445] head: 0200000000000040 ffff8881009f5680 0000000000000000 dead000000000001
>  [ 1845.520220] head: 0000000000000000 0000000080240024 00000000f5000000 0000000000000000
>  [ 1845.521006] head: 0200000000000002 ffffea000541ed01 ffffffffffffffff 0000000000000000
>  [ 1845.521812] head: 0000000000000004 0000000000000000 00000000ffffffff 0000000000000000
>  [ 1845.522581] page dumped because: kasan: bad access detected
>  [ 1845.523131]
>  [ 1845.523362] Memory state around the buggy address:
>  [ 1845.523851]  ffff8881507b5900: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>  [ 1845.524588]  ffff8881507b5980: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fc fc
>  [ 1845.525321] >ffff8881507b5a00: fc fc fc fc fc fc fc fc fa fb fb fb fb fb fb fb
>  [ 1845.526059]                                                     ^
>  [ 1845.526651]  ffff8881507b5a80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>  [ 1845.527378]  ffff8881507b5b00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>  [ 1845.528095] ==================================================================
> 
> > 
> > Thank you!
> > 
> > -- 
> > Deet-doot-dot, I am a bot.
> > https://korg.docs.kernel.org/prtracker.html
Christian Brauner April 3, 2025, 3:15 p.m. UTC | #4
On Thu, Apr 03, 2025 at 10:29:37AM +0200, Christian Brauner wrote:
> On Tue, Apr 01, 2025 at 08:07:15PM +0300, Leon Romanovsky wrote:
> > On Mon, Mar 24, 2025 at 09:00:59PM +0000, pr-tracker-bot@kernel.org wrote:
> > > The pull request you sent on Sat, 22 Mar 2025 11:13:18 +0100:
> > > 
> > > > git@gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs tags/vfs-6.15-rc1.mount
> > > 
> > > has been merged into torvalds/linux.git:
> > > https://git.kernel.org/torvalds/c/fd101da676362aaa051b4f5d8a941bd308603041
> > 
> > I didn't bisect, but this PR looks like the most relevant candidate.
> > The latest Linus's master generates the following slab-use-after-free:
> 
> Sorry, did just see this today. I'll take a look now.

So in light of "Liberation Day" and the bug that caused this splat it's
time to quote Max Liebermann:

"Ich kann nicht so viel fressen, wie ich kotzen möchte."
James Bottomley April 3, 2025, 3:34 p.m. UTC | #5
On Thu, 2025-04-03 at 17:15 +0200, Christian Brauner wrote:
> On Thu, Apr 03, 2025 at 10:29:37AM +0200, Christian Brauner wrote:
> > On Tue, Apr 01, 2025 at 08:07:15PM +0300, Leon Romanovsky wrote:
> > > On Mon, Mar 24, 2025 at 09:00:59PM +0000,
> > > pr-tracker-bot@kernel.org wrote:
> > > > The pull request you sent on Sat, 22 Mar 2025 11:13:18 +0100:
> > > > 
> > > > > git@gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs
> > > > > tags/vfs-6.15-rc1.mount
> > > > 
> > > > has been merged into torvalds/linux.git:
> > > > https://git.kernel.org/torvalds/c/fd101da676362aaa051b4f5d8a941bd308603041
> > > 
> > > I didn't bisect, but this PR looks like the most relevant
> > > candidate.
> > > The latest Linus's master generates the following slab-use-after-
> > > free:
> > 
> > Sorry, did just see this today. I'll take a look now.
> 
> So in light of "Liberation Day" and the bug that caused this splat
> it's
> time to quote Max Liebermann:
> 
> "Ich kann nicht so viel fressen, wie ich kotzen möchte."

> --- a/fs/namespace.c
> +++ b/fs/namespace.c
> @@ -2478,7 +2478,8 @@ struct vfsmount *clone_private_mount(const
> struct path *path)
>  	struct mount *old_mnt = real_mount(path->mnt);
>  	struct mount *new_mnt;
>  
> -	scoped_guard(rwsem_read, &namespace_sem)
> +	guard(rwsem_read, &namespace_sem);
> +
>  	if (IS_MNT_UNBINDABLE(old_mnt))
>  		return ERR_PTR(-EINVAL);
> 

Well that's a barfworthy oopsie, yes.  However, it does strike me as an
easy one to make for a lot of these cleanup.h things since we have a
lot of scoped and unscoped variants.  We should, at least, get
checkpatch to issue a warning about indentation expectations as it does
for our other scoped statements like for, while, if etc.

It looks quite simple if got my perl right (it's a bit rusty).

Regards,

James

---

diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index 7b28ad331742..805b65098149 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -4347,7 +4347,7 @@ sub process {
 		}
 
 # Check relative indent for conditionals and blocks.
-		if ($line =~ /\b(?:(?:if|while|for|(?:[a-z_]+|)for_each[a-z_]+)\s*\(|(?:do|else)\b)/ && $line !~ /^.\s*#/ && $line !~ /\}\s*while\s*/) {
+		if ($line =~ /\b(?:(?:if|while|scoped_[a-z_]+|for|(?:[a-z_]+|)for_each[a-z_]+)\s*\(|(?:do|else)\b)/ && $line !~ /^.\s*#/ && $line !~ /\}\s*while\s*/) {
 			($stat, $cond, $line_nr_next, $remain_next, $off_next) =
 				ctx_statement_block($linenr, $realcnt, 0)
 					if (!defined $stat);
Mateusz Guzik April 3, 2025, 5:21 p.m. UTC | #6
On Thu, Apr 03, 2025 at 11:34:34AM -0400, James Bottomley wrote:
> On Thu, 2025-04-03 at 17:15 +0200, Christian Brauner wrote:
> > On Thu, Apr 03, 2025 at 10:29:37AM +0200, Christian Brauner wrote:
> > > On Tue, Apr 01, 2025 at 08:07:15PM +0300, Leon Romanovsky wrote:
> > > > On Mon, Mar 24, 2025 at 09:00:59PM +0000,
> > > > pr-tracker-bot@kernel.org wrote:
> > > > > The pull request you sent on Sat, 22 Mar 2025 11:13:18 +0100:
> > > > > 
> > > > > > git@gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs
> > > > > > tags/vfs-6.15-rc1.mount
> > > > > 
> > > > > has been merged into torvalds/linux.git:
> > > > > https://git.kernel.org/torvalds/c/fd101da676362aaa051b4f5d8a941bd308603041
> > > > 
> > > > I didn't bisect, but this PR looks like the most relevant
> > > > candidate.
> > > > The latest Linus's master generates the following slab-use-after-
> > > > free:
> > > 
> > > Sorry, did just see this today. I'll take a look now.
> > 
> > So in light of "Liberation Day" and the bug that caused this splat
> > it's
> > time to quote Max Liebermann:
> > 
> > "Ich kann nicht so viel fressen, wie ich kotzen möchte."
> 
> > --- a/fs/namespace.c
> > +++ b/fs/namespace.c
> > @@ -2478,7 +2478,8 @@ struct vfsmount *clone_private_mount(const
> > struct path *path)
> >  	struct mount *old_mnt = real_mount(path->mnt);
> >  	struct mount *new_mnt;
> >  
> > -	scoped_guard(rwsem_read, &namespace_sem)
> > +	guard(rwsem_read, &namespace_sem);
> > +
> >  	if (IS_MNT_UNBINDABLE(old_mnt))
> >  		return ERR_PTR(-EINVAL);
> > 
> 
> Well that's a barfworthy oopsie, yes.  However, it does strike me as an
> easy one to make for a lot of these cleanup.h things since we have a
> lot of scoped and unscoped variants.  We should, at least, get
> checkpatch to issue a warning about indentation expectations as it does
> for our other scoped statements like for, while, if etc.
> 

I think this is too easy of a mistake to make to try to detect in
checkpatch.

I would argue it would be best if a language wizard came up with a way
to *demand* explicit use of { } and fail compilation if not present.

This would also provide a nice side effect of explicitly delineating
what's protected.

There are some legitimate { }-less users already, it should not pose
difficulty to patch them. I can do the churn, provided someone fixes the
problem.
Linus Torvalds April 3, 2025, 6:09 p.m. UTC | #7
On Thu, 3 Apr 2025 at 10:21, Mateusz Guzik <mjguzik@gmail.com> wrote:
>
> I would argue it would be best if a language wizard came up with a way
> to *demand* explicit use of { } and fail compilation if not present.

I tried to think of some sane model for it, but there isn't any good syntax.

The only way to enforce it would be to also have a "end" marker, ie do
something like

        scoped_guard(x) {
                ...
        } end_scoped_guard;

and that you could more-or-less enforce by having

    #define scoped_guard(..) ... real guard stuff .. \
                do {

    #define end_scope } while (0)

where in addition we could add some dummy variable declaration inside
scoped_guard(), and have a dummy use of that variable in the
end_scope, just to further make sure the two pair up.

It does have the advantage of allowing more flexibility with fewer
tricks when you can define your scope in the macros. Right now
"scoped_guard()" plays some rather ugly games internally, just in
order to avoid this pattern.

And that pattern isn't actually new. We *used* to have this pattern in

        do_each_thread(g, t) {
                ...
        } while_each_thread(g, t);

and honestly, people seemed to hate it.

(Also, sparse has that pattern as

        FOR_EACH_PTR(filelist, file) {
                ...
        } END_FOR_EACH_PTR(file);

and it actually works quite well and once you get used to it it's
nice, but I do think people tend to find it really really odd)

> This would also provide a nice side effect of explicitly delineating
> what's protected.

Sadly, I think we have too many uses for this to be worth it any more.
And I do suspect people would hate the odd "both beginning and end"
thing even if it adds some safety.

I dunno. I personally don't mind the "delineate both the beginning and
the end", but we don't have a great history of it.

               Linus
Leon Romanovsky April 3, 2025, 6:24 p.m. UTC | #8
On Thu, Apr 03, 2025 at 05:15:38PM +0200, Christian Brauner wrote:
> On Thu, Apr 03, 2025 at 10:29:37AM +0200, Christian Brauner wrote:
> > On Tue, Apr 01, 2025 at 08:07:15PM +0300, Leon Romanovsky wrote:
> > > On Mon, Mar 24, 2025 at 09:00:59PM +0000, pr-tracker-bot@kernel.org wrote:
> > > > The pull request you sent on Sat, 22 Mar 2025 11:13:18 +0100:
> > > > 
> > > > > git@gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs tags/vfs-6.15-rc1.mount
> > > > 
> > > > has been merged into torvalds/linux.git:
> > > > https://git.kernel.org/torvalds/c/fd101da676362aaa051b4f5d8a941bd308603041
> > > 
> > > I didn't bisect, but this PR looks like the most relevant candidate.
> > > The latest Linus's master generates the following slab-use-after-free:
> > 
> > Sorry, did just see this today. I'll take a look now.
> 
> So in light of "Liberation Day" and the bug that caused this splat it's
> time to quote Max Liebermann:
> 
> "Ich kann nicht so viel fressen, wie ich kotzen möchte."

> From 8822177b7a8a7315446b4227c7eb7a36916a6d6d Mon Sep 17 00:00:00 2001
> From: Christian Brauner <brauner@kernel.org>
> Date: Thu, 3 Apr 2025 16:43:50 +0200
> Subject: [PATCH] fs: actually hold the namespace semaphore
> 
> Don't use a scoped guard use a regular guard to make sure that the
> namespace semaphore is held across the whole function.
> 
> Signed-off-by: Christian Brauner <brauner@kernel.org>
> ---
>  fs/namespace.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/namespace.c b/fs/namespace.c
> index 16292ff760c9..348008b9683b 100644
> --- a/fs/namespace.c
> +++ b/fs/namespace.c
> @@ -2478,7 +2478,8 @@ struct vfsmount *clone_private_mount(const struct path *path)
>  	struct mount *old_mnt = real_mount(path->mnt);
>  	struct mount *new_mnt;
>  
> -	scoped_guard(rwsem_read, &namespace_sem)
> +	guard(rwsem_read, &namespace_sem);

I'm looking at Linus's master commit a2cc6ff5ec8f ("Merge tag
'firewire-updates-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394")
and guard is declared as macro which gets only one argument: include/linux/cleanup.h
  318 #define guard(_name) \
  319         CLASS(_name, __UNIQUE_ID(guard))



20:52:24  fs/namespace.c: In function 'clone_private_mount':
20:52:24  fs/namespace.c:2481:41: error: macro "guard" passed 2 arguments, but takes just 1
20:52:24   2481 |         guard(rwsem_read, &namespace_sem);
20:52:24        |                                         ^
20:52:24  In file included from ./include/linux/preempt.h:11,
20:52:24                   from ./include/linux/spinlock.h:56,
20:52:24                   from ./include/linux/wait.h:9,
20:52:24                   from ./include/linux/wait_bit.h:8,
20:52:24                   from ./include/linux/fs.h:7,
20:52:24                   from ./include/uapi/linux/aio_abi.h:31,
20:52:24                   from ./include/linux/syscalls.h:83,
20:52:24                   from fs/namespace.c:11:
20:52:24  ./include/linux/cleanup.h:318:9: note: macro "guard" defined here
20:52:24    318 | #define guard(_name) \
20:52:24        |         ^~~~~
20:52:24  fs/namespace.c:2481:9: error: 'guard' undeclared (first use in this function)
20:52:24   2481 |         guard(rwsem_read, &namespace_sem);
20:52:24        |         ^~~~~
20:52:24  fs/namespace.c:2481:9: note: each undeclared identifier is reported only once for each function it appears in

Do I need to apply extra patch?

Thanks
Mateusz Guzik April 3, 2025, 7:17 p.m. UTC | #9
On Thu, Apr 3, 2025 at 8:10 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> On Thu, 3 Apr 2025 at 10:21, Mateusz Guzik <mjguzik@gmail.com> wrote:
> >
> > I would argue it would be best if a language wizard came up with a way
> > to *demand* explicit use of { } and fail compilation if not present.
>
> I tried to think of some sane model for it, but there isn't any good syntax.
>
> The only way to enforce it would be to also have a "end" marker, ie do
> something like
>
>         scoped_guard(x) {
>                 ...
>         } end_scoped_guard;
>
> and that you could more-or-less enforce by having
>
>     #define scoped_guard(..) ... real guard stuff .. \
>                 do {
>
>     #define end_scope } while (0)
>

Ye I was thinking about something like that would was thoroughly
dissatisfied with the idea.

Perhaps a tolerable fallback would be to rely on checkpatch after all,
but have it detect missing { } instead of relying on indentation
level?
Linus Torvalds April 3, 2025, 7:18 p.m. UTC | #10
On Thu, 3 Apr 2025 at 11:25, Leon Romanovsky <leon@kernel.org> wrote:
> >
> > -     scoped_guard(rwsem_read, &namespace_sem)
> > +     guard(rwsem_read, &namespace_sem);
>
> I'm looking at Linus's master commit a2cc6ff5ec8f ("Merge tag
> 'firewire-updates-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394")
> and guard is declared as macro which gets only one argument: include/linux/cleanup.h
>   318 #define guard(_name) \
>   319         CLASS(_name, __UNIQUE_ID(guard))

Christian didn't test his patch, obviously.

It should be

        guard(rwsem_read)(&namespace_sem);

the guard() macro is kind of odd, but the oddity relates to how it
kind of takes a "class" thing as it's argument, and that then expands
to the constructor that may or may not take arguments itself.

That made some of the macros simpler, although in retrospect the odd
syntax probably wasn't worth it.

            Linus
James Bottomley April 3, 2025, 7:38 p.m. UTC | #11
On Thu, 2025-04-03 at 21:24 +0300, Leon Romanovsky wrote:
> On Thu, Apr 03, 2025 at 05:15:38PM +0200, Christian Brauner wrote:
> > On Thu, Apr 03, 2025 at 10:29:37AM +0200, Christian Brauner wrote:
> > > On Tue, Apr 01, 2025 at 08:07:15PM +0300, Leon Romanovsky wrote:
> > > > On Mon, Mar 24, 2025 at 09:00:59PM +0000,
> > > > pr-tracker-bot@kernel.org wrote:
> > > > > The pull request you sent on Sat, 22 Mar 2025 11:13:18 +0100:
> > > > > 
> > > > > > git@gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs
> > > > > > tags/vfs-6.15-rc1.mount
> > > > > 
> > > > > has been merged into torvalds/linux.git:
> > > > > https://git.kernel.org/torvalds/c/fd101da676362aaa051b4f5d8a941bd308603041
> > > > 
> > > > I didn't bisect, but this PR looks like the most relevant
> > > > candidate.
> > > > The latest Linus's master generates the following slab-use-
> > > > after-free:
> > > 
> > > Sorry, did just see this today. I'll take a look now.
> > 
> > So in light of "Liberation Day" and the bug that caused this splat
> > it's time to quote Max Liebermann:
> > 
> > "Ich kann nicht so viel fressen, wie ich kotzen möchte."
> 
> > From 8822177b7a8a7315446b4227c7eb7a36916a6d6d Mon Sep 17 00:00:00
> > 2001
> > From: Christian Brauner <brauner@kernel.org>
> > Date: Thu, 3 Apr 2025 16:43:50 +0200
> > Subject: [PATCH] fs: actually hold the namespace semaphore
> > 
> > Don't use a scoped guard use a regular guard to make sure that the
> > namespace semaphore is held across the whole function.
> > 
> > Signed-off-by: Christian Brauner <brauner@kernel.org>
> > ---
> >  fs/namespace.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > diff --git a/fs/namespace.c b/fs/namespace.c
> > index 16292ff760c9..348008b9683b 100644
> > --- a/fs/namespace.c
> > +++ b/fs/namespace.c
> > @@ -2478,7 +2478,8 @@ struct vfsmount *clone_private_mount(const
> > struct path *path)
> >  	struct mount *old_mnt = real_mount(path->mnt);
> >  	struct mount *new_mnt;
> >  
> > -	scoped_guard(rwsem_read, &namespace_sem)
> > +	guard(rwsem_read, &namespace_sem);
> 
> I'm looking at Linus's master commit a2cc6ff5ec8f ("Merge tag
> 'firewire-updates-6.15' of
> git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394")
> and guard is declared as macro which gets only one argument:
> include/linux/cleanup.h
>   318 #define guard(_name) \
>   319         CLASS(_name, __UNIQUE_ID(guard))
> 
> 
> 
> 20:52:24  fs/namespace.c: In function 'clone_private_mount':
> 20:52:24  fs/namespace.c:2481:41: error: macro "guard" passed 2
> arguments, but takes just 1
> 20:52:24   2481 |         guard(rwsem_read, &namespace_sem);
> 20:52:24        |                                         ^
> 20:52:24  In file included from ./include/linux/preempt.h:11,
> 20:52:24                   from ./include/linux/spinlock.h:56,
> 20:52:24                   from ./include/linux/wait.h:9,
> 20:52:24                   from ./include/linux/wait_bit.h:8,
> 20:52:24                   from ./include/linux/fs.h:7,
> 20:52:24                   from ./include/uapi/linux/aio_abi.h:31,
> 20:52:24                   from ./include/linux/syscalls.h:83,
> 20:52:24                   from fs/namespace.c:11:
> 20:52:24  ./include/linux/cleanup.h:318:9: note: macro "guard"
> defined here
> 20:52:24    318 | #define guard(_name) \
> 20:52:24        |         ^~~~~
> 20:52:24  fs/namespace.c:2481:9: error: 'guard' undeclared (first use
> in this function)
> 20:52:24   2481 |         guard(rwsem_read, &namespace_sem);
> 20:52:24        |         ^~~~~
> 20:52:24  fs/namespace.c:2481:9: note: each undeclared identifier is
> reported only once for each function it appears in
> 
> Do I need to apply extra patch?

I think the statement should be

guard(rwsem_read)(&namespace_sem);

Regards,

James
Christian Brauner April 3, 2025, 7:45 p.m. UTC | #12
On Thu, Apr 03, 2025 at 12:18:45PM -0700, Linus Torvalds wrote:
> On Thu, 3 Apr 2025 at 11:25, Leon Romanovsky <leon@kernel.org> wrote:
> > >
> > > -     scoped_guard(rwsem_read, &namespace_sem)
> > > +     guard(rwsem_read, &namespace_sem);
> >
> > I'm looking at Linus's master commit a2cc6ff5ec8f ("Merge tag
> > 'firewire-updates-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394")
> > and guard is declared as macro which gets only one argument: include/linux/cleanup.h
> >   318 #define guard(_name) \
> >   319         CLASS(_name, __UNIQUE_ID(guard))
> 
> Christian didn't test his patch, obviously.

Yes, I just sent this out as "I get why this happens." after my
screaming "dammit" moment. Sorry that I didn't make this clear. I had a
pretty strong "ffs" 10 minutes after I had waded through the overlayfs
code I added without being able to figure out how the fsck this could've
happened. In any case, there's the obviously correct version now sitting
in the tree and it's seen testing obviously.
Christian Brauner April 3, 2025, 7:55 p.m. UTC | #13
On Thu, Apr 03, 2025 at 09:45:59PM +0200, Christian Brauner wrote:
> On Thu, Apr 03, 2025 at 12:18:45PM -0700, Linus Torvalds wrote:
> > On Thu, 3 Apr 2025 at 11:25, Leon Romanovsky <leon@kernel.org> wrote:
> > > >
> > > > -     scoped_guard(rwsem_read, &namespace_sem)
> > > > +     guard(rwsem_read, &namespace_sem);
> > >
> > > I'm looking at Linus's master commit a2cc6ff5ec8f ("Merge tag
> > > 'firewire-updates-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394")
> > > and guard is declared as macro which gets only one argument: include/linux/cleanup.h
> > >   318 #define guard(_name) \
> > >   319         CLASS(_name, __UNIQUE_ID(guard))
> > 
> > Christian didn't test his patch, obviously.
> 
> Yes, I just sent this out as "I get why this happens." after my
> screaming "dammit" moment. Sorry that I didn't make this clear. I had a
> pretty strong "ffs" 10 minutes after I had waded through the overlayfs
> code I added without being able to figure out how the fsck this could've
> happened. In any case, there's the obviously correct version now sitting
> in the tree and it's seen testing obviously.

I'll also append it here just in case you want to apply it right now.
Leon Romanovsky April 4, 2025, 6:16 a.m. UTC | #14
On Thu, Apr 03, 2025 at 12:18:45PM -0700, Linus Torvalds wrote:
> On Thu, 3 Apr 2025 at 11:25, Leon Romanovsky <leon@kernel.org> wrote:
> > >
> > > -     scoped_guard(rwsem_read, &namespace_sem)
> > > +     guard(rwsem_read, &namespace_sem);
> >
> > I'm looking at Linus's master commit a2cc6ff5ec8f ("Merge tag
> > 'firewire-updates-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394")
> > and guard is declared as macro which gets only one argument: include/linux/cleanup.h
> >   318 #define guard(_name) \
> >   319         CLASS(_name, __UNIQUE_ID(guard))
> 
> Christian didn't test his patch, obviously.
> 
> It should be
> 
>         guard(rwsem_read)(&namespace_sem);
> 
> the guard() macro is kind of odd, but the oddity relates to how it
> kind of takes a "class" thing as it's argument, and that then expands
> to the constructor that may or may not take arguments itself.

Thanks, fixed.

Regarding syntax, in my opinion it is too odd and not intuitive.

> 
> That made some of the macros simpler, although in retrospect the odd
> syntax probably wasn't worth it.
> 
>             Linus
Christoph Hellwig April 4, 2025, 8:28 a.m. UTC | #15
On Thu, Apr 03, 2025 at 11:09:41AM -0700, Linus Torvalds wrote:
> On Thu, 3 Apr 2025 at 10:21, Mateusz Guzik <mjguzik@gmail.com> wrote:
> >
> > I would argue it would be best if a language wizard came up with a way
> > to *demand* explicit use of { } and fail compilation if not present.
> 
> I tried to think of some sane model for it, but there isn't any good syntax.
> 
> The only way to enforce it would be to also have a "end" marker, ie do
> something like

Or just kill the non-scoped guard because it simply is an insane API.
diff mbox series

Patch

diff --cc fs/internal.h
index 82127c69e641,db6094d5cb0b..000000000000
--- a/fs/internal.h