mbox series

[0/2] two nits for path lookup

Message ID 20250416221626.2710239-1-mjguzik@gmail.com (mailing list archive)
Headers show
Series two nits for path lookup | expand

Message

Mateusz Guzik April 16, 2025, 10:16 p.m. UTC
since path looku is being looked at, two extra nits from me:

1. some trivial jump avoidance in inode_permission()

2. but more importantly avoiding a memory access which is most likely a
cache miss when descending into devcgroup_inode_permission()

the file seems to have no maintainer fwiw

anyhow I'm confident the way forward is to add IOP_FAST_MAY_EXEC (or
similar) to elide inode_permission() in the common case to begin with.
There are quite a few branches which straight up don't need execute.
On top of that btrfs has a permission hook only to check for MAY_WRITE,
which in case of path lookup is not set. With the above flag the call
will be avoided.

Mateusz Guzik (2):
  fs: touch up predicts in inode_permission()
  device_cgroup: avoid access to ->i_rdev in the common case in
    devcgroup_inode_permission()

 fs/namei.c                    | 10 +++++-----
 include/linux/device_cgroup.h |  7 ++++---
 2 files changed, 9 insertions(+), 8 deletions(-)

Comments

Mateusz Guzik April 16, 2025, 10:35 p.m. UTC | #1
On Thu, Apr 17, 2025 at 12:16 AM Mateusz Guzik <mjguzik@gmail.com> wrote:
>
> since path looku is being looked at, two extra nits from me:
>
> 1. some trivial jump avoidance in inode_permission()
>
> 2. but more importantly avoiding a memory access which is most likely a
> cache miss when descending into devcgroup_inode_permission()
>
> the file seems to have no maintainer fwiw
>
> anyhow I'm confident the way forward is to add IOP_FAST_MAY_EXEC (or
> similar) to elide inode_permission() in the common case to begin with.
> There are quite a few branches which straight up don't need execute.

.. the bit would be set if everyone has the x perm on the inode, there
are no acls and the thing is a directory

The perm to check being MAY_EXEC elides the MAY_WRITE check in sb_permission().
The bit only showing up on directories means this is not a device,
eliding  devcgroup_inode_permission()
The bit being set means there is no need to separately check for the
mode and acls.

I have hooks in the same spot as security_* callbacks for setattr and
setacl + a CONFIG_DEBUG_VFS-guarded runtime check that the bit is only
set if there are indeed no acls and the mode grants x for everyone. It
also handles races against setattr/getacl. I just need to clean this
up + do more testing.

> On top of that btrfs has a permission hook only to check for MAY_WRITE,
> which in case of path lookup is not set. With the above flag the call
> will be avoided.
>
> Mateusz Guzik (2):
>   fs: touch up predicts in inode_permission()
>   device_cgroup: avoid access to ->i_rdev in the common case in
>     devcgroup_inode_permission()
>
>  fs/namei.c                    | 10 +++++-----
>  include/linux/device_cgroup.h |  7 ++++---
>  2 files changed, 9 insertions(+), 8 deletions(-)
>
> --
> 2.48.1
>
Linus Torvalds April 16, 2025, 10:39 p.m. UTC | #2
On Wed, 16 Apr 2025 at 15:16, Mateusz Guzik <mjguzik@gmail.com> wrote:
>
> since path looku is being looked at, two extra nits from me:

Ack, both look sane to me.

             Linus
Christian Brauner April 17, 2025, 8:13 a.m. UTC | #3
On Thu, Apr 17, 2025 at 12:16:24AM +0200, Mateusz Guzik wrote:
> since path looku is being looked at, two extra nits from me:
> 
> 1. some trivial jump avoidance in inode_permission()
> 
> 2. but more importantly avoiding a memory access which is most likely a
> cache miss when descending into devcgroup_inode_permission()

Serge did maintain this for a while. But honestly it is an absolute
legacy eyesore from the cgroup v1 days. Somehow it was decided that
device permission management is a good fit for cgroups. Idk, I have
ranted about this in other places at length. No use warming that back
up.

They later decided to reimplement device access management as a
dedicated bpf program type. That imho is another bad design decision.

What should've happend is that device access management should've just
been implemented through the bpf-LSM infrastructure. That way all this
stuff would've gone through security_inode_permission() instead of us
having to have two separate calls in inode_permission().

I would love to kill this call. And cgroup v1 is deprecated and systemd
has dropped any support for it last year as well.

> 
> the file seems to have no maintainer fwiw
> 
> anyhow I'm confident the way forward is to add IOP_FAST_MAY_EXEC (or
> similar) to elide inode_permission() in the common case to begin with.
> There are quite a few branches which straight up don't need execute.

Yes.

> On top of that btrfs has a permission hook only to check for MAY_WRITE,
> which in case of path lookup is not set. With the above flag the call
> will be avoided.
> 
> Mateusz Guzik (2):
>   fs: touch up predicts in inode_permission()
>   device_cgroup: avoid access to ->i_rdev in the common case in
>     devcgroup_inode_permission()
> 
>  fs/namei.c                    | 10 +++++-----
>  include/linux/device_cgroup.h |  7 ++++---
>  2 files changed, 9 insertions(+), 8 deletions(-)
> 
> -- 
> 2.48.1
>
Christian Brauner April 17, 2025, 9:04 a.m. UTC | #4
On Thu, 17 Apr 2025 00:16:24 +0200, Mateusz Guzik wrote:
> since path looku is being looked at, two extra nits from me:
> 
> 1. some trivial jump avoidance in inode_permission()
> 
> 2. but more importantly avoiding a memory access which is most likely a
> cache miss when descending into devcgroup_inode_permission()
> 
> [...]

Applied to the vfs-6.16.misc branch of the vfs/vfs.git tree.
Patches in the vfs-6.16.misc branch should appear in linux-next soon.

Please report any outstanding bugs that were missed during review in a
new review to the original patch series allowing us to drop it.

It's encouraged to provide Acked-bys and Reviewed-bys even though the
patch has now been applied. If possible patch trailers will be updated.

Note that commit hashes shown below are subject to change due to rebase,
trailer updates or similar. If in doubt, please check the listed branch.

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git
branch: vfs-6.16.misc

[1/2] fs: touch up predicts in inode_permission()
      https://git.kernel.org/vfs/vfs/c/305a4329d07c
[2/2] device_cgroup: avoid access to ->i_rdev in the common case in devcgroup_inode_permission()
      https://git.kernel.org/vfs/vfs/c/328ba0291442