Message ID | 20250416221626.2710239-1-mjguzik@gmail.com (mailing list archive) |
---|---|
Headers | show |
Series | two nits for path lookup | expand |
On Thu, Apr 17, 2025 at 12:16 AM Mateusz Guzik <mjguzik@gmail.com> wrote: > > since path looku is being looked at, two extra nits from me: > > 1. some trivial jump avoidance in inode_permission() > > 2. but more importantly avoiding a memory access which is most likely a > cache miss when descending into devcgroup_inode_permission() > > the file seems to have no maintainer fwiw > > anyhow I'm confident the way forward is to add IOP_FAST_MAY_EXEC (or > similar) to elide inode_permission() in the common case to begin with. > There are quite a few branches which straight up don't need execute. .. the bit would be set if everyone has the x perm on the inode, there are no acls and the thing is a directory The perm to check being MAY_EXEC elides the MAY_WRITE check in sb_permission(). The bit only showing up on directories means this is not a device, eliding devcgroup_inode_permission() The bit being set means there is no need to separately check for the mode and acls. I have hooks in the same spot as security_* callbacks for setattr and setacl + a CONFIG_DEBUG_VFS-guarded runtime check that the bit is only set if there are indeed no acls and the mode grants x for everyone. It also handles races against setattr/getacl. I just need to clean this up + do more testing. > On top of that btrfs has a permission hook only to check for MAY_WRITE, > which in case of path lookup is not set. With the above flag the call > will be avoided. > > Mateusz Guzik (2): > fs: touch up predicts in inode_permission() > device_cgroup: avoid access to ->i_rdev in the common case in > devcgroup_inode_permission() > > fs/namei.c | 10 +++++----- > include/linux/device_cgroup.h | 7 ++++--- > 2 files changed, 9 insertions(+), 8 deletions(-) > > -- > 2.48.1 >
On Wed, 16 Apr 2025 at 15:16, Mateusz Guzik <mjguzik@gmail.com> wrote: > > since path looku is being looked at, two extra nits from me: Ack, both look sane to me. Linus
On Thu, Apr 17, 2025 at 12:16:24AM +0200, Mateusz Guzik wrote: > since path looku is being looked at, two extra nits from me: > > 1. some trivial jump avoidance in inode_permission() > > 2. but more importantly avoiding a memory access which is most likely a > cache miss when descending into devcgroup_inode_permission() Serge did maintain this for a while. But honestly it is an absolute legacy eyesore from the cgroup v1 days. Somehow it was decided that device permission management is a good fit for cgroups. Idk, I have ranted about this in other places at length. No use warming that back up. They later decided to reimplement device access management as a dedicated bpf program type. That imho is another bad design decision. What should've happend is that device access management should've just been implemented through the bpf-LSM infrastructure. That way all this stuff would've gone through security_inode_permission() instead of us having to have two separate calls in inode_permission(). I would love to kill this call. And cgroup v1 is deprecated and systemd has dropped any support for it last year as well. > > the file seems to have no maintainer fwiw > > anyhow I'm confident the way forward is to add IOP_FAST_MAY_EXEC (or > similar) to elide inode_permission() in the common case to begin with. > There are quite a few branches which straight up don't need execute. Yes. > On top of that btrfs has a permission hook only to check for MAY_WRITE, > which in case of path lookup is not set. With the above flag the call > will be avoided. > > Mateusz Guzik (2): > fs: touch up predicts in inode_permission() > device_cgroup: avoid access to ->i_rdev in the common case in > devcgroup_inode_permission() > > fs/namei.c | 10 +++++----- > include/linux/device_cgroup.h | 7 ++++--- > 2 files changed, 9 insertions(+), 8 deletions(-) > > -- > 2.48.1 >
On Thu, 17 Apr 2025 00:16:24 +0200, Mateusz Guzik wrote: > since path looku is being looked at, two extra nits from me: > > 1. some trivial jump avoidance in inode_permission() > > 2. but more importantly avoiding a memory access which is most likely a > cache miss when descending into devcgroup_inode_permission() > > [...] Applied to the vfs-6.16.misc branch of the vfs/vfs.git tree. Patches in the vfs-6.16.misc branch should appear in linux-next soon. Please report any outstanding bugs that were missed during review in a new review to the original patch series allowing us to drop it. It's encouraged to provide Acked-bys and Reviewed-bys even though the patch has now been applied. If possible patch trailers will be updated. Note that commit hashes shown below are subject to change due to rebase, trailer updates or similar. If in doubt, please check the listed branch. tree: https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git branch: vfs-6.16.misc [1/2] fs: touch up predicts in inode_permission() https://git.kernel.org/vfs/vfs/c/305a4329d07c [2/2] device_cgroup: avoid access to ->i_rdev in the common case in devcgroup_inode_permission() https://git.kernel.org/vfs/vfs/c/328ba0291442