Message ID | 20200605142300.14591-1-linux@rasmusvillemoes.dk (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [resend] fs/namei.c: micro-optimize acl_permission_check | expand |
On Fri, Jun 5, 2020 at 7:23 AM Rasmus Villemoes <linux@rasmusvillemoes.dk> wrote: > > + /* > + * If the "group" and "other" permissions are the same, > + * there's no point calling in_group_p() to decide which > + * set to use. > + */ > + if ((((mode >> 3) ^ mode) & 7) && in_group_p(inode->i_gid)) > mode >>= 3; Ugh. Not only is this ugly, but it's not even the best optimization. We don't care that group and other match exactly. We only care that they match in the low 3 bits of the "mask" bits. So if we want this optimization - and it sounds worth it - I think we should do it right. But I also think it should be written more legibly. And the "& 7" is the same "& (MAY_READ | MAY_WRITE | MAY_EXEC)" we do later. In other words, if we do this, I'd like it to be done even more aggressively, but I'd also like the end result to be a lot more readable and have more comments about why we do that odd thing. Something like this *UNTESTED* patch, perhaps? I might have gotten something wrong, so this would need double-checking, but if it's right, I find it a _lot_ more easy to understand than making one expression that is pretty complicated and opaque. Hmm? Linus
On 05/06/2020 22.18, Linus Torvalds wrote: > On Fri, Jun 5, 2020 at 7:23 AM Rasmus Villemoes > <linux@rasmusvillemoes.dk> wrote: >> >> + /* >> + * If the "group" and "other" permissions are the same, >> + * there's no point calling in_group_p() to decide which >> + * set to use. >> + */ >> + if ((((mode >> 3) ^ mode) & 7) && in_group_p(inode->i_gid)) >> mode >>= 3; > > Ugh. Not only is this ugly, but it's not even the best optimization. > > We don't care that group and other match exactly. We only care that > they match in the low 3 bits of the "mask" bits. Yes, I did think about that, but I thought this was the more obviously correct approach, and that in practice one only sees the 0X44 and 0X55 cases. > So if we want this optimization - and it sounds worth it - I think we > should do it right. But I also think it should be written more > legibly. > > And the "& 7" is the same "& (MAY_READ | MAY_WRITE | MAY_EXEC)" we do later. > > In other words, if we do this, I'd like it to be done even more > aggressively, but I'd also like the end result to be a lot more > readable and have more comments about why we do that odd thing. > > Something like this *UNTESTED* patch, perhaps? That will kinda work, except you do that mask &= MAY_RWX before check_acl(), which cares about MAY_NOT_BLOCK and who knows what other bits. > I might have gotten something wrong, so this would need > double-checking, but if it's right, I find it a _lot_ more easy to > understand than making one expression that is pretty complicated and > opaque. Well, I thought this was readable enough with the added comment. There's already that magic constant 3 in the shifts, so the 7 seemed entirely sensible, though one could spell it 0007. Whatever. Perhaps this? As a whole function, I think that's a bit easier for brain-storming. It's your patch, just with that rwx thing used instead of mask, except for the call to check_acl(). static int acl_permission_check(struct inode *inode, int mask) { unsigned int mode = inode->i_mode; unsigned int rwx = mask & (MAY_READ | MAY_WRITE | MAY_EXEC); /* Are we the owner? If so, ACL's don't matter */ if (likely(uid_eq(current_fsuid(), inode->i_uid))) { if ((rwx << 6) & ~mode) return -EACCES; return 0; } /* Do we have ACL's? */ if (IS_POSIXACL(inode) && (mode & S_IRWXG)) { int error = check_acl(inode, mask); if (error != -EAGAIN) return error; } /* * Are the group permissions different from * the other permissions in the bits we care * about? Need to check group ownership if so. */ if (rwx & (mode ^ (mode >> 3))) { if (in_group_p(inode->i_gid)) mode >>= 3; } /* Bits in 'mode' clear that we require? */ return (rwx & ~mode) ? -EACCES : 0; } Rasmus
On Sun, Jun 7, 2020 at 6:22 AM Rasmus Villemoes <linux@rasmusvillemoes.dk> wrote: > > Yes, I did think about that, but I thought this was the more obviously > correct approach, and that in practice one only sees the 0X44 and 0X55 > cases. I'm not sure about that - it probably depends on your umask. Because I see a lot of -rw-rw-r--. in my home directory, and it looks like I have a umask of 0002. That's just the Fedora default, I think. Looking at /etc/bashrc, it does if [ $UID -gt 199 ] && [ "`/usr/bin/id -gn`" = "`/usr/bin/id -un`" ]; then umask 002 else umask 022 fi iow, if you have the same user-name and group name, then umask is 002 by default for regular users. Honestly, I'm not sure why Fedora has that "each user has its own group" thing, but it's at least one common setup. So I think that the system you are looking at just happens to have umask 0022, which is traditional when you have just a 'user' group. > That will kinda work, except you do that mask &= MAY_RWX before > check_acl(), which cares about MAY_NOT_BLOCK and who knows what other bits. Good catch. > Perhaps this? As a whole function, I think that's a bit easier for > brain-storming. It's your patch, just with that rwx thing used instead > of mask, except for the call to check_acl(). Looks fine to me. Once we have to have rwx/mask separate, I'm not sure it's worth having that early masking at all (didn't check what the register pressure is over that "check_acl()" call, but at least it is fairly easy to follow along. Send me a patch with commit message etc, and I'll apply it. Linus
On Sun, Jun 7, 2020 at 9:37 AM Linus Torvalds <torvalds@linux-foundation.org> wrote: > > > That will kinda work, except you do that mask &= MAY_RWX before > > check_acl(), which cares about MAY_NOT_BLOCK and who knows what other bits. > > Good catch. With the change to not clear the non-rwx bits in general, the owner case now wants to do the masking, and then the "shift left by 6" modification makes no sense since it only makes for a bigger constant (the only reason to do the shift-right was so that the bitwise not of the i_mode could be done in parallel with the shift, but with the masking that instruction scheduling optimization becomes kind of immaterial too). So I modified that patch to not bother, and add a comment about MAY_NOT_BLOCK. And since I was looking at the MAY_NOT_BLOCK logic, it was not only not mentioned in comments, it also had some really confusing code around it. The posix_acl_permission() looked like it tried to conserve that bit, which is completely wrong. It wasn't a bug only for the simple reason that the only two call-sites had either explicitly cleared the bit when calling, or had tested that the bit wasn't set in the first place. So as a result, I wrote a second patch to clear that confusion up. Rasmus, say the word and I'll mark you for authorship on the first one. Comments? Can you find something else wrong here, or some other fixup to do? Al, any reaction? Linus
On Sun, Jun 07, 2020 at 12:48:53PM -0700, Linus Torvalds wrote: > Rasmus, say the word and I'll mark you for authorship on the first one. > > Comments? Can you find something else wrong here, or some other fixup to do? > > Al, any reaction? It's correct, but this > + if (mask & (mode ^ (mode >> 3))) { > + if (in_group_p(inode->i_gid)) > + mode >>= 3; > + } > + > + /* Bits in 'mode' clear that we require? */ > + return (mask & ~mode) ? -EACCES : 0; might be easier to follow if we had, from the very beginning done unsigned int deny = ~inode->i_mode; and turned that into // for group the bits 3..5 apply, for others - 0..2 // we only care which to use when they do not // agree anyway. if (mask & (deny ^ (deny >> 3))) // mask & deny != mask & (deny >> 3) if (in_... deny >>= 3; return mask & deny ? -EACCES : 0; Hell knows...
On 07/06/2020 21.48, Linus Torvalds wrote: > On Sun, Jun 7, 2020 at 9:37 AM Linus Torvalds > <torvalds@linux-foundation.org> wrote: >> >>> That will kinda work, except you do that mask &= MAY_RWX before >>> check_acl(), which cares about MAY_NOT_BLOCK and who knows what other bits. >> >> Good catch. > > With the change to not clear the non-rwx bits in general, the owner > case now wants to do the masking, and then the "shift left by 6" > modification makes no sense since it only makes for a bigger constant > (the only reason to do the shift-right was so that the bitwise not of > the i_mode could be done in parallel with the shift, but with the > masking that instruction scheduling optimization becomes kind of > immaterial too). So I modified that patch to not bother, and add a > comment about MAY_NOT_BLOCK. > > And since I was looking at the MAY_NOT_BLOCK logic, it was not only > not mentioned in comments, it also had some really confusing code > around it. > > The posix_acl_permission() looked like it tried to conserve that bit, > which is completely wrong. It wasn't a bug only for the simple reason > that the only two call-sites had either explicitly cleared the bit > when calling, or had tested that the bit wasn't set in the first > place. > > So as a result, I wrote a second patch to clear that confusion up. > > Rasmus, say the word and I'll mark you for authorship on the first one. It might be a bit confusing with me mentioned in the third person and then also author, and it's really mostly your patch, so reported-by is fine with me. But it's up to you. > Comments? Can you find something else wrong here, or some other fixup to do? No, I think it's ok. I took a look at the disassembly and it looks fine. There's an extra push/pop of %r14 [that's where gcc computes mode>>3, then CSE allows it to do cmovne %r14d,%ebx after in_group_p), so the owner case gets slightly penalized. I think/hope the savings from avoiding the in_group_p should compensate for that - any absolute path open() by non-root saves at least two in_group_p. YMMV. Rasmus
On Sun, Jun 7, 2020 at 7:05 PM Al Viro <viro@zeniv.linux.org.uk> wrote: > > return mask & deny ? -EACCES : 0; I agree that 'deny' would be simpler to read here, but in other places it would look odd. ie the IS_POSIXACL() thing that checks for "are group bits set" still wants the mode. And I'd hate to have us use code that then mixes 'deny' and 'mode' when they are directly related to each other. Anyway, I merged the patch as-is, I guess we can make future changes to this if you feel strongly about it. Linus
diff --git a/fs/namei.c b/fs/namei.c index d81f73ff1a8b..c6f0c6643db5 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -303,7 +303,12 @@ static int acl_permission_check(struct inode *inode, int mask) return error; } - if (in_group_p(inode->i_gid)) + /* + * If the "group" and "other" permissions are the same, + * there's no point calling in_group_p() to decide which + * set to use. + */ + if ((((mode >> 3) ^ mode) & 7) && in_group_p(inode->i_gid)) mode >>= 3; }
Just something like open(/usr/include/sys/stat.h) causes five calls of generic_permission -> acl_permission_check -> in_group_p; if the compiler first tried /usr/local/include/..., that would be a few more. Altogether, on a bog-standard Ubuntu 20.04 install, a workload consisting of compiling lots of userspace programs (i.e., calling lots of short-lived programs that all need to get their shared libs mapped in, and the compilers poking around looking for system headers - lots of /usr/lib, /usr/bin, /usr/include/ accesses) puts in_group_p around 0.1% according to perf top. With an artificial load such as while true ; do find /usr/ -print0 | xargs -0 stat > /dev/null ; done that jumps to over 0.4%. System-installed files are almost always 0755 (directories and binaries) or 0644, so in most cases, we can avoid the binary search and the cost of pulling the cred->groups array and in_group_p() .text into the cpu cache. Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> --- fs/namei.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)