Message ID | 20240526034506.GZ2118490@ZenIV (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [CFT,experimental] net/socket.c: use straight fdget/fdput (resend) | expand |
On Sat, 25 May 2024 at 20:45, Al Viro <viro@zeniv.linux.org.uk> wrote: > > Checking the theory that the important part in sockfd_lookup_light() is > avoiding needless file refcount operations, not the marginal reduction > of the register pressure from not keeping a struct file pointer in the > caller. Yeah, the register pressure thing is not likely an issue. That said, I think we can fix it. > If that's true, we should get the same benefits from straight fdget()/fdput(). Agreed. That said, your patch is just horrendous. > +static inline bool fd_empty(struct fd f) > +{ > + // better code with gcc that way > + return unlikely(!(f.flags | (unsigned long)f.file)); > +} Ok, this is disgusting. I went all "WTF?" And then looked at why you would say that testing two different fields in a 'struct fd' would possibly be better than just checking one. And I see why that's the case - it basically short-circuits the (inlined) __to_fd() logic, and the compiler can undo it and look at the original single-word value. But it's still disgusting. If there is anything else in between, the compiler wouldn't then notice any more. What we *really* should do is have 'struct fd' just *be* that single-word value, never expand it to two words at all, and instead of doing "fd.file" we'd do "fd_file(fd)" and "fd_flags(fd)". Maybe it's not too late to do that? This is *particularly* true for the socket code that doesn't even want the 'struct file *' at all outside of that "check that it's a socket, then turn it into a socket pointer". So _particularly_ in that context, having a "fd_file()" helper to do the (trivial) unpacking would work very well. But even without changing 'struct fd', maybe we could just have "__fdget()" and friends not return a "unsigned long", but a "struct rawfd". Which would is a struct with an unsigned long. There's actually a possible future standard C++ extension for what we want to do ("C++ tagged pointers") and while it might make it into C eventually, we'd have to do it manully with ugly inline helpers (LLVM does it manually in C++ with a PointerIntPair class). IOW, we could do something like the attached. I think it's actually almost a cleanup, and now your socket things can use "struct rawfd" and that fd_empty() becomes static inline bool rawfd_empty(struct rawfd raw) { return !raw.word; } instead. Still somewhat disgusting, but now it's a "C doesn't have tagged pointers, so we do this by hand" _understandable_ disgusting. Hmm? The attached patch compiles. It looks "ObviouslyCorrect(tm)". But it might be "TotallyBroken(tm)". You get the idea. Linus
On Sat, 25 May 2024 at 22:07, Linus Torvalds <torvalds@linux-foundation.org> wrote: > > IOW, we could do something like the attached. I think it's actually > almost a cleanup [..] The reason I say that is this part of the patch: -static inline struct fd __to_fd(unsigned long v) +static inline struct fd __to_fd(struct rawfd raw) { - return (struct fd){(struct file *)(v & ~3),v & 3}; + return (struct fd){fdfile(raw),fdflags(raw)}; } which I think actually improves on our current situation. No, my fdfile/fdflags functions aren't pretty, but they have more type safety than the bare "unsigned long", and in that "__to_fd()" case I think it actually makes that unpacking operation much more obvious. It might be good to have the reverse "packing" helper inline function too, so that __fdget() wouldn't do this: return (struct rawfd) { FDPUT_FPUT | (unsigned long)file }; but instead have some kind of "mkrawfd()" helper that does the above, and we'd have return mkrawfd(file, FDPUT_FPUT); instead. That would obviate my EMPTY_RAWFD macro, and we'd just use "mkrawfd(NULL, 0)"? Maybe this is bikeshedding. But I'd rather have this kind of explicit one-word interface with a wrapped tagged pointer than have something as subtle as your fd_empty() that magically generates better code in certain very specific circumstances. Linus
On Sat, May 25, 2024 at 10:07:39PM -0700, Linus Torvalds wrote: > On Sat, 25 May 2024 at 20:45, Al Viro <viro@zeniv.linux.org.uk> wrote: > > > > Checking the theory that the important part in sockfd_lookup_light() is > > avoiding needless file refcount operations, not the marginal reduction > > of the register pressure from not keeping a struct file pointer in the > > caller. > > Yeah, the register pressure thing is not likely an issue. That said, I > think we can fix it. > > > If that's true, we should get the same benefits from straight fdget()/fdput(). > > Agreed. > > That said, your patch is just horrendous. > > > +static inline bool fd_empty(struct fd f) > > +{ > > + // better code with gcc that way > > + return unlikely(!(f.flags | (unsigned long)f.file)); > > +} > > Ok, this is disgusting. I went all "WTF?" > > And then looked at why you would say that testing two different fields > in a 'struct fd' would possibly be better than just checking one. > > And I see why that's the case - it basically short-circuits the > (inlined) __to_fd() logic, and the compiler can undo it and look at > the original single-word value. Not really. The real reason is different - there is a constraint on possible values of struct fd. No valid instance can ever have NULL file and non-zero flags. The usual pattern is this: fd = something; if (!fd.file) return -EBADF; // we know that fd.file != NULL .... fdput(fd); return something_else; Switch to __cleanup would expand into something like fd = something; if (!fd.file) { fdput(fd); return -EBADF; } // we know that fd.file != NULL ... fdput(fd); return something_else; Look at the early exit there in more details: if (!fd.file) { if (fd.flags & FDPUT_FPUT) fput(fd.file); // would oops return -EBADF; } It *is* correct, but only because NULL fd.file can only happen with zero fd.flags, so this fput() call is a dead code. Boths your and mine variants are equivalent to this: if (!fd.file && !fd.flags) { fpdut(fd); // compiler will eliminate it return -EBADF; } // we assume that fd.file is non-NULL now ... fdput(fd); return something_else; and IMO both attack the problem at the wrong place. If we keep the check as it is (NULL fd.file) and turn fdput() into if ((fd.flags & FDPUT_FPUT) && fd.file) fput(fd.file); we will get obviously correct behaviour with good code generation - pretty much all current users of fdput() are downstream of the check that fd.file is non-NULL and gcc will be able to optimize the second part of condition away. IOW, we don't disturb the existing explicit callers and the only implicit ones (from __cleanup()) that might get an extra check are the ones on failure exits between getting fd and checking it to be empty. Not a lot of those... How about the following? commit b3692a6e34e9dcfb1d9705782bc928bd04d5640d Author: Al Viro <viro@zeniv.linux.org.uk> Date: Sat May 25 23:32:20 2024 -0400 [experimental] net/socket.c: use straight fdget/fdput Checking the theory that the important part in sockfd_lookup_light() is avoiding needless file refcount operations, not the marginal reduction of the register pressure from not keeping a struct file pointer in the caller. If that's true, we should get the same benefits from straight fdget()/fdput(). And AFAICS with sane use of CLASS(fd) we can get a better code generation... Would be nice if somebody tested it on networking test suites (including benchmarks)... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> diff --git a/include/linux/file.h b/include/linux/file.h index 45d0f4800abd..c482c1e39ebc 100644 --- a/include/linux/file.h +++ b/include/linux/file.h @@ -29,12 +29,6 @@ extern struct file *alloc_file_pseudo_noaccount(struct inode *, struct vfsmount extern struct file *alloc_file_clone(struct file *, int flags, const struct file_operations *); -static inline void fput_light(struct file *file, int fput_needed) -{ - if (fput_needed) - fput(file); -} - struct fd { struct file *file; unsigned int flags; @@ -44,7 +38,7 @@ struct fd { static inline void fdput(struct fd fd) { - if (fd.flags & FDPUT_FPUT) + if ((fd.flags & FDPUT_FPUT) && fd.file) fput(fd.file); } @@ -76,6 +70,11 @@ static inline struct fd fdget_pos(int fd) return __to_fd(__fdget_pos(fd)); } +static inline bool fd_empty(struct fd f) +{ + return unlikely(!f.file); +} + static inline void fdput_pos(struct fd f) { if (f.flags & FDPUT_POS_UNLOCK) diff --git a/net/socket.c b/net/socket.c index e416920e9399..297293797df2 100644 --- a/net/socket.c +++ b/net/socket.c @@ -510,7 +510,7 @@ static int sock_map_fd(struct socket *sock, int flags) struct socket *sock_from_file(struct file *file) { - if (file->f_op == &socket_file_ops) + if (likely(file->f_op == &socket_file_ops)) return file->private_data; /* set in sock_alloc_file */ return NULL; @@ -550,24 +550,6 @@ struct socket *sockfd_lookup(int fd, int *err) } EXPORT_SYMBOL(sockfd_lookup); -static struct socket *sockfd_lookup_light(int fd, int *err, int *fput_needed) -{ - struct fd f = fdget(fd); - struct socket *sock; - - *err = -EBADF; - if (f.file) { - sock = sock_from_file(f.file); - if (likely(sock)) { - *fput_needed = f.flags & FDPUT_FPUT; - return sock; - } - *err = -ENOTSOCK; - fdput(f); - } - return NULL; -} - static ssize_t sockfs_listxattr(struct dentry *dentry, char *buffer, size_t size) { @@ -1834,23 +1816,25 @@ int __sys_bind(int fd, struct sockaddr __user *umyaddr, int addrlen) { struct socket *sock; struct sockaddr_storage address; - int err, fput_needed; - - sock = sockfd_lookup_light(fd, &err, &fput_needed); - if (sock) { - err = move_addr_to_kernel(umyaddr, addrlen, &address); - if (!err) { - err = security_socket_bind(sock, - (struct sockaddr *)&address, - addrlen); - if (!err) - err = READ_ONCE(sock->ops)->bind(sock, - (struct sockaddr *) - &address, addrlen); - } - fput_light(sock->file, fput_needed); - } - return err; + CLASS(fd, f)(fd); + int err; + + if (fd_empty(f)) + return -EBADF; + sock = sock_from_file(f.file); + if (unlikely(!sock)) + return -ENOTSOCK; + + err = move_addr_to_kernel(umyaddr, addrlen, &address); + if (unlikely(err)) + return err; + + err = security_socket_bind(sock, (struct sockaddr *)&address, addrlen); + if (unlikely(err)) + return err; + + return READ_ONCE(sock->ops)->bind(sock, + (struct sockaddr *)&address, addrlen); } SYSCALL_DEFINE3(bind, int, fd, struct sockaddr __user *, umyaddr, int, addrlen) @@ -1867,21 +1851,24 @@ SYSCALL_DEFINE3(bind, int, fd, struct sockaddr __user *, umyaddr, int, addrlen) int __sys_listen(int fd, int backlog) { struct socket *sock; - int err, fput_needed; + CLASS(fd, f)(fd); int somaxconn; + int err; - sock = sockfd_lookup_light(fd, &err, &fput_needed); - if (sock) { - somaxconn = READ_ONCE(sock_net(sock->sk)->core.sysctl_somaxconn); - if ((unsigned int)backlog > somaxconn) - backlog = somaxconn; + if (fd_empty(f)) + return -EBADF; + sock = sock_from_file(f.file); + if (unlikely(!sock)) + return -ENOTSOCK; - err = security_socket_listen(sock, backlog); - if (!err) - err = READ_ONCE(sock->ops)->listen(sock, backlog); + somaxconn = READ_ONCE(sock_net(sock->sk)->core.sysctl_somaxconn); + if ((unsigned int)backlog > somaxconn) + backlog = somaxconn; + + err = security_socket_listen(sock, backlog); + if (!err) + err = READ_ONCE(sock->ops)->listen(sock, backlog); - fput_light(sock->file, fput_needed); - } return err; } @@ -1992,17 +1979,12 @@ static int __sys_accept4_file(struct file *file, struct sockaddr __user *upeer_s int __sys_accept4(int fd, struct sockaddr __user *upeer_sockaddr, int __user *upeer_addrlen, int flags) { - int ret = -EBADF; - struct fd f; + CLASS(fd, f)(fd); - f = fdget(fd); - if (f.file) { - ret = __sys_accept4_file(f.file, upeer_sockaddr, + if (fd_empty(f)) + return -EBADF; + return __sys_accept4_file(f.file, upeer_sockaddr, upeer_addrlen, flags); - fdput(f); - } - - return ret; } SYSCALL_DEFINE4(accept4, int, fd, struct sockaddr __user *, upeer_sockaddr, @@ -2054,20 +2036,18 @@ int __sys_connect_file(struct file *file, struct sockaddr_storage *address, int __sys_connect(int fd, struct sockaddr __user *uservaddr, int addrlen) { - int ret = -EBADF; - struct fd f; + struct sockaddr_storage address; + CLASS(fd, f)(fd); + int ret; - f = fdget(fd); - if (f.file) { - struct sockaddr_storage address; + if (fd_empty(f)) + return -EBADF; - ret = move_addr_to_kernel(uservaddr, addrlen, &address); - if (!ret) - ret = __sys_connect_file(f.file, &address, addrlen, 0); - fdput(f); - } + ret = move_addr_to_kernel(uservaddr, addrlen, &address); + if (ret) + return ret; - return ret; + return __sys_connect_file(f.file, &address, addrlen, 0); } SYSCALL_DEFINE3(connect, int, fd, struct sockaddr __user *, uservaddr, @@ -2086,26 +2066,25 @@ int __sys_getsockname(int fd, struct sockaddr __user *usockaddr, { struct socket *sock; struct sockaddr_storage address; - int err, fput_needed; + CLASS(fd, f)(fd); + int err; - sock = sockfd_lookup_light(fd, &err, &fput_needed); - if (!sock) - goto out; + if (fd_empty(f)) + return -EBADF; + sock = sock_from_file(f.file); + if (unlikely(!sock)) + return -ENOTSOCK; err = security_socket_getsockname(sock); if (err) - goto out_put; + return err; err = READ_ONCE(sock->ops)->getname(sock, (struct sockaddr *)&address, 0); if (err < 0) - goto out_put; - /* "err" is actually length in this case */ - err = move_addr_to_user(&address, err, usockaddr, usockaddr_len); + return err; -out_put: - fput_light(sock->file, fput_needed); -out: - return err; + /* "err" is actually length in this case */ + return move_addr_to_user(&address, err, usockaddr, usockaddr_len); } SYSCALL_DEFINE3(getsockname, int, fd, struct sockaddr __user *, usockaddr, @@ -2124,26 +2103,25 @@ int __sys_getpeername(int fd, struct sockaddr __user *usockaddr, { struct socket *sock; struct sockaddr_storage address; - int err, fput_needed; + CLASS(fd, f)(fd); + int err; - sock = sockfd_lookup_light(fd, &err, &fput_needed); - if (sock != NULL) { - const struct proto_ops *ops = READ_ONCE(sock->ops); + if (fd_empty(f)) + return -EBADF; + sock = sock_from_file(f.file); + if (unlikely(!sock)) + return -ENOTSOCK; - err = security_socket_getpeername(sock); - if (err) { - fput_light(sock->file, fput_needed); - return err; - } + err = security_socket_getpeername(sock); + if (err) + return err; - err = ops->getname(sock, (struct sockaddr *)&address, 1); - if (err >= 0) - /* "err" is actually length in this case */ - err = move_addr_to_user(&address, err, usockaddr, - usockaddr_len); - fput_light(sock->file, fput_needed); - } - return err; + err = READ_ONCE(sock->ops)->getname(sock, (struct sockaddr *)&address, 1); + if (err < 0) + return err; + + /* "err" is actually length in this case */ + return move_addr_to_user(&address, err, usockaddr, usockaddr_len); } SYSCALL_DEFINE3(getpeername, int, fd, struct sockaddr __user *, usockaddr, @@ -2162,16 +2140,13 @@ int __sys_sendto(int fd, void __user *buff, size_t len, unsigned int flags, { struct socket *sock; struct sockaddr_storage address; - int err; struct msghdr msg; - int fput_needed; + CLASS(fd, f)(fd); + int err; err = import_ubuf(ITER_SOURCE, buff, len, &msg.msg_iter); if (unlikely(err)) return err; - sock = sockfd_lookup_light(fd, &err, &fput_needed); - if (!sock) - goto out; msg.msg_name = NULL; msg.msg_control = NULL; @@ -2181,20 +2156,22 @@ int __sys_sendto(int fd, void __user *buff, size_t len, unsigned int flags, if (addr) { err = move_addr_to_kernel(addr, addr_len, &address); if (err < 0) - goto out_put; + return err; msg.msg_name = (struct sockaddr *)&address; msg.msg_namelen = addr_len; } + + if (fd_empty(f)) + return -EBADF; + sock = sock_from_file(f.file); + if (unlikely(!sock)) + return -ENOTSOCK; + flags &= ~MSG_INTERNAL_SENDMSG_FLAGS; if (sock->file->f_flags & O_NONBLOCK) flags |= MSG_DONTWAIT; msg.msg_flags = flags; - err = __sock_sendmsg(sock, &msg); - -out_put: - fput_light(sock->file, fput_needed); -out: - return err; + return __sock_sendmsg(sock, &msg); } SYSCALL_DEFINE6(sendto, int, fd, void __user *, buff, size_t, len, @@ -2229,14 +2206,17 @@ int __sys_recvfrom(int fd, void __user *ubuf, size_t size, unsigned int flags, }; struct socket *sock; int err, err2; - int fput_needed; + CLASS(fd, f)(fd); + + if (fd_empty(f)) + return -EBADF; + sock = sock_from_file(f.file); + if (unlikely(!sock)) + return -ENOTSOCK; err = import_ubuf(ITER_DEST, ubuf, size, &msg.msg_iter); if (unlikely(err)) return err; - sock = sockfd_lookup_light(fd, &err, &fput_needed); - if (!sock) - goto out; if (sock->file->f_flags & O_NONBLOCK) flags |= MSG_DONTWAIT; @@ -2248,9 +2228,6 @@ int __sys_recvfrom(int fd, void __user *ubuf, size_t size, unsigned int flags, if (err2 < 0) err = err2; } - - fput_light(sock->file, fput_needed); -out: return err; } @@ -2325,17 +2302,16 @@ int __sys_setsockopt(int fd, int level, int optname, char __user *user_optval, { sockptr_t optval = USER_SOCKPTR(user_optval); bool compat = in_compat_syscall(); - int err, fput_needed; struct socket *sock; + CLASS(fd, f)(fd); - sock = sockfd_lookup_light(fd, &err, &fput_needed); - if (!sock) - return err; + if (fd_empty(f)) + return -EBADF; + sock = sock_from_file(f.file); + if (unlikely(!sock)) + return -ENOTSOCK; - err = do_sock_setsockopt(sock, compat, level, optname, optval, optlen); - - fput_light(sock->file, fput_needed); - return err; + return do_sock_setsockopt(sock, compat, level, optname, optval, optlen); } SYSCALL_DEFINE5(setsockopt, int, fd, int, level, int, optname, @@ -2391,20 +2367,17 @@ EXPORT_SYMBOL(do_sock_getsockopt); int __sys_getsockopt(int fd, int level, int optname, char __user *optval, int __user *optlen) { - int err, fput_needed; struct socket *sock; - bool compat; + CLASS(fd, f)(fd); - sock = sockfd_lookup_light(fd, &err, &fput_needed); - if (!sock) - return err; + if (fd_empty(f)) + return -EBADF; + sock = sock_from_file(f.file); + if (unlikely(!sock)) + return -ENOTSOCK; - compat = in_compat_syscall(); - err = do_sock_getsockopt(sock, compat, level, optname, + return do_sock_getsockopt(sock, in_compat_syscall(), level, optname, USER_SOCKPTR(optval), USER_SOCKPTR(optlen)); - - fput_light(sock->file, fput_needed); - return err; } SYSCALL_DEFINE5(getsockopt, int, fd, int, level, int, optname, @@ -2430,15 +2403,16 @@ int __sys_shutdown_sock(struct socket *sock, int how) int __sys_shutdown(int fd, int how) { - int err, fput_needed; struct socket *sock; + CLASS(fd, f)(fd); - sock = sockfd_lookup_light(fd, &err, &fput_needed); - if (sock != NULL) { - err = __sys_shutdown_sock(sock, how); - fput_light(sock->file, fput_needed); - } - return err; + if (fd_empty(f)) + return -EBADF; + sock = sock_from_file(f.file); + if (unlikely(!sock)) + return -ENOTSOCK; + + return __sys_shutdown_sock(sock, how); } SYSCALL_DEFINE2(shutdown, int, fd, int, how) @@ -2654,22 +2628,20 @@ long __sys_sendmsg_sock(struct socket *sock, struct msghdr *msg, long __sys_sendmsg(int fd, struct user_msghdr __user *msg, unsigned int flags, bool forbid_cmsg_compat) { - int fput_needed, err; struct msghdr msg_sys; struct socket *sock; + CLASS(fd, f)(fd); + + if (fd_empty(f)) + return -EBADF; + sock = sock_from_file(f.file); + if (unlikely(!sock)) + return -ENOTSOCK; if (forbid_cmsg_compat && (flags & MSG_CMSG_COMPAT)) return -EINVAL; - sock = sockfd_lookup_light(fd, &err, &fput_needed); - if (!sock) - goto out; - - err = ___sys_sendmsg(sock, msg, &msg_sys, flags, NULL, 0); - - fput_light(sock->file, fput_needed); -out: - return err; + return ___sys_sendmsg(sock, msg, &msg_sys, flags, NULL, 0); } SYSCALL_DEFINE3(sendmsg, int, fd, struct user_msghdr __user *, msg, unsigned int, flags) @@ -2684,13 +2656,20 @@ SYSCALL_DEFINE3(sendmsg, int, fd, struct user_msghdr __user *, msg, unsigned int int __sys_sendmmsg(int fd, struct mmsghdr __user *mmsg, unsigned int vlen, unsigned int flags, bool forbid_cmsg_compat) { - int fput_needed, err, datagrams; + int err, datagrams; struct socket *sock; struct mmsghdr __user *entry; struct compat_mmsghdr __user *compat_entry; struct msghdr msg_sys; struct used_address used_address; unsigned int oflags = flags; + CLASS(fd, f)(fd); + + if (fd_empty(f)) + return -EBADF; + sock = sock_from_file(f.file); + if (unlikely(!sock)) + return -ENOTSOCK; if (forbid_cmsg_compat && (flags & MSG_CMSG_COMPAT)) return -EINVAL; @@ -2700,10 +2679,6 @@ int __sys_sendmmsg(int fd, struct mmsghdr __user *mmsg, unsigned int vlen, datagrams = 0; - sock = sockfd_lookup_light(fd, &err, &fput_needed); - if (!sock) - return err; - used_address.name_len = UINT_MAX; entry = mmsg; compat_entry = (struct compat_mmsghdr __user *)mmsg; @@ -2739,8 +2714,6 @@ int __sys_sendmmsg(int fd, struct mmsghdr __user *mmsg, unsigned int vlen, cond_resched(); } - fput_light(sock->file, fput_needed); - /* We only return an error if no datagrams were able to be sent */ if (datagrams != 0) return datagrams; @@ -2862,22 +2835,20 @@ long __sys_recvmsg_sock(struct socket *sock, struct msghdr *msg, long __sys_recvmsg(int fd, struct user_msghdr __user *msg, unsigned int flags, bool forbid_cmsg_compat) { - int fput_needed, err; struct msghdr msg_sys; struct socket *sock; + CLASS(fd, f)(fd); + + if (fd_empty(f)) + return -EBADF; + sock = sock_from_file(f.file); + if (unlikely(!sock)) + return -ENOTSOCK; if (forbid_cmsg_compat && (flags & MSG_CMSG_COMPAT)) return -EINVAL; - sock = sockfd_lookup_light(fd, &err, &fput_needed); - if (!sock) - goto out; - - err = ___sys_recvmsg(sock, msg, &msg_sys, flags, 0); - - fput_light(sock->file, fput_needed); -out: - return err; + return ___sys_recvmsg(sock, msg, &msg_sys, flags, 0); } SYSCALL_DEFINE3(recvmsg, int, fd, struct user_msghdr __user *, msg, @@ -2894,31 +2865,32 @@ static int do_recvmmsg(int fd, struct mmsghdr __user *mmsg, unsigned int vlen, unsigned int flags, struct timespec64 *timeout) { - int fput_needed, err, datagrams; + int err, datagrams; struct socket *sock; struct mmsghdr __user *entry; struct compat_mmsghdr __user *compat_entry; struct msghdr msg_sys; struct timespec64 end_time; struct timespec64 timeout64; + CLASS(fd, f)(fd); if (timeout && poll_select_set_timeout(&end_time, timeout->tv_sec, timeout->tv_nsec)) return -EINVAL; - datagrams = 0; + if (fd_empty(f)) + return -EBADF; + sock = sock_from_file(f.file); + if (unlikely(!sock)) + return -ENOTSOCK; - sock = sockfd_lookup_light(fd, &err, &fput_needed); - if (!sock) - return err; + datagrams = 0; if (likely(!(flags & MSG_ERRQUEUE))) { err = sock_error(sock->sk); - if (err) { - datagrams = err; - goto out_put; - } + if (err) + return err; } entry = mmsg; @@ -2975,12 +2947,10 @@ static int do_recvmmsg(int fd, struct mmsghdr __user *mmsg, } if (err == 0) - goto out_put; + return datagrams; - if (datagrams == 0) { - datagrams = err; - goto out_put; - } + if (datagrams == 0) + return err; /* * We may return less entries than requested (vlen) if the @@ -2995,9 +2965,6 @@ static int do_recvmmsg(int fd, struct mmsghdr __user *mmsg, */ WRITE_ONCE(sock->sk->sk_err, -err); } -out_put: - fput_light(sock->file, fput_needed); - return datagrams; }
On Sun, 26 May 2024 at 12:27, Al Viro <viro@zeniv.linux.org.uk> wrote: > > Not really. The real reason is different - there is a constraint on > possible values of struct fd. No valid instance can ever have NULL > file and non-zero flags. > > The usual pattern is this: [ snip snip ] Ugh. I still hate it, including your new version. I suspect it will easily generate the extra test at fd_empty() time, and your new version would instead just move that extra test at fdput() time instead. Hopefully in most cases the compiler sees the previous test for fd.file, realizes the new test is unnecessary and optimizes it away. Except we most definitely pass around 'struct fd *' in some places (at least ovlfs), so I doubt that will be the case everywhere. What would make more sense is if you make the "fd_empty()" test be about the _flags_, and then both the fp_empty() test and the test inside fdput() would be testing the same things. Sadly, we'd need another bit in the flags. One option is easy enough - we'd just have to make 'struct file' always be 8-byte aligned, which it effectively always is. Or we'd need to make the rule be that FDPUT_POS_UNLOCK only gets set if FDPUT_FPUT is set. Because I think we could have even a two-bit tag value have that "no fd" case: 00 - no fd 01 - fd but no need for fput 10 - fd needs fput 11 - fd needs pos unlock and fput but as it is, that's not what we have. Right now we have 00 - no fd or fd with no need for fput ("look at fd.file to decide") 01 - fd needs fput 10 - fd pos unlock but no fput 11 - fd pos unlock and fput but that 10 case looks odd to me. Why would we ever need a pos unlock but no fput? The reason we don't need an fput is that we're the only thread that has access to the file pointer, but that also implies that we shouldn't need to lock the position. So now I've just confused myself. Why *do* we have that 10 pattern? Adding a separate bit would certainly avoid any complexity, and then you'd have "flags==0 means no file pointer" and the "fd_empty()" test would then make the fdput) test obviously unnecessary in the usual pattern. Linus
On Sun, May 26, 2024 at 03:16:37PM -0700, Linus Torvalds wrote: > Hopefully in most cases the compiler sees the previous test for > fd.file, realizes the new test is unnecessary and optimizes it away. It does. > Except we most definitely pass around 'struct fd *' in some places (at > least ovlfs), so I doubt that will be the case everywhere. ... and those places need care, really. I've done that review quite recently. I don't think you'd been on Cc in related thread... <checks> nope. || Another is overlayfs ovl_real_fdget() and ovl_real_fdget_meta(). || Those two are arguably too clever for their readers' good. || It's still "borrow or clone, and remember which one had it been", || but that's mixed with returning errors: || err = ovl_read_fdget(file, &fd); || may construct and store a junk value in fd if err is non-zero. || Not a bug, but only because all callers remember to ignore that || value in such case. Junk value as in e.g. <ERR_PTR(-EIO), FDPUT_FPUT>; it's trivial to avoid ( file = ovl_open_realfile(file, &realpath); if (IS_ERR(file)) return ERR_PTR(file); real->flags = FDPUT_FPUT; real->file = file; return 0; instead of real->flags = FDPUT_FPUT; real->file = ovl_open_realfile(file, &realpath); return PTR_ERR_OR_ZERO(real->file); we have there right now and similar for <ERR_PTR(-EPERM), 0> pairs), but note that anything like your switch to struct-wrapped unsigned long would need similar massage there anyway. > What would make more sense is if you make the "fd_empty()" test be > about the _flags_, and then both the fp_empty() test and the test > inside fdput() would be testing the same things. Umm... What encoding would you use? > Sadly, we'd need another bit in the flags. One option is easy enough - > we'd just have to make 'struct file' always be 8-byte aligned, which > it effectively always is. > > Or we'd need to make the rule be that FDPUT_POS_UNLOCK only gets set > if FDPUT_FPUT is set. Huh? That makes no sense - open a file, fork, and there you go; same file reference in unshared descriptor tables in child and parent. You definitely don't want to bump refcount on fdget_pos() in either and you don't want to lose read/write/lseek serialization. FDPUT_FPUT is *not* a property of file; it's about the original reference to file not being guaranteed to stay pinned for the lifetime of struct fd in question...
On Sun, 26 May 2024 at 16:16, Al Viro <viro@zeniv.linux.org.uk> wrote: > > FDPUT_FPUT is *not* a property of file; it's about the original > reference to file not being guaranteed to stay pinned for the lifetime > of struct fd in question... Yup. I forgot the rules and thought we set FDPUT_PUT for cases where we had exclusive access to 'struct file *', but it's for cases where we have exclusive access to the 'int fd'. Linus
On Mon, May 27, 2024 at 12:16:41AM +0100, Al Viro wrote: > > What would make more sense is if you make the "fd_empty()" test be > > about the _flags_, and then both the fp_empty() test and the test > > inside fdput() would be testing the same things. > > Umm... What encoding would you use? FWIW, _if_ we go for replacement of struct fd with struct-wrapped unsigned long and actually steal another bit from pointer, we could actually use that to encode errors as well... If bit 0 is clear for file reference and set for an error, we could use bit 1 to represent borrowed vs. cloned and bit 2 for need to unlock... overlayfs would be happier that way - we could have those functions return struct fd directly, and pass the error value that way. fdget() would report EBADF explicitly on empty slot - not a big deal. Hmm... Let me play with that a bit and see if I can come up with something tolerable. We could add that in parallel to existing struct fd; we'd need a name for replacement, though - anything like "rawfd" is asking for confusion around fdget_raw(), and we'd need replacements for fdget() et.al. that would yield those. Alternatively, with a sane set of helpers we could actually _replace_ struct fd definition without a huge flagday commit - start with adding #define fd_file(fd) ((fd).file) #define fd_valid(fd) (fd_file(fd) != NULL) convert the current users, one by one (fairly mechanical patches), then change the definition of struct fd. The last step would affect * definitions of helpers * adding fd-constructing macros (ERR_FD, etc.) * converting the few places that really construct those suckers (fdget() et.al., overlayfs, a couple of explicit initializers to {NULL, 0}). It's not a huge amount of work and the flagday step would be fairly small, but we'd probably need to split that over a couple of cycles - helpers in one, along with renaming fd.file to e.g. fd.__file_use_the_damn_helpers, then in the next cycle do the switchover. Hell knows; let me play with that a bit and let's see what falls out...
On Mon, 27 May 2024 at 09:31, Al Viro <viro@zeniv.linux.org.uk> wrote: > > Alternatively, with a sane set of helpers we could actually _replace_ > struct fd definition without a huge flagday commit It wouldn't even be all that huge. We've got less than 200 of those 'struct fd' users, and some of them hardly even look at the result, but just pass it off to other helpers (ie the kernel/bpf/syscalls.c pattern). With just a couple of helpers it would be mostly a cleanup. Linus
On Mon, 27 May 2024 at 12:20, Linus Torvalds <torvalds@linux-foundation.org> wrote: > > With just a couple of helpers it would be mostly a cleanup. Looking around, I think we could even take four bits. Without any debugging, 'struct file' is 160 bytes on 32-bit, it would not hurt at all to just force a 16-byte alignment. In fact, in practice it is aligned more than that - we already use SLAB_HWCACHE_ALIGN, which in most cases means that it's 64-byte aligned. But to be safe, we should specify the 16 bytes in the kmem_cache_create() call, and we should just make this all very explicit: --- a/fs/file_table.c +++ b/fs/file_table.c @@ -512,7 +512,7 @@ EXPORT_SYMBOL(__fput_sync); void __init files_init(void) { - filp_cachep = kmem_cache_create("filp", sizeof(struct file), 0, + filp_cachep = kmem_cache_create("filp", sizeof(struct file), 16, SLAB_TYPESAFE_BY_RCU | SLAB_HWCACHE_ALIGN | SLAB_PANIC | SLAB_ACCOUNT, NULL); percpu_counter_init(&nr_files, 0, GFP_KERNEL); --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1025,7 +1025,7 @@ struct file { errseq_t f_wb_err; errseq_t f_sb_err; /* for syncfs */ } __randomize_layout - __attribute__((aligned(4))); /* lest something weird decides that 2 is OK */ + __attribute__((aligned(16))); /* Up to four tag bits */ struct file_handle { __u32 handle_bytes; and while four tag bits isn't something to waste, it looks pretty reasonable here. Linus
On Mon, May 27, 2024 at 05:31:16PM +0100, Al Viro wrote:
> Hell knows; let me play with that a bit and let's see what falls out...
gcc optimizer is... interesting:
if (v & 1)
if (!(v & 3))
foo();
is *NOT* collapsed into a no-op. Not even
if (v & 1)
if (!(v & 1) && !(v & 2))
foo();
is good enough for it.
if (v & 1)
if (!(v & 1))
if (!(v & 2))
foo();
is recognized, thankfully, but... WTF? As far as I can tell,
if (!(v & 1) && !(v & 2))
gets turned into if (!(v & 3)), and then gcc gets stuck on that.
if (!(v & 1)) if (!(v & 2))
is turned into the same, but apparently only after it actually
looks at both parts in context of what it knows about the earlier
checks.
clang handles all forms just fine, but gcc does not ;-/
diff --git a/include/linux/file.h b/include/linux/file.h index 45d0f4800abd..831fda778e23 100644 --- a/include/linux/file.h +++ b/include/linux/file.h @@ -29,12 +29,6 @@ extern struct file *alloc_file_pseudo_noaccount(struct inode *, struct vfsmount extern struct file *alloc_file_clone(struct file *, int flags, const struct file_operations *); -static inline void fput_light(struct file *file, int fput_needed) -{ - if (fput_needed) - fput(file); -} - struct fd { struct file *file; unsigned int flags; @@ -76,6 +70,12 @@ static inline struct fd fdget_pos(int fd) return __to_fd(__fdget_pos(fd)); } +static inline bool fd_empty(struct fd f) +{ + // better code with gcc that way + return unlikely(!(f.flags | (unsigned long)f.file)); +} + static inline void fdput_pos(struct fd f) { if (f.flags & FDPUT_POS_UNLOCK) diff --git a/net/socket.c b/net/socket.c index e416920e9399..297293797df2 100644 --- a/net/socket.c +++ b/net/socket.c @@ -510,7 +510,7 @@ static int sock_map_fd(struct socket *sock, int flags) struct socket *sock_from_file(struct file *file) { - if (file->f_op == &socket_file_ops) + if (likely(file->f_op == &socket_file_ops)) return file->private_data; /* set in sock_alloc_file */ return NULL; @@ -550,24 +550,6 @@ struct socket *sockfd_lookup(int fd, int *err) } EXPORT_SYMBOL(sockfd_lookup); -static struct socket *sockfd_lookup_light(int fd, int *err, int *fput_needed) -{ - struct fd f = fdget(fd); - struct socket *sock; - - *err = -EBADF; - if (f.file) { - sock = sock_from_file(f.file); - if (likely(sock)) { - *fput_needed = f.flags & FDPUT_FPUT; - return sock; - } - *err = -ENOTSOCK; - fdput(f); - } - return NULL; -} - static ssize_t sockfs_listxattr(struct dentry *dentry, char *buffer, size_t size) { @@ -1834,23 +1816,25 @@ int __sys_bind(int fd, struct sockaddr __user *umyaddr, int addrlen) { struct socket *sock; struct sockaddr_storage address; - int err, fput_needed; - - sock = sockfd_lookup_light(fd, &err, &fput_needed); - if (sock) { - err = move_addr_to_kernel(umyaddr, addrlen, &address); - if (!err) { - err = security_socket_bind(sock, - (struct sockaddr *)&address, - addrlen); - if (!err) - err = READ_ONCE(sock->ops)->bind(sock, - (struct sockaddr *) - &address, addrlen); - } - fput_light(sock->file, fput_needed); - } - return err; + CLASS(fd, f)(fd); + int err; + + if (fd_empty(f)) + return -EBADF; + sock = sock_from_file(f.file); + if (unlikely(!sock)) + return -ENOTSOCK; + + err = move_addr_to_kernel(umyaddr, addrlen, &address); + if (unlikely(err)) + return err; + + err = security_socket_bind(sock, (struct sockaddr *)&address, addrlen); + if (unlikely(err)) + return err; + + return READ_ONCE(sock->ops)->bind(sock, + (struct sockaddr *)&address, addrlen); } SYSCALL_DEFINE3(bind, int, fd, struct sockaddr __user *, umyaddr, int, addrlen) @@ -1867,21 +1851,24 @@ SYSCALL_DEFINE3(bind, int, fd, struct sockaddr __user *, umyaddr, int, addrlen) int __sys_listen(int fd, int backlog) { struct socket *sock; - int err, fput_needed; + CLASS(fd, f)(fd); int somaxconn; + int err; - sock = sockfd_lookup_light(fd, &err, &fput_needed); - if (sock) { - somaxconn = READ_ONCE(sock_net(sock->sk)->core.sysctl_somaxconn); - if ((unsigned int)backlog > somaxconn) - backlog = somaxconn; + if (fd_empty(f)) + return -EBADF; + sock = sock_from_file(f.file); + if (unlikely(!sock)) + return -ENOTSOCK; - err = security_socket_listen(sock, backlog); - if (!err) - err = READ_ONCE(sock->ops)->listen(sock, backlog); + somaxconn = READ_ONCE(sock_net(sock->sk)->core.sysctl_somaxconn); + if ((unsigned int)backlog > somaxconn) + backlog = somaxconn; + + err = security_socket_listen(sock, backlog); + if (!err) + err = READ_ONCE(sock->ops)->listen(sock, backlog); - fput_light(sock->file, fput_needed); - } return err; } @@ -1992,17 +1979,12 @@ static int __sys_accept4_file(struct file *file, struct sockaddr __user *upeer_s int __sys_accept4(int fd, struct sockaddr __user *upeer_sockaddr, int __user *upeer_addrlen, int flags) { - int ret = -EBADF; - struct fd f; + CLASS(fd, f)(fd); - f = fdget(fd); - if (f.file) { - ret = __sys_accept4_file(f.file, upeer_sockaddr, + if (fd_empty(f)) + return -EBADF; + return __sys_accept4_file(f.file, upeer_sockaddr, upeer_addrlen, flags); - fdput(f); - } - - return ret; } SYSCALL_DEFINE4(accept4, int, fd, struct sockaddr __user *, upeer_sockaddr, @@ -2054,20 +2036,18 @@ int __sys_connect_file(struct file *file, struct sockaddr_storage *address, int __sys_connect(int fd, struct sockaddr __user *uservaddr, int addrlen) { - int ret = -EBADF; - struct fd f; + struct sockaddr_storage address; + CLASS(fd, f)(fd); + int ret; - f = fdget(fd); - if (f.file) { - struct sockaddr_storage address; + if (fd_empty(f)) + return -EBADF; - ret = move_addr_to_kernel(uservaddr, addrlen, &address); - if (!ret) - ret = __sys_connect_file(f.file, &address, addrlen, 0); - fdput(f); - } + ret = move_addr_to_kernel(uservaddr, addrlen, &address); + if (ret) + return ret; - return ret; + return __sys_connect_file(f.file, &address, addrlen, 0); } SYSCALL_DEFINE3(connect, int, fd, struct sockaddr __user *, uservaddr, @@ -2086,26 +2066,25 @@ int __sys_getsockname(int fd, struct sockaddr __user *usockaddr, { struct socket *sock; struct sockaddr_storage address; - int err, fput_needed; + CLASS(fd, f)(fd); + int err; - sock = sockfd_lookup_light(fd, &err, &fput_needed); - if (!sock) - goto out; + if (fd_empty(f)) + return -EBADF; + sock = sock_from_file(f.file); + if (unlikely(!sock)) + return -ENOTSOCK; err = security_socket_getsockname(sock); if (err) - goto out_put; + return err; err = READ_ONCE(sock->ops)->getname(sock, (struct sockaddr *)&address, 0); if (err < 0) - goto out_put; - /* "err" is actually length in this case */ - err = move_addr_to_user(&address, err, usockaddr, usockaddr_len); + return err; -out_put: - fput_light(sock->file, fput_needed); -out: - return err; + /* "err" is actually length in this case */ + return move_addr_to_user(&address, err, usockaddr, usockaddr_len); } SYSCALL_DEFINE3(getsockname, int, fd, struct sockaddr __user *, usockaddr, @@ -2124,26 +2103,25 @@ int __sys_getpeername(int fd, struct sockaddr __user *usockaddr, { struct socket *sock; struct sockaddr_storage address; - int err, fput_needed; + CLASS(fd, f)(fd); + int err; - sock = sockfd_lookup_light(fd, &err, &fput_needed); - if (sock != NULL) { - const struct proto_ops *ops = READ_ONCE(sock->ops); + if (fd_empty(f)) + return -EBADF; + sock = sock_from_file(f.file); + if (unlikely(!sock)) + return -ENOTSOCK; - err = security_socket_getpeername(sock); - if (err) { - fput_light(sock->file, fput_needed); - return err; - } + err = security_socket_getpeername(sock); + if (err) + return err; - err = ops->getname(sock, (struct sockaddr *)&address, 1); - if (err >= 0) - /* "err" is actually length in this case */ - err = move_addr_to_user(&address, err, usockaddr, - usockaddr_len); - fput_light(sock->file, fput_needed); - } - return err; + err = READ_ONCE(sock->ops)->getname(sock, (struct sockaddr *)&address, 1); + if (err < 0) + return err; + + /* "err" is actually length in this case */ + return move_addr_to_user(&address, err, usockaddr, usockaddr_len); } SYSCALL_DEFINE3(getpeername, int, fd, struct sockaddr __user *, usockaddr, @@ -2162,16 +2140,13 @@ int __sys_sendto(int fd, void __user *buff, size_t len, unsigned int flags, { struct socket *sock; struct sockaddr_storage address; - int err; struct msghdr msg; - int fput_needed; + CLASS(fd, f)(fd); + int err; err = import_ubuf(ITER_SOURCE, buff, len, &msg.msg_iter); if (unlikely(err)) return err; - sock = sockfd_lookup_light(fd, &err, &fput_needed); - if (!sock) - goto out; msg.msg_name = NULL; msg.msg_control = NULL; @@ -2181,20 +2156,22 @@ int __sys_sendto(int fd, void __user *buff, size_t len, unsigned int flags, if (addr) { err = move_addr_to_kernel(addr, addr_len, &address); if (err < 0) - goto out_put; + return err; msg.msg_name = (struct sockaddr *)&address; msg.msg_namelen = addr_len; } + + if (fd_empty(f)) + return -EBADF; + sock = sock_from_file(f.file); + if (unlikely(!sock)) + return -ENOTSOCK; + flags &= ~MSG_INTERNAL_SENDMSG_FLAGS; if (sock->file->f_flags & O_NONBLOCK) flags |= MSG_DONTWAIT; msg.msg_flags = flags; - err = __sock_sendmsg(sock, &msg); - -out_put: - fput_light(sock->file, fput_needed); -out: - return err; + return __sock_sendmsg(sock, &msg); } SYSCALL_DEFINE6(sendto, int, fd, void __user *, buff, size_t, len, @@ -2229,14 +2206,17 @@ int __sys_recvfrom(int fd, void __user *ubuf, size_t size, unsigned int flags, }; struct socket *sock; int err, err2; - int fput_needed; + CLASS(fd, f)(fd); + + if (fd_empty(f)) + return -EBADF; + sock = sock_from_file(f.file); + if (unlikely(!sock)) + return -ENOTSOCK; err = import_ubuf(ITER_DEST, ubuf, size, &msg.msg_iter); if (unlikely(err)) return err; - sock = sockfd_lookup_light(fd, &err, &fput_needed); - if (!sock) - goto out; if (sock->file->f_flags & O_NONBLOCK) flags |= MSG_DONTWAIT; @@ -2248,9 +2228,6 @@ int __sys_recvfrom(int fd, void __user *ubuf, size_t size, unsigned int flags, if (err2 < 0) err = err2; } - - fput_light(sock->file, fput_needed); -out: return err; } @@ -2325,17 +2302,16 @@ int __sys_setsockopt(int fd, int level, int optname, char __user *user_optval, { sockptr_t optval = USER_SOCKPTR(user_optval); bool compat = in_compat_syscall(); - int err, fput_needed; struct socket *sock; + CLASS(fd, f)(fd); - sock = sockfd_lookup_light(fd, &err, &fput_needed); - if (!sock) - return err; + if (fd_empty(f)) + return -EBADF; + sock = sock_from_file(f.file); + if (unlikely(!sock)) + return -ENOTSOCK; - err = do_sock_setsockopt(sock, compat, level, optname, optval, optlen); - - fput_light(sock->file, fput_needed); - return err; + return do_sock_setsockopt(sock, compat, level, optname, optval, optlen); } SYSCALL_DEFINE5(setsockopt, int, fd, int, level, int, optname, @@ -2391,20 +2367,17 @@ EXPORT_SYMBOL(do_sock_getsockopt); int __sys_getsockopt(int fd, int level, int optname, char __user *optval, int __user *optlen) { - int err, fput_needed; struct socket *sock; - bool compat; + CLASS(fd, f)(fd); - sock = sockfd_lookup_light(fd, &err, &fput_needed); - if (!sock) - return err; + if (fd_empty(f)) + return -EBADF; + sock = sock_from_file(f.file); + if (unlikely(!sock)) + return -ENOTSOCK; - compat = in_compat_syscall(); - err = do_sock_getsockopt(sock, compat, level, optname, + return do_sock_getsockopt(sock, in_compat_syscall(), level, optname, USER_SOCKPTR(optval), USER_SOCKPTR(optlen)); - - fput_light(sock->file, fput_needed); - return err; } SYSCALL_DEFINE5(getsockopt, int, fd, int, level, int, optname, @@ -2430,15 +2403,16 @@ int __sys_shutdown_sock(struct socket *sock, int how) int __sys_shutdown(int fd, int how) { - int err, fput_needed; struct socket *sock; + CLASS(fd, f)(fd); - sock = sockfd_lookup_light(fd, &err, &fput_needed); - if (sock != NULL) { - err = __sys_shutdown_sock(sock, how); - fput_light(sock->file, fput_needed); - } - return err; + if (fd_empty(f)) + return -EBADF; + sock = sock_from_file(f.file); + if (unlikely(!sock)) + return -ENOTSOCK; + + return __sys_shutdown_sock(sock, how); } SYSCALL_DEFINE2(shutdown, int, fd, int, how) @@ -2654,22 +2628,20 @@ long __sys_sendmsg_sock(struct socket *sock, struct msghdr *msg, long __sys_sendmsg(int fd, struct user_msghdr __user *msg, unsigned int flags, bool forbid_cmsg_compat) { - int fput_needed, err; struct msghdr msg_sys; struct socket *sock; + CLASS(fd, f)(fd); + + if (fd_empty(f)) + return -EBADF; + sock = sock_from_file(f.file); + if (unlikely(!sock)) + return -ENOTSOCK; if (forbid_cmsg_compat && (flags & MSG_CMSG_COMPAT)) return -EINVAL; - sock = sockfd_lookup_light(fd, &err, &fput_needed); - if (!sock) - goto out; - - err = ___sys_sendmsg(sock, msg, &msg_sys, flags, NULL, 0); - - fput_light(sock->file, fput_needed); -out: - return err; + return ___sys_sendmsg(sock, msg, &msg_sys, flags, NULL, 0); } SYSCALL_DEFINE3(sendmsg, int, fd, struct user_msghdr __user *, msg, unsigned int, flags) @@ -2684,13 +2656,20 @@ SYSCALL_DEFINE3(sendmsg, int, fd, struct user_msghdr __user *, msg, unsigned int int __sys_sendmmsg(int fd, struct mmsghdr __user *mmsg, unsigned int vlen, unsigned int flags, bool forbid_cmsg_compat) { - int fput_needed, err, datagrams; + int err, datagrams; struct socket *sock; struct mmsghdr __user *entry; struct compat_mmsghdr __user *compat_entry; struct msghdr msg_sys; struct used_address used_address; unsigned int oflags = flags; + CLASS(fd, f)(fd); + + if (fd_empty(f)) + return -EBADF; + sock = sock_from_file(f.file); + if (unlikely(!sock)) + return -ENOTSOCK; if (forbid_cmsg_compat && (flags & MSG_CMSG_COMPAT)) return -EINVAL; @@ -2700,10 +2679,6 @@ int __sys_sendmmsg(int fd, struct mmsghdr __user *mmsg, unsigned int vlen, datagrams = 0; - sock = sockfd_lookup_light(fd, &err, &fput_needed); - if (!sock) - return err; - used_address.name_len = UINT_MAX; entry = mmsg; compat_entry = (struct compat_mmsghdr __user *)mmsg; @@ -2739,8 +2714,6 @@ int __sys_sendmmsg(int fd, struct mmsghdr __user *mmsg, unsigned int vlen, cond_resched(); } - fput_light(sock->file, fput_needed); - /* We only return an error if no datagrams were able to be sent */ if (datagrams != 0) return datagrams; @@ -2862,22 +2835,20 @@ long __sys_recvmsg_sock(struct socket *sock, struct msghdr *msg, long __sys_recvmsg(int fd, struct user_msghdr __user *msg, unsigned int flags, bool forbid_cmsg_compat) { - int fput_needed, err; struct msghdr msg_sys; struct socket *sock; + CLASS(fd, f)(fd); + + if (fd_empty(f)) + return -EBADF; + sock = sock_from_file(f.file); + if (unlikely(!sock)) + return -ENOTSOCK; if (forbid_cmsg_compat && (flags & MSG_CMSG_COMPAT)) return -EINVAL; - sock = sockfd_lookup_light(fd, &err, &fput_needed); - if (!sock) - goto out; - - err = ___sys_recvmsg(sock, msg, &msg_sys, flags, 0); - - fput_light(sock->file, fput_needed); -out: - return err; + return ___sys_recvmsg(sock, msg, &msg_sys, flags, 0); } SYSCALL_DEFINE3(recvmsg, int, fd, struct user_msghdr __user *, msg, @@ -2894,31 +2865,32 @@ static int do_recvmmsg(int fd, struct mmsghdr __user *mmsg, unsigned int vlen, unsigned int flags, struct timespec64 *timeout) { - int fput_needed, err, datagrams; + int err, datagrams; struct socket *sock; struct mmsghdr __user *entry; struct compat_mmsghdr __user *compat_entry; struct msghdr msg_sys; struct timespec64 end_time; struct timespec64 timeout64; + CLASS(fd, f)(fd); if (timeout && poll_select_set_timeout(&end_time, timeout->tv_sec, timeout->tv_nsec)) return -EINVAL; - datagrams = 0; + if (fd_empty(f)) + return -EBADF; + sock = sock_from_file(f.file); + if (unlikely(!sock)) + return -ENOTSOCK; - sock = sockfd_lookup_light(fd, &err, &fput_needed); - if (!sock) - return err; + datagrams = 0; if (likely(!(flags & MSG_ERRQUEUE))) { err = sock_error(sock->sk); - if (err) { - datagrams = err; - goto out_put; - } + if (err) + return err; } entry = mmsg; @@ -2975,12 +2947,10 @@ static int do_recvmmsg(int fd, struct mmsghdr __user *mmsg, } if (err == 0) - goto out_put; + return datagrams; - if (datagrams == 0) { - datagrams = err; - goto out_put; - } + if (datagrams == 0) + return err; /* * We may return less entries than requested (vlen) if the @@ -2995,9 +2965,6 @@ static int do_recvmmsg(int fd, struct mmsghdr __user *mmsg, */ WRITE_ONCE(sock->sk->sk_err, -err); } -out_put: - fput_light(sock->file, fput_needed); - return datagrams; }
[sigh... turns out that my .muttrc had no netdev alias ;-/] Checking the theory that the important part in sockfd_lookup_light() is avoiding needless file refcount operations, not the marginal reduction of the register pressure from not keeping a struct file pointer in the caller. If that's true, we should get the same benefits from straight fdget()/fdput(). And AFAICS with sane use of CLASS(fd) we can get a better code generation... Would be nice if somebody tested it on networking test suites (including benchmarks)... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> --- ----- End forwarded message -----