diff mbox series

[net-next,v6,01/18] net: Declare MSG_SPLICE_PAGES internal sendmsg() flag

Message ID 20230411160902.4134381-2-dhowells@redhat.com (mailing list archive)
State New
Headers show
Series splice, net: Replace sendpage with sendmsg(MSG_SPLICE_PAGES), part 1 | expand

Commit Message

David Howells April 11, 2023, 4:08 p.m. UTC
Declare MSG_SPLICE_PAGES, an internal sendmsg() flag, that hints to a
network protocol that it should splice pages from the source iterator
rather than copying the data if it can.  This flag is added to a list that
is cleared by sendmsg syscalls on entry.

This is intended as a replacement for the ->sendpage() op, allowing a way
to splice in several multipage folios in one go.

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
cc: "David S. Miller" <davem@davemloft.net>
cc: Eric Dumazet <edumazet@google.com>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
cc: netdev@vger.kernel.org
---
 include/linux/socket.h | 3 +++
 net/socket.c           | 2 ++
 2 files changed, 5 insertions(+)

Comments

Al Viro April 13, 2023, 12:51 a.m. UTC | #1
On Tue, Apr 11, 2023 at 05:08:45PM +0100, David Howells wrote:

> @@ -2483,6 +2484,7 @@ static int ____sys_sendmsg(struct socket *sock, struct msghdr *msg_sys,
>  	}
>  	msg_sys->msg_flags = flags;
>  
> +	flags &= ~MSG_INTERNAL_SENDMSG_FLAGS;
>  	if (sock->file->f_flags & O_NONBLOCK)
>  		msg_sys->msg_flags |= MSG_DONTWAIT;

A bit too late, innit?  There's no users of 'flags' downstream of that
assignment to ->msg_flags, so your &= is a no-op; it should be done
*before* that assignment...
Al Viro April 13, 2023, 4:29 a.m. UTC | #2
On Thu, Apr 13, 2023 at 01:51:29AM +0100, Al Viro wrote:
> On Tue, Apr 11, 2023 at 05:08:45PM +0100, David Howells wrote:
> 
> > @@ -2483,6 +2484,7 @@ static int ____sys_sendmsg(struct socket *sock, struct msghdr *msg_sys,
> >  	}
> >  	msg_sys->msg_flags = flags;
> >  
> > +	flags &= ~MSG_INTERNAL_SENDMSG_FLAGS;
> >  	if (sock->file->f_flags & O_NONBLOCK)
> >  		msg_sys->msg_flags |= MSG_DONTWAIT;
> 
> A bit too late, innit?  There's no users of 'flags' downstream of that
> assignment to ->msg_flags, so your &= is a no-op; it should be done
> *before* that assignment...

While we are at it, io-uring has this:
int io_sendmsg_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
{
        struct io_sr_msg *sr = io_kiocb_to_cmd(req, struct io_sr_msg);
...
        sr->msg_flags = READ_ONCE(sqe->msg_flags) | MSG_NOSIGNAL;

and

int io_send(struct io_kiocb *req, unsigned int issue_flags)
{
        struct sockaddr_storage __address;
        struct io_sr_msg *sr = io_kiocb_to_cmd(req, struct io_sr_msg);

...
        flags = sr->msg_flags;
        if (issue_flags & IO_URING_F_NONBLOCK)
                flags |= MSG_DONTWAIT;
        if (flags & MSG_WAITALL)
                min_ret = iov_iter_count(&msg.msg_iter);

        msg.msg_flags = flags;
        ret = sock_sendmsg(sock, &msg);

Note that io_sendmsg_prep() handles both IORING_OP_SENDMSG and IORING_OP_SEND,
so this pair of functions can hit the same request.  And sqe->msg_flags is
not sanitized at all - it comes straight from user buffer.
David Howells April 13, 2023, 8:39 p.m. UTC | #3
Al Viro <viro@zeniv.linux.org.uk> wrote:

> Note that io_sendmsg_prep() handles both IORING_OP_SENDMSG and IORING_OP_SEND,
> so this pair of functions can hit the same request.  And sqe->msg_flags is
> not sanitized at all - it comes straight from user buffer.

Assuming ____sys_sendmsg() is fixed, I think it should be sufficient to make
io_send() and io_send_zc().  io_sendmsg() and io_sendmsg_zc() will go through
____sys_sendmsg().

David
Al Viro April 13, 2023, 8:49 p.m. UTC | #4
On Thu, Apr 13, 2023 at 09:39:22PM +0100, David Howells wrote:
> Al Viro <viro@zeniv.linux.org.uk> wrote:
> 
> > Note that io_sendmsg_prep() handles both IORING_OP_SENDMSG and IORING_OP_SEND,
> > so this pair of functions can hit the same request.  And sqe->msg_flags is
> > not sanitized at all - it comes straight from user buffer.
> 
> Assuming ____sys_sendmsg() is fixed, I think it should be sufficient to make
> io_send() and io_send_zc().  io_sendmsg() and io_sendmsg_zc() will go through
> ____sys_sendmsg().

	Sure; what I wanted to point out was that despite the name,
io_sendmsg_prep() gets used not only with io_sendmsg().  io_sendmsg()
does go through ____sys_sendmsg(), but io_send() goes straight to
sock_sendmsg() and evades all your checks...
Al Viro April 13, 2023, 9:01 p.m. UTC | #5
On Thu, Apr 13, 2023 at 09:49:18PM +0100, Al Viro wrote:
> On Thu, Apr 13, 2023 at 09:39:22PM +0100, David Howells wrote:
> > Al Viro <viro@zeniv.linux.org.uk> wrote:
> > 
> > > Note that io_sendmsg_prep() handles both IORING_OP_SENDMSG and IORING_OP_SEND,
> > > so this pair of functions can hit the same request.  And sqe->msg_flags is
> > > not sanitized at all - it comes straight from user buffer.
> > 
> > Assuming ____sys_sendmsg() is fixed, I think it should be sufficient to make
> > io_send() and io_send_zc().  io_sendmsg() and io_sendmsg_zc() will go through
> > ____sys_sendmsg().
> 
> 	Sure; what I wanted to point out was that despite the name,
> io_sendmsg_prep() gets used not only with io_sendmsg().  io_sendmsg()
> does go through ____sys_sendmsg(), but io_send() goes straight to
> sock_sendmsg() and evades all your checks...

	Incidentally, having ____sendmsg and ___sendmsg in the same file
is more than slightly antisocial - compiler can sort it out, but there
are human readers as well.  We have
____sys_sendmsg
___sys_sendmsg
__sys_sendmsg
__sys_sendmmsg
next to each other.  Maze of twisty little identifiers, all alike...
diff mbox series

Patch

diff --git a/include/linux/socket.h b/include/linux/socket.h
index 13c3a237b9c9..bd1cc3238851 100644
--- a/include/linux/socket.h
+++ b/include/linux/socket.h
@@ -327,6 +327,7 @@  struct ucred {
 					  */
 
 #define MSG_ZEROCOPY	0x4000000	/* Use user data in kernel path */
+#define MSG_SPLICE_PAGES 0x8000000	/* Splice the pages from the iterator in sendmsg() */
 #define MSG_FASTOPEN	0x20000000	/* Send data in TCP SYN */
 #define MSG_CMSG_CLOEXEC 0x40000000	/* Set close_on_exec for file
 					   descriptor received through
@@ -337,6 +338,8 @@  struct ucred {
 #define MSG_CMSG_COMPAT	0		/* We never have 32 bit fixups */
 #endif
 
+/* Flags to be cleared on entry by sendmsg and sendmmsg syscalls */
+#define MSG_INTERNAL_SENDMSG_FLAGS (MSG_SPLICE_PAGES)
 
 /* Setsockoptions(2) level. Thanks to BSD these must match IPPROTO_xxx */
 #define SOL_IP		0
diff --git a/net/socket.c b/net/socket.c
index 73e493da4589..b3fd3f7f7e03 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -2136,6 +2136,7 @@  int __sys_sendto(int fd, void __user *buff, size_t len, unsigned int flags,
 		msg.msg_name = (struct sockaddr *)&address;
 		msg.msg_namelen = addr_len;
 	}
+	flags &= ~MSG_INTERNAL_SENDMSG_FLAGS;
 	if (sock->file->f_flags & O_NONBLOCK)
 		flags |= MSG_DONTWAIT;
 	msg.msg_flags = flags;
@@ -2483,6 +2484,7 @@  static int ____sys_sendmsg(struct socket *sock, struct msghdr *msg_sys,
 	}
 	msg_sys->msg_flags = flags;
 
+	flags &= ~MSG_INTERNAL_SENDMSG_FLAGS;
 	if (sock->file->f_flags & O_NONBLOCK)
 		msg_sys->msg_flags |= MSG_DONTWAIT;
 	/*