From patchwork Fri Jun 3 05:35:48 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Arseniy Krasnov X-Patchwork-Id: 12868536 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 58A85C43334 for ; Fri, 3 Jun 2022 05:36:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240807AbiFCFgi (ORCPT ); Fri, 3 Jun 2022 01:36:38 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50990 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230374AbiFCFge (ORCPT ); Fri, 3 Jun 2022 01:36:34 -0400 Received: from mail.sberdevices.ru (mail.sberdevices.ru [45.89.227.171]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9777939142; Thu, 2 Jun 2022 22:36:31 -0700 (PDT) Received: from s-lin-edge02.sberdevices.ru (localhost [127.0.0.1]) by mail.sberdevices.ru (Postfix) with ESMTP id BBB495FD04; Fri, 3 Jun 2022 08:36:29 +0300 (MSK) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sberdevices.ru; s=mail; t=1654234589; bh=0LvHAQl3XARivHwA03baFrqhSI1ZsvnwZSA/W3J5NCI=; h=From:To:Subject:Date:Message-ID:Content-Type:MIME-Version; b=Ec9QIUa3BpOFqy+GxJKhjo/GxdhOC/hHDF3x6vp8VDfpgdGFh8iT/ClyrnPtGXRlT r0HvffkyQWiHjDTgwXRK9W+6Mtq4p7yIuHt8rDU/494FM6PxFWgfUt7WPPOh8UpgGO ycMYB7sDsWQ3EzvU/bN+OXH4AJajqNxMJKbrl/99SA1XLz01/nt6hwo8WSH/vLslfY 5aUYZXkruXjUvcBDVGK89MS0F7NiYrxWQy3rSi6Tl/uu0fs56645MGLt8fI24Q29k7 Lg2fiisa54dvmpzH4EiI913+Zoki8cVpNdQ1TL5bUWud5P3TcjRRi7E3NeyJwvcXgS GFMgyhhkHdBlA== Received: from S-MS-EXCH01.sberdevices.ru (S-MS-EXCH01.sberdevices.ru [172.16.1.4]) by mail.sberdevices.ru (Postfix) with ESMTP; Fri, 3 Jun 2022 08:36:15 +0300 (MSK) From: Arseniy Krasnov To: Stefano Garzarella , Stefan Hajnoczi , "Michael S. Tsirkin" , Jason Wang , "David S. Miller" , "Jakub Kicinski" , Paolo Abeni CC: "linux-kernel@vger.kernel.org" , "kvm@vger.kernel.org" , "virtualization@lists.linux-foundation.org" , "netdev@vger.kernel.org" , kernel , Krasnov Arseniy , Arseniy Krasnov Subject: [RFC PATCH v2 3/8] af_vsock: add zerocopy receive logic Thread-Topic: [RFC PATCH v2 3/8] af_vsock: add zerocopy receive logic Thread-Index: AQHYdwvHqkojsXt09k2Zv7nWir8Frg== Date: Fri, 3 Jun 2022 05:35:48 +0000 Message-ID: <129aa328-ad4d-cb2c-4a51-4a2bf9c9be37@sberdevices.ru> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [172.16.1.12] Content-ID: MIME-Version: 1.0 X-KSMG-Rule-ID: 4 X-KSMG-Message-Action: clean X-KSMG-AntiSpam-Status: not scanned, disabled by settings X-KSMG-AntiSpam-Interceptor-Info: not scanned X-KSMG-AntiPhishing: not scanned, disabled by settings X-KSMG-AntiVirus: Kaspersky Secure Mail Gateway, version 1.1.2.30, bases: 2022/06/03 01:19:00 #19656765 X-KSMG-AntiVirus-Status: Clean, skipped Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-State: RFC This: 1) Adds callback for 'mmap()' call on socket. It checks vm area flags and sets vm area ops. 2) Adds special 'getsockopt()' case which calls transport zerocopy callback. Input argument is vm area address. 3) Adds 'getsockopt()/setsockopt()' for switching on/off rx zerocopy mode. Signed-off-by: Arseniy Krasnov --- include/net/af_vsock.h | 7 +++ include/uapi/linux/vm_sockets.h | 3 + net/vmw_vsock/af_vsock.c | 100 ++++++++++++++++++++++++++++++++ 3 files changed, 110 insertions(+) diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h index f742e50207fb..f15f84c648ff 100644 --- a/include/net/af_vsock.h +++ b/include/net/af_vsock.h @@ -135,6 +135,13 @@ struct vsock_transport { bool (*stream_is_active)(struct vsock_sock *); bool (*stream_allow)(u32 cid, u32 port); + int (*rx_zerocopy_set)(struct vsock_sock *vsk, + bool enable); + int (*rx_zerocopy_get)(struct vsock_sock *vsk); + int (*zerocopy_dequeue)(struct vsock_sock *vsk, + struct vm_area_struct *vma, + unsigned long addr); + /* SEQ_PACKET. */ ssize_t (*seqpacket_dequeue)(struct vsock_sock *vsk, struct msghdr *msg, int flags); diff --git a/include/uapi/linux/vm_sockets.h b/include/uapi/linux/vm_sockets.h index c60ca33eac59..d1f792bed1a7 100644 --- a/include/uapi/linux/vm_sockets.h +++ b/include/uapi/linux/vm_sockets.h @@ -83,6 +83,9 @@ #define SO_VM_SOCKETS_CONNECT_TIMEOUT_NEW 8 +#define SO_VM_SOCKETS_MAP_RX 9 +#define SO_VM_SOCKETS_ZEROCOPY 10 + #if !defined(__KERNEL__) #if __BITS_PER_LONG == 64 || (defined(__x86_64__) && defined(__ILP32__)) #define SO_VM_SOCKETS_CONNECT_TIMEOUT SO_VM_SOCKETS_CONNECT_TIMEOUT_OLD diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c index f04abf662ec6..10061ef21730 100644 --- a/net/vmw_vsock/af_vsock.c +++ b/net/vmw_vsock/af_vsock.c @@ -1644,6 +1644,17 @@ static int vsock_connectible_setsockopt(struct socket *sock, } break; } + case SO_VM_SOCKETS_ZEROCOPY: { + if (!transport || !transport->rx_zerocopy_set) { + err = -EOPNOTSUPP; + } else { + COPY_IN(val); + + if (transport->rx_zerocopy_set(vsk, val)) + err = -EINVAL; + } + break; + } default: err = -ENOPROTOOPT; @@ -1657,6 +1668,48 @@ static int vsock_connectible_setsockopt(struct socket *sock, return err; } +static const struct vm_operations_struct afvsock_vm_ops = { +}; + +static int vsock_recv_zerocopy(struct socket *sock, + unsigned long address) +{ + struct sock *sk = sock->sk; + struct vsock_sock *vsk = vsock_sk(sk); + struct vm_area_struct *vma; + const struct vsock_transport *transport; + int res; + + transport = vsk->transport; + + if (!transport->rx_zerocopy_get) + return -EOPNOTSUPP; + + if (!transport->rx_zerocopy_get(vsk)) + return -EOPNOTSUPP; + + if (!transport->zerocopy_dequeue) + return -EOPNOTSUPP; + + lock_sock(sk); + mmap_write_lock(current->mm); + + vma = vma_lookup(current->mm, address); + + if (!vma || vma->vm_ops != &afvsock_vm_ops) { + mmap_write_unlock(current->mm); + release_sock(sk); + return -EINVAL; + } + + res = transport->zerocopy_dequeue(vsk, vma, address); + + mmap_write_unlock(current->mm); + release_sock(sk); + + return res; +} + static int vsock_connectible_getsockopt(struct socket *sock, int level, int optname, char __user *optval, @@ -1701,6 +1754,39 @@ static int vsock_connectible_getsockopt(struct socket *sock, lv = sock_get_timeout(vsk->connect_timeout, &v, optname == SO_VM_SOCKETS_CONNECT_TIMEOUT_OLD); break; + case SO_VM_SOCKETS_ZEROCOPY: { + const struct vsock_transport *transport; + int res; + + transport = vsk->transport; + + if (!transport->rx_zerocopy_get) + return -EOPNOTSUPP; + + lock_sock(sk); + + res = transport->rx_zerocopy_get(vsk); + + release_sock(sk); + + if (res < 0) + return -EINVAL; + + v.val64 = res; + + break; + } + case SO_VM_SOCKETS_MAP_RX: { + unsigned long vma_addr; + + if (len < sizeof(vma_addr)) + return -EINVAL; + + if (copy_from_user(&vma_addr, optval, sizeof(vma_addr))) + return -EFAULT; + + return vsock_recv_zerocopy(sock, vma_addr); + } default: return -ENOPROTOOPT; @@ -2129,6 +2215,19 @@ vsock_connectible_recvmsg(struct socket *sock, struct msghdr *msg, size_t len, return err; } +static int afvsock_mmap(struct file *file, struct socket *sock, + struct vm_area_struct *vma) +{ + if (vma->vm_flags & (VM_WRITE | VM_EXEC)) + return -EPERM; + + vma->vm_flags &= ~(VM_MAYWRITE | VM_MAYEXEC); + vma->vm_flags |= (VM_MIXEDMAP); + vma->vm_ops = &afvsock_vm_ops; + + return 0; +} + static const struct proto_ops vsock_stream_ops = { .family = PF_VSOCK, .owner = THIS_MODULE, @@ -2148,6 +2247,7 @@ static const struct proto_ops vsock_stream_ops = { .recvmsg = vsock_connectible_recvmsg, .mmap = sock_no_mmap, .sendpage = sock_no_sendpage, + .mmap = afvsock_mmap, }; static const struct proto_ops vsock_seqpacket_ops = {