From patchwork Sun Nov 6 19:36:02 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Arseniy Krasnov X-Patchwork-Id: 13033533 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3B0F2C433FE for ; Sun, 6 Nov 2022 19:36:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230020AbiKFTgi (ORCPT ); Sun, 6 Nov 2022 14:36:38 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40752 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229947AbiKFTgh (ORCPT ); Sun, 6 Nov 2022 14:36:37 -0500 Received: from mx.sberdevices.ru (mx.sberdevices.ru [45.89.227.171]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0998BE084; Sun, 6 Nov 2022 11:36:35 -0800 (PST) Received: from s-lin-edge02.sberdevices.ru (localhost [127.0.0.1]) by mx.sberdevices.ru (Postfix) with ESMTP id 5438B5FD04; Sun, 6 Nov 2022 22:36:33 +0300 (MSK) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sberdevices.ru; s=mail; t=1667763393; bh=yQ6suEWJBzxoUrocwoSmUoXHvbi6PvhSnqawY+nZJZ8=; h=From:To:Subject:Date:Message-ID:Content-Type:MIME-Version; b=YV1yrrDKglvFQLa34c6eqjzoqkTQQDdNQdyDxpaC6X/S6vOK5GLypskjvi+zp++Dm AhAfoS6eplCCgnaql70kq2zZIgZ4x20bZ15dsHACujm9LlO/N2UjuvrVMWWaIVbmy6 /q8GXlCZXIo9wDlCZKG/+dDz7dr2VwRtVrDXVsEl97SFZ3HreObFKRGRnZOBbn+P8v K9cSd98Nbz19+MBQq04/Y386P6CS1NzpglnFgdgaJwbgJFSzYpStKvR15+PLcozJUf E+6Mi/7mMASbAzeDOfaC75mlFgGA73t1xEFi0/+4kOB7mrjmx0Rl7/LceX84b6e8qq wlREl7z+nWDVw== Received: from S-MS-EXCH01.sberdevices.ru (S-MS-EXCH01.sberdevices.ru [172.16.1.4]) by mx.sberdevices.ru (Postfix) with ESMTP; Sun, 6 Nov 2022 22:36:33 +0300 (MSK) From: Arseniy Krasnov To: Stefano Garzarella , Stefan Hajnoczi , "Michael S. Tsirkin" , Jason Wang , "David S. Miller" , "edumazet@google.com" , Jakub Kicinski , Paolo Abeni , Krasnov Arseniy CC: "linux-kernel@vger.kernel.org" , "kvm@vger.kernel.org" , "virtualization@lists.linux-foundation.org" , "netdev@vger.kernel.org" , kernel Subject: [RFC PATCH v3 01/11] virtio/vsock: rework packet allocation logic Thread-Topic: [RFC PATCH v3 01/11] virtio/vsock: rework packet allocation logic Thread-Index: AQHY8hcBEbxDmf4aG021vuOLUMusCQ== Date: Sun, 6 Nov 2022 19:36:02 +0000 Message-ID: In-Reply-To: Accept-Language: en-US, ru-RU Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [172.16.1.12] Content-ID: MIME-Version: 1.0 X-KSMG-Rule-ID: 4 X-KSMG-Message-Action: clean X-KSMG-AntiSpam-Status: not scanned, disabled by settings X-KSMG-AntiSpam-Interceptor-Info: not scanned X-KSMG-AntiPhishing: not scanned, disabled by settings X-KSMG-AntiVirus: Kaspersky Secure Mail Gateway, version 1.1.2.30, bases: 2022/11/06 12:52:00 #20573807 X-KSMG-AntiVirus-Status: Clean, skipped Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-State: RFC To support zerocopy receive, packet's buffer allocation is changed: for buffers which could be mapped to user's vma we can't use 'kmalloc()'(as kernel restricts to map slab pages to user's vma) and raw buddy allocator now called. But, for tx packets(such packets won't be mapped to user), previous 'kmalloc()' way is used, but with special flag in packet's structure which allows to distinguish between 'kmalloc()' and raw pages buffers. Signed-off-by: Arseniy Krasnov --- drivers/vhost/vsock.c | 1 + include/linux/virtio_vsock.h | 1 + net/vmw_vsock/virtio_transport.c | 8 ++++++-- net/vmw_vsock/virtio_transport_common.c | 10 +++++++++- 4 files changed, 17 insertions(+), 3 deletions(-) diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c index 5703775af129..65475d128a1d 100644 --- a/drivers/vhost/vsock.c +++ b/drivers/vhost/vsock.c @@ -399,6 +399,7 @@ vhost_vsock_alloc_pkt(struct vhost_virtqueue *vq, return NULL; } + pkt->slab_buf = true; pkt->buf_len = pkt->len; nbytes = copy_from_iter(pkt->buf, pkt->len, &iov_iter); diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h index 35d7eedb5e8e..d02cb7aa922f 100644 --- a/include/linux/virtio_vsock.h +++ b/include/linux/virtio_vsock.h @@ -50,6 +50,7 @@ struct virtio_vsock_pkt { u32 off; bool reply; bool tap_delivered; + bool slab_buf; }; struct virtio_vsock_pkt_info { diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c index ad64f403536a..19909c1e9ba3 100644 --- a/net/vmw_vsock/virtio_transport.c +++ b/net/vmw_vsock/virtio_transport.c @@ -255,16 +255,20 @@ static void virtio_vsock_rx_fill(struct virtio_vsock *vsock) vq = vsock->vqs[VSOCK_VQ_RX]; do { + struct page *buf_page; + pkt = kzalloc(sizeof(*pkt), GFP_KERNEL); if (!pkt) break; - pkt->buf = kmalloc(buf_len, GFP_KERNEL); - if (!pkt->buf) { + buf_page = alloc_page(GFP_KERNEL); + + if (!buf_page) { virtio_transport_free_pkt(pkt); break; } + pkt->buf = page_to_virt(buf_page); pkt->buf_len = buf_len; pkt->len = buf_len; diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c index a9980e9b9304..37e8dbfe2f5d 100644 --- a/net/vmw_vsock/virtio_transport_common.c +++ b/net/vmw_vsock/virtio_transport_common.c @@ -69,6 +69,7 @@ virtio_transport_alloc_pkt(struct virtio_vsock_pkt_info *info, if (!pkt->buf) goto out_pkt; + pkt->slab_buf = true; pkt->buf_len = len; err = memcpy_from_msg(pkt->buf, info->msg, len); @@ -1339,7 +1340,14 @@ EXPORT_SYMBOL_GPL(virtio_transport_recv_pkt); void virtio_transport_free_pkt(struct virtio_vsock_pkt *pkt) { - kvfree(pkt->buf); + if (pkt->buf_len) { + if (pkt->slab_buf) + kvfree(pkt->buf); + else + free_pages((unsigned long)pkt->buf, + get_order(pkt->buf_len)); + } + kfree(pkt); } EXPORT_SYMBOL_GPL(virtio_transport_free_pkt); From patchwork Sun Nov 6 19:38:01 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Arseniy Krasnov X-Patchwork-Id: 13033537 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4600DC43217 for ; Sun, 6 Nov 2022 19:38:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230134AbiKFTih (ORCPT ); Sun, 6 Nov 2022 14:38:37 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42116 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229947AbiKFTig (ORCPT ); Sun, 6 Nov 2022 14:38:36 -0500 Received: from mx.sberdevices.ru (mx.sberdevices.ru [45.89.227.171]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BC5F52DC0; Sun, 6 Nov 2022 11:38:34 -0800 (PST) Received: from s-lin-edge02.sberdevices.ru (localhost [127.0.0.1]) by mx.sberdevices.ru (Postfix) with ESMTP id 2A8055FD04; Sun, 6 Nov 2022 22:38:33 +0300 (MSK) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sberdevices.ru; s=mail; t=1667763513; bh=EEoKcv2eN2h4TIogPlImzmkvyAMOSyDfOwiRdxm5/Uc=; h=From:To:Subject:Date:Message-ID:Content-Type:MIME-Version; b=a5pOFkdns/nGHJF2r6nW/22qh5m7oJXUahsvuEBQLu/LxnAIvk1vOItoJewBTKi+M tMAr4D7LOi1HkHOR81DTcrw0U4KexZtPgLBlmM+V1VJj9ZmoFL78xFDQcRKSBQiiMH hGbm53187KsXpDmUsh9JXd9Cx1kZc84y+lXtGjhzlqiM3VLdxnwZIqXt3YR3C8g4Bu HIWDpl75SJ2tN7KJICVcy37rvNnFLNj/EubHp+TQrvME5ZzwvCdp/nLJ1Dp8a1trWt Rd4uIl4wQJVaCJjtOqFVkQ4YXeBLfrjxsK/mW0x3TmhsOXKNRYRnzCOtxVKN2gKh4H qtsK72AbxY27w== Received: from S-MS-EXCH01.sberdevices.ru (S-MS-EXCH01.sberdevices.ru [172.16.1.4]) by mx.sberdevices.ru (Postfix) with ESMTP; Sun, 6 Nov 2022 22:38:32 +0300 (MSK) From: Arseniy Krasnov To: Stefano Garzarella , Stefan Hajnoczi , "Michael S. Tsirkin" , Jason Wang , "David S. Miller" , "edumazet@google.com" , Jakub Kicinski , Paolo Abeni , Krasnov Arseniy CC: "linux-kernel@vger.kernel.org" , "kvm@vger.kernel.org" , "virtualization@lists.linux-foundation.org" , "netdev@vger.kernel.org" , kernel Subject: [RFC PATCH v3 02/11] virtio/vsock: update, 'virtio_transport_recv_pkt()' Thread-Topic: [RFC PATCH v3 02/11] virtio/vsock: update, 'virtio_transport_recv_pkt()' Thread-Index: AQHY8hdI/Cz9wqLJ60mqLRsUlqPepQ== Date: Sun, 6 Nov 2022 19:38:01 +0000 Message-ID: In-Reply-To: Accept-Language: en-US, ru-RU Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [172.16.1.12] Content-ID: <271160C92372DC4C8A0D7E0DB977117E@sberdevices.ru> MIME-Version: 1.0 X-KSMG-Rule-ID: 4 X-KSMG-Message-Action: clean X-KSMG-AntiSpam-Status: not scanned, disabled by settings X-KSMG-AntiSpam-Interceptor-Info: not scanned X-KSMG-AntiPhishing: not scanned, disabled by settings X-KSMG-AntiVirus: Kaspersky Secure Mail Gateway, version 1.1.2.30, bases: 2022/11/06 12:52:00 #20573807 X-KSMG-AntiVirus-Status: Clean, skipped Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-State: RFC Pass destination socket to 'virtio_transport_recv_pkt()'. This is needed, because caller may need socket structure, thus to avoid socket extra lookup in connect/bind lists, socket is passed as parameter(if it is NULL, then lookup is performed in the same way as before). Signed-off-by: Arseniy Krasnov --- drivers/vhost/vsock.c | 2 +- include/linux/virtio_vsock.h | 1 + net/vmw_vsock/virtio_transport.c | 2 +- net/vmw_vsock/virtio_transport_common.c | 12 +++++++----- net/vmw_vsock/vsock_loopback.c | 2 +- 5 files changed, 11 insertions(+), 8 deletions(-) diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c index 65475d128a1d..6f3d9f02cc1d 100644 --- a/drivers/vhost/vsock.c +++ b/drivers/vhost/vsock.c @@ -548,7 +548,7 @@ static void vhost_vsock_handle_tx_kick(struct vhost_work *work) if (le64_to_cpu(pkt->hdr.src_cid) == vsock->guest_cid && le64_to_cpu(pkt->hdr.dst_cid) == vhost_transport_get_local_cid()) - virtio_transport_recv_pkt(&vhost_transport, pkt); + virtio_transport_recv_pkt(&vhost_transport, NULL, pkt); else virtio_transport_free_pkt(pkt); diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h index d02cb7aa922f..c1be40f89a89 100644 --- a/include/linux/virtio_vsock.h +++ b/include/linux/virtio_vsock.h @@ -150,6 +150,7 @@ virtio_transport_dgram_enqueue(struct vsock_sock *vsk, void virtio_transport_destruct(struct vsock_sock *vsk); void virtio_transport_recv_pkt(struct virtio_transport *t, + struct sock *sk, struct virtio_vsock_pkt *pkt); void virtio_transport_free_pkt(struct virtio_vsock_pkt *pkt); void virtio_transport_inc_tx_pkt(struct virtio_vsock_sock *vvs, struct virtio_vsock_pkt *pkt); diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c index 19909c1e9ba3..0305f94c98bb 100644 --- a/net/vmw_vsock/virtio_transport.c +++ b/net/vmw_vsock/virtio_transport.c @@ -560,7 +560,7 @@ static void virtio_transport_rx_work(struct work_struct *work) pkt->len = len - sizeof(pkt->hdr); virtio_transport_deliver_tap_pkt(pkt); - virtio_transport_recv_pkt(&virtio_transport, pkt); + virtio_transport_recv_pkt(&virtio_transport, NULL, pkt); } } while (!virtqueue_enable_cb(vq)); diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c index 37e8dbfe2f5d..444764869670 100644 --- a/net/vmw_vsock/virtio_transport_common.c +++ b/net/vmw_vsock/virtio_transport_common.c @@ -1238,11 +1238,11 @@ static bool virtio_transport_valid_type(u16 type) * lock. */ void virtio_transport_recv_pkt(struct virtio_transport *t, + struct sock *sk, struct virtio_vsock_pkt *pkt) { struct sockaddr_vm src, dst; struct vsock_sock *vsk; - struct sock *sk; bool space_available; vsock_addr_init(&src, le64_to_cpu(pkt->hdr.src_cid), @@ -1267,12 +1267,14 @@ void virtio_transport_recv_pkt(struct virtio_transport *t, /* The socket must be in connected or bound table * otherwise send reset back */ - sk = vsock_find_connected_socket(&src, &dst); if (!sk) { - sk = vsock_find_bound_socket(&dst); + sk = vsock_find_connected_socket(&src, &dst); if (!sk) { - (void)virtio_transport_reset_no_sock(t, pkt); - goto free_pkt; + sk = vsock_find_bound_socket(&dst); + if (!sk) { + (void)virtio_transport_reset_no_sock(t, pkt); + goto free_pkt; + } } } diff --git a/net/vmw_vsock/vsock_loopback.c b/net/vmw_vsock/vsock_loopback.c index 169a8cf65b39..37c28bbeafaf 100644 --- a/net/vmw_vsock/vsock_loopback.c +++ b/net/vmw_vsock/vsock_loopback.c @@ -134,7 +134,7 @@ static void vsock_loopback_work(struct work_struct *work) list_del_init(&pkt->list); virtio_transport_deliver_tap_pkt(pkt); - virtio_transport_recv_pkt(&loopback_transport, pkt); + virtio_transport_recv_pkt(&loopback_transport, NULL, pkt); } } From patchwork Sun Nov 6 19:40:12 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Arseniy Krasnov X-Patchwork-Id: 13033538 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 949D5C433FE for ; Sun, 6 Nov 2022 19:40:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230158AbiKFTks (ORCPT ); Sun, 6 Nov 2022 14:40:48 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42940 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230002AbiKFTkr (ORCPT ); Sun, 6 Nov 2022 14:40:47 -0500 Received: from mx.sberdevices.ru (mx.sberdevices.ru [45.89.227.171]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B1C1FEE2A; Sun, 6 Nov 2022 11:40:45 -0800 (PST) Received: from s-lin-edge02.sberdevices.ru (localhost [127.0.0.1]) by mx.sberdevices.ru (Postfix) with ESMTP id 0B4475FD04; Sun, 6 Nov 2022 22:40:44 +0300 (MSK) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sberdevices.ru; s=mail; t=1667763644; bh=0umJpkLmREN7M8YV0WN1Ff+SQdE0Z+GkV2xpTDm1Q/w=; h=From:To:Subject:Date:Message-ID:Content-Type:MIME-Version; b=ZfUXWCHLIkDt2hwa7b1dUyHFDcmGekjzqmCm175Wv9QrnmKlKbhIO/y3wpC8jnPvv gaosu092pebh6pjUl5kxNR5hIZaQWdrenos0LfY/sYCkIhQK0qaCJPQNd8hm8UI6eH Elw0gqHnb6WDh3aPeTJfyPicD9uffT89wW2NYVEEoJsr5emfPKJvr8AUy5ESgwiFsb B20aH53DP9LkgHJmWzhFL+PkDgsiIldcZJHNKNFd74GrFTb/AA7tixqGw86GP5MSvt 3rfXZAL1mA7nRMEyH9gydMCog2hZQhj0SIrjBL85vl3h0q692f01AycfwwaZtlgZ1J ZcjuHGGi0pZIA== Received: from S-MS-EXCH01.sberdevices.ru (S-MS-EXCH01.sberdevices.ru [172.16.1.4]) by mx.sberdevices.ru (Postfix) with ESMTP; Sun, 6 Nov 2022 22:40:43 +0300 (MSK) From: Arseniy Krasnov To: Stefano Garzarella , Stefan Hajnoczi , "Michael S. Tsirkin" , Jason Wang , "David S. Miller" , "edumazet@google.com" , Jakub Kicinski , Paolo Abeni , Krasnov Arseniy CC: "linux-kernel@vger.kernel.org" , "kvm@vger.kernel.org" , "virtualization@lists.linux-foundation.org" , "netdev@vger.kernel.org" , kernel Subject: [RFC PATCH v3 03/11] af_vsock: add zerocopy receive logic Thread-Topic: [RFC PATCH v3 03/11] af_vsock: add zerocopy receive logic Thread-Index: AQHY8heWueHDDL4x/0iPxprlEpp8Wg== Date: Sun, 6 Nov 2022 19:40:12 +0000 Message-ID: <7aeba781-db09-9be1-a9a3-a4c16da38fb5@sberdevices.ru> In-Reply-To: Accept-Language: en-US, ru-RU Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [172.16.1.12] Content-ID: <11A927F904AFCB4D871B16A0244EDE99@sberdevices.ru> MIME-Version: 1.0 X-KSMG-Rule-ID: 4 X-KSMG-Message-Action: clean X-KSMG-AntiSpam-Status: not scanned, disabled by settings X-KSMG-AntiSpam-Interceptor-Info: not scanned X-KSMG-AntiPhishing: not scanned, disabled by settings X-KSMG-AntiVirus: Kaspersky Secure Mail Gateway, version 1.1.2.30, bases: 2022/11/06 12:52:00 #20573807 X-KSMG-AntiVirus-Status: Clean, skipped Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-State: RFC This: 1) Adds callback for 'mmap()' call on socket. It checks vm area flags and sets vm area ops. 2) Adds special 'getsockopt()' case which calls transport zerocopy callback. Input argument is vm area address. 3) Adds 'getsockopt()/setsockopt()' for switching on/off rx zerocopy mode. Signed-off-by: Arseniy Krasnov --- include/net/af_vsock.h | 8 ++ include/uapi/linux/vm_sockets.h | 3 + net/vmw_vsock/af_vsock.c | 187 +++++++++++++++++++++++++++++++- 3 files changed, 196 insertions(+), 2 deletions(-) diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h index 568a87c5e0d0..e4f12ef8e623 100644 --- a/include/net/af_vsock.h +++ b/include/net/af_vsock.h @@ -73,6 +73,8 @@ struct vsock_sock { /* Private to transport. */ void *trans; + + bool rx_zerocopy_on; }; s64 vsock_stream_has_data(struct vsock_sock *vsk); @@ -138,6 +140,12 @@ struct vsock_transport { bool (*stream_allow)(u32 cid, u32 port); int (*set_rcvlowat)(struct vsock_sock *vsk, int val); + int (*zerocopy_rx_mmap)(struct vsock_sock *vsk, + struct vm_area_struct *vma); + int (*zerocopy_dequeue)(struct vsock_sock *vsk, + struct page **pages, + unsigned long *pages_num); + /* SEQ_PACKET. */ ssize_t (*seqpacket_dequeue)(struct vsock_sock *vsk, struct msghdr *msg, int flags); diff --git a/include/uapi/linux/vm_sockets.h b/include/uapi/linux/vm_sockets.h index c60ca33eac59..d1f792bed1a7 100644 --- a/include/uapi/linux/vm_sockets.h +++ b/include/uapi/linux/vm_sockets.h @@ -83,6 +83,9 @@ #define SO_VM_SOCKETS_CONNECT_TIMEOUT_NEW 8 +#define SO_VM_SOCKETS_MAP_RX 9 +#define SO_VM_SOCKETS_ZEROCOPY 10 + #if !defined(__KERNEL__) #if __BITS_PER_LONG == 64 || (defined(__x86_64__) && defined(__ILP32__)) #define SO_VM_SOCKETS_CONNECT_TIMEOUT SO_VM_SOCKETS_CONNECT_TIMEOUT_OLD diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c index ee418701cdee..21a915eb0820 100644 --- a/net/vmw_vsock/af_vsock.c +++ b/net/vmw_vsock/af_vsock.c @@ -1663,6 +1663,16 @@ static int vsock_connectible_setsockopt(struct socket *sock, } break; } + case SO_VM_SOCKETS_ZEROCOPY: { + if (sock->state != SS_UNCONNECTED) { + err = -EOPNOTSUPP; + break; + } + + COPY_IN(val); + vsk->rx_zerocopy_on = val; + break; + } default: err = -ENOPROTOOPT; @@ -1676,6 +1686,124 @@ static int vsock_connectible_setsockopt(struct socket *sock, return err; } +static const struct vm_operations_struct afvsock_vm_ops = { +}; + +static int vsock_recv_zerocopy(struct socket *sock, + unsigned long address) +{ + const struct vsock_transport *transport; + struct vm_area_struct *vma; + unsigned long vma_pages; + struct vsock_sock *vsk; + struct page **pages; + struct sock *sk; + int err; + int i; + + sk = sock->sk; + vsk = vsock_sk(sk); + err = 0; + + lock_sock(sk); + + if (!vsk->rx_zerocopy_on) { + err = -EOPNOTSUPP; + goto out_unlock_sock; + } + + transport = vsk->transport; + + if (!transport->zerocopy_dequeue) { + err = -EOPNOTSUPP; + goto out_unlock_sock; + } + + mmap_write_lock(current->mm); + + vma = vma_lookup(current->mm, address); + + if (!vma || vma->vm_ops != &afvsock_vm_ops) { + err = -EINVAL; + goto out_unlock_vma; + } + + /* Allow to use vm area only from the first page. */ + if (vma->vm_start != address) { + err = -EINVAL; + goto out_unlock_vma; + } + + vma_pages = (vma->vm_end - vma->vm_start) / PAGE_SIZE; + pages = kmalloc_array(vma_pages, sizeof(pages[0]), + GFP_KERNEL | __GFP_ZERO); + + if (!pages) { + err = -EINVAL; + goto out_unlock_vma; + } + + err = transport->zerocopy_dequeue(vsk, pages, &vma_pages); + + if (err) + goto out_unlock_vma; + + /* Now 'vma_pages' contains number of pages in array. + * If array element is NULL, skip it, go to next page. + */ + for (i = 0; i < vma_pages; i++) { + if (pages[i]) { + unsigned long pages_inserted; + + pages_inserted = 1; + err = vm_insert_pages(vma, address, &pages[i], &pages_inserted); + + if (err || pages_inserted) { + /* Failed to insert some pages, we have "partially" + * mapped vma. Do not return, set error code. This + * code will be returned to user. User needs to call + * 'madvise()/mmap()' to clear this vma. Anyway, + * references to all pages will to be dropped below. + */ + if (!err) { + err = -EFAULT; + break; + } + } + } + + address += PAGE_SIZE; + } + + i = 0; + + while (i < vma_pages) { + /* Drop ref count for all pages, returned by transport. + * We call 'put_page()' only once, as transport needed + * to 'get_page()' at least only once also, to prevent + * pages being freed. If transport calls 'get_page()' + * more twice or more for every page - we don't care, + * if transport calls 'get_page()' only one time, this + * meanse that every page had ref count equal to 1,then + * 'vm_insert_pages()' increments it to 2. After this + * loop, ref count will be 1 again, and page will be + * returned to allocator by user. + */ + if (pages[i]) + put_page(pages[i]); + i++; + } + + kfree(pages); + +out_unlock_vma: + mmap_write_unlock(current->mm); +out_unlock_sock: + release_sock(sk); + + return err; +} + static int vsock_connectible_getsockopt(struct socket *sock, int level, int optname, char __user *optval, @@ -1720,6 +1848,26 @@ static int vsock_connectible_getsockopt(struct socket *sock, lv = sock_get_timeout(vsk->connect_timeout, &v, optname == SO_VM_SOCKETS_CONNECT_TIMEOUT_OLD); break; + case SO_VM_SOCKETS_ZEROCOPY: { + lock_sock(sk); + + v.val64 = vsk->rx_zerocopy_on; + + release_sock(sk); + + break; + } + case SO_VM_SOCKETS_MAP_RX: { + unsigned long vma_addr; + + if (len < sizeof(vma_addr)) + return -EINVAL; + + if (copy_from_user(&vma_addr, optval, sizeof(vma_addr))) + return -EFAULT; + + return vsock_recv_zerocopy(sock, vma_addr); + } default: return -ENOPROTOOPT; @@ -2167,6 +2315,41 @@ static int vsock_set_rcvlowat(struct sock *sk, int val) return 0; } +static int afvsock_mmap(struct file *file, struct socket *sock, + struct vm_area_struct *vma) +{ + const struct vsock_transport *transport; + struct vsock_sock *vsk; + struct sock *sk; + int err; + + if (vma->vm_flags & (VM_WRITE | VM_EXEC)) + return -EPERM; + + vma->vm_flags &= ~(VM_MAYWRITE | VM_MAYEXEC); + vma->vm_flags |= (VM_MIXEDMAP); + vma->vm_ops = &afvsock_vm_ops; + + sk = sock->sk; + vsk = vsock_sk(sk); + + lock_sock(sk); + + transport = vsk->transport; + + if (!transport || !transport->zerocopy_rx_mmap) { + err = -EOPNOTSUPP; + goto out_unlock; + } + + err = transport->zerocopy_rx_mmap(vsk, vma); + +out_unlock: + release_sock(sk); + + return err; +} + static const struct proto_ops vsock_stream_ops = { .family = PF_VSOCK, .owner = THIS_MODULE, @@ -2184,7 +2367,7 @@ static const struct proto_ops vsock_stream_ops = { .getsockopt = vsock_connectible_getsockopt, .sendmsg = vsock_connectible_sendmsg, .recvmsg = vsock_connectible_recvmsg, - .mmap = sock_no_mmap, + .mmap = afvsock_mmap, .sendpage = sock_no_sendpage, .set_rcvlowat = vsock_set_rcvlowat, }; @@ -2206,7 +2389,7 @@ static const struct proto_ops vsock_seqpacket_ops = { .getsockopt = vsock_connectible_getsockopt, .sendmsg = vsock_connectible_sendmsg, .recvmsg = vsock_connectible_recvmsg, - .mmap = sock_no_mmap, + .mmap = afvsock_mmap, .sendpage = sock_no_sendpage, }; From patchwork Sun Nov 6 19:41:50 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Arseniy Krasnov X-Patchwork-Id: 13033542 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id CCE2AC4332F for ; Sun, 6 Nov 2022 19:42:27 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230174AbiKFTmZ (ORCPT ); Sun, 6 Nov 2022 14:42:25 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43576 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229932AbiKFTmY (ORCPT ); Sun, 6 Nov 2022 14:42:24 -0500 Received: from mx.sberdevices.ru (mx.sberdevices.ru [45.89.227.171]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BC2C1EE2A; Sun, 6 Nov 2022 11:42:22 -0800 (PST) Received: from s-lin-edge02.sberdevices.ru (localhost [127.0.0.1]) by mx.sberdevices.ru (Postfix) with ESMTP id 1BF1E5FD04; Sun, 6 Nov 2022 22:42:21 +0300 (MSK) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sberdevices.ru; s=mail; t=1667763741; bh=31uDZyV25yChx94dZ34mNWQ4MPrUAqdHZOxeJxUoubY=; h=From:To:Subject:Date:Message-ID:Content-Type:MIME-Version; b=Z4eMdwANTQaG8sr8Mhq6wmHYDlxRh8kggaAk51SmGnaZ9lcBcK+7t5D3UgP35vagq jU5/kZQH5ixufz4GgYvXggryvVN3Wiui+NT5DgUdm8dTdSzX6EhYYvEhSj9AIxxl+o BLQuM6zF0qDYxtedIqEGptPLMnDgRD7NKAV5r/8krzrJqfAQSExzdVDa/pPynbOg7C ERjOWxi9EZP5XUdkdN+Oq/NSdc+fwr+YYmnzophKfIuyX4koKkIX+b6zbNbhOo9Fw4 wYLeJaRuNjgf2FfEUTrd5gIGiQEeI86DcAAnWOBbKFJ7OqzNu9vtoZMOcBmN5uq0ps wfOVEaVrwrCpQ== Received: from S-MS-EXCH01.sberdevices.ru (S-MS-EXCH01.sberdevices.ru [172.16.1.4]) by mx.sberdevices.ru (Postfix) with ESMTP; Sun, 6 Nov 2022 22:42:20 +0300 (MSK) From: Arseniy Krasnov To: Stefano Garzarella , Stefan Hajnoczi , "Michael S. Tsirkin" , Jason Wang , "David S. Miller" , "edumazet@google.com" , Paolo Abeni , Jakub Kicinski , Krasnov Arseniy CC: "linux-kernel@vger.kernel.org" , "kvm@vger.kernel.org" , "virtualization@lists.linux-foundation.org" , "netdev@vger.kernel.org" , kernel Subject: [RFC PATCH v3 04/11] virtio/vsock: add transport zerocopy callback Thread-Topic: [RFC PATCH v3 04/11] virtio/vsock: add transport zerocopy callback Thread-Index: AQHY8hfQ+kdlJNbB20agFKggEXiUiw== Date: Sun, 6 Nov 2022 19:41:50 +0000 Message-ID: In-Reply-To: Accept-Language: en-US, ru-RU Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [172.16.1.12] Content-ID: <5EFFF9ED28A7A444B376B91C829C247E@sberdevices.ru> MIME-Version: 1.0 X-KSMG-Rule-ID: 4 X-KSMG-Message-Action: clean X-KSMG-AntiSpam-Status: not scanned, disabled by settings X-KSMG-AntiSpam-Interceptor-Info: not scanned X-KSMG-AntiPhishing: not scanned, disabled by settings X-KSMG-AntiVirus: Kaspersky Secure Mail Gateway, version 1.1.2.30, bases: 2022/11/06 12:52:00 #20573807 X-KSMG-AntiVirus-Status: Clean, skipped Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-State: RFC This adds transport callback which processes rx queue of socket and instead of copying data to user provided buffer, it inserts data pages of each packet to user's vm area. Signed-off-by: Arseniy Krasnov --- include/linux/virtio_vsock.h | 7 + include/uapi/linux/virtio_vsock.h | 14 ++ net/vmw_vsock/virtio_transport_common.c | 244 +++++++++++++++++++++++- 3 files changed, 261 insertions(+), 4 deletions(-) diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h index c1be40f89a89..d10fdfd8d144 100644 --- a/include/linux/virtio_vsock.h +++ b/include/linux/virtio_vsock.h @@ -37,6 +37,7 @@ struct virtio_vsock_sock { u32 buf_alloc; struct list_head rx_queue; u32 msg_count; + struct page *usr_poll_page; }; struct virtio_vsock_pkt { @@ -51,6 +52,7 @@ struct virtio_vsock_pkt { bool reply; bool tap_delivered; bool slab_buf; + bool split; }; struct virtio_vsock_pkt_info { @@ -131,6 +133,11 @@ int virtio_transport_dgram_bind(struct vsock_sock *vsk, struct sockaddr_vm *addr); bool virtio_transport_dgram_allow(u32 cid, u32 port); +int virtio_transport_zerocopy_init(struct vsock_sock *vsk, + struct vm_area_struct *vma); +int virtio_transport_zerocopy_dequeue(struct vsock_sock *vsk, + struct page **pages, + unsigned long *pages_num); int virtio_transport_connect(struct vsock_sock *vsk); int virtio_transport_shutdown(struct vsock_sock *vsk, int mode); diff --git a/include/uapi/linux/virtio_vsock.h b/include/uapi/linux/virtio_vsock.h index 64738838bee5..2a0e4f309918 100644 --- a/include/uapi/linux/virtio_vsock.h +++ b/include/uapi/linux/virtio_vsock.h @@ -66,6 +66,20 @@ struct virtio_vsock_hdr { __le32 fwd_cnt; } __attribute__((packed)); +struct virtio_vsock_usr_hdr { + u32 flags; + u32 len; +} __attribute__((packed)); + +#define VIRTIO_VSOCK_USR_POLL_NO_DATA 0 +#define VIRTIO_VSOCK_USR_POLL_HAS_DATA 1 +#define VIRTIO_VSOCK_USR_POLL_SHUTDOWN ~0 + +struct virtio_vsock_usr_hdr_pref { + u32 poll_value; + u32 hdr_num; +} __attribute__((packed)); + enum virtio_vsock_type { VIRTIO_VSOCK_TYPE_STREAM = 1, VIRTIO_VSOCK_TYPE_SEQPACKET = 2, diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c index 444764869670..fa4a2688a5d5 100644 --- a/net/vmw_vsock/virtio_transport_common.c +++ b/net/vmw_vsock/virtio_transport_common.c @@ -12,6 +12,7 @@ #include #include #include +#include #include #include @@ -241,6 +242,14 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk, static bool virtio_transport_inc_rx_pkt(struct virtio_vsock_sock *vvs, struct virtio_vsock_pkt *pkt) { + if (vvs->usr_poll_page) { + struct virtio_vsock_usr_hdr_pref *hdr; + + hdr = (struct virtio_vsock_usr_hdr_pref *)page_to_virt(vvs->usr_poll_page); + + hdr->poll_value = VIRTIO_VSOCK_USR_POLL_HAS_DATA; + } + if (vvs->rx_bytes + pkt->len > vvs->buf_alloc) return false; @@ -253,6 +262,14 @@ static void virtio_transport_dec_rx_pkt(struct virtio_vsock_sock *vvs, { vvs->rx_bytes -= pkt->len; vvs->fwd_cnt += pkt->len; + + if (!vvs->rx_bytes && vvs->usr_poll_page) { + struct virtio_vsock_usr_hdr_pref *hdr; + + hdr = (struct virtio_vsock_usr_hdr_pref *)page_to_virt(vvs->usr_poll_page); + + hdr->poll_value = VIRTIO_VSOCK_USR_POLL_NO_DATA; + } } void virtio_transport_inc_tx_pkt(struct virtio_vsock_sock *vvs, struct virtio_vsock_pkt *pkt) @@ -347,6 +364,191 @@ virtio_transport_stream_do_peek(struct vsock_sock *vsk, return err; } +int virtio_transport_zerocopy_init(struct vsock_sock *vsk, + struct vm_area_struct *vma) +{ + struct virtio_vsock_sock *vvs; + int err = 0; + + if (vma->vm_end - vma->vm_start < 2 * PAGE_SIZE) + return -EINVAL; + + vvs = vsk->trans; + + spin_lock_bh(&vvs->rx_lock); + + if (!vvs->usr_poll_page) { + /* GFP_ATOMIC because of spinlock. */ + vvs->usr_poll_page = alloc_page(GFP_KERNEL | GFP_ATOMIC); + + if (!vvs->usr_poll_page) { + err = -ENOMEM; + } else { + struct virtio_vsock_usr_hdr_pref *usr_hdr_pref; + unsigned long one_page = 1; + + usr_hdr_pref = page_to_virt(vvs->usr_poll_page); + + if (vsk->peer_shutdown & SHUTDOWN_MASK) { + usr_hdr_pref->poll_value = VIRTIO_VSOCK_USR_POLL_SHUTDOWN; + } else { + usr_hdr_pref->poll_value = vvs->rx_bytes ? + VIRTIO_VSOCK_USR_POLL_HAS_DATA : + VIRTIO_VSOCK_USR_POLL_NO_DATA; + } + + usr_hdr_pref->hdr_num = 0; + + err = vm_insert_pages(vma, vma->vm_start, + &vvs->usr_poll_page, + &one_page); + + if (one_page) + err = -EINVAL; + } + } else { + err = -EINVAL; + } + + spin_unlock_bh(&vvs->rx_lock); + + return err; +} +EXPORT_SYMBOL_GPL(virtio_transport_zerocopy_init); + +int virtio_transport_zerocopy_dequeue(struct vsock_sock *vsk, + struct page **pages, + unsigned long *pages_num) +{ + struct virtio_vsock_usr_hdr_pref *usr_hdr_pref; + struct virtio_vsock_usr_hdr *usr_hdr_buffer; + struct virtio_vsock_sock *vvs; + unsigned long max_usr_hdrs; + struct page *usr_hdr_page; + int pages_cnt; + + if (*pages_num < 2) + return -EINVAL; + + vvs = vsk->trans; + + max_usr_hdrs = (PAGE_SIZE - sizeof(*usr_hdr_pref)) / sizeof(*usr_hdr_buffer); + *pages_num = min(max_usr_hdrs, *pages_num); + pages_cnt = 0; + + spin_lock_bh(&vvs->rx_lock); + + if (!vvs->usr_poll_page) { + spin_unlock_bh(&vvs->rx_lock); + return -EINVAL; + } + + usr_hdr_page = vvs->usr_poll_page; + usr_hdr_pref = page_to_virt(usr_hdr_page); + usr_hdr_buffer = (struct virtio_vsock_usr_hdr *)(usr_hdr_pref + 1); + usr_hdr_pref->hdr_num = 0; + + /* If ref counter is 1, then page owned during + * allocation and not mapped, so insert it to + * the output array. It will be mapped. + */ + if (page_ref_count(usr_hdr_page) == 1) { + pages[pages_cnt++] = usr_hdr_page; + /* Inc ref one more, as AF_VSOCK layer calls + * 'put_page()' for each returned page. + */ + get_page(usr_hdr_page); + } else { + pages[pages_cnt++] = NULL; + } + + /* Polling page is already mapped. */ + while (!list_empty(&vvs->rx_queue) && + pages_cnt < *pages_num) { + struct virtio_vsock_pkt *pkt; + ssize_t rest_data_bytes; + size_t moved_data_bytes; + unsigned long pg_offs; + + pkt = list_first_entry(&vvs->rx_queue, + struct virtio_vsock_pkt, list); + + rest_data_bytes = le32_to_cpu(pkt->hdr.len) - pkt->off; + + /* For packets, bigger than one page, split it's + * high order allocated buffer to 0 order pages. + * Otherwise 'vm_insert_pages()' will fail, for + * all pages except first. + */ + if (rest_data_bytes > PAGE_SIZE) { + /* High order buffer not split yet. */ + if (!pkt->split) { + split_page(virt_to_page(pkt->buf), + get_order(le32_to_cpu(pkt->hdr.len))); + pkt->split = true; + } + } + + pg_offs = pkt->off; + moved_data_bytes = 0; + + while (rest_data_bytes && + pages_cnt < *pages_num) { + struct page *buf_page; + + buf_page = virt_to_page(pkt->buf + pg_offs); + + pages[pages_cnt++] = buf_page; + /* Get reference to prevent this page being + * returned to page allocator when packet will + * be freed. Ref count will be 2. + */ + get_page(buf_page); + pg_offs += PAGE_SIZE; + + if (rest_data_bytes >= PAGE_SIZE) { + moved_data_bytes += PAGE_SIZE; + rest_data_bytes -= PAGE_SIZE; + } else { + moved_data_bytes += rest_data_bytes; + rest_data_bytes = 0; + } + } + + if (!rest_data_bytes) + usr_hdr_buffer->flags = le32_to_cpu(pkt->hdr.flags); + else + usr_hdr_buffer->flags = 0; + + usr_hdr_buffer->len = moved_data_bytes; + + usr_hdr_buffer++; + usr_hdr_pref->hdr_num++; + + pkt->off = pg_offs; + + if (rest_data_bytes == 0) { + list_del(&pkt->list); + virtio_transport_dec_rx_pkt(vvs, pkt); + virtio_transport_free_pkt(pkt); + } + + /* Now ref count for all pages of packet is 1. */ + } + + if (*pages_num - 1 < max_usr_hdrs) + usr_hdr_buffer->len = 0; + + spin_unlock_bh(&vvs->rx_lock); + + virtio_transport_send_credit_update(vsk); + + *pages_num = pages_cnt; + + return 0; +} +EXPORT_SYMBOL_GPL(virtio_transport_zerocopy_dequeue); + static ssize_t virtio_transport_stream_do_dequeue(struct vsock_sock *vsk, struct msghdr *msg, @@ -969,11 +1171,21 @@ void virtio_transport_release(struct vsock_sock *vsk) { struct sock *sk = &vsk->sk; bool remove_sock = true; + struct virtio_vsock_sock *vvs = vsk->trans; if (sk->sk_type == SOCK_STREAM || sk->sk_type == SOCK_SEQPACKET) remove_sock = virtio_transport_close(vsk); if (remove_sock) { + spin_lock_bh(&vvs->rx_lock); + + if (vvs->usr_poll_page) { + __free_page(vvs->usr_poll_page); + vvs->usr_poll_page = NULL; + } + + spin_unlock_bh(&vvs->rx_lock); + sock_set_flag(sk, SOCK_DONE); virtio_transport_remove_sock(vsk); } @@ -1077,6 +1289,7 @@ virtio_transport_recv_connected(struct sock *sk, struct virtio_vsock_pkt *pkt) { struct vsock_sock *vsk = vsock_sk(sk); + struct virtio_vsock_sock *vvs = vsk->trans; int err = 0; switch (le16_to_cpu(pkt->hdr.op)) { @@ -1095,6 +1308,19 @@ virtio_transport_recv_connected(struct sock *sk, vsk->peer_shutdown |= RCV_SHUTDOWN; if (le32_to_cpu(pkt->hdr.flags) & VIRTIO_VSOCK_SHUTDOWN_SEND) vsk->peer_shutdown |= SEND_SHUTDOWN; + + spin_lock_bh(&vvs->rx_lock); + + if (vvs->usr_poll_page) { + struct virtio_vsock_usr_hdr_pref *hdr; + + hdr = (struct virtio_vsock_usr_hdr_pref *)page_to_virt(vvs->usr_poll_page); + + hdr->poll_value = 0xffffffff; + } + + spin_unlock_bh(&vvs->rx_lock); + if (vsk->peer_shutdown == SHUTDOWN_MASK && vsock_stream_has_data(vsk) <= 0 && !sock_flag(sk, SOCK_DONE)) { @@ -1343,11 +1569,21 @@ EXPORT_SYMBOL_GPL(virtio_transport_recv_pkt); void virtio_transport_free_pkt(struct virtio_vsock_pkt *pkt) { if (pkt->buf_len) { - if (pkt->slab_buf) + if (pkt->slab_buf) { kvfree(pkt->buf); - else - free_pages((unsigned long)pkt->buf, - get_order(pkt->buf_len)); + } else { + unsigned int order = get_order(pkt->buf_len); + unsigned long buf = (unsigned long)pkt->buf; + + if (pkt->split) { + int i; + + for (i = 0; i < (1 << order); i++) + free_page(buf + i * PAGE_SIZE); + } else { + free_pages(buf, order); + } + } } kfree(pkt); From patchwork Sun Nov 6 19:43:59 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Arseniy Krasnov X-Patchwork-Id: 13033543 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 923B2C4332F for ; Sun, 6 Nov 2022 19:44:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230194AbiKFTog (ORCPT ); Sun, 6 Nov 2022 14:44:36 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44760 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230085AbiKFToe (ORCPT ); Sun, 6 Nov 2022 14:44:34 -0500 Received: from mx.sberdevices.ru (mx.sberdevices.ru [45.89.227.171]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 827DC5FF0; Sun, 6 Nov 2022 11:44:32 -0800 (PST) Received: from s-lin-edge02.sberdevices.ru (localhost [127.0.0.1]) by mx.sberdevices.ru (Postfix) with ESMTP id 9F4DF5FD04; Sun, 6 Nov 2022 22:44:30 +0300 (MSK) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sberdevices.ru; s=mail; t=1667763870; bh=Z6oRYbVeFw0g+nrdQFitukonMwA9rcY/1A4nrqySmrs=; h=From:To:Subject:Date:Message-ID:Content-Type:MIME-Version; b=faJzqqiq51dEe/HKUIM/zX8RewjNgVOscJEuKRN+PlyILw5cb/A/FUZF3YERwAump oXJ1PEKjynCZDt/PxN1O/ckwIpeuIOlxEvodJdn2S+szcHzmBxlJzw/L7Y9wCJeHvE IjZsCXHBOr0W3UfY6Pj9uZ46Dc0penD7JmXB6GWmGFBRgDI/V3Ewsn8xY6ckK3+f2W M/ug+yD9yziT0p0aIRXThpGhlBpnKub6ZKePZHjik+VN9MUJ44NVexH9cIa6LZwxig IMOR8z08s89EHLQY9k/0cARdGDW/pr6a3oSIbCh0mMr/BZ4q8mlksU9VGXnfclUa9j 3BojpOMUhyRfA== Received: from S-MS-EXCH02.sberdevices.ru (S-MS-EXCH02.sberdevices.ru [172.16.1.5]) by mx.sberdevices.ru (Postfix) with ESMTP; Sun, 6 Nov 2022 22:44:30 +0300 (MSK) From: Arseniy Krasnov To: Stefano Garzarella , Stefan Hajnoczi , "Michael S. Tsirkin" , Jason Wang , "David S. Miller" , "edumazet@google.com" , Jakub Kicinski , Paolo Abeni , Krasnov Arseniy CC: "linux-kernel@vger.kernel.org" , "kvm@vger.kernel.org" , "virtualization@lists.linux-foundation.org" , "netdev@vger.kernel.org" , kernel Subject: [RFC PATCH v3 05/11] vhost/vsock: switch packet's buffer allocation Thread-Topic: [RFC PATCH v3 05/11] vhost/vsock: switch packet's buffer allocation Thread-Index: AQHY8hgdDSQFggGg1UqSYuJmlIYHyQ== Date: Sun, 6 Nov 2022 19:43:59 +0000 Message-ID: <6ec232ab-fcc6-3afb-8c38-849ad25ef6c5@sberdevices.ru> In-Reply-To: Accept-Language: en-US, ru-RU Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [172.16.1.12] Content-ID: <4547A82DCB7E9943BE87B15E62BD6DE3@sberdevices.ru> MIME-Version: 1.0 X-KSMG-Rule-ID: 4 X-KSMG-Message-Action: clean X-KSMG-AntiSpam-Status: not scanned, disabled by settings X-KSMG-AntiSpam-Interceptor-Info: not scanned X-KSMG-AntiPhishing: not scanned, disabled by settings X-KSMG-AntiVirus: Kaspersky Secure Mail Gateway, version 1.1.2.30, bases: 2022/11/06 12:52:00 #20573807 X-KSMG-AntiVirus-Status: Clean, skipped Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-State: RFC This changes packets buffer allocation logic,it depends on whether rx zerocopy enabled or disabled on destination socket. Thus, now socket lookup performed here, not in 'virtio_transport_common.c', and for zerocopy case, buffer is allocated using raw calls to the buddy allocator. If zerocopy is disabled, then buffers allocated by 'kmalloc()'(like before this patch). Signed-off-by: Arseniy Krasnov --- drivers/vhost/vsock.c | 56 +++++++++++++++++++++++++++++++++++-------- 1 file changed, 46 insertions(+), 10 deletions(-) diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c index 6f3d9f02cc1d..191a5b94aa7c 100644 --- a/drivers/vhost/vsock.c +++ b/drivers/vhost/vsock.c @@ -354,10 +354,14 @@ vhost_transport_cancel_pkt(struct vsock_sock *vsk) static struct virtio_vsock_pkt * vhost_vsock_alloc_pkt(struct vhost_virtqueue *vq, - unsigned int out, unsigned int in) + unsigned int out, unsigned int in, + struct sock **sk) { struct virtio_vsock_pkt *pkt; + struct sockaddr_vm src, dst; + struct vhost_vsock *vsock; struct iov_iter iov_iter; + struct vsock_sock *vsk; size_t nbytes; size_t len; @@ -381,6 +385,18 @@ vhost_vsock_alloc_pkt(struct vhost_virtqueue *vq, return NULL; } + vsock_addr_init(&src, le64_to_cpu(pkt->hdr.src_cid), + le32_to_cpu(pkt->hdr.src_port)); + vsock_addr_init(&dst, le64_to_cpu(pkt->hdr.dst_cid), + le32_to_cpu(pkt->hdr.dst_port)); + + *sk = vsock_find_connected_socket(&src, &dst); + if (!(*sk)) { + *sk = vsock_find_bound_socket(&dst); + if (!(*sk)) + return pkt; + } + pkt->len = le32_to_cpu(pkt->hdr.len); /* No payload */ @@ -393,14 +409,32 @@ vhost_vsock_alloc_pkt(struct vhost_virtqueue *vq, return NULL; } - pkt->buf = kvmalloc(pkt->len, GFP_KERNEL); - if (!pkt->buf) { - kfree(pkt); - return NULL; - } - - pkt->slab_buf = true; pkt->buf_len = pkt->len; + vsock = container_of(vq->dev, struct vhost_vsock, dev); + + vsk = vsock_sk(*sk); + + if (!vsk->rx_zerocopy_on) { + pkt->buf = kvmalloc(pkt->len, GFP_KERNEL); + + if (!pkt->buf) { + kfree(pkt); + return NULL; + } + + pkt->slab_buf = true; + } else { + struct page *buf_page; + + buf_page = alloc_pages(GFP_KERNEL, get_order(pkt->len)); + + if (buf_page == NULL) { + kfree(pkt); + return NULL; + } + + pkt->buf = page_to_virt(buf_page); + } nbytes = copy_from_iter(pkt->buf, pkt->len, &iov_iter); if (nbytes != pkt->len) { @@ -512,6 +546,8 @@ static void vhost_vsock_handle_tx_kick(struct vhost_work *work) vhost_disable_notify(&vsock->dev, vq); do { + struct sock *sk = NULL; + if (!vhost_vsock_more_replies(vsock)) { /* Stop tx until the device processes already * pending replies. Leave tx virtqueue @@ -533,7 +569,7 @@ static void vhost_vsock_handle_tx_kick(struct vhost_work *work) break; } - pkt = vhost_vsock_alloc_pkt(vq, out, in); + pkt = vhost_vsock_alloc_pkt(vq, out, in, &sk); if (!pkt) { vq_err(vq, "Faulted on pkt\n"); continue; @@ -548,7 +584,7 @@ static void vhost_vsock_handle_tx_kick(struct vhost_work *work) if (le64_to_cpu(pkt->hdr.src_cid) == vsock->guest_cid && le64_to_cpu(pkt->hdr.dst_cid) == vhost_transport_get_local_cid()) - virtio_transport_recv_pkt(&vhost_transport, NULL, pkt); + virtio_transport_recv_pkt(&vhost_transport, sk, pkt); else virtio_transport_free_pkt(pkt); From patchwork Sun Nov 6 19:45:36 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Arseniy Krasnov X-Patchwork-Id: 13033544 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A607AC4332F for ; Sun, 6 Nov 2022 19:46:13 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230191AbiKFTqL (ORCPT ); Sun, 6 Nov 2022 14:46:11 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45538 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230003AbiKFTqK (ORCPT ); Sun, 6 Nov 2022 14:46:10 -0500 Received: from mx.sberdevices.ru (mx.sberdevices.ru [45.89.227.171]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8BB6B5FF0; Sun, 6 Nov 2022 11:46:09 -0800 (PST) Received: from s-lin-edge02.sberdevices.ru (localhost [127.0.0.1]) by mx.sberdevices.ru (Postfix) with ESMTP id E8D395FD03; Sun, 6 Nov 2022 22:46:07 +0300 (MSK) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sberdevices.ru; s=mail; t=1667763967; bh=KB7/nCwB5US3fSlJ/o7Lc7D8vetXoSvCwrP1v/i0ne8=; h=From:To:Subject:Date:Message-ID:Content-Type:MIME-Version; b=XeGSlEKE/VjncH5U7TdNwzy9UxBlJcIT2/xTuGWgLBB12Hglf9PApiqWwlHapALc8 XA7Quq+EcrxEZJIKVN70LwA/61i09ERzTAodu16jaf35sKSBS0d5VvhOAxbfS58WBA tkqsAD3IoacKrmIh+DEMAFjfeADcDn5CcvXZyjozqLZRl2nxF80p0/9YLH6ckEOm9F 1ZyE1cv6AZ7YkkutrzJX+QwbaB2u1V0fIFfF2EZJxhVLxVcly6J0uReDooTaFvN1ag 1txwynxHvvhiuNpC/f+vqYJCbR84FawafpIwL9r3LIEm7yqslkmA+CfMXBQYjSaEv8 Qx8m2V8PXd4Hg== Received: from S-MS-EXCH02.sberdevices.ru (S-MS-EXCH02.sberdevices.ru [172.16.1.5]) by mx.sberdevices.ru (Postfix) with ESMTP; Sun, 6 Nov 2022 22:46:07 +0300 (MSK) From: Arseniy Krasnov To: Stefano Garzarella , Stefan Hajnoczi , Jason Wang , "Michael S. Tsirkin" , "David S. Miller" , "edumazet@google.com" , Paolo Abeni , Jakub Kicinski , Krasnov Arseniy CC: "linux-kernel@vger.kernel.org" , "kvm@vger.kernel.org" , "virtualization@lists.linux-foundation.org" , "netdev@vger.kernel.org" , kernel Subject: [RFC PATCH v3 06/11] vhost/vsock: enable zerocopy callback Thread-Topic: [RFC PATCH v3 06/11] vhost/vsock: enable zerocopy callback Thread-Index: AQHY8hhXqtpTwM0Q2ka0p5q420lJUQ== Date: Sun, 6 Nov 2022 19:45:36 +0000 Message-ID: <7a7f6f28-aff2-8e4b-140a-c47be778febf@sberdevices.ru> In-Reply-To: Accept-Language: en-US, ru-RU Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [172.16.1.12] Content-ID: <9F0BA688F5AE444C9C5C493604D9E891@sberdevices.ru> MIME-Version: 1.0 X-KSMG-Rule-ID: 4 X-KSMG-Message-Action: clean X-KSMG-AntiSpam-Status: not scanned, disabled by settings X-KSMG-AntiSpam-Interceptor-Info: not scanned X-KSMG-AntiPhishing: not scanned, disabled by settings X-KSMG-AntiVirus: Kaspersky Secure Mail Gateway, version 1.1.2.30, bases: 2022/11/06 18:31:00 #20575158 X-KSMG-AntiVirus-Status: Clean, skipped Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-State: RFC This adds zerocopy callback to vhost transport. Signed-off-by: Arseniy Krasnov --- drivers/vhost/vsock.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c index 191a5b94aa7c..fc6c6e667e36 100644 --- a/drivers/vhost/vsock.c +++ b/drivers/vhost/vsock.c @@ -486,6 +486,8 @@ static struct virtio_transport vhost_transport = { .stream_rcvhiwat = virtio_transport_stream_rcvhiwat, .stream_is_active = virtio_transport_stream_is_active, .stream_allow = virtio_transport_stream_allow, + .zerocopy_dequeue = virtio_transport_zerocopy_dequeue, + .zerocopy_rx_mmap = virtio_transport_zerocopy_init, .seqpacket_dequeue = virtio_transport_seqpacket_dequeue, .seqpacket_enqueue = virtio_transport_seqpacket_enqueue, From patchwork Sun Nov 6 19:47:05 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Arseniy Krasnov X-Patchwork-Id: 13033549 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 45339C433FE for ; Sun, 6 Nov 2022 19:47:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230215AbiKFTrk (ORCPT ); Sun, 6 Nov 2022 14:47:40 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46096 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229991AbiKFTrj (ORCPT ); Sun, 6 Nov 2022 14:47:39 -0500 Received: from mx.sberdevices.ru (mx.sberdevices.ru [45.89.227.171]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5C0015FF5; Sun, 6 Nov 2022 11:47:38 -0800 (PST) Received: from s-lin-edge02.sberdevices.ru (localhost [127.0.0.1]) by mx.sberdevices.ru (Postfix) with ESMTP id AFA625FD03; Sun, 6 Nov 2022 22:47:36 +0300 (MSK) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sberdevices.ru; s=mail; t=1667764056; bh=rkPnlyhLuVRW8SnsMpIB4rs4sReUJk836iqZy6DVi/8=; h=From:To:Subject:Date:Message-ID:Content-Type:MIME-Version; b=KQp6oe7xLJx2SI+CN2IxqnylOlMZXZNSluMyFmJKgx02ot9NKrYdAHjC7BELJG8fO o5YOgqbp+8Bn/w27DavoaapCqMXK2F2E1Aw2d3YLYiYb1U+al/nyQ+sl/RhvpnMyJV MUolT/KsqjsfwHORV3veNki2ELsL/EnZFLDH2aux87wjHS2HIdrC6ETumvjej2qobU CFmSZWQoWh/Ld4OaGu+vHN1KnVd5X9ZuzwC6iFVFQhOI+U/k+tLMAZEss1zPY5IBCp XVdawsIHR+5zJ/xEWzAxGw7bUFmctmW1UYHLPZd1Uf0SkAl0dh1QJq1b+MxcuMsyx8 hnqf0I0gEq/mA== Received: from S-MS-EXCH02.sberdevices.ru (S-MS-EXCH02.sberdevices.ru [172.16.1.5]) by mx.sberdevices.ru (Postfix) with ESMTP; Sun, 6 Nov 2022 22:47:36 +0300 (MSK) From: Arseniy Krasnov To: Stefano Garzarella , Stefan Hajnoczi , "Michael S. Tsirkin" , Jason Wang , "David S. Miller" , "edumazet@google.com" , Paolo Abeni , Jakub Kicinski , Krasnov Arseniy CC: "linux-kernel@vger.kernel.org" , "kvm@vger.kernel.org" , "virtualization@lists.linux-foundation.org" , "netdev@vger.kernel.org" , kernel Subject: [RFC PATCH v3 07/11] virtio/vsock: enable zerocopy callback Thread-Topic: [RFC PATCH v3 07/11] virtio/vsock: enable zerocopy callback Thread-Index: AQHY8hiMyajXktRACUOON+IpWz23oQ== Date: Sun, 6 Nov 2022 19:47:05 +0000 Message-ID: In-Reply-To: Accept-Language: en-US, ru-RU Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [172.16.1.12] Content-ID: MIME-Version: 1.0 X-KSMG-Rule-ID: 4 X-KSMG-Message-Action: clean X-KSMG-AntiSpam-Status: not scanned, disabled by settings X-KSMG-AntiSpam-Interceptor-Info: not scanned X-KSMG-AntiPhishing: not scanned, disabled by settings X-KSMG-AntiVirus: Kaspersky Secure Mail Gateway, version 1.1.2.30, bases: 2022/11/06 18:31:00 #20575158 X-KSMG-AntiVirus-Status: Clean, skipped Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-State: RFC This adds zerocopy callback for virtio transport. Signed-off-by: Arseniy Krasnov --- net/vmw_vsock/virtio_transport.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c index 0305f94c98bb..cbdc91acd9d4 100644 --- a/net/vmw_vsock/virtio_transport.c +++ b/net/vmw_vsock/virtio_transport.c @@ -480,6 +480,8 @@ static struct virtio_transport virtio_transport = { .stream_rcvhiwat = virtio_transport_stream_rcvhiwat, .stream_is_active = virtio_transport_stream_is_active, .stream_allow = virtio_transport_stream_allow, + .zerocopy_dequeue = virtio_transport_zerocopy_dequeue, + .zerocopy_rx_mmap = virtio_transport_zerocopy_init, .seqpacket_dequeue = virtio_transport_seqpacket_dequeue, .seqpacket_enqueue = virtio_transport_seqpacket_enqueue, From patchwork Sun Nov 6 19:48:56 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Arseniy Krasnov X-Patchwork-Id: 13033550 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 05612C4332F for ; Sun, 6 Nov 2022 19:49:34 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230235AbiKFTtc (ORCPT ); Sun, 6 Nov 2022 14:49:32 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47106 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229991AbiKFTtb (ORCPT ); Sun, 6 Nov 2022 14:49:31 -0500 Received: from mx.sberdevices.ru (mx.sberdevices.ru [45.89.227.171]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A76C6227; Sun, 6 Nov 2022 11:49:29 -0800 (PST) Received: from s-lin-edge02.sberdevices.ru (localhost [127.0.0.1]) by mx.sberdevices.ru (Postfix) with ESMTP id 0C2975FD03; Sun, 6 Nov 2022 22:49:28 +0300 (MSK) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sberdevices.ru; s=mail; t=1667764168; bh=aLAoY68EOFtako+FG+2TWg1WJ4inLy9K4esIGX6FXTc=; h=From:To:Subject:Date:Message-ID:Content-Type:MIME-Version; b=Dn5QtWXsZUw2NtP1g3EiPm8MgQJyzlYK3duDvEmEh4bdOdAFcO/kBnyC5ax+QlPuM AGEr9joSHccLjhRp+Gg4XK0IYuEoEY9AzlrdoYNgoi81YnO0EFh3u8DQFAbYJOFG8k 8c3HjcU+ngApjZStrVurHL3leqPCiYVqe3XzNrHyAI77DHVIFK1i0gTu5M9y3g9BLS EIQtmiOeZlnAY18M8iLpWCgn6ZTpiLfrEd/1uNLP4foHNt7MH/2/TRicWkQ6GdwOHu q66ZHBZk3b+Cu3a5ho1sVpsToTEMcL0l0/YRWYTg5Bh2hw61TqmLLqtIDALJna+GcH 3Log381XxV5/w== Received: from S-MS-EXCH01.sberdevices.ru (S-MS-EXCH01.sberdevices.ru [172.16.1.4]) by mx.sberdevices.ru (Postfix) with ESMTP; Sun, 6 Nov 2022 22:49:27 +0300 (MSK) From: Arseniy Krasnov To: Stefano Garzarella , Stefan Hajnoczi , "Michael S. Tsirkin" , Jason Wang , "David S. Miller" , "edumazet@google.com" , Jakub Kicinski , Paolo Abeni , Krasnov Arseniy CC: "linux-kernel@vger.kernel.org" , "kvm@vger.kernel.org" , "virtualization@lists.linux-foundation.org" , "netdev@vger.kernel.org" , kernel Subject: [RFC PATCH v3 08/11] test/vsock: rework message bound test Thread-Topic: [RFC PATCH v3 08/11] test/vsock: rework message bound test Thread-Index: AQHY8hjODfI0ULYbgUSWOiAerLW3Cw== Date: Sun, 6 Nov 2022 19:48:56 +0000 Message-ID: In-Reply-To: Accept-Language: en-US, ru-RU Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [172.16.1.12] Content-ID: MIME-Version: 1.0 X-KSMG-Rule-ID: 4 X-KSMG-Message-Action: clean X-KSMG-AntiSpam-Status: not scanned, disabled by settings X-KSMG-AntiSpam-Interceptor-Info: not scanned X-KSMG-AntiPhishing: not scanned, disabled by settings X-KSMG-AntiVirus: Kaspersky Secure Mail Gateway, version 1.1.2.30, bases: 2022/11/06 18:31:00 #20575158 X-KSMG-AntiVirus-Status: Clean, skipped Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-State: RFC This updates message bound test making it more complex. Instead of sending 1 bytes messages with one MSG_EOR bit, it sends messages of random length(one half of messages are smaller than page size, second half are bigger) with random number of MSG_EOR bits set. Receiver also don't know total number of messages. Signed-off-by: Arseniy Krasnov --- tools/testing/vsock/control.c | 34 +++++++++ tools/testing/vsock/control.h | 2 + tools/testing/vsock/util.c | 13 ++++ tools/testing/vsock/util.h | 1 + tools/testing/vsock/vsock_test.c | 115 +++++++++++++++++++++++++++---- 5 files changed, 152 insertions(+), 13 deletions(-) diff --git a/tools/testing/vsock/control.c b/tools/testing/vsock/control.c index 4874872fc5a3..bed1649bdf3d 100644 --- a/tools/testing/vsock/control.c +++ b/tools/testing/vsock/control.c @@ -141,6 +141,40 @@ void control_writeln(const char *str) timeout_end(); } +void control_writeulong(unsigned long value) +{ + char str[32]; + + if (snprintf(str, sizeof(str), "%lu", value) >= sizeof(str)) { + perror("snprintf"); + exit(EXIT_FAILURE); + } + + control_writeln(str); +} + +unsigned long control_readulong(bool *ok) +{ + unsigned long value; + char *str; + + if (ok) + *ok = false; + + str = control_readln(); + + if (str == NULL) + return 0; + + value = strtoul(str, NULL, 10); + free(str); + + if (ok) + *ok = true; + + return value; +} + /* Return the next line from the control socket (without the trailing newline). * * The program terminates if a timeout occurs. diff --git a/tools/testing/vsock/control.h b/tools/testing/vsock/control.h index 51814b4f9ac1..cdd922dfea68 100644 --- a/tools/testing/vsock/control.h +++ b/tools/testing/vsock/control.h @@ -9,7 +9,9 @@ void control_init(const char *control_host, const char *control_port, void control_cleanup(void); void control_writeln(const char *str); char *control_readln(void); +unsigned long control_readulong(bool *ok); void control_expectln(const char *str); bool control_cmpln(char *line, const char *str, bool fail); +void control_writeulong(unsigned long value); #endif /* CONTROL_H */ diff --git a/tools/testing/vsock/util.c b/tools/testing/vsock/util.c index 2acbb7703c6a..351903836774 100644 --- a/tools/testing/vsock/util.c +++ b/tools/testing/vsock/util.c @@ -395,3 +395,16 @@ void skip_test(struct test_case *test_cases, size_t test_cases_len, test_cases[test_id].skip = true; } + +unsigned long djb2(const void *data, size_t len) +{ + unsigned long hash = 5381; + int i = 0; + + while (i < len) { + hash = ((hash << 5) + hash) + ((unsigned char *)data)[i]; + i++; + } + + return hash; +} diff --git a/tools/testing/vsock/util.h b/tools/testing/vsock/util.h index a3375ad2fb7f..988cc69a4642 100644 --- a/tools/testing/vsock/util.h +++ b/tools/testing/vsock/util.h @@ -49,4 +49,5 @@ void run_tests(const struct test_case *test_cases, void list_tests(const struct test_case *test_cases); void skip_test(struct test_case *test_cases, size_t test_cases_len, const char *test_id_str); +unsigned long djb2(const void *data, size_t len); #endif /* UTIL_H */ diff --git a/tools/testing/vsock/vsock_test.c b/tools/testing/vsock/vsock_test.c index bb6d691cb30d..107c11165887 100644 --- a/tools/testing/vsock/vsock_test.c +++ b/tools/testing/vsock/vsock_test.c @@ -284,10 +284,14 @@ static void test_stream_msg_peek_server(const struct test_opts *opts) close(fd); } -#define MESSAGES_CNT 7 -#define MSG_EOR_IDX (MESSAGES_CNT / 2) +#define SOCK_BUF_SIZE (2 * 1024 * 1024) +#define MAX_MSG_SIZE (32 * 1024) + static void test_seqpacket_msg_bounds_client(const struct test_opts *opts) { + unsigned long curr_hash; + int page_size; + int msg_count; int fd; fd = vsock_seqpacket_connect(opts->peer_cid, 1234); @@ -296,18 +300,69 @@ static void test_seqpacket_msg_bounds_client(const struct test_opts *opts) exit(EXIT_FAILURE); } - /* Send several messages, one with MSG_EOR flag */ - for (int i = 0; i < MESSAGES_CNT; i++) - send_byte(fd, 1, (i == MSG_EOR_IDX) ? MSG_EOR : 0); + /* Wait, until receiver sets buffer size. */ + control_expectln("SRVREADY"); + + curr_hash = 0; + page_size = getpagesize(); + msg_count = SOCK_BUF_SIZE / MAX_MSG_SIZE; + + for (int i = 0; i < msg_count; i++) { + ssize_t send_size; + size_t buf_size; + int flags; + void *buf; + + /* Use "small" buffers and "big" buffers. */ + if (i & 1) + buf_size = page_size + + (rand() % (MAX_MSG_SIZE - page_size)); + else + buf_size = 1 + (rand() % page_size); + + buf = malloc(buf_size); + + if (!buf) { + perror("malloc"); + exit(EXIT_FAILURE); + } + + /* Set at least one MSG_EOR + some random. */ + if (i == (msg_count / 2) || (rand() & 1)) { + flags = MSG_EOR; + curr_hash++; + } else { + flags = 0; + } + + send_size = send(fd, buf, buf_size, flags); + + if (send_size < 0) { + perror("send"); + exit(EXIT_FAILURE); + } + + if (send_size != buf_size) { + fprintf(stderr, "Invalid send size\n"); + exit(EXIT_FAILURE); + } + + curr_hash += send_size; + curr_hash = djb2(&curr_hash, sizeof(curr_hash)); + } control_writeln("SENDDONE"); + control_writeulong(curr_hash); close(fd); } static void test_seqpacket_msg_bounds_server(const struct test_opts *opts) { + unsigned long sock_buf_size; + unsigned long remote_hash; + unsigned long curr_hash; int fd; - char buf[16]; + char buf[MAX_MSG_SIZE]; struct msghdr msg = {0}; struct iovec iov = {0}; @@ -317,25 +372,58 @@ static void test_seqpacket_msg_bounds_server(const struct test_opts *opts) exit(EXIT_FAILURE); } + sock_buf_size = SOCK_BUF_SIZE; + + if (setsockopt(fd, AF_VSOCK, SO_VM_SOCKETS_BUFFER_MAX_SIZE, + &sock_buf_size, sizeof(sock_buf_size))) { + perror("getsockopt"); + exit(EXIT_FAILURE); + } + + if (setsockopt(fd, AF_VSOCK, SO_VM_SOCKETS_BUFFER_SIZE, + &sock_buf_size, sizeof(sock_buf_size))) { + perror("getsockopt"); + exit(EXIT_FAILURE); + } + + /* Ready to receive data. */ + control_writeln("SRVREADY"); + /* Wait, until peer sends whole data. */ control_expectln("SENDDONE"); iov.iov_base = buf; iov.iov_len = sizeof(buf); msg.msg_iov = &iov; msg.msg_iovlen = 1; - for (int i = 0; i < MESSAGES_CNT; i++) { - if (recvmsg(fd, &msg, 0) != 1) { - perror("message bound violated"); - exit(EXIT_FAILURE); - } + curr_hash = 0; - if ((i == MSG_EOR_IDX) ^ !!(msg.msg_flags & MSG_EOR)) { - perror("MSG_EOR"); + while (1) { + ssize_t recv_size; + + recv_size = recvmsg(fd, &msg, 0); + + if (!recv_size) + break; + + if (recv_size < 0) { + perror("recvmsg"); exit(EXIT_FAILURE); } + + if (msg.msg_flags & MSG_EOR) + curr_hash++; + + curr_hash += recv_size; + curr_hash = djb2(&curr_hash, sizeof(curr_hash)); } close(fd); + remote_hash = control_readulong(NULL); + + if (curr_hash != remote_hash) { + fprintf(stderr, "Message bounds broken\n"); + exit(EXIT_FAILURE); + } } #define MESSAGE_TRUNC_SZ 32 @@ -837,6 +925,7 @@ int main(int argc, char **argv) .peer_cid = VMADDR_CID_ANY, }; + srand(time(NULL)); init_signals(); for (;;) { From patchwork Sun Nov 6 19:50:19 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Arseniy Krasnov X-Patchwork-Id: 13033551 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 36B62C4332F for ; Sun, 6 Nov 2022 19:51:02 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230289AbiKFTvA (ORCPT ); Sun, 6 Nov 2022 14:51:00 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48070 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230280AbiKFTuy (ORCPT ); Sun, 6 Nov 2022 14:50:54 -0500 Received: from mx.sberdevices.ru (mx.sberdevices.ru [45.89.227.171]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D544D10FDE; Sun, 6 Nov 2022 11:50:52 -0800 (PST) Received: from s-lin-edge02.sberdevices.ru (localhost [127.0.0.1]) by mx.sberdevices.ru (Postfix) with ESMTP id 425565FD03; Sun, 6 Nov 2022 22:50:51 +0300 (MSK) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sberdevices.ru; s=mail; t=1667764251; bh=JLlpkwTSfIBGrLuXfogAgrdiNEl7/2KG3a3rEAt/O1g=; h=From:To:Subject:Date:Message-ID:Content-Type:MIME-Version; b=dG9nmPY0hrwjfsuUxP/e2rXABuYHuQSKZhpi51SK88W3Z6wSNvdo6FoqaxIwXjaHv 6JMjYPa36C1miwBwJZxIx6JXxh9tPlb0HHkN0DwGYMcy8wDICso8J/4a60hMxRPvBH xRUJGZaVo6CB6PMPs8OVrRZ0dhjdpjI64az3dZkveXHu3oTSdGq8qOa2IAjibI0xF+ Ak47x4AQhYfGN6cCKFefqOCZabzgwDLXQJEp14Wyvtf6V4wiShRKVdI7LS23AhXiGK YVY01bGUWfqH0B7VDM1SBKP8WfukPcHIcrBqZ14PduiaPwCLasuHvujfjqNCqSw6yV wZoeWmVHS+HKw== Received: from S-MS-EXCH02.sberdevices.ru (S-MS-EXCH02.sberdevices.ru [172.16.1.5]) by mx.sberdevices.ru (Postfix) with ESMTP; Sun, 6 Nov 2022 22:50:51 +0300 (MSK) From: Arseniy Krasnov To: Stefano Garzarella , Stefan Hajnoczi , "Michael S. Tsirkin" , Jason Wang , "David S. Miller" , "edumazet@google.com" , Jakub Kicinski , Paolo Abeni , Krasnov Arseniy CC: "linux-kernel@vger.kernel.org" , "kvm@vger.kernel.org" , "virtualization@lists.linux-foundation.org" , "netdev@vger.kernel.org" , kernel Subject: [RFC PATCH v3 09/11] test/vsock: add big message test Thread-Topic: [RFC PATCH v3 09/11] test/vsock: add big message test Thread-Index: AQHY8hkAptW4O6wxGUmfu9tKRKX0oQ== Date: Sun, 6 Nov 2022 19:50:19 +0000 Message-ID: <4d2cac96-d829-fcf6-e042-2ed65493c602@sberdevices.ru> In-Reply-To: Accept-Language: en-US, ru-RU Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [172.16.1.12] Content-ID: MIME-Version: 1.0 X-KSMG-Rule-ID: 4 X-KSMG-Message-Action: clean X-KSMG-AntiSpam-Status: not scanned, disabled by settings X-KSMG-AntiSpam-Interceptor-Info: not scanned X-KSMG-AntiPhishing: not scanned, disabled by settings X-KSMG-AntiVirus: Kaspersky Secure Mail Gateway, version 1.1.2.30, bases: 2022/11/06 18:31:00 #20575158 X-KSMG-AntiVirus-Status: Clean, skipped Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-State: RFC This adds test for sending message, bigger than peer's buffer size. For SOCK_SEQPACKET socket it must fail, as this type of socket has message size limit. Signed-off-by: Arseniy Krasnov --- tools/testing/vsock/vsock_test.c | 62 ++++++++++++++++++++++++++++++++ 1 file changed, 62 insertions(+) diff --git a/tools/testing/vsock/vsock_test.c b/tools/testing/vsock/vsock_test.c index 107c11165887..bb4e8657f1d6 100644 --- a/tools/testing/vsock/vsock_test.c +++ b/tools/testing/vsock/vsock_test.c @@ -560,6 +560,63 @@ static void test_seqpacket_timeout_server(const struct test_opts *opts) close(fd); } +static void test_seqpacket_bigmsg_client(const struct test_opts *opts) +{ + unsigned long sock_buf_size; + ssize_t send_size; + socklen_t len; + void *data; + int fd; + + len = sizeof(sock_buf_size); + + fd = vsock_seqpacket_connect(opts->peer_cid, 1234); + if (fd < 0) { + perror("connect"); + exit(EXIT_FAILURE); + } + + if (getsockopt(fd, AF_VSOCK, SO_VM_SOCKETS_BUFFER_SIZE, + &sock_buf_size, &len)) { + perror("getsockopt"); + exit(EXIT_FAILURE); + } + + sock_buf_size++; + + data = malloc(sock_buf_size); + if (!data) { + perror("malloc"); + exit(EXIT_FAILURE); + } + + send_size = send(fd, data, sock_buf_size, 0); + if (send_size != -1) { + fprintf(stderr, "expected 'send(2)' failure, got %zi\n", + send_size); + } + + control_writeln("CLISENT"); + + free(data); + close(fd); +} + +static void test_seqpacket_bigmsg_server(const struct test_opts *opts) +{ + int fd; + + fd = vsock_seqpacket_accept(VMADDR_CID_ANY, 1234, NULL); + if (fd < 0) { + perror("accept"); + exit(EXIT_FAILURE); + } + + control_expectln("CLISENT"); + + close(fd); +} + #define BUF_PATTERN_1 'a' #define BUF_PATTERN_2 'b' @@ -832,6 +889,11 @@ static struct test_case test_cases[] = { .run_client = test_seqpacket_timeout_client, .run_server = test_seqpacket_timeout_server, }, + { + .name = "SOCK_SEQPACKET big message", + .run_client = test_seqpacket_bigmsg_client, + .run_server = test_seqpacket_bigmsg_server, + }, { .name = "SOCK_SEQPACKET invalid receive buffer", .run_client = test_seqpacket_invalid_rec_buffer_client, From patchwork Sun Nov 6 19:51:55 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Arseniy Krasnov X-Patchwork-Id: 13033555 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7DBB2C433FE for ; Sun, 6 Nov 2022 19:52:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230321AbiKFTwb (ORCPT ); Sun, 6 Nov 2022 14:52:31 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48972 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230085AbiKFTwa (ORCPT ); Sun, 6 Nov 2022 14:52:30 -0500 Received: from mx.sberdevices.ru (mx.sberdevices.ru [45.89.227.171]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0A9E210B5A; Sun, 6 Nov 2022 11:52:28 -0800 (PST) Received: from s-lin-edge02.sberdevices.ru (localhost [127.0.0.1]) by mx.sberdevices.ru (Postfix) with ESMTP id 4F7125FD03; Sun, 6 Nov 2022 22:52:26 +0300 (MSK) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sberdevices.ru; s=mail; t=1667764346; bh=gkvy/9shRw8d/tUYq9QGxng4hQ6wEGNdn6OjSeac5Qk=; h=From:To:Subject:Date:Message-ID:Content-Type:MIME-Version; b=dp9GorC2YJARu3hVnsjnbplvYohV9PDLX509Ma2/E7RgLkFWUHaMUvji3QZDtB4Zc wHSf+cvrPzAAT4MHiEuvXZEk4fl7oEC4b5KixBgmoFjLDbZQWbob+n0po7snI9kjML XQS5y4Ggds9InqLO7VvzP6JsG1c0gAc281tqykdb26JdoleGmIY/fPnp0zHeIC/1mT fa3ECa8yDL1TBUYmcZA5p/V2rsPWJaCaEpQsyGKcpuBvBabAq123pjfLtTSAmAl/G6 13eJPVyCF4L2WexgqdbLLmXRO4TNeU+T7bV6kGHv5CV7ajKmd5J5ctCYkvRWnSgEja IQuuRtTJyIiqQ== Received: from S-MS-EXCH02.sberdevices.ru (S-MS-EXCH02.sberdevices.ru [172.16.1.5]) by mx.sberdevices.ru (Postfix) with ESMTP; Sun, 6 Nov 2022 22:52:25 +0300 (MSK) From: Arseniy Krasnov To: Stefano Garzarella , Stefan Hajnoczi , "Michael S. Tsirkin" , Jason Wang , "David S. Miller" , "edumazet@google.com" , Jakub Kicinski , Paolo Abeni , Krasnov Arseniy CC: "linux-kernel@vger.kernel.org" , "kvm@vger.kernel.org" , "virtualization@lists.linux-foundation.org" , "netdev@vger.kernel.org" , kernel Subject: [RFC PATCH v3 10/11] test/vsock: add receive zerocopy tests Thread-Topic: [RFC PATCH v3 10/11] test/vsock: add receive zerocopy tests Thread-Index: AQHY8hk4EUPAFCDBikm4E+sDar4nhA== Date: Sun, 6 Nov 2022 19:51:55 +0000 Message-ID: In-Reply-To: Accept-Language: en-US, ru-RU Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [172.16.1.12] Content-ID: MIME-Version: 1.0 X-KSMG-Rule-ID: 4 X-KSMG-Message-Action: clean X-KSMG-AntiSpam-Status: not scanned, disabled by settings X-KSMG-AntiSpam-Interceptor-Info: not scanned X-KSMG-AntiPhishing: not scanned, disabled by settings X-KSMG-AntiVirus: Kaspersky Secure Mail Gateway, version 1.1.2.30, bases: 2022/11/06 18:31:00 #20575158 X-KSMG-AntiVirus-Status: Clean, skipped Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-State: RFC This adds tests for zerocopy feature: one test checks data transmission with simple integrity control. Second test covers 'error' branches in zerocopy logic(to check invalid arguments handling). Signed-off-by: Arseniy Krasnov --- tools/include/uapi/linux/virtio_vsock.h | 15 + tools/include/uapi/linux/vm_sockets.h | 8 + tools/testing/vsock/Makefile | 2 +- tools/testing/vsock/util.c | 27 +- tools/testing/vsock/util.h | 4 + tools/testing/vsock/vsock_test.c | 21 + tools/testing/vsock/vsock_test_zerocopy.c | 530 ++++++++++++++++++++++ tools/testing/vsock/vsock_test_zerocopy.h | 14 + 8 files changed, 617 insertions(+), 4 deletions(-) create mode 100644 tools/include/uapi/linux/virtio_vsock.h create mode 100644 tools/include/uapi/linux/vm_sockets.h create mode 100644 tools/testing/vsock/vsock_test_zerocopy.c create mode 100644 tools/testing/vsock/vsock_test_zerocopy.h diff --git a/tools/include/uapi/linux/virtio_vsock.h b/tools/include/uapi/linux/virtio_vsock.h new file mode 100644 index 000000000000..f393062e0394 --- /dev/null +++ b/tools/include/uapi/linux/virtio_vsock.h @@ -0,0 +1,15 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +#ifndef _UAPI_LINUX_VIRTIO_VSOCK_H +#define _UAPI_LINUX_VIRTIO_VSOCK_H +#include + +struct virtio_vsock_usr_hdr { + u32 flags; + u32 len; +} __attribute__((packed)); + +struct virtio_vsock_usr_hdr_pref { + u32 poll_value; + u32 hdr_num; +} __attribute__((packed)); +#endif /* _UAPI_LINUX_VIRTIO_VSOCK_H */ diff --git a/tools/include/uapi/linux/vm_sockets.h b/tools/include/uapi/linux/vm_sockets.h new file mode 100644 index 000000000000..cac0bc3a7041 --- /dev/null +++ b/tools/include/uapi/linux/vm_sockets.h @@ -0,0 +1,8 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +#ifndef _UAPI_LINUX_VM_SOCKETS_H +#define _UAPI_LINUX_VM_SOCKETS_H + +#define SO_VM_SOCKETS_MAP_RX 9 +#define SO_VM_SOCKETS_ZEROCOPY 10 + +#endif /* _UAPI_LINUX_VM_SOCKETS_H */ diff --git a/tools/testing/vsock/Makefile b/tools/testing/vsock/Makefile index f8293c6910c9..7172c21fbd8d 100644 --- a/tools/testing/vsock/Makefile +++ b/tools/testing/vsock/Makefile @@ -1,7 +1,7 @@ # SPDX-License-Identifier: GPL-2.0-only all: test test: vsock_test vsock_diag_test -vsock_test: vsock_test.o timeout.o control.o util.o +vsock_test: vsock_test.o vsock_test_zerocopy.o timeout.o control.o util.o vsock_diag_test: vsock_diag_test.o timeout.o control.o util.o CFLAGS += -g -O2 -Werror -Wall -I. -I../../include -I../../../usr/include -Wno-pointer-sign -fno-strict-overflow -fno-strict-aliasing -fno-common -MMD -U_FORTIFY_SOURCE -D_GNU_SOURCE diff --git a/tools/testing/vsock/util.c b/tools/testing/vsock/util.c index 351903836774..7ae5219d0fe8 100644 --- a/tools/testing/vsock/util.c +++ b/tools/testing/vsock/util.c @@ -84,7 +84,8 @@ void vsock_wait_remote_close(int fd) } /* Connect to and return the file descriptor. */ -static int vsock_connect(unsigned int cid, unsigned int port, int type) +static int vsock_connect(unsigned int cid, unsigned int port, int type, + int optname, void *optval, socklen_t optlen) { union { struct sockaddr sa; @@ -103,6 +104,13 @@ static int vsock_connect(unsigned int cid, unsigned int port, int type) fd = socket(AF_VSOCK, type, 0); + if (optval) { + if (setsockopt(fd, AF_VSOCK, optname, optval, optlen)) { + close(fd); + return -1; + } + } + timeout_begin(TIMEOUT); do { ret = connect(fd, &addr.sa, sizeof(addr.svm)); @@ -122,12 +130,25 @@ static int vsock_connect(unsigned int cid, unsigned int port, int type) int vsock_stream_connect(unsigned int cid, unsigned int port) { - return vsock_connect(cid, port, SOCK_STREAM); + return vsock_connect(cid, port, SOCK_STREAM, 0, NULL, 0); +} + +int vsock_stream_connect_opt(unsigned int cid, unsigned int port, + int optname, void *optval, socklen_t optlen) +{ + return vsock_connect(cid, port, SOCK_STREAM, optname, optval, optlen); } int vsock_seqpacket_connect(unsigned int cid, unsigned int port) { - return vsock_connect(cid, port, SOCK_SEQPACKET); + return vsock_connect(cid, port, SOCK_SEQPACKET, 0, NULL, 0); +} + +int vsock_seqpacket_connect_opt(unsigned int cid, unsigned int port, + int optname, void *optval, socklen_t optlen) +{ + return vsock_connect(cid, port, SOCK_SEQPACKET, optname, optval, + optlen); } /* Listen on and return the first incoming connection. The remote diff --git a/tools/testing/vsock/util.h b/tools/testing/vsock/util.h index 988cc69a4642..23f9f0c6853d 100644 --- a/tools/testing/vsock/util.h +++ b/tools/testing/vsock/util.h @@ -36,7 +36,11 @@ struct test_case { void init_signals(void); unsigned int parse_cid(const char *str); int vsock_stream_connect(unsigned int cid, unsigned int port); +int vsock_stream_connect_opt(unsigned int cid, unsigned int port, + int optname, void *optval, socklen_t optlen); int vsock_seqpacket_connect(unsigned int cid, unsigned int port); +int vsock_seqpacket_connect_opt(unsigned int cid, unsigned int port, + int optname, void *optval, socklen_t optlen); int vsock_stream_accept(unsigned int cid, unsigned int port, struct sockaddr_vm *clientaddrp); int vsock_seqpacket_accept(unsigned int cid, unsigned int port, diff --git a/tools/testing/vsock/vsock_test.c b/tools/testing/vsock/vsock_test.c index bb4e8657f1d6..a6ed076e42f4 100644 --- a/tools/testing/vsock/vsock_test.c +++ b/tools/testing/vsock/vsock_test.c @@ -23,6 +23,7 @@ #include "timeout.h" #include "control.h" #include "util.h" +#include "vsock_test_zerocopy.h" static void test_stream_connection_reset(const struct test_opts *opts) { @@ -904,6 +905,26 @@ static struct test_case test_cases[] = { .run_client = test_stream_poll_rcvlowat_client, .run_server = test_stream_poll_rcvlowat_server, }, + { + .name = "SOCK_STREAM zerocopy receive", + .run_client = test_stream_zerocopy_rx_client, + .run_server = test_stream_zerocopy_rx_server, + }, + { + .name = "SOCK_SEQPACKET zerocopy receive", + .run_client = test_seqpacket_zerocopy_rx_client, + .run_server = test_seqpacket_zerocopy_rx_server, + }, + { + .name = "SOCK_STREAM zerocopy receive loop poll", + .run_client = test_stream_zerocopy_rx_client_loop_poll, + .run_server = test_stream_zerocopy_rx_server_loop_poll, + }, + { + .name = "SOCK_STREAM zerocopy invalid", + .run_client = test_stream_zerocopy_rx_inv_client, + .run_server = test_stream_zerocopy_rx_inv_server, + }, {}, }; diff --git a/tools/testing/vsock/vsock_test_zerocopy.c b/tools/testing/vsock/vsock_test_zerocopy.c new file mode 100644 index 000000000000..1876f14c0cec --- /dev/null +++ b/tools/testing/vsock/vsock_test_zerocopy.c @@ -0,0 +1,530 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * + * + * Copyright (C) 2022 Sberdevices, Inc. + * + * Author: Arseniy Krasnov + */ + +#include +#include +#include +#include +#include +#include +#include + +#include "timeout.h" +#include "control.h" +#include "util.h" +#include "vsock_test_zerocopy.h" + +#define PAGE_SIZE 4096 +#define RX_MAPPING_PAGES 3 + +#define TX_BUF_SIZE 40000 +#define TX_SEND_LOOPS 3 + +static void test_connectible_zerocopy_rx_client(const struct test_opts *opts, + bool stream) +{ + unsigned long remote_hash; + unsigned long curr_hash; + unsigned long total_sum; + unsigned long msg_bytes; + unsigned long zc_on; + size_t rx_map_len; + void *rx_va; + int fd; + + zc_on = 1; + + if (stream) + fd = vsock_stream_connect_opt(opts->peer_cid, 1234, + SO_VM_SOCKETS_ZEROCOPY, + (void *)&zc_on, sizeof(zc_on)); + else + fd = vsock_seqpacket_connect_opt(opts->peer_cid, 1234, + SO_VM_SOCKETS_ZEROCOPY, + (void *)&zc_on, sizeof(zc_on)); + if (fd < 0) { + perror("connect"); + exit(EXIT_FAILURE); + } + + rx_map_len = PAGE_SIZE * RX_MAPPING_PAGES; + + rx_va = mmap(NULL, rx_map_len, PROT_READ, MAP_SHARED, fd, 0); + if (rx_va == MAP_FAILED) { + perror("mmap"); + exit(EXIT_FAILURE); + } + + total_sum = 0; + msg_bytes = 0; + curr_hash = 0; + + while (1) { + struct pollfd fds = { 0 }; + int leave_loop; + int res; + + fds.fd = fd; + fds.events = POLLIN | POLLERR | POLLHUP | + POLLRDHUP | POLLNVAL; + + res = poll(&fds, 1, -1); + + if (res < 0) { + perror("poll"); + exit(EXIT_FAILURE); + } + + if (fds.revents & POLLERR) { + perror("poll error"); + exit(EXIT_FAILURE); + } + + leave_loop = 0; + + if (fds.revents & POLLIN) { + struct virtio_vsock_usr_hdr_pref *poll_hdr; + struct virtio_vsock_usr_hdr *data_hdr; + unsigned char *data_va; + unsigned char *end_va; + socklen_t len; + int hdr_cnt; + + poll_hdr = (struct virtio_vsock_usr_hdr_pref *)rx_va; + len = sizeof(rx_va); + + if (getsockopt(fd, AF_VSOCK, + SO_VM_SOCKETS_MAP_RX, + &rx_va, &len) < 0) { + perror("getsockopt"); + exit(EXIT_FAILURE); + } + + data_hdr = (struct virtio_vsock_usr_hdr *)(poll_hdr + 1); + /* Skip headers page for data. */ + data_va = rx_va + PAGE_SIZE; + end_va = (unsigned char *)(rx_va + rx_map_len); + hdr_cnt = 0; + + while (data_va != end_va) { + int data_len = data_hdr->len; + + if (!data_hdr->len) { + if (fds.revents & (POLLHUP | POLLRDHUP) && + !hdr_cnt) + leave_loop = 1; + + break; + } + + while (data_len > 0) { + int i; + int to_read = (data_len < PAGE_SIZE) ? + data_len : PAGE_SIZE; + + for (i = 0; i < to_read; i++) + total_sum += data_va[i]; + + data_va += PAGE_SIZE; + data_len -= PAGE_SIZE; + } + + if (!stream) { + msg_bytes += data_hdr->len; + + if (data_hdr->flags) { + curr_hash += msg_bytes; + curr_hash = djb2(&curr_hash, + sizeof(curr_hash)); + msg_bytes = 0; + } + } + + data_hdr++; + hdr_cnt++; + } + + if (madvise((void *)rx_va, + rx_map_len, + MADV_DONTNEED)) { + perror("madvise"); + exit(EXIT_FAILURE); + } + + if (leave_loop) + break; + } + } + + curr_hash += total_sum; + curr_hash = djb2(&curr_hash, sizeof(curr_hash)); + + if (munmap(rx_va, rx_map_len)) { + perror("munmap"); + exit(EXIT_FAILURE); + } + + remote_hash = control_readulong(NULL); + + if (curr_hash != remote_hash) { + fprintf(stderr, "sum mismatch %lu != %lu\n", + curr_hash, remote_hash); + exit(EXIT_FAILURE); + } + + close(fd); +} + +void test_seqpacket_zerocopy_rx_client(const struct test_opts *opts) +{ + test_connectible_zerocopy_rx_client(opts, false); +} + +void test_stream_zerocopy_rx_client(const struct test_opts *opts) +{ + test_connectible_zerocopy_rx_client(opts, true); +} + +static void test_connectible_zerocopy_rx_server(const struct test_opts *opts, + bool stream) +{ + size_t max_buf_size = TX_BUF_SIZE; + unsigned long curr_hash; + long total_sum = 0; + int n = TX_SEND_LOOPS; + int fd; + + if (stream) + fd = vsock_stream_accept(VMADDR_CID_ANY, 1234, NULL); + else + fd = vsock_seqpacket_accept(VMADDR_CID_ANY, 1234, NULL); + + if (fd < 0) { + perror("accept"); + exit(EXIT_FAILURE); + } + + curr_hash = 0; + + while (n) { + unsigned char *data; + size_t send_size; + size_t buf_size; + int i; + + buf_size = 1 + (rand() % max_buf_size); + + data = malloc(buf_size); + + if (!data) { + perror("malloc"); + exit(EXIT_FAILURE); + } + + for (i = 0; i < buf_size; i++) { + data[i] = rand() & 0xff; + total_sum += data[i]; + } + + send_size = write(fd, data, buf_size); + + if (send_size != buf_size) { + perror("write"); + exit(EXIT_FAILURE); + } + + if (!stream) { + curr_hash += send_size; + curr_hash = djb2(&curr_hash, sizeof(curr_hash)); + } + + free(data); + n--; + } + + curr_hash += total_sum; + curr_hash = djb2(&curr_hash, sizeof(curr_hash)); + control_writeulong(curr_hash); + + close(fd); +} + +void test_stream_zerocopy_rx_server(const struct test_opts *opts) +{ + test_connectible_zerocopy_rx_server(opts, true); +} + +void test_seqpacket_zerocopy_rx_server(const struct test_opts *opts) +{ + test_connectible_zerocopy_rx_server(opts, false); +} + +void test_stream_zerocopy_rx_client_loop_poll(const struct test_opts *opts) +{ + unsigned long remote_sum; + unsigned long total_sum; + unsigned long zc_on = 1; + size_t rx_map_len; + u32 poll_value = 0; + void *rx_va; + int fd; + + fd = vsock_stream_connect_opt(opts->peer_cid, 1234, + SO_VM_SOCKETS_ZEROCOPY, + (void *)&zc_on, sizeof(zc_on)); + if (fd < 0) { + perror("connect"); + exit(EXIT_FAILURE); + } + + rx_map_len = PAGE_SIZE * RX_MAPPING_PAGES; + + rx_va = mmap(NULL, rx_map_len, PROT_READ, MAP_SHARED, fd, 0); + if (rx_va == MAP_FAILED) { + perror("mmap"); + exit(EXIT_FAILURE); + } + + total_sum = 0; + + while (1) { + volatile struct virtio_vsock_usr_hdr_pref *poll_hdr; + struct virtio_vsock_usr_hdr *data_hdr; + unsigned char *data_va; + unsigned char *end_va; + int leave_loop; + socklen_t len; + int hdr_cnt; + + poll_hdr = (struct virtio_vsock_usr_hdr_pref *)rx_va; + + if (poll_value != ~0) { + do { + poll_value = poll_hdr->poll_value; + } while (!poll_value); + } + + len = sizeof(rx_va); + + if (getsockopt(fd, AF_VSOCK, + SO_VM_SOCKETS_MAP_RX, + &rx_va, &len) < 0) { + perror("getsockopt"); + exit(EXIT_FAILURE); + } + + data_va = rx_va + PAGE_SIZE; + end_va = (unsigned char *)(rx_va + rx_map_len); + data_hdr = (struct virtio_vsock_usr_hdr *)(poll_hdr + 1); + hdr_cnt = 0; + leave_loop = 0; + + while (data_va != end_va) { + int data_len = data_hdr->len; + + if (!data_hdr->len) { + /* Zero length in first header and there will + * be no more data, leave processing loop. + */ + if (!hdr_cnt && (poll_value == ~0)) + leave_loop = 1; + + break; + } + + while (data_len > 0) { + int i; + int to_read = (data_len < PAGE_SIZE) ? + data_len : PAGE_SIZE; + + for (i = 0; i < to_read; i++) + total_sum += data_va[i]; + + data_va += PAGE_SIZE; + data_len -= PAGE_SIZE; + } + + data_hdr++; + hdr_cnt++; + } + + if (madvise((void *)rx_va + PAGE_SIZE, + rx_map_len - PAGE_SIZE, + MADV_DONTNEED)) { + perror("madvise"); + exit(EXIT_FAILURE); + } + + if (leave_loop) + break; + } + + if (munmap(rx_va, rx_map_len)) { + perror("munmap"); + exit(EXIT_FAILURE); + } + + remote_sum = control_readulong(NULL); + + if (total_sum != remote_sum) { + fprintf(stderr, "loop sum mismatch %lu != %lu\n", + total_sum, remote_sum); + exit(EXIT_FAILURE); + } + + close(fd); +} + +void test_stream_zerocopy_rx_server_loop_poll(const struct test_opts *opts) +{ + size_t max_buf_size = TX_BUF_SIZE; + unsigned long total_sum; + int n = TX_SEND_LOOPS; + int fd; + + fd = vsock_stream_accept(VMADDR_CID_ANY, 1234, NULL); + if (fd < 0) { + perror("accept"); + exit(EXIT_FAILURE); + } + + total_sum = 0; + + while (n) { + unsigned char *data; + size_t buf_size; + int i; + + buf_size = 1 + (rand() % max_buf_size); + + data = malloc(buf_size); + + if (!data) { + perror("malloc"); + exit(EXIT_FAILURE); + } + + for (i = 0; i < buf_size; i++) { + data[i] = rand() & 0xff; + total_sum += data[i]; + } + + if (write(fd, data, buf_size) != buf_size) { + perror("write"); + exit(EXIT_FAILURE); + } + + free(data); + n--; + } + + control_writeulong(total_sum); + + close(fd); +} + +void test_stream_zerocopy_rx_inv_client(const struct test_opts *opts) +{ + size_t map_size = PAGE_SIZE * 5; + unsigned long zc_on = 1; + socklen_t len; + void *map_va; + int fd; + + /* Try zerocopy with disable option. */ + fd = vsock_stream_connect_opt(opts->peer_cid, 1234, SO_VM_SOCKETS_ZEROCOPY, + (void *)&zc_on, sizeof(zc_on)); + if (fd < 0) { + perror("connect"); + exit(EXIT_FAILURE); + } + + len = sizeof(map_va); + map_va = 0; + + /* Try zerocopy with invalid mapping address. */ + if (getsockopt(fd, AF_VSOCK, SO_VM_SOCKETS_MAP_RX, + &map_va, &len) == 0) { + perror("getsockopt"); + exit(EXIT_FAILURE); + } + + /* Try zerocopy with valid, but not socket mapping. */ + map_va = mmap(NULL, map_size, PROT_READ, + MAP_ANONYMOUS | MAP_PRIVATE, -1, 0); + if (map_va == MAP_FAILED) { + perror("anon mmap"); + exit(EXIT_FAILURE); + } + + if (getsockopt(fd, AF_VSOCK, SO_VM_SOCKETS_MAP_RX, + &map_va, &len) == 0) { + perror("getsockopt"); + exit(EXIT_FAILURE); + } + + if (munmap(map_va, map_size)) { + perror("munmap"); + exit(EXIT_FAILURE); + } + + /* Try zerocopy with valid, but too small mapping. */ + map_va = mmap(NULL, PAGE_SIZE, PROT_READ, MAP_SHARED, fd, 0); + if (map_va != MAP_FAILED) { + perror("socket mmap small"); + exit(EXIT_FAILURE); + } + + if (getsockopt(fd, AF_VSOCK, SO_VM_SOCKETS_MAP_RX, + &map_va, &len) == 0) { + perror("getsockopt"); + exit(EXIT_FAILURE); + } + + /* Try zerocopy with valid mapping, but not from first byte. */ + map_va = mmap(NULL, map_size, PROT_READ, MAP_SHARED, fd, 0); + if (map_va == MAP_FAILED) { + perror("socket mmap"); + exit(EXIT_FAILURE); + } + + map_va += PAGE_SIZE; + + if (getsockopt(fd, AF_VSOCK, SO_VM_SOCKETS_MAP_RX, + &map_va, &len) == 0) { + perror("getsockopt"); + exit(EXIT_FAILURE); + } + + if (munmap(map_va - PAGE_SIZE, map_size)) { + perror("munmap"); + exit(EXIT_FAILURE); + } + + control_writeln("DONE"); + + close(fd); +} + +void test_stream_zerocopy_rx_inv_server(const struct test_opts *opts) +{ + int fd; + + fd = vsock_stream_accept(VMADDR_CID_ANY, 1234, NULL); + + if (fd < 0) { + perror("accept"); + exit(EXIT_FAILURE); + } + + control_expectln("DONE"); + + close(fd); +} + diff --git a/tools/testing/vsock/vsock_test_zerocopy.h b/tools/testing/vsock/vsock_test_zerocopy.h new file mode 100644 index 000000000000..a818bd65b376 --- /dev/null +++ b/tools/testing/vsock/vsock_test_zerocopy.h @@ -0,0 +1,14 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +#ifndef VSOCK_TEST_ZEROCOPY_H +#define VSOCK_TEST_ZEROCOPY_H + +void test_stream_zerocopy_rx_client(const struct test_opts *opts); +void test_stream_zerocopy_rx_server(const struct test_opts *opts); +void test_seqpacket_zerocopy_rx_client(const struct test_opts *opts); +void test_seqpacket_zerocopy_rx_server(const struct test_opts *opts); +void test_stream_zerocopy_rx_client_loop_poll(const struct test_opts *opts); +void test_stream_zerocopy_rx_server_loop_poll(const struct test_opts *opts); +void test_stream_zerocopy_rx_inv_client(const struct test_opts *opts); +void test_stream_zerocopy_rx_inv_server(const struct test_opts *opts); + +#endif /* VSOCK_TEST_ZEROCOPY_H */ From patchwork Sun Nov 6 19:54:01 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Arseniy Krasnov X-Patchwork-Id: 13033556 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3B67CC433FE for ; Sun, 6 Nov 2022 19:54:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230223AbiKFTyi (ORCPT ); Sun, 6 Nov 2022 14:54:38 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50052 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230051AbiKFTyh (ORCPT ); Sun, 6 Nov 2022 14:54:37 -0500 Received: from mx.sberdevices.ru (mx.sberdevices.ru [45.89.227.171]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A906610FF2; Sun, 6 Nov 2022 11:54:34 -0800 (PST) Received: from s-lin-edge02.sberdevices.ru (localhost [127.0.0.1]) by mx.sberdevices.ru (Postfix) with ESMTP id D6FBA5FD03; Sun, 6 Nov 2022 22:54:32 +0300 (MSK) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sberdevices.ru; s=mail; t=1667764472; bh=ulGuIRNxVQm/+k4w6lPcFueVXIy279HBcRtIxxR7snY=; h=From:To:Subject:Date:Message-ID:Content-Type:MIME-Version; b=PTrrXrIha7+bxLNwf8Q3ze08c4yHjReD94hDtKTjrFBoXYbl8lcGP14SL8Jjti+8m deBDwmetoGBAT2I8MtL2orj8Nhvo9ujJ6+I7pUg41wtM3BhBDW+kN5DCbYB+xud8jr cVI34OCfInqPgpBB2jm8YxZVZ2swopDraRHpQnk47/x9M0lZFteZzy1QcIBdYD137r BS5QbOJJw2a4ZXtaA8R3/8GDvFtZBn72uHDhLjeXCn7PRLMFcVJOGPsGATUGDQAjwv 1h57mLtT5zJxdQWuIBf/tLW05Hp/DVmDdnl23OJ91gfjaGK+iWQlL4hCg8kB3IbBza P4SUgPcZMwwrw== Received: from S-MS-EXCH02.sberdevices.ru (S-MS-EXCH02.sberdevices.ru [172.16.1.5]) by mx.sberdevices.ru (Postfix) with ESMTP; Sun, 6 Nov 2022 22:54:32 +0300 (MSK) From: Arseniy Krasnov To: Stefano Garzarella , Stefan Hajnoczi , "Michael S. Tsirkin" , Jason Wang , "David S. Miller" , "edumazet@google.com" , Jakub Kicinski , Paolo Abeni , Krasnov Arseniy CC: "linux-kernel@vger.kernel.org" , "kvm@vger.kernel.org" , "virtualization@lists.linux-foundation.org" , "netdev@vger.kernel.org" , kernel Subject: [RFC PATCH v3 11/11] test/vsock: vsock_rx_perf utility Thread-Topic: [RFC PATCH v3 11/11] test/vsock: vsock_rx_perf utility Thread-Index: AQHY8hmED7B5vfedB0aVNaKlpc9CKg== Date: Sun, 6 Nov 2022 19:54:01 +0000 Message-ID: In-Reply-To: Accept-Language: en-US, ru-RU Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [172.16.1.12] Content-ID: <8E7EBF8A82A3884B9D1EEF697DF4DF3D@sberdevices.ru> MIME-Version: 1.0 X-KSMG-Rule-ID: 4 X-KSMG-Message-Action: clean X-KSMG-AntiSpam-Status: not scanned, disabled by settings X-KSMG-AntiSpam-Interceptor-Info: not scanned X-KSMG-AntiPhishing: not scanned, disabled by settings X-KSMG-AntiVirus: Kaspersky Secure Mail Gateway, version 1.1.2.30, bases: 2022/11/06 18:31:00 #20575158 X-KSMG-AntiVirus-Status: Clean, skipped Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-State: RFC This adds utility to check vsock receive throughput. Signed-off-by: Arseniy Krasnov --- tools/testing/vsock/Makefile | 1 + tools/testing/vsock/vsock_rx_perf.c | 604 ++++++++++++++++++++++++++++ 2 files changed, 605 insertions(+) create mode 100644 tools/testing/vsock/vsock_rx_perf.c diff --git a/tools/testing/vsock/Makefile b/tools/testing/vsock/Makefile index 7172c21fbd8d..5aea346ba2bc 100644 --- a/tools/testing/vsock/Makefile +++ b/tools/testing/vsock/Makefile @@ -3,6 +3,7 @@ all: test test: vsock_test vsock_diag_test vsock_test: vsock_test.o vsock_test_zerocopy.o timeout.o control.o util.o vsock_diag_test: vsock_diag_test.o timeout.o control.o util.o +vsock_rx_perf: vsock_rx_perf.o CFLAGS += -g -O2 -Werror -Wall -I. -I../../include -I../../../usr/include -Wno-pointer-sign -fno-strict-overflow -fno-strict-aliasing -fno-common -MMD -U_FORTIFY_SOURCE -D_GNU_SOURCE .PHONY: all test clean diff --git a/tools/testing/vsock/vsock_rx_perf.c b/tools/testing/vsock/vsock_rx_perf.c new file mode 100644 index 000000000000..323626089043 --- /dev/null +++ b/tools/testing/vsock/vsock_rx_perf.c @@ -0,0 +1,604 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * vsock_rx_perf - benchmark utility for zerocopy receive. + * + * Copyright (C) 2022 SberDevices. + * + * Author: Arseniy Krasnov + */ +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#define PAGE_SIZE 4096 +#define DEFAULT_BUF_SIZE_BYTES (128*1024) +#define DEFAULT_TO_SEND_BYTES (65*1024) +#define DEFAULT_VSOCK_BUF_BYTES (256*1024) +#define DEFAULT_RCVLOWAT_BYTES 1 +#define DEFAULT_PORT 1234 + +static bool client_mode = true; +static int peer_cid = -1; +static int port = DEFAULT_PORT; +static unsigned long rcvlowat_bytes = DEFAULT_RCVLOWAT_BYTES; +static unsigned long buf_size_bytes = DEFAULT_BUF_SIZE_BYTES; +static unsigned long to_send_bytes = DEFAULT_TO_SEND_BYTES; +static unsigned long vsock_buf_bytes = DEFAULT_VSOCK_BUF_BYTES; + +#define ZEROCOPY_MODE_NO 0 +#define ZEROCOPY_MODE_POLL 1 +#define ZEROCOPY_MODE_USER_POLL 2 + +static int zerocopy_mode = ZEROCOPY_MODE_NO; + +static inline time_t current_nsec(void) +{ + struct timespec ts; + + if (clock_gettime(CLOCK_REALTIME, &ts)) { + perror("clock_gettime"); + exit(EXIT_FAILURE); + } + + return (ts.tv_sec * 1000000000ULL) + ts.tv_nsec; +} + +/* From lib/cmdline.c. */ +static unsigned long memparse(const char *ptr) +{ + char *endptr; + + unsigned long long ret = strtoull(ptr, &endptr, 0); + + switch (*endptr) { + case 'E': + case 'e': + ret <<= 10; + case 'P': + case 'p': + ret <<= 10; + case 'T': + case 't': + ret <<= 10; + case 'G': + case 'g': + ret <<= 10; + case 'M': + case 'm': + ret <<= 10; + case 'K': + case 'k': + ret <<= 10; + endptr++; + default: + break; + } + + return ret; +} + +static void map_rx(int fd, void *va) +{ + socklen_t len = sizeof(va); + + if (getsockopt(fd, AF_VSOCK, SO_VM_SOCKETS_MAP_RX, &va, &len) < 0) { + perror("getsockopt"); + exit(EXIT_FAILURE); + } +} + +static void vsock_increase_buf_size(int fd) +{ + if (setsockopt(fd, AF_VSOCK, SO_VM_SOCKETS_BUFFER_MAX_SIZE, + &vsock_buf_bytes, sizeof(vsock_buf_bytes))) { + perror("setsockopt"); + exit(EXIT_FAILURE); + } + + if (setsockopt(fd, AF_VSOCK, SO_VM_SOCKETS_BUFFER_SIZE, + &vsock_buf_bytes, sizeof(vsock_buf_bytes))) { + perror("setsockopt"); + exit(EXIT_FAILURE); + } +} + +static int vsock_connect(unsigned int cid, unsigned int port) +{ + union { + struct sockaddr sa; + struct sockaddr_vm svm; + } addr = { + .svm = { + .svm_family = AF_VSOCK, + .svm_port = port, + .svm_cid = cid, + }, + }; + unsigned long zc_on; + int fd; + + fd = socket(AF_VSOCK, SOCK_STREAM, 0); + + if (fd < 0) + return -1; + + vsock_increase_buf_size(fd); + + zc_on = 1; + + if (zerocopy_mode != ZEROCOPY_MODE_NO) { + if (setsockopt(fd, AF_VSOCK, SO_VM_SOCKETS_ZEROCOPY, + (void *)&zc_on, sizeof(zc_on))) { + close(fd); + return -1; + } + } + + if (connect(fd, &addr.sa, sizeof(addr.svm)) < 0) { + close(fd); + return -1; + } + + return fd; +} + +static void run_server(void) +{ + time_t tx_begin_ns; + size_t total_send; + int client_fd; + char *data; + int fd; + union { + struct sockaddr sa; + struct sockaddr_vm svm; + } addr = { + .svm = { + .svm_family = AF_VSOCK, + .svm_port = port, + .svm_cid = VMADDR_CID_ANY, + }, + }; + union { + struct sockaddr sa; + struct sockaddr_vm svm; + } clientaddr; + + socklen_t clientaddr_len = sizeof(clientaddr.svm); + + fprintf(stderr, "Run as server, listen %i, send %lu, tx buf %lu vsock buf %lu\n", + port, to_send_bytes, buf_size_bytes, + vsock_buf_bytes); + + fd = socket(AF_VSOCK, SOCK_STREAM, 0); + + if (fd < 0) { + perror("socket"); + exit(EXIT_FAILURE); + } + + if (bind(fd, &addr.sa, sizeof(addr.svm)) < 0) { + perror("bind"); + exit(EXIT_FAILURE); + } + + if (listen(fd, 1) < 0) { + perror("listen"); + exit(EXIT_FAILURE); + } + + client_fd = accept(fd, &clientaddr.sa, &clientaddr_len); + + if (client_fd < 0) { + perror("accept"); + exit(EXIT_FAILURE); + } + + vsock_increase_buf_size(client_fd); + vsock_increase_buf_size(fd); + + data = malloc(buf_size_bytes); + + if (!data) { + fprintf(stderr, "malloc failed\n"); + close(client_fd); + close(fd); + exit(EXIT_FAILURE); + } + + memset(data, 0, buf_size_bytes); + total_send = 0; + tx_begin_ns = current_nsec(); + + while (total_send < to_send_bytes) { + ssize_t sent; + + sent = write(client_fd, data, buf_size_bytes); + + if (sent <= 0) { + perror("write"); + exit(EXIT_FAILURE); + } + + total_send += sent; + } + + fprintf(stderr, "Tx loop time %f sec\n", + (float)(current_nsec() - tx_begin_ns) / (1000000000.0)); + + close(client_fd); + close(fd); + + free(data); +} + +static void run_client(void) +{ + unsigned int read_cnt; + time_t rx_begin_ns; + time_t in_read_ns; + void *data; + int fd; + + fprintf(stderr, "Running client, copy, peer %i:%i, rx buf %lu, vsock buf %lu\n", + peer_cid, port, buf_size_bytes, vsock_buf_bytes); + + fd = vsock_connect(peer_cid, port); + + if (fd < 0) { + perror("socket"); + exit(EXIT_FAILURE); + } + + if (setsockopt(fd, SOL_SOCKET, SO_RCVLOWAT, + &rcvlowat_bytes, + sizeof(rcvlowat_bytes))) { + perror("setsockopt 1"); + exit(EXIT_FAILURE); + } + + data = mmap(NULL, buf_size_bytes, PROT_READ | PROT_WRITE, + MAP_ANONYMOUS | MAP_PRIVATE | MAP_POPULATE, -1, 0); + + if (data == MAP_FAILED) { + perror("mmap"); + exit(EXIT_FAILURE); + } + + in_read_ns = 0; + read_cnt = 0; + rx_begin_ns = current_nsec(); + + while (1) { + struct pollfd fds = { 0 }; + + fds.fd = fd; + fds.events = POLLIN | POLLERR | POLLHUP | + POLLRDHUP | POLLNVAL; + + if (poll(&fds, 1, -1) < 0) { + perror("poll"); + exit(EXIT_FAILURE); + } + + if (fds.revents & POLLERR) { + fprintf(stderr, "Poll error\n"); + break; + } + + if (fds.revents & (POLLHUP | POLLRDHUP)) + break; + + if (fds.revents & POLLIN) { + ssize_t bytes_read; + time_t t; + + t = current_nsec(); + bytes_read = read(fd, data, buf_size_bytes); + in_read_ns += (current_nsec() - t); + read_cnt++; + + if (!bytes_read) + break; + + if (bytes_read < 0) { + perror("recv"); + exit(EXIT_FAILURE); + } + } + } + + fprintf(stderr, "Rx loop time %f sec\n", + (float)(current_nsec() - rx_begin_ns) / (1000000000.0)); + fprintf(stderr, "Total in 'read()' %f sec\n", in_read_ns / 1000000000.0); + fprintf(stderr, "POLLIN wakeups: %i\n", read_cnt); + fprintf(stderr, "Average in 'read()' %f ns\n", + (float)in_read_ns / read_cnt); + + munmap(data, buf_size_bytes); + close(fd); +} + +static void run_client_zerocopy(void) +{ + unsigned int rx_cnt; + time_t rx_begin_ns; + time_t in_rx_ns; + void *rx_va; + int done; + int fd; + + fprintf(stderr, "Running client, zerocopy, peer %i:%i, rx buf %lu, vsock buf %lu\n", + peer_cid, port, buf_size_bytes, vsock_buf_bytes); + + fd = socket(AF_VSOCK, SOCK_STREAM, 0); + + if (fd < 0) { + perror("socket"); + exit(EXIT_FAILURE); + } + + fd = vsock_connect(peer_cid, port); + + if (setsockopt(fd, SOL_SOCKET, SO_RCVLOWAT, + &rcvlowat_bytes, + sizeof(rcvlowat_bytes))) { + perror("setsockopt 1"); + exit(EXIT_FAILURE); + } + + rx_va = mmap(NULL, buf_size_bytes, PROT_READ, MAP_SHARED, fd, 0); + + if (rx_va == MAP_FAILED) { + perror("mmap"); + exit(EXIT_FAILURE); + } + + done = 0; + + in_rx_ns = 0; + rx_cnt = 0; + rx_begin_ns = current_nsec(); + + while (1) { + struct pollfd fds = { 0 }; + + fds.fd = fd; + fds.events = POLLIN | POLLERR | POLLHUP | + POLLRDHUP | POLLNVAL; + + if (poll(&fds, 1, -1) < 0) { + perror("poll"); + exit(EXIT_FAILURE); + } + + if (fds.revents & (POLLHUP | POLLRDHUP)) + done = 1; + + if (fds.revents & POLLERR) { + fprintf(stderr, "Poll error\n"); + break; + } + + if (fds.revents & POLLIN) { + struct virtio_vsock_usr_hdr_pref *hdr_pref; + size_t t; + + t = current_nsec(); + map_rx(fd, rx_va); + in_rx_ns += (current_nsec() - t); + + hdr_pref = (struct virtio_vsock_usr_hdr_pref *)rx_va; + + if (!hdr_pref->hdr_num && done) + break; + + t = current_nsec(); + if (madvise((void *)rx_va + PAGE_SIZE, buf_size_bytes - PAGE_SIZE, + MADV_DONTNEED)) { + perror("madvise"); + exit(EXIT_FAILURE); + } + in_rx_ns += (current_nsec() - t); + rx_cnt++; + } + + if (done) + break; + } + + fprintf(stderr, "Rx loop time %f sec\n", + (float)(current_nsec() - rx_begin_ns) / (1000000000.0)); + fprintf(stderr, "Total in 'getsockopt()' + 'madvise()' %f sec\n", + in_rx_ns / 1000000000.0); + fprintf(stderr, "POLLIN wakeups: %i\n", rx_cnt); + fprintf(stderr, "Average in 'getsockopt()' + 'madvise()' %f ns\n", + (float)in_rx_ns / rx_cnt); + + close(fd); +} + +static void run_client_user_poll(void) +{ + unsigned int rx_cnt; + time_t rx_begin_ns; + time_t in_rx_ns; + u32 poll_value; + void *rx_va; + int fd; + + fprintf(stderr, "Running client, user poll, peer %i:%i, rx buf %lu, vsock buf %lu\n", + peer_cid, port, buf_size_bytes, vsock_buf_bytes); + + fd = vsock_connect(peer_cid, port); + + if (fd < 0) { + perror("connect"); + exit(EXIT_FAILURE); + } + + rx_va = mmap(NULL, buf_size_bytes, PROT_READ, MAP_SHARED, fd, 0); + + if (rx_va == MAP_FAILED) { + perror("mmap"); + exit(EXIT_FAILURE); + } + + poll_value = 0; + in_rx_ns = 0; + rx_cnt = 0; + rx_begin_ns = current_nsec(); + + while (1) { + volatile struct virtio_vsock_usr_hdr_pref *poll_hdr; + int leave_loop = 0; + size_t t; + + poll_hdr = (struct virtio_vsock_usr_hdr_pref *)rx_va; + + if (poll_value != ~0) { + do { + poll_value = poll_hdr->poll_value; + } while (!poll_value); + } + + t = current_nsec(); + map_rx(fd, rx_va); + in_rx_ns += (current_nsec() - t); + + if (!poll_hdr->hdr_num && (poll_value == ~0)) + leave_loop = 1; + + t = current_nsec(); + if (madvise((void *)rx_va + PAGE_SIZE, + buf_size_bytes - PAGE_SIZE, + MADV_DONTNEED)) { + perror("madvise"); + exit(EXIT_FAILURE); + } + in_rx_ns += (current_nsec() - t); + rx_cnt++; + + if (leave_loop) + break; + } + + fprintf(stderr, "Rx loop time %f sec\n", + (float)(current_nsec() - rx_begin_ns) / (1000000000.0)); + fprintf(stderr, "Total in 'getsockopt()' + 'madvise()' %f sec\n", + in_rx_ns / 1000000000.0); + fprintf(stderr, "Busyloop wakeups: %i\n", rx_cnt); + fprintf(stderr, "Average in 'getsockopt()' + 'madvise()' %f ns\n", + (float)in_rx_ns / rx_cnt); +} + +static void usage(void) +{ + fprintf(stderr, "Usage: vsock_rx_perf [-h] -m c|s\n" + "-z n|y|u -c -m \n" + "-b \n" + "\n" + "Server: vsock_rx_perf -m s\n" + "This is benchmarking utility, to test vsock receive performance.\n" + "It runs in two modes: server or client. In server mode, it waits\n" + "connection from client, and when established, server starts data\n" + "transmission. Total size of data to send is set by '-m' option.\n" + "\n" + "Client could read this data in three different modes:\n" + "1) Using 'read()' system call. Default mode.\n" + "2) Zerocopy mode, use 'poll()' to wait data.\n" + "3) Zerocopy mode, use busyloop in userpace to wait data.\n" + "\n" + "Meaning of '-b' depends of server or client mode. In server\n" + "mode, it is size of tx buffer, passed to 'write()'. In client mode,\n" + "it is size of rx buffer(without zerocopy) passed to 'read()'. With\n" + "zerocopy enabled, it is size of rx mapping.\n" + "\n" + "Options:\n" + " -h This help message\n" + " -m c|s Server or client(client default)\n" + " -p Port\n" + " -z n|y|u Data waiting mode\n" + " -c CID of the peer\n" + " -m Megabytes to send\n" + " -b Depends on server/client mode\n" + "\n"); + exit(EXIT_FAILURE); +} + +int main(int argc, char **argv) +{ + int c; + + while ((c = getopt(argc, argv, "sc:p:m:b:z:r:hv:")) != -1) { + switch (c) { + case 'v': + vsock_buf_bytes = memparse(optarg); + break; + case 'r': + rcvlowat_bytes = memparse(optarg); + break; + case 's': + client_mode = false; + break; + case 'c': + peer_cid = atoi(optarg); + break; + case 'p': + port = atoi(optarg); + break; + case 'm': + to_send_bytes = memparse(optarg); + break; + case 'b': + buf_size_bytes = memparse(optarg); + break; + case 'z': + if (!strcmp(optarg, "n")) + zerocopy_mode = ZEROCOPY_MODE_NO; + else if (!strcmp(optarg, "y")) + zerocopy_mode = ZEROCOPY_MODE_POLL; + else if (!strcmp(optarg, "u")) + zerocopy_mode = ZEROCOPY_MODE_USER_POLL; + else + usage(); + break; + case 'h': + usage(); + break; + default: + usage(); + + } + } + + if (client_mode) { + switch (zerocopy_mode) { + case ZEROCOPY_MODE_NO: + run_client(); + break; + case ZEROCOPY_MODE_POLL: + run_client_zerocopy(); + break; + case ZEROCOPY_MODE_USER_POLL: + run_client_user_poll(); + break; + } + } else { + run_server(); + } + + return 0; +}