Message ID | 20240517144607.2595798-1-niuxuewei.nxw@antgroup.com (mailing list archive) |
---|---|
Headers | show |
Series | vsock/virtio: Add support for multi-devices | expand |
Hi, thanks for this RFC! On Fri, May 17, 2024 at 10:46:02PM GMT, Xuewei Niu wrote: ># Motivition > >Vsock is a lightweight and widely used data exchange mechanism between host >and guest. Kata Containers, a secure container runtime, leverages the >capability to exchange control data between the shim and the kata-agent. > >The Linux kernel only supports one vsock device for virtio-vsock transport, >resulting in the following limitations: > >* Poor performance isolation: All vsock connections share the same >virtqueue. This might be fixed if we implement multi-queue in virtio-vsock. >* Cannot enable more than one backend: Virtio-vsock, vhost-vsock, and >vhost-user-vsock cannot be enabled simultaneously on the transport. > >We’d like to transfer networking data, such as TSI (Transparent Socket >Impersonation), over vsock via the vhost-user protocol to reduce overhead. >However, by default, the vsock device is occupied by the kata-agent. > ># Usages > >Principle: **Supporting virtio-vsock multi-devices while also being >compatible with existing ones.** > >## Connection from Guest to Host > >There are two valuable questions to take about: > >1. How to be compatible with the existing usages? >2. How do we specify a virtio-vsock device? > >### Question 1 > >Before we delve into question 1, I'd like to provide a piece of pseudocode >as an example of one of the existing use cases from the guest's >perspective. > >Assuming there is one virtio-vsock device with CID 4. One of existing >usages to connect to host is shown as following. > >``` >fd = socket(AF_VSOCK); >connect(fd, 2, 1234); >n = write(fd, buffer); >``` > >The result is that a connection is established from the guest (4, ?) to the >host (2, 1234), where "?" denotes a random port. > >In the context of multi-devices, there are more than two devices. If the >users don’t specify one CID explicitly, the kernel becomes confused about >which device to use. The new implementation should be compatible with the >old one. > >We expanded the virtio-vsock specification to address this issue. The >specification now includes a new field called "order". > >``` >struct virtio_vsock_config { > __le64 guest_cid; > __le64 order; >} _attribute_((packed)); >``` > >In the phase of virtio-vsock driver probing, the guest kernel reads >from >VMM to get the order of each device. **We stipulate that the device with the >smallest order is regarded as the default device**(this mechanism functions >as a 'default gateway' in networking). > >Assuming there are three virtio-vsock devices: device1 (CID=3), device2 >(CID=4), and device3 (CID=5). The arrangement of the list is as follows >from the perspective of the guest kernel: > >``` >virtio_vsock_list = >virtio_vsock { cid: 4, order: 0 } -> virtio_vsock { cid: 3, order: 1 } -> virtio_vsock { cid: 5, order: 10 } >``` > >At this time, the guest kernel realizes that the device2 (CID=4) is the >default device. Execute the same code as before. > >``` >fd = socket(AF_VSOCK); >connect(fd, 2, 1234); >n = write(fd, buffer); >``` > >A connection will be established from the guest (4, ?) to the host (2, 1234). It seems that only the one with order 0 is used here though, so what is the ordering for? Wouldn't it suffice to simply indicate the default device (e.g., like the default gateway for networking)? > >### Question 2 > >Now, the user wants to specify a device instead of the default one. An >explicit binding operation is required to be performed. > >Use the device (CID=3), where “-1” represents any port, the kernel will We have a macro: VMADDR_PORT_ANY (which is -1) >search an available port automatically. > >``` >fd = socket(AF_VSOCK); >bind(fd, 3, -1); >connect(fd, 2, 1234);) >n = write(fd, buffer); >``` > >Use the device (CID=4). > >``` >fd = socket(AF_VSOCK); >bind(fd, 4, -1); >connect(fd, 2, 1234); >n = write(fd, buffer); >``` > >## Connection from Host to Guest > >Connection from host to guest is quite similar to the existing usages. The >device’s CID is specified by the bind operation. > >Listen at the device (CID=3)’s port 10000. > >``` >fd = socket(AF_VSOCK); >bind(fd, 3, 10000); >listen(fd); >new_fd = accept(fd, &host_cid, &host_port); >n = write(fd, buffer); >``` > >Listen at the device (CID=4)’s port 10000. > >``` >fd = socket(AF_VSOCK); >bind(fd, 4, 10000); >listen(fd); >new_fd = accept(fd, &host_cid, &host_port); >n = write(fd, buffer); >``` > ># Use Cases > >We've completed a POC with Kata Containers, Ztunnel, which is a >purpose-built per-node proxy for Istio ambient mesh, and TSI. Please refer >to the following link for more details. > >Link: https://bit.ly/4bdPJbU Thank you for this RFC, I left several comments in the patches, we still have some work to do, but I think it is something we can support :-) Here I summarize the things that I think we need to fix: 1. Avoid adding transport-specific things in af_vsock.c We need to have a generic API to allow other transports to implement the same functionality. 2. We need to add negotiation of a new feature in virtio/vhost transports We need to enable or disable support depending on whether the feature is negotiated, since guest and host may not support it. 3. Re-work the patch order for bisectability (more detail on patches 3/4) 4. Do we really need the order or just a default device? 5. Check if we can add some tests in tools/testing/vsock 6. When we agree on the RFC, we should discuss the spec changes in the virtio ML before sending a non-RFC series on Linux These are the main things, but I left other comments in the patches. Thanks, Stefano
On Fri, May 17, 2024 at 10:46:02PM +0800, Xuewei Niu wrote: > include/linux/virtio_vsock.h | 2 +- > include/net/af_vsock.h | 25 ++- > include/uapi/linux/virtio_vsock.h | 1 + > include/uapi/linux/vm_sockets.h | 14 ++ > net/vmw_vsock/af_vsock.c | 116 +++++++++-- > net/vmw_vsock/virtio_transport.c | 255 ++++++++++++++++++------ > net/vmw_vsock/virtio_transport_common.c | 16 +- > net/vmw_vsock/vsock_loopback.c | 4 +- > 8 files changed, 352 insertions(+), 81 deletions(-) As any change to virtio device/driver interface, this has to go through the virtio TC. Please subscribe at virtio-comment+subscribe@lists.linux.dev and then contact the TC at virtio-comment@lists.linux.dev You will likely eventually need to write a spec draft document, too.
Hi, MST! > > include/linux/virtio_vsock.h | 2 +- > > include/net/af_vsock.h | 25 ++- > > include/uapi/linux/virtio_vsock.h | 1 + > > include/uapi/linux/vm_sockets.h | 14 ++ > > net/vmw_vsock/af_vsock.c | 116 +++++++++-- > > net/vmw_vsock/virtio_transport.c | 255 ++++++++++++++++++------ > > net/vmw_vsock/virtio_transport_common.c | 16 +- > > net/vmw_vsock/vsock_loopback.c | 4 +- > > 8 files changed, 352 insertions(+), 81 deletions(-) > > As any change to virtio device/driver interface, this has to > go through the virtio TC. Please subscribe at > virtio-comment+subscribe@lists.linux.dev and then > contact the TC at virtio-comment@lists.linux.dev > > You will likely eventually need to write a spec draft document, too. Thanks for your reply. I've sent a series of RFC documents for the spec changes to virtio TC [1]. [1]: https://lore.kernel.org/virtio-comment/20240528054725.268173-1-niuxuewei.nxw@antgroup.com/ Thanks Xuewei