Message ID | 20240701-pinna-v2-2-ac396d181f59@outlook.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | vsock: avoid queuing on workqueue if possible | expand |
Hi all, > + /* Inside RCU, can't sleep! */ > + ret = mutex_trylock(&vsock->tx_lock); > + if (unlikely(ret == 0)) > + goto out_worker; I just realized that here I don't release the tx_lock and that the email subject is "PATCH PATCH". I will fix this in the next version. Any feedback is welcome! Thanks, Luigi
On Mon, Jul 01, 2024 at 04:49:41PM GMT, Luigi Leonardi wrote: >Hi all, > >> + /* Inside RCU, can't sleep! */ >> + ret = mutex_trylock(&vsock->tx_lock); >> + if (unlikely(ret == 0)) >> + goto out_worker; > >I just realized that here I don't release the tx_lock and >that the email subject is "PATCH PATCH". >I will fix this in the next version. What about adding a function to handle all these steps? So we can handle better the error path in this block code. IMHO to simplify the code, you can just return true or false if you queued it. Then if the driver is disappearing and we are still queuing it, it will be the release that will clean up all the queues, so we might not worry about this edge case. Thanks, Stefano >Any feedback is welcome! > >Thanks, >Luigi >
On Mon, Jul 01, 2024 at 04:28:03PM GMT, Luigi Leonardi via B4 Relay wrote: >From: Marco Pinna <marco.pinn95@gmail.com> > >Introduce an optimization in virtio_transport_send_pkt: >when the work queue (send_pkt_queue) is empty the packet is >put directly in the virtqueue reducing latency. > >In the following benchmark (pingpong mode) the host sends >a payload to the guest and waits for the same payload back. > >All vCPUs pinned individually to pCPUs. >vhost process pinned to a pCPU >fio process pinned both inside the host and the guest system. > >Host CPU: Intel i7-10700KF CPU @ 3.80GHz >Tool: Fio version 3.37-56 >Env: Phys host + L1 Guest >Payload: 512 >Runtime-per-test: 50s >Mode: pingpong (h-g-h) >Test runs: 50 >Type: SOCK_STREAM > >Before (Linux 6.8.11) >------ >mean(1st percentile): 380.56 ns >mean(overall): 780.83 ns >mean(99th percentile): 8300.24 ns > >After >------ >mean(1st percentile): 370.59 ns >mean(overall): 720.66 ns >mean(99th percentile): 7600.27 ns > >Same setup, using 4K payload: > >Before (Linux 6.8.11) >------ >mean(1st percentile): 458.84 ns >mean(overall): 1650.17 ns >mean(99th percentile): 42240.68 ns > >After >------ >mean(1st percentile): 450.12 ns >mean(overall): 1460.84 ns >mean(99th percentile): 37632.45 ns > >virtqueue. > >Throughput: iperf-vsock > >Before (Linux 6.8.11) >G2H 28.7 Gb/s > >After >G2H 40.8 Gb/s Cool! I'd suggest to add the length of buffer (-l param) used, and also check more lenghts, like at least 4k, 64k, 128k. > >The performance improvement is related to this optimization, >I checked that each packet was put directly on the vq >avoiding the work queue. How? > >Co-developed-by: Luigi Leonardi <luigi.leonardi@outlook.com> >Signed-off-by: Luigi Leonardi <luigi.leonardi@outlook.com> >Signed-off-by: Marco Pinna <marco.pinn95@gmail.com> I think you might want to change the author of this patch, since it's changed a lot from Marco's original one. Obviously if you both agree on this. Thanks, Stefano >--- > net/vmw_vsock/virtio_transport.c | 38 ++++++++++++++++++++++++++++++++++++-- > 1 file changed, 36 insertions(+), 2 deletions(-) > >diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c >index a74083d28120..3815aa8d956b 100644 >--- a/net/vmw_vsock/virtio_transport.c >+++ b/net/vmw_vsock/virtio_transport.c >@@ -213,6 +213,7 @@ virtio_transport_send_pkt(struct sk_buff *skb) > { > struct virtio_vsock_hdr *hdr; > struct virtio_vsock *vsock; >+ bool use_worker = true; > int len = skb->len; > > hdr = virtio_vsock_hdr(skb); >@@ -234,8 +235,41 @@ virtio_transport_send_pkt(struct sk_buff *skb) > if (virtio_vsock_skb_reply(skb)) > atomic_inc(&vsock->queued_replies); > >- virtio_vsock_skb_queue_tail(&vsock->send_pkt_queue, skb); >- queue_work(virtio_vsock_workqueue, &vsock->send_pkt_work); >+ /* If the workqueue (send_pkt_queue) is empty there is no need to enqueue the packet. >+ * Just put it on the virtqueue using virtio_transport_send_skb. >+ */ >+ if (skb_queue_empty_lockless(&vsock->send_pkt_queue)) { >+ bool restart_rx = false; >+ struct virtqueue *vq; >+ int ret; >+ >+ /* Inside RCU, can't sleep! */ >+ ret = mutex_trylock(&vsock->tx_lock); >+ if (unlikely(ret == 0)) >+ goto out_worker; >+ >+ /* Driver is being removed, no need to enqueue the packet */ >+ if (!vsock->tx_run) >+ goto out_rcu; >+ >+ vq = vsock->vqs[VSOCK_VQ_TX]; >+ >+ if (!virtio_transport_send_skb(skb, vq, vsock, &restart_rx)) { >+ use_worker = false; >+ virtqueue_kick(vq); >+ } >+ >+ mutex_unlock(&vsock->tx_lock); >+ >+ if (restart_rx) >+ queue_work(virtio_vsock_workqueue, &vsock->rx_work); >+ } >+ >+out_worker: >+ if (use_worker) { >+ virtio_vsock_skb_queue_tail(&vsock->send_pkt_queue, skb); >+ queue_work(virtio_vsock_workqueue, &vsock->send_pkt_work); >+ } > > out_rcu: > rcu_read_unlock(); > >-- 2.45.2 > >
diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c index a74083d28120..3815aa8d956b 100644 --- a/net/vmw_vsock/virtio_transport.c +++ b/net/vmw_vsock/virtio_transport.c @@ -213,6 +213,7 @@ virtio_transport_send_pkt(struct sk_buff *skb) { struct virtio_vsock_hdr *hdr; struct virtio_vsock *vsock; + bool use_worker = true; int len = skb->len; hdr = virtio_vsock_hdr(skb); @@ -234,8 +235,41 @@ virtio_transport_send_pkt(struct sk_buff *skb) if (virtio_vsock_skb_reply(skb)) atomic_inc(&vsock->queued_replies); - virtio_vsock_skb_queue_tail(&vsock->send_pkt_queue, skb); - queue_work(virtio_vsock_workqueue, &vsock->send_pkt_work); + /* If the workqueue (send_pkt_queue) is empty there is no need to enqueue the packet. + * Just put it on the virtqueue using virtio_transport_send_skb. + */ + if (skb_queue_empty_lockless(&vsock->send_pkt_queue)) { + bool restart_rx = false; + struct virtqueue *vq; + int ret; + + /* Inside RCU, can't sleep! */ + ret = mutex_trylock(&vsock->tx_lock); + if (unlikely(ret == 0)) + goto out_worker; + + /* Driver is being removed, no need to enqueue the packet */ + if (!vsock->tx_run) + goto out_rcu; + + vq = vsock->vqs[VSOCK_VQ_TX]; + + if (!virtio_transport_send_skb(skb, vq, vsock, &restart_rx)) { + use_worker = false; + virtqueue_kick(vq); + } + + mutex_unlock(&vsock->tx_lock); + + if (restart_rx) + queue_work(virtio_vsock_workqueue, &vsock->rx_work); + } + +out_worker: + if (use_worker) { + virtio_vsock_skb_queue_tail(&vsock->send_pkt_queue, skb); + queue_work(virtio_vsock_workqueue, &vsock->send_pkt_work); + } out_rcu: rcu_read_unlock();