Message ID | AS2P194MB21706E349197C1466937052C9AC22@AS2P194MB2170.EURP194.PROD.OUTLOOK.COM (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | vsock: avoid queuing on workqueue if possible | expand |
On Fri, Jun 14, 2024 at 03:55:43PM GMT, Luigi Leonardi wrote: >From: Marco Pinna <marco.pinn95@gmail.com> > >This introduces an optimization in virtio_transport_send_pkt: >when the work queue (send_pkt_queue) is empty the packet is >put directly in the virtqueue reducing latency. > >In the following benchmark (pingpong mode) the host sends >a payload to the guest and waits for the same payload back. > >Tool: Fio version 3.37-56 >Env: Phys host + L1 Guest >Payload: 4k >Runtime-per-test: 50s >Mode: pingpong (h-g-h) >Test runs: 50 >Type: SOCK_STREAM > >Before (Linux 6.8.11) >------ >mean(1st percentile): 722.45 ns >mean(overall): 1686.23 ns >mean(99th percentile): 35379.27 ns > >After >------ >mean(1st percentile): 602.62 ns >mean(overall): 1248.83 ns >mean(99th percentile): 17557.33 ns Cool, thanks for this improvement! Can you also report your host CPU detail? > >Co-developed-by: Luigi Leonardi <luigi.leonardi@outlook.com> >Signed-off-by: Luigi Leonardi <luigi.leonardi@outlook.com> >Signed-off-by: Marco Pinna <marco.pinn95@gmail.com> >--- > net/vmw_vsock/virtio_transport.c | 32 ++++++++++++++++++++++++++++++-- > 1 file changed, 30 insertions(+), 2 deletions(-) > >diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c >index c930235ecaec..e89bf87282b2 100644 >--- a/net/vmw_vsock/virtio_transport.c >+++ b/net/vmw_vsock/virtio_transport.c >@@ -214,7 +214,9 @@ virtio_transport_send_pkt(struct sk_buff *skb) > { > struct virtio_vsock_hdr *hdr; > struct virtio_vsock *vsock; >+ bool use_worker = true; > int len = skb->len; >+ int ret = -1; Please define ret in the block we use it. Also, we don't need to initialize it. > > hdr = virtio_vsock_hdr(skb); > >@@ -235,8 +237,34 @@ virtio_transport_send_pkt(struct sk_buff *skb) > if (virtio_vsock_skb_reply(skb)) > atomic_inc(&vsock->queued_replies); > >- virtio_vsock_skb_queue_tail(&vsock->send_pkt_queue, skb); >- queue_work(virtio_vsock_workqueue, &vsock->send_pkt_work); >+ /* If the send_pkt_queue is empty there is no need to enqueue the packet. We should clarify which queue. I mean we are always queueing the packet somewhere, or in the internal queue for the worker or in the virtqueue, so this comment is not really clear. >+ * Just put it on the ringbuff using virtio_transport_send_skb. ringbuff? Do you mean virtqueue? >+ */ >+ we can avoid this empty line. >+ if (skb_queue_empty_lockless(&vsock->send_pkt_queue)) { >+ bool restart_rx = false; >+ struct virtqueue *vq; ... `int ret;` here. >+ >+ mutex_lock(&vsock->tx_lock); >+ >+ vq = vsock->vqs[VSOCK_VQ_TX]; >+ >+ ret = virtio_transport_send_skb(skb, vq, vsock, &restart_rx); Ah, at the end we don't need `ret` at all. What about just `if (!virtio_transport_send_skb())`? >+ if (ret == 0) { >+ use_worker = false; >+ virtqueue_kick(vq); >+ } >+ >+ mutex_unlock(&vsock->tx_lock); >+ >+ if (restart_rx) >+ queue_work(virtio_vsock_workqueue, &vsock->rx_work); >+ } >+ >+ if (use_worker) { >+ virtio_vsock_skb_queue_tail(&vsock->send_pkt_queue, skb); >+ queue_work(virtio_vsock_workqueue, &vsock->send_pkt_work); >+ } > > out_rcu: > rcu_read_unlock(); >-- >2.45.2 >
Hello, thanks for working on this! I have some minor thoughts. On Fri, Jun 14, 2024 at 03:55:43PM +0200, Luigi Leonardi wrote: > From: Marco Pinna <marco.pinn95@gmail.com> > > This introduces an optimization in virtio_transport_send_pkt: > when the work queue (send_pkt_queue) is empty the packet is > put directly in the virtqueue reducing latency. > > In the following benchmark (pingpong mode) the host sends > a payload to the guest and waits for the same payload back. > > Tool: Fio version 3.37-56 > Env: Phys host + L1 Guest > Payload: 4k > Runtime-per-test: 50s > Mode: pingpong (h-g-h) > Test runs: 50 > Type: SOCK_STREAM > > Before (Linux 6.8.11) > ------ > mean(1st percentile): 722.45 ns > mean(overall): 1686.23 ns > mean(99th percentile): 35379.27 ns > > After > ------ > mean(1st percentile): 602.62 ns > mean(overall): 1248.83 ns > mean(99th percentile): 17557.33 ns > I think It would be interesting to know what exactly the test does, and, if the test is triggering the improvement, i.e., the better results are due to enqueuing packets directly to the virtqueue instead of letting the worker does it. If I understand correctly, this patch focuses on the case in which the worker queue is empty. I think the test can always send packets at a frequency so the worker queue is always empty, but maybe, this is a corner case and most of the time the worker queue is not empty in a non-testing environment. Matias
Hi Stefano and Matias, @Stefano Thanks for your review(s)! I'll send a V2 by the end of the week. @Matias Thanks for your feedback! > I think It would be interesting to know what exactly the test does It's relatively easy: I used fio's pingpong mode. This mode is specifically for measuring the latency, the way it works is by sending packets, in my case, from the host to the guest. and waiting for the other side to send them back. The latency I wrote in the commit is the "completion latency". The total throughput on my system is around 16 Gb/sec. > if the test is triggering the improvement Yes! I did some additional testing and I can confirm you that during this test, the worker queue is never used! > If I understand correctly, this patch focuses on the > case in which the worker queue is empty Correct! > I think the test can always send packets at a frequency so the worker queue > is always empty. but maybe, this is a corner case and most of the time the > worker queue is not empty in a non-testing environment. I'm not sure about this, but IMHO this optimization is free, there is no penalty for using it, in the worst case the system will work as usual. In any case, I'm more than happy to do some additional testing, do you have anything in mind? Luigi
On Tue, Jun 18, 2024 at 07:05:54PM +0200, Luigi Leonardi wrote: > Hi Stefano and Matias, > > @Stefano Thanks for your review(s)! I'll send a V2 by the end of the week. > > @Matias > > Thanks for your feedback! > > > I think It would be interesting to know what exactly the test does > > It's relatively easy: I used fio's pingpong mode. This mode is specifically > for measuring the latency, the way it works is by sending packets, > in my case, from the host to the guest. and waiting for the other side > to send them back. The latency I wrote in the commit is the "completion > latency". The total throughput on my system is around 16 Gb/sec. > Thanks for the explanation! > > if the test is triggering the improvement > > Yes! I did some additional testing and I can confirm you that during this > test, the worker queue is never used! > Cool. > > If I understand correctly, this patch focuses on the > > case in which the worker queue is empty > > Correct! > > > I think the test can always send packets at a frequency so the worker queue > > is always empty. but maybe, this is a corner case and most of the time the > > worker queue is not empty in a non-testing environment. > > I'm not sure about this, but IMHO this optimization is free, there is no > penalty for using it, in the worst case the system will work as usual. > In any case, I'm more than happy to do some additional testing, do you have > anything in mind? > Sure!, this is very a interesting improvement and I am in favor for that! I was only thinking out loud ;) I asked previous questions because, in my mind, I was thinking that this improvement would trigger only for the first bunch of packets, i.e., when the worker queue is empty so its effect would be seen "only at the beginning of the transmission" until the worker-queue begins to fill. If I understand correctly, the worker-queue starts to fill just after the virtqueue is full, am I right? Matias
Hi Matias, > > > I think the test can always send packets at a frequency so the worker queue > > > is always empty. but maybe, this is a corner case and most of the time the > > > worker queue is not empty in a non-testing environment. > > > > I'm not sure about this, but IMHO this optimization is free, there is no > > penalty for using it, in the worst case the system will work as usual. > > In any case, I'm more than happy to do some additional testing, do you have > > anything in mind? > > > Sure!, this is very a interesting improvement and I am in favor for > that! I was only thinking out loud ;) No worries :) > I asked previous questions > because, in my mind, I was thinking that this improvement would trigger > only for the first bunch of packets, i.e., when the worker queue is > empty so its effect would be seen "only at the beginning of the > transmission" until the worker-queue begins to fill. If I understand > correctly, the worker-queue starts to fill just after the virtqueue is > full, am I right? Correct! Packets are enqueued in the worker-queue only if the virtqueue is full. Luigi
diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c index c930235ecaec..e89bf87282b2 100644 --- a/net/vmw_vsock/virtio_transport.c +++ b/net/vmw_vsock/virtio_transport.c @@ -214,7 +214,9 @@ virtio_transport_send_pkt(struct sk_buff *skb) { struct virtio_vsock_hdr *hdr; struct virtio_vsock *vsock; + bool use_worker = true; int len = skb->len; + int ret = -1; hdr = virtio_vsock_hdr(skb); @@ -235,8 +237,34 @@ virtio_transport_send_pkt(struct sk_buff *skb) if (virtio_vsock_skb_reply(skb)) atomic_inc(&vsock->queued_replies); - virtio_vsock_skb_queue_tail(&vsock->send_pkt_queue, skb); - queue_work(virtio_vsock_workqueue, &vsock->send_pkt_work); + /* If the send_pkt_queue is empty there is no need to enqueue the packet. + * Just put it on the ringbuff using virtio_transport_send_skb. + */ + + if (skb_queue_empty_lockless(&vsock->send_pkt_queue)) { + bool restart_rx = false; + struct virtqueue *vq; + + mutex_lock(&vsock->tx_lock); + + vq = vsock->vqs[VSOCK_VQ_TX]; + + ret = virtio_transport_send_skb(skb, vq, vsock, &restart_rx); + if (ret == 0) { + use_worker = false; + virtqueue_kick(vq); + } + + mutex_unlock(&vsock->tx_lock); + + if (restart_rx) + queue_work(virtio_vsock_workqueue, &vsock->rx_work); + } + + if (use_worker) { + virtio_vsock_skb_queue_tail(&vsock->send_pkt_queue, skb); + queue_work(virtio_vsock_workqueue, &vsock->send_pkt_work); + } out_rcu: rcu_read_unlock();