diff mbox series

[RFC,v1,1/2] virtio/vsock: rework deferred credit update logic

Message ID 20240621192541.2082657-2-avkrasnov@salutedevices.com (mailing list archive)
State New, archived
Headers show
Series virtio/vsock: some updates for deferred credit update | expand

Commit Message

Arseniy Krasnov June 21, 2024, 7:25 p.m. UTC
Previous calculation of 'free_space' was wrong (but worked as expected
in most cases, see below), because it didn't account number of bytes in
rx queue. Let's rework 'free_space' calculation in the following way:
as this value is considered free space at rx side from tx point of view,
it must be equal to return value of 'virtio_transport_get_credit()' at
tx side. This function uses 'tx_cnt' counter and 'peer_fwd_cnt': first
is number of transmitted bytes (without wrap), second is last 'fwd_cnt'
value received from rx. So let's use same approach at rx side during
'free_space' calculation: add 'rx_cnt' counter which is number of
received bytes (also without wrap) and subtract 'last_fwd_cnt' from it.
Now we have:
1) 'rx_cnt' == 'tx_cnt' at both sides.
2) 'last_fwd_cnt' == 'peer_fwd_cnt' - because first is last 'fwd_cnt'
   sent to tx, while second is last 'fwd_cnt' received from rx.

Now 'free_space' is handled correctly and also we don't need
'low_rx_bytes' flag - this was more like a hack.

Previous calculation of 'free_space' worked (in 99% cases), because if
we take a look on behaviour of both expressions (new and previous):

'(rx_cnt - last_fwd_cnt)' and '(fwd_cnt - last_fwd_cnt)'

Both of them always grows up, with almost same "speed": only difference
is that 'rx_cnt' is incremented earlier during packet is received,
while 'fwd_cnt' in incremented when packet is read by user. So if 'rx_cnt'
grows "faster", then resulting 'free_space' become smaller also, so we
send credit updates a little bit more, but:

  * 'free_space' calculation based on 'rx_cnt' gives the same value,
    which tx sees as free space at rx side, so original idea of
    'free_space' is now implemented as planned.
  * Hack with 'low_rx_bytes' now is not needed.

Also here is some performance comparison between both versions of
'free_space' calculation:

 *------*----------*----------*
 |      | 'rx_cnt' | previous |
 *------*----------*----------*
 |H -> G|   8.42   |   7.82   |
 *------*----------*----------*
 |G -> H|   11.6   |   12.1   |
 *------*----------*----------*

As benchmark 'vsock-iperf' with default arguments was used. There is no
significant performance difference before and after this patch.

Signed-off-by: Arseniy Krasnov <avkrasnov@salutedevices.com>
---
 include/linux/virtio_vsock.h            | 1 +
 net/vmw_vsock/virtio_transport_common.c | 8 +++-----
 2 files changed, 4 insertions(+), 5 deletions(-)

Comments

Stefano Garzarella June 25, 2024, 1:46 p.m. UTC | #1
On Fri, Jun 21, 2024 at 10:25:40PM GMT, Arseniy Krasnov wrote:
>Previous calculation of 'free_space' was wrong (but worked as expected
>in most cases, see below), because it didn't account number of bytes in
>rx queue. Let's rework 'free_space' calculation in the following way:
>as this value is considered free space at rx side from tx point of view,
>it must be equal to return value of 'virtio_transport_get_credit()' at
>tx side. This function uses 'tx_cnt' counter and 'peer_fwd_cnt': first
>is number of transmitted bytes (without wrap), second is last 'fwd_cnt'
>value received from rx. So let's use same approach at rx side during
>'free_space' calculation: add 'rx_cnt' counter which is number of
>received bytes (also without wrap) and subtract 'last_fwd_cnt' from it.
>Now we have:
>1) 'rx_cnt' == 'tx_cnt' at both sides.
>2) 'last_fwd_cnt' == 'peer_fwd_cnt' - because first is last 'fwd_cnt'
>   sent to tx, while second is last 'fwd_cnt' received from rx.
>
>Now 'free_space' is handled correctly and also we don't need

mmm, I don't know if it was wrong before, maybe we could say it was less 
accurate.

That said, could we have the same problem now if we have a lot of 
producers and the virtqueue becomes full?

>'low_rx_bytes' flag - this was more like a hack.
>
>Previous calculation of 'free_space' worked (in 99% cases), because if
>we take a look on behaviour of both expressions (new and previous):
>
>'(rx_cnt - last_fwd_cnt)' and '(fwd_cnt - last_fwd_cnt)'
>
>Both of them always grows up, with almost same "speed": only difference
>is that 'rx_cnt' is incremented earlier during packet is received,
>while 'fwd_cnt' in incremented when packet is read by user. So if 'rx_cnt'
>grows "faster", then resulting 'free_space' become smaller also, so we
>send credit updates a little bit more, but:
>
>  * 'free_space' calculation based on 'rx_cnt' gives the same value,
>    which tx sees as free space at rx side, so original idea of

Ditto, what happen if the virtqueue is full?

>    'free_space' is now implemented as planned.
>  * Hack with 'low_rx_bytes' now is not needed.

Yeah, so this patch should also mitigate issue reported by Alex (added 
in CC), right?

If yes, please mention that problem and add a Reported-by giving credit 
to Alex.

>
>Also here is some performance comparison between both versions of
>'free_space' calculation:
>
> *------*----------*----------*
> |      | 'rx_cnt' | previous |
> *------*----------*----------*
> |H -> G|   8.42   |   7.82   |
> *------*----------*----------*
> |G -> H|   11.6   |   12.1   |
> *------*----------*----------*

How many seconds did you run it? How many repetitions? There's a little 
discrepancy anyway, but I can't tell if it's just noise.

>
>As benchmark 'vsock-iperf' with default arguments was used. There is no
>significant performance difference before and after this patch.
>
>Signed-off-by: Arseniy Krasnov <avkrasnov@salutedevices.com>
>---
> include/linux/virtio_vsock.h            | 1 +
> net/vmw_vsock/virtio_transport_common.c | 8 +++-----
> 2 files changed, 4 insertions(+), 5 deletions(-)

Thanks for working on this, I'll do more tests but the approach LGTM.

Thanks,
Stefano

>
>diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
>index c82089dee0c8..3579491c411e 100644
>--- a/include/linux/virtio_vsock.h
>+++ b/include/linux/virtio_vsock.h
>@@ -135,6 +135,7 @@ struct virtio_vsock_sock {
> 	u32 peer_buf_alloc;
>
> 	/* Protected by rx_lock */
>+	u32 rx_cnt;
> 	u32 fwd_cnt;
> 	u32 last_fwd_cnt;
> 	u32 rx_bytes;
>diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
>index 16ff976a86e3..1d4e2328e06e 100644
>--- a/net/vmw_vsock/virtio_transport_common.c
>+++ b/net/vmw_vsock/virtio_transport_common.c
>@@ -441,6 +441,7 @@ static bool virtio_transport_inc_rx_pkt(struct virtio_vsock_sock *vvs,
> 		return false;
>
> 	vvs->rx_bytes += len;
>+	vvs->rx_cnt += len;
> 	return true;
> }
>
>@@ -558,7 +559,6 @@ virtio_transport_stream_do_dequeue(struct vsock_sock *vsk,
> 	size_t bytes, total = 0;
> 	struct sk_buff *skb;
> 	u32 fwd_cnt_delta;
>-	bool low_rx_bytes;
> 	int err = -EFAULT;
> 	u32 free_space;
>
>@@ -603,9 +603,7 @@ virtio_transport_stream_do_dequeue(struct vsock_sock *vsk,
> 	}
>
> 	fwd_cnt_delta = vvs->fwd_cnt - vvs->last_fwd_cnt;
>-	free_space = vvs->buf_alloc - fwd_cnt_delta;
>-	low_rx_bytes = (vvs->rx_bytes <
>-			sock_rcvlowat(sk_vsock(vsk), 0, INT_MAX));
>+	free_space = vvs->buf_alloc - (vvs->rx_cnt - vvs->last_fwd_cnt);
>
> 	spin_unlock_bh(&vvs->rx_lock);
>
>@@ -619,7 +617,7 @@ virtio_transport_stream_do_dequeue(struct vsock_sock *vsk,
> 	 * number of bytes in rx queue is not enough to wake up reader.
> 	 */
> 	if (fwd_cnt_delta &&
>-	    (free_space < VIRTIO_VSOCK_MAX_PKT_BUF_SIZE || low_rx_bytes))
>+	    (free_space < VIRTIO_VSOCK_MAX_PKT_BUF_SIZE))
> 		virtio_transport_send_credit_update(vsk);
>
> 	return total;
>-- 
>2.25.1
>
>
Arseniy Krasnov June 25, 2024, 1:49 p.m. UTC | #2
On 25.06.2024 16:46, Stefano Garzarella wrote:
> On Fri, Jun 21, 2024 at 10:25:40PM GMT, Arseniy Krasnov wrote:
>> Previous calculation of 'free_space' was wrong (but worked as expected
>> in most cases, see below), because it didn't account number of bytes in
>> rx queue. Let's rework 'free_space' calculation in the following way:
>> as this value is considered free space at rx side from tx point of view,
>> it must be equal to return value of 'virtio_transport_get_credit()' at
>> tx side. This function uses 'tx_cnt' counter and 'peer_fwd_cnt': first
>> is number of transmitted bytes (without wrap), second is last 'fwd_cnt'
>> value received from rx. So let's use same approach at rx side during
>> 'free_space' calculation: add 'rx_cnt' counter which is number of
>> received bytes (also without wrap) and subtract 'last_fwd_cnt' from it.
>> Now we have:
>> 1) 'rx_cnt' == 'tx_cnt' at both sides.
>> 2) 'last_fwd_cnt' == 'peer_fwd_cnt' - because first is last 'fwd_cnt'
>>   sent to tx, while second is last 'fwd_cnt' received from rx.
>>
>> Now 'free_space' is handled correctly and also we don't need
> 
> mmm, I don't know if it was wrong before, maybe we could say it was less accurate.

May be "now 'free_space' is handled in more precise way and also we ..." ?

> 
> That said, could we have the same problem now if we have a lot of producers and the virtqueue becomes full?
> 

I guess if virtqueue is full, we just wait by returning skb back to tx queue... e.g.
data exchange between two sockets just freezes. ?

>> 'low_rx_bytes' flag - this was more like a hack.
>>
>> Previous calculation of 'free_space' worked (in 99% cases), because if
>> we take a look on behaviour of both expressions (new and previous):
>>
>> '(rx_cnt - last_fwd_cnt)' and '(fwd_cnt - last_fwd_cnt)'
>>
>> Both of them always grows up, with almost same "speed": only difference
>> is that 'rx_cnt' is incremented earlier during packet is received,
>> while 'fwd_cnt' in incremented when packet is read by user. So if 'rx_cnt'
>> grows "faster", then resulting 'free_space' become smaller also, so we
>> send credit updates a little bit more, but:
>>
>>  * 'free_space' calculation based on 'rx_cnt' gives the same value,
>>    which tx sees as free space at rx side, so original idea of
> 
> Ditto, what happen if the virtqueue is full?
> 
>>    'free_space' is now implemented as planned.
>>  * Hack with 'low_rx_bytes' now is not needed.
> 
> Yeah, so this patch should also mitigate issue reported by Alex (added in CC), right?
> 
> If yes, please mention that problem and add a Reported-by giving credit to Alex.

Yes, of course!

> 
>>
>> Also here is some performance comparison between both versions of
>> 'free_space' calculation:
>>
>> *------*----------*----------*
>> |      | 'rx_cnt' | previous |
>> *------*----------*----------*
>> |H -> G|   8.42   |   7.82   |
>> *------*----------*----------*
>> |G -> H|   11.6   |   12.1   |
>> *------*----------*----------*
> 
> How many seconds did you run it? How many repetitions? There's a little discrepancy anyway, but I can't tell if it's just noise.

I run 4 times, each run for ~10 seconds... I think I can also add number of credit update messages to this report.

> 
>>
>> As benchmark 'vsock-iperf' with default arguments was used. There is no
>> significant performance difference before and after this patch.
>>
>> Signed-off-by: Arseniy Krasnov <avkrasnov@salutedevices.com>
>> ---
>> include/linux/virtio_vsock.h            | 1 +
>> net/vmw_vsock/virtio_transport_common.c | 8 +++-----
>> 2 files changed, 4 insertions(+), 5 deletions(-)
> 
> Thanks for working on this, I'll do more tests but the approach LGTM.

Got it, Thanks

> 
> Thanks,
> Stefano
> 
>>
>> diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
>> index c82089dee0c8..3579491c411e 100644
>> --- a/include/linux/virtio_vsock.h
>> +++ b/include/linux/virtio_vsock.h
>> @@ -135,6 +135,7 @@ struct virtio_vsock_sock {
>>     u32 peer_buf_alloc;
>>
>>     /* Protected by rx_lock */
>> +    u32 rx_cnt;
>>     u32 fwd_cnt;
>>     u32 last_fwd_cnt;
>>     u32 rx_bytes;
>> diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
>> index 16ff976a86e3..1d4e2328e06e 100644
>> --- a/net/vmw_vsock/virtio_transport_common.c
>> +++ b/net/vmw_vsock/virtio_transport_common.c
>> @@ -441,6 +441,7 @@ static bool virtio_transport_inc_rx_pkt(struct virtio_vsock_sock *vvs,
>>         return false;
>>
>>     vvs->rx_bytes += len;
>> +    vvs->rx_cnt += len;
>>     return true;
>> }
>>
>> @@ -558,7 +559,6 @@ virtio_transport_stream_do_dequeue(struct vsock_sock *vsk,
>>     size_t bytes, total = 0;
>>     struct sk_buff *skb;
>>     u32 fwd_cnt_delta;
>> -    bool low_rx_bytes;
>>     int err = -EFAULT;
>>     u32 free_space;
>>
>> @@ -603,9 +603,7 @@ virtio_transport_stream_do_dequeue(struct vsock_sock *vsk,
>>     }
>>
>>     fwd_cnt_delta = vvs->fwd_cnt - vvs->last_fwd_cnt;
>> -    free_space = vvs->buf_alloc - fwd_cnt_delta;
>> -    low_rx_bytes = (vvs->rx_bytes <
>> -            sock_rcvlowat(sk_vsock(vsk), 0, INT_MAX));
>> +    free_space = vvs->buf_alloc - (vvs->rx_cnt - vvs->last_fwd_cnt);
>>
>>     spin_unlock_bh(&vvs->rx_lock);
>>
>> @@ -619,7 +617,7 @@ virtio_transport_stream_do_dequeue(struct vsock_sock *vsk,
>>      * number of bytes in rx queue is not enough to wake up reader.
>>      */
>>     if (fwd_cnt_delta &&
>> -        (free_space < VIRTIO_VSOCK_MAX_PKT_BUF_SIZE || low_rx_bytes))
>> +        (free_space < VIRTIO_VSOCK_MAX_PKT_BUF_SIZE))
>>         virtio_transport_send_credit_update(vsk);
>>
>>     return total;
>> -- 
>> 2.25.1
>>
>>
>
Stefano Garzarella July 1, 2024, 3:32 p.m. UTC | #3
Hi Arseniy,

On Fri, Jun 21, 2024 at 10:25:40PM GMT, Arseniy Krasnov wrote:
>Previous calculation of 'free_space' was wrong (but worked as expected
>in most cases, see below), because it didn't account number of bytes in
>rx queue. Let's rework 'free_space' calculation in the following way:
>as this value is considered free space at rx side from tx point of 
>view,
>it must be equal to return value of 'virtio_transport_get_credit()' at
>tx side. This function uses 'tx_cnt' counter and 'peer_fwd_cnt': first
>is number of transmitted bytes (without wrap), second is last 'fwd_cnt'
>value received from rx. So let's use same approach at rx side during
>'free_space' calculation: add 'rx_cnt' counter which is number of
>received bytes (also without wrap) and subtract 'last_fwd_cnt' from it.
>Now we have:
>1) 'rx_cnt' == 'tx_cnt' at both sides.
>2) 'last_fwd_cnt' == 'peer_fwd_cnt' - because first is last 'fwd_cnt'
>   sent to tx, while second is last 'fwd_cnt' received from rx.
>
>Now 'free_space' is handled correctly and also we don't need
>'low_rx_bytes' flag - this was more like a hack.
>
>Previous calculation of 'free_space' worked (in 99% cases), because if
>we take a look on behaviour of both expressions (new and previous):
>
>'(rx_cnt - last_fwd_cnt)' and '(fwd_cnt - last_fwd_cnt)'
>
>Both of them always grows up, with almost same "speed": only difference
>is that 'rx_cnt' is incremented earlier during packet is received,
>while 'fwd_cnt' in incremented when packet is read by user. So if 
>'rx_cnt'
>grows "faster", then resulting 'free_space' become smaller also, so we
>send credit updates a little bit more, but:
>
>  * 'free_space' calculation based on 'rx_cnt' gives the same value,
>    which tx sees as free space at rx side, so original idea of
>    'free_space' is now implemented as planned.
>  * Hack with 'low_rx_bytes' now is not needed.
>
>Also here is some performance comparison between both versions of
>'free_space' calculation:
>
> *------*----------*----------*
> |      | 'rx_cnt' | previous |
> *------*----------*----------*
> |H -> G|   8.42   |   7.82   |
> *------*----------*----------*
> |G -> H|   11.6   |   12.1   |
> *------*----------*----------*

I did some tests on an Intel(R) Xeon(R) Silver 4410Y using iperf-vsock:
- kernel 6.9.0
pkt_size     G->H     H->G
4k            4.6      6.4
64k          13.8     11.5
128k         13.4     11.7

- kernel 6.9.0 with this series applied
pkt_size     G->H     H->G
4k            4.6     8.16
64k          12.2     8.9
128k         12.8     8.8

I see a big drop, especially on H->G with big packets. Can you try to 
replicate on your env?

I'll try to understand more and also an i7 on the next days.

Thanks,
Stefano

>
>As benchmark 'vsock-iperf' with default arguments was used. There is no
>significant performance difference before and after this patch.
>
>Signed-off-by: Arseniy Krasnov <avkrasnov@salutedevices.com>
>---
> include/linux/virtio_vsock.h            | 1 +
> net/vmw_vsock/virtio_transport_common.c | 8 +++-----
> 2 files changed, 4 insertions(+), 5 deletions(-)
>
>diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
>index c82089dee0c8..3579491c411e 100644
>--- a/include/linux/virtio_vsock.h
>+++ b/include/linux/virtio_vsock.h
>@@ -135,6 +135,7 @@ struct virtio_vsock_sock {
> 	u32 peer_buf_alloc;
>
> 	/* Protected by rx_lock */
>+	u32 rx_cnt;
> 	u32 fwd_cnt;
> 	u32 last_fwd_cnt;
> 	u32 rx_bytes;
>diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
>index 16ff976a86e3..1d4e2328e06e 100644
>--- a/net/vmw_vsock/virtio_transport_common.c
>+++ b/net/vmw_vsock/virtio_transport_common.c
>@@ -441,6 +441,7 @@ static bool virtio_transport_inc_rx_pkt(struct virtio_vsock_sock *vvs,
> 		return false;
>
> 	vvs->rx_bytes += len;
>+	vvs->rx_cnt += len;
> 	return true;
> }
>
>@@ -558,7 +559,6 @@ virtio_transport_stream_do_dequeue(struct vsock_sock *vsk,
> 	size_t bytes, total = 0;
> 	struct sk_buff *skb;
> 	u32 fwd_cnt_delta;
>-	bool low_rx_bytes;
> 	int err = -EFAULT;
> 	u32 free_space;
>
>@@ -603,9 +603,7 @@ virtio_transport_stream_do_dequeue(struct vsock_sock *vsk,
> 	}
>
> 	fwd_cnt_delta = vvs->fwd_cnt - vvs->last_fwd_cnt;
>-	free_space = vvs->buf_alloc - fwd_cnt_delta;
>-	low_rx_bytes = (vvs->rx_bytes <
>-			sock_rcvlowat(sk_vsock(vsk), 0, INT_MAX));
>+	free_space = vvs->buf_alloc - (vvs->rx_cnt - vvs->last_fwd_cnt);
>
> 	spin_unlock_bh(&vvs->rx_lock);
>
>@@ -619,7 +617,7 @@ virtio_transport_stream_do_dequeue(struct vsock_sock *vsk,
> 	 * number of bytes in rx queue is not enough to wake up reader.
> 	 */
> 	if (fwd_cnt_delta &&
>-	    (free_space < VIRTIO_VSOCK_MAX_PKT_BUF_SIZE || low_rx_bytes))
>+	    (free_space < VIRTIO_VSOCK_MAX_PKT_BUF_SIZE))
> 		virtio_transport_send_credit_update(vsk);
>
> 	return total;
>-- 
>2.25.1
>
>
diff mbox series

Patch

diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
index c82089dee0c8..3579491c411e 100644
--- a/include/linux/virtio_vsock.h
+++ b/include/linux/virtio_vsock.h
@@ -135,6 +135,7 @@  struct virtio_vsock_sock {
 	u32 peer_buf_alloc;
 
 	/* Protected by rx_lock */
+	u32 rx_cnt;
 	u32 fwd_cnt;
 	u32 last_fwd_cnt;
 	u32 rx_bytes;
diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
index 16ff976a86e3..1d4e2328e06e 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -441,6 +441,7 @@  static bool virtio_transport_inc_rx_pkt(struct virtio_vsock_sock *vvs,
 		return false;
 
 	vvs->rx_bytes += len;
+	vvs->rx_cnt += len;
 	return true;
 }
 
@@ -558,7 +559,6 @@  virtio_transport_stream_do_dequeue(struct vsock_sock *vsk,
 	size_t bytes, total = 0;
 	struct sk_buff *skb;
 	u32 fwd_cnt_delta;
-	bool low_rx_bytes;
 	int err = -EFAULT;
 	u32 free_space;
 
@@ -603,9 +603,7 @@  virtio_transport_stream_do_dequeue(struct vsock_sock *vsk,
 	}
 
 	fwd_cnt_delta = vvs->fwd_cnt - vvs->last_fwd_cnt;
-	free_space = vvs->buf_alloc - fwd_cnt_delta;
-	low_rx_bytes = (vvs->rx_bytes <
-			sock_rcvlowat(sk_vsock(vsk), 0, INT_MAX));
+	free_space = vvs->buf_alloc - (vvs->rx_cnt - vvs->last_fwd_cnt);
 
 	spin_unlock_bh(&vvs->rx_lock);
 
@@ -619,7 +617,7 @@  virtio_transport_stream_do_dequeue(struct vsock_sock *vsk,
 	 * number of bytes in rx queue is not enough to wake up reader.
 	 */
 	if (fwd_cnt_delta &&
-	    (free_space < VIRTIO_VSOCK_MAX_PKT_BUF_SIZE || low_rx_bytes))
+	    (free_space < VIRTIO_VSOCK_MAX_PKT_BUF_SIZE))
 		virtio_transport_send_credit_update(vsk);
 
 	return total;