mbox series

[RFC,v3,00/13] virtio/vsock: introduce SOCK_SEQPACKET support

Message ID 20210125110903.597155-1-arseny.krasnov@kaspersky.com (mailing list archive)
Headers show
Series virtio/vsock: introduce SOCK_SEQPACKET support | expand

Message

Arseny Krasnov Jan. 25, 2021, 11:09 a.m. UTC
This patchset impelements support of SOCK_SEQPACKET for virtio
transport.
	As SOCK_SEQPACKET guarantees to save record boundaries, so to
do it, new packet operation was added: it marks start of record (with
record length in header), such packet doesn't carry any data.  To send
record, packet with start marker is sent first, then all data is sent
as usual 'RW' packets. On receiver's side, length of record is known
from packet with start record marker. Now as  packets of one socket
are not reordered neither on vsock nor on vhost transport layers, such
marker allows to restore original record on receiver's side. If user's
buffer is smaller that record length, when all out of size data is
dropped.
	Maximum length of datagram is not limited as in stream socket,
because same credit logic is used. Difference with stream socket is
that user is not woken up until whole record is received or error
occurred. Implementation also supports 'MSG_EOR' and 'MSG_TRUNC' flags.
	Tests also implemented.

 Arseny Krasnov (13):
  af_vsock: prepare for SOCK_SEQPACKET support
  af_vsock: prepare 'vsock_connectible_recvmsg()'
  af_vsock: implement SEQPACKET rx loop
  af_vsock: implement send logic for SOCK_SEQPACKET
  af_vsock: rest of SEQPACKET support
  af_vsock: update comments for stream sockets
  virtio/vsock: dequeue callback for SOCK_SEQPACKET
  virtio/vsock: fetch length for SEQPACKET record
  virtio/vsock: add SEQPACKET receive logic
  virtio/vsock: rest of SOCK_SEQPACKET support
  virtio/vsock: setup SEQPACKET ops for transport
  vhost/vsock: setup SEQPACKET ops for transport
  vsock_test: add SOCK_SEQPACKET tests

 drivers/vhost/vsock.c                   |   7 +-
 include/linux/virtio_vsock.h            |  12 +
 include/net/af_vsock.h                  |   6 +
 include/uapi/linux/virtio_vsock.h       |   9 +
 net/vmw_vsock/af_vsock.c                | 543 ++++++++++++++++------
 net/vmw_vsock/virtio_transport.c        |   4 +
 net/vmw_vsock/virtio_transport_common.c | 295 ++++++++++--
 tools/testing/vsock/util.c              |  32 +-
 tools/testing/vsock/util.h              |   3 +
 tools/testing/vsock/vsock_test.c        | 126 +++++
 10 files changed, 862 insertions(+), 175 deletions(-)

 TODO:
 - Support for record integrity control. As transport could drop some
   packets, something like "record-id" and record end marker need to
   be implemented. Idea is that SEQ_BEGIN packet carries both record
   length and record id, end marker(let it be SEQ_END) carries only
   record id. To be sure that no one packet was lost, receiver checks
   length of data between SEQ_BEGIN and SEQ_END(it must be same with
   value in SEQ_BEGIN) and record ids of SEQ_BEGIN and SEQ_END(this
   means that both markers were not dropped. I think that easiest way
   to implement record id for SEQ_BEGIN is to reuse another field of
   packet header(SEQ_BEGIN already uses 'flags' as record length).For
   SEQ_END record id could be stored in 'flags'.
     Another way to implement it, is to move metadata of both SEQ_END
   and SEQ_BEGIN to payload. But this approach has problem, because
   if we move something to payload, such payload is accounted by
   credit logic, which fragments payload, while payload with record
   length and id couldn't be fragmented. One way to overcome it is to
   ignore credit update for SEQ_BEGIN/SEQ_END packet.Another solution
   is to update 'stream_has_space()' function: current implementation
   return non-zero when at least 1 byte is allowed to use,but updated
   version will have extra argument, which is needed length. For 'RW'
   packet this argument is 1, for SEQ_BEGIN it is sizeof(record len +
   record id) and for SEQ_END it is sizeof(record id).

 - What to do, when server doesn't support SOCK_SEQPACKET. In current
   implementation RST is replied in the same way when listening port
   is not found. I think that current RST is enough,because case when
   server doesn't support SEQ_PACKET is same when listener missed(e.g.
   no listener in both cases).

 v2 -> v3:
 - patches reorganized: split for prepare and implementation patches
 - local variables are declared in "Reverse Christmas tree" manner
 - virtio_transport_common.c: valid leXX_to_cpu() for vsock header
   fields access
 - af_vsock.c: 'vsock_connectible_*sockopt()' added as shared code
   between stream and seqpacket sockets.
 - af_vsock.c: loops in '__vsock_*_recvmsg()' refactored.
 - af_vsock.c: 'vsock_wait_data()' refactored.

 v1 -> v2:
 - patches reordered: af_vsock.c related changes now before virtio vsock
 - patches reorganized: more small patches, where +/- are not mixed
 - tests for SOCK_SEQPACKET added
 - all commit messages updated
 - af_vsock.c: 'vsock_pre_recv_check()' inlined to
   'vsock_connectible_recvmsg()'
 - af_vsock.c: 'vsock_assign_transport()' returns ENODEV if transport
   was not found
 - virtio_transport_common.c: transport callback for seqpacket dequeue
 - virtio_transport_common.c: simplified
   'virtio_transport_recv_connected()'
 - virtio_transport_common.c: send reset on socket and packet type
			      mismatch.

Signed-off-by: Arseny Krasnov <arseny.krasnov@kaspersky.com>

Comments

Stefano Garzarella Jan. 26, 2021, 11:23 a.m. UTC | #1
Hi Arseny,
thanks for this new series!
I'm a bit busy but I hope to review it tomorrow or on Thursday.

Stefano

On Mon, Jan 25, 2021 at 02:09:00PM +0300, Arseny Krasnov wrote:
>	This patchset impelements support of SOCK_SEQPACKET for virtio
>transport.
>	As SOCK_SEQPACKET guarantees to save record boundaries, so to
>do it, new packet operation was added: it marks start of record (with
>record length in header), such packet doesn't carry any data.  To send
>record, packet with start marker is sent first, then all data is sent
>as usual 'RW' packets. On receiver's side, length of record is known
>from packet with start record marker. Now as  packets of one socket
>are not reordered neither on vsock nor on vhost transport layers, such
>marker allows to restore original record on receiver's side. If user's
>buffer is smaller that record length, when all out of size data is
>dropped.
>	Maximum length of datagram is not limited as in stream socket,
>because same credit logic is used. Difference with stream socket is
>that user is not woken up until whole record is received or error
>occurred. Implementation also supports 'MSG_EOR' and 'MSG_TRUNC' flags.
>	Tests also implemented.
>
> Arseny Krasnov (13):
>  af_vsock: prepare for SOCK_SEQPACKET support
>  af_vsock: prepare 'vsock_connectible_recvmsg()'
>  af_vsock: implement SEQPACKET rx loop
>  af_vsock: implement send logic for SOCK_SEQPACKET
>  af_vsock: rest of SEQPACKET support
>  af_vsock: update comments for stream sockets
>  virtio/vsock: dequeue callback for SOCK_SEQPACKET
>  virtio/vsock: fetch length for SEQPACKET record
>  virtio/vsock: add SEQPACKET receive logic
>  virtio/vsock: rest of SOCK_SEQPACKET support
>  virtio/vsock: setup SEQPACKET ops for transport
>  vhost/vsock: setup SEQPACKET ops for transport
>  vsock_test: add SOCK_SEQPACKET tests
>
> drivers/vhost/vsock.c                   |   7 +-
> include/linux/virtio_vsock.h            |  12 +
> include/net/af_vsock.h                  |   6 +
> include/uapi/linux/virtio_vsock.h       |   9 +
> net/vmw_vsock/af_vsock.c                | 543 ++++++++++++++++------
> net/vmw_vsock/virtio_transport.c        |   4 +
> net/vmw_vsock/virtio_transport_common.c | 295 ++++++++++--
> tools/testing/vsock/util.c              |  32 +-
> tools/testing/vsock/util.h              |   3 +
> tools/testing/vsock/vsock_test.c        | 126 +++++
> 10 files changed, 862 insertions(+), 175 deletions(-)
>
> TODO:
> - Support for record integrity control. As transport could drop some
>   packets, something like "record-id" and record end marker need to
>   be implemented. Idea is that SEQ_BEGIN packet carries both record
>   length and record id, end marker(let it be SEQ_END) carries only
>   record id. To be sure that no one packet was lost, receiver checks
>   length of data between SEQ_BEGIN and SEQ_END(it must be same with
>   value in SEQ_BEGIN) and record ids of SEQ_BEGIN and SEQ_END(this
>   means that both markers were not dropped. I think that easiest way
>   to implement record id for SEQ_BEGIN is to reuse another field of
>   packet header(SEQ_BEGIN already uses 'flags' as record length).For
>   SEQ_END record id could be stored in 'flags'.
>     Another way to implement it, is to move metadata of both SEQ_END
>   and SEQ_BEGIN to payload. But this approach has problem, because
>   if we move something to payload, such payload is accounted by
>   credit logic, which fragments payload, while payload with record
>   length and id couldn't be fragmented. One way to overcome it is to
>   ignore credit update for SEQ_BEGIN/SEQ_END packet.Another solution
>   is to update 'stream_has_space()' function: current implementation
>   return non-zero when at least 1 byte is allowed to use,but updated
>   version will have extra argument, which is needed length. For 'RW'
>   packet this argument is 1, for SEQ_BEGIN it is sizeof(record len +
>   record id) and for SEQ_END it is sizeof(record id).
>
> - What to do, when server doesn't support SOCK_SEQPACKET. In current
>   implementation RST is replied in the same way when listening port
>   is not found. I think that current RST is enough,because case when
>   server doesn't support SEQ_PACKET is same when listener missed(e.g.
>   no listener in both cases).
>
> v2 -> v3:
> - patches reorganized: split for prepare and implementation patches
> - local variables are declared in "Reverse Christmas tree" manner
> - virtio_transport_common.c: valid leXX_to_cpu() for vsock header
>   fields access
> - af_vsock.c: 'vsock_connectible_*sockopt()' added as shared code
>   between stream and seqpacket sockets.
> - af_vsock.c: loops in '__vsock_*_recvmsg()' refactored.
> - af_vsock.c: 'vsock_wait_data()' refactored.
>
> v1 -> v2:
> - patches reordered: af_vsock.c related changes now before virtio vsock
> - patches reorganized: more small patches, where +/- are not mixed
> - tests for SOCK_SEQPACKET added
> - all commit messages updated
> - af_vsock.c: 'vsock_pre_recv_check()' inlined to
>   'vsock_connectible_recvmsg()'
> - af_vsock.c: 'vsock_assign_transport()' returns ENODEV if transport
>   was not found
> - virtio_transport_common.c: transport callback for seqpacket dequeue
> - virtio_transport_common.c: simplified
>   'virtio_transport_recv_connected()'
> - virtio_transport_common.c: send reset on socket and packet type
>			      mismatch.
>
>Signed-off-by: Arseny Krasnov <arseny.krasnov@kaspersky.com>
>
>-- 
>2.25.1
>
Stefano Garzarella Jan. 28, 2021, 5:19 p.m. UTC | #2
Hi Arseny,
I reviewed a part, tomorrow I hope to finish the other patches.

Just a couple of comments in the TODOs below.

On Mon, Jan 25, 2021 at 02:09:00PM +0300, Arseny Krasnov wrote:
>	This patchset impelements support of SOCK_SEQPACKET for virtio
>transport.
>	As SOCK_SEQPACKET guarantees to save record boundaries, so to
>do it, new packet operation was added: it marks start of record (with
>record length in header), such packet doesn't carry any data.  To send
>record, packet with start marker is sent first, then all data is sent
>as usual 'RW' packets. On receiver's side, length of record is known
>from packet with start record marker. Now as  packets of one socket
>are not reordered neither on vsock nor on vhost transport layers, such
>marker allows to restore original record on receiver's side. If user's
>buffer is smaller that record length, when all out of size data is
>dropped.
>	Maximum length of datagram is not limited as in stream socket,
>because same credit logic is used. Difference with stream socket is
>that user is not woken up until whole record is received or error
>occurred. Implementation also supports 'MSG_EOR' and 'MSG_TRUNC' flags.
>	Tests also implemented.
>
> Arseny Krasnov (13):
>  af_vsock: prepare for SOCK_SEQPACKET support
>  af_vsock: prepare 'vsock_connectible_recvmsg()'
>  af_vsock: implement SEQPACKET rx loop
>  af_vsock: implement send logic for SOCK_SEQPACKET
>  af_vsock: rest of SEQPACKET support
>  af_vsock: update comments for stream sockets
>  virtio/vsock: dequeue callback for SOCK_SEQPACKET
>  virtio/vsock: fetch length for SEQPACKET record
>  virtio/vsock: add SEQPACKET receive logic
>  virtio/vsock: rest of SOCK_SEQPACKET support
>  virtio/vsock: setup SEQPACKET ops for transport
>  vhost/vsock: setup SEQPACKET ops for transport
>  vsock_test: add SOCK_SEQPACKET tests
>
> drivers/vhost/vsock.c                   |   7 +-
> include/linux/virtio_vsock.h            |  12 +
> include/net/af_vsock.h                  |   6 +
> include/uapi/linux/virtio_vsock.h       |   9 +
> net/vmw_vsock/af_vsock.c                | 543 ++++++++++++++++------
> net/vmw_vsock/virtio_transport.c        |   4 +
> net/vmw_vsock/virtio_transport_common.c | 295 ++++++++++--
> tools/testing/vsock/util.c              |  32 +-
> tools/testing/vsock/util.h              |   3 +
> tools/testing/vsock/vsock_test.c        | 126 +++++
> 10 files changed, 862 insertions(+), 175 deletions(-)
>
> TODO:
> - Support for record integrity control. As transport could drop some
>   packets, something like "record-id" and record end marker need to
>   be implemented. Idea is that SEQ_BEGIN packet carries both record
>   length and record id, end marker(let it be SEQ_END) carries only
>   record id. To be sure that no one packet was lost, receiver checks
>   length of data between SEQ_BEGIN and SEQ_END(it must be same with
>   value in SEQ_BEGIN) and record ids of SEQ_BEGIN and SEQ_END(this
>   means that both markers were not dropped. I think that easiest way
>   to implement record id for SEQ_BEGIN is to reuse another field of
>   packet header(SEQ_BEGIN already uses 'flags' as record length).For
>   SEQ_END record id could be stored in 'flags'.

I don't really like the idea of reusing the 'flags' field for this 
purpose.

>     Another way to implement it, is to move metadata of both SEQ_END
>   and SEQ_BEGIN to payload. But this approach has problem, because
>   if we move something to payload, such payload is accounted by
>   credit logic, which fragments payload, while payload with record
>   length and id couldn't be fragmented. One way to overcome it is to
>   ignore credit update for SEQ_BEGIN/SEQ_END packet.Another solution
>   is to update 'stream_has_space()' function: current implementation
>   return non-zero when at least 1 byte is allowed to use,but updated
>   version will have extra argument, which is needed length. For 'RW'
>   packet this argument is 1, for SEQ_BEGIN it is sizeof(record len +
>   record id) and for SEQ_END it is sizeof(record id).

Is the payload accounted by credit logic also if hdr.op is not 
VIRTIO_VSOCK_OP_RW?

I think that we can define a specific header to put after the 
virtio_vsock_hdr when hdr.op is SEQ_BEGIN or SEQ_END, and in this header 
we can store the id and the length of the message.

>
> - What to do, when server doesn't support SOCK_SEQPACKET. In current
>   implementation RST is replied in the same way when listening port
>   is not found. I think that current RST is enough,because case when
>   server doesn't support SEQ_PACKET is same when listener missed(e.g.
>   no listener in both cases).

I think so, but I'll check better if we can have some issues.

Thanks,
Stefano

>
> v2 -> v3:
> - patches reorganized: split for prepare and implementation patches
> - local variables are declared in "Reverse Christmas tree" manner
> - virtio_transport_common.c: valid leXX_to_cpu() for vsock header
>   fields access
> - af_vsock.c: 'vsock_connectible_*sockopt()' added as shared code
>   between stream and seqpacket sockets.
> - af_vsock.c: loops in '__vsock_*_recvmsg()' refactored.
> - af_vsock.c: 'vsock_wait_data()' refactored.
>
> v1 -> v2:
> - patches reordered: af_vsock.c related changes now before virtio vsock
> - patches reorganized: more small patches, where +/- are not mixed
> - tests for SOCK_SEQPACKET added
> - all commit messages updated
> - af_vsock.c: 'vsock_pre_recv_check()' inlined to
>   'vsock_connectible_recvmsg()'
> - af_vsock.c: 'vsock_assign_transport()' returns ENODEV if transport
>   was not found
> - virtio_transport_common.c: transport callback for seqpacket dequeue
> - virtio_transport_common.c: simplified
>   'virtio_transport_recv_connected()'
> - virtio_transport_common.c: send reset on socket and packet type
>			      mismatch.
>
>Signed-off-by: Arseny Krasnov <arseny.krasnov@kaspersky.com>
>
>-- 
>2.25.1
>
Arseny Krasnov Jan. 29, 2021, 6:41 a.m. UTC | #3
On 28.01.2021 20:19, Stefano Garzarella wrote:
> Hi Arseny,
> I reviewed a part, tomorrow I hope to finish the other patches.
>
> Just a couple of comments in the TODOs below.
>
> On Mon, Jan 25, 2021 at 02:09:00PM +0300, Arseny Krasnov wrote:
>> 	This patchset impelements support of SOCK_SEQPACKET for virtio
>> transport.
>> 	As SOCK_SEQPACKET guarantees to save record boundaries, so to
>> do it, new packet operation was added: it marks start of record (with
>> record length in header), such packet doesn't carry any data.  To send
>> record, packet with start marker is sent first, then all data is sent
>> as usual 'RW' packets. On receiver's side, length of record is known
> >from packet with start record marker. Now as  packets of one socket
>> are not reordered neither on vsock nor on vhost transport layers, such
>> marker allows to restore original record on receiver's side. If user's
>> buffer is smaller that record length, when all out of size data is
>> dropped.
>> 	Maximum length of datagram is not limited as in stream socket,
>> because same credit logic is used. Difference with stream socket is
>> that user is not woken up until whole record is received or error
>> occurred. Implementation also supports 'MSG_EOR' and 'MSG_TRUNC' flags.
>> 	Tests also implemented.
>>
>> Arseny Krasnov (13):
>>  af_vsock: prepare for SOCK_SEQPACKET support
>>  af_vsock: prepare 'vsock_connectible_recvmsg()'
>>  af_vsock: implement SEQPACKET rx loop
>>  af_vsock: implement send logic for SOCK_SEQPACKET
>>  af_vsock: rest of SEQPACKET support
>>  af_vsock: update comments for stream sockets
>>  virtio/vsock: dequeue callback for SOCK_SEQPACKET
>>  virtio/vsock: fetch length for SEQPACKET record
>>  virtio/vsock: add SEQPACKET receive logic
>>  virtio/vsock: rest of SOCK_SEQPACKET support
>>  virtio/vsock: setup SEQPACKET ops for transport
>>  vhost/vsock: setup SEQPACKET ops for transport
>>  vsock_test: add SOCK_SEQPACKET tests
>>
>> drivers/vhost/vsock.c                   |   7 +-
>> include/linux/virtio_vsock.h            |  12 +
>> include/net/af_vsock.h                  |   6 +
>> include/uapi/linux/virtio_vsock.h       |   9 +
>> net/vmw_vsock/af_vsock.c                | 543 ++++++++++++++++------
>> net/vmw_vsock/virtio_transport.c        |   4 +
>> net/vmw_vsock/virtio_transport_common.c | 295 ++++++++++--
>> tools/testing/vsock/util.c              |  32 +-
>> tools/testing/vsock/util.h              |   3 +
>> tools/testing/vsock/vsock_test.c        | 126 +++++
>> 10 files changed, 862 insertions(+), 175 deletions(-)
>>
>> TODO:
>> - Support for record integrity control. As transport could drop some
>>   packets, something like "record-id" and record end marker need to
>>   be implemented. Idea is that SEQ_BEGIN packet carries both record
>>   length and record id, end marker(let it be SEQ_END) carries only
>>   record id. To be sure that no one packet was lost, receiver checks
>>   length of data between SEQ_BEGIN and SEQ_END(it must be same with
>>   value in SEQ_BEGIN) and record ids of SEQ_BEGIN and SEQ_END(this
>>   means that both markers were not dropped. I think that easiest way
>>   to implement record id for SEQ_BEGIN is to reuse another field of
>>   packet header(SEQ_BEGIN already uses 'flags' as record length).For
>>   SEQ_END record id could be stored in 'flags'.
> I don't really like the idea of reusing the 'flags' field for this 
> purpose.
>
>>     Another way to implement it, is to move metadata of both SEQ_END
>>   and SEQ_BEGIN to payload. But this approach has problem, because
>>   if we move something to payload, such payload is accounted by
>>   credit logic, which fragments payload, while payload with record
>>   length and id couldn't be fragmented. One way to overcome it is to
>>   ignore credit update for SEQ_BEGIN/SEQ_END packet.Another solution
>>   is to update 'stream_has_space()' function: current implementation
>>   return non-zero when at least 1 byte is allowed to use,but updated
>>   version will have extra argument, which is needed length. For 'RW'
>>   packet this argument is 1, for SEQ_BEGIN it is sizeof(record len +
>>   record id) and for SEQ_END it is sizeof(record id).
> Is the payload accounted by credit logic also if hdr.op is not 
> VIRTIO_VSOCK_OP_RW?

Yes, on send any packet with payload could be fragmented if

there is not enough space at receiver. On receive 'fwd_cnt' and

'buf_alloc' are updated with header of every packet. Of course,

to every such case i've described i can add check for 'RW'

packet, to exclude payload from credit accounting, but this is

bunch of dumb checks.

>
> I think that we can define a specific header to put after the 
> virtio_vsock_hdr when hdr.op is SEQ_BEGIN or SEQ_END, and in this header 
> we can store the id and the length of the message.

I think it is better than use payload and touch credit logic

>
>> - What to do, when server doesn't support SOCK_SEQPACKET. In current
>>   implementation RST is replied in the same way when listening port
>>   is not found. I think that current RST is enough,because case when
>>   server doesn't support SEQ_PACKET is same when listener missed(e.g.
>>   no listener in both cases).
> I think so, but I'll check better if we can have some issues.
>
> Thanks,
> Stefano
>
>> v2 -> v3:
>> - patches reorganized: split for prepare and implementation patches
>> - local variables are declared in "Reverse Christmas tree" manner
>> - virtio_transport_common.c: valid leXX_to_cpu() for vsock header
>>   fields access
>> - af_vsock.c: 'vsock_connectible_*sockopt()' added as shared code
>>   between stream and seqpacket sockets.
>> - af_vsock.c: loops in '__vsock_*_recvmsg()' refactored.
>> - af_vsock.c: 'vsock_wait_data()' refactored.
>>
>> v1 -> v2:
>> - patches reordered: af_vsock.c related changes now before virtio vsock
>> - patches reorganized: more small patches, where +/- are not mixed
>> - tests for SOCK_SEQPACKET added
>> - all commit messages updated
>> - af_vsock.c: 'vsock_pre_recv_check()' inlined to
>>   'vsock_connectible_recvmsg()'
>> - af_vsock.c: 'vsock_assign_transport()' returns ENODEV if transport
>>   was not found
>> - virtio_transport_common.c: transport callback for seqpacket dequeue
>> - virtio_transport_common.c: simplified
>>   'virtio_transport_recv_connected()'
>> - virtio_transport_common.c: send reset on socket and packet type
>> 			      mismatch.
>>
>> Signed-off-by: Arseny Krasnov <arseny.krasnov@kaspersky.com>
>>
>> -- 
>> 2.25.1
>>
>
Stefano Garzarella Jan. 29, 2021, 9:26 a.m. UTC | #4
On Fri, Jan 29, 2021 at 09:41:50AM +0300, Arseny Krasnov wrote:
>
>On 28.01.2021 20:19, Stefano Garzarella wrote:
>> Hi Arseny,
>> I reviewed a part, tomorrow I hope to finish the other patches.
>>
>> Just a couple of comments in the TODOs below.
>>
>> On Mon, Jan 25, 2021 at 02:09:00PM +0300, Arseny Krasnov wrote:
>>> 	This patchset impelements support of SOCK_SEQPACKET for virtio
>>> transport.
>>> 	As SOCK_SEQPACKET guarantees to save record boundaries, so to
>>> do it, new packet operation was added: it marks start of record (with
>>> record length in header), such packet doesn't carry any data.  To send
>>> record, packet with start marker is sent first, then all data is sent
>>> as usual 'RW' packets. On receiver's side, length of record is known
>> >from packet with start record marker. Now as  packets of one socket
>>> are not reordered neither on vsock nor on vhost transport layers, such
>>> marker allows to restore original record on receiver's side. If user's
>>> buffer is smaller that record length, when all out of size data is
>>> dropped.
>>> 	Maximum length of datagram is not limited as in stream socket,
>>> because same credit logic is used. Difference with stream socket is
>>> that user is not woken up until whole record is received or error
>>> occurred. Implementation also supports 'MSG_EOR' and 'MSG_TRUNC' flags.
>>> 	Tests also implemented.
>>>
>>> Arseny Krasnov (13):
>>>  af_vsock: prepare for SOCK_SEQPACKET support
>>>  af_vsock: prepare 'vsock_connectible_recvmsg()'
>>>  af_vsock: implement SEQPACKET rx loop
>>>  af_vsock: implement send logic for SOCK_SEQPACKET
>>>  af_vsock: rest of SEQPACKET support
>>>  af_vsock: update comments for stream sockets
>>>  virtio/vsock: dequeue callback for SOCK_SEQPACKET
>>>  virtio/vsock: fetch length for SEQPACKET record
>>>  virtio/vsock: add SEQPACKET receive logic
>>>  virtio/vsock: rest of SOCK_SEQPACKET support
>>>  virtio/vsock: setup SEQPACKET ops for transport
>>>  vhost/vsock: setup SEQPACKET ops for transport
>>>  vsock_test: add SOCK_SEQPACKET tests
>>>
>>> drivers/vhost/vsock.c                   |   7 +-
>>> include/linux/virtio_vsock.h            |  12 +
>>> include/net/af_vsock.h                  |   6 +
>>> include/uapi/linux/virtio_vsock.h       |   9 +
>>> net/vmw_vsock/af_vsock.c                | 543 ++++++++++++++++------
>>> net/vmw_vsock/virtio_transport.c        |   4 +
>>> net/vmw_vsock/virtio_transport_common.c | 295 ++++++++++--
>>> tools/testing/vsock/util.c              |  32 +-
>>> tools/testing/vsock/util.h              |   3 +
>>> tools/testing/vsock/vsock_test.c        | 126 +++++
>>> 10 files changed, 862 insertions(+), 175 deletions(-)
>>>
>>> TODO:
>>> - Support for record integrity control. As transport could drop some
>>>   packets, something like "record-id" and record end marker need to
>>>   be implemented. Idea is that SEQ_BEGIN packet carries both record
>>>   length and record id, end marker(let it be SEQ_END) carries only
>>>   record id. To be sure that no one packet was lost, receiver checks
>>>   length of data between SEQ_BEGIN and SEQ_END(it must be same with
>>>   value in SEQ_BEGIN) and record ids of SEQ_BEGIN and SEQ_END(this
>>>   means that both markers were not dropped. I think that easiest way
>>>   to implement record id for SEQ_BEGIN is to reuse another field of
>>>   packet header(SEQ_BEGIN already uses 'flags' as record length).For
>>>   SEQ_END record id could be stored in 'flags'.
>> I don't really like the idea of reusing the 'flags' field for this
>> purpose.
>>
>>>     Another way to implement it, is to move metadata of both SEQ_END
>>>   and SEQ_BEGIN to payload. But this approach has problem, because
>>>   if we move something to payload, such payload is accounted by
>>>   credit logic, which fragments payload, while payload with record
>>>   length and id couldn't be fragmented. One way to overcome it is to
>>>   ignore credit update for SEQ_BEGIN/SEQ_END packet.Another solution
>>>   is to update 'stream_has_space()' function: current implementation
>>>   return non-zero when at least 1 byte is allowed to use,but updated
>>>   version will have extra argument, which is needed length. For 'RW'
>>>   packet this argument is 1, for SEQ_BEGIN it is sizeof(record len +
>>>   record id) and for SEQ_END it is sizeof(record id).
>> Is the payload accounted by credit logic also if hdr.op is not
>> VIRTIO_VSOCK_OP_RW?
>
>Yes, on send any packet with payload could be fragmented if
>
>there is not enough space at receiver. On receive 'fwd_cnt' and
>
>'buf_alloc' are updated with header of every packet. Of course,
>
>to every such case i've described i can add check for 'RW'
>
>packet, to exclude payload from credit accounting, but this is
>
>bunch of dumb checks.
>
>>
>> I think that we can define a specific header to put after the
>> virtio_vsock_hdr when hdr.op is SEQ_BEGIN or SEQ_END, and in this header
>> we can store the id and the length of the message.
>
>I think it is better than use payload and touch credit logic
>

Cool, so let's try this option, hoping there aren't a lot of issues.

Another item for TODO could be to add the SOCK_SEQPACKET support also 
for vsock_loopback. Should be simple since it also uses 
virtio_transport_common APIs and it can be useful for testing and 
debugging.

Thanks,
Stefano
Stefano Garzarella Feb. 1, 2021, 11:02 a.m. UTC | #5
On Fri, Jan 29, 2021 at 06:52:23PM +0300, Arseny Krasnov wrote:
>
>On 29.01.2021 12:26, Stefano Garzarella wrote:
>> On Fri, Jan 29, 2021 at 09:41:50AM +0300, Arseny Krasnov wrote:
>>> On 28.01.2021 20:19, Stefano Garzarella wrote:
>>>> Hi Arseny,
>>>> I reviewed a part, tomorrow I hope to finish the other patches.
>>>>
>>>> Just a couple of comments in the TODOs below.
>>>>
>>>> On Mon, Jan 25, 2021 at 02:09:00PM +0300, Arseny Krasnov wrote:
>>>>> 	This patchset impelements support of SOCK_SEQPACKET for virtio
>>>>> transport.
>>>>> 	As SOCK_SEQPACKET guarantees to save record boundaries, so to
>>>>> do it, new packet operation was added: it marks start of record (with
>>>>> record length in header), such packet doesn't carry any data.  To send
>>>>> record, packet with start marker is sent first, then all data is sent
>>>>> as usual 'RW' packets. On receiver's side, length of record is known
>>>> >from packet with start record marker. Now as  packets of one socket
>>>>> are not reordered neither on vsock nor on vhost transport layers, such
>>>>> marker allows to restore original record on receiver's side. If user's
>>>>> buffer is smaller that record length, when all out of size data is
>>>>> dropped.
>>>>> 	Maximum length of datagram is not limited as in stream socket,
>>>>> because same credit logic is used. Difference with stream socket is
>>>>> that user is not woken up until whole record is received or error
>>>>> occurred. Implementation also supports 'MSG_EOR' and 'MSG_TRUNC' flags.
>>>>> 	Tests also implemented.
>>>>>
>>>>> Arseny Krasnov (13):
>>>>>  af_vsock: prepare for SOCK_SEQPACKET support
>>>>>  af_vsock: prepare 'vsock_connectible_recvmsg()'
>>>>>  af_vsock: implement SEQPACKET rx loop
>>>>>  af_vsock: implement send logic for SOCK_SEQPACKET
>>>>>  af_vsock: rest of SEQPACKET support
>>>>>  af_vsock: update comments for stream sockets
>>>>>  virtio/vsock: dequeue callback for SOCK_SEQPACKET
>>>>>  virtio/vsock: fetch length for SEQPACKET record
>>>>>  virtio/vsock: add SEQPACKET receive logic
>>>>>  virtio/vsock: rest of SOCK_SEQPACKET support
>>>>>  virtio/vsock: setup SEQPACKET ops for transport
>>>>>  vhost/vsock: setup SEQPACKET ops for transport
>>>>>  vsock_test: add SOCK_SEQPACKET tests
>>>>>
>>>>> drivers/vhost/vsock.c                   |   7 +-
>>>>> include/linux/virtio_vsock.h            |  12 +
>>>>> include/net/af_vsock.h                  |   6 +
>>>>> include/uapi/linux/virtio_vsock.h       |   9 +
>>>>> net/vmw_vsock/af_vsock.c                | 543 ++++++++++++++++------
>>>>> net/vmw_vsock/virtio_transport.c        |   4 +
>>>>> net/vmw_vsock/virtio_transport_common.c | 295 ++++++++++--
>>>>> tools/testing/vsock/util.c              |  32 +-
>>>>> tools/testing/vsock/util.h              |   3 +
>>>>> tools/testing/vsock/vsock_test.c        | 126 +++++
>>>>> 10 files changed, 862 insertions(+), 175 deletions(-)
>>>>>
>>>>> TODO:
>>>>> - Support for record integrity control. As transport could drop some
>>>>>   packets, something like "record-id" and record end marker need to
>>>>>   be implemented. Idea is that SEQ_BEGIN packet carries both record
>>>>>   length and record id, end marker(let it be SEQ_END) carries only
>>>>>   record id. To be sure that no one packet was lost, receiver checks
>>>>>   length of data between SEQ_BEGIN and SEQ_END(it must be same with
>>>>>   value in SEQ_BEGIN) and record ids of SEQ_BEGIN and SEQ_END(this
>>>>>   means that both markers were not dropped. I think that easiest way
>>>>>   to implement record id for SEQ_BEGIN is to reuse another field of
>>>>>   packet header(SEQ_BEGIN already uses 'flags' as record length).For
>>>>>   SEQ_END record id could be stored in 'flags'.
>>>> I don't really like the idea of reusing the 'flags' field for this
>>>> purpose.
>>>>
>>>>>     Another way to implement it, is to move metadata of both SEQ_END
>>>>>   and SEQ_BEGIN to payload. But this approach has problem, because
>>>>>   if we move something to payload, such payload is accounted by
>>>>>   credit logic, which fragments payload, while payload with record
>>>>>   length and id couldn't be fragmented. One way to overcome it is to
>>>>>   ignore credit update for SEQ_BEGIN/SEQ_END packet.Another solution
>>>>>   is to update 'stream_has_space()' function: current implementation
>>>>>   return non-zero when at least 1 byte is allowed to use,but updated
>>>>>   version will have extra argument, which is needed length. For 'RW'
>>>>>   packet this argument is 1, for SEQ_BEGIN it is sizeof(record len +
>>>>>   record id) and for SEQ_END it is sizeof(record id).
>>>> Is the payload accounted by credit logic also if hdr.op is not
>>>> VIRTIO_VSOCK_OP_RW?
>>> Yes, on send any packet with payload could be fragmented if
>>>
>>> there is not enough space at receiver. On receive 'fwd_cnt' and
>>>
>>> 'buf_alloc' are updated with header of every packet. Of course,
>>>
>>> to every such case i've described i can add check for 'RW'
>>>
>>> packet, to exclude payload from credit accounting, but this is
>>>
>>> bunch of dumb checks.
>>>
>>>> I think that we can define a specific header to put after the
>>>> virtio_vsock_hdr when hdr.op is SEQ_BEGIN or SEQ_END, and in this header
>>>> we can store the id and the length of the message.
>>> I think it is better than use payload and touch credit logic
>>>
>> Cool, so let's try this option, hoping there aren't a lot of issues.
>
>If i understand, current implementation has 'struct virtio_vsock_hdr',
>
>then i'll add 'struct virtio_vsock_hdr_seq' with message length and id.
>
>After that, in 'struct virtio_vsock_pkt' which describes packet, field for
>
>header(which is 'struct virtio_vsock_hdr') must be replaced with new
>
>structure which  contains both 'struct virtio_vsock_hdr' and 'struct
>
>virtio_vsock_hdr_seq', because header field of 'struct virtio_vsock_pkt'
>
>is buffer for virtio layer. After it all accesses to header(for example to
>
>'buf_alloc' field will go accross new  structure with both headers:
>
>pkt->hdr.buf_alloc   ->   pkt->extended_hdr.classic_hdr.buf_alloc
>
>May be to avoid this, packet's header could be allocated dynamically
>
>in the same manner as packet's buffer? Size of allocation is always
>
>sizeof(classic header) + sizeof(seq header). In 'struct virtio_vsock_pkt'
>
>such header will be implemented as union of two pointers: class header
>
>and extended header containing classic and seq header. Which pointer
>
>to use is depends on packet's op.

I think that the 'classic header' can stay as is, and the extended 
header can be dynamically allocated, as we do for the payload.

But we have to be careful what happens if the other peer doesn't support 
SEQPACKET and if it counts this extra header as a payload for the credit 
mechanism.

I'll try to take a closer look in the next few days.

Thanks,
Stefano
Arseny Krasnov Feb. 1, 2021, 1:57 p.m. UTC | #6
On 01.02.2021 14:02, Stefano Garzarella wrote:
> On Fri, Jan 29, 2021 at 06:52:23PM +0300, Arseny Krasnov wrote:
>> On 29.01.2021 12:26, Stefano Garzarella wrote:
>>> On Fri, Jan 29, 2021 at 09:41:50AM +0300, Arseny Krasnov wrote:
>>>> On 28.01.2021 20:19, Stefano Garzarella wrote:
>>>>> Hi Arseny,
>>>>> I reviewed a part, tomorrow I hope to finish the other patches.
>>>>>
>>>>> Just a couple of comments in the TODOs below.
>>>>>
>>>>> On Mon, Jan 25, 2021 at 02:09:00PM +0300, Arseny Krasnov wrote:
>>>>>> 	This patchset impelements support of SOCK_SEQPACKET for virtio
>>>>>> transport.
>>>>>> 	As SOCK_SEQPACKET guarantees to save record boundaries, so to
>>>>>> do it, new packet operation was added: it marks start of record (with
>>>>>> record length in header), such packet doesn't carry any data.  To send
>>>>>> record, packet with start marker is sent first, then all data is sent
>>>>>> as usual 'RW' packets. On receiver's side, length of record is known
>>>>> >from packet with start record marker. Now as  packets of one socket
>>>>>> are not reordered neither on vsock nor on vhost transport layers, such
>>>>>> marker allows to restore original record on receiver's side. If user's
>>>>>> buffer is smaller that record length, when all out of size data is
>>>>>> dropped.
>>>>>> 	Maximum length of datagram is not limited as in stream socket,
>>>>>> because same credit logic is used. Difference with stream socket is
>>>>>> that user is not woken up until whole record is received or error
>>>>>> occurred. Implementation also supports 'MSG_EOR' and 'MSG_TRUNC' flags.
>>>>>> 	Tests also implemented.
>>>>>>
>>>>>> Arseny Krasnov (13):
>>>>>>  af_vsock: prepare for SOCK_SEQPACKET support
>>>>>>  af_vsock: prepare 'vsock_connectible_recvmsg()'
>>>>>>  af_vsock: implement SEQPACKET rx loop
>>>>>>  af_vsock: implement send logic for SOCK_SEQPACKET
>>>>>>  af_vsock: rest of SEQPACKET support
>>>>>>  af_vsock: update comments for stream sockets
>>>>>>  virtio/vsock: dequeue callback for SOCK_SEQPACKET
>>>>>>  virtio/vsock: fetch length for SEQPACKET record
>>>>>>  virtio/vsock: add SEQPACKET receive logic
>>>>>>  virtio/vsock: rest of SOCK_SEQPACKET support
>>>>>>  virtio/vsock: setup SEQPACKET ops for transport
>>>>>>  vhost/vsock: setup SEQPACKET ops for transport
>>>>>>  vsock_test: add SOCK_SEQPACKET tests
>>>>>>
>>>>>> drivers/vhost/vsock.c                   |   7 +-
>>>>>> include/linux/virtio_vsock.h            |  12 +
>>>>>> include/net/af_vsock.h                  |   6 +
>>>>>> include/uapi/linux/virtio_vsock.h       |   9 +
>>>>>> net/vmw_vsock/af_vsock.c                | 543 ++++++++++++++++------
>>>>>> net/vmw_vsock/virtio_transport.c        |   4 +
>>>>>> net/vmw_vsock/virtio_transport_common.c | 295 ++++++++++--
>>>>>> tools/testing/vsock/util.c              |  32 +-
>>>>>> tools/testing/vsock/util.h              |   3 +
>>>>>> tools/testing/vsock/vsock_test.c        | 126 +++++
>>>>>> 10 files changed, 862 insertions(+), 175 deletions(-)
>>>>>>
>>>>>> TODO:
>>>>>> - Support for record integrity control. As transport could drop some
>>>>>>   packets, something like "record-id" and record end marker need to
>>>>>>   be implemented. Idea is that SEQ_BEGIN packet carries both record
>>>>>>   length and record id, end marker(let it be SEQ_END) carries only
>>>>>>   record id. To be sure that no one packet was lost, receiver checks
>>>>>>   length of data between SEQ_BEGIN and SEQ_END(it must be same with
>>>>>>   value in SEQ_BEGIN) and record ids of SEQ_BEGIN and SEQ_END(this
>>>>>>   means that both markers were not dropped. I think that easiest way
>>>>>>   to implement record id for SEQ_BEGIN is to reuse another field of
>>>>>>   packet header(SEQ_BEGIN already uses 'flags' as record length).For
>>>>>>   SEQ_END record id could be stored in 'flags'.
>>>>> I don't really like the idea of reusing the 'flags' field for this
>>>>> purpose.
>>>>>
>>>>>>     Another way to implement it, is to move metadata of both SEQ_END
>>>>>>   and SEQ_BEGIN to payload. But this approach has problem, because
>>>>>>   if we move something to payload, such payload is accounted by
>>>>>>   credit logic, which fragments payload, while payload with record
>>>>>>   length and id couldn't be fragmented. One way to overcome it is to
>>>>>>   ignore credit update for SEQ_BEGIN/SEQ_END packet.Another solution
>>>>>>   is to update 'stream_has_space()' function: current implementation
>>>>>>   return non-zero when at least 1 byte is allowed to use,but updated
>>>>>>   version will have extra argument, which is needed length. For 'RW'
>>>>>>   packet this argument is 1, for SEQ_BEGIN it is sizeof(record len +
>>>>>>   record id) and for SEQ_END it is sizeof(record id).
>>>>> Is the payload accounted by credit logic also if hdr.op is not
>>>>> VIRTIO_VSOCK_OP_RW?
>>>> Yes, on send any packet with payload could be fragmented if
>>>>
>>>> there is not enough space at receiver. On receive 'fwd_cnt' and
>>>>
>>>> 'buf_alloc' are updated with header of every packet. Of course,
>>>>
>>>> to every such case i've described i can add check for 'RW'
>>>>
>>>> packet, to exclude payload from credit accounting, but this is
>>>>
>>>> bunch of dumb checks.
>>>>
>>>>> I think that we can define a specific header to put after the
>>>>> virtio_vsock_hdr when hdr.op is SEQ_BEGIN or SEQ_END, and in this header
>>>>> we can store the id and the length of the message.
>>>> I think it is better than use payload and touch credit logic
>>>>
>>> Cool, so let's try this option, hoping there aren't a lot of issues.
>> If i understand, current implementation has 'struct virtio_vsock_hdr',
>>
>> then i'll add 'struct virtio_vsock_hdr_seq' with message length and id.
>>
>> After that, in 'struct virtio_vsock_pkt' which describes packet, field for
>>
>> header(which is 'struct virtio_vsock_hdr') must be replaced with new
>>
>> structure which  contains both 'struct virtio_vsock_hdr' and 'struct
>>
>> virtio_vsock_hdr_seq', because header field of 'struct virtio_vsock_pkt'
>>
>> is buffer for virtio layer. After it all accesses to header(for example to
>>
>> 'buf_alloc' field will go accross new  structure with both headers:
>>
>> pkt->hdr.buf_alloc   ->   pkt->extended_hdr.classic_hdr.buf_alloc
>>
>> May be to avoid this, packet's header could be allocated dynamically
>>
>> in the same manner as packet's buffer? Size of allocation is always
>>
>> sizeof(classic header) + sizeof(seq header). In 'struct virtio_vsock_pkt'
>>
>> such header will be implemented as union of two pointers: class header
>>
>> and extended header containing classic and seq header. Which pointer
>>
>> to use is depends on packet's op.
> I think that the 'classic header' can stay as is, and the extended 
> header can be dynamically allocated, as we do for the payload.
>
> But we have to be careful what happens if the other peer doesn't support 
> SEQPACKET and if it counts this extra header as a payload for the credit 
> mechanism.

You mean put extra header to payload(buffer of second virtio desc),

in this way on send/receive auxiliary 'if's are needed to avoid credit

logic(or set length field in header of such packets to 0). But what

about placing extra header after classic header in buffer of first virtio

desc? In this case extra header is not payload and credit works as is.

Or it is critical, that size of first buffer will be not same as size of

classic header?

>
> I'll try to take a closer look in the next few days.
>
> Thanks,
> Stefano
>
>
Stefano Garzarella Feb. 1, 2021, 2:23 p.m. UTC | #7
On Mon, Feb 01, 2021 at 04:57:18PM +0300, Arseny Krasnov wrote:
>
>On 01.02.2021 14:02, Stefano Garzarella wrote:
>> On Fri, Jan 29, 2021 at 06:52:23PM +0300, Arseny Krasnov wrote:
>>> On 29.01.2021 12:26, Stefano Garzarella wrote:
>>>> On Fri, Jan 29, 2021 at 09:41:50AM +0300, Arseny Krasnov wrote:
>>>>> On 28.01.2021 20:19, Stefano Garzarella wrote:
>>>>>> Hi Arseny,
>>>>>> I reviewed a part, tomorrow I hope to finish the other patches.
>>>>>>
>>>>>> Just a couple of comments in the TODOs below.
>>>>>>
>>>>>> On Mon, Jan 25, 2021 at 02:09:00PM +0300, Arseny Krasnov wrote:
>>>>>>> 	This patchset impelements support of SOCK_SEQPACKET for virtio
>>>>>>> transport.
>>>>>>> 	As SOCK_SEQPACKET guarantees to save record boundaries, so to
>>>>>>> do it, new packet operation was added: it marks start of record (with
>>>>>>> record length in header), such packet doesn't carry any data.  To send
>>>>>>> record, packet with start marker is sent first, then all data is sent
>>>>>>> as usual 'RW' packets. On receiver's side, length of record is known
>>>>>> >from packet with start record marker. Now as  packets of one socket
>>>>>>> are not reordered neither on vsock nor on vhost transport layers, such
>>>>>>> marker allows to restore original record on receiver's side. If user's
>>>>>>> buffer is smaller that record length, when all out of size data is
>>>>>>> dropped.
>>>>>>> 	Maximum length of datagram is not limited as in stream socket,
>>>>>>> because same credit logic is used. Difference with stream socket is
>>>>>>> that user is not woken up until whole record is received or error
>>>>>>> occurred. Implementation also supports 'MSG_EOR' and 'MSG_TRUNC' flags.
>>>>>>> 	Tests also implemented.
>>>>>>>
>>>>>>> Arseny Krasnov (13):
>>>>>>>  af_vsock: prepare for SOCK_SEQPACKET support
>>>>>>>  af_vsock: prepare 'vsock_connectible_recvmsg()'
>>>>>>>  af_vsock: implement SEQPACKET rx loop
>>>>>>>  af_vsock: implement send logic for SOCK_SEQPACKET
>>>>>>>  af_vsock: rest of SEQPACKET support
>>>>>>>  af_vsock: update comments for stream sockets
>>>>>>>  virtio/vsock: dequeue callback for SOCK_SEQPACKET
>>>>>>>  virtio/vsock: fetch length for SEQPACKET record
>>>>>>>  virtio/vsock: add SEQPACKET receive logic
>>>>>>>  virtio/vsock: rest of SOCK_SEQPACKET support
>>>>>>>  virtio/vsock: setup SEQPACKET ops for transport
>>>>>>>  vhost/vsock: setup SEQPACKET ops for transport
>>>>>>>  vsock_test: add SOCK_SEQPACKET tests
>>>>>>>
>>>>>>> drivers/vhost/vsock.c                   |   7 +-
>>>>>>> include/linux/virtio_vsock.h            |  12 +
>>>>>>> include/net/af_vsock.h                  |   6 +
>>>>>>> include/uapi/linux/virtio_vsock.h       |   9 +
>>>>>>> net/vmw_vsock/af_vsock.c                | 543 ++++++++++++++++------
>>>>>>> net/vmw_vsock/virtio_transport.c        |   4 +
>>>>>>> net/vmw_vsock/virtio_transport_common.c | 295 ++++++++++--
>>>>>>> tools/testing/vsock/util.c              |  32 +-
>>>>>>> tools/testing/vsock/util.h              |   3 +
>>>>>>> tools/testing/vsock/vsock_test.c        | 126 +++++
>>>>>>> 10 files changed, 862 insertions(+), 175 deletions(-)
>>>>>>>
>>>>>>> TODO:
>>>>>>> - Support for record integrity control. As transport could drop some
>>>>>>>   packets, something like "record-id" and record end marker need to
>>>>>>>   be implemented. Idea is that SEQ_BEGIN packet carries both record
>>>>>>>   length and record id, end marker(let it be SEQ_END) carries only
>>>>>>>   record id. To be sure that no one packet was lost, receiver checks
>>>>>>>   length of data between SEQ_BEGIN and SEQ_END(it must be same with
>>>>>>>   value in SEQ_BEGIN) and record ids of SEQ_BEGIN and SEQ_END(this
>>>>>>>   means that both markers were not dropped. I think that easiest way
>>>>>>>   to implement record id for SEQ_BEGIN is to reuse another field of
>>>>>>>   packet header(SEQ_BEGIN already uses 'flags' as record length).For
>>>>>>>   SEQ_END record id could be stored in 'flags'.
>>>>>> I don't really like the idea of reusing the 'flags' field for this
>>>>>> purpose.
>>>>>>
>>>>>>>     Another way to implement it, is to move metadata of both SEQ_END
>>>>>>>   and SEQ_BEGIN to payload. But this approach has problem, because
>>>>>>>   if we move something to payload, such payload is accounted by
>>>>>>>   credit logic, which fragments payload, while payload with record
>>>>>>>   length and id couldn't be fragmented. One way to overcome it is to
>>>>>>>   ignore credit update for SEQ_BEGIN/SEQ_END packet.Another solution
>>>>>>>   is to update 'stream_has_space()' function: current implementation
>>>>>>>   return non-zero when at least 1 byte is allowed to use,but updated
>>>>>>>   version will have extra argument, which is needed length. For 'RW'
>>>>>>>   packet this argument is 1, for SEQ_BEGIN it is sizeof(record len +
>>>>>>>   record id) and for SEQ_END it is sizeof(record id).
>>>>>> Is the payload accounted by credit logic also if hdr.op is not
>>>>>> VIRTIO_VSOCK_OP_RW?
>>>>> Yes, on send any packet with payload could be fragmented if
>>>>>
>>>>> there is not enough space at receiver. On receive 'fwd_cnt' and
>>>>>
>>>>> 'buf_alloc' are updated with header of every packet. Of course,
>>>>>
>>>>> to every such case i've described i can add check for 'RW'
>>>>>
>>>>> packet, to exclude payload from credit accounting, but this is
>>>>>
>>>>> bunch of dumb checks.
>>>>>
>>>>>> I think that we can define a specific header to put after the
>>>>>> virtio_vsock_hdr when hdr.op is SEQ_BEGIN or SEQ_END, and in this header
>>>>>> we can store the id and the length of the message.
>>>>> I think it is better than use payload and touch credit logic
>>>>>
>>>> Cool, so let's try this option, hoping there aren't a lot of issues.
>>> If i understand, current implementation has 'struct 
>>> virtio_vsock_hdr',
>>>
>>> then i'll add 'struct virtio_vsock_hdr_seq' with message length and id.
>>>
>>> After that, in 'struct virtio_vsock_pkt' which describes packet, field for
>>>
>>> header(which is 'struct virtio_vsock_hdr') must be replaced with new
>>>
>>> structure which  contains both 'struct virtio_vsock_hdr' and 'struct
>>>
>>> virtio_vsock_hdr_seq', because header field of 'struct virtio_vsock_pkt'
>>>
>>> is buffer for virtio layer. After it all accesses to header(for example to
>>>
>>> 'buf_alloc' field will go accross new  structure with both headers:
>>>
>>> pkt->hdr.buf_alloc   ->   pkt->extended_hdr.classic_hdr.buf_alloc
>>>
>>> May be to avoid this, packet's header could be allocated dynamically
>>>
>>> in the same manner as packet's buffer? Size of allocation is always
>>>
>>> sizeof(classic header) + sizeof(seq header). In 'struct virtio_vsock_pkt'
>>>
>>> such header will be implemented as union of two pointers: class header
>>>
>>> and extended header containing classic and seq header. Which pointer
>>>
>>> to use is depends on packet's op.
>> I think that the 'classic header' can stay as is, and the extended
>> header can be dynamically allocated, as we do for the payload.
>>
>> But we have to be careful what happens if the other peer doesn't support
>> SEQPACKET and if it counts this extra header as a payload for the credit
>> mechanism.
>
>You mean put extra header to payload(buffer of second virtio desc),
>
>in this way on send/receive auxiliary 'if's are needed to avoid credit
>
>logic(or set length field in header of such packets to 0). But what
>
>about placing extra header after classic header in buffer of first virtio
>
>desc? In this case extra header is not payload and credit works as is.
>
>Or it is critical, that size of first buffer will be not same as size of
>
>classic header?

We need to think about compatibility with old drivers.

What would happen in this case?

I think it's easier to use the second buffer, usually used for the 
payload, to carry the extra header. Also, we can leave hdr.len = 0, so 
we are sure that it is not counted in credit mechanism.

If the driver supports SEQPACKET, it knows it must fetch extra header 
when it must handle SEQ_BEGIN/SEQ_END.

If it is not clear, I'll try to provide a simple PoC of a patch.

Thanks,
Stefano
Arseny Krasnov Feb. 1, 2021, 2:32 p.m. UTC | #8
On 01.02.2021 17:23, Stefano Garzarella wrote:
> On Mon, Feb 01, 2021 at 04:57:18PM +0300, Arseny Krasnov wrote:
>> On 01.02.2021 14:02, Stefano Garzarella wrote:
>>> On Fri, Jan 29, 2021 at 06:52:23PM +0300, Arseny Krasnov wrote:
>>>> On 29.01.2021 12:26, Stefano Garzarella wrote:
>>>>> On Fri, Jan 29, 2021 at 09:41:50AM +0300, Arseny Krasnov wrote:
>>>>>> On 28.01.2021 20:19, Stefano Garzarella wrote:
>>>>>>> Hi Arseny,
>>>>>>> I reviewed a part, tomorrow I hope to finish the other patches.
>>>>>>>
>>>>>>> Just a couple of comments in the TODOs below.
>>>>>>>
>>>>>>> On Mon, Jan 25, 2021 at 02:09:00PM +0300, Arseny Krasnov wrote:
>>>>>>>> 	This patchset impelements support of SOCK_SEQPACKET for virtio
>>>>>>>> transport.
>>>>>>>> 	As SOCK_SEQPACKET guarantees to save record boundaries, so to
>>>>>>>> do it, new packet operation was added: it marks start of record (with
>>>>>>>> record length in header), such packet doesn't carry any data.  To send
>>>>>>>> record, packet with start marker is sent first, then all data is sent
>>>>>>>> as usual 'RW' packets. On receiver's side, length of record is known
>>>>>>> >from packet with start record marker. Now as  packets of one socket
>>>>>>>> are not reordered neither on vsock nor on vhost transport layers, such
>>>>>>>> marker allows to restore original record on receiver's side. If user's
>>>>>>>> buffer is smaller that record length, when all out of size data is
>>>>>>>> dropped.
>>>>>>>> 	Maximum length of datagram is not limited as in stream socket,
>>>>>>>> because same credit logic is used. Difference with stream socket is
>>>>>>>> that user is not woken up until whole record is received or error
>>>>>>>> occurred. Implementation also supports 'MSG_EOR' and 'MSG_TRUNC' flags.
>>>>>>>> 	Tests also implemented.
>>>>>>>>
>>>>>>>> Arseny Krasnov (13):
>>>>>>>>  af_vsock: prepare for SOCK_SEQPACKET support
>>>>>>>>  af_vsock: prepare 'vsock_connectible_recvmsg()'
>>>>>>>>  af_vsock: implement SEQPACKET rx loop
>>>>>>>>  af_vsock: implement send logic for SOCK_SEQPACKET
>>>>>>>>  af_vsock: rest of SEQPACKET support
>>>>>>>>  af_vsock: update comments for stream sockets
>>>>>>>>  virtio/vsock: dequeue callback for SOCK_SEQPACKET
>>>>>>>>  virtio/vsock: fetch length for SEQPACKET record
>>>>>>>>  virtio/vsock: add SEQPACKET receive logic
>>>>>>>>  virtio/vsock: rest of SOCK_SEQPACKET support
>>>>>>>>  virtio/vsock: setup SEQPACKET ops for transport
>>>>>>>>  vhost/vsock: setup SEQPACKET ops for transport
>>>>>>>>  vsock_test: add SOCK_SEQPACKET tests
>>>>>>>>
>>>>>>>> drivers/vhost/vsock.c                   |   7 +-
>>>>>>>> include/linux/virtio_vsock.h            |  12 +
>>>>>>>> include/net/af_vsock.h                  |   6 +
>>>>>>>> include/uapi/linux/virtio_vsock.h       |   9 +
>>>>>>>> net/vmw_vsock/af_vsock.c                | 543 ++++++++++++++++------
>>>>>>>> net/vmw_vsock/virtio_transport.c        |   4 +
>>>>>>>> net/vmw_vsock/virtio_transport_common.c | 295 ++++++++++--
>>>>>>>> tools/testing/vsock/util.c              |  32 +-
>>>>>>>> tools/testing/vsock/util.h              |   3 +
>>>>>>>> tools/testing/vsock/vsock_test.c        | 126 +++++
>>>>>>>> 10 files changed, 862 insertions(+), 175 deletions(-)
>>>>>>>>
>>>>>>>> TODO:
>>>>>>>> - Support for record integrity control. As transport could drop some
>>>>>>>>   packets, something like "record-id" and record end marker need to
>>>>>>>>   be implemented. Idea is that SEQ_BEGIN packet carries both record
>>>>>>>>   length and record id, end marker(let it be SEQ_END) carries only
>>>>>>>>   record id. To be sure that no one packet was lost, receiver checks
>>>>>>>>   length of data between SEQ_BEGIN and SEQ_END(it must be same with
>>>>>>>>   value in SEQ_BEGIN) and record ids of SEQ_BEGIN and SEQ_END(this
>>>>>>>>   means that both markers were not dropped. I think that easiest way
>>>>>>>>   to implement record id for SEQ_BEGIN is to reuse another field of
>>>>>>>>   packet header(SEQ_BEGIN already uses 'flags' as record length).For
>>>>>>>>   SEQ_END record id could be stored in 'flags'.
>>>>>>> I don't really like the idea of reusing the 'flags' field for this
>>>>>>> purpose.
>>>>>>>
>>>>>>>>     Another way to implement it, is to move metadata of both SEQ_END
>>>>>>>>   and SEQ_BEGIN to payload. But this approach has problem, because
>>>>>>>>   if we move something to payload, such payload is accounted by
>>>>>>>>   credit logic, which fragments payload, while payload with record
>>>>>>>>   length and id couldn't be fragmented. One way to overcome it is to
>>>>>>>>   ignore credit update for SEQ_BEGIN/SEQ_END packet.Another solution
>>>>>>>>   is to update 'stream_has_space()' function: current implementation
>>>>>>>>   return non-zero when at least 1 byte is allowed to use,but updated
>>>>>>>>   version will have extra argument, which is needed length. For 'RW'
>>>>>>>>   packet this argument is 1, for SEQ_BEGIN it is sizeof(record len +
>>>>>>>>   record id) and for SEQ_END it is sizeof(record id).
>>>>>>> Is the payload accounted by credit logic also if hdr.op is not
>>>>>>> VIRTIO_VSOCK_OP_RW?
>>>>>> Yes, on send any packet with payload could be fragmented if
>>>>>>
>>>>>> there is not enough space at receiver. On receive 'fwd_cnt' and
>>>>>>
>>>>>> 'buf_alloc' are updated with header of every packet. Of course,
>>>>>>
>>>>>> to every such case i've described i can add check for 'RW'
>>>>>>
>>>>>> packet, to exclude payload from credit accounting, but this is
>>>>>>
>>>>>> bunch of dumb checks.
>>>>>>
>>>>>>> I think that we can define a specific header to put after the
>>>>>>> virtio_vsock_hdr when hdr.op is SEQ_BEGIN or SEQ_END, and in this header
>>>>>>> we can store the id and the length of the message.
>>>>>> I think it is better than use payload and touch credit logic
>>>>>>
>>>>> Cool, so let's try this option, hoping there aren't a lot of issues.
>>>> If i understand, current implementation has 'struct 
>>>> virtio_vsock_hdr',
>>>>
>>>> then i'll add 'struct virtio_vsock_hdr_seq' with message length and id.
>>>>
>>>> After that, in 'struct virtio_vsock_pkt' which describes packet, field for
>>>>
>>>> header(which is 'struct virtio_vsock_hdr') must be replaced with new
>>>>
>>>> structure which  contains both 'struct virtio_vsock_hdr' and 'struct
>>>>
>>>> virtio_vsock_hdr_seq', because header field of 'struct virtio_vsock_pkt'
>>>>
>>>> is buffer for virtio layer. After it all accesses to header(for example to
>>>>
>>>> 'buf_alloc' field will go accross new  structure with both headers:
>>>>
>>>> pkt->hdr.buf_alloc   ->   pkt->extended_hdr.classic_hdr.buf_alloc
>>>>
>>>> May be to avoid this, packet's header could be allocated dynamically
>>>>
>>>> in the same manner as packet's buffer? Size of allocation is always
>>>>
>>>> sizeof(classic header) + sizeof(seq header). In 'struct virtio_vsock_pkt'
>>>>
>>>> such header will be implemented as union of two pointers: class header
>>>>
>>>> and extended header containing classic and seq header. Which pointer
>>>>
>>>> to use is depends on packet's op.
>>> I think that the 'classic header' can stay as is, and the extended
>>> header can be dynamically allocated, as we do for the payload.
>>>
>>> But we have to be careful what happens if the other peer doesn't support
>>> SEQPACKET and if it counts this extra header as a payload for the credit
>>> mechanism.
>> You mean put extra header to payload(buffer of second virtio desc),
>>
>> in this way on send/receive auxiliary 'if's are needed to avoid credit
>>
>> logic(or set length field in header of such packets to 0). But what
>>
>> about placing extra header after classic header in buffer of first virtio
>>
>> desc? In this case extra header is not payload and credit works as is.
>>
>> Or it is critical, that size of first buffer will be not same as size of
>>
>> classic header?
> We need to think about compatibility with old drivers.
Yes, compatibility seems to be a trouble.
>
> What would happen in this case?
>
> I think it's easier to use the second buffer, usually used for the 
> payload, to carry the extra header. Also, we can leave hdr.len = 0, so 
> we are sure that it is not counted in credit mechanism.

Ok, that one of possible solutions. I just wanted to inform you,

that way i'll use in v4

> If the driver supports SEQPACKET, it knows it must fetch extra header 
> when it must handle SEQ_BEGIN/SEQ_END.
>
> If it is not clear, I'll try to provide a simple PoC of a patch.

No, it is clear for me, i'll implement it in v4 also take care of

review comments.

Thank You

>
> Thanks,
> Stefano
>
>
Stefano Garzarella Feb. 1, 2021, 2:34 p.m. UTC | #9
On Mon, Feb 01, 2021 at 05:32:00PM +0300, Arseny Krasnov wrote:
>
>On 01.02.2021 17:23, Stefano Garzarella wrote:
>> On Mon, Feb 01, 2021 at 04:57:18PM +0300, Arseny Krasnov wrote:
>>> On 01.02.2021 14:02, Stefano Garzarella wrote:
>>>> On Fri, Jan 29, 2021 at 06:52:23PM +0300, Arseny Krasnov wrote:
>>>>> On 29.01.2021 12:26, Stefano Garzarella wrote:
>>>>>> On Fri, Jan 29, 2021 at 09:41:50AM +0300, Arseny Krasnov wrote:
>>>>>>> On 28.01.2021 20:19, Stefano Garzarella wrote:
>>>>>>>> Hi Arseny,
>>>>>>>> I reviewed a part, tomorrow I hope to finish the other patches.
>>>>>>>>
>>>>>>>> Just a couple of comments in the TODOs below.
>>>>>>>>
>>>>>>>> On Mon, Jan 25, 2021 at 02:09:00PM +0300, Arseny Krasnov wrote:
>>>>>>>>> 	This patchset impelements support of SOCK_SEQPACKET for virtio
>>>>>>>>> transport.
>>>>>>>>> 	As SOCK_SEQPACKET guarantees to save record boundaries, so to
>>>>>>>>> do it, new packet operation was added: it marks start of record (with
>>>>>>>>> record length in header), such packet doesn't carry any data.  To send
>>>>>>>>> record, packet with start marker is sent first, then all data is sent
>>>>>>>>> as usual 'RW' packets. On receiver's side, length of record is known
>>>>>>>> >from packet with start record marker. Now as  packets of one socket
>>>>>>>>> are not reordered neither on vsock nor on vhost transport layers, such
>>>>>>>>> marker allows to restore original record on receiver's side. If user's
>>>>>>>>> buffer is smaller that record length, when all out of size data is
>>>>>>>>> dropped.
>>>>>>>>> 	Maximum length of datagram is not limited as in stream socket,
>>>>>>>>> because same credit logic is used. Difference with stream socket is
>>>>>>>>> that user is not woken up until whole record is received or error
>>>>>>>>> occurred. Implementation also supports 'MSG_EOR' and 'MSG_TRUNC' flags.
>>>>>>>>> 	Tests also implemented.
>>>>>>>>>
>>>>>>>>> Arseny Krasnov (13):
>>>>>>>>>  af_vsock: prepare for SOCK_SEQPACKET support
>>>>>>>>>  af_vsock: prepare 'vsock_connectible_recvmsg()'
>>>>>>>>>  af_vsock: implement SEQPACKET rx loop
>>>>>>>>>  af_vsock: implement send logic for SOCK_SEQPACKET
>>>>>>>>>  af_vsock: rest of SEQPACKET support
>>>>>>>>>  af_vsock: update comments for stream sockets
>>>>>>>>>  virtio/vsock: dequeue callback for SOCK_SEQPACKET
>>>>>>>>>  virtio/vsock: fetch length for SEQPACKET record
>>>>>>>>>  virtio/vsock: add SEQPACKET receive logic
>>>>>>>>>  virtio/vsock: rest of SOCK_SEQPACKET support
>>>>>>>>>  virtio/vsock: setup SEQPACKET ops for transport
>>>>>>>>>  vhost/vsock: setup SEQPACKET ops for transport
>>>>>>>>>  vsock_test: add SOCK_SEQPACKET tests
>>>>>>>>>
>>>>>>>>> drivers/vhost/vsock.c                   |   7 +-
>>>>>>>>> include/linux/virtio_vsock.h            |  12 +
>>>>>>>>> include/net/af_vsock.h                  |   6 +
>>>>>>>>> include/uapi/linux/virtio_vsock.h       |   9 +
>>>>>>>>> net/vmw_vsock/af_vsock.c                | 543 ++++++++++++++++------
>>>>>>>>> net/vmw_vsock/virtio_transport.c        |   4 +
>>>>>>>>> net/vmw_vsock/virtio_transport_common.c | 295 ++++++++++--
>>>>>>>>> tools/testing/vsock/util.c              |  32 +-
>>>>>>>>> tools/testing/vsock/util.h              |   3 +
>>>>>>>>> tools/testing/vsock/vsock_test.c        | 126 +++++
>>>>>>>>> 10 files changed, 862 insertions(+), 175 deletions(-)
>>>>>>>>>
>>>>>>>>> TODO:
>>>>>>>>> - Support for record integrity control. As transport could drop some
>>>>>>>>>   packets, something like "record-id" and record end marker need to
>>>>>>>>>   be implemented. Idea is that SEQ_BEGIN packet carries both record
>>>>>>>>>   length and record id, end marker(let it be SEQ_END) carries only
>>>>>>>>>   record id. To be sure that no one packet was lost, receiver checks
>>>>>>>>>   length of data between SEQ_BEGIN and SEQ_END(it must be same with
>>>>>>>>>   value in SEQ_BEGIN) and record ids of SEQ_BEGIN and SEQ_END(this
>>>>>>>>>   means that both markers were not dropped. I think that easiest way
>>>>>>>>>   to implement record id for SEQ_BEGIN is to reuse another field of
>>>>>>>>>   packet header(SEQ_BEGIN already uses 'flags' as record length).For
>>>>>>>>>   SEQ_END record id could be stored in 'flags'.
>>>>>>>> I don't really like the idea of reusing the 'flags' field for this
>>>>>>>> purpose.
>>>>>>>>
>>>>>>>>>     Another way to implement it, is to move metadata of both SEQ_END
>>>>>>>>>   and SEQ_BEGIN to payload. But this approach has problem, because
>>>>>>>>>   if we move something to payload, such payload is accounted by
>>>>>>>>>   credit logic, which fragments payload, while payload with record
>>>>>>>>>   length and id couldn't be fragmented. One way to overcome it is to
>>>>>>>>>   ignore credit update for SEQ_BEGIN/SEQ_END packet.Another solution
>>>>>>>>>   is to update 'stream_has_space()' function: current implementation
>>>>>>>>>   return non-zero when at least 1 byte is allowed to use,but updated
>>>>>>>>>   version will have extra argument, which is needed length. For 'RW'
>>>>>>>>>   packet this argument is 1, for SEQ_BEGIN it is sizeof(record len +
>>>>>>>>>   record id) and for SEQ_END it is sizeof(record id).
>>>>>>>> Is the payload accounted by credit logic also if hdr.op is not
>>>>>>>> VIRTIO_VSOCK_OP_RW?
>>>>>>> Yes, on send any packet with payload could be fragmented if
>>>>>>>
>>>>>>> there is not enough space at receiver. On receive 'fwd_cnt' and
>>>>>>>
>>>>>>> 'buf_alloc' are updated with header of every packet. Of course,
>>>>>>>
>>>>>>> to every such case i've described i can add check for 'RW'
>>>>>>>
>>>>>>> packet, to exclude payload from credit accounting, but this is
>>>>>>>
>>>>>>> bunch of dumb checks.
>>>>>>>
>>>>>>>> I think that we can define a specific header to put after the
>>>>>>>> virtio_vsock_hdr when hdr.op is SEQ_BEGIN or SEQ_END, and in this header
>>>>>>>> we can store the id and the length of the message.
>>>>>>> I think it is better than use payload and touch credit logic
>>>>>>>
>>>>>> Cool, so let's try this option, hoping there aren't a lot of issues.
>>>>> If i understand, current implementation has 'struct
>>>>> virtio_vsock_hdr',
>>>>>
>>>>> then i'll add 'struct virtio_vsock_hdr_seq' with message length and id.
>>>>>
>>>>> After that, in 'struct virtio_vsock_pkt' which describes packet, field for
>>>>>
>>>>> header(which is 'struct virtio_vsock_hdr') must be replaced with new
>>>>>
>>>>> structure which  contains both 'struct virtio_vsock_hdr' and 'struct
>>>>>
>>>>> virtio_vsock_hdr_seq', because header field of 'struct virtio_vsock_pkt'
>>>>>
>>>>> is buffer for virtio layer. After it all accesses to header(for example to
>>>>>
>>>>> 'buf_alloc' field will go accross new  structure with both headers:
>>>>>
>>>>> pkt->hdr.buf_alloc   ->   pkt->extended_hdr.classic_hdr.buf_alloc
>>>>>
>>>>> May be to avoid this, packet's header could be allocated dynamically
>>>>>
>>>>> in the same manner as packet's buffer? Size of allocation is always
>>>>>
>>>>> sizeof(classic header) + sizeof(seq header). In 'struct virtio_vsock_pkt'
>>>>>
>>>>> such header will be implemented as union of two pointers: class header
>>>>>
>>>>> and extended header containing classic and seq header. Which pointer
>>>>>
>>>>> to use is depends on packet's op.
>>>> I think that the 'classic header' can stay as is, and the extended
>>>> header can be dynamically allocated, as we do for the payload.
>>>>
>>>> But we have to be careful what happens if the other peer doesn't support
>>>> SEQPACKET and if it counts this extra header as a payload for the credit
>>>> mechanism.
>>> You mean put extra header to payload(buffer of second virtio desc),
>>>
>>> in this way on send/receive auxiliary 'if's are needed to avoid credit
>>>
>>> logic(or set length field in header of such packets to 0). But what
>>>
>>> about placing extra header after classic header in buffer of first virtio
>>>
>>> desc? In this case extra header is not payload and credit works as is.
>>>
>>> Or it is critical, that size of first buffer will be not same as size of
>>>
>>> classic header?
>> We need to think about compatibility with old drivers.
>Yes, compatibility seems to be a trouble.
>>
>> What would happen in this case?
>>
>> I think it's easier to use the second buffer, usually used for the
>> payload, to carry the extra header. Also, we can leave hdr.len = 0, so
>> we are sure that it is not counted in credit mechanism.
>
>Ok, that one of possible solutions. I just wanted to inform you,
>
>that way i'll use in v4
>
>> If the driver supports SEQPACKET, it knows it must fetch extra header
>> when it must handle SEQ_BEGIN/SEQ_END.
>>
>> If it is not clear, I'll try to provide a simple PoC of a patch.
>
>No, it is clear for me, i'll implement it in v4 also take care of
>
>review comments.

Great! Let me know if any issues we haven't considered come up.

Stefano