Message ID | 20210218053347.1066159-1-arseny.krasnov@kaspersky.com (mailing list archive) |
---|---|
Headers | show |
Series | virtio/vsock: introduce SOCK_SEQPACKET support | expand |
Hi Arseny, On Thu, Feb 18, 2021 at 08:33:44AM +0300, Arseny Krasnov wrote: > This patchset impelements support of SOCK_SEQPACKET for virtio >transport. > As SOCK_SEQPACKET guarantees to save record boundaries, so to >do it, two new packet operations were added: first for start of record > and second to mark end of record(SEQ_BEGIN and SEQ_END later). Also, >both operations carries metadata - to maintain boundaries and payload >integrity. Metadata is introduced by adding special header with two >fields - message count and message length: > > struct virtio_vsock_seq_hdr { > __le32 msg_cnt; > __le32 msg_len; > } __attribute__((packed)); > > This header is transmitted as payload of SEQ_BEGIN and SEQ_END >packets(buffer of second virtio descriptor in chain) in the same way as >data transmitted in RW packets. Payload was chosen as buffer for this >header to avoid touching first virtio buffer which carries header of >packet, because someone could check that size of this buffer is equal >to size of packet header. To send record, packet with start marker is >sent first(it's header contains length of record and counter), then >counter is incremented and all data is sent as usual 'RW' packets and >finally SEQ_END is sent(it also carries counter of message, which is >counter of SEQ_BEGIN + 1), also after sedning SEQ_END counter is >incremented again. On receiver's side, length of record is known from >packet with start record marker. To check that no packets were dropped >by transport, counters of two sequential SEQ_BEGIN and SEQ_END are >checked(counter of SEQ_END must be bigger that counter of SEQ_BEGIN by >1) and length of data between two markers is compared to length in >SEQ_BEGIN header. > Now as packets of one socket are not reordered neither on >vsock nor on vhost transport layers, such markers allows to restore >original record on receiver's side. If user's buffer is smaller that >record length, when all out of size data is dropped. > Maximum length of datagram is not limited as in stream socket, >because same credit logic is used. Difference with stream socket is >that user is not woken up until whole record is received or error >occurred. Implementation also supports 'MSG_EOR' and 'MSG_TRUNC' flags. > Tests also implemented. I reviewed the first part (af_vsock.c changes), tomorrow I'll review the rest. That part looks great to me, only found a few minor issues. In the meantime, however, I'm getting a doubt, especially with regard to other transports besides virtio. Should we hide the begin/end marker sending in the transport? I mean, should the transport just provide a seqpacket_enqueue() callbacl? Inside it then the transport will send the markers. This is because some transports might not need to send markers. But thinking about it more, they could actually implement stubs for that calls, if they don't need to send markers. So I think for now it's fine since it allows us to reuse a lot of code, unless someone has some objection. Thanks, Stefano
On Mon, Feb 22, 2021 at 03:23:11PM +0100, Stefano Garzarella wrote: >Hi Arseny, > >On Thu, Feb 18, 2021 at 08:33:44AM +0300, Arseny Krasnov wrote: >> This patchset impelements support of SOCK_SEQPACKET for virtio >>transport. >> As SOCK_SEQPACKET guarantees to save record boundaries, so to >>do it, two new packet operations were added: first for start of record >>and second to mark end of record(SEQ_BEGIN and SEQ_END later). Also, >>both operations carries metadata - to maintain boundaries and payload >>integrity. Metadata is introduced by adding special header with two >>fields - message count and message length: >> >> struct virtio_vsock_seq_hdr { >> __le32 msg_cnt; >> __le32 msg_len; >> } __attribute__((packed)); >> >> This header is transmitted as payload of SEQ_BEGIN and SEQ_END >>packets(buffer of second virtio descriptor in chain) in the same way as >>data transmitted in RW packets. Payload was chosen as buffer for this >>header to avoid touching first virtio buffer which carries header of >>packet, because someone could check that size of this buffer is equal >>to size of packet header. To send record, packet with start marker is >>sent first(it's header contains length of record and counter), then >>counter is incremented and all data is sent as usual 'RW' packets and >>finally SEQ_END is sent(it also carries counter of message, which is >>counter of SEQ_BEGIN + 1), also after sedning SEQ_END counter is >>incremented again. On receiver's side, length of record is known from >>packet with start record marker. To check that no packets were dropped >>by transport, counters of two sequential SEQ_BEGIN and SEQ_END are >>checked(counter of SEQ_END must be bigger that counter of SEQ_BEGIN by >>1) and length of data between two markers is compared to length in >>SEQ_BEGIN header. >> Now as packets of one socket are not reordered neither on >>vsock nor on vhost transport layers, such markers allows to restore >>original record on receiver's side. If user's buffer is smaller that >>record length, when all out of size data is dropped. >> Maximum length of datagram is not limited as in stream socket, >>because same credit logic is used. Difference with stream socket is >>that user is not woken up until whole record is received or error >>occurred. Implementation also supports 'MSG_EOR' and 'MSG_TRUNC' flags. >> Tests also implemented. > >I reviewed the first part (af_vsock.c changes), tomorrow I'll review >the rest. That part looks great to me, only found a few minor issues. I revieiwed the rest of it as well, left a few minor comments, but I think we're well on track. I'll take a better look at the specification patch tomorrow. Thanks, Stefano > >In the meantime, however, I'm getting a doubt, especially with regard >to other transports besides virtio. > >Should we hide the begin/end marker sending in the transport? > >I mean, should the transport just provide a seqpacket_enqueue() >callbacl? >Inside it then the transport will send the markers. This is because >some transports might not need to send markers. > >But thinking about it more, they could actually implement stubs for >that calls, if they don't need to send markers. > >So I think for now it's fine since it allows us to reuse a lot of >code, unless someone has some objection. > >Thanks, >Stefano >
On 23.02.2021 17:50, Stefano Garzarella wrote: > On Mon, Feb 22, 2021 at 03:23:11PM +0100, Stefano Garzarella wrote: >> Hi Arseny, >> >> On Thu, Feb 18, 2021 at 08:33:44AM +0300, Arseny Krasnov wrote: >>> This patchset impelements support of SOCK_SEQPACKET for virtio >>> transport. >>> As SOCK_SEQPACKET guarantees to save record boundaries, so to >>> do it, two new packet operations were added: first for start of record >>> and second to mark end of record(SEQ_BEGIN and SEQ_END later). Also, >>> both operations carries metadata - to maintain boundaries and payload >>> integrity. Metadata is introduced by adding special header with two >>> fields - message count and message length: >>> >>> struct virtio_vsock_seq_hdr { >>> __le32 msg_cnt; >>> __le32 msg_len; >>> } __attribute__((packed)); >>> >>> This header is transmitted as payload of SEQ_BEGIN and SEQ_END >>> packets(buffer of second virtio descriptor in chain) in the same way as >>> data transmitted in RW packets. Payload was chosen as buffer for this >>> header to avoid touching first virtio buffer which carries header of >>> packet, because someone could check that size of this buffer is equal >>> to size of packet header. To send record, packet with start marker is >>> sent first(it's header contains length of record and counter), then >>> counter is incremented and all data is sent as usual 'RW' packets and >>> finally SEQ_END is sent(it also carries counter of message, which is >>> counter of SEQ_BEGIN + 1), also after sedning SEQ_END counter is >>> incremented again. On receiver's side, length of record is known from >>> packet with start record marker. To check that no packets were dropped >>> by transport, counters of two sequential SEQ_BEGIN and SEQ_END are >>> checked(counter of SEQ_END must be bigger that counter of SEQ_BEGIN by >>> 1) and length of data between two markers is compared to length in >>> SEQ_BEGIN header. >>> Now as packets of one socket are not reordered neither on >>> vsock nor on vhost transport layers, such markers allows to restore >>> original record on receiver's side. If user's buffer is smaller that >>> record length, when all out of size data is dropped. >>> Maximum length of datagram is not limited as in stream socket, >>> because same credit logic is used. Difference with stream socket is >>> that user is not woken up until whole record is received or error >>> occurred. Implementation also supports 'MSG_EOR' and 'MSG_TRUNC' flags. >>> Tests also implemented. >> I reviewed the first part (af_vsock.c changes), tomorrow I'll review >> the rest. That part looks great to me, only found a few minor issues. > I revieiwed the rest of it as well, left a few minor comments, but I > think we're well on track. > > I'll take a better look at the specification patch tomorrow. Great, Thank You > > Thanks, > Stefano > >> In the meantime, however, I'm getting a doubt, especially with regard >> to other transports besides virtio. >> >> Should we hide the begin/end marker sending in the transport? >> >> I mean, should the transport just provide a seqpacket_enqueue() >> callbacl? >> Inside it then the transport will send the markers. This is because >> some transports might not need to send markers. >> >> But thinking about it more, they could actually implement stubs for >> that calls, if they don't need to send markers. >> >> So I think for now it's fine since it allows us to reuse a lot of >> code, unless someone has some objection. I thought about that, I'll try to implement it in next version. Let's see... >> >> Thanks, >> Stefano >> >
On Wed, Feb 24, 2021 at 07:29:25AM +0300, Arseny Krasnov wrote: > >On 23.02.2021 17:50, Stefano Garzarella wrote: >> On Mon, Feb 22, 2021 at 03:23:11PM +0100, Stefano Garzarella wrote: >>> Hi Arseny, >>> >>> On Thu, Feb 18, 2021 at 08:33:44AM +0300, Arseny Krasnov wrote: >>>> This patchset impelements support of SOCK_SEQPACKET for virtio >>>> transport. >>>> As SOCK_SEQPACKET guarantees to save record boundaries, so to >>>> do it, two new packet operations were added: first for start of record >>>> and second to mark end of record(SEQ_BEGIN and SEQ_END later). Also, >>>> both operations carries metadata - to maintain boundaries and payload >>>> integrity. Metadata is introduced by adding special header with two >>>> fields - message count and message length: >>>> >>>> struct virtio_vsock_seq_hdr { >>>> __le32 msg_cnt; >>>> __le32 msg_len; >>>> } __attribute__((packed)); >>>> >>>> This header is transmitted as payload of SEQ_BEGIN and SEQ_END >>>> packets(buffer of second virtio descriptor in chain) in the same way as >>>> data transmitted in RW packets. Payload was chosen as buffer for this >>>> header to avoid touching first virtio buffer which carries header of >>>> packet, because someone could check that size of this buffer is equal >>>> to size of packet header. To send record, packet with start marker is >>>> sent first(it's header contains length of record and counter), then >>>> counter is incremented and all data is sent as usual 'RW' packets and >>>> finally SEQ_END is sent(it also carries counter of message, which is >>>> counter of SEQ_BEGIN + 1), also after sedning SEQ_END counter is >>>> incremented again. On receiver's side, length of record is known from >>>> packet with start record marker. To check that no packets were dropped >>>> by transport, counters of two sequential SEQ_BEGIN and SEQ_END are >>>> checked(counter of SEQ_END must be bigger that counter of SEQ_BEGIN by >>>> 1) and length of data between two markers is compared to length in >>>> SEQ_BEGIN header. >>>> Now as packets of one socket are not reordered neither on >>>> vsock nor on vhost transport layers, such markers allows to restore >>>> original record on receiver's side. If user's buffer is smaller that >>>> record length, when all out of size data is dropped. >>>> Maximum length of datagram is not limited as in stream socket, >>>> because same credit logic is used. Difference with stream socket is >>>> that user is not woken up until whole record is received or error >>>> occurred. Implementation also supports 'MSG_EOR' and 'MSG_TRUNC' flags. >>>> Tests also implemented. >>> I reviewed the first part (af_vsock.c changes), tomorrow I'll review >>> the rest. That part looks great to me, only found a few minor issues. >> I revieiwed the rest of it as well, left a few minor comments, but I >> think we're well on track. >> >> I'll take a better look at the specification patch tomorrow. >Great, Thank You >> >> Thanks, >> Stefano >> >>> In the meantime, however, I'm getting a doubt, especially with regard >>> to other transports besides virtio. >>> >>> Should we hide the begin/end marker sending in the transport? >>> >>> I mean, should the transport just provide a seqpacket_enqueue() >>> callbacl? >>> Inside it then the transport will send the markers. This is because >>> some transports might not need to send markers. >>> >>> But thinking about it more, they could actually implement stubs for >>> that calls, if they don't need to send markers. >>> >>> So I think for now it's fine since it allows us to reuse a lot of >>> code, unless someone has some objection. > >I thought about that, I'll try to implement it in next version. Let's see... If you want to discuss it first, write down the idea you want to implement, I wouldn't want to make you do unnecessary work. :-) Cheers, Stefano
On 24.02.2021 11:23, Stefano Garzarella wrote: > On Wed, Feb 24, 2021 at 07:29:25AM +0300, Arseny Krasnov wrote: >> On 23.02.2021 17:50, Stefano Garzarella wrote: >>> On Mon, Feb 22, 2021 at 03:23:11PM +0100, Stefano Garzarella wrote: >>>> Hi Arseny, >>>> >>>> On Thu, Feb 18, 2021 at 08:33:44AM +0300, Arseny Krasnov wrote: >>>>> This patchset impelements support of SOCK_SEQPACKET for virtio >>>>> transport. >>>>> As SOCK_SEQPACKET guarantees to save record boundaries, so to >>>>> do it, two new packet operations were added: first for start of record >>>>> and second to mark end of record(SEQ_BEGIN and SEQ_END later). Also, >>>>> both operations carries metadata - to maintain boundaries and payload >>>>> integrity. Metadata is introduced by adding special header with two >>>>> fields - message count and message length: >>>>> >>>>> struct virtio_vsock_seq_hdr { >>>>> __le32 msg_cnt; >>>>> __le32 msg_len; >>>>> } __attribute__((packed)); >>>>> >>>>> This header is transmitted as payload of SEQ_BEGIN and SEQ_END >>>>> packets(buffer of second virtio descriptor in chain) in the same way as >>>>> data transmitted in RW packets. Payload was chosen as buffer for this >>>>> header to avoid touching first virtio buffer which carries header of >>>>> packet, because someone could check that size of this buffer is equal >>>>> to size of packet header. To send record, packet with start marker is >>>>> sent first(it's header contains length of record and counter), then >>>>> counter is incremented and all data is sent as usual 'RW' packets and >>>>> finally SEQ_END is sent(it also carries counter of message, which is >>>>> counter of SEQ_BEGIN + 1), also after sedning SEQ_END counter is >>>>> incremented again. On receiver's side, length of record is known from >>>>> packet with start record marker. To check that no packets were dropped >>>>> by transport, counters of two sequential SEQ_BEGIN and SEQ_END are >>>>> checked(counter of SEQ_END must be bigger that counter of SEQ_BEGIN by >>>>> 1) and length of data between two markers is compared to length in >>>>> SEQ_BEGIN header. >>>>> Now as packets of one socket are not reordered neither on >>>>> vsock nor on vhost transport layers, such markers allows to restore >>>>> original record on receiver's side. If user's buffer is smaller that >>>>> record length, when all out of size data is dropped. >>>>> Maximum length of datagram is not limited as in stream socket, >>>>> because same credit logic is used. Difference with stream socket is >>>>> that user is not woken up until whole record is received or error >>>>> occurred. Implementation also supports 'MSG_EOR' and 'MSG_TRUNC' flags. >>>>> Tests also implemented. >>>> I reviewed the first part (af_vsock.c changes), tomorrow I'll review >>>> the rest. That part looks great to me, only found a few minor issues. >>> I revieiwed the rest of it as well, left a few minor comments, but I >>> think we're well on track. >>> >>> I'll take a better look at the specification patch tomorrow. >> Great, Thank You >>> Thanks, >>> Stefano >>> >>>> In the meantime, however, I'm getting a doubt, especially with regard >>>> to other transports besides virtio. >>>> >>>> Should we hide the begin/end marker sending in the transport? >>>> >>>> I mean, should the transport just provide a seqpacket_enqueue() >>>> callbacl? >>>> Inside it then the transport will send the markers. This is because >>>> some transports might not need to send markers. >>>> >>>> But thinking about it more, they could actually implement stubs for >>>> that calls, if they don't need to send markers. >>>> >>>> So I think for now it's fine since it allows us to reuse a lot of >>>> code, unless someone has some objection. >> I thought about that, I'll try to implement it in next version. Let's see... > If you want to discuss it first, write down the idea you want to > implement, I wouldn't want to make you do unnecessary work. :-) Idea is simple, in iov iterator of 'struct msghdr' which is passed to enqueue callback we have two fields: 'iov_offset' which is byte offset inside io vector where next data must be picked and 'count' which is rest of unprocessed bytes in io vector. So in seqpacket enqueue callback if 'iov_offset' is 0 i'll send SEQBEGIN, and if 'count' is 0 i'll send SEQEND. > > Cheers, > Stefano > >
On Wed, Feb 24, 2021 at 11:28:50AM +0300, Arseny Krasnov wrote: > >On 24.02.2021 11:23, Stefano Garzarella wrote: >> On Wed, Feb 24, 2021 at 07:29:25AM +0300, Arseny Krasnov wrote: >>> On 23.02.2021 17:50, Stefano Garzarella wrote: >>>> On Mon, Feb 22, 2021 at 03:23:11PM +0100, Stefano Garzarella wrote: >>>>> Hi Arseny, >>>>> >>>>> On Thu, Feb 18, 2021 at 08:33:44AM +0300, Arseny Krasnov wrote: >>>>>> This patchset impelements support of SOCK_SEQPACKET for virtio >>>>>> transport. >>>>>> As SOCK_SEQPACKET guarantees to save record boundaries, so to >>>>>> do it, two new packet operations were added: first for start of record >>>>>> and second to mark end of record(SEQ_BEGIN and SEQ_END later). Also, >>>>>> both operations carries metadata - to maintain boundaries and payload >>>>>> integrity. Metadata is introduced by adding special header with two >>>>>> fields - message count and message length: >>>>>> >>>>>> struct virtio_vsock_seq_hdr { >>>>>> __le32 msg_cnt; >>>>>> __le32 msg_len; >>>>>> } __attribute__((packed)); >>>>>> >>>>>> This header is transmitted as payload of SEQ_BEGIN and SEQ_END >>>>>> packets(buffer of second virtio descriptor in chain) in the same way as >>>>>> data transmitted in RW packets. Payload was chosen as buffer for this >>>>>> header to avoid touching first virtio buffer which carries header of >>>>>> packet, because someone could check that size of this buffer is equal >>>>>> to size of packet header. To send record, packet with start marker is >>>>>> sent first(it's header contains length of record and counter), then >>>>>> counter is incremented and all data is sent as usual 'RW' packets and >>>>>> finally SEQ_END is sent(it also carries counter of message, which is >>>>>> counter of SEQ_BEGIN + 1), also after sedning SEQ_END counter is >>>>>> incremented again. On receiver's side, length of record is known from >>>>>> packet with start record marker. To check that no packets were dropped >>>>>> by transport, counters of two sequential SEQ_BEGIN and SEQ_END are >>>>>> checked(counter of SEQ_END must be bigger that counter of SEQ_BEGIN by >>>>>> 1) and length of data between two markers is compared to length in >>>>>> SEQ_BEGIN header. >>>>>> Now as packets of one socket are not reordered neither on >>>>>> vsock nor on vhost transport layers, such markers allows to restore >>>>>> original record on receiver's side. If user's buffer is smaller that >>>>>> record length, when all out of size data is dropped. >>>>>> Maximum length of datagram is not limited as in stream socket, >>>>>> because same credit logic is used. Difference with stream socket is >>>>>> that user is not woken up until whole record is received or error >>>>>> occurred. Implementation also supports 'MSG_EOR' and 'MSG_TRUNC' flags. >>>>>> Tests also implemented. >>>>> I reviewed the first part (af_vsock.c changes), tomorrow I'll review >>>>> the rest. That part looks great to me, only found a few minor issues. >>>> I revieiwed the rest of it as well, left a few minor comments, but I >>>> think we're well on track. >>>> >>>> I'll take a better look at the specification patch tomorrow. >>> Great, Thank You >>>> Thanks, >>>> Stefano >>>> >>>>> In the meantime, however, I'm getting a doubt, especially with regard >>>>> to other transports besides virtio. >>>>> >>>>> Should we hide the begin/end marker sending in the transport? >>>>> >>>>> I mean, should the transport just provide a seqpacket_enqueue() >>>>> callbacl? >>>>> Inside it then the transport will send the markers. This is because >>>>> some transports might not need to send markers. >>>>> >>>>> But thinking about it more, they could actually implement stubs for >>>>> that calls, if they don't need to send markers. >>>>> >>>>> So I think for now it's fine since it allows us to reuse a lot of >>>>> code, unless someone has some objection. >>> I thought about that, I'll try to implement it in next version. Let's see... >> If you want to discuss it first, write down the idea you want to >> implement, I wouldn't want to make you do unnecessary work. :-) > >Idea is simple, in iov iterator of 'struct msghdr' which is passed to > >enqueue callback we have two fields: 'iov_offset' which is byte > >offset inside io vector where next data must be picked and 'count' > >which is rest of unprocessed bytes in io vector. So in seqpacket > >enqueue callback if 'iov_offset' is 0 i'll send SEQBEGIN, and if > >'count' is 0 i'll send SEQEND. > Got it, make sense and it's defently more transparent for the vsock core! Go head, maybe adding a comment in the vsock core explaining this, so other developers can understand better if they want to support SEPACKET in other transports. Thanks, Stefano
On 24.02.2021 11:35, Stefano Garzarella wrote: > On Wed, Feb 24, 2021 at 11:28:50AM +0300, Arseny Krasnov wrote: >> On 24.02.2021 11:23, Stefano Garzarella wrote: >>> On Wed, Feb 24, 2021 at 07:29:25AM +0300, Arseny Krasnov wrote: >>>> On 23.02.2021 17:50, Stefano Garzarella wrote: >>>>> On Mon, Feb 22, 2021 at 03:23:11PM +0100, Stefano Garzarella wrote: >>>>>> Hi Arseny, >>>>>> >>>>>> On Thu, Feb 18, 2021 at 08:33:44AM +0300, Arseny Krasnov wrote: >>>>>>> This patchset impelements support of SOCK_SEQPACKET for virtio >>>>>>> transport. >>>>>>> As SOCK_SEQPACKET guarantees to save record boundaries, so to >>>>>>> do it, two new packet operations were added: first for start of record >>>>>>> and second to mark end of record(SEQ_BEGIN and SEQ_END later). Also, >>>>>>> both operations carries metadata - to maintain boundaries and payload >>>>>>> integrity. Metadata is introduced by adding special header with two >>>>>>> fields - message count and message length: >>>>>>> >>>>>>> struct virtio_vsock_seq_hdr { >>>>>>> __le32 msg_cnt; >>>>>>> __le32 msg_len; >>>>>>> } __attribute__((packed)); >>>>>>> >>>>>>> This header is transmitted as payload of SEQ_BEGIN and SEQ_END >>>>>>> packets(buffer of second virtio descriptor in chain) in the same way as >>>>>>> data transmitted in RW packets. Payload was chosen as buffer for this >>>>>>> header to avoid touching first virtio buffer which carries header of >>>>>>> packet, because someone could check that size of this buffer is equal >>>>>>> to size of packet header. To send record, packet with start marker is >>>>>>> sent first(it's header contains length of record and counter), then >>>>>>> counter is incremented and all data is sent as usual 'RW' packets and >>>>>>> finally SEQ_END is sent(it also carries counter of message, which is >>>>>>> counter of SEQ_BEGIN + 1), also after sedning SEQ_END counter is >>>>>>> incremented again. On receiver's side, length of record is known from >>>>>>> packet with start record marker. To check that no packets were dropped >>>>>>> by transport, counters of two sequential SEQ_BEGIN and SEQ_END are >>>>>>> checked(counter of SEQ_END must be bigger that counter of SEQ_BEGIN by >>>>>>> 1) and length of data between two markers is compared to length in >>>>>>> SEQ_BEGIN header. >>>>>>> Now as packets of one socket are not reordered neither on >>>>>>> vsock nor on vhost transport layers, such markers allows to restore >>>>>>> original record on receiver's side. If user's buffer is smaller that >>>>>>> record length, when all out of size data is dropped. >>>>>>> Maximum length of datagram is not limited as in stream socket, >>>>>>> because same credit logic is used. Difference with stream socket is >>>>>>> that user is not woken up until whole record is received or error >>>>>>> occurred. Implementation also supports 'MSG_EOR' and 'MSG_TRUNC' flags. >>>>>>> Tests also implemented. >>>>>> I reviewed the first part (af_vsock.c changes), tomorrow I'll review >>>>>> the rest. That part looks great to me, only found a few minor issues. >>>>> I revieiwed the rest of it as well, left a few minor comments, but I >>>>> think we're well on track. >>>>> >>>>> I'll take a better look at the specification patch tomorrow. >>>> Great, Thank You >>>>> Thanks, >>>>> Stefano >>>>> >>>>>> In the meantime, however, I'm getting a doubt, especially with regard >>>>>> to other transports besides virtio. >>>>>> >>>>>> Should we hide the begin/end marker sending in the transport? >>>>>> >>>>>> I mean, should the transport just provide a seqpacket_enqueue() >>>>>> callbacl? >>>>>> Inside it then the transport will send the markers. This is because >>>>>> some transports might not need to send markers. >>>>>> >>>>>> But thinking about it more, they could actually implement stubs for >>>>>> that calls, if they don't need to send markers. >>>>>> >>>>>> So I think for now it's fine since it allows us to reuse a lot of >>>>>> code, unless someone has some objection. >>>> I thought about that, I'll try to implement it in next version. Let's see... >>> If you want to discuss it first, write down the idea you want to >>> implement, I wouldn't want to make you do unnecessary work. :-) >> Idea is simple, in iov iterator of 'struct msghdr' which is passed to >> >> enqueue callback we have two fields: 'iov_offset' which is byte >> >> offset inside io vector where next data must be picked and 'count' >> >> which is rest of unprocessed bytes in io vector. So in seqpacket >> >> enqueue callback if 'iov_offset' is 0 i'll send SEQBEGIN, and if >> >> 'count' is 0 i'll send SEQEND. >> > Got it, make sense and it's defently more transparent for the vsock > core! > Go head, maybe adding a comment in the vsock core explaining this, so > other developers can understand better if they want to support SEPACKET > in other transports. Ack > > Thanks, > Stefano > >
This patchset impelements support of SOCK_SEQPACKET for virtio transport. As SOCK_SEQPACKET guarantees to save record boundaries, so to do it, two new packet operations were added: first for start of record and second to mark end of record(SEQ_BEGIN and SEQ_END later). Also, both operations carries metadata - to maintain boundaries and payload integrity. Metadata is introduced by adding special header with two fields - message count and message length: struct virtio_vsock_seq_hdr { __le32 msg_cnt; __le32 msg_len; } __attribute__((packed)); This header is transmitted as payload of SEQ_BEGIN and SEQ_END packets(buffer of second virtio descriptor in chain) in the same way as data transmitted in RW packets. Payload was chosen as buffer for this header to avoid touching first virtio buffer which carries header of packet, because someone could check that size of this buffer is equal to size of packet header. To send record, packet with start marker is sent first(it's header contains length of record and counter), then counter is incremented and all data is sent as usual 'RW' packets and finally SEQ_END is sent(it also carries counter of message, which is counter of SEQ_BEGIN + 1), also after sedning SEQ_END counter is incremented again. On receiver's side, length of record is known from packet with start record marker. To check that no packets were dropped by transport, counters of two sequential SEQ_BEGIN and SEQ_END are checked(counter of SEQ_END must be bigger that counter of SEQ_BEGIN by 1) and length of data between two markers is compared to length in SEQ_BEGIN header. Now as packets of one socket are not reordered neither on vsock nor on vhost transport layers, such markers allows to restore original record on receiver's side. If user's buffer is smaller that record length, when all out of size data is dropped. Maximum length of datagram is not limited as in stream socket, because same credit logic is used. Difference with stream socket is that user is not woken up until whole record is received or error occurred. Implementation also supports 'MSG_EOR' and 'MSG_TRUNC' flags. Tests also implemented. Arseny Krasnov (19): af_vsock: update functions for connectible socket af_vsock: separate wait data loop af_vsock: separate receive data loop af_vsock: implement SEQPACKET receive loop af_vsock: separate wait space loop af_vsock: implement send logic for SEQPACKET af_vsock: rest of SEQPACKET support af_vsock: update comments for stream sockets virtio/vsock: set packet's type in send virtio/vsock: simplify credit update function API virtio/vsock: dequeue callback for SOCK_SEQPACKET virtio/vsock: fetch length for SEQPACKET record virtio/vsock: add SEQPACKET receive logic virtio/vsock: rest of SOCK_SEQPACKET support virtio/vsock: setup SEQPACKET ops for transport vhost/vsock: setup SEQPACKET ops for transport vsock/loopback: setup SEQPACKET ops for transport vsock_test: add SOCK_SEQPACKET tests virtio/vsock: update trace event for SEQPACKET drivers/vhost/vsock.c | 8 +- include/linux/virtio_vsock.h | 14 + include/net/af_vsock.h | 9 + .../events/vsock_virtio_transport_common.h | 48 +- include/uapi/linux/virtio_vsock.h | 16 + net/vmw_vsock/af_vsock.c | 590 +++++++++++------ net/vmw_vsock/virtio_transport.c | 5 + net/vmw_vsock/virtio_transport_common.c | 342 ++++++++-- net/vmw_vsock/vsock_loopback.c | 5 + tools/testing/vsock/util.c | 32 +- tools/testing/vsock/util.h | 3 + tools/testing/vsock/vsock_test.c | 126 ++++ 12 files changed, 951 insertions(+), 247 deletions(-) v4 -> v5: - patches reorganized: 1) Setting of packet's type in 'virtio_transport_send_pkt_info()' is moved to separate patch. 2) Simplifying of 'virtio_transport_send_credit_update()' is moved to separate patch and before main virtio/vsock patches. - style problem fixed - in 'af_vsock: separate receive data loop' extra 'release_sock()' removed - added trace event fields for SEQPACKET - in 'af_vsock: separate wait data loop': 1) 'vsock_wait_data()' removed 'goto out;' 2) Comment for invalid data amount is changed. - in 'af_vsock: rest of SEQPACKET support', 'new_transport' pointer check is moved after 'try_module_get()' - in 'af_vsock: update comments for stream sockets', 'connect-oriented' replaced with 'connection-oriented' - in 'loopback/vsock: setup SEQPACKET ops for transport', 'loopback/vsock' replaced with 'vsock/loopback' v3 -> v4: - SEQPACKET specific metadata moved from packet header to payload and called 'virtio_vsock_seq_hdr' - record integrity check: 1) SEQ_END operation was added, which marks end of record. 2) Both SEQ_BEGIN and SEQ_END carries counter which is incremented on every marker send. - af_vsock.c: socket operations for STREAM and SEQPACKET call same functions instead of having own "gates" differs only by names: 'vsock_seqpacket/stream_getsockopt()' now replaced with 'vsock_connectible_getsockopt()'. - af_vsock.c: 'seqpacket_dequeue' callback returns error and flag that record ready. There is no need to return number of copied bytes, because case when record received successfully is checked at virtio transport layer, when SEQ_END is processed. Also user doesn't need number of copied bytes, because 'recv()' from SEQPACKET could return error, length of users's buffer or length of whole record(both are known in af_vsock.c). - af_vsock.c: both wait loops in af_vsock.c(for data and space) moved to separate functions because now both called from several places. - af_vsock.c: 'vsock_assign_transport()' checks that 'new_transport' pointer is not NULL and returns 'ESOCKTNOSUPPORT' instead of 'ENODEV' if failed to use transport. - tools/testing/vsock/vsock_test.c: rename tests v2 -> v3: - patches reorganized: split for prepare and implementation patches - local variables are declared in "Reverse Christmas tree" manner - virtio_transport_common.c: valid leXX_to_cpu() for vsock header fields access - af_vsock.c: 'vsock_connectible_*sockopt()' added as shared code between stream and seqpacket sockets. - af_vsock.c: loops in '__vsock_*_recvmsg()' refactored. - af_vsock.c: 'vsock_wait_data()' refactored. v1 -> v2: - patches reordered: af_vsock.c related changes now before virtio vsock - patches reorganized: more small patches, where +/- are not mixed - tests for SOCK_SEQPACKET added - all commit messages updated - af_vsock.c: 'vsock_pre_recv_check()' inlined to 'vsock_connectible_recvmsg()' - af_vsock.c: 'vsock_assign_transport()' returns ENODEV if transport was not found - virtio_transport_common.c: transport callback for seqpacket dequeue - virtio_transport_common.c: simplified 'virtio_transport_recv_connected()' - virtio_transport_common.c: send reset on socket and packet type mismatch. Signed-off-by: Arseny Krasnov <arseny.krasnov@kaspersky.com>