diff mbox

[2/3] public/io/netif.h: document control ring and toeplitz hashing

Message ID 1450865195-12883-3-git-send-email-paul.durrant@citrix.com (mailing list archive)
State New, archived
Headers show

Commit Message

Paul Durrant Dec. 23, 2015, 10:06 a.m. UTC
This patch documents a new shared (variable message length) ring between
frontend and backend that can be used to pass bulk out-of-band data, such
as that required to implement toeplitz hashing in the backend that is
configurable by the frontend.

The patch then goes on to document the messages passed over the control
ring that can be used to configure toeplitz hashing.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Keir Fraser <keir@xen.org>
Cc: Tim Deegan <tim@xen.org>
---
 xen/include/public/io/netif.h | 320 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 320 insertions(+)

Comments

Andrew Cooper Dec. 23, 2015, 11:45 a.m. UTC | #1
On 23/12/2015 10:06, Paul Durrant wrote:
> +#define NETIF_CTRL_RING_SIZE 1024
> +
> +struct netif_ctrl_ring {
> +	RING_IDX req_cons;
> +	RING_IDX req_prod;
> +	RING_IDX rsp_cons;
> +	RING_IDX rsp_prod;
> +	uint8_t req[NETIF_CTRL_RING_SIZE];
> +	uint8_t rsp[NETIF_CTRL_RING_SIZE];

To avoid making the same mistake as the xenstore ring, this at the very
minimum needs a defined reset protocol.  It should also at least have a
version number (currently expected to be zero) which is used to
delineate the use of the remaining space in the page.

> +};
> +
> +struct xen_netif_ctrl_msg_hdr {
> +	uint16_t type;
> +	uint16_t len;

These don't match your documentation above.  uint32_t's ?

> +};
> +
> +#define NETIF_CTRL_MSG_ACK                  1
> +#define NETIF_CTRL_MSG_GET_TOEPLITZ_FLAGS   2
> +#define NETIF_CTRL_MSG_SET_TOEPLITZ_FLAGS   3
> +#define NETIF_CTRL_MSG_SET_TOEPLITZ_KEY     4
> +#define NETIF_CTRL_MSG_SET_TOEPLITZ_MAPPING 5

What about 0?  Again learning from the xenstore case, can we define 0 as
explicitly an invalid value, so a page of zeroes doesn't appear to be a
valid sequence of messages.

> +
> +/* Control messages: */
> +
> +/*
> + * NETIF_CTRL_MSG_ACK:
> + *
> + * This is the only valid type of message sent by the backend to the
> + * frontend. It carries a payload of the following format:
> + *
> + *    0     1     2     3     4     5     6     7  octet
> + * +-----+-----+-----+-----+-----+-----+-----+-----+

Can I recommend that all ack packets contain the control type they are
responding to.  In the normal case, it indeed shouldn't be needed, but
if the front and back ever get out of sync, it will make debugging far
easier.

~Andrew
Paul Durrant Dec. 23, 2015, 11:56 a.m. UTC | #2
> -----Original Message-----
> From: Andrew Cooper [mailto:amc96@hermes.cam.ac.uk] On Behalf Of
> Andrew Cooper
> Sent: 23 December 2015 11:45
> To: Paul Durrant; xen-devel@lists.xenproject.org
> Cc: Keir (Xen.org); Ian Campbell; Tim (Xen.org); Ian Jackson; Jan Beulich
> Subject: Re: [Xen-devel] [PATCH 2/3] public/io/netif.h: document control
> ring and toeplitz hashing
> 
> On 23/12/2015 10:06, Paul Durrant wrote:
> > +#define NETIF_CTRL_RING_SIZE 1024
> > +
> > +struct netif_ctrl_ring {
> > +	RING_IDX req_cons;
> > +	RING_IDX req_prod;
> > +	RING_IDX rsp_cons;
> > +	RING_IDX rsp_prod;
> > +	uint8_t req[NETIF_CTRL_RING_SIZE];
> > +	uint8_t rsp[NETIF_CTRL_RING_SIZE];
> 
> To avoid making the same mistake as the xenstore ring, this at the very
> minimum needs a defined reset protocol.  It should also at least have a
> version number (currently expected to be zero) which is used to
> delineate the use of the remaining space in the page.

It doesn't need a reset protocol any more than the rx or tx rings do. xenstore is a special case, because you can't use xenstore to handle (re)connection (as you can in this case) ;-)
Given the boolean feature flag in xenstore then I agree a version number could be useful... or the xenstore flag could be changed into a version number.

> 
> > +};
> > +
> > +struct xen_netif_ctrl_msg_hdr {
> > +	uint16_t type;
> > +	uint16_t len;
> 
> These don't match your documentation above.  uint32_t's ?

Yikes, you're right. They should be uint32_ts.

> 
> > +};
> > +
> > +#define NETIF_CTRL_MSG_ACK                  1
> > +#define NETIF_CTRL_MSG_GET_TOEPLITZ_FLAGS   2
> > +#define NETIF_CTRL_MSG_SET_TOEPLITZ_FLAGS   3
> > +#define NETIF_CTRL_MSG_SET_TOEPLITZ_KEY     4
> > +#define NETIF_CTRL_MSG_SET_TOEPLITZ_MAPPING 5
> 
> What about 0?  Again learning from the xenstore case, can we define 0 as
> explicitly an invalid value, so a page of zeroes doesn't appear to be a
> valid sequence of messages.

I thought 0 being invalid was kind of obvious from the fact I started at 1, but I'll make it explicit.

> 
> > +
> > +/* Control messages: */
> > +
> > +/*
> > + * NETIF_CTRL_MSG_ACK:
> > + *
> > + * This is the only valid type of message sent by the backend to the
> > + * frontend. It carries a payload of the following format:
> > + *
> > + *    0     1     2     3     4     5     6     7  octet
> > + * +-----+-----+-----+-----+-----+-----+-----+-----+
> 
> Can I recommend that all ack packets contain the control type they are
> responding to.  In the normal case, it indeed shouldn't be needed, but
> if the front and back ever get out of sync, it will make debugging far
> easier.

Yep, that's a good idea.

  Paul

> 
> ~Andrew
David Vrabel Dec. 23, 2015, 1:27 p.m. UTC | #3
On 23/12/15 10:06, Paul Durrant wrote:
> This patch documents a new shared (variable message length) ring between
> frontend and backend that can be used to pass bulk out-of-band data, such
> as that required to implement toeplitz hashing in the backend that is
> configurable by the frontend.
> 
> The patch then goes on to document the messages passed over the control
[...]
> ring that can be used to configure toeplitz hashing.
> --- a/xen/include/public/io/netif.h
> +++ b/xen/include/public/io/netif.h
> @@ -151,6 +151,326 @@
>   */
>  
>  /*
> + * Control ring:
[...]
> + *
> + * The layout of the shared page is as follows:
> + *
> + *    0     1     2     3     4     5     6     7  octet
> + * +-----+-----+-----+-----+-----+-----+-----+-----+
> + * |        req_cons       |        req_prod       |
> + * +-----+-----+-----+-----+-----+-----+-----+-----+
> + * |        rsp_cons       |        rsp_prod       |
> + * +-----+-----+-----+-----+-----+-----+-----+-----+
> + * |                                               |
> + * +                                               +
> + * |                      req[1024]                |
> + *                         .
> + *                         .
> + * |                                               |
> + * +-----+-----+-----+-----+-----+-----+-----+-----+
> + * |                                               |
> + * +                                               +
> + * |                      rsp[1024]                |
> + *                         .
> + *                         .
> + * |                                               |
> + * +-----+-----+-----+-----+-----+-----+-----+-----+

You should use the standard ring format and infrastructure.

David
Paul Durrant Jan. 4, 2016, 9:37 a.m. UTC | #4
> -----Original Message-----
> From: David Vrabel [mailto:david.vrabel@citrix.com]
> Sent: 23 December 2015 13:28
> To: Paul Durrant; xen-devel@lists.xenproject.org
> Cc: Keir (Xen.org); Ian Campbell; Tim (Xen.org); Ian Jackson; Jan Beulich
> Subject: Re: [Xen-devel] [PATCH 2/3] public/io/netif.h: document control
> ring and toeplitz hashing
> 
> On 23/12/15 10:06, Paul Durrant wrote:
> > This patch documents a new shared (variable message length) ring
> between
> > frontend and backend that can be used to pass bulk out-of-band data, such
> > as that required to implement toeplitz hashing in the backend that is
> > configurable by the frontend.
> >
> > The patch then goes on to document the messages passed over the control
> [...]
> > ring that can be used to configure toeplitz hashing.
> > --- a/xen/include/public/io/netif.h
> > +++ b/xen/include/public/io/netif.h
> > @@ -151,6 +151,326 @@
> >   */
> >
> >  /*
> > + * Control ring:
> [...]
> > + *
> > + * The layout of the shared page is as follows:
> > + *
> > + *    0     1     2     3     4     5     6     7  octet
> > + * +-----+-----+-----+-----+-----+-----+-----+-----+
> > + * |        req_cons       |        req_prod       |
> > + * +-----+-----+-----+-----+-----+-----+-----+-----+
> > + * |        rsp_cons       |        rsp_prod       |
> > + * +-----+-----+-----+-----+-----+-----+-----+-----+
> > + * |                                               |
> > + * +                                               +
> > + * |                      req[1024]                |
> > + *                         .
> > + *                         .
> > + * |                                               |
> > + * +-----+-----+-----+-----+-----+-----+-----+-----+
> > + * |                                               |
> > + * +                                               +
> > + * |                      rsp[1024]                |
> > + *                         .
> > + *                         .
> > + * |                                               |
> > + * +-----+-----+-----+-----+-----+-----+-----+-----+
> 
> You should use the standard ring format and infrastructure.

Is there one for variable message size rings? I didn't find one. I don't want to use the fixed size balanced ring macros for control messages as fixed size messages really aren't appropriate in this case.

  Paul

> 
> David
David Vrabel Jan. 4, 2016, 10:55 a.m. UTC | #5
On 04/01/16 09:37, Paul Durrant wrote:
>>> + * The layout of the shared page is as follows:
>>> + *
>>> + *    0     1     2     3     4     5     6     7  octet
>>> + * +-----+-----+-----+-----+-----+-----+-----+-----+
>>> + * |        req_cons       |        req_prod       |
>>> + * +-----+-----+-----+-----+-----+-----+-----+-----+
>>> + * |        rsp_cons       |        rsp_prod       |
>>> + * +-----+-----+-----+-----+-----+-----+-----+-----+
>>> + * |                                               |
>>> + * +                                               +
>>> + * |                      req[1024]                |
>>> + *                         .
>>> + *                         .
>>> + * |                                               |
>>> + * +-----+-----+-----+-----+-----+-----+-----+-----+
>>> + * |                                               |
>>> + * +                                               +
>>> + * |                      rsp[1024]                |
>>> + *                         .
>>> + *                         .
>>> + * |                                               |
>>> + * +-----+-----+-----+-----+-----+-----+-----+-----+
>>
>> You should use the standard ring format and infrastructure.
> 
> Is there one for variable message size rings? I didn't find one. I
> don't want to use the fixed size balanced ring macros for control
> messages as fixed size messages really aren't appropriate in this case.

Perhaps union the request/response message types with a uint8_t
pad[1024] and use this as the request/response type?

You can use the standard macros like so (to avoid copying the full 1024
bytes every time):

hdr = RING_GET_REQUEST(...);
switch (READ_ONCE(hdr->type)) {
case FOO:
    {
        struct foo foo;
        RING_COPY_REQUEST(ring, cons, &foo);
        handle_foo(&foo);
    }
    break;
case ...:
    ...
}

David
Paul Durrant Jan. 4, 2016, 11:14 a.m. UTC | #6
> -----Original Message-----
> From: David Vrabel [mailto:david.vrabel@citrix.com]
> Sent: 04 January 2016 10:56
> To: Paul Durrant; David Vrabel; xen-devel@lists.xenproject.org
> Cc: Tim (Xen.org); Keir (Xen.org); Ian Campbell; Jan Beulich; Ian Jackson
> Subject: Re: [Xen-devel] [PATCH 2/3] public/io/netif.h: document control
> ring and toeplitz hashing
> 
> On 04/01/16 09:37, Paul Durrant wrote:
> >>> + * The layout of the shared page is as follows:
> >>> + *
> >>> + *    0     1     2     3     4     5     6     7  octet
> >>> + * +-----+-----+-----+-----+-----+-----+-----+-----+
> >>> + * |        req_cons       |        req_prod       |
> >>> + * +-----+-----+-----+-----+-----+-----+-----+-----+
> >>> + * |        rsp_cons       |        rsp_prod       |
> >>> + * +-----+-----+-----+-----+-----+-----+-----+-----+
> >>> + * |                                               |
> >>> + * +                                               +
> >>> + * |                      req[1024]                |
> >>> + *                         .
> >>> + *                         .
> >>> + * |                                               |
> >>> + * +-----+-----+-----+-----+-----+-----+-----+-----+
> >>> + * |                                               |
> >>> + * +                                               +
> >>> + * |                      rsp[1024]                |
> >>> + *                         .
> >>> + *                         .
> >>> + * |                                               |
> >>> + * +-----+-----+-----+-----+-----+-----+-----+-----+
> >>
> >> You should use the standard ring format and infrastructure.
> >
> > Is there one for variable message size rings? I didn't find one. I
> > don't want to use the fixed size balanced ring macros for control
> > messages as fixed size messages really aren't appropriate in this case.
> 
> Perhaps union the request/response message types with a uint8_t
> pad[1024] and use this as the request/response type?
> 

The problem is that this places a 1k limit on the message size, which is not there in the scheme I'm proposing. I'd rather not bake that limit in if I don't have to.

  Paul

> You can use the standard macros like so (to avoid copying the full 1024
> bytes every time):
> 
> hdr = RING_GET_REQUEST(...);
> switch (READ_ONCE(hdr->type)) {
> case FOO:
>     {
>         struct foo foo;
>         RING_COPY_REQUEST(ring, cons, &foo);
>         handle_foo(&foo);
>     }
>     break;
> case ...:
>     ...
> }
> 
> David
David Vrabel Jan. 4, 2016, 11:18 a.m. UTC | #7
On 04/01/16 11:14, Paul Durrant wrote:
>>>> You should use the standard ring format and infrastructure.
>>>
>>> Is there one for variable message size rings? I didn't find one. I
>>> don't want to use the fixed size balanced ring macros for control
>>> messages as fixed size messages really aren't appropriate in this case.
>>
>> Perhaps union the request/response message types with a uint8_t
>> pad[1024] and use this as the request/response type?
>>
> 
> The problem is that this places a 1k limit on the message size,
> which
> is not there in the scheme I'm proposing. I'd rather not bake that limit
> in if I don't have to.

>>>>> + * |                      req[1024]                |
                                 ^^^^^^^^^
Surely this limits your size to 1024 bytes?

Also if you need bigger messages you can grant those areas separately
and pass a grant ref through the ring, or you can chunk the message to
fit in several requests/responses.

David
Paul Durrant Jan. 4, 2016, 11:21 a.m. UTC | #8
> -----Original Message-----
> From: David Vrabel [mailto:david.vrabel@citrix.com]
> Sent: 04 January 2016 11:18
> To: Paul Durrant; xen-devel@lists.xenproject.org
> Cc: Tim (Xen.org); Keir (Xen.org); Ian Campbell; Jan Beulich; Ian Jackson
> Subject: Re: [Xen-devel] [PATCH 2/3] public/io/netif.h: document control
> ring and toeplitz hashing
> 
> On 04/01/16 11:14, Paul Durrant wrote:
> >>>> You should use the standard ring format and infrastructure.
> >>>
> >>> Is there one for variable message size rings? I didn't find one. I
> >>> don't want to use the fixed size balanced ring macros for control
> >>> messages as fixed size messages really aren't appropriate in this case.
> >>
> >> Perhaps union the request/response message types with a uint8_t
> >> pad[1024] and use this as the request/response type?
> >>
> >
> > The problem is that this places a 1k limit on the message size,
> > which
> > is not there in the scheme I'm proposing. I'd rather not bake that limit
> > in if I don't have to.
> 
> >>>>> + * |                      req[1024]                |
>                                  ^^^^^^^^^
> Surely this limits your size to 1024 bytes?

No, I've already got prototype code that can pass 4k messages. Nothing in the protocol says the whole message has to fit in the buffer. In fact I've explicitly documented how to handle partial messages.

> 
> Also if you need bigger messages you can grant those areas separately
> and pass a grant ref through the ring, or you can chunk the message to
> fit in several requests/responses.

Why go to that trouble to wedge a square peg into a round hole? What is the fundamental problem with what I've proposed that you want to avoid?

  Paul

> 
> David
David Vrabel Jan. 4, 2016, 11:28 a.m. UTC | #9
On 04/01/16 11:21, Paul Durrant wrote:
>> -----Original Message-----
>> From: David Vrabel [mailto:david.vrabel@citrix.com]
>> Sent: 04 January 2016 11:18
>> To: Paul Durrant; xen-devel@lists.xenproject.org
>> Cc: Tim (Xen.org); Keir (Xen.org); Ian Campbell; Jan Beulich; Ian Jackson
>> Subject: Re: [Xen-devel] [PATCH 2/3] public/io/netif.h: document control
>> ring and toeplitz hashing
>>
>> On 04/01/16 11:14, Paul Durrant wrote:
>>>>>> You should use the standard ring format and infrastructure.
>>>>>
>>>>> Is there one for variable message size rings? I didn't find one. I
>>>>> don't want to use the fixed size balanced ring macros for control
>>>>> messages as fixed size messages really aren't appropriate in this case.
>>>>
>>>> Perhaps union the request/response message types with a uint8_t
>>>> pad[1024] and use this as the request/response type?
>>>>
>>>
>>> The problem is that this places a 1k limit on the message size,
>>> which
>>> is not there in the scheme I'm proposing. I'd rather not bake that limit
>>> in if I don't have to.
>>
>>>>>>> + * |                      req[1024]                |
>>                                  ^^^^^^^^^
>> Surely this limits your size to 1024 bytes?
> 
> No, I've already got prototype code that can pass 4k messages. Nothing in the protocol says the whole message has to fit in the buffer. In fact I've explicitly documented how to handle partial messages.

Then the standard ring infrastructure will work just fine.

>> Also if you need bigger messages you can grant those areas separately
>> and pass a grant ref through the ring, or you can chunk the message to
>> fit in several requests/responses.
> 
> Why go to that trouble to wedge a square peg into a round hole? What is the fundamental problem with what I've proposed that you want to avoid?

You've put the consumer values into the shared page. I'd rather not have
to scrutinize your shared ring implementation for other security bugs.
Similarly, if there's another security issues like XSA-155 I'd rather
not have to look at another non-standard shared ring implementation.

IMO, it's you who should be presenting compelling reasons for /not/
using the standard infrastructure, not the other way around.

David
Paul Durrant Jan. 4, 2016, 11:34 a.m. UTC | #10
> -----Original Message-----
> From: David Vrabel [mailto:david.vrabel@citrix.com]
> Sent: 04 January 2016 11:29
> To: Paul Durrant; xen-devel@lists.xenproject.org
> Cc: Tim (Xen.org); Keir (Xen.org); Ian Campbell; Jan Beulich; Ian Jackson
> Subject: Re: [Xen-devel] [PATCH 2/3] public/io/netif.h: document control
> ring and toeplitz hashing
> 
> On 04/01/16 11:21, Paul Durrant wrote:
> >> -----Original Message-----
> >> From: David Vrabel [mailto:david.vrabel@citrix.com]
> >> Sent: 04 January 2016 11:18
> >> To: Paul Durrant; xen-devel@lists.xenproject.org
> >> Cc: Tim (Xen.org); Keir (Xen.org); Ian Campbell; Jan Beulich; Ian Jackson
> >> Subject: Re: [Xen-devel] [PATCH 2/3] public/io/netif.h: document control
> >> ring and toeplitz hashing
> >>
> >> On 04/01/16 11:14, Paul Durrant wrote:
> >>>>>> You should use the standard ring format and infrastructure.
> >>>>>
> >>>>> Is there one for variable message size rings? I didn't find one. I
> >>>>> don't want to use the fixed size balanced ring macros for control
> >>>>> messages as fixed size messages really aren't appropriate in this case.
> >>>>
> >>>> Perhaps union the request/response message types with a uint8_t
> >>>> pad[1024] and use this as the request/response type?
> >>>>
> >>>
> >>> The problem is that this places a 1k limit on the message size,
> >>> which
> >>> is not there in the scheme I'm proposing. I'd rather not bake that limit
> >>> in if I don't have to.
> >>
> >>>>>>> + * |                      req[1024]                |
> >>                                  ^^^^^^^^^
> >> Surely this limits your size to 1024 bytes?
> >
> > No, I've already got prototype code that can pass 4k messages. Nothing in
> the protocol says the whole message has to fit in the buffer. In fact I've
> explicitly documented how to handle partial messages.
> 
> Then the standard ring infrastructure will work just fine.
> 
> >> Also if you need bigger messages you can grant those areas separately
> >> and pass a grant ref through the ring, or you can chunk the message to
> >> fit in several requests/responses.
> >
> > Why go to that trouble to wedge a square peg into a round hole? What is
> the fundamental problem with what I've proposed that you want to avoid?
> 
> You've put the consumer values into the shared page. I'd rather not have
> to scrutinize your shared ring implementation for other security bugs.
> Similarly, if there's another security issues like XSA-155 I'd rather
> not have to look at another non-standard shared ring implementation.

Ok. That's a good enough reason. I'll come up with a new prototype.

> 
> IMO, it's you who should be presenting compelling reasons for /not/
> using the standard infrastructure, not the other way around.
> 

There is no 'standard' here though. There's convention, but that's a different thing. If we're going to have a 'no more variable size message protocols' policy than that needs writing down somewhere.

  Paul

> David
Konrad Rzeszutek Wilk Jan. 4, 2016, 8:19 p.m. UTC | #11
> > You've put the consumer values into the shared page. I'd rather not have
> > to scrutinize your shared ring implementation for other security bugs.
> > Similarly, if there's another security issues like XSA-155 I'd rather
> > not have to look at another non-standard shared ring implementation.
> 
> Ok. That's a good enough reason. I'll come up with a new prototype.

Could I suggest that you make this a more generic one? That is not
just limited to network out of band - but other drivers could 
use it as well.


> 
> > 
> > IMO, it's you who should be presenting compelling reasons for /not/
> > using the standard infrastructure, not the other way around.
> > 
> 
> There is no 'standard' here though. There's convention, but that's a different thing. If we're going to have a 'no more variable size message protocols' policy than that needs writing down somewhere.
> 
>   Paul
> 
> > David
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
Paul Durrant Jan. 5, 2016, 9:40 a.m. UTC | #12
> -----Original Message-----
> From: xen-devel-bounces@lists.xen.org [mailto:xen-devel-
> bounces@lists.xen.org] On Behalf Of Konrad Rzeszutek Wilk
> Sent: 04 January 2016 20:19
> To: Paul Durrant
> Cc: Keir (Xen.org); Ian Campbell; Tim (Xen.org); David Vrabel; Jan Beulich; Ian
> Jackson; xen-devel@lists.xenproject.org
> Subject: Re: [Xen-devel] [PATCH 2/3] public/io/netif.h: document control
> ring and toeplitz hashing
> 
> > > You've put the consumer values into the shared page. I'd rather not have
> > > to scrutinize your shared ring implementation for other security bugs.
> > > Similarly, if there's another security issues like XSA-155 I'd rather
> > > not have to look at another non-standard shared ring implementation.
> >
> > Ok. That's a good enough reason. I'll come up with a new prototype.
> 
> Could I suggest that you make this a more generic one? That is not
> just limited to network out of band - but other drivers could
> use it as well.
> 

Well, if I use the usual balanced ring macros then they are already common. The next level is the actual message format and content, which is clearly going to be specific to a particular use-case.

  Paul

> 
> >
> > >
> > > IMO, it's you who should be presenting compelling reasons for /not/
> > > using the standard infrastructure, not the other way around.
> > >
> >
> > There is no 'standard' here though. There's convention, but that's a
> different thing. If we're going to have a 'no more variable size message
> protocols' policy than that needs writing down somewhere.
> >
> >   Paul
> >
> > > David
> >
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@lists.xen.org
> > http://lists.xen.org/xen-devel
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
diff mbox

Patch

diff --git a/xen/include/public/io/netif.h b/xen/include/public/io/netif.h
index 1790ea0..612dbd0 100644
--- a/xen/include/public/io/netif.h
+++ b/xen/include/public/io/netif.h
@@ -151,6 +151,326 @@ 
  */
 
 /*
+ * Control ring:
+ *
+ * Some features, such as toeplitz hashing (detailed below), require a
+ * significant amount of out-of-band data to be passed from frontend to
+ * backend. Use of xenstore is not suitable for large quantities of data
+ * because of quota limitations and so a dedicated 'control ring' is used.
+ * The ability of the backend to use a control ring is advertised by
+ * setting:
+ *
+ * /local/domain/X/backend/<domid>/<vif>/feature-control-ring = "1"
+ *
+ * The frontend provides a control ring to the backend by setting:
+ *
+ * /local/domain/<domid>/device/vif/<vif>/ctrl-ring-ref = <gref>
+ * /local/domain/<domid>/device/vif/<vif>/event-channel-ctrl = <port>
+ *
+ * where <gref> is the grant reference of the shared page used to
+ * implement the control ring and <port> is an event channel to be used
+ * as a mailbox interrupt, before the frontend moves into the connected
+ * state.
+ *
+ * The layout of the shared page is as follows:
+ *
+ *    0     1     2     3     4     5     6     7  octet
+ * +-----+-----+-----+-----+-----+-----+-----+-----+
+ * |        req_cons       |        req_prod       |
+ * +-----+-----+-----+-----+-----+-----+-----+-----+
+ * |        rsp_cons       |        rsp_prod       |
+ * +-----+-----+-----+-----+-----+-----+-----+-----+
+ * |                                               |
+ * +                                               +
+ * |                      req[1024]                |
+ *                         .
+ *                         .
+ * |                                               |
+ * +-----+-----+-----+-----+-----+-----+-----+-----+
+ * |                                               |
+ * +                                               +
+ * |                      rsp[1024]                |
+ *                         .
+ *                         .
+ * |                                               |
+ * +-----+-----+-----+-----+-----+-----+-----+-----+
+ *
+ * This provides a 1024 byte request buffer, a 1024 response buffer and
+ * producer/consumer counts for both. The frontend and backend
+ * communicate using message structures prefaced with the following
+ * header:
+ *
+ * netif_ctrl_msg_hdr_t:
+ *
+ *    0     1     2     3     4     5     6     7  octet
+ * +-----+-----+-----+-----+-----+-----+-----+-----+
+ * | type                  | size                  |
+ * +-----+-----+-----+-----+-----+-----+-----+-----+
+ *
+ * The type is one of the NETIF_CTRL_MSG_* values defined below and the
+ * size field specifies how many octets of payload follow the header
+ * (hence size may be 0 for messages not requiring a payload).
+ *
+ * The frontend makes a request by writing a message into the req buffer
+ * (at req_cons modulo 1024, taking care to wrap correctly), incrementing
+ * req_prod by the number of octets written and then sending a mailbox
+ * event to the backend.
+ * The message length may exceed the available space in the buffer
+ * (which can be calculated as req_cons + NETIF_CTRL_RING_SIZE - req_prod)
+ * in which case, as much data should be written as is possible and
+ * req_prod should be incremented by the number of octets written. A
+ * mailbox interrupt should then be sent to the backend to start message
+ * processing and the frontend should not write any more message data
+ * into the req buffer until the backend sends a mailbox interrupt
+ * to the frontend.
+ *
+ * The backend receives a request (when triggered to do so by a mailbox
+ * event) by reading as many octets as it can (which can be calculated
+ * as req_prod - req_cons) from the req buffer (from offset req_cons
+ * modulo 1024, taking care to wrap correctly) into a private buffer and
+ * then incrementing req_cons with the number of octets read.
+ * If a complete header (8 octets) has been read then the backend can
+ * determine how many payload octets it should expect and whether they
+ * have all been read. If they have then the message can be processed.
+ * If they have not then a mailbox event should be sent to the frontend
+ * and backend processing should be suspended until the next mailbox
+ * event arrives).
+ *
+ * The backend sends responses to the frontend using the rsp buffer in
+ * much the same way that the frontend sends requests to the backend and
+ * frontend processes the responses in much the same way that the backend
+ * processes requests.
+ * The protocol allows for a maximum of one outstanding request at any
+ * point in time. Hence the frontend should not send a new request until it
+ * has received a complete response for a previous request. Similarly
+ * the backend need only provide provide buffer space for the maximum size
+ * of request that it is prepared to handle (see specification of request
+ * types below).
+ */
+
+#define NETIF_CTRL_RING_SIZE 1024
+
+struct netif_ctrl_ring {
+	RING_IDX req_cons;
+	RING_IDX req_prod;
+	RING_IDX rsp_cons;
+	RING_IDX rsp_prod;
+	uint8_t req[NETIF_CTRL_RING_SIZE];
+	uint8_t rsp[NETIF_CTRL_RING_SIZE];
+};
+
+struct xen_netif_ctrl_msg_hdr {
+	uint16_t type;
+	uint16_t len;
+};
+
+#define NETIF_CTRL_MSG_ACK                  1
+#define NETIF_CTRL_MSG_GET_TOEPLITZ_FLAGS   2
+#define NETIF_CTRL_MSG_SET_TOEPLITZ_FLAGS   3
+#define NETIF_CTRL_MSG_SET_TOEPLITZ_KEY     4
+#define NETIF_CTRL_MSG_SET_TOEPLITZ_MAPPING 5
+
+/* Control messages: */
+
+/*
+ * NETIF_CTRL_MSG_ACK:
+ *
+ * This is the only valid type of message sent by the backend to the
+ * frontend. It carries a payload of the following format:
+ *
+ *    0     1     2     3     4     5     6     7  octet
+ * +-----+-----+-----+-----+-----+-----+-----+-----+
+ * |                     status                    |
+ * +-----+-----+-----+-----+-----+-----+-----+-----+
+ * |                                               |
+ * +                                               +
+ * |                      data[]                   |
+ *                         .
+ *                         .
+ * |                                               |
+ * +-----+-----+-----+-----+-----+-----+-----+-----+
+ *
+ * The status field is always present and the correct size of the data
+ * field is determined by the type of request message and the value of the
+ * status field.
+ * If the backend receives a request from a frontend that it does not
+ * implement then it should respond with an ack message containing no
+ * data and status set to NETIF_CTRL_STATUS_NOT_SUPPORTED.
+ */
+
+#define NETIF_CTRL_STATUS_SUCCESS           0
+#define NETIF_CTRL_STATUS_NOT_SUPPORTED     1
+#define NETIF_CTRL_STATUS_INVALID_PARAMETER 2
+#define NETIF_CTRL_STATUS_BUFFER_OVERFLOW   3
+
+/*
+ * NETIF_CTRL_MSG_GET_TOEPLITZ_FLAGS:
+ *
+ * This is sent by the frontend to query the types of toeplitz
+ * hash supported by the backend. It carries no payload.
+ *
+ * A successful ack message has the following format:
+ *
+ *    0     1     2     3     4     5     6     7  octet
+ * +-----+-----+-----+-----+-----+-----+-----+-----+
+ * |           NETIF_CTRL_STATUS_SUCCESS           |
+ * +-----+-----+-----+-----+-----+-----+-----+-----+
+ * |                    flags                      |
+ * +-----+-----+-----+-----+-----+-----+-----+-----+
+ *
+ * where flags is a bitwise OR of NETIF_CTRL_TOEPLITZ_FLAG_* values
+ * defined below.
+ * An unsuccessful ack message carries no data, only a status value.
+ */
+
+/*
+ * For the purposes of the definitions below, 'Packet[]' is an array of
+ * octets containing an IP packet without options, 'Array[X..Y]' means a
+ * sub-array of 'Array' containing bytes X thru Y inclusive, and '+' is
+ * used to indicate concatenation of arrays.
+ */
+
+/*
+ * A hash calculated over an IP version 4 header as follows:
+ *
+ * Buffer[0..8] = Packet[12..15] + Packet[16..19]
+ * Result = ToeplitzHash(Buffer, 8)
+ */
+#define _NETIF_CTRL_TOEPLITZ_FLAG_IPV4     0
+#define NETIF_CTRL_TOEPLITZ_FLAG_IPV4      (1 << _NETIF_CTRL_TOEPLITZ_FLAG_IPV4)
+
+/*
+ * A hash calculated over an IP version 4 header and TCP header as
+ * follows:
+ *
+ * Buffer[0..12] = Packet[12..15] + Packet[16..19] +
+ *                 Packet[20..21] + Packet[22..23]
+ * Result = ToeplitzHash(Buffer, 12)
+ */
+#define _NETIF_CTRL_TOEPLITZ_FLAG_IPV4_TCP 1
+#define NETIF_CTRL_TOEPLITZ_FLAG_IPV4_TCP  (1 << _NETIF_CTRL_TOEPLITZ_FLAG_IPV4_TCP)
+
+/*
+ * A hash calculated over an IP version 6 header as follows:
+ *
+ * Buffer[0..32] = Packet[8..23] + Packet[24..39]
+ * Result = ToeplitzHash(Buffer, 32)
+ */
+#define _NETIF_CTRL_TOEPLITZ_FLAG_IPV6     2
+#define NETIF_CTRL_TOEPLITZ_FLAG_IPV6      (1 << _NETIF_CTRL_TOEPLITZ_FLAG_IPV4)
+
+/*
+ * A hash calculated over an IP version 6 header and TCP header as
+ * follows:
+ *
+ * Buffer[0..36] = Packet[8..23] + Packet[24..39] +
+ *                 Packet[40..41] + Packet[42..43]
+ * Result = ToeplitzHash(Buffer, 36)
+ */
+#define _NETIF_CTRL_TOEPLITZ_FLAG_IPV6_TCP 3
+#define NETIF_CTRL_TOEPLITZ_FLAG_IPV6_TCP  (1 << _NETIF_CTRL_TOEPLITZ_FLAG_IPV4_TCP)
+
+/*
+ * NETIF_CTRL_MSG_SET_TOEPLITZ_FLAGS:
+ *
+ * This is sent by the frontend to set the types of toeplitz hash that
+ * the backend should calculate. Note that the 'maximal' type of hash
+ * should always be chosen. For example, if the frontend sets both IPV4
+ * and IPV4_TCP hash types then the latter hash type should be calculated
+ * for any TCP packet and the former only calculated for non-TCP packets.
+ * The message carries a payload of the following format:
+ *
+ *    0     1     2     3     4     5     6     7  octet
+ * +-----+-----+-----+-----+-----+-----+-----+-----+
+ * |                    flags                      |
+ * +-----+-----+-----+-----+-----+-----+-----+-----+
+ *
+ * where flags is a bitwise OR of NETIF_CTRL_TOEPLITZ_FLAG_* values
+ * defined above.
+ *
+ * NOTE: Setting flags to 0 disables toeplitz hashing and the backend
+ *       is free to choose how it steers packets to queues (which is the
+ *       default state).
+ *
+ * A successful or unsuccessful ack message carries no data, only a
+ * status value.
+ */
+
+/*
+ * NETIF_CTRL_MSG_SET_TOEPLITZ_KEY:
+ *
+ * This is sent by the frontend to set the key of toeplitz hash that
+ * the backend should calculate. The toeplitz algorithm is illustrated
+ * by the following pseudo-code:
+ *
+ * (Buffer[] and Key[] are treated as shift-registers where the MSB of
+ * Buffer/Key[0] is considered 'left-most' and the LSB of Buffer/Key[N-1]
+ * is the 'right-most').
+ *
+ * Value = 0
+ * For number of bits in Buffer[]
+ *    If (left-most bit of Buffer[] is 1)
+ *        Value ^= left-most 32 bits of Key[]
+ *    Key[] << 1
+ *    Buffer[] << 1
+ *
+ * Key[] is always 40 octets in length and so the message carries a
+ * payload of the following format:
+ *
+ *    0     1     2     3     4     5     6     7  octet
+ * +-----+-----+-----+-----+-----+-----+-----+-----+
+ * |                                               |
+ * +                                               +
+ * |                                               |
+ * +                                               +
+ * |                   key[40]                     |
+ * +                                               +
+ * |                                               |
+ * +                                               +
+ * |                                               |
+ * +-----+-----+-----+-----+-----+-----+-----+-----+
+ *
+ * A successful or unsuccessful ack message carries no data, only a
+ * status value.
+ */
+
+/*
+ * NETIF_CTRL_MSG_SET_TOEPLITZ_MAPPING:
+ *
+ * This is sent by the frontend to set the mapping of toeplitz hash to
+ * queue number to be applied by the backend.
+ * The message carries a payload of the following format:
+ *
+ *    0     1     2     3     4     5     6     7  octet
+ * +-----+-----+-----+-----+-----+-----+-----+-----+
+ * |                    queue[0]                   |
+ * +-----+-----+-----+-----+-----+-----+-----+-----+
+ * |                    queue[1]                   |
+ * +-----+-----+-----+-----+-----+-----+-----+-----+
+ * |                    queue[2]                   |
+ * +-----+-----+-----+-----+-----+-----+-----+-----+
+ * |                    queue[3]                   |
+ *                         .
+ *                         .
+ * |                    queue[N-1]                 |
+ * +-----+-----+-----+-----+-----+-----+-----+-----+
+ *
+ * N can be calculated from the payload length and only power-of-2
+ * values are valid.
+ *
+ * NOTE: Before a specific mapping is set using this request, the backend
+ *       should map all toeplitz hash values to queue 0 (which is the only
+ *       queue guaranteed to exist in all cases).
+ *
+ * A successful or unsuccessful ack message carries no data, only a
+ * status value. If the value of N is not a power of 2 or any of the
+ * queue values exceeds the number of queues in operation then status
+ * should be set to NETIF_CTRL_STATUS_INVALID_PARAMETER. If N is larger
+ * than the backend's maximal size of mapping table then status should
+ * be set to NETIF_CTRL_STATUS_BUFFER_OVERFLOW.
+ */
+
+/*
  * Guest transmit
  * ==============
  *