[2/3] public/io/netif.h: document control ring and toeplitz hashing

Message ID	1450865195-12883-3-git-send-email-paul.durrant@citrix.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <xen-devel-bounces@lists.xen.org> From: Paul Durrant <paul.durrant@citrix.com> To: <xen-devel@lists.xenproject.org> Date: Wed, 23 Dec 2015 10:06:34 +0000 Message-ID: <1450865195-12883-3-git-send-email-paul.durrant@citrix.com> In-Reply-To: <1450865195-12883-1-git-send-email-paul.durrant@citrix.com> References: <1450865195-12883-1-git-send-email-paul.durrant@citrix.com> MIME-Version: 1.0 Cc: Keir Fraser <keir@xen.org>, Ian Campbell <ian.campbell@citrix.com>, Tim Deegan <tim@xen.org>, Ian Jackson <ian.jackson@eu.citrix.com>, Paul Durrant <paul.durrant@citrix.com>, Jan Beulich <jbeulich@suse.com> Subject: [Xen-devel] [PATCH 2/3] public/io/netif.h: document control ring and toeplitz hashing Precedence: list Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org

diff --git a/xen/include/public/io/netif.h b/xen/include/public/io/netif.h index 1790ea0..612dbd0 100644 --- a/xen/include/public/io/netif.h +++ b/xen/include/public/io/netif.h @@ -151,6 +151,326 @@ */ /* + * Control ring: + * + * Some features, such as toeplitz hashing (detailed below), require a + * significant amount of out-of-band data to be passed from frontend to + * backend. Use of xenstore is not suitable for large quantities of data + * because of quota limitations and so a dedicated 'control ring' is used. + * The ability of the backend to use a control ring is advertised by + * setting: + * + * /local/domain/X/backend/<domid>/<vif>/feature-control-ring = "1" + * + * The frontend provides a control ring to the backend by setting: + * + * /local/domain/<domid>/device/vif/<vif>/ctrl-ring-ref = <gref> + * /local/domain/<domid>/device/vif/<vif>/event-channel-ctrl = <port> + * + * where <gref> is the grant reference of the shared page used to + * implement the control ring and <port> is an event channel to be used + * as a mailbox interrupt, before the frontend moves into the connected + * state. + * + * The layout of the shared page is as follows: + * + * 0 1 2 3 4 5 6 7 octet + * +-----+-----+-----+-----+-----+-----+-----+-----+ + * | req_cons | req_prod | + * +-----+-----+-----+-----+-----+-----+-----+-----+ + * | rsp_cons | rsp_prod | + * +-----+-----+-----+-----+-----+-----+-----+-----+ + * | | + * + + + * | req[1024] | + * . + * . + * | | + * +-----+-----+-----+-----+-----+-----+-----+-----+ + * | | + * + + + * | rsp[1024] | + * . + * . + * | | + * +-----+-----+-----+-----+-----+-----+-----+-----+ + * + * This provides a 1024 byte request buffer, a 1024 response buffer and + * producer/consumer counts for both. The frontend and backend + * communicate using message structures prefaced with the following + * header: + * + * netif_ctrl_msg_hdr_t: + * + * 0 1 2 3 4 5 6 7 octet + * +-----+-----+-----+-----+-----+-----+-----+-----+ + * | type | size | + * +-----+-----+-----+-----+-----+-----+-----+-----+ + * + * The type is one of the NETIF_CTRL_MSG_* values defined below and the + * size field specifies how many octets of payload follow the header + * (hence size may be 0 for messages not requiring a payload). + * + * The frontend makes a request by writing a message into the req buffer + * (at req_cons modulo 1024, taking care to wrap correctly), incrementing + * req_prod by the number of octets written and then sending a mailbox + * event to the backend. + * The message length may exceed the available space in the buffer + * (which can be calculated as req_cons + NETIF_CTRL_RING_SIZE - req_prod) + * in which case, as much data should be written as is possible and + * req_prod should be incremented by the number of octets written. A + * mailbox interrupt should then be sent to the backend to start message + * processing and the frontend should not write any more message data + * into the req buffer until the backend sends a mailbox interrupt + * to the frontend. + * + * The backend receives a request (when triggered to do so by a mailbox + * event) by reading as many octets as it can (which can be calculated + * as req_prod - req_cons) from the req buffer (from offset req_cons + * modulo 1024, taking care to wrap correctly) into a private buffer and + * then incrementing req_cons with the number of octets read. + * If a complete header (8 octets) has been read then the backend can + * determine how many payload octets it should expect and whether they + * have all been read. If they have then the message can be processed. + * If they have not then a mailbox event should be sent to the frontend + * and backend processing should be suspended until the next mailbox + * event arrives). + * + * The backend sends responses to the frontend using the rsp buffer in + * much the same way that the frontend sends requests to the backend and + * frontend processes the responses in much the same way that the backend + * processes requests. + * The protocol allows for a maximum of one outstanding request at any + * point in time. Hence the frontend should not send a new request until it + * has received a complete response for a previous request. Similarly + * the backend need only provide provide buffer space for the maximum size + * of request that it is prepared to handle (see specification of request + * types below). + */ + +#define NETIF_CTRL_RING_SIZE 1024 + +struct netif_ctrl_ring { + RING_IDX req_cons; + RING_IDX req_prod; + RING_IDX rsp_cons; + RING_IDX rsp_prod; + uint8_t req[NETIF_CTRL_RING_SIZE]; + uint8_t rsp[NETIF_CTRL_RING_SIZE]; +}; + +struct xen_netif_ctrl_msg_hdr { + uint16_t type; + uint16_t len; +}; + +#define NETIF_CTRL_MSG_ACK 1 +#define NETIF_CTRL_MSG_GET_TOEPLITZ_FLAGS 2 +#define NETIF_CTRL_MSG_SET_TOEPLITZ_FLAGS 3 +#define NETIF_CTRL_MSG_SET_TOEPLITZ_KEY 4 +#define NETIF_CTRL_MSG_SET_TOEPLITZ_MAPPING 5 + +/* Control messages: */ + +/* + * NETIF_CTRL_MSG_ACK: + * + * This is the only valid type of message sent by the backend to the + * frontend. It carries a payload of the following format: + * + * 0 1 2 3 4 5 6 7 octet + * +-----+-----+-----+-----+-----+-----+-----+-----+ + * | status | + * +-----+-----+-----+-----+-----+-----+-----+-----+ + * | | + * + + + * | data[] | + * . + * . + * | | + * +-----+-----+-----+-----+-----+-----+-----+-----+ + * + * The status field is always present and the correct size of the data + * field is determined by the type of request message and the value of the + * status field. + * If the backend receives a request from a frontend that it does not + * implement then it should respond with an ack message containing no + * data and status set to NETIF_CTRL_STATUS_NOT_SUPPORTED. + */ + +#define NETIF_CTRL_STATUS_SUCCESS 0 +#define NETIF_CTRL_STATUS_NOT_SUPPORTED 1 +#define NETIF_CTRL_STATUS_INVALID_PARAMETER 2 +#define NETIF_CTRL_STATUS_BUFFER_OVERFLOW 3 + +/* + * NETIF_CTRL_MSG_GET_TOEPLITZ_FLAGS: + * + * This is sent by the frontend to query the types of toeplitz + * hash supported by the backend. It carries no payload. + * + * A successful ack message has the following format: + * + * 0 1 2 3 4 5 6 7 octet + * +-----+-----+-----+-----+-----+-----+-----+-----+ + * | NETIF_CTRL_STATUS_SUCCESS | + * +-----+-----+-----+-----+-----+-----+-----+-----+ + * | flags | + * +-----+-----+-----+-----+-----+-----+-----+-----+ + * + * where flags is a bitwise OR of NETIF_CTRL_TOEPLITZ_FLAG_* values + * defined below. + * An unsuccessful ack message carries no data, only a status value. + */ + +/* + * For the purposes of the definitions below, 'Packet[]' is an array of + * octets containing an IP packet without options, 'Array[X..Y]' means a + * sub-array of 'Array' containing bytes X thru Y inclusive, and '+' is + * used to indicate concatenation of arrays. + */ + +/* + * A hash calculated over an IP version 4 header as follows: + * + * Buffer[0..8] = Packet[12..15] + Packet[16..19] + * Result = ToeplitzHash(Buffer, 8) + */ +#define _NETIF_CTRL_TOEPLITZ_FLAG_IPV4 0 +#define NETIF_CTRL_TOEPLITZ_FLAG_IPV4 (1 << _NETIF_CTRL_TOEPLITZ_FLAG_IPV4) + +/* + * A hash calculated over an IP version 4 header and TCP header as + * follows: + * + * Buffer[0..12] = Packet[12..15] + Packet[16..19] + + * Packet[20..21] + Packet[22..23] + * Result = ToeplitzHash(Buffer, 12) + */ +#define _NETIF_CTRL_TOEPLITZ_FLAG_IPV4_TCP 1 +#define NETIF_CTRL_TOEPLITZ_FLAG_IPV4_TCP (1 << _NETIF_CTRL_TOEPLITZ_FLAG_IPV4_TCP) + +/* + * A hash calculated over an IP version 6 header as follows: + * + * Buffer[0..32] = Packet[8..23] + Packet[24..39] + * Result = ToeplitzHash(Buffer, 32) + */ +#define _NETIF_CTRL_TOEPLITZ_FLAG_IPV6 2 +#define NETIF_CTRL_TOEPLITZ_FLAG_IPV6 (1 << _NETIF_CTRL_TOEPLITZ_FLAG_IPV4) + +/* + * A hash calculated over an IP version 6 header and TCP header as + * follows: + * + * Buffer[0..36] = Packet[8..23] + Packet[24..39] + + * Packet[40..41] + Packet[42..43] + * Result = ToeplitzHash(Buffer, 36) + */ +#define _NETIF_CTRL_TOEPLITZ_FLAG_IPV6_TCP 3 +#define NETIF_CTRL_TOEPLITZ_FLAG_IPV6_TCP (1 << _NETIF_CTRL_TOEPLITZ_FLAG_IPV4_TCP) + +/* + * NETIF_CTRL_MSG_SET_TOEPLITZ_FLAGS: + * + * This is sent by the frontend to set the types of toeplitz hash that + * the backend should calculate. Note that the 'maximal' type of hash + * should always be chosen. For example, if the frontend sets both IPV4 + * and IPV4_TCP hash types then the latter hash type should be calculated + * for any TCP packet and the former only calculated for non-TCP packets. + * The message carries a payload of the following format: + * + * 0 1 2 3 4 5 6 7 octet + * +-----+-----+-----+-----+-----+-----+-----+-----+ + * | flags | + * +-----+-----+-----+-----+-----+-----+-----+-----+ + * + * where flags is a bitwise OR of NETIF_CTRL_TOEPLITZ_FLAG_* values + * defined above. + * + * NOTE: Setting flags to 0 disables toeplitz hashing and the backend + * is free to choose how it steers packets to queues (which is the + * default state). + * + * A successful or unsuccessful ack message carries no data, only a + * status value. + */ + +/* + * NETIF_CTRL_MSG_SET_TOEPLITZ_KEY: + * + * This is sent by the frontend to set the key of toeplitz hash that + * the backend should calculate. The toeplitz algorithm is illustrated + * by the following pseudo-code: + * + * (Buffer[] and Key[] are treated as shift-registers where the MSB of + * Buffer/Key[0] is considered 'left-most' and the LSB of Buffer/Key[N-1] + * is the 'right-most'). + * + * Value = 0 + * For number of bits in Buffer[] + * If (left-most bit of Buffer[] is 1) + * Value ^= left-most 32 bits of Key[] + * Key[] << 1 + * Buffer[] << 1 + * + * Key[] is always 40 octets in length and so the message carries a + * payload of the following format: + * + * 0 1 2 3 4 5 6 7 octet + * +-----+-----+-----+-----+-----+-----+-----+-----+ + * | | + * + + + * | | + * + + + * | key[40] | + * + + + * | | + * + + + * | | + * +-----+-----+-----+-----+-----+-----+-----+-----+ + * + * A successful or unsuccessful ack message carries no data, only a + * status value. + */ + +/* + * NETIF_CTRL_MSG_SET_TOEPLITZ_MAPPING: + * + * This is sent by the frontend to set the mapping of toeplitz hash to + * queue number to be applied by the backend. + * The message carries a payload of the following format: + * + * 0 1 2 3 4 5 6 7 octet + * +-----+-----+-----+-----+-----+-----+-----+-----+ + * | queue[0] | + * +-----+-----+-----+-----+-----+-----+-----+-----+ + * | queue[1] | + * +-----+-----+-----+-----+-----+-----+-----+-----+ + * | queue[2] | + * +-----+-----+-----+-----+-----+-----+-----+-----+ + * | queue[3] | + * . + * . + * | queue[N-1] | + * +-----+-----+-----+-----+-----+-----+-----+-----+ + * + * N can be calculated from the payload length and only power-of-2 + * values are valid. + * + * NOTE: Before a specific mapping is set using this request, the backend + * should map all toeplitz hash values to queue 0 (which is the only + * queue guaranteed to exist in all cases). + * + * A successful or unsuccessful ack message carries no data, only a + * status value. If the value of N is not a power of 2 or any of the + * queue values exceeds the number of queues in operation then status + * should be set to NETIF_CTRL_STATUS_INVALID_PARAMETER. If N is larger + * than the backend's maximal size of mapping table then status should + * be set to NETIF_CTRL_STATUS_BUFFER_OVERFLOW. + */ + +/* * Guest transmit * ============== *

[2/3] public/io/netif.h: document control ring and toeplitz hashing

Commit Message

Comments

Patch