Message ID | 1290708.1628244534@warthog.procyon.org.uk (mailing list archive) |
---|---|
State | RFC |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [RFC] rxrpc: Support reception of extended-SACK ACK packet | expand |
David, Thanks for working on this. For the benefit of other readers. The RxRPC protocol is not standardized by the IETF or any other standards organization. The best description of the protocol is present in the OpenAFS source repository (doc/txt/rx-spec.txt). Proposed updates to this specification are available in OpenAFS Gerrit doc: rx-spec Update for accuracy with current Rx implementations https://gerrit.openafs.org/#/c/14692 doc: rx-spec Document the Extended SACK Table protocol extension https://gerrit.openafs.org/#/c/14693 On 8/6/2021 6:08 AM, David Howells (dhowells@redhat.com) wrote: > The RxRPC ACK packet supports selective ACK of up to 255 DATA packets. It > contains a variable length array with one octet allocated for each DATA > packet to be ACK'd. Each octet is either 0 or 1 depending on whether it is > a negative or positive ACK. 7 bits in each octet are effectively unused > and, further, there are three reserved octets following the ACK array that > are all set to 0. The proposed EXTENDED_SACK ACK packet format assigns usage for all three of the unused octets. The first is used to extend the SACK table from 255 octets to 256 octets. This octet will be zero unless it is actively being used to represent a positive acknowledgement of a DATA packet. The second unused octet is designated as the <Trailer Count> field. The three unused octets are the result of a math error in the mid-90s when IBM extended the ACK packet format with a series of 32-bit fields. Over time additional fields were added but there has never been a clear method for the ACK packet receiver to determine how many fields were sent. The EXTENDED_SACK proposal addresses this oversight with the <Trailer Count> field which should be set to the number of 32-bit trailer fields. At present this value is 4, not 0. This field will permit additional trailers to be defined in the future. The third unused octet is designated as the <Extra SACK Count> field. This field designates how many additional variable length SACK tables are present following the designated number of trailer fields. Each additional SACK table can represent up to 2048 additional DATA packets. > > To extend the ACK window up to 2048 ACKs, it is proposed[1]: > > (1) that the ACKs for DATA packets first+0...first+254 in the Rx window > are in bit 0 of the octets in the array, ie. acks[0...254], pretty > much as now; and > > (2) that if the ACK count is >=256, the first reserved byte after the ACK > table is annexed to the ACK table as acks[255] and contains the ACK > for packet first+255 in bit 0; and > > (3) that if the ACK count is >256, horizontal striping be employed such > that the ACK for packet first+256 in the window is then in bit 1 of > acks[0], first+257 is in bit 1 of acks[1], up to first+511 being in > bit 1 of the borrowed reserved byte (ie. acks[255]). > > first+512 is then in bit 2 of acks[0], going all the way up to > first+2048 being in bit 7 of acks[255]. I think this should say "first + 2047 in bit 7 of acks[255]." > If extended SACK is employed in an ACK packet, it should have EXTENDED-SACK > (0x08) set in the RxRPC packet header. The EXTENDED_SACK format proposal also addresses a historical deficiency in the specification and usage of the <previousPacket> field. https://gerrit.openafs.org/#/c/14692 adds version specific details to rx-spec.txt but in summary, the <previousPacket> field is unreliable at present and cannot be used in any meaningful way. Various Rx implementations (including OpenAFS) attempt to use it to filter out-of-sequence ACK packets but this is unreliable because the value sent by many Rx peers can move backwards, or represent a sequence number that was out of range, or even not be a sequence number at all. When the EXTENDED_SACK flag is set the <previousPacket> field is defined to be the largest DATA packet sequence number accepted by the sender of the ACK packet. As such the <previousPacket> field can reliably be used for two purposes: 1. <previousPacket> can be used to detect out-of-sequence ACK packets 2. <previousPacket> - <firstPacket> + 1 represents the maximum number of SACK table bits that contain valid acknowledgements. Regardless of how many SACK tables are present in the ACK packet. > Alter rxrpc_input_ack() to sanity check the ACK count. > > Alter rxrpc_input_ack() to limit the number of bytes it extracts from the > packet for the ack array to 256. > > Alter rxrpc_input_soft_acks() to handle an extended SACK table. EXTENDED_SACK is still a proposal awaiting review. What has been agreed upon at this point in time is that the acks[] array be interpreted as bit fields. I suggest that the first patch to rxrpc implement the equivalent change to OpenAFS rx: compare RX_ACK_TYPE_ACK as a bit-field https://gerrit.openafs.org/#/c/14465/4 I also suggest that rxrpc be updated if necessary to always set <previousPacket> as described by the EXTENDED_SACK proposal regardless of whether or not the EXTENDED_SACK proposal is implemented. Existing OpenAFS Rx peers expect that behavior when receiving ACK packets even though they do not the field according to those expectations. Thanks again for taking the time to review and implement the proposal. Jeffrey Altman
diff --git a/net/rxrpc/input.c b/net/rxrpc/input.c index dc201363f2c4..0a7f7462b617 100644 --- a/net/rxrpc/input.c +++ b/net/rxrpc/input.c @@ -767,15 +767,17 @@ static void rxrpc_input_soft_acks(struct rxrpc_call *call, u8 *acks, rxrpc_seq_t seq, int nr_acks, struct rxrpc_ack_summary *summary) { - int ix; - u8 annotation, anno_type; + int ix, i; + u8 annotation, anno_type, ack; - for (; nr_acks > 0; nr_acks--, seq++) { + for (i = 0; i < nr_acks; i++, seq++) { ix = seq & RXRPC_RXTX_BUFF_MASK; annotation = call->rxtx_annotations[ix]; anno_type = annotation & RXRPC_TX_ANNO_MASK; annotation &= ~RXRPC_TX_ANNO_MASK; - switch (*acks++) { + ack = acks[i % RXRPC_EXTENDED_SACK_SIZE]; + ack >>= i / RXRPC_EXTENDED_SACK_SIZE; + switch (ack) { case RXRPC_ACK_TYPE_ACK: summary->nr_acks++; if (anno_type == RXRPC_TX_ANNO_ACK) @@ -846,7 +848,7 @@ static void rxrpc_input_ack(struct rxrpc_call *call, struct sk_buff *skb) union { struct rxrpc_ackpacket ack; struct rxrpc_ackinfo info; - u8 acks[RXRPC_MAXACKS]; + u8 acks[RXRPC_EXTENDED_SACK_SIZE]; } buf; rxrpc_serial_t ack_serial, acked_serial; rxrpc_seq_t first_soft_ack, hard_ack, prev_pkt; @@ -874,6 +876,10 @@ static void rxrpc_input_ack(struct rxrpc_call *call, struct sk_buff *skb) first_soft_ack, prev_pkt, summary.ack_reason, nr_acks); + if ((nr_acks > RXRPC_MAXACKS && !(sp->hdr.flags & RXRPC_EXTENDED_SACK)) || + (nr_acks > RXRPC_MAXACKS_EXTENDED)) + return rxrpc_proto_abort("AKC", call, 0); + switch (buf.ack.reason) { case RXRPC_ACK_PING_RESPONSE: rxrpc_input_ping_response(call, skb->tstamp, acked_serial, @@ -912,7 +918,7 @@ static void rxrpc_input_ack(struct rxrpc_call *call, struct sk_buff *skb) } buf.info.rxMTU = 0; - ioffset = offset + nr_acks + 3; + ioffset = offset + min(nr_acks, RXRPC_MAXACKS) + 3; if (skb->len >= ioffset + sizeof(buf.info) && skb_copy_bits(skb, ioffset, &buf.info, sizeof(buf.info)) < 0) return rxrpc_proto_abort("XAI", call, 0); @@ -969,7 +975,8 @@ static void rxrpc_input_ack(struct rxrpc_call *call, struct sk_buff *skb) } if (nr_acks > 0) { - if (skb_copy_bits(skb, offset, buf.acks, nr_acks) < 0) { + if (skb_copy_bits(skb, offset, buf.acks, + min_t(unsigned int, nr_acks, sizeof(buf.acks))) < 0) { rxrpc_proto_abort("XSA", call, 0); goto out; } diff --git a/net/rxrpc/protocol.h b/net/rxrpc/protocol.h index 49bb972539aa..287986012cd9 100644 --- a/net/rxrpc/protocol.h +++ b/net/rxrpc/protocol.h @@ -51,7 +51,8 @@ struct rxrpc_wire_header { #define RXRPC_CLIENT_INITIATED 0x01 /* signifies a packet generated by a client */ #define RXRPC_REQUEST_ACK 0x02 /* request an unconditional ACK of this packet */ #define RXRPC_LAST_PACKET 0x04 /* the last packet from this side for this call */ -#define RXRPC_MORE_PACKETS 0x08 /* more packets to come */ +#define RXRPC_MORE_PACKETS 0x08 /* [DATA] More packets to come */ +#define RXRPC_EXTENDED_SACK 0x08 /* [ACK] Extended SACK table */ #define RXRPC_JUMBO_PACKET 0x20 /* [DATA] this is a jumbo packet */ #define RXRPC_SLOW_START_OK 0x20 /* [ACK] slow start supported */ @@ -124,7 +125,9 @@ struct rxrpc_ackpacket { #define RXRPC_ACK__INVALID 10 /* Representation of invalid ACK reason */ uint8_t nAcks; /* number of ACKs */ -#define RXRPC_MAXACKS 255 +#define RXRPC_MAXACKS 255 /* Normal maximum number of ACKs */ +#define RXRPC_EXTENDED_SACK_SIZE 256 /* Size of the extended SACK table */ +#define RXRPC_MAXACKS_EXTENDED 2048 /* Maximum number of ACKs in extended SACK table */ uint8_t acks[0]; /* list of ACK/NAKs */ #define RXRPC_ACK_TYPE_NACK 0
The RxRPC ACK packet supports selective ACK of up to 255 DATA packets. It contains a variable length array with one octet allocated for each DATA packet to be ACK'd. Each octet is either 0 or 1 depending on whether it is a negative or positive ACK. 7 bits in each octet are effectively unused and, further, there are three reserved octets following the ACK array that are all set to 0. To extend the ACK window up to 2048 ACKs, it is proposed[1]: (1) that the ACKs for DATA packets first+0...first+254 in the Rx window are in bit 0 of the octets in the array, ie. acks[0...254], pretty much as now; and (2) that if the ACK count is >=256, the first reserved byte after the ACK table is annexed to the ACK table as acks[255] and contains the ACK for packet first+255 in bit 0; and (3) that if the ACK count is >256, horizontal striping be employed such that the ACK for packet first+256 in the window is then in bit 1 of acks[0], first+257 is in bit 1 of acks[1], up to first+511 being in bit 1 of the borrowed reserved byte (ie. acks[255]). first+512 is then in bit 2 of acks[0], going all the way up to first+2048 being in bit 7 of acks[255]. If extended SACK is employed in an ACK packet, it should have EXTENDED-SACK (0x08) set in the RxRPC packet header. Alter rxrpc_input_ack() to sanity check the ACK count. Alter rxrpc_input_ack() to limit the number of bytes it extracts from the packet for the ack array to 256. Alter rxrpc_input_soft_acks() to handle an extended SACK table. Signed-off-by: David Howells <dhowells@redhat.com> Link: https://gerrit.openafs.org/#/c/14693/3 [1] --- net/rxrpc/input.c | 21 ++++++++++++++------- net/rxrpc/protocol.h | 7 +++++-- 2 files changed, 19 insertions(+), 9 deletions(-)