Message ID | 20230803100809.29864-1-hare@suse.de (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | net/tls: avoid TCP window full during ->read_sock() | expand |
On Thu, 3 Aug 2023 12:08:09 +0200 Hannes Reinecke wrote: > When flushing the backlog after decoding each record in ->read_sock() > we may end up with really long records, causing a TCP window full as > the TCP window would only be increased again after we process the > record. So we should rather process the record first to allow the > TCP window to be increased again before flushing the backlog. > - released = tls_read_flush_backlog(sk, prot, rxm->full_len, to_decrypt, > - decrypted, &flushed_at); > skb = darg.skb; > + /* TLS 1.3 may have updated the length by more than overhead */ > + rxm = strp_msg(skb); > + tlm = tls_msg(skb); > decrypted += rxm->full_len; > > tls_rx_rec_done(ctx); > @@ -2280,6 +2275,12 @@ int tls_sw_read_sock(struct sock *sk, read_descriptor_t *desc, > goto read_sock_requeue; > } > copied += used; > + /* > + * flush backlog after processing the TLS record, otherwise we might > + * end up with really large records and triggering a TCP window full. > + */ > + released = tls_read_flush_backlog(sk, prot, decrypted - copied, decrypted, > + copied, &flushed_at); I'm surprised moving the flushing out makes a difference. rx_list should generally hold at most 1 skb (16kB) unless something is PEEKing the data. Looking at it closer I think the problem may be calling args to tls_read_flush_backlog(). Since we don't know how much data reader wants we can't sensibly evaluate the first condition, so how would it work if instead of this patch we did: - released = tls_read_flush_backlog(sk, prot, rxm->full_len, to_decrypt, + released = tls_read_flush_backlog(sk, prot, INT_MAX, 0, decrypted, &flushed_at); That would give us a flush every 128k of data (or every record if inq is shorter than 16kB). side note - I still prefer 80 char max lines, please. It seems to result in prettier code ovarall as it forces people to think more about code structure. > if (used < rxm->full_len) { > rxm->offset += used; > rxm->full_len -= used;
>> When flushing the backlog after decoding each record in ->read_sock() >> we may end up with really long records, causing a TCP window full as >> the TCP window would only be increased again after we process the >> record. So we should rather process the record first to allow the >> TCP window to be increased again before flushing the backlog. > >> - released = tls_read_flush_backlog(sk, prot, rxm->full_len, to_decrypt, >> - decrypted, &flushed_at); >> skb = darg.skb; >> + /* TLS 1.3 may have updated the length by more than overhead */ > >> + rxm = strp_msg(skb); >> + tlm = tls_msg(skb); >> decrypted += rxm->full_len; >> >> tls_rx_rec_done(ctx); >> @@ -2280,6 +2275,12 @@ int tls_sw_read_sock(struct sock *sk, read_descriptor_t *desc, >> goto read_sock_requeue; >> } >> copied += used; >> + /* >> + * flush backlog after processing the TLS record, otherwise we might >> + * end up with really large records and triggering a TCP window full. >> + */ >> + released = tls_read_flush_backlog(sk, prot, decrypted - copied, decrypted, >> + copied, &flushed_at); > > I'm surprised moving the flushing out makes a difference. > rx_list should generally hold at most 1 skb (16kB) unless something > is PEEKing the data. > > Looking at it closer I think the problem may be calling args to > tls_read_flush_backlog(). Since we don't know how much data > reader wants we can't sensibly evaluate the first condition, > so how would it work if instead of this patch we did: > > - released = tls_read_flush_backlog(sk, prot, rxm->full_len, to_decrypt, > + released = tls_read_flush_backlog(sk, prot, INT_MAX, 0, > decrypted, &flushed_at); > > That would give us a flush every 128k of data (or every record if > inq is shorter than 16kB). What happens if the window is smaller than 128K ? isn't that what Hannes is trying to solve for? Hannes, do you have some absolute numbers to how the window behaves?
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c index 9c1f13541708..57db189b29b0 100644 --- a/net/tls/tls_sw.c +++ b/net/tls/tls_sw.c @@ -2240,7 +2240,6 @@ int tls_sw_read_sock(struct sock *sk, read_descriptor_t *desc, tlm = tls_msg(skb); } else { struct tls_decrypt_arg darg; - int to_decrypt; err = tls_rx_rec_wait(sk, NULL, true, released); if (err <= 0) @@ -2248,20 +2247,16 @@ int tls_sw_read_sock(struct sock *sk, read_descriptor_t *desc, memset(&darg.inargs, 0, sizeof(darg.inargs)); - rxm = strp_msg(tls_strp_msg(ctx)); - tlm = tls_msg(tls_strp_msg(ctx)); - - to_decrypt = rxm->full_len - prot->overhead_size; - err = tls_rx_one_record(sk, NULL, &darg); if (err < 0) { tls_err_abort(sk, -EBADMSG); goto read_sock_end; } - released = tls_read_flush_backlog(sk, prot, rxm->full_len, to_decrypt, - decrypted, &flushed_at); skb = darg.skb; + /* TLS 1.3 may have updated the length by more than overhead */ + rxm = strp_msg(skb); + tlm = tls_msg(skb); decrypted += rxm->full_len; tls_rx_rec_done(ctx); @@ -2280,6 +2275,12 @@ int tls_sw_read_sock(struct sock *sk, read_descriptor_t *desc, goto read_sock_requeue; } copied += used; + /* + * flush backlog after processing the TLS record, otherwise we might + * end up with really large records and triggering a TCP window full. + */ + released = tls_read_flush_backlog(sk, prot, decrypted - copied, decrypted, + copied, &flushed_at); if (used < rxm->full_len) { rxm->offset += used; rxm->full_len -= used;
When flushing the backlog after decoding each record in ->read_sock() we may end up with really long records, causing a TCP window full as the TCP window would only be increased again after we process the record. So we should rather process the record first to allow the TCP window to be increased again before flushing the backlog. Signed-off-by: Hannes Reinecke <hare@suse.de> --- net/tls/tls_sw.c | 17 +++++++++-------- 1 file changed, 9 insertions(+), 8 deletions(-)