diff mbox series

net/tls: avoid TCP window full during ->read_sock()

Message ID 20230803100809.29864-1-hare@suse.de (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series net/tls: avoid TCP window full during ->read_sock() | expand

Checks

Context Check Description
netdev/series_format warning Single patches do not need cover letters; Target tree name not specified in the subject
netdev/tree_selection success Guessed tree name to be net-next
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 1328 this patch: 1328
netdev/cc_maintainers warning 3 maintainers not CCed: borisp@nvidia.com john.fastabend@gmail.com davem@davemloft.net
netdev/build_clang success Errors and warnings before: 1351 this patch: 1351
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 1351 this patch: 1351
netdev/checkpatch warning WARNING: line length of 84 exceeds 80 columns WARNING: line length of 85 exceeds 80 columns WARNING: line length of 87 exceeds 80 columns WARNING: line length of 90 exceeds 80 columns WARNING: networking block comments don't use an empty /* line, use /* Comment...
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Hannes Reinecke Aug. 3, 2023, 10:08 a.m. UTC
When flushing the backlog after decoding each record in ->read_sock()
we may end up with really long records, causing a TCP window full as
the TCP window would only be increased again after we process the
record. So we should rather process the record first to allow the
TCP window to be increased again before flushing the backlog.

Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 net/tls/tls_sw.c | 17 +++++++++--------
 1 file changed, 9 insertions(+), 8 deletions(-)

Comments

Jakub Kicinski Aug. 5, 2023, 12:57 a.m. UTC | #1
On Thu,  3 Aug 2023 12:08:09 +0200 Hannes Reinecke wrote:
> When flushing the backlog after decoding each record in ->read_sock()
> we may end up with really long records, causing a TCP window full as
> the TCP window would only be increased again after we process the
> record. So we should rather process the record first to allow the
> TCP window to be increased again before flushing the backlog.

> -			released = tls_read_flush_backlog(sk, prot, rxm->full_len, to_decrypt,
> -							  decrypted, &flushed_at);
>  			skb = darg.skb;
> +			/* TLS 1.3 may have updated the length by more than overhead */

> +			rxm = strp_msg(skb);
> +			tlm = tls_msg(skb);
>  			decrypted += rxm->full_len;
>  
>  			tls_rx_rec_done(ctx);
> @@ -2280,6 +2275,12 @@ int tls_sw_read_sock(struct sock *sk, read_descriptor_t *desc,
>  			goto read_sock_requeue;
>  		}
>  		copied += used;
> +		/*
> +		 * flush backlog after processing the TLS record, otherwise we might
> +		 * end up with really large records and triggering a TCP window full.
> +		 */
> +		released = tls_read_flush_backlog(sk, prot, decrypted - copied, decrypted,
> +						  copied, &flushed_at);

I'm surprised moving the flushing out makes a difference.
rx_list should generally hold at most 1 skb (16kB) unless something 
is PEEKing the data.

Looking at it closer I think the problem may be calling args to
tls_read_flush_backlog(). Since we don't know how much data
reader wants we can't sensibly evaluate the first condition,
so how would it work if instead of this patch we did:

-			released = tls_read_flush_backlog(sk, prot, rxm->full_len, to_decrypt,
+			released = tls_read_flush_backlog(sk, prot, INT_MAX, 0,
							  decrypted, &flushed_at);

That would give us a flush every 128k of data (or every record if
inq is shorter than 16kB).

side note - I still prefer 80 char max lines, please. It seems to result
in prettier code ovarall as it forces people to think more about code
structure.

>  		if (used < rxm->full_len) {
>  			rxm->offset += used;
>  			rxm->full_len -= used;
Sagi Grimberg Aug. 7, 2023, 7:08 a.m. UTC | #2
>> When flushing the backlog after decoding each record in ->read_sock()
>> we may end up with really long records, causing a TCP window full as
>> the TCP window would only be increased again after we process the
>> record. So we should rather process the record first to allow the
>> TCP window to be increased again before flushing the backlog.
> 
>> -			released = tls_read_flush_backlog(sk, prot, rxm->full_len, to_decrypt,
>> -							  decrypted, &flushed_at);
>>   			skb = darg.skb;
>> +			/* TLS 1.3 may have updated the length by more than overhead */
> 
>> +			rxm = strp_msg(skb);
>> +			tlm = tls_msg(skb);
>>   			decrypted += rxm->full_len;
>>   
>>   			tls_rx_rec_done(ctx);
>> @@ -2280,6 +2275,12 @@ int tls_sw_read_sock(struct sock *sk, read_descriptor_t *desc,
>>   			goto read_sock_requeue;
>>   		}
>>   		copied += used;
>> +		/*
>> +		 * flush backlog after processing the TLS record, otherwise we might
>> +		 * end up with really large records and triggering a TCP window full.
>> +		 */
>> +		released = tls_read_flush_backlog(sk, prot, decrypted - copied, decrypted,
>> +						  copied, &flushed_at);
> 
> I'm surprised moving the flushing out makes a difference.
> rx_list should generally hold at most 1 skb (16kB) unless something
> is PEEKing the data.
> 
> Looking at it closer I think the problem may be calling args to
> tls_read_flush_backlog(). Since we don't know how much data
> reader wants we can't sensibly evaluate the first condition,
> so how would it work if instead of this patch we did:
> 
> -			released = tls_read_flush_backlog(sk, prot, rxm->full_len, to_decrypt,
> +			released = tls_read_flush_backlog(sk, prot, INT_MAX, 0,
> 							  decrypted, &flushed_at);
> 
> That would give us a flush every 128k of data (or every record if
> inq is shorter than 16kB).

What happens if the window is smaller than 128K ? isn't that what
Hannes is trying to solve for?

Hannes, do you have some absolute numbers to how the window behaves?
diff mbox series

Patch

diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 9c1f13541708..57db189b29b0 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -2240,7 +2240,6 @@  int tls_sw_read_sock(struct sock *sk, read_descriptor_t *desc,
 			tlm = tls_msg(skb);
 		} else {
 			struct tls_decrypt_arg darg;
-			int to_decrypt;
 
 			err = tls_rx_rec_wait(sk, NULL, true, released);
 			if (err <= 0)
@@ -2248,20 +2247,16 @@  int tls_sw_read_sock(struct sock *sk, read_descriptor_t *desc,
 
 			memset(&darg.inargs, 0, sizeof(darg.inargs));
 
-			rxm = strp_msg(tls_strp_msg(ctx));
-			tlm = tls_msg(tls_strp_msg(ctx));
-
-			to_decrypt = rxm->full_len - prot->overhead_size;
-
 			err = tls_rx_one_record(sk, NULL, &darg);
 			if (err < 0) {
 				tls_err_abort(sk, -EBADMSG);
 				goto read_sock_end;
 			}
 
-			released = tls_read_flush_backlog(sk, prot, rxm->full_len, to_decrypt,
-							  decrypted, &flushed_at);
 			skb = darg.skb;
+			/* TLS 1.3 may have updated the length by more than overhead */
+			rxm = strp_msg(skb);
+			tlm = tls_msg(skb);
 			decrypted += rxm->full_len;
 
 			tls_rx_rec_done(ctx);
@@ -2280,6 +2275,12 @@  int tls_sw_read_sock(struct sock *sk, read_descriptor_t *desc,
 			goto read_sock_requeue;
 		}
 		copied += used;
+		/*
+		 * flush backlog after processing the TLS record, otherwise we might
+		 * end up with really large records and triggering a TCP window full.
+		 */
+		released = tls_read_flush_backlog(sk, prot, decrypted - copied, decrypted,
+						  copied, &flushed_at);
 		if (used < rxm->full_len) {
 			rxm->offset += used;
 			rxm->full_len -= used;