net/tls: avoid TCP window full during ->read_sock()

Message ID	20230803100809.29864-1-hare@suse.de (mailing list archive)
State	Superseded
Delegated to:	Netdev Maintainers
Headers	show Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 18C511548F for <netdev@vger.kernel.org>; Thu, 3 Aug 2023 10:08:17 +0000 (UTC) From: Hannes Reinecke <hare@suse.de> To: Christoph Hellwig <hch@lst.de> Cc: Sagi Grimberg <sagi@grimberg.me>, Keith Busch <kbusch@kernel.org>, linux-nvme@lists.infradead.org, Jakub Kicinski <kuba@kernel.org>, Eric Dumazet <edumazet@google.com>, Paolo Abeni <pabeni@redhat.com>, netdev@vger.kernel.org, Hannes Reinecke <hare@suse.de> Subject: [PATCH] net/tls: avoid TCP window full during ->read_sock() Date: Thu, 3 Aug 2023 12:08:09 +0200 Message-Id: <20230803100809.29864-1-hare@suse.de> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	net/tls: avoid TCP window full during ->read_sock() \| expand net/tls: avoid TCP window full during ->read_sock()

Message ID

20230803100809.29864-1-hare@suse.de (mailing list archive)

State

Superseded

Delegated to:

Netdev Maintainers

Headers

From: Hannes Reinecke <hare@suse.de>
To: Christoph Hellwig <hch@lst.de>
Cc: Sagi Grimberg <sagi@grimberg.me>,
	Keith Busch <kbusch@kernel.org>,
	linux-nvme@lists.infradead.org,
	Jakub Kicinski <kuba@kernel.org>,
	Eric Dumazet <edumazet@google.com>,
	Paolo Abeni <pabeni@redhat.com>,
	netdev@vger.kernel.org,
	Hannes Reinecke <hare@suse.de>
Subject: [PATCH] net/tls: avoid TCP window full during ->read_sock()
Date: Thu,  3 Aug 2023 12:08:09 +0200
Message-Id: <20230803100809.29864-1-hare@suse.de>
Precedence: bulk
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit

Series

net/tls: avoid TCP window full during ->read_sock() | expand

Context	Check	Description
netdev/series_format	warning	Single patches do not need cover letters; Target tree name not specified in the subject
netdev/tree_selection	success	Guessed tree name to be net-next
netdev/fixes_present	success	Fixes tag not required for -next series
netdev/header_inline	success	No static functions without inline keyword in header files
netdev/build_32bit	success	Errors and warnings before: 1328 this patch: 1328
netdev/cc_maintainers	warning	3 maintainers not CCed: borisp@nvidia.com john.fastabend@gmail.com davem@davemloft.net
netdev/build_clang	success	Errors and warnings before: 1351 this patch: 1351
netdev/verify_signedoff	success	Signed-off-by tag matches author and committer
netdev/deprecated_api	success	None detected
netdev/check_selftest	success	No net selftest shell script
netdev/verify_fixes	success	No Fixes tag
netdev/build_allmodconfig_warn	success	Errors and warnings before: 1351 this patch: 1351
netdev/checkpatch	warning	WARNING: line length of 84 exceeds 80 columns WARNING: line length of 85 exceeds 80 columns WARNING: line length of 87 exceeds 80 columns WARNING: line length of 90 exceeds 80 columns WARNING: networking block comments don't use an empty /* line, use /* Comment...
netdev/kdoc	success	Errors and warnings before: 0 this patch: 0
netdev/source_inline	success	Was 0 now: 0

Context

Check

Description

netdev/series_format

warning

Single patches do not need cover letters; Target tree name not specified in the subject

netdev/tree_selection

success

Guessed tree name to be net-next

netdev/fixes_present

success

Fixes tag not required for -next series

netdev/header_inline

success

No static functions without inline keyword in header files

netdev/build_32bit

success

Errors and warnings before: 1328 this patch: 1328

netdev/cc_maintainers

warning

3 maintainers not CCed: borisp@nvidia.com john.fastabend@gmail.com davem@davemloft.net

netdev/build_clang

success

Errors and warnings before: 1351 this patch: 1351

netdev/verify_signedoff

success

Signed-off-by tag matches author and committer

netdev/deprecated_api

success

None detected

netdev/check_selftest

success

No net selftest shell script

netdev/verify_fixes

success

No Fixes tag

netdev/build_allmodconfig_warn

success

Errors and warnings before: 1351 this patch: 1351

netdev/checkpatch

warning

WARNING: line length of 84 exceeds 80 columns WARNING: line length of 85 exceeds 80 columns WARNING: line length of 87 exceeds 80 columns WARNING: line length of 90 exceeds 80 columns WARNING: networking block comments don't use an empty /* line, use /* Comment...

netdev/kdoc

success

Errors and warnings before: 0 this patch: 0

netdev/source_inline

success

Was 0 now: 0

Commit Message

Hannes Reinecke Aug. 3, 2023, 10:08 a.m. UTC

When flushing the backlog after decoding each record in ->read_sock()
we may end up with really long records, causing a TCP window full as
the TCP window would only be increased again after we process the
record. So we should rather process the record first to allow the
TCP window to be increased again before flushing the backlog.

Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 net/tls/tls_sw.c | 17 +++++++++--------
 1 file changed, 9 insertions(+), 8 deletions(-)

Comments

Jakub Kicinski Aug. 5, 2023, 12:57 a.m. UTC | #1

On Thu,  3 Aug 2023 12:08:09 +0200 Hannes Reinecke wrote:
> When flushing the backlog after decoding each record in ->read_sock()
> we may end up with really long records, causing a TCP window full as
> the TCP window would only be increased again after we process the
> record. So we should rather process the record first to allow the
> TCP window to be increased again before flushing the backlog.

> -			released = tls_read_flush_backlog(sk, prot, rxm->full_len, to_decrypt,
> -							  decrypted, &flushed_at);
>  			skb = darg.skb;
> +			/* TLS 1.3 may have updated the length by more than overhead */

> +			rxm = strp_msg(skb);
> +			tlm = tls_msg(skb);
>  			decrypted += rxm->full_len;
>  
>  			tls_rx_rec_done(ctx);
> @@ -2280,6 +2275,12 @@ int tls_sw_read_sock(struct sock *sk, read_descriptor_t *desc,
>  			goto read_sock_requeue;
>  		}
>  		copied += used;
> +		/*
> +		 * flush backlog after processing the TLS record, otherwise we might
> +		 * end up with really large records and triggering a TCP window full.
> +		 */
> +		released = tls_read_flush_backlog(sk, prot, decrypted - copied, decrypted,
> +						  copied, &flushed_at);

I'm surprised moving the flushing out makes a difference.
rx_list should generally hold at most 1 skb (16kB) unless something 
is PEEKing the data.

Looking at it closer I think the problem may be calling args to
tls_read_flush_backlog(). Since we don't know how much data
reader wants we can't sensibly evaluate the first condition,
so how would it work if instead of this patch we did:

-			released = tls_read_flush_backlog(sk, prot, rxm->full_len, to_decrypt,
+			released = tls_read_flush_backlog(sk, prot, INT_MAX, 0,
							  decrypted, &flushed_at);

That would give us a flush every 128k of data (or every record if
inq is shorter than 16kB).

side note - I still prefer 80 char max lines, please. It seems to result
in prettier code ovarall as it forces people to think more about code
structure.

>  		if (used < rxm->full_len) {
>  			rxm->offset += used;
>  			rxm->full_len -= used;

Sagi Grimberg Aug. 7, 2023, 7:08 a.m. UTC | #2

>> When flushing the backlog after decoding each record in ->read_sock()
>> we may end up with really long records, causing a TCP window full as
>> the TCP window would only be increased again after we process the
>> record. So we should rather process the record first to allow the
>> TCP window to be increased again before flushing the backlog.
> 
>> -			released = tls_read_flush_backlog(sk, prot, rxm->full_len, to_decrypt,
>> -							  decrypted, &flushed_at);
>>   			skb = darg.skb;
>> +			/* TLS 1.3 may have updated the length by more than overhead */
> 
>> +			rxm = strp_msg(skb);
>> +			tlm = tls_msg(skb);
>>   			decrypted += rxm->full_len;
>>   
>>   			tls_rx_rec_done(ctx);
>> @@ -2280,6 +2275,12 @@ int tls_sw_read_sock(struct sock *sk, read_descriptor_t *desc,
>>   			goto read_sock_requeue;
>>   		}
>>   		copied += used;
>> +		/*
>> +		 * flush backlog after processing the TLS record, otherwise we might
>> +		 * end up with really large records and triggering a TCP window full.
>> +		 */
>> +		released = tls_read_flush_backlog(sk, prot, decrypted - copied, decrypted,
>> +						  copied, &flushed_at);
> 
> I'm surprised moving the flushing out makes a difference.
> rx_list should generally hold at most 1 skb (16kB) unless something
> is PEEKing the data.
> 
> Looking at it closer I think the problem may be calling args to
> tls_read_flush_backlog(). Since we don't know how much data
> reader wants we can't sensibly evaluate the first condition,
> so how would it work if instead of this patch we did:
> 
> -			released = tls_read_flush_backlog(sk, prot, rxm->full_len, to_decrypt,
> +			released = tls_read_flush_backlog(sk, prot, INT_MAX, 0,
> 							  decrypted, &flushed_at);
> 
> That would give us a flush every 128k of data (or every record if
> inq is shorter than 16kB).

What happens if the window is smaller than 128K ? isn't that what
Hannes is trying to solve for?

Hannes, do you have some absolute numbers to how the window behaves?

diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 9c1f13541708..57db189b29b0 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -2240,7 +2240,6 @@  int tls_sw_read_sock(struct sock *sk, read_descriptor_t *desc,
 			tlm = tls_msg(skb);
 		} else {
 			struct tls_decrypt_arg darg;
-			int to_decrypt;
 
 			err = tls_rx_rec_wait(sk, NULL, true, released);
 			if (err <= 0)
@@ -2248,20 +2247,16 @@  int tls_sw_read_sock(struct sock *sk, read_descriptor_t *desc,
 
 			memset(&darg.inargs, 0, sizeof(darg.inargs));
 
-			rxm = strp_msg(tls_strp_msg(ctx));
-			tlm = tls_msg(tls_strp_msg(ctx));
-
-			to_decrypt = rxm->full_len - prot->overhead_size;
-
 			err = tls_rx_one_record(sk, NULL, &darg);
 			if (err < 0) {
 				tls_err_abort(sk, -EBADMSG);
 				goto read_sock_end;
 			}
 
-			released = tls_read_flush_backlog(sk, prot, rxm->full_len, to_decrypt,
-							  decrypted, &flushed_at);
 			skb = darg.skb;
+			/* TLS 1.3 may have updated the length by more than overhead */
+			rxm = strp_msg(skb);
+			tlm = tls_msg(skb);
 			decrypted += rxm->full_len;
 
 			tls_rx_rec_done(ctx);
@@ -2280,6 +2275,12 @@  int tls_sw_read_sock(struct sock *sk, read_descriptor_t *desc,
 			goto read_sock_requeue;
 		}
 		copied += used;
+		/*
+		 * flush backlog after processing the TLS record, otherwise we might
+		 * end up with really large records and triggering a TCP window full.
+		 */
+		released = tls_read_flush_backlog(sk, prot, decrypted - copied, decrypted,
+						  copied, &flushed_at);
 		if (used < rxm->full_len) {
 			rxm->offset += used;
 			rxm->full_len -= used;

net/tls: avoid TCP window full during ->read_sock()

Checks

Commit Message

Comments

Patch