diff mbox series

[net] tls: Skip tls_append_frag on zero copy size

Message ID 20220413134956.3258530-1-maximmi@nvidia.com (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series [net] tls: Skip tls_append_frag on zero copy size | expand

Checks

Context Check Description
netdev/tree_selection success Clearly marked for net
netdev/fixes_present success Fixes tag present in non-next series
netdev/subject_prefix success Link
netdev/cover_letter success Single patches do not need cover letters
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 0 this patch: 0
netdev/cc_maintainers success CCed 9 of 9 maintainers
netdev/build_clang success Errors and warnings before: 0 this patch: 0
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 0 this patch: 0
netdev/checkpatch warning WARNING: line length of 81 exceeds 80 columns
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Maxim Mikityanskiy April 13, 2022, 1:49 p.m. UTC
Calling tls_append_frag when max_open_record_len == record->len might
add an empty fragment to the TLS record if the call happens to be on the
page boundary. Normally tls_append_frag coalesces the zero-sized
fragment to the previous one, but not if it's on page boundary.

If a resync happens then, the mlx5 driver posts dump WQEs in
tx_post_resync_dump, and the empty fragment may become a data segment
with byte_count == 0, which will confuse the NIC and lead to a CQE
error.

This commit fixes the described issue by skipping tls_append_frag on
zero size to avoid adding empty fragments. The fix is not in the driver,
because an empty fragment is hardly the desired behavior.

Fixes: e8f69799810c ("net/tls: Add generic NIC offload infrastructure")
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
---
 net/tls/tls_device.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

Comments

Jakub Kicinski April 14, 2022, 10:28 a.m. UTC | #1
On Wed, 13 Apr 2022 16:49:56 +0300 Maxim Mikityanskiy wrote:
> Calling tls_append_frag when max_open_record_len == record->len might
> add an empty fragment to the TLS record if the call happens to be on the
> page boundary. Normally tls_append_frag coalesces the zero-sized
> fragment to the previous one, but not if it's on page boundary.
> 
> If a resync happens then, the mlx5 driver posts dump WQEs in
> tx_post_resync_dump, and the empty fragment may become a data segment
> with byte_count == 0, which will confuse the NIC and lead to a CQE
> error.
> 
> This commit fixes the described issue by skipping tls_append_frag on
> zero size to avoid adding empty fragments. The fix is not in the driver,
> because an empty fragment is hardly the desired behavior.
> 
> Fixes: e8f69799810c ("net/tls: Add generic NIC offload infrastructure")
> Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
> Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
> ---
>  net/tls/tls_device.c | 12 +++++++-----
>  1 file changed, 7 insertions(+), 5 deletions(-)
> 
> diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c
> index 12f7b56771d9..af875ad4a822 100644
> --- a/net/tls/tls_device.c
> +++ b/net/tls/tls_device.c
> @@ -483,11 +483,13 @@ static int tls_push_data(struct sock *sk,
>  		copy = min_t(size_t, size, (pfrag->size - pfrag->offset));
>  		copy = min_t(size_t, copy, (max_open_record_len - record->len));
>  
> -		rc = tls_device_copy_data(page_address(pfrag->page) +
> -					  pfrag->offset, copy, msg_iter);
> -		if (rc)
> -			goto handle_error;
> -		tls_append_frag(record, pfrag, copy);
> +		if (copy) {
> +			rc = tls_device_copy_data(page_address(pfrag->page) +
> +						  pfrag->offset, copy, msg_iter);
> +			if (rc)
> +				goto handle_error;
> +			tls_append_frag(record, pfrag, copy);
> +		}

I appreciate you're likely trying to keep the fix minimal but Greg
always says "fix it right, worry about backports later".

I think we should skip more, we can reorder the mins and if 
min(size, rec space) == 0 then we can skip the allocation as well.
Maybe some application wants to do zero-length sends to flush the
MSG_MORE and would benefit that way?

>  		size -= copy;
>  		if (!size) {
Maxim Mikityanskiy April 18, 2022, 2:56 p.m. UTC | #2
On 2022-04-14 13:28, Jakub Kicinski wrote:
> On Wed, 13 Apr 2022 16:49:56 +0300 Maxim Mikityanskiy wrote:
>> Calling tls_append_frag when max_open_record_len == record->len might
>> add an empty fragment to the TLS record if the call happens to be on the
>> page boundary. Normally tls_append_frag coalesces the zero-sized
>> fragment to the previous one, but not if it's on page boundary.
>>
>> If a resync happens then, the mlx5 driver posts dump WQEs in
>> tx_post_resync_dump, and the empty fragment may become a data segment
>> with byte_count == 0, which will confuse the NIC and lead to a CQE
>> error.
>>
>> This commit fixes the described issue by skipping tls_append_frag on
>> zero size to avoid adding empty fragments. The fix is not in the driver,
>> because an empty fragment is hardly the desired behavior.
>>
>> Fixes: e8f69799810c ("net/tls: Add generic NIC offload infrastructure")
>> Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
>> Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
>> ---
>>   net/tls/tls_device.c | 12 +++++++-----
>>   1 file changed, 7 insertions(+), 5 deletions(-)
>>
>> diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c
>> index 12f7b56771d9..af875ad4a822 100644
>> --- a/net/tls/tls_device.c
>> +++ b/net/tls/tls_device.c
>> @@ -483,11 +483,13 @@ static int tls_push_data(struct sock *sk,
>>   		copy = min_t(size_t, size, (pfrag->size - pfrag->offset));
>>   		copy = min_t(size_t, copy, (max_open_record_len - record->len));
>>   
>> -		rc = tls_device_copy_data(page_address(pfrag->page) +
>> -					  pfrag->offset, copy, msg_iter);
>> -		if (rc)
>> -			goto handle_error;
>> -		tls_append_frag(record, pfrag, copy);
>> +		if (copy) {
>> +			rc = tls_device_copy_data(page_address(pfrag->page) +
>> +						  pfrag->offset, copy, msg_iter);
>> +			if (rc)
>> +				goto handle_error;
>> +			tls_append_frag(record, pfrag, copy);
>> +		}
> 
> I appreciate you're likely trying to keep the fix minimal but Greg
> always says "fix it right, worry about backports later".
> 
> I think we should skip more, we can reorder the mins and if
> min(size, rec space) == 0 then we can skip the allocation as well.

Sorry, I didn't get the idea. Could you elaborate?

Reordering the mins:

copy = min_t(size_t, size, max_open_record_len - record->len);
copy = min_t(size_t, copy, pfrag->size - pfrag->offset);

I assume by skipping the allocation you mean skipping 
tls_do_allocation(), right? Do you suggest to skip it if the result of 
the first min_t() is 0?

record->len used in the first min_t() comes from ctx->open_record, which 
either exists or is allocated by tls_do_allocation(). If we move the 
copy == 0 check above the tls_do_allocation() call, first we'll have to 
check whether ctx->open_record is NULL, which is currently checked by 
tls_do_allocation() itself.

If open_record is not NULL, there isn't much to skip in 
tls_do_allocation on copy == 0, the main part is already skipped, 
regardless of the value of copy. If open_record is NULL, we can't skip 
tls_do_allocation, and copy won't be 0 afterwards.

To compare, before (pseudocode):

tls_do_allocation {
     if (!ctx->open_record)
         ALLOCATE RECORD
         Now ctx->open_record is not NULL
     if (!sk_page_frag_refill(sk, pfrag))
         return -ENOMEM
}
handle errors from tls_do_allocation
copy = min(size, pfrag->size - pfrag->offset)
copy = min(copy, max_open_record_len - ctx->open_record->len)
if (copy)
     copy data and append frag

After:

if (ctx->open_record) {
     copy = min(size, max_open_record_len - ctx->open_record->len)
     if (copy) {
         // You want to put this part of tls_do_allocation under if (copy)?
         if (!sk_page_frag_refill(sk, pfrag))
             handle errors
         copy = min(copy, pfrag->size - pfrag->offset)
         if (copy)
             copy data and append frag
     }
} else {
     ALLOCATE RECORD
     if (!sk_page_frag_refill(sk, pfrag))
         handle errors
     // Have to do this after the allocation anyway.
     copy = min(size, max_open_record_len - ctx->open_record->len)
     copy = min(copy, pfrag->size - pfrag->offset)
     if (copy)
         copy data and append frag
}

Either I totally don't get what you suggested, or it doesn't make sense 
to me, because we have +1 branch in the common path when a record is 
open and copy is not 0, no changes when there is no record, and more 
repeating code hard to compress.

If I missed your idea, please explain in more details.

> Maybe some application wants to do zero-length sends to flush the
> MSG_MORE and would benefit that way?

If it's a zero-length send, it means that size is 0 initially, and 
max_open_record_len - ctx->open_record->len isn't 0 (otherwise the 
record would have been closed at a previous iteration). That doesn't 
sound related to swapping the mins and skipping tls_do_allocation on 
copy == 0.

Thanks,
Max

>>   		size -= copy;
>>   		if (!size) {
>
Maxim Mikityanskiy April 21, 2022, 9:47 a.m. UTC | #3
On 2022-04-18 17:56, Maxim Mikityanskiy wrote:
> On 2022-04-14 13:28, Jakub Kicinski wrote:
>> On Wed, 13 Apr 2022 16:49:56 +0300 Maxim Mikityanskiy wrote:
>>> Calling tls_append_frag when max_open_record_len == record->len might
>>> add an empty fragment to the TLS record if the call happens to be on the
>>> page boundary. Normally tls_append_frag coalesces the zero-sized
>>> fragment to the previous one, but not if it's on page boundary.
>>>
>>> If a resync happens then, the mlx5 driver posts dump WQEs in
>>> tx_post_resync_dump, and the empty fragment may become a data segment
>>> with byte_count == 0, which will confuse the NIC and lead to a CQE
>>> error.
>>>
>>> This commit fixes the described issue by skipping tls_append_frag on
>>> zero size to avoid adding empty fragments. The fix is not in the driver,
>>> because an empty fragment is hardly the desired behavior.
>>>
>>> Fixes: e8f69799810c ("net/tls: Add generic NIC offload infrastructure")
>>> Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
>>> Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
>>> ---
>>>   net/tls/tls_device.c | 12 +++++++-----
>>>   1 file changed, 7 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c
>>> index 12f7b56771d9..af875ad4a822 100644
>>> --- a/net/tls/tls_device.c
>>> +++ b/net/tls/tls_device.c
>>> @@ -483,11 +483,13 @@ static int tls_push_data(struct sock *sk,
>>>           copy = min_t(size_t, size, (pfrag->size - pfrag->offset));
>>>           copy = min_t(size_t, copy, (max_open_record_len - 
>>> record->len));
>>> -        rc = tls_device_copy_data(page_address(pfrag->page) +
>>> -                      pfrag->offset, copy, msg_iter);
>>> -        if (rc)
>>> -            goto handle_error;
>>> -        tls_append_frag(record, pfrag, copy);
>>> +        if (copy) {
>>> +            rc = tls_device_copy_data(page_address(pfrag->page) +
>>> +                          pfrag->offset, copy, msg_iter);
>>> +            if (rc)
>>> +                goto handle_error;
>>> +            tls_append_frag(record, pfrag, copy);
>>> +        }
>>
>> I appreciate you're likely trying to keep the fix minimal but Greg
>> always says "fix it right, worry about backports later".
>>
>> I think we should skip more, we can reorder the mins and if
>> min(size, rec space) == 0 then we can skip the allocation as well.
> 
> Sorry, I didn't get the idea. Could you elaborate?
> 
> Reordering the mins:
> 
> copy = min_t(size_t, size, max_open_record_len - record->len);
> copy = min_t(size_t, copy, pfrag->size - pfrag->offset);
> 
> I assume by skipping the allocation you mean skipping 
> tls_do_allocation(), right? Do you suggest to skip it if the result of 
> the first min_t() is 0?
> 
> record->len used in the first min_t() comes from ctx->open_record, which 
> either exists or is allocated by tls_do_allocation(). If we move the 
> copy == 0 check above the tls_do_allocation() call, first we'll have to 
> check whether ctx->open_record is NULL, which is currently checked by 
> tls_do_allocation() itself.
> 
> If open_record is not NULL, there isn't much to skip in 
> tls_do_allocation on copy == 0, the main part is already skipped, 
> regardless of the value of copy. If open_record is NULL, we can't skip 
> tls_do_allocation, and copy won't be 0 afterwards.
> 
> To compare, before (pseudocode):
> 
> tls_do_allocation {
>      if (!ctx->open_record)
>          ALLOCATE RECORD
>          Now ctx->open_record is not NULL
>      if (!sk_page_frag_refill(sk, pfrag))
>          return -ENOMEM
> }
> handle errors from tls_do_allocation
> copy = min(size, pfrag->size - pfrag->offset)
> copy = min(copy, max_open_record_len - ctx->open_record->len)
> if (copy)
>      copy data and append frag
> 
> After:
> 
> if (ctx->open_record) {
>      copy = min(size, max_open_record_len - ctx->open_record->len)
>      if (copy) {
>          // You want to put this part of tls_do_allocation under if (copy)?
>          if (!sk_page_frag_refill(sk, pfrag))
>              handle errors
>          copy = min(copy, pfrag->size - pfrag->offset)
>          if (copy)
>              copy data and append frag
>      }
> } else {
>      ALLOCATE RECORD
>      if (!sk_page_frag_refill(sk, pfrag))
>          handle errors
>      // Have to do this after the allocation anyway.
>      copy = min(size, max_open_record_len - ctx->open_record->len)
>      copy = min(copy, pfrag->size - pfrag->offset)
>      if (copy)
>          copy data and append frag
> }
> 
> Either I totally don't get what you suggested, or it doesn't make sense 
> to me, because we have +1 branch in the common path when a record is 
> open and copy is not 0, no changes when there is no record, and more 
> repeating code hard to compress.
> 
> If I missed your idea, please explain in more details.

Jakub, is your comment still relevant after my response? If not, can the 
patch be merged?

>> Maybe some application wants to do zero-length sends to flush the
>> MSG_MORE and would benefit that way?
> 
> If it's a zero-length send, it means that size is 0 initially, and 
> max_open_record_len - ctx->open_record->len isn't 0 (otherwise the 
> record would have been closed at a previous iteration). That doesn't 
> sound related to swapping the mins and skipping tls_do_allocation on 
> copy == 0.
> 
> Thanks,
> Max
> 
>>>           size -= copy;
>>>           if (!size) {
>>
>
Jakub Kicinski April 22, 2022, 2:55 p.m. UTC | #4
On Thu, 21 Apr 2022 12:47:18 +0300 Maxim Mikityanskiy wrote:
> On 2022-04-18 17:56, Maxim Mikityanskiy wrote:
> > On 2022-04-14 13:28, Jakub Kicinski wrote:  
> >> I appreciate you're likely trying to keep the fix minimal but Greg
> >> always says "fix it right, worry about backports later".
> >>
> >> I think we should skip more, we can reorder the mins and if
> >> min(size, rec space) == 0 then we can skip the allocation as well.  
> > 
> > Sorry, I didn't get the idea. Could you elaborate?
> > 
> > Reordering the mins:
> > 
> > copy = min_t(size_t, size, max_open_record_len - record->len);
> > copy = min_t(size_t, copy, pfrag->size - pfrag->offset);
> > 
> > I assume by skipping the allocation you mean skipping 
> > tls_do_allocation(), right? Do you suggest to skip it if the result of 
> > the first min_t() is 0?
> > 
> > record->len used in the first min_t() comes from ctx->open_record, which 
> > either exists or is allocated by tls_do_allocation(). If we move the 
> > copy == 0 check above the tls_do_allocation() call, first we'll have to 
> > check whether ctx->open_record is NULL, which is currently checked by 
> > tls_do_allocation() itself.
> > 
> > If open_record is not NULL, there isn't much to skip in 
> > tls_do_allocation on copy == 0, the main part is already skipped, 
> > regardless of the value of copy. If open_record is NULL, we can't skip 
> > tls_do_allocation, and copy won't be 0 afterwards.
> > 
> > To compare, before (pseudocode):
> > 
> > tls_do_allocation {
> >      if (!ctx->open_record)
> >          ALLOCATE RECORD
> >          Now ctx->open_record is not NULL
> >      if (!sk_page_frag_refill(sk, pfrag))
> >          return -ENOMEM
> > }
> > handle errors from tls_do_allocation
> > copy = min(size, pfrag->size - pfrag->offset)
> > copy = min(copy, max_open_record_len - ctx->open_record->len)
> > if (copy)
> >      copy data and append frag
> > 
> > After:
> > 
> > if (ctx->open_record) {
> >      copy = min(size, max_open_record_len - ctx->open_record->len)
> >      if (copy) {
> >          // You want to put this part of tls_do_allocation under if (copy)?
> >          if (!sk_page_frag_refill(sk, pfrag))
> >              handle errors
> >          copy = min(copy, pfrag->size - pfrag->offset)
> >          if (copy)
> >              copy data and append frag
> >      }
> > } else {
> >      ALLOCATE RECORD
> >      if (!sk_page_frag_refill(sk, pfrag))
> >          handle errors
> >      // Have to do this after the allocation anyway.
> >      copy = min(size, max_open_record_len - ctx->open_record->len)
> >      copy = min(copy, pfrag->size - pfrag->offset)
> >      if (copy)
> >          copy data and append frag
> > }
> > 
> > Either I totally don't get what you suggested, or it doesn't make sense 
> > to me, because we have +1 branch in the common path when a record is 
> > open and copy is not 0, no changes when there is no record, and more 
> > repeating code hard to compress.
> > 
> > If I missed your idea, please explain in more details.  
> 
> Jakub, is your comment still relevant after my response? If not, can the 
> patch be merged?

I'd prefer if you refactored the code so tls_push_data() looks more
natural. But the patch is correct so if you don't want to you can
repost.

Sorry for the delay.
Maxim Mikityanskiy April 26, 2022, 3:48 p.m. UTC | #5
On 2022-04-22 17:55, Jakub Kicinski wrote:
> On Thu, 21 Apr 2022 12:47:18 +0300 Maxim Mikityanskiy wrote:
>> On 2022-04-18 17:56, Maxim Mikityanskiy wrote:
>>> On 2022-04-14 13:28, Jakub Kicinski wrote:
>>>> I appreciate you're likely trying to keep the fix minimal but Greg
>>>> always says "fix it right, worry about backports later".
>>>>
>>>> I think we should skip more, we can reorder the mins and if
>>>> min(size, rec space) == 0 then we can skip the allocation as well.
>>>
>>> Sorry, I didn't get the idea. Could you elaborate?
>>>
>>> Reordering the mins:
>>>
>>> copy = min_t(size_t, size, max_open_record_len - record->len);
>>> copy = min_t(size_t, copy, pfrag->size - pfrag->offset);
>>>
>>> I assume by skipping the allocation you mean skipping
>>> tls_do_allocation(), right? Do you suggest to skip it if the result of
>>> the first min_t() is 0?
>>>
>>> record->len used in the first min_t() comes from ctx->open_record, which
>>> either exists or is allocated by tls_do_allocation(). If we move the
>>> copy == 0 check above the tls_do_allocation() call, first we'll have to
>>> check whether ctx->open_record is NULL, which is currently checked by
>>> tls_do_allocation() itself.
>>>
>>> If open_record is not NULL, there isn't much to skip in
>>> tls_do_allocation on copy == 0, the main part is already skipped,
>>> regardless of the value of copy. If open_record is NULL, we can't skip
>>> tls_do_allocation, and copy won't be 0 afterwards.
>>>
>>> To compare, before (pseudocode):
>>>
>>> tls_do_allocation {
>>>       if (!ctx->open_record)
>>>           ALLOCATE RECORD
>>>           Now ctx->open_record is not NULL
>>>       if (!sk_page_frag_refill(sk, pfrag))
>>>           return -ENOMEM
>>> }
>>> handle errors from tls_do_allocation
>>> copy = min(size, pfrag->size - pfrag->offset)
>>> copy = min(copy, max_open_record_len - ctx->open_record->len)
>>> if (copy)
>>>       copy data and append frag
>>>
>>> After:
>>>
>>> if (ctx->open_record) {
>>>       copy = min(size, max_open_record_len - ctx->open_record->len)
>>>       if (copy) {
>>>           // You want to put this part of tls_do_allocation under if (copy)?
>>>           if (!sk_page_frag_refill(sk, pfrag))
>>>               handle errors
>>>           copy = min(copy, pfrag->size - pfrag->offset)
>>>           if (copy)
>>>               copy data and append frag
>>>       }
>>> } else {
>>>       ALLOCATE RECORD
>>>       if (!sk_page_frag_refill(sk, pfrag))
>>>           handle errors
>>>       // Have to do this after the allocation anyway.
>>>       copy = min(size, max_open_record_len - ctx->open_record->len)
>>>       copy = min(copy, pfrag->size - pfrag->offset)
>>>       if (copy)
>>>           copy data and append frag
>>> }
>>>
>>> Either I totally don't get what you suggested, or it doesn't make sense
>>> to me, because we have +1 branch in the common path when a record is
>>> open and copy is not 0, no changes when there is no record, and more
>>> repeating code hard to compress.
>>>
>>> If I missed your idea, please explain in more details.
>>
>> Jakub, is your comment still relevant after my response? If not, can the
>> patch be merged?
> 
> I'd prefer if you refactored the code so tls_push_data() looks more
> natural.

I would be happy to improve the code, but I honestly didn't understand 
your idea. My attempt to understand it only made the code worse.

> But the patch is correct so if you don't want to you can
> repost.

OK, I'm resubmitting as is, but in case you find time to elaborate on 
your refactoring idea, I'm still open to suggestions.

Thanks.

> Sorry for the delay.
diff mbox series

Patch

diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c
index 12f7b56771d9..af875ad4a822 100644
--- a/net/tls/tls_device.c
+++ b/net/tls/tls_device.c
@@ -483,11 +483,13 @@  static int tls_push_data(struct sock *sk,
 		copy = min_t(size_t, size, (pfrag->size - pfrag->offset));
 		copy = min_t(size_t, copy, (max_open_record_len - record->len));
 
-		rc = tls_device_copy_data(page_address(pfrag->page) +
-					  pfrag->offset, copy, msg_iter);
-		if (rc)
-			goto handle_error;
-		tls_append_frag(record, pfrag, copy);
+		if (copy) {
+			rc = tls_device_copy_data(page_address(pfrag->page) +
+						  pfrag->offset, copy, msg_iter);
+			if (rc)
+				goto handle_error;
+			tls_append_frag(record, pfrag, copy);
+		}
 
 		size -= copy;
 		if (!size) {