Message ID | 20220413134956.3258530-1-maximmi@nvidia.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net] tls: Skip tls_append_frag on zero copy size | expand |
Context | Check | Description |
---|---|---|
netdev/tree_selection | success | Clearly marked for net |
netdev/fixes_present | success | Fixes tag present in non-next series |
netdev/subject_prefix | success | Link |
netdev/cover_letter | success | Single patches do not need cover letters |
netdev/patch_count | success | Link |
netdev/header_inline | success | No static functions without inline keyword in header files |
netdev/build_32bit | success | Errors and warnings before: 0 this patch: 0 |
netdev/cc_maintainers | success | CCed 9 of 9 maintainers |
netdev/build_clang | success | Errors and warnings before: 0 this patch: 0 |
netdev/module_param | success | Was 0 now: 0 |
netdev/verify_signedoff | success | Signed-off-by tag matches author and committer |
netdev/verify_fixes | success | Fixes tag looks correct |
netdev/build_allmodconfig_warn | success | Errors and warnings before: 0 this patch: 0 |
netdev/checkpatch | warning | WARNING: line length of 81 exceeds 80 columns |
netdev/kdoc | success | Errors and warnings before: 0 this patch: 0 |
netdev/source_inline | success | Was 0 now: 0 |
On Wed, 13 Apr 2022 16:49:56 +0300 Maxim Mikityanskiy wrote: > Calling tls_append_frag when max_open_record_len == record->len might > add an empty fragment to the TLS record if the call happens to be on the > page boundary. Normally tls_append_frag coalesces the zero-sized > fragment to the previous one, but not if it's on page boundary. > > If a resync happens then, the mlx5 driver posts dump WQEs in > tx_post_resync_dump, and the empty fragment may become a data segment > with byte_count == 0, which will confuse the NIC and lead to a CQE > error. > > This commit fixes the described issue by skipping tls_append_frag on > zero size to avoid adding empty fragments. The fix is not in the driver, > because an empty fragment is hardly the desired behavior. > > Fixes: e8f69799810c ("net/tls: Add generic NIC offload infrastructure") > Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> > Reviewed-by: Tariq Toukan <tariqt@nvidia.com> > --- > net/tls/tls_device.c | 12 +++++++----- > 1 file changed, 7 insertions(+), 5 deletions(-) > > diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c > index 12f7b56771d9..af875ad4a822 100644 > --- a/net/tls/tls_device.c > +++ b/net/tls/tls_device.c > @@ -483,11 +483,13 @@ static int tls_push_data(struct sock *sk, > copy = min_t(size_t, size, (pfrag->size - pfrag->offset)); > copy = min_t(size_t, copy, (max_open_record_len - record->len)); > > - rc = tls_device_copy_data(page_address(pfrag->page) + > - pfrag->offset, copy, msg_iter); > - if (rc) > - goto handle_error; > - tls_append_frag(record, pfrag, copy); > + if (copy) { > + rc = tls_device_copy_data(page_address(pfrag->page) + > + pfrag->offset, copy, msg_iter); > + if (rc) > + goto handle_error; > + tls_append_frag(record, pfrag, copy); > + } I appreciate you're likely trying to keep the fix minimal but Greg always says "fix it right, worry about backports later". I think we should skip more, we can reorder the mins and if min(size, rec space) == 0 then we can skip the allocation as well. Maybe some application wants to do zero-length sends to flush the MSG_MORE and would benefit that way? > size -= copy; > if (!size) {
On 2022-04-14 13:28, Jakub Kicinski wrote: > On Wed, 13 Apr 2022 16:49:56 +0300 Maxim Mikityanskiy wrote: >> Calling tls_append_frag when max_open_record_len == record->len might >> add an empty fragment to the TLS record if the call happens to be on the >> page boundary. Normally tls_append_frag coalesces the zero-sized >> fragment to the previous one, but not if it's on page boundary. >> >> If a resync happens then, the mlx5 driver posts dump WQEs in >> tx_post_resync_dump, and the empty fragment may become a data segment >> with byte_count == 0, which will confuse the NIC and lead to a CQE >> error. >> >> This commit fixes the described issue by skipping tls_append_frag on >> zero size to avoid adding empty fragments. The fix is not in the driver, >> because an empty fragment is hardly the desired behavior. >> >> Fixes: e8f69799810c ("net/tls: Add generic NIC offload infrastructure") >> Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> >> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> >> --- >> net/tls/tls_device.c | 12 +++++++----- >> 1 file changed, 7 insertions(+), 5 deletions(-) >> >> diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c >> index 12f7b56771d9..af875ad4a822 100644 >> --- a/net/tls/tls_device.c >> +++ b/net/tls/tls_device.c >> @@ -483,11 +483,13 @@ static int tls_push_data(struct sock *sk, >> copy = min_t(size_t, size, (pfrag->size - pfrag->offset)); >> copy = min_t(size_t, copy, (max_open_record_len - record->len)); >> >> - rc = tls_device_copy_data(page_address(pfrag->page) + >> - pfrag->offset, copy, msg_iter); >> - if (rc) >> - goto handle_error; >> - tls_append_frag(record, pfrag, copy); >> + if (copy) { >> + rc = tls_device_copy_data(page_address(pfrag->page) + >> + pfrag->offset, copy, msg_iter); >> + if (rc) >> + goto handle_error; >> + tls_append_frag(record, pfrag, copy); >> + } > > I appreciate you're likely trying to keep the fix minimal but Greg > always says "fix it right, worry about backports later". > > I think we should skip more, we can reorder the mins and if > min(size, rec space) == 0 then we can skip the allocation as well. Sorry, I didn't get the idea. Could you elaborate? Reordering the mins: copy = min_t(size_t, size, max_open_record_len - record->len); copy = min_t(size_t, copy, pfrag->size - pfrag->offset); I assume by skipping the allocation you mean skipping tls_do_allocation(), right? Do you suggest to skip it if the result of the first min_t() is 0? record->len used in the first min_t() comes from ctx->open_record, which either exists or is allocated by tls_do_allocation(). If we move the copy == 0 check above the tls_do_allocation() call, first we'll have to check whether ctx->open_record is NULL, which is currently checked by tls_do_allocation() itself. If open_record is not NULL, there isn't much to skip in tls_do_allocation on copy == 0, the main part is already skipped, regardless of the value of copy. If open_record is NULL, we can't skip tls_do_allocation, and copy won't be 0 afterwards. To compare, before (pseudocode): tls_do_allocation { if (!ctx->open_record) ALLOCATE RECORD Now ctx->open_record is not NULL if (!sk_page_frag_refill(sk, pfrag)) return -ENOMEM } handle errors from tls_do_allocation copy = min(size, pfrag->size - pfrag->offset) copy = min(copy, max_open_record_len - ctx->open_record->len) if (copy) copy data and append frag After: if (ctx->open_record) { copy = min(size, max_open_record_len - ctx->open_record->len) if (copy) { // You want to put this part of tls_do_allocation under if (copy)? if (!sk_page_frag_refill(sk, pfrag)) handle errors copy = min(copy, pfrag->size - pfrag->offset) if (copy) copy data and append frag } } else { ALLOCATE RECORD if (!sk_page_frag_refill(sk, pfrag)) handle errors // Have to do this after the allocation anyway. copy = min(size, max_open_record_len - ctx->open_record->len) copy = min(copy, pfrag->size - pfrag->offset) if (copy) copy data and append frag } Either I totally don't get what you suggested, or it doesn't make sense to me, because we have +1 branch in the common path when a record is open and copy is not 0, no changes when there is no record, and more repeating code hard to compress. If I missed your idea, please explain in more details. > Maybe some application wants to do zero-length sends to flush the > MSG_MORE and would benefit that way? If it's a zero-length send, it means that size is 0 initially, and max_open_record_len - ctx->open_record->len isn't 0 (otherwise the record would have been closed at a previous iteration). That doesn't sound related to swapping the mins and skipping tls_do_allocation on copy == 0. Thanks, Max >> size -= copy; >> if (!size) { >
On 2022-04-18 17:56, Maxim Mikityanskiy wrote: > On 2022-04-14 13:28, Jakub Kicinski wrote: >> On Wed, 13 Apr 2022 16:49:56 +0300 Maxim Mikityanskiy wrote: >>> Calling tls_append_frag when max_open_record_len == record->len might >>> add an empty fragment to the TLS record if the call happens to be on the >>> page boundary. Normally tls_append_frag coalesces the zero-sized >>> fragment to the previous one, but not if it's on page boundary. >>> >>> If a resync happens then, the mlx5 driver posts dump WQEs in >>> tx_post_resync_dump, and the empty fragment may become a data segment >>> with byte_count == 0, which will confuse the NIC and lead to a CQE >>> error. >>> >>> This commit fixes the described issue by skipping tls_append_frag on >>> zero size to avoid adding empty fragments. The fix is not in the driver, >>> because an empty fragment is hardly the desired behavior. >>> >>> Fixes: e8f69799810c ("net/tls: Add generic NIC offload infrastructure") >>> Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> >>> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> >>> --- >>> net/tls/tls_device.c | 12 +++++++----- >>> 1 file changed, 7 insertions(+), 5 deletions(-) >>> >>> diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c >>> index 12f7b56771d9..af875ad4a822 100644 >>> --- a/net/tls/tls_device.c >>> +++ b/net/tls/tls_device.c >>> @@ -483,11 +483,13 @@ static int tls_push_data(struct sock *sk, >>> copy = min_t(size_t, size, (pfrag->size - pfrag->offset)); >>> copy = min_t(size_t, copy, (max_open_record_len - >>> record->len)); >>> - rc = tls_device_copy_data(page_address(pfrag->page) + >>> - pfrag->offset, copy, msg_iter); >>> - if (rc) >>> - goto handle_error; >>> - tls_append_frag(record, pfrag, copy); >>> + if (copy) { >>> + rc = tls_device_copy_data(page_address(pfrag->page) + >>> + pfrag->offset, copy, msg_iter); >>> + if (rc) >>> + goto handle_error; >>> + tls_append_frag(record, pfrag, copy); >>> + } >> >> I appreciate you're likely trying to keep the fix minimal but Greg >> always says "fix it right, worry about backports later". >> >> I think we should skip more, we can reorder the mins and if >> min(size, rec space) == 0 then we can skip the allocation as well. > > Sorry, I didn't get the idea. Could you elaborate? > > Reordering the mins: > > copy = min_t(size_t, size, max_open_record_len - record->len); > copy = min_t(size_t, copy, pfrag->size - pfrag->offset); > > I assume by skipping the allocation you mean skipping > tls_do_allocation(), right? Do you suggest to skip it if the result of > the first min_t() is 0? > > record->len used in the first min_t() comes from ctx->open_record, which > either exists or is allocated by tls_do_allocation(). If we move the > copy == 0 check above the tls_do_allocation() call, first we'll have to > check whether ctx->open_record is NULL, which is currently checked by > tls_do_allocation() itself. > > If open_record is not NULL, there isn't much to skip in > tls_do_allocation on copy == 0, the main part is already skipped, > regardless of the value of copy. If open_record is NULL, we can't skip > tls_do_allocation, and copy won't be 0 afterwards. > > To compare, before (pseudocode): > > tls_do_allocation { > if (!ctx->open_record) > ALLOCATE RECORD > Now ctx->open_record is not NULL > if (!sk_page_frag_refill(sk, pfrag)) > return -ENOMEM > } > handle errors from tls_do_allocation > copy = min(size, pfrag->size - pfrag->offset) > copy = min(copy, max_open_record_len - ctx->open_record->len) > if (copy) > copy data and append frag > > After: > > if (ctx->open_record) { > copy = min(size, max_open_record_len - ctx->open_record->len) > if (copy) { > // You want to put this part of tls_do_allocation under if (copy)? > if (!sk_page_frag_refill(sk, pfrag)) > handle errors > copy = min(copy, pfrag->size - pfrag->offset) > if (copy) > copy data and append frag > } > } else { > ALLOCATE RECORD > if (!sk_page_frag_refill(sk, pfrag)) > handle errors > // Have to do this after the allocation anyway. > copy = min(size, max_open_record_len - ctx->open_record->len) > copy = min(copy, pfrag->size - pfrag->offset) > if (copy) > copy data and append frag > } > > Either I totally don't get what you suggested, or it doesn't make sense > to me, because we have +1 branch in the common path when a record is > open and copy is not 0, no changes when there is no record, and more > repeating code hard to compress. > > If I missed your idea, please explain in more details. Jakub, is your comment still relevant after my response? If not, can the patch be merged? >> Maybe some application wants to do zero-length sends to flush the >> MSG_MORE and would benefit that way? > > If it's a zero-length send, it means that size is 0 initially, and > max_open_record_len - ctx->open_record->len isn't 0 (otherwise the > record would have been closed at a previous iteration). That doesn't > sound related to swapping the mins and skipping tls_do_allocation on > copy == 0. > > Thanks, > Max > >>> size -= copy; >>> if (!size) { >> >
On Thu, 21 Apr 2022 12:47:18 +0300 Maxim Mikityanskiy wrote: > On 2022-04-18 17:56, Maxim Mikityanskiy wrote: > > On 2022-04-14 13:28, Jakub Kicinski wrote: > >> I appreciate you're likely trying to keep the fix minimal but Greg > >> always says "fix it right, worry about backports later". > >> > >> I think we should skip more, we can reorder the mins and if > >> min(size, rec space) == 0 then we can skip the allocation as well. > > > > Sorry, I didn't get the idea. Could you elaborate? > > > > Reordering the mins: > > > > copy = min_t(size_t, size, max_open_record_len - record->len); > > copy = min_t(size_t, copy, pfrag->size - pfrag->offset); > > > > I assume by skipping the allocation you mean skipping > > tls_do_allocation(), right? Do you suggest to skip it if the result of > > the first min_t() is 0? > > > > record->len used in the first min_t() comes from ctx->open_record, which > > either exists or is allocated by tls_do_allocation(). If we move the > > copy == 0 check above the tls_do_allocation() call, first we'll have to > > check whether ctx->open_record is NULL, which is currently checked by > > tls_do_allocation() itself. > > > > If open_record is not NULL, there isn't much to skip in > > tls_do_allocation on copy == 0, the main part is already skipped, > > regardless of the value of copy. If open_record is NULL, we can't skip > > tls_do_allocation, and copy won't be 0 afterwards. > > > > To compare, before (pseudocode): > > > > tls_do_allocation { > > if (!ctx->open_record) > > ALLOCATE RECORD > > Now ctx->open_record is not NULL > > if (!sk_page_frag_refill(sk, pfrag)) > > return -ENOMEM > > } > > handle errors from tls_do_allocation > > copy = min(size, pfrag->size - pfrag->offset) > > copy = min(copy, max_open_record_len - ctx->open_record->len) > > if (copy) > > copy data and append frag > > > > After: > > > > if (ctx->open_record) { > > copy = min(size, max_open_record_len - ctx->open_record->len) > > if (copy) { > > // You want to put this part of tls_do_allocation under if (copy)? > > if (!sk_page_frag_refill(sk, pfrag)) > > handle errors > > copy = min(copy, pfrag->size - pfrag->offset) > > if (copy) > > copy data and append frag > > } > > } else { > > ALLOCATE RECORD > > if (!sk_page_frag_refill(sk, pfrag)) > > handle errors > > // Have to do this after the allocation anyway. > > copy = min(size, max_open_record_len - ctx->open_record->len) > > copy = min(copy, pfrag->size - pfrag->offset) > > if (copy) > > copy data and append frag > > } > > > > Either I totally don't get what you suggested, or it doesn't make sense > > to me, because we have +1 branch in the common path when a record is > > open and copy is not 0, no changes when there is no record, and more > > repeating code hard to compress. > > > > If I missed your idea, please explain in more details. > > Jakub, is your comment still relevant after my response? If not, can the > patch be merged? I'd prefer if you refactored the code so tls_push_data() looks more natural. But the patch is correct so if you don't want to you can repost. Sorry for the delay.
On 2022-04-22 17:55, Jakub Kicinski wrote: > On Thu, 21 Apr 2022 12:47:18 +0300 Maxim Mikityanskiy wrote: >> On 2022-04-18 17:56, Maxim Mikityanskiy wrote: >>> On 2022-04-14 13:28, Jakub Kicinski wrote: >>>> I appreciate you're likely trying to keep the fix minimal but Greg >>>> always says "fix it right, worry about backports later". >>>> >>>> I think we should skip more, we can reorder the mins and if >>>> min(size, rec space) == 0 then we can skip the allocation as well. >>> >>> Sorry, I didn't get the idea. Could you elaborate? >>> >>> Reordering the mins: >>> >>> copy = min_t(size_t, size, max_open_record_len - record->len); >>> copy = min_t(size_t, copy, pfrag->size - pfrag->offset); >>> >>> I assume by skipping the allocation you mean skipping >>> tls_do_allocation(), right? Do you suggest to skip it if the result of >>> the first min_t() is 0? >>> >>> record->len used in the first min_t() comes from ctx->open_record, which >>> either exists or is allocated by tls_do_allocation(). If we move the >>> copy == 0 check above the tls_do_allocation() call, first we'll have to >>> check whether ctx->open_record is NULL, which is currently checked by >>> tls_do_allocation() itself. >>> >>> If open_record is not NULL, there isn't much to skip in >>> tls_do_allocation on copy == 0, the main part is already skipped, >>> regardless of the value of copy. If open_record is NULL, we can't skip >>> tls_do_allocation, and copy won't be 0 afterwards. >>> >>> To compare, before (pseudocode): >>> >>> tls_do_allocation { >>> if (!ctx->open_record) >>> ALLOCATE RECORD >>> Now ctx->open_record is not NULL >>> if (!sk_page_frag_refill(sk, pfrag)) >>> return -ENOMEM >>> } >>> handle errors from tls_do_allocation >>> copy = min(size, pfrag->size - pfrag->offset) >>> copy = min(copy, max_open_record_len - ctx->open_record->len) >>> if (copy) >>> copy data and append frag >>> >>> After: >>> >>> if (ctx->open_record) { >>> copy = min(size, max_open_record_len - ctx->open_record->len) >>> if (copy) { >>> // You want to put this part of tls_do_allocation under if (copy)? >>> if (!sk_page_frag_refill(sk, pfrag)) >>> handle errors >>> copy = min(copy, pfrag->size - pfrag->offset) >>> if (copy) >>> copy data and append frag >>> } >>> } else { >>> ALLOCATE RECORD >>> if (!sk_page_frag_refill(sk, pfrag)) >>> handle errors >>> // Have to do this after the allocation anyway. >>> copy = min(size, max_open_record_len - ctx->open_record->len) >>> copy = min(copy, pfrag->size - pfrag->offset) >>> if (copy) >>> copy data and append frag >>> } >>> >>> Either I totally don't get what you suggested, or it doesn't make sense >>> to me, because we have +1 branch in the common path when a record is >>> open and copy is not 0, no changes when there is no record, and more >>> repeating code hard to compress. >>> >>> If I missed your idea, please explain in more details. >> >> Jakub, is your comment still relevant after my response? If not, can the >> patch be merged? > > I'd prefer if you refactored the code so tls_push_data() looks more > natural. I would be happy to improve the code, but I honestly didn't understand your idea. My attempt to understand it only made the code worse. > But the patch is correct so if you don't want to you can > repost. OK, I'm resubmitting as is, but in case you find time to elaborate on your refactoring idea, I'm still open to suggestions. Thanks. > Sorry for the delay.
diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c index 12f7b56771d9..af875ad4a822 100644 --- a/net/tls/tls_device.c +++ b/net/tls/tls_device.c @@ -483,11 +483,13 @@ static int tls_push_data(struct sock *sk, copy = min_t(size_t, size, (pfrag->size - pfrag->offset)); copy = min_t(size_t, copy, (max_open_record_len - record->len)); - rc = tls_device_copy_data(page_address(pfrag->page) + - pfrag->offset, copy, msg_iter); - if (rc) - goto handle_error; - tls_append_frag(record, pfrag, copy); + if (copy) { + rc = tls_device_copy_data(page_address(pfrag->page) + + pfrag->offset, copy, msg_iter); + if (rc) + goto handle_error; + tls_append_frag(record, pfrag, copy); + } size -= copy; if (!size) {