[net-next] net: Make gro complete function to return void

Message ID	20230529134430.492879-1-parav@nvidia.com (mailing list archive)
State	Accepted
Commit	b1f2abcf817d82b54764d1474424649feda6fe1b
Delegated to:	Netdev Maintainers
Headers	show Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D2EE06FDE for <netdev@vger.kernel.org>; Mon, 29 May 2023 13:45:24 +0000 (UTC) Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.117.161 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.117.161; helo=mail.nvidia.com; pr=C From: Parav Pandit <parav@nvidia.com> To: <edumazet@google.com>, <davem@davemloft.net>, <kuba@kernel.org>, <pabeni@redhat.com>, <dsahern@kernel.org>, <netdev@vger.kernel.org> CC: Parav Pandit <parav@nvidia.com> Subject: [PATCH net-next] net: Make gro complete function to return void Date: Mon, 29 May 2023 16:44:30 +0300 Message-ID: <20230529134430.492879-1-parav@nvidia.com> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain
Series	[net-next] net: Make gro complete function to return void \| expand [net-next] net: Make gro complete function to return void

Context	Check	Description
netdev/series_format	success	Single patches do not need cover letters
netdev/tree_selection	success	Clearly marked for net-next, async
netdev/fixes_present	success	Fixes tag not required for -next series
netdev/header_inline	success	No static functions without inline keyword in header files
netdev/build_32bit	success	Errors and warnings before: 990 this patch: 990
netdev/cc_maintainers	success	CCed 6 of 6 maintainers
netdev/build_clang	success	Errors and warnings before: 127 this patch: 127
netdev/verify_signedoff	success	Signed-off-by tag matches author and committer
netdev/deprecated_api	success	None detected
netdev/check_selftest	success	No net selftest shell script
netdev/verify_fixes	success	No Fixes tag
netdev/build_allmodconfig_warn	success	Errors and warnings before: 997 this patch: 997
netdev/checkpatch	success	total: 0 errors, 0 warnings, 0 checks, 42 lines checked
netdev/kdoc	success	Errors and warnings before: 0 this patch: 0
netdev/source_inline	success	Was 0 now: 0

Parav Pandit May 29, 2023, 1:44 p.m. UTC

tcp_gro_complete() function only updates the skb fields related to GRO
and it always returns zero. All the 3 drivers which are using it
do not check for the return value either.

Change it to return void instead which simplifies its callers as
error handing becomes unnecessary.

Signed-off-by: Parav Pandit <parav@nvidia.com>
---
 include/net/tcp.h        | 2 +-
 net/ipv4/tcp_offload.c   | 7 +++----
 net/ipv6/tcpv6_offload.c | 3 ++-
 3 files changed, 6 insertions(+), 6 deletions(-)

Simon Horman May 30, 2023, 7:45 a.m. UTC | #1

On Mon, May 29, 2023 at 04:44:30PM +0300, Parav Pandit wrote:
> tcp_gro_complete() function only updates the skb fields related to GRO
> and it always returns zero. All the 3 drivers which are using it
> do not check for the return value either.
> 
> Change it to return void instead which simplifies its callers as
> error handing becomes unnecessary.
> 
> Signed-off-by: Parav Pandit <parav@nvidia.com>

Reviewed-by: Simon Horman <simon.horman@corigine.com>

David Ahern May 30, 2023, 3:25 p.m. UTC | #2

On 5/29/23 7:44 AM, Parav Pandit wrote:
> diff --git a/net/ipv4/tcp_offload.c b/net/ipv4/tcp_offload.c
> index 45dda7889387..88f9b0081ee7 100644
> --- a/net/ipv4/tcp_offload.c
> +++ b/net/ipv4/tcp_offload.c
> @@ -296,7 +296,7 @@ struct sk_buff *tcp_gro_receive(struct list_head *head, struct sk_buff *skb)
>  	return pp;
>  }
>  
> -int tcp_gro_complete(struct sk_buff *skb)
> +void tcp_gro_complete(struct sk_buff *skb)
>  {
>  	struct tcphdr *th = tcp_hdr(skb);
>  
> @@ -311,8 +311,6 @@ int tcp_gro_complete(struct sk_buff *skb)
>  
>  	if (skb->encapsulation)
>  		skb->inner_transport_header = skb->transport_header;
> -
> -	return 0;
>  }
>  EXPORT_SYMBOL(tcp_gro_complete);

tcp_gro_complete seems fairly trivial. Any reason not to make it an
inline and avoid another function call in the datapath?

>  
> @@ -342,7 +340,8 @@ INDIRECT_CALLABLE_SCOPE int tcp4_gro_complete(struct sk_buff *skb, int thoff)
>  	if (NAPI_GRO_CB(skb)->is_atomic)
>  		skb_shinfo(skb)->gso_type |= SKB_GSO_TCP_FIXEDID;
>  
> -	return tcp_gro_complete(skb);
> +	tcp_gro_complete(skb);
> +	return 0;
>  }
>

Parav Pandit May 30, 2023, 3:39 p.m. UTC | #3

> From: David Ahern <dsahern@kernel.org>
> Sent: Tuesday, May 30, 2023 11:26 AM
> 
> On 5/29/23 7:44 AM, Parav Pandit wrote:
> > diff --git a/net/ipv4/tcp_offload.c b/net/ipv4/tcp_offload.c index
> > 45dda7889387..88f9b0081ee7 100644
> > --- a/net/ipv4/tcp_offload.c
> > +++ b/net/ipv4/tcp_offload.c
> > @@ -296,7 +296,7 @@ struct sk_buff *tcp_gro_receive(struct list_head
> *head, struct sk_buff *skb)
> >  	return pp;
> >  }
> >
> > -int tcp_gro_complete(struct sk_buff *skb)
> > +void tcp_gro_complete(struct sk_buff *skb)
> >  {
> >  	struct tcphdr *th = tcp_hdr(skb);
> >
> > @@ -311,8 +311,6 @@ int tcp_gro_complete(struct sk_buff *skb)
> >
> >  	if (skb->encapsulation)
> >  		skb->inner_transport_header = skb->transport_header;
> > -
> > -	return 0;
> >  }
> >  EXPORT_SYMBOL(tcp_gro_complete);
> 
> tcp_gro_complete seems fairly trivial. Any reason not to make it an inline and
> avoid another function call in the datapath?
> 
Sounds good to me.
With inline it should mostly improve the perf, but I do not have any of the 3 adapters which are calling this API to show perf results.

Since, it is a different change touching the performance, I prefer to do follow up patch that bnx owners can test.
Is that ok?

Eric Dumazet May 30, 2023, 3:48 p.m. UTC | #4

On Tue, May 30, 2023 at 5:25 PM David Ahern <dsahern@kernel.org> wrote:
>
> On 5/29/23 7:44 AM, Parav Pandit wrote:
> > diff --git a/net/ipv4/tcp_offload.c b/net/ipv4/tcp_offload.c
> > index 45dda7889387..88f9b0081ee7 100644
> > --- a/net/ipv4/tcp_offload.c
> > +++ b/net/ipv4/tcp_offload.c
> > @@ -296,7 +296,7 @@ struct sk_buff *tcp_gro_receive(struct list_head *head, struct sk_buff *skb)
> >       return pp;
> >  }
> >
> > -int tcp_gro_complete(struct sk_buff *skb)
> > +void tcp_gro_complete(struct sk_buff *skb)
> >  {
> >       struct tcphdr *th = tcp_hdr(skb);
> >
> > @@ -311,8 +311,6 @@ int tcp_gro_complete(struct sk_buff *skb)
> >
> >       if (skb->encapsulation)
> >               skb->inner_transport_header = skb->transport_header;
> > -
> > -     return 0;
> >  }
> >  EXPORT_SYMBOL(tcp_gro_complete);
>
> tcp_gro_complete seems fairly trivial. Any reason not to make it an
> inline and avoid another function call in the datapath?

Probably, although it is a regular function call, not an indirect one.

In the grand total of driver rx napi + GRO cost, saving a few cycles
per GRO completed packet is quite small.

David Ahern May 30, 2023, 3:51 p.m. UTC | #5

On 5/30/23 9:39 AM, Parav Pandit wrote:
>> From: David Ahern <dsahern@kernel.org>
>> Sent: Tuesday, May 30, 2023 11:26 AM
>>
>> On 5/29/23 7:44 AM, Parav Pandit wrote:
>>> diff --git a/net/ipv4/tcp_offload.c b/net/ipv4/tcp_offload.c index
>>> 45dda7889387..88f9b0081ee7 100644
>>> --- a/net/ipv4/tcp_offload.c
>>> +++ b/net/ipv4/tcp_offload.c
>>> @@ -296,7 +296,7 @@ struct sk_buff *tcp_gro_receive(struct list_head
>> *head, struct sk_buff *skb)
>>>  	return pp;
>>>  }
>>>
>>> -int tcp_gro_complete(struct sk_buff *skb)
>>> +void tcp_gro_complete(struct sk_buff *skb)
>>>  {
>>>  	struct tcphdr *th = tcp_hdr(skb);
>>>
>>> @@ -311,8 +311,6 @@ int tcp_gro_complete(struct sk_buff *skb)
>>>
>>>  	if (skb->encapsulation)
>>>  		skb->inner_transport_header = skb->transport_header;
>>> -
>>> -	return 0;
>>>  }
>>>  EXPORT_SYMBOL(tcp_gro_complete);
>>
>> tcp_gro_complete seems fairly trivial. Any reason not to make it an inline and
>> avoid another function call in the datapath?
>>
> Sounds good to me.
> With inline it should mostly improve the perf, but I do not have any of the 3 adapters which are calling this API to show perf results.
> 
> Since, it is a different change touching the performance, I prefer to do follow up patch that bnx owners can test.
> Is that ok?

sure.

Jakub Kicinski May 30, 2023, 7:39 p.m. UTC | #6

On Tue, 30 May 2023 17:48:22 +0200 Eric Dumazet wrote:
> > tcp_gro_complete seems fairly trivial. Any reason not to make it an
> > inline and avoid another function call in the datapath?  
> 
> Probably, although it is a regular function call, not an indirect one.
> 
> In the grand total of driver rx napi + GRO cost, saving a few cycles
> per GRO completed packet is quite small.

IOW please make sure you include the performance analysis quantifying
the win, if you want to make this a static inline. Or let us know if
the patch is good as is, I'm keeping it in pw for now.

David Ahern May 30, 2023, 10:36 p.m. UTC | #7

On 5/30/23 1:39 PM, Jakub Kicinski wrote:
> On Tue, 30 May 2023 17:48:22 +0200 Eric Dumazet wrote:
>>> tcp_gro_complete seems fairly trivial. Any reason not to make it an
>>> inline and avoid another function call in the datapath?  
>>
>> Probably, although it is a regular function call, not an indirect one.
>>
>> In the grand total of driver rx napi + GRO cost, saving a few cycles
>> per GRO completed packet is quite small.
> 
> IOW please make sure you include the performance analysis quantifying
> the win, if you want to make this a static inline. Or let us know if
> the patch is good as is, I'm keeping it in pw for now.

I am not suggesting holding up this patch; just constantly looking for
these little savings here and there to keep lowering the overhead.

100G, 1500 MTU, line rate is 8.3M pps so GRO wise that would be ~180k
fewer function calls.

Eric Dumazet May 31, 2023, 4:38 a.m. UTC | #8

On Wed, May 31, 2023 at 12:36 AM David Ahern <dsahern@kernel.org> wrote:
>
> On 5/30/23 1:39 PM, Jakub Kicinski wrote:
> > On Tue, 30 May 2023 17:48:22 +0200 Eric Dumazet wrote:
> >>> tcp_gro_complete seems fairly trivial. Any reason not to make it an
> >>> inline and avoid another function call in the datapath?
> >>
> >> Probably, although it is a regular function call, not an indirect one.
> >>
> >> In the grand total of driver rx napi + GRO cost, saving a few cycles
> >> per GRO completed packet is quite small.
> >
> > IOW please make sure you include the performance analysis quantifying
> > the win, if you want to make this a static inline. Or let us know if
> > the patch is good as is, I'm keeping it in pw for now.
>
> I am not suggesting holding up this patch; just constantly looking for
> these little savings here and there to keep lowering the overhead.
>
> 100G, 1500 MTU, line rate is 8.3M pps so GRO wise that would be ~180k
> fewer function calls.

Here with 4K MTU, this is called 67k per second

An __skb_put() instead of skb_put() in a driver (eg mlx5e_build_linear_skb())
would have 45x more impact, and would still be noise.

patchwork-bot+netdevbpf@kernel.org May 31, 2023, 9:10 a.m. UTC | #9

Hello:

This patch was applied to netdev/net-next.git (main)
by David S. Miller <davem@davemloft.net>:

On Mon, 29 May 2023 16:44:30 +0300 you wrote:
> tcp_gro_complete() function only updates the skb fields related to GRO
> and it always returns zero. All the 3 drivers which are using it
> do not check for the return value either.
> 
> Change it to return void instead which simplifies its callers as
> error handing becomes unnecessary.
> 
> [...]

Here is the summary with links:
  - [net-next] net: Make gro complete function to return void
    https://git.kernel.org/netdev/net-next/c/b1f2abcf817d

You are awesome, thank you!

Parav Pandit May 31, 2023, 5:07 p.m. UTC | #10

> From: Eric Dumazet <edumazet@google.com>
> Sent: Wednesday, May 31, 2023 12:39 AM
> 
> On Wed, May 31, 2023 at 12:36 AM David Ahern <dsahern@kernel.org> wrote:
> >
> > On 5/30/23 1:39 PM, Jakub Kicinski wrote:
> > > On Tue, 30 May 2023 17:48:22 +0200 Eric Dumazet wrote:
> > >>> tcp_gro_complete seems fairly trivial. Any reason not to make it
> > >>> an inline and avoid another function call in the datapath?
> > >>
> > >> Probably, although it is a regular function call, not an indirect one.
> > >>
> > >> In the grand total of driver rx napi + GRO cost, saving a few
> > >> cycles per GRO completed packet is quite small.
> > >
> > > IOW please make sure you include the performance analysis
> > > quantifying the win, if you want to make this a static inline. Or
> > > let us know if the patch is good as is, I'm keeping it in pw for now.
> >
> > I am not suggesting holding up this patch; just constantly looking for
> > these little savings here and there to keep lowering the overhead.
> >
> > 100G, 1500 MTU, line rate is 8.3M pps so GRO wise that would be ~180k
> > fewer function calls.
> 
> Here with 4K MTU, this is called 67k per second
> 
> An __skb_put() instead of skb_put() in a driver (eg mlx5e_build_linear_skb())
> would have 45x more impact, and would still be noise.

Thanks, Eric, for the suggestion, will evaluate with Tariq to use __skb_put().

[net-next] net: Make gro complete function to return void

Checks

Commit Message

Comments

Patch