diff mbox series

[net-next,2/3] net/smc: Remove corked dealyed work

Message ID 20220130180256.28303-3-tonylu@linux.alibaba.com (mailing list archive)
State Accepted
Delegated to: Netdev Maintainers
Headers show
Series net/smc: Improvements for TCP_CORK and sendfile() | expand

Checks

Context Check Description
netdev/tree_selection success Clearly marked for net-next
netdev/fixes_present success Fixes tag not required for -next series
netdev/subject_prefix success Link
netdev/cover_letter success Series has a cover letter
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 0 this patch: 0
netdev/cc_maintainers success CCed 5 of 5 maintainers
netdev/build_clang success Errors and warnings before: 0 this patch: 0
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 0 this patch: 0
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 28 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Tony Lu Jan. 30, 2022, 6:02 p.m. UTC
Based on the manual of TCP_CORK [1] and MSG_MORE [2], these two options
have the same effect. Applications can set these options and informs the
kernel to pend the data, and send them out only when the socket or
syscall does not specify this flag. In other words, there's no need to
send data out by a delayed work, which will queue a lot of work.

This removes corked delayed work with SMC_TX_CORK_DELAY (250ms), and the
applications control how/when to send them out. It improves the
performance for sendfile and throughput, and remove unnecessary race of
lock_sock(). This also unlocks the limitation of sndbuf, and try to fill
it up before sending.

[1] https://linux.die.net/man/7/tcp
[2] https://man7.org/linux/man-pages/man2/send.2.html

Signed-off-by: Tony Lu <tonylu@linux.alibaba.com>
---
 net/smc/smc_tx.c | 15 ++++++---------
 1 file changed, 6 insertions(+), 9 deletions(-)

Comments

Stefan Raspl Jan. 31, 2022, 7:40 p.m. UTC | #1
On 1/30/22 19:02, Tony Lu wrote:
> Based on the manual of TCP_CORK [1] and MSG_MORE [2], these two options
> have the same effect. Applications can set these options and informs the
> kernel to pend the data, and send them out only when the socket or
> syscall does not specify this flag. In other words, there's no need to
> send data out by a delayed work, which will queue a lot of work.
> 
> This removes corked delayed work with SMC_TX_CORK_DELAY (250ms), and the
> applications control how/when to send them out. It improves the
> performance for sendfile and throughput, and remove unnecessary race of
> lock_sock(). This also unlocks the limitation of sndbuf, and try to fill
> it up before sending.
> 
> [1] https://linux.die.net/man/7/tcp
> [2] https://man7.org/linux/man-pages/man2/send.2.html
> 
> Signed-off-by: Tony Lu <tonylu@linux.alibaba.com>
> ---
>   net/smc/smc_tx.c | 15 ++++++---------
>   1 file changed, 6 insertions(+), 9 deletions(-)
> 
> diff --git a/net/smc/smc_tx.c b/net/smc/smc_tx.c
> index 7b0b6e24582f..9cec62cae7cb 100644
> --- a/net/smc/smc_tx.c
> +++ b/net/smc/smc_tx.c
> @@ -31,7 +31,6 @@
>   #include "smc_tracepoint.h"
>   
>   #define SMC_TX_WORK_DELAY	0
> -#define SMC_TX_CORK_DELAY	(HZ >> 2)	/* 250 ms */
>   
>   /***************************** sndbuf producer *******************************/
>   
> @@ -237,15 +236,13 @@ int smc_tx_sendmsg(struct smc_sock *smc, struct msghdr *msg, size_t len)
>   		if ((msg->msg_flags & MSG_OOB) && !send_remaining)
>   			conn->urg_tx_pend = true;
>   		if ((msg->msg_flags & MSG_MORE || smc_tx_is_corked(smc)) &&
> -		    (atomic_read(&conn->sndbuf_space) >
> -						(conn->sndbuf_desc->len >> 1)))
> -			/* for a corked socket defer the RDMA writes if there
> -			 * is still sufficient sndbuf_space available
> +		    (atomic_read(&conn->sndbuf_space)))
> +			/* for a corked socket defer the RDMA writes if
> +			 * sndbuf_space is still available. The applications
> +			 * should known how/when to uncork it.
>   			 */
> -			queue_delayed_work(conn->lgr->tx_wq, &conn->tx_work,
> -					   SMC_TX_CORK_DELAY);
> -		else
> -			smc_tx_sndbuf_nonempty(conn);
> +			continue;

In case we just corked the final bytes in this call, wouldn't this 'continue' 
prevent us from accounting the Bytes that we just staged to be sent out later in 
the trace_smc_tx_sendmsg() call below?

> +		smc_tx_sndbuf_nonempty(conn);
>   
>   		trace_smc_tx_sendmsg(smc, copylen);
Tony Lu Feb. 11, 2022, 9:10 a.m. UTC | #2
On Mon, Jan 31, 2022 at 08:40:47PM +0100, Stefan Raspl wrote:
> On 1/30/22 19:02, Tony Lu wrote:
> > Based on the manual of TCP_CORK [1] and MSG_MORE [2], these two options
> > have the same effect. Applications can set these options and informs the
> > kernel to pend the data, and send them out only when the socket or
> > syscall does not specify this flag. In other words, there's no need to
> > send data out by a delayed work, which will queue a lot of work.
> > 
> > This removes corked delayed work with SMC_TX_CORK_DELAY (250ms), and the
> > applications control how/when to send them out. It improves the
> > performance for sendfile and throughput, and remove unnecessary race of
> > lock_sock(). This also unlocks the limitation of sndbuf, and try to fill
> > it up before sending.
> > 
> > [1] https://linux.die.net/man/7/tcp
> > [2] https://man7.org/linux/man-pages/man2/send.2.html
> > 
> > Signed-off-by: Tony Lu <tonylu@linux.alibaba.com>
> > ---
> >   net/smc/smc_tx.c | 15 ++++++---------
> >   1 file changed, 6 insertions(+), 9 deletions(-)
> > 
> > diff --git a/net/smc/smc_tx.c b/net/smc/smc_tx.c
> > index 7b0b6e24582f..9cec62cae7cb 100644
> > --- a/net/smc/smc_tx.c
> > +++ b/net/smc/smc_tx.c
> > @@ -31,7 +31,6 @@
> >   #include "smc_tracepoint.h"
> >   #define SMC_TX_WORK_DELAY	0
> > -#define SMC_TX_CORK_DELAY	(HZ >> 2)	/* 250 ms */
> >   /***************************** sndbuf producer *******************************/
> > @@ -237,15 +236,13 @@ int smc_tx_sendmsg(struct smc_sock *smc, struct msghdr *msg, size_t len)
> >   		if ((msg->msg_flags & MSG_OOB) && !send_remaining)
> >   			conn->urg_tx_pend = true;
> >   		if ((msg->msg_flags & MSG_MORE || smc_tx_is_corked(smc)) &&
> > -		    (atomic_read(&conn->sndbuf_space) >
> > -						(conn->sndbuf_desc->len >> 1)))
> > -			/* for a corked socket defer the RDMA writes if there
> > -			 * is still sufficient sndbuf_space available
> > +		    (atomic_read(&conn->sndbuf_space)))
> > +			/* for a corked socket defer the RDMA writes if
> > +			 * sndbuf_space is still available. The applications
> > +			 * should known how/when to uncork it.
> >   			 */
> > -			queue_delayed_work(conn->lgr->tx_wq, &conn->tx_work,
> > -					   SMC_TX_CORK_DELAY);
> > -		else
> > -			smc_tx_sndbuf_nonempty(conn);
> > +			continue;
> 
> In case we just corked the final bytes in this call, wouldn't this
> 'continue' prevent us from accounting the Bytes that we just staged to be
> sent out later in the trace_smc_tx_sendmsg() call below?
> 
> > +		smc_tx_sndbuf_nonempty(conn);
> >   		trace_smc_tx_sendmsg(smc, copylen);
> 

If the application send out the final bytes in this call, the
application should also clear MSG_MORE or TCP_CORK flag, this action is
required based on the manuals [1] and [2]. So it is safe to cork the data
if flag is setted, and continue to the next loop until application
clears the flag.

[1] https://linux.die.net/man/7/tcp
[2] https://man7.org/linux/man-pages/man2/send.2.html

Thank you,
Tony Lu
Stefan Raspl Feb. 14, 2022, 10:29 a.m. UTC | #3
On 2/11/22 10:10, Tony Lu wrote:
> On Mon, Jan 31, 2022 at 08:40:47PM +0100, Stefan Raspl wrote:
>> On 1/30/22 19:02, Tony Lu wrote:
>>> Based on the manual of TCP_CORK [1] and MSG_MORE [2], these two options
>>> have the same effect. Applications can set these options and informs the
>>> kernel to pend the data, and send them out only when the socket or
>>> syscall does not specify this flag. In other words, there's no need to
>>> send data out by a delayed work, which will queue a lot of work.
>>>
>>> This removes corked delayed work with SMC_TX_CORK_DELAY (250ms), and the
>>> applications control how/when to send them out. It improves the
>>> performance for sendfile and throughput, and remove unnecessary race of
>>> lock_sock(). This also unlocks the limitation of sndbuf, and try to fill
>>> it up before sending.
>>>
>>> [1] https://linux.die.net/man/7/tcp
>>> [2] https://man7.org/linux/man-pages/man2/send.2.html
>>>
>>> Signed-off-by: Tony Lu <tonylu@linux.alibaba.com>
>>> ---
>>>    net/smc/smc_tx.c | 15 ++++++---------
>>>    1 file changed, 6 insertions(+), 9 deletions(-)
>>>
>>> diff --git a/net/smc/smc_tx.c b/net/smc/smc_tx.c
>>> index 7b0b6e24582f..9cec62cae7cb 100644
>>> --- a/net/smc/smc_tx.c
>>> +++ b/net/smc/smc_tx.c
>>> @@ -31,7 +31,6 @@
>>>    #include "smc_tracepoint.h"
>>>    #define SMC_TX_WORK_DELAY	0
>>> -#define SMC_TX_CORK_DELAY	(HZ >> 2)	/* 250 ms */
>>>    /***************************** sndbuf producer *******************************/
>>> @@ -237,15 +236,13 @@ int smc_tx_sendmsg(struct smc_sock *smc, struct msghdr *msg, size_t len)
>>>    		if ((msg->msg_flags & MSG_OOB) && !send_remaining)
>>>    			conn->urg_tx_pend = true;
>>>    		if ((msg->msg_flags & MSG_MORE || smc_tx_is_corked(smc)) &&
>>> -		    (atomic_read(&conn->sndbuf_space) >
>>> -						(conn->sndbuf_desc->len >> 1)))
>>> -			/* for a corked socket defer the RDMA writes if there
>>> -			 * is still sufficient sndbuf_space available
>>> +		    (atomic_read(&conn->sndbuf_space)))
>>> +			/* for a corked socket defer the RDMA writes if
>>> +			 * sndbuf_space is still available. The applications
>>> +			 * should known how/when to uncork it.
>>>    			 */
>>> -			queue_delayed_work(conn->lgr->tx_wq, &conn->tx_work,
>>> -					   SMC_TX_CORK_DELAY);
>>> -		else
>>> -			smc_tx_sndbuf_nonempty(conn);
>>> +			continue;
>>
>> In case we just corked the final bytes in this call, wouldn't this
>> 'continue' prevent us from accounting the Bytes that we just staged to be
>> sent out later in the trace_smc_tx_sendmsg() call below?
>>
>>> +		smc_tx_sndbuf_nonempty(conn);
>>>    		trace_smc_tx_sendmsg(smc, copylen);
>>
> 
> If the application send out the final bytes in this call, the
> application should also clear MSG_MORE or TCP_CORK flag, this action is
> required based on the manuals [1] and [2]. So it is safe to cork the data
> if flag is setted, and continue to the next loop until application
> clears the flag.

Yes, I understand. But trace_smc_tx_sendmsg(smc, copylen) should be called for 
each portion of data that we transmit, i.e. each time we run through this loop. 
That is because parameter copylen is reset during each iteration.
Now your patch adds a 'continue', which prevents that trace_smc_tc... call from 
being made. Which means the information that 'copylen' Bytes were transferred is 
lost forever, and the accounting of tx Bytes is off by 'copylen' Bytes, I believe!

Ciao,
Stefan
Tony Lu Feb. 14, 2022, 12:10 p.m. UTC | #4
On Mon, Feb 14, 2022 at 11:29:10AM +0100, Stefan Raspl wrote:
> On 2/11/22 10:10, Tony Lu wrote:
> > On Mon, Jan 31, 2022 at 08:40:47PM +0100, Stefan Raspl wrote:
> > > On 1/30/22 19:02, Tony Lu wrote:
> > > > Based on the manual of TCP_CORK [1] and MSG_MORE [2], these two options
> > > > have the same effect. Applications can set these options and informs the
> > > > kernel to pend the data, and send them out only when the socket or
> > > > syscall does not specify this flag. In other words, there's no need to
> > > > send data out by a delayed work, which will queue a lot of work.
> > > > 
> > > > This removes corked delayed work with SMC_TX_CORK_DELAY (250ms), and the
> > > > applications control how/when to send them out. It improves the
> > > > performance for sendfile and throughput, and remove unnecessary race of
> > > > lock_sock(). This also unlocks the limitation of sndbuf, and try to fill
> > > > it up before sending.
> > > > 
> > > > [1] https://linux.die.net/man/7/tcp
> > > > [2] https://man7.org/linux/man-pages/man2/send.2.html
> > > > 
> > > > Signed-off-by: Tony Lu <tonylu@linux.alibaba.com>
> > > > ---
> > > >    net/smc/smc_tx.c | 15 ++++++---------
> > > >    1 file changed, 6 insertions(+), 9 deletions(-)
> > > > 
> > > > diff --git a/net/smc/smc_tx.c b/net/smc/smc_tx.c
> > > > index 7b0b6e24582f..9cec62cae7cb 100644
> > > > --- a/net/smc/smc_tx.c
> > > > +++ b/net/smc/smc_tx.c
> > > > @@ -31,7 +31,6 @@
> > > >    #include "smc_tracepoint.h"
> > > >    #define SMC_TX_WORK_DELAY	0
> > > > -#define SMC_TX_CORK_DELAY	(HZ >> 2)	/* 250 ms */
> > > >    /***************************** sndbuf producer *******************************/
> > > > @@ -237,15 +236,13 @@ int smc_tx_sendmsg(struct smc_sock *smc, struct msghdr *msg, size_t len)
> > > >    		if ((msg->msg_flags & MSG_OOB) && !send_remaining)
> > > >    			conn->urg_tx_pend = true;
> > > >    		if ((msg->msg_flags & MSG_MORE || smc_tx_is_corked(smc)) &&
> > > > -		    (atomic_read(&conn->sndbuf_space) >
> > > > -						(conn->sndbuf_desc->len >> 1)))
> > > > -			/* for a corked socket defer the RDMA writes if there
> > > > -			 * is still sufficient sndbuf_space available
> > > > +		    (atomic_read(&conn->sndbuf_space)))
> > > > +			/* for a corked socket defer the RDMA writes if
> > > > +			 * sndbuf_space is still available. The applications
> > > > +			 * should known how/when to uncork it.
> > > >    			 */
> > > > -			queue_delayed_work(conn->lgr->tx_wq, &conn->tx_work,
> > > > -					   SMC_TX_CORK_DELAY);
> > > > -		else
> > > > -			smc_tx_sndbuf_nonempty(conn);
> > > > +			continue;
> > > 
> > > In case we just corked the final bytes in this call, wouldn't this
> > > 'continue' prevent us from accounting the Bytes that we just staged to be
> > > sent out later in the trace_smc_tx_sendmsg() call below?
> > > 
> > > > +		smc_tx_sndbuf_nonempty(conn);
> > > >    		trace_smc_tx_sendmsg(smc, copylen);
> > > 
> > 
> > If the application send out the final bytes in this call, the
> > application should also clear MSG_MORE or TCP_CORK flag, this action is
> > required based on the manuals [1] and [2]. So it is safe to cork the data
> > if flag is setted, and continue to the next loop until application
> > clears the flag.
> 
> Yes, I understand. But trace_smc_tx_sendmsg(smc, copylen) should be called
> for each portion of data that we transmit, i.e. each time we run through
> this loop. That is because parameter copylen is reset during each iteration.
> Now your patch adds a 'continue', which prevents that trace_smc_tc... call
> from being made. Which means the information that 'copylen' Bytes were
> transferred is lost forever, and the accounting of tx Bytes is off by
> 'copylen' Bytes, I believe!

This makes sense to me. It shouldn't be ignored if data was corked. I
will fix it in the next patch.

Thank you,
Tony Lu
diff mbox series

Patch

diff --git a/net/smc/smc_tx.c b/net/smc/smc_tx.c
index 7b0b6e24582f..9cec62cae7cb 100644
--- a/net/smc/smc_tx.c
+++ b/net/smc/smc_tx.c
@@ -31,7 +31,6 @@ 
 #include "smc_tracepoint.h"
 
 #define SMC_TX_WORK_DELAY	0
-#define SMC_TX_CORK_DELAY	(HZ >> 2)	/* 250 ms */
 
 /***************************** sndbuf producer *******************************/
 
@@ -237,15 +236,13 @@  int smc_tx_sendmsg(struct smc_sock *smc, struct msghdr *msg, size_t len)
 		if ((msg->msg_flags & MSG_OOB) && !send_remaining)
 			conn->urg_tx_pend = true;
 		if ((msg->msg_flags & MSG_MORE || smc_tx_is_corked(smc)) &&
-		    (atomic_read(&conn->sndbuf_space) >
-						(conn->sndbuf_desc->len >> 1)))
-			/* for a corked socket defer the RDMA writes if there
-			 * is still sufficient sndbuf_space available
+		    (atomic_read(&conn->sndbuf_space)))
+			/* for a corked socket defer the RDMA writes if
+			 * sndbuf_space is still available. The applications
+			 * should known how/when to uncork it.
 			 */
-			queue_delayed_work(conn->lgr->tx_wq, &conn->tx_work,
-					   SMC_TX_CORK_DELAY);
-		else
-			smc_tx_sndbuf_nonempty(conn);
+			continue;
+		smc_tx_sndbuf_nonempty(conn);
 
 		trace_smc_tx_sendmsg(smc, copylen);
 	} /* while (msg_data_left(msg)) */