diff mbox series

[RFC,net,v2,1/2] net/smc: Fix socket wait queue mismatch issue caused by fallback

Message ID 1636548651-44649-2-git-send-email-guwen@linux.alibaba.com (mailing list archive)
State RFC
Delegated to: Netdev Maintainers
Headers show
Series Two RFC patches for the same SMC socket wait queue mismatch issue | expand

Checks

Context Check Description
netdev/fixes_present success Fixes tag not required for -next series
netdev/subject_prefix warning Target tree name not specified in the subject
netdev/cover_letter success Series has a cover letter
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 0 this patch: 0
netdev/cc_maintainers success CCed 5 of 5 maintainers
netdev/build_clang success Errors and warnings before: 0 this patch: 0
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 0 this patch: 0
netdev/checkpatch warning WARNING: line length of 81 exceeds 80 columns WARNING: line length of 83 exceeds 80 columns WARNING: line length of 84 exceeds 80 columns WARNING: line length of 87 exceeds 80 columns
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
netdev/tree_selection success Guessing tree name failed - patch did not apply

Commit Message

Wen Gu Nov. 10, 2021, 12:50 p.m. UTC
There may be a mismatch of socket wait queue caused by SMC fallback.

When user applications used SMC to replace TCP, they might add
an eppoll_entry into smc socket->wq and expect to be notified if
epoll events like EPOLL_IN/EPOLL_OUT occurred on the smc socket.

However, once a fallback occurred, user applications start to use
clcsocket. So it is clcsocket->wq instead of smc socket->wq which
is woken up when there is data to be processed or sending space
available. Thus the eppoll_entry which was added into smc socket->wq
before fallback does not work anymore.

[if not fallback]
   application
        ^
      (poll)
        |
  smc socket->wq            clcsocket->wq
(has eppoll_entry)                .
        .                         .
        .                         .
smc socket->sk->sk_wq    clcsocket->sk->sk_wq
        ^                         ^
        | (data flow)             |(tcp handshake in rendezvous)
        |                         |
|-------------------------------------------|
|   sk_data_ready / sk_write_space ..       |
|-------------------------------------------|

[if fallback]
   application <------------------|
                        (can't poll anything)
                                  |
  smc socket->wq            clcsocket->wq
(has eppoll_entry)                .
        .                         .
        .                         .
smc socket->sk->sk_wq    clcsocket->sk->sk_wq
                                  ^
                                  |(data flow)
                                  |
|-------------------------------------------|
|   sk_data_ready / sk_write_space ..       |
|-------------------------------------------|

For example, in nginx/wrk benchmark, this issue will cause an
all-zeros test result:

server: nginx -g 'daemon off;'
client: smc_run wrk -c 1 -t 1 -d 5 http://11.200.15.93/index.html

  Running 5s test @ http://11.200.15.93/index.html
     1 threads and 1 connections
     Thread Stats   Avg      Stdev     Max   ± Stdev
     	Latency     0.00us    0.00us   0.00us    -nan%
	Req/Sec     0.00      0.00     0.00      -nan%
	0 requests in 5.00s, 0.00B read
     Requests/sec:      0.00
     Transfer/sec:       0.00B

The main idea of this patch is that since wait queue is switched
from smc socket->wq to clcsocket->wq after fallback, disabling the
eppoll_entry in smc socket->wq, maybe we can try to use clcsocket->wq
from the beginning.

So this patch set smc socket->sk->sk_wq to clcsocket->wq when smc
socket is created and reset clcsocket->sk->sk_wq to clcsocket->wq
once fallback occurrs, making clcsocket->wq the constant wait queue
that user applications use. Thus the user application can be
notified by the eppoll_entry in clcsocket->wq no matter whether
a fallback occurrs.

After this patch:

[if not fallback]
   application <------------------|
                                (poll)
                                  |
  smc socket->wq            clcsocket->wq
        .                 (has eppoll_entry)
            `   .                 .
                    `   . .   `
                  .   `    `   .
               `                  `
smc socket->sk->sk_wq    clcsocket->sk->sk_wq
        ^                         ^
        |(data flow)              |(tcp handshake in rendezvous)
        |                         |
|-------------------------------------------|
|   sk_data_ready / sk_write_space ..       |
|-------------------------------------------|

[if fallback]
   application <------------------|
                                (poll)
                                  |
  smc socket->wq            clcsocket->wq
        .                 (has eppoll_entry)
            `   .                 .
                    `   .         .
                           `   .  .
                                  `
smc socket->sk->sk_wq    clcsocket->sk->sk_wq
                                  ^
                                  |(data flow)
                                  |
|-------------------------------------------|
|   sk_data_ready / sk_write_space ..       |
|-------------------------------------------|

Link: https://lore.kernel.org/all/db9acf73-abef-209e-6ec2-8ada92e2cfbc@linux.ibm.com/
Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
Reviewed-by: Tony Lu <tonylu@linux.alibaba.com>
---
v1 -> v2
- Add the complete description about the intention of initial patch.
---
 net/smc/af_smc.c | 56 ++++++++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 54 insertions(+), 2 deletions(-)
diff mbox series

Patch

diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
index 0cf7ed2..fe5a2cd 100644
--- a/net/smc/af_smc.c
+++ b/net/smc/af_smc.c
@@ -566,6 +566,20 @@  static void smc_switch_to_fallback(struct smc_sock *smc, int reason_code)
 	smc->fallback_rsn = reason_code;
 	smc_stat_fallback(smc);
 	trace_smc_switch_to_fallback(smc, reason_code);
+
+	/* We swapped sk->sk_wq of clcsocket and smc socket in smc_create()
+	 * to prevent mismatch of wait queue caused by fallback.
+	 *
+	 * If fallback really occurred, user application starts to use clcsocket.
+	 * Accordingly, clcsocket->sk->sk_wq should be reset to clcsocket->wq
+	 * in order that user application still uses the same wait queue as what
+	 * was used before fallback.
+	 *
+	 * After fallback, the relation between socket->wq and socket->sk->sk_wq is:
+	 * - clcsocket->sk->sk_wq  -->  clcsocket->wq
+	 * - smcsocket->sk->sk_wq  -->  clcsocket->wq
+	 */
+	rcu_assign_pointer(smc->clcsock->sk->sk_wq, &smc->clcsock->wq);
 	if (smc->sk.sk_socket && smc->sk.sk_socket->file) {
 		smc->clcsock->file = smc->sk.sk_socket->file;
 		smc->clcsock->file->private_data = smc->clcsock;
@@ -2174,6 +2188,12 @@  static int smc_accept(struct socket *sock, struct socket *new_sock,
 	if (rc)
 		goto out;
 
+	/* The new accepted smc sock sets sk->sk_wq to clcsocket->wq like what
+	 * smc_create() did when the fallback does not occur.
+	 */
+	if (!smc_sk(nsk)->use_fallback)
+		rcu_assign_pointer(nsk->sk_wq, &smc_sk(nsk)->clcsock->wq);
+
 	if (lsmc->sockopt_defer_accept && !(flags & O_NONBLOCK)) {
 		/* wait till data arrives on the socket */
 		timeo = msecs_to_jiffies(lsmc->sockopt_defer_accept *
@@ -2310,8 +2330,15 @@  static __poll_t smc_poll(struct file *file, struct socket *sock,
 		mask = smc->clcsock->ops->poll(file, smc->clcsock, wait);
 		sk->sk_err = smc->clcsock->sk->sk_err;
 	} else {
-		if (sk->sk_state != SMC_CLOSED)
-			sock_poll_wait(file, sock, wait);
+		if (sk->sk_state != SMC_CLOSED) {
+			/* SMC always uses clcsocket->wq for the call to
+			 * sock_poll_wait(), which is the same wait queue
+			 * as what smc socket->sk->sk_wq points to before
+			 * fallback or what clcsocket->sk->sk_wq points to
+			 * after fallback.
+			 */
+			sock_poll_wait(file, smc->clcsock, wait);
+		}
 		if (sk->sk_err)
 			mask |= EPOLLERR;
 		if ((sk->sk_shutdown == SHUTDOWN_MASK) ||
@@ -2707,6 +2734,31 @@  static int smc_create(struct net *net, struct socket *sock, int protocol,
 	smc->sk.sk_sndbuf = max(smc->clcsock->sk->sk_sndbuf, SMC_BUF_MIN_SIZE);
 	smc->sk.sk_rcvbuf = max(smc->clcsock->sk->sk_rcvbuf, SMC_BUF_MIN_SIZE);
 
+	/* To ensure that user applications consistently use the same wait queue
+	 * before and after fallback, we set smc socket->sk->sk_wq to clcsocket->wq
+	 * here and reset clcsocket->sk->sk_wq to clcsocket->wq in
+	 * smc_switch_to_fallback() if fallback occurrs, making clcsocket->wq the
+	 * constant wait queue which user applications use.
+	 *
+	 * here:
+	 * - Set smc socket->sk->sk_wq to clcsocket->wq
+	 *   So that the sk_data_ready()/sk_write_space.. will wake up clcsocket->wq
+	 *   and user applications will be notified of epoll events if they added
+	 *   eppoll_entry into clcsocket->wq through smc_poll().
+	 *
+	 * - Set clcsocket->sk->sk_wq to smc socket->wq
+	 *   Since clcsocket->wq is occupied by smcsocket->sk->sk_wq, clcsocket->
+	 *   sk->sk_wq have to use another wait queue (smc socket->wq) during TCP
+	 *   handshake or CLC messages transmission in rendezvous.
+	 *
+	 * So the relation between socket->wq and socket->sk->sk_wq before fallback is:
+	 *
+	 * - smc socket->sk->sk_wq --> clcsocket->wq
+	 * - clcsocket->sk->sk_wq  --> smc socket->wq
+	 */
+	rcu_assign_pointer(sk->sk_wq, &smc->clcsock->wq);
+	rcu_assign_pointer(smc->clcsock->sk->sk_wq, &smc->sk.sk_socket->wq);
+
 out:
 	return rc;
 }