diff mbox series

[net-next,v5,2/5] net/smc: Limit backlog connections

Message ID c597e6c6d004e5b2a26a9535c8099d389214f273.1644323503.git.alibuda@linux.alibaba.com (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series net/smc: Optimizing performance in short-lived scenarios | expand

Checks

Context Check Description
netdev/tree_selection success Clearly marked for net-next
netdev/fixes_present success Fixes tag not required for -next series
netdev/subject_prefix success Link
netdev/cover_letter success Series has a cover letter
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 0 this patch: 0
netdev/cc_maintainers success CCed 5 of 5 maintainers
netdev/build_clang success Errors and warnings before: 0 this patch: 0
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 0 this patch: 0
netdev/checkpatch warning WARNING: line length of 85 exceeds 80 columns WARNING: line length of 86 exceeds 80 columns
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

D. Wythe Feb. 8, 2022, 12:53 p.m. UTC
From: "D. Wythe" <alibuda@linux.alibaba.com>

Current implementation does not handling backlog semantics, one
potential risk is that server will be flooded by infinite amount
connections, even if client was SMC-incapable.

This patch works to put a limit on backlog connections, referring to the
TCP implementation, we divides SMC connections into two categories:

1. Half SMC connection, which includes all TCP established while SMC not
connections.

2. Full SMC connection, which includes all SMC established connections.

For half SMC connection, since all half SMC connections starts with TCP
established, we can achieve our goal by put a limit before TCP
established. Refer to the implementation of TCP, this limits will based
on not only the half SMC connections but also the full connections,
which is also a constraint on full SMC connections.

For full SMC connections, although we know exactly where it starts, it's
quite hard to put a limit before it. The easiest way is to block wait
before receive SMC confirm CLC message, while it's under protection by
smc_server_lgr_pending, a global lock, which leads this limit to the
entire host instead of a single listen socket. Another way is to drop
the full connections, but considering the cast of SMC connections, we
prefer to keep full SMC connections.

Even so, the limits of full SMC connections still exists, see commits
about half SMC connection below.

After this patch, the limits of backend connection shows like:

For SMC:

1. Client with SMC-capability can makes 2 * backlog full SMC connections
   or 1 * backlog half SMC connections and 1 * backlog full SMC
   connections at most.

2. Client without SMC-capability can only makes 1 * backlog half TCP
   connections and 1 * backlog full TCP connections.

Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
---
 net/smc/af_smc.c | 43 +++++++++++++++++++++++++++++++++++++++++++
 net/smc/smc.h    |  4 ++++
 2 files changed, 47 insertions(+)

Comments

Karsten Graul Feb. 8, 2022, 5:13 p.m. UTC | #1
On 08/02/2022 13:53, D. Wythe wrote:
> From: "D. Wythe" <alibuda@linux.alibaba.com>
> 
> Current implementation does not handling backlog semantics, one
> potential risk is that server will be flooded by infinite amount
> connections, even if client was SMC-incapable.

In this patch you count the number of inflight SMC handshakes as pending and
check them against the defined max_backlog. I really like this improvement.

There is another queue in af_smc.c, the smc accept queue and any new client 
socket that completed the handshake process is enqueued there (in smc_accept_enqueue() )
and is waiting to get accepted by the user space application. To apply the correct
semantics here, I think the number of sockets waiting in the smc accept queue 
should also be counted as backlog connections, right? I see no limit for this queue
now. What do you think?
D. Wythe Feb. 9, 2022, 7:11 a.m. UTC | #2
There are indirectly limits on smc accept queue with following code.

+	if (sk_acceptq_is_full(&smc->sk)) {
+		NET_INC_STATS(sock_net(sk), LINUX_MIB_LISTENOVERFLOWS);
+		goto drop;
+	}


In fact, we treat the connections in smc accept queue as Full 
establisted connection. As I wrote in patch commits, there are 
trade-offs to this implemets.

Thanks.

在 2022/2/9 上午1:13, Karsten Graul 写道:
> On 08/02/2022 13:53, D. Wythe wrote:
>> From: "D. Wythe" <alibuda@linux.alibaba.com>
>>
>> Current implementation does not handling backlog semantics, one
>> potential risk is that server will be flooded by infinite amount
>> connections, even if client was SMC-incapable.
> 
> In this patch you count the number of inflight SMC handshakes as pending and
> check them against the defined max_backlog. I really like this improvement.
> 
> There is another queue in af_smc.c, the smc accept queue and any new client
> socket that completed the handshake process is enqueued there (in smc_accept_enqueue() )
> and is waiting to get accepted by the user space application. To apply the correct
> semantics here, I think the number of sockets waiting in the smc accept queue
> should also be counted as backlog connections, right? I see no limit for this queue
> now. What do you think?
Karsten Graul Feb. 9, 2022, 7:56 a.m. UTC | #3
On 09/02/2022 08:11, D. Wythe wrote:
> 
> There are indirectly limits on smc accept queue with following code.
> 
> +    if (sk_acceptq_is_full(&smc->sk)) {
> +        NET_INC_STATS(sock_net(sk), LINUX_MIB_LISTENOVERFLOWS);
> +        goto drop;
> +    }
> 
> 
> In fact, we treat the connections in smc accept queue as Full establisted connection. As I wrote in patch commits, there are trade-offs to this implemets.
> 

Thanks for the clarification, I got your point. You refer to the call to sk_acceptq_added()
in smc_accept_enqueue().
diff mbox series

Patch

diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
index 4969ac8..ebfce3d 100644
--- a/net/smc/af_smc.c
+++ b/net/smc/af_smc.c
@@ -73,6 +73,34 @@  static void smc_set_keepalive(struct sock *sk, int val)
 	smc->clcsock->sk->sk_prot->keepalive(smc->clcsock->sk, val);
 }
 
+static struct sock *smc_tcp_syn_recv_sock(const struct sock *sk, struct sk_buff *skb,
+					  struct request_sock *req,
+					  struct dst_entry *dst,
+					  struct request_sock *req_unhash,
+					  bool *own_req)
+{
+	struct smc_sock *smc;
+
+	smc = (struct smc_sock *)((uintptr_t)sk->sk_user_data & ~SK_USER_DATA_NOCOPY);
+
+	if (READ_ONCE(sk->sk_ack_backlog) + atomic_read(&smc->smc_pendings) >
+				sk->sk_max_ack_backlog)
+		goto drop;
+
+	if (sk_acceptq_is_full(&smc->sk)) {
+		NET_INC_STATS(sock_net(sk), LINUX_MIB_LISTENOVERFLOWS);
+		goto drop;
+	}
+
+	/* passthrough to origin syn recv sock fct */
+	return smc->ori_af_ops->syn_recv_sock(sk, skb, req, dst, req_unhash, own_req);
+
+drop:
+	dst_release(dst);
+	tcp_listendrop(sk);
+	return NULL;
+}
+
 static struct smc_hashinfo smc_v4_hashinfo = {
 	.lock = __RW_LOCK_UNLOCKED(smc_v4_hashinfo.lock),
 };
@@ -1595,6 +1623,9 @@  static void smc_listen_out(struct smc_sock *new_smc)
 	struct smc_sock *lsmc = new_smc->listen_smc;
 	struct sock *newsmcsk = &new_smc->sk;
 
+	if (tcp_sk(new_smc->clcsock->sk)->syn_smc)
+		atomic_dec(&lsmc->smc_pendings);
+
 	if (lsmc->sk.sk_state == SMC_LISTEN) {
 		lock_sock_nested(&lsmc->sk, SINGLE_DEPTH_NESTING);
 		smc_accept_enqueue(&lsmc->sk, newsmcsk);
@@ -2200,6 +2231,9 @@  static void smc_tcp_listen_work(struct work_struct *work)
 		if (!new_smc)
 			continue;
 
+		if (tcp_sk(new_smc->clcsock->sk)->syn_smc)
+			atomic_inc(&lsmc->smc_pendings);
+
 		new_smc->listen_smc = lsmc;
 		new_smc->use_fallback = lsmc->use_fallback;
 		new_smc->fallback_rsn = lsmc->fallback_rsn;
@@ -2266,6 +2300,15 @@  static int smc_listen(struct socket *sock, int backlog)
 	smc->clcsock->sk->sk_data_ready = smc_clcsock_data_ready;
 	smc->clcsock->sk->sk_user_data =
 		(void *)((uintptr_t)smc | SK_USER_DATA_NOCOPY);
+
+	/* save origin ops */
+	smc->ori_af_ops = inet_csk(smc->clcsock->sk)->icsk_af_ops;
+
+	smc->af_ops = *smc->ori_af_ops;
+	smc->af_ops.syn_recv_sock = smc_tcp_syn_recv_sock;
+
+	inet_csk(smc->clcsock->sk)->icsk_af_ops = &smc->af_ops;
+
 	rc = kernel_listen(smc->clcsock, backlog);
 	if (rc) {
 		smc->clcsock->sk->sk_data_ready = smc->clcsk_data_ready;
diff --git a/net/smc/smc.h b/net/smc/smc.h
index 37b2001..5e5e38d 100644
--- a/net/smc/smc.h
+++ b/net/smc/smc.h
@@ -252,6 +252,10 @@  struct smc_sock {				/* smc sock container */
 	bool			use_fallback;	/* fallback to tcp */
 	int			fallback_rsn;	/* reason for fallback */
 	u32			peer_diagnosis; /* decline reason from peer */
+	atomic_t                smc_pendings;   /* pending smc connections */
+	struct inet_connection_sock_af_ops		af_ops;
+	const struct inet_connection_sock_af_ops	*ori_af_ops;
+						/* origin af ops */
 	int			sockopt_defer_accept;
 						/* sockopt TCP_DEFER_ACCEPT
 						 * value