From patchwork Tue Feb 20 07:01:26 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "D. Wythe" X-Patchwork-Id: 13563477 Received: from out30-100.freemail.mail.aliyun.com (out30-100.freemail.mail.aliyun.com [115.124.30.100]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3F21B5A10A; Tue, 20 Feb 2024 07:01:55 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=115.124.30.100 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708412519; cv=none; b=JzXhr7GmBHy+Vkk4o9iL+xYJhVOan+y8F2Bz5XZuPCxBAe7MxQH/3aSjPD5aPnkcCD31CxUJavE6/nhUfMazOdDFbR2oCsOcDNFYHDH2TzWvp43fv5kkzbCFa2kNGB9KHXnvy1mqTfrCA89PvISbaUsVjFATsgLnWDoQuIiB/OE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708412519; c=relaxed/simple; bh=e/OJdyE4TpLB+D1pOPdH/Ffoqt1dUbUlmwdpuLPoDh0=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References; b=WKFSw9zRFaXO56xf7M7ofFGyio3a2Lx/kPDoIXyeztJZusaVpp60ShTTxaNKt0gVa9+T2YMnQ/42JERLmZbi68AQl0ARTJmS1+/xkz78uZVfBetUvykj+oiM3EmNRKDQ/dusvY9sEiEY45lYvn1qM4rgRcy5gKzfWpJB8zgkSZw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com; spf=pass smtp.mailfrom=linux.alibaba.com; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b=lWawfqTm; arc=none smtp.client-ip=115.124.30.100 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b="lWawfqTm" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1708412510; h=From:To:Subject:Date:Message-Id; bh=nCulGWy5dVx4V7R5dThNutsqPdZq+WXBiutltng6hfY=; b=lWawfqTmxPanPZjYy3+pVzPlNrdl7vZ23P/zdk5jZkWxE1IBm6CIgCH9ovS8FFKQQG4r2ngoJRUgWvRW4//I3t2Hywt1ER6lUt/B3NZhD9Vbn13gyjS5eZEar7Sig9L4MFGNMugpfDTvEjLTTcpZkTIjUx2+mO9pc2Jzz7fhG2g= X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R521e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=ay29a033018046050;MF=alibuda@linux.alibaba.com;NM=1;PH=DS;RN=13;SR=0;TI=SMTPD_---0W0vuXd4_1708412509; Received: from j66a10360.sqa.eu95.tbsite.net(mailfrom:alibuda@linux.alibaba.com fp:SMTPD_---0W0vuXd4_1708412509) by smtp.aliyun-inc.com; Tue, 20 Feb 2024 15:01:50 +0800 From: "D. Wythe" To: kgraul@linux.ibm.com, wenjia@linux.ibm.com, jaka@linux.ibm.com, wintera@linux.ibm.com, guwen@linux.alibaba.com Cc: kuba@kernel.org, davem@davemloft.net, netdev@vger.kernel.org, linux-s390@vger.kernel.org, linux-rdma@vger.kernel.org, tonylu@linux.alibaba.com, pabeni@redhat.com, edumazet@google.com Subject: [RFC net-next 01/20] net: export partial symbols in inet/inet6 proto_ops Date: Tue, 20 Feb 2024 15:01:26 +0800 Message-Id: <1708412505-34470-2-git-send-email-alibuda@linux.alibaba.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1708412505-34470-1-git-send-email-alibuda@linux.alibaba.com> References: <1708412505-34470-1-git-send-email-alibuda@linux.alibaba.com> Precedence: bulk X-Mailing-List: linux-rdma@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: From: "D. Wythe" The following symbols have been exported here 1. inet_compat_ioctl 2. inet6_sendmsg 3. inet6_recvmsg Exporting these symbols mainly provides the ability for other modules to directly access these symbols. Currently, all symbols except those above symbols are exported. So, there mighe be no obvious risk in exporting these symbols. Signed-off-by: D. Wythe --- include/net/inet_common.h | 3 +++ net/ipv4/af_inet.c | 3 ++- net/ipv6/af_inet6.c | 2 ++ 3 files changed, 7 insertions(+), 1 deletion(-) diff --git a/include/net/inet_common.h b/include/net/inet_common.h index f50a644..1c2fcca 100644 --- a/include/net/inet_common.h +++ b/include/net/inet_common.h @@ -57,6 +57,9 @@ int __inet_bind(struct sock *sk, struct sockaddr *uaddr, int addr_len, int inet_getname(struct socket *sock, struct sockaddr *uaddr, int peer); int inet_ioctl(struct socket *sock, unsigned int cmd, unsigned long arg); +#ifdef CONFIG_COMPAT +int inet_compat_ioctl(struct socket *sock, unsigned int cmd, unsigned long arg); +#endif int inet_ctl_sock_create(struct sock **sk, unsigned short family, unsigned short type, unsigned char protocol, struct net *net); diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c index ad27800..049d135 100644 --- a/net/ipv4/af_inet.c +++ b/net/ipv4/af_inet.c @@ -1031,7 +1031,7 @@ static int inet_compat_routing_ioctl(struct sock *sk, unsigned int cmd, return ip_rt_ioctl(sock_net(sk), cmd, &rt); } -static int inet_compat_ioctl(struct socket *sock, unsigned int cmd, unsigned long arg) +int inet_compat_ioctl(struct socket *sock, unsigned int cmd, unsigned long arg) { void __user *argp = compat_ptr(arg); struct sock *sk = sock->sk; @@ -1046,6 +1046,7 @@ static int inet_compat_ioctl(struct socket *sock, unsigned int cmd, unsigned lon return sk->sk_prot->compat_ioctl(sk, cmd, arg); } } +EXPORT_SYMBOL_GPL(inet_compat_ioctl); #endif /* CONFIG_COMPAT */ const struct proto_ops inet_stream_ops = { diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c index 959bfd9..5a81f8b 100644 --- a/net/ipv6/af_inet6.c +++ b/net/ipv6/af_inet6.c @@ -660,6 +660,7 @@ int inet6_sendmsg(struct socket *sock, struct msghdr *msg, size_t size) return INDIRECT_CALL_2(prot->sendmsg, tcp_sendmsg, udpv6_sendmsg, sk, msg, size); } +EXPORT_SYMBOL_GPL(inet6_sendmsg); INDIRECT_CALLABLE_DECLARE(int udpv6_recvmsg(struct sock *, struct msghdr *, size_t, int, int *)); @@ -682,6 +683,7 @@ int inet6_recvmsg(struct socket *sock, struct msghdr *msg, size_t size, msg->msg_namelen = addr_len; return err; } +EXPORT_SYMBOL_GPL(inet6_recvmsg); const struct proto_ops inet6_stream_ops = { .family = PF_INET6, From patchwork Tue Feb 20 07:01:27 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "D. Wythe" X-Patchwork-Id: 13563475 Received: from out30-111.freemail.mail.aliyun.com (out30-111.freemail.mail.aliyun.com [115.124.30.111]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C8EA35A0EC; Tue, 20 Feb 2024 07:01:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=115.124.30.111 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708412517; cv=none; b=OF2DkH9vnFu9nnz8yBlLgpF4s0jGYKtclZQFN6GYTk9uh6GxVoC7XhvLM5SRJUjFfVV/ej5tLhjO0xm20DJlFhMSFLeXhHsMAhgtL9W/ga55EdzxFx0BmLO9jH2BfacRCxDjY8/c8PYgjDSmyKBeXNbr+E57Qqq7i6HaThlTw0s= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708412517; c=relaxed/simple; bh=zztptK6/enp6zp3WltsgDWhkOi40dsLromLKXWt3ZE4=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References; b=OM9uNLmd1wliwrKuYthTw1wVZa51aS2WJNQYiordTtgHQHHlqaFHjtRy5LEdKrdSQ9ZuzqPohGOADMvl3zVnPOoxKQukRqXi8Lo9qGTXlhllc/g9AOzhD4XmsobW8kJm0gtgUVMe/baXtFmcMYSupDCqEREGcjj6q8L7m1WSTXY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com; spf=pass smtp.mailfrom=linux.alibaba.com; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b=X8rUuBmV; arc=none smtp.client-ip=115.124.30.111 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b="X8rUuBmV" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1708412511; h=From:To:Subject:Date:Message-Id; bh=z0rBls4/DVw7umnR4vhKntk5sCoGWjqNnpPfFi4kifc=; b=X8rUuBmVlmZGhTAMu4DGnWi0NgZQ2kYUxsiBbTAWrGDcb1luPPERgi6WTAaxDCzbcnemZ3fW6qLVuzMdv0TBPVIMO8DhrxgqUhiUcae50AqonRhQRUmqw1b3eFVBoNdbYR4bvJ6K2L7OH+G3ow1jIM4Xc4HT0TYFus5/lyjQ5Ho= X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R211e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=ay29a033018046056;MF=alibuda@linux.alibaba.com;NM=1;PH=DS;RN=13;SR=0;TI=SMTPD_---0W0vuXdQ_1708412510; Received: from j66a10360.sqa.eu95.tbsite.net(mailfrom:alibuda@linux.alibaba.com fp:SMTPD_---0W0vuXdQ_1708412510) by smtp.aliyun-inc.com; Tue, 20 Feb 2024 15:01:50 +0800 From: "D. Wythe" To: kgraul@linux.ibm.com, wenjia@linux.ibm.com, jaka@linux.ibm.com, wintera@linux.ibm.com, guwen@linux.alibaba.com Cc: kuba@kernel.org, davem@davemloft.net, netdev@vger.kernel.org, linux-s390@vger.kernel.org, linux-rdma@vger.kernel.org, tonylu@linux.alibaba.com, pabeni@redhat.com, edumazet@google.com Subject: [RFC net-next 02/20] net/smc: read&write sock state via unified macros Date: Tue, 20 Feb 2024 15:01:27 +0800 Message-Id: <1708412505-34470-3-git-send-email-alibuda@linux.alibaba.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1708412505-34470-1-git-send-email-alibuda@linux.alibaba.com> References: <1708412505-34470-1-git-send-email-alibuda@linux.alibaba.com> Precedence: bulk X-Mailing-List: linux-rdma@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: From: "D. Wythe" In order to merge SMC sock and TCP sock, the state field of sock will be shared by TCP protocol stack and SMC protocol statck. Unfortunately, the state defined by SMC coincides with state defined in TCP. Easier way is to redefine the state of SMC sock, however, this will confuse some of smc tools because they are using the old state definition. So we have to use a new field for SMC implementation to read and write status after merging sock, In consideration of compatibility and subsequent extensibility, we decied to make all read & write operations related to be replaced by a unified macro. In this way, we can just modify this macro to switch state access between different fields. This patch only goes into code reorganization, replacing all read and write operations on socks state with unified macros. In theory, it should have no impact on anything, and all subsequent reading or writing operations on state should via it. Signed-off-by: D. Wythe --- net/smc/af_smc.c | 195 ++++++++++++++++++++++++++-------------------------- net/smc/smc.h | 3 + net/smc/smc_close.c | 98 +++++++++++++------------- net/smc/smc_core.c | 24 +++---- net/smc/smc_diag.c | 2 +- net/smc/smc_rx.c | 16 ++--- net/smc/smc_tx.c | 2 +- 7 files changed, 171 insertions(+), 169 deletions(-) diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c index 4b52b3b..bdb6dd7 100644 --- a/net/smc/af_smc.c +++ b/net/smc/af_smc.c @@ -278,16 +278,16 @@ static int __smc_release(struct smc_sock *smc) smc_sock_set_flag(sk, SOCK_DEAD); sk->sk_shutdown |= SHUTDOWN_MASK; } else { - if (sk->sk_state != SMC_CLOSED) { - if (sk->sk_state != SMC_LISTEN && - sk->sk_state != SMC_INIT) + if (smc_sk_state(sk) != SMC_CLOSED) { + if (smc_sk_state(sk) != SMC_LISTEN && + smc_sk_state(sk) != SMC_INIT) sock_put(sk); /* passive closing */ - if (sk->sk_state == SMC_LISTEN) { + if (smc_sk_state(sk) == SMC_LISTEN) { /* wake up clcsock accept */ rc = kernel_sock_shutdown(smc->clcsock, SHUT_RDWR); } - sk->sk_state = SMC_CLOSED; + smc_sk_set_state(sk, SMC_CLOSED); sk->sk_state_change(sk); } smc_restore_fallback_changes(smc); @@ -295,7 +295,7 @@ static int __smc_release(struct smc_sock *smc) sk->sk_prot->unhash(sk); - if (sk->sk_state == SMC_CLOSED) { + if (smc_sk_state(sk) == SMC_CLOSED) { if (smc->clcsock) { release_sock(sk); smc_clcsock_release(smc); @@ -320,7 +320,7 @@ static int smc_release(struct socket *sock) sock_hold(sk); /* sock_put below */ smc = smc_sk(sk); - old_state = sk->sk_state; + old_state = smc_sk_state(sk); /* cleanup for a dangling non-blocking connect */ if (smc->connect_nonblock && old_state == SMC_INIT) @@ -329,7 +329,7 @@ static int smc_release(struct socket *sock) if (cancel_work_sync(&smc->connect_work)) sock_put(&smc->sk); /* sock_hold in smc_connect for passive closing */ - if (sk->sk_state == SMC_LISTEN) + if (smc_sk_state(sk) == SMC_LISTEN) /* smc_close_non_accepted() is called and acquires * sock lock for child sockets again */ @@ -337,7 +337,7 @@ static int smc_release(struct socket *sock) else lock_sock(sk); - if (old_state == SMC_INIT && sk->sk_state == SMC_ACTIVE && + if (old_state == SMC_INIT && smc_sk_state(sk) == SMC_ACTIVE && !smc->use_fallback) smc_close_active_abort(smc); @@ -356,7 +356,7 @@ static int smc_release(struct socket *sock) static void smc_destruct(struct sock *sk) { - if (sk->sk_state != SMC_CLOSED) + if (smc_sk_state(sk) != SMC_CLOSED) return; if (!sock_flag(sk, SOCK_DEAD)) return; @@ -375,7 +375,7 @@ static struct sock *smc_sock_alloc(struct net *net, struct socket *sock, return NULL; sock_init_data(sock, sk); /* sets sk_refcnt to 1 */ - sk->sk_state = SMC_INIT; + smc_sk_set_state(sk, SMC_INIT); sk->sk_destruct = smc_destruct; sk->sk_protocol = protocol; WRITE_ONCE(sk->sk_sndbuf, 2 * READ_ONCE(net->smc.sysctl_wmem)); @@ -423,7 +423,7 @@ static int smc_bind(struct socket *sock, struct sockaddr *uaddr, /* Check if socket is already active */ rc = -EINVAL; - if (sk->sk_state != SMC_INIT || smc->connect_nonblock) + if (smc_sk_state(sk) != SMC_INIT || smc->connect_nonblock) goto out_rel; smc->clcsock->sk->sk_reuse = sk->sk_reuse; @@ -946,14 +946,14 @@ static int smc_connect_fallback(struct smc_sock *smc, int reason_code) rc = smc_switch_to_fallback(smc, reason_code); if (rc) { /* fallback fails */ this_cpu_inc(net->smc.smc_stats->clnt_hshake_err_cnt); - if (smc->sk.sk_state == SMC_INIT) + if (smc_sk_state(&smc->sk) == SMC_INIT) sock_put(&smc->sk); /* passive closing */ return rc; } smc_copy_sock_settings_to_clc(smc); smc->connect_nonblock = 0; - if (smc->sk.sk_state == SMC_INIT) - smc->sk.sk_state = SMC_ACTIVE; + if (smc_sk_state(&smc->sk) == SMC_INIT) + smc_sk_set_state(&smc->sk, SMC_ACTIVE); return 0; } @@ -966,7 +966,7 @@ static int smc_connect_decline_fallback(struct smc_sock *smc, int reason_code, if (reason_code < 0) { /* error, fallback is not possible */ this_cpu_inc(net->smc.smc_stats->clnt_hshake_err_cnt); - if (smc->sk.sk_state == SMC_INIT) + if (smc_sk_state(&smc->sk) == SMC_INIT) sock_put(&smc->sk); /* passive closing */ return reason_code; } @@ -974,7 +974,7 @@ static int smc_connect_decline_fallback(struct smc_sock *smc, int reason_code, rc = smc_clc_send_decline(smc, reason_code, version); if (rc < 0) { this_cpu_inc(net->smc.smc_stats->clnt_hshake_err_cnt); - if (smc->sk.sk_state == SMC_INIT) + if (smc_sk_state(&smc->sk) == SMC_INIT) sock_put(&smc->sk); /* passive closing */ return rc; } @@ -1357,8 +1357,8 @@ static int smc_connect_rdma(struct smc_sock *smc, smc_copy_sock_settings_to_clc(smc); smc->connect_nonblock = 0; - if (smc->sk.sk_state == SMC_INIT) - smc->sk.sk_state = SMC_ACTIVE; + if (smc_sk_state(&smc->sk) == SMC_INIT) + smc_sk_set_state(&smc->sk, SMC_ACTIVE); return 0; connect_abort: @@ -1452,8 +1452,8 @@ static int smc_connect_ism(struct smc_sock *smc, smc_copy_sock_settings_to_clc(smc); smc->connect_nonblock = 0; - if (smc->sk.sk_state == SMC_INIT) - smc->sk.sk_state = SMC_ACTIVE; + if (smc_sk_state(&smc->sk) == SMC_INIT) + smc_sk_set_state(&smc->sk, SMC_ACTIVE); return 0; connect_abort: @@ -1607,7 +1607,7 @@ static void smc_connect_work(struct work_struct *work) release_sock(smc->clcsock->sk); lock_sock(&smc->sk); if (rc != 0 || smc->sk.sk_err) { - smc->sk.sk_state = SMC_CLOSED; + smc_sk_set_state(&smc->sk, SMC_CLOSED); if (rc == -EPIPE || rc == -EAGAIN) smc->sk.sk_err = EPIPE; else if (rc == -ECONNREFUSED) @@ -1655,10 +1655,10 @@ static int smc_connect(struct socket *sock, struct sockaddr *addr, rc = -EINVAL; goto out; case SS_CONNECTED: - rc = sk->sk_state == SMC_ACTIVE ? -EISCONN : -EINVAL; + rc = smc_sk_state(sk) == SMC_ACTIVE ? -EISCONN : -EINVAL; goto out; case SS_CONNECTING: - if (sk->sk_state == SMC_ACTIVE) + if (smc_sk_state(sk) == SMC_ACTIVE) goto connected; break; case SS_UNCONNECTED: @@ -1666,7 +1666,7 @@ static int smc_connect(struct socket *sock, struct sockaddr *addr, break; } - switch (sk->sk_state) { + switch (smc_sk_state(sk)) { default: goto out; case SMC_CLOSED: @@ -1740,11 +1740,11 @@ static int smc_clcsock_accept(struct smc_sock *lsmc, struct smc_sock **new_smc) lock_sock(lsk); if (rc < 0 && rc != -EAGAIN) lsk->sk_err = -rc; - if (rc < 0 || lsk->sk_state == SMC_CLOSED) { + if (rc < 0 || smc_sk_state(lsk) == SMC_CLOSED) { new_sk->sk_prot->unhash(new_sk); if (new_clcsock) sock_release(new_clcsock); - new_sk->sk_state = SMC_CLOSED; + smc_sk_set_state(new_sk, SMC_CLOSED); smc_sock_set_flag(new_sk, SOCK_DEAD); sock_put(new_sk); /* final */ *new_smc = NULL; @@ -1812,7 +1812,7 @@ struct sock *smc_accept_dequeue(struct sock *parent, new_sk = (struct sock *)isk; smc_accept_unlink(new_sk); - if (new_sk->sk_state == SMC_CLOSED) { + if (smc_sk_state(new_sk) == SMC_CLOSED) { new_sk->sk_prot->unhash(new_sk); if (isk->clcsock) { sock_release(isk->clcsock); @@ -1911,7 +1911,7 @@ static void smc_listen_out(struct smc_sock *new_smc) if (tcp_sk(new_smc->clcsock->sk)->syn_smc) atomic_dec(&lsmc->queued_smc_hs); - if (lsmc->sk.sk_state == SMC_LISTEN) { + if (smc_sk_state(&lsmc->sk) == SMC_LISTEN) { lock_sock_nested(&lsmc->sk, SINGLE_DEPTH_NESTING); smc_accept_enqueue(&lsmc->sk, newsmcsk); release_sock(&lsmc->sk); @@ -1929,8 +1929,8 @@ static void smc_listen_out_connected(struct smc_sock *new_smc) { struct sock *newsmcsk = &new_smc->sk; - if (newsmcsk->sk_state == SMC_INIT) - newsmcsk->sk_state = SMC_ACTIVE; + if (smc_sk_state(newsmcsk) == SMC_INIT) + smc_sk_set_state(newsmcsk, SMC_ACTIVE); smc_listen_out(new_smc); } @@ -1942,9 +1942,9 @@ static void smc_listen_out_err(struct smc_sock *new_smc) struct net *net = sock_net(newsmcsk); this_cpu_inc(net->smc.smc_stats->srv_hshake_err_cnt); - if (newsmcsk->sk_state == SMC_INIT) + if (smc_sk_state(newsmcsk) == SMC_INIT) sock_put(&new_smc->sk); /* passive closing */ - newsmcsk->sk_state = SMC_CLOSED; + smc_sk_set_state(newsmcsk, SMC_CLOSED); smc_listen_out(new_smc); } @@ -2432,7 +2432,7 @@ static void smc_listen_work(struct work_struct *work) u8 accept_version; int rc = 0; - if (new_smc->listen_smc->sk.sk_state != SMC_LISTEN) + if (smc_sk_state(&new_smc->listen_smc->sk) != SMC_LISTEN) return smc_listen_out_err(new_smc); if (new_smc->use_fallback) { @@ -2564,7 +2564,7 @@ static void smc_tcp_listen_work(struct work_struct *work) int rc = 0; lock_sock(lsk); - while (lsk->sk_state == SMC_LISTEN) { + while (smc_sk_state(lsk) == SMC_LISTEN) { rc = smc_clcsock_accept(lsmc, &new_smc); if (rc) /* clcsock accept queue empty or error */ goto out; @@ -2599,7 +2599,7 @@ static void smc_clcsock_data_ready(struct sock *listen_clcsock) if (!lsmc) goto out; lsmc->clcsk_data_ready(listen_clcsock); - if (lsmc->sk.sk_state == SMC_LISTEN) { + if (smc_sk_state(&lsmc->sk) == SMC_LISTEN) { sock_hold(&lsmc->sk); /* sock_put in smc_tcp_listen_work() */ if (!queue_work(smc_tcp_ls_wq, &lsmc->tcp_listen_work)) sock_put(&lsmc->sk); @@ -2618,12 +2618,12 @@ static int smc_listen(struct socket *sock, int backlog) lock_sock(sk); rc = -EINVAL; - if ((sk->sk_state != SMC_INIT && sk->sk_state != SMC_LISTEN) || + if ((smc_sk_state(sk) != SMC_INIT && smc_sk_state(sk) != SMC_LISTEN) || smc->connect_nonblock || sock->state != SS_UNCONNECTED) goto out; rc = 0; - if (sk->sk_state == SMC_LISTEN) { + if (smc_sk_state(sk) == SMC_LISTEN) { sk->sk_max_ack_backlog = backlog; goto out; } @@ -2666,7 +2666,7 @@ static int smc_listen(struct socket *sock, int backlog) } sk->sk_max_ack_backlog = backlog; sk->sk_ack_backlog = 0; - sk->sk_state = SMC_LISTEN; + smc_sk_set_state(sk, SMC_LISTEN); out: release_sock(sk); @@ -2686,7 +2686,7 @@ static int smc_accept(struct socket *sock, struct socket *new_sock, sock_hold(sk); /* sock_put below */ lock_sock(sk); - if (lsmc->sk.sk_state != SMC_LISTEN) { + if (smc_sk_state(&lsmc->sk) != SMC_LISTEN) { rc = -EINVAL; release_sock(sk); goto out; @@ -2748,8 +2748,8 @@ static int smc_getname(struct socket *sock, struct sockaddr *addr, { struct smc_sock *smc; - if (peer && (sock->sk->sk_state != SMC_ACTIVE) && - (sock->sk->sk_state != SMC_APPCLOSEWAIT1)) + if (peer && (smc_sk_state(sock->sk) != SMC_ACTIVE) && + (smc_sk_state(sock->sk) != SMC_APPCLOSEWAIT1)) return -ENOTCONN; smc = smc_sk(sock->sk); @@ -2769,7 +2769,7 @@ static int smc_sendmsg(struct socket *sock, struct msghdr *msg, size_t len) /* SMC does not support connect with fastopen */ if (msg->msg_flags & MSG_FASTOPEN) { /* not connected yet, fallback */ - if (sk->sk_state == SMC_INIT && !smc->connect_nonblock) { + if (smc_sk_state(sk) == SMC_INIT && !smc->connect_nonblock) { rc = smc_switch_to_fallback(smc, SMC_CLC_DECL_OPTUNSUPP); if (rc) goto out; @@ -2777,9 +2777,9 @@ static int smc_sendmsg(struct socket *sock, struct msghdr *msg, size_t len) rc = -EINVAL; goto out; } - } else if ((sk->sk_state != SMC_ACTIVE) && - (sk->sk_state != SMC_APPCLOSEWAIT1) && - (sk->sk_state != SMC_INIT)) { + } else if ((smc_sk_state(sk) != SMC_ACTIVE) && + (smc_sk_state(sk) != SMC_APPCLOSEWAIT1) && + (smc_sk_state(sk) != SMC_INIT)) { rc = -EPIPE; goto out; } @@ -2804,17 +2804,17 @@ static int smc_recvmsg(struct socket *sock, struct msghdr *msg, size_t len, smc = smc_sk(sk); lock_sock(sk); - if (sk->sk_state == SMC_CLOSED && (sk->sk_shutdown & RCV_SHUTDOWN)) { + if (smc_sk_state(sk) == SMC_CLOSED && (sk->sk_shutdown & RCV_SHUTDOWN)) { /* socket was connected before, no more data to read */ rc = 0; goto out; } - if ((sk->sk_state == SMC_INIT) || - (sk->sk_state == SMC_LISTEN) || - (sk->sk_state == SMC_CLOSED)) + if ((smc_sk_state(sk) == SMC_INIT) || + (smc_sk_state(sk) == SMC_LISTEN) || + (smc_sk_state(sk) == SMC_CLOSED)) goto out; - if (sk->sk_state == SMC_PEERFINCLOSEWAIT) { + if (smc_sk_state(sk) == SMC_PEERFINCLOSEWAIT) { rc = 0; goto out; } @@ -2861,14 +2861,14 @@ static __poll_t smc_poll(struct file *file, struct socket *sock, mask = smc->clcsock->ops->poll(file, smc->clcsock, wait); sk->sk_err = smc->clcsock->sk->sk_err; } else { - if (sk->sk_state != SMC_CLOSED) + if (smc_sk_state(sk) != SMC_CLOSED) sock_poll_wait(file, sock, wait); if (sk->sk_err) mask |= EPOLLERR; if ((sk->sk_shutdown == SHUTDOWN_MASK) || - (sk->sk_state == SMC_CLOSED)) + (smc_sk_state(sk) == SMC_CLOSED)) mask |= EPOLLHUP; - if (sk->sk_state == SMC_LISTEN) { + if (smc_sk_state(sk) == SMC_LISTEN) { /* woken up by sk_data_ready in smc_listen_work() */ mask |= smc_accept_poll(sk); } else if (smc->use_fallback) { /* as result of connect_work()*/ @@ -2876,7 +2876,7 @@ static __poll_t smc_poll(struct file *file, struct socket *sock, wait); sk->sk_err = smc->clcsock->sk->sk_err; } else { - if ((sk->sk_state != SMC_INIT && + if ((smc_sk_state(sk) != SMC_INIT && atomic_read(&smc->conn.sndbuf_space)) || sk->sk_shutdown & SEND_SHUTDOWN) { mask |= EPOLLOUT | EPOLLWRNORM; @@ -2888,7 +2888,7 @@ static __poll_t smc_poll(struct file *file, struct socket *sock, mask |= EPOLLIN | EPOLLRDNORM; if (sk->sk_shutdown & RCV_SHUTDOWN) mask |= EPOLLIN | EPOLLRDNORM | EPOLLRDHUP; - if (sk->sk_state == SMC_APPCLOSEWAIT1) + if (smc_sk_state(sk) == SMC_APPCLOSEWAIT1) mask |= EPOLLIN; if (smc->conn.urg_state == SMC_URG_VALID) mask |= EPOLLPRI; @@ -2915,29 +2915,29 @@ static int smc_shutdown(struct socket *sock, int how) lock_sock(sk); if (sock->state == SS_CONNECTING) { - if (sk->sk_state == SMC_ACTIVE) + if (smc_sk_state(sk) == SMC_ACTIVE) sock->state = SS_CONNECTED; - else if (sk->sk_state == SMC_PEERCLOSEWAIT1 || - sk->sk_state == SMC_PEERCLOSEWAIT2 || - sk->sk_state == SMC_APPCLOSEWAIT1 || - sk->sk_state == SMC_APPCLOSEWAIT2 || - sk->sk_state == SMC_APPFINCLOSEWAIT) + else if (smc_sk_state(sk) == SMC_PEERCLOSEWAIT1 || + smc_sk_state(sk) == SMC_PEERCLOSEWAIT2 || + smc_sk_state(sk) == SMC_APPCLOSEWAIT1 || + smc_sk_state(sk) == SMC_APPCLOSEWAIT2 || + smc_sk_state(sk) == SMC_APPFINCLOSEWAIT) sock->state = SS_DISCONNECTING; } rc = -ENOTCONN; - if ((sk->sk_state != SMC_ACTIVE) && - (sk->sk_state != SMC_PEERCLOSEWAIT1) && - (sk->sk_state != SMC_PEERCLOSEWAIT2) && - (sk->sk_state != SMC_APPCLOSEWAIT1) && - (sk->sk_state != SMC_APPCLOSEWAIT2) && - (sk->sk_state != SMC_APPFINCLOSEWAIT)) + if ((smc_sk_state(sk) != SMC_ACTIVE) && + (smc_sk_state(sk) != SMC_PEERCLOSEWAIT1) && + (smc_sk_state(sk) != SMC_PEERCLOSEWAIT2) && + (smc_sk_state(sk) != SMC_APPCLOSEWAIT1) && + (smc_sk_state(sk) != SMC_APPCLOSEWAIT2) && + (smc_sk_state(sk) != SMC_APPFINCLOSEWAIT)) goto out; if (smc->use_fallback) { rc = kernel_sock_shutdown(smc->clcsock, how); sk->sk_shutdown = smc->clcsock->sk->sk_shutdown; if (sk->sk_shutdown == SHUTDOWN_MASK) { - sk->sk_state = SMC_CLOSED; + smc_sk_set_state(sk, SMC_CLOSED); sk->sk_socket->state = SS_UNCONNECTED; sock_put(sk); } @@ -2945,10 +2945,10 @@ static int smc_shutdown(struct socket *sock, int how) } switch (how) { case SHUT_RDWR: /* shutdown in both directions */ - old_state = sk->sk_state; + old_state = smc_sk_state(sk); rc = smc_close_active(smc); if (old_state == SMC_ACTIVE && - sk->sk_state == SMC_PEERCLOSEWAIT1) + smc_sk_state(sk) == SMC_PEERCLOSEWAIT1) do_shutdown = false; break; case SHUT_WR: @@ -2964,7 +2964,7 @@ static int smc_shutdown(struct socket *sock, int how) /* map sock_shutdown_cmd constants to sk_shutdown value range */ sk->sk_shutdown |= how + 1; - if (sk->sk_state == SMC_CLOSED) + if (smc_sk_state(sk) == SMC_CLOSED) sock->state = SS_UNCONNECTED; else sock->state = SS_DISCONNECTING; @@ -3085,16 +3085,15 @@ static int smc_setsockopt(struct socket *sock, int level, int optname, case TCP_FASTOPEN_KEY: case TCP_FASTOPEN_NO_COOKIE: /* option not supported by SMC */ - if (sk->sk_state == SMC_INIT && !smc->connect_nonblock) { + if (smc_sk_state(sk) == SMC_INIT && !smc->connect_nonblock) rc = smc_switch_to_fallback(smc, SMC_CLC_DECL_OPTUNSUPP); - } else { + else rc = -EINVAL; - } break; case TCP_NODELAY: - if (sk->sk_state != SMC_INIT && - sk->sk_state != SMC_LISTEN && - sk->sk_state != SMC_CLOSED) { + if (smc_sk_state(sk) != SMC_INIT && + smc_sk_state(sk) != SMC_LISTEN && + smc_sk_state(sk) != SMC_CLOSED) { if (val) { SMC_STAT_INC(smc, ndly_cnt); smc_tx_pending(&smc->conn); @@ -3103,9 +3102,9 @@ static int smc_setsockopt(struct socket *sock, int level, int optname, } break; case TCP_CORK: - if (sk->sk_state != SMC_INIT && - sk->sk_state != SMC_LISTEN && - sk->sk_state != SMC_CLOSED) { + if (smc_sk_state(sk) != SMC_INIT && + smc_sk_state(sk) != SMC_LISTEN && + smc_sk_state(sk) != SMC_CLOSED) { if (!val) { SMC_STAT_INC(smc, cork_cnt); smc_tx_pending(&smc->conn); @@ -3173,24 +3172,24 @@ static int smc_ioctl(struct socket *sock, unsigned int cmd, } switch (cmd) { case SIOCINQ: /* same as FIONREAD */ - if (smc->sk.sk_state == SMC_LISTEN) { + if (smc_sk_state(&smc->sk) == SMC_LISTEN) { release_sock(&smc->sk); return -EINVAL; } - if (smc->sk.sk_state == SMC_INIT || - smc->sk.sk_state == SMC_CLOSED) + if (smc_sk_state(&smc->sk) == SMC_INIT || + smc_sk_state(&smc->sk) == SMC_CLOSED) answ = 0; else answ = atomic_read(&smc->conn.bytes_to_rcv); break; case SIOCOUTQ: /* output queue size (not send + not acked) */ - if (smc->sk.sk_state == SMC_LISTEN) { + if (smc_sk_state(&smc->sk) == SMC_LISTEN) { release_sock(&smc->sk); return -EINVAL; } - if (smc->sk.sk_state == SMC_INIT || - smc->sk.sk_state == SMC_CLOSED) + if (smc_sk_state(&smc->sk) == SMC_INIT || + smc_sk_state(&smc->sk) == SMC_CLOSED) answ = 0; else answ = smc->conn.sndbuf_desc->len - @@ -3198,23 +3197,23 @@ static int smc_ioctl(struct socket *sock, unsigned int cmd, break; case SIOCOUTQNSD: /* output queue size (not send only) */ - if (smc->sk.sk_state == SMC_LISTEN) { + if (smc_sk_state(&smc->sk) == SMC_LISTEN) { release_sock(&smc->sk); return -EINVAL; } - if (smc->sk.sk_state == SMC_INIT || - smc->sk.sk_state == SMC_CLOSED) + if (smc_sk_state(&smc->sk) == SMC_INIT || + smc_sk_state(&smc->sk) == SMC_CLOSED) answ = 0; else answ = smc_tx_prepared_sends(&smc->conn); break; case SIOCATMARK: - if (smc->sk.sk_state == SMC_LISTEN) { + if (smc_sk_state(&smc->sk) == SMC_LISTEN) { release_sock(&smc->sk); return -EINVAL; } - if (smc->sk.sk_state == SMC_INIT || - smc->sk.sk_state == SMC_CLOSED) { + if (smc_sk_state(&smc->sk) == SMC_INIT || + smc_sk_state(&smc->sk) == SMC_CLOSED) { answ = 0; } else { smc_curs_copy(&cons, &conn->local_tx_ctrl.cons, conn); @@ -3248,17 +3247,17 @@ static ssize_t smc_splice_read(struct socket *sock, loff_t *ppos, smc = smc_sk(sk); lock_sock(sk); - if (sk->sk_state == SMC_CLOSED && (sk->sk_shutdown & RCV_SHUTDOWN)) { + if (smc_sk_state(sk) == SMC_CLOSED && (sk->sk_shutdown & RCV_SHUTDOWN)) { /* socket was connected before, no more data to read */ rc = 0; goto out; } - if (sk->sk_state == SMC_INIT || - sk->sk_state == SMC_LISTEN || - sk->sk_state == SMC_CLOSED) + if (smc_sk_state(sk) == SMC_INIT || + smc_sk_state(sk) == SMC_LISTEN || + smc_sk_state(sk) == SMC_CLOSED) goto out; - if (sk->sk_state == SMC_PEERFINCLOSEWAIT) { + if (smc_sk_state(sk) == SMC_PEERFINCLOSEWAIT) { rc = 0; goto out; } diff --git a/net/smc/smc.h b/net/smc/smc.h index 18c8b78..6b651b5 100644 --- a/net/smc/smc.h +++ b/net/smc/smc.h @@ -38,6 +38,9 @@ #define KERNEL_HAS_ATOMIC64 #endif +#define smc_sk_state(sk) ((sk)->sk_state) +#define smc_sk_set_state(sk, state) (smc_sk_state(sk) = (state)) + enum smc_state { /* possible states of an SMC socket */ SMC_ACTIVE = 1, SMC_INIT = 2, diff --git a/net/smc/smc_close.c b/net/smc/smc_close.c index 10219f5..9210d1f 100644 --- a/net/smc/smc_close.c +++ b/net/smc/smc_close.c @@ -130,41 +130,41 @@ void smc_close_active_abort(struct smc_sock *smc) struct sock *sk = &smc->sk; bool release_clcsock = false; - if (sk->sk_state != SMC_INIT && smc->clcsock && smc->clcsock->sk) { + if (smc_sk_state(sk) != SMC_INIT && smc->clcsock && smc->clcsock->sk) { sk->sk_err = ECONNABORTED; if (smc->clcsock && smc->clcsock->sk) tcp_abort(smc->clcsock->sk, ECONNABORTED); } - switch (sk->sk_state) { + switch (smc_sk_state(sk)) { case SMC_ACTIVE: case SMC_APPCLOSEWAIT1: case SMC_APPCLOSEWAIT2: - sk->sk_state = SMC_PEERABORTWAIT; + smc_sk_set_state(sk, SMC_PEERABORTWAIT); smc_close_cancel_work(smc); - if (sk->sk_state != SMC_PEERABORTWAIT) + if (smc_sk_state(sk) != SMC_PEERABORTWAIT) break; - sk->sk_state = SMC_CLOSED; + smc_sk_set_state(sk, SMC_CLOSED); sock_put(sk); /* (postponed) passive closing */ break; case SMC_PEERCLOSEWAIT1: case SMC_PEERCLOSEWAIT2: case SMC_PEERFINCLOSEWAIT: - sk->sk_state = SMC_PEERABORTWAIT; + smc_sk_set_state(sk, SMC_PEERABORTWAIT); smc_close_cancel_work(smc); - if (sk->sk_state != SMC_PEERABORTWAIT) + if (smc_sk_state(sk) != SMC_PEERABORTWAIT) break; - sk->sk_state = SMC_CLOSED; + smc_sk_set_state(sk, SMC_CLOSED); smc_conn_free(&smc->conn); release_clcsock = true; sock_put(sk); /* passive closing */ break; case SMC_PROCESSABORT: case SMC_APPFINCLOSEWAIT: - sk->sk_state = SMC_PEERABORTWAIT; + smc_sk_set_state(sk, SMC_PEERABORTWAIT); smc_close_cancel_work(smc); - if (sk->sk_state != SMC_PEERABORTWAIT) + if (smc_sk_state(sk) != SMC_PEERABORTWAIT) break; - sk->sk_state = SMC_CLOSED; + smc_sk_set_state(sk, SMC_CLOSED); smc_conn_free(&smc->conn); release_clcsock = true; break; @@ -205,14 +205,14 @@ int smc_close_active(struct smc_sock *smc) 0 : sock_flag(sk, SOCK_LINGER) ? sk->sk_lingertime : SMC_MAX_STREAM_WAIT_TIMEOUT; - old_state = sk->sk_state; + old_state = smc_sk_state(sk); again: - switch (sk->sk_state) { + switch (smc_sk_state(sk)) { case SMC_INIT: - sk->sk_state = SMC_CLOSED; + smc_sk_set_state(sk, SMC_CLOSED); break; case SMC_LISTEN: - sk->sk_state = SMC_CLOSED; + smc_sk_set_state(sk, SMC_CLOSED); sk->sk_state_change(sk); /* wake up accept */ if (smc->clcsock && smc->clcsock->sk) { write_lock_bh(&smc->clcsock->sk->sk_callback_lock); @@ -232,10 +232,10 @@ int smc_close_active(struct smc_sock *smc) release_sock(sk); cancel_delayed_work_sync(&conn->tx_work); lock_sock(sk); - if (sk->sk_state == SMC_ACTIVE) { + if (smc_sk_state(sk) == SMC_ACTIVE) { /* send close request */ rc = smc_close_final(conn); - sk->sk_state = SMC_PEERCLOSEWAIT1; + smc_sk_set_state(sk, SMC_PEERCLOSEWAIT1); /* actively shutdown clcsock before peer close it, * prevent peer from entering TIME_WAIT state. @@ -257,7 +257,7 @@ int smc_close_active(struct smc_sock *smc) /* just shutdown wr done, send close request */ rc = smc_close_final(conn); } - sk->sk_state = SMC_CLOSED; + smc_sk_set_state(sk, SMC_CLOSED); break; case SMC_APPCLOSEWAIT1: case SMC_APPCLOSEWAIT2: @@ -266,18 +266,18 @@ int smc_close_active(struct smc_sock *smc) release_sock(sk); cancel_delayed_work_sync(&conn->tx_work); lock_sock(sk); - if (sk->sk_state != SMC_APPCLOSEWAIT1 && - sk->sk_state != SMC_APPCLOSEWAIT2) + if (smc_sk_state(sk) != SMC_APPCLOSEWAIT1 && + smc_sk_state(sk) != SMC_APPCLOSEWAIT2) goto again; /* confirm close from peer */ rc = smc_close_final(conn); if (smc_cdc_rxed_any_close(conn)) { /* peer has closed the socket already */ - sk->sk_state = SMC_CLOSED; + smc_sk_set_state(sk, SMC_CLOSED); sock_put(sk); /* postponed passive closing */ } else { /* peer has just issued a shutdown write */ - sk->sk_state = SMC_PEERFINCLOSEWAIT; + smc_sk_set_state(sk, SMC_PEERFINCLOSEWAIT); } break; case SMC_PEERCLOSEWAIT1: @@ -294,17 +294,17 @@ int smc_close_active(struct smc_sock *smc) break; case SMC_PROCESSABORT: rc = smc_close_abort(conn); - sk->sk_state = SMC_CLOSED; + smc_sk_set_state(sk, SMC_CLOSED); break; case SMC_PEERABORTWAIT: - sk->sk_state = SMC_CLOSED; + smc_sk_set_state(sk, SMC_CLOSED); break; case SMC_CLOSED: /* nothing to do, add tracing in future patch */ break; } - if (old_state != sk->sk_state) + if (old_state != smc_sk_state(sk)) sk->sk_state_change(sk); return rc; } @@ -315,33 +315,33 @@ static void smc_close_passive_abort_received(struct smc_sock *smc) &smc->conn.local_tx_ctrl.conn_state_flags; struct sock *sk = &smc->sk; - switch (sk->sk_state) { + switch (smc_sk_state(sk)) { case SMC_INIT: case SMC_ACTIVE: case SMC_APPCLOSEWAIT1: - sk->sk_state = SMC_PROCESSABORT; + smc_sk_set_state(sk, SMC_PROCESSABORT); sock_put(sk); /* passive closing */ break; case SMC_APPFINCLOSEWAIT: - sk->sk_state = SMC_PROCESSABORT; + smc_sk_set_state(sk, SMC_PROCESSABORT); break; case SMC_PEERCLOSEWAIT1: case SMC_PEERCLOSEWAIT2: if (txflags->peer_done_writing && !smc_close_sent_any_close(&smc->conn)) /* just shutdown, but not yet closed locally */ - sk->sk_state = SMC_PROCESSABORT; + smc_sk_set_state(sk, SMC_PROCESSABORT); else - sk->sk_state = SMC_CLOSED; + smc_sk_set_state(sk, SMC_CLOSED); sock_put(sk); /* passive closing */ break; case SMC_APPCLOSEWAIT2: case SMC_PEERFINCLOSEWAIT: - sk->sk_state = SMC_CLOSED; + smc_sk_set_state(sk, SMC_CLOSED); sock_put(sk); /* passive closing */ break; case SMC_PEERABORTWAIT: - sk->sk_state = SMC_CLOSED; + smc_sk_set_state(sk, SMC_CLOSED); break; case SMC_PROCESSABORT: /* nothing to do, add tracing in future patch */ @@ -365,7 +365,7 @@ static void smc_close_passive_work(struct work_struct *work) int old_state; lock_sock(sk); - old_state = sk->sk_state; + old_state = smc_sk_state(sk); rxflags = &conn->local_rx_ctrl.conn_state_flags; if (rxflags->peer_conn_abort) { @@ -377,19 +377,19 @@ static void smc_close_passive_work(struct work_struct *work) goto wakeup; } - switch (sk->sk_state) { + switch (smc_sk_state(sk)) { case SMC_INIT: - sk->sk_state = SMC_APPCLOSEWAIT1; + smc_sk_set_state(sk, SMC_APPCLOSEWAIT1); break; case SMC_ACTIVE: - sk->sk_state = SMC_APPCLOSEWAIT1; + smc_sk_set_state(sk, SMC_APPCLOSEWAIT1); /* postpone sock_put() for passive closing to cover * received SEND_SHUTDOWN as well */ break; case SMC_PEERCLOSEWAIT1: if (rxflags->peer_done_writing) - sk->sk_state = SMC_PEERCLOSEWAIT2; + smc_sk_set_state(sk, SMC_PEERCLOSEWAIT2); fallthrough; /* to check for closing */ case SMC_PEERCLOSEWAIT2: @@ -398,16 +398,16 @@ static void smc_close_passive_work(struct work_struct *work) if (sock_flag(sk, SOCK_DEAD) && smc_close_sent_any_close(conn)) { /* smc_release has already been called locally */ - sk->sk_state = SMC_CLOSED; + smc_sk_set_state(sk, SMC_CLOSED); } else { /* just shutdown, but not yet closed locally */ - sk->sk_state = SMC_APPFINCLOSEWAIT; + smc_sk_set_state(sk, SMC_APPFINCLOSEWAIT); } sock_put(sk); /* passive closing */ break; case SMC_PEERFINCLOSEWAIT: if (smc_cdc_rxed_any_close(conn)) { - sk->sk_state = SMC_CLOSED; + smc_sk_set_state(sk, SMC_CLOSED); sock_put(sk); /* passive closing */ } break; @@ -429,9 +429,9 @@ static void smc_close_passive_work(struct work_struct *work) sk->sk_data_ready(sk); /* wakeup blocked rcvbuf consumers */ sk->sk_write_space(sk); /* wakeup blocked sndbuf producers */ - if (old_state != sk->sk_state) { + if (old_state != smc_sk_state(sk)) { sk->sk_state_change(sk); - if ((sk->sk_state == SMC_CLOSED) && + if ((smc_sk_state(sk) == SMC_CLOSED) && (sock_flag(sk, SOCK_DEAD) || !sk->sk_socket)) { smc_conn_free(conn); if (smc->clcsock) @@ -456,19 +456,19 @@ int smc_close_shutdown_write(struct smc_sock *smc) 0 : sock_flag(sk, SOCK_LINGER) ? sk->sk_lingertime : SMC_MAX_STREAM_WAIT_TIMEOUT; - old_state = sk->sk_state; + old_state = smc_sk_state(sk); again: - switch (sk->sk_state) { + switch (smc_sk_state(sk)) { case SMC_ACTIVE: smc_close_stream_wait(smc, timeout); release_sock(sk); cancel_delayed_work_sync(&conn->tx_work); lock_sock(sk); - if (sk->sk_state != SMC_ACTIVE) + if (smc_sk_state(sk) != SMC_ACTIVE) goto again; /* send close wr request */ rc = smc_close_wr(conn); - sk->sk_state = SMC_PEERCLOSEWAIT1; + smc_sk_set_state(sk, SMC_PEERCLOSEWAIT1); break; case SMC_APPCLOSEWAIT1: /* passive close */ @@ -477,11 +477,11 @@ int smc_close_shutdown_write(struct smc_sock *smc) release_sock(sk); cancel_delayed_work_sync(&conn->tx_work); lock_sock(sk); - if (sk->sk_state != SMC_APPCLOSEWAIT1) + if (smc_sk_state(sk) != SMC_APPCLOSEWAIT1) goto again; /* confirm close from peer */ rc = smc_close_wr(conn); - sk->sk_state = SMC_APPCLOSEWAIT2; + smc_sk_set_state(sk, SMC_APPCLOSEWAIT2); break; case SMC_APPCLOSEWAIT2: case SMC_PEERFINCLOSEWAIT: @@ -494,7 +494,7 @@ int smc_close_shutdown_write(struct smc_sock *smc) break; } - if (old_state != sk->sk_state) + if (old_state != smc_sk_state(sk)) sk->sk_state_change(sk); return rc; } diff --git a/net/smc/smc_core.c b/net/smc/smc_core.c index 9b84d58..b852c09 100644 --- a/net/smc/smc_core.c +++ b/net/smc/smc_core.c @@ -1010,8 +1010,8 @@ static int smc_switch_cursor(struct smc_sock *smc, struct smc_cdc_tx_pend *pend, /* recalculate, value is used by tx_rdma_writes() */ atomic_set(&smc->conn.peer_rmbe_space, smc_write_space(conn)); - if (smc->sk.sk_state != SMC_INIT && - smc->sk.sk_state != SMC_CLOSED) { + if (smc_sk_state(&smc->sk) != SMC_INIT && + smc_sk_state(&smc->sk) != SMC_CLOSED) { rc = smcr_cdc_msg_send_validation(conn, pend, wr_buf); if (!rc) { queue_delayed_work(conn->lgr->tx_wq, &conn->tx_work, 0); @@ -1072,17 +1072,17 @@ struct smc_link *smc_switch_conns(struct smc_link_group *lgr, continue; smc = container_of(conn, struct smc_sock, conn); /* conn->lnk not yet set in SMC_INIT state */ - if (smc->sk.sk_state == SMC_INIT) + if (smc_sk_state(&smc->sk) == SMC_INIT) continue; - if (smc->sk.sk_state == SMC_CLOSED || - smc->sk.sk_state == SMC_PEERCLOSEWAIT1 || - smc->sk.sk_state == SMC_PEERCLOSEWAIT2 || - smc->sk.sk_state == SMC_APPFINCLOSEWAIT || - smc->sk.sk_state == SMC_APPCLOSEWAIT1 || - smc->sk.sk_state == SMC_APPCLOSEWAIT2 || - smc->sk.sk_state == SMC_PEERFINCLOSEWAIT || - smc->sk.sk_state == SMC_PEERABORTWAIT || - smc->sk.sk_state == SMC_PROCESSABORT) { + if (smc_sk_state(&smc->sk) == SMC_CLOSED || + smc_sk_state(&smc->sk) == SMC_PEERCLOSEWAIT1 || + smc_sk_state(&smc->sk) == SMC_PEERCLOSEWAIT2 || + smc_sk_state(&smc->sk) == SMC_APPFINCLOSEWAIT || + smc_sk_state(&smc->sk) == SMC_APPCLOSEWAIT1 || + smc_sk_state(&smc->sk) == SMC_APPCLOSEWAIT2 || + smc_sk_state(&smc->sk) == SMC_PEERFINCLOSEWAIT || + smc_sk_state(&smc->sk) == SMC_PEERABORTWAIT || + smc_sk_state(&smc->sk) == SMC_PROCESSABORT) { spin_lock_bh(&conn->send_lock); smc_switch_link_and_count(conn, to_lnk); spin_unlock_bh(&conn->send_lock); diff --git a/net/smc/smc_diag.c b/net/smc/smc_diag.c index 6fdb2d9..59a18ec 100644 --- a/net/smc/smc_diag.c +++ b/net/smc/smc_diag.c @@ -87,7 +87,7 @@ static int __smc_diag_dump(struct sock *sk, struct sk_buff *skb, r = nlmsg_data(nlh); smc_diag_msg_common_fill(r, sk); - r->diag_state = sk->sk_state; + r->diag_state = smc_sk_state(sk); if (smc->use_fallback) r->diag_mode = SMC_DIAG_MODE_FALLBACK_TCP; else if (smc_conn_lgr_valid(&smc->conn) && smc->conn.lgr->is_smcd) diff --git a/net/smc/smc_rx.c b/net/smc/smc_rx.c index 9a2f363..32fd7db 100644 --- a/net/smc/smc_rx.c +++ b/net/smc/smc_rx.c @@ -44,7 +44,7 @@ static void smc_rx_wake_up(struct sock *sk) EPOLLRDNORM | EPOLLRDBAND); sk_wake_async(sk, SOCK_WAKE_WAITD, POLL_IN); if ((sk->sk_shutdown == SHUTDOWN_MASK) || - (sk->sk_state == SMC_CLOSED)) + (smc_sk_state(sk) == SMC_CLOSED)) sk_wake_async(sk, SOCK_WAKE_WAITD, POLL_HUP); rcu_read_unlock(); } @@ -119,9 +119,9 @@ static void smc_rx_pipe_buf_release(struct pipe_inode_info *pipe, struct smc_connection *conn; struct sock *sk = &smc->sk; - if (sk->sk_state == SMC_CLOSED || - sk->sk_state == SMC_PEERFINCLOSEWAIT || - sk->sk_state == SMC_APPFINCLOSEWAIT) + if (smc_sk_state(sk) == SMC_CLOSED || + smc_sk_state(sk) == SMC_PEERFINCLOSEWAIT || + smc_sk_state(sk) == SMC_APPFINCLOSEWAIT) goto out; conn = &smc->conn; lock_sock(sk); @@ -316,7 +316,7 @@ static int smc_rx_recv_urg(struct smc_sock *smc, struct msghdr *msg, int len, return rc ? -EFAULT : len; } - if (sk->sk_state == SMC_CLOSED || sk->sk_shutdown & RCV_SHUTDOWN) + if (smc_sk_state(sk) == SMC_CLOSED || sk->sk_shutdown & RCV_SHUTDOWN) return 0; return -EAGAIN; @@ -361,7 +361,7 @@ int smc_rx_recvmsg(struct smc_sock *smc, struct msghdr *msg, return -EINVAL; /* future work for sk.sk_family == AF_SMC */ sk = &smc->sk; - if (sk->sk_state == SMC_LISTEN) + if (smc_sk_state(sk) == SMC_LISTEN) return -ENOTCONN; if (flags & MSG_OOB) return smc_rx_recv_urg(smc, msg, len, flags); @@ -398,7 +398,7 @@ int smc_rx_recvmsg(struct smc_sock *smc, struct msghdr *msg, if (read_done) { if (sk->sk_err || - sk->sk_state == SMC_CLOSED || + smc_sk_state(sk) == SMC_CLOSED || !timeo || signal_pending(current)) break; @@ -407,7 +407,7 @@ int smc_rx_recvmsg(struct smc_sock *smc, struct msghdr *msg, read_done = sock_error(sk); break; } - if (sk->sk_state == SMC_CLOSED) { + if (smc_sk_state(sk) == SMC_CLOSED) { if (!sock_flag(sk, SOCK_DONE)) { /* This occurs when user tries to read * from never connected socket. diff --git a/net/smc/smc_tx.c b/net/smc/smc_tx.c index 214ac3c..75b532d 100644 --- a/net/smc/smc_tx.c +++ b/net/smc/smc_tx.c @@ -198,7 +198,7 @@ int smc_tx_sendmsg(struct smc_sock *smc, struct msghdr *msg, size_t len) goto out_err; } - if (sk->sk_state == SMC_INIT) + if (smc_sk_state(sk) == SMC_INIT) return -ENOTCONN; if (len > conn->sndbuf_desc->len) From patchwork Tue Feb 20 07:01:28 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "D. Wythe" X-Patchwork-Id: 13563515 Received: from out30-98.freemail.mail.aliyun.com (out30-98.freemail.mail.aliyun.com [115.124.30.98]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 391B55A0EC; Tue, 20 Feb 2024 07:07:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=115.124.30.98 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708412837; cv=none; b=FTM1gAZjvrR9DBwgjsNQ4nqQRfT6ixL3bJq8PbspPGySrhGHE5T7WJIT/6yHkZ/QrafZMkGJ0aatMsZ9+8xHqPKdWT/KAODoL18UhhRtHZFPh94ezlNdFWPudsfsc6RAjBm/tfIwFhPNgg1Wqw2Ci9zTou7nIkj5ZqfkXMZg1co= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708412837; c=relaxed/simple; bh=V6nov/X9dAxDaB3AgtB20JCVQxqnfxoWclH6L5e9a6Q=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References; b=KVPr2liwWG+lqq5C5KTFbBIEG7ZS3EvMdLkyqgWnHs3/5yJ+JnHjpESsqS95FUC7Sf6TSDATi9qL95PpIFMRlo3ttPdWI8vz1zRMZZkarTPfJUVJRL7fAXztrEt2kKiqoZkW/XAWa6eAkU5CKDk1ZG4fRylRm2XssdmLJdQTLwM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com; spf=pass smtp.mailfrom=linux.alibaba.com; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b=NTFQljrO; arc=none smtp.client-ip=115.124.30.98 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b="NTFQljrO" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1708412832; h=From:To:Subject:Date:Message-Id; bh=rqaJIEKhS63d3A/o0DP77Ow0D6PbqQ03S+zaLE+WH2M=; b=NTFQljrO2iSmL5lOei6W8f2R2smwKa3GazBBVtIoOIxsvColnYmzf2Ep94fj4OWEaruaSCSdKT6uwNZIt7CuKFkaHuyi5+WHROzY1o6eu4yORc32v4bjW5q4wU7DInl1nVFPM74iq6QvRPtJP0xJ02wz7yyWDAz5PZRIJwgwaaI= X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R171e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=ay29a033018045176;MF=alibuda@linux.alibaba.com;NM=1;PH=DS;RN=13;SR=0;TI=SMTPD_---0W0vuXdd_1708412511; Received: from j66a10360.sqa.eu95.tbsite.net(mailfrom:alibuda@linux.alibaba.com fp:SMTPD_---0W0vuXdd_1708412511) by smtp.aliyun-inc.com; Tue, 20 Feb 2024 15:01:51 +0800 From: "D. Wythe" To: kgraul@linux.ibm.com, wenjia@linux.ibm.com, jaka@linux.ibm.com, wintera@linux.ibm.com, guwen@linux.alibaba.com Cc: kuba@kernel.org, davem@davemloft.net, netdev@vger.kernel.org, linux-s390@vger.kernel.org, linux-rdma@vger.kernel.org, tonylu@linux.alibaba.com, pabeni@redhat.com, edumazet@google.com Subject: [RFC net-next 03/20] net/smc: refactor smc_setsockopt Date: Tue, 20 Feb 2024 15:01:28 +0800 Message-Id: <1708412505-34470-4-git-send-email-alibuda@linux.alibaba.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1708412505-34470-1-git-send-email-alibuda@linux.alibaba.com> References: <1708412505-34470-1-git-send-email-alibuda@linux.alibaba.com> Precedence: bulk X-Mailing-List: linux-rdma@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: From: "D. Wythe" Refactoring the processing of socket options in SMC, extracting common processing functions and unsupported options. Signed-off-by: D. Wythe --- net/smc/af_smc.c | 101 ++++++++++++++++++++++++++++++++++--------------------- 1 file changed, 63 insertions(+), 38 deletions(-) diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c index bdb6dd7..e87af68 100644 --- a/net/smc/af_smc.c +++ b/net/smc/af_smc.c @@ -3038,58 +3038,40 @@ static int __smc_setsockopt(struct socket *sock, int level, int optname, return rc; } -static int smc_setsockopt(struct socket *sock, int level, int optname, - sockptr_t optval, unsigned int optlen) +/* When an unsupported sockopt is found, + * SMC should try it best to fallback. If fallback is not possible, + * an error should be explicitly returned. + */ +static inline bool smc_is_unsupport_tcp_sockopt(int optname) +{ + switch (optname) { + case TCP_FASTOPEN: + case TCP_FASTOPEN_CONNECT: + case TCP_FASTOPEN_KEY: + case TCP_FASTOPEN_NO_COOKIE: + return true; + } + return false; +} + +static int smc_setsockopt_common(struct socket *sock, int level, int optname, + sockptr_t optval, unsigned int optlen) { struct sock *sk = sock->sk; struct smc_sock *smc; - int val, rc; - - if (level == SOL_TCP && optname == TCP_ULP) - return -EOPNOTSUPP; - else if (level == SOL_SMC) - return __smc_setsockopt(sock, level, optname, optval, optlen); + int val, rc = 0; smc = smc_sk(sk); - /* generic setsockopts reaching us here always apply to the - * CLC socket - */ - mutex_lock(&smc->clcsock_release_lock); - if (!smc->clcsock) { - mutex_unlock(&smc->clcsock_release_lock); - return -EBADF; - } - if (unlikely(!smc->clcsock->ops->setsockopt)) - rc = -EOPNOTSUPP; - else - rc = smc->clcsock->ops->setsockopt(smc->clcsock, level, optname, - optval, optlen); - if (smc->clcsock->sk->sk_err) { - sk->sk_err = smc->clcsock->sk->sk_err; - sk_error_report(sk); - } - mutex_unlock(&smc->clcsock_release_lock); - if (optlen < sizeof(int)) return -EINVAL; if (copy_from_sockptr(&val, optval, sizeof(int))) return -EFAULT; lock_sock(sk); - if (rc || smc->use_fallback) + if (smc->use_fallback) goto out; switch (optname) { - case TCP_FASTOPEN: - case TCP_FASTOPEN_CONNECT: - case TCP_FASTOPEN_KEY: - case TCP_FASTOPEN_NO_COOKIE: - /* option not supported by SMC */ - if (smc_sk_state(sk) == SMC_INIT && !smc->connect_nonblock) - rc = smc_switch_to_fallback(smc, SMC_CLC_DECL_OPTUNSUPP); - else - rc = -EINVAL; - break; case TCP_NODELAY: if (smc_sk_state(sk) != SMC_INIT && smc_sk_state(sk) != SMC_LISTEN && @@ -3116,6 +3098,13 @@ static int smc_setsockopt(struct socket *sock, int level, int optname, smc->sockopt_defer_accept = val; break; default: + if (smc_is_unsupport_tcp_sockopt(optname)) { + /* option not supported by SMC */ + if (smc_sk_state(sk) == SMC_INIT && !smc->connect_nonblock) + rc = smc_switch_to_fallback(smc, SMC_CLC_DECL_OPTUNSUPP); + else + rc = -EINVAL; + } break; } out: @@ -3124,6 +3113,42 @@ static int smc_setsockopt(struct socket *sock, int level, int optname, return rc; } +static int smc_setsockopt(struct socket *sock, int level, int optname, + sockptr_t optval, unsigned int optlen) +{ + struct sock *sk = sock->sk; + struct smc_sock *smc; + int rc; + + if (level == SOL_TCP && optname == TCP_ULP) + return -EOPNOTSUPP; + else if (level == SOL_SMC) + return __smc_setsockopt(sock, level, optname, optval, optlen); + + smc = smc_sk(sk); + + /* generic setsockopts reaching us here always apply to the + * CLC socket + */ + mutex_lock(&smc->clcsock_release_lock); + if (!smc->clcsock) { + mutex_unlock(&smc->clcsock_release_lock); + return -EBADF; + } + if (unlikely(!smc->clcsock->ops->setsockopt)) + rc = -EOPNOTSUPP; + else + rc = smc->clcsock->ops->setsockopt(smc->clcsock, level, optname, + optval, optlen); + if (smc->clcsock->sk->sk_err) { + sk->sk_err = smc->clcsock->sk->sk_err; + sk_error_report(sk); + } + mutex_unlock(&smc->clcsock_release_lock); + + return rc ?: smc_setsockopt_common(sock, level, optname, optval, optlen); +} + static int smc_getsockopt(struct socket *sock, int level, int optname, char __user *optval, int __user *optlen) { From patchwork Tue Feb 20 07:01:29 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "D. Wythe" X-Patchwork-Id: 13563478 Received: from out30-113.freemail.mail.aliyun.com (out30-113.freemail.mail.aliyun.com [115.124.30.113]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 55E8453812; Tue, 20 Feb 2024 07:01:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=115.124.30.113 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708412519; cv=none; b=Dk7vV26zcxyezhxNQ7VMWNiOr6nJPtrlTAAyRNhqnzEcEtYF4jMhUMJp+e3CbOhExziQ64/Sgz3ZoQyEei8LhgFPBuXG0/ebs+39ffUIqmn8ecNRFii1Iu06vW7C0aWgsTkXVwwK6aJS0geb7O4IEuK8U6HZsggkM3kZzLEHyLs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708412519; c=relaxed/simple; bh=Iutsv3OoVwGF4kc4XgVKcahxUlu7ij77pKg7OhMyixw=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References; b=mJCMyZE5uT8R7CFzHqoQ0c20FawL+I4jyJCugqS2uHwB8T7cB8o2nWDeWqu63033n9uQP15y4O8lnqyezRu8Vp6by5NPSoFehKbWcgGTpqKe18tPVQSURUxdGjFj18Afdilo+NTR4/iD9R4ez3zfwvrS+66I0zwUHWhCMXYroLc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com; spf=pass smtp.mailfrom=linux.alibaba.com; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b=eOw8bREG; arc=none smtp.client-ip=115.124.30.113 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b="eOw8bREG" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1708412512; h=From:To:Subject:Date:Message-Id; bh=BQy1WZZL5aLxU1nT2lhQS39zApqAqUDXC+HaX/JlrPE=; b=eOw8bREG6y/bSpF+0i5Na6jzB88tPN5hbxYdOPAnsRMEdGaB/kJB7rMp+myM5gFiqQsi6/CNVWhngt89FGoTd6QxRdf57oGoOJQ+GaRq2DMJ3ZoPEoMU0bBiLdfFYWxppkr7JfqwWG470dKNkTy1ByxTnkg+tdLmN4tIR3qPhDw= X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R111e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=ay29a033018045170;MF=alibuda@linux.alibaba.com;NM=1;PH=DS;RN=13;SR=0;TI=SMTPD_---0W0vuXdx_1708412511; Received: from j66a10360.sqa.eu95.tbsite.net(mailfrom:alibuda@linux.alibaba.com fp:SMTPD_---0W0vuXdx_1708412511) by smtp.aliyun-inc.com; Tue, 20 Feb 2024 15:01:52 +0800 From: "D. Wythe" To: kgraul@linux.ibm.com, wenjia@linux.ibm.com, jaka@linux.ibm.com, wintera@linux.ibm.com, guwen@linux.alibaba.com Cc: kuba@kernel.org, davem@davemloft.net, netdev@vger.kernel.org, linux-s390@vger.kernel.org, linux-rdma@vger.kernel.org, tonylu@linux.alibaba.com, pabeni@redhat.com, edumazet@google.com Subject: [RFC net-next 04/20] net/smc: refactor smc_accept_poll Date: Tue, 20 Feb 2024 15:01:29 +0800 Message-Id: <1708412505-34470-5-git-send-email-alibuda@linux.alibaba.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1708412505-34470-1-git-send-email-alibuda@linux.alibaba.com> References: <1708412505-34470-1-git-send-email-alibuda@linux.alibaba.com> Precedence: bulk X-Mailing-List: linux-rdma@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: From: "D. Wythe" Refactoring smc_accept_poll to extract a common function for determining whether the accept_queue is empty. Signed-off-by: D. Wythe --- net/smc/af_smc.c | 17 ++++++++--------- 1 file changed, 8 insertions(+), 9 deletions(-) diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c index e87af68..7966d06 100644 --- a/net/smc/af_smc.c +++ b/net/smc/af_smc.c @@ -2832,17 +2832,16 @@ static int smc_recvmsg(struct socket *sock, struct msghdr *msg, size_t len, return rc; } -static __poll_t smc_accept_poll(struct sock *parent) +static inline bool smc_accept_queue_empty(struct sock *sk) { - struct smc_sock *isk = smc_sk(parent); - __poll_t mask = 0; - - spin_lock(&isk->accept_q_lock); - if (!list_empty(&isk->accept_q)) - mask = EPOLLIN | EPOLLRDNORM; - spin_unlock(&isk->accept_q_lock); + return list_empty(&smc_sk(sk)->accept_q); +} - return mask; +static __poll_t smc_accept_poll(struct sock *parent) +{ + if (!smc_accept_queue_empty(parent)) + return EPOLLIN | EPOLLRDNORM; + return 0; } static __poll_t smc_poll(struct file *file, struct socket *sock, From patchwork Tue Feb 20 07:01:30 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "D. Wythe" X-Patchwork-Id: 13563476 Received: from out30-110.freemail.mail.aliyun.com (out30-110.freemail.mail.aliyun.com [115.124.30.110]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D382E5A102; Tue, 20 Feb 2024 07:01:55 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=115.124.30.110 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708412518; cv=none; b=EiyTXgZmCE/3gJ5PUBfbA9Zzx4FTGxDaVvZ2tGWTFmOPbH5OBiXJO/wwwy8+dOj4Zz7jtmZCMZgLi5NN/nDvGa7Aj1VB8JZNQmFZrLlYspTh+3ZTKD4Nu6gCadRD6pAQiXNq1lZ47DmcXVAmYdjylb15KuFHUuMp3tll28yfnhk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708412518; c=relaxed/simple; bh=qzkCICB9HoTrOvet9L/Jj6fywd5UhEGsi5l9fNtfBGM=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References; b=XGNjlObG9zp2yKuAOn7IvLhJSbZbXXorEc1I6VaU1Xa2K0SI85u87zDPfNfquwHPjbWbrCq0305yD6UqEefY8d78RGlh3L9yexdErujEtRb4Eql5YXXaKhaM9TlmRaaGOxMs5Nbrl4hZ6EslhB9eW+y7ui8tl/5TF7soWpmD/cU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com; spf=pass smtp.mailfrom=linux.alibaba.com; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b=Fw1KNtum; arc=none smtp.client-ip=115.124.30.110 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b="Fw1KNtum" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1708412513; h=From:To:Subject:Date:Message-Id; bh=rZbZxhZJSrWPib9cXdiHqcSkJQ2MvvBwgBg0dzO7QSI=; b=Fw1KNtum9JHSoCJ8N1CZ5CyR6z+0RZYyTU/oCAK2IvzPS1ZClWYHdrryDzsgrAW8XWnFrbNyd4eDbkCZgc2BqYbDrmcH/ObGVsKt3ZmWySv5In7G4Xk2ZdRAAaxSDKgZShX30UMjpHNGDZGy5/twKzcBpf+UNAj+XV8gsd+YGqY= X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R871e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=ay29a033018046059;MF=alibuda@linux.alibaba.com;NM=1;PH=DS;RN=13;SR=0;TI=SMTPD_---0W0vuXe3_1708412512; Received: from j66a10360.sqa.eu95.tbsite.net(mailfrom:alibuda@linux.alibaba.com fp:SMTPD_---0W0vuXe3_1708412512) by smtp.aliyun-inc.com; Tue, 20 Feb 2024 15:01:52 +0800 From: "D. Wythe" To: kgraul@linux.ibm.com, wenjia@linux.ibm.com, jaka@linux.ibm.com, wintera@linux.ibm.com, guwen@linux.alibaba.com Cc: kuba@kernel.org, davem@davemloft.net, netdev@vger.kernel.org, linux-s390@vger.kernel.org, linux-rdma@vger.kernel.org, tonylu@linux.alibaba.com, pabeni@redhat.com, edumazet@google.com Subject: [RFC net-next 05/20] net/smc: try test to fallback when ulp set Date: Tue, 20 Feb 2024 15:01:30 +0800 Message-Id: <1708412505-34470-6-git-send-email-alibuda@linux.alibaba.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1708412505-34470-1-git-send-email-alibuda@linux.alibaba.com> References: <1708412505-34470-1-git-send-email-alibuda@linux.alibaba.com> Precedence: bulk X-Mailing-List: linux-rdma@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: From: "D. Wythe" Currently, when the ULP option was seen, we will immediately return a failure. Here we try to fallback first as much as possible, rather than immediately returning. Signed-off-by: D. Wythe --- net/smc/af_smc.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c index 7966d06..b7c9f5c 100644 --- a/net/smc/af_smc.c +++ b/net/smc/af_smc.c @@ -3048,6 +3048,7 @@ static inline bool smc_is_unsupport_tcp_sockopt(int optname) case TCP_FASTOPEN_CONNECT: case TCP_FASTOPEN_KEY: case TCP_FASTOPEN_NO_COOKIE: + case TCP_ULP: return true; } return false; @@ -3119,9 +3120,7 @@ static int smc_setsockopt(struct socket *sock, int level, int optname, struct smc_sock *smc; int rc; - if (level == SOL_TCP && optname == TCP_ULP) - return -EOPNOTSUPP; - else if (level == SOL_SMC) + if (level == SOL_SMC) return __smc_setsockopt(sock, level, optname, optval, optlen); smc = smc_sk(sk); From patchwork Tue Feb 20 07:01:31 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "D. Wythe" X-Patchwork-Id: 13563488 Received: from out199-8.us.a.mail.aliyun.com (out199-8.us.a.mail.aliyun.com [47.90.199.8]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DF2F85B5AC; Tue, 20 Feb 2024 07:02:06 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=47.90.199.8 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708412529; cv=none; b=Yl1zLABuzPGPCUlV5LUe/m/r1svyEFS2I4WUvfIel3SkX3yCzWzA6jo8q/07hqflc6vuHticIY9UMfc7lo6kEUGhmlgzCQ+ln4r5PgM1+4+M4F4PuJNSRShMClvhZuw5aZ1z5fOjAiWYHLPyifwP1JLLyT6us1hEdQJmd/oloZ0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708412529; c=relaxed/simple; bh=Vq2GJFM2PilCsHlhHrVOnLtULh/FY3L0ZNEdj2Ed1Ow=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References; b=R/PTybB3nBmyJmgFs0EXWI/xp1mECwReGMGwfjywQm/eUihqG22vN3qfzTmSzpKF2x95L55SBQuBo936VtyAs7AawkdO/Ge7H9EAkzw1MHIMp34ihI3rPc/UZkLSwa9IU1185zflIPle80Q4cnXRlFHmMjDMayxx2OoS+r3ZbRk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com; spf=pass smtp.mailfrom=linux.alibaba.com; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b=kdc3+ao2; arc=none smtp.client-ip=47.90.199.8 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b="kdc3+ao2" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1708412514; h=From:To:Subject:Date:Message-Id; bh=ttf3fJ5jheHkxjVZhrFoGQBtw9rERuzk9kWvV/P+5tc=; b=kdc3+ao2/FweqjF8Hyd0V3AbkAL7SuH2Fme1wOtQKVE7t1kJzx2tZSjWUhjvm3lUxEo+CB+8HKMhDc1E3g3Aq7GtgSQz7Udb400/jrbShlHFeNat61n/yNOcKiVx0UO93bfxSFuOaIneUtHRcMw1lw5WPVfsuyY5AeiJfRnx5vs= X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R131e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=ay29a033018045192;MF=alibuda@linux.alibaba.com;NM=1;PH=DS;RN=13;SR=0;TI=SMTPD_---0W0vuXeI_1708412513; Received: from j66a10360.sqa.eu95.tbsite.net(mailfrom:alibuda@linux.alibaba.com fp:SMTPD_---0W0vuXeI_1708412513) by smtp.aliyun-inc.com; Tue, 20 Feb 2024 15:01:53 +0800 From: "D. Wythe" To: kgraul@linux.ibm.com, wenjia@linux.ibm.com, jaka@linux.ibm.com, wintera@linux.ibm.com, guwen@linux.alibaba.com Cc: kuba@kernel.org, davem@davemloft.net, netdev@vger.kernel.org, linux-s390@vger.kernel.org, linux-rdma@vger.kernel.org, tonylu@linux.alibaba.com, pabeni@redhat.com, edumazet@google.com Subject: [RFC net-next 06/20] net/smc: fast return on unconcernd TCP options Date: Tue, 20 Feb 2024 15:01:31 +0800 Message-Id: <1708412505-34470-7-git-send-email-alibuda@linux.alibaba.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1708412505-34470-1-git-send-email-alibuda@linux.alibaba.com> References: <1708412505-34470-1-git-send-email-alibuda@linux.alibaba.com> Precedence: bulk X-Mailing-List: linux-rdma@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: From: "D. Wythe" SMC does not require additional processing of every TCP options, hence that when options that do not require additional processing, we can immediately return. Note that options which are explicitly not supported require a try to fallback, and shall not simplt returned. Signed-off-by: D. Wythe --- net/smc/af_smc.c | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+) diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c index b7c9f5c..40cf0569 100644 --- a/net/smc/af_smc.c +++ b/net/smc/af_smc.c @@ -3054,6 +3054,28 @@ static inline bool smc_is_unsupport_tcp_sockopt(int optname) return false; } +/* Return true if smc might modify the semantics of + * the imcoming TCP options. Specifically, it includes + * unsupported TCP options. + */ +static inline bool smc_need_override_tcp_sockopt(struct sock *sk, int optname) +{ + switch (optname) { + case TCP_NODELAY: + case TCP_CORK: + if (smc_sk_state(sk) == SMC_INIT || + smc_sk_state(sk) == SMC_LISTEN || + smc_sk_state(sk) == SMC_CLOSED) + return false; + fallthrough; + case TCP_DEFER_ACCEPT: + return true; + default: + break; + } + return smc_is_unsupport_tcp_sockopt(optname); +} + static int smc_setsockopt_common(struct socket *sock, int level, int optname, sockptr_t optval, unsigned int optlen) { @@ -3063,6 +3085,10 @@ static int smc_setsockopt_common(struct socket *sock, int level, int optname, smc = smc_sk(sk); + /* Fast path, just go away if no extra action needed */ + if (!smc_need_override_tcp_sockopt(sk, optname)) + return 0; + if (optlen < sizeof(int)) return -EINVAL; if (copy_from_sockptr(&val, optval, sizeof(int))) From patchwork Tue Feb 20 07:01:32 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "D. Wythe" X-Patchwork-Id: 13563482 Received: from out30-118.freemail.mail.aliyun.com (out30-118.freemail.mail.aliyun.com [115.124.30.118]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D82A75A4D3; Tue, 20 Feb 2024 07:02:02 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=115.124.30.118 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708412525; cv=none; b=moowzByhZaiFmfURXbN+88setO2wmFMpatwHROXsxiaWq2pPrSAITjLyW++Y4I1XFNlQDtDxQ9V83qPBbKw5bMm1XKRj4xwcYEN2qfBzTBLb62o6CbVDRWrHkvvQO5WLd5Woo7oKG49r9lRBhVTuFEcRqwpBV7aZpvpW6fmplCQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708412525; c=relaxed/simple; bh=jYEq85wOIB9G9ChiwfDhIZbbzmqSualEHfDNqFfY8lE=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References; b=dmDp9e8WsQ4gYkvtzwLKbz61SyLdSKV5uC8X4MBYqnIXKyehWbSkA8v9O3zT+pxzoQCXdKEIz47eaQyj0j/gj4wVn3rICpjtnFelv8Qxaxncoh1966c1NDOnF90zAqdh0FUfkLZxxNiCSZMMEDn7h7uNJTYyU7PptUPhPu4LPw8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com; spf=pass smtp.mailfrom=linux.alibaba.com; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b=HFa9YrTt; arc=none smtp.client-ip=115.124.30.118 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b="HFa9YrTt" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1708412514; h=From:To:Subject:Date:Message-Id; bh=x+0fR1Jr6Fhs41cI45xUAC3ALCYq4HJIKp2WfWH9HB4=; b=HFa9YrTtF7UOatdbU+9dAhForFrmpG7xpHsHd5JxP6X5EpWBevWPVfGRUHYqH8jZu1yHVd8ryz17FF954cV9fGam2JqrIYLcOXpJGJTDpbSVFe+XjiM7GGz7U9dDGBRMCt9cEBBOGWBOx/dszTxC2IokQvHjerfD6SdF9SjCBNw= X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R361e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=ay29a033018046060;MF=alibuda@linux.alibaba.com;NM=1;PH=DS;RN=13;SR=0;TI=SMTPD_---0W0vuXeS_1708412513; Received: from j66a10360.sqa.eu95.tbsite.net(mailfrom:alibuda@linux.alibaba.com fp:SMTPD_---0W0vuXeS_1708412513) by smtp.aliyun-inc.com; Tue, 20 Feb 2024 15:01:54 +0800 From: "D. Wythe" To: kgraul@linux.ibm.com, wenjia@linux.ibm.com, jaka@linux.ibm.com, wintera@linux.ibm.com, guwen@linux.alibaba.com Cc: kuba@kernel.org, davem@davemloft.net, netdev@vger.kernel.org, linux-s390@vger.kernel.org, linux-rdma@vger.kernel.org, tonylu@linux.alibaba.com, pabeni@redhat.com, edumazet@google.com Subject: [RFC net-next 07/20] net/smc: refactor sock_flag/sock_set_flag Date: Tue, 20 Feb 2024 15:01:32 +0800 Message-Id: <1708412505-34470-8-git-send-email-alibuda@linux.alibaba.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1708412505-34470-1-git-send-email-alibuda@linux.alibaba.com> References: <1708412505-34470-1-git-send-email-alibuda@linux.alibaba.com> Precedence: bulk X-Mailing-List: linux-rdma@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: From: "D. Wythe" Use a unified new macro to access the flag of the sock, so that we can easily modify the behavior of a specific flag instead of modifying the original function. Signed-off-by: D. Wythe --- net/smc/af_smc.c | 4 ++-- net/smc/smc.h | 2 ++ net/smc/smc_cdc.c | 2 +- net/smc/smc_close.c | 8 ++++---- net/smc/smc_rx.c | 8 ++++---- 5 files changed, 13 insertions(+), 11 deletions(-) diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c index 40cf0569..66306b7 100644 --- a/net/smc/af_smc.c +++ b/net/smc/af_smc.c @@ -358,7 +358,7 @@ static void smc_destruct(struct sock *sk) { if (smc_sk_state(sk) != SMC_CLOSED) return; - if (!sock_flag(sk, SOCK_DEAD)) + if (!smc_sock_flag(sk, SOCK_DEAD)) return; } @@ -1623,7 +1623,7 @@ static void smc_connect_work(struct work_struct *work) smc->sk.sk_err = -rc; out: - if (!sock_flag(&smc->sk, SOCK_DEAD)) { + if (!smc_sock_flag(&smc->sk, SOCK_DEAD)) { if (smc->sk.sk_err) { smc->sk.sk_state_change(&smc->sk); } else { /* allow polling before and after fallback decision */ diff --git a/net/smc/smc.h b/net/smc/smc.h index 6b651b5..fce6a7a 100644 --- a/net/smc/smc.h +++ b/net/smc/smc.h @@ -388,4 +388,6 @@ static inline void smc_sock_set_flag(struct sock *sk, enum sock_flags flag) set_bit(flag, &sk->sk_flags); } +#define smc_sock_flag(sk, flag) sock_flag(sk, flag) + #endif /* __SMC_H */ diff --git a/net/smc/smc_cdc.c b/net/smc/smc_cdc.c index 3c06625..7614545 100644 --- a/net/smc/smc_cdc.c +++ b/net/smc/smc_cdc.c @@ -285,7 +285,7 @@ static void smc_cdc_handle_urg_data_arrival(struct smc_sock *smc, /* new data included urgent business */ smc_curs_copy(&conn->urg_curs, &conn->local_rx_ctrl.prod, conn); conn->urg_state = SMC_URG_VALID; - if (!sock_flag(&smc->sk, SOCK_URGINLINE)) + if (!smc_sock_flag(&smc->sk, SOCK_URGINLINE)) /* we'll skip the urgent byte, so don't account for it */ (*diff_prod)--; base = (char *)conn->rmb_desc->cpu_addr + conn->rx_off; diff --git a/net/smc/smc_close.c b/net/smc/smc_close.c index 9210d1f..8d9512e 100644 --- a/net/smc/smc_close.c +++ b/net/smc/smc_close.c @@ -202,7 +202,7 @@ int smc_close_active(struct smc_sock *smc) int rc1 = 0; timeout = current->flags & PF_EXITING ? - 0 : sock_flag(sk, SOCK_LINGER) ? + 0 : smc_sock_flag(sk, SOCK_LINGER) ? sk->sk_lingertime : SMC_MAX_STREAM_WAIT_TIMEOUT; old_state = smc_sk_state(sk); @@ -395,7 +395,7 @@ static void smc_close_passive_work(struct work_struct *work) case SMC_PEERCLOSEWAIT2: if (!smc_cdc_rxed_any_close(conn)) break; - if (sock_flag(sk, SOCK_DEAD) && + if (smc_sock_flag(sk, SOCK_DEAD) && smc_close_sent_any_close(conn)) { /* smc_release has already been called locally */ smc_sk_set_state(sk, SMC_CLOSED); @@ -432,7 +432,7 @@ static void smc_close_passive_work(struct work_struct *work) if (old_state != smc_sk_state(sk)) { sk->sk_state_change(sk); if ((smc_sk_state(sk) == SMC_CLOSED) && - (sock_flag(sk, SOCK_DEAD) || !sk->sk_socket)) { + (smc_sock_flag(sk, SOCK_DEAD) || !sk->sk_socket)) { smc_conn_free(conn); if (smc->clcsock) release_clcsock = true; @@ -453,7 +453,7 @@ int smc_close_shutdown_write(struct smc_sock *smc) int rc = 0; timeout = current->flags & PF_EXITING ? - 0 : sock_flag(sk, SOCK_LINGER) ? + 0 : smc_sock_flag(sk, SOCK_LINGER) ? sk->sk_lingertime : SMC_MAX_STREAM_WAIT_TIMEOUT; old_state = smc_sk_state(sk); diff --git a/net/smc/smc_rx.c b/net/smc/smc_rx.c index 32fd7db..684caae 100644 --- a/net/smc/smc_rx.c +++ b/net/smc/smc_rx.c @@ -70,7 +70,7 @@ static int smc_rx_update_consumer(struct smc_sock *smc, if (conn->urg_state == SMC_URG_VALID || conn->urg_rx_skip_pend) { diff = smc_curs_comp(conn->rmb_desc->len, &cons, &conn->urg_curs); - if (sock_flag(sk, SOCK_URGINLINE)) { + if (smc_sock_flag(sk, SOCK_URGINLINE)) { if (diff == 0) { force = true; rc = 1; @@ -286,7 +286,7 @@ static int smc_rx_recv_urg(struct smc_sock *smc, struct msghdr *msg, int len, struct sock *sk = &smc->sk; int rc = 0; - if (sock_flag(sk, SOCK_URGINLINE) || + if (smc_sock_flag(sk, SOCK_URGINLINE) || !(conn->urg_state == SMC_URG_VALID) || conn->urg_state == SMC_URG_READ) return -EINVAL; @@ -408,7 +408,7 @@ int smc_rx_recvmsg(struct smc_sock *smc, struct msghdr *msg, break; } if (smc_sk_state(sk) == SMC_CLOSED) { - if (!sock_flag(sk, SOCK_DONE)) { + if (!smc_sock_flag(sk, SOCK_DONE)) { /* This occurs when user tries to read * from never connected socket. */ @@ -449,7 +449,7 @@ int smc_rx_recvmsg(struct smc_sock *smc, struct msghdr *msg, if (splbytes) smc_curs_add(conn->rmb_desc->len, &cons, splbytes); if (conn->urg_state == SMC_URG_VALID && - sock_flag(&smc->sk, SOCK_URGINLINE) && + smc_sock_flag(&smc->sk, SOCK_URGINLINE) && readable > 1) readable--; /* always stop at urgent Byte */ /* not more than what user space asked for */ From patchwork Tue Feb 20 07:01:33 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "D. Wythe" X-Patchwork-Id: 13563481 Received: from out30-98.freemail.mail.aliyun.com (out30-98.freemail.mail.aliyun.com [115.124.30.98]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D82EF5B5AC; Tue, 20 Feb 2024 07:02:02 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=115.124.30.98 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708412525; cv=none; b=WM0+5nwL7wGq4fgAzWQuiIAWtHhXEbcUtYxmc1Aror0l5kNcvarEljA/xD/uw7yr0Kgz8PQ5LOg7uIElFmkTkU2RvuZgHLRjydlqH3K8PXiB5gYJyKn/pL0b+HEJ77/G+a2P0n3suqPVnyBFang60qWnLKqj/SYRSuDcj52bv78= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708412525; c=relaxed/simple; bh=lUsgo7B7TuvCI1PsvB7d/R6A6ox50ArQ1Zv2S8EHM3w=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References; b=AtjrpQ9XFc9gZIdily4VuNtfcUYtXRFD8y/dEPs3raDQUOHF5MCFIYK+mWmkr1PihJ3Zz99Mldb62kycNOU16NR49LcUzSgEiMGITrSeG6qoniCeDo5lGnuQqbTsN78Nc3hQO1ipk8qt5CvNhX8ERc74C5Ft0+RohV3zd6+h4No= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com; spf=pass smtp.mailfrom=linux.alibaba.com; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b=CjZipc81; arc=none smtp.client-ip=115.124.30.98 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b="CjZipc81" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1708412515; h=From:To:Subject:Date:Message-Id; bh=9lQeQhf1fFiHWvZuBmGc279ZVtyAMjo0RgNUKteShYc=; b=CjZipc81Zaf6OLyk8KVBBrIdVEDtirOi8+ZA5hDtDi1k0lHHggRU2nLHKCDEcCqyKjfYbawDG9z2x9z2SjJ/zoHxlc8gW7GuNGnaWJmanAkIRMkuu6W7Jg9lUMSYRY5AwRdj30jOwqeFbInO0pcRCFpdWpic1beuyOdNY0jZnGI= X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R211e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=ay29a033018046050;MF=alibuda@linux.alibaba.com;NM=1;PH=DS;RN=13;SR=0;TI=SMTPD_---0W0vuXeh_1708412514; Received: from j66a10360.sqa.eu95.tbsite.net(mailfrom:alibuda@linux.alibaba.com fp:SMTPD_---0W0vuXeh_1708412514) by smtp.aliyun-inc.com; Tue, 20 Feb 2024 15:01:54 +0800 From: "D. Wythe" To: kgraul@linux.ibm.com, wenjia@linux.ibm.com, jaka@linux.ibm.com, wintera@linux.ibm.com, guwen@linux.alibaba.com Cc: kuba@kernel.org, davem@davemloft.net, netdev@vger.kernel.org, linux-s390@vger.kernel.org, linux-rdma@vger.kernel.org, tonylu@linux.alibaba.com, pabeni@redhat.com, edumazet@google.com Subject: [RFC net-next 08/20] net/smc: optimize mutex_fback_rsn from mutex to spinlock Date: Tue, 20 Feb 2024 15:01:33 +0800 Message-Id: <1708412505-34470-9-git-send-email-alibuda@linux.alibaba.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1708412505-34470-1-git-send-email-alibuda@linux.alibaba.com> References: <1708412505-34470-1-git-send-email-alibuda@linux.alibaba.com> Precedence: bulk X-Mailing-List: linux-rdma@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: From: "D. Wythe" The region protected by mutex_fback_rsn is simple enough and has no potential blocking points. This change makes us can invoke smc_stat_fallback() in any context, typically, in the context of IRQ. Signed-off-by: D. Wythe --- include/net/netns/smc.h | 2 +- net/smc/af_smc.c | 4 ++-- net/smc/smc_stats.c | 6 +++--- 3 files changed, 6 insertions(+), 6 deletions(-) diff --git a/include/net/netns/smc.h b/include/net/netns/smc.h index fc752a5..99bde74 100644 --- a/include/net/netns/smc.h +++ b/include/net/netns/smc.h @@ -10,7 +10,7 @@ struct netns_smc { /* per cpu counters for SMC */ struct smc_stats __percpu *smc_stats; /* protect fback_rsn */ - struct mutex mutex_fback_rsn; + spinlock_t mutex_fback_rsn; struct smc_stats_rsn *fback_rsn; bool limit_smc_hs; /* constraint on handshake */ diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c index 66306b7..1381ac1 100644 --- a/net/smc/af_smc.c +++ b/net/smc/af_smc.c @@ -769,7 +769,7 @@ static void smc_stat_fallback(struct smc_sock *smc) { struct net *net = sock_net(&smc->sk); - mutex_lock(&net->smc.mutex_fback_rsn); + spin_lock_bh(&net->smc.mutex_fback_rsn); if (smc->listen_smc) { smc_stat_inc_fback_rsn_cnt(smc, net->smc.fback_rsn->srv); net->smc.fback_rsn->srv_fback_cnt++; @@ -777,7 +777,7 @@ static void smc_stat_fallback(struct smc_sock *smc) smc_stat_inc_fback_rsn_cnt(smc, net->smc.fback_rsn->clnt); net->smc.fback_rsn->clnt_fback_cnt++; } - mutex_unlock(&net->smc.mutex_fback_rsn); + spin_unlock_bh(&net->smc.mutex_fback_rsn); } /* must be called under rcu read lock */ diff --git a/net/smc/smc_stats.c b/net/smc/smc_stats.c index ca14c0f..64668e9 100644 --- a/net/smc/smc_stats.c +++ b/net/smc/smc_stats.c @@ -26,7 +26,7 @@ int smc_stats_init(struct net *net) net->smc.smc_stats = alloc_percpu(struct smc_stats); if (!net->smc.smc_stats) goto err_stats; - mutex_init(&net->smc.mutex_fback_rsn); + spin_lock_init(&net->smc.mutex_fback_rsn); return 0; err_stats: @@ -387,7 +387,7 @@ int smc_nl_get_fback_stats(struct sk_buff *skb, struct netlink_callback *cb) int snum = cb_ctx->pos[0]; bool is_srv = true; - mutex_lock(&net->smc.mutex_fback_rsn); + spin_lock_bh(&net->smc.mutex_fback_rsn); for (k = 0; k < SMC_MAX_FBACK_RSN_CNT; k++) { if (k < snum) continue; @@ -406,7 +406,7 @@ int smc_nl_get_fback_stats(struct sk_buff *skb, struct netlink_callback *cb) if (rc_clnt == -ENODATA && rc_srv == -ENODATA) break; } - mutex_unlock(&net->smc.mutex_fback_rsn); + spin_unlock_bh(&net->smc.mutex_fback_rsn); cb_ctx->pos[1] = skip_serv; cb_ctx->pos[0] = k; return skb->len; From patchwork Tue Feb 20 07:01:34 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "D. Wythe" X-Patchwork-Id: 13563479 Received: from out30-131.freemail.mail.aliyun.com (out30-131.freemail.mail.aliyun.com [115.124.30.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6441F5A4D3; Tue, 20 Feb 2024 07:01:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=115.124.30.131 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708412520; cv=none; b=FqzC6Ih7n6n/t6wmXqGyal2dP7o7RM1uYAE/t8jKG4Ji/EH5Ht+qzkMuboGBg0lYKjtvx781nHRNRrKxAXEPGcPR51xqscGl7yKLw9M89AJf9kEPrYOpsXAKI3u+DV0x/PLCZYP9uI3huhVRkZH+mhUeLJhXBOKbYnth48aFa1w= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708412520; c=relaxed/simple; bh=3KZsEV1WJUZcqZISvCXQp8Hcw+dNN1oGioEr2lvo0AQ=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References; b=PXLfXBE4qCWA1fHyjpAG907SeLKnFmVZG0hykA90MPwtRobUXuejwCMBVDV/FPinkgnoIXRnNeD0MCSN7BN+Dk/dmyARIm+KIWDyDFNiJY2QnAhrcfc5Uq6VCDiMaGgR0rzcmYhsqZLU0gw2yQvxF9XiveaNZ62WimMRH4x6E7c= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com; spf=pass smtp.mailfrom=linux.alibaba.com; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b=sH6MRfLT; arc=none smtp.client-ip=115.124.30.131 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b="sH6MRfLT" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1708412516; h=From:To:Subject:Date:Message-Id; bh=3USlzh593tPYna5RCKRSdt/zJLdUUoPtKz7as3FKsT8=; b=sH6MRfLTAVgVa1eqVq/Cx1ZaNt8RadDEO61NIbb5tBrHLP//Smq2ay43/4ZL2/3orfbbnm/xh/xSZ1kFPBpPsdgRyzU3n9iABRiE2FoOJ67fwUjfSqOP9sqQoJ1/2sQKwDmU9vqVp02PSXV2KPH5mc/zmop2FW/+RtwGF9HV+EQ= X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R191e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=ay29a033018045170;MF=alibuda@linux.alibaba.com;NM=1;PH=DS;RN=13;SR=0;TI=SMTPD_---0W0vuXez_1708412515; Received: from j66a10360.sqa.eu95.tbsite.net(mailfrom:alibuda@linux.alibaba.com fp:SMTPD_---0W0vuXez_1708412515) by smtp.aliyun-inc.com; Tue, 20 Feb 2024 15:01:55 +0800 From: "D. Wythe" To: kgraul@linux.ibm.com, wenjia@linux.ibm.com, jaka@linux.ibm.com, wintera@linux.ibm.com, guwen@linux.alibaba.com Cc: kuba@kernel.org, davem@davemloft.net, netdev@vger.kernel.org, linux-s390@vger.kernel.org, linux-rdma@vger.kernel.org, tonylu@linux.alibaba.com, pabeni@redhat.com, edumazet@google.com Subject: [RFC net-next 09/20] net/smc: refator smc_switch_to_fallback Date: Tue, 20 Feb 2024 15:01:34 +0800 Message-Id: <1708412505-34470-10-git-send-email-alibuda@linux.alibaba.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1708412505-34470-1-git-send-email-alibuda@linux.alibaba.com> References: <1708412505-34470-1-git-send-email-alibuda@linux.alibaba.com> Precedence: bulk X-Mailing-List: linux-rdma@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: From: "D. Wythe" Move code ahead which has no need protected by clcsock_release_lock, On the one hand, this reduces the granularity of the critical area, and on the other hand, for the inet version of SMC, the code protected by the critical area is meaningless. This patch make it possible to invoke smc_switch_to_fallback() in any context (IRQ .etc) within inet sock version of SMC. Signed-off-by: D. Wythe --- net/smc/af_smc.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c index 1381ac1..20abdda 100644 --- a/net/smc/af_smc.c +++ b/net/smc/af_smc.c @@ -909,16 +909,18 @@ static int smc_switch_to_fallback(struct smc_sock *smc, int reason_code) { int rc = 0; + /* no need protected by clcsock_release_lock, move head */ + smc->use_fallback = true; + smc->fallback_rsn = reason_code; + smc_stat_fallback(smc); + trace_smc_switch_to_fallback(smc, reason_code); + mutex_lock(&smc->clcsock_release_lock); if (!smc->clcsock) { rc = -EBADF; goto out; } - smc->use_fallback = true; - smc->fallback_rsn = reason_code; - smc_stat_fallback(smc); - trace_smc_switch_to_fallback(smc, reason_code); if (smc->sk.sk_socket && smc->sk.sk_socket->file) { smc->clcsock->file = smc->sk.sk_socket->file; smc->clcsock->file->private_data = smc->clcsock; From patchwork Tue Feb 20 07:01:35 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "D. Wythe" X-Patchwork-Id: 13563483 Received: from out30-101.freemail.mail.aliyun.com (out30-101.freemail.mail.aliyun.com [115.124.30.101]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E71A65B5CD; Tue, 20 Feb 2024 07:02:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=115.124.30.101 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708412527; cv=none; b=uo3tTeOjr1z80ayoJXMMIuR1+vYqf+YDeseA7mUlZKHhVj85ShGrnFQ8/QgZ+vTQRLsJRIJ6JRWb3TwuY7sPfUWNqrdcHP/CXQVbT0kMX+evho7I8UBLz08ngwj9d+E1xRoJyU/QMl82Y3ITyg9X8JwTpTrqB5kA/gJGt/IP/PU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708412527; c=relaxed/simple; bh=1oITY5bmGcT4RsZmcB213lVYZ7JyMbqVj5Ni7/GeoqM=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References; b=OTcKQ9XRTCdo2fWv1GzBijUMEBXbMLvLbcmITnMi8vhletOL2OCI8Q8XsF5L83DGHJFHvK19QvjfFLiLSNCAe/civy5cOuzqkaqgPvqcByVnchwF+Lqg/paFflQyEaHSdfYV6vdPmdMCIvXpEwM8To+aV8E1nc7lgsVsWP6lUUM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com; spf=pass smtp.mailfrom=linux.alibaba.com; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b=Bwk2sb7i; arc=none smtp.client-ip=115.124.30.101 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b="Bwk2sb7i" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1708412517; h=From:To:Subject:Date:Message-Id; bh=L4lhr6Rrf/wRN1/5nDyGfaaPTD10iK9aCrz3lMKLFJ0=; b=Bwk2sb7iYokAqz5YS5neMN/Bf16OTHcYQcqm018TRzVQYXDPD9DKEMQmWKl34YGlJUU6ihVzg3GgUvEh60IrSaARXtsBLdYCSxn+ctM/qMH2LU7KydhhNC3W3Xc+YfZ9QWGNa4M1+GUMQY+QZp+hCAZmCdZvg+tyF3wh6EsUIn0= X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R101e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=ay29a033018045176;MF=alibuda@linux.alibaba.com;NM=1;PH=DS;RN=13;SR=0;TI=SMTPD_---0W0vuXfK_1708412515; Received: from j66a10360.sqa.eu95.tbsite.net(mailfrom:alibuda@linux.alibaba.com fp:SMTPD_---0W0vuXfK_1708412515) by smtp.aliyun-inc.com; Tue, 20 Feb 2024 15:01:56 +0800 From: "D. Wythe" To: kgraul@linux.ibm.com, wenjia@linux.ibm.com, jaka@linux.ibm.com, wintera@linux.ibm.com, guwen@linux.alibaba.com Cc: kuba@kernel.org, davem@davemloft.net, netdev@vger.kernel.org, linux-s390@vger.kernel.org, linux-rdma@vger.kernel.org, tonylu@linux.alibaba.com, pabeni@redhat.com, edumazet@google.com Subject: [RFC net-next 10/20] net/smc: make initialization code in smc_listen independent Date: Tue, 20 Feb 2024 15:01:35 +0800 Message-Id: <1708412505-34470-11-git-send-email-alibuda@linux.alibaba.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1708412505-34470-1-git-send-email-alibuda@linux.alibaba.com> References: <1708412505-34470-1-git-send-email-alibuda@linux.alibaba.com> Precedence: bulk X-Mailing-List: linux-rdma@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: From: "D. Wythe" This patch make the initialization code in smc_listen independent as smc_init_listen, we will use it in the inet version of SMC. This patch clearly has no side effects, logically speaking, it only refactored this smc_listen() function. Signed-off-by: D. Wythe --- net/smc/af_smc.c | 49 +++++++++++++++++++++++++++++-------------------- 1 file changed, 29 insertions(+), 20 deletions(-) diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c index 20abdda..484e981 100644 --- a/net/smc/af_smc.c +++ b/net/smc/af_smc.c @@ -2610,6 +2610,34 @@ static void smc_clcsock_data_ready(struct sock *listen_clcsock) read_unlock_bh(&listen_clcsock->sk_callback_lock); } +static inline void smc_init_listen(struct smc_sock *smc) +{ + struct sock *clcsk; + + clcsk = smc_sock_is_inet_sock(&smc->sk) ? &smc->sk : smc->clcsock->sk; + + /* save original sk_data_ready function and establish + * smc-specific sk_data_ready function + */ + write_lock_bh(&clcsk->sk_callback_lock); + clcsk->sk_user_data = + (void *)((uintptr_t)smc | SK_USER_DATA_NOCOPY); + smc_clcsock_replace_cb(&clcsk->sk_data_ready, + smc_clcsock_data_ready, &smc->clcsk_data_ready); + write_unlock_bh(&clcsk->sk_callback_lock); + + /* save original ops */ + smc->ori_af_ops = inet_csk(clcsk)->icsk_af_ops; + + smc->af_ops = *smc->ori_af_ops; + smc->af_ops.syn_recv_sock = smc_tcp_syn_recv_sock; + + inet_csk(clcsk)->icsk_af_ops = &smc->af_ops; + + if (smc->limit_smc_hs) + tcp_sk(clcsk)->smc_hs_congested = smc_hs_congested; +} + static int smc_listen(struct socket *sock, int backlog) { struct sock *sk = sock->sk; @@ -2636,26 +2664,7 @@ static int smc_listen(struct socket *sock, int backlog) if (!smc->use_fallback) tcp_sk(smc->clcsock->sk)->syn_smc = 1; - /* save original sk_data_ready function and establish - * smc-specific sk_data_ready function - */ - write_lock_bh(&smc->clcsock->sk->sk_callback_lock); - smc->clcsock->sk->sk_user_data = - (void *)((uintptr_t)smc | SK_USER_DATA_NOCOPY); - smc_clcsock_replace_cb(&smc->clcsock->sk->sk_data_ready, - smc_clcsock_data_ready, &smc->clcsk_data_ready); - write_unlock_bh(&smc->clcsock->sk->sk_callback_lock); - - /* save original ops */ - smc->ori_af_ops = inet_csk(smc->clcsock->sk)->icsk_af_ops; - - smc->af_ops = *smc->ori_af_ops; - smc->af_ops.syn_recv_sock = smc_tcp_syn_recv_sock; - - inet_csk(smc->clcsock->sk)->icsk_af_ops = &smc->af_ops; - - if (smc->limit_smc_hs) - tcp_sk(smc->clcsock->sk)->smc_hs_congested = smc_hs_congested; + smc_init_listen(smc); rc = kernel_listen(smc->clcsock, backlog); if (rc) { From patchwork Tue Feb 20 07:01:36 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "D. Wythe" X-Patchwork-Id: 13563484 Received: from out30-101.freemail.mail.aliyun.com (out30-101.freemail.mail.aliyun.com [115.124.30.101]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D8BAC5B673; Tue, 20 Feb 2024 07:02:05 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=115.124.30.101 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708412528; cv=none; b=JMflbdXhAB49F87X9q7qrw5HMKy/GYsy5F08a4wNwG4+zyy36HJ6brgGgiKGJdE1AGrdpsPMEaWCMV6uzzpr+jIZDi4DgCm36jtZXiCeGBp1i8TpyaO7kBKH+VTYAGr4K7961TBqyYUqWRI7QYYsrnfjMHmeBfcJT9rT8YMlM9s= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708412528; c=relaxed/simple; bh=AR2vZ2Pk14pFkuCI5pWdkzga/sx98D1kPbpMRsNxMFM=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References; b=O6OvgfPYQCoTGzT7cmX0J8jvID7Q6UiYkCPrYSygaeUAAJadCTb1HmH/mjAS83c1UcfcVbeTv6jMUs4XCZSk9q6zsNfGR3zxAJmLdvnpS057PXOrYz0Nm5Obys52PD15OAG1v4moMg0O7y+v98MWOPduV5ueFiQgM3L6xPrZlCA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com; spf=pass smtp.mailfrom=linux.alibaba.com; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b=jBdmAsHB; arc=none smtp.client-ip=115.124.30.101 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b="jBdmAsHB" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1708412518; h=From:To:Subject:Date:Message-Id; bh=winjUbtGrzKRyXsXM3mIb5ppf4C3A0xLqD6cwkkdcKY=; b=jBdmAsHBRzbTIxScnrr3aZGpbMoOzt9spz4+sB4eQjwqhMnwWB9dC9IN3fh9JjQP0feuXFgmu8FZcmxNjTHH/FdJx4PpBl8wOW+FFhZCb5JXqSQJjfN6CAm0tbNuE39Ar5Ffv/c3J1vAyVaGjDbUGEdh+2GHS7mlU6wDMhtYlX0= X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R131e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=ay29a033018046060;MF=alibuda@linux.alibaba.com;NM=1;PH=DS;RN=13;SR=0;TI=SMTPD_---0W0vuXfh_1708412516; Received: from j66a10360.sqa.eu95.tbsite.net(mailfrom:alibuda@linux.alibaba.com fp:SMTPD_---0W0vuXfh_1708412516) by smtp.aliyun-inc.com; Tue, 20 Feb 2024 15:01:56 +0800 From: "D. Wythe" To: kgraul@linux.ibm.com, wenjia@linux.ibm.com, jaka@linux.ibm.com, wintera@linux.ibm.com, guwen@linux.alibaba.com Cc: kuba@kernel.org, davem@davemloft.net, netdev@vger.kernel.org, linux-s390@vger.kernel.org, linux-rdma@vger.kernel.org, tonylu@linux.alibaba.com, pabeni@redhat.com, edumazet@google.com Subject: [RFC net-next 11/20] net/smc: make __smc_accept can return the new accepted sock Date: Tue, 20 Feb 2024 15:01:36 +0800 Message-Id: <1708412505-34470-12-git-send-email-alibuda@linux.alibaba.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1708412505-34470-1-git-send-email-alibuda@linux.alibaba.com> References: <1708412505-34470-1-git-send-email-alibuda@linux.alibaba.com> Precedence: bulk X-Mailing-List: linux-rdma@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: From: "D. Wythe" This patch is a refactoring for smc_accept(), the extracted __smc_accept() make it possible to obtain the accepted sock when an NULL clcsock passed in. In that way, the inet version of SMC can access the accepted sock without providing a faked clcsock. Signed-off-by: D. Wythe --- net/smc/af_smc.c | 24 ++++++++++++++++++------ 1 file changed, 18 insertions(+), 6 deletions(-) diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c index 484e981..e0505d6 100644 --- a/net/smc/af_smc.c +++ b/net/smc/af_smc.c @@ -2684,17 +2684,16 @@ static int smc_listen(struct socket *sock, int backlog) return rc; } -static int smc_accept(struct socket *sock, struct socket *new_sock, - int flags, bool kern) +static struct sock *__smc_accept(struct sock *sk, struct socket *new_sock, + int flags, int *err, bool kern) { - struct sock *sk = sock->sk, *nsk; DECLARE_WAITQUEUE(wait, current); + struct sock *nsk = NULL; struct smc_sock *lsmc; long timeo; int rc = 0; lsmc = smc_sk(sk); - sock_hold(sk); /* sock_put below */ lock_sock(sk); if (smc_sk_state(&lsmc->sk) != SMC_LISTEN) { @@ -2750,8 +2749,21 @@ static int smc_accept(struct socket *sock, struct socket *new_sock, } out: - sock_put(sk); /* sock_hold above */ - return rc; + *err = rc; + return nsk; +} + +static int smc_accept(struct socket *sock, struct socket *new_sock, + int flags, bool kern) +{ + struct sock *sk = sock->sk; + int error; + + sock_hold(sk); + __smc_accept(sk, new_sock, flags, &error, kern); + sock_put(sk); + + return error; } static int smc_getname(struct socket *sock, struct sockaddr *addr, From patchwork Tue Feb 20 07:01:37 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "D. Wythe" X-Patchwork-Id: 13563480 Received: from out30-100.freemail.mail.aliyun.com (out30-100.freemail.mail.aliyun.com [115.124.30.100]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 603415B5B3; Tue, 20 Feb 2024 07:02:01 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=115.124.30.100 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708412524; cv=none; b=NadZT70T/jajERfqQHCTtIwNLrDKfRYawDeDort/eCqyZX+W6Rx+gqu+9aJzvZc3Fm+ROS6ZPgjQ861l+molf+sEwJ3S4Ms9bBbr+Dtg6M1a6LRl3rRZraY1T/Hfbmu6c7K1riY+dR628mNcshPY6IMnbugy9gAZTcmnUk4XD8Q= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708412524; c=relaxed/simple; bh=uynprvzz2AbAKrOJMac5dPZJ0gH2wTRRdmgU8YlBESE=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References; b=OruoMZinirEg37jIirERQdvY7Dn4jmdxP9EQfgaoNm2WxTAdqJspRk0V3QrmxHWsemZ+zhu9TJxONnXDchYrsi3wD5y1fZdEhbIOjvF+nxofr3nvjG9Me5HT5qWiVNwrStnqOHvGEtmkIT6wjO3sXtDVZ8LiY/XaqC1pTWAxAS4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com; spf=pass smtp.mailfrom=linux.alibaba.com; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b=bYaTxERB; arc=none smtp.client-ip=115.124.30.100 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b="bYaTxERB" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1708412519; h=From:To:Subject:Date:Message-Id; bh=sNwHuATEVbDgcOGggJ4aYilgn556khxa9K+HduugkGs=; b=bYaTxERBJQyS305dEjY09TeV9/aRz4AVTKlWD7UjrfhkhNxPTOfPAf/tgVYdNYC4WY3E2utZHAvnxu9llMbMYtqkh/QN7E6Y6rLB+4KBsm8yMYeyAridYxnpYSSgl0zVZNuH+yikBwz3qzZhCJYzyk0lQoR6r56iJ7iTw97/HC8= X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R141e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=ay29a033018046050;MF=alibuda@linux.alibaba.com;NM=1;PH=DS;RN=13;SR=0;TI=SMTPD_---0W0vuXfw_1708412517; Received: from j66a10360.sqa.eu95.tbsite.net(mailfrom:alibuda@linux.alibaba.com fp:SMTPD_---0W0vuXfw_1708412517) by smtp.aliyun-inc.com; Tue, 20 Feb 2024 15:01:57 +0800 From: "D. Wythe" To: kgraul@linux.ibm.com, wenjia@linux.ibm.com, jaka@linux.ibm.com, wintera@linux.ibm.com, guwen@linux.alibaba.com Cc: kuba@kernel.org, davem@davemloft.net, netdev@vger.kernel.org, linux-s390@vger.kernel.org, linux-rdma@vger.kernel.org, tonylu@linux.alibaba.com, pabeni@redhat.com, edumazet@google.com Subject: [RFC net-next 12/20] net/smc: refatoring initialization of smc sock Date: Tue, 20 Feb 2024 15:01:37 +0800 Message-Id: <1708412505-34470-13-git-send-email-alibuda@linux.alibaba.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1708412505-34470-1-git-send-email-alibuda@linux.alibaba.com> References: <1708412505-34470-1-git-send-email-alibuda@linux.alibaba.com> Precedence: bulk X-Mailing-List: linux-rdma@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: From: "D. Wythe" This patch try extract the common part of smc sock initialization, use smc_sock_init() for active open sock initialization, smc_sock_init_passive() for passive open sock initialization. This is a preparation to implement the inet version of SMC. Signed-off-by: D. Wythe --- net/smc/af_smc.c | 58 +++++++++++++++++++++++++++++++++++++++----------------- 1 file changed, 41 insertions(+), 17 deletions(-) diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c index e0505d6..97e3951 100644 --- a/net/smc/af_smc.c +++ b/net/smc/af_smc.c @@ -362,10 +362,48 @@ static void smc_destruct(struct sock *sk) return; } +static inline void smc_sock_init_common(struct sock *sk) +{ + struct smc_sock *smc = smc_sk(sk); + + smc_sk_set_state(sk, SMC_INIT); + INIT_DELAYED_WORK(&smc->conn.tx_work, smc_tx_work); + spin_lock_init(&smc->conn.send_lock); + mutex_init(&smc->clcsock_release_lock); +} + +static void smc_sock_init_passive(struct sock *par, struct sock *sk) +{ + struct smc_sock *parent = smc_sk(par); + struct sock *clcsk; + + smc_sock_init_common(sk); + smc_sk(sk)->listen_smc = parent; + + clcsk = smc_sock_is_inet_sock(sk) ? sk : smc_sk(sk)->clcsock->sk; + if (tcp_sk(clcsk)->syn_smc) + atomic_inc(&parent->queued_smc_hs); +} + +static void smc_sock_init(struct sock *sk, struct net *net) +{ + struct smc_sock *smc = smc_sk(sk); + + smc_sock_init_common(sk); + WRITE_ONCE(sk->sk_sndbuf, 2 * READ_ONCE(net->smc.sysctl_wmem)); + WRITE_ONCE(sk->sk_rcvbuf, 2* READ_ONCE(net->smc.sysctl_rmem)); + INIT_WORK(&smc->tcp_listen_work, smc_tcp_listen_work); + INIT_WORK(&smc->connect_work, smc_connect_work); + INIT_LIST_HEAD(&smc->accept_q); + spin_lock_init(&smc->accept_q_lock); + smc_init_saved_callbacks(smc); + + sk->sk_destruct = smc_destruct; +} + static struct sock *smc_sock_alloc(struct net *net, struct socket *sock, int protocol) { - struct smc_sock *smc; struct proto *prot; struct sock *sk; @@ -375,21 +413,9 @@ static struct sock *smc_sock_alloc(struct net *net, struct socket *sock, return NULL; sock_init_data(sock, sk); /* sets sk_refcnt to 1 */ - smc_sk_set_state(sk, SMC_INIT); - sk->sk_destruct = smc_destruct; sk->sk_protocol = protocol; - WRITE_ONCE(sk->sk_sndbuf, 2 * READ_ONCE(net->smc.sysctl_wmem)); - WRITE_ONCE(sk->sk_rcvbuf, 2 * READ_ONCE(net->smc.sysctl_rmem)); - smc = smc_sk(sk); - INIT_WORK(&smc->tcp_listen_work, smc_tcp_listen_work); - INIT_WORK(&smc->connect_work, smc_connect_work); - INIT_DELAYED_WORK(&smc->conn.tx_work, smc_tx_work); - INIT_LIST_HEAD(&smc->accept_q); - spin_lock_init(&smc->accept_q_lock); - spin_lock_init(&smc->conn.send_lock); + smc_sock_init(sk, net); sk->sk_prot->hash(sk); - mutex_init(&smc->clcsock_release_lock); - smc_init_saved_callbacks(smc); return sk; } @@ -2573,10 +2599,8 @@ static void smc_tcp_listen_work(struct work_struct *work) if (!new_smc) continue; - if (tcp_sk(new_smc->clcsock->sk)->syn_smc) - atomic_inc(&lsmc->queued_smc_hs); + smc_sock_init_passive(lsk, &new_smc->sk); - new_smc->listen_smc = lsmc; new_smc->use_fallback = lsmc->use_fallback; new_smc->fallback_rsn = lsmc->fallback_rsn; sock_hold(lsk); /* sock_put in smc_listen_work */ From patchwork Tue Feb 20 07:01:38 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "D. Wythe" X-Patchwork-Id: 13563486 Received: from out30-98.freemail.mail.aliyun.com (out30-98.freemail.mail.aliyun.com [115.124.30.98]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A7BD85A4D3; Tue, 20 Feb 2024 07:02:06 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=115.124.30.98 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708412528; cv=none; b=AO4kL5vd0wqOeUIMqou8CANcoEFpk8ONd0Hug/mhRvEwCYcinX6B2V83cCPUMZuaSX6cJYe59wJTviQ/pk1Xvlf9nfQHYX/rNqvJ+OmIb0lmLcQRY2VS2dUWFjnnUvLd2DPiWK+1NNfeVM0qbcZ5HOA8Xf/DcR3l9hA8RqMj2nQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708412528; c=relaxed/simple; bh=Uen6/3QA/kH5/krcRm82SDHTVPfbkLOOICuztBWUjv8=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References; b=BQkbu7LWHZj+mCeTLJU9+D6r0LDW0IhMO1TIfdiZhjB5lplaiyozZjBC9rB2EPM4NM+U4s0QEAN0L/aGM1KP9Z6p8uG9BJAt/xMRCLdW4V4VXYmIptNn+7J6sLr76hPhw6i/qoQgDuSupdwmom6M5D6QT7jxwGPB2dOAHdryyUw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com; spf=pass smtp.mailfrom=linux.alibaba.com; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b=wJzZuHVc; arc=none smtp.client-ip=115.124.30.98 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b="wJzZuHVc" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1708412518; h=From:To:Subject:Date:Message-Id; bh=pHoJolDSC2elBPjYqT6pbVM380kpEmMy7XkwC6XWJxQ=; b=wJzZuHVcvD1keusWFLwuLC+8GltiQP1/kcDljBHRgDWY6y6XvzsOE6JR9OXoxt9IgTlQSC+3u6w4+vrdsmQdWpSDEL4+GQU6ZFB650/OwKFSbI02Wlthxlzsr1ClJGVohZba/xlXs8Yd6atVYXCjAfcVj5h+tWvBNXFRMYmcMsQ= X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R111e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=ay29a033018046056;MF=alibuda@linux.alibaba.com;NM=1;PH=DS;RN=13;SR=0;TI=SMTPD_---0W0vuXgI_1708412518; Received: from j66a10360.sqa.eu95.tbsite.net(mailfrom:alibuda@linux.alibaba.com fp:SMTPD_---0W0vuXgI_1708412518) by smtp.aliyun-inc.com; Tue, 20 Feb 2024 15:01:58 +0800 From: "D. Wythe" To: kgraul@linux.ibm.com, wenjia@linux.ibm.com, jaka@linux.ibm.com, wintera@linux.ibm.com, guwen@linux.alibaba.com Cc: kuba@kernel.org, davem@davemloft.net, netdev@vger.kernel.org, linux-s390@vger.kernel.org, linux-rdma@vger.kernel.org, tonylu@linux.alibaba.com, pabeni@redhat.com, edumazet@google.com Subject: [RFC net-next 13/20] net/smc: embedded tcp sock into smc sock Date: Tue, 20 Feb 2024 15:01:38 +0800 Message-Id: <1708412505-34470-14-git-send-email-alibuda@linux.alibaba.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1708412505-34470-1-git-send-email-alibuda@linux.alibaba.com> References: <1708412505-34470-1-git-send-email-alibuda@linux.alibaba.com> Precedence: bulk X-Mailing-List: linux-rdma@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: From: "D. Wythe" For inet version of SMC, one of the key goals is to make a fallbacked smc sock can be recognazied as a tcp sock by net tools. So, it is necessary to embedded the tcp sock into smc sock and make the tcp sock as the first member of smc sock. Signed-off-by: D. Wythe --- net/smc/smc.h | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/net/smc/smc.h b/net/smc/smc.h index fce6a7a..932d61f 100644 --- a/net/smc/smc.h +++ b/net/smc/smc.h @@ -248,7 +248,11 @@ struct smc_connection { }; struct smc_sock { /* smc sock container */ - struct sock sk; + union { + struct tcp6_sock tp6sk; + struct tcp_sock tpsk; + struct sock sk; + }; struct socket *clcsock; /* internal tcp socket */ void (*clcsk_state_change)(struct sock *sk); /* original stat_change fct. */ @@ -388,6 +392,11 @@ static inline void smc_sock_set_flag(struct sock *sk, enum sock_flags flag) set_bit(flag, &sk->sk_flags); } +static __always_inline bool smc_sock_is_inet_sock(const struct sock *sk) +{ + return inet_test_bit(IS_ICSK, sk); +} + #define smc_sock_flag(sk, flag) sock_flag(sk, flag) #endif /* __SMC_H */ From patchwork Tue Feb 20 07:01:39 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "D. Wythe" X-Patchwork-Id: 13563492 Received: from out199-1.us.a.mail.aliyun.com (out199-1.us.a.mail.aliyun.com [47.90.199.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C26EC5A11A; Tue, 20 Feb 2024 07:02:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=47.90.199.1 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708412539; cv=none; b=RZ63M9WDfVFtxUBlsHQq81Bf61krAsr5QkR7h0TEX9YBHjlQ031wm+nO6tINOk9bUQs39Q670KPQXtSMNB6DeMuIMbM6yuOzzlSeWk4Rr9/bRh61/0wgAJwkvLJhafaV3lo0Z8EsK4KPtO92ik3Fww3Y6OFO7BaABZ0YRtoI7sk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708412539; c=relaxed/simple; bh=KHH6nqnlO3XtD52ILwZrK3g1QIawp8D8BY/Ro4dbtak=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References; b=p8aSBpvfxqIWt5/QXYEk9RmXO3SyLnNed0ScLy5JmUj/fdiDLDg+UHFq3/B4AXSsIEVx/Ng6gr+3Tl5Bl6jwhJITD6M6A5T1Fj/mnAso1HwvG//SC9m2DG1yvlkBhBF3pULyzsVEZIrQjgTcgZnbxttd5T16ufItwRd7fIVwMTs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com; spf=pass smtp.mailfrom=linux.alibaba.com; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b=XWPm01l4; arc=none smtp.client-ip=47.90.199.1 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b="XWPm01l4" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1708412519; h=From:To:Subject:Date:Message-Id; bh=H78lH3tbMyek+0jhdG1eR22ohdujG/bHb3E38Wi9Cac=; b=XWPm01l4aLFFCwB+Vg4bOlijxRSMtRYHH1WstCdaH/A6bNSJBf5jVjAx9QBlQwl5Z8grg7ABjYAY+d7wLopgoKPw28JY91xkdR/FCQea6fj3VTGqS0fpxVIyb69cCAUeMeQOnlWlm2RH6v7fxvMQqGgjU6eDPgW8aPTH93R17+M= X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R211e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=ay29a033018045176;MF=alibuda@linux.alibaba.com;NM=1;PH=DS;RN=13;SR=0;TI=SMTPD_---0W0vuXgV_1708412518; Received: from j66a10360.sqa.eu95.tbsite.net(mailfrom:alibuda@linux.alibaba.com fp:SMTPD_---0W0vuXgV_1708412518) by smtp.aliyun-inc.com; Tue, 20 Feb 2024 15:01:59 +0800 From: "D. Wythe" To: kgraul@linux.ibm.com, wenjia@linux.ibm.com, jaka@linux.ibm.com, wintera@linux.ibm.com, guwen@linux.alibaba.com Cc: kuba@kernel.org, davem@davemloft.net, netdev@vger.kernel.org, linux-s390@vger.kernel.org, linux-rdma@vger.kernel.org, tonylu@linux.alibaba.com, pabeni@redhat.com, edumazet@google.com Subject: [RFC net-next 14/20] net/smc: allow to access the state of inet smc sock Date: Tue, 20 Feb 2024 15:01:39 +0800 Message-Id: <1708412505-34470-15-git-send-email-alibuda@linux.alibaba.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1708412505-34470-1-git-send-email-alibuda@linux.alibaba.com> References: <1708412505-34470-1-git-send-email-alibuda@linux.alibaba.com> Precedence: bulk X-Mailing-List: linux-rdma@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: From: "D. Wythe" As we know, in inet version of smc, smc_sock and tcp_sock coexist, this will result in the sk_state field has been accessed and modified by both protocols, which can cause obvious exceptions. Therefore, this patch modify the state macro for reading and setting the smc state, using the icsk field to determine which is the very field needed to be accessed or changed. Signed-off-by: D. Wythe --- net/smc/smc.h | 18 +++++++++++++++--- 1 file changed, 15 insertions(+), 3 deletions(-) diff --git a/net/smc/smc.h b/net/smc/smc.h index 932d61f..e54a30c 100644 --- a/net/smc/smc.h +++ b/net/smc/smc.h @@ -38,9 +38,6 @@ #define KERNEL_HAS_ATOMIC64 #endif -#define smc_sk_state(sk) ((sk)->sk_state) -#define smc_sk_set_state(sk, state) (smc_sk_state(sk) = (state)) - enum smc_state { /* possible states of an SMC socket */ SMC_ACTIVE = 1, SMC_INIT = 2, @@ -254,6 +251,7 @@ struct smc_sock { /* smc sock container */ struct sock sk; }; struct socket *clcsock; /* internal tcp socket */ + unsigned char smc_state; /* smc state used in smc via inet_sk */ void (*clcsk_state_change)(struct sock *sk); /* original stat_change fct. */ void (*clcsk_data_ready)(struct sock *sk); @@ -397,6 +395,20 @@ static __always_inline bool smc_sock_is_inet_sock(const struct sock *sk) return inet_test_bit(IS_ICSK, sk); } +#define smc_sk_state(sk) ({ \ + struct sock *__sk = (sk); \ + smc_sock_is_inet_sock(__sk) ? \ + smc_sk(__sk)->smc_state : (__sk)->sk_state; \ +}) + +static __always_inline void smc_sk_set_state(struct sock *sk, unsigned char state) +{ + if (smc_sock_is_inet_sock(sk)) + smc_sk(sk)->smc_state = state; + else + sk->sk_state = state; +} + #define smc_sock_flag(sk, flag) sock_flag(sk, flag) #endif /* __SMC_H */ From patchwork Tue Feb 20 07:01:40 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "D. Wythe" X-Patchwork-Id: 13563516 Received: from out30-132.freemail.mail.aliyun.com (out30-132.freemail.mail.aliyun.com [115.124.30.132]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2F7ED5A0EC; Tue, 20 Feb 2024 07:07:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=115.124.30.132 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708412845; cv=none; b=lDPQNalKcfux3Nnsehk4iCCq0dF6ZpQG+SKTVo2TAF0jS0HqzTcNw49AEJQPN1Fr/ey8FGbqAvYnq8y+Du5VZizaALTJ1xCx4cNGAzA/ButuOAr1PnocNNRQ66qm7jlyyz1bmQWMVQC0oADrdk5f3ReL1N4Nl3qB8SdkE9LnMp4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708412845; c=relaxed/simple; bh=l28AfzidSu1pRTD2a6149c5GFZhgjtGOmiAePxjJGtU=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References; b=b3ygX31ctN8a/eE1HmKNNLvxdIWbIxoPZJoxJSwbxShPKIRcaz+JMkTr5vNdTJqx3uUlC3nvaBV3vBPLwxJeqC2AHj+nS/33I/wI9A12tekULLHzUobOUkmsfSMlLzqyBtXfyXLnGZYrwOk9GzJyCYLNvvj1ScL9GWN9424Atlw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com; spf=pass smtp.mailfrom=linux.alibaba.com; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b=c7qZXbm6; arc=none smtp.client-ip=115.124.30.132 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b="c7qZXbm6" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1708412839; h=From:To:Subject:Date:Message-Id; bh=0gYY+oZKSPlQwfXwQJFl5/DHmqexYu6zfCNueOj8rac=; b=c7qZXbm6RngmxkP3CVZN8lmBK5B+s/Pt0qWCPWuQDoMW706zZS4EYCq/Kgjbhq4WfG5ggeYcq0VZYs15DNomPFNwKTRdXcoRCw4qWOkudZ1aoxnrVRmeEgnPR7HXD+ul3rayFUQIDgK57+9uaw8HnDU6bz0z2UoCUByoWn1xWEM= X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R761e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=ay29a033018046060;MF=alibuda@linux.alibaba.com;NM=1;PH=DS;RN=13;SR=0;TI=SMTPD_---0W0vuXgk_1708412519; Received: from j66a10360.sqa.eu95.tbsite.net(mailfrom:alibuda@linux.alibaba.com fp:SMTPD_---0W0vuXgk_1708412519) by smtp.aliyun-inc.com; Tue, 20 Feb 2024 15:01:59 +0800 From: "D. Wythe" To: kgraul@linux.ibm.com, wenjia@linux.ibm.com, jaka@linux.ibm.com, wintera@linux.ibm.com, guwen@linux.alibaba.com Cc: kuba@kernel.org, davem@davemloft.net, netdev@vger.kernel.org, linux-s390@vger.kernel.org, linux-rdma@vger.kernel.org, tonylu@linux.alibaba.com, pabeni@redhat.com, edumazet@google.com Subject: [RFC net-next 15/20] net/smc: enable access of sock flags of inet smc sock Date: Tue, 20 Feb 2024 15:01:40 +0800 Message-Id: <1708412505-34470-16-git-send-email-alibuda@linux.alibaba.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1708412505-34470-1-git-send-email-alibuda@linux.alibaba.com> References: <1708412505-34470-1-git-send-email-alibuda@linux.alibaba.com> Precedence: bulk X-Mailing-List: linux-rdma@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: From: "D. Wythe" Since smc_sock and tcp_sock coexist in inet version of smc, the sock flags field are shared between tcp and smc. Like the sk_state, we also need a extra sock flags filed for smc in inet version, and using the icsk field to determine which is the very field needed to be accessed or changed. Signed-off-by: D. Wythe --- net/smc/smc.h | 34 +++++++++++++++++++++++++++++----- 1 file changed, 29 insertions(+), 5 deletions(-) diff --git a/net/smc/smc.h b/net/smc/smc.h index e54a30c..1675193 100644 --- a/net/smc/smc.h +++ b/net/smc/smc.h @@ -252,6 +252,7 @@ struct smc_sock { /* smc sock container */ }; struct socket *clcsock; /* internal tcp socket */ unsigned char smc_state; /* smc state used in smc via inet_sk */ + unsigned long smc_sk_flags; /* smc sock flags used for inet sock */ void (*clcsk_state_change)(struct sock *sk); /* original stat_change fct. */ void (*clcsk_data_ready)(struct sock *sk); @@ -385,10 +386,6 @@ void smc_fill_gid_list(struct smc_link_group *lgr, int smc_nl_enable_hs_limitation(struct sk_buff *skb, struct genl_info *info); int smc_nl_disable_hs_limitation(struct sk_buff *skb, struct genl_info *info); -static inline void smc_sock_set_flag(struct sock *sk, enum sock_flags flag) -{ - set_bit(flag, &sk->sk_flags); -} static __always_inline bool smc_sock_is_inet_sock(const struct sock *sk) { @@ -409,6 +406,33 @@ static __always_inline void smc_sk_set_state(struct sock *sk, unsigned char stat sk->sk_state = state; } -#define smc_sock_flag(sk, flag) sock_flag(sk, flag) +static __always_inline bool smc_sock_flag(const struct sock *sk, enum sock_flags flag) +{ + if (smc_sock_is_inet_sock(sk)) { + switch (flag) { + case SOCK_DEAD: + case SOCK_DONE: + return test_bit(flag, &smc_sk(sk)->smc_sk_flags); + default: + break; + } + } + return sock_flag(sk, flag); +} + +static __always_inline void smc_sock_set_flag(struct sock *sk, enum sock_flags flag) +{ + if (smc_sock_is_inet_sock(sk)) { + switch (flag) { + case SOCK_DEAD: + case SOCK_DONE: + __set_bit(flag, &smc_sk(sk)->smc_sk_flags); + return; + default: + break; + } + } + set_bit(flag, &sk->sk_flags); +} #endif /* __SMC_H */ From patchwork Tue Feb 20 07:01:41 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "D. Wythe" X-Patchwork-Id: 13563490 Received: from out30-101.freemail.mail.aliyun.com (out30-101.freemail.mail.aliyun.com [115.124.30.101]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D7A575BAE4; Tue, 20 Feb 2024 07:02:08 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=115.124.30.101 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708412531; cv=none; b=CbbyQFO0H2sFU7MgW5VsQ5RAcnqTLWiSHvnGoglgLbz3YLsCIV+AiClhm/d9lDwRrj1kbDgMlwlRentAUYt55ay1YqMea6OG6y7/3ShraEcJQa4+sas0Ko9asFSJVBwdA3QCAnaO2rMS4vPZQdpCQb7T1uLL72DymVUCLfoj/1A= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708412531; c=relaxed/simple; bh=B0jplbyeTnyGgXjKs16mHemTW9xr+/sOdoE/vhvo+Yk=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References; b=hAJIYIwSdcOgU6bvfEoAlEmR11RQ1fwS/k/OtqXH6rjbOs3izr2ao9FpFihB1trQgrL0Lq/rv/NV1Uj+JgJRj8O1V7JU6Tq90ow4ZMhzlUeJowdAaJr7In6RG2O89+L4F67r8eSDs46DhQMPLE+nRPpJGEUinvYxR/pMO9syDM8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com; spf=pass smtp.mailfrom=linux.alibaba.com; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b=UlI9b52O; arc=none smtp.client-ip=115.124.30.101 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b="UlI9b52O" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1708412520; h=From:To:Subject:Date:Message-Id; bh=0pvz0ikn7fOS4oqX6wWIzYrHZ9s8iUGQ3pRJhSRJWas=; b=UlI9b52ONfSlK3wUFQR53G16Jvl8kQGfEMePcSX0N+h3ZDeparH673PsOVgGxKo8TQMPfu0l9j6HpV+Wqdwok6BoDJoKOwkGXGgBM7yl6V2y1EeCEO6H6QHx/7mqQefccHbkhvrYU2Dd+SqFFgpmJVEsNqr5Tt2RZ/gVQpWTMBw= X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R911e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=ay29a033018046049;MF=alibuda@linux.alibaba.com;NM=1;PH=DS;RN=13;SR=0;TI=SMTPD_---0W0vuXh3_1708412519; Received: from j66a10360.sqa.eu95.tbsite.net(mailfrom:alibuda@linux.alibaba.com fp:SMTPD_---0W0vuXh3_1708412519) by smtp.aliyun-inc.com; Tue, 20 Feb 2024 15:02:00 +0800 From: "D. Wythe" To: kgraul@linux.ibm.com, wenjia@linux.ibm.com, jaka@linux.ibm.com, wintera@linux.ibm.com, guwen@linux.alibaba.com Cc: kuba@kernel.org, davem@davemloft.net, netdev@vger.kernel.org, linux-s390@vger.kernel.org, linux-rdma@vger.kernel.org, tonylu@linux.alibaba.com, pabeni@redhat.com, edumazet@google.com Subject: [RFC net-next 16/20] net/smc: add inet proto defination for SMC Date: Tue, 20 Feb 2024 15:01:41 +0800 Message-Id: <1708412505-34470-17-git-send-email-alibuda@linux.alibaba.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1708412505-34470-1-git-send-email-alibuda@linux.alibaba.com> References: <1708412505-34470-1-git-send-email-alibuda@linux.alibaba.com> Precedence: bulk X-Mailing-List: linux-rdma@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: From: "D. Wythe" To implement SMC based on INET sock, we need to be able to identify its real sock type, So we need to apply for a unique IPPROTO_XXX definition. But unlike IPPROTO_TCP or other similar definitions, which values need to be filled into IP message and transmitted in the network. In fact, we just need make sure it is unique in the code. That is, IPPROTO_SMC dose not exist in network, and it is only used to distinguish actual inet sock type in code, and it's still IPPROTO_TCP that is transmitted in the network. In theory, we just need to define IPPROTO_SMC as value greater than 255 and unique in the code. In this patch, we pick 263, following IPPROTO_MPTCP. Signed-off-by: D. Wythe --- include/uapi/linux/in.h | 2 ++ tools/include/uapi/linux/in.h | 2 ++ 2 files changed, 4 insertions(+) diff --git a/include/uapi/linux/in.h b/include/uapi/linux/in.h index e682ab6..7f4b449 100644 --- a/include/uapi/linux/in.h +++ b/include/uapi/linux/in.h @@ -83,6 +83,8 @@ enum { #define IPPROTO_RAW IPPROTO_RAW IPPROTO_MPTCP = 262, /* Multipath TCP connection */ #define IPPROTO_MPTCP IPPROTO_MPTCP + IPPROTO_SMC = 263, /* Shared Memory Communications */ +#define IPPROTO_SMC IPPROTO_SMC IPPROTO_MAX }; #endif diff --git a/tools/include/uapi/linux/in.h b/tools/include/uapi/linux/in.h index e682ab6..7f4b449 100644 --- a/tools/include/uapi/linux/in.h +++ b/tools/include/uapi/linux/in.h @@ -83,6 +83,8 @@ enum { #define IPPROTO_RAW IPPROTO_RAW IPPROTO_MPTCP = 262, /* Multipath TCP connection */ #define IPPROTO_MPTCP IPPROTO_MPTCP + IPPROTO_SMC = 263, /* Shared Memory Communications */ +#define IPPROTO_SMC IPPROTO_SMC IPPROTO_MAX }; #endif From patchwork Tue Feb 20 07:01:42 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "D. Wythe" X-Patchwork-Id: 13563487 Received: from out30-124.freemail.mail.aliyun.com (out30-124.freemail.mail.aliyun.com [115.124.30.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0D8F85B67D; Tue, 20 Feb 2024 07:02:05 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=115.124.30.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708412528; cv=none; b=WWYnHnDIavZQY8dcfWMbVW4dxm9UhleI+FoNQ1LqQ7P1bhNmhs1V9Vk3RP2HT8k4VeREiRJMD3UiVAgCiSM81N/lJHSF8GYSzBhG4rjejXoGAci9D0jwrQLYaXhaL0doyGTxRS2Yu5RtGTkydNVXAyXQhkOm9b3MrbZJ+A21tk0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708412528; c=relaxed/simple; bh=SQNvMalijd45gfFGASD1aB5+WZz0gV238QYfgewPJO0=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References; b=POIcWdrBSrLT+tHtKCKUBsePUshsJ80VKhmk4WjALEO7NNH8A5qUY6pYKHwJMCvyRNNEG9iQvvoSuQ/FAHrKpGEPBNZmnu74mLEx9YkA0cZB45IMIFpbonnECcCLcwFBCHu0O/22+ddwJ+H7NK74NTbMrPDTTL2nsD9BGlQmUTg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com; spf=pass smtp.mailfrom=linux.alibaba.com; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b=KK5umUty; arc=none smtp.client-ip=115.124.30.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b="KK5umUty" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1708412521; h=From:To:Subject:Date:Message-Id; bh=oxlCRX75DTvhbMxdANvNarM4aptbjgB/GONB9m6f8rA=; b=KK5umUty/xbbuZA+mU0OmgHgsQFXWm87ufKiqKv3An589Cae9znHUR0TLe3/x1sAzm5LchRxCLN/P+n8yJS/PO8i5KkWouMrT7Scch4GH5a3MZ/xzqtiGysCB8cUaiEilaMo9Y9Pk/LNOOMNaOL/US3GFhGI+1NWDEbcOiCoCPU= X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R111e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=ay29a033018046050;MF=alibuda@linux.alibaba.com;NM=1;PH=DS;RN=13;SR=0;TI=SMTPD_---0W0vuXhK_1708412520; Received: from j66a10360.sqa.eu95.tbsite.net(mailfrom:alibuda@linux.alibaba.com fp:SMTPD_---0W0vuXhK_1708412520) by smtp.aliyun-inc.com; Tue, 20 Feb 2024 15:02:01 +0800 From: "D. Wythe" To: kgraul@linux.ibm.com, wenjia@linux.ibm.com, jaka@linux.ibm.com, wintera@linux.ibm.com, guwen@linux.alibaba.com Cc: kuba@kernel.org, davem@davemloft.net, netdev@vger.kernel.org, linux-s390@vger.kernel.org, linux-rdma@vger.kernel.org, tonylu@linux.alibaba.com, pabeni@redhat.com, edumazet@google.com Subject: [RFC net-next 17/20] net/smc: add dummy implementation for inet smc sock Date: Tue, 20 Feb 2024 15:01:42 +0800 Message-Id: <1708412505-34470-18-git-send-email-alibuda@linux.alibaba.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1708412505-34470-1-git-send-email-alibuda@linux.alibaba.com> References: <1708412505-34470-1-git-send-email-alibuda@linux.alibaba.com> Precedence: bulk X-Mailing-List: linux-rdma@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: From: "D. Wythe" This patch implements a dummy version of inet smc sock, and register it into the inet protocols, which allows us to create a inet smc sock. Note that, the ops is forked from tcp ops. The vast majority of fields are consistent with TCP, and those cannot be consistent, mainly including, 1. obj_size 2. tw_prot and rsk_prot 3. function than need to be override, explicitly set to NULL. Signed-off-by: D. Wythe --- net/smc/Makefile | 1 + net/smc/af_smc.c | 46 +++++++- net/smc/smc_inet.c | 315 +++++++++++++++++++++++++++++++++++++++++++++++++++++ net/smc/smc_inet.h | 86 +++++++++++++++ 4 files changed, 447 insertions(+), 1 deletion(-) create mode 100644 net/smc/smc_inet.c create mode 100644 net/smc/smc_inet.h diff --git a/net/smc/Makefile b/net/smc/Makefile index 875efcd..4f10c3b 100644 --- a/net/smc/Makefile +++ b/net/smc/Makefile @@ -5,4 +5,5 @@ obj-$(CONFIG_SMC_DIAG) += smc_diag.o smc-y := af_smc.o smc_pnet.o smc_ib.o smc_clc.o smc_core.o smc_wr.o smc_llc.o smc-y += smc_cdc.o smc_tx.o smc_rx.o smc_close.o smc_ism.o smc_netlink.o smc_stats.o smc-y += smc_tracepoint.o +smc-y += smc_inet.o smc-$(CONFIG_SYSCTL) += smc_sysctl.o diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c index 97e3951..390fe6c 100644 --- a/net/smc/af_smc.c +++ b/net/smc/af_smc.c @@ -36,6 +36,9 @@ #include #include +#include +#include +#include #include "smc_netns.h" #include "smc.h" @@ -53,6 +56,7 @@ #include "smc_stats.h" #include "smc_tracepoint.h" #include "smc_sysctl.h" +#include "smc_inet.h" static DEFINE_MUTEX(smc_server_lgr_pending); /* serialize link group * creation on server @@ -3658,9 +3662,36 @@ static int __init smc_init(void) goto out_ib; } + /* init smc inet sock related proto and proto_ops */ + rc = smc_inet_sock_init(); + if (!rc) { + /* registe smc inet proto */ + rc = proto_register(&smc_inet_prot, 1); + if (rc) { + pr_err("%s: proto_register smc_inet_prot fails with %d\n", __func__, rc); + goto out_ulp; + } + /* no return value */ + inet_register_protosw(&smc_inet_protosw); +#if IS_ENABLED(CONFIG_IPV6) + /* register smc inet6 proto */ + rc = proto_register(&smc_inet6_prot, 1); + if (rc) { + pr_err("%s: proto_register smc_inet6_prot fails with %d\n", __func__, rc); + goto out_proto_register; + } + /* no return value */ + inet6_register_protosw(&smc_inet6_protosw); +#endif + } + static_branch_enable(&tcp_have_smc); return 0; - +out_proto_register: + inet_unregister_protosw(&smc_inet_protosw); + proto_unregister(&smc_inet_prot); +out_ulp: + tcp_unregister_ulp(&smc_ulp_ops); out_ib: smc_ib_unregister_client(); out_sock: @@ -3695,6 +3726,10 @@ static int __init smc_init(void) static void __exit smc_exit(void) { static_branch_disable(&tcp_have_smc); + inet_unregister_protosw(&smc_inet_protosw); +#if IS_ENABLED(CONFIG_IPV6) + inet6_unregister_protosw(&smc_inet6_protosw); +#endif tcp_unregister_ulp(&smc_ulp_ops); sock_unregister(PF_SMC); smc_core_exit(); @@ -3705,6 +3740,10 @@ static void __exit smc_exit(void) destroy_workqueue(smc_hs_wq); proto_unregister(&smc_proto6); proto_unregister(&smc_proto); + proto_unregister(&smc_inet_prot); +#if IS_ENABLED(CONFIG_IPV6) + proto_unregister(&smc_inet6_prot); +#endif smc_pnet_exit(); smc_nl_exit(); smc_clc_exit(); @@ -3720,5 +3759,10 @@ static void __exit smc_exit(void) MODULE_DESCRIPTION("smc socket address family"); MODULE_LICENSE("GPL"); MODULE_ALIAS_NETPROTO(PF_SMC); +/* It seems that this macro has different + * understanding of enum type(IPPROTO_SMC or SOCK_STREAM) + */ +MODULE_ALIAS_NET_PF_PROTO_TYPE(PF_INET, 263, 1); +MODULE_ALIAS_NET_PF_PROTO_TYPE(PF_INET6, 263, 1); MODULE_ALIAS_TCP_ULP("smc"); MODULE_ALIAS_GENL_FAMILY(SMC_GENL_FAMILY_NAME); diff --git a/net/smc/smc_inet.c b/net/smc/smc_inet.c new file mode 100644 index 00000000..d35b567 --- /dev/null +++ b/net/smc/smc_inet.c @@ -0,0 +1,315 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Shared Memory Communications over RDMA (SMC-R) and RoCE + * + * AF_SMC protocol family socket handler keeping the AF_INET sock address type + * applies to SOCK_STREAM sockets only + * offers an alternative communication option for TCP-protocol sockets + * applicable with RoCE-cards only + * + * Initial restrictions: + * - support for alternate links postponed + * + * Copyright IBM Corp. 2016, 2018 + * + */ + +#include +#include + +#include "smc_inet.h" +#include "smc.h" + +static struct timewait_sock_ops smc_timewait_sock_ops = { + .twsk_obj_size = sizeof(struct tcp_timewait_sock), + .twsk_unique = tcp_twsk_unique, + .twsk_destructor = tcp_twsk_destructor, +}; + +static struct timewait_sock_ops smc6_timewait_sock_ops = { + .twsk_obj_size = sizeof(struct tcp6_timewait_sock), + .twsk_unique = tcp_twsk_unique, + .twsk_destructor = tcp_twsk_destructor, +}; + +struct proto smc_inet_prot = { + .name = "SMC", + .owner = THIS_MODULE, + .close = tcp_close, + .pre_connect = NULL, + .connect = tcp_v4_connect, + .disconnect = tcp_disconnect, + .accept = smc_inet_csk_accept, + .ioctl = tcp_ioctl, + .init = smc_inet_init_sock, + .destroy = tcp_v4_destroy_sock, + .shutdown = tcp_shutdown, + .setsockopt = tcp_setsockopt, + .getsockopt = tcp_getsockopt, + .keepalive = tcp_set_keepalive, + .recvmsg = tcp_recvmsg, + .sendmsg = tcp_sendmsg, + .backlog_rcv = tcp_v4_do_rcv, + .release_cb = smc_inet_sock_proto_release_cb, + .hash = inet_hash, + .unhash = inet_unhash, + .get_port = inet_csk_get_port, + .enter_memory_pressure = tcp_enter_memory_pressure, + .per_cpu_fw_alloc = &tcp_memory_per_cpu_fw_alloc, + .leave_memory_pressure = tcp_leave_memory_pressure, + .stream_memory_free = tcp_stream_memory_free, + .sockets_allocated = &tcp_sockets_allocated, + .orphan_count = &tcp_orphan_count, + .memory_allocated = &tcp_memory_allocated, + .memory_pressure = &tcp_memory_pressure, + .sysctl_mem = sysctl_tcp_mem, + .sysctl_wmem_offset = offsetof(struct net, ipv4.sysctl_tcp_wmem), + .sysctl_rmem_offset = offsetof(struct net, ipv4.sysctl_tcp_rmem), + .max_header = MAX_TCP_HEADER, + .obj_size = sizeof(struct smc_sock), + .slab_flags = SLAB_TYPESAFE_BY_RCU, + .twsk_prot = &smc_timewait_sock_ops, + /* tcp_conn_request will use tcp_request_sock_ops */ + .rsk_prot = NULL, + .h.hashinfo = &tcp_hashinfo, + .no_autobind = true, + .diag_destroy = tcp_abort, +}; +EXPORT_SYMBOL_GPL(smc_inet_prot); + +const struct proto_ops smc_inet_stream_ops = { + .family = PF_INET, + .owner = THIS_MODULE, + .release = smc_inet_release, + .bind = inet_bind, + .connect = smc_inet_connect, + .socketpair = sock_no_socketpair, + .accept = inet_accept, + .getname = inet_getname, + .poll = smc_inet_poll, + .ioctl = smc_inet_ioctl, + .gettstamp = sock_gettstamp, + .listen = smc_inet_listen, + .shutdown = smc_inet_shutdown, + .setsockopt = smc_inet_setsockopt, + .getsockopt = smc_inet_getsockopt, + .sendmsg = smc_inet_sendmsg, + .recvmsg = smc_inet_recvmsg, +#ifdef CONFIG_MMU + .mmap = tcp_mmap, +#endif + .splice_read = smc_inet_splice_read, + .read_sock = tcp_read_sock, + .sendmsg_locked = tcp_sendmsg_locked, + .peek_len = tcp_peek_len, +#ifdef CONFIG_COMPAT + .compat_ioctl = inet_compat_ioctl, +#endif + .set_rcvlowat = tcp_set_rcvlowat, +}; + +struct inet_protosw smc_inet_protosw = { + .type = SOCK_STREAM, + .protocol = IPPROTO_SMC, + .prot = &smc_inet_prot, + .ops = &smc_inet_stream_ops, + .flags = INET_PROTOSW_ICSK, +}; + +#if IS_ENABLED(CONFIG_IPV6) +struct proto smc_inet6_prot = { + .name = "SMCv6", + .owner = THIS_MODULE, + .close = tcp_close, + .pre_connect = NULL, + .connect = NULL, + .disconnect = tcp_disconnect, + .accept = smc_inet_csk_accept, + .ioctl = tcp_ioctl, + .init = smc_inet_init_sock, + .destroy = NULL, + .shutdown = tcp_shutdown, + .setsockopt = tcp_setsockopt, + .getsockopt = tcp_getsockopt, + .keepalive = tcp_set_keepalive, + .recvmsg = tcp_recvmsg, + .sendmsg = tcp_sendmsg, + .backlog_rcv = NULL, + .release_cb = smc_inet_sock_proto_release_cb, + .hash = NULL, + .unhash = inet_unhash, + .get_port = inet_csk_get_port, + .enter_memory_pressure = tcp_enter_memory_pressure, + .per_cpu_fw_alloc = &tcp_memory_per_cpu_fw_alloc, + .leave_memory_pressure = tcp_leave_memory_pressure, + .stream_memory_free = tcp_stream_memory_free, + .sockets_allocated = &tcp_sockets_allocated, + .memory_allocated = &tcp_memory_allocated, + .memory_pressure = &tcp_memory_pressure, + .orphan_count = &tcp_orphan_count, + .sysctl_mem = sysctl_tcp_mem, + .sysctl_wmem_offset = offsetof(struct net, ipv4.sysctl_tcp_wmem), + .sysctl_rmem_offset = offsetof(struct net, ipv4.sysctl_tcp_rmem), + .max_header = MAX_TCP_HEADER, + .obj_size = sizeof(struct smc_sock), + .ipv6_pinfo_offset = offsetof(struct tcp6_sock, inet6), + .slab_flags = SLAB_TYPESAFE_BY_RCU, + .twsk_prot = &smc6_timewait_sock_ops, + /* tcp_conn_request will use tcp_request_sock_ops */ + .rsk_prot = NULL, + .h.hashinfo = &tcp_hashinfo, + .no_autobind = true, + .diag_destroy = tcp_abort, +}; +EXPORT_SYMBOL_GPL(smc_inet6_prot); + +const struct proto_ops smc_inet6_stream_ops = { + .family = PF_INET6, + .owner = THIS_MODULE, + .release = smc_inet_release, + .bind = inet6_bind, + .connect = smc_inet_connect, /* ok */ + .socketpair = sock_no_socketpair, /* a do nothing */ + .accept = inet_accept, /* ok */ + .getname = inet6_getname, + .poll = smc_inet_poll, /* ok */ + .ioctl = smc_inet_ioctl, /* must change */ + .gettstamp = sock_gettstamp, + .listen = smc_inet_listen, /* ok */ + .shutdown = smc_inet_shutdown, /* ok */ + .setsockopt = smc_inet_setsockopt, /* ok */ + .getsockopt = smc_inet_getsockopt, /* ok */ + .sendmsg = smc_inet_sendmsg, /* retpoline's sake */ + .recvmsg = smc_inet_recvmsg, /* retpoline's sake */ +#ifdef CONFIG_MMU + .mmap = tcp_mmap, +#endif + .sendmsg_locked = tcp_sendmsg_locked, + .splice_read = smc_inet_splice_read, + .read_sock = tcp_read_sock, + .peek_len = tcp_peek_len, +#ifdef CONFIG_COMPAT + .compat_ioctl = inet6_compat_ioctl, +#endif + .set_rcvlowat = tcp_set_rcvlowat, +}; + +struct inet_protosw smc_inet6_protosw = { + .type = SOCK_STREAM, + .protocol = IPPROTO_SMC, + .prot = &smc_inet6_prot, + .ops = &smc_inet6_stream_ops, + .flags = INET_PROTOSW_ICSK, +}; +#endif + +int smc_inet_sock_init(void) +{ + struct proto *tcp_v4prot; +#if IS_ENABLED(CONFIG_IPV6) + struct proto *tcp_v6prot; +#endif + + tcp_v4prot = smc_inet_get_tcp_prot(PF_INET); + if (unlikely(!tcp_v4prot)) + return -EINVAL; + +#if IS_ENABLED(CONFIG_IPV6) + tcp_v6prot = smc_inet_get_tcp_prot(PF_INET6); + if (unlikely(!tcp_v6prot)) + return -EINVAL; +#endif + + /* INET sock has a issues here. twsk will hold the reference of the this module, + * so it may be found that the SMC module cannot be uninstalled after the test program ends, + * But eventually, twsk will release the reference of the module. + * This may affect some old test cases if they try to remove the module immediately after + * completing their test. + */ + + /* Complete the full prot and proto_ops to + * ensure consistency with TCP. Some symbols here have not been exported, + * so that we have to assign it here. + */ + smc_inet_prot.pre_connect = tcp_v4prot->pre_connect; + +#if IS_ENABLED(CONFIG_IPV6) + smc_inet6_prot.pre_connect = tcp_v6prot->pre_connect; + smc_inet6_prot.connect = tcp_v6prot->connect; + smc_inet6_prot.destroy = tcp_v6prot->destroy; + smc_inet6_prot.backlog_rcv = tcp_v6prot->backlog_rcv; + smc_inet6_prot.hash = tcp_v6prot->hash; +#endif + return 0; +} + +int smc_inet_init_sock(struct sock *sk) { return 0; } + +void smc_inet_sock_proto_release_cb(struct sock *sk) {} + +int smc_inet_connect(struct socket *sock, struct sockaddr *addr, + int alen, int flags) +{ + return -EOPNOTSUPP; +} + +int smc_inet_setsockopt(struct socket *sock, int level, int optname, + sockptr_t optval, unsigned int optlen) +{ + return -EOPNOTSUPP; +} + +int smc_inet_getsockopt(struct socket *sock, int level, int optname, + char __user *optval, int __user *optlen) +{ + return -EOPNOTSUPP; +} + +int smc_inet_ioctl(struct socket *sock, unsigned int cmd, + unsigned long arg) +{ + return -EOPNOTSUPP; +} + +int smc_inet_sendmsg(struct socket *sock, struct msghdr *msg, size_t len) +{ + return -EOPNOTSUPP; +} + +int smc_inet_recvmsg(struct socket *sock, struct msghdr *msg, size_t len, + int flags) +{ + return -EOPNOTSUPP; +} + +ssize_t smc_inet_splice_read(struct socket *sock, loff_t *ppos, + struct pipe_inode_info *pipe, size_t len, + unsigned int flags) +{ + return -EOPNOTSUPP; +} + +__poll_t smc_inet_poll(struct file *file, struct socket *sock, poll_table *wait) +{ + return 0; +} + +struct sock *smc_inet_csk_accept(struct sock *sk, int flags, int *err, bool kern) +{ + return NULL; +} + +int smc_inet_listen(struct socket *sock, int backlog) +{ + return -EOPNOTSUPP; +} + +int smc_inet_shutdown(struct socket *sock, int how) +{ + return -EOPNOTSUPP; +} + +int smc_inet_release(struct socket *sock) +{ + return -EOPNOTSUPP; +} diff --git a/net/smc/smc_inet.h b/net/smc/smc_inet.h new file mode 100644 index 00000000..68ecfa0 --- /dev/null +++ b/net/smc/smc_inet.h @@ -0,0 +1,86 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Shared Memory Communications over RDMA (SMC-R) and RoCE + * + * Definitions for the SMC module (socket related) + * + * Copyright IBM Corp. 2016 + * + */ + +#ifndef __SMC_INET +#define __SMC_INET + +#include +#include +#include +#include +/* MUST after net/tcp.h or warning */ +#include + +extern struct proto smc_inet_prot; +extern struct proto smc_inet6_prot; + +extern const struct proto_ops smc_inet_stream_ops; +extern const struct proto_ops smc_inet6_stream_ops; + +extern struct inet_protosw smc_inet_protosw; +extern struct inet_protosw smc_inet6_protosw; + +/* obtain TCP proto via sock family */ +static __always_inline struct proto *smc_inet_get_tcp_prot(int family) +{ + switch (family) { + case AF_INET: + return &tcp_prot; + case AF_INET6: + return &tcpv6_prot; + default: + pr_warn_once("smc: %s(unknown family %d)\n", __func__, family); + break; + } + return NULL; +} + +/* This function initializes the inet related structures. + * If initialization is successful, it returns 0; + * otherwise, it returns a non-zero value. + */ +int smc_inet_sock_init(void); + +int smc_inet_init_sock(struct sock *sk); +void smc_inet_sock_proto_release_cb(struct sock *sk); + +int smc_inet_connect(struct socket *sock, struct sockaddr *addr, + int alen, int flags); + +int smc_inet_setsockopt(struct socket *sock, int level, int optname, + sockptr_t optval, unsigned int optlen); + +int smc_inet_getsockopt(struct socket *sock, int level, int optname, + char __user *optval, int __user *optlen); + +int smc_inet_ioctl(struct socket *sock, unsigned int cmd, + unsigned long arg); + +int smc_inet_sendmsg(struct socket *sock, struct msghdr *msg, size_t len); + +int smc_inet_recvmsg(struct socket *sock, struct msghdr *msg, size_t len, + int flags); + +ssize_t smc_inet_sendpage(struct socket *sock, struct page *page, + int offset, size_t size, int flags); + +ssize_t smc_inet_splice_read(struct socket *sock, loff_t *ppos, + struct pipe_inode_info *pipe, size_t len, + unsigned int flags); + +__poll_t smc_inet_poll(struct file *file, struct socket *sock, poll_table *wait); + +struct sock *smc_inet_csk_accept(struct sock *sk, int flags, int *err, bool kern); +int smc_inet_listen(struct socket *sock, int backlog); + +int smc_inet_shutdown(struct socket *sock, int how); +int smc_inet_release(struct socket *sock); + +#endif // __SMC_INET From patchwork Tue Feb 20 07:01:43 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "D. Wythe" X-Patchwork-Id: 13563491 Received: from out30-130.freemail.mail.aliyun.com (out30-130.freemail.mail.aliyun.com [115.124.30.130]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CFD935C60A; Tue, 20 Feb 2024 07:02:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=115.124.30.130 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708412532; cv=none; b=G00w3sP0LDph/SyRgCkSI0OOoqc64B5+quq1YHwDSwJRWV1h4/ASc70pIKGHzvDIvRzLc/XujZLuNc7GSOl4AsK9oByx4LGWQunKB/r702RfbrhQDT75gfHXBmjt0svk+IYUoS1t1yuf4eRrSnVCxMsryp96ymOHiqDr0y2pSd4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708412532; c=relaxed/simple; bh=SiTAsqKh8EAlVt3tvGR5Lh2Elhcnf0KiBNWqK87aAHM=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References; b=R40bY3R3eou1tkwqOyJMEyV7cXsILJYNKxNivSqIjU/O6g86T3MN4jfbtkmpEUQ54MG1wEN0g4B8kU3DxJgI1oKiF9l83+o5xr0qODsT3cOQdC8jUUhKHXWk0pIC7kEQleZ6SvKkbXK5YXDaxaCM7sMyUi5huEDZ2ZM4U3MyViQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com; spf=pass smtp.mailfrom=linux.alibaba.com; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b=mDCUOo62; arc=none smtp.client-ip=115.124.30.130 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b="mDCUOo62" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1708412522; h=From:To:Subject:Date:Message-Id; bh=cOgniKP2yjByNBrFGRtX5j3RCkIgyNcnzJjy9IfWBWI=; b=mDCUOo62S3/AJ6AGzFW5y7ffXHFXm1701eZRHpsho3vg6x1J0U6+Tl5ju5m/bbk7e/mmXr1JeRob/0zXugXUUdNSfty90QsLV+E3fMMq2w1DTBOx1eV0eUxtKoLowAYGtq0EFLoGLJL3oYrBfRceZk4PicH8kIz+k/exqfgYAd8= X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R331e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=ay29a033018045176;MF=alibuda@linux.alibaba.com;NM=1;PH=DS;RN=13;SR=0;TI=SMTPD_---0W0vuXhV_1708412521; Received: from j66a10360.sqa.eu95.tbsite.net(mailfrom:alibuda@linux.alibaba.com fp:SMTPD_---0W0vuXhV_1708412521) by smtp.aliyun-inc.com; Tue, 20 Feb 2024 15:02:01 +0800 From: "D. Wythe" To: kgraul@linux.ibm.com, wenjia@linux.ibm.com, jaka@linux.ibm.com, wintera@linux.ibm.com, guwen@linux.alibaba.com Cc: kuba@kernel.org, davem@davemloft.net, netdev@vger.kernel.org, linux-s390@vger.kernel.org, linux-rdma@vger.kernel.org, tonylu@linux.alibaba.com, pabeni@redhat.com, edumazet@google.com Subject: [RFC net-next 18/20] net/smc: add define and macro for smc_negotiation Date: Tue, 20 Feb 2024 15:01:43 +0800 Message-Id: <1708412505-34470-19-git-send-email-alibuda@linux.alibaba.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1708412505-34470-1-git-send-email-alibuda@linux.alibaba.com> References: <1708412505-34470-1-git-send-email-alibuda@linux.alibaba.com> Precedence: bulk X-Mailing-List: linux-rdma@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: From: "D. Wythe" smc_negotiation is a new way to describe the state of the SMC protocol, note that it will only be used by inet sock. It mainly describes the following states of SMC sock: TBD: Before TCP handshake is completed. PREPARE: TCP is established, and smc is establishing. SMC: smc handshake is established. NO_SMC: sock should act as TCP. Before this patch, it is determined that these conditions must be applied simultaneously to syn_smc/use_fallback/sk_state, synchronization of fields needs to be handled with care, while syn_smc field cannot be modified at any time. Based on these considerations, inet sock uses smc_negotiation to control the protocol state. Signed-off-by: D. Wythe --- net/smc/smc.h | 1 + net/smc/smc_inet.h | 96 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 97 insertions(+) diff --git a/net/smc/smc.h b/net/smc/smc.h index 1675193..538920f 100644 --- a/net/smc/smc.h +++ b/net/smc/smc.h @@ -252,6 +252,7 @@ struct smc_sock { /* smc sock container */ }; struct socket *clcsock; /* internal tcp socket */ unsigned char smc_state; /* smc state used in smc via inet_sk */ + unsigned int isck_smc_negotiation; unsigned long smc_sk_flags; /* smc sock flags used for inet sock */ void (*clcsk_state_change)(struct sock *sk); /* original stat_change fct. */ diff --git a/net/smc/smc_inet.h b/net/smc/smc_inet.h index 68ecfa0..1f182c0 100644 --- a/net/smc/smc_inet.h +++ b/net/smc/smc_inet.h @@ -18,6 +18,9 @@ /* MUST after net/tcp.h or warning */ #include +#include +#include "smc.h" + extern struct proto smc_inet_prot; extern struct proto smc_inet6_prot; @@ -27,6 +30,99 @@ extern struct inet_protosw smc_inet_protosw; extern struct inet_protosw smc_inet6_protosw; +enum smc_inet_sock_negotiation_state { + /* When creating an AF_SMC sock, the state field will be initialized to 0 by default, + * which is only for logical compatibility with that situation + * and will never be used. + */ + SMC_NEGOTIATION_COMPATIBLE_WITH_AF_SMC = 0, + + /* This connection is still uncertain whether it is an SMC connection or not, + * It always appears when actively open SMC connection, because it's unclear + * whether the server supports the SMC protocol and has willing to use SMC. + */ + SMC_NEGOTIATION_TBD = 0x10, + + /* This state indicates that this connection is definitely not an SMC connection. + * and it is absolutely impossible to become an SMC connection again. A fina + * state. + */ + SMC_NEGOTIATION_NO_SMC = 0x20, + + /* This state indicates that this connection is an SMC connection. and it is + * absolutely impossible to become an not-SMC connection again. A final state. + */ + SMC_NEGOTIATION_SMC = 0x40, + + /* This state indicates that this connection is in the process of SMC handshake. + * It is mainly used to eliminate the ambiguity of syn_smc, because when syn_smc is 1, + * It may represent remote has support for SMC, or it may just indicate that itself has + * supports for SMC. + */ + SMC_NEGOTIATION_PREPARE_SMC = 0x80, + + /* flags */ + SMC_NEGOTIATION_LISTEN_FLAG = 0x01, + SMC_NEGOTIATION_ABORT_FLAG = 0x02, +}; + +static __always_inline void isck_smc_negotiation_store(struct smc_sock *smc, + enum smc_inet_sock_negotiation_state state) +{ + WRITE_ONCE(smc->isck_smc_negotiation, + state | (READ_ONCE(smc->isck_smc_negotiation) & 0x0f)); +} + +static __always_inline int isck_smc_negotiation_load(struct smc_sock *smc) +{ + return READ_ONCE(smc->isck_smc_negotiation) & 0xf0; +} + +static __always_inline void isck_smc_negotiation_set_flags(struct smc_sock *smc, int flags) +{ + smc->isck_smc_negotiation = (smc->isck_smc_negotiation | (flags & 0x0f)); +} + +static __always_inline int isck_smc_negotiation_get_flags(struct smc_sock *smc) +{ + return smc->isck_smc_negotiation & 0x0f; +} + +static __always_inline bool smc_inet_sock_check_fallback_fast(struct sock *sk) +{ + return !tcp_sk(sk)->syn_smc; +} + +static __always_inline bool smc_inet_sock_check_fallback(struct sock *sk) +{ + return isck_smc_negotiation_load(smc_sk(sk)) == SMC_NEGOTIATION_NO_SMC; +} + +static __always_inline bool smc_inet_sock_check_smc(struct sock *sk) +{ + if (smc_inet_sock_check_fallback_fast(sk)) + return false; + + return isck_smc_negotiation_load(smc_sk(sk)) == SMC_NEGOTIATION_SMC; +} + +static __always_inline bool smc_inet_sock_is_active_open(struct sock *sk) +{ + return !(isck_smc_negotiation_get_flags(smc_sk(sk)) & SMC_NEGOTIATION_LISTEN_FLAG); +} + +static inline void smc_inet_sock_abort(struct sock *sk) +{ + write_lock_bh(&sk->sk_callback_lock); + if (isck_smc_negotiation_get_flags(smc_sk(sk)) & SMC_NEGOTIATION_ABORT_FLAG) { + write_unlock_bh(&sk->sk_callback_lock); + return; + } + isck_smc_negotiation_set_flags(smc_sk(sk), SMC_NEGOTIATION_ABORT_FLAG); + write_unlock_bh(&sk->sk_callback_lock); + sk->sk_error_report(sk); +} + /* obtain TCP proto via sock family */ static __always_inline struct proto *smc_inet_get_tcp_prot(int family) { From patchwork Tue Feb 20 07:01:44 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: "D. Wythe" X-Patchwork-Id: 13563489 Received: from out30-99.freemail.mail.aliyun.com (out30-99.freemail.mail.aliyun.com [115.124.30.99]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EFF2D5B5CE; Tue, 20 Feb 2024 07:02:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=115.124.30.99 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708412530; cv=none; b=tyJRU0Lw33qCl/EcJpeM21kuaStP1/SaSIzk+J+uQfdz4a7ZWveTDMGAfzEIBzPtpI/fsGFvHsM0vQfeLSRDCaFkzQenAd16Hgol8mp8EZWBuJF9oWqK3mE2VQ1a3ORgGaH8rQCgVDD2UHYH0iAznH8tMYKQb/tB3/iK/MjLJ9k= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708412530; c=relaxed/simple; bh=cWRc+mkbWG2n0PDtOYSRZHIkGo13jr4Tdym5T/s3jaE=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version:Content-Type; b=cRQsZRtYr0siLFSqQ12NOMXCULR9Lm2QlaDUKxLjE8L5CA75TeoF59ASDx+UZOtr3yT6bEB0ZRHe1RcumyL1uCjgeVa8f+KIFl1LXL5SndbDDIDdbf6twfs0YQghV4+UIGEQ8xKAzD7TzdYBUB++aXa6sO/qW8x7UK7jFdcAm+0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com; spf=pass smtp.mailfrom=linux.alibaba.com; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b=ZKkQN6dQ; arc=none smtp.client-ip=115.124.30.99 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b="ZKkQN6dQ" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1708412522; h=From:To:Subject:Date:Message-Id:MIME-Version:Content-Type; bh=v0ywznRTPj7Wn1FFiwpPP6LD+OKCgbhW6KYJevfE9Nk=; b=ZKkQN6dQjOM/htmJKPxb/UuDpP7ucPlTDTqRF/iEqOh2aOVNgnyZr1/en68KkI5atgdI09patKLJ30xqB/AzXEOGXE1zlZfUS2C/OIDKeNU16AW0UAh1eHPebbjbsO2bD1QFj9msl5KYBzgMCRFcqrEH5WmP6vULj+Qd+3HJfH8= X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R141e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=ay29a033018046059;MF=alibuda@linux.alibaba.com;NM=1;PH=DS;RN=13;SR=0;TI=SMTPD_---0W0vuXhs_1708412522; Received: from j66a10360.sqa.eu95.tbsite.net(mailfrom:alibuda@linux.alibaba.com fp:SMTPD_---0W0vuXhs_1708412522) by smtp.aliyun-inc.com; Tue, 20 Feb 2024 15:02:02 +0800 From: "D. Wythe" To: kgraul@linux.ibm.com, wenjia@linux.ibm.com, jaka@linux.ibm.com, wintera@linux.ibm.com, guwen@linux.alibaba.com Cc: kuba@kernel.org, davem@davemloft.net, netdev@vger.kernel.org, linux-s390@vger.kernel.org, linux-rdma@vger.kernel.org, tonylu@linux.alibaba.com, pabeni@redhat.com, edumazet@google.com Subject: [RFC net-next 19/20] net/smc: support smc inet with merge socks Date: Tue, 20 Feb 2024 15:01:44 +0800 Message-Id: <1708412505-34470-20-git-send-email-alibuda@linux.alibaba.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1708412505-34470-1-git-send-email-alibuda@linux.alibaba.com> References: <1708412505-34470-1-git-send-email-alibuda@linux.alibaba.com> Precedence: bulk X-Mailing-List: linux-rdma@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: "D. Wythe" This patch mainly implements how AF_INET works under one single sock. Unlike the AF_SMC sock, the inet smc can directly fallback to TCP under the IRQ context, no need to start a work in workqueue. The inet smc start the listen work only when the peer has syn_smc set. It accelerates the efficiency of establishing connections whom prove to be SMC unsupported. The complex logic here is that we need to dynamically sort the reqsk in TCP, and try to move ahead the syn_smc with 0 as much as possible, which allows for faster fallback connections. However, due to the timing issue of accept, we cannot always guarantee this condition. Signed-off-by: D. Wythe --- include/linux/tcp.h | 1 + net/smc/af_smc.c | 942 +++++++++++++++++++++++++++++++++++++++++++++++++++- net/smc/smc.h | 7 + net/smc/smc_cdc.h | 8 + net/smc/smc_clc.h | 1 + net/smc/smc_close.c | 16 +- net/smc/smc_inet.c | 215 +++++++++--- net/smc/smc_inet.h | 98 ++++++ net/smc/smc_rx.c | 6 +- 9 files changed, 1236 insertions(+), 58 deletions(-) diff --git a/include/linux/tcp.h b/include/linux/tcp.h index a1c47a6..8546ae9 100644 --- a/include/linux/tcp.h +++ b/include/linux/tcp.h @@ -479,6 +479,7 @@ struct tcp_sock { #if IS_ENABLED(CONFIG_SMC) bool (*smc_hs_congested)(const struct sock *sk); bool syn_smc; /* SYN includes SMC */ + bool is_smc; /* is this sock also a smc sock */ #endif #if defined(CONFIG_TCP_MD5SIG) || defined(CONFIG_TCP_AO) diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c index 390fe6c..b66a199 100644 --- a/net/smc/af_smc.c +++ b/net/smc/af_smc.c @@ -72,6 +72,8 @@ static void smc_tcp_listen_work(struct work_struct *); static void smc_connect_work(struct work_struct *); +static int smc_inet_sock_do_handshake(struct sock *sk, bool sk_locked, bool sync); + int smc_nl_dump_hs_limitation(struct sk_buff *skb, struct netlink_callback *cb) { struct smc_nl_dmp_ctx *cb_ctx = smc_nl_dmp_ctx(cb); @@ -360,6 +362,9 @@ static int smc_release(struct socket *sock) static void smc_destruct(struct sock *sk) { + if (smc_sk(sk)->original_sk_destruct) + smc_sk(sk)->original_sk_destruct(sk); + if (smc_sk_state(sk) != SMC_CLOSED) return; if (!smc_sock_flag(sk, SOCK_DEAD)) @@ -402,6 +407,9 @@ static void smc_sock_init(struct sock *sk, struct net *net) spin_lock_init(&smc->accept_q_lock); smc_init_saved_callbacks(smc); + /* already set (for inet sock), save the original */ + if (sk->sk_destruct) + smc->original_sk_destruct = sk->sk_destruct; sk->sk_destruct = smc_destruct; } @@ -518,6 +526,10 @@ static void smc_adjust_sock_bufsizes(struct sock *nsk, struct sock *osk, static void smc_copy_sock_settings(struct sock *nsk, struct sock *osk, unsigned long mask) { + /* no need for inet smc */ + if (smc_sock_is_inet_sock(nsk)) + return; + /* options we don't get control via setsockopt for */ nsk->sk_type = osk->sk_type; nsk->sk_sndtimeo = osk->sk_sndtimeo; @@ -945,6 +957,11 @@ static int smc_switch_to_fallback(struct smc_sock *smc, int reason_code) smc_stat_fallback(smc); trace_smc_switch_to_fallback(smc, reason_code); + /* inet sock */ + if (smc_sock_is_inet_sock(&smc->sk)) + return 0; + + /* smc sock */ mutex_lock(&smc->clcsock_release_lock); if (!smc->clcsock) { rc = -EBADF; @@ -1712,12 +1729,28 @@ static int smc_connect(struct socket *sock, struct sockaddr *addr, break; } - smc_copy_sock_settings_to_clc(smc); - tcp_sk(smc->clcsock->sk)->syn_smc = 1; if (smc->connect_nonblock) { rc = -EALREADY; goto out; } + + smc_copy_sock_settings_to_clc(smc); + + if (smc_sock_is_inet_sock(sk)) { + write_lock_bh(&sk->sk_callback_lock); + if (smc_inet_sock_set_syn_smc_locked(sk, 1)) { + if (flags & O_NONBLOCK) + smc_clcsock_replace_cb(&sk->sk_state_change, + smc_inet_sock_state_change, + &smc->clcsk_state_change); + } else if (!tcp_sk(sk)->syn_smc && !smc->use_fallback) { + smc_switch_to_fallback(smc, SMC_CLC_DECL_OPTUNSUPP); + } + write_unlock_bh(&sk->sk_callback_lock); + } else { + tcp_sk(smc->clcsock->sk)->syn_smc = 1; + } + rc = kernel_connect(smc->clcsock, addr, alen, flags); if (rc && rc != -EINPROGRESS) goto out; @@ -1726,6 +1759,56 @@ static int smc_connect(struct socket *sock, struct sockaddr *addr, sock->state = rc ? SS_CONNECTING : SS_CONNECTED; goto out; } + + /* for inet sock */ + if (smc_sock_is_inet_sock(sk)) { + if (flags & O_NONBLOCK) { + write_lock_bh(&sk->sk_callback_lock); + if (smc_inet_sock_check_smc(sk) || smc_inet_sock_check_fallback(sk)) { + rc = 0; + } else { + smc->connect_nonblock = 1; + rc = -EINPROGRESS; + } + write_unlock_bh(&sk->sk_callback_lock); + } else { + write_lock_bh(&sk->sk_callback_lock); +again: + switch (isck_smc_negotiation_load(smc)) { + case SMC_NEGOTIATION_TBD: + /* already abort */ + if (isck_smc_negotiation_get_flags(smc_sk(sk)) & + SMC_NEGOTIATION_ABORT_FLAG) { + rc = -ECONNABORTED; + break; + } + smc_inet_sock_move_state_locked(sk, SMC_NEGOTIATION_TBD, + SMC_NEGOTIATION_PREPARE_SMC); + write_unlock_bh(&sk->sk_callback_lock); +do_handshake: + rc = smc_inet_sock_do_handshake(sk, /* sk_locked */ true, + true); + write_lock_bh(&sk->sk_callback_lock); + break; + case SMC_NEGOTIATION_PREPARE_SMC: + write_unlock_bh(&sk->sk_callback_lock); + /* cancel success */ + if (cancel_work_sync(&smc->connect_work)) + goto do_handshake; + write_lock_bh(&sk->sk_callback_lock); + goto again; + case SMC_NEGOTIATION_NO_SMC: + case SMC_NEGOTIATION_SMC: + rc = 0; + break; + } + write_unlock_bh(&sk->sk_callback_lock); + if (!rc) + goto connected; + } + goto out; + } + sock_hold(&smc->sk); /* sock put in passive closing */ if (flags & O_NONBLOCK) { if (queue_work(smc_hs_wq, &smc->connect_work)) @@ -1816,7 +1899,8 @@ static void smc_accept_enqueue(struct sock *parent, struct sock *sk) spin_lock(&par->accept_q_lock); list_add_tail(&smc_sk(sk)->accept_q, &par->accept_q); spin_unlock(&par->accept_q_lock); - sk_acceptq_added(parent); + if (!smc_sock_is_inet_sock(sk)) + sk_acceptq_added(parent); } /* remove a socket from the accept queue of its parental listening socket */ @@ -1827,7 +1911,8 @@ static void smc_accept_unlink(struct sock *sk) spin_lock(&par->accept_q_lock); list_del_init(&smc_sk(sk)->accept_q); spin_unlock(&par->accept_q_lock); - sk_acceptq_removed(&smc_sk(sk)->listen_smc->sk); + if (!smc_sock_is_inet_sock(sk)) + sk_acceptq_removed(&smc_sk(sk)->listen_smc->sk); sock_put(sk); /* sock_hold in smc_accept_enqueue */ } @@ -1845,6 +1930,10 @@ struct sock *smc_accept_dequeue(struct sock *parent, smc_accept_unlink(new_sk); if (smc_sk_state(new_sk) == SMC_CLOSED) { + if (smc_sock_is_inet_sock(parent)) { + tcp_close(new_sk, 0); + continue; + } new_sk->sk_prot->unhash(new_sk); if (isk->clcsock) { sock_release(isk->clcsock); @@ -1873,13 +1962,25 @@ void smc_close_non_accepted(struct sock *sk) sock_hold(sk); /* sock_put below */ lock_sock(sk); - if (!sk->sk_lingertime) - /* wait for peer closing */ - WRITE_ONCE(sk->sk_lingertime, SMC_MAX_STREAM_WAIT_TIMEOUT); - __smc_release(smc); + if (smc_sock_is_inet_sock(sk)) { + if (!smc_inet_sock_check_fallback(sk)) + smc_close_active(smc); + smc_sock_set_flag(sk, SOCK_DEAD); + release_sock(sk); + tcp_close(sk, 0); + lock_sock(sk); + if (smc_sk_state(sk) == SMC_CLOSED) + smc_conn_free(&smc->conn); + } else { + if (!sk->sk_lingertime) + /* wait for peer closing */ + sk->sk_lingertime = SMC_MAX_STREAM_WAIT_TIMEOUT; + __smc_release(smc); + } release_sock(sk); sock_put(sk); /* sock_hold above */ - sock_put(sk); /* final sock_put */ + if (!smc_sock_is_inet_sock(sk)) + sock_put(sk); /* final sock_put */ } static int smcr_serv_conf_first_link(struct smc_sock *smc) @@ -1943,6 +2044,14 @@ static void smc_listen_out(struct smc_sock *new_smc) if (tcp_sk(new_smc->clcsock->sk)->syn_smc) atomic_dec(&lsmc->queued_smc_hs); + if (smc_sock_is_inet_sock(newsmcsk)) + smc_inet_sock_move_state(newsmcsk, + SMC_NEGOTIATION_PREPARE_SMC, + new_smc->use_fallback && + smc_sk_state(newsmcsk) == SMC_ACTIVE ? + SMC_NEGOTIATION_NO_SMC : + SMC_NEGOTIATION_SMC); + if (smc_sk_state(&lsmc->sk) == SMC_LISTEN) { lock_sock_nested(&lsmc->sk, SINGLE_DEPTH_NESTING); smc_accept_enqueue(&lsmc->sk, newsmcsk); @@ -2855,7 +2964,7 @@ static int smc_recvmsg(struct socket *sock, struct msghdr *msg, size_t len, smc = smc_sk(sk); lock_sock(sk); - if (smc_sk_state(sk) == SMC_CLOSED && (sk->sk_shutdown & RCV_SHUTDOWN)) { + if (smc_sk_state(sk) == SMC_CLOSED && smc_has_rcv_shutdown(sk)) { /* socket was connected before, no more data to read */ rc = 0; goto out; @@ -2936,7 +3045,7 @@ static __poll_t smc_poll(struct file *file, struct socket *sock, } if (atomic_read(&smc->conn.bytes_to_rcv)) mask |= EPOLLIN | EPOLLRDNORM; - if (sk->sk_shutdown & RCV_SHUTDOWN) + if (smc_has_rcv_shutdown(sk)) mask |= EPOLLIN | EPOLLRDNORM | EPOLLRDHUP; if (smc_sk_state(sk) == SMC_APPCLOSEWAIT1) mask |= EPOLLIN; @@ -3347,7 +3456,7 @@ static ssize_t smc_splice_read(struct socket *sock, loff_t *ppos, smc = smc_sk(sk); lock_sock(sk); - if (smc_sk_state(sk) == SMC_CLOSED && (sk->sk_shutdown & RCV_SHUTDOWN)) { + if (smc_sk_state(sk) == SMC_CLOSED && (smc_has_rcv_shutdown(sk))) { /* socket was connected before, no more data to read */ rc = 0; goto out; @@ -3573,6 +3682,815 @@ static void __net_exit smc_net_stat_exit(struct net *net) .exit = smc_net_stat_exit, }; +static inline struct request_sock *smc_inet_reqsk_get_safe_tail_0(struct sock *parent) +{ + struct request_sock_queue *queue = &inet_csk(parent)->icsk_accept_queue; + struct request_sock *req = queue->rskq_accept_head; + + if (req && smc_sk(req->sk)->ordered && tcp_sk(req->sk)->syn_smc == 0) + return smc_sk(parent)->tail_0; + + return NULL; +} + +static inline struct request_sock *smc_inet_reqsk_get_safe_tail_1(struct sock *parent) +{ + struct request_sock_queue *queue = &inet_csk(parent)->icsk_accept_queue; + struct request_sock *tail_0 = smc_inet_reqsk_get_safe_tail_0(parent); + struct request_sock *req; + + if (tail_0) + req = tail_0->dl_next; + else + req = queue->rskq_accept_head; + + if (req && smc_sk(req->sk)->ordered && tcp_sk(req->sk)->syn_smc) + return smc_sk(parent)->tail_1; + + return NULL; +} + +static inline void smc_reqsk_queue_remove_locked(struct request_sock_queue *queue) +{ + struct request_sock *req; + + req = queue->rskq_accept_head; + if (req) { + WRITE_ONCE(queue->rskq_accept_head, req->dl_next); + if (!queue->rskq_accept_head) + queue->rskq_accept_tail = NULL; + } +} + +static inline void smc_reqsk_queue_add_locked(struct request_sock_queue *queue, + struct request_sock *req) +{ + req->dl_next = NULL; + if (!queue->rskq_accept_head) + WRITE_ONCE(queue->rskq_accept_head, req); + else + queue->rskq_accept_tail->dl_next = req; + queue->rskq_accept_tail = req; +} + +static inline void smc_reqsk_queue_join_locked(struct request_sock_queue *to, + struct request_sock_queue *from) +{ + if (reqsk_queue_empty(from)) + return; + + if (reqsk_queue_empty(to)) { + to->rskq_accept_head = from->rskq_accept_head; + to->rskq_accept_tail = from->rskq_accept_tail; + } else { + to->rskq_accept_tail->dl_next = from->rskq_accept_head; + to->rskq_accept_tail = from->rskq_accept_tail; + } + + from->rskq_accept_head = NULL; + from->rskq_accept_tail = NULL; +} + +static inline void smc_reqsk_queue_cut_locked(struct request_sock_queue *queue, + struct request_sock *tail, + struct request_sock_queue *split) +{ + if (!tail) { + split->rskq_accept_tail = queue->rskq_accept_tail; + split->rskq_accept_head = queue->rskq_accept_head; + queue->rskq_accept_tail = NULL; + queue->rskq_accept_head = NULL; + return; + } + + if (tail == queue->rskq_accept_tail) { + split->rskq_accept_tail = NULL; + split->rskq_accept_head = NULL; + return; + } + + split->rskq_accept_head = tail->dl_next; + split->rskq_accept_tail = queue->rskq_accept_tail; + queue->rskq_accept_tail = tail; + tail->dl_next = NULL; +} + +static inline void __smc_inet_sock_sort_csk_queue(struct sock *parent, int *tcp_cnt, int *smc_cnt) +{ + struct request_sock_queue queue_smc, queue_free; + struct smc_sock *par = smc_sk(parent); + struct request_sock_queue *queue; + struct request_sock *req; + int cnt0, cnt1; + + queue = &inet_csk(parent)->icsk_accept_queue; + + spin_lock_bh(&queue->rskq_lock); + + par->tail_0 = smc_inet_reqsk_get_safe_tail_0(parent); + par->tail_1 = smc_inet_reqsk_get_safe_tail_1(parent); + + cnt0 = par->tail_0 ? smc_sk(par->tail_0->sk)->queued_cnt : 0; + cnt1 = par->tail_1 ? smc_sk(par->tail_1->sk)->queued_cnt : 0; + + smc_reqsk_queue_cut_locked(queue, par->tail_0, &queue_smc); + smc_reqsk_queue_cut_locked(&queue_smc, par->tail_1, &queue_free); + + /* scan all queue_free and re-add it */ + while ((req = queue_free.rskq_accept_head)) { + smc_sk(req->sk)->ordered = 1; + smc_reqsk_queue_remove_locked(&queue_free); + /* It's not good at timecast, but better to understand */ + if (tcp_sk(req->sk)->syn_smc) { + smc_reqsk_queue_add_locked(&queue_smc, req); + cnt1++; + } else { + smc_reqsk_queue_add_locked(queue, req); + cnt0++; + } + } + /* update tail */ + par->tail_0 = queue->rskq_accept_tail; + par->tail_1 = queue_smc.rskq_accept_tail; + + /* join queue */ + smc_reqsk_queue_join_locked(queue, &queue_smc); + + if (par->tail_0) + smc_sk(par->tail_0->sk)->queued_cnt = cnt0; + + if (par->tail_1) + smc_sk(par->tail_1->sk)->queued_cnt = cnt1; + + *tcp_cnt = cnt0; + *smc_cnt = cnt1; + + spin_unlock_bh(&queue->rskq_lock); +} + +static int smc_inet_sock_sort_csk_queue(struct sock *parent) +{ + int smc_cnt, tcp_cnt; + int mask = 0; + + __smc_inet_sock_sort_csk_queue(parent, &tcp_cnt, &smc_cnt); + if (tcp_cnt) + mask |= SMC_REQSK_TCP; + if (smc_cnt) + mask |= SMC_REQSK_SMC; + + return mask; +} + +static void smc_inet_listen_work(struct work_struct *work) +{ + struct smc_sock *smc = container_of(work, struct smc_sock, + smc_listen_work); + struct sock *sk = &smc->sk; + + /* Initialize accompanying socket */ + smc_inet_sock_init_accompany_socket(sk); + + /* current smc sock has not bee accept yet. */ + rcu_assign_pointer(sk->sk_wq, &smc_sk(sk)->accompany_socket.wq); + smc_listen_work(work); +} + +/* Wait for an incoming connection, avoid race conditions. This must be called + * with the socket locked. + */ +static int smc_inet_csk_wait_for_connect(struct sock *sk, long *timeo) +{ + struct inet_connection_sock *icsk = inet_csk(sk); + DEFINE_WAIT(wait); + int err; + + lock_sock(sk); + + /* True wake-one mechanism for incoming connections: only + * one process gets woken up, not the 'whole herd'. + * Since we do not 'race & poll' for established sockets + * anymore, the common case will execute the loop only once. + * + * Subtle issue: "add_wait_queue_exclusive()" will be added + * after any current non-exclusive waiters, and we know that + * it will always _stay_ after any new non-exclusive waiters + * because all non-exclusive waiters are added at the + * beginning of the wait-queue. As such, it's ok to "drop" + * our exclusiveness temporarily when we get woken up without + * having to remove and re-insert us on the wait queue. + */ + for (;;) { + prepare_to_wait_exclusive(sk_sleep(sk), &wait, + TASK_INTERRUPTIBLE); + release_sock(sk); + if (smc_accept_queue_empty(sk) && reqsk_queue_empty(&icsk->icsk_accept_queue)) + *timeo = schedule_timeout(*timeo); + sched_annotate_sleep(); + lock_sock(sk); + err = 0; + if (!reqsk_queue_empty(&icsk->icsk_accept_queue)) + break; + if (!smc_accept_queue_empty(sk)) + break; + err = -EINVAL; + if (sk->sk_state != TCP_LISTEN) + break; + err = sock_intr_errno(*timeo); + if (signal_pending(current)) + break; + err = -EAGAIN; + if (!*timeo) + break; + } + finish_wait(sk_sleep(sk), &wait); + release_sock(sk); + return err; +} + +struct sock *__smc_inet_csk_accept(struct sock *sk, int flags, int *err, bool kern, int next_state) +{ + struct sock *child; + int cur; + + child = inet_csk_accept(sk, flags | O_NONBLOCK, err, kern); + if (child) { + smc_sk(child)->listen_smc = smc_sk(sk); + + /* depends on syn_smc if next_state not specify */ + if (next_state == SMC_NEGOTIATION_TBD) + next_state = tcp_sk(child)->syn_smc ? SMC_NEGOTIATION_PREPARE_SMC : + SMC_NEGOTIATION_NO_SMC; + + cur = smc_inet_sock_move_state(child, SMC_NEGOTIATION_TBD, + next_state); + switch (cur) { + case SMC_NEGOTIATION_NO_SMC: + smc_sk_set_state(child, SMC_ACTIVE); + smc_switch_to_fallback(smc_sk(child), SMC_CLC_DECL_PEERNOSMC); + break; + case SMC_NEGOTIATION_PREPARE_SMC: + /* init as passive open smc sock */ + smc_sock_init_passive(sk, child); + break; + default: + break; + } + } + return child; +} + +struct sock *smc_inet_csk_accept(struct sock *sk, int flags, int *err, bool kern) +{ + struct sock *child; + long timeo; + + timeo = sock_rcvtimeo(sk, flags & O_NONBLOCK); + +again: + /* has smc sock */ + if (!smc_accept_queue_empty(sk)) { + child = __smc_accept(sk, NULL, flags | O_NONBLOCK, err, kern); + if (child) + return child; + } + + child = __smc_inet_csk_accept(sk, flags | O_NONBLOCK, err, kern, SMC_NEGOTIATION_TBD); + if (child) { + /* not smc sock */ + if (smc_inet_sock_check_fallback_fast(child)) + return child; + /* smc sock */ + smc_inet_sock_do_handshake(child, /* sk not locked */ false, /* sync */ false); + *err = -EAGAIN; + child = NULL; + } + + if (*err == -EAGAIN && timeo) { + *err = smc_inet_csk_wait_for_connect(sk, &timeo); + if (*err == 0) + goto again; + } + + return NULL; +} + +static void smc_inet_tcp_listen_work(struct work_struct *work) +{ + struct smc_sock *lsmc = container_of(work, struct smc_sock, + tcp_listen_work); + struct sock *lsk = &lsmc->sk; + struct sock *child; + int error = 0; + + while (smc_sk_state(lsk) == SMC_LISTEN && + (smc_inet_sock_sort_csk_queue(lsk) & SMC_REQSK_SMC)) { + child = __smc_inet_csk_accept(lsk, O_NONBLOCK, &error, 1, + SMC_NEGOTIATION_PREPARE_SMC); + if (!child || error) + break; + + /* run handshake for child + * If child is a fallback connection, run a sync handshake to eliminate + * the impact of queue_work(). + */ + smc_inet_sock_do_handshake(child, /* sk not locked */ false, + !tcp_sk(child)->syn_smc); + } +} + +static void smc_inet_sock_data_ready(struct sock *sk) +{ + struct smc_sock *smc = smc_sk(sk); + int mask; + + if (inet_sk_state_load(sk) == TCP_LISTEN) { + mask = smc_inet_sock_sort_csk_queue(sk); + if (mask & SMC_REQSK_TCP || !smc_accept_queue_empty(sk)) + smc->clcsk_data_ready(sk); + if (mask & SMC_REQSK_SMC) + queue_work(smc_tcp_ls_wq, &smc->tcp_listen_work); + } else { + write_lock_bh(&sk->sk_callback_lock); + sk->sk_data_ready = smc->clcsk_data_ready; + write_unlock_bh(&sk->sk_callback_lock); + smc->clcsk_data_ready(sk); + } +} + +int smc_inet_listen(struct socket *sock, int backlog) +{ + struct sock *sk = sock->sk; + bool need_init = false; + struct smc_sock *smc; + + smc = smc_sk(sk); + + write_lock_bh(&sk->sk_callback_lock); + /* still wish to accept smc sock */ + if (isck_smc_negotiation_load(smc) == SMC_NEGOTIATION_TBD) { + need_init = tcp_sk(sk)->syn_smc = 1; + isck_smc_negotiation_set_flags(smc, SMC_NEGOTIATION_LISTEN_FLAG); + } + write_unlock_bh(&sk->sk_callback_lock); + + if (need_init) { + lock_sock(sk); + if (smc_sk_state(sk) == SMC_INIT) { + smc_init_listen(smc); + INIT_WORK(&smc->tcp_listen_work, smc_inet_tcp_listen_work); + smc_clcsock_replace_cb(&sk->sk_data_ready, smc_inet_sock_data_ready, + &smc->clcsk_data_ready); + smc_sk_set_state(sk, SMC_LISTEN); + } + release_sock(sk); + } + return inet_listen(sock, backlog); +} + +static int __smc_inet_connect_work_locked(struct smc_sock *smc) +{ + int rc; + + rc = __smc_connect(smc); + if (rc < 0) + smc->sk.sk_err = -rc; + + smc_inet_sock_move_state(&smc->sk, SMC_NEGOTIATION_PREPARE_SMC, + (smc->use_fallback && + smc_sk_state(&smc->sk) == SMC_ACTIVE) ? + SMC_NEGOTIATION_NO_SMC : SMC_NEGOTIATION_SMC); + + if (!smc_sock_flag(&smc->sk, SOCK_DEAD)) { + if (smc->sk.sk_err) + smc->sk.sk_state_change(&smc->sk); + else + smc->sk.sk_write_space(&smc->sk); + } + + return rc; +} + +static void smc_inet_connect_work(struct work_struct *work) +{ + struct smc_sock *smc = container_of(work, struct smc_sock, + connect_work); + + sock_hold(&smc->sk); /* sock put bellow */ + lock_sock(&smc->sk); + __smc_inet_connect_work_locked(smc); + release_sock(&smc->sk); + sock_put(&smc->sk); /* sock hold above */ +} + +/* caller MUST not access sk after smc_inet_sock_do_handshake + * is invoked unless a sock_hold() has performed beforehand. + */ +static int smc_inet_sock_do_handshake(struct sock *sk, bool sk_locked, bool sync) +{ + struct smc_sock *smc = smc_sk(sk); + int rc = 0; + + if (smc_inet_sock_is_active_open(sk)) { + INIT_WORK(&smc->connect_work, smc_inet_connect_work); + if (!sync) { + queue_work(smc_hs_wq, &smc->connect_work); + return 0; + } + if (sk_locked) + return __smc_inet_connect_work_locked(smc); + lock_sock(sk); + rc = __smc_inet_connect_work_locked(smc); + release_sock(sk); + return rc; + } + + INIT_WORK(&smc->smc_listen_work, smc_inet_listen_work); + /* protected listen_smc during smc_inet_listen_work */ + sock_hold(&smc->listen_smc->sk); + + if (!sync) + queue_work(smc_hs_wq, &smc->smc_listen_work); + else + smc_inet_listen_work(&smc->smc_listen_work); + + /* listen work has no retval */ + return 0; +} + +void smc_inet_sock_state_change(struct sock *sk) +{ + struct smc_sock *smc = smc_sk(sk); + int cur; + + if (sk->sk_err || (1 << sk->sk_state) & (TCPF_CLOSE_WAIT | TCPF_ESTABLISHED)) { + write_lock_bh(&sk->sk_callback_lock); + + /* resume sk_state_change */ + sk->sk_state_change = smc->clcsk_state_change; + + /* cause by abort */ + if (isck_smc_negotiation_get_flags(smc_sk(sk)) & SMC_NEGOTIATION_ABORT_FLAG) + goto out_unlock; + + if (isck_smc_negotiation_load(smc) != SMC_NEGOTIATION_TBD) + goto out_unlock; + + cur = smc_inet_sock_move_state_locked(sk, SMC_NEGOTIATION_TBD, + (tcp_sk(sk)->syn_smc && + !sk->sk_err) ? + SMC_NEGOTIATION_PREPARE_SMC : + SMC_NEGOTIATION_NO_SMC); + + if (cur == SMC_NEGOTIATION_PREPARE_SMC) { + smc_inet_sock_do_handshake(sk, /* not locked */ false, /* async */ false); + } else if (cur == SMC_NEGOTIATION_NO_SMC) { + smc->use_fallback = true; + smc->fallback_rsn = SMC_CLC_DECL_PEERNOSMC; + smc_stat_fallback(smc); + trace_smc_switch_to_fallback(smc, SMC_CLC_DECL_PEERNOSMC); + smc->connect_nonblock = 0; + smc_sk_set_state(&smc->sk, SMC_ACTIVE); + } +out_unlock: + write_unlock_bh(&sk->sk_callback_lock); + } + + smc->clcsk_state_change(sk); +} + +int smc_inet_init_sock(struct sock *sk) +{ + struct smc_sock *smc = smc_sk(sk); + int rc; + + tcp_sk(sk)->is_smc = 1; + + /* Call tcp init sock first */ + rc = smc_inet_get_tcp_prot(sk->sk_family)->init(sk); + if (rc) + return rc; + + /* init common smc sock */ + smc_sock_init(sk, sock_net(sk)); + + /* IPPROTO_SMC does not exist in network, we MUST + * reset it to IPPROTO_TCP before connect. + */ + sk->sk_protocol = IPPROTO_TCP; + + /* Initialize smc_sock state */ + smc_sk_set_state(sk, SMC_INIT); + + /* built link */ + smc_inet_sock_init_accompany_socket(sk); + + /* Initialize negotiation state, see more details in + * enum smc_inet_sock_negotiation_state. + */ + isck_smc_negotiation_store(smc, SMC_NEGOTIATION_TBD); + + return 0; +} + +void smc_inet_sock_proto_release_cb(struct sock *sk) +{ + tcp_release_cb(sk); + + /* smc_release_cb only works for socks who identified + * as SMC. Note listen sock will also return here. + */ + if (!smc_inet_sock_check_smc(sk)) + return; + + smc_release_cb(sk); +} + +int smc_inet_connect(struct socket *sock, struct sockaddr *addr, + int alen, int flags) +{ + return smc_connect(sock, addr, alen, flags); +} + +int smc_inet_setsockopt(struct socket *sock, int level, int optname, + sockptr_t optval, unsigned int optlen) +{ + struct sock *sk = sock->sk; + struct smc_sock *smc; + bool fallback; + int rc; + + smc = smc_sk(sk); + fallback = smc_inet_sock_check_fallback(sk); + + if (level == SOL_SMC) + return __smc_setsockopt(sock, level, optname, optval, optlen); + + /* Note that we always need to check if it's an unsupported + * options before set it to the given value via sock_common_setsockopt(). + * This is because if we set it after we found it is not supported to smc and + * we have no idea to fallback, we have to report this error to userspace. + * However, the user might find it is set correctly via sock_common_getsockopt(). + */ + if (!fallback && level == SOL_TCP && smc_is_unsupport_tcp_sockopt(optname)) { + /* can not fallback, but with not-supported option */ + if (!smc_inet_sock_try_disable_smc(sk, SMC_NEGOTIATION_NOT_SUPPORT_FLAG)) + return -EOPNOTSUPP; + fallback = true; + smc_switch_to_fallback(smc_sk(sk), SMC_CLC_DECL_OPTUNSUPP); + } + + /* call original setsockopt */ + rc = sock_common_setsockopt(sock, level, optname, optval, optlen); + if (rc) + return rc; + + /* already be fallback */ + if (fallback) + return 0; + + /* deliver to smc if needed */ + return smc_setsockopt_common(sock, level, optname, optval, optlen); +} + +int smc_inet_getsockopt(struct socket *sock, int level, int optname, + char __user *optval, int __user *optlen) +{ + if (level == SOL_SMC) + return __smc_getsockopt(sock, level, optname, optval, optlen); + + /* smc_getsockopt is just a wrap on sock_common_getsockopt + * So we don't need to reuse it. + */ + return sock_common_getsockopt(sock, level, optname, optval, optlen); +} + +int smc_inet_ioctl(struct socket *sock, unsigned int cmd, + unsigned long arg) +{ + struct sock *sk = sock->sk; + int rc; + + if (smc_inet_sock_check_fallback(sk)) +fallback: + return smc_call_inet_sock_ops(sk, inet_ioctl, inet6_ioctl, sock, cmd, arg); + + rc = smc_ioctl(sock, cmd, arg); + if (unlikely(smc_sk(sk)->use_fallback)) + goto fallback; + + return rc; +} + +int smc_inet_sendmsg(struct socket *sock, struct msghdr *msg, size_t len) +{ + struct sock *sk = sock->sk; + struct smc_sock *smc; + int rc; + + smc = smc_sk(sk); + + /* Send before connected, might be fastopen or user's incorrect usage, but + * whatever, in either case, we do not need to replace it with SMC any more. + * If it dues to user's incorrect usage, then it is also an error for TCP. + * Users should correct that error themselves. + */ + if (smc_inet_sock_rectify_state(sk) == SMC_NEGOTIATION_NO_SMC) + goto no_smc; + + rc = smc_sendmsg(sock, msg, len); + if (likely(!smc->use_fallback)) + return rc; + + /* Fallback during smc_sendmsg */ +no_smc: + return smc_call_inet_sock_ops(sk, inet_sendmsg, inet6_sendmsg, sock, msg, len); +} + +int smc_inet_recvmsg(struct socket *sock, struct msghdr *msg, size_t len, + int flags) +{ + struct sock *sk = sock->sk; + struct smc_sock *smc; + int rc; + + smc = smc_sk(sk); + + /* Recv before connection goes established, it's okay for TCP but not + * support in SMC(see smc_recvmsg), we should try our best to fallback + * if passible. + */ + if (smc_inet_sock_rectify_state(sk) == SMC_NEGOTIATION_NO_SMC) + goto no_smc; + + rc = smc_recvmsg(sock, msg, len, flags); + if (likely(!smc->use_fallback)) + return rc; + + /* Fallback during smc_recvmsg */ +no_smc: + return smc_call_inet_sock_ops(sk, inet_recvmsg, inet6_recvmsg, sock, msg, len, flags); +} + +ssize_t smc_inet_splice_read(struct socket *sock, loff_t *ppos, + struct pipe_inode_info *pipe, size_t len, + unsigned int flags) +{ + struct sock *sk = sock->sk; + struct smc_sock *smc; + int rc; + + smc = smc_sk(sk); + + if (smc_inet_sock_rectify_state(sk) == SMC_NEGOTIATION_NO_SMC) + goto no_smc; + + rc = smc_splice_read(sock, ppos, pipe, len, flags); + if (likely(!smc->use_fallback)) + return rc; + + /* Fallback during smc_splice_read */ +no_smc: + return tcp_splice_read(sock, ppos, pipe, len, flags); +} + +static inline __poll_t smc_inet_listen_poll(struct file *file, struct socket *sock, + poll_table *wait) +{ + __poll_t mask; + + mask = tcp_poll(file, sock, wait); + /* no tcp sock */ + if (!(smc_inet_sock_sort_csk_queue(sock->sk) & SMC_REQSK_TCP)) + mask &= ~(EPOLLIN | EPOLLRDNORM); + mask |= smc_accept_poll(sock->sk); + return mask; +} + +__poll_t smc_inet_poll(struct file *file, struct socket *sock, poll_table *wait) +{ + struct sock *sk = sock->sk; + __poll_t mask; + + if (smc_inet_sock_check_fallback_fast(sk)) +no_smc: + return tcp_poll(file, sock, wait); + + /* special case */ + if (inet_sk_state_load(sk) == TCP_LISTEN) + return smc_inet_listen_poll(file, sock, wait); + + mask = smc_poll(file, sock, wait); + if (smc_sk(sk)->use_fallback) + goto no_smc; + + return mask; +} + +int smc_inet_shutdown(struct socket *sock, int how) +{ + struct sock *sk = sock->sk; + struct smc_sock *smc; + int rc; + + smc = smc_sk(sk); + + /* All state changes of sock are handled by inet_shutdown, + * smc only needs to be responsible for + * executing the corresponding semantics. + */ + rc = inet_shutdown(sock, how); + if (rc) + return rc; + + /* shutdown during SMC_NEGOTIATION_TBD, we can force it to be + * fallback. + */ + if (smc_inet_sock_try_disable_smc(sk, SMC_NEGOTIATION_ABORT_FLAG)) + return 0; + + /* executing the corresponding semantics if can not be fallback */ + lock_sock(sk); + switch (how) { + case SHUT_RDWR: /* shutdown in both directions */ + rc = smc_close_active(smc); + break; + case SHUT_WR: + rc = smc_close_shutdown_write(smc); + break; + case SHUT_RD: + rc = 0; + /* nothing more to do because peer is not involved */ + break; + } + release_sock(sk); + return rc; +} + +int smc_inet_release(struct socket *sock) +{ + struct sock *sk = sock->sk; + struct smc_sock *smc; + int old_state, rc; + bool do_free = false; + + if (!sk) + return 0; + + smc = smc_sk(sk); + + old_state = smc_sk_state(sk); + + sock_hold(sk); /* sock put bellow */ + + smc_inet_sock_try_disable_smc(sk, SMC_NEGOTIATION_ABORT_FLAG); + + /* check fallback ? */ + if (smc_inet_sock_check_fallback(sk)) { + if (smc_sk_state(sk) == SMC_ACTIVE) + sock_put(sk); /* sock put for passive closing */ + smc_sock_set_flag(sk, SOCK_DEAD); + smc_sk_set_state(sk, SMC_CLOSED); + goto out; + } + + if (smc->connect_nonblock && cancel_work_sync(&smc->connect_work)) + sock_put(&smc->sk); /* sock_hold for passive closing */ + + if (smc_sk_state(sk) == SMC_LISTEN) + /* smc_close_non_accepted() is called and acquires + * sock lock for child sockets again + */ + lock_sock_nested(sk, SINGLE_DEPTH_NESTING); + else + lock_sock(sk); + + if (!smc->use_fallback) { + /* ret of smc_close_active do not need return to userspace */ + smc_close_active(smc); + do_free = true; + } else { + if (smc_sk_state(sk) == SMC_ACTIVE) + sock_put(sk); /* sock put for passive closing */ + smc_sk_set_state(sk, SMC_CLOSED); + } + smc_sock_set_flag(sk, SOCK_DEAD); + + release_sock(sk); +out: + /* release tcp sock */ + rc = smc_call_inet_sock_ops(sk, inet_release, inet6_release, sock); + + if (do_free) { + lock_sock(sk); + if (smc_sk_state(sk) == SMC_CLOSED) + smc_conn_free(&smc->conn); + release_sock(sk); + } + sock_put(sk); /* sock hold above */ + return rc; +} + static int __init smc_init(void) { int rc; diff --git a/net/smc/smc.h b/net/smc/smc.h index 538920f..0507e98 100644 --- a/net/smc/smc.h +++ b/net/smc/smc.h @@ -251,9 +251,14 @@ struct smc_sock { /* smc sock container */ struct sock sk; }; struct socket *clcsock; /* internal tcp socket */ + struct socket accompany_socket; unsigned char smc_state; /* smc state used in smc via inet_sk */ unsigned int isck_smc_negotiation; unsigned long smc_sk_flags; /* smc sock flags used for inet sock */ + unsigned int queued_cnt; + struct request_sock *tail_0; + struct request_sock *tail_1; + struct request_sock *reqsk; void (*clcsk_state_change)(struct sock *sk); /* original stat_change fct. */ void (*clcsk_data_ready)(struct sock *sk); @@ -262,6 +267,7 @@ struct smc_sock { /* smc sock container */ /* original write_space fct. */ void (*clcsk_error_report)(struct sock *sk); /* original error_report fct. */ + void (*original_sk_destruct)(struct sock *sk); struct smc_connection conn; /* smc connection */ struct smc_sock *listen_smc; /* listen parent */ struct work_struct connect_work; /* handle non-blocking connect*/ @@ -290,6 +296,7 @@ struct smc_sock { /* smc sock container */ /* non-blocking connect in * flight */ + u8 ordered : 1; struct mutex clcsock_release_lock; /* protects clcsock of a listen * socket diff --git a/net/smc/smc_cdc.h b/net/smc/smc_cdc.h index 696cc11..4b33947a 100644 --- a/net/smc/smc_cdc.h +++ b/net/smc/smc_cdc.h @@ -302,4 +302,12 @@ int smcr_cdc_msg_send_validation(struct smc_connection *conn, int smc_cdc_init(void) __init; void smcd_cdc_rx_init(struct smc_connection *conn); +static inline bool smc_has_rcv_shutdown(struct sock *sk) +{ + if (smc_sock_is_inet_sock(sk)) + return smc_cdc_rxed_any_close_or_senddone(&smc_sk(sk)->conn); + else + return sk->sk_shutdown & RCV_SHUTDOWN; +} + #endif /* SMC_CDC_H */ diff --git a/net/smc/smc_clc.h b/net/smc/smc_clc.h index 7cc7070..5bcd7a3 100644 --- a/net/smc/smc_clc.h +++ b/net/smc/smc_clc.h @@ -35,6 +35,7 @@ #define SMC_CLC_DECL_TIMEOUT_AL 0x02020000 /* timeout w4 QP add link */ #define SMC_CLC_DECL_CNFERR 0x03000000 /* configuration error */ #define SMC_CLC_DECL_PEERNOSMC 0x03010000 /* peer did not indicate SMC */ +#define SMC_CLC_DECL_ACTIVE 0x03010001 /* local active fallback */ #define SMC_CLC_DECL_IPSEC 0x03020000 /* IPsec usage */ #define SMC_CLC_DECL_NOSMCDEV 0x03030000 /* no SMC device found (R or D) */ #define SMC_CLC_DECL_NOSMCDDEV 0x03030001 /* no SMC-D device found */ diff --git a/net/smc/smc_close.c b/net/smc/smc_close.c index 8d9512e..098b123 100644 --- a/net/smc/smc_close.c +++ b/net/smc/smc_close.c @@ -19,6 +19,7 @@ #include "smc_tx.h" #include "smc_cdc.h" #include "smc_close.h" +#include "smc_inet.h" /* release the clcsock that is assigned to the smc_sock */ void smc_clcsock_release(struct smc_sock *smc) @@ -27,6 +28,10 @@ void smc_clcsock_release(struct smc_sock *smc) if (smc->listen_smc && current_work() != &smc->smc_listen_work) cancel_work_sync(&smc->smc_listen_work); + + if (smc_sock_is_inet_sock(&smc->sk)) + return; + mutex_lock(&smc->clcsock_release_lock); if (smc->clcsock) { tcp = smc->clcsock; @@ -130,11 +135,16 @@ void smc_close_active_abort(struct smc_sock *smc) struct sock *sk = &smc->sk; bool release_clcsock = false; - if (smc_sk_state(sk) != SMC_INIT && smc->clcsock && smc->clcsock->sk) { - sk->sk_err = ECONNABORTED; - if (smc->clcsock && smc->clcsock->sk) + if (smc_sk_state(sk) != SMC_INIT) { + /* sock locked */ + if (smc_sock_is_inet_sock(sk)) { + smc_inet_sock_abort(sk); + } else if (smc->clcsock && smc->clcsock->sk) { + sk->sk_err = ECONNABORTED; tcp_abort(smc->clcsock->sk, ECONNABORTED); + } } + switch (smc_sk_state(sk)) { case SMC_ACTIVE: case SMC_APPCLOSEWAIT1: diff --git a/net/smc/smc_inet.c b/net/smc/smc_inet.c index d35b567..353f6a8 100644 --- a/net/smc/smc_inet.c +++ b/net/smc/smc_inet.c @@ -36,7 +36,7 @@ struct proto smc_inet_prot = { .name = "SMC", .owner = THIS_MODULE, .close = tcp_close, - .pre_connect = NULL, + .pre_connect = NULL, .connect = tcp_v4_connect, .disconnect = tcp_disconnect, .accept = smc_inet_csk_accept, @@ -121,7 +121,7 @@ struct proto smc_inet6_prot = { .name = "SMCv6", .owner = THIS_MODULE, .close = tcp_close, - .pre_connect = NULL, + .pre_connect = NULL, .connect = NULL, .disconnect = tcp_disconnect, .accept = smc_inet_csk_accept, @@ -145,6 +145,7 @@ struct proto smc_inet6_prot = { .stream_memory_free = tcp_stream_memory_free, .sockets_allocated = &tcp_sockets_allocated, .memory_allocated = &tcp_memory_allocated, + .per_cpu_fw_alloc = &tcp_memory_per_cpu_fw_alloc, .memory_pressure = &tcp_memory_pressure, .orphan_count = &tcp_orphan_count, .sysctl_mem = sysctl_tcp_mem, @@ -203,6 +204,54 @@ struct inet_protosw smc_inet6_protosw = { }; #endif +int smc_inet_sock_move_state(struct sock *sk, int except, int target) +{ + int rc; + + write_lock_bh(&sk->sk_callback_lock); + rc = smc_inet_sock_move_state_locked(sk, except, target); + write_unlock_bh(&sk->sk_callback_lock); + return rc; +} + +int smc_inet_sock_move_state_locked(struct sock *sk, int except, int target) +{ + struct smc_sock *smc = smc_sk(sk); + int cur; + + cur = isck_smc_negotiation_load(smc); + if (cur != except) + return cur; + + switch (cur) { + case SMC_NEGOTIATION_TBD: + switch (target) { + case SMC_NEGOTIATION_PREPARE_SMC: + case SMC_NEGOTIATION_NO_SMC: + isck_smc_negotiation_store(smc, target); + sock_hold(sk); /* sock hold for passive closing */ + return target; + default: + break; + } + break; + case SMC_NEGOTIATION_PREPARE_SMC: + switch (target) { + case SMC_NEGOTIATION_NO_SMC: + case SMC_NEGOTIATION_SMC: + isck_smc_negotiation_store(smc, target); + return target; + default: + break; + } + break; + default: + break; + } + + return cur; +} + int smc_inet_sock_init(void) { struct proto *tcp_v4prot; @@ -231,6 +280,7 @@ int smc_inet_sock_init(void) * ensure consistency with TCP. Some symbols here have not been exported, * so that we have to assign it here. */ + smc_inet_prot.pre_connect = tcp_v4prot->pre_connect; #if IS_ENABLED(CONFIG_IPV6) @@ -243,73 +293,158 @@ int smc_inet_sock_init(void) return 0; } -int smc_inet_init_sock(struct sock *sk) { return 0; } +static int smc_inet_clcsock_sendmsg(struct socket *sock, struct msghdr *msg, size_t len) +{ + struct sock *sk = sock->sk; + struct smc_sock *smc; -void smc_inet_sock_proto_release_cb(struct sock *sk) {} + smc = smc_sk(sock->sk); -int smc_inet_connect(struct socket *sock, struct sockaddr *addr, - int alen, int flags) -{ - return -EOPNOTSUPP; -} + if (current_work() == &smc->smc_listen_work) + return tcp_sendmsg(sk, msg, len); -int smc_inet_setsockopt(struct socket *sock, int level, int optname, - sockptr_t optval, unsigned int optlen) -{ - return -EOPNOTSUPP; -} + /* smc_inet_clcsock_sendmsg only works for smc handshaking + * fallback sendmsg should process by smc_inet_sendmsg. + * see more details in smc_inet_sendmsg(). + */ + if (smc->use_fallback) + return -EOPNOTSUPP; -int smc_inet_getsockopt(struct socket *sock, int level, int optname, - char __user *optval, int __user *optlen) -{ - return -EOPNOTSUPP; + /* It is difficult for us to determine whether the current sk is locked. + * Therefore, we rely on the implementation of conenct_work() implementation, which + * is locked always. + */ + return tcp_sendmsg_locked(sk, msg, len); } -int smc_inet_ioctl(struct socket *sock, unsigned int cmd, - unsigned long arg) +int smc_sk_wait_tcp_data(struct sock *sk, long *timeo, const struct sk_buff *skb) { - return -EOPNOTSUPP; + DEFINE_WAIT_FUNC(wait, woken_wake_function); + int rc; + + lock_sock(sk); + add_wait_queue(sk_sleep(sk), &wait); + sk_set_bit(SOCKWQ_ASYNC_WAITDATA, sk); + rc = sk_wait_event(sk, timeo, skb_peek_tail(&sk->sk_receive_queue) != skb || + isck_smc_negotiation_get_flags(smc_sk(sk)) & SMC_NEGOTIATION_ABORT_FLAG, + &wait); + sk_clear_bit(SOCKWQ_ASYNC_WAITDATA, sk); + remove_wait_queue(sk_sleep(sk), &wait); + release_sock(sk); + return rc; } -int smc_inet_sendmsg(struct socket *sock, struct msghdr *msg, size_t len) +static int smc_inet_clcsock_recvmsg(struct socket *sock, struct msghdr *msg, size_t len, + int flags) { - return -EOPNOTSUPP; + struct sock *sk = sock->sk; + struct smc_sock *smc; + int addr_len, err; + long timeo; + + smc = smc_sk(sock->sk); + + /* smc_inet_clcsock_recvmsg only works for smc handshaking + * fallback recvmsg should process by smc_inet_recvmsg. + */ + if (smc->use_fallback) + return -EOPNOTSUPP; + + if (likely(!(flags & MSG_ERRQUEUE))) + sock_rps_record_flow(sk); + + timeo = sock_rcvtimeo(sk, flags & MSG_DONTWAIT); + + /* Locked, see more details in smc_inet_clcsock_sendmsg() */ + if (current_work() != &smc->smc_listen_work) + release_sock(sock->sk); +again: + /* recv nonblock */ + err = tcp_recvmsg(sk, msg, len, flags | MSG_DONTWAIT, &addr_len); + if (err != -EAGAIN || !timeo) + goto out; + + smc_sk_wait_tcp_data(sk, &timeo, NULL); + if (isck_smc_negotiation_get_flags(smc_sk(sk)) & SMC_NEGOTIATION_ABORT_FLAG) { + err = -ECONNABORTED; + goto out; + } + goto again; +out: + if (current_work() != &smc->smc_listen_work) { + lock_sock(sock->sk); + /* since we release sock before, there might be state changed */ + if (err >= 0 && smc_sk_state(&smc->sk) != SMC_INIT) + err = -EPIPE; + } + if (err >= 0) + msg->msg_namelen = addr_len; + return err; } -int smc_inet_recvmsg(struct socket *sock, struct msghdr *msg, size_t len, - int flags) +static ssize_t smc_inet_clcsock_splice_read(struct socket *sock, loff_t *ppos, + struct pipe_inode_info *pipe, size_t len, + unsigned int flags) { + /* fallback splice_read should process by smc_inet_splice_read. */ return -EOPNOTSUPP; } -ssize_t smc_inet_splice_read(struct socket *sock, loff_t *ppos, - struct pipe_inode_info *pipe, size_t len, - unsigned int flags) +static int smc_inet_clcsock_connect(struct socket *sock, struct sockaddr *addr, + int alen, int flags) { - return -EOPNOTSUPP; + /* smc_connect will lock the sock->sk */ + return __inet_stream_connect(sock, addr, alen, flags, 0); } -__poll_t smc_inet_poll(struct file *file, struct socket *sock, poll_table *wait) +static int smc_inet_clcsock_shutdown(struct socket *sock, int how) { + /* shutdown could call from smc_close_active, we should + * not fail it. + */ return 0; } -struct sock *smc_inet_csk_accept(struct sock *sk, int flags, int *err, bool kern) +static int smc_inet_clcsock_release(struct socket *sock) { - return NULL; + /* shutdown could call from smc_close_active, we should + * not fail it. + */ + return 0; } -int smc_inet_listen(struct socket *sock, int backlog) +static int smc_inet_clcsock_getname(struct socket *sock, struct sockaddr *addr, + int peer) { - return -EOPNOTSUPP; -} + int rc; -int smc_inet_shutdown(struct socket *sock, int how) -{ - return -EOPNOTSUPP; + release_sock(sock->sk); + rc = sock->sk->sk_family == PF_INET ? inet_getname(sock, addr, peer) : +#if IS_ENABLED(CONFIG_IPV6) + inet6_getname(sock, addr, peer); +#else + -EINVAL; +#endif + lock_sock(sock->sk); + return rc; } -int smc_inet_release(struct socket *sock) +static __poll_t smc_inet_clcsock_poll(struct file *file, struct socket *sock, + poll_table *wait) { - return -EOPNOTSUPP; + return 0; } + +const struct proto_ops smc_inet_clcsock_ops = { + .family = PF_UNSPEC, + /* It is not a real ops, its lifecycle is bound to the SMC module. */ + .owner = NULL, + .release = smc_inet_clcsock_release, + .getname = smc_inet_clcsock_getname, + .connect = smc_inet_clcsock_connect, + .shutdown = smc_inet_clcsock_shutdown, + .sendmsg = smc_inet_clcsock_sendmsg, + .recvmsg = smc_inet_clcsock_recvmsg, + .splice_read = smc_inet_clcsock_splice_read, + .poll = smc_inet_clcsock_poll, +}; diff --git a/net/smc/smc_inet.h b/net/smc/smc_inet.h index 1f182c0..a8c3c11 100644 --- a/net/smc/smc_inet.h +++ b/net/smc/smc_inet.h @@ -30,6 +30,10 @@ extern struct inet_protosw smc_inet_protosw; extern struct inet_protosw smc_inet6_protosw; +extern const struct proto_ops smc_inet_clcsock_ops; + +void smc_inet_sock_state_change(struct sock *sk); + enum smc_inet_sock_negotiation_state { /* When creating an AF_SMC sock, the state field will be initialized to 0 by default, * which is only for logical compatibility with that situation @@ -64,6 +68,7 @@ enum smc_inet_sock_negotiation_state { /* flags */ SMC_NEGOTIATION_LISTEN_FLAG = 0x01, SMC_NEGOTIATION_ABORT_FLAG = 0x02, + SMC_NEGOTIATION_NOT_SUPPORT_FLAG = 0x04, }; static __always_inline void isck_smc_negotiation_store(struct smc_sock *smc, @@ -123,6 +128,96 @@ static inline void smc_inet_sock_abort(struct sock *sk) sk->sk_error_report(sk); } +int smc_inet_sock_move_state(struct sock *sk, int except, int target); +int smc_inet_sock_move_state_locked(struct sock *sk, int except, int target); + +static inline int smc_inet_sock_set_syn_smc_locked(struct sock *sk, int value) +{ + int flags; + + /* not set syn smc */ + if (value == 0) { + if (smc_sk_state(sk) != SMC_LISTEN) { + smc_inet_sock_move_state_locked(sk, SMC_NEGOTIATION_TBD, + SMC_NEGOTIATION_NO_SMC); + smc_sk_set_state(sk, SMC_ACTIVE); + } + return 0; + } + /* set syn smc */ + flags = isck_smc_negotiation_get_flags(smc_sk(sk)); + if (isck_smc_negotiation_load(smc_sk(sk)) != SMC_NEGOTIATION_TBD) + return 0; + if (flags & SMC_NEGOTIATION_ABORT_FLAG) + return 0; + if (flags & SMC_NEGOTIATION_NOT_SUPPORT_FLAG) + return 0; + tcp_sk(sk)->syn_smc = 1; + return 1; +} + +static inline int smc_inet_sock_try_disable_smc(struct sock *sk, int flag) +{ + struct smc_sock *smc = smc_sk(sk); + int success = 0; + + write_lock_bh(&sk->sk_callback_lock); + switch (isck_smc_negotiation_load(smc)) { + case SMC_NEGOTIATION_TBD: + /* can not disable now */ + if (flag != SMC_NEGOTIATION_ABORT_FLAG && tcp_sk(sk)->syn_smc) + break; + isck_smc_negotiation_set_flags(smc_sk(sk), flag); + fallthrough; + case SMC_NEGOTIATION_NO_SMC: + success = 1; + default: + break; + } + write_unlock_bh(&sk->sk_callback_lock); + return success; +} + +static inline int smc_inet_sock_rectify_state(struct sock *sk) +{ + int cur = isck_smc_negotiation_load(smc_sk(sk)); + + switch (cur) { + case SMC_NEGOTIATION_TBD: + if (!smc_inet_sock_try_disable_smc(sk, SMC_NEGOTIATION_NOT_SUPPORT_FLAG)) + break; + fallthrough; + case SMC_NEGOTIATION_NO_SMC: + return SMC_NEGOTIATION_NO_SMC; + default: + break; + } + return cur; +} + +static __always_inline void smc_inet_sock_init_accompany_socket(struct sock *sk) +{ + struct smc_sock *smc = smc_sk(sk); + + smc->accompany_socket.sk = sk; + init_waitqueue_head(&smc->accompany_socket.wq.wait); + smc->accompany_socket.ops = &smc_inet_clcsock_ops; + smc->accompany_socket.state = SS_UNCONNECTED; + + smc->clcsock = &smc->accompany_socket; +} + +#if IS_ENABLED(CONFIG_IPV6) +#define smc_call_inet_sock_ops(sk, inet, inet6, ...) ({ \ + (sk)->sk_family == PF_INET ? inet(__VA_ARGS__) : \ + inet6(__VA_ARGS__); \ +}) +#else +#define smc_call_inet_sock_ops(sk, inet, inet6, ...) inet(__VA_ARGS__) +#endif +#define SMC_REQSK_SMC 0x01 +#define SMC_REQSK_TCP 0x02 + /* obtain TCP proto via sock family */ static __always_inline struct proto *smc_inet_get_tcp_prot(int family) { @@ -179,4 +274,7 @@ ssize_t smc_inet_splice_read(struct socket *sock, loff_t *ppos, int smc_inet_shutdown(struct socket *sock, int how); int smc_inet_release(struct socket *sock); +int smc_inet_sock_pre_connect(struct sock *sk, struct sockaddr *uaddr, + int addr_len); + #endif // __SMC_INET diff --git a/net/smc/smc_rx.c b/net/smc/smc_rx.c index 684caae..cf9542b 100644 --- a/net/smc/smc_rx.c +++ b/net/smc/smc_rx.c @@ -269,7 +269,7 @@ int smc_rx_wait(struct smc_sock *smc, long *timeo, rc = sk_wait_event(sk, timeo, READ_ONCE(sk->sk_err) || cflags->peer_conn_abort || - READ_ONCE(sk->sk_shutdown) & RCV_SHUTDOWN || + smc_has_rcv_shutdown(sk) || conn->killed || fcrit(conn), &wait); @@ -316,7 +316,7 @@ static int smc_rx_recv_urg(struct smc_sock *smc, struct msghdr *msg, int len, return rc ? -EFAULT : len; } - if (smc_sk_state(sk) == SMC_CLOSED || sk->sk_shutdown & RCV_SHUTDOWN) + if (smc_sk_state(sk) == SMC_CLOSED || smc_has_rcv_shutdown(sk)) return 0; return -EAGAIN; @@ -387,7 +387,7 @@ int smc_rx_recvmsg(struct smc_sock *smc, struct msghdr *msg, if (smc_rx_recvmsg_data_available(smc)) goto copy; - if (sk->sk_shutdown & RCV_SHUTDOWN) { + if (smc_has_rcv_shutdown(sk)) { /* smc_cdc_msg_recv_action() could have run after * above smc_rx_recvmsg_data_available() */ From patchwork Tue Feb 20 07:01:45 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "D. Wythe" X-Patchwork-Id: 13563485 Received: from out30-110.freemail.mail.aliyun.com (out30-110.freemail.mail.aliyun.com [115.124.30.110]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 41DCA5B681; Tue, 20 Feb 2024 07:02:05 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=115.124.30.110 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708412528; cv=none; b=d8ivAyEDORa7wt3RBzkUtzpax8M0gHnlX7Pp03zoE9GkqlOXY3pFGgEiKXrFlX1CunKVzer+VomHnicWRiupvugtdkYqwDL/lsa1a9QGc7SHHZyQuTouK5eKBU88bOqnrniGBYmpQ8FGxHQO6PPtaDzJgVHPQ/v3I8df1zOIsF0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708412528; c=relaxed/simple; bh=Jv+zHoVmLNKmJpm8kJwJMo2+DqHD1ltHOdxjg35ROzU=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References; b=hNTiemGoSKos6SQq23jainrKyBZLjbHA7ht7AkLqKNf0fOfnQWyAw/PCRmpmByIVamkbtB5InQKeqEN6Q6dwrTn4OnQE4TgeejDPskwrzUPV6WzIz/Nq+T2i1hgYVuLOMNDSwciaN9vcTqtCbyFF/ipcJz+MrU3W8TN79VqEoOs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com; spf=pass smtp.mailfrom=linux.alibaba.com; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b=jGGWC/Ga; arc=none smtp.client-ip=115.124.30.110 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b="jGGWC/Ga" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1708412524; h=From:To:Subject:Date:Message-Id; bh=Dgvw2iqatbV+g0QS0mr8fYwI58RDtpslVXvTHIGOg3I=; b=jGGWC/Ga6B98Olx6jrwsd2mbl3ywO7gqikisIwfbAkpdhHJAIkKVPL77UDrOI8zafQSd6X+Wzf1i7dQ0rk9NKYYUE1yHwhCK6+qxsZRZcl1WrRWajRp+FvVYxco3gmle224xytuxc7R90kHChckWOvYPtilg3G3zjOHZTjJTTwY= X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R151e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=ay29a033018046050;MF=alibuda@linux.alibaba.com;NM=1;PH=DS;RN=13;SR=0;TI=SMTPD_---0W0vuXi2_1708412522; Received: from j66a10360.sqa.eu95.tbsite.net(mailfrom:alibuda@linux.alibaba.com fp:SMTPD_---0W0vuXi2_1708412522) by smtp.aliyun-inc.com; Tue, 20 Feb 2024 15:02:03 +0800 From: "D. Wythe" To: kgraul@linux.ibm.com, wenjia@linux.ibm.com, jaka@linux.ibm.com, wintera@linux.ibm.com, guwen@linux.alibaba.com Cc: kuba@kernel.org, davem@davemloft.net, netdev@vger.kernel.org, linux-s390@vger.kernel.org, linux-rdma@vger.kernel.org, tonylu@linux.alibaba.com, pabeni@redhat.com, edumazet@google.com Subject: [RFC net-next 20/20] net/smc: support diag for smc inet mode Date: Tue, 20 Feb 2024 15:01:45 +0800 Message-Id: <1708412505-34470-21-git-send-email-alibuda@linux.alibaba.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1708412505-34470-1-git-send-email-alibuda@linux.alibaba.com> References: <1708412505-34470-1-git-send-email-alibuda@linux.alibaba.com> Precedence: bulk X-Mailing-List: linux-rdma@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: From: "D. Wythe" --- net/smc/smc_diag.c | 155 ++++++++++++++++++++++++++++++++++++++++++++++------- 1 file changed, 137 insertions(+), 18 deletions(-) diff --git a/net/smc/smc_diag.c b/net/smc/smc_diag.c index 59a18ec..20532e1 100644 --- a/net/smc/smc_diag.c +++ b/net/smc/smc_diag.c @@ -22,9 +22,11 @@ #include "smc.h" #include "smc_core.h" #include "smc_ism.h" +#include "smc_inet.h" struct smc_diag_dump_ctx { int pos[2]; + int inet_pos[2]; }; static struct smc_diag_dump_ctx *smc_dump_context(struct netlink_callback *cb) @@ -35,24 +37,42 @@ static struct smc_diag_dump_ctx *smc_dump_context(struct netlink_callback *cb) static void smc_diag_msg_common_fill(struct smc_diag_msg *r, struct sock *sk) { struct smc_sock *smc = smc_sk(sk); + struct sock *clcsk; + bool is_v4, is_v6; + + if (smc_sock_is_inet_sock(sk)) + clcsk = sk; + else if (smc->clcsock) + clcsk = smc->clcsock->sk; + else + return; memset(r, 0, sizeof(*r)); r->diag_family = sk->sk_family; sock_diag_save_cookie(sk, r->id.idiag_cookie); - if (!smc->clcsock) - return; - r->id.idiag_sport = htons(smc->clcsock->sk->sk_num); - r->id.idiag_dport = smc->clcsock->sk->sk_dport; - r->id.idiag_if = smc->clcsock->sk->sk_bound_dev_if; - if (sk->sk_protocol == SMCPROTO_SMC) { - r->id.idiag_src[0] = smc->clcsock->sk->sk_rcv_saddr; - r->id.idiag_dst[0] = smc->clcsock->sk->sk_daddr; + + r->id.idiag_sport = htons(clcsk->sk_num); + r->id.idiag_dport = clcsk->sk_dport; + r->id.idiag_if = clcsk->sk_bound_dev_if; + + is_v4 = smc_sock_is_inet_sock(sk) ? clcsk->sk_family == AF_INET : + sk->sk_protocol == SMCPROTO_SMC; #if IS_ENABLED(CONFIG_IPV6) - } else if (sk->sk_protocol == SMCPROTO_SMC6) { - memcpy(&r->id.idiag_src, &smc->clcsock->sk->sk_v6_rcv_saddr, - sizeof(smc->clcsock->sk->sk_v6_rcv_saddr)); - memcpy(&r->id.idiag_dst, &smc->clcsock->sk->sk_v6_daddr, - sizeof(smc->clcsock->sk->sk_v6_daddr)); + is_v6 = smc_sock_is_inet_sock(sk) ? clcsk->sk_family == AF_INET6 : + sk->sk_protocol == SMCPROTO_SMC6; +#else + is_v6 = false; +#endif + + if (is_v4) { + r->id.idiag_src[0] = clcsk->sk_rcv_saddr; + r->id.idiag_dst[0] = clcsk->sk_daddr; +#if IS_ENABLED(CONFIG_IPV6) + } else if (is_v6) { + memcpy(&r->id.idiag_src, &clcsk->sk_v6_rcv_saddr, + sizeof(clcsk->sk_v6_rcv_saddr)); + memcpy(&r->id.idiag_dst, &clcsk->sk_v6_daddr, + sizeof(clcsk->sk_v6_daddr)); #endif } } @@ -72,7 +92,7 @@ static int smc_diag_msg_attrs_fill(struct sock *sk, struct sk_buff *skb, static int __smc_diag_dump(struct sock *sk, struct sk_buff *skb, struct netlink_callback *cb, const struct smc_diag_req *req, - struct nlattr *bc) + struct nlattr *bc, bool is_listen) { struct smc_sock *smc = smc_sk(sk); struct smc_diag_fallback fallback; @@ -88,6 +108,12 @@ static int __smc_diag_dump(struct sock *sk, struct sk_buff *skb, r = nlmsg_data(nlh); smc_diag_msg_common_fill(r, sk); r->diag_state = smc_sk_state(sk); + + if (is_listen) + r->diag_state = SMC_LISTEN; + else + r->diag_state = smc_sk_state(sk); + if (smc->use_fallback) r->diag_mode = SMC_DIAG_MODE_FALLBACK_TCP; else if (smc_conn_lgr_valid(&smc->conn) && smc->conn.lgr->is_smcd) @@ -193,6 +219,82 @@ static int __smc_diag_dump(struct sock *sk, struct sk_buff *skb, return -EMSGSIZE; } +static int smc_diag_dump_inet_proto(struct inet_hashinfo *hashinfo, struct sk_buff *skb, + struct netlink_callback *cb, int p_type) +{ + struct smc_diag_dump_ctx *cb_ctx = smc_dump_context(cb); + struct net *net = sock_net(skb->sk); + int snum = cb_ctx->inet_pos[p_type]; + struct nlattr *bc = NULL; + int rc = 0, num = 0, i; + struct proto *target_proto; + struct sock *sk; + +#if IS_ENABLED(CONFIG_IPV6) + target_proto = (p_type == SMCPROTO_SMC6) ? &smc_inet6_prot : &smc_inet_prot; +#else + target_proto = &smc_inet_prot; +#endif + + for (i = 0; i < hashinfo->lhash2_mask; i++) { + struct inet_listen_hashbucket *ilb; + struct hlist_nulls_node *node; + + ilb = &hashinfo->lhash2[i]; + spin_lock(&ilb->lock); + sk_nulls_for_each(sk, node, &ilb->nulls_head) { + if (!net_eq(sock_net(sk), net)) + continue; + if (sk->sk_prot != target_proto) + continue; + if (num < snum) + goto next_ls; + rc = __smc_diag_dump(sk, skb, cb, nlmsg_data(cb->nlh), bc, 1); + if (rc < 0) { + spin_unlock(&ilb->lock); + goto out; + } +next_ls: + num++; + } + spin_unlock(&ilb->lock); + } + + for (i = 0; i <= hashinfo->ehash_mask; i++) { + struct inet_ehash_bucket *head = &hashinfo->ehash[i]; + spinlock_t *lock = inet_ehash_lockp(hashinfo, i); + struct hlist_nulls_node *node; + + if (hlist_nulls_empty(&head->chain)) + continue; + + spin_lock_bh(lock); + sk_nulls_for_each(sk, node, &head->chain) { + if (!net_eq(sock_net(sk), net)) + continue; + if (sk->sk_state == TCP_TIME_WAIT) + continue; + if (sk->sk_state == TCP_NEW_SYN_RECV) + continue; + if (sk->sk_prot != target_proto) + continue; + if (num < snum) + goto next; + rc = __smc_diag_dump(sk, skb, cb, nlmsg_data(cb->nlh), bc, 0); + if (rc < 0) { + spin_unlock_bh(lock); + goto out; + } +next: + num++; + } + spin_unlock_bh(lock); + } +out: + cb_ctx->inet_pos[p_type] = num; + return rc; +} + static int smc_diag_dump_proto(struct proto *prot, struct sk_buff *skb, struct netlink_callback *cb, int p_type) { @@ -214,7 +316,7 @@ static int smc_diag_dump_proto(struct proto *prot, struct sk_buff *skb, continue; if (num < snum) goto next; - rc = __smc_diag_dump(sk, skb, cb, nlmsg_data(cb->nlh), bc); + rc = __smc_diag_dump(sk, skb, cb, nlmsg_data(cb->nlh), bc, 0); if (rc < 0) goto out; next: @@ -232,8 +334,26 @@ static int smc_diag_dump(struct sk_buff *skb, struct netlink_callback *cb) int rc = 0; rc = smc_diag_dump_proto(&smc_proto, skb, cb, SMCPROTO_SMC); - if (!rc) - smc_diag_dump_proto(&smc_proto6, skb, cb, SMCPROTO_SMC6); + if (rc) + goto out; + +#if IS_ENABLED(CONFIG_IPV6) + rc = smc_diag_dump_proto(&smc_proto6, skb, cb, SMCPROTO_SMC6); + if (rc) + goto out; +#endif + + rc = smc_diag_dump_inet_proto(smc_inet_prot.h.hashinfo, skb, cb, SMCPROTO_SMC); + if (rc) + goto out; + +#if IS_ENABLED(CONFIG_IPV6) + rc = smc_diag_dump_inet_proto(smc_inet6_prot.h.hashinfo, skb, cb, SMCPROTO_SMC6); + if (rc) + goto out; +#endif + return 0; +out: return skb->len; } @@ -273,6 +393,5 @@ static void __exit smc_diag_exit(void) module_init(smc_diag_init); module_exit(smc_diag_exit); MODULE_LICENSE("GPL"); -MODULE_DESCRIPTION("SMC socket monitoring via SOCK_DIAG"); MODULE_ALIAS_NET_PF_PROTO_TYPE(PF_NETLINK, NETLINK_SOCK_DIAG, 43 /* AF_SMC */); MODULE_ALIAS_GENL_FAMILY(SMCR_GENL_FAMILY_NAME);