Message ID | 1699442703-25015-1-git-send-email-alibuda@linux.alibaba.com (mailing list archive) |
---|---|
State | Not Applicable |
Headers | show |
Series | [RFC,net-next] net/smc: Introduce IPPROTO_SMC for smc | expand |
On Wed, Nov 08, 2023 at 07:25:03PM +0800, D. Wythe wrote: >From: "D. Wythe" <alibuda@linux.alibaba.com> > >This patch attempts to initiate a discussion on creating smc socket >via AF_INET, similar to the following code snippet: > >/* create v4 smc sock */ >v4 = socket(AF_INET, SOCK_STREAM, IPPROTO_SMC); > >/* create v6 smc sock */ >v6 = socket(AF_INET6, SOCK_STREAM, IPPROTO_SMC); > >As we all know, the way we currently create an SMC socket as >follows. > >/* create v4 smc sock */ >v4 = socket(AF_SMC, SOCK_STREAM, SMCPROTO_SMC); > >/* create v6 smc sock */ >v6 = socket(AF_SMC, SOCK_STREAM, SMCPROTO_SMC6); > >Note: This is not to suggest removing the SMC path, but rather to propose >adding a new path (inet path). > >There are several reasons why we believe it is much better than AF_SMC: > >Semantics: > >SMC extends the TCP protocol and switches it's data path to RDMA path if >RDMA link is ready. Otherwise, SMC should always try its best to degrade to >TCP. From this perspective, SMC is a protocol derived from TCP and can also >fallback to TCP, It should be considered as part of the same protocol >family as TCP (AF_INET and AF_INET6). > >Compatibility & Scalability: > >Due to the presence of fallback, we needs to handle it very carefully to >keep the consistent with the TCP sockets. SMC has done a lot of work to >ensure that, but still, there are quite a few issues left, such as: > >1. The "ss" command cannot display the process name and ID associated with >the fallback socket. > >2. The linger option is ineffective when user try’s to close the fallback >socket. > >3. Some eBPF attach points related to INET_SOCK are ineffective under >fallback socket, such as BPF_CGROUP_INET_SOCK_RELEASE. > >4. SO_PEEK_OFF is a un-supported sock option for fallback sockets, while >it’s of course supported for tcp sockets. > >Of course, we can fix each issue one by one, but it is not a fundamental >solution. Any changes on the inet path may require re-synchronization, >including bug fixes, security fixes, tracing, new features and more. For >example, there is a commit which we think is very valueable: > >commit 0dd061a6a115 ("bpf: Add update_socket_protocol hook") > >This commit allows users to modify dynamically the protocol before socket >created through eBPF programs, which provides a more flexible approach >than smc_run (LP_PRELOAD). It does not require the process restart >and allows for controlling replacement at the connection level, whereas >smc_run operates at the process level. > >However, to benefit from it under the SMC path requires additional >code submission while nothing changes requires to do under inet path. > >I'm not saying that these issues cannot be fixed under smc path, however, >the solution for these issues often involves duplicating work that already >done on inet path. Thats to say, if we can be under the inet path, we can >easily reuse the existing infrastructure. > >Performance: > >In order to ensure consistency between fallback sockets and TCP sockets, >SMC creates an additional TCP socket. This introduces additional overhead >of approximately 15%-20% for the establishment and destruction of fallback >sockets. In fact, for the users we have contacted who have shown interest >in SMC, ensuring consistency in performance between fallback and TCP has >always been their top priority. Since no one can guarantee the >availability of RDMA links, support for SMC on both sides, or if the >user's environment is 100% suitable for SMC. Fallback is the only way to >address those issues, but the additional performance overhead is >unacceptable, as fallback cannot provide the benefits of RDMA and only >brings burden right now. > >In inet path, we can embed TCP sock into SMC sock, when fallback occurs, >the socket behaves exactly like a TCP socket. In our POC, the performance >of fallback socket under inet path is almost indistinguishable from of >tcp socket, with less than 1% loss. Additionally, and more importantly, >it has full feature compatibility with TCP socket. > >Of course, it is also possible under smc path, but in that way, it >would require a significant amount of work to ensure compatibility with >tcp sockets, which most of them has already been done in inet path. >And still, any changes in inet path may require re-synchronization. > >I also noticed that there have been some discussions on this issue before. > >Link: https://lore.kernel.org/stable/4a873ea1-ba83-1506-9172-e955d5f9ae16@redhat.com/ > >And I saw some supportive opinions here, maybe it is time to continue >discussing this matter now. > >Signed-off-by: D. Wythe <alibuda@linux.alibaba.com> >--- > include/uapi/linux/in.h | 2 ++ > 1 file changed, 2 insertions(+) > >diff --git a/include/uapi/linux/in.h b/include/uapi/linux/in.h >index e682ab6..0c6322b 100644 >--- a/include/uapi/linux/in.h >+++ b/include/uapi/linux/in.h >@@ -83,6 +83,8 @@ enum { > #define IPPROTO_RAW IPPROTO_RAW > IPPROTO_MPTCP = 262, /* Multipath TCP connection */ > #define IPPROTO_MPTCP IPPROTO_MPTCP >+ IPPROTO_SMC = 263, /* Shared Memory Communications */ >+#define IPPROTO_SMC IPPROTO_SMC I think adding a new IPPROTO_SMC protocol is good, but we need to make sure this won't break AF_SMC. Best regards, Dust > IPPROTO_MAX > }; > #endif >-- >1.8.3.1
On 2023/11/8 19:25, D. Wythe wrote: > From: "D. Wythe" <alibuda@linux.alibaba.com> > > This patch attempts to initiate a discussion on creating smc socket > via AF_INET, similar to the following code snippet: > > /* create v4 smc sock */ > v4 = socket(AF_INET, SOCK_STREAM, IPPROTO_SMC); > > /* create v6 smc sock */ > v6 = socket(AF_INET6, SOCK_STREAM, IPPROTO_SMC); > > As we all know, the way we currently create an SMC socket as > follows. > > /* create v4 smc sock */ > v4 = socket(AF_SMC, SOCK_STREAM, SMCPROTO_SMC); > > /* create v6 smc sock */ > v6 = socket(AF_SMC, SOCK_STREAM, SMCPROTO_SMC6); > > Note: This is not to suggest removing the SMC path, but rather to propose > adding a new path (inet path). > > There are several reasons why we believe it is much better than AF_SMC: > > Semantics: > > SMC extends the TCP protocol and switches it's data path to RDMA path if > RDMA link is ready. Otherwise, SMC should always try its best to degrade to > TCP. From this perspective, SMC is a protocol derived from TCP and can also > fallback to TCP, It should be considered as part of the same protocol > family as TCP (AF_INET and AF_INET6). > > Compatibility & Scalability: > > Due to the presence of fallback, we needs to handle it very carefully to > keep the consistent with the TCP sockets. SMC has done a lot of work to > ensure that, but still, there are quite a few issues left, such as: > > 1. The "ss" command cannot display the process name and ID associated with > the fallback socket. > > 2. The linger option is ineffective when user try’s to close the fallback > socket. > > 3. Some eBPF attach points related to INET_SOCK are ineffective under > fallback socket, such as BPF_CGROUP_INET_SOCK_RELEASE. > > 4. SO_PEEK_OFF is a un-supported sock option for fallback sockets, while > it’s of course supported for tcp sockets. > > Of course, we can fix each issue one by one, but it is not a fundamental > solution. Any changes on the inet path may require re-synchronization, > including bug fixes, security fixes, tracing, new features and more. For > example, there is a commit which we think is very valueable: > > commit 0dd061a6a115 ("bpf: Add update_socket_protocol hook") > > This commit allows users to modify dynamically the protocol before socket > created through eBPF programs, which provides a more flexible approach > than smc_run (LP_PRELOAD). It does not require the process restart > and allows for controlling replacement at the connection level, whereas > smc_run operates at the process level. > > However, to benefit from it under the SMC path requires additional > code submission while nothing changes requires to do under inet path. > > I'm not saying that these issues cannot be fixed under smc path, however, > the solution for these issues often involves duplicating work that already > done on inet path. Thats to say, if we can be under the inet path, we can > easily reuse the existing infrastructure. > > Performance: > > In order to ensure consistency between fallback sockets and TCP sockets, > SMC creates an additional TCP socket. This introduces additional overhead > of approximately 15%-20% for the establishment and destruction of fallback > sockets. In fact, for the users we have contacted who have shown interest > in SMC, ensuring consistency in performance between fallback and TCP has > always been their top priority. Since no one can guarantee the > availability of RDMA links, support for SMC on both sides, or if the > user's environment is 100% suitable for SMC. Fallback is the only way to > address those issues, but the additional performance overhead is > unacceptable, as fallback cannot provide the benefits of RDMA and only > brings burden right now. > > In inet path, we can embed TCP sock into SMC sock, when fallback occurs, > the socket behaves exactly like a TCP socket. In our POC, the performance > of fallback socket under inet path is almost indistinguishable from of > tcp socket, with less than 1% loss. Additionally, and more importantly, > it has full feature compatibility with TCP socket. > > Of course, it is also possible under smc path, but in that way, it > would require a significant amount of work to ensure compatibility with > tcp sockets, which most of them has already been done in inet path. > And still, any changes in inet path may require re-synchronization. > > I also noticed that there have been some discussions on this issue before. > > Link: https://lore.kernel.org/stable/4a873ea1-ba83-1506-9172-e955d5f9ae16@redhat.com/ > > And I saw some supportive opinions here, maybe it is time to continue > discussing this matter now. > I see the reasons. Since the introduction of IPPROTO_SMC could mean many works to current SMC code. Could you give us a rough idea about what are you going to do in the implementation? And if the AF_INET+IPPROTO_SMC coexists with current AF_SMC, which one should be chose in different situation? Thanks, Wen Gu > Signed-off-by: D. Wythe <alibuda@linux.alibaba.com> > --- > include/uapi/linux/in.h | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/include/uapi/linux/in.h b/include/uapi/linux/in.h > index e682ab6..0c6322b 100644 > --- a/include/uapi/linux/in.h > +++ b/include/uapi/linux/in.h > @@ -83,6 +83,8 @@ enum { > #define IPPROTO_RAW IPPROTO_RAW > IPPROTO_MPTCP = 262, /* Multipath TCP connection */ > #define IPPROTO_MPTCP IPPROTO_MPTCP > + IPPROTO_SMC = 263, /* Shared Memory Communications */ > +#define IPPROTO_SMC IPPROTO_SMC > IPPROTO_MAX > }; > #endif
diff --git a/include/uapi/linux/in.h b/include/uapi/linux/in.h index e682ab6..0c6322b 100644 --- a/include/uapi/linux/in.h +++ b/include/uapi/linux/in.h @@ -83,6 +83,8 @@ enum { #define IPPROTO_RAW IPPROTO_RAW IPPROTO_MPTCP = 262, /* Multipath TCP connection */ #define IPPROTO_MPTCP IPPROTO_MPTCP + IPPROTO_SMC = 263, /* Shared Memory Communications */ +#define IPPROTO_SMC IPPROTO_SMC IPPROTO_MAX }; #endif