Message ID | 1717061440-59937-1-git-send-email-alibuda@linux.alibaba.com (mailing list archive) |
---|---|
Headers | show |
Series | Introduce IPPROTO_SMC | expand |
On 5/30/24 5:30 PM, D. Wythe wrote: > From: "D. Wythe" <alibuda@linux.alibaba.com> > > This patch allows to create smc socket via AF_INET, > similar to the following code, > > /* create v4 smc sock */ > v4 = socket(AF_INET, SOCK_STREAM, IPPROTO_SMC); > > /* create v6 smc sock */ > v6 = socket(AF_INET6, SOCK_STREAM, IPPROTO_SMC); Welcome everyone to try out the eBPF based version of smc_run during testing, I have added a separate command called smc_run.bpf, it was equivalent to normal smc_run but with IPPROTO_SMC via eBPF. You can obtain the code and more info from: https://github.com/D-Wythe/smc-tools Usage: smc_run.bpf An eBPF implemented smc_run based on IPPROTO_SMC: 1. Support to transparent replacement based on command (Just like smc_run). 2. Supprot to transparent replacement based on pid configuration. And supports the inheritance of this capability between parent and child processes. 3. Support to transparent replacement based on per netns configuration. smc_run.bpf COMMAND 1. Equivalent to smc_run but with IPPROTO_SMC via eBPF smc_run.bpf -p pid 1. Add the process with target pid to the map. Afterward, all socket() calls of the process and its descendant processes will be replaced from IPPROTO_TCP to IPPROTO_SMC. 2. Mapping will be automatically deleted when process exits. 3. Specifically, COMMAND mode is actually works like following: smc_run.bpf -p $$ COMMAND exit smc_run.bpf -n 1 1. Make all socket() calls of the current netns to be replaced from IPPROTO_TCP to IPPROTO_SMC. 2. Turn off it by smc_run.bpf -n 0
On 30.05.24 12:14, D. Wythe wrote: > > > On 5/30/24 5:30 PM, D. Wythe wrote: >> From: "D. Wythe" <alibuda@linux.alibaba.com> >> >> This patch allows to create smc socket via AF_INET, >> similar to the following code, >> >> /* create v4 smc sock */ >> v4 = socket(AF_INET, SOCK_STREAM, IPPROTO_SMC); >> >> /* create v6 smc sock */ >> v6 = socket(AF_INET6, SOCK_STREAM, IPPROTO_SMC); > > Welcome everyone to try out the eBPF based version of smc_run during > testing, I have added a separate command called smc_run.bpf, > it was equivalent to normal smc_run but with IPPROTO_SMC via eBPF. > > You can obtain the code and more info from: > https://github.com/D-Wythe/smc-tools > > Usage: > > smc_run.bpf > An eBPF implemented smc_run based on IPPROTO_SMC: > > 1. Support to transparent replacement based on command (Just like smc_run). > 2. Supprot to transparent replacement based on pid configuration. And > supports the inheritance of this capability between parent and child > processes. > 3. Support to transparent replacement based on per netns configuration. > > smc_run.bpf COMMAND > > 1. Equivalent to smc_run but with IPPROTO_SMC via eBPF > > smc_run.bpf -p pid > > 1. Add the process with target pid to the map. Afterward, all socket() > calls of the process and its descendant processes will be replaced from > IPPROTO_TCP to IPPROTO_SMC. > 2. Mapping will be automatically deleted when process exits. > 3. Specifically, COMMAND mode is actually works like following: > > smc_run.bpf -p $$ > COMMAND > exit > > smc_run.bpf -n 1 > > 1. Make all socket() calls of the current netns to be replaced from > IPPROTO_TCP to IPPROTO_SMC. > 2. Turn off it by smc_run.bpf -n 0 > > Hi D. Wythe, Thank you for the info and description! The code generally looks good to me, just still some details I need to check again. And I'd like to give smc_run.bpf a try, and maybe let you know if it works for me next week. Thanks, Wenjia
On 5/31/24 4:06 PM, Wenjia Zhang wrote: > > > On 30.05.24 12:14, D. Wythe wrote: >> >> >> On 5/30/24 5:30 PM, D. Wythe wrote: >>> From: "D. Wythe" <alibuda@linux.alibaba.com> >>> >>> This patch allows to create smc socket via AF_INET, >>> similar to the following code, >>> >>> /* create v4 smc sock */ >>> v4 = socket(AF_INET, SOCK_STREAM, IPPROTO_SMC); >>> >>> /* create v6 smc sock */ >>> v6 = socket(AF_INET6, SOCK_STREAM, IPPROTO_SMC); >> >> Welcome everyone to try out the eBPF based version of smc_run during >> testing, I have added a separate command called smc_run.bpf, >> it was equivalent to normal smc_run but with IPPROTO_SMC via eBPF. >> >> You can obtain the code and more info from: >> https://github.com/D-Wythe/smc-tools >> >> Usage: >> >> smc_run.bpf >> An eBPF implemented smc_run based on IPPROTO_SMC: >> >> 1. Support to transparent replacement based on command (Just like >> smc_run). >> 2. Supprot to transparent replacement based on pid configuration. And >> supports the inheritance of this capability between parent and child >> processes. >> 3. Support to transparent replacement based on per netns configuration. >> >> smc_run.bpf COMMAND >> >> 1. Equivalent to smc_run but with IPPROTO_SMC via eBPF >> >> smc_run.bpf -p pid >> >> 1. Add the process with target pid to the map. Afterward, all >> socket() calls of the process and its descendant processes will be >> replaced from IPPROTO_TCP to IPPROTO_SMC. >> 2. Mapping will be automatically deleted when process exits. >> 3. Specifically, COMMAND mode is actually works like following: >> >> smc_run.bpf -p $$ >> COMMAND >> exit >> >> smc_run.bpf -n 1 >> >> 1. Make all socket() calls of the current netns to be replaced from >> IPPROTO_TCP to IPPROTO_SMC. >> 2. Turn off it by smc_run.bpf -n 0 >> >> > Hi D. Wythe, > > Thank you for the info and description! The code generally looks good > to me, just still some details I need to check again. And I'd like to > give smc_run.bpf a try, and maybe let you know if it works for me next > week. > > Thanks, > Wenjia Hi Wenjia, That's okay to us. And if there are any issues regarding the use of smc_run.bpf, please let me know. Best wishes, D. Wythe
On Thu, 2024-05-30 at 18:14 +0800, D. Wythe wrote: > > On 5/30/24 5:30 PM, D. Wythe wrote: > > From: "D. Wythe" <alibuda@linux.alibaba.com> > > > > This patch allows to create smc socket via AF_INET, > > similar to the following code, > > > > /* create v4 smc sock */ > > v4 = socket(AF_INET, SOCK_STREAM, IPPROTO_SMC); > > > > /* create v6 smc sock */ > > v6 = socket(AF_INET6, SOCK_STREAM, IPPROTO_SMC); > > Welcome everyone to try out the eBPF based version of smc_run during > testing, I have added a separate command called smc_run.bpf, > it was equivalent to normal smc_run but with IPPROTO_SMC via eBPF. > > You can obtain the code and more info from: > https://github.com/D-Wythe/smc-tools > > Usage: > > smc_run.bpf > An eBPF implemented smc_run based on IPPROTO_SMC: > > 1. Support to transparent replacement based on command (Just like smc_run). > 2. Supprot to transparent replacement based on pid configuration. And > supports the inheritance of this capability between parent and child > processes. > 3. Support to transparent replacement based on per netns configuration. > > smc_run.bpf COMMAND > > 1. Equivalent to smc_run but with IPPROTO_SMC via eBPF > > smc_run.bpf -p pid > > 1. Add the process with target pid to the map. Afterward, all socket() > calls of the process and its descendant processes will be replaced from > IPPROTO_TCP to IPPROTO_SMC. > 2. Mapping will be automatically deleted when process exits. > 3. Specifically, COMMAND mode is actually works like following: > > smc_run.bpf -p $$ > COMMAND > exit > > smc_run.bpf -n 1 > > 1. Make all socket() calls of the current netns to be replaced from > IPPROTO_TCP to IPPROTO_SMC. > 2. Turn off it by smc_run.bpf -n 0 > > Hi D. Wythe, I gave this series plus your smc_run.bpf and SMC_LO based SMC-D a test run on my Ryzen 3900X workstation and I have to say I'm quite impressed. I first tried the SMC_LO feature as merged in v6.10-rc1 with the classic LD_PRELOAD based smc_run and iperf3, and qperf … tcp_bw/tcp_lat both with normal localhost and between docker containers. For this to work I of course had to initially set my UEID as x86_64 unlike s390x doesn't get an SEID set. I used the following script for this. #!/usr/bin/sh machine_id_upper=$(cat /etc/machine-id | tr '[:lower:]' '[:upper:]') machine_id_suffix=$(echo "$machine_id_upper" | head -c 27) ueid="MID-$machine_id_suffix" smcd ueid add "$ueid" The performance is pretty impressive: * iperf3 with 12 parallel connections (matching core count) results in ~152 Gbit/s on normal loopback and ~312 Gbit/s with SMC_LO. * qperf … tcp_bw (single thread) results in ~46 Gbit/s on normal loopback and ~58 Gbit/s with SMC_LO * qperf … tcp_lat latency test results in 5-9 us with normal loopback and around 3-4 us with SMC_LO Then I applied this series on top of v6.10-rc1 and tried it with your smc_run.bpf. The performance is of course in-line with the above but thanks to being able to enable SMC on a per-netns basis I was able to try a few more thing. First I tried just enabling it in my default netns and verified that after restarting sshd new ssh connections to localhost used SMC-D through SMC_LO. Then I started Chrome and confirmed that its TCP connections also registered with SMC and successfully fell back to TCP mode. I had no trouble with normal browsing though I guess especially Google stuff often uses HTTP/3 so isn't affected. Still nice to see I didn't get breakage. Secondly I tried smc_run.bpf with docker containers using the following trick: docker inspect --format '{{.State.Pid}}' <my_container_name> 34651 nsenter -t 34651 -n smc_run.bpf -n 1 Sadly this only works for commands started in the container after loading the BPF. So I wonder if you know of a good way to either automatically execute smc_run.bpf on container start or maybe use it on the docker daemon such that all namespaces created by docker get the IPPROTO_SMC override. I'd then definitely consider using SMC-D with SMC_LO between my home lab containers even if just for bragging rights ;-) Feel free to add for the IPPROTO_SMC series: Tested-by: Niklas Schnelle <schnelle@linux.ibm.com> Thanks, Niklas
On 6/3/24 3:48 PM, Niklas Schnelle wrote: > On Thu, 2024-05-30 at 18:14 +0800, D. Wythe wrote: >> On 5/30/24 5:30 PM, D. Wythe wrote: >>> From: "D. Wythe" <alibuda@linux.alibaba.com> >>> >>> This patch allows to create smc socket via AF_INET, >>> similar to the following code, >>> >>> /* create v4 smc sock */ >>> v4 = socket(AF_INET, SOCK_STREAM, IPPROTO_SMC); >>> >>> /* create v6 smc sock */ >>> v6 = socket(AF_INET6, SOCK_STREAM, IPPROTO_SMC); >> Welcome everyone to try out the eBPF based version of smc_run during >> testing, I have added a separate command called smc_run.bpf, >> it was equivalent to normal smc_run but with IPPROTO_SMC via eBPF. >> >> You can obtain the code and more info from: >> https://github.com/D-Wythe/smc-tools >> >> Usage: >> >> smc_run.bpf >> An eBPF implemented smc_run based on IPPROTO_SMC: >> >> 1. Support to transparent replacement based on command (Just like smc_run). >> 2. Supprot to transparent replacement based on pid configuration. And >> supports the inheritance of this capability between parent and child >> processes. >> 3. Support to transparent replacement based on per netns configuration. >> >> smc_run.bpf COMMAND >> >> 1. Equivalent to smc_run but with IPPROTO_SMC via eBPF >> >> smc_run.bpf -p pid >> >> 1. Add the process with target pid to the map. Afterward, all socket() >> calls of the process and its descendant processes will be replaced from >> IPPROTO_TCP to IPPROTO_SMC. >> 2. Mapping will be automatically deleted when process exits. >> 3. Specifically, COMMAND mode is actually works like following: >> >> smc_run.bpf -p $$ >> COMMAND >> exit >> >> smc_run.bpf -n 1 >> >> 1. Make all socket() calls of the current netns to be replaced from >> IPPROTO_TCP to IPPROTO_SMC. >> 2. Turn off it by smc_run.bpf -n 0 >> >> > Hi D. Wythe, > > I gave this series plus your smc_run.bpf and SMC_LO based SMC-D a test > run on my Ryzen 3900X workstation and I have to say I'm quite > impressed. I first tried the SMC_LO feature as merged in v6.10-rc1 with > the classic LD_PRELOAD based smc_run and iperf3, and qperf … > tcp_bw/tcp_lat both with normal localhost and between docker > containers. For this to work I of course had to initially set my UEID > as x86_64 unlike s390x doesn't get an SEID set. I used the following > script for this. > > > #!/usr/bin/sh > machine_id_upper=$(cat /etc/machine-id | tr '[:lower:]' '[:upper:]') > machine_id_suffix=$(echo "$machine_id_upper" | head -c 27) > ueid="MID-$machine_id_suffix" > smcd ueid add "$ueid" > > > The performance is pretty impressive: > * iperf3 with 12 parallel connections (matching core count) results in > ~152 Gbit/s on normal loopback and ~312 Gbit/s with SMC_LO. > * qperf … tcp_bw (single thread) results in ~46 Gbit/s on normal loopback > and ~58 Gbit/s with SMC_LO > * qperf … tcp_lat latency test results in 5-9 us with normal loopback > and around 3-4 us with SMC_LO > > Then I applied this series on top of v6.10-rc1 and tried it with your > smc_run.bpf. The performance is of course in-line with the above but > thanks to being able to enable SMC on a per-netns basis I was able to > try a few more thing. First I tried just enabling it in my default > netns and verified that after restarting sshd new ssh connections to > localhost used SMC-D through SMC_LO. Then I started Chrome and > confirmed that its TCP connections also registered with SMC and > successfully fell back to TCP mode. I had no trouble with normal > browsing though I guess especially Google stuff often uses HTTP/3 so > isn't affected. Still nice to see I didn't get breakage. > > Secondly I tried smc_run.bpf with docker containers using the following > trick: > > docker inspect --format '{{.State.Pid}}' <my_container_name> > 34651 > nsenter -t 34651 -n smc_run.bpf -n 1 > > Sadly this only works for commands started in the container after > loading the BPF. So I wonder if you know of a good way to either > automatically execute smc_run.bpf on container start or maybe use it on > the docker daemon such that all namespaces created by docker get the > IPPROTO_SMC override. I'd then definitely consider using SMC-D with > SMC_LO between my home lab containers even if just for bragging rights > ;-) > > Feel free to add for the IPPROTO_SMC series: > > Tested-by: Niklas Schnelle <schnelle@linux.ibm.com> > > Thanks, > Niklas Hi Niklas , Thanks very much for your testing. Regarding your question, have you ever tried starting the container using 'smc_run.bpf docker' ? The smc_run.bpf allows the capability for replacement to be inherited by descendant processes. This might meet your needs. However, it should be noted that this scope would no longer be limited to netns. If you don't want to replace the docker command and would like to keep per netns, there are indeed some tricky ways, for example, we could check current process name when creating new netns to decide if we should add it to the ebpf-map, but I think it's not appropriate to include this in smc_run.bpf. Best wishes, D. Wythe
On Mon, 2024-06-03 at 23:07 +0800, D. Wythe wrote: > > On 6/3/24 3:48 PM, Niklas Schnelle wrote: > > On Thu, 2024-05-30 at 18:14 +0800, D. Wythe wrote: > > > On 5/30/24 5:30 PM, D. Wythe wrote: > > > > From: "D. Wythe" <alibuda@linux.alibaba.com> > > > ---8<--- > > Hi D. Wythe, > > > > I gave this series plus your smc_run.bpf and SMC_LO based SMC-D a test > > run on my Ryzen 3900X workstation and I have to say I'm quite > > impressed. I first tried the SMC_LO feature as merged in v6.10-rc1 with > > the classic LD_PRELOAD based smc_run and iperf3, and qperf … > > tcp_bw/tcp_lat both with normal localhost and between docker > > containers. For this to work I of course had to initially set my UEID > > as x86_64 unlike s390x doesn't get an SEID set. I used the following > > script for this. > > > > > > #!/usr/bin/sh > > machine_id_upper=$(cat /etc/machine-id | tr '[:lower:]' '[:upper:]') > > machine_id_suffix=$(echo "$machine_id_upper" | head -c 27) > > ueid="MID-$machine_id_suffix" > > smcd ueid add "$ueid" > > > > > > The performance is pretty impressive: > > * iperf3 with 12 parallel connections (matching core count) results in > > ~152 Gbit/s on normal loopback and ~312 Gbit/s with SMC_LO. > > * qperf … tcp_bw (single thread) results in ~46 Gbit/s on normal loopback > > and ~58 Gbit/s with SMC_LO > > * qperf … tcp_lat latency test results in 5-9 us with normal loopback > > and around 3-4 us with SMC_LO > > > > Then I applied this series on top of v6.10-rc1 and tried it with your > > smc_run.bpf. The performance is of course in-line with the above but > > thanks to being able to enable SMC on a per-netns basis I was able to > > try a few more thing. First I tried just enabling it in my default > > netns and verified that after restarting sshd new ssh connections to > > localhost used SMC-D through SMC_LO. Then I started Chrome and > > confirmed that its TCP connections also registered with SMC and > > successfully fell back to TCP mode. I had no trouble with normal > > browsing though I guess especially Google stuff often uses HTTP/3 so > > isn't affected. Still nice to see I didn't get breakage. > > > > Secondly I tried smc_run.bpf with docker containers using the following > > trick: > > > > docker inspect --format '{{.State.Pid}}' <my_container_name> > > 34651 > > nsenter -t 34651 -n smc_run.bpf -n 1 > > > > Sadly this only works for commands started in the container after > > loading the BPF. So I wonder if you know of a good way to either > > automatically execute smc_run.bpf on container start or maybe use it on > > the docker daemon such that all namespaces created by docker get the > > IPPROTO_SMC override. I'd then definitely consider using SMC-D with > > SMC_LO between my home lab containers even if just for bragging rights > > ;-) > > > > Feel free to add for the IPPROTO_SMC series: > > > > Tested-by: Niklas Schnelle <schnelle@linux.ibm.com> > > > > Thanks, > > Niklas > > Hi Niklas , > > Thanks very much for your testing. > > Regarding your question, have you ever tried starting the container > using 'smc_run.bpf docker' ? > > The smc_run.bpf allows the capability for replacement to be inherited by > descendant processes. This might meet your needs. > However, it should be noted that this scope would no longer be limited > to netns. > > If you don't want to replace the docker command and would like to keep > per netns, there are indeed some tricky ways, for example, > we could check current process name when creating new netns to decide if > we should add it to the ebpf-map, > but I think it's not appropriate to include this in smc_run.bpf. > > Best wishes, > D. Wythe I'll have to try this. I'm guessing that for docker the smc_run would have to be added for the docker daemon and not the individual docker commands. For podman on the other hand the individual command might work as there is no central daemon. And as you said bpf should allow us to add other policies in the future. Thanks, Niklas
From: "D. Wythe" <alibuda@linux.alibaba.com> This patch allows to create smc socket via AF_INET, similar to the following code, /* create v4 smc sock */ v4 = socket(AF_INET, SOCK_STREAM, IPPROTO_SMC); /* create v6 smc sock */ v6 = socket(AF_INET6, SOCK_STREAM, IPPROTO_SMC); There are several reasons why we believe it is appropriate here: 1. For smc sockets, it actually use IPv4 (AF-INET) or IPv6 (AF-INET6) address. There is no AF_SMC address at all. 2. Create smc socket in the AF_INET(6) path, which allows us to reuse the infrastructure of AF_INET(6) path, such as common ebpf hooks. Otherwise, smc have to implement it again in AF_SMC path. Such as: 1. Replace IPPROTO_TCP with IPPROTO_SMC in the socket() syscall initiated by the user, without the use of LD-PRELOAD. 2. Select whether immediate fallback is required based on peer's port/ip before connect(). A very significant result is that we can now use eBPF to implement smc_run instead of LD_PRELOAD, who is completely ineffective in scenarios of static linking. Another potential value is that we are attempting to optimize the performance of fallback socks, where merging socks is an important part, and it relies on the creation of SMC sockets under the AF_INET path. (More information : https://lore.kernel.org/netdev/1699442703-25015-1-git-send-email-alibuda@linux.alibaba.com/T/) v2 -> v1: - Code formatting, mainly including alignment and annotation repair. - move inet_smc proto ops to inet_smc.c, avoiding af_smc.c becoming too bulky. - Fix the issue where refactoring affects the initialization order. - Fix compile warning (unused out_inet_prot) while CONFIG_IPV6 was not set. v3 -> v2: - Add Alibaba's copyright information to the newfile v4 -> v3: - Fix some spelling errors - Align function naming style with smc_sock_init() to smc_sk_init() - Reversing the order of the conditional checks on clcsock to make the code more intuitive v5 -> v4: - Fix some spelling errors - Added comment, "/* CONFIG_IPV6 */", after the final #endif directive. - Rename smc_inet.h and smc_inet.c to smc_inet.h and smc_inet.c - Encapsulate the initialization and destruction of inet_smc in inet_smc.c, rather than implementing it directly in af_smc.c. - Remove useless header files in smc_inet.h - Make smc_inet_prot_xxx and smc_inet_sock_init() to be static, since it's only used in smc_inet.c D. Wythe (3): net/smc: refactoring initialization of smc sock net/smc: expose smc proto operations net/smc: Introduce IPPROTO_SMC include/uapi/linux/in.h | 2 + net/smc/Makefile | 2 +- net/smc/af_smc.c | 162 +++++++++++++++++++++++++-------------------- net/smc/smc.h | 38 +++++++++++ net/smc/smc_inet.c | 170 ++++++++++++++++++++++++++++++++++++++++++++++++ net/smc/smc_inet.h | 22 +++++++ 6 files changed, 325 insertions(+), 71 deletions(-) create mode 100644 net/smc/smc_inet.c create mode 100644 net/smc/smc_inet.h