Message ID | 20250307-vsock-trans-signal-race-v1-1-3aca3f771fbd@rbox.co (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net] vsock/bpf: Handle EINTR connect() racing against sockmap update | expand |
> Signal delivered during connect() may result in a disconnect of an already > TCP_ESTABLISHED socket. Problem is that such established socket might have > been placed in a sockmap before the connection was closed. We end up with a > SS_UNCONNECTED vsock in a sockmap. And this, combined with the ability to > reassign (unconnected) vsock's transport to NULL, breaks the sockmap > contract. As manifested by WARN_ON_ONCE. Note that Luigi is currently working on a (vsock test suit) test[1] for a related bug, which could be neatly adapted to test this bug as well. [1]: https://lore.kernel.org/netdev/20250306-test_vsock-v1-0-0320b5accf92@redhat.com/
On Fri, Mar 07, 2025 at 10:27:50AM +0100, Michal Luczaj wrote: >Signal delivered during connect() may result in a disconnect of an already >TCP_ESTABLISHED socket. Problem is that such established socket might have >been placed in a sockmap before the connection was closed. We end up with a >SS_UNCONNECTED vsock in a sockmap. And this, combined with the ability to >reassign (unconnected) vsock's transport to NULL, breaks the sockmap >contract. As manifested by WARN_ON_ONCE. > >Ensure the socket does not stay in sockmap. > >WARNING: CPU: 10 PID: 1310 at net/vmw_vsock/vsock_bpf.c:90 vsock_bpf_recvmsg+0xb4b/0xdf0 >CPU: 10 UID: 0 PID: 1310 Comm: a.out Tainted: G W 6.14.0-rc4+ > sock_recvmsg+0x1b2/0x220 > __sys_recvfrom+0x190/0x270 > __x64_sys_recvfrom+0xdc/0x1b0 > do_syscall_64+0x93/0x1b0 > entry_SYSCALL_64_after_hwframe+0x76/0x7e > >Fixes: 634f1a7110b4 ("vsock: support sockmap") >Signed-off-by: Michal Luczaj <mhal@rbox.co> >--- > net/vmw_vsock/af_vsock.c | 10 +++++++++- > net/vmw_vsock/vsock_bpf.c | 1 + > 2 files changed, 10 insertions(+), 1 deletion(-) I can't see this patch on the virtualization ML, are you using get_maintainer.pl? $ ./scripts/get_maintainer.pl -f net/vmw_vsock/af_vsock.c Stefano Garzarella <sgarzare@redhat.com> (maintainer:VM SOCKETS (AF_VSOCK)) "David S. Miller" <davem@davemloft.net> (maintainer:NETWORKING [GENERAL]) Eric Dumazet <edumazet@google.com> (maintainer:NETWORKING [GENERAL]) Jakub Kicinski <kuba@kernel.org> (maintainer:NETWORKING [GENERAL]) Paolo Abeni <pabeni@redhat.com> (maintainer:NETWORKING [GENERAL]) Simon Horman <horms@kernel.org> (reviewer:NETWORKING [GENERAL]) virtualization@lists.linux.dev (open list:VM SOCKETS (AF_VSOCK)) netdev@vger.kernel.org (open list:VM SOCKETS (AF_VSOCK)) linux-kernel@vger.kernel.org (open list) BTW the patch LGTM, thanks for the fix! Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> > >diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c >index 7742a9ae0131310bba197830a241541b2cde6123..e5a6d1d413634f414370595c02bcd77664780d8e 100644 >--- a/net/vmw_vsock/af_vsock.c >+++ b/net/vmw_vsock/af_vsock.c >@@ -1581,7 +1581,15 @@ static int vsock_connect(struct socket *sock, struct sockaddr *addr, > > if (signal_pending(current)) { > err = sock_intr_errno(timeout); >- sk->sk_state = sk->sk_state == TCP_ESTABLISHED ? TCP_CLOSING : TCP_CLOSE; >+ if (sk->sk_state == TCP_ESTABLISHED) { >+ /* Might have raced with a sockmap update. */ >+ if (sk->sk_prot->unhash) >+ sk->sk_prot->unhash(sk); >+ >+ sk->sk_state = TCP_CLOSING; >+ } else { >+ sk->sk_state = TCP_CLOSE; >+ } > sock->state = SS_UNCONNECTED; > vsock_transport_cancel_pkt(vsk); > vsock_remove_connected(vsk); >diff --git a/net/vmw_vsock/vsock_bpf.c b/net/vmw_vsock/vsock_bpf.c >index 07b96d56f3a577af71021b1b8132743554996c4f..c68fdaf09046b68254dac3ea70ffbe73dfa45cef 100644 >--- a/net/vmw_vsock/vsock_bpf.c >+++ b/net/vmw_vsock/vsock_bpf.c >@@ -127,6 +127,7 @@ static void vsock_bpf_rebuild_protos(struct proto *prot, const struct proto *bas > { > *prot = *base; > prot->close = sock_map_close; >+ prot->unhash = sock_map_unhash; > prot->recvmsg = vsock_bpf_recvmsg; > prot->sock_is_readable = sk_msg_is_readable; > } > >--- >base-commit: b1455a45afcf789f98032ec93c16fea0facdec93 >change-id: 20250305-vsock-trans-signal-race-d62f7718d099 > >Best regards, >-- >Michal Luczaj <mhal@rbox.co> >
On Fri, Mar 07, 2025 at 10:58:55AM +0100, Michal Luczaj wrote: >> Signal delivered during connect() may result in a disconnect of an already >> TCP_ESTABLISHED socket. Problem is that such established socket might have >> been placed in a sockmap before the connection was closed. We end up with a >> SS_UNCONNECTED vsock in a sockmap. And this, combined with the ability to >> reassign (unconnected) vsock's transport to NULL, breaks the sockmap >> contract. As manifested by WARN_ON_ONCE. > >Note that Luigi is currently working on a (vsock test suit) test[1] for a >related bug, which could be neatly adapted to test this bug as well. Can you work with Luigi to include the changes in that series? Thanks Stefano > >[1]: https://lore.kernel.org/netdev/20250306-test_vsock-v1-0-0320b5accf92@redhat.com/ >
On 3/7/25 15:33, Stefano Garzarella wrote: > On Fri, Mar 07, 2025 at 10:27:50AM +0100, Michal Luczaj wrote: >> Signal delivered during connect() may result in a disconnect of an already >> TCP_ESTABLISHED socket. Problem is that such established socket might have >> been placed in a sockmap before the connection was closed. We end up with a >> SS_UNCONNECTED vsock in a sockmap. And this, combined with the ability to >> reassign (unconnected) vsock's transport to NULL, breaks the sockmap >> contract. As manifested by WARN_ON_ONCE. >> >> Ensure the socket does not stay in sockmap. >> >> WARNING: CPU: 10 PID: 1310 at net/vmw_vsock/vsock_bpf.c:90 vsock_bpf_recvmsg+0xb4b/0xdf0 >> CPU: 10 UID: 0 PID: 1310 Comm: a.out Tainted: G W 6.14.0-rc4+ >> sock_recvmsg+0x1b2/0x220 >> __sys_recvfrom+0x190/0x270 >> __x64_sys_recvfrom+0xdc/0x1b0 >> do_syscall_64+0x93/0x1b0 >> entry_SYSCALL_64_after_hwframe+0x76/0x7e >> >> Fixes: 634f1a7110b4 ("vsock: support sockmap") >> Signed-off-by: Michal Luczaj <mhal@rbox.co> >> --- >> net/vmw_vsock/af_vsock.c | 10 +++++++++- >> net/vmw_vsock/vsock_bpf.c | 1 + >> 2 files changed, 10 insertions(+), 1 deletion(-) > > I can't see this patch on the virtualization ML, are you using > get_maintainer.pl? My bad, sorry. In fact, what's the acceptable strategy for bouncing addresses? > BTW the patch LGTM, thanks for the fix! > > Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Thanks! One question for BPF maintainers: sock_map_unhash() does _not_ call `sk_psock_stop(psock)` nor `cancel_delayed_work_sync(&psock->work)`. Is this intended?
On 3/7/25 15:35, Stefano Garzarella wrote: > On Fri, Mar 07, 2025 at 10:58:55AM +0100, Michal Luczaj wrote: >>> Signal delivered during connect() may result in a disconnect of an already >>> TCP_ESTABLISHED socket. Problem is that such established socket might have >>> been placed in a sockmap before the connection was closed. We end up with a >>> SS_UNCONNECTED vsock in a sockmap. And this, combined with the ability to >>> reassign (unconnected) vsock's transport to NULL, breaks the sockmap >>> contract. As manifested by WARN_ON_ONCE. >> >> Note that Luigi is currently working on a (vsock test suit) test[1] for a >> related bug, which could be neatly adapted to test this bug as well. >> [1]: https://lore.kernel.org/netdev/20250306-test_vsock-v1-0-0320b5accf92@redhat.com/ > > Can you work with Luigi to include the changes in that series? I was just going to wait for Luigi to finish his work (no rush, really) and then try to parametrize it. That is unless BPF/sockmap maintainers decide this thread's thing is a sockmap thing and should be in tools/testing/selftests/bpf. Below is a repro. If I'm not mistaken, it's basically what Luigi wrote, just sprinkled with map_update_elem() and recv(). #include <stdio.h> #include <stdint.h> #include <stdlib.h> #include <unistd.h> #include <errno.h> #include <pthread.h> #include <sys/wait.h> #include <sys/socket.h> #include <sys/syscall.h> #include <linux/bpf.h> #include <linux/vm_sockets.h> static void die(const char *msg) { perror(msg); exit(-1); } static int sockmap_create(void) { union bpf_attr attr = { .map_type = BPF_MAP_TYPE_SOCKMAP, .key_size = sizeof(int), .value_size = sizeof(int), .max_entries = 1 }; int map; map = syscall(SYS_bpf, BPF_MAP_CREATE, &attr, sizeof(attr)); if (map < 0) die("map_create"); return map; } static void map_update_elem(int fd, int key, int value) { union bpf_attr attr = { .map_fd = fd, .key = (uint64_t)&key, .value = (uint64_t)&value, .flags = BPF_ANY }; (void)syscall(SYS_bpf, BPF_MAP_UPDATE_ELEM, &attr, sizeof(attr)); } static void sighandler(int sig) { /* nop */ } static void *racer(void *c) { int map = sockmap_create(); for (;;) { map_update_elem(map, 0, *(int *)c); if (kill(0, SIGUSR1) < 0) die("kill"); } } int main(void) { struct sockaddr_vm addr = { .svm_family = AF_VSOCK, .svm_cid = VMADDR_CID_LOCAL, .svm_port = VMADDR_PORT_ANY }; socklen_t alen = sizeof(addr); struct sockaddr_vm bad_addr; pthread_t thread; int s, c; s = socket(AF_VSOCK, SOCK_SEQPACKET, 0); if (s < 0) die("socket s"); if (bind(s, (struct sockaddr *)&addr, alen)) die("bind"); if (listen(s, -1)) die("listen"); if (getsockname(s, (struct sockaddr *)&addr, &alen)) die("getsockname"); bad_addr = addr; bad_addr.svm_cid = 0x42424242; /* non-existing */ if (signal(SIGUSR1, sighandler) == SIG_ERR) die("signal"); if (pthread_create(&thread, 0, racer, &c)) die("pthread_create"); for (;;) { c = socket(AF_VSOCK, SOCK_SEQPACKET, 0); if (c < 0) die("socket c"); if (!connect(c, (struct sockaddr *)&addr, alen) || errno != EINTR) goto outro; if (!connect(c, (struct sockaddr *)&bad_addr, alen) || errno != ESOCKTNOSUPPORT) goto outro; (void)recv(c, &(char){0}, 1, MSG_DONTWAIT); outro: close(accept(s, NULL, NULL)); close(c); } return 0; }
On 3/7/25 10:27, Michal Luczaj wrote: > Signal delivered during connect() may result in a disconnect of an already > TCP_ESTABLISHED socket. Problem is that such established socket might have > been placed in a sockmap before the connection was closed. We end up with a > SS_UNCONNECTED vsock in a sockmap. And this, combined with the ability to > reassign (unconnected) vsock's transport to NULL, breaks the sockmap > contract. As manifested by WARN_ON_ONCE. > > Ensure the socket does not stay in sockmap. > > WARNING: CPU: 10 PID: 1310 at net/vmw_vsock/vsock_bpf.c:90 vsock_bpf_recvmsg+0xb4b/0xdf0 > CPU: 10 UID: 0 PID: 1310 Comm: a.out Tainted: G W 6.14.0-rc4+ > sock_recvmsg+0x1b2/0x220 > __sys_recvfrom+0x190/0x270 > __x64_sys_recvfrom+0xdc/0x1b0 > do_syscall_64+0x93/0x1b0 > entry_SYSCALL_64_after_hwframe+0x76/0x7e > > Fixes: 634f1a7110b4 ("vsock: support sockmap") > Signed-off-by: Michal Luczaj <mhal@rbox.co> This fix is insufficient; warning can be triggered another way. Apologies. maintainer-netdev.rst says author can do that, so: pw-bot: cr
On Fri, Mar 07, 2025 at 05:01:11PM +0100, Michal Luczaj wrote: >On 3/7/25 15:35, Stefano Garzarella wrote: >> On Fri, Mar 07, 2025 at 10:58:55AM +0100, Michal Luczaj wrote: >>>> Signal delivered during connect() may result in a disconnect of an already >>>> TCP_ESTABLISHED socket. Problem is that such established socket might have >>>> been placed in a sockmap before the connection was closed. We end up with a >>>> SS_UNCONNECTED vsock in a sockmap. And this, combined with the ability to >>>> reassign (unconnected) vsock's transport to NULL, breaks the sockmap >>>> contract. As manifested by WARN_ON_ONCE. >>> >>> Note that Luigi is currently working on a (vsock test suit) test[1] for a >>> related bug, which could be neatly adapted to test this bug as well. >>> [1]: https://lore.kernel.org/netdev/20250306-test_vsock-v1-0-0320b5accf92@redhat.com/ >> >> Can you work with Luigi to include the changes in that series? > >I was just going to wait for Luigi to finish his work (no rush, really) and >then try to parametrize it. Sure, this is fine, thanks for that! Stefano > >That is unless BPF/sockmap maintainers decide this thread's thing is a >sockmap thing and should be in tools/testing/selftests/bpf. > >Below is a repro. If I'm not mistaken, it's basically what Luigi wrote, >just sprinkled with map_update_elem() and recv(). > >#include <stdio.h> >#include <stdint.h> >#include <stdlib.h> >#include <unistd.h> >#include <errno.h> >#include <pthread.h> >#include <sys/wait.h> >#include <sys/socket.h> >#include <sys/syscall.h> >#include <linux/bpf.h> >#include <linux/vm_sockets.h> > >static void die(const char *msg) >{ > perror(msg); > exit(-1); >} > >static int sockmap_create(void) >{ > union bpf_attr attr = { > .map_type = BPF_MAP_TYPE_SOCKMAP, > .key_size = sizeof(int), > .value_size = sizeof(int), > .max_entries = 1 > }; > int map; > > map = syscall(SYS_bpf, BPF_MAP_CREATE, &attr, sizeof(attr)); > if (map < 0) > die("map_create"); > > return map; >} > >static void map_update_elem(int fd, int key, int value) >{ > union bpf_attr attr = { > .map_fd = fd, > .key = (uint64_t)&key, > .value = (uint64_t)&value, > .flags = BPF_ANY > }; > > (void)syscall(SYS_bpf, BPF_MAP_UPDATE_ELEM, &attr, sizeof(attr)); >} > >static void sighandler(int sig) >{ > /* nop */ >} > >static void *racer(void *c) >{ > int map = sockmap_create(); > > for (;;) { > map_update_elem(map, 0, *(int *)c); > if (kill(0, SIGUSR1) < 0) > die("kill"); > } >} > >int main(void) >{ > struct sockaddr_vm addr = { > .svm_family = AF_VSOCK, > .svm_cid = VMADDR_CID_LOCAL, > .svm_port = VMADDR_PORT_ANY > }; > socklen_t alen = sizeof(addr); > struct sockaddr_vm bad_addr; > pthread_t thread; > int s, c; > > s = socket(AF_VSOCK, SOCK_SEQPACKET, 0); > if (s < 0) > die("socket s"); > > if (bind(s, (struct sockaddr *)&addr, alen)) > die("bind"); > > if (listen(s, -1)) > die("listen"); > > if (getsockname(s, (struct sockaddr *)&addr, &alen)) > die("getsockname"); > > bad_addr = addr; > bad_addr.svm_cid = 0x42424242; /* non-existing */ > > if (signal(SIGUSR1, sighandler) == SIG_ERR) > die("signal"); > > if (pthread_create(&thread, 0, racer, &c)) > die("pthread_create"); > > for (;;) { > c = socket(AF_VSOCK, SOCK_SEQPACKET, 0); > if (c < 0) > die("socket c"); > > if (!connect(c, (struct sockaddr *)&addr, alen) || > errno != EINTR) > goto outro; > > if (!connect(c, (struct sockaddr *)&bad_addr, alen) || > errno != ESOCKTNOSUPPORT) > goto outro; > > (void)recv(c, &(char){0}, 1, MSG_DONTWAIT); >outro: > close(accept(s, NULL, NULL)); > close(c); > } > > return 0; >} >
On Fri, Mar 07, 2025 at 05:00:08PM +0100, Michal Luczaj wrote: >On 3/7/25 15:33, Stefano Garzarella wrote: >> On Fri, Mar 07, 2025 at 10:27:50AM +0100, Michal Luczaj wrote: >>> Signal delivered during connect() may result in a disconnect of an already >>> TCP_ESTABLISHED socket. Problem is that such established socket might have >>> been placed in a sockmap before the connection was closed. We end up with a >>> SS_UNCONNECTED vsock in a sockmap. And this, combined with the ability to >>> reassign (unconnected) vsock's transport to NULL, breaks the sockmap >>> contract. As manifested by WARN_ON_ONCE. >>> >>> Ensure the socket does not stay in sockmap. >>> >>> WARNING: CPU: 10 PID: 1310 at net/vmw_vsock/vsock_bpf.c:90 vsock_bpf_recvmsg+0xb4b/0xdf0 >>> CPU: 10 UID: 0 PID: 1310 Comm: a.out Tainted: G W 6.14.0-rc4+ >>> sock_recvmsg+0x1b2/0x220 >>> __sys_recvfrom+0x190/0x270 >>> __x64_sys_recvfrom+0xdc/0x1b0 >>> do_syscall_64+0x93/0x1b0 >>> entry_SYSCALL_64_after_hwframe+0x76/0x7e >>> >>> Fixes: 634f1a7110b4 ("vsock: support sockmap") >>> Signed-off-by: Michal Luczaj <mhal@rbox.co> >>> --- >>> net/vmw_vsock/af_vsock.c | 10 +++++++++- >>> net/vmw_vsock/vsock_bpf.c | 1 + >>> 2 files changed, 10 insertions(+), 1 deletion(-) >> >> I can't see this patch on the virtualization ML, are you using >> get_maintainer.pl? > >My bad, sorry. In fact, what's the acceptable strategy for bouncing addresses? I usually use --nogit so I put in CC pretty much just what's in MAINTAINERS (there I hope there are no bouncing addresses). Thanks, Stefano > >> BTW the patch LGTM, thanks for the fix! >> >> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> > >Thanks! > >One question for BPF maintainers: sock_map_unhash() does _not_ call >`sk_psock_stop(psock)` nor `cancel_delayed_work_sync(&psock->work)`. Is >this intended? >
On Mon, Mar 10, 2025 at 12:42:28AM +0100, Michal Luczaj wrote: >On 3/7/25 10:27, Michal Luczaj wrote: >> Signal delivered during connect() may result in a disconnect of an already >> TCP_ESTABLISHED socket. Problem is that such established socket might have >> been placed in a sockmap before the connection was closed. We end up with a >> SS_UNCONNECTED vsock in a sockmap. And this, combined with the ability to >> reassign (unconnected) vsock's transport to NULL, breaks the sockmap >> contract. As manifested by WARN_ON_ONCE. >> >> Ensure the socket does not stay in sockmap. >> >> WARNING: CPU: 10 PID: 1310 at net/vmw_vsock/vsock_bpf.c:90 vsock_bpf_recvmsg+0xb4b/0xdf0 >> CPU: 10 UID: 0 PID: 1310 Comm: a.out Tainted: G W 6.14.0-rc4+ >> sock_recvmsg+0x1b2/0x220 >> __sys_recvfrom+0x190/0x270 >> __x64_sys_recvfrom+0xdc/0x1b0 >> do_syscall_64+0x93/0x1b0 >> entry_SYSCALL_64_after_hwframe+0x76/0x7e >> >> Fixes: 634f1a7110b4 ("vsock: support sockmap") >> Signed-off-by: Michal Luczaj <mhal@rbox.co> > >This fix is insufficient; warning can be triggered another way. Apologies. No need to apologize, you are doing a great job to improve vsock with bpf! Thanks, Stefano > >maintainer-netdev.rst says author can do that, so: >pw-bot: cr >
Hi Michal, On Fri, Mar 07, 2025 at 05:01:11PM +0100, Michal Luczaj wrote: >On 3/7/25 15:35, Stefano Garzarella wrote: >> On Fri, Mar 07, 2025 at 10:58:55AM +0100, Michal Luczaj wrote: >>>> Signal delivered during connect() may result in a disconnect of an already >>>> TCP_ESTABLISHED socket. Problem is that such established socket might have >>>> been placed in a sockmap before the connection was closed. We end up with a >>>> SS_UNCONNECTED vsock in a sockmap. And this, combined with the ability to >>>> reassign (unconnected) vsock's transport to NULL, breaks the sockmap >>>> contract. As manifested by WARN_ON_ONCE. >>> >>> Note that Luigi is currently working on a (vsock test suit) test[1] for a >>> related bug, which could be neatly adapted to test this bug as well. >>> [1]: https://lore.kernel.org/netdev/20250306-test_vsock-v1-0-0320b5accf92@redhat.com/ >> >> Can you work with Luigi to include the changes in that series? > >I was just going to wait for Luigi to finish his work (no rush, really) and >then try to parametrize it. > Here[1] I pushed the v2 of the series, it addresses Stefano's comments. I use b4 to send the patches, so one commit looks "strange". It is used by b4 and it contains the cover letter. It would be nice to send both tests together, so whenever your patch is ready, feel free to open me a PR on github or send the series directly in the ML :) Cheers, Luigi [1]https://github.com/luigix25/linux/tree/test_vsock_v2
On 2025-03-10 16:00:09, Stefano Garzarella wrote: > On Mon, Mar 10, 2025 at 12:42:28AM +0100, Michal Luczaj wrote: > > On 3/7/25 10:27, Michal Luczaj wrote: > > > Signal delivered during connect() may result in a disconnect of an already > > > TCP_ESTABLISHED socket. Problem is that such established socket might have > > > been placed in a sockmap before the connection was closed. We end up with a > > > SS_UNCONNECTED vsock in a sockmap. And this, combined with the ability to > > > reassign (unconnected) vsock's transport to NULL, breaks the sockmap > > > contract. As manifested by WARN_ON_ONCE. > > > > > > Ensure the socket does not stay in sockmap. > > > > > > WARNING: CPU: 10 PID: 1310 at net/vmw_vsock/vsock_bpf.c:90 vsock_bpf_recvmsg+0xb4b/0xdf0 > > > CPU: 10 UID: 0 PID: 1310 Comm: a.out Tainted: G W 6.14.0-rc4+ > > > sock_recvmsg+0x1b2/0x220 > > > __sys_recvfrom+0x190/0x270 > > > __x64_sys_recvfrom+0xdc/0x1b0 > > > do_syscall_64+0x93/0x1b0 > > > entry_SYSCALL_64_after_hwframe+0x76/0x7e > > > > > > Fixes: 634f1a7110b4 ("vsock: support sockmap") > > > Signed-off-by: Michal Luczaj <mhal@rbox.co> > > > > This fix is insufficient; warning can be triggered another way. Apologies. > > No need to apologize, you are doing a great job to improve vsock with bpf! +1 thanks for working on it! I was out Monday but will catch up with patches as well. > > Thanks, > Stefano > > > > > maintainer-netdev.rst says author can do that, so: > > pw-bot: cr > > > >
On 2025-03-07 17:01:11, Michal Luczaj wrote: > On 3/7/25 15:35, Stefano Garzarella wrote: > > On Fri, Mar 07, 2025 at 10:58:55AM +0100, Michal Luczaj wrote: > >>> Signal delivered during connect() may result in a disconnect of an already > >>> TCP_ESTABLISHED socket. Problem is that such established socket might have > >>> been placed in a sockmap before the connection was closed. We end up with a > >>> SS_UNCONNECTED vsock in a sockmap. And this, combined with the ability to > >>> reassign (unconnected) vsock's transport to NULL, breaks the sockmap > >>> contract. As manifested by WARN_ON_ONCE. > >> > >> Note that Luigi is currently working on a (vsock test suit) test[1] for a > >> related bug, which could be neatly adapted to test this bug as well. > >> [1]: https://lore.kernel.org/netdev/20250306-test_vsock-v1-0-0320b5accf92@redhat.com/ > > > > Can you work with Luigi to include the changes in that series? > > I was just going to wait for Luigi to finish his work (no rush, really) and > then try to parametrize it. > > That is unless BPF/sockmap maintainers decide this thread's thing is a > sockmap thing and should be in tools/testing/selftests/bpf. I think it makes sense to pull into selftests/bpf then it would get run from BPF CI so we can ensure BPF changes will keep this working correctly. I guess the trick would be to see how long to run racer to get the warning but also not hang the CI. If you run it for 5 seconds or so does it trip? Or are you running this for minutes? If it takes too long to run it could be put into test_sockmap which has longer running things already. We also have some longer TCP compliance tests that would be good to start running in public somehow so might think a bit more how the infra for testing is going in sockmap side. Thanks! > > Below is a repro. If I'm not mistaken, it's basically what Luigi wrote, > just sprinkled with map_update_elem() and recv(). > > #include <stdio.h> > #include <stdint.h> > #include <stdlib.h> > #include <unistd.h> > #include <errno.h> > #include <pthread.h> > #include <sys/wait.h> > #include <sys/socket.h> > #include <sys/syscall.h> > #include <linux/bpf.h> > #include <linux/vm_sockets.h> > > static void die(const char *msg) > { > perror(msg); > exit(-1); > } > > static int sockmap_create(void) > { > union bpf_attr attr = { > .map_type = BPF_MAP_TYPE_SOCKMAP, > .key_size = sizeof(int), > .value_size = sizeof(int), > .max_entries = 1 > }; > int map; > > map = syscall(SYS_bpf, BPF_MAP_CREATE, &attr, sizeof(attr)); > if (map < 0) > die("map_create"); > > return map; > } > > static void map_update_elem(int fd, int key, int value) > { > union bpf_attr attr = { > .map_fd = fd, > .key = (uint64_t)&key, > .value = (uint64_t)&value, > .flags = BPF_ANY > }; > > (void)syscall(SYS_bpf, BPF_MAP_UPDATE_ELEM, &attr, sizeof(attr)); > } > > static void sighandler(int sig) > { > /* nop */ > } > > static void *racer(void *c) > { > int map = sockmap_create(); > > for (;;) { > map_update_elem(map, 0, *(int *)c); > if (kill(0, SIGUSR1) < 0) > die("kill"); > } > } > > int main(void) > { > struct sockaddr_vm addr = { > .svm_family = AF_VSOCK, > .svm_cid = VMADDR_CID_LOCAL, > .svm_port = VMADDR_PORT_ANY > }; > socklen_t alen = sizeof(addr); > struct sockaddr_vm bad_addr; > pthread_t thread; > int s, c; > > s = socket(AF_VSOCK, SOCK_SEQPACKET, 0); > if (s < 0) > die("socket s"); This would likely be a good test for all protocol types to stress test the update/connect/close flow. If we can land it in selftests/bpf I would be happy to make it run for TCP and others. It might be worth looking over ./tools/testing/selftests/bpf/test_sockmap.c and see if any tests from there should run for AF_VSOCK as well. > > if (bind(s, (struct sockaddr *)&addr, alen)) > die("bind"); > > if (listen(s, -1)) > die("listen"); > > if (getsockname(s, (struct sockaddr *)&addr, &alen)) > die("getsockname"); > > bad_addr = addr; > bad_addr.svm_cid = 0x42424242; /* non-existing */ > > if (signal(SIGUSR1, sighandler) == SIG_ERR) > die("signal"); > > if (pthread_create(&thread, 0, racer, &c)) > die("pthread_create"); > > for (;;) { > c = socket(AF_VSOCK, SOCK_SEQPACKET, 0); > if (c < 0) > die("socket c"); > > if (!connect(c, (struct sockaddr *)&addr, alen) || > errno != EINTR) > goto outro; > > if (!connect(c, (struct sockaddr *)&bad_addr, alen) || > errno != ESOCKTNOSUPPORT) > goto outro; > > (void)recv(c, &(char){0}, 1, MSG_DONTWAIT); > outro: > close(accept(s, NULL, NULL)); > close(c); > } > > return 0; > } > >
On 2025-03-07 10:27:50, Michal Luczaj wrote: > Signal delivered during connect() may result in a disconnect of an already > TCP_ESTABLISHED socket. Problem is that such established socket might have > been placed in a sockmap before the connection was closed. We end up with a > SS_UNCONNECTED vsock in a sockmap. And this, combined with the ability to > reassign (unconnected) vsock's transport to NULL, breaks the sockmap > contract. As manifested by WARN_ON_ONCE. > > Ensure the socket does not stay in sockmap. > > WARNING: CPU: 10 PID: 1310 at net/vmw_vsock/vsock_bpf.c:90 vsock_bpf_recvmsg+0xb4b/0xdf0 > CPU: 10 UID: 0 PID: 1310 Comm: a.out Tainted: G W 6.14.0-rc4+ > sock_recvmsg+0x1b2/0x220 > __sys_recvfrom+0x190/0x270 > __x64_sys_recvfrom+0xdc/0x1b0 > do_syscall_64+0x93/0x1b0 > entry_SYSCALL_64_after_hwframe+0x76/0x7e > > Fixes: 634f1a7110b4 ("vsock: support sockmap") > Signed-off-by: Michal Luczaj <mhal@rbox.co> Hi Michal, Unhashing the socket to stop any references from sockmap side if the sock is being put into CLOSING state makes sense to me. Was there another v2 somewhere? I didn't see it in my inbox or I missed it. I think you mentioned more fixes were needed. Thanks! John
On 3/11/25 14:49, Luigi Leonardi wrote: > Hi Michal, > > On Fri, Mar 07, 2025 at 05:01:11PM +0100, Michal Luczaj wrote: >> On 3/7/25 15:35, Stefano Garzarella wrote: >>> On Fri, Mar 07, 2025 at 10:58:55AM +0100, Michal Luczaj wrote: >>>>> Signal delivered during connect() may result in a disconnect of an already >>>>> TCP_ESTABLISHED socket. Problem is that such established socket might have >>>>> been placed in a sockmap before the connection was closed. We end up with a >>>>> SS_UNCONNECTED vsock in a sockmap. And this, combined with the ability to >>>>> reassign (unconnected) vsock's transport to NULL, breaks the sockmap >>>>> contract. As manifested by WARN_ON_ONCE. >>>> >>>> Note that Luigi is currently working on a (vsock test suit) test[1] for a >>>> related bug, which could be neatly adapted to test this bug as well. >>>> [1]: https://lore.kernel.org/netdev/20250306-test_vsock-v1-0-0320b5accf92@redhat.com/ >>> >>> Can you work with Luigi to include the changes in that series? >> >> I was just going to wait for Luigi to finish his work (no rush, really) and >> then try to parametrize it. >> > > Here[1] I pushed the v2 of the series, it addresses Stefano's comments. > I use b4 to send the patches, so one commit looks "strange". It is used > by b4 and it contains the cover letter. > [1]https://github.com/luigix25/linux/tree/test_vsock_v2 > > It would be nice to send both tests together, so whenever your patch is > ready, feel free to open me a PR on github or send the series directly > in the ML :) I've noticed you've already sent it to ML and I agree it's better this way. Perhaps my wording was unclear: by "wait for you to finish" I've meant "wait for you to get your work merged". Sorry for confusion, Michal
On 3/11/25 17:23, John Fastabend wrote: > On 2025-03-07 10:27:50, Michal Luczaj wrote: >> Signal delivered during connect() may result in a disconnect of an already >> TCP_ESTABLISHED socket. Problem is that such established socket might have >> been placed in a sockmap before the connection was closed. We end up with a >> SS_UNCONNECTED vsock in a sockmap. And this, combined with the ability to >> reassign (unconnected) vsock's transport to NULL, breaks the sockmap >> contract. As manifested by WARN_ON_ONCE. >> >> Ensure the socket does not stay in sockmap. >> >> WARNING: CPU: 10 PID: 1310 at net/vmw_vsock/vsock_bpf.c:90 vsock_bpf_recvmsg+0xb4b/0xdf0 >> CPU: 10 UID: 0 PID: 1310 Comm: a.out Tainted: G W 6.14.0-rc4+ >> sock_recvmsg+0x1b2/0x220 >> __sys_recvfrom+0x190/0x270 >> __x64_sys_recvfrom+0xdc/0x1b0 >> do_syscall_64+0x93/0x1b0 >> entry_SYSCALL_64_after_hwframe+0x76/0x7e >> >> Fixes: 634f1a7110b4 ("vsock: support sockmap") >> Signed-off-by: Michal Luczaj <mhal@rbox.co> > > Hi Michal, > > Unhashing the socket to stop any references from sockmap side if the > sock is being put into CLOSING state makes sense to me. Was there > another v2 somewhere? I didn't see it in my inbox or I missed it. > I think you mentioned more fixes were needed. Great, thanks for checking. I was worried I might be missing some subtleties of sock_map_unhash() not calling `sk_psock_stop(psock)` nor `cancel_delayed_work_sync(&psock->work)`. Especially since user still has socket descriptor open and can play with such "unhashed" socket. I've just sent v2: https://lore.kernel.org/netdev/20250314-vsock-trans-signal-race-v2-0-421a41f60f42@rbox.co/ Repro is adapted to sockmap_basic. And to answer your question from another thread: test triggers warning in a second. Currently timeout is 2s. I'm not sure how useful it may be for other families, but let me know if you'd rather have it somehow more generic. Thanks, Michal
Hi Michal, On Fri, Mar 14, 2025 at 04:22:20PM +0100, Michal Luczaj wrote: >Perhaps my wording was unclear: by "wait for you to finish" I've meant >"wait for you to get your work merged". > >Sorry for confusion, >Michal > Don't worry :) feel free to leave comments or test that patch! Cheers, Luigi
diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c index 7742a9ae0131310bba197830a241541b2cde6123..e5a6d1d413634f414370595c02bcd77664780d8e 100644 --- a/net/vmw_vsock/af_vsock.c +++ b/net/vmw_vsock/af_vsock.c @@ -1581,7 +1581,15 @@ static int vsock_connect(struct socket *sock, struct sockaddr *addr, if (signal_pending(current)) { err = sock_intr_errno(timeout); - sk->sk_state = sk->sk_state == TCP_ESTABLISHED ? TCP_CLOSING : TCP_CLOSE; + if (sk->sk_state == TCP_ESTABLISHED) { + /* Might have raced with a sockmap update. */ + if (sk->sk_prot->unhash) + sk->sk_prot->unhash(sk); + + sk->sk_state = TCP_CLOSING; + } else { + sk->sk_state = TCP_CLOSE; + } sock->state = SS_UNCONNECTED; vsock_transport_cancel_pkt(vsk); vsock_remove_connected(vsk); diff --git a/net/vmw_vsock/vsock_bpf.c b/net/vmw_vsock/vsock_bpf.c index 07b96d56f3a577af71021b1b8132743554996c4f..c68fdaf09046b68254dac3ea70ffbe73dfa45cef 100644 --- a/net/vmw_vsock/vsock_bpf.c +++ b/net/vmw_vsock/vsock_bpf.c @@ -127,6 +127,7 @@ static void vsock_bpf_rebuild_protos(struct proto *prot, const struct proto *bas { *prot = *base; prot->close = sock_map_close; + prot->unhash = sock_map_unhash; prot->recvmsg = vsock_bpf_recvmsg; prot->sock_is_readable = sk_msg_is_readable; }
Signal delivered during connect() may result in a disconnect of an already TCP_ESTABLISHED socket. Problem is that such established socket might have been placed in a sockmap before the connection was closed. We end up with a SS_UNCONNECTED vsock in a sockmap. And this, combined with the ability to reassign (unconnected) vsock's transport to NULL, breaks the sockmap contract. As manifested by WARN_ON_ONCE. Ensure the socket does not stay in sockmap. WARNING: CPU: 10 PID: 1310 at net/vmw_vsock/vsock_bpf.c:90 vsock_bpf_recvmsg+0xb4b/0xdf0 CPU: 10 UID: 0 PID: 1310 Comm: a.out Tainted: G W 6.14.0-rc4+ sock_recvmsg+0x1b2/0x220 __sys_recvfrom+0x190/0x270 __x64_sys_recvfrom+0xdc/0x1b0 do_syscall_64+0x93/0x1b0 entry_SYSCALL_64_after_hwframe+0x76/0x7e Fixes: 634f1a7110b4 ("vsock: support sockmap") Signed-off-by: Michal Luczaj <mhal@rbox.co> --- net/vmw_vsock/af_vsock.c | 10 +++++++++- net/vmw_vsock/vsock_bpf.c | 1 + 2 files changed, 10 insertions(+), 1 deletion(-) --- base-commit: b1455a45afcf789f98032ec93c16fea0facdec93 change-id: 20250305-vsock-trans-signal-race-d62f7718d099 Best regards,