diff mbox series

[net] vsock/bpf: Handle EINTR connect() racing against sockmap update

Message ID 20250307-vsock-trans-signal-race-v1-1-3aca3f771fbd@rbox.co (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series [net] vsock/bpf: Handle EINTR connect() racing against sockmap update | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 0 this patch: 0
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers warning 1 maintainers not CCed: virtualization@lists.linux.dev
netdev/build_clang success Errors and warnings before: 0 this patch: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 0 this patch: 0
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 23 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2025-03-07--12-00 (tests: 894)

Commit Message

Michal Luczaj March 7, 2025, 9:27 a.m. UTC
Signal delivered during connect() may result in a disconnect of an already
TCP_ESTABLISHED socket. Problem is that such established socket might have
been placed in a sockmap before the connection was closed. We end up with a
SS_UNCONNECTED vsock in a sockmap. And this, combined with the ability to
reassign (unconnected) vsock's transport to NULL, breaks the sockmap
contract. As manifested by WARN_ON_ONCE.

Ensure the socket does not stay in sockmap.

WARNING: CPU: 10 PID: 1310 at net/vmw_vsock/vsock_bpf.c:90 vsock_bpf_recvmsg+0xb4b/0xdf0
CPU: 10 UID: 0 PID: 1310 Comm: a.out Tainted: G        W          6.14.0-rc4+
 sock_recvmsg+0x1b2/0x220
 __sys_recvfrom+0x190/0x270
 __x64_sys_recvfrom+0xdc/0x1b0
 do_syscall_64+0x93/0x1b0
 entry_SYSCALL_64_after_hwframe+0x76/0x7e

Fixes: 634f1a7110b4 ("vsock: support sockmap")
Signed-off-by: Michal Luczaj <mhal@rbox.co>
---
 net/vmw_vsock/af_vsock.c  | 10 +++++++++-
 net/vmw_vsock/vsock_bpf.c |  1 +
 2 files changed, 10 insertions(+), 1 deletion(-)


---
base-commit: b1455a45afcf789f98032ec93c16fea0facdec93
change-id: 20250305-vsock-trans-signal-race-d62f7718d099

Best regards,

Comments

Michal Luczaj March 7, 2025, 9:58 a.m. UTC | #1
> Signal delivered during connect() may result in a disconnect of an already
> TCP_ESTABLISHED socket. Problem is that such established socket might have
> been placed in a sockmap before the connection was closed. We end up with a
> SS_UNCONNECTED vsock in a sockmap. And this, combined with the ability to
> reassign (unconnected) vsock's transport to NULL, breaks the sockmap
> contract. As manifested by WARN_ON_ONCE.

Note that Luigi is currently working on a (vsock test suit) test[1] for a
related bug, which could be neatly adapted to test this bug as well.

[1]: https://lore.kernel.org/netdev/20250306-test_vsock-v1-0-0320b5accf92@redhat.com/
Stefano Garzarella March 7, 2025, 2:33 p.m. UTC | #2
On Fri, Mar 07, 2025 at 10:27:50AM +0100, Michal Luczaj wrote:
>Signal delivered during connect() may result in a disconnect of an already
>TCP_ESTABLISHED socket. Problem is that such established socket might have
>been placed in a sockmap before the connection was closed. We end up with a
>SS_UNCONNECTED vsock in a sockmap. And this, combined with the ability to
>reassign (unconnected) vsock's transport to NULL, breaks the sockmap
>contract. As manifested by WARN_ON_ONCE.
>
>Ensure the socket does not stay in sockmap.
>
>WARNING: CPU: 10 PID: 1310 at net/vmw_vsock/vsock_bpf.c:90 vsock_bpf_recvmsg+0xb4b/0xdf0
>CPU: 10 UID: 0 PID: 1310 Comm: a.out Tainted: G        W          6.14.0-rc4+
> sock_recvmsg+0x1b2/0x220
> __sys_recvfrom+0x190/0x270
> __x64_sys_recvfrom+0xdc/0x1b0
> do_syscall_64+0x93/0x1b0
> entry_SYSCALL_64_after_hwframe+0x76/0x7e
>
>Fixes: 634f1a7110b4 ("vsock: support sockmap")
>Signed-off-by: Michal Luczaj <mhal@rbox.co>
>---
> net/vmw_vsock/af_vsock.c  | 10 +++++++++-
> net/vmw_vsock/vsock_bpf.c |  1 +
> 2 files changed, 10 insertions(+), 1 deletion(-)

I can't see this patch on the virtualization ML, are you using 
get_maintainer.pl?

$ ./scripts/get_maintainer.pl -f net/vmw_vsock/af_vsock.c
Stefano Garzarella <sgarzare@redhat.com> (maintainer:VM SOCKETS (AF_VSOCK))
"David S. Miller" <davem@davemloft.net> (maintainer:NETWORKING [GENERAL])
Eric Dumazet <edumazet@google.com> (maintainer:NETWORKING [GENERAL])
Jakub Kicinski <kuba@kernel.org> (maintainer:NETWORKING [GENERAL])
Paolo Abeni <pabeni@redhat.com> (maintainer:NETWORKING [GENERAL])
Simon Horman <horms@kernel.org> (reviewer:NETWORKING [GENERAL])
virtualization@lists.linux.dev (open list:VM SOCKETS (AF_VSOCK))
netdev@vger.kernel.org (open list:VM SOCKETS (AF_VSOCK))
linux-kernel@vger.kernel.org (open list)

BTW the patch LGTM, thanks for the fix!

Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>

>
>diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
>index 7742a9ae0131310bba197830a241541b2cde6123..e5a6d1d413634f414370595c02bcd77664780d8e 100644
>--- a/net/vmw_vsock/af_vsock.c
>+++ b/net/vmw_vsock/af_vsock.c
>@@ -1581,7 +1581,15 @@ static int vsock_connect(struct socket *sock, struct sockaddr *addr,
>
> 		if (signal_pending(current)) {
> 			err = sock_intr_errno(timeout);
>-			sk->sk_state = sk->sk_state == TCP_ESTABLISHED ? TCP_CLOSING : TCP_CLOSE;
>+			if (sk->sk_state == TCP_ESTABLISHED) {
>+				/* Might have raced with a sockmap update. */
>+				if (sk->sk_prot->unhash)
>+					sk->sk_prot->unhash(sk);
>+
>+				sk->sk_state = TCP_CLOSING;
>+			} else {
>+				sk->sk_state = TCP_CLOSE;
>+			}
> 			sock->state = SS_UNCONNECTED;
> 			vsock_transport_cancel_pkt(vsk);
> 			vsock_remove_connected(vsk);
>diff --git a/net/vmw_vsock/vsock_bpf.c b/net/vmw_vsock/vsock_bpf.c
>index 07b96d56f3a577af71021b1b8132743554996c4f..c68fdaf09046b68254dac3ea70ffbe73dfa45cef 100644
>--- a/net/vmw_vsock/vsock_bpf.c
>+++ b/net/vmw_vsock/vsock_bpf.c
>@@ -127,6 +127,7 @@ static void vsock_bpf_rebuild_protos(struct proto *prot, const struct proto *bas
> {
> 	*prot        = *base;
> 	prot->close  = sock_map_close;
>+	prot->unhash = sock_map_unhash;
> 	prot->recvmsg = vsock_bpf_recvmsg;
> 	prot->sock_is_readable = sk_msg_is_readable;
> }
>
>---
>base-commit: b1455a45afcf789f98032ec93c16fea0facdec93
>change-id: 20250305-vsock-trans-signal-race-d62f7718d099
>
>Best regards,
>-- 
>Michal Luczaj <mhal@rbox.co>
>
Stefano Garzarella March 7, 2025, 2:35 p.m. UTC | #3
On Fri, Mar 07, 2025 at 10:58:55AM +0100, Michal Luczaj wrote:
>> Signal delivered during connect() may result in a disconnect of an already
>> TCP_ESTABLISHED socket. Problem is that such established socket might have
>> been placed in a sockmap before the connection was closed. We end up with a
>> SS_UNCONNECTED vsock in a sockmap. And this, combined with the ability to
>> reassign (unconnected) vsock's transport to NULL, breaks the sockmap
>> contract. As manifested by WARN_ON_ONCE.
>
>Note that Luigi is currently working on a (vsock test suit) test[1] for a
>related bug, which could be neatly adapted to test this bug as well.

Can you work with Luigi to include the changes in that series?

Thanks
Stefano

>
>[1]: https://lore.kernel.org/netdev/20250306-test_vsock-v1-0-0320b5accf92@redhat.com/
>
Michal Luczaj March 7, 2025, 4 p.m. UTC | #4
On 3/7/25 15:33, Stefano Garzarella wrote:
> On Fri, Mar 07, 2025 at 10:27:50AM +0100, Michal Luczaj wrote:
>> Signal delivered during connect() may result in a disconnect of an already
>> TCP_ESTABLISHED socket. Problem is that such established socket might have
>> been placed in a sockmap before the connection was closed. We end up with a
>> SS_UNCONNECTED vsock in a sockmap. And this, combined with the ability to
>> reassign (unconnected) vsock's transport to NULL, breaks the sockmap
>> contract. As manifested by WARN_ON_ONCE.
>>
>> Ensure the socket does not stay in sockmap.
>>
>> WARNING: CPU: 10 PID: 1310 at net/vmw_vsock/vsock_bpf.c:90 vsock_bpf_recvmsg+0xb4b/0xdf0
>> CPU: 10 UID: 0 PID: 1310 Comm: a.out Tainted: G        W          6.14.0-rc4+
>> sock_recvmsg+0x1b2/0x220
>> __sys_recvfrom+0x190/0x270
>> __x64_sys_recvfrom+0xdc/0x1b0
>> do_syscall_64+0x93/0x1b0
>> entry_SYSCALL_64_after_hwframe+0x76/0x7e
>>
>> Fixes: 634f1a7110b4 ("vsock: support sockmap")
>> Signed-off-by: Michal Luczaj <mhal@rbox.co>
>> ---
>> net/vmw_vsock/af_vsock.c  | 10 +++++++++-
>> net/vmw_vsock/vsock_bpf.c |  1 +
>> 2 files changed, 10 insertions(+), 1 deletion(-)
> 
> I can't see this patch on the virtualization ML, are you using 
> get_maintainer.pl?

My bad, sorry. In fact, what's the acceptable strategy for bouncing addresses?

> BTW the patch LGTM, thanks for the fix!
> 
> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>

Thanks!

One question for BPF maintainers: sock_map_unhash() does _not_ call
`sk_psock_stop(psock)` nor `cancel_delayed_work_sync(&psock->work)`. Is
this intended?
Michal Luczaj March 7, 2025, 4:01 p.m. UTC | #5
On 3/7/25 15:35, Stefano Garzarella wrote:
> On Fri, Mar 07, 2025 at 10:58:55AM +0100, Michal Luczaj wrote:
>>> Signal delivered during connect() may result in a disconnect of an already
>>> TCP_ESTABLISHED socket. Problem is that such established socket might have
>>> been placed in a sockmap before the connection was closed. We end up with a
>>> SS_UNCONNECTED vsock in a sockmap. And this, combined with the ability to
>>> reassign (unconnected) vsock's transport to NULL, breaks the sockmap
>>> contract. As manifested by WARN_ON_ONCE.
>>
>> Note that Luigi is currently working on a (vsock test suit) test[1] for a
>> related bug, which could be neatly adapted to test this bug as well.
>> [1]: https://lore.kernel.org/netdev/20250306-test_vsock-v1-0-0320b5accf92@redhat.com/
> 
> Can you work with Luigi to include the changes in that series?

I was just going to wait for Luigi to finish his work (no rush, really) and
then try to parametrize it.

That is unless BPF/sockmap maintainers decide this thread's thing is a
sockmap thing and should be in tools/testing/selftests/bpf.

Below is a repro. If I'm not mistaken, it's basically what Luigi wrote,
just sprinkled with map_update_elem() and recv().

#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>
#include <unistd.h>
#include <errno.h>
#include <pthread.h>
#include <sys/wait.h>
#include <sys/socket.h>
#include <sys/syscall.h>
#include <linux/bpf.h>
#include <linux/vm_sockets.h>

static void die(const char *msg)
{
	perror(msg);
	exit(-1);
}

static int sockmap_create(void)
{
	union bpf_attr attr = {
		.map_type = BPF_MAP_TYPE_SOCKMAP,
		.key_size = sizeof(int),
		.value_size = sizeof(int),
		.max_entries = 1
	};
	int map;

	map = syscall(SYS_bpf, BPF_MAP_CREATE, &attr, sizeof(attr));
	if (map < 0)
		die("map_create");

	return map;
}

static void map_update_elem(int fd, int key, int value)
{
	union bpf_attr attr = {
		.map_fd = fd,
		.key = (uint64_t)&key,
		.value = (uint64_t)&value,
		.flags = BPF_ANY
	};

	(void)syscall(SYS_bpf, BPF_MAP_UPDATE_ELEM, &attr, sizeof(attr));
}

static void sighandler(int sig)
{
	/* nop */
}

static void *racer(void *c)
{
	int map = sockmap_create();

	for (;;) {
		map_update_elem(map, 0, *(int *)c);
 		if (kill(0, SIGUSR1) < 0)
 			die("kill");
 	}
}

int main(void)
{
	struct sockaddr_vm addr = {
		.svm_family = AF_VSOCK,
		.svm_cid = VMADDR_CID_LOCAL,
		.svm_port = VMADDR_PORT_ANY
	};
	socklen_t alen = sizeof(addr);
	struct sockaddr_vm bad_addr;
	pthread_t thread;
	int s, c;

	s = socket(AF_VSOCK, SOCK_SEQPACKET, 0);
	if (s < 0)
		die("socket s");

	if (bind(s, (struct sockaddr *)&addr, alen))
		die("bind");

	if (listen(s, -1))
		die("listen");

	if (getsockname(s, (struct sockaddr *)&addr, &alen))
		die("getsockname");

	bad_addr = addr;
	bad_addr.svm_cid = 0x42424242; /* non-existing */

	if (signal(SIGUSR1, sighandler) == SIG_ERR)
		die("signal");

	if (pthread_create(&thread, 0, racer, &c))
		die("pthread_create");

	for (;;) {
		c = socket(AF_VSOCK, SOCK_SEQPACKET, 0);
		if (c < 0)
			die("socket c");

		if (!connect(c, (struct sockaddr *)&addr, alen) ||
		    errno != EINTR)
			goto outro;

		if (!connect(c, (struct sockaddr *)&bad_addr, alen) ||
		    errno != ESOCKTNOSUPPORT)
			goto outro;

		(void)recv(c, &(char){0}, 1, MSG_DONTWAIT);
outro:
		close(accept(s, NULL, NULL));
		close(c);
	}

	return 0;
}
Michal Luczaj March 9, 2025, 11:42 p.m. UTC | #6
On 3/7/25 10:27, Michal Luczaj wrote:
> Signal delivered during connect() may result in a disconnect of an already
> TCP_ESTABLISHED socket. Problem is that such established socket might have
> been placed in a sockmap before the connection was closed. We end up with a
> SS_UNCONNECTED vsock in a sockmap. And this, combined with the ability to
> reassign (unconnected) vsock's transport to NULL, breaks the sockmap
> contract. As manifested by WARN_ON_ONCE.
> 
> Ensure the socket does not stay in sockmap.
> 
> WARNING: CPU: 10 PID: 1310 at net/vmw_vsock/vsock_bpf.c:90 vsock_bpf_recvmsg+0xb4b/0xdf0
> CPU: 10 UID: 0 PID: 1310 Comm: a.out Tainted: G        W          6.14.0-rc4+
>  sock_recvmsg+0x1b2/0x220
>  __sys_recvfrom+0x190/0x270
>  __x64_sys_recvfrom+0xdc/0x1b0
>  do_syscall_64+0x93/0x1b0
>  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> 
> Fixes: 634f1a7110b4 ("vsock: support sockmap")
> Signed-off-by: Michal Luczaj <mhal@rbox.co>

This fix is insufficient; warning can be triggered another way. Apologies.

maintainer-netdev.rst says author can do that, so:
pw-bot: cr
Stefano Garzarella March 10, 2025, 2:52 p.m. UTC | #7
On Fri, Mar 07, 2025 at 05:01:11PM +0100, Michal Luczaj wrote:
>On 3/7/25 15:35, Stefano Garzarella wrote:
>> On Fri, Mar 07, 2025 at 10:58:55AM +0100, Michal Luczaj wrote:
>>>> Signal delivered during connect() may result in a disconnect of an already
>>>> TCP_ESTABLISHED socket. Problem is that such established socket might have
>>>> been placed in a sockmap before the connection was closed. We end up with a
>>>> SS_UNCONNECTED vsock in a sockmap. And this, combined with the ability to
>>>> reassign (unconnected) vsock's transport to NULL, breaks the sockmap
>>>> contract. As manifested by WARN_ON_ONCE.
>>>
>>> Note that Luigi is currently working on a (vsock test suit) test[1] for a
>>> related bug, which could be neatly adapted to test this bug as well.
>>> [1]: https://lore.kernel.org/netdev/20250306-test_vsock-v1-0-0320b5accf92@redhat.com/
>>
>> Can you work with Luigi to include the changes in that series?
>
>I was just going to wait for Luigi to finish his work (no rush, really) and
>then try to parametrize it.

Sure, this is fine, thanks for that!

Stefano

>
>That is unless BPF/sockmap maintainers decide this thread's thing is a
>sockmap thing and should be in tools/testing/selftests/bpf.
>
>Below is a repro. If I'm not mistaken, it's basically what Luigi wrote,
>just sprinkled with map_update_elem() and recv().
>
>#include <stdio.h>
>#include <stdint.h>
>#include <stdlib.h>
>#include <unistd.h>
>#include <errno.h>
>#include <pthread.h>
>#include <sys/wait.h>
>#include <sys/socket.h>
>#include <sys/syscall.h>
>#include <linux/bpf.h>
>#include <linux/vm_sockets.h>
>
>static void die(const char *msg)
>{
>	perror(msg);
>	exit(-1);
>}
>
>static int sockmap_create(void)
>{
>	union bpf_attr attr = {
>		.map_type = BPF_MAP_TYPE_SOCKMAP,
>		.key_size = sizeof(int),
>		.value_size = sizeof(int),
>		.max_entries = 1
>	};
>	int map;
>
>	map = syscall(SYS_bpf, BPF_MAP_CREATE, &attr, sizeof(attr));
>	if (map < 0)
>		die("map_create");
>
>	return map;
>}
>
>static void map_update_elem(int fd, int key, int value)
>{
>	union bpf_attr attr = {
>		.map_fd = fd,
>		.key = (uint64_t)&key,
>		.value = (uint64_t)&value,
>		.flags = BPF_ANY
>	};
>
>	(void)syscall(SYS_bpf, BPF_MAP_UPDATE_ELEM, &attr, sizeof(attr));
>}
>
>static void sighandler(int sig)
>{
>	/* nop */
>}
>
>static void *racer(void *c)
>{
>	int map = sockmap_create();
>
>	for (;;) {
>		map_update_elem(map, 0, *(int *)c);
> 		if (kill(0, SIGUSR1) < 0)
> 			die("kill");
> 	}
>}
>
>int main(void)
>{
>	struct sockaddr_vm addr = {
>		.svm_family = AF_VSOCK,
>		.svm_cid = VMADDR_CID_LOCAL,
>		.svm_port = VMADDR_PORT_ANY
>	};
>	socklen_t alen = sizeof(addr);
>	struct sockaddr_vm bad_addr;
>	pthread_t thread;
>	int s, c;
>
>	s = socket(AF_VSOCK, SOCK_SEQPACKET, 0);
>	if (s < 0)
>		die("socket s");
>
>	if (bind(s, (struct sockaddr *)&addr, alen))
>		die("bind");
>
>	if (listen(s, -1))
>		die("listen");
>
>	if (getsockname(s, (struct sockaddr *)&addr, &alen))
>		die("getsockname");
>
>	bad_addr = addr;
>	bad_addr.svm_cid = 0x42424242; /* non-existing */
>
>	if (signal(SIGUSR1, sighandler) == SIG_ERR)
>		die("signal");
>
>	if (pthread_create(&thread, 0, racer, &c))
>		die("pthread_create");
>
>	for (;;) {
>		c = socket(AF_VSOCK, SOCK_SEQPACKET, 0);
>		if (c < 0)
>			die("socket c");
>
>		if (!connect(c, (struct sockaddr *)&addr, alen) ||
>		    errno != EINTR)
>			goto outro;
>
>		if (!connect(c, (struct sockaddr *)&bad_addr, alen) ||
>		    errno != ESOCKTNOSUPPORT)
>			goto outro;
>
>		(void)recv(c, &(char){0}, 1, MSG_DONTWAIT);
>outro:
>		close(accept(s, NULL, NULL));
>		close(c);
>	}
>
>	return 0;
>}
>
Stefano Garzarella March 10, 2025, 2:57 p.m. UTC | #8
On Fri, Mar 07, 2025 at 05:00:08PM +0100, Michal Luczaj wrote:
>On 3/7/25 15:33, Stefano Garzarella wrote:
>> On Fri, Mar 07, 2025 at 10:27:50AM +0100, Michal Luczaj wrote:
>>> Signal delivered during connect() may result in a disconnect of an already
>>> TCP_ESTABLISHED socket. Problem is that such established socket might have
>>> been placed in a sockmap before the connection was closed. We end up with a
>>> SS_UNCONNECTED vsock in a sockmap. And this, combined with the ability to
>>> reassign (unconnected) vsock's transport to NULL, breaks the sockmap
>>> contract. As manifested by WARN_ON_ONCE.
>>>
>>> Ensure the socket does not stay in sockmap.
>>>
>>> WARNING: CPU: 10 PID: 1310 at net/vmw_vsock/vsock_bpf.c:90 vsock_bpf_recvmsg+0xb4b/0xdf0
>>> CPU: 10 UID: 0 PID: 1310 Comm: a.out Tainted: G        W          6.14.0-rc4+
>>> sock_recvmsg+0x1b2/0x220
>>> __sys_recvfrom+0x190/0x270
>>> __x64_sys_recvfrom+0xdc/0x1b0
>>> do_syscall_64+0x93/0x1b0
>>> entry_SYSCALL_64_after_hwframe+0x76/0x7e
>>>
>>> Fixes: 634f1a7110b4 ("vsock: support sockmap")
>>> Signed-off-by: Michal Luczaj <mhal@rbox.co>
>>> ---
>>> net/vmw_vsock/af_vsock.c  | 10 +++++++++-
>>> net/vmw_vsock/vsock_bpf.c |  1 +
>>> 2 files changed, 10 insertions(+), 1 deletion(-)
>>
>> I can't see this patch on the virtualization ML, are you using
>> get_maintainer.pl?
>
>My bad, sorry. In fact, what's the acceptable strategy for bouncing addresses?

I usually use --nogit so I put in CC pretty much just what's in 
MAINTAINERS (there I hope there are no bouncing addresses).

Thanks,
Stefano

>
>> BTW the patch LGTM, thanks for the fix!
>>
>> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
>
>Thanks!
>
>One question for BPF maintainers: sock_map_unhash() does _not_ call
>`sk_psock_stop(psock)` nor `cancel_delayed_work_sync(&psock->work)`. Is
>this intended?
>
Stefano Garzarella March 10, 2025, 3 p.m. UTC | #9
On Mon, Mar 10, 2025 at 12:42:28AM +0100, Michal Luczaj wrote:
>On 3/7/25 10:27, Michal Luczaj wrote:
>> Signal delivered during connect() may result in a disconnect of an already
>> TCP_ESTABLISHED socket. Problem is that such established socket might have
>> been placed in a sockmap before the connection was closed. We end up with a
>> SS_UNCONNECTED vsock in a sockmap. And this, combined with the ability to
>> reassign (unconnected) vsock's transport to NULL, breaks the sockmap
>> contract. As manifested by WARN_ON_ONCE.
>>
>> Ensure the socket does not stay in sockmap.
>>
>> WARNING: CPU: 10 PID: 1310 at net/vmw_vsock/vsock_bpf.c:90 vsock_bpf_recvmsg+0xb4b/0xdf0
>> CPU: 10 UID: 0 PID: 1310 Comm: a.out Tainted: G        W          6.14.0-rc4+
>>  sock_recvmsg+0x1b2/0x220
>>  __sys_recvfrom+0x190/0x270
>>  __x64_sys_recvfrom+0xdc/0x1b0
>>  do_syscall_64+0x93/0x1b0
>>  entry_SYSCALL_64_after_hwframe+0x76/0x7e
>>
>> Fixes: 634f1a7110b4 ("vsock: support sockmap")
>> Signed-off-by: Michal Luczaj <mhal@rbox.co>
>
>This fix is insufficient; warning can be triggered another way. Apologies.

No need to apologize, you are doing a great job to improve vsock with 
bpf!

Thanks,
Stefano

>
>maintainer-netdev.rst says author can do that, so:
>pw-bot: cr
>
Luigi Leonardi March 11, 2025, 1:49 p.m. UTC | #10
Hi Michal,

On Fri, Mar 07, 2025 at 05:01:11PM +0100, Michal Luczaj wrote:
>On 3/7/25 15:35, Stefano Garzarella wrote:
>> On Fri, Mar 07, 2025 at 10:58:55AM +0100, Michal Luczaj wrote:
>>>> Signal delivered during connect() may result in a disconnect of an already
>>>> TCP_ESTABLISHED socket. Problem is that such established socket might have
>>>> been placed in a sockmap before the connection was closed. We end up with a
>>>> SS_UNCONNECTED vsock in a sockmap. And this, combined with the ability to
>>>> reassign (unconnected) vsock's transport to NULL, breaks the sockmap
>>>> contract. As manifested by WARN_ON_ONCE.
>>>
>>> Note that Luigi is currently working on a (vsock test suit) test[1] for a
>>> related bug, which could be neatly adapted to test this bug as well.
>>> [1]: https://lore.kernel.org/netdev/20250306-test_vsock-v1-0-0320b5accf92@redhat.com/
>>
>> Can you work with Luigi to include the changes in that series?
>
>I was just going to wait for Luigi to finish his work (no rush, really) and
>then try to parametrize it.
>

Here[1] I pushed the v2 of the series, it addresses Stefano's comments.
I use b4 to send the patches, so one commit looks "strange". It is used 
by b4 and it contains the cover letter.

It would be nice to send both tests together, so whenever your patch is 
ready, feel free to open me a PR on github or send the series directly 
in the ML :)

Cheers,
Luigi

[1]https://github.com/luigix25/linux/tree/test_vsock_v2
John Fastabend March 11, 2025, 3:38 p.m. UTC | #11
On 2025-03-10 16:00:09, Stefano Garzarella wrote:
> On Mon, Mar 10, 2025 at 12:42:28AM +0100, Michal Luczaj wrote:
> > On 3/7/25 10:27, Michal Luczaj wrote:
> > > Signal delivered during connect() may result in a disconnect of an already
> > > TCP_ESTABLISHED socket. Problem is that such established socket might have
> > > been placed in a sockmap before the connection was closed. We end up with a
> > > SS_UNCONNECTED vsock in a sockmap. And this, combined with the ability to
> > > reassign (unconnected) vsock's transport to NULL, breaks the sockmap
> > > contract. As manifested by WARN_ON_ONCE.
> > > 
> > > Ensure the socket does not stay in sockmap.
> > > 
> > > WARNING: CPU: 10 PID: 1310 at net/vmw_vsock/vsock_bpf.c:90 vsock_bpf_recvmsg+0xb4b/0xdf0
> > > CPU: 10 UID: 0 PID: 1310 Comm: a.out Tainted: G        W          6.14.0-rc4+
> > >  sock_recvmsg+0x1b2/0x220
> > >  __sys_recvfrom+0x190/0x270
> > >  __x64_sys_recvfrom+0xdc/0x1b0
> > >  do_syscall_64+0x93/0x1b0
> > >  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> > > 
> > > Fixes: 634f1a7110b4 ("vsock: support sockmap")
> > > Signed-off-by: Michal Luczaj <mhal@rbox.co>
> > 
> > This fix is insufficient; warning can be triggered another way. Apologies.
> 
> No need to apologize, you are doing a great job to improve vsock with bpf!

+1 thanks for working on it! I was out Monday but will catch up with
patches as well.

> 
> Thanks,
> Stefano
> 
> > 
> > maintainer-netdev.rst says author can do that, so:
> > pw-bot: cr
> > 
> 
>
John Fastabend March 11, 2025, 3:56 p.m. UTC | #12
On 2025-03-07 17:01:11, Michal Luczaj wrote:
> On 3/7/25 15:35, Stefano Garzarella wrote:
> > On Fri, Mar 07, 2025 at 10:58:55AM +0100, Michal Luczaj wrote:
> >>> Signal delivered during connect() may result in a disconnect of an already
> >>> TCP_ESTABLISHED socket. Problem is that such established socket might have
> >>> been placed in a sockmap before the connection was closed. We end up with a
> >>> SS_UNCONNECTED vsock in a sockmap. And this, combined with the ability to
> >>> reassign (unconnected) vsock's transport to NULL, breaks the sockmap
> >>> contract. As manifested by WARN_ON_ONCE.
> >>
> >> Note that Luigi is currently working on a (vsock test suit) test[1] for a
> >> related bug, which could be neatly adapted to test this bug as well.
> >> [1]: https://lore.kernel.org/netdev/20250306-test_vsock-v1-0-0320b5accf92@redhat.com/
> > 
> > Can you work with Luigi to include the changes in that series?
> 
> I was just going to wait for Luigi to finish his work (no rush, really) and
> then try to parametrize it.
> 
> That is unless BPF/sockmap maintainers decide this thread's thing is a
> sockmap thing and should be in tools/testing/selftests/bpf.

I think it makes sense to pull into selftests/bpf then it would get run
from BPF CI so we can ensure BPF changes will keep this working
correctly.

I guess the trick would be to see how long to run racer to get the
warning but also not hang the CI. If you run it for 5 seconds or so
does it trip? Or are you running this for minutes?

If it takes too long to run it could be put into test_sockmap which
has longer running things already. We also have some longer TCP
compliance tests that would be good to start running in public somehow
so might think a bit more how the infra for testing is going in
sockmap side.

Thanks!

> 
> Below is a repro. If I'm not mistaken, it's basically what Luigi wrote,
> just sprinkled with map_update_elem() and recv().
> 
> #include <stdio.h>
> #include <stdint.h>
> #include <stdlib.h>
> #include <unistd.h>
> #include <errno.h>
> #include <pthread.h>
> #include <sys/wait.h>
> #include <sys/socket.h>
> #include <sys/syscall.h>
> #include <linux/bpf.h>
> #include <linux/vm_sockets.h>
> 
> static void die(const char *msg)
> {
> 	perror(msg);
> 	exit(-1);
> }
> 
> static int sockmap_create(void)
> {
> 	union bpf_attr attr = {
> 		.map_type = BPF_MAP_TYPE_SOCKMAP,
> 		.key_size = sizeof(int),
> 		.value_size = sizeof(int),
> 		.max_entries = 1
> 	};
> 	int map;
> 
> 	map = syscall(SYS_bpf, BPF_MAP_CREATE, &attr, sizeof(attr));
> 	if (map < 0)
> 		die("map_create");
> 
> 	return map;
> }
> 
> static void map_update_elem(int fd, int key, int value)
> {
> 	union bpf_attr attr = {
> 		.map_fd = fd,
> 		.key = (uint64_t)&key,
> 		.value = (uint64_t)&value,
> 		.flags = BPF_ANY
> 	};
> 
> 	(void)syscall(SYS_bpf, BPF_MAP_UPDATE_ELEM, &attr, sizeof(attr));
> }
> 
> static void sighandler(int sig)
> {
> 	/* nop */
> }
> 
> static void *racer(void *c)
> {
> 	int map = sockmap_create();
> 
> 	for (;;) {
> 		map_update_elem(map, 0, *(int *)c);
>  		if (kill(0, SIGUSR1) < 0)
>  			die("kill");
>  	}
> }
> 
> int main(void)
> {
> 	struct sockaddr_vm addr = {
> 		.svm_family = AF_VSOCK,
> 		.svm_cid = VMADDR_CID_LOCAL,
> 		.svm_port = VMADDR_PORT_ANY
> 	};
> 	socklen_t alen = sizeof(addr);
> 	struct sockaddr_vm bad_addr;
> 	pthread_t thread;
> 	int s, c;
> 
> 	s = socket(AF_VSOCK, SOCK_SEQPACKET, 0);
> 	if (s < 0)
> 		die("socket s");

This would likely be a good test for all protocol types to stress test
the update/connect/close flow. If we can land it in selftests/bpf I
would be happy to make it run for TCP and others.

It might be worth looking over ./tools/testing/selftests/bpf/test_sockmap.c
and see if any tests from there should run for AF_VSOCK as well.

> 
> 	if (bind(s, (struct sockaddr *)&addr, alen))
> 		die("bind");
> 
> 	if (listen(s, -1))
> 		die("listen");
> 
> 	if (getsockname(s, (struct sockaddr *)&addr, &alen))
> 		die("getsockname");
> 
> 	bad_addr = addr;
> 	bad_addr.svm_cid = 0x42424242; /* non-existing */
> 
> 	if (signal(SIGUSR1, sighandler) == SIG_ERR)
> 		die("signal");
> 
> 	if (pthread_create(&thread, 0, racer, &c))
> 		die("pthread_create");
> 
> 	for (;;) {
> 		c = socket(AF_VSOCK, SOCK_SEQPACKET, 0);
> 		if (c < 0)
> 			die("socket c");
> 
> 		if (!connect(c, (struct sockaddr *)&addr, alen) ||
> 		    errno != EINTR)
> 			goto outro;
> 
> 		if (!connect(c, (struct sockaddr *)&bad_addr, alen) ||
> 		    errno != ESOCKTNOSUPPORT)
> 			goto outro;
> 
> 		(void)recv(c, &(char){0}, 1, MSG_DONTWAIT);
> outro:
> 		close(accept(s, NULL, NULL));
> 		close(c);
> 	}
> 
> 	return 0;
> }
> 
>
John Fastabend March 11, 2025, 4:23 p.m. UTC | #13
On 2025-03-07 10:27:50, Michal Luczaj wrote:
> Signal delivered during connect() may result in a disconnect of an already
> TCP_ESTABLISHED socket. Problem is that such established socket might have
> been placed in a sockmap before the connection was closed. We end up with a
> SS_UNCONNECTED vsock in a sockmap. And this, combined with the ability to
> reassign (unconnected) vsock's transport to NULL, breaks the sockmap
> contract. As manifested by WARN_ON_ONCE.
> 
> Ensure the socket does not stay in sockmap.
> 
> WARNING: CPU: 10 PID: 1310 at net/vmw_vsock/vsock_bpf.c:90 vsock_bpf_recvmsg+0xb4b/0xdf0
> CPU: 10 UID: 0 PID: 1310 Comm: a.out Tainted: G        W          6.14.0-rc4+
>  sock_recvmsg+0x1b2/0x220
>  __sys_recvfrom+0x190/0x270
>  __x64_sys_recvfrom+0xdc/0x1b0
>  do_syscall_64+0x93/0x1b0
>  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> 
> Fixes: 634f1a7110b4 ("vsock: support sockmap")
> Signed-off-by: Michal Luczaj <mhal@rbox.co>

Hi Michal,

Unhashing the socket to stop any references from sockmap side if the
sock is being put into CLOSING state makes sense to me. Was there
another v2 somewhere? I didn't see it in my inbox or I missed it.
I think you mentioned more fixes were needed.

Thanks!
John
Michal Luczaj March 14, 2025, 3:22 p.m. UTC | #14
On 3/11/25 14:49, Luigi Leonardi wrote:
> Hi Michal,
> 
> On Fri, Mar 07, 2025 at 05:01:11PM +0100, Michal Luczaj wrote:
>> On 3/7/25 15:35, Stefano Garzarella wrote:
>>> On Fri, Mar 07, 2025 at 10:58:55AM +0100, Michal Luczaj wrote:
>>>>> Signal delivered during connect() may result in a disconnect of an already
>>>>> TCP_ESTABLISHED socket. Problem is that such established socket might have
>>>>> been placed in a sockmap before the connection was closed. We end up with a
>>>>> SS_UNCONNECTED vsock in a sockmap. And this, combined with the ability to
>>>>> reassign (unconnected) vsock's transport to NULL, breaks the sockmap
>>>>> contract. As manifested by WARN_ON_ONCE.
>>>>
>>>> Note that Luigi is currently working on a (vsock test suit) test[1] for a
>>>> related bug, which could be neatly adapted to test this bug as well.
>>>> [1]: https://lore.kernel.org/netdev/20250306-test_vsock-v1-0-0320b5accf92@redhat.com/
>>>
>>> Can you work with Luigi to include the changes in that series?
>>
>> I was just going to wait for Luigi to finish his work (no rush, really) and
>> then try to parametrize it.
>>
> 
> Here[1] I pushed the v2 of the series, it addresses Stefano's comments.
> I use b4 to send the patches, so one commit looks "strange". It is used 
> by b4 and it contains the cover letter.
> [1]https://github.com/luigix25/linux/tree/test_vsock_v2
> 
> It would be nice to send both tests together, so whenever your patch is 
> ready, feel free to open me a PR on github or send the series directly 
> in the ML :)

I've noticed you've already sent it to ML and I agree it's better this way.
Perhaps my wording was unclear: by "wait for you to finish" I've meant
"wait for you to get your work merged".

Sorry for confusion,
Michal
Michal Luczaj March 14, 2025, 3:29 p.m. UTC | #15
On 3/11/25 17:23, John Fastabend wrote:
> On 2025-03-07 10:27:50, Michal Luczaj wrote:
>> Signal delivered during connect() may result in a disconnect of an already
>> TCP_ESTABLISHED socket. Problem is that such established socket might have
>> been placed in a sockmap before the connection was closed. We end up with a
>> SS_UNCONNECTED vsock in a sockmap. And this, combined with the ability to
>> reassign (unconnected) vsock's transport to NULL, breaks the sockmap
>> contract. As manifested by WARN_ON_ONCE.
>>
>> Ensure the socket does not stay in sockmap.
>>
>> WARNING: CPU: 10 PID: 1310 at net/vmw_vsock/vsock_bpf.c:90 vsock_bpf_recvmsg+0xb4b/0xdf0
>> CPU: 10 UID: 0 PID: 1310 Comm: a.out Tainted: G        W          6.14.0-rc4+
>>  sock_recvmsg+0x1b2/0x220
>>  __sys_recvfrom+0x190/0x270
>>  __x64_sys_recvfrom+0xdc/0x1b0
>>  do_syscall_64+0x93/0x1b0
>>  entry_SYSCALL_64_after_hwframe+0x76/0x7e
>>
>> Fixes: 634f1a7110b4 ("vsock: support sockmap")
>> Signed-off-by: Michal Luczaj <mhal@rbox.co>
> 
> Hi Michal,
> 
> Unhashing the socket to stop any references from sockmap side if the
> sock is being put into CLOSING state makes sense to me. Was there
> another v2 somewhere? I didn't see it in my inbox or I missed it.
> I think you mentioned more fixes were needed.

Great, thanks for checking. I was worried I might be missing some
subtleties of sock_map_unhash() not calling `sk_psock_stop(psock)` nor
`cancel_delayed_work_sync(&psock->work)`. Especially since user still has
socket descriptor open and can play with such "unhashed" socket.

I've just sent v2: https://lore.kernel.org/netdev/20250314-vsock-trans-signal-race-v2-0-421a41f60f42@rbox.co/

Repro is adapted to sockmap_basic. And to answer your question from
another thread: test triggers warning in a second. Currently timeout is 2s.
I'm not sure how useful it may be for other families, but let me know if
you'd rather have it somehow more generic.

Thanks,
Michal
Luigi Leonardi March 18, 2025, 8:42 a.m. UTC | #16
Hi Michal,

On Fri, Mar 14, 2025 at 04:22:20PM +0100, Michal Luczaj wrote:
>Perhaps my wording was unclear: by "wait for you to finish" I've meant
>"wait for you to get your work merged".
>
>Sorry for confusion,
>Michal
>

Don't worry :) feel free to leave comments or test that patch!

Cheers,
Luigi
diff mbox series

Patch

diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index 7742a9ae0131310bba197830a241541b2cde6123..e5a6d1d413634f414370595c02bcd77664780d8e 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -1581,7 +1581,15 @@  static int vsock_connect(struct socket *sock, struct sockaddr *addr,
 
 		if (signal_pending(current)) {
 			err = sock_intr_errno(timeout);
-			sk->sk_state = sk->sk_state == TCP_ESTABLISHED ? TCP_CLOSING : TCP_CLOSE;
+			if (sk->sk_state == TCP_ESTABLISHED) {
+				/* Might have raced with a sockmap update. */
+				if (sk->sk_prot->unhash)
+					sk->sk_prot->unhash(sk);
+
+				sk->sk_state = TCP_CLOSING;
+			} else {
+				sk->sk_state = TCP_CLOSE;
+			}
 			sock->state = SS_UNCONNECTED;
 			vsock_transport_cancel_pkt(vsk);
 			vsock_remove_connected(vsk);
diff --git a/net/vmw_vsock/vsock_bpf.c b/net/vmw_vsock/vsock_bpf.c
index 07b96d56f3a577af71021b1b8132743554996c4f..c68fdaf09046b68254dac3ea70ffbe73dfa45cef 100644
--- a/net/vmw_vsock/vsock_bpf.c
+++ b/net/vmw_vsock/vsock_bpf.c
@@ -127,6 +127,7 @@  static void vsock_bpf_rebuild_protos(struct proto *prot, const struct proto *bas
 {
 	*prot        = *base;
 	prot->close  = sock_map_close;
+	prot->unhash = sock_map_unhash;
 	prot->recvmsg = vsock_bpf_recvmsg;
 	prot->sock_is_readable = sk_msg_is_readable;
 }