diff mbox series

[bpf,v2] xsk: fix xsk_diag use-after-free error during socket cleanup

Message ID 20230831100119.17408-1-magnus.karlsson@gmail.com (mailing list archive)
State Accepted
Commit 3e019d8a05a38abb5c85d4f1e85fda964610aa14
Delegated to: BPF
Headers show
Series [bpf,v2] xsk: fix xsk_diag use-after-free error during socket cleanup | expand

Checks

Context Check Description
bpf/vmtest-bpf-PR success PR summary
bpf/vmtest-bpf-VM_Test-0 success Logs for ShellCheck
bpf/vmtest-bpf-VM_Test-5 success Logs for set-matrix
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for bpf
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 1330 this patch: 1330
netdev/cc_maintainers warning 6 maintainers not CCed: kuba@kernel.org hawk@kernel.org john.fastabend@gmail.com davem@davemloft.net pabeni@redhat.com edumazet@google.com
netdev/build_clang success Errors and warnings before: 1353 this patch: 1353
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 1353 this patch: 1353
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 9 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
bpf/vmtest-bpf-VM_Test-1 success Logs for build for aarch64 with gcc
bpf/vmtest-bpf-VM_Test-3 success Logs for build for x86_64 with gcc
bpf/vmtest-bpf-VM_Test-4 success Logs for build for x86_64 with llvm-16
bpf/vmtest-bpf-VM_Test-2 success Logs for build for s390x with gcc
bpf/vmtest-bpf-VM_Test-9 success Logs for test_maps on x86_64 with llvm-16
bpf/vmtest-bpf-VM_Test-8 success Logs for test_maps on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-6 success Logs for test_maps on aarch64 with gcc
bpf/vmtest-bpf-VM_Test-12 fail Logs for test_progs on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-13 fail Logs for test_progs on x86_64 with llvm-16
bpf/vmtest-bpf-VM_Test-14 fail Logs for test_progs_no_alu32 on aarch64 with gcc
bpf/vmtest-bpf-VM_Test-17 fail Logs for test_progs_no_alu32 on x86_64 with llvm-16
bpf/vmtest-bpf-VM_Test-16 fail Logs for test_progs_no_alu32 on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-18 success Logs for test_progs_no_alu32_parallel on aarch64 with gcc
bpf/vmtest-bpf-VM_Test-19 success Logs for test_progs_no_alu32_parallel on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-21 success Logs for test_progs_parallel on aarch64 with gcc
bpf/vmtest-bpf-VM_Test-20 success Logs for test_progs_no_alu32_parallel on x86_64 with llvm-16
bpf/vmtest-bpf-VM_Test-22 success Logs for test_progs_parallel on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-24 success Logs for test_verifier on aarch64 with gcc
bpf/vmtest-bpf-VM_Test-23 success Logs for test_progs_parallel on x86_64 with llvm-16
bpf/vmtest-bpf-VM_Test-26 success Logs for test_verifier on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-27 success Logs for test_verifier on x86_64 with llvm-16
bpf/vmtest-bpf-VM_Test-28 success Logs for veristat
bpf/vmtest-bpf-VM_Test-10 fail Logs for test_progs on aarch64 with gcc
bpf/vmtest-bpf-VM_Test-25 success Logs for test_verifier on s390x with gcc
bpf/vmtest-bpf-VM_Test-15 fail Logs for test_progs_no_alu32 on s390x with gcc
bpf/vmtest-bpf-VM_Test-11 fail Logs for test_progs on s390x with gcc
bpf/vmtest-bpf-VM_Test-7 success Logs for test_maps on s390x with gcc

Commit Message

Magnus Karlsson Aug. 31, 2023, 10:01 a.m. UTC
From: Magnus Karlsson <magnus.karlsson@intel.com>

Fix a use-after-free error that is possible if the xsk_diag interface
is used after the socket has been unbound from the device. This can
happen either due to the socket being closed or the device
disappearing. In the early days of AF_XDP, the way we tested that a
socket was not bound to a device was to simply check if the netdevice
pointer in the xsk socket structure was NULL. Later, a better system
was introduced by having an explicit state variable in the xsk socket
struct. For example, the state of a socket that is on the way to being
closed and has been unbound from the device is XSK_UNBOUND.

The commit in the Fixes tag below deleted the old way of signalling
that a socket is unbound, setting dev to NULL. This in the belief that
all code using the old way had been exterminated. That was
unfortunately not true as the xsk diagnostics code was still using the
old way and thus does not work as intended when a socket is going
down. Fix this by introducing a test against the state variable. If
the socket is in the state XSK_UNBOUND, simply abort the diagnostic's
netlink operation.

Fixes: 18b1ab7aa76b ("xsk: Fix race at socket teardown")
Reported-and-tested-by: syzbot+822d1359297e2694f873@syzkaller.appspotmail.com
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
---
v1 -> v2:
  * Added READ_ONCE for the state variable [Magnus]
  * Improved commit message [Maciej]

 net/xdp/xsk_diag.c | 3 +++
 1 file changed, 3 insertions(+)


base-commit: 7d35eb1a184a3f0759ad9e9cde4669b5c55b2063
--
2.42.0

Comments

Fijalkowski, Maciej Aug. 31, 2023, 10:39 a.m. UTC | #1
On Thu, Aug 31, 2023 at 12:01:17PM +0200, Magnus Karlsson wrote:
> From: Magnus Karlsson <magnus.karlsson@intel.com>
> 
> Fix a use-after-free error that is possible if the xsk_diag interface
> is used after the socket has been unbound from the device. This can
> happen either due to the socket being closed or the device
> disappearing. In the early days of AF_XDP, the way we tested that a
> socket was not bound to a device was to simply check if the netdevice
> pointer in the xsk socket structure was NULL. Later, a better system
> was introduced by having an explicit state variable in the xsk socket
> struct. For example, the state of a socket that is on the way to being
> closed and has been unbound from the device is XSK_UNBOUND.
> 
> The commit in the Fixes tag below deleted the old way of signalling
> that a socket is unbound, setting dev to NULL. This in the belief that
> all code using the old way had been exterminated. That was
> unfortunately not true as the xsk diagnostics code was still using the
> old way and thus does not work as intended when a socket is going
> down. Fix this by introducing a test against the state variable. If
> the socket is in the state XSK_UNBOUND, simply abort the diagnostic's
> netlink operation.
> 
> Fixes: 18b1ab7aa76b ("xsk: Fix race at socket teardown")
> Reported-and-tested-by: syzbot+822d1359297e2694f873@syzkaller.appspotmail.com
> Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>

Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>

> ---
> v1 -> v2:
>   * Added READ_ONCE for the state variable [Magnus]
>   * Improved commit message [Maciej]
> 
>  net/xdp/xsk_diag.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/net/xdp/xsk_diag.c b/net/xdp/xsk_diag.c
> index c014217f5fa7..22b36c8143cf 100644
> --- a/net/xdp/xsk_diag.c
> +++ b/net/xdp/xsk_diag.c
> @@ -111,6 +111,9 @@ static int xsk_diag_fill(struct sock *sk, struct sk_buff *nlskb,
>  	sock_diag_save_cookie(sk, msg->xdiag_cookie);
> 
>  	mutex_lock(&xs->mutex);
> +	if (READ_ONCE(xs->state) == XSK_UNBOUND)
> +		goto out_nlmsg_trim;
> +
>  	if ((req->xdiag_show & XDP_SHOW_INFO) && xsk_diag_put_info(xs, nlskb))
>  		goto out_nlmsg_trim;
> 
> 
> base-commit: 7d35eb1a184a3f0759ad9e9cde4669b5c55b2063
> --
> 2.42.0
Fijalkowski, Maciej Aug. 31, 2023, 10:55 a.m. UTC | #2
On Thu, Aug 31, 2023 at 12:39:21PM +0200, Maciej Fijalkowski wrote:
> On Thu, Aug 31, 2023 at 12:01:17PM +0200, Magnus Karlsson wrote:
> > From: Magnus Karlsson <magnus.karlsson@intel.com>
> > 
> > Fix a use-after-free error that is possible if the xsk_diag interface
> > is used after the socket has been unbound from the device. This can
> > happen either due to the socket being closed or the device
> > disappearing. In the early days of AF_XDP, the way we tested that a
> > socket was not bound to a device was to simply check if the netdevice
> > pointer in the xsk socket structure was NULL. Later, a better system
> > was introduced by having an explicit state variable in the xsk socket
> > struct. For example, the state of a socket that is on the way to being
> > closed and has been unbound from the device is XSK_UNBOUND.
> > 
> > The commit in the Fixes tag below deleted the old way of signalling
> > that a socket is unbound, setting dev to NULL. This in the belief that
> > all code using the old way had been exterminated. That was
> > unfortunately not true as the xsk diagnostics code was still using the
> > old way and thus does not work as intended when a socket is going
> > down. Fix this by introducing a test against the state variable. If
> > the socket is in the state XSK_UNBOUND, simply abort the diagnostic's
> > netlink operation.
> > 
> > Fixes: 18b1ab7aa76b ("xsk: Fix race at socket teardown")
> > Reported-and-tested-by: syzbot+822d1359297e2694f873@syzkaller.appspotmail.com
> > Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
> 
> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>

FWIW also tested that issue is no longer triggered on my local system:
Tested-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>


> 
> > ---
> > v1 -> v2:
> >   * Added READ_ONCE for the state variable [Magnus]
> >   * Improved commit message [Maciej]
> > 
> >  net/xdp/xsk_diag.c | 3 +++
> >  1 file changed, 3 insertions(+)
> > 
> > diff --git a/net/xdp/xsk_diag.c b/net/xdp/xsk_diag.c
> > index c014217f5fa7..22b36c8143cf 100644
> > --- a/net/xdp/xsk_diag.c
> > +++ b/net/xdp/xsk_diag.c
> > @@ -111,6 +111,9 @@ static int xsk_diag_fill(struct sock *sk, struct sk_buff *nlskb,
> >  	sock_diag_save_cookie(sk, msg->xdiag_cookie);
> > 
> >  	mutex_lock(&xs->mutex);
> > +	if (READ_ONCE(xs->state) == XSK_UNBOUND)
> > +		goto out_nlmsg_trim;
> > +
> >  	if ((req->xdiag_show & XDP_SHOW_INFO) && xsk_diag_put_info(xs, nlskb))
> >  		goto out_nlmsg_trim;
> > 
> > 
> > base-commit: 7d35eb1a184a3f0759ad9e9cde4669b5c55b2063
> > --
> > 2.42.0
patchwork-bot+netdevbpf@kernel.org Aug. 31, 2023, 11:30 a.m. UTC | #3
Hello:

This patch was applied to bpf/bpf.git (master)
by Daniel Borkmann <daniel@iogearbox.net>:

On Thu, 31 Aug 2023 12:01:17 +0200 you wrote:
> From: Magnus Karlsson <magnus.karlsson@intel.com>
> 
> Fix a use-after-free error that is possible if the xsk_diag interface
> is used after the socket has been unbound from the device. This can
> happen either due to the socket being closed or the device
> disappearing. In the early days of AF_XDP, the way we tested that a
> socket was not bound to a device was to simply check if the netdevice
> pointer in the xsk socket structure was NULL. Later, a better system
> was introduced by having an explicit state variable in the xsk socket
> struct. For example, the state of a socket that is on the way to being
> closed and has been unbound from the device is XSK_UNBOUND.
> 
> [...]

Here is the summary with links:
  - [bpf,v2] xsk: fix xsk_diag use-after-free error during socket cleanup
    https://git.kernel.org/bpf/bpf/c/3e019d8a05a3

You are awesome, thank you!
diff mbox series

Patch

diff --git a/net/xdp/xsk_diag.c b/net/xdp/xsk_diag.c
index c014217f5fa7..22b36c8143cf 100644
--- a/net/xdp/xsk_diag.c
+++ b/net/xdp/xsk_diag.c
@@ -111,6 +111,9 @@  static int xsk_diag_fill(struct sock *sk, struct sk_buff *nlskb,
 	sock_diag_save_cookie(sk, msg->xdiag_cookie);

 	mutex_lock(&xs->mutex);
+	if (READ_ONCE(xs->state) == XSK_UNBOUND)
+		goto out_nlmsg_trim;
+
 	if ((req->xdiag_show & XDP_SHOW_INFO) && xsk_diag_put_info(xs, nlskb))
 		goto out_nlmsg_trim;