diff mbox series

[bpf] xsk: fix xsk_diag use-after-free error during socket cleanup

Message ID 20230830151704.14855-1-magnus.karlsson@gmail.com (mailing list archive)
State Superseded
Delegated to: BPF
Headers show
Series [bpf] xsk: fix xsk_diag use-after-free error during socket cleanup | expand

Checks

Context Check Description
bpf/vmtest-bpf-PR fail PR summary
bpf/vmtest-bpf-VM_Test-2 success Logs for build for aarch64 with gcc
bpf/vmtest-bpf-VM_Test-0 success Logs for ${{ matrix.test }} on ${{ matrix.arch }} with ${{ matrix.toolchain_full }}
bpf/vmtest-bpf-VM_Test-1 success Logs for ShellCheck
bpf/vmtest-bpf-VM_Test-3 fail Logs for build for s390x with gcc
bpf/vmtest-bpf-VM_Test-5 success Logs for build for x86_64 with llvm-16
bpf/vmtest-bpf-VM_Test-7 success Logs for veristat
bpf/vmtest-bpf-VM_Test-6 success Logs for set-matrix
bpf/vmtest-bpf-VM_Test-4 success Logs for build for x86_64 with gcc
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for bpf
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 1330 this patch: 1330
netdev/cc_maintainers warning 6 maintainers not CCed: kuba@kernel.org hawk@kernel.org john.fastabend@gmail.com davem@davemloft.net pabeni@redhat.com edumazet@google.com
netdev/build_clang success Errors and warnings before: 1353 this patch: 1353
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 1353 this patch: 1353
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 9 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Magnus Karlsson Aug. 30, 2023, 3:17 p.m. UTC
From: Magnus Karlsson <magnus.karlsson@intel.com>

Fix a use-after-free error that is possible if the xsk_diag interface
is used at the same time as the socket is being closed. In the early
days of AF_XDP, the way we tested that a socket was not bound or being
closed was to simply check if the netdevice pointer in the xsk socket
structure was NULL. Later, a better system was introduced by having an
explicit state variable in the xsk socket struct. For example, the
state of a socket that is going down is XSK_UNBOUND.

The commit in the Fixes tag below deleted the old way of signalling
that a socket is going down, setting dev to NULL. This in the belief
that all code using the old way had been exterminated. That was
unfortunately not true as the xsk diagnostics code was still using the
old way and thus does not work as intended when a socket is going
down. Fix this by introducing a test against the state variable. If
the socket is going down, simply abort the diagnostic's netlink
operation.

Fixes: 18b1ab7aa76b ("xsk: Fix race at socket teardown")
Reported-by: syzbot+822d1359297e2694f873@syzkaller.appspotmail.com
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
---
 net/xdp/xsk_diag.c | 3 +++
 1 file changed, 3 insertions(+)


base-commit: 35d2b7ffffc1d9b3dc6c761010aa3338da49165b

Comments

Maciej Fijalkowski Aug. 30, 2023, 5:02 p.m. UTC | #1
On Wed, Aug 30, 2023 at 05:17:03PM +0200, Magnus Karlsson wrote:
> From: Magnus Karlsson <magnus.karlsson@intel.com>
> 
> Fix a use-after-free error that is possible if the xsk_diag interface
> is used at the same time as the socket is being closed. In the early

I thought our understanding is: socket is alive, we use diag interface
against it but netdev that we bound socket to is being torn down.

since xs->dev was freed but not NULLed, xsk_diag_put_info() uses this ptr
to retrieve ifindex.

> days of AF_XDP, the way we tested that a socket was not bound or being
> closed was to simply check if the netdevice pointer in the xsk socket
> structure was NULL. Later, a better system was introduced by having an
> explicit state variable in the xsk socket struct. For example, the
> state of a socket that is going down is XSK_UNBOUND.
> 
> The commit in the Fixes tag below deleted the old way of signalling
> that a socket is going down, setting dev to NULL. This in the belief
> that all code using the old way had been exterminated. That was
> unfortunately not true as the xsk diagnostics code was still using the
> old way and thus does not work as intended when a socket is going
> down. Fix this by introducing a test against the state variable. If

Again, I believe it was not the socket going down but rather the netdev?

> the socket is going down, simply abort the diagnostic's netlink
> operation.
> 
> Fixes: 18b1ab7aa76b ("xsk: Fix race at socket teardown")
> Reported-by: syzbot+822d1359297e2694f873@syzkaller.appspotmail.com

Nit: I see syzbot wanted you to include:
Reported-and-tested-by: syzbot+822d13...@syzkaller.appspotmail.com

> Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
> ---
>  net/xdp/xsk_diag.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/net/xdp/xsk_diag.c b/net/xdp/xsk_diag.c
> index c014217f5fa7..da3100bfa1c5 100644
> --- a/net/xdp/xsk_diag.c
> +++ b/net/xdp/xsk_diag.c
> @@ -111,6 +111,9 @@ static int xsk_diag_fill(struct sock *sk, struct sk_buff *nlskb,
>  	sock_diag_save_cookie(sk, msg->xdiag_cookie);
>  
>  	mutex_lock(&xs->mutex);
> +	if (xs->state == XSK_UNBOUND)
> +		goto out_nlmsg_trim;

With the above I feel like we can get rid of xs->dev test in
xsk_diag_put_info(), no?

> +
>  	if ((req->xdiag_show & XDP_SHOW_INFO) && xsk_diag_put_info(xs, nlskb))
>  		goto out_nlmsg_trim;
>  
> 
> base-commit: 35d2b7ffffc1d9b3dc6c761010aa3338da49165b
> -- 
> 2.42.0
>
Magnus Karlsson Aug. 30, 2023, 6:58 p.m. UTC | #2
On Wed, 30 Aug 2023 at 19:03, Maciej Fijalkowski
<maciej.fijalkowski@intel.com> wrote:
>
> On Wed, Aug 30, 2023 at 05:17:03PM +0200, Magnus Karlsson wrote:
> > From: Magnus Karlsson <magnus.karlsson@intel.com>
> >
> > Fix a use-after-free error that is possible if the xsk_diag interface
> > is used at the same time as the socket is being closed. In the early
>
> I thought our understanding is: socket is alive, we use diag interface
> against it but netdev that we bound socket to is being torn down.

If the socket was not going down at the same time, we would still have
a reference to the netdevice and it would not disappear. So the socket
needs to be going down for this to happen.

> since xs->dev was freed but not NULLed, xsk_diag_put_info() uses this ptr
> to retrieve ifindex.
>
> > days of AF_XDP, the way we tested that a socket was not bound or being
> > closed was to simply check if the netdevice pointer in the xsk socket
> > structure was NULL. Later, a better system was introduced by having an
> > explicit state variable in the xsk socket struct. For example, the
> > state of a socket that is going down is XSK_UNBOUND.
> >
> > The commit in the Fixes tag below deleted the old way of signalling
> > that a socket is going down, setting dev to NULL. This in the belief
> > that all code using the old way had been exterminated. That was
> > unfortunately not true as the xsk diagnostics code was still using the
> > old way and thus does not work as intended when a socket is going
> > down. Fix this by introducing a test against the state variable. If
>
> Again, I believe it was not the socket going down but rather the netdev?
>
> > the socket is going down, simply abort the diagnostic's netlink
> > operation.
> >
> > Fixes: 18b1ab7aa76b ("xsk: Fix race at socket teardown")
> > Reported-by: syzbot+822d1359297e2694f873@syzkaller.appspotmail.com
>
> Nit: I see syzbot wanted you to include:
> Reported-and-tested-by: syzbot+822d13...@syzkaller.appspotmail.com
>
> > Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
> > ---
> >  net/xdp/xsk_diag.c | 3 +++
> >  1 file changed, 3 insertions(+)
> >
> > diff --git a/net/xdp/xsk_diag.c b/net/xdp/xsk_diag.c
> > index c014217f5fa7..da3100bfa1c5 100644
> > --- a/net/xdp/xsk_diag.c
> > +++ b/net/xdp/xsk_diag.c
> > @@ -111,6 +111,9 @@ static int xsk_diag_fill(struct sock *sk, struct sk_buff *nlskb,
> >       sock_diag_save_cookie(sk, msg->xdiag_cookie);
> >
> >       mutex_lock(&xs->mutex);
> > +     if (xs->state == XSK_UNBOUND)
> > +             goto out_nlmsg_trim;
>
> With the above I feel like we can get rid of xs->dev test in
> xsk_diag_put_info(), no?

It has to stay since the socket does not get a reference to the device
until it is bound. It is fine to use the xsk_diag interface on an
unbound socket to query its state.

> > +
> >       if ((req->xdiag_show & XDP_SHOW_INFO) && xsk_diag_put_info(xs, nlskb))
> >               goto out_nlmsg_trim;
> >
> >
> > base-commit: 35d2b7ffffc1d9b3dc6c761010aa3338da49165b
> > --
> > 2.42.0
> >
Maciej Fijalkowski Aug. 30, 2023, 9:57 p.m. UTC | #3
On Wed, Aug 30, 2023 at 08:58:09PM +0200, Magnus Karlsson wrote:
> On Wed, 30 Aug 2023 at 19:03, Maciej Fijalkowski
> <maciej.fijalkowski@intel.com> wrote:
> >
> > On Wed, Aug 30, 2023 at 05:17:03PM +0200, Magnus Karlsson wrote:
> > > From: Magnus Karlsson <magnus.karlsson@intel.com>
> > >
> > > Fix a use-after-free error that is possible if the xsk_diag interface
> > > is used at the same time as the socket is being closed. In the early
> >
> > I thought our understanding is: socket is alive, we use diag interface
> > against it but netdev that we bound socket to is being torn down.
> 
> If the socket was not going down at the same time, we would still have
> a reference to the netdevice and it would not disappear. So the socket
> needs to be going down for this to happen.

No, I am able to trigger this now on my local system with KASAN turned on
via:

window 0:
sudo ./xdpsock -i enp24s0f0np0 -r -z -q 17

window 1:
watch -n 0.1 "ss --xdp -e"

window 2:
sudo rmmod ice

we hold the device via dev_get_by_index() in xsk_bind() but dev_put() is
called from xsk_unbind_dev() which can happen either from xsk_release() or
xsk_notifier(), our case refers to the latter.

I don't see currently how ss gets the ifname but after rmmoding ice I am
getting something bogus over there:

Recv-Q Send-Q Local Address:Port Peer Address:PortProcess
0      0               if18:q17              *     ino:18691 sk:2001
        rx(entries:2048)
        umem(id:0,size:16777216,num_pages:4096,chunk_size:4096,headroom:0,ifindex:0,qid:17,zc:1,refs:1)
        fr(entries:4096)
        cr(entries:2048)
        stats(rx dropped:0,rx invalid:0,rx queue full:0,rx fill ring empty:0,tx invalid:0,tx ring empty:0)

'if18' instead 'enp24s0f0np0'. With your patch we bail out early so we
wouldn't have that problem AFAICT.

> >
> > > days of AF_XDP, the way we tested that a socket was not bound or being
> > > closed was to simply check if the netdevice pointer in the xsk socket
> > > structure was NULL. Later, a better system was introduced by having an
> > > explicit state variable in the xsk socket struct. For example, the
> > > state of a socket that is going down is XSK_UNBOUND.
> > >
> > > The commit in the Fixes tag below deleted the old way of signalling
> > > that a socket is going down, setting dev to NULL. This in the belief
> > > that all code using the old way had been exterminated. That was
> > > unfortunately not true as the xsk diagnostics code was still using the
> > > old way and thus does not work as intended when a socket is going
> > > down. Fix this by introducing a test against the state variable. If
> >
> > Again, I believe it was not the socket going down but rather the netdev?
> >
> > > the socket is going down, simply abort the diagnostic's netlink
> > > operation.
> > >
> > > Fixes: 18b1ab7aa76b ("xsk: Fix race at socket teardown")
> > > Reported-by: syzbot+822d1359297e2694f873@syzkaller.appspotmail.com
> >
> > Nit: I see syzbot wanted you to include:
> > Reported-and-tested-by: syzbot+822d13...@syzkaller.appspotmail.com
> >
> > > Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
> > > ---
> > >  net/xdp/xsk_diag.c | 3 +++
> > >  1 file changed, 3 insertions(+)
> > >
> > > diff --git a/net/xdp/xsk_diag.c b/net/xdp/xsk_diag.c
> > > index c014217f5fa7..da3100bfa1c5 100644
> > > --- a/net/xdp/xsk_diag.c
> > > +++ b/net/xdp/xsk_diag.c
> > > @@ -111,6 +111,9 @@ static int xsk_diag_fill(struct sock *sk, struct sk_buff *nlskb,
> > >       sock_diag_save_cookie(sk, msg->xdiag_cookie);
> > >
> > >       mutex_lock(&xs->mutex);
> > > +     if (xs->state == XSK_UNBOUND)
> > > +             goto out_nlmsg_trim;
> >
> > With the above I feel like we can get rid of xs->dev test in
> > xsk_diag_put_info(), no?
> 
> It has to stay since the socket does not get a reference to the device
> until it is bound. It is fine to use the xsk_diag interface on an
> unbound socket to query its state.

Yes good point here.

> 
> > > +
> > >       if ((req->xdiag_show & XDP_SHOW_INFO) && xsk_diag_put_info(xs, nlskb))
> > >               goto out_nlmsg_trim;
> > >
> > >
> > > base-commit: 35d2b7ffffc1d9b3dc6c761010aa3338da49165b
> > > --
> > > 2.42.0
> > >
Magnus Karlsson Aug. 31, 2023, 5:15 a.m. UTC | #4
On Wed, 30 Aug 2023 at 17:17, Magnus Karlsson <magnus.karlsson@gmail.com> wrote:
>
> From: Magnus Karlsson <magnus.karlsson@intel.com>
>
> Fix a use-after-free error that is possible if the xsk_diag interface
> is used at the same time as the socket is being closed. In the early
> days of AF_XDP, the way we tested that a socket was not bound or being
> closed was to simply check if the netdevice pointer in the xsk socket
> structure was NULL. Later, a better system was introduced by having an
> explicit state variable in the xsk socket struct. For example, the
> state of a socket that is going down is XSK_UNBOUND.
>
> The commit in the Fixes tag below deleted the old way of signalling
> that a socket is going down, setting dev to NULL. This in the belief
> that all code using the old way had been exterminated. That was
> unfortunately not true as the xsk diagnostics code was still using the
> old way and thus does not work as intended when a socket is going
> down. Fix this by introducing a test against the state variable. If
> the socket is going down, simply abort the diagnostic's netlink
> operation.
>
> Fixes: 18b1ab7aa76b ("xsk: Fix race at socket teardown")
> Reported-by: syzbot+822d1359297e2694f873@syzkaller.appspotmail.com
> Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
> ---
>  net/xdp/xsk_diag.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/net/xdp/xsk_diag.c b/net/xdp/xsk_diag.c
> index c014217f5fa7..da3100bfa1c5 100644
> --- a/net/xdp/xsk_diag.c
> +++ b/net/xdp/xsk_diag.c
> @@ -111,6 +111,9 @@ static int xsk_diag_fill(struct sock *sk, struct sk_buff *nlskb,
>         sock_diag_save_cookie(sk, msg->xdiag_cookie);
>
>         mutex_lock(&xs->mutex);
> +       if (xs->state == XSK_UNBOUND)

Sorry, but I have to spin a v2. There should be a READ_ONCE() here of the state.

> +               goto out_nlmsg_trim;
> +
>         if ((req->xdiag_show & XDP_SHOW_INFO) && xsk_diag_put_info(xs, nlskb))
>                 goto out_nlmsg_trim;
>
>
> base-commit: 35d2b7ffffc1d9b3dc6c761010aa3338da49165b
> --
> 2.42.0
>
Magnus Karlsson Aug. 31, 2023, 5:21 a.m. UTC | #5
On Wed, 30 Aug 2023 at 23:57, Maciej Fijalkowski
<maciej.fijalkowski@intel.com> wrote:
>
> On Wed, Aug 30, 2023 at 08:58:09PM +0200, Magnus Karlsson wrote:
> > On Wed, 30 Aug 2023 at 19:03, Maciej Fijalkowski
> > <maciej.fijalkowski@intel.com> wrote:
> > >
> > > On Wed, Aug 30, 2023 at 05:17:03PM +0200, Magnus Karlsson wrote:
> > > > From: Magnus Karlsson <magnus.karlsson@intel.com>
> > > >
> > > > Fix a use-after-free error that is possible if the xsk_diag interface
> > > > is used at the same time as the socket is being closed. In the early
> > >
> > > I thought our understanding is: socket is alive, we use diag interface
> > > against it but netdev that we bound socket to is being torn down.
> >
> > If the socket was not going down at the same time, we would still have
> > a reference to the netdevice and it would not disappear. So the socket
> > needs to be going down for this to happen.
>
> No, I am able to trigger this now on my local system with KASAN turned on
> via:
>
> window 0:
> sudo ./xdpsock -i enp24s0f0np0 -r -z -q 17
>
> window 1:
> watch -n 0.1 "ss --xdp -e"
>
> window 2:
> sudo rmmod ice
>
> we hold the device via dev_get_by_index() in xsk_bind() but dev_put() is
> called from xsk_unbind_dev() which can happen either from xsk_release() or
> xsk_notifier(), our case refers to the latter.

Nice reproducer! My definition of "going down" is probably not clear.
In both the cases above, the state is set to XSK_UNBOUND and the
reference to the device is dropped, i.e. the socket is on its path to
oblivion. In any case, I will send a v2 to fix the missing READ_ONCE()
and I will try to make this "going down" clearer in the commit
message.

> I don't see currently how ss gets the ifname but after rmmoding ice I am
> getting something bogus over there:
>
> Recv-Q Send-Q Local Address:Port Peer Address:PortProcess
> 0      0               if18:q17              *     ino:18691 sk:2001
>         rx(entries:2048)
>         umem(id:0,size:16777216,num_pages:4096,chunk_size:4096,headroom:0,ifindex:0,qid:17,zc:1,refs:1)
>         fr(entries:4096)
>         cr(entries:2048)
>         stats(rx dropped:0,rx invalid:0,rx queue full:0,rx fill ring empty:0,tx invalid:0,tx ring empty:0)
>
> 'if18' instead 'enp24s0f0np0'. With your patch we bail out early so we
> wouldn't have that problem AFAICT.

"if18"? Interesting. Good thing we get rid of this with the patch.

> > >
> > > > days of AF_XDP, the way we tested that a socket was not bound or being
> > > > closed was to simply check if the netdevice pointer in the xsk socket
> > > > structure was NULL. Later, a better system was introduced by having an
> > > > explicit state variable in the xsk socket struct. For example, the
> > > > state of a socket that is going down is XSK_UNBOUND.
> > > >
> > > > The commit in the Fixes tag below deleted the old way of signalling
> > > > that a socket is going down, setting dev to NULL. This in the belief
> > > > that all code using the old way had been exterminated. That was
> > > > unfortunately not true as the xsk diagnostics code was still using the
> > > > old way and thus does not work as intended when a socket is going
> > > > down. Fix this by introducing a test against the state variable. If
> > >
> > > Again, I believe it was not the socket going down but rather the netdev?
> > >
> > > > the socket is going down, simply abort the diagnostic's netlink
> > > > operation.
> > > >
> > > > Fixes: 18b1ab7aa76b ("xsk: Fix race at socket teardown")
> > > > Reported-by: syzbot+822d1359297e2694f873@syzkaller.appspotmail.com
> > >
> > > Nit: I see syzbot wanted you to include:
> > > Reported-and-tested-by: syzbot+822d13...@syzkaller.appspotmail.com
> > >
> > > > Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
> > > > ---
> > > >  net/xdp/xsk_diag.c | 3 +++
> > > >  1 file changed, 3 insertions(+)
> > > >
> > > > diff --git a/net/xdp/xsk_diag.c b/net/xdp/xsk_diag.c
> > > > index c014217f5fa7..da3100bfa1c5 100644
> > > > --- a/net/xdp/xsk_diag.c
> > > > +++ b/net/xdp/xsk_diag.c
> > > > @@ -111,6 +111,9 @@ static int xsk_diag_fill(struct sock *sk, struct sk_buff *nlskb,
> > > >       sock_diag_save_cookie(sk, msg->xdiag_cookie);
> > > >
> > > >       mutex_lock(&xs->mutex);
> > > > +     if (xs->state == XSK_UNBOUND)
> > > > +             goto out_nlmsg_trim;
> > >
> > > With the above I feel like we can get rid of xs->dev test in
> > > xsk_diag_put_info(), no?
> >
> > It has to stay since the socket does not get a reference to the device
> > until it is bound. It is fine to use the xsk_diag interface on an
> > unbound socket to query its state.
>
> Yes good point here.
>
> >
> > > > +
> > > >       if ((req->xdiag_show & XDP_SHOW_INFO) && xsk_diag_put_info(xs, nlskb))
> > > >               goto out_nlmsg_trim;
> > > >
> > > >
> > > > base-commit: 35d2b7ffffc1d9b3dc6c761010aa3338da49165b
> > > > --
> > > > 2.42.0
> > > >
diff mbox series

Patch

diff --git a/net/xdp/xsk_diag.c b/net/xdp/xsk_diag.c
index c014217f5fa7..da3100bfa1c5 100644
--- a/net/xdp/xsk_diag.c
+++ b/net/xdp/xsk_diag.c
@@ -111,6 +111,9 @@  static int xsk_diag_fill(struct sock *sk, struct sk_buff *nlskb,
 	sock_diag_save_cookie(sk, msg->xdiag_cookie);
 
 	mutex_lock(&xs->mutex);
+	if (xs->state == XSK_UNBOUND)
+		goto out_nlmsg_trim;
+
 	if ((req->xdiag_show & XDP_SHOW_INFO) && xsk_diag_put_info(xs, nlskb))
 		goto out_nlmsg_trim;