Message ID | 20240516103049.1132040-1-mhal@rbox.co (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net] af_unix: Fix garbage collection of embryos carrying OOB with SCM_RIGHTS | expand |
From: Michal Luczaj <mhal@rbox.co> Date: Thu, 16 May 2024 12:20:48 +0200 > GC attempts to explicitly drop oob_skb before purging the hit list. > > The problem is with embryos: kfree_skb(u->oob_skb) is never called on an > embryo socket, as those sockets are not directly stacked by the SCC walk. > > The python script below [0] sends a listener's fd to its embryo as OOB > data. While GC does collect the embryo's queue, it fails to drop the OOB > skb's refcount. The skb which was in embryo's receive queue stays as > unix_sk(sk)->oob_skb and keeps the listener's refcount [1]. > > Tell GC to dispose embryo's oob_skb. > > [0]: > from array import array > from socket import * > > addr = '\x00unix-oob' > lis = socket(AF_UNIX, SOCK_STREAM) > lis.bind(addr) > lis.listen(1) > > s = socket(AF_UNIX, SOCK_STREAM) > s.connect(addr) > scm = (SOL_SOCKET, SCM_RIGHTS, array('i', [lis.fileno()])) > s.sendmsg([b'x'], [scm], MSG_OOB) > lis.close() > > [1] > $ grep unix-oob /proc/net/unix > $ ./unix-oob.py > $ grep unix-oob /proc/net/unix > 0000000000000000: 00000002 00000000 00000000 0001 02 0 @unix-oob > 0000000000000000: 00000002 00000000 00010000 0001 01 6072 @unix-oob > > Fixes: 4090fa373f0e ("af_unix: Replace garbage collection algorithm.") > Signed-off-by: Michal Luczaj <mhal@rbox.co> > --- > net/unix/garbage.c | 25 ++++++++++++++++--------- > 1 file changed, 16 insertions(+), 9 deletions(-) > > diff --git a/net/unix/garbage.c b/net/unix/garbage.c > index 1f8b8cdfcdc8..8e9431955ab7 100644 > --- a/net/unix/garbage.c > +++ b/net/unix/garbage.c > @@ -342,6 +342,18 @@ enum unix_recv_queue_lock_class { > U_RECVQ_LOCK_EMBRYO, > }; > > +static void unix_collect_queue(struct unix_sock *u, struct sk_buff_head *hitlist) > +{ > + skb_queue_splice_init(&u->sk.sk_receive_queue, hitlist); > + > +#if IS_ENABLED(CONFIG_AF_UNIX_OOB) > + if (u->oob_skb) { > + WARN_ON_ONCE(skb_unref(u->oob_skb)); > + u->oob_skb = NULL; > + } > +#endif > +} > + > static void unix_collect_skb(struct list_head *scc, struct sk_buff_head *hitlist) > { > struct unix_vertex *vertex; > @@ -365,18 +377,11 @@ static void unix_collect_skb(struct list_head *scc, struct sk_buff_head *hitlist > > /* listener -> embryo order, the inversion never happens. */ > spin_lock_nested(&embryo_queue->lock, U_RECVQ_LOCK_EMBRYO); > - skb_queue_splice_init(embryo_queue, hitlist); > + unix_collect_queue(unix_sk(skb->sk), hitlist); > spin_unlock(&embryo_queue->lock); > } > } else { > - skb_queue_splice_init(queue, hitlist); > - > -#if IS_ENABLED(CONFIG_AF_UNIX_OOB) > - if (u->oob_skb) { > - kfree_skb(u->oob_skb); > - u->oob_skb = NULL; > - } > -#endif > + unix_collect_queue(u, hitlist); > } > > spin_unlock(&queue->lock); > @@ -583,6 +588,8 @@ static void __unix_gc(struct work_struct *work) > skb_queue_walk(&hitlist, skb) { > if (UNIXCB(skb).fp) > UNIXCB(skb).fp->dead = true; > + > + WARN_ON_ONCE(refcount_read(&skb->users) != 1); Given we will refactor OOB with no additional refcount, this will not make sense. Rather, I'd add a test case in a selftest to catch the future regression. And I noticed that I actually tried to catch this in tools/testing/selftests/net/af_unix/scm_rights.c , and what is missing is... :S ---8<--- >From f6f47b3cdd22e7c4ed48bf1de089babd09c606e0 Mon Sep 17 00:00:00 2001 From: Kuniyuki Iwashima <kuniyu@amazon.com> Date: Thu, 16 May 2024 12:10:45 +0000 Subject: [PATCH] selftest: af_unix: Make SCM_RIGHTS into OOB data. scm_rights.c covers various test cases for inflight file descriptors and garbage collector for AF_UNIX sockets. Currently, SCM_RIGHTS messages are sent with 3-bytes string, and it's not good for MSG_OOB cases, as SCM_RIGTS cmsg goes with the first 2-bytes, which is non-OOB data. Let's send SCM_RIGHTS messages with 1-byte character to pack SCM_RIGHTS into OOB data. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> diff --git a/tools/testing/selftests/net/af_unix/scm_rights.c b/tools/testing/selftests/net/af_unix/scm_rights.c index bab606c9f1eb..2bfed46e0b19 100644 --- a/tools/testing/selftests/net/af_unix/scm_rights.c +++ b/tools/testing/selftests/net/af_unix/scm_rights.c @@ -197,8 +197,8 @@ void __send_fd(struct __test_metadata *_metadata, const FIXTURE_VARIANT(scm_rights) *variant, int inflight, int receiver) { -#define MSG "nop" -#define MSGLEN 3 +#define MSG "x" +#define MSGLEN 1 struct { struct cmsghdr cmsghdr; int fd[2]; ---8<--- With this, we can catch the case properly, # RUN scm_rights.stream_listener_oob.self_ref ... # scm_rights.c:119:self_ref:Expected 0 (0) == ret (6) # self_ref: Test terminated by assertion # FAIL scm_rights.stream_listener_oob.self_ref not ok 13 scm_rights.stream_listener_oob.self_ref # RUN scm_rights.stream_listener_oob.triangle ... # scm_rights.c:119:triangle:Expected 0 (0) == ret (18) # triangle: Test terminated by assertion # FAIL scm_rights.stream_listener_oob.triangle not ok 14 scm_rights.stream_listener_oob.triangle # RUN scm_rights.stream_listener_oob.cross_edge ... # scm_rights.c:119:cross_edge:Expected 0 (0) == ret (24) # cross_edge: Test terminated by assertion # FAIL scm_rights.stream_listener_oob.cross_edge not ok 15 scm_rights.stream_listener_oob.cross_edge # FAILED: 12 / 15 tests passed. # Totals: pass:12 fail:3 xfail:0 xpass:0 skip:0 error:0 And with your patch, all good ! # RUN scm_rights.stream_listener_oob.self_ref ... # OK scm_rights.stream_listener_oob.self_ref ok 13 scm_rights.stream_listener_oob.self_ref # RUN scm_rights.stream_listener_oob.triangle ... # OK scm_rights.stream_listener_oob.triangle ok 14 scm_rights.stream_listener_oob.triangle # RUN scm_rights.stream_listener_oob.cross_edge ... # OK scm_rights.stream_listener_oob.cross_edge ok 15 scm_rights.stream_listener_oob.cross_edge # PASSED: 15 / 15 tests passed. # Totals: pass:15 fail:0 xfail:0 xpass:0 skip:0 error:0 Could you remove the WARN_ON_ONCE() and repost with my patch above ? Thanks!
On 5/16/24 14:31, Kuniyuki Iwashima wrote: > From: Michal Luczaj <mhal@rbox.co> > Date: Thu, 16 May 2024 12:20:48 +0200 >> ... >> @@ -583,6 +588,8 @@ static void __unix_gc(struct work_struct *work) >> skb_queue_walk(&hitlist, skb) { >> if (UNIXCB(skb).fp) >> UNIXCB(skb).fp->dead = true; >> + >> + WARN_ON_ONCE(refcount_read(&skb->users) != 1); > > Given we will refactor OOB with no additional refcount, this will not > make sense. Rather, I'd add a test case in a selftest to catch the > future regression. Sure, I get it. > And I noticed that I actually tried to catch this in > > tools/testing/selftests/net/af_unix/scm_rights.c > > , and what is missing is... :S > ... > Could you remove the WARN_ON_ONCE() and repost with my patch > above ? Done: https://lore.kernel.org/all/20240516145457.1206847-1-mhal@rbox.co/ thanks, Michal
diff --git a/net/unix/garbage.c b/net/unix/garbage.c index 1f8b8cdfcdc8..8e9431955ab7 100644 --- a/net/unix/garbage.c +++ b/net/unix/garbage.c @@ -342,6 +342,18 @@ enum unix_recv_queue_lock_class { U_RECVQ_LOCK_EMBRYO, }; +static void unix_collect_queue(struct unix_sock *u, struct sk_buff_head *hitlist) +{ + skb_queue_splice_init(&u->sk.sk_receive_queue, hitlist); + +#if IS_ENABLED(CONFIG_AF_UNIX_OOB) + if (u->oob_skb) { + WARN_ON_ONCE(skb_unref(u->oob_skb)); + u->oob_skb = NULL; + } +#endif +} + static void unix_collect_skb(struct list_head *scc, struct sk_buff_head *hitlist) { struct unix_vertex *vertex; @@ -365,18 +377,11 @@ static void unix_collect_skb(struct list_head *scc, struct sk_buff_head *hitlist /* listener -> embryo order, the inversion never happens. */ spin_lock_nested(&embryo_queue->lock, U_RECVQ_LOCK_EMBRYO); - skb_queue_splice_init(embryo_queue, hitlist); + unix_collect_queue(unix_sk(skb->sk), hitlist); spin_unlock(&embryo_queue->lock); } } else { - skb_queue_splice_init(queue, hitlist); - -#if IS_ENABLED(CONFIG_AF_UNIX_OOB) - if (u->oob_skb) { - kfree_skb(u->oob_skb); - u->oob_skb = NULL; - } -#endif + unix_collect_queue(u, hitlist); } spin_unlock(&queue->lock); @@ -583,6 +588,8 @@ static void __unix_gc(struct work_struct *work) skb_queue_walk(&hitlist, skb) { if (UNIXCB(skb).fp) UNIXCB(skb).fp->dead = true; + + WARN_ON_ONCE(refcount_read(&skb->users) != 1); } __skb_queue_purge(&hitlist);
GC attempts to explicitly drop oob_skb before purging the hit list. The problem is with embryos: kfree_skb(u->oob_skb) is never called on an embryo socket, as those sockets are not directly stacked by the SCC walk. The python script below [0] sends a listener's fd to its embryo as OOB data. While GC does collect the embryo's queue, it fails to drop the OOB skb's refcount. The skb which was in embryo's receive queue stays as unix_sk(sk)->oob_skb and keeps the listener's refcount [1]. Tell GC to dispose embryo's oob_skb. [0]: from array import array from socket import * addr = '\x00unix-oob' lis = socket(AF_UNIX, SOCK_STREAM) lis.bind(addr) lis.listen(1) s = socket(AF_UNIX, SOCK_STREAM) s.connect(addr) scm = (SOL_SOCKET, SCM_RIGHTS, array('i', [lis.fileno()])) s.sendmsg([b'x'], [scm], MSG_OOB) lis.close() [1] $ grep unix-oob /proc/net/unix $ ./unix-oob.py $ grep unix-oob /proc/net/unix 0000000000000000: 00000002 00000000 00000000 0001 02 0 @unix-oob 0000000000000000: 00000002 00000000 00010000 0001 01 6072 @unix-oob Fixes: 4090fa373f0e ("af_unix: Replace garbage collection algorithm.") Signed-off-by: Michal Luczaj <mhal@rbox.co> --- net/unix/garbage.c | 25 ++++++++++++++++--------- 1 file changed, 16 insertions(+), 9 deletions(-)