Message ID | 20240515003204.43153-2-kuniyu@amazon.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | af_unix: Fix memleak and null-ptr-deref around MSG_OOB and GC. | expand |
On 5/15/24 02:32, Kuniyuki Iwashima wrote: > ... > The python script below [0] sends a listener's fd to its embryo as OOB > data. Then, GC does not iterates the embryo from the listener to drop > the OOB skb's refcount, and the skb in embryo's receive queue keeps the > listener's refcount. As a result, the listener is leaked and the warning > [1] is hit. > ... Sorry, this does not convey what I wrote. And I think your edit is incorrect. GC starts from the in-flight listener and *does* iterate the embryo; see scan_children() where scan_inflight() is called for all the embryos. The skb in embryo's RQ *does not* keep the listener's refcount; skb from RQ ends up in the hit list and is purged. It is embryo's oob_skb that holds the refcount; see how __unix_gc() goes over gc_candidates attempting to kfree_skb(u->oob_skb), notice that `u` here is a listener, not an embryo. I understand you're "in rush for the merge window", but would it be okay if I ask you not to edit my commit messages so heavily? Thanks, Michal
From: Michal Luczaj <mhal@rbox.co> Date: Wed, 15 May 2024 11:34:51 +0200 > On 5/15/24 02:32, Kuniyuki Iwashima wrote: > > ... > > The python script below [0] sends a listener's fd to its embryo as OOB > > data. Then, GC does not iterates the embryo from the listener to drop > > the OOB skb's refcount, and the skb in embryo's receive queue keeps the > > listener's refcount. As a result, the listener is leaked and the warning > > [1] is hit. > > ... > > Sorry, this does not convey what I wrote. And I think your edit is > incorrect. > > GC starts from the in-flight listener and *does* iterate the embryo; see > scan_children() where scan_inflight() is called for all the embryos. I meant the current code does not call skb_unref() for embryos's OOB skb because it's done _after_ scan_inflight(), not in scan_inflight(). > The skb in embryo's RQ *does not* keep the listener's refcount; skb from RQ > ends up in the hit list and is purged. unix_sk(sk)->oob_skb is a pointer to skb in recvq. Perhaps I should have written "the skb which was in embryo's receive queue stays as unix_sk(sk)->oob_skb and keeps the listener's refcount". > It is embryo's oob_skb that holds the refcount; see how __unix_gc() goes > over gc_candidates attempting to kfree_skb(u->oob_skb), notice that `u` > here is a listener, not an embryo. > > I understand you're "in rush for the merge window", but would it be okay if > I ask you not to edit my commit messages so heavily? I noticed the new gc code was merged in Linus' tree. It's still not synced with net.git, but I guess it will be done soon and your patch will not apply on net.git. Then, I cannot include your patch as a series, so please feel free to send it to each stable tree. Thanks
On 5/15/24 15:35, Kuniyuki Iwashima wrote: > From: Michal Luczaj <mhal@rbox.co> > Date: Wed, 15 May 2024 11:34:51 +0200 >> On 5/15/24 02:32, Kuniyuki Iwashima wrote: >>> ... >>> The python script below [0] sends a listener's fd to its embryo as OOB >>> data. Then, GC does not iterates the embryo from the listener to drop >>> the OOB skb's refcount, and the skb in embryo's receive queue keeps the >>> listener's refcount. As a result, the listener is leaked and the warning >>> [1] is hit. >>> ... >> >> Sorry, this does not convey what I wrote. And I think your edit is >> incorrect. >> >> GC starts from the in-flight listener and *does* iterate the embryo; see >> scan_children() where scan_inflight() is called for all the embryos. > > I meant the current code does not call skb_unref() for embryos's OOB skb > because it's done _after_ scan_inflight(), not in scan_inflight(). Right, I think I see what you mean. >> The skb in embryo's RQ *does not* keep the listener's refcount; skb from RQ >> ends up in the hit list and is purged. > > unix_sk(sk)->oob_skb is a pointer to skb in recvq. Perhaps I should > have written "the skb which was in embryo's receive queue stays as > unix_sk(sk)->oob_skb and keeps the listener's refcount". I wholeheartedly concur with you! >> It is embryo's oob_skb that holds the refcount; see how __unix_gc() goes >> over gc_candidates attempting to kfree_skb(u->oob_skb), notice that `u` >> here is a listener, not an embryo. >> >> I understand you're "in rush for the merge window", but would it be okay if >> I ask you not to edit my commit messages so heavily? > > I noticed the new gc code was merged in Linus' tree. It's still not > synced with net.git, but I guess it will be done soon and your patch > will not apply on net.git. Then, I cannot include your patch as a > series, so please feel free to send it to each stable tree. All right, no problem. Does it mean you'll be posting PATCH 2/2 ("af_unix: Update unix_sk(sk)->oob_skb under sk_receive_queue lock") to stable(s)? Moving on to the New GC: Python test from this patch shows that the New GC is memleaking in pretty much the same fashion. $ grep splat /proc/net/unix $ ./unix-oob-splat.py $ rm unix-oob-splat $ ./unix-oob-splat.py $ grep splat /proc/net/unix 0000000000000000: 00000002 00000000 00000000 0001 02 0 unix-oob-splat 0000000000000000: 00000002 00000000 00000000 0001 02 0 unix-oob-splat 0000000000000000: 00000002 00000000 00010000 0001 01 6643 unix-oob-splat 0000000000000000: 00000002 00000000 00010000 0001 01 2920 unix-oob-splat I've posted a patch: https://lore.kernel.org/netdev/20240516103049.1132040-1-mhal@rbox.co/ I tried to align with your version of the commit message, but feel free to chime in. Also, I took the liberty to introduce a small sanity check. Would you prefer if I dropped this hunk or possibly made it a separate patch? Thanks!
From: Michal Luczaj <mhal@rbox.co> Date: Thu, 16 May 2024 12:33:35 +0200 > On 5/15/24 15:35, Kuniyuki Iwashima wrote: > > From: Michal Luczaj <mhal@rbox.co> > > Date: Wed, 15 May 2024 11:34:51 +0200 > >> On 5/15/24 02:32, Kuniyuki Iwashima wrote: > >>> ... > >>> The python script below [0] sends a listener's fd to its embryo as OOB > >>> data. Then, GC does not iterates the embryo from the listener to drop > >>> the OOB skb's refcount, and the skb in embryo's receive queue keeps the > >>> listener's refcount. As a result, the listener is leaked and the warning > >>> [1] is hit. > >>> ... > >> > >> Sorry, this does not convey what I wrote. And I think your edit is > >> incorrect. > >> > >> GC starts from the in-flight listener and *does* iterate the embryo; see > >> scan_children() where scan_inflight() is called for all the embryos. > > > > I meant the current code does not call skb_unref() for embryos's OOB skb > > because it's done _after_ scan_inflight(), not in scan_inflight(). > > Right, I think I see what you mean. > > >> The skb in embryo's RQ *does not* keep the listener's refcount; skb from RQ > >> ends up in the hit list and is purged. > > > > unix_sk(sk)->oob_skb is a pointer to skb in recvq. Perhaps I should > > have written "the skb which was in embryo's receive queue stays as > > unix_sk(sk)->oob_skb and keeps the listener's refcount". > > I wholeheartedly concur with you! > > >> It is embryo's oob_skb that holds the refcount; see how __unix_gc() goes > >> over gc_candidates attempting to kfree_skb(u->oob_skb), notice that `u` > >> here is a listener, not an embryo. > >> > >> I understand you're "in rush for the merge window", but would it be okay if > >> I ask you not to edit my commit messages so heavily? > > > > I noticed the new gc code was merged in Linus' tree. It's still not > > synced with net.git, but I guess it will be done soon and your patch > > will not apply on net.git. Then, I cannot include your patch as a > > series, so please feel free to send it to each stable tree. > > All right, no problem. Does it mean you'll be posting PATCH 2/2 ("af_unix: > Update unix_sk(sk)->oob_skb under sk_receive_queue lock") to stable(s)? I'll post patch 2/2 to net.git and it will be sent to stable later by netdev maintainers. Then, with your patch, the issue is completely fixed for the old gc. > > Moving on to the New GC: Python test from this patch shows that the New GC > is memleaking in pretty much the same fashion. Good catch! > > $ grep splat /proc/net/unix > $ ./unix-oob-splat.py > $ rm unix-oob-splat > $ ./unix-oob-splat.py > $ grep splat /proc/net/unix > 0000000000000000: 00000002 00000000 00000000 0001 02 0 unix-oob-splat > 0000000000000000: 00000002 00000000 00000000 0001 02 0 unix-oob-splat > 0000000000000000: 00000002 00000000 00010000 0001 01 6643 unix-oob-splat > 0000000000000000: 00000002 00000000 00010000 0001 01 2920 unix-oob-splat > > I've posted a patch: > https://lore.kernel.org/netdev/20240516103049.1132040-1-mhal@rbox.co/ > > I tried to align with your version of the commit message, but feel free to > chime in. Also, I took the liberty to introduce a small sanity check. Would > you prefer if I dropped this hunk or possibly made it a separate patch? Will comment on the patch thread. Thanks!
diff --git a/net/unix/garbage.c b/net/unix/garbage.c index 0104be9d4704..beecd0bfbf48 100644 --- a/net/unix/garbage.c +++ b/net/unix/garbage.c @@ -170,10 +170,11 @@ static void scan_inflight(struct sock *x, void (*func)(struct unix_sock *), /* Process the descriptors of this socket */ int nfd = UNIXCB(skb).fp->count; struct file **fp = UNIXCB(skb).fp->fp; + struct unix_sock *u; while (nfd--) { /* Get the socket the fd matches if it indeed does so */ - struct unix_sock *u = unix_get_socket(*fp++); + u = unix_get_socket(*fp++); /* Ignore non-candidates, they could have been added * to the queues after starting the garbage collection @@ -187,6 +188,14 @@ static void scan_inflight(struct sock *x, void (*func)(struct unix_sock *), if (hit && hitlist != NULL) { __skb_unlink(skb, &x->sk_receive_queue); __skb_queue_tail(hitlist, skb); + +#if IS_ENABLED(CONFIG_AF_UNIX_OOB) + u = unix_sk(x); + if (u->oob_skb == skb) { + WARN_ON_ONCE(skb_unref(u->oob_skb)); + u->oob_skb = NULL; + } +#endif } } } @@ -338,17 +347,9 @@ static void __unix_gc(struct work_struct *work) * which are creating the cycle(s). */ skb_queue_head_init(&hitlist); - list_for_each_entry(u, &gc_candidates, link) { + list_for_each_entry(u, &gc_candidates, link) scan_children(&u->sk, inc_inflight, &hitlist); -#if IS_ENABLED(CONFIG_AF_UNIX_OOB) - if (u->oob_skb) { - kfree_skb(u->oob_skb); - u->oob_skb = NULL; - } -#endif - } - /* not_cycle_list contains those sockets which do not make up a * cycle. Restore these to the inflight list. */