Message ID | 20221129025249.463833-1-yin31149@gmail.com (mailing list archive) |
---|---|
State | Deferred |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [v3] net: sched: fix memory leak in tcindex_set_parms | expand |
On Tue, 2022-11-29 at 10:52 +0800, Hawkins Jiawei wrote: > Syzkaller reports a memory leak as follows: > ==================================== > BUG: memory leak > unreferenced object 0xffff88810c287f00 (size 256): > comm "syz-executor105", pid 3600, jiffies 4294943292 (age 12.990s) > hex dump (first 32 bytes): > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > backtrace: > [<ffffffff814cf9f0>] kmalloc_trace+0x20/0x90 mm/slab_common.c:1046 > [<ffffffff839c9e07>] kmalloc include/linux/slab.h:576 [inline] > [<ffffffff839c9e07>] kmalloc_array include/linux/slab.h:627 [inline] > [<ffffffff839c9e07>] kcalloc include/linux/slab.h:659 [inline] > [<ffffffff839c9e07>] tcf_exts_init include/net/pkt_cls.h:250 [inline] > [<ffffffff839c9e07>] tcindex_set_parms+0xa7/0xbe0 net/sched/cls_tcindex.c:342 > [<ffffffff839caa1f>] tcindex_change+0xdf/0x120 net/sched/cls_tcindex.c:553 > [<ffffffff8394db62>] tc_new_tfilter+0x4f2/0x1100 net/sched/cls_api.c:2147 > [<ffffffff8389e91c>] rtnetlink_rcv_msg+0x4dc/0x5d0 net/core/rtnetlink.c:6082 > [<ffffffff839eba67>] netlink_rcv_skb+0x87/0x1d0 net/netlink/af_netlink.c:2540 > [<ffffffff839eab87>] netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline] > [<ffffffff839eab87>] netlink_unicast+0x397/0x4c0 net/netlink/af_netlink.c:1345 > [<ffffffff839eb046>] netlink_sendmsg+0x396/0x710 net/netlink/af_netlink.c:1921 > [<ffffffff8383e796>] sock_sendmsg_nosec net/socket.c:714 [inline] > [<ffffffff8383e796>] sock_sendmsg+0x56/0x80 net/socket.c:734 > [<ffffffff8383eb08>] ____sys_sendmsg+0x178/0x410 net/socket.c:2482 > [<ffffffff83843678>] ___sys_sendmsg+0xa8/0x110 net/socket.c:2536 > [<ffffffff838439c5>] __sys_sendmmsg+0x105/0x330 net/socket.c:2622 > [<ffffffff83843c14>] __do_sys_sendmmsg net/socket.c:2651 [inline] > [<ffffffff83843c14>] __se_sys_sendmmsg net/socket.c:2648 [inline] > [<ffffffff83843c14>] __x64_sys_sendmmsg+0x24/0x30 net/socket.c:2648 > [<ffffffff84605fd5>] do_syscall_x64 arch/x86/entry/common.c:50 [inline] > [<ffffffff84605fd5>] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 > [<ffffffff84800087>] entry_SYSCALL_64_after_hwframe+0x63/0xcd > ==================================== > > Kernel uses tcindex_change() to change an existing > filter properties. During the process of changing, > kernel uses tcindex_alloc_perfect_hash() to newly > allocate filter results, uses tcindex_filter_result_init() > to clear the old filter result. > > Yet the problem is that, kernel clears the old > filter result, without destroying its tcf_exts structure, > which triggers the above memory leak. > > Considering that there already extis a tc_filter_wq workqueue > to destroy the old tcindex_data by tcindex_partial_destroy_work() > at the end of tcindex_set_parms(), this patch solves this memory > leak bug by removing this old filter result clearing part, > and delegating it to the tc_filter_wq workqueue. > > [Thanks to the suggestion from Jakub Kicinski, Cong Wang, Paolo Abeni > and Dmitry Vyukov] > > Fixes: b9a24bb76bf6 ("net_sched: properly handle failure case of tcf_exts_init()") > Link: https://lore.kernel.org/all/0000000000001de5c505ebc9ec59@google.com/ > Reported-by: syzbot+232ebdbd36706c965ebf@syzkaller.appspotmail.com > Tested-by: syzbot+232ebdbd36706c965ebf@syzkaller.appspotmail.com > Cc: Cong Wang <cong.wang@bytedance.com> > Cc: Jakub Kicinski <kuba@kernel.org> > Cc: Paolo Abeni <pabeni@redhat.com> > Cc: Dmitry Vyukov <dvyukov@google.com> > Signed-off-by: Hawkins Jiawei <yin31149@gmail.com> The patch looks correct to me, but we are very late in this release cycle, and I fear there is a chance of introducing some regression. The issue addressed here is present since quite some time, I suggest to postpone this fix to the beginning of the next release cycle. Please, repost this patch after that 6.1 is released, thanks! (And feel free to add my Acked-by). Paolo
On Thu, 1 Dec 2022 at 18:24, Paolo Abeni <pabeni@redhat.com> wrote: > > On Tue, 2022-11-29 at 10:52 +0800, Hawkins Jiawei wrote: > > Syzkaller reports a memory leak as follows: > > ==================================== > > BUG: memory leak > > unreferenced object 0xffff88810c287f00 (size 256): > > comm "syz-executor105", pid 3600, jiffies 4294943292 (age 12.990s) > > hex dump (first 32 bytes): > > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > > backtrace: > > [<ffffffff814cf9f0>] kmalloc_trace+0x20/0x90 mm/slab_common.c:1046 > > [<ffffffff839c9e07>] kmalloc include/linux/slab.h:576 [inline] > > [<ffffffff839c9e07>] kmalloc_array include/linux/slab.h:627 [inline] > > [<ffffffff839c9e07>] kcalloc include/linux/slab.h:659 [inline] > > [<ffffffff839c9e07>] tcf_exts_init include/net/pkt_cls.h:250 [inline] > > [<ffffffff839c9e07>] tcindex_set_parms+0xa7/0xbe0 net/sched/cls_tcindex.c:342 > > [<ffffffff839caa1f>] tcindex_change+0xdf/0x120 net/sched/cls_tcindex.c:553 > > [<ffffffff8394db62>] tc_new_tfilter+0x4f2/0x1100 net/sched/cls_api.c:2147 > > [<ffffffff8389e91c>] rtnetlink_rcv_msg+0x4dc/0x5d0 net/core/rtnetlink.c:6082 > > [<ffffffff839eba67>] netlink_rcv_skb+0x87/0x1d0 net/netlink/af_netlink.c:2540 > > [<ffffffff839eab87>] netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline] > > [<ffffffff839eab87>] netlink_unicast+0x397/0x4c0 net/netlink/af_netlink.c:1345 > > [<ffffffff839eb046>] netlink_sendmsg+0x396/0x710 net/netlink/af_netlink.c:1921 > > [<ffffffff8383e796>] sock_sendmsg_nosec net/socket.c:714 [inline] > > [<ffffffff8383e796>] sock_sendmsg+0x56/0x80 net/socket.c:734 > > [<ffffffff8383eb08>] ____sys_sendmsg+0x178/0x410 net/socket.c:2482 > > [<ffffffff83843678>] ___sys_sendmsg+0xa8/0x110 net/socket.c:2536 > > [<ffffffff838439c5>] __sys_sendmmsg+0x105/0x330 net/socket.c:2622 > > [<ffffffff83843c14>] __do_sys_sendmmsg net/socket.c:2651 [inline] > > [<ffffffff83843c14>] __se_sys_sendmmsg net/socket.c:2648 [inline] > > [<ffffffff83843c14>] __x64_sys_sendmmsg+0x24/0x30 net/socket.c:2648 > > [<ffffffff84605fd5>] do_syscall_x64 arch/x86/entry/common.c:50 [inline] > > [<ffffffff84605fd5>] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 > > [<ffffffff84800087>] entry_SYSCALL_64_after_hwframe+0x63/0xcd > > ==================================== > > > > Kernel uses tcindex_change() to change an existing > > filter properties. During the process of changing, > > kernel uses tcindex_alloc_perfect_hash() to newly > > allocate filter results, uses tcindex_filter_result_init() > > to clear the old filter result. > > > > Yet the problem is that, kernel clears the old > > filter result, without destroying its tcf_exts structure, > > which triggers the above memory leak. > > > > Considering that there already extis a tc_filter_wq workqueue > > to destroy the old tcindex_data by tcindex_partial_destroy_work() > > at the end of tcindex_set_parms(), this patch solves this memory > > leak bug by removing this old filter result clearing part, > > and delegating it to the tc_filter_wq workqueue. > > > > [Thanks to the suggestion from Jakub Kicinski, Cong Wang, Paolo Abeni > > and Dmitry Vyukov] > > > > Fixes: b9a24bb76bf6 ("net_sched: properly handle failure case of tcf_exts_init()") > > Link: https://lore.kernel.org/all/0000000000001de5c505ebc9ec59@google.com/ > > Reported-by: syzbot+232ebdbd36706c965ebf@syzkaller.appspotmail.com > > Tested-by: syzbot+232ebdbd36706c965ebf@syzkaller.appspotmail.com > > Cc: Cong Wang <cong.wang@bytedance.com> > > Cc: Jakub Kicinski <kuba@kernel.org> > > Cc: Paolo Abeni <pabeni@redhat.com> > > Cc: Dmitry Vyukov <dvyukov@google.com> > > Signed-off-by: Hawkins Jiawei <yin31149@gmail.com> > > The patch looks correct to me, but we are very late in this release > cycle, and I fear there is a chance of introducing some regression. The > issue addressed here is present since quite some time, I suggest to > postpone this fix to the beginning of the next release cycle. > > Please, repost this patch after that 6.1 is released, thanks! (And feel > free to add my Acked-by). Thanks for your review. I will retest this patch after 6.1, and repost this patch if the patch works fine. > > Paolo >
On Tue, Nov 29, 2022 at 10:52:49AM +0800, Hawkins Jiawei wrote: > Kernel uses tcindex_change() to change an existing > filter properties. During the process of changing, > kernel uses tcindex_alloc_perfect_hash() to newly > allocate filter results, uses tcindex_filter_result_init() > to clear the old filter result. > > Yet the problem is that, kernel clears the old > filter result, without destroying its tcf_exts structure, > which triggers the above memory leak. > > Considering that there already extis a tc_filter_wq workqueue > to destroy the old tcindex_data by tcindex_partial_destroy_work() > at the end of tcindex_set_parms(), this patch solves this memory > leak bug by removing this old filter result clearing part, > and delegating it to the tc_filter_wq workqueue. Hmm?? The tcindex_partial_destroy_work() is to destroy 'oldp' which is different from 'old_r'. I mean, you seem assuming that struct tcindex_filter_result is always from struct tcindex_data, which is not true, check the following tcindex_lookup() which retrieves tcindex_filter_result from struct tcindex_filter. static struct tcindex_filter_result *tcindex_lookup(struct tcindex_data *p, u16 key) { if (p->perfect) { struct tcindex_filter_result *f = p->perfect + key; return tcindex_filter_is_set(f) ? f : NULL; } else if (p->h) { struct tcindex_filter __rcu **fp; struct tcindex_filter *f; fp = &p->h[key % p->hash]; for (f = rcu_dereference_bh_rtnl(*fp); f; fp = &f->next, f = rcu_dereference_bh_rtnl(*fp)) if (f->key == key) return &f->result; } return NULL; } > diff --git a/net/sched/cls_tcindex.c b/net/sched/cls_tcindex.c > index 1c9eeb98d826..3f4e7a6cdd96 100644 > --- a/net/sched/cls_tcindex.c > +++ b/net/sched/cls_tcindex.c > @@ -478,14 +478,6 @@ tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base, > tcf_bind_filter(tp, &cr, base); > } > > - if (old_r && old_r != r) { > - err = tcindex_filter_result_init(old_r, cp, net); > - if (err < 0) { > - kfree(f); > - goto errout_alloc; > - } > - } > - Even if your above analysis is correct, 'old_r' becomes unused (set but not used) now, I think you should get some compiler warning. Thanks.
On Sun, 4 Dec 2022 at 04:19, Cong Wang <xiyou.wangcong@gmail.com> wrote: > > On Tue, Nov 29, 2022 at 10:52:49AM +0800, Hawkins Jiawei wrote: > > Kernel uses tcindex_change() to change an existing > > filter properties. During the process of changing, > > kernel uses tcindex_alloc_perfect_hash() to newly > > allocate filter results, uses tcindex_filter_result_init() > > to clear the old filter result. > > > > Yet the problem is that, kernel clears the old > > filter result, without destroying its tcf_exts structure, > > which triggers the above memory leak. > > > > Considering that there already extis a tc_filter_wq workqueue > > to destroy the old tcindex_data by tcindex_partial_destroy_work() > > at the end of tcindex_set_parms(), this patch solves this memory > > leak bug by removing this old filter result clearing part, > > and delegating it to the tc_filter_wq workqueue. > > Hmm?? The tcindex_partial_destroy_work() is to destroy 'oldp' which is > different from 'old_r'. I mean, you seem assuming that struct > tcindex_filter_result is always from struct tcindex_data, which is not > true, check the following tcindex_lookup() which retrieves tcindex_filter_result > from struct tcindex_filter. > > static struct tcindex_filter_result *tcindex_lookup(struct tcindex_data *p, > u16 key) > { > if (p->perfect) { > struct tcindex_filter_result *f = p->perfect + key; > > return tcindex_filter_is_set(f) ? f : NULL; > } else if (p->h) { > struct tcindex_filter __rcu **fp; > struct tcindex_filter *f; > > fp = &p->h[key % p->hash]; > for (f = rcu_dereference_bh_rtnl(*fp); > f; > fp = &f->next, f = rcu_dereference_bh_rtnl(*fp)) > if (f->key == key) > return &f->result; > } > > return NULL; > } Oh, thanks for correcting me! You are right, I wrongly assuming that struct tcindex_filter_result is always from struct tcindex_data `perfect` field. But I think this patch still can fix this problem, after reviewing the tcindex_set_parms(). Because only the `tcindex_filter_result` is from `struct tcindex_data`, can the code reaches the deleted part in this patch. To be more specific, the simplified logic about original tcindex_set_parms() is as below: static int tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base, u32 handle, struct tcindex_data *p, struct tcindex_filter_result *r, struct nlattr **tb, struct nlattr *est, u32 flags, struct netlink_ext_ack *extack) { ... if (p->perfect) { int i; if (tcindex_alloc_perfect_hash(net, cp) < 0) goto errout; cp->alloc_hash = cp->hash; for (i = 0; i < min(cp->hash, p->hash); i++) cp->perfect[i].res = p->perfect[i].res; balloc = 1; } cp->h = p->h; ... if (cp->perfect) r = cp->perfect + handle; else r = tcindex_lookup(cp, handle) ? : &new_filter_result; if (old_r && old_r != r) { err = tcindex_filter_result_init(old_r, cp, net); if (err < 0) { kfree(f); goto errout_alloc; } } ... } - cp's h field is directly copied from p's h field - if `old_r` is retrieved from struct tcindex_filter, in other word, is retrieved from p's h field. Then the `r` should get the same value from `tcindex_loopup(cp, handle)`. - so `old_r == r` is true, code will never uses tcindex_filter_result_init() to clear the old_r in such case. So I think this patch still can fix this memory leak caused by tcindex_filter_result_init(), But maybe I need to improve my commit message. Please correct me If I am wrong. > > diff --git a/net/sched/cls_tcindex.c b/net/sched/cls_tcindex.c > > index 1c9eeb98d826..3f4e7a6cdd96 100644 > > --- a/net/sched/cls_tcindex.c > > +++ b/net/sched/cls_tcindex.c > > @@ -478,14 +478,6 @@ tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base, > > tcf_bind_filter(tp, &cr, base); > > } > > > > - if (old_r && old_r != r) { > > - err = tcindex_filter_result_init(old_r, cp, net); > > - if (err < 0) { > > - kfree(f); > > - goto errout_alloc; > > - } > > - } > > - > > Even if your above analysis is correct, 'old_r' becomes unused (set but not used) > now, I think you should get some compiler warning. Oh, it actually didn't trigger any compiler warning, because there is still a used place as below: static int tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base, u32 handle, struct tcindex_data *p, struct tcindex_filter_result *r, struct nlattr **tb, struct nlattr *est, u32 flags, struct netlink_ext_ack *extack) { struct tcindex_filter_result new_filter_result, *old_r = r; ... err = tcindex_filter_result_init(&new_filter_result, cp, net); if (err < 0) goto errout_alloc; if (old_r) cr = r->res; ... } But the `old_r` and `r` has the same value here, so we can just replace the `old_r` with `r` here, and delete the `old_r` as you suggested. Thanks for your suggestion! > > Thanks.
On Mon, Dec 05, 2022 at 11:19:56PM +0800, Hawkins Jiawei wrote: > To be more specific, the simplified logic about original > tcindex_set_parms() is as below: > > static int > tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base, > u32 handle, struct tcindex_data *p, > struct tcindex_filter_result *r, struct nlattr **tb, > struct nlattr *est, u32 flags, struct netlink_ext_ack *extack) > { > ... > if (p->perfect) { > int i; > > if (tcindex_alloc_perfect_hash(net, cp) < 0) > goto errout; > cp->alloc_hash = cp->hash; > for (i = 0; i < min(cp->hash, p->hash); i++) > cp->perfect[i].res = p->perfect[i].res; > balloc = 1; > } > cp->h = p->h; > > ... > > if (cp->perfect) > r = cp->perfect + handle; We can reach here if p->perfect is non-NULL. > else > r = tcindex_lookup(cp, handle) ? : &new_filter_result; > > if (old_r && old_r != r) { > err = tcindex_filter_result_init(old_r, cp, net); > if (err < 0) { > kfree(f); > goto errout_alloc; > } > } > ... > } > > - cp's h field is directly copied from p's h field > > - if `old_r` is retrieved from struct tcindex_filter, in other word, > is retrieved from p's h field. Then the `r` should get the same value > from `tcindex_loopup(cp, handle)`. See above, 'r' can be 'cp->perfect + handle' which is newly allocated, hence different from 'old_r'. > > - so `old_r == r` is true, code will never uses tcindex_filter_result_init() > to clear the old_r in such case. Not always. > > So I think this patch still can fix this memory leak caused by > tcindex_filter_result_init(), But maybe I need to improve my > commit message. > I think your patch may introduce other memory leaks and 'old_r' may be left as obsoleted too. Thanks.
On Sun, 11 Dec 2022 at 05:29, Cong Wang <xiyou.wangcong@gmail.com> wrote: > > On Mon, Dec 05, 2022 at 11:19:56PM +0800, Hawkins Jiawei wrote: > > To be more specific, the simplified logic about original > > tcindex_set_parms() is as below: > > > > static int > > tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base, > > u32 handle, struct tcindex_data *p, > > struct tcindex_filter_result *r, struct nlattr **tb, > > struct nlattr *est, u32 flags, struct netlink_ext_ack *extack) > > { > > ... > > if (p->perfect) { > > int i; > > > > if (tcindex_alloc_perfect_hash(net, cp) < 0) > > goto errout; > > cp->alloc_hash = cp->hash; > > for (i = 0; i < min(cp->hash, p->hash); i++) > > cp->perfect[i].res = p->perfect[i].res; > > balloc = 1; > > } > > cp->h = p->h; > > > > ... > > > > if (cp->perfect) > > r = cp->perfect + handle; > > We can reach here if p->perfect is non-NULL. > > > else > > r = tcindex_lookup(cp, handle) ? : &new_filter_result; > > > > if (old_r && old_r != r) { > > err = tcindex_filter_result_init(old_r, cp, net); > > if (err < 0) { > > kfree(f); > > goto errout_alloc; > > } > > } > > ... > > } > > > > - cp's h field is directly copied from p's h field > > > > - if `old_r` is retrieved from struct tcindex_filter, in other word, > > is retrieved from p's h field. Then the `r` should get the same value > > from `tcindex_loopup(cp, handle)`. > > See above, 'r' can be 'cp->perfect + handle' which is newly allocated, > hence different from 'old_r'. But if `r` is `cp->perfect + handle`, this means `cp->perfect` is not NULL. So `p->perfect` should not be NULL, which means `old_r` should be `p->perfect + handle`, according to tcindex_lookup(). This is not correct with the assumption that `old_r` is retrieved from p's h field. > > > > > - so `old_r == r` is true, code will never uses tcindex_filter_result_init() > > to clear the old_r in such case. > > Not always. > > > > > So I think this patch still can fix this memory leak caused by > > tcindex_filter_result_init(), But maybe I need to improve my > > commit message. > > > > I think your patch may introduce other memory leaks and 'old_r' may > be left as obsoleted too. I still think this patch should not introduce any memory leaks. * If the `old_r` is not NULL, it should have only two source according to the tcindex_lookup() - `old_r` is retrieved from `p->perfect`; or `old_r` is retrieved from `p->h`. And if `old_r` is retrieved from `p->h`, this means `p->perfect` is NULL. * If the `old_r` is retrieved from `p->perfect`, kernel uses tcindex_alloc_perfect_hash() to newly allocate the filter results. And `r` should be `cp->perfect + handle`, which is newly allocated. So `r != old_r` in this situation, but kernel will clears the `old_r` at tc_filter_wq workqueue in tcindex_partial_destroy_work(), by destroying the p->perfect. So here kernel doesn't need tcindex_filter_result_init() to clear the old filter result, and there is no memory leak. * If the `old_r` is retrieved from `p->h`, then `p->perfect` is NULL discussed above. Considering that `cp->h` is directly copied from `p->h`, `r` should get the same value as `old_r` from tcindex_lookup(). So `r == old_r`, it will ignore the part that kernel uses tcindex_filter_result_init() to clear the old filter result. So removing this part of code should have no effect in this situation. It seems that whether `old_r` is retrived from `p->h` or `p->perfect`, it is okay to directly deleting the part that kernel uses tcindex_filter_result_init() to clear the old filter result, without any memory leak. But this can fix the memory leak caused by tcindex_filter_result_init(). As for `old_r` may be left as obsoleted, do you mean `old_r` becomes unused(set but not used)? I think we can directly removing `old_r`. > > Thanks.
diff --git a/net/sched/cls_tcindex.c b/net/sched/cls_tcindex.c index 1c9eeb98d826..3f4e7a6cdd96 100644 --- a/net/sched/cls_tcindex.c +++ b/net/sched/cls_tcindex.c @@ -478,14 +478,6 @@ tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base, tcf_bind_filter(tp, &cr, base); } - if (old_r && old_r != r) { - err = tcindex_filter_result_init(old_r, cp, net); - if (err < 0) { - kfree(f); - goto errout_alloc; - } - } - oldp = p; r->res = cr; tcf_exts_change(&r->exts, &e);