Message ID | 20241110172836.331319-1-alexandre.ferrieux@orange.com (mailing list archive) |
---|---|
State | Accepted |
Commit | 73af53d82076bbe184d9ece9e14b0dc8599e6055 |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net,v7] net: sched: cls_u32: Fix u32's systematic failure to free IDR entries for hnodes. | expand |
On 10/11/2024 14:28, Alexandre Ferrieux wrote: > To generate hnode handles (in gen_new_htid()), u32 uses IDR and > encodes the returned small integer into a structured 32-bit > word. Unfortunately, at disposal time, the needed decoding > is not done. As a result, idr_remove() fails, and the IDR > fills up. Since its size is 2048, the following script ends up > with "Filter already exists": > > tc filter add dev myve $FILTER1 > tc filter add dev myve $FILTER2 > for i in {1..2048} > do > echo $i > tc filter del dev myve $FILTER2 > tc filter add dev myve $FILTER2 > done > > This patch adds the missing decoding logic for handles that > deserve it. > > Fixes: e7614370d6f0 ("net_sched: use idr to allocate u32 filter handles") > Reviewed-by: Eric Dumazet <edumazet@google.com> > Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> > Signed-off-by: Alexandre Ferrieux <alexandre.ferrieux@orange.com> Tested-by: Victor Nogueira <victor@mojatatu.com>
Hello: This patch was applied to netdev/net.git (main) by Jakub Kicinski <kuba@kernel.org>: On Sun, 10 Nov 2024 18:28:36 +0100 you wrote: > To generate hnode handles (in gen_new_htid()), u32 uses IDR and > encodes the returned small integer into a structured 32-bit > word. Unfortunately, at disposal time, the needed decoding > is not done. As a result, idr_remove() fails, and the IDR > fills up. Since its size is 2048, the following script ends up > with "Filter already exists": > > [...] Here is the summary with links: - [net,v7] net: sched: cls_u32: Fix u32's systematic failure to free IDR entries for hnodes. https://git.kernel.org/netdev/net/c/73af53d82076 You are awesome, thank you!
Hi, In the recent fix of u32's IDR leaks, one side remark is that the problem went unnoticed for 7 years due to the NULL result from idr_remove() being ignored at this call site. Now, a cursory grep over the whole Linux tree shows 306 out of 386 call sites (excluding those hidden in macros, if any) don't bother to extract the value returned by idr_remove(). Indeed, a failed IDR removal is "mostly harmless" since IDs are not pointers so the mismatch is detectable (and is detected, returning NULL). However, in racy situations you may end up killing an innocent fresh entry, which may really break things a bit later. And in all cases, a true bug is the root cause. So, unless we have reasons to think cls_u32 was the only place where two ID encodings might lend themselves to confusion, I'm wondering if it wouldn't make sense to chase the issue more systematically: - either with WARN_ON[_ONCE](idr_remove()==NULL) on each call site individually (a year-long endeavor implying tens of maintainers) - or with WARN_ON[_ONCE] just before returning NULL within idr_remove() itself, or even radix_tree_delete_item(). Opinions ?
On Thu, Nov 14, 2024 at 07:24:27PM +0100, Alexandre Ferrieux wrote: > Hi, > > In the recent fix of u32's IDR leaks, one side remark is that the problem went > unnoticed for 7 years due to the NULL result from idr_remove() being ignored at > this call site. I'd blame the lack of self test coverage. :) > > Now, a cursory grep over the whole Linux tree shows 306 out of 386 call sites > (excluding those hidden in macros, if any) don't bother to extract the value > returned by idr_remove(). > > Indeed, a failed IDR removal is "mostly harmless" since IDs are not pointers so > the mismatch is detectable (and is detected, returning NULL). However, in racy > situations you may end up killing an innocent fresh entry, which may really > break things a bit later. And in all cases, a true bug is the root cause. > > So, unless we have reasons to think cls_u32 was the only place where two ID > encodings might lend themselves to confusion, I'm wondering if it wouldn't make > sense to chase the issue more systematically: > > - either with WARN_ON[_ONCE](idr_remove()==NULL) on each call site individually > (a year-long endeavor implying tens of maintainers) > > - or with WARN_ON[_ONCE] just before returning NULL within idr_remove() itself, > or even radix_tree_delete_item(). > > Opinions ? Yeah, or simply WARN_ON uncleaned IDR in idr_destroy(), which is a more common pattern. Thanks.
On Mon, Nov 18, 2024 at 07:51:47PM -0800, Cong Wang wrote: > On Thu, Nov 14, 2024 at 07:24:27PM +0100, Alexandre Ferrieux wrote: > > Hi, > > > > In the recent fix of u32's IDR leaks, one side remark is that the problem went > > unnoticed for 7 years due to the NULL result from idr_remove() being ignored at > > this call site. > > I'd blame the lack of self test coverage. :) > > > > > Now, a cursory grep over the whole Linux tree shows 306 out of 386 call sites > > (excluding those hidden in macros, if any) don't bother to extract the value > > returned by idr_remove(). > > > > Indeed, a failed IDR removal is "mostly harmless" since IDs are not pointers so > > the mismatch is detectable (and is detected, returning NULL). However, in racy > > situations you may end up killing an innocent fresh entry, which may really > > break things a bit later. And in all cases, a true bug is the root cause. > > > > So, unless we have reasons to think cls_u32 was the only place where two ID > > encodings might lend themselves to confusion, I'm wondering if it wouldn't make > > sense to chase the issue more systematically: > > > > - either with WARN_ON[_ONCE](idr_remove()==NULL) on each call site individually > > (a year-long endeavor implying tens of maintainers) > > > > - or with WARN_ON[_ONCE] just before returning NULL within idr_remove() itself, > > or even radix_tree_delete_item(). > > > > Opinions ? > > Yeah, or simply WARN_ON uncleaned IDR in idr_destroy(), which is a more > common pattern. Something like this (or, of course, move it to caller to reduce the noise): diff --git a/lib/radix-tree.c b/lib/radix-tree.c index 976b9bd02a1b..20cc690ffb32 100644 --- a/lib/radix-tree.c +++ b/lib/radix-tree.c @@ -1559,6 +1559,8 @@ void __rcu **idr_get_free(struct radix_tree_root *root, void idr_destroy(struct idr *idr) { struct radix_tree_node *node = rcu_dereference_raw(idr->idr_rt.xa_head); + + WARN_ON(!idr_is_empty(idr)); if (radix_tree_is_internal_node(node)) radix_tree_free_nodes(node); idr->idr_rt.xa_head = NULL;
On 19/11/2024 04:51, Cong Wang wrote: > On Thu, Nov 14, 2024 at 07:24:27PM +0100, Alexandre Ferrieux wrote: >> Hi, >> >> In the recent fix of u32's IDR leaks, one side remark is that the problem went >> unnoticed for 7 years due to the NULL result from idr_remove() being ignored at >> this call site. >> [...] >> So, unless we have reasons to think cls_u32 was the only place where two ID >> encodings might lend themselves to confusion, I'm wondering if it wouldn't make >> sense to chase the issue more systematically: >> >> - either with WARN_ON[_ONCE](idr_remove()==NULL) on each call site individually >> (a year-long endeavor implying tens of maintainers) >> >> - or with WARN_ON[_ONCE] just before returning NULL within idr_remove() itself, >> or even radix_tree_delete_item(). >> >> Opinions ? > > Yeah, or simply WARN_ON uncleaned IDR in idr_destroy(), which is a more > common pattern. No, in the general case, idr_destroy() only happens at the end of life of an IDR set. Some structures in the kernel have a long lifetime, which means possibly splipping out of fuzzers' scrutiny. As an illustration, in cls_u32 itself, in the 2048-delete-add loop I use in the tdc test committed with the fix, idr_destroy(&tp_c->handle_idr) is called only at the "cleanup" step, when deleting the interface. You can only imagine, in the hundreds of other uses of IDR, the "miss rate" that would follow from targeting idr_destroy() instead of idr_remove().
On Tue, Nov 19, 2024 at 07:46:32AM +0100, Alexandre Ferrieux wrote: > On 19/11/2024 04:51, Cong Wang wrote: > > On Thu, Nov 14, 2024 at 07:24:27PM +0100, Alexandre Ferrieux wrote: > >> Hi, > >> > >> In the recent fix of u32's IDR leaks, one side remark is that the problem went > >> unnoticed for 7 years due to the NULL result from idr_remove() being ignored at > >> this call site. > >> [...] > >> So, unless we have reasons to think cls_u32 was the only place where two ID > >> encodings might lend themselves to confusion, I'm wondering if it wouldn't make > >> sense to chase the issue more systematically: > >> > >> - either with WARN_ON[_ONCE](idr_remove()==NULL) on each call site individually > >> (a year-long endeavor implying tens of maintainers) > >> > >> - or with WARN_ON[_ONCE] just before returning NULL within idr_remove() itself, > >> or even radix_tree_delete_item(). > >> > >> Opinions ? > > > > Yeah, or simply WARN_ON uncleaned IDR in idr_destroy(), which is a more > > common pattern. > > No, in the general case, idr_destroy() only happens at the end of life of an IDR > set. Some structures in the kernel have a long lifetime, which means possibly > splipping out of fuzzers' scrutiny. Sure, move it to where you believe is appropriate. It is a very common pattern we detect resource leakage when destroying, for a quick example, in inet_sock_destruct() we detect skb accounting leaks: 153 WARN_ON_ONCE(atomic_read(&sk->sk_rmem_alloc)); 154 WARN_ON_ONCE(refcount_read(&sk->sk_wmem_alloc)); 155 WARN_ON_ONCE(sk->sk_wmem_queued); 156 WARN_ON_ONCE(sk_forward_alloc_get(sk)); Another example of IDR leakage detection can be found in drivers/gpu/drm/vmwgfx/ttm_object.c: 447 void ttm_object_device_release(struct ttm_object_device **p_tdev) 448 { 449 struct ttm_object_device *tdev = *p_tdev; 450 451 *p_tdev = NULL; 452 453 WARN_ON_ONCE(!idr_is_empty(&tdev->idr)); 454 idr_destroy(&tdev->idr); 455 456 kfree(tdev); 457 } > > As an illustration, in cls_u32 itself, in the 2048-delete-add loop I use in the > tdc test committed with the fix, idr_destroy(&tp_c->handle_idr) is called only > at the "cleanup" step, when deleting the interface. > > You can only imagine, in the hundreds of other uses of IDR, the "miss rate" that > would follow from targeting idr_destroy() instead of idr_remove(). > I am not saying it is suitable for this specific case, I am just saying it is a common pattern for you to consier, that's all. Thanks.
diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c index 9412d88a99bc..d3a03c57545b 100644 --- a/net/sched/cls_u32.c +++ b/net/sched/cls_u32.c @@ -92,6 +92,16 @@ struct tc_u_common { long knodes; }; +static u32 handle2id(u32 h) +{ + return ((h & 0x80000000) ? ((h >> 20) & 0x7FF) : h); +} + +static u32 id2handle(u32 id) +{ + return (id | 0x800U) << 20; +} + static inline unsigned int u32_hash_fold(__be32 key, const struct tc_u32_sel *sel, u8 fshift) @@ -310,7 +320,7 @@ static u32 gen_new_htid(struct tc_u_common *tp_c, struct tc_u_hnode *ptr) int id = idr_alloc_cyclic(&tp_c->handle_idr, ptr, 1, 0x7FF, GFP_KERNEL); if (id < 0) return 0; - return (id | 0x800U) << 20; + return id2handle(id); } static struct hlist_head *tc_u_common_hash; @@ -360,7 +370,7 @@ static int u32_init(struct tcf_proto *tp) return -ENOBUFS; refcount_set(&root_ht->refcnt, 1); - root_ht->handle = tp_c ? gen_new_htid(tp_c, root_ht) : 0x80000000; + root_ht->handle = tp_c ? gen_new_htid(tp_c, root_ht) : id2handle(0); root_ht->prio = tp->prio; root_ht->is_root = true; idr_init(&root_ht->handle_idr); @@ -612,7 +622,7 @@ static int u32_destroy_hnode(struct tcf_proto *tp, struct tc_u_hnode *ht, if (phn == ht) { u32_clear_hw_hnode(tp, ht, extack); idr_destroy(&ht->handle_idr); - idr_remove(&tp_c->handle_idr, ht->handle); + idr_remove(&tp_c->handle_idr, handle2id(ht->handle)); RCU_INIT_POINTER(*hn, ht->next); kfree_rcu(ht, rcu); return 0; @@ -989,7 +999,7 @@ static int u32_change(struct net *net, struct sk_buff *in_skb, err = u32_replace_hw_hnode(tp, ht, userflags, extack); if (err) { - idr_remove(&tp_c->handle_idr, handle); + idr_remove(&tp_c->handle_idr, handle2id(handle)); kfree(ht); return err; }