diff mbox series

[net,v7] net: sched: cls_u32: Fix u32's systematic failure to free IDR entries for hnodes.

Message ID 20241110172836.331319-1-alexandre.ferrieux@orange.com (mailing list archive)
State Accepted
Commit 73af53d82076bbe184d9ece9e14b0dc8599e6055
Delegated to: Netdev Maintainers
Headers show
Series [net,v7] net: sched: cls_u32: Fix u32's systematic failure to free IDR entries for hnodes. | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 3 this patch: 3
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers warning 2 maintainers not CCed: kuba@kernel.org pabeni@redhat.com
netdev/build_clang success Errors and warnings before: 3 this patch: 3
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 4 this patch: 4
netdev/checkpatch warning WARNING: From:/Signed-off-by: email address mismatch: 'From: Alexandre Ferrieux <alexandre.ferrieux@gmail.com>' != 'Signed-off-by: Alexandre Ferrieux <alexandre.ferrieux@orange.com>'
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2024-11-11--21-00 (tests: 787)

Commit Message

Alexandre Ferrieux Nov. 10, 2024, 5:28 p.m. UTC
To generate hnode handles (in gen_new_htid()), u32 uses IDR and
encodes the returned small integer into a structured 32-bit
word. Unfortunately, at disposal time, the needed decoding
is not done. As a result, idr_remove() fails, and the IDR
fills up. Since its size is 2048, the following script ends up
with "Filter already exists":

  tc filter add dev myve $FILTER1
  tc filter add dev myve $FILTER2
  for i in {1..2048}
  do
    echo $i
    tc filter del dev myve $FILTER2
    tc filter add dev myve $FILTER2
  done

This patch adds the missing decoding logic for handles that
deserve it.

Fixes: e7614370d6f0 ("net_sched: use idr to allocate u32 filter handles")
Reviewed-by: Eric Dumazet <edumazet@google.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Alexandre Ferrieux <alexandre.ferrieux@orange.com>
---
v7: static inline -> static
v6: big speedup of the tdc test with batch tc
v5: fix title - again
v4: add tdc test
v3: prepend title with subsystem ident
v2: use u32 type in handle encoder/decoder

 net/sched/cls_u32.c                           | 18 ++++++++++----
 .../tc-testing/tc-tests/filters/u32.json      | 24 +++++++++++++++++++
 2 files changed, 38 insertions(+), 4 deletions(-)

Comments

Victor Nogueira Nov. 10, 2024, 6:14 p.m. UTC | #1
On 10/11/2024 14:28, Alexandre Ferrieux wrote:
> To generate hnode handles (in gen_new_htid()), u32 uses IDR and
> encodes the returned small integer into a structured 32-bit
> word. Unfortunately, at disposal time, the needed decoding
> is not done. As a result, idr_remove() fails, and the IDR
> fills up. Since its size is 2048, the following script ends up
> with "Filter already exists":
> 
>    tc filter add dev myve $FILTER1
>    tc filter add dev myve $FILTER2
>    for i in {1..2048}
>    do
>      echo $i
>      tc filter del dev myve $FILTER2
>      tc filter add dev myve $FILTER2
>    done
> 
> This patch adds the missing decoding logic for handles that
> deserve it.
> 
> Fixes: e7614370d6f0 ("net_sched: use idr to allocate u32 filter handles")
> Reviewed-by: Eric Dumazet <edumazet@google.com>
> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
> Signed-off-by: Alexandre Ferrieux <alexandre.ferrieux@orange.com>

Tested-by: Victor Nogueira <victor@mojatatu.com>
patchwork-bot+netdevbpf@kernel.org Nov. 13, 2024, 5 a.m. UTC | #2
Hello:

This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Sun, 10 Nov 2024 18:28:36 +0100 you wrote:
> To generate hnode handles (in gen_new_htid()), u32 uses IDR and
> encodes the returned small integer into a structured 32-bit
> word. Unfortunately, at disposal time, the needed decoding
> is not done. As a result, idr_remove() fails, and the IDR
> fills up. Since its size is 2048, the following script ends up
> with "Filter already exists":
> 
> [...]

Here is the summary with links:
  - [net,v7] net: sched: cls_u32: Fix u32's systematic failure to free IDR entries for hnodes.
    https://git.kernel.org/netdev/net/c/73af53d82076

You are awesome, thank you!
Alexandre Ferrieux Nov. 14, 2024, 6:24 p.m. UTC | #3
Hi,

In the recent fix of u32's IDR leaks, one side remark is that the problem went
unnoticed for 7 years due to the NULL result from idr_remove() being ignored at
this call site.

Now, a cursory grep over the whole Linux tree shows 306 out of 386 call sites
(excluding those hidden in macros, if any) don't bother to extract the value
returned by idr_remove().

Indeed, a failed IDR removal is "mostly harmless" since IDs are not pointers so
the mismatch is detectable (and is detected, returning NULL). However, in racy
situations you may end up killing an innocent fresh entry, which may really
break things a bit later. And in all cases, a true bug is the root cause.

So, unless we have reasons to think cls_u32 was the only place where two ID
encodings might lend themselves to confusion, I'm wondering if it wouldn't make
sense to chase the issue more systematically:

 - either with WARN_ON[_ONCE](idr_remove()==NULL) on each call site individually
(a year-long endeavor implying tens of maintainers)

 - or with WARN_ON[_ONCE] just before returning NULL within idr_remove() itself,
or even radix_tree_delete_item().

Opinions ?
Cong Wang Nov. 19, 2024, 3:51 a.m. UTC | #4
On Thu, Nov 14, 2024 at 07:24:27PM +0100, Alexandre Ferrieux wrote:
> Hi,
> 
> In the recent fix of u32's IDR leaks, one side remark is that the problem went
> unnoticed for 7 years due to the NULL result from idr_remove() being ignored at
> this call site.

I'd blame the lack of self test coverage. :)

> 
> Now, a cursory grep over the whole Linux tree shows 306 out of 386 call sites
> (excluding those hidden in macros, if any) don't bother to extract the value
> returned by idr_remove().
> 
> Indeed, a failed IDR removal is "mostly harmless" since IDs are not pointers so
> the mismatch is detectable (and is detected, returning NULL). However, in racy
> situations you may end up killing an innocent fresh entry, which may really
> break things a bit later. And in all cases, a true bug is the root cause.
> 
> So, unless we have reasons to think cls_u32 was the only place where two ID
> encodings might lend themselves to confusion, I'm wondering if it wouldn't make
> sense to chase the issue more systematically:
> 
>  - either with WARN_ON[_ONCE](idr_remove()==NULL) on each call site individually
> (a year-long endeavor implying tens of maintainers)
> 
>  - or with WARN_ON[_ONCE] just before returning NULL within idr_remove() itself,
> or even radix_tree_delete_item().
> 
> Opinions ?

Yeah, or simply WARN_ON uncleaned IDR in idr_destroy(), which is a more
common pattern.

Thanks.
Cong Wang Nov. 19, 2024, 3:57 a.m. UTC | #5
On Mon, Nov 18, 2024 at 07:51:47PM -0800, Cong Wang wrote:
> On Thu, Nov 14, 2024 at 07:24:27PM +0100, Alexandre Ferrieux wrote:
> > Hi,
> > 
> > In the recent fix of u32's IDR leaks, one side remark is that the problem went
> > unnoticed for 7 years due to the NULL result from idr_remove() being ignored at
> > this call site.
> 
> I'd blame the lack of self test coverage. :)
> 
> > 
> > Now, a cursory grep over the whole Linux tree shows 306 out of 386 call sites
> > (excluding those hidden in macros, if any) don't bother to extract the value
> > returned by idr_remove().
> > 
> > Indeed, a failed IDR removal is "mostly harmless" since IDs are not pointers so
> > the mismatch is detectable (and is detected, returning NULL). However, in racy
> > situations you may end up killing an innocent fresh entry, which may really
> > break things a bit later. And in all cases, a true bug is the root cause.
> > 
> > So, unless we have reasons to think cls_u32 was the only place where two ID
> > encodings might lend themselves to confusion, I'm wondering if it wouldn't make
> > sense to chase the issue more systematically:
> > 
> >  - either with WARN_ON[_ONCE](idr_remove()==NULL) on each call site individually
> > (a year-long endeavor implying tens of maintainers)
> > 
> >  - or with WARN_ON[_ONCE] just before returning NULL within idr_remove() itself,
> > or even radix_tree_delete_item().
> > 
> > Opinions ?
> 
> Yeah, or simply WARN_ON uncleaned IDR in idr_destroy(), which is a more
> common pattern.

Something like this (or, of course, move it to caller to reduce the noise):

diff --git a/lib/radix-tree.c b/lib/radix-tree.c
index 976b9bd02a1b..20cc690ffb32 100644
--- a/lib/radix-tree.c
+++ b/lib/radix-tree.c
@@ -1559,6 +1559,8 @@ void __rcu **idr_get_free(struct radix_tree_root *root,
 void idr_destroy(struct idr *idr)
 {
        struct radix_tree_node *node = rcu_dereference_raw(idr->idr_rt.xa_head);
+
+       WARN_ON(!idr_is_empty(idr));
        if (radix_tree_is_internal_node(node))
                radix_tree_free_nodes(node);
        idr->idr_rt.xa_head = NULL;
Alexandre Ferrieux Nov. 19, 2024, 6:46 a.m. UTC | #6
On 19/11/2024 04:51, Cong Wang wrote:
> On Thu, Nov 14, 2024 at 07:24:27PM +0100, Alexandre Ferrieux wrote:
>> Hi,
>> 
>> In the recent fix of u32's IDR leaks, one side remark is that the problem went
>> unnoticed for 7 years due to the NULL result from idr_remove() being ignored at
>> this call site.
>> [...]
>> So, unless we have reasons to think cls_u32 was the only place where two ID
>> encodings might lend themselves to confusion, I'm wondering if it wouldn't make
>> sense to chase the issue more systematically:
>> 
>>  - either with WARN_ON[_ONCE](idr_remove()==NULL) on each call site individually
>> (a year-long endeavor implying tens of maintainers)
>> 
>>  - or with WARN_ON[_ONCE] just before returning NULL within idr_remove() itself,
>> or even radix_tree_delete_item().
>> 
>> Opinions ?
> 
> Yeah, or simply WARN_ON uncleaned IDR in idr_destroy(), which is a more
> common pattern.

No, in the general case, idr_destroy() only happens at the end of life of an IDR
set. Some structures in the kernel have a long lifetime, which means possibly
splipping out of fuzzers' scrutiny.

As an illustration, in cls_u32 itself, in the 2048-delete-add loop I use in the
tdc test committed with the fix, idr_destroy(&tp_c->handle_idr) is called only
at the "cleanup" step, when deleting the interface.

You can only imagine, in the hundreds of other uses of IDR, the "miss rate" that
would follow from targeting idr_destroy() instead of idr_remove().
Cong Wang Nov. 22, 2024, 9:32 p.m. UTC | #7
On Tue, Nov 19, 2024 at 07:46:32AM +0100, Alexandre Ferrieux wrote:
> On 19/11/2024 04:51, Cong Wang wrote:
> > On Thu, Nov 14, 2024 at 07:24:27PM +0100, Alexandre Ferrieux wrote:
> >> Hi,
> >> 
> >> In the recent fix of u32's IDR leaks, one side remark is that the problem went
> >> unnoticed for 7 years due to the NULL result from idr_remove() being ignored at
> >> this call site.
> >> [...]
> >> So, unless we have reasons to think cls_u32 was the only place where two ID
> >> encodings might lend themselves to confusion, I'm wondering if it wouldn't make
> >> sense to chase the issue more systematically:
> >> 
> >>  - either with WARN_ON[_ONCE](idr_remove()==NULL) on each call site individually
> >> (a year-long endeavor implying tens of maintainers)
> >> 
> >>  - or with WARN_ON[_ONCE] just before returning NULL within idr_remove() itself,
> >> or even radix_tree_delete_item().
> >> 
> >> Opinions ?
> > 
> > Yeah, or simply WARN_ON uncleaned IDR in idr_destroy(), which is a more
> > common pattern.
> 
> No, in the general case, idr_destroy() only happens at the end of life of an IDR
> set. Some structures in the kernel have a long lifetime, which means possibly
> splipping out of fuzzers' scrutiny.

Sure, move it to where you believe is appropriate.

It is a very common pattern we detect resource leakage when destroying,
for a quick example, in inet_sock_destruct() we detect skb accounting
leaks:

 153         WARN_ON_ONCE(atomic_read(&sk->sk_rmem_alloc));
 154         WARN_ON_ONCE(refcount_read(&sk->sk_wmem_alloc));
 155         WARN_ON_ONCE(sk->sk_wmem_queued);
 156         WARN_ON_ONCE(sk_forward_alloc_get(sk));


Another example of IDR leakage detection can be found in
drivers/gpu/drm/vmwgfx/ttm_object.c:

447 void ttm_object_device_release(struct ttm_object_device **p_tdev)
448 {
449         struct ttm_object_device *tdev = *p_tdev;
450 
451         *p_tdev = NULL;
452 
453         WARN_ON_ONCE(!idr_is_empty(&tdev->idr));
454         idr_destroy(&tdev->idr);
455 
456         kfree(tdev);
457 }


> 
> As an illustration, in cls_u32 itself, in the 2048-delete-add loop I use in the
> tdc test committed with the fix, idr_destroy(&tp_c->handle_idr) is called only
> at the "cleanup" step, when deleting the interface.
> 
> You can only imagine, in the hundreds of other uses of IDR, the "miss rate" that
> would follow from targeting idr_destroy() instead of idr_remove().
> 

I am not saying it is suitable for this specific case, I am just saying it is a
common pattern for you to consier, that's all.

Thanks.
diff mbox series

Patch

diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c
index 9412d88a99bc..d3a03c57545b 100644
--- a/net/sched/cls_u32.c
+++ b/net/sched/cls_u32.c
@@ -92,6 +92,16 @@  struct tc_u_common {
 	long			knodes;
 };
 
+static u32 handle2id(u32 h)
+{
+	return ((h & 0x80000000) ? ((h >> 20) & 0x7FF) : h);
+}
+
+static u32 id2handle(u32 id)
+{
+	return (id | 0x800U) << 20;
+}
+
 static inline unsigned int u32_hash_fold(__be32 key,
 					 const struct tc_u32_sel *sel,
 					 u8 fshift)
@@ -310,7 +320,7 @@  static u32 gen_new_htid(struct tc_u_common *tp_c, struct tc_u_hnode *ptr)
 	int id = idr_alloc_cyclic(&tp_c->handle_idr, ptr, 1, 0x7FF, GFP_KERNEL);
 	if (id < 0)
 		return 0;
-	return (id | 0x800U) << 20;
+	return id2handle(id);
 }
 
 static struct hlist_head *tc_u_common_hash;
@@ -360,7 +370,7 @@  static int u32_init(struct tcf_proto *tp)
 		return -ENOBUFS;
 
 	refcount_set(&root_ht->refcnt, 1);
-	root_ht->handle = tp_c ? gen_new_htid(tp_c, root_ht) : 0x80000000;
+	root_ht->handle = tp_c ? gen_new_htid(tp_c, root_ht) : id2handle(0);
 	root_ht->prio = tp->prio;
 	root_ht->is_root = true;
 	idr_init(&root_ht->handle_idr);
@@ -612,7 +622,7 @@  static int u32_destroy_hnode(struct tcf_proto *tp, struct tc_u_hnode *ht,
 		if (phn == ht) {
 			u32_clear_hw_hnode(tp, ht, extack);
 			idr_destroy(&ht->handle_idr);
-			idr_remove(&tp_c->handle_idr, ht->handle);
+			idr_remove(&tp_c->handle_idr, handle2id(ht->handle));
 			RCU_INIT_POINTER(*hn, ht->next);
 			kfree_rcu(ht, rcu);
 			return 0;
@@ -989,7 +999,7 @@  static int u32_change(struct net *net, struct sk_buff *in_skb,
 
 		err = u32_replace_hw_hnode(tp, ht, userflags, extack);
 		if (err) {
-			idr_remove(&tp_c->handle_idr, handle);
+			idr_remove(&tp_c->handle_idr, handle2id(handle));
 			kfree(ht);
 			return err;
 		}