diff mbox series

[v2,1/2] net: Fix skb_segment when splitting gso_size mangled skb having linear-headed frag_list whose head_frag=true

Message ID 20240626065555.35460-2-dracodingfly@gmail.com (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series [v2,1/2] net: Fix skb_segment when splitting gso_size mangled skb having linear-headed frag_list whose head_frag=true | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Guessed tree name to be net-next
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 842 this patch: 842
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers success CCed 5 of 5 maintainers
netdev/build_clang success Errors and warnings before: 849 this patch: 849
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 849 this patch: 849
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 8 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 54 this patch: 54
netdev/source_inline success Was 0 now: 0
bpf/vmtest-bpf-next-VM_Test-1 success Logs for ShellCheck
bpf/vmtest-bpf-next-VM_Test-0 success Logs for Lint
bpf/vmtest-bpf-next-VM_Test-3 success Logs for Validate matrix.py
bpf/vmtest-bpf-next-VM_Test-2 success Logs for Unittests
bpf/vmtest-bpf-next-VM_Test-5 success Logs for aarch64-gcc / build-release
bpf/vmtest-bpf-next-VM_Test-4 success Logs for aarch64-gcc / build / build for aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-10 success Logs for aarch64-gcc / veristat
bpf/vmtest-bpf-next-VM_Test-12 success Logs for s390x-gcc / build-release
bpf/vmtest-bpf-next-VM_Test-11 success Logs for s390x-gcc / build / build for s390x with gcc
bpf/vmtest-bpf-next-VM_Test-17 success Logs for s390x-gcc / veristat
bpf/vmtest-bpf-next-VM_Test-19 success Logs for x86_64-gcc / build / build for x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-20 success Logs for x86_64-gcc / build-release
bpf/vmtest-bpf-next-VM_Test-18 success Logs for set-matrix
bpf/vmtest-bpf-next-VM_Test-28 success Logs for x86_64-llvm-17 / build / build for x86_64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-29 success Logs for x86_64-llvm-17 / build-release / build for x86_64 with llvm-17-O2
bpf/vmtest-bpf-next-VM_Test-34 success Logs for x86_64-llvm-17 / veristat
bpf/vmtest-bpf-next-VM_Test-35 success Logs for x86_64-llvm-18 / build / build for x86_64 with llvm-18
bpf/vmtest-bpf-next-VM_Test-36 success Logs for x86_64-llvm-18 / build-release / build for x86_64 with llvm-18-O2
bpf/vmtest-bpf-next-VM_Test-42 success Logs for x86_64-llvm-18 / veristat
bpf/vmtest-bpf-next-VM_Test-6 success Logs for aarch64-gcc / test (test_maps, false, 360) / test_maps on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-9 success Logs for aarch64-gcc / test (test_verifier, false, 360) / test_verifier on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-8 success Logs for aarch64-gcc / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-7 success Logs for aarch64-gcc / test (test_progs, false, 360) / test_progs on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-21 success Logs for x86_64-gcc / test (test_maps, false, 360) / test_maps on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-25 success Logs for x86_64-gcc / test (test_progs_parallel, true, 30) / test_progs_parallel on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-24 success Logs for x86_64-gcc / test (test_progs_no_alu32_parallel, true, 30) / test_progs_no_alu32_parallel on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-26 success Logs for x86_64-gcc / test (test_verifier, false, 360) / test_verifier on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-27 fail Logs for x86_64-gcc / veristat / veristat on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-30 success Logs for x86_64-llvm-17 / test (test_maps, false, 360) / test_maps on x86_64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-33 success Logs for x86_64-llvm-17 / test (test_verifier, false, 360) / test_verifier on x86_64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-37 success Logs for x86_64-llvm-18 / test (test_maps, false, 360) / test_maps on x86_64 with llvm-18
bpf/vmtest-bpf-next-VM_Test-41 success Logs for x86_64-llvm-18 / test (test_verifier, false, 360) / test_verifier on x86_64 with llvm-18
bpf/vmtest-bpf-next-VM_Test-23 success Logs for x86_64-gcc / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-22 success Logs for x86_64-gcc / test (test_progs, false, 360) / test_progs on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-31 success Logs for x86_64-llvm-17 / test (test_progs, false, 360) / test_progs on x86_64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-32 success Logs for x86_64-llvm-17 / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on x86_64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-38 success Logs for x86_64-llvm-18 / test (test_progs, false, 360) / test_progs on x86_64 with llvm-18
bpf/vmtest-bpf-next-VM_Test-39 success Logs for x86_64-llvm-18 / test (test_progs_cpuv4, false, 360) / test_progs_cpuv4 on x86_64 with llvm-18
bpf/vmtest-bpf-next-VM_Test-40 success Logs for x86_64-llvm-18 / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on x86_64 with llvm-18
bpf/vmtest-bpf-next-VM_Test-13 success Logs for s390x-gcc / test (test_maps, false, 360) / test_maps on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-16 success Logs for s390x-gcc / test (test_verifier, false, 360) / test_verifier on s390x with gcc
bpf/vmtest-bpf-next-PR fail PR summary
bpf/vmtest-bpf-next-VM_Test-14 success Logs for s390x-gcc / test (test_progs, false, 360) / test_progs on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-15 success Logs for s390x-gcc / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on s390x with gcc

Commit Message

Fred Li June 26, 2024, 6:55 a.m. UTC
When using calico bpf based NAT, hits a kernel BUG_ON at function
skb_segment(), line 4560. Performing NAT translation when accessing
a Service IP across subnets, the calico will encap vxlan and calls the
bpf_skb_adjust_room to decrease the gso_size, and then call bpf_redirect
send packets out.

4438 struct sk_buff *skb_segment(struct sk_buff *head_skb,
4439                             netdev_features_t features)
4440 {
4441     struct sk_buff *segs = NULL;
4442     struct sk_buff *tail = NULL;
...
4558         if (hsize <= 0 && i >= nfrags && skb_headlen(list_skb) &&
4559             (skb_headlen(list_skb) == len || sg)) {
4560                 BUG_ON(skb_headlen(list_skb) > len);
4561
4562                 nskb = skb_clone(list_skb, GFP_ATOMIC);
4563                 if (unlikely(!nskb))
4564                     goto err;

call stack:
...
   [exception RIP: skb_segment+3016]
    RIP: ffffffffb97df2a8  RSP: ffffa3f2cce08728  RFLAGS: 00010293
    RAX: 000000000000007d  RBX: 00000000fffff7b3  RCX: 0000000000000011
    RDX: 0000000000000000  RSI: ffff895ea32c76c0  RDI: 00000000000008c1
    RBP: ffffa3f2cce087f8   R8: 000000000000088f   R9: 0000000000000011
    R10: 000000000000090c  R11: ffff895e47e68000  R12: ffff895eb2022f00
    R13: 000000000000004b  R14: ffff895ecdaf2000  R15: ffff895eb2023f00
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #9 [ffffa3f2cce08720] skb_segment at ffffffffb97ded63
...

The skb has the following properties:
    doffset = 66
    list_skb = skb_shinfo(skb)->frag_list
    list_skb->head_frag = true
    skb->len = 2441 && skb->data_len = 2250
    skb_shinfo(skb)->nr_frags = 17
    skb_shinfo(skb)->gso_size = 75
    skb_shinfo(skb)->frags[0...16].bv_len = 125
    list_skb->len = 125
    list_skb->data_len = 0

When slicing the frag_list skb, there three cases:
1. Only *non* head_frag
    sg will be false, only when skb_headlen(list_skb)==len is satisfied,
    it will enter the branch at line 4560, and there will be no crash.
2. Mixed head_frag
    sg will be false, Only when skb_headlen(list_skb)==len is satisfied,
    it will enter the branch at line 4560, and there will be no crash.
3. Only frag_list with head_frag=true
    sg is true, three cases below:
    (1) skb_headlen(list_skb)==len is satisfied, it will enter the branch
       at line 4560, and there will be no crash.
    (2) skb_headlen(list_skb)<len is satisfied, it will enter the branch
       at line 4560, and there will be no crash.
    (3) skb_headlen(list_skb)>len is satisfied, it will be crash.

Applying this patch, three cases will be:
1. Only *non* head_frag
    sg will be false. No difference with before.
2. Mixed head_frag
    sg will be false. No difference with before.
3. Only frag_list with head_frag=true
    sg is true, there also three cases:
    (1) skb_headlen(list_skb)==len is satisfied, no difference with before.
    (2) skb_headlen(list_skb)<len is satisfied, will be revert to copying
        in this case.
    (3) skb_headlen(list_skb)>len is satisfied, will be revert to copying
        in this case.

Since commit 13acc94eff122("net: permit skb_segment on head_frag frag_list
skb"), it is allowed to segment the head_frag frag_list skb.

Signed-off-by: Fred Li <dracodingfly@gmail.com>
---
 net/core/skbuff.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Paolo Abeni July 2, 2024, 9:23 a.m. UTC | #1
On Wed, 2024-06-26 at 14:55 +0800, Fred Li wrote:
> When using calico bpf based NAT, hits a kernel BUG_ON at function
> skb_segment(), line 4560. Performing NAT translation when accessing
> a Service IP across subnets, the calico will encap vxlan and calls the
> bpf_skb_adjust_room to decrease the gso_size, and then call bpf_redirect
> send packets out.
> 
> 4438 struct sk_buff *skb_segment(struct sk_buff *head_skb,
> 4439                             netdev_features_t features)
> 4440 {
> 4441     struct sk_buff *segs = NULL;
> 4442     struct sk_buff *tail = NULL;
> ...
> 4558         if (hsize <= 0 && i >= nfrags && skb_headlen(list_skb) &&
> 4559             (skb_headlen(list_skb) == len || sg)) {
> 4560                 BUG_ON(skb_headlen(list_skb) > len);
> 4561
> 4562                 nskb = skb_clone(list_skb, GFP_ATOMIC);
> 4563                 if (unlikely(!nskb))
> 4564                     goto err;
> 
> call stack:
> ...
>    [exception RIP: skb_segment+3016]
>     RIP: ffffffffb97df2a8  RSP: ffffa3f2cce08728  RFLAGS: 00010293
>     RAX: 000000000000007d  RBX: 00000000fffff7b3  RCX: 0000000000000011
>     RDX: 0000000000000000  RSI: ffff895ea32c76c0  RDI: 00000000000008c1
>     RBP: ffffa3f2cce087f8   R8: 000000000000088f   R9: 0000000000000011
>     R10: 000000000000090c  R11: ffff895e47e68000  R12: ffff895eb2022f00
>     R13: 000000000000004b  R14: ffff895ecdaf2000  R15: ffff895eb2023f00
>     ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
>  #9 [ffffa3f2cce08720] skb_segment at ffffffffb97ded63
> ...
> 
> The skb has the following properties:
>     doffset = 66
>     list_skb = skb_shinfo(skb)->frag_list
>     list_skb->head_frag = true
>     skb->len = 2441 && skb->data_len = 2250
>     skb_shinfo(skb)->nr_frags = 17
>     skb_shinfo(skb)->gso_size = 75
>     skb_shinfo(skb)->frags[0...16].bv_len = 125
>     list_skb->len = 125
>     list_skb->data_len = 0
> 
> When slicing the frag_list skb, there three cases:
> 1. Only *non* head_frag
>     sg will be false, only when skb_headlen(list_skb)==len is satisfied,
>     it will enter the branch at line 4560, and there will be no crash.
> 2. Mixed head_frag
>     sg will be false, Only when skb_headlen(list_skb)==len is satisfied,
>     it will enter the branch at line 4560, and there will be no crash.
> 3. Only frag_list with head_frag=true
>     sg is true, three cases below:
>     (1) skb_headlen(list_skb)==len is satisfied, it will enter the branch
>        at line 4560, and there will be no crash.
>     (2) skb_headlen(list_skb)<len is satisfied, it will enter the branch
>        at line 4560, and there will be no crash.
>     (3) skb_headlen(list_skb)>len is satisfied, it will be crash.
> 
> Applying this patch, three cases will be:
> 1. Only *non* head_frag
>     sg will be false. No difference with before.
> 2. Mixed head_frag
>     sg will be false. No difference with before.
> 3. Only frag_list with head_frag=true
>     sg is true, there also three cases:
>     (1) skb_headlen(list_skb)==len is satisfied, no difference with before.
>     (2) skb_headlen(list_skb)<len is satisfied, will be revert to copying
>         in this case.
>     (3) skb_headlen(list_skb)>len is satisfied, will be revert to copying
>         in this case.
> 
> Since commit 13acc94eff122("net: permit skb_segment on head_frag frag_list
> skb"), it is allowed to segment the head_frag frag_list skb.
> 
> Signed-off-by: Fred Li <dracodingfly@gmail.com>
> ---
>  net/core/skbuff.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index f0a9ef1aeaa2..b1dab1b071fc 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -4556,7 +4556,7 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
>  		hsize = skb_headlen(head_skb) - offset;
>  
>  		if (hsize <= 0 && i >= nfrags && skb_headlen(list_skb) &&
> -		    (skb_headlen(list_skb) == len || sg)) {
> +		    (skb_headlen(list_skb) == len)) {
>  			BUG_ON(skb_headlen(list_skb) > len);
>  
>  			nskb = skb_clone(list_skb, GFP_ATOMIC);

I must admit I more than a bit lost in the many turns of skb_segment(),
but the above does not look like the correct solution, as it will make
the later BUG_ON() unreachable/meaningless.

Do I read correctly that when the BUG_ON() triggers:

list_skb->len is 125
len is 75
list_skb->frag_head is true

It looks like skb_segment() is becoming even and ever more complex to
cope with unexpected skb layouts, only possibly when skb_segment() is
called by some BPF helpers.

I think it should be up to the helper itself to adjust the skb and/or
the device features to respect skb_segment() constraints, instead of
making the common code increasingly not maintainable.

Thanks,

Paolo
Fred Li July 3, 2024, 12:21 p.m. UTC | #2
> I must admit I more than a bit lost in the many turns of skb_segment(),
> but the above does not look like the correct solution, as it will make
> the later BUG_ON() unreachable/meaningless.

Sorry, the subsequent BUG_ON maybe should be removed in this patch, because
for skb_headlen(list_skb) > len, it will continue splitting as commit 13acc94eff122 
(net: permit skb_segment on head_frag frag_list skb) does.

> 
> Do I read correctly that when the BUG_ON() triggers:
> 
> list_skb->len is 125
> len is 75
> list_skb->frag_head is true
>

yes, it's correct.
list_skb->len is 125
gso_size is 75, also the len in the BUG_ON conditon
list_skb->head_frag is true
 
> It looks like skb_segment() is becoming even and ever more complex to
> cope with unexpected skb layouts, only possibly when skb_segment() is
> called by some BPF helpers.
> 
> Thanks,
> 
> Paolo

I'll wait for more suggestions from others.

Thanks

Fred Li
Willem de Bruijn July 6, 2024, 2:26 p.m. UTC | #3
Fred Li wrote:
> > I must admit I more than a bit lost in the many turns of skb_segment(),
> > but the above does not look like the correct solution, as it will make
> > the later BUG_ON() unreachable/meaningless.
> 
> Sorry, the subsequent BUG_ON maybe should be removed in this patch, because
> for skb_headlen(list_skb) > len, it will continue splitting as commit 13acc94eff122 
> (net: permit skb_segment on head_frag frag_list skb) does.
> 
> > 
> > Do I read correctly that when the BUG_ON() triggers:
> > 
> > list_skb->len is 125
> > len is 75
> > list_skb->frag_head is true
> >
> 
> yes, it's correct.
> list_skb->len is 125
> gso_size is 75, also the len in the BUG_ON conditon
> list_skb->head_frag is true
>  
> > It looks like skb_segment() is becoming even and ever more complex to
> > cope with unexpected skb layouts, only possibly when skb_segment() is
> > called by some BPF helpers.
> > 
> > Thanks,
> > 
> > Paolo
> 
> I'll wait for more suggestions from others.

In general, agreed with Paolo. Segmentation is getting ever more
complex and the code hard to follow.

Maybe at some point we'll have to bite the bullet and seriously
refactor it. Or at least parts, such as the frag_list handling case.
But that kills any odds of backporting fixes, so not if we can avoid
it.

In particular, changing gso_size on skbs with frag_list is fragile.

Instead of adding another special case, how about just linearizing
sbks after BPF calls adjust_room (at least if called without
BPF_F_ADJ_ROOM_FIXED_GSO) if they have frag_list.
diff mbox series

Patch

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index f0a9ef1aeaa2..b1dab1b071fc 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -4556,7 +4556,7 @@  struct sk_buff *skb_segment(struct sk_buff *head_skb,
 		hsize = skb_headlen(head_skb) - offset;
 
 		if (hsize <= 0 && i >= nfrags && skb_headlen(list_skb) &&
-		    (skb_headlen(list_skb) == len || sg)) {
+		    (skb_headlen(list_skb) == len)) {
 			BUG_ON(skb_headlen(list_skb) > len);
 
 			nskb = skb_clone(list_skb, GFP_ATOMIC);