diff mbox series

[bpf-next] bpf: fix issue that packet only contains l2 is dropped

Message ID 20221015092448.117563-1-shaozhengchao@huawei.com (mailing list archive)
State Changes Requested
Delegated to: BPF
Headers show
Series [bpf-next] bpf: fix issue that packet only contains l2 is dropped | expand

Checks

Context Check Description
netdev/tree_selection success Clearly marked for bpf-next
netdev/fixes_present success Fixes tag not required for -next series
netdev/subject_prefix success Link
netdev/cover_letter success Single patches do not need cover letters
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 35 this patch: 35
netdev/cc_maintainers success CCed 18 of 18 maintainers
netdev/build_clang success Errors and warnings before: 5 this patch: 5
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 35 this patch: 35
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 18 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
bpf/vmtest-bpf-next-VM_Test-4 success Logs for llvm-toolchain
bpf/vmtest-bpf-next-VM_Test-5 success Logs for set-matrix
bpf/vmtest-bpf-next-VM_Test-2 success Logs for build for x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-3 success Logs for build for x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-1 success Logs for build for s390x with gcc
bpf/vmtest-bpf-next-VM_Test-16 success Logs for test_verifier on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-17 success Logs for test_verifier on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-7 success Logs for test_maps on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-8 success Logs for test_maps on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-10 success Logs for test_progs on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-11 success Logs for test_progs on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-13 success Logs for test_progs_no_alu32 on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-14 fail Logs for test_progs_no_alu32 on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-6 fail Logs for test_maps on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-15 success Logs for test_verifier on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-9 fail Logs for test_progs on s390x with gcc
bpf/vmtest-bpf-next-PR fail PR summary
bpf/vmtest-bpf-next-VM_Test-12 success Logs for test_progs_no_alu32 on s390x with gcc

Commit Message

shaozhengchao Oct. 15, 2022, 9:24 a.m. UTC
As [0] see, bpf_prog_test_run_skb() should allow user space to forward
14-bytes packet via BPF_PROG_RUN instead of dropping packet directly.
So fix it.

0: https://github.com/cilium/ebpf/commit/a38fb6b5a46ab3b5639ea4d421232a10013596c0

Fixes: fd1894224407 ("bpf: Don't redirect packets with invalid pkt_len")
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
---
 net/bpf/test_run.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

Comments

Stanislav Fomichev Oct. 17, 2022, 4:36 p.m. UTC | #1
On Sat, Oct 15, 2022 at 2:16 AM Zhengchao Shao <shaozhengchao@huawei.com> wrote:
>
> As [0] see, bpf_prog_test_run_skb() should allow user space to forward
> 14-bytes packet via BPF_PROG_RUN instead of dropping packet directly.
> So fix it.
>
> 0: https://github.com/cilium/ebpf/commit/a38fb6b5a46ab3b5639ea4d421232a10013596c0
>
> Fixes: fd1894224407 ("bpf: Don't redirect packets with invalid pkt_len")
> Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
> ---
>  net/bpf/test_run.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
> index 13d578ce2a09..aa1b49f19ca3 100644
> --- a/net/bpf/test_run.c
> +++ b/net/bpf/test_run.c
> @@ -979,9 +979,6 @@ static int convert___skb_to_skb(struct sk_buff *skb, struct __sk_buff *__skb)
>  {
>         struct qdisc_skb_cb *cb = (struct qdisc_skb_cb *)skb->cb;
>
> -       if (!skb->len)
> -               return -EINVAL;
> -
>         if (!__skb)
>                 return 0;
>
> @@ -1102,6 +1099,9 @@ int bpf_prog_test_run_skb(struct bpf_prog *prog, const union bpf_attr *kattr,
>         if (IS_ERR(data))
>                 return PTR_ERR(data);
>
> +       if (size == ETH_HLEN)
> +               is_l2 = true;
> +

Don't think this will work? That is_l2 is there to expose proper l2/l3
skb for specific hooks; we can't suddenly start exposing l2 headers to
the hooks that don't expect it.
Does it make sense to start with a small reproducer that triggers the
issue first? We can have a couple of cases for
len=0/ETH_HLEN-1/ETH_HLEN+1 and trigger them from the bpf program that
redirects to different devices (to trigger dev_is_mac_header_xmit).





>         ctx = bpf_ctx_init(kattr, sizeof(struct __sk_buff));
>         if (IS_ERR(ctx)) {
>                 kfree(data);
> --
> 2.17.1
>
shaozhengchao Oct. 20, 2022, 1:46 a.m. UTC | #2
On 2022/10/18 0:36, Stanislav Fomichev wrote:
> On Sat, Oct 15, 2022 at 2:16 AM Zhengchao Shao <shaozhengchao@huawei.com> wrote:
>>
>> As [0] see, bpf_prog_test_run_skb() should allow user space to forward
>> 14-bytes packet via BPF_PROG_RUN instead of dropping packet directly.
>> So fix it.
>>
>> 0: https://github.com/cilium/ebpf/commit/a38fb6b5a46ab3b5639ea4d421232a10013596c0
>>
>> Fixes: fd1894224407 ("bpf: Don't redirect packets with invalid pkt_len")
>> Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
>> ---
>>   net/bpf/test_run.c | 6 +++---
>>   1 file changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
>> index 13d578ce2a09..aa1b49f19ca3 100644
>> --- a/net/bpf/test_run.c
>> +++ b/net/bpf/test_run.c
>> @@ -979,9 +979,6 @@ static int convert___skb_to_skb(struct sk_buff *skb, struct __sk_buff *__skb)
>>   {
>>          struct qdisc_skb_cb *cb = (struct qdisc_skb_cb *)skb->cb;
>>
>> -       if (!skb->len)
>> -               return -EINVAL;
>> -
>>          if (!__skb)
>>                  return 0;
>>
>> @@ -1102,6 +1099,9 @@ int bpf_prog_test_run_skb(struct bpf_prog *prog, const union bpf_attr *kattr,
>>          if (IS_ERR(data))
>>                  return PTR_ERR(data);
>>
>> +       if (size == ETH_HLEN)
>> +               is_l2 = true;
>> +
> 
> Don't think this will work? That is_l2 is there to expose proper l2/l3
> skb for specific hooks; we can't suddenly start exposing l2 headers to
> the hooks that don't expect it.
> Does it make sense to start with a small reproducer that triggers the
> issue first? We can have a couple of cases for
> len=0/ETH_HLEN-1/ETH_HLEN+1 and trigger them from the bpf program that
> redirects to different devices (to trigger dev_is_mac_header_xmit).
> 
> 
Hi Stanislav:
	Thank you for your review. Is_l2 is the flag of a specific
hook. Therefore, do you mean that if skb->len is equal to 0, just
add the length back?
> 
> 
> 
>>          ctx = bpf_ctx_init(kattr, sizeof(struct __sk_buff));
>>          if (IS_ERR(ctx)) {
>>                  kfree(data);
>> --
>> 2.17.1
>>
Stanislav Fomichev Oct. 20, 2022, 5:45 p.m. UTC | #3
On Wed, Oct 19, 2022 at 6:47 PM shaozhengchao <shaozhengchao@huawei.com> wrote:
>
>
>
> On 2022/10/18 0:36, Stanislav Fomichev wrote:
> > On Sat, Oct 15, 2022 at 2:16 AM Zhengchao Shao <shaozhengchao@huawei.com> wrote:
> >>
> >> As [0] see, bpf_prog_test_run_skb() should allow user space to forward
> >> 14-bytes packet via BPF_PROG_RUN instead of dropping packet directly.
> >> So fix it.
> >>
> >> 0: https://github.com/cilium/ebpf/commit/a38fb6b5a46ab3b5639ea4d421232a10013596c0
> >>
> >> Fixes: fd1894224407 ("bpf: Don't redirect packets with invalid pkt_len")
> >> Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
> >> ---
> >>   net/bpf/test_run.c | 6 +++---
> >>   1 file changed, 3 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
> >> index 13d578ce2a09..aa1b49f19ca3 100644
> >> --- a/net/bpf/test_run.c
> >> +++ b/net/bpf/test_run.c
> >> @@ -979,9 +979,6 @@ static int convert___skb_to_skb(struct sk_buff *skb, struct __sk_buff *__skb)
> >>   {
> >>          struct qdisc_skb_cb *cb = (struct qdisc_skb_cb *)skb->cb;
> >>
> >> -       if (!skb->len)
> >> -               return -EINVAL;
> >> -
> >>          if (!__skb)
> >>                  return 0;
> >>
> >> @@ -1102,6 +1099,9 @@ int bpf_prog_test_run_skb(struct bpf_prog *prog, const union bpf_attr *kattr,
> >>          if (IS_ERR(data))
> >>                  return PTR_ERR(data);
> >>
> >> +       if (size == ETH_HLEN)
> >> +               is_l2 = true;
> >> +
> >
> > Don't think this will work? That is_l2 is there to expose proper l2/l3
> > skb for specific hooks; we can't suddenly start exposing l2 headers to
> > the hooks that don't expect it.
> > Does it make sense to start with a small reproducer that triggers the
> > issue first? We can have a couple of cases for
> > len=0/ETH_HLEN-1/ETH_HLEN+1 and trigger them from the bpf program that
> > redirects to different devices (to trigger dev_is_mac_header_xmit).
> >
> >
> Hi Stanislav:
>         Thank you for your review. Is_l2 is the flag of a specific
> hook. Therefore, do you mean that if skb->len is equal to 0, just
> add the length back?

Not sure I understand your question. All I'm saying is - you can't
flip that flag arbitrarily. This flag depends on the attach point that
you're running the prog against. Some attach points expect packets
with l2, some expect packets without l2.

What about starting with a small reproducer? Does it make sense to
create a small selftest that adds net namespace + fq_codel +
bpf_prog_test run and do redirect ingress/egress with len
0/1...tcphdr? Because I'm not sure I 100% understand whether it's only
len=0 that's problematic or some other combination as well?

> >
> >
> >
> >>          ctx = bpf_ctx_init(kattr, sizeof(struct __sk_buff));
> >>          if (IS_ERR(ctx)) {
> >>                  kfree(data);
> >> --
> >> 2.17.1
> >>
shaozhengchao Oct. 21, 2022, 7:25 a.m. UTC | #4
On 2022/10/21 1:45, Stanislav Fomichev wrote:
> On Wed, Oct 19, 2022 at 6:47 PM shaozhengchao <shaozhengchao@huawei.com> wrote:
>>
>>
>>
>> On 2022/10/18 0:36, Stanislav Fomichev wrote:
>>> On Sat, Oct 15, 2022 at 2:16 AM Zhengchao Shao <shaozhengchao@huawei.com> wrote:
>>>>
>>>> As [0] see, bpf_prog_test_run_skb() should allow user space to forward
>>>> 14-bytes packet via BPF_PROG_RUN instead of dropping packet directly.
>>>> So fix it.
>>>>
>>>> 0: https://github.com/cilium/ebpf/commit/a38fb6b5a46ab3b5639ea4d421232a10013596c0
>>>>
>>>> Fixes: fd1894224407 ("bpf: Don't redirect packets with invalid pkt_len")
>>>> Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
>>>> ---
>>>>    net/bpf/test_run.c | 6 +++---
>>>>    1 file changed, 3 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
>>>> index 13d578ce2a09..aa1b49f19ca3 100644
>>>> --- a/net/bpf/test_run.c
>>>> +++ b/net/bpf/test_run.c
>>>> @@ -979,9 +979,6 @@ static int convert___skb_to_skb(struct sk_buff *skb, struct __sk_buff *__skb)
>>>>    {
>>>>           struct qdisc_skb_cb *cb = (struct qdisc_skb_cb *)skb->cb;
>>>>
>>>> -       if (!skb->len)
>>>> -               return -EINVAL;
>>>> -
>>>>           if (!__skb)
>>>>                   return 0;
>>>>
>>>> @@ -1102,6 +1099,9 @@ int bpf_prog_test_run_skb(struct bpf_prog *prog, const union bpf_attr *kattr,
>>>>           if (IS_ERR(data))
>>>>                   return PTR_ERR(data);
>>>>
>>>> +       if (size == ETH_HLEN)
>>>> +               is_l2 = true;
>>>> +
>>>
>>> Don't think this will work? That is_l2 is there to expose proper l2/l3
>>> skb for specific hooks; we can't suddenly start exposing l2 headers to
>>> the hooks that don't expect it.
>>> Does it make sense to start with a small reproducer that triggers the
>>> issue first? We can have a couple of cases for
>>> len=0/ETH_HLEN-1/ETH_HLEN+1 and trigger them from the bpf program that
>>> redirects to different devices (to trigger dev_is_mac_header_xmit).
>>>
>>>
>> Hi Stanislav:
>>          Thank you for your review. Is_l2 is the flag of a specific
>> hook. Therefore, do you mean that if skb->len is equal to 0, just
>> add the length back?
> 
> Not sure I understand your question. All I'm saying is - you can't
> flip that flag arbitrarily. This flag depends on the attach point that
> you're running the prog against. Some attach points expect packets
> with l2, some expect packets without l2.
> 
> What about starting with a small reproducer? Does it make sense to
> create a small selftest that adds net namespace + fq_codel +
> bpf_prog_test run and do redirect ingress/egress with len
> 0/1...tcphdr? Because I'm not sure I 100% understand whether it's only
> len=0 that's problematic or some other combination as well?
> 
yes, only skb->len = 0 will cause null-ptr-deref issue.
The following is the process of triggering the problem:
enqueue a skb:
fq_codel_enqueue()
	...
	idx = fq_codel_classify()        --->if idx != 0
	flow = &q->flows[idx];
	flow_queue_add(flow, skb);       --->add skb to flow[idex]
	q->backlogs[idx] += qdisc_pkt_len(skb); --->backlogs = 0
	...
	fq_codel_drop()                  --->set sch->limit = 0, always
drop packets
		...
		idx = i                  --->becuase backlogs in every
flows is 0, so idx = 0
		...
		flow = &q->flows[idx];   --->get idx=0 flow
		...
		dequeue_head()
			skb = flow->head; --->flow->head = NULL
			flow->head = skb->next; --->cause null-ptr-deref
So, if skb->len !=0,fq_codel_drop() could get the correct idx, and
then skb!=NULL, it will be OK.
Maybe, I will fix it in fq_codel.

But, as I know, skb->len = 0 is just invalid packet. I prefer to add the
length back, like bellow:
	if (is_l2 || !skb->len)
		__skb_push(skb, hh_len);
is it OK?
>>>
>>>
>>>
>>>>           ctx = bpf_ctx_init(kattr, sizeof(struct __sk_buff));
>>>>           if (IS_ERR(ctx)) {
>>>>                   kfree(data);
>>>> --
>>>> 2.17.1
>>>>
Stanislav Fomichev Oct. 21, 2022, 6:16 p.m. UTC | #5
On Fri, Oct 21, 2022 at 12:25 AM shaozhengchao <shaozhengchao@huawei.com> wrote:
>
>
>
> On 2022/10/21 1:45, Stanislav Fomichev wrote:
> > On Wed, Oct 19, 2022 at 6:47 PM shaozhengchao <shaozhengchao@huawei.com> wrote:
> >>
> >>
> >>
> >> On 2022/10/18 0:36, Stanislav Fomichev wrote:
> >>> On Sat, Oct 15, 2022 at 2:16 AM Zhengchao Shao <shaozhengchao@huawei.com> wrote:
> >>>>
> >>>> As [0] see, bpf_prog_test_run_skb() should allow user space to forward
> >>>> 14-bytes packet via BPF_PROG_RUN instead of dropping packet directly.
> >>>> So fix it.
> >>>>
> >>>> 0: https://github.com/cilium/ebpf/commit/a38fb6b5a46ab3b5639ea4d421232a10013596c0
> >>>>
> >>>> Fixes: fd1894224407 ("bpf: Don't redirect packets with invalid pkt_len")
> >>>> Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
> >>>> ---
> >>>>    net/bpf/test_run.c | 6 +++---
> >>>>    1 file changed, 3 insertions(+), 3 deletions(-)
> >>>>
> >>>> diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
> >>>> index 13d578ce2a09..aa1b49f19ca3 100644
> >>>> --- a/net/bpf/test_run.c
> >>>> +++ b/net/bpf/test_run.c
> >>>> @@ -979,9 +979,6 @@ static int convert___skb_to_skb(struct sk_buff *skb, struct __sk_buff *__skb)
> >>>>    {
> >>>>           struct qdisc_skb_cb *cb = (struct qdisc_skb_cb *)skb->cb;
> >>>>
> >>>> -       if (!skb->len)
> >>>> -               return -EINVAL;
> >>>> -
> >>>>           if (!__skb)
> >>>>                   return 0;
> >>>>
> >>>> @@ -1102,6 +1099,9 @@ int bpf_prog_test_run_skb(struct bpf_prog *prog, const union bpf_attr *kattr,
> >>>>           if (IS_ERR(data))
> >>>>                   return PTR_ERR(data);
> >>>>
> >>>> +       if (size == ETH_HLEN)
> >>>> +               is_l2 = true;
> >>>> +
> >>>
> >>> Don't think this will work? That is_l2 is there to expose proper l2/l3
> >>> skb for specific hooks; we can't suddenly start exposing l2 headers to
> >>> the hooks that don't expect it.
> >>> Does it make sense to start with a small reproducer that triggers the
> >>> issue first? We can have a couple of cases for
> >>> len=0/ETH_HLEN-1/ETH_HLEN+1 and trigger them from the bpf program that
> >>> redirects to different devices (to trigger dev_is_mac_header_xmit).
> >>>
> >>>
> >> Hi Stanislav:
> >>          Thank you for your review. Is_l2 is the flag of a specific
> >> hook. Therefore, do you mean that if skb->len is equal to 0, just
> >> add the length back?
> >
> > Not sure I understand your question. All I'm saying is - you can't
> > flip that flag arbitrarily. This flag depends on the attach point that
> > you're running the prog against. Some attach points expect packets
> > with l2, some expect packets without l2.
> >
> > What about starting with a small reproducer? Does it make sense to
> > create a small selftest that adds net namespace + fq_codel +
> > bpf_prog_test run and do redirect ingress/egress with len
> > 0/1...tcphdr? Because I'm not sure I 100% understand whether it's only
> > len=0 that's problematic or some other combination as well?
> >
> yes, only skb->len = 0 will cause null-ptr-deref issue.
> The following is the process of triggering the problem:
> enqueue a skb:
> fq_codel_enqueue()
>         ...
>         idx = fq_codel_classify()        --->if idx != 0
>         flow = &q->flows[idx];
>         flow_queue_add(flow, skb);       --->add skb to flow[idex]
>         q->backlogs[idx] += qdisc_pkt_len(skb); --->backlogs = 0
>         ...
>         fq_codel_drop()                  --->set sch->limit = 0, always
> drop packets
>                 ...
>                 idx = i                  --->becuase backlogs in every
> flows is 0, so idx = 0
>                 ...
>                 flow = &q->flows[idx];   --->get idx=0 flow
>                 ...
>                 dequeue_head()
>                         skb = flow->head; --->flow->head = NULL
>                         flow->head = skb->next; --->cause null-ptr-deref
> So, if skb->len !=0,fq_codel_drop() could get the correct idx, and
> then skb!=NULL, it will be OK.
> Maybe, I will fix it in fq_codel.

I think the consensus here is that the stack, in general, doesn't
expect the packets like this. So there are probably more broken things
besides fq_codel. Thus, it's better if we remove the ability to
generate them from the bpf side instead of fixing the individual users
like fq_codel.

> But, as I know, skb->len = 0 is just invalid packet. I prefer to add the
> length back, like bellow:
>         if (is_l2 || !skb->len)
>                 __skb_push(skb, hh_len);
> is it OK?

Probably not?

Looking at the original syzkaller report, prog_type is
BPF_PROG_TYPE_LWT_XMIT which does expect a packet without l2 header.
Can we do something like:

if (!is_l2 && !skb->len) {
  // append some dummy byte to the skb ?
}


}

> >>>
> >>>
> >>>
> >>>>           ctx = bpf_ctx_init(kattr, sizeof(struct __sk_buff));
> >>>>           if (IS_ERR(ctx)) {
> >>>>                   kfree(data);
> >>>> --
> >>>> 2.17.1
> >>>>
shaozhengchao Oct. 22, 2022, 11:36 a.m. UTC | #6
On 2022/10/22 2:16, Stanislav Fomichev wrote:
> On Fri, Oct 21, 2022 at 12:25 AM shaozhengchao <shaozhengchao@huawei.com> wrote:
>>
>>
>>
>> On 2022/10/21 1:45, Stanislav Fomichev wrote:
>>> On Wed, Oct 19, 2022 at 6:47 PM shaozhengchao <shaozhengchao@huawei.com> wrote:
>>>>
>>>>
>>>>
>>>> On 2022/10/18 0:36, Stanislav Fomichev wrote:
>>>>> On Sat, Oct 15, 2022 at 2:16 AM Zhengchao Shao <shaozhengchao@huawei.com> wrote:
>>>>>>
>>>>>> As [0] see, bpf_prog_test_run_skb() should allow user space to forward
>>>>>> 14-bytes packet via BPF_PROG_RUN instead of dropping packet directly.
>>>>>> So fix it.
>>>>>>
>>>>>> 0: https://github.com/cilium/ebpf/commit/a38fb6b5a46ab3b5639ea4d421232a10013596c0
>>>>>>
>>>>>> Fixes: fd1894224407 ("bpf: Don't redirect packets with invalid pkt_len")
>>>>>> Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
>>>>>> ---
>>>>>>     net/bpf/test_run.c | 6 +++---
>>>>>>     1 file changed, 3 insertions(+), 3 deletions(-)
>>>>>>
>>>>>> diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
>>>>>> index 13d578ce2a09..aa1b49f19ca3 100644
>>>>>> --- a/net/bpf/test_run.c
>>>>>> +++ b/net/bpf/test_run.c
>>>>>> @@ -979,9 +979,6 @@ static int convert___skb_to_skb(struct sk_buff *skb, struct __sk_buff *__skb)
>>>>>>     {
>>>>>>            struct qdisc_skb_cb *cb = (struct qdisc_skb_cb *)skb->cb;
>>>>>>
>>>>>> -       if (!skb->len)
>>>>>> -               return -EINVAL;
>>>>>> -
>>>>>>            if (!__skb)
>>>>>>                    return 0;
>>>>>>
>>>>>> @@ -1102,6 +1099,9 @@ int bpf_prog_test_run_skb(struct bpf_prog *prog, const union bpf_attr *kattr,
>>>>>>            if (IS_ERR(data))
>>>>>>                    return PTR_ERR(data);
>>>>>>
>>>>>> +       if (size == ETH_HLEN)
>>>>>> +               is_l2 = true;
>>>>>> +
>>>>>
>>>>> Don't think this will work? That is_l2 is there to expose proper l2/l3
>>>>> skb for specific hooks; we can't suddenly start exposing l2 headers to
>>>>> the hooks that don't expect it.
>>>>> Does it make sense to start with a small reproducer that triggers the
>>>>> issue first? We can have a couple of cases for
>>>>> len=0/ETH_HLEN-1/ETH_HLEN+1 and trigger them from the bpf program that
>>>>> redirects to different devices (to trigger dev_is_mac_header_xmit).
>>>>>
>>>>>
>>>> Hi Stanislav:
>>>>           Thank you for your review. Is_l2 is the flag of a specific
>>>> hook. Therefore, do you mean that if skb->len is equal to 0, just
>>>> add the length back?
>>>
>>> Not sure I understand your question. All I'm saying is - you can't
>>> flip that flag arbitrarily. This flag depends on the attach point that
>>> you're running the prog against. Some attach points expect packets
>>> with l2, some expect packets without l2.
>>>
>>> What about starting with a small reproducer? Does it make sense to
>>> create a small selftest that adds net namespace + fq_codel +
>>> bpf_prog_test run and do redirect ingress/egress with len
>>> 0/1...tcphdr? Because I'm not sure I 100% understand whether it's only
>>> len=0 that's problematic or some other combination as well?
>>>
>> yes, only skb->len = 0 will cause null-ptr-deref issue.
>> The following is the process of triggering the problem:
>> enqueue a skb:
>> fq_codel_enqueue()
>>          ...
>>          idx = fq_codel_classify()        --->if idx != 0
>>          flow = &q->flows[idx];
>>          flow_queue_add(flow, skb);       --->add skb to flow[idex]
>>          q->backlogs[idx] += qdisc_pkt_len(skb); --->backlogs = 0
>>          ...
>>          fq_codel_drop()                  --->set sch->limit = 0, always
>> drop packets
>>                  ...
>>                  idx = i                  --->becuase backlogs in every
>> flows is 0, so idx = 0
>>                  ...
>>                  flow = &q->flows[idx];   --->get idx=0 flow
>>                  ...
>>                  dequeue_head()
>>                          skb = flow->head; --->flow->head = NULL
>>                          flow->head = skb->next; --->cause null-ptr-deref
>> So, if skb->len !=0,fq_codel_drop() could get the correct idx, and
>> then skb!=NULL, it will be OK.
>> Maybe, I will fix it in fq_codel.
> 
> I think the consensus here is that the stack, in general, doesn't
> expect the packets like this. So there are probably more broken things
> besides fq_codel. Thus, it's better if we remove the ability to
> generate them from the bpf side instead of fixing the individual users
> like fq_codel.
> 
>> But, as I know, skb->len = 0 is just invalid packet. I prefer to add the
>> length back, like bellow:
>>          if (is_l2 || !skb->len)
>>                  __skb_push(skb, hh_len);
>> is it OK?
> 
> Probably not?
> 
> Looking at the original syzkaller report, prog_type is
> BPF_PROG_TYPE_LWT_XMIT which does expect a packet without l2 header.
> Can we do something like:
> 
> if (!is_l2 && !skb->len) {
>    // append some dummy byte to the skb ?
> }
> 
> 
I pad one byte, and test OK.
if (!is_l2 && !skb->len)
     __skb_push(skb, 1);

Does it look OK to you?
> }
> 
>>>>>
>>>>>
>>>>>
>>>>>>            ctx = bpf_ctx_init(kattr, sizeof(struct __sk_buff));
>>>>>>            if (IS_ERR(ctx)) {
>>>>>>                    kfree(data);
>>>>>> --
>>>>>> 2.17.1
>>>>>>
Stanislav Fomichev Oct. 24, 2022, 5:13 p.m. UTC | #7
On Sat, Oct 22, 2022 at 4:36 AM shaozhengchao <shaozhengchao@huawei.com> wrote:
>
>
>
> On 2022/10/22 2:16, Stanislav Fomichev wrote:
> > On Fri, Oct 21, 2022 at 12:25 AM shaozhengchao <shaozhengchao@huawei.com> wrote:
> >>
> >>
> >>
> >> On 2022/10/21 1:45, Stanislav Fomichev wrote:
> >>> On Wed, Oct 19, 2022 at 6:47 PM shaozhengchao <shaozhengchao@huawei.com> wrote:
> >>>>
> >>>>
> >>>>
> >>>> On 2022/10/18 0:36, Stanislav Fomichev wrote:
> >>>>> On Sat, Oct 15, 2022 at 2:16 AM Zhengchao Shao <shaozhengchao@huawei.com> wrote:
> >>>>>>
> >>>>>> As [0] see, bpf_prog_test_run_skb() should allow user space to forward
> >>>>>> 14-bytes packet via BPF_PROG_RUN instead of dropping packet directly.
> >>>>>> So fix it.
> >>>>>>
> >>>>>> 0: https://github.com/cilium/ebpf/commit/a38fb6b5a46ab3b5639ea4d421232a10013596c0
> >>>>>>
> >>>>>> Fixes: fd1894224407 ("bpf: Don't redirect packets with invalid pkt_len")
> >>>>>> Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
> >>>>>> ---
> >>>>>>     net/bpf/test_run.c | 6 +++---
> >>>>>>     1 file changed, 3 insertions(+), 3 deletions(-)
> >>>>>>
> >>>>>> diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
> >>>>>> index 13d578ce2a09..aa1b49f19ca3 100644
> >>>>>> --- a/net/bpf/test_run.c
> >>>>>> +++ b/net/bpf/test_run.c
> >>>>>> @@ -979,9 +979,6 @@ static int convert___skb_to_skb(struct sk_buff *skb, struct __sk_buff *__skb)
> >>>>>>     {
> >>>>>>            struct qdisc_skb_cb *cb = (struct qdisc_skb_cb *)skb->cb;
> >>>>>>
> >>>>>> -       if (!skb->len)
> >>>>>> -               return -EINVAL;
> >>>>>> -
> >>>>>>            if (!__skb)
> >>>>>>                    return 0;
> >>>>>>
> >>>>>> @@ -1102,6 +1099,9 @@ int bpf_prog_test_run_skb(struct bpf_prog *prog, const union bpf_attr *kattr,
> >>>>>>            if (IS_ERR(data))
> >>>>>>                    return PTR_ERR(data);
> >>>>>>
> >>>>>> +       if (size == ETH_HLEN)
> >>>>>> +               is_l2 = true;
> >>>>>> +
> >>>>>
> >>>>> Don't think this will work? That is_l2 is there to expose proper l2/l3
> >>>>> skb for specific hooks; we can't suddenly start exposing l2 headers to
> >>>>> the hooks that don't expect it.
> >>>>> Does it make sense to start with a small reproducer that triggers the
> >>>>> issue first? We can have a couple of cases for
> >>>>> len=0/ETH_HLEN-1/ETH_HLEN+1 and trigger them from the bpf program that
> >>>>> redirects to different devices (to trigger dev_is_mac_header_xmit).
> >>>>>
> >>>>>
> >>>> Hi Stanislav:
> >>>>           Thank you for your review. Is_l2 is the flag of a specific
> >>>> hook. Therefore, do you mean that if skb->len is equal to 0, just
> >>>> add the length back?
> >>>
> >>> Not sure I understand your question. All I'm saying is - you can't
> >>> flip that flag arbitrarily. This flag depends on the attach point that
> >>> you're running the prog against. Some attach points expect packets
> >>> with l2, some expect packets without l2.
> >>>
> >>> What about starting with a small reproducer? Does it make sense to
> >>> create a small selftest that adds net namespace + fq_codel +
> >>> bpf_prog_test run and do redirect ingress/egress with len
> >>> 0/1...tcphdr? Because I'm not sure I 100% understand whether it's only
> >>> len=0 that's problematic or some other combination as well?
> >>>
> >> yes, only skb->len = 0 will cause null-ptr-deref issue.
> >> The following is the process of triggering the problem:
> >> enqueue a skb:
> >> fq_codel_enqueue()
> >>          ...
> >>          idx = fq_codel_classify()        --->if idx != 0
> >>          flow = &q->flows[idx];
> >>          flow_queue_add(flow, skb);       --->add skb to flow[idex]
> >>          q->backlogs[idx] += qdisc_pkt_len(skb); --->backlogs = 0
> >>          ...
> >>          fq_codel_drop()                  --->set sch->limit = 0, always
> >> drop packets
> >>                  ...
> >>                  idx = i                  --->becuase backlogs in every
> >> flows is 0, so idx = 0
> >>                  ...
> >>                  flow = &q->flows[idx];   --->get idx=0 flow
> >>                  ...
> >>                  dequeue_head()
> >>                          skb = flow->head; --->flow->head = NULL
> >>                          flow->head = skb->next; --->cause null-ptr-deref
> >> So, if skb->len !=0,fq_codel_drop() could get the correct idx, and
> >> then skb!=NULL, it will be OK.
> >> Maybe, I will fix it in fq_codel.
> >
> > I think the consensus here is that the stack, in general, doesn't
> > expect the packets like this. So there are probably more broken things
> > besides fq_codel. Thus, it's better if we remove the ability to
> > generate them from the bpf side instead of fixing the individual users
> > like fq_codel.
> >
> >> But, as I know, skb->len = 0 is just invalid packet. I prefer to add the
> >> length back, like bellow:
> >>          if (is_l2 || !skb->len)
> >>                  __skb_push(skb, hh_len);
> >> is it OK?
> >
> > Probably not?
> >
> > Looking at the original syzkaller report, prog_type is
> > BPF_PROG_TYPE_LWT_XMIT which does expect a packet without l2 header.
> > Can we do something like:
> >
> > if (!is_l2 && !skb->len) {
> >    // append some dummy byte to the skb ?
> > }
> >
> >
> I pad one byte, and test OK.
> if (!is_l2 && !skb->len)
>      __skb_push(skb, 1);
>
> Does it look OK to you?

Nope, this will eat a byte out of the l2 header. We need to skb_put
and make sure we allocate enough to make that skb_put succeed.

But stepping back a bit: it feels like it's all unnecessary? The only
valid use-case of this is probing for the BPF_PROG_TEST_RUN as cilium
does. This is mostly about testing, so fixing it in the users seems
fair? No real production code is expected to generate these zero-len
packets. Or are we concerned that this will leak into stable kernels?

I feel like we are trying to add more complexity here for no apparent reason.
shaozhengchao Oct. 27, 2022, 11:58 a.m. UTC | #8
On 2022/10/25 1:13, Stanislav Fomichev wrote:
> On Sat, Oct 22, 2022 at 4:36 AM shaozhengchao <shaozhengchao@huawei.com> wrote:
>>
>>
>>
>> On 2022/10/22 2:16, Stanislav Fomichev wrote:
>>> On Fri, Oct 21, 2022 at 12:25 AM shaozhengchao <shaozhengchao@huawei.com> wrote:
>>>>
>>>>
>>>>
>>>> On 2022/10/21 1:45, Stanislav Fomichev wrote:
>>>>> On Wed, Oct 19, 2022 at 6:47 PM shaozhengchao <shaozhengchao@huawei.com> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 2022/10/18 0:36, Stanislav Fomichev wrote:
>>>>>>> On Sat, Oct 15, 2022 at 2:16 AM Zhengchao Shao <shaozhengchao@huawei.com> wrote:
>>>>>>>>
>>>>>>>> As [0] see, bpf_prog_test_run_skb() should allow user space to forward
>>>>>>>> 14-bytes packet via BPF_PROG_RUN instead of dropping packet directly.
>>>>>>>> So fix it.
>>>>>>>>
>>>>>>>> 0: https://github.com/cilium/ebpf/commit/a38fb6b5a46ab3b5639ea4d421232a10013596c0
>>>>>>>>
>>>>>>>> Fixes: fd1894224407 ("bpf: Don't redirect packets with invalid pkt_len")
>>>>>>>> Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
>>>>>>>> ---
>>>>>>>>      net/bpf/test_run.c | 6 +++---
>>>>>>>>      1 file changed, 3 insertions(+), 3 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
>>>>>>>> index 13d578ce2a09..aa1b49f19ca3 100644
>>>>>>>> --- a/net/bpf/test_run.c
>>>>>>>> +++ b/net/bpf/test_run.c
>>>>>>>> @@ -979,9 +979,6 @@ static int convert___skb_to_skb(struct sk_buff *skb, struct __sk_buff *__skb)
>>>>>>>>      {
>>>>>>>>             struct qdisc_skb_cb *cb = (struct qdisc_skb_cb *)skb->cb;
>>>>>>>>
>>>>>>>> -       if (!skb->len)
>>>>>>>> -               return -EINVAL;
>>>>>>>> -
>>>>>>>>             if (!__skb)
>>>>>>>>                     return 0;
>>>>>>>>
>>>>>>>> @@ -1102,6 +1099,9 @@ int bpf_prog_test_run_skb(struct bpf_prog *prog, const union bpf_attr *kattr,
>>>>>>>>             if (IS_ERR(data))
>>>>>>>>                     return PTR_ERR(data);
>>>>>>>>
>>>>>>>> +       if (size == ETH_HLEN)
>>>>>>>> +               is_l2 = true;
>>>>>>>> +
>>>>>>>
>>>>>>> Don't think this will work? That is_l2 is there to expose proper l2/l3
>>>>>>> skb for specific hooks; we can't suddenly start exposing l2 headers to
>>>>>>> the hooks that don't expect it.
>>>>>>> Does it make sense to start with a small reproducer that triggers the
>>>>>>> issue first? We can have a couple of cases for
>>>>>>> len=0/ETH_HLEN-1/ETH_HLEN+1 and trigger them from the bpf program that
>>>>>>> redirects to different devices (to trigger dev_is_mac_header_xmit).
>>>>>>>
>>>>>>>
>>>>>> Hi Stanislav:
>>>>>>            Thank you for your review. Is_l2 is the flag of a specific
>>>>>> hook. Therefore, do you mean that if skb->len is equal to 0, just
>>>>>> add the length back?
>>>>>
>>>>> Not sure I understand your question. All I'm saying is - you can't
>>>>> flip that flag arbitrarily. This flag depends on the attach point that
>>>>> you're running the prog against. Some attach points expect packets
>>>>> with l2, some expect packets without l2.
>>>>>
>>>>> What about starting with a small reproducer? Does it make sense to
>>>>> create a small selftest that adds net namespace + fq_codel +
>>>>> bpf_prog_test run and do redirect ingress/egress with len
>>>>> 0/1...tcphdr? Because I'm not sure I 100% understand whether it's only
>>>>> len=0 that's problematic or some other combination as well?
>>>>>
>>>> yes, only skb->len = 0 will cause null-ptr-deref issue.
>>>> The following is the process of triggering the problem:
>>>> enqueue a skb:
>>>> fq_codel_enqueue()
>>>>           ...
>>>>           idx = fq_codel_classify()        --->if idx != 0
>>>>           flow = &q->flows[idx];
>>>>           flow_queue_add(flow, skb);       --->add skb to flow[idex]
>>>>           q->backlogs[idx] += qdisc_pkt_len(skb); --->backlogs = 0
>>>>           ...
>>>>           fq_codel_drop()                  --->set sch->limit = 0, always
>>>> drop packets
>>>>                   ...
>>>>                   idx = i                  --->becuase backlogs in every
>>>> flows is 0, so idx = 0
>>>>                   ...
>>>>                   flow = &q->flows[idx];   --->get idx=0 flow
>>>>                   ...
>>>>                   dequeue_head()
>>>>                           skb = flow->head; --->flow->head = NULL
>>>>                           flow->head = skb->next; --->cause null-ptr-deref
>>>> So, if skb->len !=0,fq_codel_drop() could get the correct idx, and
>>>> then skb!=NULL, it will be OK.
>>>> Maybe, I will fix it in fq_codel.
>>>
>>> I think the consensus here is that the stack, in general, doesn't
>>> expect the packets like this. So there are probably more broken things
>>> besides fq_codel. Thus, it's better if we remove the ability to
>>> generate them from the bpf side instead of fixing the individual users
>>> like fq_codel.
>>>
>>>> But, as I know, skb->len = 0 is just invalid packet. I prefer to add the
>>>> length back, like bellow:
>>>>           if (is_l2 || !skb->len)
>>>>                   __skb_push(skb, hh_len);
>>>> is it OK?
>>>
>>> Probably not?
>>>
>>> Looking at the original syzkaller report, prog_type is
>>> BPF_PROG_TYPE_LWT_XMIT which does expect a packet without l2 header.
>>> Can we do something like:
>>>
>>> if (!is_l2 && !skb->len) {
>>>     // append some dummy byte to the skb ?
>>> }
>>>
>>>
>> I pad one byte, and test OK.
>> if (!is_l2 && !skb->len)
>>       __skb_push(skb, 1);
>>
>> Does it look OK to you?
> 
> Nope, this will eat a byte out of the l2 header. We need to skb_put
> and make sure we allocate enough to make that skb_put succeed.
> 
> But stepping back a bit: it feels like it's all unnecessary? The only
> valid use-case of this is probing for the BPF_PROG_TEST_RUN as cilium
> does. This is mostly about testing, so fixing it in the users seems
> fair? No real production code is expected to generate these zero-len
> packets. Or are we concerned that this will leak into stable kernels?
> 
> I feel like we are trying to add more complexity here for no apparent reason.
> 
I agree with you. users should make sure the correct skb len and
configurations are passed into kernel. Incorrect configurations should
be discarded to ensure kernel stability.

Lorenz, Can you modify the user-mode test code?

Zhengchao Shao
Stanislav Fomichev Oct. 27, 2022, 4:37 p.m. UTC | #9
On Thu, Oct 27, 2022 at 4:58 AM shaozhengchao <shaozhengchao@huawei.com> wrote:
>
>
>
> On 2022/10/25 1:13, Stanislav Fomichev wrote:
> > On Sat, Oct 22, 2022 at 4:36 AM shaozhengchao <shaozhengchao@huawei.com> wrote:
> >>
> >>
> >>
> >> On 2022/10/22 2:16, Stanislav Fomichev wrote:
> >>> On Fri, Oct 21, 2022 at 12:25 AM shaozhengchao <shaozhengchao@huawei.com> wrote:
> >>>>
> >>>>
> >>>>
> >>>> On 2022/10/21 1:45, Stanislav Fomichev wrote:
> >>>>> On Wed, Oct 19, 2022 at 6:47 PM shaozhengchao <shaozhengchao@huawei.com> wrote:
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On 2022/10/18 0:36, Stanislav Fomichev wrote:
> >>>>>>> On Sat, Oct 15, 2022 at 2:16 AM Zhengchao Shao <shaozhengchao@huawei.com> wrote:
> >>>>>>>>
> >>>>>>>> As [0] see, bpf_prog_test_run_skb() should allow user space to forward
> >>>>>>>> 14-bytes packet via BPF_PROG_RUN instead of dropping packet directly.
> >>>>>>>> So fix it.
> >>>>>>>>
> >>>>>>>> 0: https://github.com/cilium/ebpf/commit/a38fb6b5a46ab3b5639ea4d421232a10013596c0
> >>>>>>>>
> >>>>>>>> Fixes: fd1894224407 ("bpf: Don't redirect packets with invalid pkt_len")
> >>>>>>>> Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
> >>>>>>>> ---
> >>>>>>>>      net/bpf/test_run.c | 6 +++---
> >>>>>>>>      1 file changed, 3 insertions(+), 3 deletions(-)
> >>>>>>>>
> >>>>>>>> diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
> >>>>>>>> index 13d578ce2a09..aa1b49f19ca3 100644
> >>>>>>>> --- a/net/bpf/test_run.c
> >>>>>>>> +++ b/net/bpf/test_run.c
> >>>>>>>> @@ -979,9 +979,6 @@ static int convert___skb_to_skb(struct sk_buff *skb, struct __sk_buff *__skb)
> >>>>>>>>      {
> >>>>>>>>             struct qdisc_skb_cb *cb = (struct qdisc_skb_cb *)skb->cb;
> >>>>>>>>
> >>>>>>>> -       if (!skb->len)
> >>>>>>>> -               return -EINVAL;
> >>>>>>>> -
> >>>>>>>>             if (!__skb)
> >>>>>>>>                     return 0;
> >>>>>>>>
> >>>>>>>> @@ -1102,6 +1099,9 @@ int bpf_prog_test_run_skb(struct bpf_prog *prog, const union bpf_attr *kattr,
> >>>>>>>>             if (IS_ERR(data))
> >>>>>>>>                     return PTR_ERR(data);
> >>>>>>>>
> >>>>>>>> +       if (size == ETH_HLEN)
> >>>>>>>> +               is_l2 = true;
> >>>>>>>> +
> >>>>>>>
> >>>>>>> Don't think this will work? That is_l2 is there to expose proper l2/l3
> >>>>>>> skb for specific hooks; we can't suddenly start exposing l2 headers to
> >>>>>>> the hooks that don't expect it.
> >>>>>>> Does it make sense to start with a small reproducer that triggers the
> >>>>>>> issue first? We can have a couple of cases for
> >>>>>>> len=0/ETH_HLEN-1/ETH_HLEN+1 and trigger them from the bpf program that
> >>>>>>> redirects to different devices (to trigger dev_is_mac_header_xmit).
> >>>>>>>
> >>>>>>>
> >>>>>> Hi Stanislav:
> >>>>>>            Thank you for your review. Is_l2 is the flag of a specific
> >>>>>> hook. Therefore, do you mean that if skb->len is equal to 0, just
> >>>>>> add the length back?
> >>>>>
> >>>>> Not sure I understand your question. All I'm saying is - you can't
> >>>>> flip that flag arbitrarily. This flag depends on the attach point that
> >>>>> you're running the prog against. Some attach points expect packets
> >>>>> with l2, some expect packets without l2.
> >>>>>
> >>>>> What about starting with a small reproducer? Does it make sense to
> >>>>> create a small selftest that adds net namespace + fq_codel +
> >>>>> bpf_prog_test run and do redirect ingress/egress with len
> >>>>> 0/1...tcphdr? Because I'm not sure I 100% understand whether it's only
> >>>>> len=0 that's problematic or some other combination as well?
> >>>>>
> >>>> yes, only skb->len = 0 will cause null-ptr-deref issue.
> >>>> The following is the process of triggering the problem:
> >>>> enqueue a skb:
> >>>> fq_codel_enqueue()
> >>>>           ...
> >>>>           idx = fq_codel_classify()        --->if idx != 0
> >>>>           flow = &q->flows[idx];
> >>>>           flow_queue_add(flow, skb);       --->add skb to flow[idex]
> >>>>           q->backlogs[idx] += qdisc_pkt_len(skb); --->backlogs = 0
> >>>>           ...
> >>>>           fq_codel_drop()                  --->set sch->limit = 0, always
> >>>> drop packets
> >>>>                   ...
> >>>>                   idx = i                  --->becuase backlogs in every
> >>>> flows is 0, so idx = 0
> >>>>                   ...
> >>>>                   flow = &q->flows[idx];   --->get idx=0 flow
> >>>>                   ...
> >>>>                   dequeue_head()
> >>>>                           skb = flow->head; --->flow->head = NULL
> >>>>                           flow->head = skb->next; --->cause null-ptr-deref
> >>>> So, if skb->len !=0,fq_codel_drop() could get the correct idx, and
> >>>> then skb!=NULL, it will be OK.
> >>>> Maybe, I will fix it in fq_codel.
> >>>
> >>> I think the consensus here is that the stack, in general, doesn't
> >>> expect the packets like this. So there are probably more broken things
> >>> besides fq_codel. Thus, it's better if we remove the ability to
> >>> generate them from the bpf side instead of fixing the individual users
> >>> like fq_codel.
> >>>
> >>>> But, as I know, skb->len = 0 is just invalid packet. I prefer to add the
> >>>> length back, like bellow:
> >>>>           if (is_l2 || !skb->len)
> >>>>                   __skb_push(skb, hh_len);
> >>>> is it OK?
> >>>
> >>> Probably not?
> >>>
> >>> Looking at the original syzkaller report, prog_type is
> >>> BPF_PROG_TYPE_LWT_XMIT which does expect a packet without l2 header.
> >>> Can we do something like:
> >>>
> >>> if (!is_l2 && !skb->len) {
> >>>     // append some dummy byte to the skb ?
> >>> }
> >>>
> >>>
> >> I pad one byte, and test OK.
> >> if (!is_l2 && !skb->len)
> >>       __skb_push(skb, 1);
> >>
> >> Does it look OK to you?
> >
> > Nope, this will eat a byte out of the l2 header. We need to skb_put
> > and make sure we allocate enough to make that skb_put succeed.
> >
> > But stepping back a bit: it feels like it's all unnecessary? The only
> > valid use-case of this is probing for the BPF_PROG_TEST_RUN as cilium
> > does. This is mostly about testing, so fixing it in the users seems
> > fair? No real production code is expected to generate these zero-len
> > packets. Or are we concerned that this will leak into stable kernels?
> >
> > I feel like we are trying to add more complexity here for no apparent reason.
> >
> I agree with you. users should make sure the correct skb len and
> configurations are passed into kernel. Incorrect configurations should
> be discarded to ensure kernel stability.
>
> Lorenz, Can you modify the user-mode test code?

Lorenz already fixed it for Cilium. I think the discussion here is
around other potential users out there.
Let's wait for them to appear if it is indeed a problem?
shaozhengchao Oct. 28, 2022, 1:23 a.m. UTC | #10
On 2022/10/28 0:37, Stanislav Fomichev wrote:
> On Thu, Oct 27, 2022 at 4:58 AM shaozhengchao <shaozhengchao@huawei.com> wrote:
>>
>>
>>
>> On 2022/10/25 1:13, Stanislav Fomichev wrote:
>>> On Sat, Oct 22, 2022 at 4:36 AM shaozhengchao <shaozhengchao@huawei.com> wrote:
>>>>
>>>>
>>>>
>>>> On 2022/10/22 2:16, Stanislav Fomichev wrote:
>>>>> On Fri, Oct 21, 2022 at 12:25 AM shaozhengchao <shaozhengchao@huawei.com> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 2022/10/21 1:45, Stanislav Fomichev wrote:
>>>>>>> On Wed, Oct 19, 2022 at 6:47 PM shaozhengchao <shaozhengchao@huawei.com> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 2022/10/18 0:36, Stanislav Fomichev wrote:
>>>>>>>>> On Sat, Oct 15, 2022 at 2:16 AM Zhengchao Shao <shaozhengchao@huawei.com> wrote:
>>>>>>>>>>
>>>>>>>>>> As [0] see, bpf_prog_test_run_skb() should allow user space to forward
>>>>>>>>>> 14-bytes packet via BPF_PROG_RUN instead of dropping packet directly.
>>>>>>>>>> So fix it.
>>>>>>>>>>
>>>>>>>>>> 0: https://github.com/cilium/ebpf/commit/a38fb6b5a46ab3b5639ea4d421232a10013596c0
>>>>>>>>>>
>>>>>>>>>> Fixes: fd1894224407 ("bpf: Don't redirect packets with invalid pkt_len")
>>>>>>>>>> Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
>>>>>>>>>> ---
>>>>>>>>>>       net/bpf/test_run.c | 6 +++---
>>>>>>>>>>       1 file changed, 3 insertions(+), 3 deletions(-)
>>>>>>>>>>
>>>>>>>>>> diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
>>>>>>>>>> index 13d578ce2a09..aa1b49f19ca3 100644
>>>>>>>>>> --- a/net/bpf/test_run.c
>>>>>>>>>> +++ b/net/bpf/test_run.c
>>>>>>>>>> @@ -979,9 +979,6 @@ static int convert___skb_to_skb(struct sk_buff *skb, struct __sk_buff *__skb)
>>>>>>>>>>       {
>>>>>>>>>>              struct qdisc_skb_cb *cb = (struct qdisc_skb_cb *)skb->cb;
>>>>>>>>>>
>>>>>>>>>> -       if (!skb->len)
>>>>>>>>>> -               return -EINVAL;
>>>>>>>>>> -
>>>>>>>>>>              if (!__skb)
>>>>>>>>>>                      return 0;
>>>>>>>>>>
>>>>>>>>>> @@ -1102,6 +1099,9 @@ int bpf_prog_test_run_skb(struct bpf_prog *prog, const union bpf_attr *kattr,
>>>>>>>>>>              if (IS_ERR(data))
>>>>>>>>>>                      return PTR_ERR(data);
>>>>>>>>>>
>>>>>>>>>> +       if (size == ETH_HLEN)
>>>>>>>>>> +               is_l2 = true;
>>>>>>>>>> +
>>>>>>>>>
>>>>>>>>> Don't think this will work? That is_l2 is there to expose proper l2/l3
>>>>>>>>> skb for specific hooks; we can't suddenly start exposing l2 headers to
>>>>>>>>> the hooks that don't expect it.
>>>>>>>>> Does it make sense to start with a small reproducer that triggers the
>>>>>>>>> issue first? We can have a couple of cases for
>>>>>>>>> len=0/ETH_HLEN-1/ETH_HLEN+1 and trigger them from the bpf program that
>>>>>>>>> redirects to different devices (to trigger dev_is_mac_header_xmit).
>>>>>>>>>
>>>>>>>>>
>>>>>>>> Hi Stanislav:
>>>>>>>>             Thank you for your review. Is_l2 is the flag of a specific
>>>>>>>> hook. Therefore, do you mean that if skb->len is equal to 0, just
>>>>>>>> add the length back?
>>>>>>>
>>>>>>> Not sure I understand your question. All I'm saying is - you can't
>>>>>>> flip that flag arbitrarily. This flag depends on the attach point that
>>>>>>> you're running the prog against. Some attach points expect packets
>>>>>>> with l2, some expect packets without l2.
>>>>>>>
>>>>>>> What about starting with a small reproducer? Does it make sense to
>>>>>>> create a small selftest that adds net namespace + fq_codel +
>>>>>>> bpf_prog_test run and do redirect ingress/egress with len
>>>>>>> 0/1...tcphdr? Because I'm not sure I 100% understand whether it's only
>>>>>>> len=0 that's problematic or some other combination as well?
>>>>>>>
>>>>>> yes, only skb->len = 0 will cause null-ptr-deref issue.
>>>>>> The following is the process of triggering the problem:
>>>>>> enqueue a skb:
>>>>>> fq_codel_enqueue()
>>>>>>            ...
>>>>>>            idx = fq_codel_classify()        --->if idx != 0
>>>>>>            flow = &q->flows[idx];
>>>>>>            flow_queue_add(flow, skb);       --->add skb to flow[idex]
>>>>>>            q->backlogs[idx] += qdisc_pkt_len(skb); --->backlogs = 0
>>>>>>            ...
>>>>>>            fq_codel_drop()                  --->set sch->limit = 0, always
>>>>>> drop packets
>>>>>>                    ...
>>>>>>                    idx = i                  --->becuase backlogs in every
>>>>>> flows is 0, so idx = 0
>>>>>>                    ...
>>>>>>                    flow = &q->flows[idx];   --->get idx=0 flow
>>>>>>                    ...
>>>>>>                    dequeue_head()
>>>>>>                            skb = flow->head; --->flow->head = NULL
>>>>>>                            flow->head = skb->next; --->cause null-ptr-deref
>>>>>> So, if skb->len !=0,fq_codel_drop() could get the correct idx, and
>>>>>> then skb!=NULL, it will be OK.
>>>>>> Maybe, I will fix it in fq_codel.
>>>>>
>>>>> I think the consensus here is that the stack, in general, doesn't
>>>>> expect the packets like this. So there are probably more broken things
>>>>> besides fq_codel. Thus, it's better if we remove the ability to
>>>>> generate them from the bpf side instead of fixing the individual users
>>>>> like fq_codel.
>>>>>
>>>>>> But, as I know, skb->len = 0 is just invalid packet. I prefer to add the
>>>>>> length back, like bellow:
>>>>>>            if (is_l2 || !skb->len)
>>>>>>                    __skb_push(skb, hh_len);
>>>>>> is it OK?
>>>>>
>>>>> Probably not?
>>>>>
>>>>> Looking at the original syzkaller report, prog_type is
>>>>> BPF_PROG_TYPE_LWT_XMIT which does expect a packet without l2 header.
>>>>> Can we do something like:
>>>>>
>>>>> if (!is_l2 && !skb->len) {
>>>>>      // append some dummy byte to the skb ?
>>>>> }
>>>>>
>>>>>
>>>> I pad one byte, and test OK.
>>>> if (!is_l2 && !skb->len)
>>>>        __skb_push(skb, 1);
>>>>
>>>> Does it look OK to you?
>>>
>>> Nope, this will eat a byte out of the l2 header. We need to skb_put
>>> and make sure we allocate enough to make that skb_put succeed.
>>>
>>> But stepping back a bit: it feels like it's all unnecessary? The only
>>> valid use-case of this is probing for the BPF_PROG_TEST_RUN as cilium
>>> does. This is mostly about testing, so fixing it in the users seems
>>> fair? No real production code is expected to generate these zero-len
>>> packets. Or are we concerned that this will leak into stable kernels?
>>>
>>> I feel like we are trying to add more complexity here for no apparent reason.
>>>
>> I agree with you. users should make sure the correct skb len and
>> configurations are passed into kernel. Incorrect configurations should
>> be discarded to ensure kernel stability.
>>
>> Lorenz, Can you modify the user-mode test code?
> 
> Lorenz already fixed it for Cilium. I think the discussion here is
> around other potential users out there.
> Let's wait for them to appear if it is indeed a problem?

Currently, Lorenz only does the workaround. Maybe Lorenz still expects 
to change the value back to 14 bytes?
diff mbox series

Patch

diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
index 13d578ce2a09..aa1b49f19ca3 100644
--- a/net/bpf/test_run.c
+++ b/net/bpf/test_run.c
@@ -979,9 +979,6 @@  static int convert___skb_to_skb(struct sk_buff *skb, struct __sk_buff *__skb)
 {
 	struct qdisc_skb_cb *cb = (struct qdisc_skb_cb *)skb->cb;
 
-	if (!skb->len)
-		return -EINVAL;
-
 	if (!__skb)
 		return 0;
 
@@ -1102,6 +1099,9 @@  int bpf_prog_test_run_skb(struct bpf_prog *prog, const union bpf_attr *kattr,
 	if (IS_ERR(data))
 		return PTR_ERR(data);
 
+	if (size == ETH_HLEN)
+		is_l2 = true;
+
 	ctx = bpf_ctx_init(kattr, sizeof(struct __sk_buff));
 	if (IS_ERR(ctx)) {
 		kfree(data);