[RFC,bpf-next,1/2] bpf: Add bpf_current_capable kfunc

Message ID	20230814143341.3767-2-laoar.shao@gmail.com (mailing list archive)
State	RFC
Delegated to:	BPF
Headers	show Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 197FBBA34 for <bpf@vger.kernel.org>; Mon, 14 Aug 2023 14:33:57 +0000 (UTC) From: Yafang Shao <laoar.shao@gmail.com> To: ast@kernel.org, daniel@iogearbox.net, john.fastabend@gmail.com, andrii@kernel.org, martin.lau@linux.dev, song@kernel.org, yonghong.song@linux.dev, kpsingh@kernel.org, sdf@google.com, haoluo@google.com, jolsa@kernel.org Cc: bpf@vger.kernel.org, Yafang Shao <laoar.shao@gmail.com> Subject: [RFC PATCH bpf-next 1/2] bpf: Add bpf_current_capable kfunc Date: Mon, 14 Aug 2023 14:33:40 +0000 Message-Id: <20230814143341.3767-2-laoar.shao@gmail.com> In-Reply-To: <20230814143341.3767-1-laoar.shao@gmail.com> References: <20230814143341.3767-1-laoar.shao@gmail.com> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	bpf: Add a new kfunc bpf_current_capable \| expand [RFC,bpf-next,0/2] bpf: Add a new kfunc bpf_current_capable [RFC,bpf-next,1/2] bpf: Add bpf_current_capable kfunc [RFC,bpf-next,2/2] selftests/bpf: Add selftest for bpf_current_capable

Context	Check	Description
bpf/vmtest-bpf-next-VM_Test-1	success	Logs for ShellCheck
bpf/vmtest-bpf-next-VM_Test-2	success	Logs for build for aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-3	success	Logs for build for s390x with gcc
bpf/vmtest-bpf-next-VM_Test-4	success	Logs for build for x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-5	success	Logs for build for x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-6	success	Logs for set-matrix
bpf/vmtest-bpf-next-VM_Test-20	success	Logs for test_progs_no_alu32_parallel on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-21	success	Logs for test_progs_no_alu32_parallel on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-25	success	Logs for test_verifier on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-27	success	Logs for test_verifier on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-28	success	Logs for test_verifier on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-29	success	Logs for veristat
netdev/series_format	success	Posting correctly formatted
netdev/tree_selection	success	Clearly marked for bpf-next, async
netdev/fixes_present	success	Fixes tag not required for -next series
netdev/header_inline	success	No static functions without inline keyword in header files
netdev/build_32bit	fail	Errors and warnings before: 1370 this patch: 1371
netdev/cc_maintainers	success	CCed 12 of 12 maintainers
netdev/build_clang	success	Errors and warnings before: 1353 this patch: 1353
netdev/verify_signedoff	success	Signed-off-by tag matches author and committer
netdev/deprecated_api	success	None detected
netdev/check_selftest	success	No net selftest shell script
netdev/verify_fixes	success	No Fixes tag
netdev/build_allmodconfig_warn	fail	Errors and warnings before: 1393 this patch: 1394
netdev/checkpatch	success	total: 0 errors, 0 warnings, 0 checks, 18 lines checked
netdev/kdoc	success	Errors and warnings before: 0 this patch: 0
netdev/source_inline	success	Was 0 now: 0
bpf/vmtest-bpf-next-VM_Test-7	success	Logs for test_maps on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-9	success	Logs for test_maps on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-10	success	Logs for test_maps on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-11	success	Logs for test_progs on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-13	success	Logs for test_progs on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-14	success	Logs for test_progs on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-15	success	Logs for test_progs_no_alu32 on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-16	success	Logs for test_progs_no_alu32 on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-17	success	Logs for test_progs_no_alu32 on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-18	success	Logs for test_progs_no_alu32 on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-19	success	Logs for test_progs_no_alu32_parallel on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-22	success	Logs for test_progs_parallel on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-23	success	Logs for test_progs_parallel on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-24	success	Logs for test_progs_parallel on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-26	success	Logs for test_verifier on s390x with gcc
bpf/vmtest-bpf-next-PR	success	PR summary
bpf/vmtest-bpf-next-VM_Test-8	success	Logs for test_maps on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-12	success	Logs for test_progs on s390x with gcc

Yafang Shao Aug. 14, 2023, 2:33 p.m. UTC

Add a new bpf_current_capable kfunc to check whether the current task
has a specific capability. In our use case, we will use it in a lsm bpf
program to help identify if the user operation is permitted.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
 kernel/bpf/helpers.c | 6 ++++++
 1 file changed, 6 insertions(+)

Yonghong Song Aug. 15, 2023, 12:28 a.m. UTC | #1

On 8/14/23 7:33 AM, Yafang Shao wrote:
> Add a new bpf_current_capable kfunc to check whether the current task
> has a specific capability. In our use case, we will use it in a lsm bpf
> program to help identify if the user operation is permitted.
> 
> Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> ---
>   kernel/bpf/helpers.c | 6 ++++++
>   1 file changed, 6 insertions(+)
> 
> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> index eb91cae..bbee7ea 100644
> --- a/kernel/bpf/helpers.c
> +++ b/kernel/bpf/helpers.c
> @@ -2429,6 +2429,11 @@ __bpf_kfunc void bpf_rcu_read_unlock(void)
>   	rcu_read_unlock();
>   }
>   
> +__bpf_kfunc bool bpf_current_capable(int cap)
> +{
> +	return has_capability(current, cap);
> +}

Since you are testing against 'current' capabilities, I assume
that the context should be process. Otherwise, you are testing
against random task which does not make much sense.

Since you are testing against 'current' cap, and if the capability
for that task is stable, you do not need this kfunc.
You can test cap in user space and pass it into the bpf program.

But if the cap for your process may change in the middle of
run, then you indeed need bpf prog to test capability in real time.
Is this your use case and could you describe in in more detail?

> +
>   __diag_pop();
>   
>   BTF_SET8_START(generic_btf_ids)
> @@ -2456,6 +2461,7 @@ __bpf_kfunc void bpf_rcu_read_unlock(void)
>   BTF_ID_FLAGS(func, bpf_task_under_cgroup, KF_RCU)
>   #endif
>   BTF_ID_FLAGS(func, bpf_task_from_pid, KF_ACQUIRE | KF_RET_NULL)
> +BTF_ID_FLAGS(func, bpf_current_capable, KF_RCU)
>   BTF_SET8_END(generic_btf_ids)
>   
>   static const struct btf_kfunc_id_set generic_kfunc_set = {

Yafang Shao Aug. 15, 2023, 2:45 a.m. UTC | #2

On Tue, Aug 15, 2023 at 8:28 AM Yonghong Song <yonghong.song@linux.dev> wrote:
>
>
>
> On 8/14/23 7:33 AM, Yafang Shao wrote:
> > Add a new bpf_current_capable kfunc to check whether the current task
> > has a specific capability. In our use case, we will use it in a lsm bpf
> > program to help identify if the user operation is permitted.
> >
> > Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> > ---
> >   kernel/bpf/helpers.c | 6 ++++++
> >   1 file changed, 6 insertions(+)
> >
> > diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> > index eb91cae..bbee7ea 100644
> > --- a/kernel/bpf/helpers.c
> > +++ b/kernel/bpf/helpers.c
> > @@ -2429,6 +2429,11 @@ __bpf_kfunc void bpf_rcu_read_unlock(void)
> >       rcu_read_unlock();
> >   }
> >
> > +__bpf_kfunc bool bpf_current_capable(int cap)
> > +{
> > +     return has_capability(current, cap);
> > +}
>
> Since you are testing against 'current' capabilities, I assume
> that the context should be process. Otherwise, you are testing
> against random task which does not make much sense.

It is in the process context.

>
> Since you are testing against 'current' cap, and if the capability
> for that task is stable, you do not need this kfunc.
> You can test cap in user space and pass it into the bpf program.
>
> But if the cap for your process may change in the middle of
> run, then you indeed need bpf prog to test capability in real time.
> Is this your use case and could you describe in in more detail?

After we convert the capability of our networking bpf program from
CAP_SYS_ADMIN to CAP_BPF+CAP_NET_ADMIN to enhance the security, we
encountered the "pointer comparison prohibited" error, because
allow_ptr_leaks is enabled only when CAP_PERFMON is set. However, if
we enable the CAP_PERFMON for the networking bpf program, it can run
tracing bpf prog, perf_event bpf prog and etc, that is not expected by
us.

Hence we are planning to use a lsm bpf program to disallow it from
running other bpf programs. In our lsm bpf program we will check the
capability of processes, if the process has cap_net_admin, cap_bpf and
cap_perfmon but don't have cap_sys_admin we will refuse it to run
tracing and perf_event bpf program. While if a process has  cap_bpf
and cap_perfmon but doesn't have cap_net_admin, that said it is a bpf
program which wants to run trace bpf, we will allow it.

We can't use lsm_cgroup because it is supported on cgroup2 only, while
we're still using cgroup1.

Another possible solution is enable allow_ptr_leaks for cap_net_admin
as well, but after I checked the commit which introduces the cap_bpf
and cap_perfmon [1], I think we wouldn't like to do it.

[1]. https://lore.kernel.org/bpf/20200513230355.7858-3-alexei.starovoitov@gmail.com/

Yonghong Song Aug. 15, 2023, 3:40 a.m. UTC | #3

On 8/14/23 7:45 PM, Yafang Shao wrote:
> On Tue, Aug 15, 2023 at 8:28 AM Yonghong Song <yonghong.song@linux.dev> wrote:
>>
>>
>>
>> On 8/14/23 7:33 AM, Yafang Shao wrote:
>>> Add a new bpf_current_capable kfunc to check whether the current task
>>> has a specific capability. In our use case, we will use it in a lsm bpf
>>> program to help identify if the user operation is permitted.
>>>
>>> Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
>>> ---
>>>    kernel/bpf/helpers.c | 6 ++++++
>>>    1 file changed, 6 insertions(+)
>>>
>>> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
>>> index eb91cae..bbee7ea 100644
>>> --- a/kernel/bpf/helpers.c
>>> +++ b/kernel/bpf/helpers.c
>>> @@ -2429,6 +2429,11 @@ __bpf_kfunc void bpf_rcu_read_unlock(void)
>>>        rcu_read_unlock();
>>>    }
>>>
>>> +__bpf_kfunc bool bpf_current_capable(int cap)
>>> +{
>>> +     return has_capability(current, cap);
>>> +}
>>
>> Since you are testing against 'current' capabilities, I assume
>> that the context should be process. Otherwise, you are testing
>> against random task which does not make much sense.
> 
> It is in the process context.
> 
>>
>> Since you are testing against 'current' cap, and if the capability
>> for that task is stable, you do not need this kfunc.
>> You can test cap in user space and pass it into the bpf program.
>>
>> But if the cap for your process may change in the middle of
>> run, then you indeed need bpf prog to test capability in real time.
>> Is this your use case and could you describe in in more detail?
> 
> After we convert the capability of our networking bpf program from
> CAP_SYS_ADMIN to CAP_BPF+CAP_NET_ADMIN to enhance the security, we
> encountered the "pointer comparison prohibited" error, because

Could you give a reproducible test case for this verifier failure
with CAP_BPF+CAP_NET_ADMIN capability? Is this due to packet pointer
or something else? Maybe verifier needs to be improved in these
cases without violating the promise of allow_ptr_leaks?

> allow_ptr_leaks is enabled only when CAP_PERFMON is set. However, if
> we enable the CAP_PERFMON for the networking bpf program, it can run
> tracing bpf prog, perf_event bpf prog and etc, that is not expected by
> us.
> 
> Hence we are planning to use a lsm bpf program to disallow it from
> running other bpf programs. In our lsm bpf program we will check the
> capability of processes, if the process has cap_net_admin, cap_bpf and
> cap_perfmon but don't have cap_sys_admin we will refuse it to run
> tracing and perf_event bpf program. While if a process has  cap_bpf
> and cap_perfmon but doesn't have cap_net_admin, that said it is a bpf
> program which wants to run trace bpf, we will allow it.
> 
> We can't use lsm_cgroup because it is supported on cgroup2 only, while
> we're still using cgroup1.
> 
> Another possible solution is enable allow_ptr_leaks for cap_net_admin
> as well, but after I checked the commit which introduces the cap_bpf
> and cap_perfmon [1], I think we wouldn't like to do it.
> 
> [1]. https://lore.kernel.org/bpf/20200513230355.7858-3-alexei.starovoitov@gmail.com/

Yafang Shao Aug. 15, 2023, 5:49 a.m. UTC | #4

On Tue, Aug 15, 2023 at 11:40 AM Yonghong Song <yonghong.song@linux.dev> wrote:
>
>
>
> On 8/14/23 7:45 PM, Yafang Shao wrote:
> > On Tue, Aug 15, 2023 at 8:28 AM Yonghong Song <yonghong.song@linux.dev> wrote:
> >>
> >>
> >>
> >> On 8/14/23 7:33 AM, Yafang Shao wrote:
> >>> Add a new bpf_current_capable kfunc to check whether the current task
> >>> has a specific capability. In our use case, we will use it in a lsm bpf
> >>> program to help identify if the user operation is permitted.
> >>>
> >>> Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> >>> ---
> >>>    kernel/bpf/helpers.c | 6 ++++++
> >>>    1 file changed, 6 insertions(+)
> >>>
> >>> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> >>> index eb91cae..bbee7ea 100644
> >>> --- a/kernel/bpf/helpers.c
> >>> +++ b/kernel/bpf/helpers.c
> >>> @@ -2429,6 +2429,11 @@ __bpf_kfunc void bpf_rcu_read_unlock(void)
> >>>        rcu_read_unlock();
> >>>    }
> >>>
> >>> +__bpf_kfunc bool bpf_current_capable(int cap)
> >>> +{
> >>> +     return has_capability(current, cap);
> >>> +}
> >>
> >> Since you are testing against 'current' capabilities, I assume
> >> that the context should be process. Otherwise, you are testing
> >> against random task which does not make much sense.
> >
> > It is in the process context.
> >
> >>
> >> Since you are testing against 'current' cap, and if the capability
> >> for that task is stable, you do not need this kfunc.
> >> You can test cap in user space and pass it into the bpf program.
> >>
> >> But if the cap for your process may change in the middle of
> >> run, then you indeed need bpf prog to test capability in real time.
> >> Is this your use case and could you describe in in more detail?
> >
> > After we convert the capability of our networking bpf program from
> > CAP_SYS_ADMIN to CAP_BPF+CAP_NET_ADMIN to enhance the security, we
> > encountered the "pointer comparison prohibited" error, because
>
> Could you give a reproducible test case for this verifier failure
> with CAP_BPF+CAP_NET_ADMIN capability? Is this due to packet pointer
> or something else? Maybe verifier needs to be improved in these
> cases without violating the promise of allow_ptr_leaks?

Here it is.

SEC("cls-ingress")
int ingress(struct __sk_buff *skb)
{
        struct iphdr *iph = (void *)(long)skb->data + sizeof(struct ethhdr);

        if ((long)(iph + 1) > (long)skb->data_end)
                return TC_ACT_STOLEN;
        return TC_ACT_OK;
}

In this bpf prog, it will compare the pointer iph with skb->data_end,
and thus it will fail the verifier.

Yonghong Song Aug. 15, 2023, 3:19 p.m. UTC | #5

On 8/14/23 10:49 PM, Yafang Shao wrote:
> On Tue, Aug 15, 2023 at 11:40 AM Yonghong Song <yonghong.song@linux.dev> wrote:
>>
>>
>>
>> On 8/14/23 7:45 PM, Yafang Shao wrote:
>>> On Tue, Aug 15, 2023 at 8:28 AM Yonghong Song <yonghong.song@linux.dev> wrote:
>>>>
>>>>
>>>>
>>>> On 8/14/23 7:33 AM, Yafang Shao wrote:
>>>>> Add a new bpf_current_capable kfunc to check whether the current task
>>>>> has a specific capability. In our use case, we will use it in a lsm bpf
>>>>> program to help identify if the user operation is permitted.
>>>>>
>>>>> Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
>>>>> ---
>>>>>     kernel/bpf/helpers.c | 6 ++++++
>>>>>     1 file changed, 6 insertions(+)
>>>>>
>>>>> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
>>>>> index eb91cae..bbee7ea 100644
>>>>> --- a/kernel/bpf/helpers.c
>>>>> +++ b/kernel/bpf/helpers.c
>>>>> @@ -2429,6 +2429,11 @@ __bpf_kfunc void bpf_rcu_read_unlock(void)
>>>>>         rcu_read_unlock();
>>>>>     }
>>>>>
>>>>> +__bpf_kfunc bool bpf_current_capable(int cap)
>>>>> +{
>>>>> +     return has_capability(current, cap);
>>>>> +}
>>>>
>>>> Since you are testing against 'current' capabilities, I assume
>>>> that the context should be process. Otherwise, you are testing
>>>> against random task which does not make much sense.
>>>
>>> It is in the process context.
>>>
>>>>
>>>> Since you are testing against 'current' cap, and if the capability
>>>> for that task is stable, you do not need this kfunc.
>>>> You can test cap in user space and pass it into the bpf program.
>>>>
>>>> But if the cap for your process may change in the middle of
>>>> run, then you indeed need bpf prog to test capability in real time.
>>>> Is this your use case and could you describe in in more detail?
>>>
>>> After we convert the capability of our networking bpf program from
>>> CAP_SYS_ADMIN to CAP_BPF+CAP_NET_ADMIN to enhance the security, we
>>> encountered the "pointer comparison prohibited" error, because
>>
>> Could you give a reproducible test case for this verifier failure
>> with CAP_BPF+CAP_NET_ADMIN capability? Is this due to packet pointer
>> or something else? Maybe verifier needs to be improved in these
>> cases without violating the promise of allow_ptr_leaks?
> 
> Here it is.
> 
> SEC("cls-ingress")
> int ingress(struct __sk_buff *skb)
> {
>          struct iphdr *iph = (void *)(long)skb->data + sizeof(struct ethhdr);
> 
>          if ((long)(iph + 1) > (long)skb->data_end)
>                  return TC_ACT_STOLEN;
>          return TC_ACT_OK;
> }
> 
> In this bpf prog, it will compare the pointer iph with skb->data_end,
> and thus it will fail the verifier.

Thanks. In this particular case, I think comparing packet ptr and 
data_end should not be considered as ptr_leaks. Probably Alexei
and Daniel can comment on this too.

Alexei Starovoitov Aug. 17, 2023, 1:53 a.m. UTC | #6

On Mon, Aug 14, 2023 at 7:46 PM Yafang Shao <laoar.shao@gmail.com> wrote:
>
> On Tue, Aug 15, 2023 at 8:28 AM Yonghong Song <yonghong.song@linux.dev> wrote:
> >
> >
> >
> > On 8/14/23 7:33 AM, Yafang Shao wrote:
> > > Add a new bpf_current_capable kfunc to check whether the current task
> > > has a specific capability. In our use case, we will use it in a lsm bpf
> > > program to help identify if the user operation is permitted.
> > >
> > > Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> > > ---
> > >   kernel/bpf/helpers.c | 6 ++++++
> > >   1 file changed, 6 insertions(+)
> > >
> > > diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> > > index eb91cae..bbee7ea 100644
> > > --- a/kernel/bpf/helpers.c
> > > +++ b/kernel/bpf/helpers.c
> > > @@ -2429,6 +2429,11 @@ __bpf_kfunc void bpf_rcu_read_unlock(void)
> > >       rcu_read_unlock();
> > >   }
> > >
> > > +__bpf_kfunc bool bpf_current_capable(int cap)
> > > +{
> > > +     return has_capability(current, cap);
> > > +}
> >
> > Since you are testing against 'current' capabilities, I assume
> > that the context should be process. Otherwise, you are testing
> > against random task which does not make much sense.
>
> It is in the process context.
>
> >
> > Since you are testing against 'current' cap, and if the capability
> > for that task is stable, you do not need this kfunc.
> > You can test cap in user space and pass it into the bpf program.
> >
> > But if the cap for your process may change in the middle of
> > run, then you indeed need bpf prog to test capability in real time.
> > Is this your use case and could you describe in in more detail?
>
> After we convert the capability of our networking bpf program from
> CAP_SYS_ADMIN to CAP_BPF+CAP_NET_ADMIN to enhance the security, we
> encountered the "pointer comparison prohibited" error, because
> allow_ptr_leaks is enabled only when CAP_PERFMON is set. However, if
> we enable the CAP_PERFMON for the networking bpf program, it can run
> tracing bpf prog, perf_event bpf prog and etc, that is not expected by
> us.
>
> Hence we are planning to use a lsm bpf program to disallow it from
> running other bpf programs. In our lsm bpf program we will check the
> capability of processes, if the process has cap_net_admin, cap_bpf and
> cap_perfmon but don't have cap_sys_admin we will refuse it to run
> tracing and perf_event bpf program. While if a process has  cap_bpf
> and cap_perfmon but doesn't have cap_net_admin, that said it is a bpf
> program which wants to run trace bpf, we will allow it.
>
> We can't use lsm_cgroup because it is supported on cgroup2 only, while
> we're still using cgroup1.
>
> Another possible solution is enable allow_ptr_leaks for cap_net_admin
> as well, but after I checked the commit which introduces the cap_bpf
> and cap_perfmon [1], I think we wouldn't like to do it.

Sorry. None of these options are acceptable.

The idea of introducing a bpf_current_capable() kfunc just to work
around a deficiency in the verifier is not sound.

Next time instead of sending patch please describe the issue first.
You should have started with an email that:
        struct iphdr *iph = (void *)(long)skb->data + sizeof(struct ethhdr);

        if ((long)(iph + 1) > (long)skb->data_end)
needs cap_perfmon.

Yafang Shao Aug. 17, 2023, 2:30 a.m. UTC | #7

On Thu, Aug 17, 2023 at 9:54 AM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Mon, Aug 14, 2023 at 7:46 PM Yafang Shao <laoar.shao@gmail.com> wrote:
> >
> > On Tue, Aug 15, 2023 at 8:28 AM Yonghong Song <yonghong.song@linux.dev> wrote:
> > >
> > >
> > >
> > > On 8/14/23 7:33 AM, Yafang Shao wrote:
> > > > Add a new bpf_current_capable kfunc to check whether the current task
> > > > has a specific capability. In our use case, we will use it in a lsm bpf
> > > > program to help identify if the user operation is permitted.
> > > >
> > > > Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> > > > ---
> > > >   kernel/bpf/helpers.c | 6 ++++++
> > > >   1 file changed, 6 insertions(+)
> > > >
> > > > diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> > > > index eb91cae..bbee7ea 100644
> > > > --- a/kernel/bpf/helpers.c
> > > > +++ b/kernel/bpf/helpers.c
> > > > @@ -2429,6 +2429,11 @@ __bpf_kfunc void bpf_rcu_read_unlock(void)
> > > >       rcu_read_unlock();
> > > >   }
> > > >
> > > > +__bpf_kfunc bool bpf_current_capable(int cap)
> > > > +{
> > > > +     return has_capability(current, cap);
> > > > +}
> > >
> > > Since you are testing against 'current' capabilities, I assume
> > > that the context should be process. Otherwise, you are testing
> > > against random task which does not make much sense.
> >
> > It is in the process context.
> >
> > >
> > > Since you are testing against 'current' cap, and if the capability
> > > for that task is stable, you do not need this kfunc.
> > > You can test cap in user space and pass it into the bpf program.
> > >
> > > But if the cap for your process may change in the middle of
> > > run, then you indeed need bpf prog to test capability in real time.
> > > Is this your use case and could you describe in in more detail?
> >
> > After we convert the capability of our networking bpf program from
> > CAP_SYS_ADMIN to CAP_BPF+CAP_NET_ADMIN to enhance the security, we
> > encountered the "pointer comparison prohibited" error, because
> > allow_ptr_leaks is enabled only when CAP_PERFMON is set. However, if
> > we enable the CAP_PERFMON for the networking bpf program, it can run
> > tracing bpf prog, perf_event bpf prog and etc, that is not expected by
> > us.
> >
> > Hence we are planning to use a lsm bpf program to disallow it from
> > running other bpf programs. In our lsm bpf program we will check the
> > capability of processes, if the process has cap_net_admin, cap_bpf and
> > cap_perfmon but don't have cap_sys_admin we will refuse it to run
> > tracing and perf_event bpf program. While if a process has  cap_bpf
> > and cap_perfmon but doesn't have cap_net_admin, that said it is a bpf
> > program which wants to run trace bpf, we will allow it.
> >
> > We can't use lsm_cgroup because it is supported on cgroup2 only, while
> > we're still using cgroup1.
> >
> > Another possible solution is enable allow_ptr_leaks for cap_net_admin
> > as well, but after I checked the commit which introduces the cap_bpf
> > and cap_perfmon [1], I think we wouldn't like to do it.
>
> Sorry. None of these options are acceptable.
>
> The idea of introducing a bpf_current_capable() kfunc just to work
> around a deficiency in the verifier is not sound.

So what should we do then?
Just enabling the cap_perfmon for it? That does not sound as well ...

>
> Next time instead of sending patch please describe the issue first.
> You should have started with an email that:
>         struct iphdr *iph = (void *)(long)skb->data + sizeof(struct ethhdr);
>
>         if ((long)(iph + 1) > (long)skb->data_end)
> needs cap_perfmon.

Got it.  Thanks for your explanation.

Alexei Starovoitov Aug. 17, 2023, 3:30 a.m. UTC | #8

On Wed, Aug 16, 2023 at 7:31 PM Yafang Shao <laoar.shao@gmail.com> wrote:
>
> On Thu, Aug 17, 2023 at 9:54 AM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Mon, Aug 14, 2023 at 7:46 PM Yafang Shao <laoar.shao@gmail.com> wrote:
> > >
> > > On Tue, Aug 15, 2023 at 8:28 AM Yonghong Song <yonghong.song@linux.dev> wrote:
> > > >
> > > >
> > > >
> > > > On 8/14/23 7:33 AM, Yafang Shao wrote:
> > > > > Add a new bpf_current_capable kfunc to check whether the current task
> > > > > has a specific capability. In our use case, we will use it in a lsm bpf
> > > > > program to help identify if the user operation is permitted.
> > > > >
> > > > > Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> > > > > ---
> > > > >   kernel/bpf/helpers.c | 6 ++++++
> > > > >   1 file changed, 6 insertions(+)
> > > > >
> > > > > diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> > > > > index eb91cae..bbee7ea 100644
> > > > > --- a/kernel/bpf/helpers.c
> > > > > +++ b/kernel/bpf/helpers.c
> > > > > @@ -2429,6 +2429,11 @@ __bpf_kfunc void bpf_rcu_read_unlock(void)
> > > > >       rcu_read_unlock();
> > > > >   }
> > > > >
> > > > > +__bpf_kfunc bool bpf_current_capable(int cap)
> > > > > +{
> > > > > +     return has_capability(current, cap);
> > > > > +}
> > > >
> > > > Since you are testing against 'current' capabilities, I assume
> > > > that the context should be process. Otherwise, you are testing
> > > > against random task which does not make much sense.
> > >
> > > It is in the process context.
> > >
> > > >
> > > > Since you are testing against 'current' cap, and if the capability
> > > > for that task is stable, you do not need this kfunc.
> > > > You can test cap in user space and pass it into the bpf program.
> > > >
> > > > But if the cap for your process may change in the middle of
> > > > run, then you indeed need bpf prog to test capability in real time.
> > > > Is this your use case and could you describe in in more detail?
> > >
> > > After we convert the capability of our networking bpf program from
> > > CAP_SYS_ADMIN to CAP_BPF+CAP_NET_ADMIN to enhance the security, we
> > > encountered the "pointer comparison prohibited" error, because
> > > allow_ptr_leaks is enabled only when CAP_PERFMON is set. However, if
> > > we enable the CAP_PERFMON for the networking bpf program, it can run
> > > tracing bpf prog, perf_event bpf prog and etc, that is not expected by
> > > us.
> > >
> > > Hence we are planning to use a lsm bpf program to disallow it from
> > > running other bpf programs. In our lsm bpf program we will check the
> > > capability of processes, if the process has cap_net_admin, cap_bpf and
> > > cap_perfmon but don't have cap_sys_admin we will refuse it to run
> > > tracing and perf_event bpf program. While if a process has  cap_bpf
> > > and cap_perfmon but doesn't have cap_net_admin, that said it is a bpf
> > > program which wants to run trace bpf, we will allow it.
> > >
> > > We can't use lsm_cgroup because it is supported on cgroup2 only, while
> > > we're still using cgroup1.
> > >
> > > Another possible solution is enable allow_ptr_leaks for cap_net_admin
> > > as well, but after I checked the commit which introduces the cap_bpf
> > > and cap_perfmon [1], I think we wouldn't like to do it.
> >
> > Sorry. None of these options are acceptable.
> >
> > The idea of introducing a bpf_current_capable() kfunc just to work
> > around a deficiency in the verifier is not sound.
>
> So what should we do then?
> Just enabling the cap_perfmon for it? That does not sound as well ...

Yonghong already pointed out upthread that
comparison of two packet pointers is not a pointer leak.
See this code:
        } else if (!try_match_pkt_pointers(insn, dst_reg, &regs[insn->src_reg],
                                           this_branch, other_branch) &&
                   is_pointer_value(env, insn->dst_reg)) {
                verbose(env, "R%d pointer comparison prohibited\n",
                        insn->dst_reg);
                return -EACCES;
        }

It's not clear why it doesn't address your case.
That's what we should be discussing and debugging instead
of arguing about bpf_current_capable and similar "solutions".

> >
> > Next time instead of sending patch please describe the issue first.
> > You should have started with an email that:
> >         struct iphdr *iph = (void *)(long)skb->data + sizeof(struct ethhdr);
> >
> >         if ((long)(iph + 1) > (long)skb->data_end)
> > needs cap_perfmon.
>
> Got it.  Thanks for your explanation.
>
> --
> Regards
> Yafang

Yafang Shao Aug. 17, 2023, 7:09 a.m. UTC | #9

On Thu, Aug 17, 2023 at 11:31 AM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Wed, Aug 16, 2023 at 7:31 PM Yafang Shao <laoar.shao@gmail.com> wrote:
> >
> > On Thu, Aug 17, 2023 at 9:54 AM Alexei Starovoitov
> > <alexei.starovoitov@gmail.com> wrote:
> > >
> > > On Mon, Aug 14, 2023 at 7:46 PM Yafang Shao <laoar.shao@gmail.com> wrote:
> > > >
> > > > On Tue, Aug 15, 2023 at 8:28 AM Yonghong Song <yonghong.song@linux.dev> wrote:
> > > > >
> > > > >
> > > > >
> > > > > On 8/14/23 7:33 AM, Yafang Shao wrote:
> > > > > > Add a new bpf_current_capable kfunc to check whether the current task
> > > > > > has a specific capability. In our use case, we will use it in a lsm bpf
> > > > > > program to help identify if the user operation is permitted.
> > > > > >
> > > > > > Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> > > > > > ---
> > > > > >   kernel/bpf/helpers.c | 6 ++++++
> > > > > >   1 file changed, 6 insertions(+)
> > > > > >
> > > > > > diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> > > > > > index eb91cae..bbee7ea 100644
> > > > > > --- a/kernel/bpf/helpers.c
> > > > > > +++ b/kernel/bpf/helpers.c
> > > > > > @@ -2429,6 +2429,11 @@ __bpf_kfunc void bpf_rcu_read_unlock(void)
> > > > > >       rcu_read_unlock();
> > > > > >   }
> > > > > >
> > > > > > +__bpf_kfunc bool bpf_current_capable(int cap)
> > > > > > +{
> > > > > > +     return has_capability(current, cap);
> > > > > > +}
> > > > >
> > > > > Since you are testing against 'current' capabilities, I assume
> > > > > that the context should be process. Otherwise, you are testing
> > > > > against random task which does not make much sense.
> > > >
> > > > It is in the process context.
> > > >
> > > > >
> > > > > Since you are testing against 'current' cap, and if the capability
> > > > > for that task is stable, you do not need this kfunc.
> > > > > You can test cap in user space and pass it into the bpf program.
> > > > >
> > > > > But if the cap for your process may change in the middle of
> > > > > run, then you indeed need bpf prog to test capability in real time.
> > > > > Is this your use case and could you describe in in more detail?
> > > >
> > > > After we convert the capability of our networking bpf program from
> > > > CAP_SYS_ADMIN to CAP_BPF+CAP_NET_ADMIN to enhance the security, we
> > > > encountered the "pointer comparison prohibited" error, because
> > > > allow_ptr_leaks is enabled only when CAP_PERFMON is set. However, if
> > > > we enable the CAP_PERFMON for the networking bpf program, it can run
> > > > tracing bpf prog, perf_event bpf prog and etc, that is not expected by
> > > > us.
> > > >
> > > > Hence we are planning to use a lsm bpf program to disallow it from
> > > > running other bpf programs. In our lsm bpf program we will check the
> > > > capability of processes, if the process has cap_net_admin, cap_bpf and
> > > > cap_perfmon but don't have cap_sys_admin we will refuse it to run
> > > > tracing and perf_event bpf program. While if a process has  cap_bpf
> > > > and cap_perfmon but doesn't have cap_net_admin, that said it is a bpf
> > > > program which wants to run trace bpf, we will allow it.
> > > >
> > > > We can't use lsm_cgroup because it is supported on cgroup2 only, while
> > > > we're still using cgroup1.
> > > >
> > > > Another possible solution is enable allow_ptr_leaks for cap_net_admin
> > > > as well, but after I checked the commit which introduces the cap_bpf
> > > > and cap_perfmon [1], I think we wouldn't like to do it.
> > >
> > > Sorry. None of these options are acceptable.
> > >
> > > The idea of introducing a bpf_current_capable() kfunc just to work
> > > around a deficiency in the verifier is not sound.
> >
> > So what should we do then?
> > Just enabling the cap_perfmon for it? That does not sound as well ...
>
> Yonghong already pointed out upthread that
> comparison of two packet pointers is not a pointer leak.
> See this code:
>         } else if (!try_match_pkt_pointers(insn, dst_reg, &regs[insn->src_reg],
>                                            this_branch, other_branch) &&
>                    is_pointer_value(env, insn->dst_reg)) {
>                 verbose(env, "R%d pointer comparison prohibited\n",
>                         insn->dst_reg);
>                 return -EACCES;
>         }
>
> It's not clear why it doesn't address your case.

It can address the issue.
It seems we should do the code change below.

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 0b9da95..c66dc61 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -13819,6 +13819,18 @@ static int check_cond_jmp_op(struct
bpf_verifier_env *env,
                return -EINVAL;
        }

+       other_branch = push_stack(env, *insn_idx + insn->off + 1, *insn_idx,
+                                 false);
+       if (!other_branch)
+               return -EFAULT;
+
+       /* check src2 operand */
+       err = check_reg_arg(env, insn->dst_reg, SRC_OP);
+       if (err)
+               return err;
+
+       dst_reg = &regs[insn->dst_reg];
+
        if (BPF_SRC(insn->code) == BPF_X) {
                if (insn->imm != 0) {
                        verbose(env, "BPF_JMP/JMP32 uses reserved fields\n");
@@ -13830,7 +13842,9 @@ static int check_cond_jmp_op(struct
bpf_verifier_env *env,
                if (err)
                        return err;

-               if (is_pointer_value(env, insn->src_reg)) {
+               if (!try_match_pkt_pointers(insn, dst_reg, &regs[insn->src_reg],
+                                          this_branch, other_branch) &&
+                   is_pointer_value(env, insn->src_reg)) {
                        verbose(env, "R%d pointer comparison prohibited\n",
                                insn->src_reg);
                        return -EACCES;
@@ -13843,12 +13857,6 @@ static int check_cond_jmp_op(struct
bpf_verifier_env *env,
                }
        }

-       /* check src2 operand */
-       err = check_reg_arg(env, insn->dst_reg, SRC_OP);
-       if (err)
-               return err;
-
-       dst_reg = &regs[insn->dst_reg];
        is_jmp32 = BPF_CLASS(insn->code) == BPF_JMP32;

        if (BPF_SRC(insn->code) == BPF_K) {
@@ -13920,10 +13928,6 @@ static int check_cond_jmp_op(struct
bpf_verifier_env *env,
                return 0;
        }

-       other_branch = push_stack(env, *insn_idx + insn->off + 1, *insn_idx,
-                                 false);
-       if (!other_branch)
-               return -EFAULT;
        other_branch_regs = other_branch->frame[other_branch->curframe]->regs;

        /* detect if we are comparing against a constant value so we can adjust

Daniel Borkmann Aug. 17, 2023, 3:30 p.m. UTC | #10

On 8/17/23 9:09 AM, Yafang Shao wrote:
> On Thu, Aug 17, 2023 at 11:31 AM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
>> On Wed, Aug 16, 2023 at 7:31 PM Yafang Shao <laoar.shao@gmail.com> wrote:
>>> On Thu, Aug 17, 2023 at 9:54 AM Alexei Starovoitov
>>> <alexei.starovoitov@gmail.com> wrote:
>>>> On Mon, Aug 14, 2023 at 7:46 PM Yafang Shao <laoar.shao@gmail.com> wrote:
>>>>> On Tue, Aug 15, 2023 at 8:28 AM Yonghong Song <yonghong.song@linux.dev> wrote:
>>>>>> On 8/14/23 7:33 AM, Yafang Shao wrote:
>>>>>>> Add a new bpf_current_capable kfunc to check whether the current task
>>>>>>> has a specific capability. In our use case, we will use it in a lsm bpf
>>>>>>> program to help identify if the user operation is permitted.
>>>>>>>
>>>>>>> Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
>>>>>>> ---
>>>>>>>    kernel/bpf/helpers.c | 6 ++++++
>>>>>>>    1 file changed, 6 insertions(+)
>>>>>>>
>>>>>>> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
>>>>>>> index eb91cae..bbee7ea 100644
>>>>>>> --- a/kernel/bpf/helpers.c
>>>>>>> +++ b/kernel/bpf/helpers.c
>>>>>>> @@ -2429,6 +2429,11 @@ __bpf_kfunc void bpf_rcu_read_unlock(void)
>>>>>>>        rcu_read_unlock();
>>>>>>>    }
>>>>>>>
>>>>>>> +__bpf_kfunc bool bpf_current_capable(int cap)
>>>>>>> +{
>>>>>>> +     return has_capability(current, cap);
>>>>>>> +}
>>>>>>
>>>>>> Since you are testing against 'current' capabilities, I assume
>>>>>> that the context should be process. Otherwise, you are testing
>>>>>> against random task which does not make much sense.
>>>>>
>>>>> It is in the process context.
>>>>>
>>>>>> Since you are testing against 'current' cap, and if the capability
>>>>>> for that task is stable, you do not need this kfunc.
>>>>>> You can test cap in user space and pass it into the bpf program.
>>>>>>
>>>>>> But if the cap for your process may change in the middle of
>>>>>> run, then you indeed need bpf prog to test capability in real time.
>>>>>> Is this your use case and could you describe in in more detail?
>>>>>
>>>>> After we convert the capability of our networking bpf program from
>>>>> CAP_SYS_ADMIN to CAP_BPF+CAP_NET_ADMIN to enhance the security, we
>>>>> encountered the "pointer comparison prohibited" error, because
>>>>> allow_ptr_leaks is enabled only when CAP_PERFMON is set. However, if
>>>>> we enable the CAP_PERFMON for the networking bpf program, it can run
>>>>> tracing bpf prog, perf_event bpf prog and etc, that is not expected by
>>>>> us.
>>>>>
>>>>> Hence we are planning to use a lsm bpf program to disallow it from
>>>>> running other bpf programs. In our lsm bpf program we will check the
>>>>> capability of processes, if the process has cap_net_admin, cap_bpf and
>>>>> cap_perfmon but don't have cap_sys_admin we will refuse it to run
>>>>> tracing and perf_event bpf program. While if a process has  cap_bpf
>>>>> and cap_perfmon but doesn't have cap_net_admin, that said it is a bpf
>>>>> program which wants to run trace bpf, we will allow it.
>>>>>
>>>>> We can't use lsm_cgroup because it is supported on cgroup2 only, while
>>>>> we're still using cgroup1.
>>>>>
>>>>> Another possible solution is enable allow_ptr_leaks for cap_net_admin
>>>>> as well, but after I checked the commit which introduces the cap_bpf
>>>>> and cap_perfmon [1], I think we wouldn't like to do it.
>>>>
>>>> Sorry. None of these options are acceptable.
>>>>
>>>> The idea of introducing a bpf_current_capable() kfunc just to work
>>>> around a deficiency in the verifier is not sound.
>>>
>>> So what should we do then?
>>> Just enabling the cap_perfmon for it? That does not sound as well ...
>>
>> Yonghong already pointed out upthread that
>> comparison of two packet pointers is not a pointer leak.
>> See this code:
>>          } else if (!try_match_pkt_pointers(insn, dst_reg, &regs[insn->src_reg],
>>                                             this_branch, other_branch) &&
>>                     is_pointer_value(env, insn->dst_reg)) {
>>                  verbose(env, "R%d pointer comparison prohibited\n",
>>                          insn->dst_reg);
>>                  return -EACCES;
>>          }
>>
>> It's not clear why it doesn't address your case.
> 
> It can address the issue.
> It seems we should do the code change below.

I presume in your actual program you also access the IP header (otherwise it
would be quite useless), and as you mentioned above, you would like this to be
accessible for just CAP_BPF + CAP_NET_ADMIN. The CAP_PERFMON restriction is
there for a reason, that is, to be on same cap level as tracing programs as you
could craft Spectre v1 style access. It's not a deficiency.

Thanks,
Daniel

Alexei Starovoitov Aug. 17, 2023, 5:45 p.m. UTC | #11

On Thu, Aug 17, 2023 at 8:30 AM Daniel Borkmann <daniel@iogearbox.net> wrote:
>
> On 8/17/23 9:09 AM, Yafang Shao wrote:
> > On Thu, Aug 17, 2023 at 11:31 AM Alexei Starovoitov
> > <alexei.starovoitov@gmail.com> wrote:
> >> On Wed, Aug 16, 2023 at 7:31 PM Yafang Shao <laoar.shao@gmail.com> wrote:
> >>> On Thu, Aug 17, 2023 at 9:54 AM Alexei Starovoitov
> >>> <alexei.starovoitov@gmail.com> wrote:
> >>>> On Mon, Aug 14, 2023 at 7:46 PM Yafang Shao <laoar.shao@gmail.com> wrote:
> >>>>> On Tue, Aug 15, 2023 at 8:28 AM Yonghong Song <yonghong.song@linux.dev> wrote:
> >>>>>> On 8/14/23 7:33 AM, Yafang Shao wrote:
> >>>>>>> Add a new bpf_current_capable kfunc to check whether the current task
> >>>>>>> has a specific capability. In our use case, we will use it in a lsm bpf
> >>>>>>> program to help identify if the user operation is permitted.
> >>>>>>>
> >>>>>>> Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> >>>>>>> ---
> >>>>>>>    kernel/bpf/helpers.c | 6 ++++++
> >>>>>>>    1 file changed, 6 insertions(+)
> >>>>>>>
> >>>>>>> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> >>>>>>> index eb91cae..bbee7ea 100644
> >>>>>>> --- a/kernel/bpf/helpers.c
> >>>>>>> +++ b/kernel/bpf/helpers.c
> >>>>>>> @@ -2429,6 +2429,11 @@ __bpf_kfunc void bpf_rcu_read_unlock(void)
> >>>>>>>        rcu_read_unlock();
> >>>>>>>    }
> >>>>>>>
> >>>>>>> +__bpf_kfunc bool bpf_current_capable(int cap)
> >>>>>>> +{
> >>>>>>> +     return has_capability(current, cap);
> >>>>>>> +}
> >>>>>>
> >>>>>> Since you are testing against 'current' capabilities, I assume
> >>>>>> that the context should be process. Otherwise, you are testing
> >>>>>> against random task which does not make much sense.
> >>>>>
> >>>>> It is in the process context.
> >>>>>
> >>>>>> Since you are testing against 'current' cap, and if the capability
> >>>>>> for that task is stable, you do not need this kfunc.
> >>>>>> You can test cap in user space and pass it into the bpf program.
> >>>>>>
> >>>>>> But if the cap for your process may change in the middle of
> >>>>>> run, then you indeed need bpf prog to test capability in real time.
> >>>>>> Is this your use case and could you describe in in more detail?
> >>>>>
> >>>>> After we convert the capability of our networking bpf program from
> >>>>> CAP_SYS_ADMIN to CAP_BPF+CAP_NET_ADMIN to enhance the security, we
> >>>>> encountered the "pointer comparison prohibited" error, because
> >>>>> allow_ptr_leaks is enabled only when CAP_PERFMON is set. However, if
> >>>>> we enable the CAP_PERFMON for the networking bpf program, it can run
> >>>>> tracing bpf prog, perf_event bpf prog and etc, that is not expected by
> >>>>> us.
> >>>>>
> >>>>> Hence we are planning to use a lsm bpf program to disallow it from
> >>>>> running other bpf programs. In our lsm bpf program we will check the
> >>>>> capability of processes, if the process has cap_net_admin, cap_bpf and
> >>>>> cap_perfmon but don't have cap_sys_admin we will refuse it to run
> >>>>> tracing and perf_event bpf program. While if a process has  cap_bpf
> >>>>> and cap_perfmon but doesn't have cap_net_admin, that said it is a bpf
> >>>>> program which wants to run trace bpf, we will allow it.
> >>>>>
> >>>>> We can't use lsm_cgroup because it is supported on cgroup2 only, while
> >>>>> we're still using cgroup1.
> >>>>>
> >>>>> Another possible solution is enable allow_ptr_leaks for cap_net_admin
> >>>>> as well, but after I checked the commit which introduces the cap_bpf
> >>>>> and cap_perfmon [1], I think we wouldn't like to do it.
> >>>>
> >>>> Sorry. None of these options are acceptable.
> >>>>
> >>>> The idea of introducing a bpf_current_capable() kfunc just to work
> >>>> around a deficiency in the verifier is not sound.
> >>>
> >>> So what should we do then?
> >>> Just enabling the cap_perfmon for it? That does not sound as well ...
> >>
> >> Yonghong already pointed out upthread that
> >> comparison of two packet pointers is not a pointer leak.
> >> See this code:
> >>          } else if (!try_match_pkt_pointers(insn, dst_reg, &regs[insn->src_reg],
> >>                                             this_branch, other_branch) &&
> >>                     is_pointer_value(env, insn->dst_reg)) {
> >>                  verbose(env, "R%d pointer comparison prohibited\n",
> >>                          insn->dst_reg);
> >>                  return -EACCES;
> >>          }
> >>
> >> It's not clear why it doesn't address your case.
> >
> > It can address the issue.
> > It seems we should do the code change below.
>
> I presume in your actual program you also access the IP header (otherwise it
> would be quite useless), and as you mentioned above, you would like this to be
> accessible for just CAP_BPF + CAP_NET_ADMIN. The CAP_PERFMON restriction is
> there for a reason, that is, to be on same cap level as tracing programs as you
> could craft Spectre v1 style access. It's not a deficiency.

Probably not. To mount v1 style attack the load of data_end needs to
be attacker controlled which isn't the case here.

Alexei Starovoitov Aug. 17, 2023, 5:48 p.m. UTC | #12

On Thu, Aug 17, 2023 at 12:10 AM Yafang Shao <laoar.shao@gmail.com> wrote:
> >
> > Yonghong already pointed out upthread that
> > comparison of two packet pointers is not a pointer leak.
> > See this code:
> >         } else if (!try_match_pkt_pointers(insn, dst_reg, &regs[insn->src_reg],
> >                                            this_branch, other_branch) &&
> >                    is_pointer_value(env, insn->dst_reg)) {
> >                 verbose(env, "R%d pointer comparison prohibited\n",
> >                         insn->dst_reg);
> >                 return -EACCES;
> >         }
> >
> > It's not clear why it doesn't address your case.
>
> It can address the issue.
> It seems we should do the code change below.
>
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 0b9da95..c66dc61 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -13819,6 +13819,18 @@ static int check_cond_jmp_op(struct
> bpf_verifier_env *env,
>                 return -EINVAL;
>         }
>
> +       other_branch = push_stack(env, *insn_idx + insn->off + 1, *insn_idx,
> +                                 false);

Yeah. something like that.
except we must do push_stack() only after is_branch_taken() didn't succeed.

Yafang Shao Aug. 21, 2023, 5:56 a.m. UTC | #13

On Thu, Aug 17, 2023 at 11:30 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
>
> On 8/17/23 9:09 AM, Yafang Shao wrote:
> > On Thu, Aug 17, 2023 at 11:31 AM Alexei Starovoitov
> > <alexei.starovoitov@gmail.com> wrote:
> >> On Wed, Aug 16, 2023 at 7:31 PM Yafang Shao <laoar.shao@gmail.com> wrote:
> >>> On Thu, Aug 17, 2023 at 9:54 AM Alexei Starovoitov
> >>> <alexei.starovoitov@gmail.com> wrote:
> >>>> On Mon, Aug 14, 2023 at 7:46 PM Yafang Shao <laoar.shao@gmail.com> wrote:
> >>>>> On Tue, Aug 15, 2023 at 8:28 AM Yonghong Song <yonghong.song@linux.dev> wrote:
> >>>>>> On 8/14/23 7:33 AM, Yafang Shao wrote:
> >>>>>>> Add a new bpf_current_capable kfunc to check whether the current task
> >>>>>>> has a specific capability. In our use case, we will use it in a lsm bpf
> >>>>>>> program to help identify if the user operation is permitted.
> >>>>>>>
> >>>>>>> Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> >>>>>>> ---
> >>>>>>>    kernel/bpf/helpers.c | 6 ++++++
> >>>>>>>    1 file changed, 6 insertions(+)
> >>>>>>>
> >>>>>>> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> >>>>>>> index eb91cae..bbee7ea 100644
> >>>>>>> --- a/kernel/bpf/helpers.c
> >>>>>>> +++ b/kernel/bpf/helpers.c
> >>>>>>> @@ -2429,6 +2429,11 @@ __bpf_kfunc void bpf_rcu_read_unlock(void)
> >>>>>>>        rcu_read_unlock();
> >>>>>>>    }
> >>>>>>>
> >>>>>>> +__bpf_kfunc bool bpf_current_capable(int cap)
> >>>>>>> +{
> >>>>>>> +     return has_capability(current, cap);
> >>>>>>> +}
> >>>>>>
> >>>>>> Since you are testing against 'current' capabilities, I assume
> >>>>>> that the context should be process. Otherwise, you are testing
> >>>>>> against random task which does not make much sense.
> >>>>>
> >>>>> It is in the process context.
> >>>>>
> >>>>>> Since you are testing against 'current' cap, and if the capability
> >>>>>> for that task is stable, you do not need this kfunc.
> >>>>>> You can test cap in user space and pass it into the bpf program.
> >>>>>>
> >>>>>> But if the cap for your process may change in the middle of
> >>>>>> run, then you indeed need bpf prog to test capability in real time.
> >>>>>> Is this your use case and could you describe in in more detail?
> >>>>>
> >>>>> After we convert the capability of our networking bpf program from
> >>>>> CAP_SYS_ADMIN to CAP_BPF+CAP_NET_ADMIN to enhance the security, we
> >>>>> encountered the "pointer comparison prohibited" error, because
> >>>>> allow_ptr_leaks is enabled only when CAP_PERFMON is set. However, if
> >>>>> we enable the CAP_PERFMON for the networking bpf program, it can run
> >>>>> tracing bpf prog, perf_event bpf prog and etc, that is not expected by
> >>>>> us.
> >>>>>
> >>>>> Hence we are planning to use a lsm bpf program to disallow it from
> >>>>> running other bpf programs. In our lsm bpf program we will check the
> >>>>> capability of processes, if the process has cap_net_admin, cap_bpf and
> >>>>> cap_perfmon but don't have cap_sys_admin we will refuse it to run
> >>>>> tracing and perf_event bpf program. While if a process has  cap_bpf
> >>>>> and cap_perfmon but doesn't have cap_net_admin, that said it is a bpf
> >>>>> program which wants to run trace bpf, we will allow it.
> >>>>>
> >>>>> We can't use lsm_cgroup because it is supported on cgroup2 only, while
> >>>>> we're still using cgroup1.
> >>>>>
> >>>>> Another possible solution is enable allow_ptr_leaks for cap_net_admin
> >>>>> as well, but after I checked the commit which introduces the cap_bpf
> >>>>> and cap_perfmon [1], I think we wouldn't like to do it.
> >>>>
> >>>> Sorry. None of these options are acceptable.
> >>>>
> >>>> The idea of introducing a bpf_current_capable() kfunc just to work
> >>>> around a deficiency in the verifier is not sound.
> >>>
> >>> So what should we do then?
> >>> Just enabling the cap_perfmon for it? That does not sound as well ...
> >>
> >> Yonghong already pointed out upthread that
> >> comparison of two packet pointers is not a pointer leak.
> >> See this code:
> >>          } else if (!try_match_pkt_pointers(insn, dst_reg, &regs[insn->src_reg],
> >>                                             this_branch, other_branch) &&
> >>                     is_pointer_value(env, insn->dst_reg)) {
> >>                  verbose(env, "R%d pointer comparison prohibited\n",
> >>                          insn->dst_reg);
> >>                  return -EACCES;
> >>          }
> >>
> >> It's not clear why it doesn't address your case.
> >
> > It can address the issue.
> > It seems we should do the code change below.
>
> I presume in your actual program you also access the IP header (otherwise it
> would be quite useless), and as you mentioned above, you would like this to be
> accessible for just CAP_BPF + CAP_NET_ADMIN. The CAP_PERFMON restriction is
> there for a reason, that is, to be on same cap level as tracing programs as you
> could craft Spectre v1 style access. It's not a deficiency.

Hi Daniel,

We also hit an error caused by bpf_bypass_spec_v1 in our
networking-bpf with cap_net_admin+cap_bpf.

Here is a simple reproducer,

#include <linux/pkt_cls.h>
#include <linux/if_ether.h>
#include <linux/ip.h>
#include <linux/tcp.h>
#include <linux/bpf.h>

#include <bpf/bpf_helpers.h>

struct flow_stats_v {
    __u32 egress_packets;
    __u32 ingress_packets;
};

/* ELF map definition */
struct bpf_elf_map {
    __u32 type;
    __u32 size_key;
    __u32 size_value;
    __u32 max_elem;
    __u32 flags;
    __u32 id;
    __u32 pinning;
    __u32 inner_id;
    __u32 inner_idx;
};

struct bpf_elf_map SEC("maps") flow_stats = {
    .type = BPF_MAP_TYPE_HASH,
    .size_key = sizeof(__u32),
    .size_value = sizeof(struct flow_stats_v),
    .max_elem = 32,
    .flags = BPF_F_NO_PREALLOC,
    .pinning = 0,
};

struct bpf_elf_map SEC("maps") classid = {
    .type = BPF_MAP_TYPE_HASH,
    .size_key = sizeof(__be32),
    .size_value = sizeof(__be32),
    .max_elem = 32,
    .flags = BPF_F_NO_PREALLOC,
    .pinning = 0,
};

struct flow {
    __be32 local_classid;
    __be32 local_addr;
    __u8 dir;
};

static void flow_stat(struct __sk_buff *skb, struct flow *flow)
{
        struct flow_stats_v *value;
        __u32 key = 0; // This value is not relavant

        value = bpf_map_lookup_elem(&flow_stats, &key);
        if (!value)
                return;

        if (flow->dir == 0)
                __sync_fetch_and_add(&value->egress_packets, 1);
        else
                __sync_fetch_and_add(&value->ingress_packets, 1);
}

int handle_packet(struct __sk_buff *skb, int dir)
{
        void *data_end = (void *)(long)skb->data_end;
        void *data = (void *)(long)skb->data;
        struct flow flow = {};
        __be32 *id;

        flow.dir = dir;
        flow.local_addr = 1; // This value is not relavant

        id = bpf_map_lookup_elem(&classid, &flow.local_addr);
        if (id && *id)
                flow.local_classid = *id;

        flow_stat(skb, &flow);
        return TC_ACT_OK;
}

SEC("cls-ingress")
int ingress(struct __sk_buff *skb)
{
        return handle_packet(skb, 1);
}

char _license[] SEC("license") = "GPL";


The error log of the bpf verifier as follows,

23: (71) r3 = *(u8 *)(r10 -8)         ;
R3_w=scalar(umax=255,var_off=(0x0; 0xff)) R10=fp0 fp-8=mmmm???m
24: (b7) r2 = 1                       ; R2_w=1
25: (55) if r3 != 0x0 goto pc+1 27:
R0=map_value(off=0,ks=4,vs=8,imm=0) R1_w=1 R2_w=1
R3_w=scalar(umax=255,var_off=(0x0; 0xff)) R10=fp0 fp-8=mmmm???m fp-16=
27: (67) r2 <<= 2                     ; R2_w=4
28: (0f) r0 += r2
R0 tried to add from different maps, paths or scalars, pointer
arithmetic with it prohibited for !root
processed 31 insns (limit 1000000000) max_states_per_insn 0
total_states 3 peak_states 3 mark_read 1

It can be fixed by the code change below in the kernel,

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index b6b60cd..0e200a9 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -11750,7 +11750,7 @@ static int update_alu_sanitation_state(struct
bpf_insn_aux_data *aux,
        /* If we arrived here from different branches with different
         * state or limits to sanitize, then this won't work.
         */
-       if (aux->alu_state &&
+       if (aux->alu_state && aux->alu_limit &&
            (aux->alu_state != alu_state ||
             aux->alu_limit != alu_limit))
                return REASON_PATHS;

But I'm not sure if it is a proper fix.

We can workaround the issue by modifying our bpf program as follows,

@@ -77,7 +77,21 @@ int handle_packet(struct __sk_buff *skb, int dir)
        if (id && *id)
                flow.local_classid = *id;

-       flow_stat(skb, &flow);
+       {
+       struct flow_stats_v *value;
+       __u32 key = 0; // This value is not relavant
+
+       value = bpf_map_lookup_elem(&flow_stats, &key);
+       if (!value)
+               return TC_ACT_OK;
+
+       if (flow.dir == 0)
+               __sync_fetch_and_add(&value->egress_packets, 1);
+       else
+               __sync_fetch_and_add(&value->ingress_packets, 1);
+
+       }
+       // flow_stat(skb, &flow);
        return TC_ACT_OK;
 }

But I don't have a clear idea why.
It seems to me that the bpf verifier under cap_net_admin+cap_bpf is
fragile.  I'm wondering if it is possible to add a sysctl to bypass
the perfmon check :

--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -2091,22 +2091,22 @@ static inline void
bpf_map_dec_elem_count(struct bpf_map *map)

 static inline bool bpf_allow_ptr_leaks(void)
 {
-       return perfmon_capable();
+       return perfmon_capable() || syscal_bypass_perfmon;
 }

 static inline bool bpf_allow_uninit_stack(void)
 {
-       return perfmon_capable();
+       return perfmon_capable() || syscal_bypass_perfmon;
 }

 static inline bool bpf_bypass_spec_v1(void)
 {
-       return perfmon_capable();
+       return perfmon_capable() || syscal_bypass_perfmon;
 }

 static inline bool bpf_bypass_spec_v4(void)
 {
-       return perfmon_capable();
+       return perfmon_capable() || syscal_bypass_perfmon;
 }

 int bpf_map_new_fd(struct bpf_map *map, int flags);

Do you have any suggestions?

[RFC,bpf-next,1/2] bpf: Add bpf_current_capable kfunc

Checks

Commit Message

Comments

Patch