Message ID | 20230814143341.3767-2-laoar.shao@gmail.com (mailing list archive) |
---|---|
State | RFC |
Delegated to: | BPF |
Headers | show |
Series | bpf: Add a new kfunc bpf_current_capable | expand |
On 8/14/23 7:33 AM, Yafang Shao wrote: > Add a new bpf_current_capable kfunc to check whether the current task > has a specific capability. In our use case, we will use it in a lsm bpf > program to help identify if the user operation is permitted. > > Signed-off-by: Yafang Shao <laoar.shao@gmail.com> > --- > kernel/bpf/helpers.c | 6 ++++++ > 1 file changed, 6 insertions(+) > > diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c > index eb91cae..bbee7ea 100644 > --- a/kernel/bpf/helpers.c > +++ b/kernel/bpf/helpers.c > @@ -2429,6 +2429,11 @@ __bpf_kfunc void bpf_rcu_read_unlock(void) > rcu_read_unlock(); > } > > +__bpf_kfunc bool bpf_current_capable(int cap) > +{ > + return has_capability(current, cap); > +} Since you are testing against 'current' capabilities, I assume that the context should be process. Otherwise, you are testing against random task which does not make much sense. Since you are testing against 'current' cap, and if the capability for that task is stable, you do not need this kfunc. You can test cap in user space and pass it into the bpf program. But if the cap for your process may change in the middle of run, then you indeed need bpf prog to test capability in real time. Is this your use case and could you describe in in more detail? > + > __diag_pop(); > > BTF_SET8_START(generic_btf_ids) > @@ -2456,6 +2461,7 @@ __bpf_kfunc void bpf_rcu_read_unlock(void) > BTF_ID_FLAGS(func, bpf_task_under_cgroup, KF_RCU) > #endif > BTF_ID_FLAGS(func, bpf_task_from_pid, KF_ACQUIRE | KF_RET_NULL) > +BTF_ID_FLAGS(func, bpf_current_capable, KF_RCU) > BTF_SET8_END(generic_btf_ids) > > static const struct btf_kfunc_id_set generic_kfunc_set = {
On Tue, Aug 15, 2023 at 8:28 AM Yonghong Song <yonghong.song@linux.dev> wrote: > > > > On 8/14/23 7:33 AM, Yafang Shao wrote: > > Add a new bpf_current_capable kfunc to check whether the current task > > has a specific capability. In our use case, we will use it in a lsm bpf > > program to help identify if the user operation is permitted. > > > > Signed-off-by: Yafang Shao <laoar.shao@gmail.com> > > --- > > kernel/bpf/helpers.c | 6 ++++++ > > 1 file changed, 6 insertions(+) > > > > diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c > > index eb91cae..bbee7ea 100644 > > --- a/kernel/bpf/helpers.c > > +++ b/kernel/bpf/helpers.c > > @@ -2429,6 +2429,11 @@ __bpf_kfunc void bpf_rcu_read_unlock(void) > > rcu_read_unlock(); > > } > > > > +__bpf_kfunc bool bpf_current_capable(int cap) > > +{ > > + return has_capability(current, cap); > > +} > > Since you are testing against 'current' capabilities, I assume > that the context should be process. Otherwise, you are testing > against random task which does not make much sense. It is in the process context. > > Since you are testing against 'current' cap, and if the capability > for that task is stable, you do not need this kfunc. > You can test cap in user space and pass it into the bpf program. > > But if the cap for your process may change in the middle of > run, then you indeed need bpf prog to test capability in real time. > Is this your use case and could you describe in in more detail? After we convert the capability of our networking bpf program from CAP_SYS_ADMIN to CAP_BPF+CAP_NET_ADMIN to enhance the security, we encountered the "pointer comparison prohibited" error, because allow_ptr_leaks is enabled only when CAP_PERFMON is set. However, if we enable the CAP_PERFMON for the networking bpf program, it can run tracing bpf prog, perf_event bpf prog and etc, that is not expected by us. Hence we are planning to use a lsm bpf program to disallow it from running other bpf programs. In our lsm bpf program we will check the capability of processes, if the process has cap_net_admin, cap_bpf and cap_perfmon but don't have cap_sys_admin we will refuse it to run tracing and perf_event bpf program. While if a process has cap_bpf and cap_perfmon but doesn't have cap_net_admin, that said it is a bpf program which wants to run trace bpf, we will allow it. We can't use lsm_cgroup because it is supported on cgroup2 only, while we're still using cgroup1. Another possible solution is enable allow_ptr_leaks for cap_net_admin as well, but after I checked the commit which introduces the cap_bpf and cap_perfmon [1], I think we wouldn't like to do it. [1]. https://lore.kernel.org/bpf/20200513230355.7858-3-alexei.starovoitov@gmail.com/
On 8/14/23 7:45 PM, Yafang Shao wrote: > On Tue, Aug 15, 2023 at 8:28 AM Yonghong Song <yonghong.song@linux.dev> wrote: >> >> >> >> On 8/14/23 7:33 AM, Yafang Shao wrote: >>> Add a new bpf_current_capable kfunc to check whether the current task >>> has a specific capability. In our use case, we will use it in a lsm bpf >>> program to help identify if the user operation is permitted. >>> >>> Signed-off-by: Yafang Shao <laoar.shao@gmail.com> >>> --- >>> kernel/bpf/helpers.c | 6 ++++++ >>> 1 file changed, 6 insertions(+) >>> >>> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c >>> index eb91cae..bbee7ea 100644 >>> --- a/kernel/bpf/helpers.c >>> +++ b/kernel/bpf/helpers.c >>> @@ -2429,6 +2429,11 @@ __bpf_kfunc void bpf_rcu_read_unlock(void) >>> rcu_read_unlock(); >>> } >>> >>> +__bpf_kfunc bool bpf_current_capable(int cap) >>> +{ >>> + return has_capability(current, cap); >>> +} >> >> Since you are testing against 'current' capabilities, I assume >> that the context should be process. Otherwise, you are testing >> against random task which does not make much sense. > > It is in the process context. > >> >> Since you are testing against 'current' cap, and if the capability >> for that task is stable, you do not need this kfunc. >> You can test cap in user space and pass it into the bpf program. >> >> But if the cap for your process may change in the middle of >> run, then you indeed need bpf prog to test capability in real time. >> Is this your use case and could you describe in in more detail? > > After we convert the capability of our networking bpf program from > CAP_SYS_ADMIN to CAP_BPF+CAP_NET_ADMIN to enhance the security, we > encountered the "pointer comparison prohibited" error, because Could you give a reproducible test case for this verifier failure with CAP_BPF+CAP_NET_ADMIN capability? Is this due to packet pointer or something else? Maybe verifier needs to be improved in these cases without violating the promise of allow_ptr_leaks? > allow_ptr_leaks is enabled only when CAP_PERFMON is set. However, if > we enable the CAP_PERFMON for the networking bpf program, it can run > tracing bpf prog, perf_event bpf prog and etc, that is not expected by > us. > > Hence we are planning to use a lsm bpf program to disallow it from > running other bpf programs. In our lsm bpf program we will check the > capability of processes, if the process has cap_net_admin, cap_bpf and > cap_perfmon but don't have cap_sys_admin we will refuse it to run > tracing and perf_event bpf program. While if a process has cap_bpf > and cap_perfmon but doesn't have cap_net_admin, that said it is a bpf > program which wants to run trace bpf, we will allow it. > > We can't use lsm_cgroup because it is supported on cgroup2 only, while > we're still using cgroup1. > > Another possible solution is enable allow_ptr_leaks for cap_net_admin > as well, but after I checked the commit which introduces the cap_bpf > and cap_perfmon [1], I think we wouldn't like to do it. > > [1]. https://lore.kernel.org/bpf/20200513230355.7858-3-alexei.starovoitov@gmail.com/
On Tue, Aug 15, 2023 at 11:40 AM Yonghong Song <yonghong.song@linux.dev> wrote: > > > > On 8/14/23 7:45 PM, Yafang Shao wrote: > > On Tue, Aug 15, 2023 at 8:28 AM Yonghong Song <yonghong.song@linux.dev> wrote: > >> > >> > >> > >> On 8/14/23 7:33 AM, Yafang Shao wrote: > >>> Add a new bpf_current_capable kfunc to check whether the current task > >>> has a specific capability. In our use case, we will use it in a lsm bpf > >>> program to help identify if the user operation is permitted. > >>> > >>> Signed-off-by: Yafang Shao <laoar.shao@gmail.com> > >>> --- > >>> kernel/bpf/helpers.c | 6 ++++++ > >>> 1 file changed, 6 insertions(+) > >>> > >>> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c > >>> index eb91cae..bbee7ea 100644 > >>> --- a/kernel/bpf/helpers.c > >>> +++ b/kernel/bpf/helpers.c > >>> @@ -2429,6 +2429,11 @@ __bpf_kfunc void bpf_rcu_read_unlock(void) > >>> rcu_read_unlock(); > >>> } > >>> > >>> +__bpf_kfunc bool bpf_current_capable(int cap) > >>> +{ > >>> + return has_capability(current, cap); > >>> +} > >> > >> Since you are testing against 'current' capabilities, I assume > >> that the context should be process. Otherwise, you are testing > >> against random task which does not make much sense. > > > > It is in the process context. > > > >> > >> Since you are testing against 'current' cap, and if the capability > >> for that task is stable, you do not need this kfunc. > >> You can test cap in user space and pass it into the bpf program. > >> > >> But if the cap for your process may change in the middle of > >> run, then you indeed need bpf prog to test capability in real time. > >> Is this your use case and could you describe in in more detail? > > > > After we convert the capability of our networking bpf program from > > CAP_SYS_ADMIN to CAP_BPF+CAP_NET_ADMIN to enhance the security, we > > encountered the "pointer comparison prohibited" error, because > > Could you give a reproducible test case for this verifier failure > with CAP_BPF+CAP_NET_ADMIN capability? Is this due to packet pointer > or something else? Maybe verifier needs to be improved in these > cases without violating the promise of allow_ptr_leaks? Here it is. SEC("cls-ingress") int ingress(struct __sk_buff *skb) { struct iphdr *iph = (void *)(long)skb->data + sizeof(struct ethhdr); if ((long)(iph + 1) > (long)skb->data_end) return TC_ACT_STOLEN; return TC_ACT_OK; } In this bpf prog, it will compare the pointer iph with skb->data_end, and thus it will fail the verifier.
On 8/14/23 10:49 PM, Yafang Shao wrote: > On Tue, Aug 15, 2023 at 11:40 AM Yonghong Song <yonghong.song@linux.dev> wrote: >> >> >> >> On 8/14/23 7:45 PM, Yafang Shao wrote: >>> On Tue, Aug 15, 2023 at 8:28 AM Yonghong Song <yonghong.song@linux.dev> wrote: >>>> >>>> >>>> >>>> On 8/14/23 7:33 AM, Yafang Shao wrote: >>>>> Add a new bpf_current_capable kfunc to check whether the current task >>>>> has a specific capability. In our use case, we will use it in a lsm bpf >>>>> program to help identify if the user operation is permitted. >>>>> >>>>> Signed-off-by: Yafang Shao <laoar.shao@gmail.com> >>>>> --- >>>>> kernel/bpf/helpers.c | 6 ++++++ >>>>> 1 file changed, 6 insertions(+) >>>>> >>>>> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c >>>>> index eb91cae..bbee7ea 100644 >>>>> --- a/kernel/bpf/helpers.c >>>>> +++ b/kernel/bpf/helpers.c >>>>> @@ -2429,6 +2429,11 @@ __bpf_kfunc void bpf_rcu_read_unlock(void) >>>>> rcu_read_unlock(); >>>>> } >>>>> >>>>> +__bpf_kfunc bool bpf_current_capable(int cap) >>>>> +{ >>>>> + return has_capability(current, cap); >>>>> +} >>>> >>>> Since you are testing against 'current' capabilities, I assume >>>> that the context should be process. Otherwise, you are testing >>>> against random task which does not make much sense. >>> >>> It is in the process context. >>> >>>> >>>> Since you are testing against 'current' cap, and if the capability >>>> for that task is stable, you do not need this kfunc. >>>> You can test cap in user space and pass it into the bpf program. >>>> >>>> But if the cap for your process may change in the middle of >>>> run, then you indeed need bpf prog to test capability in real time. >>>> Is this your use case and could you describe in in more detail? >>> >>> After we convert the capability of our networking bpf program from >>> CAP_SYS_ADMIN to CAP_BPF+CAP_NET_ADMIN to enhance the security, we >>> encountered the "pointer comparison prohibited" error, because >> >> Could you give a reproducible test case for this verifier failure >> with CAP_BPF+CAP_NET_ADMIN capability? Is this due to packet pointer >> or something else? Maybe verifier needs to be improved in these >> cases without violating the promise of allow_ptr_leaks? > > Here it is. > > SEC("cls-ingress") > int ingress(struct __sk_buff *skb) > { > struct iphdr *iph = (void *)(long)skb->data + sizeof(struct ethhdr); > > if ((long)(iph + 1) > (long)skb->data_end) > return TC_ACT_STOLEN; > return TC_ACT_OK; > } > > In this bpf prog, it will compare the pointer iph with skb->data_end, > and thus it will fail the verifier. Thanks. In this particular case, I think comparing packet ptr and data_end should not be considered as ptr_leaks. Probably Alexei and Daniel can comment on this too.
On Mon, Aug 14, 2023 at 7:46 PM Yafang Shao <laoar.shao@gmail.com> wrote: > > On Tue, Aug 15, 2023 at 8:28 AM Yonghong Song <yonghong.song@linux.dev> wrote: > > > > > > > > On 8/14/23 7:33 AM, Yafang Shao wrote: > > > Add a new bpf_current_capable kfunc to check whether the current task > > > has a specific capability. In our use case, we will use it in a lsm bpf > > > program to help identify if the user operation is permitted. > > > > > > Signed-off-by: Yafang Shao <laoar.shao@gmail.com> > > > --- > > > kernel/bpf/helpers.c | 6 ++++++ > > > 1 file changed, 6 insertions(+) > > > > > > diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c > > > index eb91cae..bbee7ea 100644 > > > --- a/kernel/bpf/helpers.c > > > +++ b/kernel/bpf/helpers.c > > > @@ -2429,6 +2429,11 @@ __bpf_kfunc void bpf_rcu_read_unlock(void) > > > rcu_read_unlock(); > > > } > > > > > > +__bpf_kfunc bool bpf_current_capable(int cap) > > > +{ > > > + return has_capability(current, cap); > > > +} > > > > Since you are testing against 'current' capabilities, I assume > > that the context should be process. Otherwise, you are testing > > against random task which does not make much sense. > > It is in the process context. > > > > > Since you are testing against 'current' cap, and if the capability > > for that task is stable, you do not need this kfunc. > > You can test cap in user space and pass it into the bpf program. > > > > But if the cap for your process may change in the middle of > > run, then you indeed need bpf prog to test capability in real time. > > Is this your use case and could you describe in in more detail? > > After we convert the capability of our networking bpf program from > CAP_SYS_ADMIN to CAP_BPF+CAP_NET_ADMIN to enhance the security, we > encountered the "pointer comparison prohibited" error, because > allow_ptr_leaks is enabled only when CAP_PERFMON is set. However, if > we enable the CAP_PERFMON for the networking bpf program, it can run > tracing bpf prog, perf_event bpf prog and etc, that is not expected by > us. > > Hence we are planning to use a lsm bpf program to disallow it from > running other bpf programs. In our lsm bpf program we will check the > capability of processes, if the process has cap_net_admin, cap_bpf and > cap_perfmon but don't have cap_sys_admin we will refuse it to run > tracing and perf_event bpf program. While if a process has cap_bpf > and cap_perfmon but doesn't have cap_net_admin, that said it is a bpf > program which wants to run trace bpf, we will allow it. > > We can't use lsm_cgroup because it is supported on cgroup2 only, while > we're still using cgroup1. > > Another possible solution is enable allow_ptr_leaks for cap_net_admin > as well, but after I checked the commit which introduces the cap_bpf > and cap_perfmon [1], I think we wouldn't like to do it. Sorry. None of these options are acceptable. The idea of introducing a bpf_current_capable() kfunc just to work around a deficiency in the verifier is not sound. Next time instead of sending patch please describe the issue first. You should have started with an email that: struct iphdr *iph = (void *)(long)skb->data + sizeof(struct ethhdr); if ((long)(iph + 1) > (long)skb->data_end) needs cap_perfmon.
On Thu, Aug 17, 2023 at 9:54 AM Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > > On Mon, Aug 14, 2023 at 7:46 PM Yafang Shao <laoar.shao@gmail.com> wrote: > > > > On Tue, Aug 15, 2023 at 8:28 AM Yonghong Song <yonghong.song@linux.dev> wrote: > > > > > > > > > > > > On 8/14/23 7:33 AM, Yafang Shao wrote: > > > > Add a new bpf_current_capable kfunc to check whether the current task > > > > has a specific capability. In our use case, we will use it in a lsm bpf > > > > program to help identify if the user operation is permitted. > > > > > > > > Signed-off-by: Yafang Shao <laoar.shao@gmail.com> > > > > --- > > > > kernel/bpf/helpers.c | 6 ++++++ > > > > 1 file changed, 6 insertions(+) > > > > > > > > diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c > > > > index eb91cae..bbee7ea 100644 > > > > --- a/kernel/bpf/helpers.c > > > > +++ b/kernel/bpf/helpers.c > > > > @@ -2429,6 +2429,11 @@ __bpf_kfunc void bpf_rcu_read_unlock(void) > > > > rcu_read_unlock(); > > > > } > > > > > > > > +__bpf_kfunc bool bpf_current_capable(int cap) > > > > +{ > > > > + return has_capability(current, cap); > > > > +} > > > > > > Since you are testing against 'current' capabilities, I assume > > > that the context should be process. Otherwise, you are testing > > > against random task which does not make much sense. > > > > It is in the process context. > > > > > > > > Since you are testing against 'current' cap, and if the capability > > > for that task is stable, you do not need this kfunc. > > > You can test cap in user space and pass it into the bpf program. > > > > > > But if the cap for your process may change in the middle of > > > run, then you indeed need bpf prog to test capability in real time. > > > Is this your use case and could you describe in in more detail? > > > > After we convert the capability of our networking bpf program from > > CAP_SYS_ADMIN to CAP_BPF+CAP_NET_ADMIN to enhance the security, we > > encountered the "pointer comparison prohibited" error, because > > allow_ptr_leaks is enabled only when CAP_PERFMON is set. However, if > > we enable the CAP_PERFMON for the networking bpf program, it can run > > tracing bpf prog, perf_event bpf prog and etc, that is not expected by > > us. > > > > Hence we are planning to use a lsm bpf program to disallow it from > > running other bpf programs. In our lsm bpf program we will check the > > capability of processes, if the process has cap_net_admin, cap_bpf and > > cap_perfmon but don't have cap_sys_admin we will refuse it to run > > tracing and perf_event bpf program. While if a process has cap_bpf > > and cap_perfmon but doesn't have cap_net_admin, that said it is a bpf > > program which wants to run trace bpf, we will allow it. > > > > We can't use lsm_cgroup because it is supported on cgroup2 only, while > > we're still using cgroup1. > > > > Another possible solution is enable allow_ptr_leaks for cap_net_admin > > as well, but after I checked the commit which introduces the cap_bpf > > and cap_perfmon [1], I think we wouldn't like to do it. > > Sorry. None of these options are acceptable. > > The idea of introducing a bpf_current_capable() kfunc just to work > around a deficiency in the verifier is not sound. So what should we do then? Just enabling the cap_perfmon for it? That does not sound as well ... > > Next time instead of sending patch please describe the issue first. > You should have started with an email that: > struct iphdr *iph = (void *)(long)skb->data + sizeof(struct ethhdr); > > if ((long)(iph + 1) > (long)skb->data_end) > needs cap_perfmon. Got it. Thanks for your explanation.
On Wed, Aug 16, 2023 at 7:31 PM Yafang Shao <laoar.shao@gmail.com> wrote: > > On Thu, Aug 17, 2023 at 9:54 AM Alexei Starovoitov > <alexei.starovoitov@gmail.com> wrote: > > > > On Mon, Aug 14, 2023 at 7:46 PM Yafang Shao <laoar.shao@gmail.com> wrote: > > > > > > On Tue, Aug 15, 2023 at 8:28 AM Yonghong Song <yonghong.song@linux.dev> wrote: > > > > > > > > > > > > > > > > On 8/14/23 7:33 AM, Yafang Shao wrote: > > > > > Add a new bpf_current_capable kfunc to check whether the current task > > > > > has a specific capability. In our use case, we will use it in a lsm bpf > > > > > program to help identify if the user operation is permitted. > > > > > > > > > > Signed-off-by: Yafang Shao <laoar.shao@gmail.com> > > > > > --- > > > > > kernel/bpf/helpers.c | 6 ++++++ > > > > > 1 file changed, 6 insertions(+) > > > > > > > > > > diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c > > > > > index eb91cae..bbee7ea 100644 > > > > > --- a/kernel/bpf/helpers.c > > > > > +++ b/kernel/bpf/helpers.c > > > > > @@ -2429,6 +2429,11 @@ __bpf_kfunc void bpf_rcu_read_unlock(void) > > > > > rcu_read_unlock(); > > > > > } > > > > > > > > > > +__bpf_kfunc bool bpf_current_capable(int cap) > > > > > +{ > > > > > + return has_capability(current, cap); > > > > > +} > > > > > > > > Since you are testing against 'current' capabilities, I assume > > > > that the context should be process. Otherwise, you are testing > > > > against random task which does not make much sense. > > > > > > It is in the process context. > > > > > > > > > > > Since you are testing against 'current' cap, and if the capability > > > > for that task is stable, you do not need this kfunc. > > > > You can test cap in user space and pass it into the bpf program. > > > > > > > > But if the cap for your process may change in the middle of > > > > run, then you indeed need bpf prog to test capability in real time. > > > > Is this your use case and could you describe in in more detail? > > > > > > After we convert the capability of our networking bpf program from > > > CAP_SYS_ADMIN to CAP_BPF+CAP_NET_ADMIN to enhance the security, we > > > encountered the "pointer comparison prohibited" error, because > > > allow_ptr_leaks is enabled only when CAP_PERFMON is set. However, if > > > we enable the CAP_PERFMON for the networking bpf program, it can run > > > tracing bpf prog, perf_event bpf prog and etc, that is not expected by > > > us. > > > > > > Hence we are planning to use a lsm bpf program to disallow it from > > > running other bpf programs. In our lsm bpf program we will check the > > > capability of processes, if the process has cap_net_admin, cap_bpf and > > > cap_perfmon but don't have cap_sys_admin we will refuse it to run > > > tracing and perf_event bpf program. While if a process has cap_bpf > > > and cap_perfmon but doesn't have cap_net_admin, that said it is a bpf > > > program which wants to run trace bpf, we will allow it. > > > > > > We can't use lsm_cgroup because it is supported on cgroup2 only, while > > > we're still using cgroup1. > > > > > > Another possible solution is enable allow_ptr_leaks for cap_net_admin > > > as well, but after I checked the commit which introduces the cap_bpf > > > and cap_perfmon [1], I think we wouldn't like to do it. > > > > Sorry. None of these options are acceptable. > > > > The idea of introducing a bpf_current_capable() kfunc just to work > > around a deficiency in the verifier is not sound. > > So what should we do then? > Just enabling the cap_perfmon for it? That does not sound as well ... Yonghong already pointed out upthread that comparison of two packet pointers is not a pointer leak. See this code: } else if (!try_match_pkt_pointers(insn, dst_reg, ®s[insn->src_reg], this_branch, other_branch) && is_pointer_value(env, insn->dst_reg)) { verbose(env, "R%d pointer comparison prohibited\n", insn->dst_reg); return -EACCES; } It's not clear why it doesn't address your case. That's what we should be discussing and debugging instead of arguing about bpf_current_capable and similar "solutions". > > > > Next time instead of sending patch please describe the issue first. > > You should have started with an email that: > > struct iphdr *iph = (void *)(long)skb->data + sizeof(struct ethhdr); > > > > if ((long)(iph + 1) > (long)skb->data_end) > > needs cap_perfmon. > > Got it. Thanks for your explanation. > > -- > Regards > Yafang
On Thu, Aug 17, 2023 at 11:31 AM Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > > On Wed, Aug 16, 2023 at 7:31 PM Yafang Shao <laoar.shao@gmail.com> wrote: > > > > On Thu, Aug 17, 2023 at 9:54 AM Alexei Starovoitov > > <alexei.starovoitov@gmail.com> wrote: > > > > > > On Mon, Aug 14, 2023 at 7:46 PM Yafang Shao <laoar.shao@gmail.com> wrote: > > > > > > > > On Tue, Aug 15, 2023 at 8:28 AM Yonghong Song <yonghong.song@linux.dev> wrote: > > > > > > > > > > > > > > > > > > > > On 8/14/23 7:33 AM, Yafang Shao wrote: > > > > > > Add a new bpf_current_capable kfunc to check whether the current task > > > > > > has a specific capability. In our use case, we will use it in a lsm bpf > > > > > > program to help identify if the user operation is permitted. > > > > > > > > > > > > Signed-off-by: Yafang Shao <laoar.shao@gmail.com> > > > > > > --- > > > > > > kernel/bpf/helpers.c | 6 ++++++ > > > > > > 1 file changed, 6 insertions(+) > > > > > > > > > > > > diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c > > > > > > index eb91cae..bbee7ea 100644 > > > > > > --- a/kernel/bpf/helpers.c > > > > > > +++ b/kernel/bpf/helpers.c > > > > > > @@ -2429,6 +2429,11 @@ __bpf_kfunc void bpf_rcu_read_unlock(void) > > > > > > rcu_read_unlock(); > > > > > > } > > > > > > > > > > > > +__bpf_kfunc bool bpf_current_capable(int cap) > > > > > > +{ > > > > > > + return has_capability(current, cap); > > > > > > +} > > > > > > > > > > Since you are testing against 'current' capabilities, I assume > > > > > that the context should be process. Otherwise, you are testing > > > > > against random task which does not make much sense. > > > > > > > > It is in the process context. > > > > > > > > > > > > > > Since you are testing against 'current' cap, and if the capability > > > > > for that task is stable, you do not need this kfunc. > > > > > You can test cap in user space and pass it into the bpf program. > > > > > > > > > > But if the cap for your process may change in the middle of > > > > > run, then you indeed need bpf prog to test capability in real time. > > > > > Is this your use case and could you describe in in more detail? > > > > > > > > After we convert the capability of our networking bpf program from > > > > CAP_SYS_ADMIN to CAP_BPF+CAP_NET_ADMIN to enhance the security, we > > > > encountered the "pointer comparison prohibited" error, because > > > > allow_ptr_leaks is enabled only when CAP_PERFMON is set. However, if > > > > we enable the CAP_PERFMON for the networking bpf program, it can run > > > > tracing bpf prog, perf_event bpf prog and etc, that is not expected by > > > > us. > > > > > > > > Hence we are planning to use a lsm bpf program to disallow it from > > > > running other bpf programs. In our lsm bpf program we will check the > > > > capability of processes, if the process has cap_net_admin, cap_bpf and > > > > cap_perfmon but don't have cap_sys_admin we will refuse it to run > > > > tracing and perf_event bpf program. While if a process has cap_bpf > > > > and cap_perfmon but doesn't have cap_net_admin, that said it is a bpf > > > > program which wants to run trace bpf, we will allow it. > > > > > > > > We can't use lsm_cgroup because it is supported on cgroup2 only, while > > > > we're still using cgroup1. > > > > > > > > Another possible solution is enable allow_ptr_leaks for cap_net_admin > > > > as well, but after I checked the commit which introduces the cap_bpf > > > > and cap_perfmon [1], I think we wouldn't like to do it. > > > > > > Sorry. None of these options are acceptable. > > > > > > The idea of introducing a bpf_current_capable() kfunc just to work > > > around a deficiency in the verifier is not sound. > > > > So what should we do then? > > Just enabling the cap_perfmon for it? That does not sound as well ... > > Yonghong already pointed out upthread that > comparison of two packet pointers is not a pointer leak. > See this code: > } else if (!try_match_pkt_pointers(insn, dst_reg, ®s[insn->src_reg], > this_branch, other_branch) && > is_pointer_value(env, insn->dst_reg)) { > verbose(env, "R%d pointer comparison prohibited\n", > insn->dst_reg); > return -EACCES; > } > > It's not clear why it doesn't address your case. It can address the issue. It seems we should do the code change below. diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 0b9da95..c66dc61 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -13819,6 +13819,18 @@ static int check_cond_jmp_op(struct bpf_verifier_env *env, return -EINVAL; } + other_branch = push_stack(env, *insn_idx + insn->off + 1, *insn_idx, + false); + if (!other_branch) + return -EFAULT; + + /* check src2 operand */ + err = check_reg_arg(env, insn->dst_reg, SRC_OP); + if (err) + return err; + + dst_reg = ®s[insn->dst_reg]; + if (BPF_SRC(insn->code) == BPF_X) { if (insn->imm != 0) { verbose(env, "BPF_JMP/JMP32 uses reserved fields\n"); @@ -13830,7 +13842,9 @@ static int check_cond_jmp_op(struct bpf_verifier_env *env, if (err) return err; - if (is_pointer_value(env, insn->src_reg)) { + if (!try_match_pkt_pointers(insn, dst_reg, ®s[insn->src_reg], + this_branch, other_branch) && + is_pointer_value(env, insn->src_reg)) { verbose(env, "R%d pointer comparison prohibited\n", insn->src_reg); return -EACCES; @@ -13843,12 +13857,6 @@ static int check_cond_jmp_op(struct bpf_verifier_env *env, } } - /* check src2 operand */ - err = check_reg_arg(env, insn->dst_reg, SRC_OP); - if (err) - return err; - - dst_reg = ®s[insn->dst_reg]; is_jmp32 = BPF_CLASS(insn->code) == BPF_JMP32; if (BPF_SRC(insn->code) == BPF_K) { @@ -13920,10 +13928,6 @@ static int check_cond_jmp_op(struct bpf_verifier_env *env, return 0; } - other_branch = push_stack(env, *insn_idx + insn->off + 1, *insn_idx, - false); - if (!other_branch) - return -EFAULT; other_branch_regs = other_branch->frame[other_branch->curframe]->regs; /* detect if we are comparing against a constant value so we can adjust
On 8/17/23 9:09 AM, Yafang Shao wrote: > On Thu, Aug 17, 2023 at 11:31 AM Alexei Starovoitov > <alexei.starovoitov@gmail.com> wrote: >> On Wed, Aug 16, 2023 at 7:31 PM Yafang Shao <laoar.shao@gmail.com> wrote: >>> On Thu, Aug 17, 2023 at 9:54 AM Alexei Starovoitov >>> <alexei.starovoitov@gmail.com> wrote: >>>> On Mon, Aug 14, 2023 at 7:46 PM Yafang Shao <laoar.shao@gmail.com> wrote: >>>>> On Tue, Aug 15, 2023 at 8:28 AM Yonghong Song <yonghong.song@linux.dev> wrote: >>>>>> On 8/14/23 7:33 AM, Yafang Shao wrote: >>>>>>> Add a new bpf_current_capable kfunc to check whether the current task >>>>>>> has a specific capability. In our use case, we will use it in a lsm bpf >>>>>>> program to help identify if the user operation is permitted. >>>>>>> >>>>>>> Signed-off-by: Yafang Shao <laoar.shao@gmail.com> >>>>>>> --- >>>>>>> kernel/bpf/helpers.c | 6 ++++++ >>>>>>> 1 file changed, 6 insertions(+) >>>>>>> >>>>>>> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c >>>>>>> index eb91cae..bbee7ea 100644 >>>>>>> --- a/kernel/bpf/helpers.c >>>>>>> +++ b/kernel/bpf/helpers.c >>>>>>> @@ -2429,6 +2429,11 @@ __bpf_kfunc void bpf_rcu_read_unlock(void) >>>>>>> rcu_read_unlock(); >>>>>>> } >>>>>>> >>>>>>> +__bpf_kfunc bool bpf_current_capable(int cap) >>>>>>> +{ >>>>>>> + return has_capability(current, cap); >>>>>>> +} >>>>>> >>>>>> Since you are testing against 'current' capabilities, I assume >>>>>> that the context should be process. Otherwise, you are testing >>>>>> against random task which does not make much sense. >>>>> >>>>> It is in the process context. >>>>> >>>>>> Since you are testing against 'current' cap, and if the capability >>>>>> for that task is stable, you do not need this kfunc. >>>>>> You can test cap in user space and pass it into the bpf program. >>>>>> >>>>>> But if the cap for your process may change in the middle of >>>>>> run, then you indeed need bpf prog to test capability in real time. >>>>>> Is this your use case and could you describe in in more detail? >>>>> >>>>> After we convert the capability of our networking bpf program from >>>>> CAP_SYS_ADMIN to CAP_BPF+CAP_NET_ADMIN to enhance the security, we >>>>> encountered the "pointer comparison prohibited" error, because >>>>> allow_ptr_leaks is enabled only when CAP_PERFMON is set. However, if >>>>> we enable the CAP_PERFMON for the networking bpf program, it can run >>>>> tracing bpf prog, perf_event bpf prog and etc, that is not expected by >>>>> us. >>>>> >>>>> Hence we are planning to use a lsm bpf program to disallow it from >>>>> running other bpf programs. In our lsm bpf program we will check the >>>>> capability of processes, if the process has cap_net_admin, cap_bpf and >>>>> cap_perfmon but don't have cap_sys_admin we will refuse it to run >>>>> tracing and perf_event bpf program. While if a process has cap_bpf >>>>> and cap_perfmon but doesn't have cap_net_admin, that said it is a bpf >>>>> program which wants to run trace bpf, we will allow it. >>>>> >>>>> We can't use lsm_cgroup because it is supported on cgroup2 only, while >>>>> we're still using cgroup1. >>>>> >>>>> Another possible solution is enable allow_ptr_leaks for cap_net_admin >>>>> as well, but after I checked the commit which introduces the cap_bpf >>>>> and cap_perfmon [1], I think we wouldn't like to do it. >>>> >>>> Sorry. None of these options are acceptable. >>>> >>>> The idea of introducing a bpf_current_capable() kfunc just to work >>>> around a deficiency in the verifier is not sound. >>> >>> So what should we do then? >>> Just enabling the cap_perfmon for it? That does not sound as well ... >> >> Yonghong already pointed out upthread that >> comparison of two packet pointers is not a pointer leak. >> See this code: >> } else if (!try_match_pkt_pointers(insn, dst_reg, ®s[insn->src_reg], >> this_branch, other_branch) && >> is_pointer_value(env, insn->dst_reg)) { >> verbose(env, "R%d pointer comparison prohibited\n", >> insn->dst_reg); >> return -EACCES; >> } >> >> It's not clear why it doesn't address your case. > > It can address the issue. > It seems we should do the code change below. I presume in your actual program you also access the IP header (otherwise it would be quite useless), and as you mentioned above, you would like this to be accessible for just CAP_BPF + CAP_NET_ADMIN. The CAP_PERFMON restriction is there for a reason, that is, to be on same cap level as tracing programs as you could craft Spectre v1 style access. It's not a deficiency. Thanks, Daniel
On Thu, Aug 17, 2023 at 8:30 AM Daniel Borkmann <daniel@iogearbox.net> wrote: > > On 8/17/23 9:09 AM, Yafang Shao wrote: > > On Thu, Aug 17, 2023 at 11:31 AM Alexei Starovoitov > > <alexei.starovoitov@gmail.com> wrote: > >> On Wed, Aug 16, 2023 at 7:31 PM Yafang Shao <laoar.shao@gmail.com> wrote: > >>> On Thu, Aug 17, 2023 at 9:54 AM Alexei Starovoitov > >>> <alexei.starovoitov@gmail.com> wrote: > >>>> On Mon, Aug 14, 2023 at 7:46 PM Yafang Shao <laoar.shao@gmail.com> wrote: > >>>>> On Tue, Aug 15, 2023 at 8:28 AM Yonghong Song <yonghong.song@linux.dev> wrote: > >>>>>> On 8/14/23 7:33 AM, Yafang Shao wrote: > >>>>>>> Add a new bpf_current_capable kfunc to check whether the current task > >>>>>>> has a specific capability. In our use case, we will use it in a lsm bpf > >>>>>>> program to help identify if the user operation is permitted. > >>>>>>> > >>>>>>> Signed-off-by: Yafang Shao <laoar.shao@gmail.com> > >>>>>>> --- > >>>>>>> kernel/bpf/helpers.c | 6 ++++++ > >>>>>>> 1 file changed, 6 insertions(+) > >>>>>>> > >>>>>>> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c > >>>>>>> index eb91cae..bbee7ea 100644 > >>>>>>> --- a/kernel/bpf/helpers.c > >>>>>>> +++ b/kernel/bpf/helpers.c > >>>>>>> @@ -2429,6 +2429,11 @@ __bpf_kfunc void bpf_rcu_read_unlock(void) > >>>>>>> rcu_read_unlock(); > >>>>>>> } > >>>>>>> > >>>>>>> +__bpf_kfunc bool bpf_current_capable(int cap) > >>>>>>> +{ > >>>>>>> + return has_capability(current, cap); > >>>>>>> +} > >>>>>> > >>>>>> Since you are testing against 'current' capabilities, I assume > >>>>>> that the context should be process. Otherwise, you are testing > >>>>>> against random task which does not make much sense. > >>>>> > >>>>> It is in the process context. > >>>>> > >>>>>> Since you are testing against 'current' cap, and if the capability > >>>>>> for that task is stable, you do not need this kfunc. > >>>>>> You can test cap in user space and pass it into the bpf program. > >>>>>> > >>>>>> But if the cap for your process may change in the middle of > >>>>>> run, then you indeed need bpf prog to test capability in real time. > >>>>>> Is this your use case and could you describe in in more detail? > >>>>> > >>>>> After we convert the capability of our networking bpf program from > >>>>> CAP_SYS_ADMIN to CAP_BPF+CAP_NET_ADMIN to enhance the security, we > >>>>> encountered the "pointer comparison prohibited" error, because > >>>>> allow_ptr_leaks is enabled only when CAP_PERFMON is set. However, if > >>>>> we enable the CAP_PERFMON for the networking bpf program, it can run > >>>>> tracing bpf prog, perf_event bpf prog and etc, that is not expected by > >>>>> us. > >>>>> > >>>>> Hence we are planning to use a lsm bpf program to disallow it from > >>>>> running other bpf programs. In our lsm bpf program we will check the > >>>>> capability of processes, if the process has cap_net_admin, cap_bpf and > >>>>> cap_perfmon but don't have cap_sys_admin we will refuse it to run > >>>>> tracing and perf_event bpf program. While if a process has cap_bpf > >>>>> and cap_perfmon but doesn't have cap_net_admin, that said it is a bpf > >>>>> program which wants to run trace bpf, we will allow it. > >>>>> > >>>>> We can't use lsm_cgroup because it is supported on cgroup2 only, while > >>>>> we're still using cgroup1. > >>>>> > >>>>> Another possible solution is enable allow_ptr_leaks for cap_net_admin > >>>>> as well, but after I checked the commit which introduces the cap_bpf > >>>>> and cap_perfmon [1], I think we wouldn't like to do it. > >>>> > >>>> Sorry. None of these options are acceptable. > >>>> > >>>> The idea of introducing a bpf_current_capable() kfunc just to work > >>>> around a deficiency in the verifier is not sound. > >>> > >>> So what should we do then? > >>> Just enabling the cap_perfmon for it? That does not sound as well ... > >> > >> Yonghong already pointed out upthread that > >> comparison of two packet pointers is not a pointer leak. > >> See this code: > >> } else if (!try_match_pkt_pointers(insn, dst_reg, ®s[insn->src_reg], > >> this_branch, other_branch) && > >> is_pointer_value(env, insn->dst_reg)) { > >> verbose(env, "R%d pointer comparison prohibited\n", > >> insn->dst_reg); > >> return -EACCES; > >> } > >> > >> It's not clear why it doesn't address your case. > > > > It can address the issue. > > It seems we should do the code change below. > > I presume in your actual program you also access the IP header (otherwise it > would be quite useless), and as you mentioned above, you would like this to be > accessible for just CAP_BPF + CAP_NET_ADMIN. The CAP_PERFMON restriction is > there for a reason, that is, to be on same cap level as tracing programs as you > could craft Spectre v1 style access. It's not a deficiency. Probably not. To mount v1 style attack the load of data_end needs to be attacker controlled which isn't the case here.
On Thu, Aug 17, 2023 at 12:10 AM Yafang Shao <laoar.shao@gmail.com> wrote: > > > > Yonghong already pointed out upthread that > > comparison of two packet pointers is not a pointer leak. > > See this code: > > } else if (!try_match_pkt_pointers(insn, dst_reg, ®s[insn->src_reg], > > this_branch, other_branch) && > > is_pointer_value(env, insn->dst_reg)) { > > verbose(env, "R%d pointer comparison prohibited\n", > > insn->dst_reg); > > return -EACCES; > > } > > > > It's not clear why it doesn't address your case. > > It can address the issue. > It seems we should do the code change below. > > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c > index 0b9da95..c66dc61 100644 > --- a/kernel/bpf/verifier.c > +++ b/kernel/bpf/verifier.c > @@ -13819,6 +13819,18 @@ static int check_cond_jmp_op(struct > bpf_verifier_env *env, > return -EINVAL; > } > > + other_branch = push_stack(env, *insn_idx + insn->off + 1, *insn_idx, > + false); Yeah. something like that. except we must do push_stack() only after is_branch_taken() didn't succeed.
On Thu, Aug 17, 2023 at 11:30 PM Daniel Borkmann <daniel@iogearbox.net> wrote: > > On 8/17/23 9:09 AM, Yafang Shao wrote: > > On Thu, Aug 17, 2023 at 11:31 AM Alexei Starovoitov > > <alexei.starovoitov@gmail.com> wrote: > >> On Wed, Aug 16, 2023 at 7:31 PM Yafang Shao <laoar.shao@gmail.com> wrote: > >>> On Thu, Aug 17, 2023 at 9:54 AM Alexei Starovoitov > >>> <alexei.starovoitov@gmail.com> wrote: > >>>> On Mon, Aug 14, 2023 at 7:46 PM Yafang Shao <laoar.shao@gmail.com> wrote: > >>>>> On Tue, Aug 15, 2023 at 8:28 AM Yonghong Song <yonghong.song@linux.dev> wrote: > >>>>>> On 8/14/23 7:33 AM, Yafang Shao wrote: > >>>>>>> Add a new bpf_current_capable kfunc to check whether the current task > >>>>>>> has a specific capability. In our use case, we will use it in a lsm bpf > >>>>>>> program to help identify if the user operation is permitted. > >>>>>>> > >>>>>>> Signed-off-by: Yafang Shao <laoar.shao@gmail.com> > >>>>>>> --- > >>>>>>> kernel/bpf/helpers.c | 6 ++++++ > >>>>>>> 1 file changed, 6 insertions(+) > >>>>>>> > >>>>>>> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c > >>>>>>> index eb91cae..bbee7ea 100644 > >>>>>>> --- a/kernel/bpf/helpers.c > >>>>>>> +++ b/kernel/bpf/helpers.c > >>>>>>> @@ -2429,6 +2429,11 @@ __bpf_kfunc void bpf_rcu_read_unlock(void) > >>>>>>> rcu_read_unlock(); > >>>>>>> } > >>>>>>> > >>>>>>> +__bpf_kfunc bool bpf_current_capable(int cap) > >>>>>>> +{ > >>>>>>> + return has_capability(current, cap); > >>>>>>> +} > >>>>>> > >>>>>> Since you are testing against 'current' capabilities, I assume > >>>>>> that the context should be process. Otherwise, you are testing > >>>>>> against random task which does not make much sense. > >>>>> > >>>>> It is in the process context. > >>>>> > >>>>>> Since you are testing against 'current' cap, and if the capability > >>>>>> for that task is stable, you do not need this kfunc. > >>>>>> You can test cap in user space and pass it into the bpf program. > >>>>>> > >>>>>> But if the cap for your process may change in the middle of > >>>>>> run, then you indeed need bpf prog to test capability in real time. > >>>>>> Is this your use case and could you describe in in more detail? > >>>>> > >>>>> After we convert the capability of our networking bpf program from > >>>>> CAP_SYS_ADMIN to CAP_BPF+CAP_NET_ADMIN to enhance the security, we > >>>>> encountered the "pointer comparison prohibited" error, because > >>>>> allow_ptr_leaks is enabled only when CAP_PERFMON is set. However, if > >>>>> we enable the CAP_PERFMON for the networking bpf program, it can run > >>>>> tracing bpf prog, perf_event bpf prog and etc, that is not expected by > >>>>> us. > >>>>> > >>>>> Hence we are planning to use a lsm bpf program to disallow it from > >>>>> running other bpf programs. In our lsm bpf program we will check the > >>>>> capability of processes, if the process has cap_net_admin, cap_bpf and > >>>>> cap_perfmon but don't have cap_sys_admin we will refuse it to run > >>>>> tracing and perf_event bpf program. While if a process has cap_bpf > >>>>> and cap_perfmon but doesn't have cap_net_admin, that said it is a bpf > >>>>> program which wants to run trace bpf, we will allow it. > >>>>> > >>>>> We can't use lsm_cgroup because it is supported on cgroup2 only, while > >>>>> we're still using cgroup1. > >>>>> > >>>>> Another possible solution is enable allow_ptr_leaks for cap_net_admin > >>>>> as well, but after I checked the commit which introduces the cap_bpf > >>>>> and cap_perfmon [1], I think we wouldn't like to do it. > >>>> > >>>> Sorry. None of these options are acceptable. > >>>> > >>>> The idea of introducing a bpf_current_capable() kfunc just to work > >>>> around a deficiency in the verifier is not sound. > >>> > >>> So what should we do then? > >>> Just enabling the cap_perfmon for it? That does not sound as well ... > >> > >> Yonghong already pointed out upthread that > >> comparison of two packet pointers is not a pointer leak. > >> See this code: > >> } else if (!try_match_pkt_pointers(insn, dst_reg, ®s[insn->src_reg], > >> this_branch, other_branch) && > >> is_pointer_value(env, insn->dst_reg)) { > >> verbose(env, "R%d pointer comparison prohibited\n", > >> insn->dst_reg); > >> return -EACCES; > >> } > >> > >> It's not clear why it doesn't address your case. > > > > It can address the issue. > > It seems we should do the code change below. > > I presume in your actual program you also access the IP header (otherwise it > would be quite useless), and as you mentioned above, you would like this to be > accessible for just CAP_BPF + CAP_NET_ADMIN. The CAP_PERFMON restriction is > there for a reason, that is, to be on same cap level as tracing programs as you > could craft Spectre v1 style access. It's not a deficiency. Hi Daniel, We also hit an error caused by bpf_bypass_spec_v1 in our networking-bpf with cap_net_admin+cap_bpf. Here is a simple reproducer, #include <linux/pkt_cls.h> #include <linux/if_ether.h> #include <linux/ip.h> #include <linux/tcp.h> #include <linux/bpf.h> #include <bpf/bpf_helpers.h> struct flow_stats_v { __u32 egress_packets; __u32 ingress_packets; }; /* ELF map definition */ struct bpf_elf_map { __u32 type; __u32 size_key; __u32 size_value; __u32 max_elem; __u32 flags; __u32 id; __u32 pinning; __u32 inner_id; __u32 inner_idx; }; struct bpf_elf_map SEC("maps") flow_stats = { .type = BPF_MAP_TYPE_HASH, .size_key = sizeof(__u32), .size_value = sizeof(struct flow_stats_v), .max_elem = 32, .flags = BPF_F_NO_PREALLOC, .pinning = 0, }; struct bpf_elf_map SEC("maps") classid = { .type = BPF_MAP_TYPE_HASH, .size_key = sizeof(__be32), .size_value = sizeof(__be32), .max_elem = 32, .flags = BPF_F_NO_PREALLOC, .pinning = 0, }; struct flow { __be32 local_classid; __be32 local_addr; __u8 dir; }; static void flow_stat(struct __sk_buff *skb, struct flow *flow) { struct flow_stats_v *value; __u32 key = 0; // This value is not relavant value = bpf_map_lookup_elem(&flow_stats, &key); if (!value) return; if (flow->dir == 0) __sync_fetch_and_add(&value->egress_packets, 1); else __sync_fetch_and_add(&value->ingress_packets, 1); } int handle_packet(struct __sk_buff *skb, int dir) { void *data_end = (void *)(long)skb->data_end; void *data = (void *)(long)skb->data; struct flow flow = {}; __be32 *id; flow.dir = dir; flow.local_addr = 1; // This value is not relavant id = bpf_map_lookup_elem(&classid, &flow.local_addr); if (id && *id) flow.local_classid = *id; flow_stat(skb, &flow); return TC_ACT_OK; } SEC("cls-ingress") int ingress(struct __sk_buff *skb) { return handle_packet(skb, 1); } char _license[] SEC("license") = "GPL"; The error log of the bpf verifier as follows, 23: (71) r3 = *(u8 *)(r10 -8) ; R3_w=scalar(umax=255,var_off=(0x0; 0xff)) R10=fp0 fp-8=mmmm???m 24: (b7) r2 = 1 ; R2_w=1 25: (55) if r3 != 0x0 goto pc+1 27: R0=map_value(off=0,ks=4,vs=8,imm=0) R1_w=1 R2_w=1 R3_w=scalar(umax=255,var_off=(0x0; 0xff)) R10=fp0 fp-8=mmmm???m fp-16= 27: (67) r2 <<= 2 ; R2_w=4 28: (0f) r0 += r2 R0 tried to add from different maps, paths or scalars, pointer arithmetic with it prohibited for !root processed 31 insns (limit 1000000000) max_states_per_insn 0 total_states 3 peak_states 3 mark_read 1 It can be fixed by the code change below in the kernel, diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index b6b60cd..0e200a9 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -11750,7 +11750,7 @@ static int update_alu_sanitation_state(struct bpf_insn_aux_data *aux, /* If we arrived here from different branches with different * state or limits to sanitize, then this won't work. */ - if (aux->alu_state && + if (aux->alu_state && aux->alu_limit && (aux->alu_state != alu_state || aux->alu_limit != alu_limit)) return REASON_PATHS; But I'm not sure if it is a proper fix. We can workaround the issue by modifying our bpf program as follows, @@ -77,7 +77,21 @@ int handle_packet(struct __sk_buff *skb, int dir) if (id && *id) flow.local_classid = *id; - flow_stat(skb, &flow); + { + struct flow_stats_v *value; + __u32 key = 0; // This value is not relavant + + value = bpf_map_lookup_elem(&flow_stats, &key); + if (!value) + return TC_ACT_OK; + + if (flow.dir == 0) + __sync_fetch_and_add(&value->egress_packets, 1); + else + __sync_fetch_and_add(&value->ingress_packets, 1); + + } + // flow_stat(skb, &flow); return TC_ACT_OK; } But I don't have a clear idea why. It seems to me that the bpf verifier under cap_net_admin+cap_bpf is fragile. I'm wondering if it is possible to add a sysctl to bypass the perfmon check : --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -2091,22 +2091,22 @@ static inline void bpf_map_dec_elem_count(struct bpf_map *map) static inline bool bpf_allow_ptr_leaks(void) { - return perfmon_capable(); + return perfmon_capable() || syscal_bypass_perfmon; } static inline bool bpf_allow_uninit_stack(void) { - return perfmon_capable(); + return perfmon_capable() || syscal_bypass_perfmon; } static inline bool bpf_bypass_spec_v1(void) { - return perfmon_capable(); + return perfmon_capable() || syscal_bypass_perfmon; } static inline bool bpf_bypass_spec_v4(void) { - return perfmon_capable(); + return perfmon_capable() || syscal_bypass_perfmon; } int bpf_map_new_fd(struct bpf_map *map, int flags); Do you have any suggestions?
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c index eb91cae..bbee7ea 100644 --- a/kernel/bpf/helpers.c +++ b/kernel/bpf/helpers.c @@ -2429,6 +2429,11 @@ __bpf_kfunc void bpf_rcu_read_unlock(void) rcu_read_unlock(); } +__bpf_kfunc bool bpf_current_capable(int cap) +{ + return has_capability(current, cap); +} + __diag_pop(); BTF_SET8_START(generic_btf_ids) @@ -2456,6 +2461,7 @@ __bpf_kfunc void bpf_rcu_read_unlock(void) BTF_ID_FLAGS(func, bpf_task_under_cgroup, KF_RCU) #endif BTF_ID_FLAGS(func, bpf_task_from_pid, KF_ACQUIRE | KF_RET_NULL) +BTF_ID_FLAGS(func, bpf_current_capable, KF_RCU) BTF_SET8_END(generic_btf_ids) static const struct btf_kfunc_id_set generic_kfunc_set = {
Add a new bpf_current_capable kfunc to check whether the current task has a specific capability. In our use case, we will use it in a lsm bpf program to help identify if the user operation is permitted. Signed-off-by: Yafang Shao <laoar.shao@gmail.com> --- kernel/bpf/helpers.c | 6 ++++++ 1 file changed, 6 insertions(+)