Message ID | 20210226112322.144927-2-bjorn.topel@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | BPF |
Headers | show |
Series | Optimize bpf_redirect_map()/xdp_do_redirect() | expand |
Björn Töpel <bjorn.topel@gmail.com> writes: > From: Björn Töpel <bjorn.topel@intel.com> > > Currently the bpf_redirect_map() implementation dispatches to the > correct map-lookup function via a switch-statement. To avoid the > dispatching, this change adds bpf_redirect_map() as a map > operation. Each map provides its bpf_redirect_map() version, and > correct function is automatically selected by the BPF verifier. > > A nice side-effect of the code movement is that the map lookup > functions are now local to the map implementation files, which removes > one additional function call. > > Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Nice! I agree that this is a much nicer approach! :) (That last paragraph above is why I asked if you updated the performance numbers in the cover letter; removing an additional function call should affect those, right?) Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
On 2021-02-26 12:37, Toke Høiland-Jørgensen wrote: > Björn Töpel <bjorn.topel@gmail.com> writes: > >> From: Björn Töpel <bjorn.topel@intel.com> >> >> Currently the bpf_redirect_map() implementation dispatches to the >> correct map-lookup function via a switch-statement. To avoid the >> dispatching, this change adds bpf_redirect_map() as a map >> operation. Each map provides its bpf_redirect_map() version, and >> correct function is automatically selected by the BPF verifier. >> >> A nice side-effect of the code movement is that the map lookup >> functions are now local to the map implementation files, which removes >> one additional function call. >> >> Signed-off-by: Björn Töpel <bjorn.topel@intel.com> > > Nice! I agree that this is a much nicer approach! :) > > (That last paragraph above is why I asked if you updated the performance > numbers in the cover letter; removing an additional function call should > affect those, right?) > Yeah, it should. Let me spend some more time benchmarking on the DEVMAP scenario. @Jesper Do you have a CPUMAP benchmark that you can point me to? I just did functional testing for CPUMAP > Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> > Thank you! Björn
On 2021-02-26 12:40, Björn Töpel wrote: > On 2021-02-26 12:37, Toke Høiland-Jørgensen wrote: [...] >> >> (That last paragraph above is why I asked if you updated the performance >> numbers in the cover letter; removing an additional function call should >> affect those, right?) >> > > Yeah, it should. Let me spend some more time benchmarking on the DEVMAP > scenario. > I did a re-measure using samples/xdp_redirect_map. The setup is 64B packets blasted to an i40e. As a baseline, # xdp_rxq_info --dev ens801f1 --action XDP_DROP gives 24.8 Mpps. Now, xdp_redirect_map. Same NIC, two ports, receive from port A, redirect to port B: baseline: 14.3 Mpps this series: 15.4 Mpps which is almost 8%! Björn
Björn Töpel <bjorn.topel@intel.com> writes: > On 2021-02-26 12:40, Björn Töpel wrote: >> On 2021-02-26 12:37, Toke Høiland-Jørgensen wrote: > > [...] > >>> >>> (That last paragraph above is why I asked if you updated the performance >>> numbers in the cover letter; removing an additional function call should >>> affect those, right?) >>> >> >> Yeah, it should. Let me spend some more time benchmarking on the DEVMAP >> scenario. >> > > I did a re-measure using samples/xdp_redirect_map. > > The setup is 64B packets blasted to an i40e. As a baseline, > > # xdp_rxq_info --dev ens801f1 --action XDP_DROP > > gives 24.8 Mpps. > > > Now, xdp_redirect_map. Same NIC, two ports, receive from port A, > redirect to port B: > > baseline: 14.3 Mpps > this series: 15.4 Mpps > > which is almost 8%! Or 5 ns difference: 10**9/(14.3*10**6) - 10**9/(15.4*10**6) 4.995004995005004 Nice :) -Toke
On Fri, 26 Feb 2021 12:40:33 +0100 Björn Töpel <bjorn.topel@intel.com> wrote: > @Jesper Do you have a CPUMAP benchmark that you can point me to? I just > did functional testing for CPUMAP I usually just use the xdp_redirect_cpu samples/bpf program. Your optimization will help the RX enqueue side, but the bottleneck for CPUMAP is the remote CPU dequeue. You should still be able to see that RX-side performance improve, and that should be enough (even-though packets are dropped before reaching remote CPU). I'm not going to ask you to test scale out to more CPUs.
On Fri, 26 Feb 2021 13:26:22 +0100 Toke Høiland-Jørgensen <toke@redhat.com> wrote: > Björn Töpel <bjorn.topel@intel.com> writes: > > > On 2021-02-26 12:40, Björn Töpel wrote: > >> On 2021-02-26 12:37, Toke Høiland-Jørgensen wrote: > > > > [...] > > > >>> > >>> (That last paragraph above is why I asked if you updated the performance > >>> numbers in the cover letter; removing an additional function call should > >>> affect those, right?) > >>> > >> > >> Yeah, it should. Let me spend some more time benchmarking on the DEVMAP > >> scenario. > >> > > > > I did a re-measure using samples/xdp_redirect_map. > > > > The setup is 64B packets blasted to an i40e. As a baseline, > > > > # xdp_rxq_info --dev ens801f1 --action XDP_DROP > > > > gives 24.8 Mpps. > > > > > > Now, xdp_redirect_map. Same NIC, two ports, receive from port A, > > redirect to port B: > > > > baseline: 14.3 Mpps > > this series: 15.4 Mpps > > > > which is almost 8%! > > Or 5 ns difference: > > 10**9/(14.3*10**6) - 10**9/(15.4*10**6) > 4.995004995005004 > > Nice :) Yes, this is a very significant improvement at this zoom-in benchmarking level :-)
On Fri, 26 Feb 2021 12:37:40 +0100 Toke Høiland-Jørgensen <toke@redhat.com> wrote: > Björn Töpel <bjorn.topel@gmail.com> writes: > > > From: Björn Töpel <bjorn.topel@intel.com> > > > > Currently the bpf_redirect_map() implementation dispatches to the > > correct map-lookup function via a switch-statement. To avoid the > > dispatching, this change adds bpf_redirect_map() as a map > > operation. Each map provides its bpf_redirect_map() version, and > > correct function is automatically selected by the BPF verifier. > > > > A nice side-effect of the code movement is that the map lookup > > functions are now local to the map implementation files, which removes > > one additional function call. > > > > Signed-off-by: Björn Töpel <bjorn.topel@intel.com> > > Nice! I agree that this is a much nicer approach! :) I agree :-) Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Hi "Björn,
I love your patch! Perhaps something to improve:
[auto build test WARNING on 9c8f21e6f8856a96634e542a58ef3abf27486801]
url: https://github.com/0day-ci/linux/commits/Bj-rn-T-pel/Optimize-bpf_redirect_map-xdp_do_redirect/20210226-192840
base: 9c8f21e6f8856a96634e542a58ef3abf27486801
config: mips-randconfig-r026-20210226 (attached as .config)
compiler: mips64-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# https://github.com/0day-ci/linux/commit/1f7606274f17503baf1c0908dad3462981840749
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Bj-rn-T-pel/Optimize-bpf_redirect_map-xdp_do_redirect/20210226-192840
git checkout 1f7606274f17503baf1c0908dad3462981840749
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=mips
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
All warnings (new ones prefixed by >>):
In file included from include/linux/bpf_verifier.h:9,
from kernel/bpf/verifier.c:12:
kernel/bpf/verifier.c: In function 'jit_subprogs':
include/linux/filter.h:363:4: warning: cast between incompatible function types from 'unsigned int (*)(const void *, const struct bpf_insn *)' to 'u64 (*)(u64, u64, u64, u64, u64)' {aka 'long long unsigned int (*)(long long unsigned int, long long unsigned int, long long unsigned int, long long unsigned int, long long unsigned int)'} [-Wcast-function-type]
363 | ((u64 (*)(u64, u64, u64, u64, u64))(x))
| ^
kernel/bpf/verifier.c:11421:16: note: in expansion of macro 'BPF_CAST_CALL'
11421 | insn->imm = BPF_CAST_CALL(func[subprog]->bpf_func) -
| ^~~~~~~~~~~~~
kernel/bpf/verifier.c: In function 'fixup_bpf_calls':
include/linux/filter.h:363:4: warning: cast between incompatible function types from 'void * (* const)(struct bpf_map *, void *)' to 'u64 (*)(u64, u64, u64, u64, u64)' {aka 'long long unsigned int (*)(long long unsigned int, long long unsigned int, long long unsigned int, long long unsigned int, long long unsigned int)'} [-Wcast-function-type]
363 | ((u64 (*)(u64, u64, u64, u64, u64))(x))
| ^
kernel/bpf/verifier.c:11814:17: note: in expansion of macro 'BPF_CAST_CALL'
11814 | insn->imm = BPF_CAST_CALL(ops->map_lookup_elem) -
| ^~~~~~~~~~~~~
include/linux/filter.h:363:4: warning: cast between incompatible function types from 'int (* const)(struct bpf_map *, void *, void *, u64)' {aka 'int (* const)(struct bpf_map *, void *, void *, long long unsigned int)'} to 'u64 (*)(u64, u64, u64, u64, u64)' {aka 'long long unsigned int (*)(long long unsigned int, long long unsigned int, long long unsigned int, long long unsigned int, long long unsigned int)'} [-Wcast-function-type]
363 | ((u64 (*)(u64, u64, u64, u64, u64))(x))
| ^
kernel/bpf/verifier.c:11818:17: note: in expansion of macro 'BPF_CAST_CALL'
11818 | insn->imm = BPF_CAST_CALL(ops->map_update_elem) -
| ^~~~~~~~~~~~~
include/linux/filter.h:363:4: warning: cast between incompatible function types from 'int (* const)(struct bpf_map *, void *)' to 'u64 (*)(u64, u64, u64, u64, u64)' {aka 'long long unsigned int (*)(long long unsigned int, long long unsigned int, long long unsigned int, long long unsigned int, long long unsigned int)'} [-Wcast-function-type]
363 | ((u64 (*)(u64, u64, u64, u64, u64))(x))
| ^
kernel/bpf/verifier.c:11822:17: note: in expansion of macro 'BPF_CAST_CALL'
11822 | insn->imm = BPF_CAST_CALL(ops->map_delete_elem) -
| ^~~~~~~~~~~~~
include/linux/filter.h:363:4: warning: cast between incompatible function types from 'int (* const)(struct bpf_map *, void *, u64)' {aka 'int (* const)(struct bpf_map *, void *, long long unsigned int)'} to 'u64 (*)(u64, u64, u64, u64, u64)' {aka 'long long unsigned int (*)(long long unsigned int, long long unsigned int, long long unsigned int, long long unsigned int, long long unsigned int)'} [-Wcast-function-type]
363 | ((u64 (*)(u64, u64, u64, u64, u64))(x))
| ^
kernel/bpf/verifier.c:11826:17: note: in expansion of macro 'BPF_CAST_CALL'
11826 | insn->imm = BPF_CAST_CALL(ops->map_push_elem) -
| ^~~~~~~~~~~~~
include/linux/filter.h:363:4: warning: cast between incompatible function types from 'int (* const)(struct bpf_map *, void *)' to 'u64 (*)(u64, u64, u64, u64, u64)' {aka 'long long unsigned int (*)(long long unsigned int, long long unsigned int, long long unsigned int, long long unsigned int, long long unsigned int)'} [-Wcast-function-type]
363 | ((u64 (*)(u64, u64, u64, u64, u64))(x))
| ^
kernel/bpf/verifier.c:11830:17: note: in expansion of macro 'BPF_CAST_CALL'
11830 | insn->imm = BPF_CAST_CALL(ops->map_pop_elem) -
| ^~~~~~~~~~~~~
include/linux/filter.h:363:4: warning: cast between incompatible function types from 'int (* const)(struct bpf_map *, void *)' to 'u64 (*)(u64, u64, u64, u64, u64)' {aka 'long long unsigned int (*)(long long unsigned int, long long unsigned int, long long unsigned int, long long unsigned int, long long unsigned int)'} [-Wcast-function-type]
363 | ((u64 (*)(u64, u64, u64, u64, u64))(x))
| ^
kernel/bpf/verifier.c:11834:17: note: in expansion of macro 'BPF_CAST_CALL'
11834 | insn->imm = BPF_CAST_CALL(ops->map_peek_elem) -
| ^~~~~~~~~~~~~
>> include/linux/filter.h:363:4: warning: cast between incompatible function types from 'int (* const)(struct bpf_map *, u32, u64)' {aka 'int (* const)(struct bpf_map *, unsigned int, long long unsigned int)'} to 'u64 (*)(u64, u64, u64, u64, u64)' {aka 'long long unsigned int (*)(long long unsigned int, long long unsigned int, long long unsigned int, long long unsigned int, long long unsigned int)'} [-Wcast-function-type]
363 | ((u64 (*)(u64, u64, u64, u64, u64))(x))
| ^
kernel/bpf/verifier.c:11838:17: note: in expansion of macro 'BPF_CAST_CALL'
11838 | insn->imm = BPF_CAST_CALL(ops->xdp_redirect_map) - __bpf_call_base;
| ^~~~~~~~~~~~~
vim +363 include/linux/filter.h
f8f6d679aaa78b Daniel Borkmann 2014-05-29 361
09772d92cd5ad9 Daniel Borkmann 2018-06-02 362 #define BPF_CAST_CALL(x) \
09772d92cd5ad9 Daniel Borkmann 2018-06-02 @363 ((u64 (*)(u64, u64, u64, u64, u64))(x))
09772d92cd5ad9 Daniel Borkmann 2018-06-02 364
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
On 2/26/21 12:23 PM, Björn Töpel wrote: > From: Björn Töpel <bjorn.topel@intel.com> > > Currently the bpf_redirect_map() implementation dispatches to the > correct map-lookup function via a switch-statement. To avoid the > dispatching, this change adds bpf_redirect_map() as a map > operation. Each map provides its bpf_redirect_map() version, and > correct function is automatically selected by the BPF verifier. > > A nice side-effect of the code movement is that the map lookup > functions are now local to the map implementation files, which removes > one additional function call. > > Signed-off-by: Björn Töpel <bjorn.topel@intel.com> > --- > include/linux/bpf.h | 26 ++++++-------------------- > include/linux/filter.h | 27 +++++++++++++++++++++++++++ > include/net/xdp_sock.h | 19 ------------------- > kernel/bpf/cpumap.c | 8 +++++++- > kernel/bpf/devmap.c | 16 ++++++++++++++-- > kernel/bpf/verifier.c | 11 +++++++++-- > net/core/filter.c | 39 +-------------------------------------- > net/xdp/xskmap.c | 18 ++++++++++++++++++ > 8 files changed, 82 insertions(+), 82 deletions(-) > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h > index cccaef1088ea..a44ba904ca37 100644 > --- a/include/linux/bpf.h > +++ b/include/linux/bpf.h > @@ -117,6 +117,9 @@ struct bpf_map_ops { > void *owner, u32 size); > struct bpf_local_storage __rcu ** (*map_owner_storage_ptr)(void *owner); > > + /* XDP helpers.*/ > + int (*xdp_redirect_map)(struct bpf_map *map, u32 ifindex, u64 flags); > + > /* map_meta_equal must be implemented for maps that can be > * used as an inner map. It is a runtime check to ensure > * an inner map can be inserted to an outer map. [...] > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c > index 1dda9d81f12c..96705a49225e 100644 > --- a/kernel/bpf/verifier.c > +++ b/kernel/bpf/verifier.c > @@ -5409,7 +5409,8 @@ record_func_map(struct bpf_verifier_env *env, struct bpf_call_arg_meta *meta, > func_id != BPF_FUNC_map_delete_elem && > func_id != BPF_FUNC_map_push_elem && > func_id != BPF_FUNC_map_pop_elem && > - func_id != BPF_FUNC_map_peek_elem) > + func_id != BPF_FUNC_map_peek_elem && > + func_id != BPF_FUNC_redirect_map) > return 0; > > if (map == NULL) { > @@ -11762,7 +11763,8 @@ static int fixup_bpf_calls(struct bpf_verifier_env *env) > insn->imm == BPF_FUNC_map_delete_elem || > insn->imm == BPF_FUNC_map_push_elem || > insn->imm == BPF_FUNC_map_pop_elem || > - insn->imm == BPF_FUNC_map_peek_elem)) { > + insn->imm == BPF_FUNC_map_peek_elem || > + insn->imm == BPF_FUNC_redirect_map)) { > aux = &env->insn_aux_data[i + delta]; > if (bpf_map_ptr_poisoned(aux)) > goto patch_call_imm; > @@ -11804,6 +11806,8 @@ static int fixup_bpf_calls(struct bpf_verifier_env *env) > (int (*)(struct bpf_map *map, void *value))NULL)); > BUILD_BUG_ON(!__same_type(ops->map_peek_elem, > (int (*)(struct bpf_map *map, void *value))NULL)); > + BUILD_BUG_ON(!__same_type(ops->xdp_redirect_map, > + (int (*)(struct bpf_map *map, u32 ifindex, u64 flags))NULL)); > patch_map_ops_generic: > switch (insn->imm) { > case BPF_FUNC_map_lookup_elem: > @@ -11830,6 +11834,9 @@ static int fixup_bpf_calls(struct bpf_verifier_env *env) > insn->imm = BPF_CAST_CALL(ops->map_peek_elem) - > __bpf_call_base; > continue; > + case BPF_FUNC_redirect_map: > + insn->imm = BPF_CAST_CALL(ops->xdp_redirect_map) - __bpf_call_base; Small nit: I would name the generic callback ops->map_redirect so that this is in line with the general naming convention for the map ops. Otherwise this looks much better, thx! > + continue; > } > > goto patch_call_imm;
On 2021-02-26 22:48, Daniel Borkmann wrote: > On 2/26/21 12:23 PM, Björn Töpel wrote: >> From: Björn Töpel <bjorn.topel@intel.com> >> >> Currently the bpf_redirect_map() implementation dispatches to the >> correct map-lookup function via a switch-statement. To avoid the >> dispatching, this change adds bpf_redirect_map() as a map >> operation. Each map provides its bpf_redirect_map() version, and >> correct function is automatically selected by the BPF verifier. >> >> A nice side-effect of the code movement is that the map lookup >> functions are now local to the map implementation files, which removes >> one additional function call. >> >> Signed-off-by: Björn Töpel <bjorn.topel@intel.com> >> --- >> include/linux/bpf.h | 26 ++++++-------------------- >> include/linux/filter.h | 27 +++++++++++++++++++++++++++ >> include/net/xdp_sock.h | 19 ------------------- >> kernel/bpf/cpumap.c | 8 +++++++- >> kernel/bpf/devmap.c | 16 ++++++++++++++-- >> kernel/bpf/verifier.c | 11 +++++++++-- >> net/core/filter.c | 39 +-------------------------------------- >> net/xdp/xskmap.c | 18 ++++++++++++++++++ >> 8 files changed, 82 insertions(+), 82 deletions(-) >> >> diff --git a/include/linux/bpf.h b/include/linux/bpf.h >> index cccaef1088ea..a44ba904ca37 100644 >> --- a/include/linux/bpf.h >> +++ b/include/linux/bpf.h >> @@ -117,6 +117,9 @@ struct bpf_map_ops { >> void *owner, u32 size); >> struct bpf_local_storage __rcu ** (*map_owner_storage_ptr)(void >> *owner); >> + /* XDP helpers.*/ >> + int (*xdp_redirect_map)(struct bpf_map *map, u32 ifindex, u64 >> flags); >> + >> /* map_meta_equal must be implemented for maps that can be >> * used as an inner map. It is a runtime check to ensure >> * an inner map can be inserted to an outer map. > [...] >> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c >> index 1dda9d81f12c..96705a49225e 100644 >> --- a/kernel/bpf/verifier.c >> +++ b/kernel/bpf/verifier.c >> @@ -5409,7 +5409,8 @@ record_func_map(struct bpf_verifier_env *env, >> struct bpf_call_arg_meta *meta, >> func_id != BPF_FUNC_map_delete_elem && >> func_id != BPF_FUNC_map_push_elem && >> func_id != BPF_FUNC_map_pop_elem && >> - func_id != BPF_FUNC_map_peek_elem) >> + func_id != BPF_FUNC_map_peek_elem && >> + func_id != BPF_FUNC_redirect_map) >> return 0; >> if (map == NULL) { >> @@ -11762,7 +11763,8 @@ static int fixup_bpf_calls(struct >> bpf_verifier_env *env) >> insn->imm == BPF_FUNC_map_delete_elem || >> insn->imm == BPF_FUNC_map_push_elem || >> insn->imm == BPF_FUNC_map_pop_elem || >> - insn->imm == BPF_FUNC_map_peek_elem)) { >> + insn->imm == BPF_FUNC_map_peek_elem || >> + insn->imm == BPF_FUNC_redirect_map)) { >> aux = &env->insn_aux_data[i + delta]; >> if (bpf_map_ptr_poisoned(aux)) >> goto patch_call_imm; >> @@ -11804,6 +11806,8 @@ static int fixup_bpf_calls(struct >> bpf_verifier_env *env) >> (int (*)(struct bpf_map *map, void *value))NULL)); >> BUILD_BUG_ON(!__same_type(ops->map_peek_elem, >> (int (*)(struct bpf_map *map, void *value))NULL)); >> + BUILD_BUG_ON(!__same_type(ops->xdp_redirect_map, >> + (int (*)(struct bpf_map *map, u32 ifindex, u64 >> flags))NULL)); >> patch_map_ops_generic: >> switch (insn->imm) { >> case BPF_FUNC_map_lookup_elem: >> @@ -11830,6 +11834,9 @@ static int fixup_bpf_calls(struct >> bpf_verifier_env *env) >> insn->imm = BPF_CAST_CALL(ops->map_peek_elem) - >> __bpf_call_base; >> continue; >> + case BPF_FUNC_redirect_map: >> + insn->imm = BPF_CAST_CALL(ops->xdp_redirect_map) - >> __bpf_call_base; > > Small nit: I would name the generic callback ops->map_redirect so that > this is in line with > the general naming convention for the map ops. Otherwise this looks much > better, thx! > I'll respin! Thanks for the input! I'll ignore the BPF_CAST_CALL W=1 warnings ([-Wcast-function-type]), or do you have any thoughts on that? I don't think it's a good idea to silence that warning for the whole verifier.c Björn >> + continue; >> } >> goto patch_call_imm;
On 2/27/21 10:04 AM, Björn Töpel wrote: > On 2021-02-26 22:48, Daniel Borkmann wrote: >> On 2/26/21 12:23 PM, Björn Töpel wrote: >>> From: Björn Töpel <bjorn.topel@intel.com> >>> >>> Currently the bpf_redirect_map() implementation dispatches to the >>> correct map-lookup function via a switch-statement. To avoid the >>> dispatching, this change adds bpf_redirect_map() as a map >>> operation. Each map provides its bpf_redirect_map() version, and >>> correct function is automatically selected by the BPF verifier. >>> >>> A nice side-effect of the code movement is that the map lookup >>> functions are now local to the map implementation files, which removes >>> one additional function call. >>> >>> Signed-off-by: Björn Töpel <bjorn.topel@intel.com> >>> --- >>> include/linux/bpf.h | 26 ++++++-------------------- >>> include/linux/filter.h | 27 +++++++++++++++++++++++++++ >>> include/net/xdp_sock.h | 19 ------------------- >>> kernel/bpf/cpumap.c | 8 +++++++- >>> kernel/bpf/devmap.c | 16 ++++++++++++++-- >>> kernel/bpf/verifier.c | 11 +++++++++-- >>> net/core/filter.c | 39 +-------------------------------------- >>> net/xdp/xskmap.c | 18 ++++++++++++++++++ >>> 8 files changed, 82 insertions(+), 82 deletions(-) >>> >>> diff --git a/include/linux/bpf.h b/include/linux/bpf.h >>> index cccaef1088ea..a44ba904ca37 100644 >>> --- a/include/linux/bpf.h >>> +++ b/include/linux/bpf.h >>> @@ -117,6 +117,9 @@ struct bpf_map_ops { >>> void *owner, u32 size); >>> struct bpf_local_storage __rcu ** (*map_owner_storage_ptr)(void *owner); >>> + /* XDP helpers.*/ >>> + int (*xdp_redirect_map)(struct bpf_map *map, u32 ifindex, u64 flags); >>> + >>> /* map_meta_equal must be implemented for maps that can be >>> * used as an inner map. It is a runtime check to ensure >>> * an inner map can be inserted to an outer map. >> [...] >>> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c >>> index 1dda9d81f12c..96705a49225e 100644 >>> --- a/kernel/bpf/verifier.c >>> +++ b/kernel/bpf/verifier.c >>> @@ -5409,7 +5409,8 @@ record_func_map(struct bpf_verifier_env *env, struct bpf_call_arg_meta *meta, >>> func_id != BPF_FUNC_map_delete_elem && >>> func_id != BPF_FUNC_map_push_elem && >>> func_id != BPF_FUNC_map_pop_elem && >>> - func_id != BPF_FUNC_map_peek_elem) >>> + func_id != BPF_FUNC_map_peek_elem && >>> + func_id != BPF_FUNC_redirect_map) >>> return 0; >>> if (map == NULL) { >>> @@ -11762,7 +11763,8 @@ static int fixup_bpf_calls(struct bpf_verifier_env *env) >>> insn->imm == BPF_FUNC_map_delete_elem || >>> insn->imm == BPF_FUNC_map_push_elem || >>> insn->imm == BPF_FUNC_map_pop_elem || >>> - insn->imm == BPF_FUNC_map_peek_elem)) { >>> + insn->imm == BPF_FUNC_map_peek_elem || >>> + insn->imm == BPF_FUNC_redirect_map)) { >>> aux = &env->insn_aux_data[i + delta]; >>> if (bpf_map_ptr_poisoned(aux)) >>> goto patch_call_imm; >>> @@ -11804,6 +11806,8 @@ static int fixup_bpf_calls(struct bpf_verifier_env *env) >>> (int (*)(struct bpf_map *map, void *value))NULL)); >>> BUILD_BUG_ON(!__same_type(ops->map_peek_elem, >>> (int (*)(struct bpf_map *map, void *value))NULL)); >>> + BUILD_BUG_ON(!__same_type(ops->xdp_redirect_map, >>> + (int (*)(struct bpf_map *map, u32 ifindex, u64 flags))NULL)); >>> patch_map_ops_generic: >>> switch (insn->imm) { >>> case BPF_FUNC_map_lookup_elem: >>> @@ -11830,6 +11834,9 @@ static int fixup_bpf_calls(struct bpf_verifier_env *env) >>> insn->imm = BPF_CAST_CALL(ops->map_peek_elem) - >>> __bpf_call_base; >>> continue; >>> + case BPF_FUNC_redirect_map: >>> + insn->imm = BPF_CAST_CALL(ops->xdp_redirect_map) - __bpf_call_base; >> >> Small nit: I would name the generic callback ops->map_redirect so that this is in line with >> the general naming convention for the map ops. Otherwise this looks much better, thx! >> > > I'll respin! Thanks for the input! > > I'll ignore the BPF_CAST_CALL W=1 warnings ([-Wcast-function-type]), or > do you have any thoughts on that? I don't think it's a good idea to > silence that warning for the whole verifier.c Makes sense, yes, given they are neither new nor critical for the existing ones either. Thanks, Daniel
diff --git a/include/linux/bpf.h b/include/linux/bpf.h index cccaef1088ea..a44ba904ca37 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -117,6 +117,9 @@ struct bpf_map_ops { void *owner, u32 size); struct bpf_local_storage __rcu ** (*map_owner_storage_ptr)(void *owner); + /* XDP helpers.*/ + int (*xdp_redirect_map)(struct bpf_map *map, u32 ifindex, u64 flags); + /* map_meta_equal must be implemented for maps that can be * used as an inner map. It is a runtime check to ensure * an inner map can be inserted to an outer map. @@ -1429,9 +1432,9 @@ struct btf *bpf_get_btf_vmlinux(void); /* Map specifics */ struct xdp_buff; struct sk_buff; +struct bpf_dtab_netdev; +struct bpf_cpu_map_entry; -struct bpf_dtab_netdev *__dev_map_lookup_elem(struct bpf_map *map, u32 key); -struct bpf_dtab_netdev *__dev_map_hash_lookup_elem(struct bpf_map *map, u32 key); void __dev_flush(void); int dev_xdp_enqueue(struct net_device *dev, struct xdp_buff *xdp, struct net_device *dev_rx); @@ -1441,7 +1444,6 @@ int dev_map_generic_redirect(struct bpf_dtab_netdev *dst, struct sk_buff *skb, struct bpf_prog *xdp_prog); bool dev_map_can_have_prog(struct bpf_map *map); -struct bpf_cpu_map_entry *__cpu_map_lookup_elem(struct bpf_map *map, u32 key); void __cpu_map_flush(void); int cpu_map_enqueue(struct bpf_cpu_map_entry *rcpu, struct xdp_buff *xdp, struct net_device *dev_rx); @@ -1568,17 +1570,6 @@ static inline int bpf_obj_get_user(const char __user *pathname, int flags) return -EOPNOTSUPP; } -static inline struct net_device *__dev_map_lookup_elem(struct bpf_map *map, - u32 key) -{ - return NULL; -} - -static inline struct net_device *__dev_map_hash_lookup_elem(struct bpf_map *map, - u32 key) -{ - return NULL; -} static inline bool dev_map_can_have_prog(struct bpf_map *map) { return false; @@ -1590,6 +1581,7 @@ static inline void __dev_flush(void) struct xdp_buff; struct bpf_dtab_netdev; +struct bpf_cpu_map_entry; static inline int dev_xdp_enqueue(struct net_device *dev, struct xdp_buff *xdp, @@ -1614,12 +1606,6 @@ static inline int dev_map_generic_redirect(struct bpf_dtab_netdev *dst, return 0; } -static inline -struct bpf_cpu_map_entry *__cpu_map_lookup_elem(struct bpf_map *map, u32 key) -{ - return NULL; -} - static inline void __cpu_map_flush(void) { } diff --git a/include/linux/filter.h b/include/linux/filter.h index 3b00fc906ccd..008691fd3b58 100644 --- a/include/linux/filter.h +++ b/include/linux/filter.h @@ -1472,4 +1472,31 @@ static inline bool bpf_sk_lookup_run_v6(struct net *net, int protocol, } #endif /* IS_ENABLED(CONFIG_IPV6) */ +static __always_inline int __bpf_xdp_redirect_map(struct bpf_map *map, u32 ifindex, u64 flags, + void *lookup_elem(struct bpf_map *map, u32 key)) +{ + struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info); + + /* Lower bits of the flags are used as return code on lookup failure */ + if (unlikely(flags > XDP_TX)) + return XDP_ABORTED; + + ri->tgt_value = lookup_elem(map, ifindex); + if (unlikely(!ri->tgt_value)) { + /* If the lookup fails we want to clear out the state in the + * redirect_info struct completely, so that if an eBPF program + * performs multiple lookups, the last one always takes + * precedence. + */ + WRITE_ONCE(ri->map, NULL); + return flags; + } + + ri->flags = flags; + ri->tgt_index = ifindex; + WRITE_ONCE(ri->map, map); + + return XDP_REDIRECT; +} + #endif /* __LINUX_FILTER_H__ */ diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h index cc17bc957548..9c0722c6d7ac 100644 --- a/include/net/xdp_sock.h +++ b/include/net/xdp_sock.h @@ -80,19 +80,6 @@ int xsk_generic_rcv(struct xdp_sock *xs, struct xdp_buff *xdp); int __xsk_map_redirect(struct xdp_sock *xs, struct xdp_buff *xdp); void __xsk_map_flush(void); -static inline struct xdp_sock *__xsk_map_lookup_elem(struct bpf_map *map, - u32 key) -{ - struct xsk_map *m = container_of(map, struct xsk_map, map); - struct xdp_sock *xs; - - if (key >= map->max_entries) - return NULL; - - xs = READ_ONCE(m->xsk_map[key]); - return xs; -} - #else static inline int xsk_generic_rcv(struct xdp_sock *xs, struct xdp_buff *xdp) @@ -109,12 +96,6 @@ static inline void __xsk_map_flush(void) { } -static inline struct xdp_sock *__xsk_map_lookup_elem(struct bpf_map *map, - u32 key) -{ - return NULL; -} - #endif /* CONFIG_XDP_SOCKETS */ #endif /* _LINUX_XDP_SOCK_H */ diff --git a/kernel/bpf/cpumap.c b/kernel/bpf/cpumap.c index 5d1469de6921..85a2d33fd46b 100644 --- a/kernel/bpf/cpumap.c +++ b/kernel/bpf/cpumap.c @@ -563,7 +563,7 @@ static void cpu_map_free(struct bpf_map *map) kfree(cmap); } -struct bpf_cpu_map_entry *__cpu_map_lookup_elem(struct bpf_map *map, u32 key) +static void *__cpu_map_lookup_elem(struct bpf_map *map, u32 key) { struct bpf_cpu_map *cmap = container_of(map, struct bpf_cpu_map, map); struct bpf_cpu_map_entry *rcpu; @@ -600,6 +600,11 @@ static int cpu_map_get_next_key(struct bpf_map *map, void *key, void *next_key) return 0; } +static int cpu_map_xdp_redirect_map(struct bpf_map *map, u32 ifindex, u64 flags) +{ + return __bpf_xdp_redirect_map(map, ifindex, flags, __cpu_map_lookup_elem); +} + static int cpu_map_btf_id; const struct bpf_map_ops cpu_map_ops = { .map_meta_equal = bpf_map_meta_equal, @@ -612,6 +617,7 @@ const struct bpf_map_ops cpu_map_ops = { .map_check_btf = map_check_no_btf, .map_btf_name = "bpf_cpu_map", .map_btf_id = &cpu_map_btf_id, + .xdp_redirect_map = cpu_map_xdp_redirect_map, }; static void bq_flush_to_queue(struct xdp_bulk_queue *bq) diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c index 85d9d1b72a33..adf9a2517f80 100644 --- a/kernel/bpf/devmap.c +++ b/kernel/bpf/devmap.c @@ -258,7 +258,7 @@ static int dev_map_get_next_key(struct bpf_map *map, void *key, void *next_key) return 0; } -struct bpf_dtab_netdev *__dev_map_hash_lookup_elem(struct bpf_map *map, u32 key) +static void *__dev_map_hash_lookup_elem(struct bpf_map *map, u32 key) { struct bpf_dtab *dtab = container_of(map, struct bpf_dtab, map); struct hlist_head *head = dev_map_index_hash(dtab, key); @@ -392,7 +392,7 @@ void __dev_flush(void) * update happens in parallel here a dev_put wont happen until after reading the * ifindex. */ -struct bpf_dtab_netdev *__dev_map_lookup_elem(struct bpf_map *map, u32 key) +static void *__dev_map_lookup_elem(struct bpf_map *map, u32 key) { struct bpf_dtab *dtab = container_of(map, struct bpf_dtab, map); struct bpf_dtab_netdev *obj; @@ -735,6 +735,16 @@ static int dev_map_hash_update_elem(struct bpf_map *map, void *key, void *value, map, key, value, map_flags); } +static int dev_map_xdp_redirect_map(struct bpf_map *map, u32 ifindex, u64 flags) +{ + return __bpf_xdp_redirect_map(map, ifindex, flags, __dev_map_lookup_elem); +} + +static int dev_hash_map_xdp_redirect_map(struct bpf_map *map, u32 ifindex, u64 flags) +{ + return __bpf_xdp_redirect_map(map, ifindex, flags, __dev_map_hash_lookup_elem); +} + static int dev_map_btf_id; const struct bpf_map_ops dev_map_ops = { .map_meta_equal = bpf_map_meta_equal, @@ -747,6 +757,7 @@ const struct bpf_map_ops dev_map_ops = { .map_check_btf = map_check_no_btf, .map_btf_name = "bpf_dtab", .map_btf_id = &dev_map_btf_id, + .xdp_redirect_map = dev_map_xdp_redirect_map, }; static int dev_map_hash_map_btf_id; @@ -761,6 +772,7 @@ const struct bpf_map_ops dev_map_hash_ops = { .map_check_btf = map_check_no_btf, .map_btf_name = "bpf_dtab", .map_btf_id = &dev_map_hash_map_btf_id, + .xdp_redirect_map = dev_hash_map_xdp_redirect_map, }; static void dev_map_hash_remove_netdev(struct bpf_dtab *dtab, diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 1dda9d81f12c..96705a49225e 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -5409,7 +5409,8 @@ record_func_map(struct bpf_verifier_env *env, struct bpf_call_arg_meta *meta, func_id != BPF_FUNC_map_delete_elem && func_id != BPF_FUNC_map_push_elem && func_id != BPF_FUNC_map_pop_elem && - func_id != BPF_FUNC_map_peek_elem) + func_id != BPF_FUNC_map_peek_elem && + func_id != BPF_FUNC_redirect_map) return 0; if (map == NULL) { @@ -11762,7 +11763,8 @@ static int fixup_bpf_calls(struct bpf_verifier_env *env) insn->imm == BPF_FUNC_map_delete_elem || insn->imm == BPF_FUNC_map_push_elem || insn->imm == BPF_FUNC_map_pop_elem || - insn->imm == BPF_FUNC_map_peek_elem)) { + insn->imm == BPF_FUNC_map_peek_elem || + insn->imm == BPF_FUNC_redirect_map)) { aux = &env->insn_aux_data[i + delta]; if (bpf_map_ptr_poisoned(aux)) goto patch_call_imm; @@ -11804,6 +11806,8 @@ static int fixup_bpf_calls(struct bpf_verifier_env *env) (int (*)(struct bpf_map *map, void *value))NULL)); BUILD_BUG_ON(!__same_type(ops->map_peek_elem, (int (*)(struct bpf_map *map, void *value))NULL)); + BUILD_BUG_ON(!__same_type(ops->xdp_redirect_map, + (int (*)(struct bpf_map *map, u32 ifindex, u64 flags))NULL)); patch_map_ops_generic: switch (insn->imm) { case BPF_FUNC_map_lookup_elem: @@ -11830,6 +11834,9 @@ static int fixup_bpf_calls(struct bpf_verifier_env *env) insn->imm = BPF_CAST_CALL(ops->map_peek_elem) - __bpf_call_base; continue; + case BPF_FUNC_redirect_map: + insn->imm = BPF_CAST_CALL(ops->xdp_redirect_map) - __bpf_call_base; + continue; } goto patch_call_imm; diff --git a/net/core/filter.c b/net/core/filter.c index adfdad234674..fdf7401f43fd 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -3944,22 +3944,6 @@ void xdp_do_flush(void) } EXPORT_SYMBOL_GPL(xdp_do_flush); -static inline void *__xdp_map_lookup_elem(struct bpf_map *map, u32 index) -{ - switch (map->map_type) { - case BPF_MAP_TYPE_DEVMAP: - return __dev_map_lookup_elem(map, index); - case BPF_MAP_TYPE_DEVMAP_HASH: - return __dev_map_hash_lookup_elem(map, index); - case BPF_MAP_TYPE_CPUMAP: - return __cpu_map_lookup_elem(map, index); - case BPF_MAP_TYPE_XSKMAP: - return __xsk_map_lookup_elem(map, index); - default: - return NULL; - } -} - void bpf_clear_redirect_map(struct bpf_map *map) { struct bpf_redirect_info *ri; @@ -4113,28 +4097,7 @@ static const struct bpf_func_proto bpf_xdp_redirect_proto = { BPF_CALL_3(bpf_xdp_redirect_map, struct bpf_map *, map, u32, ifindex, u64, flags) { - struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info); - - /* Lower bits of the flags are used as return code on lookup failure */ - if (unlikely(flags > XDP_TX)) - return XDP_ABORTED; - - ri->tgt_value = __xdp_map_lookup_elem(map, ifindex); - if (unlikely(!ri->tgt_value)) { - /* If the lookup fails we want to clear out the state in the - * redirect_info struct completely, so that if an eBPF program - * performs multiple lookups, the last one always takes - * precedence. - */ - WRITE_ONCE(ri->map, NULL); - return flags; - } - - ri->flags = flags; - ri->tgt_index = ifindex; - WRITE_ONCE(ri->map, map); - - return XDP_REDIRECT; + return map->ops->xdp_redirect_map(map, ifindex, flags); } static const struct bpf_func_proto bpf_xdp_redirect_map_proto = { diff --git a/net/xdp/xskmap.c b/net/xdp/xskmap.c index 113fd9017203..92f4023d3ae2 100644 --- a/net/xdp/xskmap.c +++ b/net/xdp/xskmap.c @@ -125,6 +125,18 @@ static int xsk_map_gen_lookup(struct bpf_map *map, struct bpf_insn *insn_buf) return insn - insn_buf; } +static void *__xsk_map_lookup_elem(struct bpf_map *map, u32 key) +{ + struct xsk_map *m = container_of(map, struct xsk_map, map); + struct xdp_sock *xs; + + if (key >= map->max_entries) + return NULL; + + xs = READ_ONCE(m->xsk_map[key]); + return xs; +} + static void *xsk_map_lookup_elem(struct bpf_map *map, void *key) { WARN_ON_ONCE(!rcu_read_lock_held()); @@ -215,6 +227,11 @@ static int xsk_map_delete_elem(struct bpf_map *map, void *key) return 0; } +static int xsk_map_xdp_redirect_map(struct bpf_map *map, u32 ifindex, u64 flags) +{ + return __bpf_xdp_redirect_map(map, ifindex, flags, __xsk_map_lookup_elem); +} + void xsk_map_try_sock_delete(struct xsk_map *map, struct xdp_sock *xs, struct xdp_sock **map_entry) { @@ -247,4 +264,5 @@ const struct bpf_map_ops xsk_map_ops = { .map_check_btf = map_check_no_btf, .map_btf_name = "xsk_map", .map_btf_id = &xsk_map_btf_id, + .xdp_redirect_map = xsk_map_xdp_redirect_map, };