Message ID | f850bb7e20950736d9175c61d7e0691098e06182.1660592020.git.dxu@dxuuu.xyz (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | BPF |
Headers | show |
Series | Support direct writes to nf_conn:mark | expand |
Daniel Xu <dxu@dxuuu.xyz> writes: > Support direct writes to nf_conn:mark from TC and XDP prog types. This > is useful when applications want to store per-connection metadata. This > is also particularly useful for applications that run both bpf and > iptables/nftables because the latter can trivially access this metadata. > > One example use case would be if a bpf prog is responsible for advanced > packet classification and iptables/nftables is later used for routing > due to pre-existing/legacy code. > > Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> Didn't we agree the last time around that all field access should be using helper kfuncs instead of allowing direct writes to struct nf_conn? -Toke
Toke Høiland-Jørgensen <toke@kernel.org> wrote: > > Support direct writes to nf_conn:mark from TC and XDP prog types. This > > is useful when applications want to store per-connection metadata. This > > is also particularly useful for applications that run both bpf and > > iptables/nftables because the latter can trivially access this metadata. > > > > One example use case would be if a bpf prog is responsible for advanced > > packet classification and iptables/nftables is later used for routing > > due to pre-existing/legacy code. > > > > Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> > > Didn't we agree the last time around that all field access should be > using helper kfuncs instead of allowing direct writes to struct nf_conn? I don't see why ct->mark needs special handling. It might be possible we need to change accesses on nf/tc side to use READ/WRITE_ONCE though.
Hi Toke, On Mon, Aug 15, 2022, at 4:25 PM, Toke Høiland-Jørgensen wrote: > Daniel Xu <dxu@dxuuu.xyz> writes: > >> Support direct writes to nf_conn:mark from TC and XDP prog types. This >> is useful when applications want to store per-connection metadata. This >> is also particularly useful for applications that run both bpf and >> iptables/nftables because the latter can trivially access this metadata. >> >> One example use case would be if a bpf prog is responsible for advanced >> packet classification and iptables/nftables is later used for routing >> due to pre-existing/legacy code. >> >> Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> > > Didn't we agree the last time around that all field access should be > using helper kfuncs instead of allowing direct writes to struct nf_conn? Sorry, I was not aware of those discussions. Do you have a link handy? I received the suggestion to enable direct writes here: https://lore.kernel.org/bpf/CAP01T74aWUW-iyPCV_VfASO6YqfAZmnkYQMN2B4L8ngMMgnAcw@mail.gmail.com/ . Thanks, Daniel
On Mon, Aug 15, 2022 at 3:40 PM Florian Westphal <fw@strlen.de> wrote: > > Toke Høiland-Jørgensen <toke@kernel.org> wrote: > > > Support direct writes to nf_conn:mark from TC and XDP prog types. This > > > is useful when applications want to store per-connection metadata. This > > > is also particularly useful for applications that run both bpf and > > > iptables/nftables because the latter can trivially access this metadata. > > > > > > One example use case would be if a bpf prog is responsible for advanced > > > packet classification and iptables/nftables is later used for routing > > > due to pre-existing/legacy code. > > > > > > Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> > > > > Didn't we agree the last time around that all field access should be > > using helper kfuncs instead of allowing direct writes to struct nf_conn? > > I don't see why ct->mark needs special handling. > > It might be possible we need to change accesses on nf/tc side to use > READ/WRITE_ONCE though. +1 I don't think we need to have a hard rule. If fields is safe to access directly than it's faster to let bpf prog read/write it. There are no backward compat concerns. If conntrack side decides to make that field special we can disallow direct writes in the same kernel version. These accesses, just like kfuncs, are unstable.
Hi Daniel, Thank you for the patch! Yet something to improve: [auto build test ERROR on bpf-next/master] url: https://github.com/intel-lab-lkp/linux/commits/Daniel-Xu/Support-direct-writes-to-nf_conn-mark/20220816-060429 base: https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master config: um-x86_64_defconfig (https://download.01.org/0day-ci/archive/20220816/202208160931.5FG8tZ8G-lkp@intel.com/config) compiler: gcc-11 (Debian 11.3.0-5) 11.3.0 reproduce (this is a W=1 build): # https://github.com/intel-lab-lkp/linux/commit/c7b21d163eb9c61514dd86baf4281deb4d4387bb git remote add linux-review https://github.com/intel-lab-lkp/linux git fetch --no-tags linux-review Daniel-Xu/Support-direct-writes-to-nf_conn-mark/20220816-060429 git checkout c7b21d163eb9c61514dd86baf4281deb4d4387bb # save the config file mkdir build_dir && cp config build_dir/.config make W=1 O=build_dir ARCH=um SUBARCH=x86_64 SHELL=/bin/bash If you fix the issue, kindly add following tag where applicable Reported-by: kernel test robot <lkp@intel.com> All errors (new ones prefixed by >>): net/core/filter.c: In function 'tc_cls_act_btf_struct_access': >> net/core/filter.c:8723:24: error: implicit declaration of function 'btf_struct_access' [-Werror=implicit-function-declaration] 8723 | return btf_struct_access(log, btf, t, off, size, atype, next_btf_id, | ^~~~~~~~~~~~~~~~~ cc1: some warnings being treated as errors vim +/btf_struct_access +8723 net/core/filter.c 8714 8715 static int tc_cls_act_btf_struct_access(struct bpf_verifier_log *log, 8716 const struct btf *btf, 8717 const struct btf_type *t, int off, 8718 int size, enum bpf_access_type atype, 8719 u32 *next_btf_id, 8720 enum bpf_type_flag *flag) 8721 { 8722 if (atype == BPF_READ) > 8723 return btf_struct_access(log, btf, t, off, size, atype, next_btf_id, 8724 flag); 8725 8726 return nf_conntrack_btf_struct_access(log, btf, t, off, size, atype, 8727 next_btf_id, flag); 8728 } 8729
Alexei Starovoitov <alexei.starovoitov@gmail.com> writes: > On Mon, Aug 15, 2022 at 3:40 PM Florian Westphal <fw@strlen.de> wrote: >> >> Toke Høiland-Jørgensen <toke@kernel.org> wrote: >> > > Support direct writes to nf_conn:mark from TC and XDP prog types. This >> > > is useful when applications want to store per-connection metadata. This >> > > is also particularly useful for applications that run both bpf and >> > > iptables/nftables because the latter can trivially access this metadata. >> > > >> > > One example use case would be if a bpf prog is responsible for advanced >> > > packet classification and iptables/nftables is later used for routing >> > > due to pre-existing/legacy code. >> > > >> > > Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> >> > >> > Didn't we agree the last time around that all field access should be >> > using helper kfuncs instead of allowing direct writes to struct nf_conn? >> >> I don't see why ct->mark needs special handling. >> >> It might be possible we need to change accesses on nf/tc side to use >> READ/WRITE_ONCE though. > > +1 > I don't think we need to have a hard rule. > If fields is safe to access directly than it's faster > to let bpf prog read/write it. > There are no backward compat concerns. If conntrack side decides > to make that field special we can disallow direct writes in > the same kernel version. Right, I was under the impression we wanted all fields to be wrapper by helpers so that the struct owner could change their semantics without affecting users (and solve the performance issue by figuring out a generic way to inline those helpers). I guess there could also be an API consistency argument for doing this. However, I don't have a strong opinion on this, so if y'all prefer keeping these as direct field writes, that's OK with me. > These accesses, just like kfuncs, are unstable. Well, it will be interesting to see how that plays out the first time an application relying on one of these breaks on a kernel upgrade :) -Toke
Hi Florian, On Mon, Aug 15, 2022, at 4:40 PM, Florian Westphal wrote: > Toke Høiland-Jørgensen <toke@kernel.org> wrote: >> > Support direct writes to nf_conn:mark from TC and XDP prog types. This >> > is useful when applications want to store per-connection metadata. This >> > is also particularly useful for applications that run both bpf and >> > iptables/nftables because the latter can trivially access this metadata. >> > >> > One example use case would be if a bpf prog is responsible for advanced >> > packet classification and iptables/nftables is later used for routing >> > due to pre-existing/legacy code. >> > >> > Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> >> >> Didn't we agree the last time around that all field access should be >> using helper kfuncs instead of allowing direct writes to struct nf_conn? > > I don't see why ct->mark needs special handling. > > It might be possible we need to change accesses on nf/tc side to use > READ/WRITE_ONCE though. I reviewed some of the LKMM literature and I would concur that READ/WRITE_ONCE() is necessary. Especially after this patchset. However, it's unclear to me if this is a latent issue. IOW: is reading ct->mark protected by a lock? I only briefly looked but it doesn't seem like it. I'll do some more digging. In the meantime, I'll send out a v2 on this patchset and I'll plan on sending out a followup patchset for adding READ/WRITE_ONCE() to ct->mark accesses. Thanks, Daniel
Daniel Xu <dxu@dxuuu.xyz> wrote: > On Mon, Aug 15, 2022, at 4:40 PM, Florian Westphal wrote: > > Toke Høiland-Jørgensen <toke@kernel.org> wrote: > >> > Support direct writes to nf_conn:mark from TC and XDP prog types. This > >> > is useful when applications want to store per-connection metadata. This > >> > is also particularly useful for applications that run both bpf and > >> > iptables/nftables because the latter can trivially access this metadata. > >> > > >> > One example use case would be if a bpf prog is responsible for advanced > >> > packet classification and iptables/nftables is later used for routing > >> > due to pre-existing/legacy code. > >> > > >> > Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> > >> > >> Didn't we agree the last time around that all field access should be > >> using helper kfuncs instead of allowing direct writes to struct nf_conn? > > > > I don't see why ct->mark needs special handling. > > > > It might be possible we need to change accesses on nf/tc side to use > > READ/WRITE_ONCE though. > > I reviewed some of the LKMM literature and I would concur that > READ/WRITE_ONCE() is necessary. Especially after this patchset. > > However, it's unclear to me if this is a latent issue. IOW: is reading > ct->mark protected by a lock? I only briefly looked but it doesn't > seem like it. No, its not protected by a lock. READ/WRITE_ONCE is unrelated to your patchset, this is a pre-existing "bug".
On Wed, Aug 17, 2022, at 12:34 PM, Florian Westphal wrote: > Daniel Xu <dxu@dxuuu.xyz> wrote: >> On Mon, Aug 15, 2022, at 4:40 PM, Florian Westphal wrote: >> > Toke Høiland-Jørgensen <toke@kernel.org> wrote: >> >> > Support direct writes to nf_conn:mark from TC and XDP prog types. This >> >> > is useful when applications want to store per-connection metadata. This >> >> > is also particularly useful for applications that run both bpf and >> >> > iptables/nftables because the latter can trivially access this metadata. >> >> > >> >> > One example use case would be if a bpf prog is responsible for advanced >> >> > packet classification and iptables/nftables is later used for routing >> >> > due to pre-existing/legacy code. >> >> > >> >> > Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> >> >> >> >> Didn't we agree the last time around that all field access should be >> >> using helper kfuncs instead of allowing direct writes to struct nf_conn? >> > >> > I don't see why ct->mark needs special handling. >> > >> > It might be possible we need to change accesses on nf/tc side to use >> > READ/WRITE_ONCE though. >> >> I reviewed some of the LKMM literature and I would concur that >> READ/WRITE_ONCE() is necessary. Especially after this patchset. >> >> However, it's unclear to me if this is a latent issue. IOW: is reading >> ct->mark protected by a lock? I only briefly looked but it doesn't >> seem like it. > > No, its not protected by a lock. READ/WRITE_ONCE is unrelated to your > patchset, this is a pre-existing "bug". Thanks for confirming. Since it's pre-existing I will send out a followup patchset then. Thanks, Daniel
Hi Daniel, Thank you for the patch! Yet something to improve: [auto build test ERROR on bpf-next/master] url: https://github.com/intel-lab-lkp/linux/commits/Daniel-Xu/Support-direct-writes-to-nf_conn-mark/20220816-060429 base: https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master config: arm-versatile_defconfig (https://download.01.org/0day-ci/archive/20220819/202208190318.HygywK17-lkp@intel.com/config) compiler: clang version 16.0.0 (https://github.com/llvm/llvm-project aed5e3bea138ce581d682158eb61c27b3cfdd6ec) reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # install arm cross compiling tool for clang build # apt-get install binutils-arm-linux-gnueabi # https://github.com/intel-lab-lkp/linux/commit/c7b21d163eb9c61514dd86baf4281deb4d4387bb git remote add linux-review https://github.com/intel-lab-lkp/linux git fetch --no-tags linux-review Daniel-Xu/Support-direct-writes-to-nf_conn-mark/20220816-060429 git checkout c7b21d163eb9c61514dd86baf4281deb4d4387bb # save the config file mkdir build_dir && cp config build_dir/.config COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=arm SHELL=/bin/bash If you fix the issue, kindly add following tag where applicable Reported-by: kernel test robot <lkp@intel.com> All errors (new ones prefixed by >>): >> net/core/filter.c:8723:10: error: call to undeclared function 'btf_struct_access'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration] return btf_struct_access(log, btf, t, off, size, atype, next_btf_id, ^ net/core/filter.c:8797:10: error: call to undeclared function 'btf_struct_access'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration] return btf_struct_access(log, btf, t, off, size, atype, next_btf_id, ^ 2 errors generated. vim +/btf_struct_access +8723 net/core/filter.c 8714 8715 static int tc_cls_act_btf_struct_access(struct bpf_verifier_log *log, 8716 const struct btf *btf, 8717 const struct btf_type *t, int off, 8718 int size, enum bpf_access_type atype, 8719 u32 *next_btf_id, 8720 enum bpf_type_flag *flag) 8721 { 8722 if (atype == BPF_READ) > 8723 return btf_struct_access(log, btf, t, off, size, atype, next_btf_id, 8724 flag); 8725 8726 return nf_conntrack_btf_struct_access(log, btf, t, off, size, atype, 8727 next_btf_id, flag); 8728 } 8729
diff --git a/include/net/netfilter/nf_conntrack_bpf.h b/include/net/netfilter/nf_conntrack_bpf.h index a473b56842c5..0f584c2bd475 100644 --- a/include/net/netfilter/nf_conntrack_bpf.h +++ b/include/net/netfilter/nf_conntrack_bpf.h @@ -3,6 +3,7 @@ #ifndef _NF_CONNTRACK_BPF_H #define _NF_CONNTRACK_BPF_H +#include <linux/bpf.h> #include <linux/btf.h> #include <linux/kconfig.h> @@ -10,6 +11,12 @@ (IS_MODULE(CONFIG_NF_CONNTRACK) && IS_ENABLED(CONFIG_DEBUG_INFO_BTF_MODULES)) extern int register_nf_conntrack_bpf(void); +extern int nf_conntrack_btf_struct_access(struct bpf_verifier_log *log, + const struct btf *btf, + const struct btf_type *t, int off, + int size, enum bpf_access_type atype, + u32 *next_btf_id, + enum bpf_type_flag *flag); #else @@ -18,6 +25,17 @@ static inline int register_nf_conntrack_bpf(void) return 0; } +static inline int +nf_conntrack_btf_struct_access(struct bpf_verifier_log *log, + const struct btf *btf, + const struct btf_type *t, int off, + int size, enum bpf_access_type atype, + u32 *next_btf_id, + enum bpf_type_flag *flag) +{ + return -EACCES; +} + #endif #endif /* _NF_CONNTRACK_BPF_H */ diff --git a/net/core/filter.c b/net/core/filter.c index 5669248aff25..d7b768fe9de7 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -18,6 +18,7 @@ */ #include <linux/atomic.h> +#include <linux/bpf_verifier.h> #include <linux/module.h> #include <linux/types.h> #include <linux/mm.h> @@ -55,6 +56,7 @@ #include <net/sock_reuseport.h> #include <net/busy_poll.h> #include <net/tcp.h> +#include <net/netfilter/nf_conntrack_bpf.h> #include <net/xfrm.h> #include <net/udp.h> #include <linux/bpf_trace.h> @@ -8710,6 +8712,21 @@ static bool tc_cls_act_is_valid_access(int off, int size, return bpf_skb_is_valid_access(off, size, type, prog, info); } +static int tc_cls_act_btf_struct_access(struct bpf_verifier_log *log, + const struct btf *btf, + const struct btf_type *t, int off, + int size, enum bpf_access_type atype, + u32 *next_btf_id, + enum bpf_type_flag *flag) +{ + if (atype == BPF_READ) + return btf_struct_access(log, btf, t, off, size, atype, next_btf_id, + flag); + + return nf_conntrack_btf_struct_access(log, btf, t, off, size, atype, + next_btf_id, flag); +} + static bool __is_valid_xdp_access(int off, int size) { if (off < 0 || off >= sizeof(struct xdp_md)) @@ -8769,6 +8786,21 @@ void bpf_warn_invalid_xdp_action(struct net_device *dev, struct bpf_prog *prog, } EXPORT_SYMBOL_GPL(bpf_warn_invalid_xdp_action); +static int xdp_btf_struct_access(struct bpf_verifier_log *log, + const struct btf *btf, + const struct btf_type *t, int off, + int size, enum bpf_access_type atype, + u32 *next_btf_id, + enum bpf_type_flag *flag) +{ + if (atype == BPF_READ) + return btf_struct_access(log, btf, t, off, size, atype, next_btf_id, + flag); + + return nf_conntrack_btf_struct_access(log, btf, t, off, size, atype, + next_btf_id, flag); +} + static bool sock_addr_is_valid_access(int off, int size, enum bpf_access_type type, const struct bpf_prog *prog, @@ -10663,6 +10695,7 @@ const struct bpf_verifier_ops tc_cls_act_verifier_ops = { .convert_ctx_access = tc_cls_act_convert_ctx_access, .gen_prologue = tc_cls_act_prologue, .gen_ld_abs = bpf_gen_ld_abs, + .btf_struct_access = tc_cls_act_btf_struct_access, }; const struct bpf_prog_ops tc_cls_act_prog_ops = { @@ -10674,6 +10707,7 @@ const struct bpf_verifier_ops xdp_verifier_ops = { .is_valid_access = xdp_is_valid_access, .convert_ctx_access = xdp_convert_ctx_access, .gen_prologue = bpf_noop_prologue, + .btf_struct_access = xdp_btf_struct_access, }; const struct bpf_prog_ops xdp_prog_ops = { diff --git a/net/netfilter/nf_conntrack_bpf.c b/net/netfilter/nf_conntrack_bpf.c index 1cd87b28c9b0..8010cc542d17 100644 --- a/net/netfilter/nf_conntrack_bpf.c +++ b/net/netfilter/nf_conntrack_bpf.c @@ -6,6 +6,7 @@ * are exposed through to BPF programs is explicitly unstable. */ +#include <linux/bpf_verifier.h> #include <linux/bpf.h> #include <linux/btf.h> #include <linux/types.h> @@ -15,6 +16,8 @@ #include <net/netfilter/nf_conntrack_bpf.h> #include <net/netfilter/nf_conntrack_core.h> +static const struct btf_type *nf_conn_type; + /* bpf_ct_opts - Options for CT lookup helpers * * Members: @@ -184,6 +187,53 @@ static struct nf_conn *__bpf_nf_ct_lookup(struct net *net, return ct; } +/* Check writes into `struct nf_conn` */ +int nf_conntrack_btf_struct_access(struct bpf_verifier_log *log, + const struct btf *btf, + const struct btf_type *t, int off, + int size, enum bpf_access_type atype, + u32 *next_btf_id, + enum bpf_type_flag *flag) +{ + const struct btf_type *nct = READ_ONCE(nf_conn_type); + s32 type_id; + size_t end; + + if (!nct) { + type_id = btf_find_by_name_kind(btf, "nf_conn", BTF_KIND_STRUCT); + if (type_id < 0) + return -EINVAL; + + nct = btf_type_by_id(btf, type_id); + WRITE_ONCE(nf_conn_type, nct); + } + + if (t != nct) { + bpf_log(log, "only read is supported\n"); + return -EACCES; + } + + switch (off) { +#if defined(CONFIG_NF_CONNTRACK_MARK) + case offsetof(struct nf_conn, mark): + end = offsetofend(struct nf_conn, mark); + break; +#endif + default: + bpf_log(log, "no write support to nf_conn at off %d\n", off); + return -EACCES; + } + + if (off + size > end) { + bpf_log(log, + "write access at off %d with size %d beyond the member of nf_conn ended at %zu\n", + off, size, end); + return -EACCES; + } + + return NOT_INIT; +} + __diag_push(); __diag_ignore_all("-Wmissing-prototypes", "Global functions as their definitions will be in nf_conntrack BTF");
Support direct writes to nf_conn:mark from TC and XDP prog types. This is useful when applications want to store per-connection metadata. This is also particularly useful for applications that run both bpf and iptables/nftables because the latter can trivially access this metadata. One example use case would be if a bpf prog is responsible for advanced packet classification and iptables/nftables is later used for routing due to pre-existing/legacy code. Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> --- include/net/netfilter/nf_conntrack_bpf.h | 18 +++++++++ net/core/filter.c | 34 ++++++++++++++++ net/netfilter/nf_conntrack_bpf.c | 50 ++++++++++++++++++++++++ 3 files changed, 102 insertions(+)