Message ID | 1625471347-21730-1-git-send-email-paulb@nvidia.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net,v2] skbuff: Release nfct refcount on napi stolen or re-used skbs | expand |
Hi! On 5.7.2021 10.49, Paul Blakey wrote: > When multiple SKBs are merged to a new skb under napi GRO, > or SKB is re-used by napi, if nfct was set for them in the > driver, it will not be released while freeing their stolen > head state or on re-use. > > Release nfct on napi's stolen or re-used SKBs, and > in gro_list_prepare, check conntrack metadata diff. > > Fixes: 5c6b94604744 ("net/mlx5e: CT: Handle misses after executing CT action") > Reviewed-by: Roi Dayan <roid@nvidia.com> > Signed-off-by: Paul Blakey <paulb@nvidia.com> > --- > Changelog: > v1->v2: > Check for different flows based on CT and chain metadata in gro_list_prepare > > net/core/dev.c | 13 +++++++++++++ > net/core/skbuff.c | 1 + > 2 files changed, 14 insertions(+) > > diff --git a/net/core/dev.c b/net/core/dev.c > index 439faadab0c2..bf62cb2ec6da 100644 > --- a/net/core/dev.c > +++ b/net/core/dev.c > @@ -5981,6 +5981,18 @@ static void gro_list_prepare(const struct list_head *head, > diffs = memcmp(skb_mac_header(p), > skb_mac_header(skb), > maclen); > + > + diffs |= skb_get_nfct(p) ^ skb_get_nfct(skb); > + > + if (!diffs) { > + struct tc_skb_ext *skb_ext = skb_ext_find(skb, TC_SKB_EXT); > + struct tc_skb_ext *p_ext = skb_ext_find(p, TC_SKB_EXT); > + > + diffs |= (!!p_ext) ^ (!!skb_ext); > + if (!diffs && unlikely(skb_ext)) > + diffs |= p_ext->chain ^ skb_ext->chain; > + } > + > NAPI_GRO_CB(p)->same_flow = !diffs; > } > } > @@ -6243,6 +6255,7 @@ static void napi_reuse_skb(struct napi_struct *napi, struct sk_buff *skb) > skb_shinfo(skb)->gso_type = 0; > skb->truesize = SKB_TRUESIZE(skb_end_offset(skb)); > skb_ext_reset(skb); > + nf_reset_ct(skb); > > napi->skb = skb; > } > diff --git a/net/core/skbuff.c b/net/core/skbuff.c > index bbc3b4b62032..239eb2fba255 100644 > --- a/net/core/skbuff.c > +++ b/net/core/skbuff.c > @@ -939,6 +939,7 @@ void __kfree_skb_defer(struct sk_buff *skb) > > void napi_skb_free_stolen_head(struct sk_buff *skb) > { > + nf_conntrack_put(skb_nfct(skb)); Should this be nf_reset_ct() to clear the skb->_nfct and not leave a uncounted reference? > skb_dst_drop(skb); > skb_ext_put(skb); > napi_skb_cache_put(skb);
On Mon, 2021-07-05 at 10:49 +0300, Paul Blakey wrote: > When multiple SKBs are merged to a new skb under napi GRO, > or SKB is re-used by napi, if nfct was set for them in the > driver, it will not be released while freeing their stolen > head state or on re-use. > > Release nfct on napi's stolen or re-used SKBs, and > in gro_list_prepare, check conntrack metadata diff. > > Fixes: 5c6b94604744 ("net/mlx5e: CT: Handle misses after executing CT action") > Reviewed-by: Roi Dayan <roid@nvidia.com> > Signed-off-by: Paul Blakey <paulb@nvidia.com> > --- > Changelog: > v1->v2: > Check for different flows based on CT and chain metadata in gro_list_prepare > > net/core/dev.c | 13 +++++++++++++ > net/core/skbuff.c | 1 + > 2 files changed, 14 insertions(+) > > diff --git a/net/core/dev.c b/net/core/dev.c > index 439faadab0c2..bf62cb2ec6da 100644 > --- a/net/core/dev.c > +++ b/net/core/dev.c > @@ -5981,6 +5981,18 @@ static void gro_list_prepare(const struct list_head *head, > diffs = memcmp(skb_mac_header(p), > skb_mac_header(skb), > maclen); > + > + diffs |= skb_get_nfct(p) ^ skb_get_nfct(skb); > + > + if (!diffs) { > + struct tc_skb_ext *skb_ext = skb_ext_find(skb, TC_SKB_EXT); > + struct tc_skb_ext *p_ext = skb_ext_find(p, TC_SKB_EXT); > + > + diffs |= (!!p_ext) ^ (!!skb_ext); > + if (!diffs && unlikely(skb_ext)) > + diffs |= p_ext->chain ^ skb_ext->chain; > + } I'm wondering... if 2 skbs are merged, and have the same L2/L3/L4 headers - except len and csum - can they have different dst/TC_EXT? @Eric: I'm sorry for the very dumb and late question. You reported v1 of this patch would make "GRO slow as hell", could you please elaborate a bit more? I thought most skbs (with no ct attached) would see little difference??? Cheers, Paolo
Hi Paul, Thank you for the patch! Yet something to improve: [auto build test ERROR on net/master] url: https://github.com/0day-ci/linux/commits/Paul-Blakey/skbuff-Release-nfct-refcount-on-napi-stolen-or-re-used-skbs/20210705-155140 base: https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git 6ff63a150b5556012589ae59efac1b5eeb7d32c3 config: powerpc-buildonly-randconfig-r003-20210705 (attached as .config) compiler: clang version 13.0.0 (https://github.com/llvm/llvm-project 3f9bf9f42a9043e20c6d2a74dd4f47a90a7e2b41) reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # install powerpc cross compiling tool for clang build # apt-get install binutils-powerpc-linux-gnu # https://github.com/0day-ci/linux/commit/92ce7d888d93e976782be040ca8bff871e7153cf git remote add linux-review https://github.com/0day-ci/linux git fetch --no-tags linux-review Paul-Blakey/skbuff-Release-nfct-refcount-on-napi-stolen-or-re-used-skbs/20210705-155140 git checkout 92ce7d888d93e976782be040ca8bff871e7153cf # save the attached .config to linux build tree COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=powerpc If you fix the issue, kindly add following tag as appropriate Reported-by: kernel test robot <lkp@intel.com> All errors (new ones prefixed by >>): __do_insb ^ arch/powerpc/include/asm/io.h:556:56: note: expanded from macro '__do_insb' #define __do_insb(p, b, n) readsb((PCI_IO_ADDR)_IO_BASE+(p), (b), (n)) ~~~~~~~~~~~~~~~~~~~~~^ In file included from net/core/skbuff.c:41: In file included from include/linux/interrupt.h:11: In file included from include/linux/hardirq.h:11: In file included from arch/powerpc/include/asm/hardirq.h:6: In file included from include/linux/irq.h:20: In file included from include/linux/io.h:13: In file included from arch/powerpc/include/asm/io.h:619: arch/powerpc/include/asm/io-defs.h:45:1: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] DEF_PCI_AC_NORET(insw, (unsigned long p, void *b, unsigned long c), ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ arch/powerpc/include/asm/io.h:616:3: note: expanded from macro 'DEF_PCI_AC_NORET' __do_##name al; \ ^~~~~~~~~~~~~~ <scratch space>:236:1: note: expanded from here __do_insw ^ arch/powerpc/include/asm/io.h:557:56: note: expanded from macro '__do_insw' #define __do_insw(p, b, n) readsw((PCI_IO_ADDR)_IO_BASE+(p), (b), (n)) ~~~~~~~~~~~~~~~~~~~~~^ In file included from net/core/skbuff.c:41: In file included from include/linux/interrupt.h:11: In file included from include/linux/hardirq.h:11: In file included from arch/powerpc/include/asm/hardirq.h:6: In file included from include/linux/irq.h:20: In file included from include/linux/io.h:13: In file included from arch/powerpc/include/asm/io.h:619: arch/powerpc/include/asm/io-defs.h:47:1: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] DEF_PCI_AC_NORET(insl, (unsigned long p, void *b, unsigned long c), ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ arch/powerpc/include/asm/io.h:616:3: note: expanded from macro 'DEF_PCI_AC_NORET' __do_##name al; \ ^~~~~~~~~~~~~~ <scratch space>:238:1: note: expanded from here __do_insl ^ arch/powerpc/include/asm/io.h:558:56: note: expanded from macro '__do_insl' #define __do_insl(p, b, n) readsl((PCI_IO_ADDR)_IO_BASE+(p), (b), (n)) ~~~~~~~~~~~~~~~~~~~~~^ In file included from net/core/skbuff.c:41: In file included from include/linux/interrupt.h:11: In file included from include/linux/hardirq.h:11: In file included from arch/powerpc/include/asm/hardirq.h:6: In file included from include/linux/irq.h:20: In file included from include/linux/io.h:13: In file included from arch/powerpc/include/asm/io.h:619: arch/powerpc/include/asm/io-defs.h:49:1: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] DEF_PCI_AC_NORET(outsb, (unsigned long p, const void *b, unsigned long c), ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ arch/powerpc/include/asm/io.h:616:3: note: expanded from macro 'DEF_PCI_AC_NORET' __do_##name al; \ ^~~~~~~~~~~~~~ <scratch space>:3:1: note: expanded from here __do_outsb ^ arch/powerpc/include/asm/io.h:559:58: note: expanded from macro '__do_outsb' #define __do_outsb(p, b, n) writesb((PCI_IO_ADDR)_IO_BASE+(p),(b),(n)) ~~~~~~~~~~~~~~~~~~~~~^ In file included from net/core/skbuff.c:41: In file included from include/linux/interrupt.h:11: In file included from include/linux/hardirq.h:11: In file included from arch/powerpc/include/asm/hardirq.h:6: In file included from include/linux/irq.h:20: In file included from include/linux/io.h:13: In file included from arch/powerpc/include/asm/io.h:619: arch/powerpc/include/asm/io-defs.h:51:1: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] DEF_PCI_AC_NORET(outsw, (unsigned long p, const void *b, unsigned long c), ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ arch/powerpc/include/asm/io.h:616:3: note: expanded from macro 'DEF_PCI_AC_NORET' __do_##name al; \ ^~~~~~~~~~~~~~ <scratch space>:5:1: note: expanded from here __do_outsw ^ arch/powerpc/include/asm/io.h:560:58: note: expanded from macro '__do_outsw' #define __do_outsw(p, b, n) writesw((PCI_IO_ADDR)_IO_BASE+(p),(b),(n)) ~~~~~~~~~~~~~~~~~~~~~^ In file included from net/core/skbuff.c:41: In file included from include/linux/interrupt.h:11: In file included from include/linux/hardirq.h:11: In file included from arch/powerpc/include/asm/hardirq.h:6: In file included from include/linux/irq.h:20: In file included from include/linux/io.h:13: In file included from arch/powerpc/include/asm/io.h:619: arch/powerpc/include/asm/io-defs.h:53:1: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] DEF_PCI_AC_NORET(outsl, (unsigned long p, const void *b, unsigned long c), ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ arch/powerpc/include/asm/io.h:616:3: note: expanded from macro 'DEF_PCI_AC_NORET' __do_##name al; \ ^~~~~~~~~~~~~~ <scratch space>:7:1: note: expanded from here __do_outsl ^ arch/powerpc/include/asm/io.h:561:58: note: expanded from macro '__do_outsl' #define __do_outsl(p, b, n) writesl((PCI_IO_ADDR)_IO_BASE+(p),(b),(n)) ~~~~~~~~~~~~~~~~~~~~~^ >> net/core/skbuff.c:946:2: error: implicit declaration of function 'nf_conntrack_put' [-Werror,-Wimplicit-function-declaration] nf_conntrack_put(skb_nfct(skb)); ^ 7 warnings and 1 error generated. -- __do_insb ^ arch/powerpc/include/asm/io.h:556:56: note: expanded from macro '__do_insb' #define __do_insb(p, b, n) readsb((PCI_IO_ADDR)_IO_BASE+(p), (b), (n)) ~~~~~~~~~~~~~~~~~~~~~^ In file included from net/core/dev.c:88: In file included from include/linux/interrupt.h:11: In file included from include/linux/hardirq.h:11: In file included from arch/powerpc/include/asm/hardirq.h:6: In file included from include/linux/irq.h:20: In file included from include/linux/io.h:13: In file included from arch/powerpc/include/asm/io.h:619: arch/powerpc/include/asm/io-defs.h:45:1: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] DEF_PCI_AC_NORET(insw, (unsigned long p, void *b, unsigned long c), ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ arch/powerpc/include/asm/io.h:616:3: note: expanded from macro 'DEF_PCI_AC_NORET' __do_##name al; \ ^~~~~~~~~~~~~~ <scratch space>:71:1: note: expanded from here __do_insw ^ arch/powerpc/include/asm/io.h:557:56: note: expanded from macro '__do_insw' #define __do_insw(p, b, n) readsw((PCI_IO_ADDR)_IO_BASE+(p), (b), (n)) ~~~~~~~~~~~~~~~~~~~~~^ In file included from net/core/dev.c:88: In file included from include/linux/interrupt.h:11: In file included from include/linux/hardirq.h:11: In file included from arch/powerpc/include/asm/hardirq.h:6: In file included from include/linux/irq.h:20: In file included from include/linux/io.h:13: In file included from arch/powerpc/include/asm/io.h:619: arch/powerpc/include/asm/io-defs.h:47:1: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] DEF_PCI_AC_NORET(insl, (unsigned long p, void *b, unsigned long c), ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ arch/powerpc/include/asm/io.h:616:3: note: expanded from macro 'DEF_PCI_AC_NORET' __do_##name al; \ ^~~~~~~~~~~~~~ <scratch space>:73:1: note: expanded from here __do_insl ^ arch/powerpc/include/asm/io.h:558:56: note: expanded from macro '__do_insl' #define __do_insl(p, b, n) readsl((PCI_IO_ADDR)_IO_BASE+(p), (b), (n)) ~~~~~~~~~~~~~~~~~~~~~^ In file included from net/core/dev.c:88: In file included from include/linux/interrupt.h:11: In file included from include/linux/hardirq.h:11: In file included from arch/powerpc/include/asm/hardirq.h:6: In file included from include/linux/irq.h:20: In file included from include/linux/io.h:13: In file included from arch/powerpc/include/asm/io.h:619: arch/powerpc/include/asm/io-defs.h:49:1: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] DEF_PCI_AC_NORET(outsb, (unsigned long p, const void *b, unsigned long c), ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ arch/powerpc/include/asm/io.h:616:3: note: expanded from macro 'DEF_PCI_AC_NORET' __do_##name al; \ ^~~~~~~~~~~~~~ <scratch space>:75:1: note: expanded from here __do_outsb ^ arch/powerpc/include/asm/io.h:559:58: note: expanded from macro '__do_outsb' #define __do_outsb(p, b, n) writesb((PCI_IO_ADDR)_IO_BASE+(p),(b),(n)) ~~~~~~~~~~~~~~~~~~~~~^ In file included from net/core/dev.c:88: In file included from include/linux/interrupt.h:11: In file included from include/linux/hardirq.h:11: In file included from arch/powerpc/include/asm/hardirq.h:6: In file included from include/linux/irq.h:20: In file included from include/linux/io.h:13: In file included from arch/powerpc/include/asm/io.h:619: arch/powerpc/include/asm/io-defs.h:51:1: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] DEF_PCI_AC_NORET(outsw, (unsigned long p, const void *b, unsigned long c), ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ arch/powerpc/include/asm/io.h:616:3: note: expanded from macro 'DEF_PCI_AC_NORET' __do_##name al; \ ^~~~~~~~~~~~~~ <scratch space>:77:1: note: expanded from here __do_outsw ^ arch/powerpc/include/asm/io.h:560:58: note: expanded from macro '__do_outsw' #define __do_outsw(p, b, n) writesw((PCI_IO_ADDR)_IO_BASE+(p),(b),(n)) ~~~~~~~~~~~~~~~~~~~~~^ In file included from net/core/dev.c:88: In file included from include/linux/interrupt.h:11: In file included from include/linux/hardirq.h:11: In file included from arch/powerpc/include/asm/hardirq.h:6: In file included from include/linux/irq.h:20: In file included from include/linux/io.h:13: In file included from arch/powerpc/include/asm/io.h:619: arch/powerpc/include/asm/io-defs.h:53:1: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] DEF_PCI_AC_NORET(outsl, (unsigned long p, const void *b, unsigned long c), ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ arch/powerpc/include/asm/io.h:616:3: note: expanded from macro 'DEF_PCI_AC_NORET' __do_##name al; \ ^~~~~~~~~~~~~~ <scratch space>:79:1: note: expanded from here __do_outsl ^ arch/powerpc/include/asm/io.h:561:58: note: expanded from macro '__do_outsl' #define __do_outsl(p, b, n) writesl((PCI_IO_ADDR)_IO_BASE+(p),(b),(n)) ~~~~~~~~~~~~~~~~~~~~~^ >> net/core/dev.c:6015:33: error: implicit declaration of function 'skb_ext_find' [-Werror,-Wimplicit-function-declaration] struct tc_skb_ext *skb_ext = skb_ext_find(skb, TC_SKB_EXT); ^ >> net/core/dev.c:6015:51: error: use of undeclared identifier 'TC_SKB_EXT' struct tc_skb_ext *skb_ext = skb_ext_find(skb, TC_SKB_EXT); ^ net/core/dev.c:6016:47: error: use of undeclared identifier 'TC_SKB_EXT' struct tc_skb_ext *p_ext = skb_ext_find(p, TC_SKB_EXT); ^ >> net/core/dev.c:6020:19: error: incomplete definition of type 'struct tc_skb_ext' diffs |= p_ext->chain ^ skb_ext->chain; ~~~~~^ net/core/dev.c:6015:11: note: forward declaration of 'struct tc_skb_ext' struct tc_skb_ext *skb_ext = skb_ext_find(skb, TC_SKB_EXT); ^ net/core/dev.c:6020:36: error: incomplete definition of type 'struct tc_skb_ext' diffs |= p_ext->chain ^ skb_ext->chain; ~~~~~~~^ net/core/dev.c:6015:11: note: forward declaration of 'struct tc_skb_ext' struct tc_skb_ext *skb_ext = skb_ext_find(skb, TC_SKB_EXT); ^ 7 warnings and 5 errors generated. vim +/nf_conntrack_put +946 net/core/skbuff.c 943 944 void napi_skb_free_stolen_head(struct sk_buff *skb) 945 { > 946 nf_conntrack_put(skb_nfct(skb)); 947 skb_dst_drop(skb); 948 skb_ext_put(skb); 949 napi_skb_cache_put(skb); 950 } 951 --- 0-DAY CI Kernel Test Service, Intel Corporation https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
Hi Paul, Thank you for the patch! Yet something to improve: [auto build test ERROR on net/master] url: https://github.com/0day-ci/linux/commits/Paul-Blakey/skbuff-Release-nfct-refcount-on-napi-stolen-or-re-used-skbs/20210705-155140 base: https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git 6ff63a150b5556012589ae59efac1b5eeb7d32c3 config: um-x86_64_defconfig (attached as .config) compiler: gcc-9 (Debian 9.3.0-22) 9.3.0 reproduce (this is a W=1 build): # https://github.com/0day-ci/linux/commit/92ce7d888d93e976782be040ca8bff871e7153cf git remote add linux-review https://github.com/0day-ci/linux git fetch --no-tags linux-review Paul-Blakey/skbuff-Release-nfct-refcount-on-napi-stolen-or-re-used-skbs/20210705-155140 git checkout 92ce7d888d93e976782be040ca8bff871e7153cf # save the attached .config to linux build tree make W=1 ARCH=um SUBARCH=x86_64 If you fix the issue, kindly add following tag as appropriate Reported-by: kernel test robot <lkp@intel.com> All errors (new ones prefixed by >>): net/core/skbuff.c: In function 'napi_skb_free_stolen_head': >> net/core/skbuff.c:946:2: error: implicit declaration of function 'nf_conntrack_put' [-Werror=implicit-function-declaration] 946 | nf_conntrack_put(skb_nfct(skb)); | ^~~~~~~~~~~~~~~~ cc1: some warnings being treated as errors -- net/core/dev.c: In function 'gro_list_prepare': >> net/core/dev.c:6015:33: error: implicit declaration of function 'skb_ext_find'; did you mean 'skb_ext_copy'? [-Werror=implicit-function-declaration] 6015 | struct tc_skb_ext *skb_ext = skb_ext_find(skb, TC_SKB_EXT); | ^~~~~~~~~~~~ | skb_ext_copy >> net/core/dev.c:6015:51: error: 'TC_SKB_EXT' undeclared (first use in this function) 6015 | struct tc_skb_ext *skb_ext = skb_ext_find(skb, TC_SKB_EXT); | ^~~~~~~~~~ net/core/dev.c:6015:51: note: each undeclared identifier is reported only once for each function it appears in >> net/core/dev.c:6020:19: error: dereferencing pointer to incomplete type 'struct tc_skb_ext' 6020 | diffs |= p_ext->chain ^ skb_ext->chain; | ^~ cc1: some warnings being treated as errors vim +/nf_conntrack_put +946 net/core/skbuff.c 943 944 void napi_skb_free_stolen_head(struct sk_buff *skb) 945 { > 946 nf_conntrack_put(skb_nfct(skb)); 947 skb_dst_drop(skb); 948 skb_ext_put(skb); 949 napi_skb_cache_put(skb); 950 } 951 --- 0-DAY CI Kernel Test Service, Intel Corporation https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
On Mon, 5 Jul 2021, Mika Penttilä wrote: > Hi! > > On 5.7.2021 10.49, Paul Blakey wrote: > > When multiple SKBs are merged to a new skb under napi GRO, > > or SKB is re-used by napi, if nfct was set for them in the > > driver, it will not be released while freeing their stolen > > head state or on re-use. > > > > Release nfct on napi's stolen or re-used SKBs, and > > in gro_list_prepare, check conntrack metadata diff. > > > > Fixes: 5c6b94604744 ("net/mlx5e: CT: Handle misses after executing CT > > action") > > Reviewed-by: Roi Dayan <roid@nvidia.com> > > Signed-off-by: Paul Blakey <paulb@nvidia.com> > > --- > > Changelog: > > v1->v2: > > Check for different flows based on CT and chain metadata in > > gro_list_prepare > > > > net/core/dev.c | 13 +++++++++++++ > > net/core/skbuff.c | 1 + > > 2 files changed, 14 insertions(+) > > > > diff --git a/net/core/dev.c b/net/core/dev.c > > index 439faadab0c2..bf62cb2ec6da 100644 > > --- a/net/core/dev.c > > +++ b/net/core/dev.c > > @@ -5981,6 +5981,18 @@ static void gro_list_prepare(const struct list_head > > *head, > > diffs = memcmp(skb_mac_header(p), > > skb_mac_header(skb), > > maclen); > > + > > + diffs |= skb_get_nfct(p) ^ skb_get_nfct(skb); > > + > > + if (!diffs) { > > + struct tc_skb_ext *skb_ext = skb_ext_find(skb, > > TC_SKB_EXT); > > + struct tc_skb_ext *p_ext = skb_ext_find(p, > > TC_SKB_EXT); > > + > > + diffs |= (!!p_ext) ^ (!!skb_ext); > > + if (!diffs && unlikely(skb_ext)) > > + diffs |= p_ext->chain ^ skb_ext->chain; > > + } > > + > > NAPI_GRO_CB(p)->same_flow = !diffs; > > } > > } > > @@ -6243,6 +6255,7 @@ static void napi_reuse_skb(struct napi_struct *napi, > > struct sk_buff *skb) > > skb_shinfo(skb)->gso_type = 0; > > skb->truesize = SKB_TRUESIZE(skb_end_offset(skb)); > > skb_ext_reset(skb); > > + nf_reset_ct(skb); > > > > napi->skb = skb; > > } > > diff --git a/net/core/skbuff.c b/net/core/skbuff.c > > index bbc3b4b62032..239eb2fba255 100644 > > --- a/net/core/skbuff.c > > +++ b/net/core/skbuff.c > > @@ -939,6 +939,7 @@ void __kfree_skb_defer(struct sk_buff *skb) > > > > void napi_skb_free_stolen_head(struct sk_buff *skb) > > { > > + nf_conntrack_put(skb_nfct(skb)); > > Should this be nf_reset_ct() to clear the skb->_nfct and not leave a > uncounted reference? Yes it should, thanks! Sending revision. > > > skb_dst_drop(skb); > > skb_ext_put(skb); > > napi_skb_cache_put(skb); > >
On Mon, 5 Jul 2021, Paolo Abeni wrote: > On Mon, 2021-07-05 at 10:49 +0300, Paul Blakey wrote: > > When multiple SKBs are merged to a new skb under napi GRO, > > or SKB is re-used by napi, if nfct was set for them in the > > driver, it will not be released while freeing their stolen > > head state or on re-use. > > > > Release nfct on napi's stolen or re-used SKBs, and > > in gro_list_prepare, check conntrack metadata diff. > > > > Fixes: 5c6b94604744 ("net/mlx5e: CT: Handle misses after executing CT action") > > Reviewed-by: Roi Dayan <roid@nvidia.com> > > Signed-off-by: Paul Blakey <paulb@nvidia.com> > > --- > > Changelog: > > v1->v2: > > Check for different flows based on CT and chain metadata in gro_list_prepare > > > > net/core/dev.c | 13 +++++++++++++ > > net/core/skbuff.c | 1 + > > 2 files changed, 14 insertions(+) > > > > diff --git a/net/core/dev.c b/net/core/dev.c > > index 439faadab0c2..bf62cb2ec6da 100644 > > --- a/net/core/dev.c > > +++ b/net/core/dev.c > > @@ -5981,6 +5981,18 @@ static void gro_list_prepare(const struct list_head *head, > > diffs = memcmp(skb_mac_header(p), > > skb_mac_header(skb), > > maclen); > > + > > + diffs |= skb_get_nfct(p) ^ skb_get_nfct(skb); > > + > > + if (!diffs) { > > + struct tc_skb_ext *skb_ext = skb_ext_find(skb, TC_SKB_EXT); > > + struct tc_skb_ext *p_ext = skb_ext_find(p, TC_SKB_EXT); > > + > > + diffs |= (!!p_ext) ^ (!!skb_ext); > > + if (!diffs && unlikely(skb_ext)) > > + diffs |= p_ext->chain ^ skb_ext->chain; > > + } > > I'm wondering... if 2 skbs are merged, and have the same L2/L3/L4 > headers - except len and csum - can they have different dst/TC_EXT? Yes and same tunnel header metadata... so even tunnels are the same. So probably not, I had trouble thinking of when it can happen as well. But user might have some weird tc policy where it will happen, especially with header rewrite. To test this, I ran two tcp streams that do some hops in hardware (tc goto chain), and for one stream of the two, I used tc pedit rules to rewrite an ip/mac so the two stream will be the same 5 tuple (macs, ips, ports) just on different chains and on different connection tracking zones. For the last hop where both streams where the same, I used tc flower skip_hw flag so it will miss to software and we get here with same same_flow = false. This might be too much for email (so I won't bother formatting it :)) but here is the setup I used: echo "add arp rules" tc_filter add dev $REP ingress protocol arp prio 888 flower skip_hw $tc_verbose \ action mirred egress mirror dev $VETH_REP1 pipe \ action mirred egress redirect dev $VETH_REP2 tc_filter add dev $VETH_REP2 ingress protocol arp prio 888 flower skip_hw $tc_verbose \ action mirred egress redirect dev $REP tc_filter add dev $VETH_REP1 ingress protocol arp prio 888 flower skip_hw $tc_verbose \ action mirred egress redirect dev $REP echo "add ct rules" flag="" #ORIG: #chain 0, REP->VETH[12], send to diff chains based on mac, #for zone 4 first do hw rewrite of fake ip so we have same tuple tc_filter add dev $REP ingress protocol ip chain 0 prio 1 flower ip_proto $proto dst_ip $ip_remote1 $tc_verbose $flag \ dst_mac $remote_mac ct_state -trk \ action ct zone 3 action goto chain 3 #chain 2, continuation of header rewrite, send to chain 4 & zone 4 tc_filter add dev $REP ingress protocol ip chain 2 prio 1 flower ip_proto $proto $tc_verbose \ action ct zone 4 action goto chain 4 #chain 3, REP->VETH, zone 3 tc_filter add dev $REP ingress protocol ip chain 3 prio 1 flower ip_proto $proto $tc_verbose \ ct_state +trk+new ct_zone 3 \ action ct zone 3 commit \ action mirred egress redirect dev $VETH_REP1 tc_filter add dev $REP ingress protocol ip chain 3 prio 1 flower ip_proto $proto $tc_verbose \ ct_state +trk+est ct_zone 3 \ action mirred egress redirect dev $VETH_REP1 #others... tc_filter add dev $REP ingress protocol ip chain 3 prio 5 flower ip_proto $proto skip_hw $tc_verbose \ ct_zone 3 \ action drop tc_filter add dev $REP ingress protocol ip chain 3 prio 6 flower ip_proto $proto skip_hw $tc_verbose \ action drop #chain 4, REP->VETH2, zone 4, +new/+est, rewrite back mac/ip and fwd tc_filter add dev $REP ingress protocol ip chain 4 prio 5 flower ip_proto $proto skip_hw $tc_verbose \ ct_zone 4 \ action drop tc_filter add dev $REP ingress protocol ip chain 4 prio 6 flower ip_proto $proto skip_hw $tc_verbose \ action drop #catch wrong packets tc_filter add dev $REP ingress protocol ip chain 4 prio 2 flower ip_proto $proto $tc_verbose skip_hw \ ct_zone 3 \ action drop tc_filter add dev $REP ingress protocol ip chain 3 prio 2 flower ip_proto $proto $tc_verbose skip_hw \ ct_zone 4 \ action drop #REPLY: #chain 0, VETH->REP, send to zone 3, then chain 6 for fwd tc_filter add dev $VETH_REP1 ingress protocol ip chain 0 prio 1 flower ip_proto $proto $tc_verbose $flag \ ct_state -trk \ action ct zone 3 action goto chain 6 #chain 6, VETH->REP, zone 3, +est, fwd tc_filter add dev $VETH_REP1 ingress protocol ip chain 6 prio 1 flower ip_proto $proto $tc_verbose \ ct_state +trk+est \ action mirred egress redirect dev $REP #chain 0, VETH->REP, send to zone 4, fake ip for ct, then chain 6 for fwd #chain 6, VETH2->REP, zone 3, +est, revert fake ip and fwd to dev port1=6000 port2=7000 echo port1 $port1 port2 $port2 tc_filter add dev $REP ingress protocol ip chain 0 prio 1 flower ip_proto $proto dst_ip $ip_remote2 $tc_verbose $flag \ dst_mac $remote_mac2 ct_state -trk src_port $port2 \ action pedit ex \ munge eth dst set $remote_mac \ munge ip dst set $ip_remote1 \ munge $proto sport set $port1 \ pipe \ action csum $proto ip pipe \ action action goto chain 2 tc_filter add dev $REP ingress protocol ip chain 0 prio 2 flower ip_proto $proto dst_ip $ip_remote2 $tc_verbose $flag \ dst_mac $remote_mac2 \ action mirred egress redirect dev $VETH_REP2 tc_filter add dev $REP ingress protocol ip chain 4 prio 1 flower ip_proto $proto $tc_verbose \ ct_state +trk+new ct_zone 4 src_port $port1 \ action ct zone 4 commit pipe \ action pedit ex \ munge eth dst set $remote_mac2 \ munge ip dst set $ip_remote2 \ munge $proto sport set $port2 \ pipe \ action csum $proto ip pipe \ action mirred egress redirect dev $VETH_REP2 tc_filter add dev $REP ingress protocol ip chain 4 prio 1 flower ip_proto $proto $tc_verbose \ ct_state +trk+est ct_zone 4 src_port $port1 \ action pedit ex \ munge eth dst set $remote_mac2 \ munge ip dst set $ip_remote2 \ munge $proto sport set $port2 \ pipe \ action csum $proto ip pipe \ action mirred egress redirect dev $VETH_REP2 tc_filter add dev $VETH_REP2 ingress protocol ip chain 0 prio 1 flower ip_proto $proto $tc_verbose $flag \ ct_state -trk dst_port $port2 \ action pedit ex \ munge ip src set $ip_remote1 \ munge $proto dport set $port1 \ pipe \ action csum $proto ip pipe \ action ct zone 4 action goto chain 6 tc_filter add dev $VETH_REP2 ingress protocol ip chain 0 prio 2 flower ip_proto $proto $tc_verbose $flag \ action mirred egress redirect dev $REP tc_filter add dev $VETH_REP2 ingress protocol ip chain 6 prio 1 flower ip_proto $proto $tc_verbose \ ct_state +trk+est dst_port $port1 \ action pedit ex \ munge ip src set $ip_remote2 \ munge $proto dport set $port2 \ pipe \ action csum $proto ip pipe \ action mirred egress redirect dev $REP > > @Eric: I'm sorry for the very dumb and late question. You reported v1 > of this patch would make "GRO slow as hell", could you please elaborate > a bit more? I thought most skbs (with no ct attached) would see little > difference??? > > Cheers, > > Paolo > > >
diff --git a/net/core/dev.c b/net/core/dev.c index 439faadab0c2..bf62cb2ec6da 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -5981,6 +5981,18 @@ static void gro_list_prepare(const struct list_head *head, diffs = memcmp(skb_mac_header(p), skb_mac_header(skb), maclen); + + diffs |= skb_get_nfct(p) ^ skb_get_nfct(skb); + + if (!diffs) { + struct tc_skb_ext *skb_ext = skb_ext_find(skb, TC_SKB_EXT); + struct tc_skb_ext *p_ext = skb_ext_find(p, TC_SKB_EXT); + + diffs |= (!!p_ext) ^ (!!skb_ext); + if (!diffs && unlikely(skb_ext)) + diffs |= p_ext->chain ^ skb_ext->chain; + } + NAPI_GRO_CB(p)->same_flow = !diffs; } } @@ -6243,6 +6255,7 @@ static void napi_reuse_skb(struct napi_struct *napi, struct sk_buff *skb) skb_shinfo(skb)->gso_type = 0; skb->truesize = SKB_TRUESIZE(skb_end_offset(skb)); skb_ext_reset(skb); + nf_reset_ct(skb); napi->skb = skb; } diff --git a/net/core/skbuff.c b/net/core/skbuff.c index bbc3b4b62032..239eb2fba255 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -939,6 +939,7 @@ void __kfree_skb_defer(struct sk_buff *skb) void napi_skb_free_stolen_head(struct sk_buff *skb) { + nf_conntrack_put(skb_nfct(skb)); skb_dst_drop(skb); skb_ext_put(skb); napi_skb_cache_put(skb);