Message ID | 167475990764.1934330.11960904198087757911.stgit@localhost.localdomain (mailing list archive) |
---|---|
State | Accepted |
Commit | 7d2c89b325874a35564db5630a459966afab04cc |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net] skb: Do mix page pool and page referenced frags in GRO | expand |
Alexander Duyck <alexander.duyck@gmail.com> writes: > From: Alexander Duyck <alexanderduyck@fb.com> > > GSO should not merge page pool recycled frames with standard reference > counted frames. Traditionally this didn't occur, at least not often. > However as we start looking at adding support for wireless adapters there > becomes the potential to mix the two due to A-MSDU repartitioning frames in > the receive path. There are possibly other places where this may have > occurred however I suspect they must be few and far between as we have not > seen this issue until now. > > Fixes: 53e0961da1c7 ("page_pool: add frag page recycling support in page pool") > Reported-by: Felix Fietkau <nbd@nbd.name> > Signed-off-by: Alexander Duyck <alexanderduyck@fb.com> I know I'm pattern matching a bit crudely here, but we recently had another report where doing a get_page() on skb->head didn't seem to be enough; any chance they might be related? See: https://lore.kernel.org/r/Y9BfknDG0LXmruDu@JNXK7M3 -Toke
On Thu, Jan 26, 2023 at 11:14 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote: > > Alexander Duyck <alexander.duyck@gmail.com> writes: > > > From: Alexander Duyck <alexanderduyck@fb.com> > > > > GSO should not merge page pool recycled frames with standard reference > > counted frames. Traditionally this didn't occur, at least not often. > > However as we start looking at adding support for wireless adapters there > > becomes the potential to mix the two due to A-MSDU repartitioning frames in > > the receive path. There are possibly other places where this may have > > occurred however I suspect they must be few and far between as we have not > > seen this issue until now. > > > > Fixes: 53e0961da1c7 ("page_pool: add frag page recycling support in page pool") > > Reported-by: Felix Fietkau <nbd@nbd.name> > > Signed-off-by: Alexander Duyck <alexanderduyck@fb.com> > > I know I'm pattern matching a bit crudely here, but we recently had > another report where doing a get_page() on skb->head didn't seem to be > enough; any chance they might be related? > > See: https://lore.kernel.org/r/Y9BfknDG0LXmruDu@JNXK7M3 Looking at it I wouldn't think so. Doing get_page() on these frames is fine. In the case you reference it looks like get_page() is being called on a slab allocated skb head. So somehow a slab allocated head is leaking through. What is causing the issue here is that after get_page() is being called and the fragments are moved into a non-pp_recycle skb they are then picked out and merged back into a pp_recycle skb. As a result what is happening is that we are seeing a reference count leak from pp_frag_count and into refcount. This is the quick-n-dirty fix. I am debating if we want to expand this so that we could support the case where the donor frame is pp_recycle but the recipient is a standard reference counted frame. Fixing that would essentially consist of having to add logic to take the reference on all donor frags, making certain that nr_frags on the donor skb isn't altered, and then lastly making sure that all cases use the NAPI_GRO_FREE path to drop the page pool counts.
Alexander Duyck <alexander.duyck@gmail.com> writes: > On Thu, Jan 26, 2023 at 11:14 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote: >> >> Alexander Duyck <alexander.duyck@gmail.com> writes: >> >> > From: Alexander Duyck <alexanderduyck@fb.com> >> > >> > GSO should not merge page pool recycled frames with standard reference >> > counted frames. Traditionally this didn't occur, at least not often. >> > However as we start looking at adding support for wireless adapters there >> > becomes the potential to mix the two due to A-MSDU repartitioning frames in >> > the receive path. There are possibly other places where this may have >> > occurred however I suspect they must be few and far between as we have not >> > seen this issue until now. >> > >> > Fixes: 53e0961da1c7 ("page_pool: add frag page recycling support in page pool") >> > Reported-by: Felix Fietkau <nbd@nbd.name> >> > Signed-off-by: Alexander Duyck <alexanderduyck@fb.com> >> >> I know I'm pattern matching a bit crudely here, but we recently had >> another report where doing a get_page() on skb->head didn't seem to be >> enough; any chance they might be related? >> >> See: https://lore.kernel.org/r/Y9BfknDG0LXmruDu@JNXK7M3 > > Looking at it I wouldn't think so. Doing get_page() on these frames is > fine. In the case you reference it looks like get_page() is being > called on a slab allocated skb head. So somehow a slab allocated head > is leaking through. Alright, thanks for taking a look! :) -Toke
On Thu, 26 Jan 2023 11:06:59 -0800 Alexander Duyck wrote: > From: Alexander Duyck <alexanderduyck@fb.com> > > GSO should not merge page pool recycled frames with standard reference > counted frames. Traditionally this didn't occur, at least not often. > However as we start looking at adding support for wireless adapters there > becomes the potential to mix the two due to A-MSDU repartitioning frames in > the receive path. There are possibly other places where this may have > occurred however I suspect they must be few and far between as we have not > seen this issue until now. > > Fixes: 53e0961da1c7 ("page_pool: add frag page recycling support in page pool") > Reported-by: Felix Fietkau <nbd@nbd.name> > Signed-off-by: Alexander Duyck <alexanderduyck@fb.com> Exciting investigation! Felix, out of curiosity - the impact of loosing GRO on performance is not significant enough to care? We could possibly try to switch to using the frag list if we can't merge into frags safely. > diff --git a/net/core/gro.c b/net/core/gro.c > index 506f83d715f8..4bac7ea6e025 100644 > --- a/net/core/gro.c > +++ b/net/core/gro.c > @@ -162,6 +162,15 @@ int skb_gro_receive(struct sk_buff *p, struct sk_buff *skb) > struct sk_buff *lp; > int segs; > > + /* Do not splice page pool based packets w/ non-page pool > + * packets. This can result in reference count issues as page > + * pool pages will not decrement the reference count and will > + * instead be immediately returned to the pool or have frag > + * count decremented. > + */ > + if (p->pp_recycle != skb->pp_recycle) > + return -ETOOMANYREFS; > > /* pairs with WRITE_ONCE() in netif_set_gro_max_size() */ > gro_max_size = READ_ONCE(p->dev->gro_max_size); > > >
Thanks Alexander! On Fri, 27 Jan 2023 at 01:13, Jakub Kicinski <kuba@kernel.org> wrote: > > On Thu, 26 Jan 2023 11:06:59 -0800 Alexander Duyck wrote: > > From: Alexander Duyck <alexanderduyck@fb.com> > > > > GSO should not merge page pool recycled frames with standard reference > > counted frames. Traditionally this didn't occur, at least not often. > > However as we start looking at adding support for wireless adapters there > > becomes the potential to mix the two due to A-MSDU repartitioning frames in > > the receive path. There are possibly other places where this may have > > occurred however I suspect they must be few and far between as we have not > > seen this issue until now. > > > > Fixes: 53e0961da1c7 ("page_pool: add frag page recycling support in page pool") > > Reported-by: Felix Fietkau <nbd@nbd.name> > > Signed-off-by: Alexander Duyck <alexanderduyck@fb.com> > > Exciting investigation! > Felix, out of curiosity - the impact of loosing GRO on performance is > not significant enough to care? We could possibly try to switch to > using the frag list if we can't merge into frags safely. > > > diff --git a/net/core/gro.c b/net/core/gro.c > > index 506f83d715f8..4bac7ea6e025 100644 > > --- a/net/core/gro.c > > +++ b/net/core/gro.c > > @@ -162,6 +162,15 @@ int skb_gro_receive(struct sk_buff *p, struct sk_buff *skb) > > struct sk_buff *lp; > > int segs; > > > > + /* Do not splice page pool based packets w/ non-page pool > > + * packets. This can result in reference count issues as page > > + * pool pages will not decrement the reference count and will > > + * instead be immediately returned to the pool or have frag > > + * count decremented. > > + */ > > + if (p->pp_recycle != skb->pp_recycle) > > + return -ETOOMANYREFS; > > > > /* pairs with WRITE_ONCE() in netif_set_gro_max_size() */ > > gro_max_size = READ_ONCE(p->dev->gro_max_size); > > > > > > > Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
On 27.01.23 00:13, Jakub Kicinski wrote: > On Thu, 26 Jan 2023 11:06:59 -0800 Alexander Duyck wrote: >> From: Alexander Duyck <alexanderduyck@fb.com> >> >> GSO should not merge page pool recycled frames with standard reference >> counted frames. Traditionally this didn't occur, at least not often. >> However as we start looking at adding support for wireless adapters there >> becomes the potential to mix the two due to A-MSDU repartitioning frames in >> the receive path. There are possibly other places where this may have >> occurred however I suspect they must be few and far between as we have not >> seen this issue until now. >> >> Fixes: 53e0961da1c7 ("page_pool: add frag page recycling support in page pool") >> Reported-by: Felix Fietkau <nbd@nbd.name> >> Signed-off-by: Alexander Duyck <alexanderduyck@fb.com> > > Exciting investigation! > Felix, out of curiosity - the impact of loosing GRO on performance is > not significant enough to care? We could possibly try to switch to > using the frag list if we can't merge into frags safely. Since this only affects combining page_pool and non-page_pool packets, the performance loss should be neglegible. - Felix
On 2023/1/27 3:06, Alexander Duyck wrote: > From: Alexander Duyck <alexanderduyck@fb.com> > > GSO should not merge page pool recycled frames with standard reference > counted frames. Traditionally this didn't occur, at least not often. > However as we start looking at adding support for wireless adapters there > becomes the potential to mix the two due to A-MSDU repartitioning frames in > the receive path. There are possibly other places where this may have > occurred however I suspect they must be few and far between as we have not > seen this issue until now. > > Fixes: 53e0961da1c7 ("page_pool: add frag page recycling support in page pool") > Reported-by: Felix Fietkau <nbd@nbd.name> > Signed-off-by: Alexander Duyck <alexanderduyck@fb.com> > --- > net/core/gro.c | 9 +++++++++ > 1 file changed, 9 insertions(+) > > diff --git a/net/core/gro.c b/net/core/gro.c > index 506f83d715f8..4bac7ea6e025 100644 > --- a/net/core/gro.c > +++ b/net/core/gro.c > @@ -162,6 +162,15 @@ int skb_gro_receive(struct sk_buff *p, struct sk_buff *skb) > struct sk_buff *lp; > int segs; > > + /* Do not splice page pool based packets w/ non-page pool > + * packets. This can result in reference count issues as page > + * pool pages will not decrement the reference count and will > + * instead be immediately returned to the pool or have frag > + * count decremented. > + */ > + if (p->pp_recycle != skb->pp_recycle) > + return -ETOOMANYREFS; If we are not allowing gro for the above case, setting NAPI_GRO_CB(p)->flush to 1 in gro_list_prepare() seems to be making more sense so that the above case has the same handling as skb_has_frag_list() handling? https://elixir.bootlin.com/linux/v6.2-rc4/source/net/core/gro.c#L503 As it seems to avoid some unnecessary operation according to comment in tcp4_gro_receive(): https://elixir.bootlin.com/linux/v6.2-rc4/source/net/ipv4/tcp_offload.c#L322 Also if A-MSDU is normal case for wireless adapters and we want the performance back for the above case, maybe the driver can set skb->pp_recycle and update the page->pp_frag_count instead of page refcount if A-MSDU or A-MSDU decap performed by the driver can track if the page is from page pool. In that case we may turn the above checking to a WARN_ON() to catch any other corner-case. > + > /* pairs with WRITE_ONCE() in netif_set_gro_max_size() */ > gro_max_size = READ_ONCE(p->dev->gro_max_size); > > > > . >
On Sat, 28 Jan 2023 10:37:47 +0800 Yunsheng Lin wrote: > If we are not allowing gro for the above case, setting NAPI_GRO_CB(p)->flush > to 1 in gro_list_prepare() seems to be making more sense so that the above > case has the same handling as skb_has_frag_list() handling? > https://elixir.bootlin.com/linux/v6.2-rc4/source/net/core/gro.c#L503 > > As it seems to avoid some unnecessary operation according to comment > in tcp4_gro_receive(): > https://elixir.bootlin.com/linux/v6.2-rc4/source/net/ipv4/tcp_offload.c#L322 The frag_list case can be determined with just the input skb. For pp_recycle we need to compare input skb's pp_recycle with the pp_recycle of the skb already held by GRO. I'll hold off with applying a bit longer tho, in case Eric wants to chime in with an ack or opinion.
On Sat, Jan 28, 2023 at 6:26 AM Jakub Kicinski <kuba@kernel.org> wrote: > > On Sat, 28 Jan 2023 10:37:47 +0800 Yunsheng Lin wrote: > > If we are not allowing gro for the above case, setting NAPI_GRO_CB(p)->flush > > to 1 in gro_list_prepare() seems to be making more sense so that the above > > case has the same handling as skb_has_frag_list() handling? > > https://elixir.bootlin.com/linux/v6.2-rc4/source/net/core/gro.c#L503 > > > > As it seems to avoid some unnecessary operation according to comment > > in tcp4_gro_receive(): > > https://elixir.bootlin.com/linux/v6.2-rc4/source/net/ipv4/tcp_offload.c#L322 > > The frag_list case can be determined with just the input skb. > For pp_recycle we need to compare input skb's pp_recycle with > the pp_recycle of the skb already held by GRO. > > I'll hold off with applying a bit longer tho, in case Eric > wants to chime in with an ack or opinion. We can say that we are adding in the fast path an expensive check about an unlikely condition. GRO is by far the most expensive component in our stack. I would at least make the extra checks conditional to CONFIG_PAGE_POOL=y Ideally all accesses to skb->pp_recycle should be done via a helper [1] I hope that we will switch later to a per-page marker, instead of a per-skb one. ( a la https://www.spinics.net/lists/netdev/msg874099.html ) [1] Problem is that CONFIG_PAGE_POOL=y is now forced because of net/bpf/test_run.c So... testing skb->pp_recycle seems needed for the time being Reviewed-by: Eric Dumazet <edumazet@google.com> My tentative patch was something like: diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 4c8492401a101f1d6d43079fc70962210389763c..a53b176738b10f3b69b38c487e0c280f44990b6f 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -918,8 +918,10 @@ struct sk_buff { fclone:2, peeked:1, head_frag:1, - pfmemalloc:1, - pp_recycle:1; /* page_pool recycle indicator */ +#ifdef CONFIG_PAGE_POOL + pp_recycle:1, /* page_pool recycle indicator */ +#endif + pfmemalloc:1; #ifdef CONFIG_SKB_EXTENSIONS __u8 active_extensions; #endif @@ -3388,6 +3390,15 @@ static inline void __skb_frag_unref(skb_frag_t *frag, bool recycle) put_page(page); } +static inline bool skb_pp_recycle(const struct sk_buff *skb) +{ +#ifdef CONFIG_PAGE_POOL + return skb->pp_recycle; +#else + return false; +#endif +} + /** * skb_frag_unref - release a reference on a paged fragment of an skb. * @skb: the buffer @@ -3400,7 +3411,7 @@ static inline void skb_frag_unref(struct sk_buff *skb, int f) struct skb_shared_info *shinfo = skb_shinfo(skb); if (!skb_zcopy_managed(skb)) - __skb_frag_unref(&shinfo->frags[f], skb->pp_recycle); + __skb_frag_unref(&shinfo->frags[f], skb_pp_recycle(skb)); } /** diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 180df58e85c72eaa16f5cb56b56d181a379b8921..7a2783a2c9608eec728a0adacea4619ab1c62791 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -801,19 +801,13 @@ static void skb_clone_fraglist(struct sk_buff *skb) skb_get(list); } -static bool skb_pp_recycle(struct sk_buff *skb, void *data) -{ - if (!IS_ENABLED(CONFIG_PAGE_POOL) || !skb->pp_recycle) - return false; - return page_pool_return_skb_page(virt_to_page(data)); -} - static void skb_free_head(struct sk_buff *skb) { unsigned char *head = skb->head; if (skb->head_frag) { - if (skb_pp_recycle(skb, head)) + if (skb_pp_recycle(skb) && + page_pool_return_skb_page(virt_to_page(head))) return; skb_free_frag(head); } else { @@ -840,7 +834,7 @@ static void skb_release_data(struct sk_buff *skb, enum skb_drop_reason reason) } for (i = 0; i < shinfo->nr_frags; i++) - __skb_frag_unref(&shinfo->frags[i], skb->pp_recycle); + __skb_frag_unref(&shinfo->frags[i], skb_pp_recycle(skb)); free_head: if (shinfo->frag_list) @@ -857,7 +851,10 @@ static void skb_release_data(struct sk_buff *skb, enum skb_drop_reason reason) * Eventually the last SKB will have the recycling bit set and it's * dataref set to 0, which will trigger the recycling */ +#ifdef CONFIG_PAGE_POOL skb->pp_recycle = 0; +#endif + return; } /* @@ -1292,7 +1289,9 @@ static struct sk_buff *__skb_clone(struct sk_buff *n, struct sk_buff *skb) n->nohdr = 0; n->peeked = 0; C(pfmemalloc); +#ifdef CONFIG_PAGE_POOL C(pp_recycle); +#endif n->destructor = NULL; C(tail); C(end); @@ -3859,7 +3858,7 @@ int skb_shift(struct sk_buff *tgt, struct sk_buff *skb, int shiftlen) fragto = &skb_shinfo(tgt)->frags[merge]; skb_frag_size_add(fragto, skb_frag_size(fragfrom)); - __skb_frag_unref(fragfrom, skb->pp_recycle); + __skb_frag_unref(fragfrom, skb_pp_recycle(skb)); } /* Reposition in the original skb */ @@ -5529,7 +5528,7 @@ bool skb_try_coalesce(struct sk_buff *to, struct sk_buff *from, * references for cloned SKBs at the moment that would result in * inconsistent reference counts. */ - if (to->pp_recycle != (from->pp_recycle && !skb_cloned(from))) + if (skb_pp_recycle(to) != (skb_pp_recycle(from) && !skb_cloned(from))) return false; if (len <= skb_tailroom(to)) {
On Sat, Jan 28, 2023 at 6:26 AM Jakub Kicinski <kuba@kernel.org> wrote: > > On Sat, 28 Jan 2023 10:37:47 +0800 Yunsheng Lin wrote: > > If we are not allowing gro for the above case, setting NAPI_GRO_CB(p)->flush > > to 1 in gro_list_prepare() seems to be making more sense so that the above > > case has the same handling as skb_has_frag_list() handling? > > https://elixir.bootlin.com/linux/v6.2-rc4/source/net/core/gro.c#L503 > > > > As it seems to avoid some unnecessary operation according to comment > > in tcp4_gro_receive(): > > https://elixir.bootlin.com/linux/v6.2-rc4/source/net/ipv4/tcp_offload.c#L322 > > The frag_list case can be determined with just the input skb. > For pp_recycle we need to compare input skb's pp_recycle with > the pp_recycle of the skb already held by GRO. > > I'll hold off with applying a bit longer tho, in case Eric > wants to chime in with an ack or opinion. Doing the test only if the final step (once all headers have been verified) seems less costly for the vast majority of the cases the driver cooks skbs with a consistent pp_recycle bit ? So Alex patch seems less expensive to me than adding the check very early.
Hello: This patch was applied to netdev/net.git (master) by Jakub Kicinski <kuba@kernel.org>: On Thu, 26 Jan 2023 11:06:59 -0800 you wrote: > From: Alexander Duyck <alexanderduyck@fb.com> > > GSO should not merge page pool recycled frames with standard reference > counted frames. Traditionally this didn't occur, at least not often. > However as we start looking at adding support for wireless adapters there > becomes the potential to mix the two due to A-MSDU repartitioning frames in > the receive path. There are possibly other places where this may have > occurred however I suspect they must be few and far between as we have not > seen this issue until now. > > [...] Here is the summary with links: - [net] skb: Do mix page pool and page referenced frags in GRO https://git.kernel.org/netdev/net/c/7d2c89b32587 You are awesome, thank you!
On Fri, Jan 27, 2023 at 11:16 PM Eric Dumazet <edumazet@google.com> wrote: > > On Sat, Jan 28, 2023 at 6:26 AM Jakub Kicinski <kuba@kernel.org> wrote: > > > > On Sat, 28 Jan 2023 10:37:47 +0800 Yunsheng Lin wrote: > > > If we are not allowing gro for the above case, setting NAPI_GRO_CB(p)->flush > > > to 1 in gro_list_prepare() seems to be making more sense so that the above > > > case has the same handling as skb_has_frag_list() handling? > > > https://elixir.bootlin.com/linux/v6.2-rc4/source/net/core/gro.c#L503 > > > > > > As it seems to avoid some unnecessary operation according to comment > > > in tcp4_gro_receive(): > > > https://elixir.bootlin.com/linux/v6.2-rc4/source/net/ipv4/tcp_offload.c#L322 > > > > The frag_list case can be determined with just the input skb. > > For pp_recycle we need to compare input skb's pp_recycle with > > the pp_recycle of the skb already held by GRO. > > > > I'll hold off with applying a bit longer tho, in case Eric > > wants to chime in with an ack or opinion. > > Doing the test only if the final step (once all headers have been > verified) seems less costly > for the vast majority of the cases the driver cooks skbs with a > consistent pp_recycle bit ? > > So Alex patch seems less expensive to me than adding the check very early. That was the general idea. Basically there is no need to look into this until we are looking at merging the skb and it is very unlikely that we will see a mix of page pool and non-page pool skbs. I considered this check to be something equivalent to discovering there is no space in the skb to store the frags so that is one of the reasons why I had picked the spot that I did.
On Sat, 2023-01-28 at 08:08 +0100, Eric Dumazet wrote: > On Sat, Jan 28, 2023 at 6:26 AM Jakub Kicinski <kuba@kernel.org> wrote: > > > > On Sat, 28 Jan 2023 10:37:47 +0800 Yunsheng Lin wrote: > > > If we are not allowing gro for the above case, setting NAPI_GRO_CB(p)->flush > > > to 1 in gro_list_prepare() seems to be making more sense so that the above > > > case has the same handling as skb_has_frag_list() handling? > > > https://elixir.bootlin.com/linux/v6.2-rc4/source/net/core/gro.c#L503 > > > > > > As it seems to avoid some unnecessary operation according to comment > > > in tcp4_gro_receive(): > > > https://elixir.bootlin.com/linux/v6.2-rc4/source/net/ipv4/tcp_offload.c#L322 > > > > The frag_list case can be determined with just the input skb. > > For pp_recycle we need to compare input skb's pp_recycle with > > the pp_recycle of the skb already held by GRO. > > > > I'll hold off with applying a bit longer tho, in case Eric > > wants to chime in with an ack or opinion. > > We can say that we are adding in the fast path an expensive check > about an unlikely condition. > > GRO is by far the most expensive component in our stack. Slightly related to the above: currently the GRO engine performs the skb metadata check for every packet. My understanding is that even with XDP enabled and ebpf running on the given packet, the skb should usually have meta_len == 0. What about setting 'skb->slow_gro' together with meta_len and moving the skb_metadata_differs() check under the slow_gro guard? Cheers, Paolo
On Mon, Jan 30, 2023 at 12:50 AM Paolo Abeni <pabeni@redhat.com> wrote: > > On Sat, 2023-01-28 at 08:08 +0100, Eric Dumazet wrote: > > On Sat, Jan 28, 2023 at 6:26 AM Jakub Kicinski <kuba@kernel.org> wrote: > > > > > > On Sat, 28 Jan 2023 10:37:47 +0800 Yunsheng Lin wrote: > > > > If we are not allowing gro for the above case, setting NAPI_GRO_CB(p)->flush > > > > to 1 in gro_list_prepare() seems to be making more sense so that the above > > > > case has the same handling as skb_has_frag_list() handling? > > > > https://elixir.bootlin.com/linux/v6.2-rc4/source/net/core/gro.c#L503 > > > > > > > > As it seems to avoid some unnecessary operation according to comment > > > > in tcp4_gro_receive(): > > > > https://elixir.bootlin.com/linux/v6.2-rc4/source/net/ipv4/tcp_offload.c#L322 > > > > > > The frag_list case can be determined with just the input skb. > > > For pp_recycle we need to compare input skb's pp_recycle with > > > the pp_recycle of the skb already held by GRO. > > > > > > I'll hold off with applying a bit longer tho, in case Eric > > > wants to chime in with an ack or opinion. > > > > We can say that we are adding in the fast path an expensive check > > about an unlikely condition. > > > > GRO is by far the most expensive component in our stack. > > Slightly related to the above: currently the GRO engine performs the > skb metadata check for every packet. My understanding is that even with > XDP enabled and ebpf running on the given packet, the skb should > usually have meta_len == 0. > > What about setting 'skb->slow_gro' together with meta_len and moving > the skb_metadata_differs() check under the slow_gro guard? Makes sense to me, especially since we have to do a pointer chase to get the metadata length out of the shared info. Looking at the code one thing I was wondering about is if we should be flagging frames where one is slow_gro and one is not as having a diff and just skipping the checks since we know the slow_gro checks are expensive and if they differ based on that flag odds are one will have a field present that the other doesn't.
On 27/01/2023 00.13, Jakub Kicinski wrote: > On Thu, 26 Jan 2023 11:06:59 -0800 Alexander Duyck wrote: >> From: Alexander Duyck<alexanderduyck@fb.com> >> >> GSO should not merge page pool recycled frames with standard reference >> counted frames. Traditionally this didn't occur, at least not often. >> However as we start looking at adding support for wireless adapters there >> becomes the potential to mix the two due to A-MSDU repartitioning frames in >> the receive path. There are possibly other places where this may have >> occurred however I suspect they must be few and far between as we have not >> seen this issue until now. >> >> Fixes: 53e0961da1c7 ("page_pool: add frag page recycling support in page pool") >> Reported-by: Felix Fietkau<nbd@nbd.name> >> Signed-off-by: Alexander Duyck<alexanderduyck@fb.com> > Exciting investigation! > Felix, out of curiosity - the impact of loosing GRO on performance is > not significant enough to care? We could possibly try to switch to > using the frag list if we can't merge into frags safely. Using the frag list sounds scary, because we recently learned that kfree_skb_list requires all SKBs on the list to have same refcnt (else the walking of the list can lead to other bugs). --Jesper
diff --git a/net/core/gro.c b/net/core/gro.c index 506f83d715f8..4bac7ea6e025 100644 --- a/net/core/gro.c +++ b/net/core/gro.c @@ -162,6 +162,15 @@ int skb_gro_receive(struct sk_buff *p, struct sk_buff *skb) struct sk_buff *lp; int segs; + /* Do not splice page pool based packets w/ non-page pool + * packets. This can result in reference count issues as page + * pool pages will not decrement the reference count and will + * instead be immediately returned to the pool or have frag + * count decremented. + */ + if (p->pp_recycle != skb->pp_recycle) + return -ETOOMANYREFS; + /* pairs with WRITE_ONCE() in netif_set_gro_max_size() */ gro_max_size = READ_ONCE(p->dev->gro_max_size);