Message ID | 168563651438.3436004.17735707525651776648.stgit@firesoul (mailing list archive) |
---|---|
State | Accepted |
Commit | 411486626e5779bd85439282985ff3fc25a3f6d2 |
Delegated to: | BPF |
Headers | show |
Series | [bpf-next,V2] bpf/xdp: optimize bpf_xdp_pointer to avoid reading sinfo | expand |
> Currently we observed a significant performance degradation in > samples/bpf xdp1 and xdp2, due XDP multibuffer "xdp.frags" handling, > added in commit 772251742262 ("samples/bpf: fixup some tools to be able > to support xdp multibuffer"). > > This patch reduce the overhead by avoiding to read/load shared_info > (sinfo) memory area, when XDP packet don't have any frags. This improves > performance because sinfo is located in another cacheline. > > Function bpf_xdp_pointer() is used by BPF helpers bpf_xdp_load_bytes() > and bpf_xdp_store_bytes(). As a help to reviewers, xdp_get_buff_len() can > potentially access sinfo, but it uses xdp_buff_has_frags() flags bit check > to avoid accessing sinfo in no-frags case. > > The likely/unlikely instrumentation lays out asm code such that sinfo > access isn't interleaved with no-frags case (checked on GCC 12.2.1-4). > The generated asm code is more compact towards the no-frags case. > > The BPF kfunc bpf_dynptr_slice() also use bpf_xdp_pointer(). Thus, it > should also take effect for that. > > Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> > --- > net/core/filter.c | 7 ++++--- > 1 file changed, 4 insertions(+), 3 deletions(-) > > diff --git a/net/core/filter.c b/net/core/filter.c > index 968139f4a1ac..961db5bd2f94 100644 > --- a/net/core/filter.c > +++ b/net/core/filter.c > @@ -3948,20 +3948,21 @@ void bpf_xdp_copy_buf(struct xdp_buff *xdp, unsigned long off, > > void *bpf_xdp_pointer(struct xdp_buff *xdp, u32 offset, u32 len) > { > - struct skb_shared_info *sinfo = xdp_get_shared_info_from_buff(xdp); > u32 size = xdp->data_end - xdp->data; > + struct skb_shared_info *sinfo; > void *addr = xdp->data; > int i; > > if (unlikely(offset > 0xffff || len > 0xffff)) > return ERR_PTR(-EFAULT); > > - if (offset + len > xdp_get_buff_len(xdp)) > + if (unlikely(offset + len > xdp_get_buff_len(xdp))) > return ERR_PTR(-EINVAL); > > - if (offset < size) /* linear area */ > + if (likely((offset < size))) /* linear area */ nit: you can drop a round bracket here. Other than that: Acked-by: Lorenzo Bianconi <lorenzo@kernel.org> > goto out; > > + sinfo = xdp_get_shared_info_from_buff(xdp); > offset -= size; > for (i = 0; i < sinfo->nr_frags; i++) { /* paged area */ > u32 frag_size = skb_frag_size(&sinfo->frags[i]); > >
Jesper Dangaard Brouer <brouer@redhat.com> writes: > Currently we observed a significant performance degradation in > samples/bpf xdp1 and xdp2, due XDP multibuffer "xdp.frags" handling, > added in commit 772251742262 ("samples/bpf: fixup some tools to be able > to support xdp multibuffer"). > > This patch reduce the overhead by avoiding to read/load shared_info > (sinfo) memory area, when XDP packet don't have any frags. This improves > performance because sinfo is located in another cacheline. > > Function bpf_xdp_pointer() is used by BPF helpers bpf_xdp_load_bytes() > and bpf_xdp_store_bytes(). As a help to reviewers, xdp_get_buff_len() can > potentially access sinfo, but it uses xdp_buff_has_frags() flags bit check > to avoid accessing sinfo in no-frags case. > > The likely/unlikely instrumentation lays out asm code such that sinfo > access isn't interleaved with no-frags case (checked on GCC 12.2.1-4). > The generated asm code is more compact towards the no-frags case. > > The BPF kfunc bpf_dynptr_slice() also use bpf_xdp_pointer(). Thus, it > should also take effect for that. > > Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Thanks for fixing this! Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
On Thu, Jun 1, 2023 at 1:34 PM Lorenzo Bianconi <lorenzo@kernel.org> wrote: > > > Currently we observed a significant performance degradation in > > samples/bpf xdp1 and xdp2, due XDP multibuffer "xdp.frags" handling, > > added in commit 772251742262 ("samples/bpf: fixup some tools to be able > > to support xdp multibuffer"). > > > > This patch reduce the overhead by avoiding to read/load shared_info > > (sinfo) memory area, when XDP packet don't have any frags. This improves > > performance because sinfo is located in another cacheline. > > > > Function bpf_xdp_pointer() is used by BPF helpers bpf_xdp_load_bytes() > > and bpf_xdp_store_bytes(). As a help to reviewers, xdp_get_buff_len() can > > potentially access sinfo, but it uses xdp_buff_has_frags() flags bit check > > to avoid accessing sinfo in no-frags case. > > > > The likely/unlikely instrumentation lays out asm code such that sinfo > > access isn't interleaved with no-frags case (checked on GCC 12.2.1-4). > > The generated asm code is more compact towards the no-frags case. > > > > The BPF kfunc bpf_dynptr_slice() also use bpf_xdp_pointer(). Thus, it > > should also take effect for that. > > > > Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> > > --- > > net/core/filter.c | 7 ++++--- > > 1 file changed, 4 insertions(+), 3 deletions(-) > > > > diff --git a/net/core/filter.c b/net/core/filter.c > > index 968139f4a1ac..961db5bd2f94 100644 > > --- a/net/core/filter.c > > +++ b/net/core/filter.c > > @@ -3948,20 +3948,21 @@ void bpf_xdp_copy_buf(struct xdp_buff *xdp, unsigned long off, > > > > void *bpf_xdp_pointer(struct xdp_buff *xdp, u32 offset, u32 len) > > { > > - struct skb_shared_info *sinfo = xdp_get_shared_info_from_buff(xdp); > > u32 size = xdp->data_end - xdp->data; > > + struct skb_shared_info *sinfo; > > void *addr = xdp->data; > > int i; > > > > if (unlikely(offset > 0xffff || len > 0xffff)) > > return ERR_PTR(-EFAULT); > > > > - if (offset + len > xdp_get_buff_len(xdp)) > > + if (unlikely(offset + len > xdp_get_buff_len(xdp))) > > return ERR_PTR(-EINVAL); > > > > - if (offset < size) /* linear area */ > > + if (likely((offset < size))) /* linear area */ > > nit: you can drop a round bracket here. Other than that: Fixed while applying. Thanks everyone. > Acked-by: Lorenzo Bianconi <lorenzo@kernel.org> > > > goto out; > > > > + sinfo = xdp_get_shared_info_from_buff(xdp); > > offset -= size; > > for (i = 0; i < sinfo->nr_frags; i++) { /* paged area */ > > u32 frag_size = skb_frag_size(&sinfo->frags[i]); > > > >
Hello: This patch was applied to bpf/bpf-next.git (master) by Alexei Starovoitov <ast@kernel.org>: On Thu, 01 Jun 2023 18:21:54 +0200 you wrote: > Currently we observed a significant performance degradation in > samples/bpf xdp1 and xdp2, due XDP multibuffer "xdp.frags" handling, > added in commit 772251742262 ("samples/bpf: fixup some tools to be able > to support xdp multibuffer"). > > This patch reduce the overhead by avoiding to read/load shared_info > (sinfo) memory area, when XDP packet don't have any frags. This improves > performance because sinfo is located in another cacheline. > > [...] Here is the summary with links: - [bpf-next,V2] bpf/xdp: optimize bpf_xdp_pointer to avoid reading sinfo https://git.kernel.org/bpf/bpf-next/c/411486626e57 You are awesome, thank you!
diff --git a/net/core/filter.c b/net/core/filter.c index 968139f4a1ac..961db5bd2f94 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -3948,20 +3948,21 @@ void bpf_xdp_copy_buf(struct xdp_buff *xdp, unsigned long off, void *bpf_xdp_pointer(struct xdp_buff *xdp, u32 offset, u32 len) { - struct skb_shared_info *sinfo = xdp_get_shared_info_from_buff(xdp); u32 size = xdp->data_end - xdp->data; + struct skb_shared_info *sinfo; void *addr = xdp->data; int i; if (unlikely(offset > 0xffff || len > 0xffff)) return ERR_PTR(-EFAULT); - if (offset + len > xdp_get_buff_len(xdp)) + if (unlikely(offset + len > xdp_get_buff_len(xdp))) return ERR_PTR(-EINVAL); - if (offset < size) /* linear area */ + if (likely((offset < size))) /* linear area */ goto out; + sinfo = xdp_get_shared_info_from_buff(xdp); offset -= size; for (i = 0; i < sinfo->nr_frags; i++) { /* paged area */ u32 frag_size = skb_frag_size(&sinfo->frags[i]);
Currently we observed a significant performance degradation in samples/bpf xdp1 and xdp2, due XDP multibuffer "xdp.frags" handling, added in commit 772251742262 ("samples/bpf: fixup some tools to be able to support xdp multibuffer"). This patch reduce the overhead by avoiding to read/load shared_info (sinfo) memory area, when XDP packet don't have any frags. This improves performance because sinfo is located in another cacheline. Function bpf_xdp_pointer() is used by BPF helpers bpf_xdp_load_bytes() and bpf_xdp_store_bytes(). As a help to reviewers, xdp_get_buff_len() can potentially access sinfo, but it uses xdp_buff_has_frags() flags bit check to avoid accessing sinfo in no-frags case. The likely/unlikely instrumentation lays out asm code such that sinfo access isn't interleaved with no-frags case (checked on GCC 12.2.1-4). The generated asm code is more compact towards the no-frags case. The BPF kfunc bpf_dynptr_slice() also use bpf_xdp_pointer(). Thus, it should also take effect for that. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> --- net/core/filter.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-)