Message ID | 20250107152940.26530-6-aleksander.lobakin@intel.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | bpf: cpumap: enable GRO for XDP_PASS frames | expand |
On 1/7/25 4:29 PM, Alexander Lobakin wrote: > Add a function to get an array of skbs from the NAPI percpu cache. > It's supposed to be a drop-in replacement for > kmem_cache_alloc_bulk(skbuff_head_cache, GFP_ATOMIC) and > xdp_alloc_skb_bulk(GFP_ATOMIC). The difference (apart from the > requirement to call it only from the BH) is that it tries to use > as many NAPI cache entries for skbs as possible, and allocate new > ones only if needed. > > The logic is as follows: > > * there is enough skbs in the cache: decache them and return to the > caller; > * not enough: try refilling the cache first. If there is now enough > skbs, return; > * still not enough: try allocating skbs directly to the output array > with %GFP_ZERO, maybe we'll be able to get some. If there's now > enough, return; > * still not enough: return as many as we were able to obtain. > > Most of times, if called from the NAPI polling loop, the first one will > be true, sometimes (rarely) the second one. The third and the fourth -- > only under heavy memory pressure. > It can save significant amounts of CPU cycles if there are GRO cycles > and/or Tx completion cycles (anything that descends to > napi_skb_cache_put()) happening on this CPU. > > Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> > Tested-by: Daniel Xu <dxu@dxuuu.xyz> > --- > include/linux/skbuff.h | 1 + > net/core/skbuff.c | 62 ++++++++++++++++++++++++++++++++++++++++++ > 2 files changed, 63 insertions(+) > > diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h > index bb2b751d274a..1c089c7c14e1 100644 > --- a/include/linux/skbuff.h > +++ b/include/linux/skbuff.h > @@ -1315,6 +1315,7 @@ struct sk_buff *build_skb_around(struct sk_buff *skb, > void *data, unsigned int frag_size); > void skb_attempt_defer_free(struct sk_buff *skb); > > +u32 napi_skb_cache_get_bulk(void **skbs, u32 n); > struct sk_buff *napi_build_skb(void *data, unsigned int frag_size); > struct sk_buff *slab_build_skb(void *data); > > diff --git a/net/core/skbuff.c b/net/core/skbuff.c > index a441613a1e6c..42eb31dcc9ce 100644 > --- a/net/core/skbuff.c > +++ b/net/core/skbuff.c > @@ -367,6 +367,68 @@ static struct sk_buff *napi_skb_cache_get(void) > return skb; > } > > +/** > + * napi_skb_cache_get_bulk - obtain a number of zeroed skb heads from the cache > + * @skbs: pointer to an at least @n-sized array to fill with skb pointers > + * @n: number of entries to provide > + * > + * Tries to obtain @n &sk_buff entries from the NAPI percpu cache and writes > + * the pointers into the provided array @skbs. If there are less entries > + * available, tries to replenish the cache and bulk-allocates the diff from > + * the MM layer if needed. > + * The heads are being zeroed with either memset() or %__GFP_ZERO, so they are > + * ready for {,__}build_skb_around() and don't have any data buffers attached. > + * Must be called *only* from the BH context. > + * > + * Return: number of successfully allocated skbs (@n if no actual allocation > + * needed or kmem_cache_alloc_bulk() didn't fail). > + */ > +u32 napi_skb_cache_get_bulk(void **skbs, u32 n) > +{ > + struct napi_alloc_cache *nc = this_cpu_ptr(&napi_alloc_cache); > + u32 bulk, total = n; > + > + local_lock_nested_bh(&napi_alloc_cache.bh_lock); > + > + if (nc->skb_count >= n) > + goto get; I (mis?)understood from the commit message this condition should be likely, too?!? > + /* No enough cached skbs. Try refilling the cache first */ > + bulk = min(NAPI_SKB_CACHE_SIZE - nc->skb_count, NAPI_SKB_CACHE_BULK); > + nc->skb_count += kmem_cache_alloc_bulk(net_hotdata.skbuff_cache, > + GFP_ATOMIC | __GFP_NOWARN, bulk, > + &nc->skb_cache[nc->skb_count]); > + if (likely(nc->skb_count >= n)) > + goto get; > + > + /* Still not enough. Bulk-allocate the missing part directly, zeroed */ > + n -= kmem_cache_alloc_bulk(net_hotdata.skbuff_cache, > + GFP_ATOMIC | __GFP_ZERO | __GFP_NOWARN, > + n - nc->skb_count, &skbs[nc->skb_count]); You should probably cap 'n' to NAPI_SKB_CACHE_SIZE. Also what about latency spikes when n == 48 (should be the maximum possible with such limit) here? /P
From: Paolo Abeni <pabeni@redhat.com> Date: Thu, 9 Jan 2025 14:16:22 +0100 > On 1/7/25 4:29 PM, Alexander Lobakin wrote: >> Add a function to get an array of skbs from the NAPI percpu cache. >> It's supposed to be a drop-in replacement for >> kmem_cache_alloc_bulk(skbuff_head_cache, GFP_ATOMIC) and >> xdp_alloc_skb_bulk(GFP_ATOMIC). The difference (apart from the >> requirement to call it only from the BH) is that it tries to use >> as many NAPI cache entries for skbs as possible, and allocate new >> ones only if needed. [...] >> +u32 napi_skb_cache_get_bulk(void **skbs, u32 n) >> +{ >> + struct napi_alloc_cache *nc = this_cpu_ptr(&napi_alloc_cache); >> + u32 bulk, total = n; >> + >> + local_lock_nested_bh(&napi_alloc_cache.bh_lock); >> + >> + if (nc->skb_count >= n) >> + goto get; > > I (mis?)understood from the commit message this condition should be > likely, too?!? It depends, I didn't want to make this unlikely() as will happen sometimes anyway, while the two unlikely() below can happen only on when the system is low on memory. > >> + /* No enough cached skbs. Try refilling the cache first */ >> + bulk = min(NAPI_SKB_CACHE_SIZE - nc->skb_count, NAPI_SKB_CACHE_BULK); >> + nc->skb_count += kmem_cache_alloc_bulk(net_hotdata.skbuff_cache, >> + GFP_ATOMIC | __GFP_NOWARN, bulk, >> + &nc->skb_cache[nc->skb_count]); >> + if (likely(nc->skb_count >= n)) >> + goto get; >> + >> + /* Still not enough. Bulk-allocate the missing part directly, zeroed */ >> + n -= kmem_cache_alloc_bulk(net_hotdata.skbuff_cache, >> + GFP_ATOMIC | __GFP_ZERO | __GFP_NOWARN, >> + n - nc->skb_count, &skbs[nc->skb_count]); > > You should probably cap 'n' to NAPI_SKB_CACHE_SIZE. Also what about > latency spikes when n == 48 (should be the maximum possible with such > limit) here? The current users never allocate more than 8 skbs in one bulk. Anyway, the current approach wants to be a drop-in for kmem_cache_alloc_bulk(skbuff_cache), which doesn't cap anything. Not that this last branch allocates to @skbs directly, not to the percpu NAPI cache. > > /P Thanks, Olek
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index bb2b751d274a..1c089c7c14e1 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -1315,6 +1315,7 @@ struct sk_buff *build_skb_around(struct sk_buff *skb, void *data, unsigned int frag_size); void skb_attempt_defer_free(struct sk_buff *skb); +u32 napi_skb_cache_get_bulk(void **skbs, u32 n); struct sk_buff *napi_build_skb(void *data, unsigned int frag_size); struct sk_buff *slab_build_skb(void *data); diff --git a/net/core/skbuff.c b/net/core/skbuff.c index a441613a1e6c..42eb31dcc9ce 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -367,6 +367,68 @@ static struct sk_buff *napi_skb_cache_get(void) return skb; } +/** + * napi_skb_cache_get_bulk - obtain a number of zeroed skb heads from the cache + * @skbs: pointer to an at least @n-sized array to fill with skb pointers + * @n: number of entries to provide + * + * Tries to obtain @n &sk_buff entries from the NAPI percpu cache and writes + * the pointers into the provided array @skbs. If there are less entries + * available, tries to replenish the cache and bulk-allocates the diff from + * the MM layer if needed. + * The heads are being zeroed with either memset() or %__GFP_ZERO, so they are + * ready for {,__}build_skb_around() and don't have any data buffers attached. + * Must be called *only* from the BH context. + * + * Return: number of successfully allocated skbs (@n if no actual allocation + * needed or kmem_cache_alloc_bulk() didn't fail). + */ +u32 napi_skb_cache_get_bulk(void **skbs, u32 n) +{ + struct napi_alloc_cache *nc = this_cpu_ptr(&napi_alloc_cache); + u32 bulk, total = n; + + local_lock_nested_bh(&napi_alloc_cache.bh_lock); + + if (nc->skb_count >= n) + goto get; + + /* No enough cached skbs. Try refilling the cache first */ + bulk = min(NAPI_SKB_CACHE_SIZE - nc->skb_count, NAPI_SKB_CACHE_BULK); + nc->skb_count += kmem_cache_alloc_bulk(net_hotdata.skbuff_cache, + GFP_ATOMIC | __GFP_NOWARN, bulk, + &nc->skb_cache[nc->skb_count]); + if (likely(nc->skb_count >= n)) + goto get; + + /* Still not enough. Bulk-allocate the missing part directly, zeroed */ + n -= kmem_cache_alloc_bulk(net_hotdata.skbuff_cache, + GFP_ATOMIC | __GFP_ZERO | __GFP_NOWARN, + n - nc->skb_count, &skbs[nc->skb_count]); + if (likely(nc->skb_count >= n)) + goto get; + + /* kmem_cache didn't allocate the number we need, limit the output */ + total -= n - nc->skb_count; + n = nc->skb_count; + +get: + for (u32 base = nc->skb_count - n, i = 0; i < n; i++) { + u32 cache_size = kmem_cache_size(net_hotdata.skbuff_cache); + + skbs[i] = nc->skb_cache[base + i]; + + kasan_mempool_unpoison_object(skbs[i], cache_size); + memset(skbs[i], 0, offsetof(struct sk_buff, tail)); + } + + nc->skb_count -= n; + local_unlock_nested_bh(&napi_alloc_cache.bh_lock); + + return total; +} +EXPORT_SYMBOL_GPL(napi_skb_cache_get_bulk); + static inline void __finalize_skb_around(struct sk_buff *skb, void *data, unsigned int size) {