Message ID | 20230308065936.1550103-13-martin.lau@linux.dev (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | BPF |
Headers | show |
Series | bpf: Use bpf_mem_cache_alloc/free in bpf_local_storage | expand |
On Tue, Mar 07, 2023 at 10:59:31PM -0800, Martin KaFai Lau wrote: > From: Martin KaFai Lau <martin.lau@kernel.org> > > This patch adds a few bpf mem allocator functions which will > be used in the bpf_local_storage in a later patch. > > bpf_mem_cache_alloc_flags(..., gfp_t flags) is added. When the > flags == GFP_KERNEL, it will fallback to __alloc(..., GFP_KERNEL). > bpf_local_storage knows its running context is sleepable (GFP_KERNEL) > and provides a better guarantee on memory allocation. > > bpf_local_storage has some uncommon cases that its selem > cannot be reused immediately. It handles its own > rcu_head and goes through a rcu_trace gp and then free it. > bpf_mem_cache_raw_free() is added for direct free purpose > without leaking the LLIST_NODE_SZ internal knowledge. > During free time, the 'struct bpf_mem_alloc *ma' is no longer > available. However, the caller should know if it is > percpu memory or not and it can call different raw_free functions. > bpf_local_storage does not support percpu value, so only > the non-percpu 'bpf_mem_cache_raw_free()' is added in > this patch. > > Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> > --- > include/linux/bpf_mem_alloc.h | 2 ++ > kernel/bpf/memalloc.c | 42 +++++++++++++++++++++++++++-------- > 2 files changed, 35 insertions(+), 9 deletions(-) > > diff --git a/include/linux/bpf_mem_alloc.h b/include/linux/bpf_mem_alloc.h > index a7104af61ab4..3929be5743f4 100644 > --- a/include/linux/bpf_mem_alloc.h > +++ b/include/linux/bpf_mem_alloc.h > @@ -31,5 +31,7 @@ void bpf_mem_free(struct bpf_mem_alloc *ma, void *ptr); > /* kmem_cache_alloc/free equivalent: */ > void *bpf_mem_cache_alloc(struct bpf_mem_alloc *ma); > void bpf_mem_cache_free(struct bpf_mem_alloc *ma, void *ptr); > +void bpf_mem_cache_raw_free(void *ptr); > +void *bpf_mem_cache_alloc_flags(struct bpf_mem_alloc *ma, gfp_t flags); > > #endif /* _BPF_MEM_ALLOC_H */ > diff --git a/kernel/bpf/memalloc.c b/kernel/bpf/memalloc.c > index 5fcdacbb8439..2b78eed27c9c 100644 > --- a/kernel/bpf/memalloc.c > +++ b/kernel/bpf/memalloc.c > @@ -121,15 +121,8 @@ static struct llist_node notrace *__llist_del_first(struct llist_head *head) > return entry; > } > > -static void *__alloc(struct bpf_mem_cache *c, int node) > +static void *__alloc(struct bpf_mem_cache *c, int node, gfp_t flags) > { > - /* Allocate, but don't deplete atomic reserves that typical > - * GFP_ATOMIC would do. irq_work runs on this cpu and kmalloc > - * will allocate from the current numa node which is what we > - * want here. > - */ > - gfp_t flags = GFP_NOWAIT | __GFP_NOWARN | __GFP_ACCOUNT; > - > if (c->percpu_size) { > void **obj = kmalloc_node(c->percpu_size, flags, node); > void *pptr = __alloc_percpu_gfp(c->unit_size, 8, flags); > @@ -185,7 +178,12 @@ static void alloc_bulk(struct bpf_mem_cache *c, int cnt, int node) > */ > obj = __llist_del_first(&c->free_by_rcu); > if (!obj) { > - obj = __alloc(c, node); > + /* Allocate, but don't deplete atomic reserves that typical > + * GFP_ATOMIC would do. irq_work runs on this cpu and kmalloc > + * will allocate from the current numa node which is what we > + * want here. > + */ > + obj = __alloc(c, node, GFP_NOWAIT | __GFP_NOWARN | __GFP_ACCOUNT); > if (!obj) > break; > } > @@ -676,3 +674,29 @@ void notrace bpf_mem_cache_free(struct bpf_mem_alloc *ma, void *ptr) > > unit_free(this_cpu_ptr(ma->cache), ptr); > } > + > +void bpf_mem_cache_raw_free(void *ptr) > +{ > + kfree(ptr - LLIST_NODE_SZ); > +} I think this needs a big comment explaining when it's ok to use it. The tradeoffs of missing free list and what it means. Also it needs if (!ptr) return; for consistency. The rest of the patches look fine. I've applied all except 12, 13, 14.
diff --git a/include/linux/bpf_mem_alloc.h b/include/linux/bpf_mem_alloc.h index a7104af61ab4..3929be5743f4 100644 --- a/include/linux/bpf_mem_alloc.h +++ b/include/linux/bpf_mem_alloc.h @@ -31,5 +31,7 @@ void bpf_mem_free(struct bpf_mem_alloc *ma, void *ptr); /* kmem_cache_alloc/free equivalent: */ void *bpf_mem_cache_alloc(struct bpf_mem_alloc *ma); void bpf_mem_cache_free(struct bpf_mem_alloc *ma, void *ptr); +void bpf_mem_cache_raw_free(void *ptr); +void *bpf_mem_cache_alloc_flags(struct bpf_mem_alloc *ma, gfp_t flags); #endif /* _BPF_MEM_ALLOC_H */ diff --git a/kernel/bpf/memalloc.c b/kernel/bpf/memalloc.c index 5fcdacbb8439..2b78eed27c9c 100644 --- a/kernel/bpf/memalloc.c +++ b/kernel/bpf/memalloc.c @@ -121,15 +121,8 @@ static struct llist_node notrace *__llist_del_first(struct llist_head *head) return entry; } -static void *__alloc(struct bpf_mem_cache *c, int node) +static void *__alloc(struct bpf_mem_cache *c, int node, gfp_t flags) { - /* Allocate, but don't deplete atomic reserves that typical - * GFP_ATOMIC would do. irq_work runs on this cpu and kmalloc - * will allocate from the current numa node which is what we - * want here. - */ - gfp_t flags = GFP_NOWAIT | __GFP_NOWARN | __GFP_ACCOUNT; - if (c->percpu_size) { void **obj = kmalloc_node(c->percpu_size, flags, node); void *pptr = __alloc_percpu_gfp(c->unit_size, 8, flags); @@ -185,7 +178,12 @@ static void alloc_bulk(struct bpf_mem_cache *c, int cnt, int node) */ obj = __llist_del_first(&c->free_by_rcu); if (!obj) { - obj = __alloc(c, node); + /* Allocate, but don't deplete atomic reserves that typical + * GFP_ATOMIC would do. irq_work runs on this cpu and kmalloc + * will allocate from the current numa node which is what we + * want here. + */ + obj = __alloc(c, node, GFP_NOWAIT | __GFP_NOWARN | __GFP_ACCOUNT); if (!obj) break; } @@ -676,3 +674,29 @@ void notrace bpf_mem_cache_free(struct bpf_mem_alloc *ma, void *ptr) unit_free(this_cpu_ptr(ma->cache), ptr); } + +void bpf_mem_cache_raw_free(void *ptr) +{ + kfree(ptr - LLIST_NODE_SZ); +} + +void notrace *bpf_mem_cache_alloc_flags(struct bpf_mem_alloc *ma, gfp_t flags) +{ + struct bpf_mem_cache *c; + void *ret; + + c = this_cpu_ptr(ma->cache); + + ret = unit_alloc(c); + if (!ret && flags == GFP_KERNEL) { + struct mem_cgroup *memcg, *old_memcg; + + memcg = get_memcg(c); + old_memcg = set_active_memcg(memcg); + ret = __alloc(c, NUMA_NO_NODE, GFP_KERNEL | __GFP_NOWARN | __GFP_ACCOUNT); + set_active_memcg(old_memcg); + mem_cgroup_put(memcg); + } + + return !ret ? NULL : ret + LLIST_NODE_SZ; +}