Message ID | 20240209040608.98927-15-alexei.starovoitov@gmail.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | bpf: Introduce BPF arena. | expand |
On Thu, 2024-02-08 at 20:06 -0800, Alexei Starovoitov wrote: > From: Alexei Starovoitov <ast@kernel.org> > > LLVM automatically places __arena variables into ".arena.1" ELF section. > When libbpf sees such section it creates internal 'struct bpf_map' LIBBPF_MAP_ARENA > that is connected to actual BPF_MAP_TYPE_ARENA 'struct bpf_map'. > They share the same kernel's side bpf map and single map_fd. > Both are emitted into skeleton. Real arena with the name given by bpf program > in SEC(".maps") and another with "__arena_internal" name. > All global variables from ".arena.1" section are accessible from user space > via skel->arena->name_of_var. [...] I hit a strange bug when playing with patch. Consider a simple example [0]. When the following BPF global variable: int __arena * __arena bar; - is commented -- the test passes; - is uncommented -- in the test fails because global variable 'shared' is NULL. Note: the second __arena is necessary to put 'bar' to .arena.1 section. [0] https://github.com/kernel-patches/bpf/commit/6d95c8557c25d01ef3f13e6aef2bda9ac2516484
On Mon, Feb 12, 2024 at 4:34 PM Eduard Zingerman <eddyz87@gmail.com> wrote: > > On Thu, 2024-02-08 at 20:06 -0800, Alexei Starovoitov wrote: > > From: Alexei Starovoitov <ast@kernel.org> > > > > LLVM automatically places __arena variables into ".arena.1" ELF section. > > When libbpf sees such section it creates internal 'struct bpf_map' LIBBPF_MAP_ARENA > > that is connected to actual BPF_MAP_TYPE_ARENA 'struct bpf_map'. > > They share the same kernel's side bpf map and single map_fd. > > Both are emitted into skeleton. Real arena with the name given by bpf program > > in SEC(".maps") and another with "__arena_internal" name. > > All global variables from ".arena.1" section are accessible from user space > > via skel->arena->name_of_var. > > [...] > > I hit a strange bug when playing with patch. Consider a simple example [0]. > When the following BPF global variable: > > int __arena * __arena bar; > > - is commented -- the test passes; > - is uncommented -- in the test fails because global variable 'shared' is NULL. Right. That's expected, because __uint(max_entries, 1); The test creates an area on 1 page and it's consumed by int __arena * __arena bar; variable. Of course, one variable doesn't take the whole page. There could have been many arena global vars. But that page is not available anymore to bpf_arena_alloc_pages, so it returns NULL.
On Mon, 2024-02-12 at 16:44 -0800, Alexei Starovoitov wrote: > > I hit a strange bug when playing with patch. Consider a simple example [0]. > > When the following BPF global variable: > > > > int __arena * __arena bar; > > > > - is commented -- the test passes; > > - is uncommented -- in the test fails because global variable 'shared' is NULL. > > Right. That's expected, because __uint(max_entries, 1); > The test creates an area on 1 page and it's consumed > by int __arena * __arena bar; variable. > Of course, one variable doesn't take the whole page. > There could have been many arena global vars. > But that page is not available anymore to bpf_arena_alloc_pages, > so it returns NULL. My bad, thank you for explaining.
On Mon, Feb 12, 2024 at 4:49 PM Eduard Zingerman <eddyz87@gmail.com> wrote: > > On Mon, 2024-02-12 at 16:44 -0800, Alexei Starovoitov wrote: > > > I hit a strange bug when playing with patch. Consider a simple example [0]. > > > When the following BPF global variable: > > > > > > int __arena * __arena bar; > > > > > > - is commented -- the test passes; > > > - is uncommented -- in the test fails because global variable 'shared' is NULL. > > > > Right. That's expected, because __uint(max_entries, 1); > > The test creates an area on 1 page and it's consumed > > by int __arena * __arena bar; variable. > > Of course, one variable doesn't take the whole page. > > There could have been many arena global vars. > > But that page is not available anymore to bpf_arena_alloc_pages, > > so it returns NULL. > > My bad, thank you for explaining. Since it was a surprising behavior we can make libbpf to auto-extend max_entries with the number of pages necessary for arena global vars, but it will be surprising too. struct { __uint(type, BPF_MAP_TYPE_ARENA); __uint(map_flags, BPF_F_MMAPABLE); __ulong(map_extra, 2ull << 44); // this is start of user VMA __uint(max_entries, 1000); // this is length of user VMA in pages } arena SEC(".maps"); if libbpf adds extra pages to max_entries the user_vm_end shifts too and libbpf would need to mmap() it with that size. When all is hidden in libbpf it's fine, but still can be a surprise to see a different max_entries in map_info and bpftool map list. Not sure which way is user friendlier.
On Mon, 2024-02-12 at 18:08 -0800, Alexei Starovoitov wrote: [...] > Since it was a surprising behavior we can make libbpf > to auto-extend max_entries with the number of pages necessary > for arena global vars, but it will be surprising too. > > struct { > __uint(type, BPF_MAP_TYPE_ARENA); > __uint(map_flags, BPF_F_MMAPABLE); > __ulong(map_extra, 2ull << 44); // this is start of user VMA > __uint(max_entries, 1000); // this is length of user VMA in pages > } arena SEC(".maps"); > > if libbpf adds extra pages to max_entries the user_vm_end shifts too > and libbpf would need to mmap() it with that size. > When all is hidden in libbpf it's fine, but still can be a surprise > to see a different max_entries in map_info and bpftool map list. > Not sure which way is user friendlier. Adjusting max_entries would be surprising indeed. On the other hand, it would remove the error condition about "Declared arena map size %zd is too small ...". Probably either way is fine, as long as it is documented. Don't have a strong opinion.
On Thu, 2024-02-08 at 20:06 -0800, Alexei Starovoitov wrote: > From: Alexei Starovoitov <ast@kernel.org> > > LLVM automatically places __arena variables into ".arena.1" ELF section. > When libbpf sees such section it creates internal 'struct bpf_map' LIBBPF_MAP_ARENA > that is connected to actual BPF_MAP_TYPE_ARENA 'struct bpf_map'. > They share the same kernel's side bpf map and single map_fd. > Both are emitted into skeleton. Real arena with the name given by bpf program > in SEC(".maps") and another with "__arena_internal" name. > All global variables from ".arena.1" section are accessible from user space > via skel->arena->name_of_var. > > For bss/data/rodata the skeleton/libbpf perform the following sequence: > 1. addr = mmap(MAP_ANONYMOUS) > 2. user space optionally modifies global vars > 3. map_fd = bpf_create_map() > 4. bpf_update_map_elem(map_fd, addr) // to store values into the kernel > 5. mmap(addr, MAP_FIXED, map_fd) > after step 5 user spaces see the values it wrote at step 2 at the same addresses > > arena doesn't support update_map_elem. Hence skeleton/libbpf do: > 1. addr = mmap(MAP_ANONYMOUS) > 2. user space optionally modifies global vars > 3. map_fd = bpf_create_map(MAP_TYPE_ARENA) > 4. real_addr = mmap(map->map_extra, MAP_SHARED | MAP_FIXED, map_fd) > 5. memcpy(real_addr, addr) // this will fault-in and allocate pages > 6. munmap(addr) > > At the end look and feel of global data vs __arena global data is the same from bpf prog pov. [...] So, at first I thought that having two maps is a bit of a hack. However, after trying to make it work with only one map I don't really like that either :) The patch looks good to me, have not spotted any logical issues. I have two questions if you don't mind: First is regarding initialization data. In bpf_object__init_internal_map() the amount of bpf_map_mmap_sz(map) bytes is mmaped and only data_sz bytes are copied, then bpf_map_mmap_sz(map) bytes are copied in bpf_object__create_maps(). Is Linux/libc smart enough to skip action on pages which were mmaped but never touched? Second is regarding naming. Currently only one arena is supported, and generated skel has a single '->arena' field. Is there a plan to support multiple arenas at some point? If so, should '->arena' field use the same name as arena map declared in program?
On Thu, Feb 8, 2024 at 8:07 PM Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > > From: Alexei Starovoitov <ast@kernel.org> > > LLVM automatically places __arena variables into ".arena.1" ELF section. > When libbpf sees such section it creates internal 'struct bpf_map' LIBBPF_MAP_ARENA > that is connected to actual BPF_MAP_TYPE_ARENA 'struct bpf_map'. > They share the same kernel's side bpf map and single map_fd. > Both are emitted into skeleton. Real arena with the name given by bpf program > in SEC(".maps") and another with "__arena_internal" name. > All global variables from ".arena.1" section are accessible from user space > via skel->arena->name_of_var. This "real arena" vs "__arena_internal" seems like an unnecessary complication. When processing .arena.1 ELF section, we search for explicit BPF_MAP_TYPE_ARENA map, right? Great, at that point, we can use that map and it's map->mmapable pointer (we mmap() anonymous memory to hold initial values of global variables). We copy init values into map->mmapable on actual arena struct bpf_map. Then during map creation (during load) we do a new mmap(map_fd), taking into account map_extra. Then memcpy() from the original anonymous mmap into this arena-linked mmap. Then discard the original mmap. This way we don't have fake maps anymore, we initialize actual map (we might need to just remember smaller init mmap_sz, which doesn't seem like a big complication). WDYT? BTW, I think bpf_object__load_skeleton() can re-assign skeleton's arena data pointer, so user accessing skel->arena->var before and after skeleton load will be getting correct pointer. > > For bss/data/rodata the skeleton/libbpf perform the following sequence: > 1. addr = mmap(MAP_ANONYMOUS) > 2. user space optionally modifies global vars > 3. map_fd = bpf_create_map() > 4. bpf_update_map_elem(map_fd, addr) // to store values into the kernel > 5. mmap(addr, MAP_FIXED, map_fd) > after step 5 user spaces see the values it wrote at step 2 at the same addresses > > arena doesn't support update_map_elem. Hence skeleton/libbpf do: > 1. addr = mmap(MAP_ANONYMOUS) > 2. user space optionally modifies global vars > 3. map_fd = bpf_create_map(MAP_TYPE_ARENA) > 4. real_addr = mmap(map->map_extra, MAP_SHARED | MAP_FIXED, map_fd) > 5. memcpy(real_addr, addr) // this will fault-in and allocate pages > 6. munmap(addr) > > At the end look and feel of global data vs __arena global data is the same from bpf prog pov. > > Another complication is: > struct { > __uint(type, BPF_MAP_TYPE_ARENA); > } arena SEC(".maps"); > > int __arena foo; > int bar; > > ptr1 = &foo; // relocation against ".arena.1" section > ptr2 = &arena; // relocation against ".maps" section > ptr3 = &bar; // relocation against ".bss" section > > Fo the kernel ptr1 and ptr2 has point to the same arena's map_fd > while ptr3 points to a different global array's map_fd. > For the verifier: > ptr1->type == unknown_scalar > ptr2->type == const_ptr_to_map > ptr3->type == ptr_to_map_value > > after the verifier and for JIT all 3 ptr-s are normal ld_imm64 insns. > > Signed-off-by: Alexei Starovoitov <ast@kernel.org> > --- > tools/bpf/bpftool/gen.c | 13 ++++- > tools/lib/bpf/libbpf.c | 102 +++++++++++++++++++++++++++++++++++----- > 2 files changed, 101 insertions(+), 14 deletions(-) > [...] > @@ -1718,10 +1722,34 @@ static int > bpf_object__init_internal_map(struct bpf_object *obj, enum libbpf_map_type type, > const char *real_name, int sec_idx, void *data, size_t data_sz) > { > + const long page_sz = sysconf(_SC_PAGE_SIZE); > + struct bpf_map *map, *arena = NULL; > struct bpf_map_def *def; > - struct bpf_map *map; > size_t mmap_sz; > - int err; > + int err, i; > + > + if (type == LIBBPF_MAP_ARENA) { > + for (i = 0; i < obj->nr_maps; i++) { > + map = &obj->maps[i]; > + if (map->def.type != BPF_MAP_TYPE_ARENA) > + continue; > + arena = map; > + real_name = "__arena_internal"; > + mmap_sz = bpf_map_mmap_sz(map); > + if (roundup(data_sz, page_sz) > mmap_sz) { > + pr_warn("Declared arena map size %zd is too small to hold" > + "global __arena variables of size %zd\n", > + mmap_sz, data_sz); > + return -E2BIG; > + } > + break; > + } > + if (!arena) { > + pr_warn("To use global __arena variables the arena map should" > + "be declared explicitly in SEC(\".maps\")\n"); > + return -ENOENT; > + } so basically here we found arena, let's arena->mmapable = mmap(MAP_ANONYMOUS) here, memcpy(arena->mmapable, data, data_sz) and exit early, not doing bpf_object__add_map()? > + } > > map = bpf_object__add_map(obj); > if (IS_ERR(map)) > @@ -1732,6 +1760,7 @@ bpf_object__init_internal_map(struct bpf_object *obj, enum libbpf_map_type type, > map->sec_offset = 0; > map->real_name = strdup(real_name); > map->name = internal_map_name(obj, real_name); > + map->arena = arena; > if (!map->real_name || !map->name) { > zfree(&map->real_name); > zfree(&map->name); [...] > @@ -13437,6 +13508,11 @@ int bpf_object__load_skeleton(struct bpf_object_skeleton *s) > continue; > } > > + if (map->arena) { > + *mmaped = map->arena->mmaped; > + continue; > + } > + yep, I was going to suggest that we can fix up this pointer in load_skeleton, nice > if (map->def.map_flags & BPF_F_RDONLY_PROG) > prot = PROT_READ; > else > -- > 2.34.1 >
On Tue, Feb 13, 2024 at 3:11 PM Eduard Zingerman <eddyz87@gmail.com> wrote: > > On Thu, 2024-02-08 at 20:06 -0800, Alexei Starovoitov wrote: > > From: Alexei Starovoitov <ast@kernel.org> > > > > LLVM automatically places __arena variables into ".arena.1" ELF section. > > When libbpf sees such section it creates internal 'struct bpf_map' LIBBPF_MAP_ARENA > > that is connected to actual BPF_MAP_TYPE_ARENA 'struct bpf_map'. > > They share the same kernel's side bpf map and single map_fd. > > Both are emitted into skeleton. Real arena with the name given by bpf program > > in SEC(".maps") and another with "__arena_internal" name. > > All global variables from ".arena.1" section are accessible from user space > > via skel->arena->name_of_var. > > > > For bss/data/rodata the skeleton/libbpf perform the following sequence: > > 1. addr = mmap(MAP_ANONYMOUS) > > 2. user space optionally modifies global vars > > 3. map_fd = bpf_create_map() > > 4. bpf_update_map_elem(map_fd, addr) // to store values into the kernel > > 5. mmap(addr, MAP_FIXED, map_fd) > > after step 5 user spaces see the values it wrote at step 2 at the same addresses > > > > arena doesn't support update_map_elem. Hence skeleton/libbpf do: > > 1. addr = mmap(MAP_ANONYMOUS) > > 2. user space optionally modifies global vars > > 3. map_fd = bpf_create_map(MAP_TYPE_ARENA) > > 4. real_addr = mmap(map->map_extra, MAP_SHARED | MAP_FIXED, map_fd) > > 5. memcpy(real_addr, addr) // this will fault-in and allocate pages > > 6. munmap(addr) > > > > At the end look and feel of global data vs __arena global data is the same from bpf prog pov. > > [...] > > So, at first I thought that having two maps is a bit of a hack. yep, that was my instinct as well > However, after trying to make it work with only one map I don't really > like that either :) Can you elaborate? see my reply to Alexei, I wonder how did you think about doing this? > > The patch looks good to me, have not spotted any logical issues. > > I have two questions if you don't mind: > > First is regarding initialization data. > In bpf_object__init_internal_map() the amount of bpf_map_mmap_sz(map) > bytes is mmaped and only data_sz bytes are copied, > then bpf_map_mmap_sz(map) bytes are copied in bpf_object__create_maps(). > Is Linux/libc smart enough to skip action on pages which were mmaped but > never touched? > > Second is regarding naming. > Currently only one arena is supported, and generated skel has a single '->arena' field. > Is there a plan to support multiple arenas at some point? > If so, should '->arena' field use the same name as arena map declared in program? >
On Tue, 2024-02-13 at 15:17 -0800, Andrii Nakryiko wrote: [...] > > So, at first I thought that having two maps is a bit of a hack. > > yep, that was my instinct as well > > > However, after trying to make it work with only one map I don't really > > like that either :) > > Can you elaborate? see my reply to Alexei, I wonder how did you think > about doing this? Relocations in the ELF file are against a new section: ".arena.1". This works nicely with logic in bpf_program__record_reloc(). If single map is used, we effectively need to track two indexes for the map section: - one used for relocations against map variables themselves (named "generic map reference relocation" in the function code); - one used for relocations against ".arena.1" (named "global data map relocation" in the function code). This spooked me off: - either bpf_object__init_internal_map() would have a specialized branch for arenas, as with current approach; - or bpf_program__record_reloc() would have a specialized branch for arenas, as with one map approach. Additionally, skel generation logic currently assumes that mmapable bindings would be generated only for internal maps, but that is probably not a big deal.
On Tue, Feb 13, 2024 at 3:37 PM Eduard Zingerman <eddyz87@gmail.com> wrote: > > On Tue, 2024-02-13 at 15:17 -0800, Andrii Nakryiko wrote: > > [...] > > > > So, at first I thought that having two maps is a bit of a hack. > > > > yep, that was my instinct as well > > > > > However, after trying to make it work with only one map I don't really > > > like that either :) > > > > Can you elaborate? see my reply to Alexei, I wonder how did you think > > about doing this? > > Relocations in the ELF file are against a new section: ".arena.1". > This works nicely with logic in bpf_program__record_reloc(). > If single map is used, we effectively need to track two indexes for > the map section: > - one used for relocations against map variables themselves > (named "generic map reference relocation" in the function code); > - one used for relocations against ".arena.1" > (named "global data map relocation" in the function code). > > This spooked me off: > - either bpf_object__init_internal_map() would have a specialized > branch for arenas, as with current approach; > - or bpf_program__record_reloc() would have a specialized branch for arenas, > as with one map approach. Yes, relocations would know about .arena.1, but it's a pretty simple check in a few places. We basically have arena *definition* sec_idx (corresponding to SEC(".maps")) and arena *data* sec_idx. The latter is what is recorded for global variables in .arena.1. We can remember this arena data sec_idx in struct bpf_object once during ELF processing, and then just special case it internally in a few places. The "fake" bpf_map for __arena_internal is user-visible and requires autocreate=false tricks, etc. I feel like it's a worse tradeoff from a user API perspective than a few extra ARENA-specific internal checks (which we already have a few anyways, ARENA is not completely transparent internally anyways). > > Additionally, skel generation logic currently assumes that mmapable > bindings would be generated only for internal maps, > but that is probably not a big deal. yeah, it's not, we will have STRUCT_OPS maps handled special anyways (Kui-Feng posted an RFC already), so ARENA won't be the only one special case
On Tue, 2024-02-13 at 16:09 -0800, Andrii Nakryiko wrote: [...] > The "fake" bpf_map for __arena_internal is user-visible and requires > autocreate=false tricks, etc. I feel like it's a worse tradeoff from a > user API perspective than a few extra ARENA-specific internal checks > (which we already have a few anyways, ARENA is not completely > transparent internally anyways). By user-visible you mean when doing "bpf_object__for_each_map()", right? Shouldn't users ignore bpf_map__is_internal() maps? But I agree that having one map might be a bit cleaner.
On Tue, Feb 13, 2024 at 4:16 PM Eduard Zingerman <eddyz87@gmail.com> wrote: > > On Tue, 2024-02-13 at 16:09 -0800, Andrii Nakryiko wrote: > [...] > > > The "fake" bpf_map for __arena_internal is user-visible and requires > > autocreate=false tricks, etc. I feel like it's a worse tradeoff from a > > user API perspective than a few extra ARENA-specific internal checks > > (which we already have a few anyways, ARENA is not completely > > transparent internally anyways). > > By user-visible you mean when doing "bpf_object__for_each_map()", right? > Shouldn't users ignore bpf_map__is_internal() maps? no, not really, they are valid maps, and you can bpf_map__set_value_size() on them (for example). Similarly for bpf_map__initial_value(). And here it will be interesting that before load you should use bpf_map__initial_value(__arena_internal) if you want to tune something, and after load it will be bpf_map__initial_value(real_arena). While if we combine them, it actually will work more naturally. > But I agree that having one map might be a bit cleaner. +1
On Tue, Feb 13, 2024 at 3:11 PM Eduard Zingerman <eddyz87@gmail.com> wrote: > > On Thu, 2024-02-08 at 20:06 -0800, Alexei Starovoitov wrote: > > From: Alexei Starovoitov <ast@kernel.org> > > > > LLVM automatically places __arena variables into ".arena.1" ELF section. > > When libbpf sees such section it creates internal 'struct bpf_map' LIBBPF_MAP_ARENA > > that is connected to actual BPF_MAP_TYPE_ARENA 'struct bpf_map'. > > They share the same kernel's side bpf map and single map_fd. > > Both are emitted into skeleton. Real arena with the name given by bpf program > > in SEC(".maps") and another with "__arena_internal" name. > > All global variables from ".arena.1" section are accessible from user space > > via skel->arena->name_of_var. > > > > For bss/data/rodata the skeleton/libbpf perform the following sequence: > > 1. addr = mmap(MAP_ANONYMOUS) > > 2. user space optionally modifies global vars > > 3. map_fd = bpf_create_map() > > 4. bpf_update_map_elem(map_fd, addr) // to store values into the kernel > > 5. mmap(addr, MAP_FIXED, map_fd) > > after step 5 user spaces see the values it wrote at step 2 at the same addresses > > > > arena doesn't support update_map_elem. Hence skeleton/libbpf do: > > 1. addr = mmap(MAP_ANONYMOUS) > > 2. user space optionally modifies global vars > > 3. map_fd = bpf_create_map(MAP_TYPE_ARENA) > > 4. real_addr = mmap(map->map_extra, MAP_SHARED | MAP_FIXED, map_fd) > > 5. memcpy(real_addr, addr) // this will fault-in and allocate pages > > 6. munmap(addr) > > > > At the end look and feel of global data vs __arena global data is the same from bpf prog pov. > > [...] > > So, at first I thought that having two maps is a bit of a hack. > However, after trying to make it work with only one map I don't really > like that either :) My first attempt was with single arena map, but it ended up with hacks all over libbpf and bpftool to treat one map differently depending on conditions. Two maps simplified the code a lot. > The patch looks good to me, have not spotted any logical issues. > > I have two questions if you don't mind: > > First is regarding initialization data. > In bpf_object__init_internal_map() the amount of bpf_map_mmap_sz(map) > bytes is mmaped and only data_sz bytes are copied, > then bpf_map_mmap_sz(map) bytes are copied in bpf_object__create_maps(). > Is Linux/libc smart enough to skip action on pages which were mmaped but > never touched? kernel gives zeroed out pages to user space. So it's ok to mmap a page, copy data_sz bytes into it and later copy the full page from one addr to another. No garbage copy. In this case there will be data by llvm construction of ".arena.1" It looks to me that even .bss-like __arena vars have zero-s in data and non-zero data_sz. > > Second is regarding naming. > Currently only one arena is supported, and generated skel has a single '->arena' field. > Is there a plan to support multiple arenas at some point? > If so, should '->arena' field use the same name as arena map declared in program? I wanted to place all global arena vars into a default name "arena" and let skeleton to use that name without thinking what name bpf prog gave to the actual map.
On Tue, Feb 13, 2024 at 4:09 PM Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote: > > On Tue, Feb 13, 2024 at 3:37 PM Eduard Zingerman <eddyz87@gmail.com> wrote: > > > > On Tue, 2024-02-13 at 15:17 -0800, Andrii Nakryiko wrote: > > > > [...] > > > > > > So, at first I thought that having two maps is a bit of a hack. > > > > > > yep, that was my instinct as well > > > > > > > However, after trying to make it work with only one map I don't really > > > > like that either :) > > > > > > Can you elaborate? see my reply to Alexei, I wonder how did you think > > > about doing this? > > > > Relocations in the ELF file are against a new section: ".arena.1". > > This works nicely with logic in bpf_program__record_reloc(). > > If single map is used, we effectively need to track two indexes for > > the map section: > > - one used for relocations against map variables themselves > > (named "generic map reference relocation" in the function code); > > - one used for relocations against ".arena.1" > > (named "global data map relocation" in the function code). > > > > This spooked me off: > > - either bpf_object__init_internal_map() would have a specialized > > branch for arenas, as with current approach; > > - or bpf_program__record_reloc() would have a specialized branch for arenas, > > as with one map approach. > > Yes, relocations would know about .arena.1, but it's a pretty simple > check in a few places. We basically have arena *definition* sec_idx > (corresponding to SEC(".maps")) and arena *data* sec_idx. The latter > is what is recorded for global variables in .arena.1. We can remember > this arena data sec_idx in struct bpf_object once during ELF > processing, and then just special case it internally in a few places. That was my first attempt and bpf_program__record_reloc() became a mess. Currently it does relo search either in internal maps or in obj->efile.btf_maps_shndx. Doing double search wasn't nice. And further, such dual meaning of 'struct bpf_map' object messes assumptions of bpf_object__shndx_is_maps, bpf_object__shndx_is_data and the way libbpf treats map->libbpf_type everywhere. bpf_map__is_internal() cannot really say true or false for such dual use map. Then skeleton gen gets ugly. Needs more public libbpf APIs to use in bpftool gen. Just a mess. > The "fake" bpf_map for __arena_internal is user-visible and requires > autocreate=false tricks, etc. I feel like it's a worse tradeoff from a > user API perspective than a few extra ARENA-specific internal checks > (which we already have a few anyways, ARENA is not completely > transparent internally anyways). what do you mean 'user visible'? I can add a filter to avoid generating a pointer for it in a skeleton. Then it won't be any more visible than other bss/data fake maps. The 2nd fake arena returns true out of bpf_map__is_internal. The key comment in the patch: /* bpf_object will contain two arena maps: * LIBBPF_MAP_ARENA & BPF_MAP_TYPE_ARENA * and * LIBBPF_MAP_UNSPEC & BPF_MAP_TYPE_ARENA. * The former map->arena will point to latter. */
On Tue, 2024-02-13 at 17:02 -0800, Alexei Starovoitov wrote: [...] > > First is regarding initialization data. > > In bpf_object__init_internal_map() the amount of bpf_map_mmap_sz(map) > > bytes is mmaped and only data_sz bytes are copied, > > then bpf_map_mmap_sz(map) bytes are copied in bpf_object__create_maps(). > > Is Linux/libc smart enough to skip action on pages which were mmaped but > > never touched? > > kernel gives zeroed out pages to user space. > So it's ok to mmap a page, copy data_sz bytes into it > and later copy the full page from one addr to another. > No garbage copy. > In this case there will be data by llvm construction of ".arena.1" > It looks to me that even .bss-like __arena vars have zero-s in data > and non-zero data_sz. I was actually worried about second memcpy increasing RSS unnecessarily, but I missed that internal map does: def->max_entries = roundup(data_sz, page_sz) / page_sz; So that is not an issue as bpf_map_mmap_sz() for internal map is proportional to data_sz, not full arena. Sorry for the noise.
On Tue, Feb 13, 2024 at 5:24 PM Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > > On Tue, Feb 13, 2024 at 4:09 PM Andrii Nakryiko > <andrii.nakryiko@gmail.com> wrote: > > > > On Tue, Feb 13, 2024 at 3:37 PM Eduard Zingerman <eddyz87@gmail.com> wrote: > > > > > > On Tue, 2024-02-13 at 15:17 -0800, Andrii Nakryiko wrote: > > > > > > [...] > > > > > > > > So, at first I thought that having two maps is a bit of a hack. > > > > > > > > yep, that was my instinct as well > > > > > > > > > However, after trying to make it work with only one map I don't really > > > > > like that either :) > > > > > > > > Can you elaborate? see my reply to Alexei, I wonder how did you think > > > > about doing this? > > > > > > Relocations in the ELF file are against a new section: ".arena.1". > > > This works nicely with logic in bpf_program__record_reloc(). > > > If single map is used, we effectively need to track two indexes for > > > the map section: > > > - one used for relocations against map variables themselves > > > (named "generic map reference relocation" in the function code); > > > - one used for relocations against ".arena.1" > > > (named "global data map relocation" in the function code). > > > > > > This spooked me off: > > > - either bpf_object__init_internal_map() would have a specialized > > > branch for arenas, as with current approach; > > > - or bpf_program__record_reloc() would have a specialized branch for arenas, > > > as with one map approach. > > > > Yes, relocations would know about .arena.1, but it's a pretty simple > > check in a few places. We basically have arena *definition* sec_idx > > (corresponding to SEC(".maps")) and arena *data* sec_idx. The latter > > is what is recorded for global variables in .arena.1. We can remember > > this arena data sec_idx in struct bpf_object once during ELF > > processing, and then just special case it internally in a few places. > > That was my first attempt and bpf_program__record_reloc() > became a mess. > Currently it does relo search either in internal maps > or in obj->efile.btf_maps_shndx. > Doing double search wasn't nice. > And further, such dual meaning of 'struct bpf_map' object messes > assumptions of bpf_object__shndx_is_maps, bpf_object__shndx_is_data > and the way libbpf treats map->libbpf_type everywhere. > > bpf_map__is_internal() cannot really say true or false > for such dual use map. > Then skeleton gen gets ugly. > Needs more public libbpf APIs to use in bpftool gen. > Just a mess. It might be easier for me to try implement it the way I see it than discuss it over emails. I'll give it a try today-tomorrow and get back to you. > > > The "fake" bpf_map for __arena_internal is user-visible and requires > > autocreate=false tricks, etc. I feel like it's a worse tradeoff from a > > user API perspective than a few extra ARENA-specific internal checks > > (which we already have a few anyways, ARENA is not completely > > transparent internally anyways). > > what do you mean 'user visible'? That __arena_internal (representing .area.1 data section) actually is separate from actual ARENA map (represented by variable in .maps section). And both have separate `struct bpf_map`, which you can look up by name or through iterating all maps of bpf_object. And that you can call getters/setters on __arena_internal, even though the only thing that actually makes sense there is bpf_map__initial_value(), which would just as much make sense on ARENA map itself. > I can add a filter to avoid generating a pointer for it in a skeleton. > Then it won't be any more visible than other bss/data fake maps. bss/data are not fake maps, they have corresponding BPF map (ARRAY) in the kernel. Which is different from __arena_internal. And even if we hide it from skeleton, it's still there in bpf_object, as I mentioned above. Let me try implementing what I have in mind and see how bad it is. > The 2nd fake arena returns true out of bpf_map__is_internal. > > The key comment in the patch: > /* bpf_object will contain two arena maps: > * LIBBPF_MAP_ARENA & BPF_MAP_TYPE_ARENA > * and > * LIBBPF_MAP_UNSPEC & BPF_MAP_TYPE_ARENA. > * The former map->arena will point to latter. > */ Yes, and I'd like to not have two arena maps because they are logically one.
On Wed, Feb 14, 2024 at 9:24 AM Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote: > > On Tue, Feb 13, 2024 at 5:24 PM Alexei Starovoitov > <alexei.starovoitov@gmail.com> wrote: > > > > On Tue, Feb 13, 2024 at 4:09 PM Andrii Nakryiko > > <andrii.nakryiko@gmail.com> wrote: > > > > > > On Tue, Feb 13, 2024 at 3:37 PM Eduard Zingerman <eddyz87@gmail.com> wrote: > > > > > > > > On Tue, 2024-02-13 at 15:17 -0800, Andrii Nakryiko wrote: > > > > > > > > [...] > > > > > > > > > > So, at first I thought that having two maps is a bit of a hack. > > > > > > > > > > yep, that was my instinct as well > > > > > > > > > > > However, after trying to make it work with only one map I don't really > > > > > > like that either :) > > > > > > > > > > Can you elaborate? see my reply to Alexei, I wonder how did you think > > > > > about doing this? > > > > > > > > Relocations in the ELF file are against a new section: ".arena.1". > > > > This works nicely with logic in bpf_program__record_reloc(). > > > > If single map is used, we effectively need to track two indexes for > > > > the map section: > > > > - one used for relocations against map variables themselves > > > > (named "generic map reference relocation" in the function code); > > > > - one used for relocations against ".arena.1" > > > > (named "global data map relocation" in the function code). > > > > > > > > This spooked me off: > > > > - either bpf_object__init_internal_map() would have a specialized > > > > branch for arenas, as with current approach; > > > > - or bpf_program__record_reloc() would have a specialized branch for arenas, > > > > as with one map approach. > > > > > > Yes, relocations would know about .arena.1, but it's a pretty simple > > > check in a few places. We basically have arena *definition* sec_idx > > > (corresponding to SEC(".maps")) and arena *data* sec_idx. The latter > > > is what is recorded for global variables in .arena.1. We can remember > > > this arena data sec_idx in struct bpf_object once during ELF > > > processing, and then just special case it internally in a few places. > > > > That was my first attempt and bpf_program__record_reloc() > > became a mess. > > Currently it does relo search either in internal maps > > or in obj->efile.btf_maps_shndx. > > Doing double search wasn't nice. > > And further, such dual meaning of 'struct bpf_map' object messes > > assumptions of bpf_object__shndx_is_maps, bpf_object__shndx_is_data > > and the way libbpf treats map->libbpf_type everywhere. > > > > bpf_map__is_internal() cannot really say true or false > > for such dual use map. > > Then skeleton gen gets ugly. > > Needs more public libbpf APIs to use in bpftool gen. > > Just a mess. > > It might be easier for me to try implement it the way I see it than > discuss it over emails. I'll give it a try today-tomorrow and get back > to you. > > > > > > The "fake" bpf_map for __arena_internal is user-visible and requires > > > autocreate=false tricks, etc. I feel like it's a worse tradeoff from a > > > user API perspective than a few extra ARENA-specific internal checks > > > (which we already have a few anyways, ARENA is not completely > > > transparent internally anyways). > > > > what do you mean 'user visible'? > > That __arena_internal (representing .area.1 data section) actually is > separate from actual ARENA map (represented by variable in .maps > section). And both have separate `struct bpf_map`, which you can look > up by name or through iterating all maps of bpf_object. And that you > can call getters/setters on __arena_internal, even though the only > thing that actually makes sense there is bpf_map__initial_value(), > which would just as much make sense on ARENA map itself. > > > I can add a filter to avoid generating a pointer for it in a skeleton. > > Then it won't be any more visible than other bss/data fake maps. > > bss/data are not fake maps, they have corresponding BPF map (ARRAY) in > the kernel. Which is different from __arena_internal. And even if we > hide it from skeleton, it's still there in bpf_object, as I mentioned > above. > > Let me try implementing what I have in mind and see how bad it is. > > > The 2nd fake arena returns true out of bpf_map__is_internal. > > > > The key comment in the patch: > > /* bpf_object will contain two arena maps: > > * LIBBPF_MAP_ARENA & BPF_MAP_TYPE_ARENA > > * and > > * LIBBPF_MAP_UNSPEC & BPF_MAP_TYPE_ARENA. > > * The former map->arena will point to latter. > > */ > > Yes, and I'd like to not have two arena maps because they are logically one. Alright, I'm back. I pushed 3 patches on top of your patches into [0]. Available also at [1], if that's more convenient. I'll paste the main diff below, but gmail will inevitably butcher the formatting, but it's easier to discuss the code this way. [0] https://github.com/anakryiko/linux/commits/arena/ [1] https://git.kernel.org/pub/scm/linux/kernel/git/andrii/bpf-next.git/log/?h=arena First, as I was working on the code, I realized that the place where we do mmap() after creating ARENA map is different from where we normally do post-creation steps, so I moved the code to keep all those extra steps in one place. No changes in logic, but now we also don't need to close map_fd and so on, I think it's better this way. And so the main changes are below. There are definitely a few ARENA-specific checks here and there, but I don't think it's that bad. A bunch of code is just undoing code changes from previous patch, so once you incorporate these changes into your patches it will be even smaller. The main outcome is that we don't have a fake map as an independent struct bpf_map and bpf_map__initial_value() logic works transparently. We'll probably need similar special-casing for STRUCT_OPS maps that Kui-Feng is adding, so ARENA won't be the only one. The slightly annoying part is that special casing is necessary because of map->mmapable assumption that it has to be munmap() and its size is calculated by bpf_map_mmap_sz() (I could have hacked map->def.value_size for this, but that felt wrong). We could generalize/fix that, but I chose not to do that just yet. commit 2a7a90e06d02a4edb60cf92c19aee2b3f05d3cca Author: Andrii Nakryiko <andrii@kernel.org> Date: Thu Feb 15 14:55:00 2024 -0800 libbpf: remove fake __arena_internal map Unify actual ARENA map and fake __arena_internal map. .arena.1 ELF section isn't a stand-alone BPF map, so it shouldn't be represented as `struct bpf_map *` instance in libbpf APIs. Instead, use ELF section data as initial data image, which is exposed through skeleton and bpf_map__initial_value() to the user, if they need to tune it before the load phase. During load phase, this initial image is copied over into mmap()'ed region corresponding to ARENA, and discarded. Few small checks here and there had to be added to make sure this approach works with bpf_map__initial_value(), mostly due to hard-coded assumption that map->mmaped is set up with mmap() syscall and should be munmap()'ed. For ARENA, .arena.1 can be (much) smaller than maximum ARENA size, so this smaller data size has to be tracked separately. Given it is enforced that there is only one ARENA for entire bpf_object instance, we just keep it in a separate field. This can be generalized if necessary later. bpftool is adjusted a bit as well, because ARENA map is not reported as "internal" (it's not a great fit in this case), plus we need to take into account that ARENA map can exist without corresponding .arena.1 ELF section, so user-facing data section in skeleton is optional. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> diff --git a/tools/bpf/bpftool/gen.c b/tools/bpf/bpftool/gen.c index 273da2098231..6e17b95436de 100644 --- a/tools/bpf/bpftool/gen.c +++ b/tools/bpf/bpftool/gen.c @@ -82,7 +82,7 @@ static bool get_map_ident(const struct bpf_map *map, char *buf, size_t buf_sz) const char *name = bpf_map__name(map); int i, n; - if (!bpf_map__is_internal(map) || bpf_map__type(map) == BPF_MAP_TYPE_ARENA) { + if (!bpf_map__is_internal(map)) { snprintf(buf, buf_sz, "%s", name); return true; } @@ -109,7 +109,7 @@ static bool get_datasec_ident(const char *sec_name, char *buf, size_t buf_sz) /* recognize hard coded LLVM section name */ if (strcmp(sec_name, ".arena.1") == 0) { /* this is the name to use in skeleton */ - strncpy(buf, "arena", buf_sz); + snprintf(buf, buf_sz, "arena"); return true; } for (i = 0, n = ARRAY_SIZE(pfxs); i < n; i++) { @@ -242,14 +242,16 @@ static const struct btf_type *find_type_for_map(struct btf *btf, const char *map static bool is_mmapable_map(const struct bpf_map *map, char *buf, size_t sz) { - if (!bpf_map__is_internal(map) || !(bpf_map__map_flags(map) & BPF_F_MMAPABLE)) - return false; + size_t tmp_sz; - if (bpf_map__type(map) == BPF_MAP_TYPE_ARENA) { - strncpy(buf, "arena", sz); + if (bpf_map__type(map) == BPF_MAP_TYPE_ARENA && bpf_map__initial_value(map, &tmp_sz)) { + snprintf(buf, sz, "arena"); return true; } + if (!bpf_map__is_internal(map) || !(bpf_map__map_flags(map) & BPF_F_MMAPABLE)) + return false; + if (!get_map_ident(map, buf, sz)) return false; diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c index 5a53f1ed87f2..c72577bef439 100644 --- a/tools/lib/bpf/libbpf.c +++ b/tools/lib/bpf/libbpf.c @@ -506,7 +506,6 @@ enum libbpf_map_type { LIBBPF_MAP_BSS, LIBBPF_MAP_RODATA, LIBBPF_MAP_KCONFIG, - LIBBPF_MAP_ARENA, }; struct bpf_map_def { @@ -549,7 +548,6 @@ struct bpf_map { bool reused; bool autocreate; __u64 map_extra; - struct bpf_map *arena; }; enum extern_type { @@ -616,7 +614,6 @@ enum sec_type { SEC_BSS, SEC_DATA, SEC_RODATA, - SEC_ARENA, }; struct elf_sec_desc { @@ -634,6 +631,7 @@ struct elf_state { Elf_Data *symbols; Elf_Data *st_ops_data; Elf_Data *st_ops_link_data; + Elf_Data *arena_data; size_t shstrndx; /* section index for section name strings */ size_t strtabidx; struct elf_sec_desc *secs; @@ -644,6 +642,7 @@ struct elf_state { int symbols_shndx; int st_ops_shndx; int st_ops_link_shndx; + int arena_data_shndx; }; struct usdt_manager; @@ -703,6 +702,10 @@ struct bpf_object { struct usdt_manager *usdt_man; + struct bpf_map *arena_map; + void *arena_data; + size_t arena_data_sz; + struct kern_feature_cache *feat_cache; char *token_path; int token_fd; @@ -1340,6 +1343,7 @@ static void bpf_object__elf_finish(struct bpf_object *obj) obj->efile.symbols = NULL; obj->efile.st_ops_data = NULL; obj->efile.st_ops_link_data = NULL; + obj->efile.arena_data = NULL; zfree(&obj->efile.secs); obj->efile.sec_cnt = 0; @@ -1722,34 +1726,10 @@ static int bpf_object__init_internal_map(struct bpf_object *obj, enum libbpf_map_type type, const char *real_name, int sec_idx, void *data, size_t data_sz) { - const long page_sz = sysconf(_SC_PAGE_SIZE); - struct bpf_map *map, *arena = NULL; struct bpf_map_def *def; + struct bpf_map *map; size_t mmap_sz; - int err, i; - - if (type == LIBBPF_MAP_ARENA) { - for (i = 0; i < obj->nr_maps; i++) { - map = &obj->maps[i]; - if (map->def.type != BPF_MAP_TYPE_ARENA) - continue; - arena = map; - real_name = "__arena_internal"; - mmap_sz = bpf_map_mmap_sz(map); - if (roundup(data_sz, page_sz) > mmap_sz) { - pr_warn("Declared arena map size %zd is too small to hold" - "global __arena variables of size %zd\n", - mmap_sz, data_sz); - return -E2BIG; - } - break; - } - if (!arena) { - pr_warn("To use global __arena variables the arena map should" - "be declared explicitly in SEC(\".maps\")\n"); - return -ENOENT; - } - } + int err; map = bpf_object__add_map(obj); if (IS_ERR(map)) @@ -1760,7 +1740,6 @@ bpf_object__init_internal_map(struct bpf_object *obj, enum libbpf_map_type type, map->sec_offset = 0; map->real_name = strdup(real_name); map->name = internal_map_name(obj, real_name); - map->arena = arena; if (!map->real_name || !map->name) { zfree(&map->real_name); zfree(&map->name); @@ -1768,32 +1747,18 @@ bpf_object__init_internal_map(struct bpf_object *obj, enum libbpf_map_type type, } def = &map->def; - if (type == LIBBPF_MAP_ARENA) { - /* bpf_object will contain two arena maps: - * LIBBPF_MAP_ARENA & BPF_MAP_TYPE_ARENA - * and - * LIBBPF_MAP_UNSPEC & BPF_MAP_TYPE_ARENA. - * The former map->arena will point to latter. - */ - def->type = BPF_MAP_TYPE_ARENA; - def->key_size = 0; - def->value_size = 0; - def->max_entries = roundup(data_sz, page_sz) / page_sz; - def->map_flags = BPF_F_MMAPABLE; - } else { - def->type = BPF_MAP_TYPE_ARRAY; - def->key_size = sizeof(int); - def->value_size = data_sz; - def->max_entries = 1; - def->map_flags = type == LIBBPF_MAP_RODATA || type == LIBBPF_MAP_KCONFIG - ? BPF_F_RDONLY_PROG : 0; + def->type = BPF_MAP_TYPE_ARRAY; + def->key_size = sizeof(int); + def->value_size = data_sz; + def->max_entries = 1; + def->map_flags = type == LIBBPF_MAP_RODATA || type == LIBBPF_MAP_KCONFIG + ? BPF_F_RDONLY_PROG : 0; - /* failures are fine because of maps like .rodata.str1.1 */ - (void) map_fill_btf_type_info(obj, map); + /* failures are fine because of maps like .rodata.str1.1 */ + (void) map_fill_btf_type_info(obj, map); - if (map_is_mmapable(obj, map)) - def->map_flags |= BPF_F_MMAPABLE; - } + if (map_is_mmapable(obj, map)) + def->map_flags |= BPF_F_MMAPABLE; pr_debug("map '%s' (global data): at sec_idx %d, offset %zu, flags %x.\n", map->name, map->sec_idx, map->sec_offset, def->map_flags); @@ -1857,13 +1822,6 @@ static int bpf_object__init_global_data_maps(struct bpf_object *obj) NULL, sec_desc->data->d_size); break; - case SEC_ARENA: - sec_name = elf_sec_name(obj, elf_sec_by_idx(obj, sec_idx)); - err = bpf_object__init_internal_map(obj, LIBBPF_MAP_ARENA, - sec_name, sec_idx, - sec_desc->data->d_buf, - sec_desc->data->d_size); - break; default: /* skip */ break; @@ -2786,6 +2744,32 @@ static int bpf_object__init_user_btf_map(struct bpf_object *obj, return 0; } +static int init_arena_map_data(struct bpf_object *obj, struct bpf_map *map, + const char *sec_name, int sec_idx, + void *data, size_t data_sz) +{ + const long page_sz = sysconf(_SC_PAGE_SIZE); + size_t mmap_sz; + + mmap_sz = bpf_map_mmap_sz(obj->arena_map); + if (roundup(data_sz, page_sz) > mmap_sz) { + pr_warn("elf: sec '%s': declared ARENA map size (%zu) is too small to hold global __arena variables of size %zu\n", + sec_name, mmap_sz, data_sz); + return -E2BIG; + } + + obj->arena_data = malloc(data_sz); + if (!obj->arena_data) + return -ENOMEM; + memcpy(obj->arena_data, data, data_sz); + obj->arena_data_sz = data_sz; + + /* make bpf_map__init_value() work for ARENA maps */ + map->mmaped = obj->arena_data; + + return 0; +} + static int bpf_object__init_user_btf_maps(struct bpf_object *obj, bool strict, const char *pin_root_path) { @@ -2835,6 +2819,33 @@ static int bpf_object__init_user_btf_maps(struct bpf_object *obj, bool strict, return err; } + for (i = 0; i < obj->nr_maps; i++) { + struct bpf_map *map = &obj->maps[i]; + + if (map->def.type != BPF_MAP_TYPE_ARENA) + continue; + + if (obj->arena_map) { + pr_warn("map '%s': only single ARENA map is supported (map '%s' is also ARENA)\n", + map->name, obj->arena_map->name); + return -EINVAL; + } + obj->arena_map = map; + + if (obj->efile.arena_data) { + err = init_arena_map_data(obj, map, ARENA_SEC, obj->efile.arena_data_shndx, + obj->efile.arena_data->d_buf, + obj->efile.arena_data->d_size); + if (err) + return err; + } + } + if (obj->efile.arena_data && !obj->arena_map) { + pr_warn("elf: sec '%s': to use global __arena variables the ARENA map should be explicitly declared in SEC(\".maps\")\n", + ARENA_SEC); + return -ENOENT; + } + return 0; } @@ -3699,9 +3710,8 @@ static int bpf_object__elf_collect(struct bpf_object *obj) obj->efile.st_ops_link_data = data; obj->efile.st_ops_link_shndx = idx; } else if (strcmp(name, ARENA_SEC) == 0) { - sec_desc->sec_type = SEC_ARENA; - sec_desc->shdr = sh; - sec_desc->data = data; + obj->efile.arena_data = data; + obj->efile.arena_data_shndx = idx; } else { pr_info("elf: skipping unrecognized data section(%d) %s\n", idx, name); @@ -4204,7 +4214,6 @@ static bool bpf_object__shndx_is_data(const struct bpf_object *obj, case SEC_BSS: case SEC_DATA: case SEC_RODATA: - case SEC_ARENA: return true; default: return false; @@ -4230,8 +4239,6 @@ bpf_object__section_to_libbpf_map_type(const struct bpf_object *obj, int shndx) return LIBBPF_MAP_DATA; case SEC_RODATA: return LIBBPF_MAP_RODATA; - case SEC_ARENA: - return LIBBPF_MAP_ARENA; default: return LIBBPF_MAP_UNSPEC; } @@ -4332,6 +4339,15 @@ static int bpf_program__record_reloc(struct bpf_program *prog, type = bpf_object__section_to_libbpf_map_type(obj, shdr_idx); sym_sec_name = elf_sec_name(obj, elf_sec_by_idx(obj, shdr_idx)); + /* arena data relocation */ + if (shdr_idx == obj->efile.arena_data_shndx) { + reloc_desc->type = RELO_DATA; + reloc_desc->insn_idx = insn_idx; + reloc_desc->map_idx = obj->arena_map - obj->maps; + reloc_desc->sym_off = sym->st_value; + return 0; + } + /* generic map reference relocation */ if (type == LIBBPF_MAP_UNSPEC) { if (!bpf_object__shndx_is_maps(obj, shdr_idx)) { @@ -4385,7 +4401,7 @@ static int bpf_program__record_reloc(struct bpf_program *prog, reloc_desc->type = RELO_DATA; reloc_desc->insn_idx = insn_idx; - reloc_desc->map_idx = map->arena ? map->arena - obj->maps : map_idx; + reloc_desc->map_idx = map_idx; reloc_desc->sym_off = sym->st_value; return 0; } @@ -4872,8 +4888,6 @@ bpf_object__populate_internal_map(struct bpf_object *obj, struct bpf_map *map) bpf_gen__map_freeze(obj->gen_loader, map - obj->maps); return 0; } - if (map_type == LIBBPF_MAP_ARENA) - return 0; err = bpf_map_update_elem(map->fd, &zero, map->mmaped, 0); if (err) { @@ -5166,15 +5180,6 @@ bpf_object__create_maps(struct bpf_object *obj) if (bpf_map__is_internal(map) && !kernel_supports(obj, FEAT_GLOBAL_DATA)) map->autocreate = false; - if (map->libbpf_type == LIBBPF_MAP_ARENA) { - size_t len = bpf_map_mmap_sz(map); - - memcpy(map->arena->mmaped, map->mmaped, len); - map->autocreate = false; - munmap(map->mmaped, len); - map->mmaped = NULL; - } - if (!map->autocreate) { pr_debug("map '%s': skipped auto-creating...\n", map->name); continue; @@ -5229,6 +5234,10 @@ bpf_object__create_maps(struct bpf_object *obj) map->name, err); return err; } + if (obj->arena_data) { + memcpy(map->mmaped, obj->arena_data, obj->arena_data_sz); + zfree(&obj->arena_data); + } } if (map->init_slots_sz && map->def.type != BPF_MAP_TYPE_PROG_ARRAY) { err = init_map_in_map_slots(obj, map); @@ -8716,13 +8725,9 @@ static void bpf_map__destroy(struct bpf_map *map) zfree(&map->init_slots); map->init_slots_sz = 0; - if (map->mmaped) { - size_t mmap_sz; - - mmap_sz = bpf_map_mmap_sz(map); - munmap(map->mmaped, mmap_sz); - map->mmaped = NULL; - } + if (map->mmaped && map->mmaped != map->obj->arena_data) + munmap(map->mmaped, bpf_map_mmap_sz(map)); + map->mmaped = NULL; if (map->st_ops) { zfree(&map->st_ops->data); @@ -8782,6 +8787,8 @@ void bpf_object__close(struct bpf_object *obj) if (obj->token_fd > 0) close(obj->token_fd); + zfree(&obj->arena_data); + free(obj); } @@ -9803,8 +9810,6 @@ static bool map_uses_real_name(const struct bpf_map *map) return true; if (map->libbpf_type == LIBBPF_MAP_RODATA && strcmp(map->real_name, RODATA_SEC) != 0) return true; - if (map->libbpf_type == LIBBPF_MAP_ARENA) - return true; return false; } @@ -10006,22 +10011,35 @@ __u32 bpf_map__btf_value_type_id(const struct bpf_map *map) int bpf_map__set_initial_value(struct bpf_map *map, const void *data, size_t size) { + size_t actual_sz; + if (map->obj->loaded || map->reused) return libbpf_err(-EBUSY); - if (!map->mmaped || map->libbpf_type == LIBBPF_MAP_KCONFIG || - size != map->def.value_size) + if (!map->mmaped || map->libbpf_type == LIBBPF_MAP_KCONFIG) + return libbpf_err(-EINVAL); + + if (map->def.type == BPF_MAP_TYPE_ARENA) + actual_sz = map->obj->arena_data_sz; + else + actual_sz = map->def.value_size; + if (size != actual_sz) return libbpf_err(-EINVAL); memcpy(map->mmaped, data, size); return 0; } -void *bpf_map__initial_value(struct bpf_map *map, size_t *psize) +void *bpf_map__initial_value(const struct bpf_map *map, size_t *psize) { if (!map->mmaped) return NULL; - *psize = map->def.value_size; + + if (map->def.type == BPF_MAP_TYPE_ARENA) + *psize = map->obj->arena_data_sz; + else + *psize = map->def.value_size; + return map->mmaped; } @@ -13510,8 +13528,8 @@ int bpf_object__load_skeleton(struct bpf_object_skeleton *s) continue; } - if (map->arena) { - *mmaped = map->arena->mmaped; + if (map->def.type == BPF_MAP_TYPE_ARENA) { + *mmaped = map->mmaped; continue; } diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h index 5723cbbfcc41..7b510761f545 100644 --- a/tools/lib/bpf/libbpf.h +++ b/tools/lib/bpf/libbpf.h @@ -1014,7 +1014,7 @@ LIBBPF_API int bpf_map__set_map_extra(struct bpf_map *map, __u64 map_extra); LIBBPF_API int bpf_map__set_initial_value(struct bpf_map *map, const void *data, size_t size); -LIBBPF_API void *bpf_map__initial_value(struct bpf_map *map, size_t *psize); +LIBBPF_API void *bpf_map__initial_value(const struct bpf_map *map, size_t *psize); /** * @brief **bpf_map__is_internal()** tells the caller whether or not the
On Thu, Feb 15, 2024 at 3:22 PM Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote: > > { > @@ -2835,6 +2819,33 @@ static int > bpf_object__init_user_btf_maps(struct bpf_object *obj, bool strict, > return err; > } > > + for (i = 0; i < obj->nr_maps; i++) { > + struct bpf_map *map = &obj->maps[i]; > + > + if (map->def.type != BPF_MAP_TYPE_ARENA) > + continue; > + > + if (obj->arena_map) { > + pr_warn("map '%s': only single ARENA map is supported > (map '%s' is also ARENA)\n", > + map->name, obj->arena_map->name); > + return -EINVAL; > + } > + obj->arena_map = map; > + > + if (obj->efile.arena_data) { > + err = init_arena_map_data(obj, map, ARENA_SEC, > obj->efile.arena_data_shndx, > + obj->efile.arena_data->d_buf, > + obj->efile.arena_data->d_size); > + if (err) > + return err; > + } > + } > + if (obj->efile.arena_data && !obj->arena_map) { > + pr_warn("elf: sec '%s': to use global __arena variables the > ARENA map should be explicitly declared in SEC(\".maps\")\n", > + ARENA_SEC); > + return -ENOENT; > + } > + > return 0; > } > > @@ -3699,9 +3710,8 @@ static int bpf_object__elf_collect(struct bpf_object *obj) > obj->efile.st_ops_link_data = data; > obj->efile.st_ops_link_shndx = idx; > } else if (strcmp(name, ARENA_SEC) == 0) { > - sec_desc->sec_type = SEC_ARENA; > - sec_desc->shdr = sh; > - sec_desc->data = data; > + obj->efile.arena_data = data; > + obj->efile.arena_data_shndx = idx; I see. So these two are sort-of main tricks. Special case ARENA_SEC like ".maps" and then look for this obj level map in the right spots. The special case around bpf_map__[set_]initial_value kind break the layering with: if (map->def.type == BPF_MAP_TYPE_ARENA) actual_sz = map->obj->arena_data_sz; but no big deal. How do you want me to squash the patches? Keep "rename is_internal_mmapable_map into is_mmapable_map" patch as-is and then squash mine and your 2nd and 3rd?
On Thu, Feb 15, 2024 at 6:45 PM Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > > On Thu, Feb 15, 2024 at 3:22 PM Andrii Nakryiko > <andrii.nakryiko@gmail.com> wrote: > > > > { > > @@ -2835,6 +2819,33 @@ static int > > bpf_object__init_user_btf_maps(struct bpf_object *obj, bool strict, > > return err; > > } > > > > + for (i = 0; i < obj->nr_maps; i++) { > > + struct bpf_map *map = &obj->maps[i]; > > + > > + if (map->def.type != BPF_MAP_TYPE_ARENA) > > + continue; > > + > > + if (obj->arena_map) { > > + pr_warn("map '%s': only single ARENA map is supported > > (map '%s' is also ARENA)\n", > > + map->name, obj->arena_map->name); > > + return -EINVAL; > > + } > > + obj->arena_map = map; > > + > > + if (obj->efile.arena_data) { > > + err = init_arena_map_data(obj, map, ARENA_SEC, > > obj->efile.arena_data_shndx, > > + obj->efile.arena_data->d_buf, > > + obj->efile.arena_data->d_size); > > + if (err) > > + return err; > > + } > > + } > > + if (obj->efile.arena_data && !obj->arena_map) { > > + pr_warn("elf: sec '%s': to use global __arena variables the > > ARENA map should be explicitly declared in SEC(\".maps\")\n", > > + ARENA_SEC); > > + return -ENOENT; > > + } > > + > > return 0; > > } > > > > @@ -3699,9 +3710,8 @@ static int bpf_object__elf_collect(struct bpf_object *obj) > > obj->efile.st_ops_link_data = data; > > obj->efile.st_ops_link_shndx = idx; > > } else if (strcmp(name, ARENA_SEC) == 0) { > > - sec_desc->sec_type = SEC_ARENA; > > - sec_desc->shdr = sh; > > - sec_desc->data = data; > > + obj->efile.arena_data = data; > > + obj->efile.arena_data_shndx = idx; > > I see. So these two are sort-of main tricks. > Special case ARENA_SEC like ".maps" and then look for this > obj level map in the right spots. yep > The special case around bpf_map__[set_]initial_value kind break > the layering with: > if (map->def.type == BPF_MAP_TYPE_ARENA) > actual_sz = map->obj->arena_data_sz; > but no big deal. > true, and struct_ops will be another special case, so we might want to think about generalizing that a bit, but that's a separate thing we can handle later on > How do you want me to squash the patches? > Keep "rename is_internal_mmapable_map into is_mmapable_map" patch as-is yep > and then squash mine and your 2nd and 3rd? I think `libbpf: move post-creation steps for ARENA map` should be squashed into your `libbpf: Add support for bpf_arena.` which introduces ARENA map by itself. And then `libbpf: Recognize __arena global varaibles.` and `libbpf: remove fake __arena_internal map` can be squashed together as well.
diff --git a/tools/bpf/bpftool/gen.c b/tools/bpf/bpftool/gen.c index a9334c57e859..74fabbdbad2b 100644 --- a/tools/bpf/bpftool/gen.c +++ b/tools/bpf/bpftool/gen.c @@ -82,7 +82,7 @@ static bool get_map_ident(const struct bpf_map *map, char *buf, size_t buf_sz) const char *name = bpf_map__name(map); int i, n; - if (!bpf_map__is_internal(map)) { + if (!bpf_map__is_internal(map) || bpf_map__type(map) == BPF_MAP_TYPE_ARENA) { snprintf(buf, buf_sz, "%s", name); return true; } @@ -106,6 +106,12 @@ static bool get_datasec_ident(const char *sec_name, char *buf, size_t buf_sz) static const char *pfxs[] = { ".data", ".rodata", ".bss", ".kconfig" }; int i, n; + /* recognize hard coded LLVM section name */ + if (strcmp(sec_name, ".arena.1") == 0) { + /* this is the name to use in skeleton */ + strncpy(buf, "arena", buf_sz); + return true; + } for (i = 0, n = ARRAY_SIZE(pfxs); i < n; i++) { const char *pfx = pfxs[i]; @@ -239,6 +245,11 @@ static bool is_internal_mmapable_map(const struct bpf_map *map, char *buf, size_ if (!bpf_map__is_internal(map) || !(bpf_map__map_flags(map) & BPF_F_MMAPABLE)) return false; + if (bpf_map__type(map) == BPF_MAP_TYPE_ARENA) { + strncpy(buf, "arena", sz); + return true; + } + if (!get_map_ident(map, buf, sz)) return false; diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c index f8158e250327..d5364280a06c 100644 --- a/tools/lib/bpf/libbpf.c +++ b/tools/lib/bpf/libbpf.c @@ -498,6 +498,7 @@ struct bpf_struct_ops { #define KSYMS_SEC ".ksyms" #define STRUCT_OPS_SEC ".struct_ops" #define STRUCT_OPS_LINK_SEC ".struct_ops.link" +#define ARENA_SEC ".arena.1" enum libbpf_map_type { LIBBPF_MAP_UNSPEC, @@ -505,6 +506,7 @@ enum libbpf_map_type { LIBBPF_MAP_BSS, LIBBPF_MAP_RODATA, LIBBPF_MAP_KCONFIG, + LIBBPF_MAP_ARENA, }; struct bpf_map_def { @@ -547,6 +549,7 @@ struct bpf_map { bool reused; bool autocreate; __u64 map_extra; + struct bpf_map *arena; }; enum extern_type { @@ -613,6 +616,7 @@ enum sec_type { SEC_BSS, SEC_DATA, SEC_RODATA, + SEC_ARENA, }; struct elf_sec_desc { @@ -1718,10 +1722,34 @@ static int bpf_object__init_internal_map(struct bpf_object *obj, enum libbpf_map_type type, const char *real_name, int sec_idx, void *data, size_t data_sz) { + const long page_sz = sysconf(_SC_PAGE_SIZE); + struct bpf_map *map, *arena = NULL; struct bpf_map_def *def; - struct bpf_map *map; size_t mmap_sz; - int err; + int err, i; + + if (type == LIBBPF_MAP_ARENA) { + for (i = 0; i < obj->nr_maps; i++) { + map = &obj->maps[i]; + if (map->def.type != BPF_MAP_TYPE_ARENA) + continue; + arena = map; + real_name = "__arena_internal"; + mmap_sz = bpf_map_mmap_sz(map); + if (roundup(data_sz, page_sz) > mmap_sz) { + pr_warn("Declared arena map size %zd is too small to hold" + "global __arena variables of size %zd\n", + mmap_sz, data_sz); + return -E2BIG; + } + break; + } + if (!arena) { + pr_warn("To use global __arena variables the arena map should" + "be declared explicitly in SEC(\".maps\")\n"); + return -ENOENT; + } + } map = bpf_object__add_map(obj); if (IS_ERR(map)) @@ -1732,6 +1760,7 @@ bpf_object__init_internal_map(struct bpf_object *obj, enum libbpf_map_type type, map->sec_offset = 0; map->real_name = strdup(real_name); map->name = internal_map_name(obj, real_name); + map->arena = arena; if (!map->real_name || !map->name) { zfree(&map->real_name); zfree(&map->name); @@ -1739,18 +1768,32 @@ bpf_object__init_internal_map(struct bpf_object *obj, enum libbpf_map_type type, } def = &map->def; - def->type = BPF_MAP_TYPE_ARRAY; - def->key_size = sizeof(int); - def->value_size = data_sz; - def->max_entries = 1; - def->map_flags = type == LIBBPF_MAP_RODATA || type == LIBBPF_MAP_KCONFIG - ? BPF_F_RDONLY_PROG : 0; + if (type == LIBBPF_MAP_ARENA) { + /* bpf_object will contain two arena maps: + * LIBBPF_MAP_ARENA & BPF_MAP_TYPE_ARENA + * and + * LIBBPF_MAP_UNSPEC & BPF_MAP_TYPE_ARENA. + * The former map->arena will point to latter. + */ + def->type = BPF_MAP_TYPE_ARENA; + def->key_size = 0; + def->value_size = 0; + def->max_entries = roundup(data_sz, page_sz) / page_sz; + def->map_flags = BPF_F_MMAPABLE; + } else { + def->type = BPF_MAP_TYPE_ARRAY; + def->key_size = sizeof(int); + def->value_size = data_sz; + def->max_entries = 1; + def->map_flags = type == LIBBPF_MAP_RODATA || type == LIBBPF_MAP_KCONFIG + ? BPF_F_RDONLY_PROG : 0; - /* failures are fine because of maps like .rodata.str1.1 */ - (void) map_fill_btf_type_info(obj, map); + /* failures are fine because of maps like .rodata.str1.1 */ + (void) map_fill_btf_type_info(obj, map); - if (map_is_mmapable(obj, map)) - def->map_flags |= BPF_F_MMAPABLE; + if (map_is_mmapable(obj, map)) + def->map_flags |= BPF_F_MMAPABLE; + } pr_debug("map '%s' (global data): at sec_idx %d, offset %zu, flags %x.\n", map->name, map->sec_idx, map->sec_offset, def->map_flags); @@ -1814,6 +1857,13 @@ static int bpf_object__init_global_data_maps(struct bpf_object *obj) NULL, sec_desc->data->d_size); break; + case SEC_ARENA: + sec_name = elf_sec_name(obj, elf_sec_by_idx(obj, sec_idx)); + err = bpf_object__init_internal_map(obj, LIBBPF_MAP_ARENA, + sec_name, sec_idx, + sec_desc->data->d_buf, + sec_desc->data->d_size); + break; default: /* skip */ break; @@ -3646,6 +3696,10 @@ static int bpf_object__elf_collect(struct bpf_object *obj) } else if (strcmp(name, STRUCT_OPS_LINK_SEC) == 0) { obj->efile.st_ops_link_data = data; obj->efile.st_ops_link_shndx = idx; + } else if (strcmp(name, ARENA_SEC) == 0) { + sec_desc->sec_type = SEC_ARENA; + sec_desc->shdr = sh; + sec_desc->data = data; } else { pr_info("elf: skipping unrecognized data section(%d) %s\n", idx, name); @@ -4148,6 +4202,7 @@ static bool bpf_object__shndx_is_data(const struct bpf_object *obj, case SEC_BSS: case SEC_DATA: case SEC_RODATA: + case SEC_ARENA: return true; default: return false; @@ -4173,6 +4228,8 @@ bpf_object__section_to_libbpf_map_type(const struct bpf_object *obj, int shndx) return LIBBPF_MAP_DATA; case SEC_RODATA: return LIBBPF_MAP_RODATA; + case SEC_ARENA: + return LIBBPF_MAP_ARENA; default: return LIBBPF_MAP_UNSPEC; } @@ -4326,7 +4383,7 @@ static int bpf_program__record_reloc(struct bpf_program *prog, reloc_desc->type = RELO_DATA; reloc_desc->insn_idx = insn_idx; - reloc_desc->map_idx = map_idx; + reloc_desc->map_idx = map->arena ? map->arena - obj->maps : map_idx; reloc_desc->sym_off = sym->st_value; return 0; } @@ -4813,6 +4870,9 @@ bpf_object__populate_internal_map(struct bpf_object *obj, struct bpf_map *map) bpf_gen__map_freeze(obj->gen_loader, map - obj->maps); return 0; } + if (map_type == LIBBPF_MAP_ARENA) + return 0; + err = bpf_map_update_elem(map->fd, &zero, map->mmaped, 0); if (err) { err = -errno; @@ -5119,6 +5179,15 @@ bpf_object__create_maps(struct bpf_object *obj) if (bpf_map__is_internal(map) && !kernel_supports(obj, FEAT_GLOBAL_DATA)) map->autocreate = false; + if (map->libbpf_type == LIBBPF_MAP_ARENA) { + size_t len = bpf_map_mmap_sz(map); + + memcpy(map->arena->mmaped, map->mmaped, len); + map->autocreate = false; + munmap(map->mmaped, len); + map->mmaped = NULL; + } + if (!map->autocreate) { pr_debug("map '%s': skipped auto-creating...\n", map->name); continue; @@ -9735,6 +9804,8 @@ static bool map_uses_real_name(const struct bpf_map *map) return true; if (map->libbpf_type == LIBBPF_MAP_RODATA && strcmp(map->real_name, RODATA_SEC) != 0) return true; + if (map->libbpf_type == LIBBPF_MAP_ARENA) + return true; return false; } @@ -13437,6 +13508,11 @@ int bpf_object__load_skeleton(struct bpf_object_skeleton *s) continue; } + if (map->arena) { + *mmaped = map->arena->mmaped; + continue; + } + if (map->def.map_flags & BPF_F_RDONLY_PROG) prot = PROT_READ; else