Message ID | 20240402193945.17327-1-acme@kernel.org (mailing list archive) |
---|---|
Headers | show |
Series | pahole: Reproducible parallel DWARF loading/serial BTF encoding | expand |
On Tue, 2024-04-02 at 16:39 -0300, Arnaldo Carvalho de Melo wrote: > Hi, > > This allows us to have reproducible builds while keeping the > DWARF loading phase in parallel, achieving a noticeable speedup as > showed in the commit log messages: [...] > Working on libbpf to allow for parallel reproducible BTF encoding is the > next step. Another option would be to apply some sort of canonical ordering to BTF itself. E.g. put all PTR before STRUCT, sort same kinds by name, sort same kinds by vlen, etc. Something akin to [1], however this experiment has several flaws: - slowdown is much worse than with your patch-set; - I see a small number of functions with identical names appearing and disappearing from final BTF. I'll try to figure out the reason for slowdown tomorrow. [1] https://github.com/eddyz87/dwarves/tree/sort-btf
On 04/04/2024 01:08, Eduard Zingerman wrote: > On Tue, 2024-04-02 at 16:39 -0300, Arnaldo Carvalho de Melo wrote: >> Hi, >> >> This allows us to have reproducible builds while keeping the >> DWARF loading phase in parallel, achieving a noticeable speedup as >> showed in the commit log messages: > > [...] > >> Working on libbpf to allow for parallel reproducible BTF encoding is the >> next step. > > Another option would be to apply some sort of canonical ordering to > BTF itself. E.g. put all PTR before STRUCT, sort same kinds by name, > sort same kinds by vlen, etc. Something akin to [1], however this > experiment has several flaws: > - slowdown is much worse than with your patch-set; > - I see a small number of functions with identical names appearing and > disappearing from final BTF. > Could that be the handling of functions with same name, inconsistent prototypes? We leave them out deliberately (see btf_encoder__add_saved_funcs(). > I'll try to figure out the reason for slowdown tomorrow. > > [1] https://github.com/eddyz87/dwarves/tree/sort-btf >
On 02/04/2024 20:39, Arnaldo Carvalho de Melo wrote: > Hi, > > This allows us to have reproducible builds while keeping the > DWARF loading phase in parallel, achieving a noticeable speedup as > showed in the commit log messages: > > On a: > > model name : Intel(R) Core(TM) i7-14700K > > 8 performance cores (16 threads), 12 efficiency cores. > > Serial encoding: > > $ perf stat -e cycles -r5 pahole --btf_encode_detached=vmlinux.btf.serial vmlinux > 5.18276 +- 0.00952 seconds time elapsed ( +- 0.18% ) > > Parallel, but non-reproducible: > > $ perf stat -e cycles -r5 pahole -j --btf_encode_detached=vmlinux.btf.parallel vmlinux > 1.8529 +- 0.0159 seconds time elapsed ( +- 0.86% ) > > reproducible build done using parallel DWARF loading + CUs-ordered-as-in-vmlinux serial BTF encoding: > > $ perf stat -e cycles -r5 pahole -j --reproducible_build --btf_encode_detached=vmlinux.btf.parallel.reproducible_build vmlinux > 2.3632 +- 0.0164 seconds time elapsed ( +- 0.69% ) > > Please take a look, its in the 'next' branch at: > > https://git.kernel.org/pub/scm/devel/pahole/pahole.git > https://git.kernel.org/pub/scm/devel/pahole/pahole.git/log/?h=next > > There is a new tool to do regression testing on this feature: > > https://git.kernel.org/pub/scm/devel/pahole/pahole.git/commit/?h=next&id=c751214c19bf8591bf8e4abdc677cbadee08f630 > > And here a more detailed set of tests using it: > > https://git.kernel.org/pub/scm/devel/pahole/pahole.git/commit/?h=next&id=4451467ca16a6e31834f6f98661c63587ce556f7 > > Working on libbpf to allow for parallel reproducible BTF encoding is the > next step. > > Thanks a lot, > Hey Arnaldo In testing this series I've hit a segmentation fault: Using host libthread_db library "/usr/lib64/libthread_db.so.1". Core was generated by `pahole -J --btf_features=all --reproducible_build -j vmlinux'. Program terminated with signal SIGSEGV, Segmentation fault. #0 0x00007f8c8260a58c in ptr_table__entry (pt=0x7f8c60001e70, id=77) at /home/almagui/src/dwarves/dwarves.c:612 612 return id >= pt->nr_entries ? NULL : pt->entries[id]; [Current thread is 1 (Thread 0x7f8c65400700 (LWP 624441))] (gdb) bt #0 0x00007f8c8260a58c in ptr_table__entry (pt=0x7f8c60001e70, id=77) at /home/almagui/src/dwarves/dwarves.c:612 #1 0x00007f8c8260ada2 in cu__type (cu=0x7f8c60001e40, id=77) at /home/almagui/src/dwarves/dwarves.c:806 #2 0x00007f8c8261342c in ftype__fprintf (ftype=0x7f8c60272f30, cu=0x7f8c60001e40, name=0x0, inlined=0, is_pointer=0, type_spacing=0, is_prototype=true, conf=0x7f8c653ff930, fp=0x7f8c3804bc90) at /home/almagui/src/dwarves/dwarves_fprintf.c:1388 #3 0x00007f8c8261289d in function__prototype_conf (func=0x7f8c60272f30, cu=0x7f8c60001e40, conf=0x7f8c653ff930, bf=0x7f8c27225dad "", len=512) at /home/almagui/src/dwarves/dwarves_fprintf.c:1183 #4 0x00007f8c8261b52b in proto__get (func=0x7f8c60272f30, proto=0x7f8c27225dad "", len=512) at /home/almagui/src/dwarves/btf_encoder.c:811 #5 0x00007f8c8261b665 in funcs__match (encoder=0x7f8c28023220, func=0x7f8c27225d88, f2=0x7f8c5805c560) at /home/almagui/src/dwarves/btf_encoder.c:839 #6 0x00007f8c8261b7fc in btf_encoder__save_func (encoder=0x7f8c28023220, fn=0x7f8c5805c560, func=0x7f8c27225d88) at /home/almagui/src/dwarves/btf_encoder.c:871 #7 0x00007f8c8261e361 in btf_encoder__encode_cu (encoder=0x7f8c28023220, cu=0x7f8c58001e20, conf_load=0x412400 <conf_load>) at /home/almagui/src/dwarves/btf_encoder.c:1888 #8 0x000000000040a36c in pahole_stealer (cu=0x7f8c58001e20, conf_load=0x412400 <conf_load>, thr_data=0x0) at /home/almagui/src/dwarves/pahole.c:3342 #9 0x00007f8c8262672c in cu__finalize (cu=0x7f8c38001e20, cus=0x21412a0, conf=0x412400 <conf_load>, thr_data=0x0) at /home/almagui/src/dwarves/dwarf_loader.c:3029 #10 0x00007f8c82626765 in cus__finalize (cus=0x21412a0, cu=0x7f8c38001e20, conf=0x412400 <conf_load>, thr_data=0x0) at /home/almagui/src/dwarves/dwarf_loader.c:3036 #11 0x00007f8c82626e9b in dwarf_cus__process_cu (dcus=0x7ffd71eaf0d0, cu_die=0x7f8c653ffeb0, cu=0x7f8c38001e20, thr_data=0x0) at /home/almagui/src/dwarves/dwarf_loader.c:3243 #12 0x00007f8c826270d2 in dwarf_cus__process_cu_thread (arg=0x7ffd71eaef50) at /home/almagui/src/dwarves/dwarf_loader.c:3313 #13 0x00007f8c816081da in start_thread () from /usr/lib64/libpthread.so.0 #14 0x00007f8c81239e73 in clone () from /usr/lib64/libc.so.6 So for conf_load->skip_encoding_btf_inconsistent_proto (enabled as part of "all" and enabled for vmlinux/module BTF), we use dwarves_fprintf() to write prototypes to check for inconsistent definitions. Program terminated with signal SIGSEGV, Segmentation fault. #0 0x00007f8c8260a58c in ptr_table__entry (pt=0x7f8c60001e70, id=77) at /home/almagui/src/dwarves/dwarves.c:612 612 return id >= pt->nr_entries ? NULL : pt->entries[id]; [Current thread is 1 (Thread 0x7f8c65400700 (LWP 624441))] (gdb) print *(struct ptr_table *)0x7f8c60001e70 $1 = {entries = 0x0, nr_entries = 2979, allocated_entries = 4096} (gdb) So it looks like the ptr_table has 2979 entries but entries is NULL; could there be an issue where CU initialization is not yet complete for some threads (it also happens very early in processing)? Can you reproduce this failure at your end? Thanks! Alan > - Arnaldo > > Arnaldo Carvalho de Melo (12): > core: Allow asking for a reproducible build > pahole: Disable BTF multithreaded encoded when doing reproducible builds > dwarf_loader: Separate creating the cu/dcu pair from processing it > dwarf_loader: Introduce dwarf_cus__process_cu() > dwarf_loader: Create the cu/dcu pair in dwarf_cus__nextcu() > dwarf_loader: Remove unused 'thr_data' arg from dwarf_cus__create_and_process_cu() > core: Add unlocked cus__add() variant > core: Add cus__remove(), counterpart of cus__add() > dwarf_loader: Add the cu to the cus list early, remove on LSK_DELETE > core/dwarf_loader: Add functions to set state of CU processing > pahole: Encode BTF serially in a reproducible build > tests: Add a BTF reproducible generation test > > dwarf_loader.c | 73 +++++++++++++++++++++++--------- > dwarves.c | 58 ++++++++++++++++++++++++- > dwarves.h | 17 ++++++++ > pahole.c | 84 +++++++++++++++++++++++++++++++++++-- > tests/reproducible_build.sh | 56 +++++++++++++++++++++++++ > 5 files changed, 264 insertions(+), 24 deletions(-) > create mode 100755 tests/reproducible_build.sh >
On Tue, Apr 02, 2024 at 04:39:33PM -0300, Arnaldo Carvalho de Melo wrote: > Hi, > > This allows us to have reproducible builds while keeping the > DWARF loading phase in parallel, achieving a noticeable speedup as > showed in the commit log messages: > > On a: > > model name : Intel(R) Core(TM) i7-14700K > > 8 performance cores (16 threads), 12 efficiency cores. > > Serial encoding: > > $ perf stat -e cycles -r5 pahole --btf_encode_detached=vmlinux.btf.serial vmlinux > 5.18276 +- 0.00952 seconds time elapsed ( +- 0.18% ) > > Parallel, but non-reproducible: > > $ perf stat -e cycles -r5 pahole -j --btf_encode_detached=vmlinux.btf.parallel vmlinux > 1.8529 +- 0.0159 seconds time elapsed ( +- 0.86% ) > > reproducible build done using parallel DWARF loading + CUs-ordered-as-in-vmlinux serial BTF encoding: > > $ perf stat -e cycles -r5 pahole -j --reproducible_build --btf_encode_detached=vmlinux.btf.parallel.reproducible_build vmlinux > 2.3632 +- 0.0164 seconds time elapsed ( +- 0.69% ) hm, got it even faster than paralel build ;-) but it's within the 1 second deviation, I guess it shows better on bigger kernels reproducible_build: [root@krava linux-qemu]# perf stat -e cycles -r 3 -- /home/jolsa/kernel/bpf/pahole/build/pahole -j --reproducible_build --btf_encode_detached=btf.2 ./vmlinux Performance counter stats for '/home/jolsa/kernel/bpf/pahole/build/pahole -j --reproducible_build --btf_encode_detached=btf.2 ./vmlinux' (3 runs): 295,519,117,258 cycles ( +- 19.48% ) 9.43 +- 1.02 seconds time elapsed ( +- 10.84% ) paralel build: [root@krava linux-qemu]# perf stat -e cycles -r 3 -- /home/jolsa/kernel/bpf/pahole/build/pahole -j --btf_encode_detached=btf.2 ./vmlinux Performance counter stats for '/home/jolsa/kernel/bpf/pahole/build/pahole -j --btf_encode_detached=btf.2 ./vmlinux' (3 runs): 391,320,973,331 cycles ( +- 19.19% ) 9.851 +- 0.695 seconds time elapsed ( +- 7.06% ) 1 cpu: [root@krava linux-qemu]# perf stat -e cycles -r 3 -- /home/jolsa/kernel/bpf/pahole/build/pahole --btf_encode_detached=btf.2 ./vmlinux Performance counter stats for '/home/jolsa/kernel/bpf/pahole/build/pahole --btf_encode_detached=btf.2 ./vmlinux' (3 runs): 208,492,342,135 cycles ( +- 19.43% ) 16.761 +- 0.644 seconds time elapsed ( +- 3.84% ) jirka > > Please take a look, its in the 'next' branch at: > > https://git.kernel.org/pub/scm/devel/pahole/pahole.git > https://git.kernel.org/pub/scm/devel/pahole/pahole.git/log/?h=next > > There is a new tool to do regression testing on this feature: > > https://git.kernel.org/pub/scm/devel/pahole/pahole.git/commit/?h=next&id=c751214c19bf8591bf8e4abdc677cbadee08f630 > > And here a more detailed set of tests using it: > > https://git.kernel.org/pub/scm/devel/pahole/pahole.git/commit/?h=next&id=4451467ca16a6e31834f6f98661c63587ce556f7 > > Working on libbpf to allow for parallel reproducible BTF encoding is the > next step. > > Thanks a lot, > > - Arnaldo > > Arnaldo Carvalho de Melo (12): > core: Allow asking for a reproducible build > pahole: Disable BTF multithreaded encoded when doing reproducible builds > dwarf_loader: Separate creating the cu/dcu pair from processing it > dwarf_loader: Introduce dwarf_cus__process_cu() > dwarf_loader: Create the cu/dcu pair in dwarf_cus__nextcu() > dwarf_loader: Remove unused 'thr_data' arg from dwarf_cus__create_and_process_cu() > core: Add unlocked cus__add() variant > core: Add cus__remove(), counterpart of cus__add() > dwarf_loader: Add the cu to the cus list early, remove on LSK_DELETE > core/dwarf_loader: Add functions to set state of CU processing > pahole: Encode BTF serially in a reproducible build > tests: Add a BTF reproducible generation test > > dwarf_loader.c | 73 +++++++++++++++++++++++--------- > dwarves.c | 58 ++++++++++++++++++++++++- > dwarves.h | 17 ++++++++ > pahole.c | 84 +++++++++++++++++++++++++++++++++++-- > tests/reproducible_build.sh | 56 +++++++++++++++++++++++++ > 5 files changed, 264 insertions(+), 24 deletions(-) > create mode 100755 tests/reproducible_build.sh > > -- > 2.44.0 >
On 04/04/2024 09:58, Alan Maguire wrote: > On 02/04/2024 20:39, Arnaldo Carvalho de Melo wrote: >> Hi, >> >> This allows us to have reproducible builds while keeping the >> DWARF loading phase in parallel, achieving a noticeable speedup as >> showed in the commit log messages: >> >> On a: >> >> model name : Intel(R) Core(TM) i7-14700K >> >> 8 performance cores (16 threads), 12 efficiency cores. >> >> Serial encoding: >> >> $ perf stat -e cycles -r5 pahole --btf_encode_detached=vmlinux.btf.serial vmlinux >> 5.18276 +- 0.00952 seconds time elapsed ( +- 0.18% ) >> >> Parallel, but non-reproducible: >> >> $ perf stat -e cycles -r5 pahole -j --btf_encode_detached=vmlinux.btf.parallel vmlinux >> 1.8529 +- 0.0159 seconds time elapsed ( +- 0.86% ) >> >> reproducible build done using parallel DWARF loading + CUs-ordered-as-in-vmlinux serial BTF encoding: >> >> $ perf stat -e cycles -r5 pahole -j --reproducible_build --btf_encode_detached=vmlinux.btf.parallel.reproducible_build vmlinux >> 2.3632 +- 0.0164 seconds time elapsed ( +- 0.69% ) >> >> Please take a look, its in the 'next' branch at: >> >> https://git.kernel.org/pub/scm/devel/pahole/pahole.git >> https://git.kernel.org/pub/scm/devel/pahole/pahole.git/log/?h=next >> >> There is a new tool to do regression testing on this feature: >> >> https://git.kernel.org/pub/scm/devel/pahole/pahole.git/commit/?h=next&id=c751214c19bf8591bf8e4abdc677cbadee08f630 >> >> And here a more detailed set of tests using it: >> >> https://git.kernel.org/pub/scm/devel/pahole/pahole.git/commit/?h=next&id=4451467ca16a6e31834f6f98661c63587ce556f7 >> >> Working on libbpf to allow for parallel reproducible BTF encoding is the >> next step. >> >> Thanks a lot, >> > > Hey Arnaldo > > In testing this series I've hit a segmentation fault: > > Using host libthread_db library "/usr/lib64/libthread_db.so.1". > Core was generated by `pahole -J --btf_features=all --reproducible_build > -j vmlinux'. > Program terminated with signal SIGSEGV, Segmentation fault. > #0 0x00007f8c8260a58c in ptr_table__entry (pt=0x7f8c60001e70, id=77) > at /home/almagui/src/dwarves/dwarves.c:612 > 612 return id >= pt->nr_entries ? NULL : pt->entries[id]; > [Current thread is 1 (Thread 0x7f8c65400700 (LWP 624441))] > (gdb) bt > #0 0x00007f8c8260a58c in ptr_table__entry (pt=0x7f8c60001e70, id=77) > at /home/almagui/src/dwarves/dwarves.c:612 > #1 0x00007f8c8260ada2 in cu__type (cu=0x7f8c60001e40, id=77) > at /home/almagui/src/dwarves/dwarves.c:806 > #2 0x00007f8c8261342c in ftype__fprintf (ftype=0x7f8c60272f30, > cu=0x7f8c60001e40, name=0x0, inlined=0, is_pointer=0, type_spacing=0, > is_prototype=true, conf=0x7f8c653ff930, fp=0x7f8c3804bc90) > at /home/almagui/src/dwarves/dwarves_fprintf.c:1388 > #3 0x00007f8c8261289d in function__prototype_conf (func=0x7f8c60272f30, > cu=0x7f8c60001e40, conf=0x7f8c653ff930, bf=0x7f8c27225dad "", len=512) > at /home/almagui/src/dwarves/dwarves_fprintf.c:1183 > #4 0x00007f8c8261b52b in proto__get (func=0x7f8c60272f30, > proto=0x7f8c27225dad "", len=512) > at /home/almagui/src/dwarves/btf_encoder.c:811 > #5 0x00007f8c8261b665 in funcs__match (encoder=0x7f8c28023220, > func=0x7f8c27225d88, f2=0x7f8c5805c560) > at /home/almagui/src/dwarves/btf_encoder.c:839 > #6 0x00007f8c8261b7fc in btf_encoder__save_func (encoder=0x7f8c28023220, > fn=0x7f8c5805c560, func=0x7f8c27225d88) > at /home/almagui/src/dwarves/btf_encoder.c:871 > #7 0x00007f8c8261e361 in btf_encoder__encode_cu (encoder=0x7f8c28023220, > cu=0x7f8c58001e20, conf_load=0x412400 <conf_load>) > at /home/almagui/src/dwarves/btf_encoder.c:1888 > #8 0x000000000040a36c in pahole_stealer (cu=0x7f8c58001e20, > conf_load=0x412400 <conf_load>, thr_data=0x0) > at /home/almagui/src/dwarves/pahole.c:3342 > #9 0x00007f8c8262672c in cu__finalize (cu=0x7f8c38001e20, cus=0x21412a0, > conf=0x412400 <conf_load>, thr_data=0x0) > at /home/almagui/src/dwarves/dwarf_loader.c:3029 > #10 0x00007f8c82626765 in cus__finalize (cus=0x21412a0, cu=0x7f8c38001e20, > conf=0x412400 <conf_load>, thr_data=0x0) > at /home/almagui/src/dwarves/dwarf_loader.c:3036 > #11 0x00007f8c82626e9b in dwarf_cus__process_cu (dcus=0x7ffd71eaf0d0, > cu_die=0x7f8c653ffeb0, cu=0x7f8c38001e20, thr_data=0x0) > at /home/almagui/src/dwarves/dwarf_loader.c:3243 > #12 0x00007f8c826270d2 in dwarf_cus__process_cu_thread (arg=0x7ffd71eaef50) > at /home/almagui/src/dwarves/dwarf_loader.c:3313 > #13 0x00007f8c816081da in start_thread () from /usr/lib64/libpthread.so.0 > #14 0x00007f8c81239e73 in clone () from /usr/lib64/libc.so.6 > > So for conf_load->skip_encoding_btf_inconsistent_proto (enabled as part > of "all" and enabled for vmlinux/module BTF), we use dwarves_fprintf() > to write prototypes to check for inconsistent definitions. > > Program terminated with signal SIGSEGV, Segmentation fault. > #0 0x00007f8c8260a58c in ptr_table__entry (pt=0x7f8c60001e70, id=77) > at /home/almagui/src/dwarves/dwarves.c:612 > 612 return id >= pt->nr_entries ? NULL : pt->entries[id]; > [Current thread is 1 (Thread 0x7f8c65400700 (LWP 624441))] > (gdb) print *(struct ptr_table *)0x7f8c60001e70 > $1 = {entries = 0x0, nr_entries = 2979, allocated_entries = 4096} > (gdb) > > So it looks like the ptr_table has 2979 entries but entries is NULL; > could there be an issue where CU initialization is not yet complete > for some threads (it also happens very early in processing)? Can you > reproduce this failure at your end? Thanks! > the following (when applied on top of the series) resolves the segmentation fault for me: diff --git a/pahole.c b/pahole.c index 6c7e738..5ff0eaf 100644 --- a/pahole.c +++ b/pahole.c @@ -3348,8 +3348,8 @@ static enum load_steal_kind pahole_stealer(struct cu *cu, if (conf_load->reproducible_build) { ret = LSK__KEEPIT; // we're not processing the cu passed to this function, so keep it. - // Equivalent to LSK__DELETE since we processed this - cus__remove(cus, cu); - cu__delete(cu); } out_btf: if (!thr_data) // See comment about reproducibe_build above In other words, the problem is we remove/delete CUs when finished with them in each thread (when BTF is generated). However because the save/add_saved_funcs stashes CU references in the associated struct function * (to allow prototype comparison for the same function in different CUs), we end up with stale CU references and in this case the freed/nulled ptr_table caused an issue. As far as I can see we need to retain CUs until all BTF has been merged from threads. With the fix in place, I'm seeing less then 100msec difference between reproducible/non-reproducible vmlinux BTF generation; that's great! Alan > Alan > >> - Arnaldo >> >> Arnaldo Carvalho de Melo (12): >> core: Allow asking for a reproducible build >> pahole: Disable BTF multithreaded encoded when doing reproducible builds >> dwarf_loader: Separate creating the cu/dcu pair from processing it >> dwarf_loader: Introduce dwarf_cus__process_cu() >> dwarf_loader: Create the cu/dcu pair in dwarf_cus__nextcu() >> dwarf_loader: Remove unused 'thr_data' arg from dwarf_cus__create_and_process_cu() >> core: Add unlocked cus__add() variant >> core: Add cus__remove(), counterpart of cus__add() >> dwarf_loader: Add the cu to the cus list early, remove on LSK_DELETE >> core/dwarf_loader: Add functions to set state of CU processing >> pahole: Encode BTF serially in a reproducible build >> tests: Add a BTF reproducible generation test >> >> dwarf_loader.c | 73 +++++++++++++++++++++++--------- >> dwarves.c | 58 ++++++++++++++++++++++++- >> dwarves.h | 17 ++++++++ >> pahole.c | 84 +++++++++++++++++++++++++++++++++++-- >> tests/reproducible_build.sh | 56 +++++++++++++++++++++++++ >> 5 files changed, 264 insertions(+), 24 deletions(-) >> create mode 100755 tests/reproducible_build.sh >> >
On Mon, Apr 08, 2024 at 01:00:59PM +0100, Alan Maguire wrote: > On 04/04/2024 09:58, Alan Maguire wrote: > > Program terminated with signal SIGSEGV, Segmentation fault. > > #0 0x00007f8c8260a58c in ptr_table__entry (pt=0x7f8c60001e70, id=77) > > at /home/almagui/src/dwarves/dwarves.c:612 > > 612 return id >= pt->nr_entries ? NULL : pt->entries[id]; > > [Current thread is 1 (Thread 0x7f8c65400700 (LWP 624441))] > > (gdb) print *(struct ptr_table *)0x7f8c60001e70 > > $1 = {entries = 0x0, nr_entries = 2979, allocated_entries = 4096} > > (gdb) > > So it looks like the ptr_table has 2979 entries but entries is NULL; > > could there be an issue where CU initialization is not yet complete > > for some threads (it also happens very early in processing)? Can you > > reproduce this failure at your end? Thanks! > the following (when applied on top of the series) resolves the > segmentation fault for me: > diff --git a/pahole.c b/pahole.c > index 6c7e738..5ff0eaf 100644 > --- a/pahole.c > +++ b/pahole.c > @@ -3348,8 +3348,8 @@ static enum load_steal_kind pahole_stealer(struct > cu *cu, > if (conf_load->reproducible_build) { > ret = LSK__KEEPIT; // we're not processing the > cu passed to this function, so keep it. > - // Equivalent to LSK__DELETE since we processed > this > - cus__remove(cus, cu); > - cu__delete(cu); > } > out_btf: > if (!thr_data) // See comment about reproducibe_build above > Yeah, Jiri also pointed out this call to cu__delete() was new, I was trying to avoid having unprocessed 'struct cu' using too much memory, so after processing it, delete them, but as you found out there are references to that memory... > In other words, the problem is we remove/delete CUs when finished with > them in each thread (when BTF is generated). However because the > save/add_saved_funcs stashes CU references in the associated struct > function * (to allow prototype comparison for the same function in > different CUs), we end up with stale CU references and in this case the > freed/nulled ptr_table caused an issue. As far as I can see we need to > retain CUs until all BTF has been merged from threads. > With the fix in place, I'm seeing less then 100msec difference between > reproducible/non-reproducible vmlinux BTF generation; that's great! Excellent! I'll remove it and add a note crediting you with the removal and having the explanation about why its not possibe to delete it at that point (references to the associated 'struct function'). Perhaps we can save this info in some other way that allows us to free the CU after having it processed, I'll think about it. But its good to see that the difference is small, great! Thanks a lot! - Arnaldo
On Thu, 2024-04-04 at 09:05 +0100, Alan Maguire wrote: [...] > Could that be the handling of functions with same name, inconsistent > prototypes? We leave them out deliberately (see > btf_encoder__add_saved_funcs(). > > > I'll try to figure out the reason for slowdown tomorrow. > > > > [1] https://github.com/eddyz87/dwarves/tree/sort-btf > > Fwiw, the best I can do is here: https://github.com/eddyz87/dwarves/tree/sort-btf It adds total ordering to BTF types using kind, kflag, vlen, name etc properties, and rebuilds final BTF to follow this order. Here are the measurements: $ sudo cpupower frequency-set --min 3Ghz --max 3Ghz $ nice -n18 perf stat -r50 pahole --reproducible_build -j --btf_encode_detached=/dev/null vmlinux ... 3.08648 +- 0.00813 seconds time elapsed ( +- 0.26% ) $ nice -n18 perf stat -r50 pahole -j --btf_encode_detached=/dev/null vmlinux ... 3.00785 +- 0.00878 seconds time elapsed ( +- 0.29% ) Which gives 2.6% total time penalty when reproducible build option is used. ./test_progs tests are passing. Still, there are a few discrepancies in generated BTFs: some function prototypes are included twice at random (about 30 IDs added/deleted). This might be connected to Alan's suggestion and requires further investigation. All in all, Arnaldo's approach with CU ordering looks simpler. Best regards, Eduard
On Tue, Apr 9, 2024 at 7:34 AM Eduard Zingerman <eddyz87@gmail.com> wrote: > > On Thu, 2024-04-04 at 09:05 +0100, Alan Maguire wrote: > [...] > > > Could that be the handling of functions with same name, inconsistent > > prototypes? We leave them out deliberately (see > > btf_encoder__add_saved_funcs(). > > > > > I'll try to figure out the reason for slowdown tomorrow. > > > > > > [1] https://github.com/eddyz87/dwarves/tree/sort-btf > > > > > Fwiw, the best I can do is here: > https://github.com/eddyz87/dwarves/tree/sort-btf > > It adds total ordering to BTF types using kind, kflag, vlen, name etc properties, > and rebuilds final BTF to follow this order. Here are the measurements: > > $ sudo cpupower frequency-set --min 3Ghz --max 3Ghz > $ nice -n18 perf stat -r50 pahole --reproducible_build -j --btf_encode_detached=/dev/null vmlinux > ... > 3.08648 +- 0.00813 seconds time elapsed ( +- 0.26% ) > > $ nice -n18 perf stat -r50 pahole -j --btf_encode_detached=/dev/null vmlinux > ... > 3.00785 +- 0.00878 seconds time elapsed ( +- 0.29% ) > > Which gives 2.6% total time penalty when reproducible build option is used. > ./test_progs tests are passing. > > Still, there are a few discrepancies in generated BTFs: some function > prototypes are included twice at random (about 30 IDs added/deleted). > This might be connected to Alan's suggestion and requires > further investigation. > > All in all, Arnaldo's approach with CU ordering looks simpler. I would actually go with sorted BTF, since it will probably make diff-ing of BTFs practical. Will be easier to track changes from one kernel version to another. vmlinux.h will become a bit more sorted too and normal diff vmlinux_6_1.h vmlinux_6_2.h will be possible. Or am I misunderstanding the sorting concept?
On Tue, 2024-04-09 at 07:56 -0700, Alexei Starovoitov wrote: [...] > I would actually go with sorted BTF, since it will probably > make diff-ing of BTFs practical. Will be easier to track changes > from one kernel version to another. vmlinux.h will become > a bit more sorted too and normal diff vmlinux_6_1.h vmlinux_6_2.h > will be possible. > Or am I misunderstanding the sorting concept? You understand the concept correctly, here is a sample: [1] INT '_Bool' size=1 bits_offset=0 nr_bits=8 encoding=BOOL [2] INT '__int128' size=16 bits_offset=0 nr_bits=128 encoding=SIGNED [3] INT '__int128 unsigned' size=16 bits_offset=0 nr_bits=128 encoding=(none) [4] INT 'char' size=1 bits_offset=0 nr_bits=8 encoding=(none) [5] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED [6] INT 'long int' size=8 bits_offset=0 nr_bits=64 encoding=SIGNED [7] INT 'long long int' size=8 bits_offset=0 nr_bits=64 encoding=SIGNED ... [15085] STRUCT 'arch_elf_state' size=0 vlen=0 [15086] STRUCT 'arch_vdso_data' size=0 vlen=0 [15087] STRUCT 'bpf_run_ctx' size=0 vlen=0 [15088] STRUCT 'dev_archdata' size=0 vlen=0 [15089] STRUCT 'dyn_arch_ftrace' size=0 vlen=0 [15090] STRUCT 'fscrypt_dummy_policy' size=0 vlen=0 ... (Sort by kind, than by vlen, than by name because sorting by name is a bit costly, then by member properties)
On Tue, Apr 09, 2024 at 06:01:08PM +0300, Eduard Zingerman wrote: > On Tue, 2024-04-09 at 07:56 -0700, Alexei Starovoitov wrote: > [...] > > I would actually go with sorted BTF, since it will probably > > make diff-ing of BTFs practical. Will be easier to track changes What kind of diff-ing of BTFs from different kernels are you interested in? in pahole's repository we have btfdiff, that will, given a vmlinux with both DWARF and BTF use pahole to pretty print all types, expanded, and then compare the two outputs, which should produce the same results from BTF and DWARF. Ditto for DWARF from a vmlinux compared to a detached BTF file. And also now we have another regression test script that will produce the output from 'btftool btf dump' for the BTF generated from DWARF in serial mode, and then compare that with the output from 'bpftool btf dump' for reproducible encodings done using -j 1 ... number-of-processors-on-the-machine. All have to match, all types, all BTF ids. We can as well use something like btfdiff to compare the output from 'pahole --expand_types --sort' for two BTFs for two different kernels, to see what are the new types and the changes to types in both. What else do you want to compare? To be able to match we would have to somehow have ranges for each DWARF CU so that when encoding and then deduplicating we would have space in the ID space for new types to fill in while keeping the old types IDs matching the same types in the new vmlinux. While ordering all types we would have to have ID space available from each of the BTF kinds, no? I haven't looked at Eduard's patches, is that what it is done? > > from one kernel version to another. vmlinux.h will become > > a bit more sorted too and normal diff vmlinux_6_1.h vmlinux_6_2.h > > will be possible. > > Or am I misunderstanding the sorting concept? > You understand the concept correctly, here is a sample: > [1] INT '_Bool' size=1 bits_offset=0 nr_bits=8 encoding=BOOL > [2] INT '__int128' size=16 bits_offset=0 nr_bits=128 encoding=SIGNED > [3] INT '__int128 unsigned' size=16 bits_offset=0 nr_bits=128 encoding=(none) > [4] INT 'char' size=1 bits_offset=0 nr_bits=8 encoding=(none) > [5] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED > [6] INT 'long int' size=8 bits_offset=0 nr_bits=64 encoding=SIGNED > [7] INT 'long long int' size=8 bits_offset=0 nr_bits=64 encoding=SIGNED The above: so far so good, probably there will not be something that will push what is now BTF id 6 to become 7 in a new vmlinux, but can we say the same for the more dynamic parts, like the list of structs? A struct can vanish, that abstraction not being used anymore in the kernel, so its BTF id will vacate and all of the next struct IDs will "fall down" and gets its IDs decremented, no? If these difficulties are present as I mentioned, then rebuilding from the BTF data with something like the existing 'pahole --expand_types --sort' from the BTF from kernel N to compare with the same output for kernel N + 1 should be enough to see what changed from one kernel to the next one? - Arnaldo > ... > [15085] STRUCT 'arch_elf_state' size=0 vlen=0 > [15086] STRUCT 'arch_vdso_data' size=0 vlen=0 > [15087] STRUCT 'bpf_run_ctx' size=0 vlen=0 > [15088] STRUCT 'dev_archdata' size=0 vlen=0 > [15089] STRUCT 'dyn_arch_ftrace' size=0 vlen=0 > [15090] STRUCT 'fscrypt_dummy_policy' size=0 vlen=0 > ... > > (Sort by kind, than by vlen, than by name because sorting by name is a > bit costly, then by member properties)
On Tue, 2024-04-09 at 15:45 -0300, Arnaldo Carvalho de Melo wrote: > On Tue, Apr 09, 2024 at 06:01:08PM +0300, Eduard Zingerman wrote: > > On Tue, 2024-04-09 at 07:56 -0700, Alexei Starovoitov wrote: > > [...] > > > > I would actually go with sorted BTF, since it will probably > > > make diff-ing of BTFs practical. Will be easier to track changes > > What kind of diff-ing of BTFs from different kernels are you interested > in? > > in pahole's repository we have btfdiff, that will, given a vmlinux with > both DWARF and BTF use pahole to pretty print all types, expanded, and > then compare the two outputs, which should produce the same results from > BTF and DWARF. Ditto for DWARF from a vmlinux compared to a detached BTF > file. > > And also now we have another regression test script that will produce > the output from 'btftool btf dump' for the BTF generated from DWARF in > serial mode, and then compare that with the output from 'bpftool btf > dump' for reproducible encodings done using -j 1 ... > number-of-processors-on-the-machine. All have to match, all types, all > BTF ids. > > We can as well use something like btfdiff to compare the output from > 'pahole --expand_types --sort' for two BTFs for two different kernels, > to see what are the new types and the changes to types in both. > > What else do you want to compare? To be able to match we would have to > somehow have ranges for each DWARF CU so that when encoding and then > deduplicating we would have space in the ID space for new types to fill > in while keeping the old types IDs matching the same types in the new > vmlinux. As far as I understand Alexei, he means diffing two vmlinux.h files generated for different kernel versions. The vmlinux.h is generated by bpftool using command `bpftool btf dump file <binary-file> format c`. The output is topologically sorted to satisfy C compiler, but ordering is not total, so vmlinux.h content may vary from build to build if BTF type order differs. Thus, any kind of stable BTF type ordering would make vmlinux.h stable. On the other hand, topological ordering used by bpftool (the algorithm is in the libbpf, actually) might be extended with additional rules to make the ordering total. > While ordering all types we would have to have ID space available from > each of the BTF kinds, no? > > I haven't looked at Eduard's patches, is that what it is done? No, I don't reserve any ID space, the output of `bpftool btf dump file <binary-file> format raw` is not suitable for diffing w/o post-processing if some types are added or removed in the middle. I simply add a function to compare two BTF types and a pass that sorts all BTF types before finalizing BTF generation. > > > from one kernel version to another. vmlinux.h will become > > > a bit more sorted too and normal diff vmlinux_6_1.h vmlinux_6_2.h > > > will be possible. > > > Or am I misunderstanding the sorting concept? > > > You understand the concept correctly, here is a sample: > > > [1] INT '_Bool' size=1 bits_offset=0 nr_bits=8 encoding=BOOL > > [2] INT '__int128' size=16 bits_offset=0 nr_bits=128 encoding=SIGNED > > [3] INT '__int128 unsigned' size=16 bits_offset=0 nr_bits=128 encoding=(none) > > [4] INT 'char' size=1 bits_offset=0 nr_bits=8 encoding=(none) > > [5] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED > > [6] INT 'long int' size=8 bits_offset=0 nr_bits=64 encoding=SIGNED > > [7] INT 'long long int' size=8 bits_offset=0 nr_bits=64 encoding=SIGNED > > The above: so far so good, probably there will not be something that > will push what is now BTF id 6 to become 7 in a new vmlinux, but can we > say the same for the more dynamic parts, like the list of structs? > > A struct can vanish, that abstraction not being used anymore in the > kernel, so its BTF id will vacate and all of the next struct IDs will > "fall down" and gets its IDs decremented, no? Yes, this would happen. > If these difficulties are present as I mentioned, then rebuilding from > the BTF data with something like the existing 'pahole --expand_types > --sort' from the BTF from kernel N to compare with the same output for > kernel N + 1 should be enough to see what changed from one kernel to the > next one? Yes, this is an option. Thanks, Eduard
On Tue, Apr 9, 2024 at 12:29 PM Eduard Zingerman <eddyz87@gmail.com> wrote: > > On Tue, 2024-04-09 at 15:45 -0300, Arnaldo Carvalho de Melo wrote: > > On Tue, Apr 09, 2024 at 06:01:08PM +0300, Eduard Zingerman wrote: > > > On Tue, 2024-04-09 at 07:56 -0700, Alexei Starovoitov wrote: > > > [...] > > > > > > I would actually go with sorted BTF, since it will probably > > > > make diff-ing of BTFs practical. Will be easier to track changes > > > > What kind of diff-ing of BTFs from different kernels are you interested > > in? > > > > in pahole's repository we have btfdiff, that will, given a vmlinux with > > both DWARF and BTF use pahole to pretty print all types, expanded, and > > then compare the two outputs, which should produce the same results from > > BTF and DWARF. Ditto for DWARF from a vmlinux compared to a detached BTF > > file. > > > > And also now we have another regression test script that will produce > > the output from 'btftool btf dump' for the BTF generated from DWARF in > > serial mode, and then compare that with the output from 'bpftool btf > > dump' for reproducible encodings done using -j 1 ... > > number-of-processors-on-the-machine. All have to match, all types, all > > BTF ids. > > > > We can as well use something like btfdiff to compare the output from > > 'pahole --expand_types --sort' for two BTFs for two different kernels, > > to see what are the new types and the changes to types in both. > > > > What else do you want to compare? To be able to match we would have to > > somehow have ranges for each DWARF CU so that when encoding and then > > deduplicating we would have space in the ID space for new types to fill > > in while keeping the old types IDs matching the same types in the new > > vmlinux. > > As far as I understand Alexei, he means diffing two vmlinux.h files > generated for different kernel versions. The vmlinux.h is generated by > bpftool using command `bpftool btf dump file <binary-file> format c`. > The output is topologically sorted to satisfy C compiler, but ordering > is not total, so vmlinux.h content may vary from build to build if BTF > type order differs. > > Thus, any kind of stable BTF type ordering would make vmlinux.h stable. > On the other hand, topological ordering used by bpftool > (the algorithm is in the libbpf, actually) might be extended with > additional rules to make the ordering total. Not looking for perfect ordering. People check-in vmlinux.h into their repos. If it's more or less sorted the git diff of updated vmlinux.h will be a bit more human readable. Hopefully.
On Tue, Apr 09, 2024 at 10:29:18PM +0300, Eduard Zingerman wrote: > On Tue, 2024-04-09 at 15:45 -0300, Arnaldo Carvalho de Melo wrote: > > On Tue, Apr 09, 2024 at 06:01:08PM +0300, Eduard Zingerman wrote: > > > On Tue, 2024-04-09 at 07:56 -0700, Alexei Starovoitov wrote: > > > [...] > > > > > > I would actually go with sorted BTF, since it will probably > > > > make diff-ing of BTFs practical. Will be easier to track changes > > > > What kind of diff-ing of BTFs from different kernels are you interested > > in? > > > > in pahole's repository we have btfdiff, that will, given a vmlinux with > > both DWARF and BTF use pahole to pretty print all types, expanded, and > > then compare the two outputs, which should produce the same results from > > BTF and DWARF. Ditto for DWARF from a vmlinux compared to a detached BTF > > file. > > > > And also now we have another regression test script that will produce > > the output from 'btftool btf dump' for the BTF generated from DWARF in > > serial mode, and then compare that with the output from 'bpftool btf > > dump' for reproducible encodings done using -j 1 ... > > number-of-processors-on-the-machine. All have to match, all types, all > > BTF ids. > > > > We can as well use something like btfdiff to compare the output from > > 'pahole --expand_types --sort' for two BTFs for two different kernels, > > to see what are the new types and the changes to types in both. > > > > What else do you want to compare? To be able to match we would have to > > somehow have ranges for each DWARF CU so that when encoding and then > > deduplicating we would have space in the ID space for new types to fill > > in while keeping the old types IDs matching the same types in the new > > vmlinux. > > As far as I understand Alexei, he means diffing two vmlinux.h files > generated for different kernel versions. The vmlinux.h is generated by > bpftool using command `bpftool btf dump file <binary-file> format c`. > The output is topologically sorted to satisfy C compiler, but ordering > is not total, so vmlinux.h content may vary from build to build if BTF > type order differs. > > Thus, any kind of stable BTF type ordering would make vmlinux.h stable. > On the other hand, topological ordering used by bpftool > (the algorithm is in the libbpf, actually) might be extended with > additional rules to make the ordering total. Interesting, the other tool that is in the pahole repo is 'fullcircle', that given a .o file will generate a compileable file (a vmlinux.h say) and then build it again to generate DWARF and then compare the original DWARF with the new onbe. > > While ordering all types we would have to have ID space available from > > each of the BTF kinds, no? > > > > I haven't looked at Eduard's patches, is that what it is done? > > No, I don't reserve any ID space, the output of > `bpftool btf dump file <binary-file> format raw` is not suitable for > diffing w/o post-processing if some types are added or removed in the > middle. > I simply add a function to compare two BTF types and a pass that sorts > all BTF types before finalizing BTF generation. Ok, so I see that the BTF ids for the types will change, its the vmlinux.h that is to be compared. root@x1:~# pahole -F btf --compile | tail -12 struct ncsi_aen_handler { unsigned char type; /* 0 1 */ /* XXX 3 bytes hole, try to pack */ int payload; /* 4 4 */ int (*handler)(struct ncsi_dev_priv *, struct ncsi_aen_pkt_hdr *); /* 8 8 */ /* size: 16, cachelines: 1, members: 3 */ /* sum members: 13, holes: 1, sum holes: 3 */ /* last cacheline: 16 bytes */ }; root@x1:~# pahole -F btf --compile > a.c ; echo 'int main(void) { struct ncsi_aen_handler b = { 1, } ; return b.type ; } ' >> a.c ; gcc -g -o bla -c a.c root@x1:~# pahole --expand_types ncsi_aen_handler > from_kernel_btf root@x1:~# pahole --expand_types -C ncsi_aen_handler bla > from_bla_dwarf root@x1:~# diff -u from_kernel_btf from_bla_dwarf root@x1:~# The above is for a super simple struct, no expansions even, now for: root@x1:~# pahole -F btf --compile > a.c ; echo 'int main(void) { struct task_struct b = { .prio = 12345, } ; return b.prio ; } ' >> a.c ; gcc -g -o bla -c a.c root@x1:~# pahole --suppress_aligned_attribute --expand_types -C task_struct bla > from_bla_dwarf root@x1:~# pahole --suppress_aligned_attribute --expand_types task_struct > from_kernel_btf root@x1:~# diff -u from_kernel_btf from_bla_dwarf root@x1:~# I suppressed the align attribute as right now the output from pahole when it finds the __attribute__ alignment present in DWARF is slightly different, but equivalent (barring bugs) to when it infers the alignment and adds it to BTF data, that has no alignment info other than the member offsets (DWARF has both the member offsets to infer the alignment _and_ attributes when they are present in the source code, sometimes even duplicated, which probably is the reason for the difference in output (albeit the end result should be equivalent)). root@x1:~# pahole --expand_types task_struct | wc -l 1254 root@x1:~# pahole --expand_types task_struct | tail /* XXX last struct has 1 hole, 1 bit hole */ /* size: 13696, cachelines: 214, members: 265 */ /* sum members: 13522, holes: 20, sum holes: 158 */ /* sum bitfield members: 83 bits, bit holes: 2, sum bit holes: 45 bits */ /* member types with holes: 4, total: 6, bit holes: 2, total: 2 */ /* paddings: 6, sum paddings: 49 */ /* forced alignments: 2, forced holes: 2, sum forced holes: 88 */ }; root@x1:~# I.e. the original BTF doesn't have to be sorted (well, it will keep the order DWARF does, which, in turn, is another desire of reproducible builds, it will not have the same output for two kernel releases, but should be as close as possible) pahole (--sort or --compile) or bpftool can do it either by plain sorting the types (pahole --sort, used by btfdiff to compara output from DWARF to output from BTF) or by generating a compilable source code (pahole --compile, aka "topologically sorted to satisfy C compiler"). > > > > from one kernel version to another. vmlinux.h will become > > > > a bit more sorted too and normal diff vmlinux_6_1.h vmlinux_6_2.h > > > > will be possible. > > > > Or am I misunderstanding the sorting concept? > > > You understand the concept correctly, here is a sample: > > > [1] INT '_Bool' size=1 bits_offset=0 nr_bits=8 encoding=BOOL > > > [2] INT '__int128' size=16 bits_offset=0 nr_bits=128 encoding=SIGNED > > > [3] INT '__int128 unsigned' size=16 bits_offset=0 nr_bits=128 encoding=(none) > > > [4] INT 'char' size=1 bits_offset=0 nr_bits=8 encoding=(none) > > > [5] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED > > > [6] INT 'long int' size=8 bits_offset=0 nr_bits=64 encoding=SIGNED > > > [7] INT 'long long int' size=8 bits_offset=0 nr_bits=64 encoding=SIGNED > > The above: so far so good, probably there will not be something that > > will push what is now BTF id 6 to become 7 in a new vmlinux, but can we > > say the same for the more dynamic parts, like the list of structs? > > A struct can vanish, that abstraction not being used anymore in the > > kernel, so its BTF id will vacate and all of the next struct IDs will > > "fall down" and gets its IDs decremented, no? > Yes, this would happen. We're on the same page. > > If these difficulties are present as I mentioned, then rebuilding from > > the BTF data with something like the existing 'pahole --expand_types > > --sort' from the BTF from kernel N to compare with the same output for > > kernel N + 1 should be enough to see what changed from one kernel to the > > next one? > Yes, this is an option. Agreed. What I tried in my series was to do as little as possible to make the serial output be the same as whatever level of paralelism we have while making the whole process to cost as close to the unconstrained parallelism that we had in place, i.e. to get a reproducible build at the lowest cost in terms of code churn (the more code we touch, the more chances we have of new bugs to be introduced) and of CPU cycles/memory use, etc. - Arnaldo
On Mon, Apr 08, 2024 at 11:39:32AM -0300, Arnaldo Carvalho de Melo wrote: > On Mon, Apr 08, 2024 at 01:00:59PM +0100, Alan Maguire wrote: > > On 04/04/2024 09:58, Alan Maguire wrote: > > > Program terminated with signal SIGSEGV, Segmentation fault. > > > #0 0x00007f8c8260a58c in ptr_table__entry (pt=0x7f8c60001e70, id=77) > > > at /home/almagui/src/dwarves/dwarves.c:612 > > > 612 return id >= pt->nr_entries ? NULL : pt->entries[id]; > > > [Current thread is 1 (Thread 0x7f8c65400700 (LWP 624441))] > > > (gdb) print *(struct ptr_table *)0x7f8c60001e70 > > > $1 = {entries = 0x0, nr_entries = 2979, allocated_entries = 4096} > > > (gdb) > > > > So it looks like the ptr_table has 2979 entries but entries is NULL; > > > could there be an issue where CU initialization is not yet complete > > > for some threads (it also happens very early in processing)? Can you > > > reproduce this failure at your end? Thanks! > > > the following (when applied on top of the series) resolves the > > segmentation fault for me: > > > diff --git a/pahole.c b/pahole.c > > index 6c7e738..5ff0eaf 100644 > > --- a/pahole.c > > +++ b/pahole.c > > @@ -3348,8 +3348,8 @@ static enum load_steal_kind pahole_stealer(struct > > cu *cu, > > if (conf_load->reproducible_build) { > > ret = LSK__KEEPIT; // we're not processing the > > cu passed to this function, so keep it. > > - // Equivalent to LSK__DELETE since we processed > > this > > - cus__remove(cus, cu); > > - cu__delete(cu); > > } > > out_btf: > > if (!thr_data) // See comment about reproducibe_build above > > > > Yeah, Jiri also pointed out this call to cu__delete() was new, I was > trying to avoid having unprocessed 'struct cu' using too much memory, so > after processing it, delete them, but as you found out there are > references to that memory... > > > In other words, the problem is we remove/delete CUs when finished with > > them in each thread (when BTF is generated). However because the > > save/add_saved_funcs stashes CU references in the associated struct > > function * (to allow prototype comparison for the same function in > > different CUs), we end up with stale CU references and in this case the > > freed/nulled ptr_table caused an issue. As far as I can see we need to > > retain CUs until all BTF has been merged from threads. > > > With the fix in place, I'm seeing less then 100msec difference between > > reproducible/non-reproducible vmlinux BTF generation; that's great! > > Excellent! > > I'll remove it and add a note crediting you with the removal and having > the explanation about why its not possibe to delete it at that point > (references to the associated 'struct function'). So I removed that cus__remove + cu__delete and also the other one at the flush operation, leaving all cleaning up to cus__delete() time: ⬢[acme@toolbox pahole]$ git diff diff --git a/dwarves.c b/dwarves.c index fbc8d8aa0060b7d0..1ec259f50dbd3778 100644 --- a/dwarves.c +++ b/dwarves.c @@ -489,8 +489,12 @@ struct cu *cus__get_next_processable_cu(struct cus *cus) cu->state = CU__PROCESSING; goto found; case CU__PROCESSING: - // This will only happen when we get to parallel - // reproducible BTF encoding, libbpf dedup work needed here. + // This will happen when we get to parallel + // reproducible BTF encoding, libbpf dedup work needed + // here. The other possibility is when we're flushing + // the DWARF processed CUs when the parallel DWARF + // loading stoped and we still have CUs to encode to + // BTF because of ordering requirements. continue; case CU__UNPROCESSED: // The first entry isn't loaded, signal the diff --git a/pahole.c b/pahole.c index 6c7e73835b3e9139..77772bb42bb443ce 100644 --- a/pahole.c +++ b/pahole.c @@ -3347,9 +3347,9 @@ static enum load_steal_kind pahole_stealer(struct cu *cu, if (conf_load->reproducible_build) { ret = LSK__KEEPIT; // we're not processing the cu passed to this function, so keep it. - // Equivalent to LSK__DELETE since we processed this - cus__remove(cus, cu); - cu__delete(cu); + // Kinda equivalent to LSK__DELETE since we processed this, but we can't delete it + // as we stash references to entries in CUs for 'struct function' in btf_encoder__add_saved_funcs() + // and btf_encoder__save_func(), so we can't delete them here. - Alan Maguire } out_btf: if (!thr_data) // See comment about reproducibe_build above @@ -3667,9 +3667,6 @@ static int cus__flush_reproducible_build(struct cus *cus, struct btf_encoder *en err = btf_encoder__encode_cu(encoder, cu, conf_load); if (err < 0) break; - - cus__remove(cus, cu); - cu__delete(cu); ⬢[acme@toolbox pahole]$ It ends up taking a bit more time on this 14700K with 32Gb, I'll later try to remove that need to keep everything in memory and also double check this hunch that this is due to keeping everyuthing in memory. Can I take this (with the above patch, that is a bit bigger than yours) as a Tested-by + Reviewed-by you? - Arnald
On Tue, Apr 09, 2024 at 05:34:46PM +0300, Eduard Zingerman wrote: > Still, there are a few discrepancies in generated BTFs: some function > prototypes are included twice at random (about 30 IDs added/deleted). > This might be connected to Alan's suggestion and requires > further investigation. > > All in all, Arnaldo's approach with CU ordering looks simpler. I'm going, for now, with the simple approach, can I take your comments as a Reviewed-by: you? - Arnaldo
On Fri, 2024-04-12 at 17:37 -0300, Arnaldo Carvalho de Melo wrote: > On Tue, Apr 09, 2024 at 05:34:46PM +0300, Eduard Zingerman wrote: > > Still, there are a few discrepancies in generated BTFs: some function > > prototypes are included twice at random (about 30 IDs added/deleted). > > This might be connected to Alan's suggestion and requires > > further investigation. > > > > All in all, Arnaldo's approach with CU ordering looks simpler. > > I'm going, for now, with the simple approach, can I take your comments > as a Reviewed-by: you? If you are going to post next version I'll go through the new series and ack the patches (I understand the main idea but did not read the series in detail).
On Fri, Apr 12, 2024 at 11:40:44PM +0300, Eduard Zingerman wrote: > On Fri, 2024-04-12 at 17:37 -0300, Arnaldo Carvalho de Melo wrote: > > On Tue, Apr 09, 2024 at 05:34:46PM +0300, Eduard Zingerman wrote: > > > Still, there are a few discrepancies in generated BTFs: some function > > > prototypes are included twice at random (about 30 IDs added/deleted). > > > This might be connected to Alan's suggestion and requires > > > further investigation. > > > > > > All in all, Arnaldo's approach with CU ordering looks simpler. > > > > I'm going, for now, with the simple approach, can I take your comments > > as a Reviewed-by: you? > > If you are going to post next version I'll go through the new series > and ack the patches (I understand the main idea but did not read the > series in detail). Ok, its now in the next branch, I'll repost here as well. - Arnaldo
On Fri, 2024-04-12 at 18:09 -0300, Arnaldo Carvalho de Melo wrote: > On Fri, Apr 12, 2024 at 11:40:44PM +0300, Eduard Zingerman wrote: > > On Fri, 2024-04-12 at 17:37 -0300, Arnaldo Carvalho de Melo wrote: > > > On Tue, Apr 09, 2024 at 05:34:46PM +0300, Eduard Zingerman wrote: > > > > Still, there are a few discrepancies in generated BTFs: some function > > > > prototypes are included twice at random (about 30 IDs added/deleted). > > > > This might be connected to Alan's suggestion and requires > > > > further investigation. > > > > > > > > All in all, Arnaldo's approach with CU ordering looks simpler. > > > > > > I'm going, for now, with the simple approach, can I take your comments > > > as a Reviewed-by: you? > > > > If you are going to post next version I'll go through the new series > > and ack the patches (I understand the main idea but did not read the > > series in detail). > > Ok, its now in the next branch, I'll repost here as well. Great, thank you, I'll go through it over the weekend.