bpf selftest pyperf180.c compilation failure with latest last llvm18 (in development)

Message ID	3e3a8a30-dde0-43a1-981e-2274962780ef@linux.dev (mailing list archive)
State	Changes Requested
Delegated to:	BPF
Headers	show Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A7C255672 for <bpf@vger.kernel.org>; Tue, 31 Oct 2023 03:59:05 +0000 (UTC) Message-ID: <3e3a8a30-dde0-43a1-981e-2274962780ef@linux.dev> Date: Mon, 30 Oct 2023 20:58:55 -0700 Precedence: bulk MIME-Version: 1.0 Content-Language: en-GB To: bpf <bpf@vger.kernel.org> Cc: Eddy Z <eddyz87@gmail.com>, Alexei Starovoitov <ast@kernel.org>, Daniel Borkmann <daniel@iogearbox.net>, Andrii Nakryiko <andrii@kernel.org>, Martin KaFai Lau <martin.lau@kernel.org> From: Yonghong Song <yonghong.song@linux.dev> Subject: bpf selftest pyperf180.c compilation failure with latest last llvm18 (in development) Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit
Series	bpf selftest pyperf180.c compilation failure with latest last llvm18 (in development) \| expand bpf selftest pyperf180.c compilation failure with latest last llvm18 (in development)

Context	Check	Description
netdev/tree_selection	success	Not a local patch
bpf/vmtest-bpf-VM_Test-0	success	Logs for Lint
bpf/vmtest-bpf-VM_Test-1	success	Logs for ShellCheck
bpf/vmtest-bpf-VM_Test-2	success	Logs for Validate matrix.py
bpf/vmtest-bpf-PR	success	PR summary
bpf/vmtest-bpf-VM_Test-3	success	Logs for aarch64-gcc / build / build for aarch64 with gcc
bpf/vmtest-bpf-VM_Test-4	success	Logs for aarch64-gcc / test (test_maps, false, 360) / test_maps on aarch64 with gcc
bpf/vmtest-bpf-VM_Test-5	success	Logs for aarch64-gcc / test (test_progs, false, 360) / test_progs on aarch64 with gcc
bpf/vmtest-bpf-VM_Test-7	success	Logs for aarch64-gcc / test (test_verifier, false, 360) / test_verifier on aarch64 with gcc
bpf/vmtest-bpf-VM_Test-6	success	Logs for aarch64-gcc / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on aarch64 with gcc
bpf/vmtest-bpf-VM_Test-9	success	Logs for s390x-gcc / build / build for s390x with gcc
bpf/vmtest-bpf-VM_Test-8	success	Logs for aarch64-gcc / veristat
bpf/vmtest-bpf-VM_Test-11	success	Logs for s390x-gcc / test (test_progs, false, 360) / test_progs on s390x with gcc
bpf/vmtest-bpf-VM_Test-13	success	Logs for s390x-gcc / test (test_verifier, false, 360) / test_verifier on s390x with gcc
bpf/vmtest-bpf-VM_Test-10	success	Logs for s390x-gcc / test (test_maps, false, 360) / test_maps on s390x with gcc
bpf/vmtest-bpf-VM_Test-12	success	Logs for s390x-gcc / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on s390x with gcc
bpf/vmtest-bpf-VM_Test-15	success	Logs for set-matrix
bpf/vmtest-bpf-VM_Test-17	success	Logs for x86_64-gcc / test (test_maps, false, 360) / test_maps on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-14	success	Logs for s390x-gcc / veristat
bpf/vmtest-bpf-VM_Test-19	success	Logs for x86_64-gcc / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-16	success	Logs for x86_64-gcc / build / build for x86_64 with gcc
bpf/vmtest-bpf-VM_Test-21	success	Logs for x86_64-gcc / test (test_progs_parallel, true, 30) / test_progs_parallel on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-18	success	Logs for x86_64-gcc / test (test_progs, false, 360) / test_progs on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-23	success	Logs for x86_64-gcc / veristat / veristat on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-20	success	Logs for x86_64-gcc / test (test_progs_no_alu32_parallel, true, 30) / test_progs_no_alu32_parallel on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-22	success	Logs for x86_64-gcc / test (test_verifier, false, 360) / test_verifier on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-24	success	Logs for x86_64-llvm-16 / build / build for x86_64 with llvm-16
bpf/vmtest-bpf-VM_Test-26	success	Logs for x86_64-llvm-16 / test (test_progs, false, 360) / test_progs on x86_64 with llvm-16
bpf/vmtest-bpf-VM_Test-28	success	Logs for x86_64-llvm-16 / test (test_verifier, false, 360) / test_verifier on x86_64 with llvm-16
bpf/vmtest-bpf-VM_Test-25	success	Logs for x86_64-llvm-16 / test (test_maps, false, 360) / test_maps on x86_64 with llvm-16
bpf/vmtest-bpf-VM_Test-27	success	Logs for x86_64-llvm-16 / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on x86_64 with llvm-16
bpf/vmtest-bpf-VM_Test-29	success	Logs for x86_64-llvm-16 / veristat

Yonghong Song Oct. 31, 2023, 3:58 a.m. UTC

With latest llvm18 (main branch of llvm-project repo), when building bpf selftests,
[~/work/bpf-next (master)]$ make -C tools/testing/selftests/bpf LLVM=1 -j

The following compilation error happens:
fatal error: error in backend: Branch target out of insn range
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace, preprocessed source, and associated run script.
Stack dump:
0. Program arguments: clang -g -Wall -Werror -D__TARGET_ARCH_x86 -mlittle-endian -I/home/yhs/work/bpf-next/tools/testing/selftests/bpf/tools/include -I/home/yhs
/work/bpf-next/tools/testing/selftests/bpf -I/home/yhs/work/bpf-next/tools/include/uapi -I/home/yhs/work/bpf-next/tools/testing/selftests/usr/include -idirafter /hom
e/yhs/work/llvm-project/llvm/build.18/install/lib/clang/18/include -idirafter /usr/local/include -idirafter /usr/include -Wno-compare-distinct-pointer-types -DENABLE
_ATOMICS_TESTS -O2 --target=bpf -c progs/pyperf180.c -mcpu=v3 -o /home/yhs/work/bpf-next/tools/testing/selftests/bpf/pyperf180.bpf.o
1. <eof> parser at end of file
2. Code generation
.....

The compilation failure only happens to cpu=v2 and cpu=v3. cpu=v4 is okay
since cpu=v4 supports 32-bit branch target offset.

The above failure is due to upstream llvm patch
https://reviews.llvm.org/D143624
where some inlining ordering are changed in the compiler.
The following change can temporarily work around the issue:

We will do some more investigation to see whether we could do
anything in llvm side to mitigate the issue, or if not, will
provide a proper patch to fix the issue.

Eduard Zingerman Nov. 8, 2023, 2:13 a.m. UTC | #1

On Mon, 2023-10-30 at 20:58 -0700, Yonghong Song wrote:
> With latest llvm18 (main branch of llvm-project repo), when building bpf selftests,
>     [~/work/bpf-next (master)]$ make -C tools/testing/selftests/bpf LLVM=1 -j
> 
> The following compilation error happens:
>     fatal error: error in backend: Branch target out of insn range
>     PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace, preprocessed source, and associated run script.
>     Stack dump:
>     0.      Program arguments: clang -g -Wall -Werror -D__TARGET_ARCH_x86 -mlittle-endian -I/home/yhs/work/bpf-next/tools/testing/selftests/bpf/tools/include -I/home/yhs
>     /work/bpf-next/tools/testing/selftests/bpf -I/home/yhs/work/bpf-next/tools/include/uapi -I/home/yhs/work/bpf-next/tools/testing/selftests/usr/include -idirafter /hom
>     e/yhs/work/llvm-project/llvm/build.18/install/lib/clang/18/include -idirafter /usr/local/include -idirafter /usr/include -Wno-compare-distinct-pointer-types -DENABLE
>     _ATOMICS_TESTS -O2 --target=bpf -c progs/pyperf180.c -mcpu=v3 -o /home/yhs/work/bpf-next/tools/testing/selftests/bpf/pyperf180.bpf.o
>     1.      <eof> parser at end of file
>     2.      Code generation
>     .....
> 
> The compilation failure only happens to cpu=v2 and cpu=v3. cpu=v4 is okay
> since cpu=v4 supports 32-bit branch target offset.
> 
> The above failure is due to upstream llvm patch
>     https://reviews.llvm.org/D143624
> where some inlining ordering are changed in the compiler.

Hi Yonghong, Alexei,

This is a followup for the off-list discussion. I think I have a
relatively simple two pass algorithm that allows to replace jumps
longer than 2**16 by series of shorter jumps using "trampoline"
goto instructions.

The basic idea of the algorithm is to:
- Visit basic blocks sequentially from first to last (after LLVM is
  done with figuring BB ordering), effectively splitting basic blocks
  in two parts: "processed" and "unexplored".
- Insert "trampoline" jumps only at "unexplored" side, thus
  guaranteeing that distances between basic blocks on "processed" side
  never change.
- Maintain the list of "pending jumps":
  - Whenever a basic block is picked from "unexplored" side
    information about edges coming to and from this basic block is
    added as pending jumps:
    - backward edges are added before basic block is processed;
    - forward edges are added after basic block is processed.
  - Pending jump is a tuple (off,src,dst,backedge):
    - 'src', 'dst' - basic blocks (swapped for backedges);
    - 'off' - current distance from 'src'.
- When a basic block is picked from "unexplored" side:
  - discard all pending jumps that have this basic block as 'dst';
  - peek a pending jump for which jmp.off + bb.size > MAX_JUMP_DISTANCE;
  - if such jump is present:
    - split basic block;
    - insert trampoline instruction;
    - discard pending jump and schedule new pending jump with
      trampoline src, original dst, and off=0;
  - if such jump is not present move basic block from "unexplored" to
    "processed";
  - when basic block is moved from "unexplored" side to "processed",
    bump 'off' field of each pending jump by the size of the basic
    block.

So, the main part is to keep 'off' fields of pending jumps smaller
than MAX_JUMP_DISTANCE by inserting trampoline jumps.

I have a Python model for this algorithm at [0]. It passes a few
hand-coded tests but I still need to do some property-based testing.
I think I need another day to finish with testing, after that it
should be possible to translate this code to LLVM/C++ in a couple of days.

Please let me know if this is interesting.

Thanks,
Eduard

[0] https://gist.github.com/eddyz87/7e8d162b2bb2071769a9b3d960898405

[...]

Alexei Starovoitov Nov. 8, 2023, 8:05 p.m. UTC | #2

On Tue, Nov 7, 2023 at 6:13 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Mon, 2023-10-30 at 20:58 -0700, Yonghong Song wrote:
> > With latest llvm18 (main branch of llvm-project repo), when building bpf selftests,
> >     [~/work/bpf-next (master)]$ make -C tools/testing/selftests/bpf LLVM=1 -j
> >
> > The following compilation error happens:
> >     fatal error: error in backend: Branch target out of insn range
> >     PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace, preprocessed source, and associated run script.
> >     Stack dump:
> >     0.      Program arguments: clang -g -Wall -Werror -D__TARGET_ARCH_x86 -mlittle-endian -I/home/yhs/work/bpf-next/tools/testing/selftests/bpf/tools/include -I/home/yhs
> >     /work/bpf-next/tools/testing/selftests/bpf -I/home/yhs/work/bpf-next/tools/include/uapi -I/home/yhs/work/bpf-next/tools/testing/selftests/usr/include -idirafter /hom
> >     e/yhs/work/llvm-project/llvm/build.18/install/lib/clang/18/include -idirafter /usr/local/include -idirafter /usr/include -Wno-compare-distinct-pointer-types -DENABLE
> >     _ATOMICS_TESTS -O2 --target=bpf -c progs/pyperf180.c -mcpu=v3 -o /home/yhs/work/bpf-next/tools/testing/selftests/bpf/pyperf180.bpf.o
> >     1.      <eof> parser at end of file
> >     2.      Code generation
> >     .....
> >
> > The compilation failure only happens to cpu=v2 and cpu=v3. cpu=v4 is okay
> > since cpu=v4 supports 32-bit branch target offset.
> >
> > The above failure is due to upstream llvm patch
> >     https://reviews.llvm.org/D143624
> > where some inlining ordering are changed in the compiler.
>
> Hi Yonghong, Alexei,
>
> This is a followup for the off-list discussion. I think I have a
> relatively simple two pass algorithm that allows to replace jumps
> longer than 2**16 by series of shorter jumps using "trampoline"
> goto instructions.
>
> The basic idea of the algorithm is to:
> - Visit basic blocks sequentially from first to last (after LLVM is
>   done with figuring BB ordering), effectively splitting basic blocks
>   in two parts: "processed" and "unexplored".
> - Insert "trampoline" jumps only at "unexplored" side, thus
>   guaranteeing that distances between basic blocks on "processed" side
>   never change.
> - Maintain the list of "pending jumps":
>   - Whenever a basic block is picked from "unexplored" side
>     information about edges coming to and from this basic block is
>     added as pending jumps:
>     - backward edges are added before basic block is processed;
>     - forward edges are added after basic block is processed.
>   - Pending jump is a tuple (off,src,dst,backedge):
>     - 'src', 'dst' - basic blocks (swapped for backedges);
>     - 'off' - current distance from 'src'.
> - When a basic block is picked from "unexplored" side:
>   - discard all pending jumps that have this basic block as 'dst';
>   - peek a pending jump for which jmp.off + bb.size > MAX_JUMP_DISTANCE;
>   - if such jump is present:
>     - split basic block;
>     - insert trampoline instruction;
>     - discard pending jump and schedule new pending jump with
>       trampoline src, original dst, and off=0;
>   - if such jump is not present move basic block from "unexplored" to
>     "processed";
>   - when basic block is moved from "unexplored" side to "processed",
>     bump 'off' field of each pending jump by the size of the basic
>     block.
>
> So, the main part is to keep 'off' fields of pending jumps smaller
> than MAX_JUMP_DISTANCE by inserting trampoline jumps.
>
> I have a Python model for this algorithm at [0]. It passes a few
> hand-coded tests but I still need to do some property-based testing.
> I think I need another day to finish with testing, after that it
> should be possible to translate this code to LLVM/C++ in a couple of days.

The algorithm doesn't look simple.
Even if we change llvm to do this, it's not clear whether
the verifier will be able to consume such code.
imo it's too much effort to address a non-issue.
I'd just adjust the pyperf180.c test.

Eduard Zingerman Nov. 9, 2023, 1:20 a.m. UTC | #3

On Wed, 2023-11-08 at 12:05 -0800, Alexei Starovoitov wrote:
[...]
> The algorithm doesn't look simple.
> Even if we change llvm to do this, it's not clear whether
> the verifier will be able to consume such code.

Actually, I don't think that trampoline jumps could cause any trouble.

> imo it's too much effort to address a non-issue.
> I'd just adjust the pyperf180.c test.

Ok, I'll drop this. Thank you for taking a look.

Yonghong Song Nov. 9, 2023, 2:58 a.m. UTC | #4

On 11/8/23 5:20 PM, Eduard Zingerman wrote:
> On Wed, 2023-11-08 at 12:05 -0800, Alexei Starovoitov wrote:
> [...]
>> The algorithm doesn't look simple.
>> Even if we change llvm to do this, it's not clear whether
>> the verifier will be able to consume such code.
> Actually, I don't think that trampoline jumps could cause any trouble.
>
>> imo it's too much effort to address a non-issue.
>> I'd just adjust the pyperf180.c test.
> Ok, I'll drop this. Thank you for taking a look.

Thanks Eduard for doing analysis for this! I will send
a patch soon to fix selftest failure issue.

bpf selftest pyperf180.c compilation failure with latest last llvm18 (in development)

Checks

Commit Message

Comments

Patch