Message ID | 20220703130924.57240-1-dlan@gentoo.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | RISC-V/bpf: Enable bpf_probe_read{, str}() | expand |
On 03/07/2022 14:09, Yixun Lan wrote: > Enable this option to fix a bcc error in RISC-V platform > > And, the error shows as follows: > > ~ # runqlen > WARNING: This target JIT is not designed for the host you are running. \ > If bad things happen, please choose a different -march switch. > bpf: Failed to load program: Invalid argument > 0: R1=ctx(off=0,imm=0) R10=fp0 > 0: (85) call bpf_get_current_task#35 ; R0_w=scalar() > 1: (b7) r6 = 0 ; R6_w=0 > 2: (7b) *(u64 *)(r10 -8) = r6 ; R6_w=P0 R10=fp0 fp-8_w=00000000 > 3: (07) r0 += 312 ; R0_w=scalar() > 4: (bf) r1 = r10 ; R1_w=fp0 R10=fp0 > 5: (07) r1 += -8 ; R1_w=fp-8 > 6: (b7) r2 = 8 ; R2_w=8 > 7: (bf) r3 = r0 ; R0_w=scalar(id=1) R3_w=scalar(id=1) > 8: (85) call bpf_probe_read#4 > unknown func bpf_probe_read#4 > processed 9 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0 > > Traceback (most recent call last): > File "/usr/lib/python-exec/python3.9/runqlen", line 187, in <module> > b.attach_perf_event(ev_type=PerfType.SOFTWARE, > File "/usr/lib/python3.9/site-packages/bcc/__init__.py", line 1228, in attach_perf_event > fn = self.load_func(fn_name, BPF.PERF_EVENT) > File "/usr/lib/python3.9/site-packages/bcc/__init__.py", line 522, in load_func > raise Exception("Failed to load BPF program %s: %s" % > Exception: Failed to load BPF program b'do_perf_event': Invalid argument > > Signed-off-by: Yixun Lan <dlan@gentoo.org> Do you know what commit this fixes? Thanks, Conor. > --- > arch/riscv/Kconfig | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig > index 32ffef9f6e5b4..da0016f1be6ce 100644 > --- a/arch/riscv/Kconfig > +++ b/arch/riscv/Kconfig > @@ -25,6 +25,7 @@ config RISCV > select ARCH_HAS_GIGANTIC_PAGE > select ARCH_HAS_KCOV > select ARCH_HAS_MMIOWB > + select ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE > select ARCH_HAS_PTE_SPECIAL > select ARCH_HAS_SET_DIRECT_MAP if MMU > select ARCH_HAS_SET_MEMORY if MMU
Hi Conor Dooley: On 14:12 Sun 03 Jul , Conor Dooley wrote: > On 03/07/2022 14:09, Yixun Lan wrote: > > Enable this option to fix a bcc error in RISC-V platform > > > > And, the error shows as follows: > > > > ~ # runqlen > > WARNING: This target JIT is not designed for the host you are running. \ > > If bad things happen, please choose a different -march switch. > > bpf: Failed to load program: Invalid argument > > 0: R1=ctx(off=0,imm=0) R10=fp0 > > 0: (85) call bpf_get_current_task#35 ; R0_w=scalar() > > 1: (b7) r6 = 0 ; R6_w=0 > > 2: (7b) *(u64 *)(r10 -8) = r6 ; R6_w=P0 R10=fp0 fp-8_w=00000000 > > 3: (07) r0 += 312 ; R0_w=scalar() > > 4: (bf) r1 = r10 ; R1_w=fp0 R10=fp0 > > 5: (07) r1 += -8 ; R1_w=fp-8 > > 6: (b7) r2 = 8 ; R2_w=8 > > 7: (bf) r3 = r0 ; R0_w=scalar(id=1) R3_w=scalar(id=1) > > 8: (85) call bpf_probe_read#4 > > unknown func bpf_probe_read#4 > > processed 9 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0 > > > > Traceback (most recent call last): > > File "/usr/lib/python-exec/python3.9/runqlen", line 187, in <module> > > b.attach_perf_event(ev_type=PerfType.SOFTWARE, > > File "/usr/lib/python3.9/site-packages/bcc/__init__.py", line 1228, in attach_perf_event > > fn = self.load_func(fn_name, BPF.PERF_EVENT) > > File "/usr/lib/python3.9/site-packages/bcc/__init__.py", line 522, in load_func > > raise Exception("Failed to load BPF program %s: %s" % > > Exception: Failed to load BPF program b'do_perf_event': Invalid argument > > > > Signed-off-by: Yixun Lan <dlan@gentoo.org> > > Do you know what commit this fixes? > Thanks, > Conor. > I think this is effectively broken for RISC-V 64 at the commit: 0ebeea8ca8a4: bpf: Restrict bpf_probe_read{, str}() only to archs where they work However, the bcc tools haven't got BPF support for RISC-V at that time, so no one noticed it I can add a Fixes tag if you think it's a proper way > > --- > > arch/riscv/Kconfig | 1 + > > 1 file changed, 1 insertion(+) > > > > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig > > index 32ffef9f6e5b4..da0016f1be6ce 100644 > > --- a/arch/riscv/Kconfig > > +++ b/arch/riscv/Kconfig > > @@ -25,6 +25,7 @@ config RISCV > > select ARCH_HAS_GIGANTIC_PAGE > > select ARCH_HAS_KCOV > > select ARCH_HAS_MMIOWB > > + select ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE > > select ARCH_HAS_PTE_SPECIAL > > select ARCH_HAS_SET_DIRECT_MAP if MMU > > select ARCH_HAS_SET_MEMORY if MMU
On Sun, Jul 03, 2022 at 09:09:24PM +0800, Yixun Lan wrote: > Enable this option to fix a bcc error in RISC-V platform > > And, the error shows as follows: These should not be enabled on new platforms. Use the proper helpers to probe kernel vs user pointers instead.
On 04/07/2022 03:20, Yixun Lan wrote: > [You don't often get email from dlan@gentoo.org. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ] > > EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe > > Hi Conor Dooley: > > On 14:12 Sun 03 Jul , Conor Dooley wrote: >> On 03/07/2022 14:09, Yixun Lan wrote: >>> Enable this option to fix a bcc error in RISC-V platform >>> >>> And, the error shows as follows: >>> >>> ~ # runqlen >>> WARNING: This target JIT is not designed for the host you are running. \ >>> If bad things happen, please choose a different -march switch. >>> bpf: Failed to load program: Invalid argument >>> 0: R1=ctx(off=0,imm=0) R10=fp0 >>> 0: (85) call bpf_get_current_task#35 ; R0_w=scalar() >>> 1: (b7) r6 = 0 ; R6_w=0 >>> 2: (7b) *(u64 *)(r10 -8) = r6 ; R6_w=P0 R10=fp0 fp-8_w=00000000 >>> 3: (07) r0 += 312 ; R0_w=scalar() >>> 4: (bf) r1 = r10 ; R1_w=fp0 R10=fp0 >>> 5: (07) r1 += -8 ; R1_w=fp-8 >>> 6: (b7) r2 = 8 ; R2_w=8 >>> 7: (bf) r3 = r0 ; R0_w=scalar(id=1) R3_w=scalar(id=1) >>> 8: (85) call bpf_probe_read#4 >>> unknown func bpf_probe_read#4 >>> processed 9 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0 >>> >>> Traceback (most recent call last): >>> File "/usr/lib/python-exec/python3.9/runqlen", line 187, in <module> >>> b.attach_perf_event(ev_type=PerfType.SOFTWARE, >>> File "/usr/lib/python3.9/site-packages/bcc/__init__.py", line 1228, in attach_perf_event >>> fn = self.load_func(fn_name, BPF.PERF_EVENT) >>> File "/usr/lib/python3.9/site-packages/bcc/__init__.py", line 522, in load_func >>> raise Exception("Failed to load BPF program %s: %s" % >>> Exception: Failed to load BPF program b'do_perf_event': Invalid argument >>> >>> Signed-off-by: Yixun Lan <dlan@gentoo.org> >> >> Do you know what commit this fixes? >> Thanks, >> Conor. >> > > I think this is effectively broken for RISC-V 64 at the commit: > 0ebeea8ca8a4: bpf: Restrict bpf_probe_read{, str}() only to archs where they work > > However, the bcc tools haven't got BPF support for RISC-V at that time, > so no one noticed it > > I can add a Fixes tag if you think it's a proper way Hmm, I had a read of that commit and the breakage sounded intentional. I ran a git log --grep "0ebeea8ca8a4" & it seems like a mixed bag of fixes tags. Whether or not it deserves the explicit tag, mentioning the commit would be useful. The tag would be: Fixes: 0ebeea8ca8a4 ("bpf: Restrict bpf_probe_read{, str}() only to archs where they work") Thanks, Conor. > >>> --- >>> arch/riscv/Kconfig | 1 + >>> 1 file changed, 1 insertion(+) >>> >>> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig >>> index 32ffef9f6e5b4..da0016f1be6ce 100644 >>> --- a/arch/riscv/Kconfig >>> +++ b/arch/riscv/Kconfig >>> @@ -25,6 +25,7 @@ config RISCV >>> select ARCH_HAS_GIGANTIC_PAGE >>> select ARCH_HAS_KCOV >>> select ARCH_HAS_MMIOWB >>> + select ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE >>> select ARCH_HAS_PTE_SPECIAL >>> select ARCH_HAS_SET_DIRECT_MAP if MMU >>> select ARCH_HAS_SET_MEMORY if MMU > > -- > Yixun Lan (dlan) > Gentoo Linux Developer > GPG Key ID AABEFD55 > > _______________________________________________ > linux-riscv mailing list > linux-riscv@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-riscv
On Sun, Jul 3, 2022 at 10:53 PM Christoph Hellwig <hch@infradead.org> wrote: > > On Sun, Jul 03, 2022 at 09:09:24PM +0800, Yixun Lan wrote: > > Enable this option to fix a bcc error in RISC-V platform > > > > And, the error shows as follows: > > These should not be enabled on new platforms. Use the proper helpers > to probe kernel vs user pointers instead. riscv existed as of [0], so I'd argue it is a proper bug fix, as corresponding select ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE should have been added back then. But I also agree that BCC tools should be updated to use proper bpf_probe_read_{kernel,user}[_str()] helpers, please contribute such fixes to BCC tools and BCC itself as well. Cc'ed Alan as his ksnoop in libbpf-tools seems to be using bpf_probe_read() as well and needs to be fixed. [0] 0ebeea8ca8a4 ("bpf: Restrict bpf_probe_read{, str}() only to archs where they work")
On 7/5/22 10:00 PM, Andrii Nakryiko wrote: > On Sun, Jul 3, 2022 at 10:53 PM Christoph Hellwig <hch@infradead.org> wrote: >> >> On Sun, Jul 03, 2022 at 09:09:24PM +0800, Yixun Lan wrote: >>> Enable this option to fix a bcc error in RISC-V platform >>> >>> And, the error shows as follows: >> >> These should not be enabled on new platforms. Use the proper helpers >> to probe kernel vs user pointers instead. > > riscv existed as of [0], so I'd argue it is a proper bug fix, as > corresponding select ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE should > have been added back then. > > But I also agree that BCC tools should be updated to use proper > bpf_probe_read_{kernel,user}[_str()] helpers, please contribute such > fixes to BCC tools and BCC itself as well. Cc'ed Alan as his ksnoop in > libbpf-tools seems to be using bpf_probe_read() as well and needs to > be fixed. Yixun, the bcc change looks like below: --- a/src/cc/frontends/clang/b_frontend_action.cc +++ b/src/cc/frontends/clang/b_frontend_action.cc @@ -132,7 +132,7 @@ static std::string check_bpf_probe_read_user(llvm::StringRef probe, /* For arch with overlapping address space, dont use bpf_probe_read for * user read. Just error out */ -#if defined(__s390x__) +#if defined(__s390x__) || defined(__riscv_) overlap_addr = true; return ""; #endif Basically, prevent using bpf_probe_read() helper, so it will force user to use bpf_probe_read_user() or bpf_probe_read_kernel(). and this should make it work for old kernels. > > [0] 0ebeea8ca8a4 ("bpf: Restrict bpf_probe_read{, str}() only to > archs where they work")
On Tue, Jul 05, 2022 at 10:00:42PM -0700, Andrii Nakryiko wrote: > riscv existed as of [0], so I'd argue it is a proper bug fix, as > corresponding select ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE should > have been added back then. How much of an eBPF ecosystem was there on RISC-V at the point?
On Tue, Jul 05, 2022 at 11:41:30PM -0700, Yonghong Song wrote: > > > On 7/5/22 10:00 PM, Andrii Nakryiko wrote: > > On Sun, Jul 3, 2022 at 10:53 PM Christoph Hellwig <hch@infradead.org> wrote: > > > > > > On Sun, Jul 03, 2022 at 09:09:24PM +0800, Yixun Lan wrote: > > > > Enable this option to fix a bcc error in RISC-V platform > > > > > > > > And, the error shows as follows: > > > > > > These should not be enabled on new platforms. Use the proper helpers > > > to probe kernel vs user pointers instead. > > > > riscv existed as of [0], so I'd argue it is a proper bug fix, as > > corresponding select ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE should > > have been added back then. > > > > But I also agree that BCC tools should be updated to use proper > > bpf_probe_read_{kernel,user}[_str()] helpers, please contribute such > > fixes to BCC tools and BCC itself as well. Cc'ed Alan as his ksnoop in > > libbpf-tools seems to be using bpf_probe_read() as well and needs to > > be fixed. > > Yixun, the bcc change looks like below: No, this is broken. bcc needs to stop using bpf_probe_read entirely for user addresses and unconditionally use bpf_probe_read_user first and only fall back to bpf_probe_read if not supported.
On Wed, Jul 6, 2022 at 12:00 AM Christoph Hellwig <hch@infradead.org> wrote: > > On Tue, Jul 05, 2022 at 10:00:42PM -0700, Andrii Nakryiko wrote: > > riscv existed as of [0], so I'd argue it is a proper bug fix, as > > corresponding select ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE should > > have been added back then. > > How much of an eBPF ecosystem was there on RISC-V at the point? No idea, never used RISC-V and didn't pay much attention. But why does it matter?
Hi Christoph, YongHong On 00:01 Wed 06 Jul , Christoph Hellwig wrote: > On Tue, Jul 05, 2022 at 11:41:30PM -0700, Yonghong Song wrote: > > > > > > On 7/5/22 10:00 PM, Andrii Nakryiko wrote: > > > On Sun, Jul 3, 2022 at 10:53 PM Christoph Hellwig <hch@infradead.org> wrote: > > > > > > > > On Sun, Jul 03, 2022 at 09:09:24PM +0800, Yixun Lan wrote: > > > > > Enable this option to fix a bcc error in RISC-V platform > > > > > > > > > > And, the error shows as follows: > > > > > > > > These should not be enabled on new platforms. Use the proper helpers > > > > to probe kernel vs user pointers instead. > > > > > > riscv existed as of [0], so I'd argue it is a proper bug fix, as > > > corresponding select ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE should > > > have been added back then. > > > > > > But I also agree that BCC tools should be updated to use proper > > > bpf_probe_read_{kernel,user}[_str()] helpers, please contribute such > > > fixes to BCC tools and BCC itself as well. Cc'ed Alan as his ksnoop in > > > libbpf-tools seems to be using bpf_probe_read() as well and needs to > > > be fixed. > > > > Yixun, the bcc change looks like below: > > No, this is broken. bcc needs to stop using bpf_probe_read entirely > for user addresses and unconditionally use bpf_probe_read_user first > and only fall back to bpf_probe_read if not supported. I agree with Christoph, there is something in the bcc tools that need to adjust in order to use new bpf_probe_read_{kernel,user} Please check the ongoing discussion [0] in the bcc tools if you're interested in, advice and comments are welcome [0] https://github.com/iovisor/bcc/pull/4085#issuecomment-1179446738
On Sat, Jul 09, 2022 at 09:01:10AM +0800, Yixun Lan wrote: > Please check the ongoing discussion [0] in the bcc tools if you're > interested in, advice and comments are welcome > > [0] https://github.com/iovisor/bcc/pull/4085#issuecomment-1179446738 I can't find a way to post there, as replying eems to require a login. Is there a mailing list discussion somewhere that is broadly accessible?
On Fri, Jul 08, 2022 at 03:22:51PM -0700, Andrii Nakryiko wrote: > On Wed, Jul 6, 2022 at 12:00 AM Christoph Hellwig <hch@infradead.org> wrote: > > > > On Tue, Jul 05, 2022 at 10:00:42PM -0700, Andrii Nakryiko wrote: > > > riscv existed as of [0], so I'd argue it is a proper bug fix, as > > > corresponding select ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE should > > > have been added back then. > > > > How much of an eBPF ecosystem was there on RISC-V at the point? > > No idea, never used RISC-V and didn't pay much attention. But why does > it matter? It matters because we should not spread broken legacy interfaces.
Hi Christoph: On 23:24 Fri 08 Jul , Christoph Hellwig wrote: > On Sat, Jul 09, 2022 at 09:01:10AM +0800, Yixun Lan wrote: > > Please check the ongoing discussion [0] in the bcc tools if you're > > interested in, advice and comments are welcome > > > > [0] https://github.com/iovisor/bcc/pull/4085#issuecomment-1179446738 > > I can't find a way to post there, as replying eems to require a login. > Is there a mailing list discussion somewhere that is broadly accessible? never mind, I think the logic is quite clear, we can do something in bcc: 1) adopt new _{kernel,user} interface whenever possible, this will work fine for all arch with new kernel versions 2) for old kernel versions which lack the _{kernel,user} support, fall back to old bpf_probe_read(), but take care of the Archs which have overlaping address space - like s390, and just error out for it
On Sat, Jul 09, 2022 at 04:48:05PM +0800, Yixun Lan wrote: > never mind, I think the logic is quite clear, we can do something in bcc: > > 1) adopt new _{kernel,user} interface whenever possible, this will > work fine for all arch with new kernel versions > > 2) for old kernel versions which lack the _{kernel,user} support, > fall back to old bpf_probe_read(), but take care of the Archs which > have overlaping address space - like s390, and just error out for it Yes, that is the right thing to do.
diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig index 32ffef9f6e5b4..da0016f1be6ce 100644 --- a/arch/riscv/Kconfig +++ b/arch/riscv/Kconfig @@ -25,6 +25,7 @@ config RISCV select ARCH_HAS_GIGANTIC_PAGE select ARCH_HAS_KCOV select ARCH_HAS_MMIOWB + select ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE select ARCH_HAS_PTE_SPECIAL select ARCH_HAS_SET_DIRECT_MAP if MMU select ARCH_HAS_SET_MEMORY if MMU
Enable this option to fix a bcc error in RISC-V platform And, the error shows as follows: ~ # runqlen WARNING: This target JIT is not designed for the host you are running. \ If bad things happen, please choose a different -march switch. bpf: Failed to load program: Invalid argument 0: R1=ctx(off=0,imm=0) R10=fp0 0: (85) call bpf_get_current_task#35 ; R0_w=scalar() 1: (b7) r6 = 0 ; R6_w=0 2: (7b) *(u64 *)(r10 -8) = r6 ; R6_w=P0 R10=fp0 fp-8_w=00000000 3: (07) r0 += 312 ; R0_w=scalar() 4: (bf) r1 = r10 ; R1_w=fp0 R10=fp0 5: (07) r1 += -8 ; R1_w=fp-8 6: (b7) r2 = 8 ; R2_w=8 7: (bf) r3 = r0 ; R0_w=scalar(id=1) R3_w=scalar(id=1) 8: (85) call bpf_probe_read#4 unknown func bpf_probe_read#4 processed 9 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0 Traceback (most recent call last): File "/usr/lib/python-exec/python3.9/runqlen", line 187, in <module> b.attach_perf_event(ev_type=PerfType.SOFTWARE, File "/usr/lib/python3.9/site-packages/bcc/__init__.py", line 1228, in attach_perf_event fn = self.load_func(fn_name, BPF.PERF_EVENT) File "/usr/lib/python3.9/site-packages/bcc/__init__.py", line 522, in load_func raise Exception("Failed to load BPF program %s: %s" % Exception: Failed to load BPF program b'do_perf_event': Invalid argument Signed-off-by: Yixun Lan <dlan@gentoo.org> --- arch/riscv/Kconfig | 1 + 1 file changed, 1 insertion(+)