diff mbox series

RISC-V uprobe bug (Was: Re: WARNING: CPU: 3 PID: 261 at kernel/bpf/memalloc.c:342)

Message ID 87v8d19aun.fsf@all.your.base.are.belong.to.us (mailing list archive)
State RFC
Delegated to: BPF
Headers show
Series RISC-V uprobe bug (Was: Re: WARNING: CPU: 3 PID: 261 at kernel/bpf/memalloc.c:342) | expand

Checks

Context Check Description
netdev/tree_selection success Not a local patch

Commit Message

Björn Töpel Aug. 26, 2023, 1:44 p.m. UTC
Björn Töpel <bjorn@kernel.org> writes:

> I'm chasing a workqueue hang on RISC-V/qemu (TCG), using the bpf
> selftests on bpf-next 9e3b47abeb8f.
>
> I'm able to reproduce the hang by multiple runs of:
>  | ./test_progs -a link_api -a linked_list
> I'm currently investigating that.

+Guo for uprobe

This was an interesting bug. The hang is an ebreak (RISC-V breakpoint),
that puts the kernel into an infinite loop.

To reproduce, simply run the BPF selftest:
./test_progs -v -a link_api -a linked_list

First the link_api test is being run, which exercises the uprobe
functionality. The link_api test completes, and test_progs will still
have the uprobe active/enabled. Next the linked_list test triggered a
WARN_ON (which is implemented via ebreak as well).

Now, handle_break() is entered, and the uprobe_breakpoint_handler()
returns true exiting the handle_break(), which returns to the WARN
ebreak, and we have merry-go-round.

Lucky for the RISC-V folks, the BPF memory handler had a WARN that
surfaced the bug! ;-)

This patch fixes the issue, but it's probably a prettier variant:
--8<--
--8<--

I'll cook a cleaner/proper patch for this, unless the uprobes folks has
a better solution.


Björn

Comments

Nam Cao Aug. 26, 2023, 6:12 p.m. UTC | #1
On Sat, Aug 26, 2023 at 03:44:48PM +0200, Björn Töpel wrote:
> Björn Töpel <bjorn@kernel.org> writes:
> 
> > I'm chasing a workqueue hang on RISC-V/qemu (TCG), using the bpf
> > selftests on bpf-next 9e3b47abeb8f.
> >
> > I'm able to reproduce the hang by multiple runs of:
> >  | ./test_progs -a link_api -a linked_list
> > I'm currently investigating that.
> 
> +Guo for uprobe
> 
> This was an interesting bug. The hang is an ebreak (RISC-V breakpoint),
> that puts the kernel into an infinite loop.
> 
> To reproduce, simply run the BPF selftest:
> ./test_progs -v -a link_api -a linked_list
> 
> First the link_api test is being run, which exercises the uprobe
> functionality. The link_api test completes, and test_progs will still
> have the uprobe active/enabled. Next the linked_list test triggered a
> WARN_ON (which is implemented via ebreak as well).
> 
> Now, handle_break() is entered, and the uprobe_breakpoint_handler()
> returns true exiting the handle_break(), which returns to the WARN
> ebreak, and we have merry-go-round.
> 
> Lucky for the RISC-V folks, the BPF memory handler had a WARN that
> surfaced the bug! ;-)

Thanks for the analysis.

I couldn't reproduce the problem, so I am just taking a guess here. The problem
is bebcause uprobes didn't find a probe point at that ebreak instruction. However,
it also doesn't think a ebreak instruction is there, then it got confused and just
return back to the ebreak instruction, then everything repeats.

The reason why uprobes didn't think there is a ebreak instruction is because
is_trap_insn() only returns true if it is a 32-bit ebreak, or 16-bit c.ebreak if
C extension is available, not both. So a 32-bit ebreak is not correctly recognized
as a trap instruction.

If my guess is correct, the following should fix it. Can you please try if it works?

(this is the first time I send a patch this way, so please let me know if you can't apply)

Best regards,
Nam

---
 arch/riscv/kernel/probes/uprobes.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/arch/riscv/kernel/probes/uprobes.c b/arch/riscv/kernel/probes/uprobes.c
index 194f166b2cc4..91f4ce101cd1 100644
--- a/arch/riscv/kernel/probes/uprobes.c
+++ b/arch/riscv/kernel/probes/uprobes.c
@@ -3,6 +3,7 @@
 #include <linux/highmem.h>
 #include <linux/ptrace.h>
 #include <linux/uprobes.h>
+#include <asm/insn.h>
 
 #include "decode-insn.h"
 
@@ -17,6 +18,15 @@ bool is_swbp_insn(uprobe_opcode_t *insn)
 #endif
 }
 
+bool is_trap_insn(uprobe_opcode_t *insn)
+{
+#ifdef CONFIG_RISCV_ISA_C
+	if (riscv_insn_is_c_ebreak(*insn))
+		return true;
+#endif
+	return riscv_insn_is_ebreak(*insn);
+}
+
 unsigned long uprobe_get_swbp_addr(struct pt_regs *regs)
 {
 	return instruction_pointer(regs);
Nam Cao Aug. 26, 2023, 6:31 p.m. UTC | #2
On Sat, Aug 26, 2023 at 08:12:30PM +0200, Nam Cao wrote:
> On Sat, Aug 26, 2023 at 03:44:48PM +0200, Björn Töpel wrote:
> > Björn Töpel <bjorn@kernel.org> writes:
> > 
> > > I'm chasing a workqueue hang on RISC-V/qemu (TCG), using the bpf
> > > selftests on bpf-next 9e3b47abeb8f.
> > >
> > > I'm able to reproduce the hang by multiple runs of:
> > >  | ./test_progs -a link_api -a linked_list
> > > I'm currently investigating that.
> > 
> > +Guo for uprobe
> > 
> > This was an interesting bug. The hang is an ebreak (RISC-V breakpoint),
> > that puts the kernel into an infinite loop.
> > 
> > To reproduce, simply run the BPF selftest:
> > ./test_progs -v -a link_api -a linked_list
> > 
> > First the link_api test is being run, which exercises the uprobe
> > functionality. The link_api test completes, and test_progs will still
> > have the uprobe active/enabled. Next the linked_list test triggered a
> > WARN_ON (which is implemented via ebreak as well).
> > 
> > Now, handle_break() is entered, and the uprobe_breakpoint_handler()
> > returns true exiting the handle_break(), which returns to the WARN
> > ebreak, and we have merry-go-round.
> > 
> > Lucky for the RISC-V folks, the BPF memory handler had a WARN that
> > surfaced the bug! ;-)
> 
> Thanks for the analysis.
> 
> I couldn't reproduce the problem, so I am just taking a guess here. The problem
> is bebcause uprobes didn't find a probe point at that ebreak instruction. However,
> it also doesn't think a ebreak instruction is there, then it got confused and just
> return back to the ebreak instruction, then everything repeats.
> 
> The reason why uprobes didn't think there is a ebreak instruction is because
> is_trap_insn() only returns true if it is a 32-bit ebreak, or 16-bit c.ebreak if
> C extension is available, not both. So a 32-bit ebreak is not correctly recognized
> as a trap instruction.

I feel like I wasn't very clear with this: I was talking about handle_swbp() in
kernel/events/uprobes.c. In this function, the call to find_active_uprobe() should
return false. Then uprobes check if the trap instruction is still there by
calling is_trap_insn(), who correctly says "no". So uprobes assume it is safe to
just comeback to that address. If is_trap_insn() correctly returns true, then
uprobes would know that this is a ebreak, but not a probe, and handle thing correctly.

Best regards,
Nam
Björn Töpel Aug. 27, 2023, 8:11 a.m. UTC | #3
Nam,

Nam Cao <namcaov@gmail.com> writes:

> On Sat, Aug 26, 2023 at 03:44:48PM +0200, Björn Töpel wrote:
>> Björn Töpel <bjorn@kernel.org> writes:
>> 
>> > I'm chasing a workqueue hang on RISC-V/qemu (TCG), using the bpf
>> > selftests on bpf-next 9e3b47abeb8f.
>> >
>> > I'm able to reproduce the hang by multiple runs of:
>> >  | ./test_progs -a link_api -a linked_list
>> > I'm currently investigating that.
>> 
>> +Guo for uprobe
>> 
>> This was an interesting bug. The hang is an ebreak (RISC-V breakpoint),
>> that puts the kernel into an infinite loop.
>> 
>> To reproduce, simply run the BPF selftest:
>> ./test_progs -v -a link_api -a linked_list
>> 
>> First the link_api test is being run, which exercises the uprobe
>> functionality. The link_api test completes, and test_progs will still
>> have the uprobe active/enabled. Next the linked_list test triggered a
>> WARN_ON (which is implemented via ebreak as well).
>> 
>> Now, handle_break() is entered, and the uprobe_breakpoint_handler()
>> returns true exiting the handle_break(), which returns to the WARN
>> ebreak, and we have merry-go-round.
>> 
>> Lucky for the RISC-V folks, the BPF memory handler had a WARN that
>> surfaced the bug! ;-)
>
> Thanks for the analysis.
>
> I couldn't reproduce the problem, so I am just taking a guess here. The problem
> is bebcause uprobes didn't find a probe point at that ebreak instruction. However,
> it also doesn't think a ebreak instruction is there, then it got confused and just
> return back to the ebreak instruction, then everything repeats.
>
> The reason why uprobes didn't think there is a ebreak instruction is because
> is_trap_insn() only returns true if it is a 32-bit ebreak, or 16-bit c.ebreak if
> C extension is available, not both. So a 32-bit ebreak is not correctly recognized
> as a trap instruction.
>
> If my guess is correct, the following should fix it. Can you please try if it works?
>
> (this is the first time I send a patch this way, so please let me know if you can't apply)
>
> Best regards,
> Nam
>
> ---
>  arch/riscv/kernel/probes/uprobes.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)
>
> diff --git a/arch/riscv/kernel/probes/uprobes.c b/arch/riscv/kernel/probes/uprobes.c
> index 194f166b2cc4..91f4ce101cd1 100644
> --- a/arch/riscv/kernel/probes/uprobes.c
> +++ b/arch/riscv/kernel/probes/uprobes.c
> @@ -3,6 +3,7 @@
>  #include <linux/highmem.h>
>  #include <linux/ptrace.h>
>  #include <linux/uprobes.h>
> +#include <asm/insn.h>
>  
>  #include "decode-insn.h"
>  
> @@ -17,6 +18,15 @@ bool is_swbp_insn(uprobe_opcode_t *insn)
>  #endif
>  }
>  
> +bool is_trap_insn(uprobe_opcode_t *insn)
> +{
> +#ifdef CONFIG_RISCV_ISA_C
> +	if (riscv_insn_is_c_ebreak(*insn))
> +		return true;
> +#endif
> +	return riscv_insn_is_ebreak(*insn);
> +}
> +
>  unsigned long uprobe_get_swbp_addr(struct pt_regs *regs)
>  {
>  	return instruction_pointer(regs);
> -- 
> 2.34.1

The default implementation of is_trap_insn() which RISC-V is using calls
is_swbp_insn(), which is doing what your patch does. Your patch does not
address the issue.

We're taking an ebreak trap from kernel space. In this case we should
never look for a userland (uprobe) handler at all, only the kprobe
handlers should be considered.

In this case, the TIF_UPROBE is incorrectly set, and incorrectly (not)
handled in the "common entry" exit path, which takes us to the infinite
loop.


Björn
Nam Cao Aug. 27, 2023, 8:35 a.m. UTC | #4
On Sun, Aug 27, 2023 at 10:11:25AM +0200, Björn Töpel wrote:
> The default implementation of is_trap_insn() which RISC-V is using calls
> is_swbp_insn(), which is doing what your patch does. Your patch does not
> address the issue.

is_swbp_insn() does this:

        #ifdef CONFIG_RISCV_ISA_C
                return (*insn & 0xffff) == UPROBE_SWBP_INSN;
        #else
                return *insn == UPROBE_SWBP_INSN;
        #endif

...so it doesn't even check for 32-bit ebreak if C extension is on. My patch
is not the same.

But okay, if it doesn't solve the problem, then I must be wrong somewhere.
 
> We're taking an ebreak trap from kernel space. In this case we should
> never look for a userland (uprobe) handler at all, only the kprobe
> handlers should be considered.
> 
> In this case, the TIF_UPROBE is incorrectly set, and incorrectly (not)
> handled in the "common entry" exit path, which takes us to the infinite
> loop.

This change makes a lot of sense, no reason to check for uprobes if exception
comes from the kernel.

Best regards,
Nam
Björn Töpel Aug. 27, 2023, 9:04 a.m. UTC | #5
Nam Cao <namcaov@gmail.com> writes:

> On Sun, Aug 27, 2023 at 10:11:25AM +0200, Björn Töpel wrote:
>> The default implementation of is_trap_insn() which RISC-V is using calls
>> is_swbp_insn(), which is doing what your patch does. Your patch does not
>> address the issue.
>
> is_swbp_insn() does this:
>
>         #ifdef CONFIG_RISCV_ISA_C
>                 return (*insn & 0xffff) == UPROBE_SWBP_INSN;
>         #else
>                 return *insn == UPROBE_SWBP_INSN;
>         #endif
>
> ...so it doesn't even check for 32-bit ebreak if C extension is on. My patch
> is not the same.

Ah, was too quick.

AFAIU uprobes *always* uses c.ebreak when CONFIG_RISCV_ISA_C is set, and
ebreak otherwise. That's the reason is_swbp_insn() is implemented like
that. If that's not the case, there's a another bug, that your patches
addresses.

In that case we need an arch specific set_swbp() implementation together
with your patch.

Guo, thoughts? ...text patching on RISC-V is still a big WIP.

> But okay, if it doesn't solve the problem, then I must be wrong somewhere.

Yes, this bug is another issue.

>> We're taking an ebreak trap from kernel space. In this case we should
>> never look for a userland (uprobe) handler at all, only the kprobe
>> handlers should be considered.
>> 
>> In this case, the TIF_UPROBE is incorrectly set, and incorrectly (not)
>> handled in the "common entry" exit path, which takes us to the infinite
>> loop.
>
> This change makes a lot of sense, no reason to check for uprobes if exception
> comes from the kernel.

Ok! I sent a patch proper for this.


Björn
Nam Cao Aug. 27, 2023, 9:39 a.m. UTC | #6
On Sun, Aug 27, 2023 at 11:04:34AM +0200, Björn Töpel wrote:
> Nam Cao <namcaov@gmail.com> writes:
> 
> > On Sun, Aug 27, 2023 at 10:11:25AM +0200, Björn Töpel wrote:
> >> The default implementation of is_trap_insn() which RISC-V is using calls
> >> is_swbp_insn(), which is doing what your patch does. Your patch does not
> >> address the issue.
> >
> > is_swbp_insn() does this:
> >
> >         #ifdef CONFIG_RISCV_ISA_C
> >                 return (*insn & 0xffff) == UPROBE_SWBP_INSN;
> >         #else
> >                 return *insn == UPROBE_SWBP_INSN;
> >         #endif
> >
> > ...so it doesn't even check for 32-bit ebreak if C extension is on. My patch
> > is not the same.
> 
> Ah, was too quick.
> 
> AFAIU uprobes *always* uses c.ebreak when CONFIG_RISCV_ISA_C is set, and
> ebreak otherwise. That's the reason is_swbp_insn() is implemented like
> that.

That's what I understand too.

> If that's not the case, there's a another bug, that your patches
> addresses.

I think it's a bug regardless. is_trap_insn() is used by uprobes to figure out
if there is an instruction that generates trap exception, not just instructions
that are "SWBP". The reason is because when there is a trap, but uprobes doesn't
see a probe installed here, it needs is_trap_insn() to figure out if the trap
is generated by ebreak from something else, or because the probe is just removed.
In the latter case, uprobes will return back, because probe has already been removed,
so it should be safe to do so. That's why I think the incorrect is_swbp_insn()
would cause a hang, because uprobes incorrectly thinks there is no ebreak there,
so it should be okay to go back, but there actually is.

So, from my understanding, if uprobes encounter a 32-bit ebreak for any reason,
the kernel would hang. I think your patch is a great addition nonetheless, but I
am guessing that it only masks the problem by preventing uprobes from seeing the
32-bit ebreak in the specific test, not really solve it. So, if there is a 32-bit
ebreak in userspace, the bug still causes the kernel to hang.

I am still quite confident of my logic, so I would be very suprised if my fix
doesn't solve the reported hang. Do you mind testing my patch? My potato of a
laptop unfortunately cannot run the test :(

Best regards,
Nam
Björn Töpel Aug. 27, 2023, 7:20 p.m. UTC | #7
Nam Cao <namcaov@gmail.com> writes:

> On Sun, Aug 27, 2023 at 11:04:34AM +0200, Björn Töpel wrote:
>> Nam Cao <namcaov@gmail.com> writes:
>> 
>> > On Sun, Aug 27, 2023 at 10:11:25AM +0200, Björn Töpel wrote:
>> >> The default implementation of is_trap_insn() which RISC-V is using calls
>> >> is_swbp_insn(), which is doing what your patch does. Your patch does not
>> >> address the issue.
>> >
>> > is_swbp_insn() does this:
>> >
>> >         #ifdef CONFIG_RISCV_ISA_C
>> >                 return (*insn & 0xffff) == UPROBE_SWBP_INSN;
>> >         #else
>> >                 return *insn == UPROBE_SWBP_INSN;
>> >         #endif
>> >
>> > ...so it doesn't even check for 32-bit ebreak if C extension is on. My patch
>> > is not the same.
>> 
>> Ah, was too quick.
>> 
>> AFAIU uprobes *always* uses c.ebreak when CONFIG_RISCV_ISA_C is set, and
>> ebreak otherwise. That's the reason is_swbp_insn() is implemented like
>> that.
>
> That's what I understand too.
>
>> If that's not the case, there's a another bug, that your patches
>> addresses.
>
> I think it's a bug regardless. is_trap_insn() is used by uprobes to figure out
> if there is an instruction that generates trap exception, not just instructions
> that are "SWBP". The reason is because when there is a trap, but uprobes doesn't
> see a probe installed here, it needs is_trap_insn() to figure out if the trap
> is generated by ebreak from something else, or because the probe is just removed.
> In the latter case, uprobes will return back, because probe has already been removed,
> so it should be safe to do so. That's why I think the incorrect is_swbp_insn()
> would cause a hang, because uprobes incorrectly thinks there is no ebreak there,
> so it should be okay to go back, but there actually is.
>
> So, from my understanding, if uprobes encounter a 32-bit ebreak for any reason,
> the kernel would hang. I think your patch is a great addition nonetheless, but I
> am guessing that it only masks the problem by preventing uprobes from seeing the
> 32-bit ebreak in the specific test, not really solve it. So, if there is a 32-bit
> ebreak in userspace, the bug still causes the kernel to hang.
>
> I am still quite confident of my logic, so I would be very suprised if my fix
> doesn't solve the reported hang. Do you mind testing my patch? My potato of a
> laptop unfortunately cannot run the test :(

Maybe I wasn't clear, sorry for that! I did take the patch for a spin,
and it did not solve this particular problem.

When we're taking a trap from *kernel*mode, we should never deal with
uprobes at all. Have a look at uprobe_pre_sstep_notifier(), this
function returns 1, which then means that the trap handler exit
premature.

The code you're referring to (called from uprobe_notify_resume()), and
will never be entered, because we're not exiting the trap to
userland. Have a look in kernel/entry/common.c (search for
e.g. TIF_UPROBE).

Now, for your concern, which I see as a potential different bug. Not at
all related to my issue "trap from kernelmode touches uprobe
incorrectly"; A "random" ebreak from *userland* is trapped, when uprobes
is enabled will set the kernel in a hang. I suggest you construct try to
write a simple program to reproduce this!

I had a quick look in the uprobe handling code, and AFAIU the was used
when installing the uprobe as an additional check, and when searching
for an active uprobe. I'm still a bit puzzled how the issue you're
describing could trigger. A reproducer would help!


Cheers,
Björn
Nam Cao Aug. 27, 2023, 7:41 p.m. UTC | #8
On Sun, Aug 27, 2023 at 09:20:44PM +0200, Björn Töpel wrote:
> Nam Cao <namcaov@gmail.com> writes:
> 
> > On Sun, Aug 27, 2023 at 11:04:34AM +0200, Björn Töpel wrote:
> >> Nam Cao <namcaov@gmail.com> writes:
> >> 
> >> > On Sun, Aug 27, 2023 at 10:11:25AM +0200, Björn Töpel wrote:
> >> >> The default implementation of is_trap_insn() which RISC-V is using calls
> >> >> is_swbp_insn(), which is doing what your patch does. Your patch does not
> >> >> address the issue.
> >> >
> >> > is_swbp_insn() does this:
> >> >
> >> >         #ifdef CONFIG_RISCV_ISA_C
> >> >                 return (*insn & 0xffff) == UPROBE_SWBP_INSN;
> >> >         #else
> >> >                 return *insn == UPROBE_SWBP_INSN;
> >> >         #endif
> >> >
> >> > ...so it doesn't even check for 32-bit ebreak if C extension is on. My patch
> >> > is not the same.
> >> 
> >> Ah, was too quick.
> >> 
> >> AFAIU uprobes *always* uses c.ebreak when CONFIG_RISCV_ISA_C is set, and
> >> ebreak otherwise. That's the reason is_swbp_insn() is implemented like
> >> that.
> >
> > That's what I understand too.
> >
> >> If that's not the case, there's a another bug, that your patches
> >> addresses.
> >
> > I think it's a bug regardless. is_trap_insn() is used by uprobes to figure out
> > if there is an instruction that generates trap exception, not just instructions
> > that are "SWBP". The reason is because when there is a trap, but uprobes doesn't
> > see a probe installed here, it needs is_trap_insn() to figure out if the trap
> > is generated by ebreak from something else, or because the probe is just removed.
> > In the latter case, uprobes will return back, because probe has already been removed,
> > so it should be safe to do so. That's why I think the incorrect is_swbp_insn()
> > would cause a hang, because uprobes incorrectly thinks there is no ebreak there,
> > so it should be okay to go back, but there actually is.
> >
> > So, from my understanding, if uprobes encounter a 32-bit ebreak for any reason,
> > the kernel would hang. I think your patch is a great addition nonetheless, but I
> > am guessing that it only masks the problem by preventing uprobes from seeing the
> > 32-bit ebreak in the specific test, not really solve it. So, if there is a 32-bit
> > ebreak in userspace, the bug still causes the kernel to hang.
> >
> > I am still quite confident of my logic, so I would be very suprised if my fix
> > doesn't solve the reported hang. Do you mind testing my patch? My potato of a
> > laptop unfortunately cannot run the test :(
> 
> Maybe I wasn't clear, sorry for that! I did take the patch for a spin,
> and it did not solve this particular problem.

Okay, thanks for the comfirmation!
 
> When we're taking a trap from *kernel*mode, we should never deal with
> uprobes at all. Have a look at uprobe_pre_sstep_notifier(), this
> function returns 1, which then means that the trap handler exit
> premature.
>
> The code you're referring to (called from uprobe_notify_resume()), and
> will never be entered, because we're not exiting the trap to
> userland. Have a look in kernel/entry/common.c (search for
> e.g. TIF_UPROBE).

I will think about this a bit and answer later. I will answer the below part
first.
 
> Now, for your concern, which I see as a potential different bug. Not at
> all related to my issue "trap from kernelmode touches uprobe
> incorrectly"; A "random" ebreak from *userland* is trapped, when uprobes
> is enabled will set the kernel in a hang. I suggest you construct try to
> write a simple program to reproduce this!
> 
> I had a quick look in the uprobe handling code, and AFAIU the was used
> when installing the uprobe as an additional check, and when searching
> for an active uprobe. I'm still a bit puzzled how the issue you're
> describing could trigger. A reproducer would help!

I have just produced the problem, using this small program:

	.global _start                                                                                      
	_start:
		addi x0, x1, 0
		addi x0, x1, 1
	        addi x0, x1, 2
	.option push
	.option arch, -c
	        ebreak
	.option pop
	        ecall

Compile that with
	riscv64-linux-gnu-gcc test.s -nostdlib -static -o ebreak

And setup uprobes by:
	mount -t tracefs nodev /sys/kernel/tracing/
	echo "p /ebreak:0x0000010c" > /sys/kernel/tracing/uprobe_events
	echo 1 > /sys/kernel/tracing/events/uprobes/enable

(obviously you would have to edit the offset value to be _start symbol of your
binary)

Then I execute the program, and the kernel loop indefinitely (it keeps going in
and out of exception handler).

Then I apply my patch, then the kernel doesn't loop anymore.

So I think it is a valid issue, and I will send a proper patch to fix this.

Best regards,
Nam
Nam Cao Aug. 27, 2023, 8:15 p.m. UTC | #9
On Sun, Aug 27, 2023 at 09:20:44PM +0200, Björn Töpel wrote:
> Nam Cao <namcaov@gmail.com> writes:
> 
> > On Sun, Aug 27, 2023 at 11:04:34AM +0200, Björn Töpel wrote:
> >> Nam Cao <namcaov@gmail.com> writes:
> >> 
> >> > On Sun, Aug 27, 2023 at 10:11:25AM +0200, Björn Töpel wrote:
> >> >> The default implementation of is_trap_insn() which RISC-V is using calls
> >> >> is_swbp_insn(), which is doing what your patch does. Your patch does not
> >> >> address the issue.
> >> >
> >> > is_swbp_insn() does this:
> >> >
> >> >         #ifdef CONFIG_RISCV_ISA_C
> >> >                 return (*insn & 0xffff) == UPROBE_SWBP_INSN;
> >> >         #else
> >> >                 return *insn == UPROBE_SWBP_INSN;
> >> >         #endif
> >> >
> >> > ...so it doesn't even check for 32-bit ebreak if C extension is on. My patch
> >> > is not the same.
> >> 
> >> Ah, was too quick.
> >> 
> >> AFAIU uprobes *always* uses c.ebreak when CONFIG_RISCV_ISA_C is set, and
> >> ebreak otherwise. That's the reason is_swbp_insn() is implemented like
> >> that.
> >
> > That's what I understand too.
> >
> >> If that's not the case, there's a another bug, that your patches
> >> addresses.
> >
> > I think it's a bug regardless. is_trap_insn() is used by uprobes to figure out
> > if there is an instruction that generates trap exception, not just instructions
> > that are "SWBP". The reason is because when there is a trap, but uprobes doesn't
> > see a probe installed here, it needs is_trap_insn() to figure out if the trap
> > is generated by ebreak from something else, or because the probe is just removed.
> > In the latter case, uprobes will return back, because probe has already been removed,
> > so it should be safe to do so. That's why I think the incorrect is_swbp_insn()
> > would cause a hang, because uprobes incorrectly thinks there is no ebreak there,
> > so it should be okay to go back, but there actually is.
> >
> > So, from my understanding, if uprobes encounter a 32-bit ebreak for any reason,
> > the kernel would hang. I think your patch is a great addition nonetheless, but I
> > am guessing that it only masks the problem by preventing uprobes from seeing the
> > 32-bit ebreak in the specific test, not really solve it. So, if there is a 32-bit
> > ebreak in userspace, the bug still causes the kernel to hang.
> >
> > I am still quite confident of my logic, so I would be very suprised if my fix
> > doesn't solve the reported hang. Do you mind testing my patch? My potato of a
> > laptop unfortunately cannot run the test :(
> 
> Maybe I wasn't clear, sorry for that! I did take the patch for a spin,
> and it did not solve this particular problem.
> 
> When we're taking a trap from *kernel*mode, we should never deal with
> uprobes at all. Have a look at uprobe_pre_sstep_notifier(), this
> function returns 1, which then means that the trap handler exit
> premature.
> 
> The code you're referring to (called from uprobe_notify_resume()), and
> will never be entered, because we're not exiting the trap to
> userland. Have a look in kernel/entry/common.c (search for
> e.g. TIF_UPROBE).

Ah right, uprobe_notify_resume() is not called if we do not return to user
space. My bad, I thought it is called. Thanks for the discussion, now why I can
see my patch is irrelevant, and your patch is the correct fix for the reported
problem.

Best regards,
Nam
diff mbox series

Patch

diff --git a/arch/riscv/kernel/traps.c b/arch/riscv/kernel/traps.c
index f798c853bede..1198cb879d2f 100644
--- a/arch/riscv/kernel/traps.c
+++ b/arch/riscv/kernel/traps.c
@@ -248,23 +248,29 @@  static inline unsigned long get_break_insn_length(unsigned long pc)
 
 void handle_break(struct pt_regs *regs)
 {
+       bool user = user_mode(regs);
+
 #ifdef CONFIG_KPROBES
-       if (kprobe_single_step_handler(regs))
-               return;
+       if (!user) {
+               if (kprobe_single_step_handler(regs))
+                       return;
 
-       if (kprobe_breakpoint_handler(regs))
-               return;
+               if (kprobe_breakpoint_handler(regs))
+                       return;
+       }
 #endif
 #ifdef CONFIG_UPROBES
-       if (uprobe_single_step_handler(regs))
-               return;
+       if (user) {
+               if (uprobe_single_step_handler(regs))
+                       return;
 
-       if (uprobe_breakpoint_handler(regs))
-               return;
+               if (uprobe_breakpoint_handler(regs))
+                       return;
+       }
 #endif
        current->thread.bad_cause = regs->cause;
 
-       if (user_mode(regs))
+       if (user)
                force_sig_fault(SIGTRAP, TRAP_BRKPT, (void __user *)regs->epc);
 #ifdef CONFIG_KGDB
        else if (notify_die(DIE_TRAP, "EBREAK", regs, 0, regs->cause, SIGTRAP)