diff mbox series

riscv: signal: handle syscall restart before get_signal

Message ID 20230803224458.4156006-1-ancientmodern4@gmail.com (mailing list archive)
State Accepted
Commit ce4f78f1b53d3327fbd32764aa333bf05fb68818
Headers show
Series riscv: signal: handle syscall restart before get_signal | expand

Commit Message

Haorong Lu Aug. 3, 2023, 10:44 p.m. UTC
In the current riscv implementation, blocking syscalls like read() may
not correctly restart after being interrupted by ptrace. This problem
arises when the syscall restart process in arch_do_signal_or_restart()
is bypassed due to changes to the regs->cause register, such as an
ebreak instruction.

Steps to reproduce:
1. Interrupt the tracee process with PTRACE_SEIZE & PTRACE_INTERRUPT.
2. Backup original registers and instruction at new_pc.
3. Change pc to new_pc, and inject an instruction (like ebreak) to this
   address.
4. Resume with PTRACE_CONT and wait for the process to stop again after
   executing ebreak.
5. Restore original registers and instructions, and detach from the
   tracee process.
6. Now the read() syscall in tracee will return -1 with errno set to
   ERESTARTSYS.

Specifically, during an interrupt, the regs->cause changes from
EXC_SYSCALL to EXC_BREAKPOINT due to the injected ebreak, which is
inaccessible via ptrace so we cannot restore it. This alteration breaks
the syscall restart condition and ends the read() syscall with an
ERESTARTSYS error. According to include/linux/errno.h, it should never
be seen by user programs. X86 can avoid this issue as it checks the
syscall condition using a register (orig_ax) exposed to user space.
Arm64 handles syscall restart before calling get_signal, where it could
be paused and inspected by ptrace/debugger.

This patch adjusts the riscv implementation to arm64 style, which also
checks syscall using a kernel register (syscallno). It ensures the
syscall restart process is not bypassed when changes to the cause
register occur, providing more consistent behavior across various
architectures.

For a simplified reproduction program, feel free to visit:
https://github.com/ancientmodern/riscv-ptrace-bug-demo.

Signed-off-by: Haorong Lu <ancientmodern4@gmail.com>
---
 arch/riscv/kernel/signal.c | 85 +++++++++++++++++++++-----------------
 1 file changed, 46 insertions(+), 39 deletions(-)

Comments

Guo Ren Aug. 4, 2023, 1:08 a.m. UTC | #1
On Fri, Aug 4, 2023 at 6:45 AM Haorong Lu <ancientmodern4@gmail.com> wrote:
>
> In the current riscv implementation, blocking syscalls like read() may
> not correctly restart after being interrupted by ptrace. This problem
> arises when the syscall restart process in arch_do_signal_or_restart()
> is bypassed due to changes to the regs->cause register, such as an
> ebreak instruction.
>
> Steps to reproduce:
> 1. Interrupt the tracee process with PTRACE_SEIZE & PTRACE_INTERRUPT.
> 2. Backup original registers and instruction at new_pc.
> 3. Change pc to new_pc, and inject an instruction (like ebreak) to this
>    address.
> 4. Resume with PTRACE_CONT and wait for the process to stop again after
>    executing ebreak.
> 5. Restore original registers and instructions, and detach from the
>    tracee process.
> 6. Now the read() syscall in tracee will return -1 with errno set to
>    ERESTARTSYS.
>
> Specifically, during an interrupt, the regs->cause changes from
> EXC_SYSCALL to EXC_BREAKPOINT due to the injected ebreak, which is
> inaccessible via ptrace so we cannot restore it. This alteration breaks
> the syscall restart condition and ends the read() syscall with an
> ERESTARTSYS error. According to include/linux/errno.h, it should never
> be seen by user programs. X86 can avoid this issue as it checks the
> syscall condition using a register (orig_ax) exposed to user space.
> Arm64 handles syscall restart before calling get_signal, where it could
> be paused and inspected by ptrace/debugger.
>
> This patch adjusts the riscv implementation to arm64 style, which also
> checks syscall using a kernel register (syscallno). It ensures the
> syscall restart process is not bypassed when changes to the cause
> register occur, providing more consistent behavior across various
> architectures.
>
> For a simplified reproduction program, feel free to visit:
> https://github.com/ancientmodern/riscv-ptrace-bug-demo.
>
> Signed-off-by: Haorong Lu <ancientmodern4@gmail.com>
> ---
>  arch/riscv/kernel/signal.c | 85 +++++++++++++++++++++-----------------
>  1 file changed, 46 insertions(+), 39 deletions(-)
>
> diff --git a/arch/riscv/kernel/signal.c b/arch/riscv/kernel/signal.c
> index 180d951d3624..d2d7169048ea 100644
> --- a/arch/riscv/kernel/signal.c
> +++ b/arch/riscv/kernel/signal.c
> @@ -391,30 +391,6 @@ static void handle_signal(struct ksignal *ksig, struct pt_regs *regs)
>         sigset_t *oldset = sigmask_to_save();
>         int ret;
>
> -       /* Are we from a system call? */
> -       if (regs->cause == EXC_SYSCALL) {
> -               /* Avoid additional syscall restarting via ret_from_exception */
> -               regs->cause = -1UL;
> -               /* If so, check system call restarting.. */
> -               switch (regs->a0) {
> -               case -ERESTART_RESTARTBLOCK:
> -               case -ERESTARTNOHAND:
> -                       regs->a0 = -EINTR;
> -                       break;
> -
> -               case -ERESTARTSYS:
> -                       if (!(ksig->ka.sa.sa_flags & SA_RESTART)) {
> -                               regs->a0 = -EINTR;
> -                               break;
> -                       }
> -                       fallthrough;
> -               case -ERESTARTNOINTR:
> -                        regs->a0 = regs->orig_a0;
> -                       regs->epc -= 0x4;
> -                       break;
> -               }
> -       }
> -
>         rseq_signal_deliver(ksig, regs);
>
>         /* Set up the stack frame */
> @@ -428,35 +404,66 @@ static void handle_signal(struct ksignal *ksig, struct pt_regs *regs)
>
>  void arch_do_signal_or_restart(struct pt_regs *regs)
>  {
> +       unsigned long continue_addr = 0, restart_addr = 0;
> +       int retval = 0;
>         struct ksignal ksig;
> +       bool syscall = (regs->cause == EXC_SYSCALL);
>
> -       if (get_signal(&ksig)) {
> -               /* Actually deliver the signal */
> -               handle_signal(&ksig, regs);
> -               return;
> -       }
> +       /* If we were from a system call, check for system call restarting */
> +       if (syscall) {
> +               continue_addr = regs->epc;
> +               restart_addr = continue_addr - 4;
> +               retval = regs->a0;
>
> -       /* Did we come from a system call? */
> -       if (regs->cause == EXC_SYSCALL) {
>                 /* Avoid additional syscall restarting via ret_from_exception */
>                 regs->cause = -1UL;
>
> -               /* Restart the system call - no handlers present */
> -               switch (regs->a0) {
> +               /*
> +                * Prepare for system call restart. We do this here so that a
> +                * debugger will see the already changed PC.
> +                */
> +               switch (retval) {
>                 case -ERESTARTNOHAND:
>                 case -ERESTARTSYS:
>                 case -ERESTARTNOINTR:
> -                        regs->a0 = regs->orig_a0;
> -                       regs->epc -= 0x4;
> -                       break;
>                 case -ERESTART_RESTARTBLOCK:
> -                        regs->a0 = regs->orig_a0;
> -                       regs->a7 = __NR_restart_syscall;
> -                       regs->epc -= 0x4;
> +                       regs->a0 = regs->orig_a0;
> +                       regs->epc = restart_addr;
>                         break;
>                 }
>         }
>
> +       /*
> +        * Get the signal to deliver. When running under ptrace, at this point
> +        * the debugger may change all of our registers.
> +        */
> +       if (get_signal(&ksig)) {
> +               /*
> +                * Depending on the signal settings, we may need to revert the
> +                * decision to restart the system call, but skip this if a
> +                * debugger has chosen to restart at a different PC.
> +                */
> +               if (regs->epc == restart_addr &&
> +                   (retval == -ERESTARTNOHAND ||
> +                    retval == -ERESTART_RESTARTBLOCK ||
> +                    (retval == -ERESTARTSYS &&
> +                     !(ksig.ka.sa.sa_flags & SA_RESTART)))) {
> +                       regs->a0 = -EINTR;
> +                       regs->epc = continue_addr;
> +               }
> +
> +               /* Actually deliver the signal */
> +               handle_signal(&ksig, regs);
> +               return;
> +       }
> +
> +       /*
> +        * Handle restarting a different system call. As above, if a debugger
> +        * has chosen to restart at a different PC, ignore the restart.
> +        */
> +       if (syscall && regs->epc == restart_addr && retval == -ERESTART_RESTARTBLOCK)
> +               regs->a7 = __NR_restart_syscall;
> +
I thought your patch contains two parts:
1. bugfix
2. Some coding conventions or adjusting some logic of the original signal.

Could we separate them into two pieces and make the bugfix one
minimalistic? Then, people could easier to review your patches.

>         /*
>          * If there is no signal to deliver, we just put the saved
>          * sigmask back.
> --
> 2.41.0
>
Haorong Lu Aug. 4, 2023, 6:36 a.m. UTC | #2
On Fri, Aug 04, 2023 at 09:08:53AM +0800, Guo Ren wrote:
> On Fri, Aug 4, 2023 at 6:45 AM Haorong Lu <ancientmodern4@gmail.com> wrote:
> >
> > In the current riscv implementation, blocking syscalls like read() may
> > not correctly restart after being interrupted by ptrace. This problem
> > arises when the syscall restart process in arch_do_signal_or_restart()
> > is bypassed due to changes to the regs->cause register, such as an
> > ebreak instruction.
> >
> > Steps to reproduce:
> > 1. Interrupt the tracee process with PTRACE_SEIZE & PTRACE_INTERRUPT.
> > 2. Backup original registers and instruction at new_pc.
> > 3. Change pc to new_pc, and inject an instruction (like ebreak) to this
> >    address.
> > 4. Resume with PTRACE_CONT and wait for the process to stop again after
> >    executing ebreak.
> > 5. Restore original registers and instructions, and detach from the
> >    tracee process.
> > 6. Now the read() syscall in tracee will return -1 with errno set to
> >    ERESTARTSYS.
> >
> > Specifically, during an interrupt, the regs->cause changes from
> > EXC_SYSCALL to EXC_BREAKPOINT due to the injected ebreak, which is
> > inaccessible via ptrace so we cannot restore it. This alteration breaks
> > the syscall restart condition and ends the read() syscall with an
> > ERESTARTSYS error. According to include/linux/errno.h, it should never
> > be seen by user programs. X86 can avoid this issue as it checks the
> > syscall condition using a register (orig_ax) exposed to user space.
> > Arm64 handles syscall restart before calling get_signal, where it could
> > be paused and inspected by ptrace/debugger.
> >
> > This patch adjusts the riscv implementation to arm64 style, which also
> > checks syscall using a kernel register (syscallno). It ensures the
> > syscall restart process is not bypassed when changes to the cause
> > register occur, providing more consistent behavior across various
> > architectures.
> >
> > For a simplified reproduction program, feel free to visit:
> > https://github.com/ancientmodern/riscv-ptrace-bug-demo.
> >
> > Signed-off-by: Haorong Lu <ancientmodern4@gmail.com>
> > ---
> >  arch/riscv/kernel/signal.c | 85 +++++++++++++++++++++-----------------
> >  1 file changed, 46 insertions(+), 39 deletions(-)
> >
> > diff --git a/arch/riscv/kernel/signal.c b/arch/riscv/kernel/signal.c
> > index 180d951d3624..d2d7169048ea 100644
> > --- a/arch/riscv/kernel/signal.c
> > +++ b/arch/riscv/kernel/signal.c
> > @@ -391,30 +391,6 @@ static void handle_signal(struct ksignal *ksig, struct pt_regs *regs)
> >         sigset_t *oldset = sigmask_to_save();
> >         int ret;
> >
> > -       /* Are we from a system call? */
> > -       if (regs->cause == EXC_SYSCALL) {
> > -               /* Avoid additional syscall restarting via ret_from_exception */
> > -               regs->cause = -1UL;
> > -               /* If so, check system call restarting.. */
> > -               switch (regs->a0) {
> > -               case -ERESTART_RESTARTBLOCK:
> > -               case -ERESTARTNOHAND:
> > -                       regs->a0 = -EINTR;
> > -                       break;
> > -
> > -               case -ERESTARTSYS:
> > -                       if (!(ksig->ka.sa.sa_flags & SA_RESTART)) {
> > -                               regs->a0 = -EINTR;
> > -                               break;
> > -                       }
> > -                       fallthrough;
> > -               case -ERESTARTNOINTR:
> > -                        regs->a0 = regs->orig_a0;
> > -                       regs->epc -= 0x4;
> > -                       break;
> > -               }
> > -       }
> > -
> >         rseq_signal_deliver(ksig, regs);
> >
> >         /* Set up the stack frame */
> > @@ -428,35 +404,66 @@ static void handle_signal(struct ksignal *ksig, struct pt_regs *regs)
> >
> >  void arch_do_signal_or_restart(struct pt_regs *regs)
> >  {
> > +       unsigned long continue_addr = 0, restart_addr = 0;
> > +       int retval = 0;
> >         struct ksignal ksig;
> > +       bool syscall = (regs->cause == EXC_SYSCALL);
> >
> > -       if (get_signal(&ksig)) {
> > -               /* Actually deliver the signal */
> > -               handle_signal(&ksig, regs);
> > -               return;
> > -       }
> > +       /* If we were from a system call, check for system call restarting */
> > +       if (syscall) {
> > +               continue_addr = regs->epc;
> > +               restart_addr = continue_addr - 4;
> > +               retval = regs->a0;
> >
> > -       /* Did we come from a system call? */
> > -       if (regs->cause == EXC_SYSCALL) {
> >                 /* Avoid additional syscall restarting via ret_from_exception */
> >                 regs->cause = -1UL;
> >
> > -               /* Restart the system call - no handlers present */
> > -               switch (regs->a0) {
> > +               /*
> > +                * Prepare for system call restart. We do this here so that a
> > +                * debugger will see the already changed PC.
> > +                */
> > +               switch (retval) {
> >                 case -ERESTARTNOHAND:
> >                 case -ERESTARTSYS:
> >                 case -ERESTARTNOINTR:
> > -                        regs->a0 = regs->orig_a0;
> > -                       regs->epc -= 0x4;
> > -                       break;
> >                 case -ERESTART_RESTARTBLOCK:
> > -                        regs->a0 = regs->orig_a0;
> > -                       regs->a7 = __NR_restart_syscall;
> > -                       regs->epc -= 0x4;
> > +                       regs->a0 = regs->orig_a0;
> > +                       regs->epc = restart_addr;
> >                         break;
> >                 }
> >         }
> >
> > +       /*
> > +        * Get the signal to deliver. When running under ptrace, at this point
> > +        * the debugger may change all of our registers.
> > +        */
> > +       if (get_signal(&ksig)) {
> > +               /*
> > +                * Depending on the signal settings, we may need to revert the
> > +                * decision to restart the system call, but skip this if a
> > +                * debugger has chosen to restart at a different PC.
> > +                */
> > +               if (regs->epc == restart_addr &&
> > +                   (retval == -ERESTARTNOHAND ||
> > +                    retval == -ERESTART_RESTARTBLOCK ||
> > +                    (retval == -ERESTARTSYS &&
> > +                     !(ksig.ka.sa.sa_flags & SA_RESTART)))) {
> > +                       regs->a0 = -EINTR;
> > +                       regs->epc = continue_addr;
> > +               }
> > +
> > +               /* Actually deliver the signal */
> > +               handle_signal(&ksig, regs);
> > +               return;
> > +       }
> > +
> > +       /*
> > +        * Handle restarting a different system call. As above, if a debugger
> > +        * has chosen to restart at a different PC, ignore the restart.
> > +        */
> > +       if (syscall && regs->epc == restart_addr && retval == -ERESTART_RESTARTBLOCK)
> > +               regs->a7 = __NR_restart_syscall;
> > +
> I thought your patch contains two parts:
> 1. bugfix
> 2. Some coding conventions or adjusting some logic of the original signal.
> 
> Could we separate them into two pieces and make the bugfix one
> minimalistic? Then, people could easier to review your patches.

Hi Guo, thanks for your feedback!

AFAIU modifying logic of these two functions are the means to fix this
bug. These changes should not affect specific signals as they merely
invert the order of handling (syscall restart && pending signals).

Since syscall restart involves many different conditions, providing a
minimal bugfix could be hard and might introduce other issues. And 
actually there is not too many changes:

- move syscall restatrt before get_signal to prevent it being bypassed
  when regs->cause has been changed

- simplify duplicated ERESTARTSYS hanlder in handle_signal to a small
  "if branch" between get_signal and handle_signal

I met this bug accidentally when developing riscv version of a user-
space checkpoint/restore tool. Frankly, I'm fairly new to this area.
Interested to see if there're better solutions to this :)

Best,
Haorong

> 
> >         /*
> >          * If there is no signal to deliver, we just put the saved
> >          * sigmask back.
> > --
> > 2.41.0
> >
> 
> 
> -- 
> Best Regards
>  Guo Ren
patchwork-bot+linux-riscv@kernel.org Nov. 6, 2023, 3 p.m. UTC | #3
Hello:

This patch was applied to riscv/linux.git (for-next)
by Palmer Dabbelt <palmer@rivosinc.com>:

On Thu,  3 Aug 2023 15:44:54 -0700 you wrote:
> In the current riscv implementation, blocking syscalls like read() may
> not correctly restart after being interrupted by ptrace. This problem
> arises when the syscall restart process in arch_do_signal_or_restart()
> is bypassed due to changes to the regs->cause register, such as an
> ebreak instruction.
> 
> Steps to reproduce:
> 1. Interrupt the tracee process with PTRACE_SEIZE & PTRACE_INTERRUPT.
> 2. Backup original registers and instruction at new_pc.
> 3. Change pc to new_pc, and inject an instruction (like ebreak) to this
>    address.
> 4. Resume with PTRACE_CONT and wait for the process to stop again after
>    executing ebreak.
> 5. Restore original registers and instructions, and detach from the
>    tracee process.
> 6. Now the read() syscall in tracee will return -1 with errno set to
>    ERESTARTSYS.
> 
> [...]

Here is the summary with links:
  - riscv: signal: handle syscall restart before get_signal
    https://git.kernel.org/riscv/c/ce4f78f1b53d

You are awesome, thank you!
diff mbox series

Patch

diff --git a/arch/riscv/kernel/signal.c b/arch/riscv/kernel/signal.c
index 180d951d3624..d2d7169048ea 100644
--- a/arch/riscv/kernel/signal.c
+++ b/arch/riscv/kernel/signal.c
@@ -391,30 +391,6 @@  static void handle_signal(struct ksignal *ksig, struct pt_regs *regs)
 	sigset_t *oldset = sigmask_to_save();
 	int ret;
 
-	/* Are we from a system call? */
-	if (regs->cause == EXC_SYSCALL) {
-		/* Avoid additional syscall restarting via ret_from_exception */
-		regs->cause = -1UL;
-		/* If so, check system call restarting.. */
-		switch (regs->a0) {
-		case -ERESTART_RESTARTBLOCK:
-		case -ERESTARTNOHAND:
-			regs->a0 = -EINTR;
-			break;
-
-		case -ERESTARTSYS:
-			if (!(ksig->ka.sa.sa_flags & SA_RESTART)) {
-				regs->a0 = -EINTR;
-				break;
-			}
-			fallthrough;
-		case -ERESTARTNOINTR:
-                        regs->a0 = regs->orig_a0;
-			regs->epc -= 0x4;
-			break;
-		}
-	}
-
 	rseq_signal_deliver(ksig, regs);
 
 	/* Set up the stack frame */
@@ -428,35 +404,66 @@  static void handle_signal(struct ksignal *ksig, struct pt_regs *regs)
 
 void arch_do_signal_or_restart(struct pt_regs *regs)
 {
+	unsigned long continue_addr = 0, restart_addr = 0;
+	int retval = 0;
 	struct ksignal ksig;
+	bool syscall = (regs->cause == EXC_SYSCALL);
 
-	if (get_signal(&ksig)) {
-		/* Actually deliver the signal */
-		handle_signal(&ksig, regs);
-		return;
-	}
+	/* If we were from a system call, check for system call restarting */
+	if (syscall) {
+		continue_addr = regs->epc;
+		restart_addr = continue_addr - 4;
+		retval = regs->a0;
 
-	/* Did we come from a system call? */
-	if (regs->cause == EXC_SYSCALL) {
 		/* Avoid additional syscall restarting via ret_from_exception */
 		regs->cause = -1UL;
 
-		/* Restart the system call - no handlers present */
-		switch (regs->a0) {
+		/*
+		 * Prepare for system call restart. We do this here so that a
+		 * debugger will see the already changed PC.
+		 */
+		switch (retval) {
 		case -ERESTARTNOHAND:
 		case -ERESTARTSYS:
 		case -ERESTARTNOINTR:
-                        regs->a0 = regs->orig_a0;
-			regs->epc -= 0x4;
-			break;
 		case -ERESTART_RESTARTBLOCK:
-                        regs->a0 = regs->orig_a0;
-			regs->a7 = __NR_restart_syscall;
-			regs->epc -= 0x4;
+			regs->a0 = regs->orig_a0;
+			regs->epc = restart_addr;
 			break;
 		}
 	}
 
+	/*
+	 * Get the signal to deliver. When running under ptrace, at this point
+	 * the debugger may change all of our registers.
+	 */
+	if (get_signal(&ksig)) {
+		/*
+		 * Depending on the signal settings, we may need to revert the
+		 * decision to restart the system call, but skip this if a
+		 * debugger has chosen to restart at a different PC.
+		 */
+		if (regs->epc == restart_addr &&
+		    (retval == -ERESTARTNOHAND ||
+		     retval == -ERESTART_RESTARTBLOCK ||
+		     (retval == -ERESTARTSYS &&
+		      !(ksig.ka.sa.sa_flags & SA_RESTART)))) {
+			regs->a0 = -EINTR;
+			regs->epc = continue_addr;
+		}
+
+		/* Actually deliver the signal */
+		handle_signal(&ksig, regs);
+		return;
+	}
+
+	/*
+	 * Handle restarting a different system call. As above, if a debugger
+	 * has chosen to restart at a different PC, ignore the restart.
+	 */
+	if (syscall && regs->epc == restart_addr && retval == -ERESTART_RESTARTBLOCK)
+		regs->a7 = __NR_restart_syscall;
+
 	/*
 	 * If there is no signal to deliver, we just put the saved
 	 * sigmask back.