diff mbox series

[v2] arm64:kgdb: Fix kernel single-stepping

Message ID 20200219152403.3495-1-minyard@acm.org (mailing list archive)
State New, archived
Headers show
Series [v2] arm64:kgdb: Fix kernel single-stepping | expand

Commit Message

Corey Minyard Feb. 19, 2020, 3:24 p.m. UTC
From: Corey Minyard <cminyard@mvista.com>

I was working on a single-step bug on kgdb on an ARM64 system, and I saw
this scenario:

* A single step is setup to return to el1
* The ERET return to el1
* An interrupt is pending and runs before the instruction
* As soon as PSTATE.D (the debug disable bit) is cleared, the single
    step happens in that location, not where it should have.

This appears to be due to PSTATE.SS not being cleared when the exception
happens.  Per section D.2.12.5 of the ARMv8 reference manual, that
appears to be incorrect, it says "As part of exception entry, the PE
does all of the following: ...  Sets PSTATE.SS to 0."

However, I appear to not be the first person who has noticed this.  In
the el0-only portion of the kernel_entry macro in entry.S, I found the
following comment: "Ensure MDSCR_EL1.SS is clear, since we can unmask
debug exceptions when scheduling."  Exactly the same scenario, except
coming from a userland single step, not a kernel one.

As I was studying this, though, I realized that the following scenario
had an issue:

* Kernel enables MDSCR.SS, MDSCR.KDE, MDSCR.MDE (unnecessary), and
  PSTATE.SS to enable a single step in el1, for kgdb or kprobes,
  on the current CPU's MDSCR register and the process' PSTATE.SS
  register.
* Kernel returns from the exception with ERET.
* An interrupt or page fault happens on the instruction, causing the
  instruction to not be run, but the exception handler runs.
* The exception causes the task to migrate to a new core.
* The return from the exception runs on a different processor now,
  where the MDSCR values are not set up for a single step.
* The single step fails to happen.

This is bad for kgdb, of course, but it seems really bad for kprobes if
this happens.

To fix both these problems, rework the handling of single steps to clear
things out upon entry to the kernel from el1, and then to set up single
step when returning to el1, and not do the setup in debug-monitors.c.
This means that single stepping does not use
enable/disable_debug_monitors(); it is no longer necessary to track
those flags for single stepping.  This is much like single stepping is
handled for el0.  A new flag is added in pt_regs to enable single
stepping from el1.  Unfortunately, the old value of PSTATE.SS cannot be
used for this because of the hardware bug mentioned earlier.

As part of this, there is an interaction between single stepping and the
other users of debug monitors with the MDSCR.KDE bit.  That bit has to
be set for both hardware breakpoints at el1 and single stepping at el1.
A new variable was created to store the cpu-wide value of MDSCR.KDE; the
single stepping code makes sure not to clear that bit on kernel entry if
it's set in the per-cpu variable.

After fixing this and doing some more testing, I ran into another issue:

* Kernel enables the pt_regs single step
* Kernel returns from the exception with ERET.
* An interrupt or page fault happens on the instruction, causing the
  instruction to not be run, but the exception handler runs.
* The exception handling hits a breakpoint and stops.
* The user continues from the breakpoint, so the kernel is no longer
  expecting a single step.
* On the return from the first exception, the single step flag in
  pt_regs is still set, so a single step trap happens.
* The kernel keels over from an unexpected single step.

There's no easy way to find the pt_regs that has the single step flag
set.  So a thread info flag was added so that the single step could be
disabled in this case.  Both that flag and the flag in pt_regs must be
set to enable a single step.

Signed-off-by: Corey Minyard <cminyard@mvista.com>
---
Changes from v1:

After studying the EL0 handling for this, I realized an issue with using
MDSCR to check if single step is enabled: it can be expensive on a VM.
So check the task flag first to see if single step is enabled.  Then
check MDSCR if the task flag is set.

 arch/arm64/include/asm/debug-monitors.h |  3 +-
 arch/arm64/include/asm/ptrace.h         |  4 +-
 arch/arm64/include/asm/thread_info.h    |  1 +
 arch/arm64/kernel/asm-offsets.c         |  1 +
 arch/arm64/kernel/debug-monitors.c      | 61 ++++++++++++++++++++++---
 arch/arm64/kernel/entry.S               | 44 ++++++++++++++++++
 arch/arm64/kernel/kgdb.c                |  6 ++-
 7 files changed, 109 insertions(+), 11 deletions(-)

Comments

Daniel Thompson Feb. 20, 2020, 2:06 p.m. UTC | #1
On Wed, Feb 19, 2020 at 09:24:03AM -0600, minyard@acm.org wrote:
> From: Corey Minyard <cminyard@mvista.com>
> 
> I was working on a single-step bug on kgdb on an ARM64 system, and I saw
> this scenario:
> 
> * A single step is setup to return to el1
> * The ERET return to el1
> * An interrupt is pending and runs before the instruction
> * As soon as PSTATE.D (the debug disable bit) is cleared, the single
>     step happens in that location, not where it should have.
> 
> This appears to be due to PSTATE.SS not being cleared when the exception
> happens.  Per section D.2.12.5 of the ARMv8 reference manual, that
> appears to be incorrect, it says "As part of exception entry, the PE
> does all of the following: ...  Sets PSTATE.SS to 0."
> 
> However, I appear to not be the first person who has noticed this.  In
> the el0-only portion of the kernel_entry macro in entry.S, I found the
> following comment: "Ensure MDSCR_EL1.SS is clear, since we can unmask
> debug exceptions when scheduling."  Exactly the same scenario, except
> coming from a userland single step, not a kernel one.
> 
> As I was studying this, though, I realized that the following scenario
> had an issue:
> 
> * Kernel enables MDSCR.SS, MDSCR.KDE, MDSCR.MDE (unnecessary), and
>   PSTATE.SS to enable a single step in el1, for kgdb or kprobes,
>   on the current CPU's MDSCR register and the process' PSTATE.SS
>   register.
> * Kernel returns from the exception with ERET.
> * An interrupt or page fault happens on the instruction, causing the
>   instruction to not be run, but the exception handler runs.
> * The exception causes the task to migrate to a new core.
> * The return from the exception runs on a different processor now,
>   where the MDSCR values are not set up for a single step.
> * The single step fails to happen.
> 
> This is bad for kgdb, of course, but it seems really bad for kprobes if
> this happens.
> 
> To fix both these problems, rework the handling of single steps to clear
> things out upon entry to the kernel from el1, and then to set up single
> step when returning to el1, and not do the setup in debug-monitors.c.
> This means that single stepping does not use
> enable/disable_debug_monitors(); it is no longer necessary to track
> those flags for single stepping.  This is much like single stepping is
> handled for el0.  A new flag is added in pt_regs to enable single
> stepping from el1.  Unfortunately, the old value of PSTATE.SS cannot be
> used for this because of the hardware bug mentioned earlier.
> 
> As part of this, there is an interaction between single stepping and the
> other users of debug monitors with the MDSCR.KDE bit.  That bit has to
> be set for both hardware breakpoints at el1 and single stepping at el1.
> A new variable was created to store the cpu-wide value of MDSCR.KDE; the
> single stepping code makes sure not to clear that bit on kernel entry if
> it's set in the per-cpu variable.
> 
> After fixing this and doing some more testing, I ran into another issue:
> 
> * Kernel enables the pt_regs single step
> * Kernel returns from the exception with ERET.
> * An interrupt or page fault happens on the instruction, causing the
>   instruction to not be run, but the exception handler runs.
> * The exception handling hits a breakpoint and stops.
> * The user continues from the breakpoint, so the kernel is no longer
>   expecting a single step.
> * On the return from the first exception, the single step flag in
>   pt_regs is still set, so a single step trap happens.
> * The kernel keels over from an unexpected single step.
> 
> There's no easy way to find the pt_regs that has the single step flag
> set.  So a thread info flag was added so that the single step could be
> disabled in this case.  Both that flag and the flag in pt_regs must be
> set to enable a single step.
> 
> Signed-off-by: Corey Minyard <cminyard@mvista.com>

I've pointed the kgdbtest suite at this patch (and run one of the
historically unstable test cases an extra 100 times just in case).

kgdbtest hasn't got great coverage, runs the code in qemu and some
of the strongest tests are still marked XFAIL on arm64 (for reasons
unrelated to stepping).

So the best I can say based on the above is that the test suite does not
observe any regression (but equally no improvement). Nevertheless FWIW:


Tested-by: Daniel Thompson <daniel.thompson@linaro.org>


Daniel.
Marc Zyngier Feb. 20, 2020, 2:21 p.m. UTC | #2
On 2020-02-19 15:24, minyard@acm.org wrote:
> From: Corey Minyard <cminyard@mvista.com>

[...]

> After studying the EL0 handling for this, I realized an issue with 
> using
> MDSCR to check if single step is enabled: it can be expensive on a VM.
> So check the task flag first to see if single step is enabled.  Then
> check MDSCR if the task flag is set.

Very tangential remark: I'd really like people *not* to try and optimize
Linux based on the behaviour of a hypervisor. In general, reading a
system register is fast, and the fact that it traps on a given 
hypervisor
at some point may not be true in the future, nor be a valid assumption
across hypervisors.

         M.
Will Deacon Feb. 20, 2020, 2:22 p.m. UTC | #3
On Wed, Feb 19, 2020 at 09:24:03AM -0600, minyard@acm.org wrote:
> From: Corey Minyard <cminyard@mvista.com>
> 
> I was working on a single-step bug on kgdb on an ARM64 system, and I saw
> this scenario:
> 
> * A single step is setup to return to el1
> * The ERET return to el1
> * An interrupt is pending and runs before the instruction
> * As soon as PSTATE.D (the debug disable bit) is cleared, the single
>     step happens in that location, not where it should have.
> 
> This appears to be due to PSTATE.SS not being cleared when the exception
> happens.  Per section D.2.12.5 of the ARMv8 reference manual, that
> appears to be incorrect, it says "As part of exception entry, the PE
> does all of the following: ...  Sets PSTATE.SS to 0."

Sorry, but I don't follow you here. If PSTATE.SS is not cleared, why would
you take the step exception?

> However, I appear to not be the first person who has noticed this.  In
> the el0-only portion of the kernel_entry macro in entry.S, I found the
> following comment: "Ensure MDSCR_EL1.SS is clear, since we can unmask
> debug exceptions when scheduling."  Exactly the same scenario, except
> coming from a userland single step, not a kernel one.

No, I think you might be conflating PSTATE.SS and MDSCR_EL1.SS.

> As I was studying this, though, I realized that the following scenario
> had an issue:
> 
> * Kernel enables MDSCR.SS, MDSCR.KDE, MDSCR.MDE (unnecessary), and
>   PSTATE.SS to enable a single step in el1, for kgdb or kprobes,
>   on the current CPU's MDSCR register and the process' PSTATE.SS
>   register.
> * Kernel returns from the exception with ERET.
> * An interrupt or page fault happens on the instruction, causing the
>   instruction to not be run, but the exception handler runs.
> * The exception causes the task to migrate to a new core.
> * The return from the exception runs on a different processor now,
>   where the MDSCR values are not set up for a single step.
> * The single step fails to happen.
> 
> This is bad for kgdb, of course, but it seems really bad for kprobes if
> this happens.

I don't see how this can happen for kprobes. Have you managed to reproduce
the failure?

> To fix both these problems, rework the handling of single steps to clear
> things out upon entry to the kernel from el1, and then to set up single
> step when returning to el1, and not do the setup in debug-monitors.c.
> This means that single stepping does not use
> enable/disable_debug_monitors(); it is no longer necessary to track
> those flags for single stepping.  This is much like single stepping is
> handled for el0.  A new flag is added in pt_regs to enable single
> stepping from el1.  Unfortunately, the old value of PSTATE.SS cannot be
> used for this because of the hardware bug mentioned earlier.

I don't think there's a hardware bug.

It sound like you're trying to make kernel debugging per-task instead
of per-cpu, but I don't think that's the right thing to do. What if I /want/
to debug an interrupt handler? For example, I might have a watchpoint on
something accessed by timer interrupt.

> As part of this, there is an interaction between single stepping and the
> other users of debug monitors with the MDSCR.KDE bit.  That bit has to
> be set for both hardware breakpoints at el1 and single stepping at el1.
> A new variable was created to store the cpu-wide value of MDSCR.KDE; the
> single stepping code makes sure not to clear that bit on kernel entry if
> it's set in the per-cpu variable.
> 
> After fixing this and doing some more testing, I ran into another issue:
> 
> * Kernel enables the pt_regs single step
> * Kernel returns from the exception with ERET.
> * An interrupt or page fault happens on the instruction, causing the
>   instruction to not be run, but the exception handler runs.

This sounds like you've broken debug; we should take the step exception
in the exception handler. That's the way this is supposed to work.

> There's no easy way to find the pt_regs that has the single step flag
> set.  So a thread info flag was added so that the single step could be
> disabled in this case.  Both that flag and the flag in pt_regs must be
> set to enable a single step.

Honestly, I get the feeling that you don't really understand the code
you're changing here and it's a tonne of effort to try to untangle what
you're doing. That's not necessarily your fault because the debug
architecture is a nightmare to comprehend, but I'm not keen to change it
unless we have a really good justification. I'm sure kgdb is riddled with
bugs but, as I said before, the fixes should be in kgdb, not by tearing
up the low-level debug code (which has the potential to break other users).

Maybe it would be easier if you tried to fix one problem per patch,
preferably with a way to reproduce the issue you're seeing each time?

Will
Corey Minyard Feb. 20, 2020, 2:50 p.m. UTC | #4
On Thu, Feb 20, 2020 at 02:21:36PM +0000, Marc Zyngier wrote:
> On 2020-02-19 15:24, minyard@acm.org wrote:
> > From: Corey Minyard <cminyard@mvista.com>
> 
> [...]
> 
> > After studying the EL0 handling for this, I realized an issue with using
> > MDSCR to check if single step is enabled: it can be expensive on a VM.
> > So check the task flag first to see if single step is enabled.  Then
> > check MDSCR if the task flag is set.
> 
> Very tangential remark: I'd really like people *not* to try and optimize
> Linux based on the behaviour of a hypervisor. In general, reading a
> system register is fast, and the fact that it traps on a given hypervisor
> at some point may not be true in the future, nor be a valid assumption
> across hypervisors.

Normally I would agree, but I based this upon git commit
https://github.com/torvalds/linux/commit/2a2830703a2371b47f7b50b1d35cb15dc0e2b717
which seemed to say that it was a significant enough factor to do in the
EL0 case.

-corey

> 
>         M.
> -- 
> Jazz is not dead. It just smells funny...
Corey Minyard Feb. 20, 2020, 2:52 p.m. UTC | #5
On Thu, Feb 20, 2020 at 02:06:50PM +0000, Daniel Thompson wrote:
> On Wed, Feb 19, 2020 at 09:24:03AM -0600, minyard@acm.org wrote:
> > From: Corey Minyard <cminyard@mvista.com>
> > 
> > I was working on a single-step bug on kgdb on an ARM64 system, and I saw
> > this scenario:
> > 
> > * A single step is setup to return to el1
> > * The ERET return to el1
> > * An interrupt is pending and runs before the instruction
> > * As soon as PSTATE.D (the debug disable bit) is cleared, the single
> >     step happens in that location, not where it should have.
> > 
> > This appears to be due to PSTATE.SS not being cleared when the exception
> > happens.  Per section D.2.12.5 of the ARMv8 reference manual, that
> > appears to be incorrect, it says "As part of exception entry, the PE
> > does all of the following: ...  Sets PSTATE.SS to 0."
> > 
> > However, I appear to not be the first person who has noticed this.  In
> > the el0-only portion of the kernel_entry macro in entry.S, I found the
> > following comment: "Ensure MDSCR_EL1.SS is clear, since we can unmask
> > debug exceptions when scheduling."  Exactly the same scenario, except
> > coming from a userland single step, not a kernel one.
> > 
> > As I was studying this, though, I realized that the following scenario
> > had an issue:
> > 
> > * Kernel enables MDSCR.SS, MDSCR.KDE, MDSCR.MDE (unnecessary), and
> >   PSTATE.SS to enable a single step in el1, for kgdb or kprobes,
> >   on the current CPU's MDSCR register and the process' PSTATE.SS
> >   register.
> > * Kernel returns from the exception with ERET.
> > * An interrupt or page fault happens on the instruction, causing the
> >   instruction to not be run, but the exception handler runs.
> > * The exception causes the task to migrate to a new core.
> > * The return from the exception runs on a different processor now,
> >   where the MDSCR values are not set up for a single step.
> > * The single step fails to happen.
> > 
> > This is bad for kgdb, of course, but it seems really bad for kprobes if
> > this happens.
> > 
> > To fix both these problems, rework the handling of single steps to clear
> > things out upon entry to the kernel from el1, and then to set up single
> > step when returning to el1, and not do the setup in debug-monitors.c.
> > This means that single stepping does not use
> > enable/disable_debug_monitors(); it is no longer necessary to track
> > those flags for single stepping.  This is much like single stepping is
> > handled for el0.  A new flag is added in pt_regs to enable single
> > stepping from el1.  Unfortunately, the old value of PSTATE.SS cannot be
> > used for this because of the hardware bug mentioned earlier.
> > 
> > As part of this, there is an interaction between single stepping and the
> > other users of debug monitors with the MDSCR.KDE bit.  That bit has to
> > be set for both hardware breakpoints at el1 and single stepping at el1.
> > A new variable was created to store the cpu-wide value of MDSCR.KDE; the
> > single stepping code makes sure not to clear that bit on kernel entry if
> > it's set in the per-cpu variable.
> > 
> > After fixing this and doing some more testing, I ran into another issue:
> > 
> > * Kernel enables the pt_regs single step
> > * Kernel returns from the exception with ERET.
> > * An interrupt or page fault happens on the instruction, causing the
> >   instruction to not be run, but the exception handler runs.
> > * The exception handling hits a breakpoint and stops.
> > * The user continues from the breakpoint, so the kernel is no longer
> >   expecting a single step.
> > * On the return from the first exception, the single step flag in
> >   pt_regs is still set, so a single step trap happens.
> > * The kernel keels over from an unexpected single step.
> > 
> > There's no easy way to find the pt_regs that has the single step flag
> > set.  So a thread info flag was added so that the single step could be
> > disabled in this case.  Both that flag and the flag in pt_regs must be
> > set to enable a single step.
> > 
> > Signed-off-by: Corey Minyard <cminyard@mvista.com>
> 
> I've pointed the kgdbtest suite at this patch (and run one of the
> historically unstable test cases an extra 100 times just in case).
> 
> kgdbtest hasn't got great coverage, runs the code in qemu and some
> of the strongest tests are still marked XFAIL on arm64 (for reasons
> unrelated to stepping).
> 
> So the best I can say based on the above is that the test suite does not
> observe any regression (but equally no improvement). Nevertheless FWIW:

Thanks for testing this.  This is not a surprise, you would either have
to have a broken processor like the one I'm using, or you would have to
have a migration occur on the instruction being single-stepped, which
would be extremely unlikely.

Since I've already gained some experience here, I'll try to look at
fixing things here for ARM64.

-corey

> 
> 
> Tested-by: Daniel Thompson <daniel.thompson@linaro.org>
> 
> 
> Daniel.
Marc Zyngier Feb. 20, 2020, 3:06 p.m. UTC | #6
On 2020-02-20 14:50, Corey Minyard wrote:
> On Thu, Feb 20, 2020 at 02:21:36PM +0000, Marc Zyngier wrote:
>> On 2020-02-19 15:24, minyard@acm.org wrote:
>> > From: Corey Minyard <cminyard@mvista.com>
>> 
>> [...]
>> 
>> > After studying the EL0 handling for this, I realized an issue with using
>> > MDSCR to check if single step is enabled: it can be expensive on a VM.
>> > So check the task flag first to see if single step is enabled.  Then
>> > check MDSCR if the task flag is set.
>> 
>> Very tangential remark: I'd really like people *not* to try and 
>> optimize
>> Linux based on the behaviour of a hypervisor. In general, reading a
>> system register is fast, and the fact that it traps on a given 
>> hypervisor
>> at some point may not be true in the future, nor be a valid assumption
>> across hypervisors.
> 
> Normally I would agree, but I based this upon git commit
> https://github.com/torvalds/linux/commit/2a2830703a2371b47f7b50b1d35cb15dc0e2b717
> which seemed to say that it was a significant enough factor to do in 
> the
> EL0 case.

And that's a blast from a distant past. Hypervisors have changed 
drastically
over these 6 years, and I'm still sitting on a bunch of patches that 
*could*
change the way MDSCR_EL1 is handled.

         M.
Corey Minyard Feb. 20, 2020, 4:30 p.m. UTC | #7
On Thu, Feb 20, 2020 at 02:22:14PM +0000, Will Deacon wrote:
> On Wed, Feb 19, 2020 at 09:24:03AM -0600, minyard@acm.org wrote:
> > From: Corey Minyard <cminyard@mvista.com>
> > 
> > I was working on a single-step bug on kgdb on an ARM64 system, and I saw
> > this scenario:
> > 
> > * A single step is setup to return to el1
> > * The ERET return to el1
> > * An interrupt is pending and runs before the instruction
> > * As soon as PSTATE.D (the debug disable bit) is cleared, the single
> >     step happens in that location, not where it should have.
> > 
> > This appears to be due to PSTATE.SS not being cleared when the exception
> > happens.  Per section D.2.12.5 of the ARMv8 reference manual, that
> > appears to be incorrect, it says "As part of exception entry, the PE
> > does all of the following: ...  Sets PSTATE.SS to 0."
> 
> Sorry, but I don't follow you here. If PSTATE.SS is not cleared, why would
> you take the step exception?

I don't follow.  If PSTATE.SS is set and MDSCR_EL1.SS is set, the
processor will take a single-step exception as soon as the debug
exceptions are enabled.  That's what I'm seeing.  The hardware bug is
that PSTATE.SS is not cleared on an exception, and MDSCR_EL1.SS is not
cleared on kernel entry from el1.

I'm not 100% sure that PSTATE.SS is supposed to clear on an exception.
The debug handling documentation in the ARM64 manual is extremely hard
to follow.  But I'm pretty sure about this, as you would see this
problem on every processor and it would be obvious.  You could never
continue from a breakpoint, because the following happens when
continuing from a breakpoint in what I'm seeing:

* gdb disables the breakpoint
* gdb does a single step
* The single step triggers when debug excecption are enabled, not
  after the instruction in question.
* gdb restores the breakpoint and continues
* The breakpoint occurs again because the single step never really
  happened.

> 
> > However, I appear to not be the first person who has noticed this.  In
> > the el0-only portion of the kernel_entry macro in entry.S, I found the
> > following comment: "Ensure MDSCR_EL1.SS is clear, since we can unmask
> > debug exceptions when scheduling."  Exactly the same scenario, except
> > coming from a userland single step, not a kernel one.
> 
> No, I think you might be conflating PSTATE.SS and MDSCR_EL1.SS.

Not exactly.  If the processor clears PSTATE.SS, why would you need to
clear MDSCR_EL1.SS?  You can just ignore it.  But looking at the git
commit where that code was introduced, I can see that wasn't the reason.

> 
> > As I was studying this, though, I realized that the following scenario
> > had an issue:
> > 
> > * Kernel enables MDSCR.SS, MDSCR.KDE, MDSCR.MDE (unnecessary), and
> >   PSTATE.SS to enable a single step in el1, for kgdb or kprobes,
> >   on the current CPU's MDSCR register and the process' PSTATE.SS
> >   register.
> > * Kernel returns from the exception with ERET.
> > * An interrupt or page fault happens on the instruction, causing the
> >   instruction to not be run, but the exception handler runs.
> > * The exception causes the task to migrate to a new core.
> > * The return from the exception runs on a different processor now,
> >   where the MDSCR values are not set up for a single step.
> > * The single step fails to happen.
> > 
> > This is bad for kgdb, of course, but it seems really bad for kprobes if
> > this happens.
> 
> I don't see how this can happen for kprobes. Have you managed to reproduce
> the failure?

Can a migration happen if kprobes sets up a single-step, does the step,
and an interrupt or page fault happens before the single step occurs?
If so, that single-step will never happen.

I would be hard to reproduce.  I think I could force this to happen by
modifying the kernel to force a migration in the single-step code, but
it would be hard without modifying the kernel.

> 
> > To fix both these problems, rework the handling of single steps to clear
> > things out upon entry to the kernel from el1, and then to set up single
> > step when returning to el1, and not do the setup in debug-monitors.c.
> > This means that single stepping does not use
> > enable/disable_debug_monitors(); it is no longer necessary to track
> > those flags for single stepping.  This is much like single stepping is
> > handled for el0.  A new flag is added in pt_regs to enable single
> > stepping from el1.  Unfortunately, the old value of PSTATE.SS cannot be
> > used for this because of the hardware bug mentioned earlier.
> 
> I don't think there's a hardware bug.
> 
> It sound like you're trying to make kernel debugging per-task instead
> of per-cpu, but I don't think that's the right thing to do. What if I /want/
> to debug an interrupt handler? For example, I might have a watchpoint on
> something accessed by timer interrupt.
> 
> > As part of this, there is an interaction between single stepping and the
> > other users of debug monitors with the MDSCR.KDE bit.  That bit has to
> > be set for both hardware breakpoints at el1 and single stepping at el1.
> > A new variable was created to store the cpu-wide value of MDSCR.KDE; the
> > single stepping code makes sure not to clear that bit on kernel entry if
> > it's set in the per-cpu variable.
> > 
> > After fixing this and doing some more testing, I ran into another issue:
> > 
> > * Kernel enables the pt_regs single step
> > * Kernel returns from the exception with ERET.
> > * An interrupt or page fault happens on the instruction, causing the
> >   instruction to not be run, but the exception handler runs.
> 
> This sounds like you've broken debug; we should take the step exception
> in the exception handler. That's the way this is supposed to work.

Ok, here is the disconnect, I think.  If that is the case, then what I'm
seeing is working like it should.  That doesn't work with gdb, though,
gdb expects to be able to single-step and get to the next instruction.
The scenario I mentioned at the top of this email.

Let me look at this a bit more.  I'll look at this on qemu and maybe a
pi.

-corey

> 
> > There's no easy way to find the pt_regs that has the single step flag
> > set.  So a thread info flag was added so that the single step could be
> > disabled in this case.  Both that flag and the flag in pt_regs must be
> > set to enable a single step.
> 
> Honestly, I get the feeling that you don't really understand the code
> you're changing here and it's a tonne of effort to try to untangle what
> you're doing. That's not necessarily your fault because the debug
> architecture is a nightmare to comprehend, but I'm not keen to change it
> unless we have a really good justification. I'm sure kgdb is riddled with
> bugs but, as I said before, the fixes should be in kgdb, not by tearing
> up the low-level debug code (which has the potential to break other users).
> 
> Maybe it would be easier if you tried to fix one problem per patch,
> preferably with a way to reproduce the issue you're seeing each time?
> 
> Will
Corey Minyard Feb. 20, 2020, 9:30 p.m. UTC | #8
On Thu, Feb 20, 2020 at 10:30:38AM -0600, Corey Minyard wrote:
> On Thu, Feb 20, 2020 at 02:22:14PM +0000, Will Deacon wrote:
> > On Wed, Feb 19, 2020 at 09:24:03AM -0600, minyard@acm.org wrote:

snip...

> > > After fixing this and doing some more testing, I ran into another issue:
> > > 
> > > * Kernel enables the pt_regs single step
> > > * Kernel returns from the exception with ERET.
> > > * An interrupt or page fault happens on the instruction, causing the
> > >   instruction to not be run, but the exception handler runs.
> > 
> > This sounds like you've broken debug; we should take the step exception
> > in the exception handler. That's the way this is supposed to work.
> 
> Ok, here is the disconnect, I think.  If that is the case, then what I'm
> seeing is working like it should.  That doesn't work with gdb, though,
> gdb expects to be able to single-step and get to the next instruction.
> The scenario I mentioned at the top of this email.
> 
> Let me look at this a bit more.  I'll look at this on qemu and maybe a
> pi.
> 

Ok, this is the disconnect.  I was assuming that single step would stop
at the next instruction after returning from an exception.  qemu works
the same way the hardware I have does.  So I'm assuming arm64 doesn't
clear PTRACE.SS on an exception, even though that seems to be what the
manual says.

You can reproduce this by setting up kgdb on the kernel and hooking up
gdb, setting a breakpoint somewhere that has interrupts enabled, then
doing a "continue".  It will hit the same breakpoint again and again
because the PC doesn't get advanced by the single step and the timer
interrupt is always going to be pending.  I can do a more detailed set
of instructions with qemu, if you like.

I looked at kprobes a bit.  I don't think kprobes will have a problem
with this particular issue, it disables interrupts while single
stepping and doesn't allow a probe on any instruction that would modify
the interrupt settings.  I didn't look at page faults, but I assume that
it also won't allow a probe where there can be a page fault.

If a single-step is enabled and an exception occurs before the
instruction is executed, the single step is happening in the exception
handler when debug is re-enabled.  This what you are saying is how it is
supposed to work.

That's not what gdb is expecting, and that's not how x86 works, at
least.  I looked at ARM and MIPS and they don't even do single steps in
the kernel debugger.  PPC seems to work like x86 from code examination
and since our testers haven't reported this bug on that architecture.

I can do a patch that works sort of like kprobes, disabling interrupts
and simulating a single-step if the instruction modifies the daif bits.
Then you couldn't single step across an instruction that did a page
fault, but that's probably not a huge restriction.

I could modify the patch I have now to ifdef it out unless kgdb is
enabled.

I can do a patch that just pulls single step support out of the kgdb
interface for ARM64, since it doesn't work as gdb expects, anyway.  And
that's what ARM does.

It doesn't really matter to me.  I'm just trying to fix a bug that was
reported to me, and trying to get it upstream as a good citizen.  I don't
use kgdb.

-corey
James Morse Feb. 24, 2020, 6:07 p.m. UTC | #9
Hi Corey,

On 20/02/2020 21:30, Corey Minyard wrote:
> On Thu, Feb 20, 2020 at 10:30:38AM -0600, Corey Minyard wrote:
>> On Thu, Feb 20, 2020 at 02:22:14PM +0000, Will Deacon wrote:
>>> On Wed, Feb 19, 2020 at 09:24:03AM -0600, minyard@acm.org wrote:
>>>> After fixing this and doing some more testing, I ran into another issue:
>>>>
>>>> * Kernel enables the pt_regs single step
>>>> * Kernel returns from the exception with ERET.
>>>> * An interrupt or page fault happens on the instruction, causing the
>>>>   instruction to not be run, but the exception handler runs.
>>>
>>> This sounds like you've broken debug; we should take the step exception
>>> in the exception handler. That's the way this is supposed to work.
>>
>> Ok, here is the disconnect, I think.  If that is the case, then what I'm
>> seeing is working like it should.  That doesn't work with gdb, though,
>> gdb expects to be able to single-step and get to the next instruction.
>> The scenario I mentioned at the top of this email.
>>
>> Let me look at this a bit more.  I'll look at this on qemu and maybe a
>> pi.

> Ok, this is the disconnect.  I was assuming that single step would stop
> at the next instruction after returning from an exception.  qemu works
> the same way the hardware I have does.  So I'm assuming arm64 doesn't
> clear PTRACE.SS on an exception, even though that seems to be what the
> manual says.

PSTATE.SS isn't an enable bit for single step ... its part of a bigger state-machine.
(my made-up terminology for it is 'PSTATE.Suppress-Step'...)

The diagram in the Arm-Arm's D2.12.3 "The software step state machine" may help.

MDSCR_EL1.SS enables single-step, if PSTATE.D is clear the CPU will now take step
exceptions instead of pretty much anything else. (active pending state)
To execute one instruction you need to ERET with SPSR_ELx.SS set. (active, not pending)
The CPU will execute one instruction, then clear PSTATE.SS. (taking us back to active pending)

Taking an exception clears PSTATE.SS so that you know you're in active-pending state, and
will take a step exception once you re-enable debug with PSTATE.D. This lets you step the
exception handlers.
(if it was set, you wouldn't see the first instruction in the step handler, if it was
inherited, you couldn't know if you would see the first instruction or not).
If you take something other than a step exception, PSTATE.SS will be preserved in SPSR_EL1.SS.


What I think you are seeing is the step exception once debug is re-enabled, after taking
an exception you didn't want. This happens because MDSCR_EL1.SS is still set.


> You can reproduce this by setting up kgdb on the kernel and hooking up
> gdb, setting a breakpoint somewhere that has interrupts enabled, then
> doing a "continue".  It will hit the same breakpoint again and again
> because the PC doesn't get advanced by the single step and the timer
> interrupt is always going to be pending.  I can do a more detailed set
> of instructions with qemu, if you like.

> I looked at kprobes a bit.  I don't think kprobes will have a problem
> with this particular issue, it disables interrupts while single
> stepping and doesn't allow a probe on any instruction that would modify
> the interrupt settings.  I didn't look at page faults, but I assume that
> it also won't allow a probe where there can be a page fault.

Yes, arch_prepare_kprobe() checks search_exception_tables() for locations that we know may
cause page faults. These are blacklisted.


Thanks,

James
Corey Minyard Feb. 25, 2020, 3:38 p.m. UTC | #10
On Mon, Feb 24, 2020 at 06:07:17PM +0000, James Morse wrote:
> Hi Corey,
> 
> On 20/02/2020 21:30, Corey Minyard wrote:
> > On Thu, Feb 20, 2020 at 10:30:38AM -0600, Corey Minyard wrote:
> >> On Thu, Feb 20, 2020 at 02:22:14PM +0000, Will Deacon wrote:
> >>> On Wed, Feb 19, 2020 at 09:24:03AM -0600, minyard@acm.org wrote:
> >>>> After fixing this and doing some more testing, I ran into another issue:
> >>>>
> >>>> * Kernel enables the pt_regs single step
> >>>> * Kernel returns from the exception with ERET.
> >>>> * An interrupt or page fault happens on the instruction, causing the
> >>>>   instruction to not be run, but the exception handler runs.
> >>>
> >>> This sounds like you've broken debug; we should take the step exception
> >>> in the exception handler. That's the way this is supposed to work.
> >>
> >> Ok, here is the disconnect, I think.  If that is the case, then what I'm
> >> seeing is working like it should.  That doesn't work with gdb, though,
> >> gdb expects to be able to single-step and get to the next instruction.
> >> The scenario I mentioned at the top of this email.
> >>
> >> Let me look at this a bit more.  I'll look at this on qemu and maybe a
> >> pi.
> 
> > Ok, this is the disconnect.  I was assuming that single step would stop
> > at the next instruction after returning from an exception.  qemu works
> > the same way the hardware I have does.  So I'm assuming arm64 doesn't
> > clear PTRACE.SS on an exception, even though that seems to be what the
> > manual says.
> 
> PSTATE.SS isn't an enable bit for single step ... its part of a bigger state-machine.
> (my made-up terminology for it is 'PSTATE.Suppress-Step'...)
> 
> The diagram in the Arm-Arm's D2.12.3 "The software step state machine" may help.
> 
> MDSCR_EL1.SS enables single-step, if PSTATE.D is clear the CPU will now take step
> exceptions instead of pretty much anything else. (active pending state)
> To execute one instruction you need to ERET with SPSR_ELx.SS set. (active, not pending)
> The CPU will execute one instruction, then clear PSTATE.SS. (taking us back to active pending)
> 
> Taking an exception clears PSTATE.SS so that you know you're in active-pending state, and
> will take a step exception once you re-enable debug with PSTATE.D. This lets you step the
> exception handlers.
> (if it was set, you wouldn't see the first instruction in the step handler, if it was
> inherited, you couldn't know if you would see the first instruction or not).
> If you take something other than a step exception, PSTATE.SS will be preserved in SPSR_EL1.SS.
> 
> 
> What I think you are seeing is the step exception once debug is re-enabled, after taking
> an exception you didn't want. This happens because MDSCR_EL1.SS is still set.

Ok, I was familiar with that diagram, but I was trying to fit it into
how the other architectures where I have done this type of work.  This
is a little bizarre to me, but I understand now.  Your explaination was
very helpful, though the code I have is correct either way.

The problem is that kgdb doesn't work right with the current
implementation.  If you continue from a breakpoint, it does not
continue.  It just stops at the same place.  What happens is:

* gdb remove the breakpoint and single steps.
* An exception happens and the single step stops in the kernel entry.
  Thus the state machine goes to inactive.
* gdb re-inserts the breakpoint and continues.
* When the exception returns, the breakpoint is there and is hit again.

You can never continue from a breakpoint without removing it, because
there's alway a timer interrupt pending.  You can't single-step through
instructions (stepi) because it always stops in the kernel entry.  If
you do a normal gdb single step in code it just hangs because it keeps
trying to single step through instructions and keeps stopping in kernel
entry.  So gdb does not expect the behavior that is currently
implemented.

The patch as I have posted it is probably the simplest way to fix it.
It basically makes single-step work like other architectures, and like
the userspace single step works.  I could ifdef it so that the entry
code is only there if kgdb is enabled.  You can single step through
instructions that cause page faults, so it's a little more general.

The other way is to run the single-stepped instruction with interrupts
disabled and emulate any messing with the DAIF bits.  I assume
that's only "MRS <Xt>, DAIF", "MSR DAIF, <Xt>", "MSR DAIFSet, #<imm>",
and "MSR DAIFClr, #<imm>".  Well, I guess ERET also does that, but maybe
that's ok, probably not a big deal.  In this case you can't single step
over instructions that take page faults.  I'm not sure if that's a big
deal or not, but I assume users would do that.  And it's more complex
since you have to emulate those instructions messing with DAIF.

I would like to get this fixed, either way.

Thanks,

-corey

> 
> 
> > You can reproduce this by setting up kgdb on the kernel and hooking up
> > gdb, setting a breakpoint somewhere that has interrupts enabled, then
> > doing a "continue".  It will hit the same breakpoint again and again
> > because the PC doesn't get advanced by the single step and the timer
> > interrupt is always going to be pending.  I can do a more detailed set
> > of instructions with qemu, if you like.
> 
> > I looked at kprobes a bit.  I don't think kprobes will have a problem
> > with this particular issue, it disables interrupts while single
> > stepping and doesn't allow a probe on any instruction that would modify
> > the interrupt settings.  I didn't look at page faults, but I assume that
> > it also won't allow a probe where there can be a page fault.
> 
> Yes, arch_prepare_kprobe() checks search_exception_tables() for locations that we know may
> cause page faults. These are blacklisted.
> 
> 
> Thanks,
> 
> James
James Morse Feb. 25, 2020, 5:55 p.m. UTC | #11
Hi Corey,

On 25/02/2020 15:38, Corey Minyard wrote:
> On Mon, Feb 24, 2020 at 06:07:17PM +0000, James Morse wrote:
>> On 20/02/2020 21:30, Corey Minyard wrote:
>>> Ok, this is the disconnect.  I was assuming that single step would stop
>>> at the next instruction after returning from an exception.  qemu works
>>> the same way the hardware I have does.  So I'm assuming arm64 doesn't
>>> clear PTRACE.SS on an exception, even though that seems to be what the
>>> manual says.
>>
>> PSTATE.SS isn't an enable bit for single step ... its part of a bigger state-machine.
>> (my made-up terminology for it is 'PSTATE.Suppress-Step'...)
>>
>> The diagram in the Arm-Arm's D2.12.3 "The software step state machine" may help.
>>
>> MDSCR_EL1.SS enables single-step, if PSTATE.D is clear the CPU will now take step
>> exceptions instead of pretty much anything else. (active pending state)
>> To execute one instruction you need to ERET with SPSR_ELx.SS set. (active, not pending)
>> The CPU will execute one instruction, then clear PSTATE.SS. (taking us back to active pending)
>>
>> Taking an exception clears PSTATE.SS so that you know you're in active-pending state, and
>> will take a step exception once you re-enable debug with PSTATE.D. This lets you step the
>> exception handlers.
>> (if it was set, you wouldn't see the first instruction in the step handler, if it was
>> inherited, you couldn't know if you would see the first instruction or not).
>> If you take something other than a step exception, PSTATE.SS will be preserved in SPSR_EL1.SS.
>>
>>
>> What I think you are seeing is the step exception once debug is re-enabled, after taking
>> an exception you didn't want. This happens because MDSCR_EL1.SS is still set.


> Ok, I was familiar with that diagram, but I was trying to fit it into
> how the other architectures where I have done this type of work.  This
> is a little bizarre to me, but I understand now.  Your explaination was
> very helpful, though the code I have is correct either way.

| +/*
| + * The task that is currently being single-stepped.  There can be only
| + * one.
| + */
| +struct task_struct *single_step_task;

? I think this would break kprobes and perf's use of single-step on SMP systems.


> The problem is that kgdb doesn't work right with the current
> implementation.  If you continue from a breakpoint, it does not
> continue.  It just stops at the same place.  What happens is:
> 
> * gdb remove the breakpoint and single steps.
> * An exception happens and the single step stops in the kernel entry.
>   Thus the state machine goes to inactive.

(e.g. the original instruction caused a page fault)


> * gdb re-inserts the breakpoint and continues.

> * When the exception returns, the breakpoint is there and is hit again.

Yes, because the original instruction hadn't run, it caused a page fault. This time its
more likely to succeed.

perf's use of arm64's breakpoints is quite happy with this. It means if you hit a
breakpoint in the fault handler, you see those too. If an instruction causes a page-fault,
you may see it twice, but that is because the CPU tried to execute it twice.

(I agree the irq case is probably just annoying for kgdb)


> You can never continue from a breakpoint without removing it, because
> there's alway a timer interrupt pending.

Are you driving the single-step hardware directly here, or using the behaviour from
breakpoint_handler() and reinstall_suspended_bps()?

These disable breakpoints and step the original instruction, then re-enable breakpoints.
This is because breakpoints fire before the instruction runs, and single-step doesn't
suppress breakpoints. This has to happen regardless of asynchronous exceptions.


> You can't single-step through
> instructions (stepi) because it always stops in the kernel entry.  If
> you do a normal gdb single step in code it just hangs because it keeps
> trying to single step through instructions and keeps stopping in kernel
> entry.  So gdb does not expect the behavior that is currently
> implemented.

Is it fair to say that the user driving kgdb is very-slow compared to IRQs firing?
This isn't true for the other consumers of single-step (kprobes, perf).


> The patch as I have posted it is probably the simplest way to fix it.
> It basically makes single-step work like other architectures, and like
> the userspace single step works.  I could ifdef it so that the entry
> code is only there if kgdb is enabled.  You can single step through
> instructions that cause page faults, so it's a little more general.

> The other way is to run the single-stepped instruction with interrupts
> disabled and emulate any messing with the DAIF bits.  I assume
> that's only "MRS <Xt>, DAIF", "MSR DAIF, <Xt>", "MSR DAIFSet, #<imm>",
> and "MSR DAIFClr, #<imm>". 

> Well, I guess ERET also does that, but maybe
> that's ok, probably not a big deal.

(tangent: you can't step ERET!)


> In this case you can't single step over instructions that take page faults.

This works for perf. It doesn't for kprobes, which is why kprobes blacklists those sites.


> I'm not sure if that's a big
> deal or not, but I assume users would do that.  And it's more complex
> since you have to emulate those instructions messing with DAIF.
> 
> I would like to get this fixed, either way.

If the problem is IRQs preventing the very-slow user making forward-progress, it may be
possible for kgdb to ask the irqchip code to mute all IRQs on this CPU while it is
stepping. The PMR mechanism we use for pNMI could do this for the GIC. (Caveat: I don't
know anything about the GIC). For the pi ... no idea.

This doesn't stop you seeing instructions that fault from taking a fault, or taking the
breakpoint twice when the CPU tries to run the instruction twice. This is just the debug
hardware showing you what happened.


Thanks,

James
Corey Minyard Feb. 26, 2020, 2:58 a.m. UTC | #12
Hello James,

On Tue, Feb 25, 2020 at 05:55:44PM +0000, James Morse wrote:
> Hi Corey,
> 
> On 25/02/2020 15:38, Corey Minyard wrote:
> > On Mon, Feb 24, 2020 at 06:07:17PM +0000, James Morse wrote:
> >> On 20/02/2020 21:30, Corey Minyard wrote:
> >>> Ok, this is the disconnect.  I was assuming that single step would stop
> >>> at the next instruction after returning from an exception.  qemu works
> >>> the same way the hardware I have does.  So I'm assuming arm64 doesn't
> >>> clear PTRACE.SS on an exception, even though that seems to be what the
> >>> manual says.
> >>
> >> PSTATE.SS isn't an enable bit for single step ... its part of a bigger state-machine.
> >> (my made-up terminology for it is 'PSTATE.Suppress-Step'...)
> >>
> >> The diagram in the Arm-Arm's D2.12.3 "The software step state machine" may help.
> >>
> >> MDSCR_EL1.SS enables single-step, if PSTATE.D is clear the CPU will now take step
> >> exceptions instead of pretty much anything else. (active pending state)
> >> To execute one instruction you need to ERET with SPSR_ELx.SS set. (active, not pending)
> >> The CPU will execute one instruction, then clear PSTATE.SS. (taking us back to active pending)
> >>
> >> Taking an exception clears PSTATE.SS so that you know you're in active-pending state, and
> >> will take a step exception once you re-enable debug with PSTATE.D. This lets you step the
> >> exception handlers.
> >> (if it was set, you wouldn't see the first instruction in the step handler, if it was
> >> inherited, you couldn't know if you would see the first instruction or not).
> >> If you take something other than a step exception, PSTATE.SS will be preserved in SPSR_EL1.SS.
> >>
> >>
> >> What I think you are seeing is the step exception once debug is re-enabled, after taking
> >> an exception you didn't want. This happens because MDSCR_EL1.SS is still set.
> 
> 
> > Ok, I was familiar with that diagram, but I was trying to fit it into
> > how the other architectures where I have done this type of work.  This
> > is a little bizarre to me, but I understand now.  Your explaination was
> > very helpful, though the code I have is correct either way.
> 
> | +/*
> | + * The task that is currently being single-stepped.  There can be only
> | + * one.
> | + */
> | +struct task_struct *single_step_task;
> 
> ? I think this would break kprobes and perf's use of single-step on SMP systems.

It shouldn't.  The task will never change in that case, interrupts are
disabled as the instruction runs.

> 
> 
> > The problem is that kgdb doesn't work right with the current
> > implementation.  If you continue from a breakpoint, it does not
> > continue.  It just stops at the same place.  What happens is:
> > 
> > * gdb remove the breakpoint and single steps.
> > * An exception happens and the single step stops in the kernel entry.
> >   Thus the state machine goes to inactive.
> 
> (e.g. the original instruction caused a page fault)
> 
> 
> > * gdb re-inserts the breakpoint and continues.
> 
> > * When the exception returns, the breakpoint is there and is hit again.
> 
> Yes, because the original instruction hadn't run, it caused a page fault. This time its
> more likely to succeed.
> 
> perf's use of arm64's breakpoints is quite happy with this. It means if you hit a
> breakpoint in the fault handler, you see those too. If an instruction causes a page-fault,
> you may see it twice, but that is because the CPU tried to execute it twice.

That's true, but gdb runs at human speed.  There will always be a timer
interrupt pending.

> 
> (I agree the irq case is probably just annoying for kgdb)
> 
> 
> > You can never continue from a breakpoint without removing it, because
> > there's alway a timer interrupt pending.
> 
> Are you driving the single-step hardware directly here, or using the behaviour from
> breakpoint_handler() and reinstall_suspended_bps()?
> 
> These disable breakpoints and step the original instruction, then re-enable breakpoints.
> This is because breakpoints fire before the instruction runs, and single-step doesn't
> suppress breakpoints. This has to happen regardless of asynchronous exceptions.
> 
> 
> > You can't single-step through
> > instructions (stepi) because it always stops in the kernel entry.  If
> > you do a normal gdb single step in code it just hangs because it keeps
> > trying to single step through instructions and keeps stopping in kernel
> > entry.  So gdb does not expect the behavior that is currently
> > implemented.
> 
> Is it fair to say that the user driving kgdb is very-slow compared to IRQs firing?
> This isn't true for the other consumers of single-step (kprobes, perf).

Yes.  You hit a breakpoint, the user does whatever then does a continue
or single-step.  Which is, of course, not true of the other single-step
users, as you say.

> 
> 
> > The patch as I have posted it is probably the simplest way to fix it.
> > It basically makes single-step work like other architectures, and like
> > the userspace single step works.  I could ifdef it so that the entry
> > code is only there if kgdb is enabled.  You can single step through
> > instructions that cause page faults, so it's a little more general.
> 
> > The other way is to run the single-stepped instruction with interrupts
> > disabled and emulate any messing with the DAIF bits.  I assume
> > that's only "MRS <Xt>, DAIF", "MSR DAIF, <Xt>", "MSR DAIFSet, #<imm>",
> > and "MSR DAIFClr, #<imm>". 
> 
> > Well, I guess ERET also does that, but maybe
> > that's ok, probably not a big deal.
> 
> (tangent: you can't step ERET!)
> 
> 
> > In this case you can't single step over instructions that take page faults.
> 
> This works for perf. It doesn't for kprobes, which is why kprobes blacklists those sites.
> 
> 
> > I'm not sure if that's a big
> > deal or not, but I assume users would do that.  And it's more complex
> > since you have to emulate those instructions messing with DAIF.
> > 
> > I would like to get this fixed, either way.
> 
> If the problem is IRQs preventing the very-slow user making forward-progress, it may be
> possible for kgdb to ask the irqchip code to mute all IRQs on this CPU while it is
> stepping. The PMR mechanism we use for pNMI could do this for the GIC. (Caveat: I don't
> know anything about the GIC). For the pi ... no idea.
> 
> This doesn't stop you seeing instructions that fault from taking a fault, or taking the
> breakpoint twice when the CPU tries to run the instruction twice. This is just the debug
> hardware showing you what happened.

I'm not sure messing around with the GIC is really the right solution.
But, it's an interesting idea.

I'm not sure how to proceed from here, though.

-corey

> 
> 
> Thanks,
> 
> James
diff mbox series

Patch

diff --git a/arch/arm64/include/asm/debug-monitors.h b/arch/arm64/include/asm/debug-monitors.h
index 7619f473155f..02842fef74bb 100644
--- a/arch/arm64/include/asm/debug-monitors.h
+++ b/arch/arm64/include/asm/debug-monitors.h
@@ -13,7 +13,8 @@ 
 #include <asm/ptrace.h>
 
 /* Low-level stepping controls. */
-#define DBG_MDSCR_SS		(1 << 0)
+#define DBG_MDSCR_SS_BIT	0
+#define DBG_MDSCR_SS		(1 << DBG_MDSCR_SS_BIT)
 #define DBG_SPSR_SS		(1 << 21)
 
 /* MDSCR_EL1 enabling bits */
diff --git a/arch/arm64/include/asm/ptrace.h b/arch/arm64/include/asm/ptrace.h
index bf57308fcd63..61b69f1f7c26 100644
--- a/arch/arm64/include/asm/ptrace.h
+++ b/arch/arm64/include/asm/ptrace.h
@@ -169,11 +169,11 @@  struct pt_regs {
 	};
 	u64 orig_x0;
 #ifdef __AARCH64EB__
-	u32 unused2;
+	u32 ss_enable; /* Kernel single-step for a task */
 	s32 syscallno;
 #else
 	s32 syscallno;
-	u32 unused2;
+	u32 ss_enable;
 #endif
 
 	u64 orig_addr_limit;
diff --git a/arch/arm64/include/asm/thread_info.h b/arch/arm64/include/asm/thread_info.h
index f0cec4160136..445519a5d2c9 100644
--- a/arch/arm64/include/asm/thread_info.h
+++ b/arch/arm64/include/asm/thread_info.h
@@ -78,6 +78,7 @@  void arch_release_task_struct(struct task_struct *tsk);
 #define TIF_SVE_VL_INHERIT	24	/* Inherit sve_vl_onexec across exec */
 #define TIF_SSBD		25	/* Wants SSB mitigation */
 #define TIF_TAGGED_ADDR		26	/* Allow tagged user addresses */
+#define TIF_KERNEL_SINGLESTEP	27	/* Single-stepping in EL1. */
 
 #define _TIF_SIGPENDING		(1 << TIF_SIGPENDING)
 #define _TIF_NEED_RESCHED	(1 << TIF_NEED_RESCHED)
diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c
index a5bdce8af65b..038e76d2f0c4 100644
--- a/arch/arm64/kernel/asm-offsets.c
+++ b/arch/arm64/kernel/asm-offsets.c
@@ -62,6 +62,7 @@  int main(void)
   DEFINE(S_PSTATE,		offsetof(struct pt_regs, pstate));
   DEFINE(S_PC,			offsetof(struct pt_regs, pc));
   DEFINE(S_SYSCALLNO,		offsetof(struct pt_regs, syscallno));
+  DEFINE(S_SS_ENABLE,		offsetof(struct pt_regs, ss_enable));
   DEFINE(S_ORIG_ADDR_LIMIT,	offsetof(struct pt_regs, orig_addr_limit));
   DEFINE(S_PMR_SAVE,		offsetof(struct pt_regs, pmr_save));
   DEFINE(S_STACKFRAME,		offsetof(struct pt_regs, stackframe));
diff --git a/arch/arm64/kernel/debug-monitors.c b/arch/arm64/kernel/debug-monitors.c
index 48222a4760c2..9260f1cfe985 100644
--- a/arch/arm64/kernel/debug-monitors.c
+++ b/arch/arm64/kernel/debug-monitors.c
@@ -77,6 +77,14 @@  early_param("nodebugmon", early_debug_disable);
 static DEFINE_PER_CPU(int, mde_ref_count);
 static DEFINE_PER_CPU(int, kde_ref_count);
 
+/*
+ * The KDE bit must be set for hardware breakpoints or single stepping
+ * to work in the kernel.  So when a kernel single-step finishes, it
+ * will clear the SS bit and the KDE bit.  It uses the below to restore
+ * the KDE bit if we need it set for hardware breakpoints.
+ */
+DEFINE_PER_CPU_READ_MOSTLY(u64, mdscr_debug_bits);
+
 void enable_debug_monitors(enum dbg_active_el el)
 {
 	u32 mdscr, enable = 0;
@@ -94,6 +102,7 @@  void enable_debug_monitors(enum dbg_active_el el)
 		mdscr = mdscr_read();
 		mdscr |= enable;
 		mdscr_write(mdscr);
+		__this_cpu_write(mdscr_debug_bits, mdscr & DBG_MDSCR_KDE);
 	}
 }
 NOKPROBE_SYMBOL(enable_debug_monitors);
@@ -115,6 +124,7 @@  void disable_debug_monitors(enum dbg_active_el el)
 		mdscr = mdscr_read();
 		mdscr &= disable;
 		mdscr_write(mdscr);
+		__this_cpu_write(mdscr_debug_bits, mdscr & DBG_MDSCR_KDE);
 	}
 }
 NOKPROBE_SYMBOL(disable_debug_monitors);
@@ -405,27 +415,66 @@  void user_fastforward_single_step(struct task_struct *task)
 }
 
 /* Kernel API */
+
+/*
+ * The task that is currently being single-stepped.  There can be only
+ * one.
+ */
+struct task_struct *single_step_task;
+
+/*
+ * Why, you may ask, does this have both regs->ss_enable and
+ * TIF_KERNEL_SINGLESTEP to enable single stepping?  The trouble lies
+ * in nested exceptions in the kernel.  If an interrupt is pending
+ * when a single-step occurs, it will happen before the single step,
+ * and it will go through the el1 kernel entry.
+ *
+ * One scenario is that another exception may occur during this
+ * interrupt processing.  If you only had TIF_KERNEL_SINGLESTEP, single
+ * stepping would be enabled there, but that's the wrong place.  So you
+ * have to rely on regs->ss_enabled to tell you if that is the case,
+ * since it won't be enabled in the second interrupt handler.
+ *
+ * A breakpoint may be hit during this interrupt, and the kernel will
+ * stop there.  But if you only had regs->ss_enabled, you couldn't
+ * disable the single stepping since you have no idea where regs is
+ * you return from kgdb.  So when it returned, it would hit the single
+ * step, and the kernel would die from an unknown single step source.
+ * So you have TIF_KERNEL_SINGLESTEP to prevent that problem.  The
+ * task being single-stepped is saved and the flag cleared when it is
+ * disabled.
+ *
+ * It would be nice to be able to use SPSR.SS instead of having a
+ * separate regs->ss_enable flag.  However, some processors don't
+ * clear PSTATE.SS on an exception, so SPSR.SS will be set in
+ * subsequent exception handlers.
+ */
 void kernel_enable_single_step(struct pt_regs *regs)
 {
 	WARN_ON(!irqs_disabled());
-	set_regs_spsr_ss(regs);
-	mdscr_write(mdscr_read() | DBG_MDSCR_SS);
-	enable_debug_monitors(DBG_ACTIVE_EL1);
+	regs->ss_enable = DBG_MDSCR_SS;
+	set_ti_thread_flag(task_thread_info(current), TIF_KERNEL_SINGLESTEP);
+	get_task_struct(current);
+	single_step_task = current;
 }
 NOKPROBE_SYMBOL(kernel_enable_single_step);
 
 void kernel_disable_single_step(void)
 {
 	WARN_ON(!irqs_disabled());
-	mdscr_write(mdscr_read() & ~DBG_MDSCR_SS);
-	disable_debug_monitors(DBG_ACTIVE_EL1);
+	if (single_step_task) {
+		clear_ti_thread_flag(task_thread_info(single_step_task),
+				     TIF_KERNEL_SINGLESTEP);
+		put_task_struct(single_step_task);
+		single_step_task = NULL;
+	}
 }
 NOKPROBE_SYMBOL(kernel_disable_single_step);
 
 int kernel_active_single_step(void)
 {
 	WARN_ON(!irqs_disabled());
-	return mdscr_read() & DBG_MDSCR_SS;
+	return single_step_task != NULL;
 }
 NOKPROBE_SYMBOL(kernel_active_single_step);
 
diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
index 9461d812ae27..336fc667bf04 100644
--- a/arch/arm64/kernel/entry.S
+++ b/arch/arm64/kernel/entry.S
@@ -25,6 +25,7 @@ 
 #include <asm/thread_info.h>
 #include <asm/asm-uaccess.h>
 #include <asm/unistd.h>
+#include <asm/debug-monitors.h>
 
 /*
  * Context tracking subsystem.  Used to instrument transitions
@@ -191,6 +192,32 @@  alternative_cb_end
 	mrs	x23, spsr_el1
 	stp	lr, x21, [sp, #S_LR]
 
+	.if \el != 0
+	/*
+	 * If single-step was enabled, save it off and disable it,
+	 * or it will trap on enable_dbg.
+	 * The restore code will re-enable it if necessary.
+	 * Use the task flag to know if kernel single step might be enabled,
+	 * not the mdscr value first, accessing mdscr in a VM is expensive.
+	 * But the task flag MDSCR.SS must be set for it to be enabled,
+	 * so that has to be checked, if the task flag is set.
+	 */
+	mov	w21, #0
+	ldr	x20, [tsk, #TSK_TI_FLAGS]
+	tbz	x20, #TIF_KERNEL_SINGLESTEP, 1f
+	mrs	x20, mdscr_el1
+	tbz	x20, #DBG_MDSCR_SS_BIT, 1f
+	ldr_this_cpu x19, mdscr_debug_bits, x21
+	bic	x20, x20, #DBG_MDSCR_SS
+	bic	x20, x20, #DBG_MDSCR_KDE
+	orr	x20, x20, x19
+	msr	mdscr_el1, x20
+	mov	w21, #DBG_MDSCR_SS
+1:
+	str	w21, [sp, #S_SS_ENABLE]
+	bic	x23, x23, #DBG_SPSR_SS
+	.endif /* \el != 0 */
+
 	/*
 	 * In order to be able to dump the contents of struct pt_regs at the
 	 * time the exception was taken (in case we attempt to walk the call
@@ -344,6 +371,23 @@  alternative_else_nop_endif
 	apply_ssbd 0, x0, x1
 	.endif
 
+	.if	\el != 0
+	/* Restore the single-step bit. */
+	ldr	w20, [sp, #S_SS_ENABLE]
+	tbz	w20, #DBG_MDSCR_SS_BIT, 6f
+	ldr	x20, [tsk, #TSK_TI_FLAGS]
+	tbz	x20, #TIF_KERNEL_SINGLESTEP, 6f
+	disable_daif
+	mrs	x20, mdscr_el1
+	orr	x20, x20, #DBG_MDSCR_SS // Enable single step
+	/* KDE must be set for SS in EL1 */
+	orr     x20, x20, #DBG_MDSCR_KDE
+	msr	mdscr_el1, x20
+	orr	x22, x22, #DBG_SPSR_SS
+6:
+	/* PSTATE.D and PSTATE.SS will be restored from SPSR_EL1. */
+	.endif
+
 	msr	elr_el1, x21			// set up the return data
 	msr	spsr_el1, x22
 	ldp	x0, x1, [sp, #16 * 0]
diff --git a/arch/arm64/kernel/kgdb.c b/arch/arm64/kernel/kgdb.c
index 43119922341f..5b40f190f455 100644
--- a/arch/arm64/kernel/kgdb.c
+++ b/arch/arm64/kernel/kgdb.c
@@ -221,8 +221,10 @@  int kgdb_arch_handle_exception(int exception_vector, int signo,
 		/*
 		 * Enable single step handling
 		 */
-		if (!kernel_active_single_step())
-			kernel_enable_single_step(linux_regs);
+		if (kernel_active_single_step())
+			/* Clear out the old one */
+			kernel_disable_single_step();
+		kernel_enable_single_step(linux_regs);
 		err = 0;
 		break;
 	default: