diff mbox

[RESEND] arm64: kgdb: fix single stepping

Message ID 1429578793-3971-1-git-send-email-takahiro.akashi@linaro.org (mailing list archive)
State New, archived
Headers show

Commit Message

AKASHI Takahiro April 21, 2015, 1:13 a.m. UTC
Jason,

Could you please review my patch below?
See also arm64 maintainer's comment:
http://lists.infradead.org/pipermail/linux-arm-kernel/2015-January/313712.html

Thanks,
-Takahiro AKASHI

I tried to verify kgdb in vanilla kernel on fast model, but it seems that
the single stepping with kgdb doesn't work correctly since its first
appearance at v3.15.

On v3.15, 'stepi' command after breaking the kernel at some breakpoint
steps forward to the next instruction, but the succeeding 'stepi' never
goes beyond that.
On v3.16, 'stepi' moves forward and stops at the next instruction just
after enable_dbg in el1_dbg, and never goes beyond that. This variance of
behavior seems to come in with the following patch in v3.16:

    commit 2a2830703a23 ("arm64: debug: avoid accessing mdscr_el1 on fault
    paths where possible")

This patch
(1) moves kgdb_disable_single_step() from 'c' command handling to single
    step handler.
    This makes sure that single stepping gets effective at every 's' command.
    Please note that, under the current implementation, single step bit in
    spsr, which is cleared by the first single stepping, will not be set
    again for the consecutive 's' commands because single step bit in mdscr
    is still kept on (that is, kernel_active_single_step() in
    kgdb_arch_handle_exception() is true).
(2) re-implements kgdb_roundup_cpus() because the current implementation
    enabled interrupts naively. See below.
(3) removes 'enable_dbg' in el1_dbg.
    Single step bit in mdscr is turned on in do_handle_exception()->
    kgdb_handle_expection() before returning to debugged context, and if
    debug exception is enabled in el1_dbg, we will see unexpected single-
    stepping in el1_dbg.
    Since v3.18, the following patch does the same:
      commit 1059c6bf8534 ("arm64: debug: don't re-enable debug exceptions
      on return from el1_dbg)
(4) masks interrupts while single-stepping one instruction.
    If an interrupt is caught during processing a single-stepping, debug
    exception is unintentionally enabled by el1_irq's 'enable_dbg' before
    returning to debugged context.
    Thus, like in (2), we will see unexpected single-stepping in el1_irq.

Basically (1) and (2) are for v3.15, (3) and (4) for v3.1[67].

* issue fixed by (2):
Without (2), we would see another problem if a breakpoint is set at
interrupt-sensible places, like gic_handle_irq():

    KGDB: re-enter error: breakpoint removed ffffffc000081258
    ------------[ cut here ]------------
    WARNING: CPU: 0 PID: 650 at kernel/debug/debug_core.c:435
					kgdb_handle_exception+0x1dc/0x1f4()
    Modules linked in:
    CPU: 0 PID: 650 Comm: sh Not tainted 3.17.0-rc2+ #177
    Call trace:
    [<ffffffc000087fac>] dump_backtrace+0x0/0x130
    [<ffffffc0000880ec>] show_stack+0x10/0x1c
    [<ffffffc0004d683c>] dump_stack+0x74/0xb8
    [<ffffffc0000ab824>] warn_slowpath_common+0x8c/0xb4
    [<ffffffc0000ab90c>] warn_slowpath_null+0x14/0x20
    [<ffffffc000121bfc>] kgdb_handle_exception+0x1d8/0x1f4
    [<ffffffc000092ffc>] kgdb_brk_fn+0x18/0x28
    [<ffffffc0000821c8>] brk_handler+0x9c/0xe8
    [<ffffffc0000811e8>] do_debug_exception+0x3c/0xac
    Exception stack(0xffffffc07e027650 to 0xffffffc07e027770)
    ...
    [<ffffffc000083cac>] el1_dbg+0x14/0x68
    [<ffffffc00012178c>] kgdb_cpu_enter+0x464/0x5c0
    [<ffffffc000121bb4>] kgdb_handle_exception+0x190/0x1f4
    [<ffffffc000092ffc>] kgdb_brk_fn+0x18/0x28
    [<ffffffc0000821c8>] brk_handler+0x9c/0xe8
    [<ffffffc0000811e8>] do_debug_exception+0x3c/0xac
    Exception stack(0xffffffc07e027ac0 to 0xffffffc07e027be0)
    ...
    [<ffffffc000083cac>] el1_dbg+0x14/0x68
    [<ffffffc00032e4b4>] __handle_sysrq+0x11c/0x190
    [<ffffffc00032e93c>] write_sysrq_trigger+0x4c/0x60
    [<ffffffc0001e7d58>] proc_reg_write+0x54/0x84
    [<ffffffc000192fa4>] vfs_write+0x98/0x1c8
    [<ffffffc0001939b0>] SyS_write+0x40/0xa0

Once some interrupt occurs, a breakpoint at gic_handle_irq() triggers kgdb.
Kgdb then calls kgdb_roundup_cpus() to sync with other cpus.
Current kgdb_roundup_cpus() unmasks interrupts temporarily to
use smp_call_function().
This eventually allows another interrupt to occur and likely results in
hitting a breakpoint at gic_handle_irq() again since debug exception is
always enabled in el1_irq.

We can avoid this issue by specifying "nokgdbroundup" in kernel parameter,
but this will also leave other cpus be in unknown state in terms of kgdb,
and may result in interfering with kgdb activity.

Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
---
 arch/arm64/kernel/kgdb.c |   60 +++++++++++++++++++++++++++++++++++-----------
 1 file changed, 46 insertions(+), 14 deletions(-)

Comments

Peppe CAVALLARO Sept. 14, 2016, 12:48 p.m. UTC | #1
Hello

[snip]

> We can avoid this issue by specifying "nokgdbroundup" in kernel parameter,
> but this will also leave other cpus be in unknown state in terms of kgdb,
> and may result in interfering with kgdb activity.

> Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org 
<http://lists.infradead.org/mailman/listinfo/linux-arm-kernel>>

Tested-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>

[snip]
Will Deacon Sept. 14, 2016, 2:58 p.m. UTC | #2
Hi Akashi,

On Tue, Apr 21, 2015 at 02:13:13AM +0100, AKASHI Takahiro wrote:
> Could you please review my patch below?
> See also arm64 maintainer's comment:
> http://lists.infradead.org/pipermail/linux-arm-kernel/2015-January/313712.html

-ETIMEDOUT waiting for the kdgb folk to comment. Ppeople have reported
that this patch is required for kgdb to work correctly on arm64, so I'm
happy to merge it.

However, as detailed in your comment log:

> This patch
> (1) moves kgdb_disable_single_step() from 'c' command handling to single
>     step handler.
>     This makes sure that single stepping gets effective at every 's' command.
>     Please note that, under the current implementation, single step bit in
>     spsr, which is cleared by the first single stepping, will not be set
>     again for the consecutive 's' commands because single step bit in mdscr
>     is still kept on (that is, kernel_active_single_step() in
>     kgdb_arch_handle_exception() is true).
> (2) re-implements kgdb_roundup_cpus() because the current implementation
>     enabled interrupts naively. See below.
> (3) removes 'enable_dbg' in el1_dbg.
>     Single step bit in mdscr is turned on in do_handle_exception()->
>     kgdb_handle_expection() before returning to debugged context, and if
>     debug exception is enabled in el1_dbg, we will see unexpected single-
>     stepping in el1_dbg.
>     Since v3.18, the following patch does the same:
>       commit 1059c6bf8534 ("arm64: debug: don't re-enable debug exceptions
>       on return from el1_dbg)
> (4) masks interrupts while single-stepping one instruction.
>     If an interrupt is caught during processing a single-stepping, debug
>     exception is unintentionally enabled by el1_irq's 'enable_dbg' before
>     returning to debugged context.
>     Thus, like in (2), we will see unexpected single-stepping in el1_irq.

this patch is doing *far* too much in one go. Could you please repost it
as a series of self-contained fixes with clear commit messages, so I can
queue them and cc stable where appropriate?

Thanks,

Will
AKASHI Takahiro Sept. 15, 2016, 7:56 a.m. UTC | #3
On Wed, Sep 14, 2016 at 03:58:51PM +0100, Will Deacon wrote:
> Hi Akashi,
> 
> On Tue, Apr 21, 2015 at 02:13:13AM +0100, AKASHI Takahiro wrote:
> > Could you please review my patch below?
> > See also arm64 maintainer's comment:
> > http://lists.infradead.org/pipermail/linux-arm-kernel/2015-January/313712.html
> 
> -ETIMEDOUT waiting for the kdgb folk to comment. Ppeople have reported
> that this patch is required for kgdb to work correctly on arm64, so I'm
> happy to merge it.

I'm happy, too.

> However, as detailed in your comment log:
> 
> > This patch
> > (1) moves kgdb_disable_single_step() from 'c' command handling to single
> >     step handler.
> >     This makes sure that single stepping gets effective at every 's' command.
> >     Please note that, under the current implementation, single step bit in
> >     spsr, which is cleared by the first single stepping, will not be set
> >     again for the consecutive 's' commands because single step bit in mdscr
> >     is still kept on (that is, kernel_active_single_step() in
> >     kgdb_arch_handle_exception() is true).
> > (2) re-implements kgdb_roundup_cpus() because the current implementation
> >     enabled interrupts naively. See below.
> > (3) removes 'enable_dbg' in el1_dbg.
> >     Single step bit in mdscr is turned on in do_handle_exception()->
> >     kgdb_handle_expection() before returning to debugged context, and if
> >     debug exception is enabled in el1_dbg, we will see unexpected single-
> >     stepping in el1_dbg.
> >     Since v3.18, the following patch does the same:
> >       commit 1059c6bf8534 ("arm64: debug: don't re-enable debug exceptions
> >       on return from el1_dbg)
> > (4) masks interrupts while single-stepping one instruction.
> >     If an interrupt is caught during processing a single-stepping, debug
> >     exception is unintentionally enabled by el1_irq's 'enable_dbg' before
> >     returning to debugged context.
> >     Thus, like in (2), we will see unexpected single-stepping in el1_irq.
> 
> this patch is doing *far* too much in one go. Could you please repost it
> as a series of self-contained fixes with clear commit messages, so I can
> queue them and cc stable where appropriate?

Sure, but I need to refresh my memory here.

-Takahiro AKASHI

> Thanks,
> 
> Will
Daniel Thompson Sept. 15, 2016, 10:41 a.m. UTC | #4
On 15/09/16 08:56, AKASHI Takahiro wrote:
> On Wed, Sep 14, 2016 at 03:58:51PM +0100, Will Deacon wrote:
>> Hi Akashi,
>>
>> On Tue, Apr 21, 2015 at 02:13:13AM +0100, AKASHI Takahiro wrote:
>>> Could you please review my patch below?
>>> See also arm64 maintainer's comment:
>>> http://lists.infradead.org/pipermail/linux-arm-kernel/2015-January/313712.html
>>
>> -ETIMEDOUT waiting for the kdgb folk to comment. Ppeople have reported
>> that this patch is required for kgdb to work correctly on arm64, so I'm
>> happy to merge it.
>
> I'm happy, too.

I'll keep an eye out and FWIW see if I can throw in a review. I'm not  
really one of "kgdb folk" but did examine it fairly deeply in the early  
stages of the FIQ/NMI work (and which has since stopped focussing on kgdb).

I have some equally elderly, albeit rather less critical, kdb patches  
that I should have pushed harder for so I'm sympathetic here ;-)


Daniel.


>> However, as detailed in your comment log:
>>
>>> This patch
>>> (1) moves kgdb_disable_single_step() from 'c' command handling to single
>>>     step handler.
>>>     This makes sure that single stepping gets effective at every 's' command.
>>>     Please note that, under the current implementation, single step bit in
>>>     spsr, which is cleared by the first single stepping, will not be set
>>>     again for the consecutive 's' commands because single step bit in mdscr
>>>     is still kept on (that is, kernel_active_single_step() in
>>>     kgdb_arch_handle_exception() is true).
>>> (2) re-implements kgdb_roundup_cpus() because the current implementation
>>>     enabled interrupts naively. See below.
>>> (3) removes 'enable_dbg' in el1_dbg.
>>>     Single step bit in mdscr is turned on in do_handle_exception()->
>>>     kgdb_handle_expection() before returning to debugged context, and if
>>>     debug exception is enabled in el1_dbg, we will see unexpected single-
>>>     stepping in el1_dbg.
>>>     Since v3.18, the following patch does the same:
>>>       commit 1059c6bf8534 ("arm64: debug: don't re-enable debug exceptions
>>>       on return from el1_dbg)
>>> (4) masks interrupts while single-stepping one instruction.
>>>     If an interrupt is caught during processing a single-stepping, debug
>>>     exception is unintentionally enabled by el1_irq's 'enable_dbg' before
>>>     returning to debugged context.
>>>     Thus, like in (2), we will see unexpected single-stepping in el1_irq.
>>
>> this patch is doing *far* too much in one go. Could you please repost it
>> as a series of self-contained fixes with clear commit messages, so I can
>> queue them and cc stable where appropriate?
>
> Sure, but I need to refresh my memory here.
>
> -Takahiro AKASHI
>
>> Thanks,
>>
>> Will
Jason Wessel Sept. 15, 2016, 12:32 p.m. UTC | #5
On 09/15/2016 05:41 AM, Daniel Thompson wrote:
> On 15/09/16 08:56, AKASHI Takahiro wrote:
>> On Wed, Sep 14, 2016 at 03:58:51PM +0100, Will Deacon wrote:
>>> Hi Akashi,
>>>
>>> On Tue, Apr 21, 2015 at 02:13:13AM +0100, AKASHI Takahiro wrote:
>>>> Could you please review my patch below?
>>>> See also arm64 maintainer's comment:
>>>> http://lists.infradead.org/pipermail/linux-arm-kernel/2015-January/313712.html
>>>
>>> -ETIMEDOUT waiting for the kdgb folk to comment. Ppeople have reported
>>> that this patch is required for kgdb to work correctly on arm64, so I'm
>>> happy to merge it.
>>
>> I'm happy, too.
>
> I'll keep an eye out and FWIW see if I can throw in a review. I'm not
> really one of "kgdb folk" but did examine it fairly deeply in the early
> stages of the FIQ/NMI work (and which has since stopped focussing on kgdb).
>
> I have some equally elderly, albeit rather less critical, kdb patches
> that I should have pushed harder for so I'm sympathetic here ;-)


Hey, if you do happen to have something, why not send it along.  I just asked the linux-next folks to re-activate the kgdb-next branch so that merges can start taking place again.  It appears that the final merge request never even went through from linux-next as I just rebased it and there was a patch in there from Daniel dating back to over a year ago.

If there is work out there that needs merging and reviews lets get it done.  I had been kind of stuck in time on what seems like the stone age 3.14 kernel projects until recently, but jumped back into mainline development about a month ago.

I have no objection to merging the ARM64 single step patch and have comments in a separate mail that follows with respect to the patch.


Cheers,
Jason.
Jason Wessel Sept. 15, 2016, 1:04 p.m. UTC | #6
On 04/20/2015 08:13 PM, AKASHI Takahiro wrote:
> Jason,
>
> Could you please review my patch below?
> See also arm64 maintainer's comment:
> http://lists.infradead.org/pipermail/linux-arm-kernel/2015-January/313712.html
>
> Thanks,
> -Takahiro AKASHI
>
> I tried to verify kgdb in vanilla kernel on fast model, but it seems that
> the single stepping with kgdb doesn't work correctly since its first
> appearance at v3.15.
>
> On v3.15, 'stepi' command after breaking the kernel at some breakpoint
> steps forward to the next instruction, but the succeeding 'stepi' never
> goes beyond that.
> On v3.16, 'stepi' moves forward and stops at the next instruction just
> after enable_dbg in el1_dbg, and never goes beyond that. This variance of
> behavior seems to come in with the following patch in v3.16:
>
>     commit 2a2830703a23 ("arm64: debug: avoid accessing mdscr_el1 on fault
>     paths where possible")
>
> This patch
> (1) moves kgdb_disable_single_step() from 'c' command handling to single
>     step handler.
>     This makes sure that single stepping gets effective at every 's' command.
>     Please note that, under the current implementation, single step bit in
>     spsr, which is cleared by the first single stepping, will not be set
>     again for the consecutive 's' commands because single step bit in mdscr
>     is still kept on (that is, kernel_active_single_step() in
>     kgdb_arch_handle_exception() is true).
> (2) re-implements kgdb_roundup_cpus() because the current implementation
>     enabled interrupts naively. See below.
> (3) removes 'enable_dbg' in el1_dbg.
>     Single step bit in mdscr is turned on in do_handle_exception()->
>     kgdb_handle_expection() before returning to debugged context, and if
>     debug exception is enabled in el1_dbg, we will see unexpected single-
>     stepping in el1_dbg.
>     Since v3.18, the following patch does the same:
>       commit 1059c6bf8534 ("arm64: debug: don't re-enable debug exceptions
>       on return from el1_dbg)
> (4) masks interrupts while single-stepping one instruction.
>     If an interrupt is caught during processing a single-stepping, debug
>     exception is unintentionally enabled by el1_irq's 'enable_dbg' before
>     returning to debugged context.
>     Thus, like in (2), we will see unexpected single-stepping in el1_irq.
>
> Basically (1) and (2) are for v3.15, (3) and (4) for v3.1[67].
>
> @@ -176,18 +183,14 @@ int kgdb_arch_handle_exception(int exception_vector, int signo,
>  		 * over and over again.
>  		 */
>  		kgdb_arch_update_addr(linux_regs, remcom_in_buffer);
> -		atomic_set(&kgdb_cpu_doing_single_step, -1);
> -		kgdb_single_step =  0;


This is a subtle change, but I assume it is what you intended?  All the CPUs will get released into the run state when exiting the kgdb exception handler.


> -
> -		/*
> -		 * Received continue command, disable single step
> -		 */
> -		if (kernel_active_single_step())
> -			kernel_disable_single_step();
>


I see why this is not needed above any more given that you have a function that cleans up the state on entry to the kgdb exception handler.

The rest of the patch is fine.

I added the patch to kgdb-next after fixing up the context since it no longer applied to the mainline ( https://git.kernel.org/cgit/linux/kernel/git/jwessel/kgdb.git/log/?h=kgdb-next).  If there is further discussion on the point above, another patch can be added, but it I am assuming this is the way you desire it to work as there are some other architectures that use the same behaviour.  I do not presently have any ARM64 hardware to validate this particular change.

I also added to the patch a "Cc: linux-stable <stable@vger.kernel.org>" so we can have this appear on some of the older kernels.


Thanks,
Jason.
AKASHI Takahiro Sept. 16, 2016, 4:32 a.m. UTC | #7
Hi Jason,

Welcome back to mainline development.
I've been looking forward to your comments.

On Thu, Sep 15, 2016 at 08:04:57AM -0500, Jason Wessel wrote:
> On 04/20/2015 08:13 PM, AKASHI Takahiro wrote:
> >Jason,
> >
> >Could you please review my patch below?
> >See also arm64 maintainer's comment:
> >http://lists.infradead.org/pipermail/linux-arm-kernel/2015-January/313712.html
> >
> >Thanks,
> >-Takahiro AKASHI
> >
> >I tried to verify kgdb in vanilla kernel on fast model, but it seems that
> >the single stepping with kgdb doesn't work correctly since its first
> >appearance at v3.15.
> >
> >On v3.15, 'stepi' command after breaking the kernel at some breakpoint
> >steps forward to the next instruction, but the succeeding 'stepi' never
> >goes beyond that.
> >On v3.16, 'stepi' moves forward and stops at the next instruction just
> >after enable_dbg in el1_dbg, and never goes beyond that. This variance of
> >behavior seems to come in with the following patch in v3.16:
> >
> >    commit 2a2830703a23 ("arm64: debug: avoid accessing mdscr_el1 on fault
> >    paths where possible")
> >
> >This patch
> >(1) moves kgdb_disable_single_step() from 'c' command handling to single
> >    step handler.
> >    This makes sure that single stepping gets effective at every 's' command.
> >    Please note that, under the current implementation, single step bit in
> >    spsr, which is cleared by the first single stepping, will not be set
> >    again for the consecutive 's' commands because single step bit in mdscr
> >    is still kept on (that is, kernel_active_single_step() in
> >    kgdb_arch_handle_exception() is true).
> >(2) re-implements kgdb_roundup_cpus() because the current implementation
> >    enabled interrupts naively. See below.
> >(3) removes 'enable_dbg' in el1_dbg.
> >    Single step bit in mdscr is turned on in do_handle_exception()->
> >    kgdb_handle_expection() before returning to debugged context, and if
> >    debug exception is enabled in el1_dbg, we will see unexpected single-
> >    stepping in el1_dbg.
> >    Since v3.18, the following patch does the same:
> >      commit 1059c6bf8534 ("arm64: debug: don't re-enable debug exceptions
> >      on return from el1_dbg)
> >(4) masks interrupts while single-stepping one instruction.
> >    If an interrupt is caught during processing a single-stepping, debug
> >    exception is unintentionally enabled by el1_irq's 'enable_dbg' before
> >    returning to debugged context.
> >    Thus, like in (2), we will see unexpected single-stepping in el1_irq.
> >
> >Basically (1) and (2) are for v3.15, (3) and (4) for v3.1[67].
> >
> >@@ -176,18 +183,14 @@ int kgdb_arch_handle_exception(int exception_vector, int signo,
> > 		 * over and over again.
> > 		 */
> > 		kgdb_arch_update_addr(linux_regs, remcom_in_buffer);
> >-		atomic_set(&kgdb_cpu_doing_single_step, -1);
> >-		kgdb_single_step =  0;
> 
> 
> This is a subtle change, but I assume it is what you intended?  All the CPUs will get released into the run state when exiting the kgdb exception handler.

You are talking about "- kgdb_single_step = 0." Right?
Do you think that there is any (negative) side effect of this change?
Since most of architectures, including x86, don't handle this variable
explicitly, I didn't expect that it was essential.

> 
> >-
> >-		/*
> >-		 * Received continue command, disable single step
> >-		 */
> >-		if (kernel_active_single_step())
> >-			kernel_disable_single_step();
> >
> 
> 
> I see why this is not needed above any more given that you have a function that cleans up the state on entry to the kgdb exception handler.

Yeah, the point here is that, on arm64, we need to set SS (Software step)
bit in SPSR as well as MDSCR before returning from the exception handler
in order to enable a hardware single step according to ARM ARM D2.12
("Software Step exceptions").
So it might be good enough just to call kernel_enable_single_step()
at 's' command unconditionally instead of "if (!kernel_active_single_step)".
(Please note that, as I mentioned in the commit message, SPSR.SS bit will
be cleared while MDSCR.SS bit is kept on after a hardware single step.)

But anyhow, I believe that the hunk above is redundant in my implementation.

> The rest of the patch is fine.

Thank you.

> I added the patch to kgdb-next after fixing up the context since it no longer applied to the mainline ( https://git.kernel.org/cgit/linux/kernel/git/jwessel/kgdb.git/log/?h=kgdb-next).  If there is further discussion on the point above, another patch can be added, but it I am assuming this is the way you desire it to work as there are some other architectures that use the same behaviour.  I do not presently have any ARM64 hardware to validate this particular change.
> 
> I also added to the patch a "Cc: linux-stable <stable@vger.kernel.org>" so we can have this appear on some of the older kernels.

Since Will asked me to split this patch into a few, I need some reworks
to clarify which hunks in the patch are necessary for which version of kernel.

Thanks,
-Takahiro AKASHI

> 
> Thanks,
> Jason.
>
Will Deacon Sept. 16, 2016, 7:45 a.m. UTC | #8
A Jason appeared!

On Fri, Sep 16, 2016 at 01:32:19PM +0900, AKASHI Takahiro wrote:
> On Thu, Sep 15, 2016 at 08:04:57AM -0500, Jason Wessel wrote:
> > I added the patch to kgdb-next after fixing up the context since it no
> > longer applied to the mainline (
> > https://git.kernel.org/cgit/linux/kernel/git/jwessel/kgdb.git/log/?h=kgdb-next).
> > If there is further discussion on the point above, another patch can be
> > added, but it I am assuming this is the way you desire it to work as
> > there are some other architectures that use the same behaviour.  I do
> > not presently have any ARM64 hardware to validate this particular
> > change.
> > 
> > I also added to the patch a "Cc: linux-stable <stable@vger.kernel.org>"
> > so we can have this appear on some of the older kernels.
> 
> Since Will asked me to split this patch into a few, I need some reworks
> to clarify which hunks in the patch are necessary for which version of kernel.

Yes, splitting the patch would be much better for sorting out the stable
backports too. Jason, please can you drop the patch for now? I don't mind
whether the end result goes via arm64 or kgdb, but we should at least both
agree on it first :)

Will
Jason Wessel Sept. 19, 2016, 10:25 p.m. UTC | #9
On 09/16/2016 02:45 AM, Will Deacon wrote:
> On Fri, Sep 16, 2016 at 01:32:19PM +0900, AKASHI Takahiro wrote:
>> On Thu, Sep 15, 2016 at 08:04:57AM -0500, Jason Wessel wrote:
>>> I added the patch to kgdb-next after fixing up the context since it no
>>> longer applied to the mainline (
>>> https://git.kernel.org/cgit/linux/kernel/git/jwessel/kgdb.git/log/?h=kgdb-next).
>>> If there is further discussion on the point above, another patch can be
>>> added, but it I am assuming this is the way you desire it to work as
>>> there are some other architectures that use the same behaviour.  I do
>>> not presently have any ARM64 hardware to validate this particular
>>> change.
>>>
>>> I also added to the patch a "Cc: linux-stable <stable@vger.kernel.org>"
>>> so we can have this appear on some of the older kernels.
>> Since Will asked me to split this patch into a few, I need some reworks
>> to clarify which hunks in the patch are necessary for which version of kernel.
> Yes, splitting the patch would be much better for sorting out the stable
> backports too. Jason, please can you drop the patch for now? I don't mind
> whether the end result goes via arm64 or kgdb, but we should at least both
> agree on it first :)

Splitting it is a very wise idea so that we can have all the -stable kernels patched up with a working single step function.

The separated patches can easily be tagged with the CC line examples as shown below:

Cc: <stable@vger.kernel.org> # 3.15.x-
Cc: <stable@vger.kernel.org> # 4.4
Cc: <stable@vger.kernel.org> # 4.4-4.5

I had dropped the original patch.

Cheers,
Jason.
Jason Wessel Sept. 19, 2016, 10:29 p.m. UTC | #10
On 09/15/2016 11:32 PM, AKASHI Takahiro wrote:
> @@ -176,18 +183,14 @@ int kgdb_arch_handle_exception(int exception_vector, int signo,
>>> 		 * over and over again.
>>> 		 */
>>> 		kgdb_arch_update_addr(linux_regs, remcom_in_buffer);
>>> -		atomic_set(&kgdb_cpu_doing_single_step, -1);
>>> -		kgdb_single_step =  0;
>>
>> This is a subtle change, but I assume it is what you intended?  All the CPUs will get released into the run state when exiting the kgdb exception handler.
> You are talking about "- kgdb_single_step = 0." Right?


Correct.

> Do you think that there is any (negative) side effect of this change?


Not at all.   The kernel debugger always skids to a stop, and it is more reliable from a locking perspective if the other CPU threads are released while a single CPU is asked to single step until the next "skid" for all the other CPUs.

When you do not release the other CPUs you can end up single stepping a CPU which dead locks or never exits a lock elsewhere due to what ever it was blocking on never getting freed from another CPU.

Cheers,
Jason.
AKASHI Takahiro Sept. 20, 2016, 8:43 a.m. UTC | #11
Jason,

On Mon, Sep 19, 2016 at 05:29:36PM -0500, Jason Wessel wrote:
> On 09/15/2016 11:32 PM, AKASHI Takahiro wrote:
> >@@ -176,18 +183,14 @@ int kgdb_arch_handle_exception(int exception_vector, int signo,
> >>>		 * over and over again.
> >>>		 */
> >>>		kgdb_arch_update_addr(linux_regs, remcom_in_buffer);
> >>>-		atomic_set(&kgdb_cpu_doing_single_step, -1);
> >>>-		kgdb_single_step =  0;
> >>
> >>This is a subtle change, but I assume it is what you intended?  All the CPUs will get released into the run state when exiting the kgdb exception handler.
> >You are talking about "- kgdb_single_step = 0." Right?
> 
> 
> Correct.
> 
> >Do you think that there is any (negative) side effect of this change?
> 
> 
> Not at all.   The kernel debugger always skids to a stop, and it is more reliable from a locking perspective if the other CPU threads are released while a single CPU is asked to single step until the next "skid" for all the other CPUs.
> 
> When you do not release the other CPUs you can end up single stepping a CPU which dead locks or never exits a lock elsewhere due to what ever it was blocking on never getting freed from another CPU.

Thank you for the explanation. This convinces me very much.

-Takahiro AKASHI

> Cheers,
> Jason.
diff mbox

Patch

diff --git a/arch/arm64/kernel/kgdb.c b/arch/arm64/kernel/kgdb.c
index a0d10c5..81b5910 100644
--- a/arch/arm64/kernel/kgdb.c
+++ b/arch/arm64/kernel/kgdb.c
@@ -19,9 +19,13 @@ 
  * along with this program.  If not, see <http://www.gnu.org/licenses/>.
  */
 
+#include <linux/cpumask.h>
 #include <linux/irq.h>
+#include <linux/irq_work.h>
 #include <linux/kdebug.h>
 #include <linux/kgdb.h>
+#include <linux/percpu.h>
+#include <asm/ptrace.h>
 #include <asm/traps.h>
 
 struct dbg_reg_def_t dbg_reg_def[DBG_MAX_REG_NUM] = {
@@ -95,6 +99,9 @@  struct dbg_reg_def_t dbg_reg_def[DBG_MAX_REG_NUM] = {
 	{ "fpcr", 4, -1 },
 };
 
+static DEFINE_PER_CPU(unsigned int, kgdb_pstate);
+static DEFINE_PER_CPU(struct irq_work, kgdb_irq_work);
+
 char *dbg_get_reg(int regno, void *mem, struct pt_regs *regs)
 {
 	if (regno >= DBG_MAX_REG_NUM || regno < 0)
@@ -176,18 +183,14 @@  int kgdb_arch_handle_exception(int exception_vector, int signo,
 		 * over and over again.
 		 */
 		kgdb_arch_update_addr(linux_regs, remcom_in_buffer);
-		atomic_set(&kgdb_cpu_doing_single_step, -1);
-		kgdb_single_step =  0;
-
-		/*
-		 * Received continue command, disable single step
-		 */
-		if (kernel_active_single_step())
-			kernel_disable_single_step();
 
 		err = 0;
 		break;
 	case 's':
+		/* mask interrupts while single stepping */
+		__this_cpu_write(kgdb_pstate, linux_regs->pstate);
+		linux_regs->pstate |= PSR_I_BIT;
+
 		/*
 		 * Update step address value with address passed
 		 * with step packet.
@@ -198,8 +201,6 @@  int kgdb_arch_handle_exception(int exception_vector, int signo,
 		 */
 		kgdb_arch_update_addr(linux_regs, remcom_in_buffer);
 		atomic_set(&kgdb_cpu_doing_single_step, raw_smp_processor_id());
-		kgdb_single_step =  1;
-
 		/*
 		 * Enable single step handling
 		 */
@@ -229,6 +230,18 @@  static int kgdb_compiled_brk_fn(struct pt_regs *regs, unsigned int esr)
 
 static int kgdb_step_brk_fn(struct pt_regs *regs, unsigned int esr)
 {
+	unsigned int pstate;
+
+	kernel_disable_single_step();
+	atomic_set(&kgdb_cpu_doing_single_step, -1);
+
+	/* restore interrupt mask status */
+	pstate = __this_cpu_read(kgdb_pstate);
+	if (pstate & PSR_I_BIT)
+		regs->pstate |= PSR_I_BIT;
+	else
+		regs->pstate &= ~PSR_I_BIT;
+
 	kgdb_handle_exception(1, SIGTRAP, 0, regs);
 	return 0;
 }
@@ -249,16 +262,27 @@  static struct step_hook kgdb_step_hook = {
 	.fn		= kgdb_step_brk_fn
 };
 
-static void kgdb_call_nmi_hook(void *ignored)
+static void kgdb_roundup_hook(struct irq_work *work)
 {
 	kgdb_nmicallback(raw_smp_processor_id(), get_irq_regs());
 }
 
 void kgdb_roundup_cpus(unsigned long flags)
 {
-	local_irq_enable();
-	smp_call_function(kgdb_call_nmi_hook, NULL, 0);
-	local_irq_disable();
+	int cpu;
+	struct cpumask mask;
+	struct irq_work *work;
+
+	mask = *cpu_online_mask;
+	cpumask_clear_cpu(smp_processor_id(), &mask);
+	cpu = cpumask_first(&mask);
+	if (cpu >= nr_cpu_ids)
+		return;
+
+	for_each_cpu(cpu, &mask) {
+		work = per_cpu_ptr(&kgdb_irq_work, cpu);
+		irq_work_queue_on(work, cpu);
+	}
 }
 
 static int __kgdb_notify(struct die_args *args, unsigned long cmd)
@@ -299,6 +323,8 @@  static struct notifier_block kgdb_notifier = {
 int kgdb_arch_init(void)
 {
 	int ret = register_die_notifier(&kgdb_notifier);
+	int cpu;
+	struct irq_work *work;
 
 	if (ret != 0)
 		return ret;
@@ -306,6 +332,12 @@  int kgdb_arch_init(void)
 	register_break_hook(&kgdb_brkpt_hook);
 	register_break_hook(&kgdb_compiled_brkpt_hook);
 	register_step_hook(&kgdb_step_hook);
+
+	for_each_possible_cpu(cpu) {
+		work = per_cpu_ptr(&kgdb_irq_work, cpu);
+		init_irq_work(work, kgdb_roundup_hook);
+	}
+
 	return 0;
 }