Message ID | 20170523043058.5463-3-takahiro.akashi@linaro.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Tue, May 23, 2017 at 01:30:56PM +0900, AKASHI Takahiro wrote: > After entering kgdb mode, 'stepi' may unexpectedly breaks the execution > somewhere in el1_irq. > > This happens because a debug exception is always enabled in el1_irq > due to the following commit merged in v3.16: > commit 2a2830703a23 ("arm64: debug: avoid accessing mdscr_el1 on fault > paths where possible") > A pending interrupt can be taken after kgdb has enabled a software step, > but before a debug exception is actually taken. > > This patch enforces interrupts to be masked while single stepping. The desired behaviour here boils down to whether or not KGDB expects to step into or over interrupts in response a stepi instruction. What do other architectures do? What happens if the instruction being stepped faults? Will > Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org> > Cc: Catalin Marinas <catalin.marinas@arm.com> > Cc: Will Deacon <will.deacon@arm.com> > Cc: Jason Wessel <jason.wessel@windriver.com> > --- > arch/arm64/kernel/kgdb.c | 16 ++++++++++++++++ > 1 file changed, 16 insertions(+) > > diff --git a/arch/arm64/kernel/kgdb.c b/arch/arm64/kernel/kgdb.c > index b9176b324e5a..fddbc6be3780 100644 > --- a/arch/arm64/kernel/kgdb.c > +++ b/arch/arm64/kernel/kgdb.c > @@ -28,6 +28,7 @@ > > #include <asm/debug-monitors.h> > #include <asm/insn.h> > +#include <asm/ptrace.h> > #include <asm/traps.h> > > struct dbg_reg_def_t dbg_reg_def[DBG_MAX_REG_NUM] = { > @@ -111,6 +112,8 @@ struct dbg_reg_def_t dbg_reg_def[DBG_MAX_REG_NUM] = { > { "fpcr", 4, -1 }, > }; > > +static DEFINE_PER_CPU(unsigned int, kgdb_pstate); > + > char *dbg_get_reg(int regno, void *mem, struct pt_regs *regs) > { > if (regno >= DBG_MAX_REG_NUM || regno < 0) > @@ -200,6 +203,10 @@ int kgdb_arch_handle_exception(int exception_vector, int signo, > err = 0; > break; > case 's': > + /* mask interrupts while single stepping */ > + __this_cpu_write(kgdb_pstate, linux_regs->pstate); > + linux_regs->pstate |= PSR_I_BIT; > + > /* > * Update step address value with address passed > * with step packet. > @@ -242,11 +249,20 @@ NOKPROBE_SYMBOL(kgdb_compiled_brk_fn); > > static int kgdb_step_brk_fn(struct pt_regs *regs, unsigned int esr) > { > + unsigned int pstate; > + > if (!kgdb_single_step) > return DBG_HOOK_ERROR; > > kernel_disable_single_step(); > > + /* restore interrupt mask status */ > + pstate = __this_cpu_read(kgdb_pstate); > + if (pstate & PSR_I_BIT) > + regs->pstate |= PSR_I_BIT; > + else > + regs->pstate &= ~PSR_I_BIT; > + > kgdb_handle_exception(1, SIGTRAP, 0, regs); > return 0; > } > -- > 2.11.1 >
On Mon, Jun 05, 2017 at 05:29:19PM +0100, Will Deacon wrote: > On Tue, May 23, 2017 at 01:30:56PM +0900, AKASHI Takahiro wrote: > > After entering kgdb mode, 'stepi' may unexpectedly breaks the execution > > somewhere in el1_irq. > > > > This happens because a debug exception is always enabled in el1_irq > > due to the following commit merged in v3.16: > > commit 2a2830703a23 ("arm64: debug: avoid accessing mdscr_el1 on fault > > paths where possible") > > A pending interrupt can be taken after kgdb has enabled a software step, > > but before a debug exception is actually taken. > > > > This patch enforces interrupts to be masked while single stepping. > > The desired behaviour here boils down to whether or not KGDB expects to step > into or over interrupts in response a stepi instruction. What do other > architectures do? I don't know x86 case, but if we step into interrupt code here, doing stepi on a normal function will be almost useless as every attempt of stepi will end up falling into irq code (mostly for timer interrupt). > What happens if the instruction being stepped faults? Well, as a matter of fact, we get a gdb control somewhere in exception code (actually just after 'enable_dbg' in el1_sync). But this is a synchronous event, and 'c' command will put us back where we were before stepi. I hope this will be a more desirable behavior. (The only drawback here is we can't see a correct/complete backtrace of stack because the current gdb is not kernel-aware.) Thanks, -Takahiro AKASHI > Will > > > Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org> > > Cc: Catalin Marinas <catalin.marinas@arm.com> > > Cc: Will Deacon <will.deacon@arm.com> > > Cc: Jason Wessel <jason.wessel@windriver.com> > > --- > > arch/arm64/kernel/kgdb.c | 16 ++++++++++++++++ > > 1 file changed, 16 insertions(+) > > > > diff --git a/arch/arm64/kernel/kgdb.c b/arch/arm64/kernel/kgdb.c > > index b9176b324e5a..fddbc6be3780 100644 > > --- a/arch/arm64/kernel/kgdb.c > > +++ b/arch/arm64/kernel/kgdb.c > > @@ -28,6 +28,7 @@ > > > > #include <asm/debug-monitors.h> > > #include <asm/insn.h> > > +#include <asm/ptrace.h> > > #include <asm/traps.h> > > > > struct dbg_reg_def_t dbg_reg_def[DBG_MAX_REG_NUM] = { > > @@ -111,6 +112,8 @@ struct dbg_reg_def_t dbg_reg_def[DBG_MAX_REG_NUM] = { > > { "fpcr", 4, -1 }, > > }; > > > > +static DEFINE_PER_CPU(unsigned int, kgdb_pstate); > > + > > char *dbg_get_reg(int regno, void *mem, struct pt_regs *regs) > > { > > if (regno >= DBG_MAX_REG_NUM || regno < 0) > > @@ -200,6 +203,10 @@ int kgdb_arch_handle_exception(int exception_vector, int signo, > > err = 0; > > break; > > case 's': > > + /* mask interrupts while single stepping */ > > + __this_cpu_write(kgdb_pstate, linux_regs->pstate); > > + linux_regs->pstate |= PSR_I_BIT; > > + > > /* > > * Update step address value with address passed > > * with step packet. > > @@ -242,11 +249,20 @@ NOKPROBE_SYMBOL(kgdb_compiled_brk_fn); > > > > static int kgdb_step_brk_fn(struct pt_regs *regs, unsigned int esr) > > { > > + unsigned int pstate; > > + > > if (!kgdb_single_step) > > return DBG_HOOK_ERROR; > > > > kernel_disable_single_step(); > > > > + /* restore interrupt mask status */ > > + pstate = __this_cpu_read(kgdb_pstate); > > + if (pstate & PSR_I_BIT) > > + regs->pstate |= PSR_I_BIT; > > + else > > + regs->pstate &= ~PSR_I_BIT; > > + > > kgdb_handle_exception(1, SIGTRAP, 0, regs); > > return 0; > > } > > -- > > 2.11.1 > >
On Wed, Jun 07, 2017 at 02:34:50PM +0900, AKASHI Takahiro wrote: > On Mon, Jun 05, 2017 at 05:29:19PM +0100, Will Deacon wrote: > > On Tue, May 23, 2017 at 01:30:56PM +0900, AKASHI Takahiro wrote: > > > After entering kgdb mode, 'stepi' may unexpectedly breaks the execution > > > somewhere in el1_irq. > > > > > > This happens because a debug exception is always enabled in el1_irq > > > due to the following commit merged in v3.16: > > > commit 2a2830703a23 ("arm64: debug: avoid accessing mdscr_el1 on fault > > > paths where possible") > > > A pending interrupt can be taken after kgdb has enabled a software step, > > > but before a debug exception is actually taken. > > > > > > This patch enforces interrupts to be masked while single stepping. > > > > The desired behaviour here boils down to whether or not KGDB expects to step > > into or over interrupts in response a stepi instruction. What do other > > architectures do? > > I don't know x86 case, but if we step into interrupt code here, > doing stepi on a normal function will be almost useless as every > attempt of stepi will end up falling into irq code (mostly for timer > interrupt). > > > What happens if the instruction being stepped faults? > > Well, as a matter of fact, we get a gdb control somewhere in exception code > (actually just after 'enable_dbg' in el1_sync). Ok, but don't we need to re-enable interrupts, otherwise we can't safely handle the fault (which might involve blocking)? Will
Will, Sorry for late response, On Wed, Jun 07, 2017 at 05:50:18PM +0100, Will Deacon wrote: > On Wed, Jun 07, 2017 at 02:34:50PM +0900, AKASHI Takahiro wrote: > > On Mon, Jun 05, 2017 at 05:29:19PM +0100, Will Deacon wrote: > > > On Tue, May 23, 2017 at 01:30:56PM +0900, AKASHI Takahiro wrote: > > > > After entering kgdb mode, 'stepi' may unexpectedly breaks the execution > > > > somewhere in el1_irq. > > > > > > > > This happens because a debug exception is always enabled in el1_irq > > > > due to the following commit merged in v3.16: > > > > commit 2a2830703a23 ("arm64: debug: avoid accessing mdscr_el1 on fault > > > > paths where possible") > > > > A pending interrupt can be taken after kgdb has enabled a software step, > > > > but before a debug exception is actually taken. > > > > > > > > This patch enforces interrupts to be masked while single stepping. > > > > > > The desired behaviour here boils down to whether or not KGDB expects to step > > > into or over interrupts in response a stepi instruction. What do other > > > architectures do? > > > > I don't know x86 case, but if we step into interrupt code here, > > doing stepi on a normal function will be almost useless as every > > attempt of stepi will end up falling into irq code (mostly for timer > > interrupt). > > > > > What happens if the instruction being stepped faults? > > > > Well, as a matter of fact, we get a gdb control somewhere in exception code > > (actually just after 'enable_dbg' in el1_sync). > > Ok, but don't we need to re-enable interrupts, otherwise we can't safely > handle the fault (which might involve blocking)? I thought a lot, but have got no other way to solve the issue, which totally makes stepi in vain. In theory, you might be right, but in practice, people don't always expect to step through the whole sequence of fault recovery with single stepping. Once we do 'c(ontinue),' interrupts are enabled again and the execution will follow as expected. If you want to 'step over' a faulted instruction here, or to break somewhere in a middle of exception handler, you need to manage to set a breakpoint explicitly. But it will, I believe, be much better than useless stepi from day-1 :) Meanwhile, kprobes also disables interrupts while single stepping. See setup_singlestep(). Thanks, -Takahiro AKASHI > Will
On Tue, Jun 20, 2017 at 11:43:34AM +0900, AKASHI Takahiro wrote: > On Wed, Jun 07, 2017 at 05:50:18PM +0100, Will Deacon wrote: > > On Wed, Jun 07, 2017 at 02:34:50PM +0900, AKASHI Takahiro wrote: > > > On Mon, Jun 05, 2017 at 05:29:19PM +0100, Will Deacon wrote: > > > > On Tue, May 23, 2017 at 01:30:56PM +0900, AKASHI Takahiro wrote: > > > > > After entering kgdb mode, 'stepi' may unexpectedly breaks the execution > > > > > somewhere in el1_irq. > > > > > > > > > > This happens because a debug exception is always enabled in el1_irq > > > > > due to the following commit merged in v3.16: > > > > > commit 2a2830703a23 ("arm64: debug: avoid accessing mdscr_el1 on fault > > > > > paths where possible") > > > > > A pending interrupt can be taken after kgdb has enabled a software step, > > > > > but before a debug exception is actually taken. > > > > > > > > > > This patch enforces interrupts to be masked while single stepping. > > > > > > > > The desired behaviour here boils down to whether or not KGDB expects to step > > > > into or over interrupts in response a stepi instruction. What do other > > > > architectures do? > > > > > > I don't know x86 case, but if we step into interrupt code here, > > > doing stepi on a normal function will be almost useless as every > > > attempt of stepi will end up falling into irq code (mostly for timer > > > interrupt). > > > > > > > What happens if the instruction being stepped faults? > > > > > > Well, as a matter of fact, we get a gdb control somewhere in exception code > > > (actually just after 'enable_dbg' in el1_sync). > > > > Ok, but don't we need to re-enable interrupts, otherwise we can't safely > > handle the fault (which might involve blocking)? > > I thought a lot, but have got no other way to solve the issue, which > totally makes stepi in vain. > In theory, you might be right, but in practice, people don't always expect > to step through the whole sequence of fault recovery with single stepping. > Once we do 'c(ontinue),' interrupts are enabled again and the execution > will follow as expected. It's not the stepping guarantees I'm worried about. I'm more worried that the fault handler panics because it's called with IRQs disabled, so the debugger has ended up changing the behaviour of the kernel which is absolutely not what you want! > If you want to 'step over' a faulted instruction here, or to break > somewhere in a middle of exception handler, you need to manage to set > a breakpoint explicitly. But it will, I believe, be much better than > useless stepi from day-1 :) > > Meanwhile, kprobes also disables interrupts while single stepping. > See setup_singlestep(). Sure, but I don't think those instructions can fault. Can KGDB make the same guarantees? Will
On Tue, Jun 20, 2017 at 06:07:00PM +0100, Will Deacon wrote: > On Tue, Jun 20, 2017 at 11:43:34AM +0900, AKASHI Takahiro wrote: > > On Wed, Jun 07, 2017 at 05:50:18PM +0100, Will Deacon wrote: > > > On Wed, Jun 07, 2017 at 02:34:50PM +0900, AKASHI Takahiro wrote: > > > > Ok, but don't we need to re-enable interrupts, otherwise we can't safely > > > handle the fault (which might involve blocking)? > > > > I thought a lot, but have got no other way to solve the issue, which > > totally makes stepi in vain. > > In theory, you might be right, but in practice, people don't always expect > > to step through the whole sequence of fault recovery with single stepping. > > Once we do 'c(ontinue),' interrupts are enabled again and the execution > > will follow as expected. > > It's not the stepping guarantees I'm worried about. I'm more worried that > the fault handler panics because it's called with IRQs disabled, I'm still wondering how it can cause panic. > so the > debugger has ended up changing the behaviour of the kernel which is > absolutely not what you want! That's what I'm saying, I guess. As far as we are keeping on single stepping, no interrupts will be taken but once the execution is continued, any outstanding interrupts will be served. Human intervention with a debugger, either software-oriented kgdb or jtag-based ICE, will inevitably affect the system's behavior somehow. > > If you want to 'step over' a faulted instruction here, or to break > > somewhere in a middle of exception handler, you need to manage to set > > a breakpoint explicitly. But it will, I believe, be much better than > > useless stepi from day-1 :) > > > > Meanwhile, kprobes also disables interrupts while single stepping. > > See setup_singlestep(). > > Sure, but I don't think those instructions can fault. Looking into arm_probe_decode_insn() & aarch64_insn_is_steppable(), I doubt that no instruction to be single stepped by kprobes will cause any (page) fault. In addition, a comment in kprobe_fault_handler() says, /* * We are here because the instruction being single * stepped caused a page fault. We reset the current * kprobe and the ip points back to the probe address * and allow the page fault handler to continue as a * normal page fault. */ Kprobes seems to redo the instruction with single step disabled if it has generated a page fault. > Can KGDB make the same > guarantees? Taking a similar approach won't be impossible, but require lots of work, in particular decoding instructions to be single stepped. (I'm even not sure that gdb protocol allows this.) It also raises another question, as you mentioned early, should we step into or step over exception/interrupt handler? Thanks, -Takahiro AKASHI > Will
On Wed, Jun 21, 2017 at 11:43:05AM +0900, AKASHI Takahiro wrote: > On Tue, Jun 20, 2017 at 06:07:00PM +0100, Will Deacon wrote: > > On Tue, Jun 20, 2017 at 11:43:34AM +0900, AKASHI Takahiro wrote: > > > On Wed, Jun 07, 2017 at 05:50:18PM +0100, Will Deacon wrote: > > > > On Wed, Jun 07, 2017 at 02:34:50PM +0900, AKASHI Takahiro wrote: > > > > > > Ok, but don't we need to re-enable interrupts, otherwise we can't safely > > > > handle the fault (which might involve blocking)? > > > > > > I thought a lot, but have got no other way to solve the issue, which > > > totally makes stepi in vain. > > > In theory, you might be right, but in practice, people don't always expect > > > to step through the whole sequence of fault recovery with single stepping. > > > Once we do 'c(ontinue),' interrupts are enabled again and the execution > > > will follow as expected. > > > > It's not the stepping guarantees I'm worried about. I'm more worried that > > the fault handler panics because it's called with IRQs disabled, > > I'm still wondering how it can cause panic. Well you're calling the page fault path with interrupts disabled. It might try to allocate page tables without passing GFP_ATOMIC, or it might block on I/O or reschedule. The answer might well be "you can't step put_user and friends", but perhaps we should try to enforce that rather than go horribly wrong if you do it. Will
diff --git a/arch/arm64/kernel/kgdb.c b/arch/arm64/kernel/kgdb.c index b9176b324e5a..fddbc6be3780 100644 --- a/arch/arm64/kernel/kgdb.c +++ b/arch/arm64/kernel/kgdb.c @@ -28,6 +28,7 @@ #include <asm/debug-monitors.h> #include <asm/insn.h> +#include <asm/ptrace.h> #include <asm/traps.h> struct dbg_reg_def_t dbg_reg_def[DBG_MAX_REG_NUM] = { @@ -111,6 +112,8 @@ struct dbg_reg_def_t dbg_reg_def[DBG_MAX_REG_NUM] = { { "fpcr", 4, -1 }, }; +static DEFINE_PER_CPU(unsigned int, kgdb_pstate); + char *dbg_get_reg(int regno, void *mem, struct pt_regs *regs) { if (regno >= DBG_MAX_REG_NUM || regno < 0) @@ -200,6 +203,10 @@ int kgdb_arch_handle_exception(int exception_vector, int signo, err = 0; break; case 's': + /* mask interrupts while single stepping */ + __this_cpu_write(kgdb_pstate, linux_regs->pstate); + linux_regs->pstate |= PSR_I_BIT; + /* * Update step address value with address passed * with step packet. @@ -242,11 +249,20 @@ NOKPROBE_SYMBOL(kgdb_compiled_brk_fn); static int kgdb_step_brk_fn(struct pt_regs *regs, unsigned int esr) { + unsigned int pstate; + if (!kgdb_single_step) return DBG_HOOK_ERROR; kernel_disable_single_step(); + /* restore interrupt mask status */ + pstate = __this_cpu_read(kgdb_pstate); + if (pstate & PSR_I_BIT) + regs->pstate |= PSR_I_BIT; + else + regs->pstate &= ~PSR_I_BIT; + kgdb_handle_exception(1, SIGTRAP, 0, regs); return 0; }
After entering kgdb mode, 'stepi' may unexpectedly breaks the execution somewhere in el1_irq. This happens because a debug exception is always enabled in el1_irq due to the following commit merged in v3.16: commit 2a2830703a23 ("arm64: debug: avoid accessing mdscr_el1 on fault paths where possible") A pending interrupt can be taken after kgdb has enabled a software step, but before a debug exception is actually taken. This patch enforces interrupts to be masked while single stepping. Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Jason Wessel <jason.wessel@windriver.com> --- arch/arm64/kernel/kgdb.c | 16 ++++++++++++++++ 1 file changed, 16 insertions(+)