diff mbox series

[RFC,v1,4/4] arm64: kgdb: Round up cpus using IPI_CALL_NMI_FUNC

Message ID 1587726554-32018-5-git-send-email-sumit.garg@linaro.org (mailing list archive)
State New, archived
Headers show
Series arm64: Introduce new IPI as IPI_CALL_NMI_FUNC | expand

Commit Message

Sumit Garg April 24, 2020, 11:09 a.m. UTC
arm64 platforms with GICv3 or later supports pseudo NMIs which can be
leveraged to round up CPUs which are stuck in hard lockup state with
interrupts disabled that wouldn't be possible with a normal IPI.

So instead switch to round up CPUs using IPI_CALL_NMI_FUNC. And in
case a particular arm64 platform doesn't supports pseudo NMIs,
IPI_CALL_NMI_FUNC will act as a normal IPI which maintains existing
kgdb functionality.

Also, one thing to note here is that with CPUs running in NMI context,
kernel has special handling for printk() which involves CPU specific
buffers and defering printk() until exit from NMI context. But with kgdb
we don't want to defer printk() especially backtrace on corresponding
CPUs. So switch to normal printk() context instead prior to entering
kgdb context.

Signed-off-by: Sumit Garg <sumit.garg@linaro.org>
---
 arch/arm64/kernel/kgdb.c | 15 +++++++++++++++
 arch/arm64/kernel/smp.c  | 17 ++++++++++++++---
 2 files changed, 29 insertions(+), 3 deletions(-)

Comments

Doug Anderson April 24, 2020, 8:46 p.m. UTC | #1
Hi,

On Fri, Apr 24, 2020 at 4:11 AM Sumit Garg <sumit.garg@linaro.org> wrote:
>
> arm64 platforms with GICv3 or later supports pseudo NMIs which can be
> leveraged to round up CPUs which are stuck in hard lockup state with
> interrupts disabled that wouldn't be possible with a normal IPI.
>
> So instead switch to round up CPUs using IPI_CALL_NMI_FUNC. And in
> case a particular arm64 platform doesn't supports pseudo NMIs,
> IPI_CALL_NMI_FUNC will act as a normal IPI which maintains existing
> kgdb functionality.
>
> Also, one thing to note here is that with CPUs running in NMI context,
> kernel has special handling for printk() which involves CPU specific
> buffers and defering printk() until exit from NMI context. But with kgdb
> we don't want to defer printk() especially backtrace on corresponding
> CPUs. So switch to normal printk() context instead prior to entering
> kgdb context.
>
> Signed-off-by: Sumit Garg <sumit.garg@linaro.org>
> ---
>  arch/arm64/kernel/kgdb.c | 15 +++++++++++++++
>  arch/arm64/kernel/smp.c  | 17 ++++++++++++++---
>  2 files changed, 29 insertions(+), 3 deletions(-)
>
> diff --git a/arch/arm64/kernel/kgdb.c b/arch/arm64/kernel/kgdb.c
> index 4311992..0851ead 100644
> --- a/arch/arm64/kernel/kgdb.c
> +++ b/arch/arm64/kernel/kgdb.c
> @@ -14,6 +14,7 @@
>  #include <linux/kgdb.h>
>  #include <linux/kprobes.h>
>  #include <linux/sched/task_stack.h>
> +#include <linux/smp.h>
>
>  #include <asm/debug-monitors.h>
>  #include <asm/insn.h>
> @@ -353,3 +354,17 @@ int kgdb_arch_remove_breakpoint(struct kgdb_bkpt *bpt)
>         return aarch64_insn_write((void *)bpt->bpt_addr,
>                         *(u32 *)bpt->saved_instr);
>  }
> +
> +#ifdef CONFIG_SMP
> +void kgdb_roundup_cpus(void)
> +{
> +       struct cpumask mask;
> +
> +       cpumask_copy(&mask, cpu_online_mask);
> +       cpumask_clear_cpu(raw_smp_processor_id(), &mask);
> +       if (cpumask_empty(&mask))
> +               return;
> +
> +       arch_send_call_nmi_func_ipi_mask(&mask);
> +}
> +#endif
> diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
> index 27c8ee1..c7158f6e8 100644
> --- a/arch/arm64/kernel/smp.c
> +++ b/arch/arm64/kernel/smp.c
> @@ -31,6 +31,7 @@
>  #include <linux/of.h>
>  #include <linux/irq_work.h>
>  #include <linux/kexec.h>
> +#include <linux/kgdb.h>
>  #include <linux/kvm_host.h>
>
>  #include <asm/alternative.h>
> @@ -976,9 +977,19 @@ void handle_IPI(int ipinr, struct pt_regs *regs)
>                 /* Handle it as a normal interrupt if not in NMI context */
>                 if (!in_nmi())
>                         irq_enter();
> -
> -               /* nop, IPI handlers for special features can be added here. */
> -
> +#ifdef CONFIG_KGDB

My vote would be to keep "ifdef"s out of the middle of functions.  Can
you put your code in "arch/arm64/kernel/kgdb.c" and then have a dummpy
no-op function if "CONFIG_KGDB" isn't defined?


> +               if (atomic_read(&kgdb_active) != -1) {
> +                       /*
> +                        * For kgdb to work properly, we need printk to operate
> +                        * in normal context.
> +                        */
> +                       if (in_nmi())
> +                               printk_nmi_exit();

It feels like all the printk management belongs in kgdb_nmicallback().
...or is there some reason that this isn't a problem for other
platforms using NMI?  Maybe it's just that nobody has noticed it yet?


> +                       kgdb_nmicallback(raw_smp_processor_id(), regs);

Why do you need to call raw_smp_processor_id()?  Are you expecting a
different value than the local variable "cpu"?


> +                       if (in_nmi())
> +                               printk_nmi_enter();
> +               }
> +#endif
>                 if (!in_nmi())
>                         irq_exit();
>                 break;

Not that I really know what I'm talking about since I really don't
know arm64 at this level very well, but I'll ask anyway and probably
look like a fool...  I had a note that said:

* Will Deacon says:
*
* the whole roundup code is sketchy and it's the only place in the kernel
* which tries to perform I-cache maintenance with irqs disabled, leading
* to this nasty hack in the arch code:
*
* https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arm64/include/asm/cacheflush.h#n74

I presume that, if nothing else, the comment needs to be updated.
...but is the situation any better (or worse?) with your new solution?

-Doug
Sumit Garg April 27, 2020, 4:52 a.m. UTC | #2
Hi Doug,

Thanks for your comments.

On Sat, 25 Apr 2020 at 02:17, Doug Anderson <dianders@chromium.org> wrote:
>
> Hi,
>
> On Fri, Apr 24, 2020 at 4:11 AM Sumit Garg <sumit.garg@linaro.org> wrote:
> >
> > arm64 platforms with GICv3 or later supports pseudo NMIs which can be
> > leveraged to round up CPUs which are stuck in hard lockup state with
> > interrupts disabled that wouldn't be possible with a normal IPI.
> >
> > So instead switch to round up CPUs using IPI_CALL_NMI_FUNC. And in
> > case a particular arm64 platform doesn't supports pseudo NMIs,
> > IPI_CALL_NMI_FUNC will act as a normal IPI which maintains existing
> > kgdb functionality.
> >
> > Also, one thing to note here is that with CPUs running in NMI context,
> > kernel has special handling for printk() which involves CPU specific
> > buffers and defering printk() until exit from NMI context. But with kgdb
> > we don't want to defer printk() especially backtrace on corresponding
> > CPUs. So switch to normal printk() context instead prior to entering
> > kgdb context.
> >
> > Signed-off-by: Sumit Garg <sumit.garg@linaro.org>
> > ---
> >  arch/arm64/kernel/kgdb.c | 15 +++++++++++++++
> >  arch/arm64/kernel/smp.c  | 17 ++++++++++++++---
> >  2 files changed, 29 insertions(+), 3 deletions(-)
> >
> > diff --git a/arch/arm64/kernel/kgdb.c b/arch/arm64/kernel/kgdb.c
> > index 4311992..0851ead 100644
> > --- a/arch/arm64/kernel/kgdb.c
> > +++ b/arch/arm64/kernel/kgdb.c
> > @@ -14,6 +14,7 @@
> >  #include <linux/kgdb.h>
> >  #include <linux/kprobes.h>
> >  #include <linux/sched/task_stack.h>
> > +#include <linux/smp.h>
> >
> >  #include <asm/debug-monitors.h>
> >  #include <asm/insn.h>
> > @@ -353,3 +354,17 @@ int kgdb_arch_remove_breakpoint(struct kgdb_bkpt *bpt)
> >         return aarch64_insn_write((void *)bpt->bpt_addr,
> >                         *(u32 *)bpt->saved_instr);
> >  }
> > +
> > +#ifdef CONFIG_SMP
> > +void kgdb_roundup_cpus(void)
> > +{
> > +       struct cpumask mask;
> > +
> > +       cpumask_copy(&mask, cpu_online_mask);
> > +       cpumask_clear_cpu(raw_smp_processor_id(), &mask);
> > +       if (cpumask_empty(&mask))
> > +               return;
> > +
> > +       arch_send_call_nmi_func_ipi_mask(&mask);
> > +}
> > +#endif
> > diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
> > index 27c8ee1..c7158f6e8 100644
> > --- a/arch/arm64/kernel/smp.c
> > +++ b/arch/arm64/kernel/smp.c
> > @@ -31,6 +31,7 @@
> >  #include <linux/of.h>
> >  #include <linux/irq_work.h>
> >  #include <linux/kexec.h>
> > +#include <linux/kgdb.h>
> >  #include <linux/kvm_host.h>
> >
> >  #include <asm/alternative.h>
> > @@ -976,9 +977,19 @@ void handle_IPI(int ipinr, struct pt_regs *regs)
> >                 /* Handle it as a normal interrupt if not in NMI context */
> >                 if (!in_nmi())
> >                         irq_enter();
> > -
> > -               /* nop, IPI handlers for special features can be added here. */
> > -
> > +#ifdef CONFIG_KGDB
>
> My vote would be to keep "ifdef"s out of the middle of functions.  Can
> you put your code in "arch/arm64/kernel/kgdb.c" and then have a dummpy
> no-op function if "CONFIG_KGDB" isn't defined?
>

Sure.

>
> > +               if (atomic_read(&kgdb_active) != -1) {
> > +                       /*
> > +                        * For kgdb to work properly, we need printk to operate
> > +                        * in normal context.
> > +                        */
> > +                       if (in_nmi())
> > +                               printk_nmi_exit();
>
> It feels like all the printk management belongs in kgdb_nmicallback().
> ...or is there some reason that this isn't a problem for other
> platforms using NMI?  Maybe it's just that nobody has noticed it yet?
>

Initially I was skeptical of moving this printk handling in the common
kgdb framework but after exploring other platforms like x86 (probably
unnoticed bug), I agree with you that it belongs to
kgdb_nmicallback(). So I will move it there.

>
> > +                       kgdb_nmicallback(raw_smp_processor_id(), regs);
>
> Why do you need to call raw_smp_processor_id()?  Are you expecting a
> different value than the local variable "cpu"?

Ah, no. Will use the local variable "cpu" instead.

>
>
> > +                       if (in_nmi())
> > +                               printk_nmi_enter();
> > +               }
> > +#endif
> >                 if (!in_nmi())
> >                         irq_exit();
> >                 break;
>
> Not that I really know what I'm talking about since I really don't
> know arm64 at this level very well, but I'll ask anyway and probably
> look like a fool...  I had a note that said:
>
> * Will Deacon says:
> *
> * the whole roundup code is sketchy and it's the only place in the kernel
> * which tries to perform I-cache maintenance with irqs disabled, leading
> * to this nasty hack in the arch code:
> *
> * https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arm64/include/asm/cacheflush.h#n74
>
> I presume that, if nothing else, the comment needs to be updated.
> ...but is the situation any better (or worse?) with your new solution?

I think the situation remains the same with new solution as well. As
either we use IPI being a pseudo NMI or a normal IRQ to roundup CPUs,
kgdb still does I-cache maintenance with irqs disabled which could
lead to a deadlock trying to IPI the secondary CPUs without this nasty
hack in the arch code.

-Sumit

>
> -Doug
diff mbox series

Patch

diff --git a/arch/arm64/kernel/kgdb.c b/arch/arm64/kernel/kgdb.c
index 4311992..0851ead 100644
--- a/arch/arm64/kernel/kgdb.c
+++ b/arch/arm64/kernel/kgdb.c
@@ -14,6 +14,7 @@ 
 #include <linux/kgdb.h>
 #include <linux/kprobes.h>
 #include <linux/sched/task_stack.h>
+#include <linux/smp.h>
 
 #include <asm/debug-monitors.h>
 #include <asm/insn.h>
@@ -353,3 +354,17 @@  int kgdb_arch_remove_breakpoint(struct kgdb_bkpt *bpt)
 	return aarch64_insn_write((void *)bpt->bpt_addr,
 			*(u32 *)bpt->saved_instr);
 }
+
+#ifdef CONFIG_SMP
+void kgdb_roundup_cpus(void)
+{
+	struct cpumask mask;
+
+	cpumask_copy(&mask, cpu_online_mask);
+	cpumask_clear_cpu(raw_smp_processor_id(), &mask);
+	if (cpumask_empty(&mask))
+		return;
+
+	arch_send_call_nmi_func_ipi_mask(&mask);
+}
+#endif
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index 27c8ee1..c7158f6e8 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -31,6 +31,7 @@ 
 #include <linux/of.h>
 #include <linux/irq_work.h>
 #include <linux/kexec.h>
+#include <linux/kgdb.h>
 #include <linux/kvm_host.h>
 
 #include <asm/alternative.h>
@@ -976,9 +977,19 @@  void handle_IPI(int ipinr, struct pt_regs *regs)
 		/* Handle it as a normal interrupt if not in NMI context */
 		if (!in_nmi())
 			irq_enter();
-
-		/* nop, IPI handlers for special features can be added here. */
-
+#ifdef CONFIG_KGDB
+		if (atomic_read(&kgdb_active) != -1) {
+			/*
+			 * For kgdb to work properly, we need printk to operate
+			 * in normal context.
+			 */
+			if (in_nmi())
+				printk_nmi_exit();
+			kgdb_nmicallback(raw_smp_processor_id(), regs);
+			if (in_nmi())
+				printk_nmi_enter();
+		}
+#endif
 		if (!in_nmi())
 			irq_exit();
 		break;