perf/arm-cmn: Add shutdown routine

Message ID	20221215180039.18035-1-blakgeof@amazon.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org> From: Geoff Blake <blakgeof@amazon.com> To: <Robin.Murphy@arm.com> CC: <blakgeof@amazon.com>, Will Deacon <will@kernel.org>, Mark Rutland <mark.rutland@arm.com>, <linux-arm-kernel@lists.infradead.org>, <linux-kernel@vger.kernel.org> Subject: [PATCH] perf/arm-cmn: Add shutdown routine Date: Thu, 15 Dec 2022 12:00:39 -0600 Message-ID: <20221215180039.18035-1-blakgeof@amazon.com> In-Reply-To: <2bb86e97-6cef-700e-70ed-4f303da10fd9@amazon.com> References: <2bb86e97-6cef-700e-70ed-4f303da10fd9@amazon.com> MIME-Version: 1.0 Precedence: list Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" <linux-arm-kernel-bounces@lists.infradead.org> Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org
Series	perf/arm-cmn: Add shutdown routine \| expand perf/arm-cmn: Add shutdown routine

Geoff Blake Dec. 15, 2022, 6 p.m. UTC

Attempt #2 with all the feedback from Robin to do the minimal amount of
shutdown and handle spurious IRQs within the CMN driver but still do
limited logging in the event a spurious IRQ still occurs in the future.
Tested over 100's of kexec's and have no reproduced the spurious IRQs.

The CMN driver does not gracefully handle all
restart cases, such as kexec.  On a kexec if the
arm-cmn driver is in use it can be left in a state
with still active  events that can cause spurious and/or
unhandled interrupts that appear as non-fatal kernel errors
like below, that can be confusing and misleading:

[    3.895093] irq 28: nobody cared (try booting with the "irqpoll" option)
[    3.895170] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.19.0-1011-aws #12
[    3.895172] Hardware name: Amazon EC2 c6g.metal/Not Specified, BIOS 1.0 10/16/2017
[    3.895174] Call trace:
[    3.895175]  dump_backtrace+0xe8/0x150
[    3.895181]  show_stack+0x28/0x70
[    3.895183]  dump_stack_lvl+0x68/0x9c
[    3.895188]  dump_stack+0x1c/0x48
[    3.895190]  __report_bad_irq+0x58/0x138
[    3.895193]  note_interrupt+0x23c/0x360
[    3.895196]  handle_irq_event+0x108/0x1a0
[    3.895198]  handle_fasteoi_irq+0xd0/0x24c
[    3.895201]  generic_handle_domain_irq+0x3c/0x70
[    3.895203]  __gic_handle_irq_from_irqson.isra.0+0xcc/0x2c0
[    3.895207]  gic_handle_irq+0x34/0xb0
[    3.895209]  call_on_irq_stack+0x40/0x50
[    3.895211]  do_interrupt_handler+0xb0/0xb4
[    3.895214]  el1_interrupt+0x4c/0xe0
[    3.895217]  el1h_64_irq_handler+0x1c/0x40
[    3.895220]  el1h_64_irq+0x78/0x7c
[    3.895222]  __do_softirq+0xd0/0x450
[    3.895223]  __irq_exit_rcu+0xcc/0x120
[    3.895227]  irq_exit_rcu+0x20/0x40
[    3.895229]  el1_interrupt+0x50/0xe0
[    3.895231]  el1h_64_irq_handler+0x1c/0x40
[    3.895233]  el1h_64_irq+0x78/0x7c
[    3.895235]  arch_cpu_idle+0x1c/0x6c
[    3.895238]  default_idle_call+0x4c/0x19c
[    3.895240]  cpuidle_idle_call+0x18c/0x1f0
[    3.895243]  do_idle+0xb0/0x11c
[    3.895245]  cpu_startup_entry+0x34/0x40
[    3.895248]  rest_init+0xec/0x104
[    3.895250]  arch_post_acpi_subsys_init+0x0/0x30
[    3.895254]  start_kernel+0x4d0/0x534
[    3.895256]  __primary_switched+0xc4/0xcc
[    3.895259] handlers:
[    3.895292] [<000000008f5364c7>] arm_cmn_handle_irq [arm_cmn]
[    3.895369] Disabling IRQ #28

This type of kernel error can be reproduced by running perf with
an arm_cmn event active and then forcing a kexec.  On return from
the kexec, this message can appear semi-regularly.


Signed-off-by: Geoff Blake <blakgeof@amazon.com>
---
 drivers/perf/arm-cmn.c | 23 +++++++++++++++++++----
 1 file changed, 19 insertions(+), 4 deletions(-)

Geoff Blake Jan. 4, 2023, 3:55 p.m. UTC | #1

Robin, Will,

Happy new year!  Hope I can get some attention back on this patch.  
Only difference from Robin's is it will do limited logging if spurious 
interrupts still happen to occur on current or future CMN implementations.

Thanks,
Geoff

Geoff Blake Jan. 19, 2023, 3:30 p.m. UTC | #2

Robin, Will,

Still looking for a follow-up on this patch, would appreciate another 
review cycle. 

-Geoff

Robin Murphy Jan. 19, 2023, 3:32 p.m. UTC | #3

On 04/01/2023 3:55 pm, Geoff Blake wrote:
> Robin, Will,
> 
> Happy new year!  Hope I can get some attention back on this patch.
> Only difference from Robin's is it will do limited logging if spurious
> interrupts still happen to occur on current or future CMN implementations.

In all honesty I'm not sure what you want me to say... now you've 
written the same patch that I already sent, but still with an incorrect 
commit message, and with some unrelated changes that aren't mentioned 
and have nothing to do with shutdown anyway. Please see:

https://www.kernel.org/doc/html/latest/process/submitting-patches.html#separate-your-changes

If you have a convincing argument that returning IRQ_NONE for unexpected 
spurious interrupts is a real and important concern, then please propose 
a general solution, because if it matters for arm-cmn then it matters 
for hundreds of other drivers too, by rough estimate:

$ git grep -l IRQ_NONE '*.c' | xargs git grep -L IRQF_SHARED | wc -l
834

Thanks,
Robin.

Geoff Blake Jan. 19, 2023, 5:10 p.m. UTC | #4

On Thu, 19 Jan 2023, Robin Murphy wrote:

> If you have a convincing argument that returning IRQ_NONE for unexpected
> spurious interrupts is a real and important concern, then please propose
> a general solution, because if it matters for arm-cmn then it matters
> for hundreds of other drivers too, by rough estimate:
> 
> $ git grep -l IRQ_NONE '*.c' | xargs git grep -L IRQF_SHARED | wc -l
> 834

The general solution for IRQ_NONE exists in the layer above the 
driver, it complains with a visible warning that something might be wrong 
and then moves on. Nothing more is needed.

Your shutdown routine that flips DT_EN in CMN_DT_DTC_CTL is sufficient, as 
after some testing it solves the problem with left over IRQs for my kexec 
use case.  

-Geoff

Will Deacon Jan. 19, 2023, 6:14 p.m. UTC | #5

On Thu, Jan 19, 2023 at 11:10:29AM -0600, Geoff Blake wrote:
> On Thu, 19 Jan 2023, Robin Murphy wrote:
>  
> > If you have a convincing argument that returning IRQ_NONE for unexpected
> > spurious interrupts is a real and important concern, then please propose
> > a general solution, because if it matters for arm-cmn then it matters
> > for hundreds of other drivers too, by rough estimate:
> > 
> > $ git grep -l IRQ_NONE '*.c' | xargs git grep -L IRQF_SHARED | wc -l
> > 834
> 
> The general solution for IRQ_NONE exists in the layer above the 
> driver, it complains with a visible warning that something might be wrong 
> and then moves on. Nothing more is needed.
> 
> Your shutdown routine that flips DT_EN in CMN_DT_DTC_CTL is sufficient, as 
> after some testing it solves the problem with left over IRQs for my kexec 
> use case.  

Please can you two let me know when you've settled on a fix for this? I'm
not going to queue something that you don't both agree on, although it seems
like we're quibbling over details at this point so maybe let's get
_something_ in and then work to improve it later?

Cheers,

Will

perf/arm-cmn: Add shutdown routine

Commit Message

Comments

Patch