diff mbox series

genirq: clear IRQS_PENDING in irq descriptor

Message ID 20250211023040.180330-1-bo.ye@mediatek.com (mailing list archive)
State New
Headers show
Series genirq: clear IRQS_PENDING in irq descriptor | expand

Commit Message

Bo Ye Feb. 11, 2025, 2:30 a.m. UTC
From: Bosser Ye <bo.ye@mediatek.com>

In the kernel-6.6 IRQ subsystem, there is a case of IRQ retrigger:
Due to the possibility of electrical signal glitches causing false interrupts for edge-triggered type IRQs,
it is necessary to clear any potential false interrupts or re-triggered interrupt signals from the interrupt
source between disabling and enabling the edge-triggered IRQ.

When the module using this IRQ may disable the IRQ as needed and then If the disabled IRQ is triggered, the IRQ
subsystem will set the istate of the corresponding IRQ descriptor to pending. After the module using this
IRQ completes other tasks, it clears the pending state on the GIC using irq_set_irqchip_state(). However,
the pending state in the IRQ descriptor's istate is not cleared, which leads to the module receiving the IRQ
again after enabling it, even though the interrupt source has not triggered, because the IRQ subsystem
retriggers the interrupt based on the pending state in the IRQ descriptor.

[ 1015.093550] [T300432] ccci_fsm: CPU: 3 PID: 432 Comm: ccci_fsm Tainted: P        W  OE      6.6.30-android15-8-o-g3d1adaff8937-4k #1 4e6ae6c76d81ac612e982b5e84c39c55b332fb77
...
[ 1015.093609] [T300432] ccci_fsm: Call trace:
[ 1015.093628] [T300432] ccci_fsm:  dump_backtrace+0xec/0x138
[ 1015.093668] [T300432] ccci_fsm:  show_stack+0x18/0x28
[ 1015.093697] [T300432] ccci_fsm:  dump_stack_lvl+0x50/0x6c
[ 1015.093728] [T300432] ccci_fsm:  dump_stack+0x18/0x24
[ 1015.093747] [T300432] ccci_fsm:  gic_retrigger+0x74/0x7c
[ 1015.093764] [T300432] ccci_fsm:  check_irq_resend+0x8c/0x16c
[ 1015.093777] [T300432] ccci_fsm:  irq_startup+0x2ec/0x360
[ 1015.093788] [T300432] ccci_fsm:  enable_irq+0x84/0xf4
[ 1015.093798] [T300432] ccci_fsm:  wdt_enable_irq+0x2c/0xec [ccci_md_all 029335d5c64293385f41211c1eb232e631274782]
[ 1015.094263] [T300432] ccci_fsm:  md_cd_start+0x34c/0x510 [ccci_md_all 029335d5c64293385f41211c1eb232e631274782]
[ 1015.094707] [T300432] ccci_fsm:  ccci_md_start+0x38/0x48 [ccci_md_all 029335d5c64293385f41211c1eb232e631274782]
[ 1015.095150] [T300432] ccci_fsm:  fsm_routine_start+0x448/0x1ed8 [ccci_md_all 029335d5c64293385f41211c1eb232e631274782]
[ 1015.095593] [T300432] ccci_fsm:  fsm_main_thread+0x20c/0xa78 [ccci_md_all 029335d5c64293385f41211c1eb232e631274782]
[ 1015.096035] [T300432] ccci_fsm:  kthread+0x110/0x1b8
[ 1015.096049] [T300432] ccci_fsm:  ret_from_fork+0x10/0x20
...
[ 1015.096067] [    C0] swapper/0: HWIRQ 107 handle_fasteoi_irq[714] set desc->istates to IRQS_PENDING // CPU execute IRQ's ISR

Solution: the corresponding upstream patch modifies the irq_set_irqchip_state(...) in the IRQ subsystem.
The purpose is to clear the pending state in the IRQ descriptor's istate when successfully clearing the
corresponding IRQ on the GIC.

Test: Stress tests have verified that the patch is effective and does not cause any side effects.

Signed-off-by: Bosser Ye <bo.ye@mediatek.com>
---
 kernel/irq/manage.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

Comments

Thomas Gleixner Feb. 13, 2025, 11:39 a.m. UTC | #1
On Tue, Feb 11 2025 at 10:30, Bo Ye wrote:
> In the kernel-6.6 IRQ subsystem, there is a case of IRQ retrigger:

How is kernel 6.6 relevant here?

> Due to the possibility of electrical signal glitches causing false
> interrupts for edge-triggered type IRQs, it is necessary to clear any
> potential false interrupts or re-triggered interrupt signals from the
> interrupt source between disabling and enabling the edge-triggered
> IRQ.

This claim is just wrong.

A disable_irq(); enable_irq(); sequence must preserve the pending bit so
that interrupts do not get lost. The lazy disabling mechanism is there
to guarantee that.

> When the module using this IRQ may disable the IRQ as needed and then
> If the disabled IRQ is triggered, the IRQ subsystem will set the
> istate of the corresponding IRQ descriptor to pending.

Rightfully so.

> After the module using this IRQ completes other tasks, it clears the
> pending state on the GIC using irq_set_irqchip_state().

So this is a problem related to a specific out of tree driver and the
GIC, right? 

> However, the pending state in the IRQ descriptor's istate is not
> cleared, which leads to the module receiving the IRQ again after
> enabling it, even though the interrupt source has not triggered,
> because the IRQ subsystem retriggers the interrupt based on the
> pending state in the IRQ descriptor.

What's the actual problem here? A driver has to be able to handle
spurious interrupts at any given time.

> Solution: the corresponding upstream patch modifies the
> irq_set_irqchip_state(...) in the IRQ subsystem.

Which corresponding upstream patch?

> The purpose is to clear the pending state in the IRQ descriptor's
> istate when successfully clearing the corresponding IRQ on the GIC.

Sure that's the purpose, but you fail to explain the actual problem and
the interaction with irq_set_irqchip_state().

Thanks,

        tglx
diff mbox series

Patch

diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
index 8a936c1ffad3..ad1cefb2e5aa 100644
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -2893,8 +2893,11 @@  int irq_set_irqchip_state(unsigned int irq, enum irqchip_irq_state which,
 #endif
 	} while (data);
 
-	if (data)
+	if (data) {
 		err = chip->irq_set_irqchip_state(data, which, val);
+		if (!err && which == IRQCHIP_STATE_PENDING && !val)
+			desc->istate &= ~IRQS_PENDING;
+	}
 
 out_unlock:
 	irq_put_desc_busunlock(desc, flags);