diff mbox series

[2/2] irqchip/sifive-plic: Fix the interrupt was enabled accidentally issue.

Message ID 20201012135724.110579-2-greentime.hu@sifive.com (mailing list archive)
State New, archived
Headers show
Series [1/2] irqchip/sifive-plic: Enable or disable interrupt based on its previous setting | expand

Commit Message

Greentime Hu Oct. 12, 2020, 1:57 p.m. UTC
In commit 2ca0b460bbcb ("genirq/affinity: Make affinity setting if activated opt-in"),
it added irqd_affinity_on_activate() checking in the function
irq_set_affinity_deactivated() so it will return false here.
In that case, it will call irq_try_set_affinity() -> plic_irq_toggle()
which will enable the interrupt to cause the CPU hang.

	if (irq_set_affinity_deactivated())
		return 0;
	...
	irq_try_set_affinity(data, mask, force);

[  919.015783] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
[  919.020922] rcu:     0-...0: (0 ticks this GP)
idle=7d2/1/0x4000000000000002 softirq=1424/1424 fqs=105807
[  919.030295]  (detected by 1, t=225825 jiffies, g=1561, q=3496)
[  919.036109] Task dump for CPU 0:
[  919.039321] kworker/0:1     R  running task        0    30      2 0x00000008
[  919.046359] Workqueue: events set_brightness_delayed
[  919.051302] Call Trace:
[  919.053738] [<ffffffe000930d92>] __schedule+0x194/0x4de
[  982.035783] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
[  982.040923] rcu:     0-...0: (0 ticks this GP)
idle=7d2/1/0x4000000000000002 softirq=1424/1424 fqs=113325
[  982.050294]  (detected by 1, t=241580 jiffies, g=1561, q=3509)
[  982.056108] Task dump for CPU 0:
[  982.059321] kworker/0:1     R  running task        0    30      2 0x00000008
[  982.066359] Workqueue: events set_brightness_delayed
[  982.071302] Call Trace:
[  982.073739] [<ffffffe000930d92>] __schedule+0x194/0x4de
[..]

After applying this patch, the CPU hang issue can be fixed.

Signed-off-by: Greentime Hu <greentime.hu@sifive.com>
---
 drivers/irqchip/irq-sifive-plic.c | 4 ++++
 1 file changed, 4 insertions(+)

Comments

Marc Zyngier Oct. 15, 2020, 9:23 p.m. UTC | #1
On 2020-10-12 14:57, Greentime Hu wrote:
> In commit 2ca0b460bbcb ("genirq/affinity: Make affinity setting if
> activated opt-in"),
> it added irqd_affinity_on_activate() checking in the function
> irq_set_affinity_deactivated() so it will return false here.
> In that case, it will call irq_try_set_affinity() -> plic_irq_toggle()
> which will enable the interrupt to cause the CPU hang.
> 
> 	if (irq_set_affinity_deactivated())
> 		return 0;
> 	...
> 	irq_try_set_affinity(data, mask, force);
> 
> [  919.015783] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
> [  919.020922] rcu:     0-...0: (0 ticks this GP)
> idle=7d2/1/0x4000000000000002 softirq=1424/1424 fqs=105807
> [  919.030295]  (detected by 1, t=225825 jiffies, g=1561, q=3496)
> [  919.036109] Task dump for CPU 0:
> [  919.039321] kworker/0:1     R  running task        0    30      2 
> 0x00000008
> [  919.046359] Workqueue: events set_brightness_delayed
> [  919.051302] Call Trace:
> [  919.053738] [<ffffffe000930d92>] __schedule+0x194/0x4de
> [  982.035783] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
> [  982.040923] rcu:     0-...0: (0 ticks this GP)
> idle=7d2/1/0x4000000000000002 softirq=1424/1424 fqs=113325
> [  982.050294]  (detected by 1, t=241580 jiffies, g=1561, q=3509)
> [  982.056108] Task dump for CPU 0:
> [  982.059321] kworker/0:1     R  running task        0    30      2 
> 0x00000008
> [  982.066359] Workqueue: events set_brightness_delayed
> [  982.071302] Call Trace:
> [  982.073739] [<ffffffe000930d92>] __schedule+0x194/0x4de
> [..]
> 
> After applying this patch, the CPU hang issue can be fixed.
> 
> Signed-off-by: Greentime Hu <greentime.hu@sifive.com>
> ---
>  drivers/irqchip/irq-sifive-plic.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/drivers/irqchip/irq-sifive-plic.c
> b/drivers/irqchip/irq-sifive-plic.c
> index 4cc8a2657a6d..8a673d9cff69 100644
> --- a/drivers/irqchip/irq-sifive-plic.c
> +++ b/drivers/irqchip/irq-sifive-plic.c
> @@ -183,10 +183,14 @@ static int plic_irqdomain_map(struct irq_domain
> *d, unsigned int irq,
>  			      irq_hw_number_t hwirq)
>  {
>  	struct plic_priv *priv = d->host_data;
> +	struct irq_data *irqd;
> 
>  	irq_domain_set_info(d, irq, hwirq, &plic_chip, d->host_data,
>  			    handle_fasteoi_irq, NULL, NULL);
>  	irq_set_noprobe(irq);
> +	irqd = irq_get_irq_data(irq);
> +	irqd_set_single_target(irqd);
> +	irqd_set_affinity_on_activate(irqd);
>  	irq_set_affinity(irq, &priv->lmask);
>  	return 0;
>  }

How does this fix anything? The plic driver doesn't have an activate
callback, so how does it make any difference? I get the feeling this
papers over another issue.

         M.
Thomas Gleixner Oct. 16, 2020, 1:42 p.m. UTC | #2
On Thu, Oct 15 2020 at 22:23, Marc Zyngier wrote:
> On 2020-10-12 14:57, Greentime Hu wrote:
>> In commit 2ca0b460bbcb ("genirq/affinity: Make affinity setting if
>> activated opt-in"),

This commit sha does not exist in mainline. Referencing a random stable
kernel tree is useless.

>> it added irqd_affinity_on_activate() checking in the function
>> irq_set_affinity_deactivated() so it will return false here.
>> In that case, it will call irq_try_set_affinity() -> plic_irq_toggle()
>> which will enable the interrupt to cause the CPU hang.
>> @@ -183,10 +183,14 @@ static int plic_irqdomain_map(struct irq_domain
>> *d, unsigned int irq,
>>  			      irq_hw_number_t hwirq)
>>  {
>>  	struct plic_priv *priv = d->host_data;
>> +	struct irq_data *irqd;
>> 
>>  	irq_domain_set_info(d, irq, hwirq, &plic_chip, d->host_data,
>>  			    handle_fasteoi_irq, NULL, NULL);
>>  	irq_set_noprobe(irq);
>> +	irqd = irq_get_irq_data(irq);
>> +	irqd_set_single_target(irqd);
>> +	irqd_set_affinity_on_activate(irqd);
>>  	irq_set_affinity(irq, &priv->lmask);
>>  	return 0;
>>  }
>
> How does this fix anything? The plic driver doesn't have an activate
> callback, so how does it make any difference? I get the feeling this
> papers over another issue.

The existence of the activate callback is not really the interesting
part. And of course the analysis is completely bogus.

The referenced commit is a follow up to:

baedb87d1b53 ("genirq/affinity: Handle affinity setting on inactive interrupts correctly")

which was added on Jul 17 and the fix was added on Jul 24:

f0c7baca1800 ("genirq/affinity: Make affinity setting if activated opt-in")

The latter restored the original behaviour within 7 days, except for
x86. And the original behaviour was that the core did not care about the
activated state at all. It just invoked irqchip->irq_set_affinity()
unconditionally, which is still does except when the interrupt is marked
with irqd_set_affinity_on_activate().

So the plic code had this problem before that change already:

 	irq_set_noprobe(irq);
 	irq_set_affinity(irq, &priv->lmask);

Mapping happens way before anything else, so Hu is definitely barking up
the wrong tree.

What the patch works around is the irq_set_affinity() callback which is
completely bogus and still broken even after that duct tape which"
fixes" the boot fail.

irq_set_affinity() can be called on masked interrupts so this:

        plic_irq_toggle(&priv->lmask, d, 0);
        plic_irq_toggle(cpumask_of(cpu), d, 1);

will always unconditionally unmask the interrupt when affinity is
changed.

        plic_irq_toggle(cpumask_of(cpu), d, !irqd_irq_masked(d));

is all it needs to work everywhere. Obvious, right?

The changelog wants to be "...: Fix broken irq_set_affinity() callback"
and the Fixes: tag referencing the commit which introduced that
unconditional unmask. Oh well.

Thanks,

        tglx
diff mbox series

Patch

diff --git a/drivers/irqchip/irq-sifive-plic.c b/drivers/irqchip/irq-sifive-plic.c
index 4cc8a2657a6d..8a673d9cff69 100644
--- a/drivers/irqchip/irq-sifive-plic.c
+++ b/drivers/irqchip/irq-sifive-plic.c
@@ -183,10 +183,14 @@  static int plic_irqdomain_map(struct irq_domain *d, unsigned int irq,
 			      irq_hw_number_t hwirq)
 {
 	struct plic_priv *priv = d->host_data;
+	struct irq_data *irqd;
 
 	irq_domain_set_info(d, irq, hwirq, &plic_chip, d->host_data,
 			    handle_fasteoi_irq, NULL, NULL);
 	irq_set_noprobe(irq);
+	irqd = irq_get_irq_data(irq);
+	irqd_set_single_target(irqd);
+	irqd_set_affinity_on_activate(irqd);
 	irq_set_affinity(irq, &priv->lmask);
 	return 0;
 }