Message ID | 20201012135724.110579-2-greentime.hu@sifive.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [1/2] irqchip/sifive-plic: Enable or disable interrupt based on its previous setting | expand |
On 2020-10-12 14:57, Greentime Hu wrote: > In commit 2ca0b460bbcb ("genirq/affinity: Make affinity setting if > activated opt-in"), > it added irqd_affinity_on_activate() checking in the function > irq_set_affinity_deactivated() so it will return false here. > In that case, it will call irq_try_set_affinity() -> plic_irq_toggle() > which will enable the interrupt to cause the CPU hang. > > if (irq_set_affinity_deactivated()) > return 0; > ... > irq_try_set_affinity(data, mask, force); > > [ 919.015783] rcu: INFO: rcu_sched detected stalls on CPUs/tasks: > [ 919.020922] rcu: 0-...0: (0 ticks this GP) > idle=7d2/1/0x4000000000000002 softirq=1424/1424 fqs=105807 > [ 919.030295] (detected by 1, t=225825 jiffies, g=1561, q=3496) > [ 919.036109] Task dump for CPU 0: > [ 919.039321] kworker/0:1 R running task 0 30 2 > 0x00000008 > [ 919.046359] Workqueue: events set_brightness_delayed > [ 919.051302] Call Trace: > [ 919.053738] [<ffffffe000930d92>] __schedule+0x194/0x4de > [ 982.035783] rcu: INFO: rcu_sched detected stalls on CPUs/tasks: > [ 982.040923] rcu: 0-...0: (0 ticks this GP) > idle=7d2/1/0x4000000000000002 softirq=1424/1424 fqs=113325 > [ 982.050294] (detected by 1, t=241580 jiffies, g=1561, q=3509) > [ 982.056108] Task dump for CPU 0: > [ 982.059321] kworker/0:1 R running task 0 30 2 > 0x00000008 > [ 982.066359] Workqueue: events set_brightness_delayed > [ 982.071302] Call Trace: > [ 982.073739] [<ffffffe000930d92>] __schedule+0x194/0x4de > [..] > > After applying this patch, the CPU hang issue can be fixed. > > Signed-off-by: Greentime Hu <greentime.hu@sifive.com> > --- > drivers/irqchip/irq-sifive-plic.c | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/drivers/irqchip/irq-sifive-plic.c > b/drivers/irqchip/irq-sifive-plic.c > index 4cc8a2657a6d..8a673d9cff69 100644 > --- a/drivers/irqchip/irq-sifive-plic.c > +++ b/drivers/irqchip/irq-sifive-plic.c > @@ -183,10 +183,14 @@ static int plic_irqdomain_map(struct irq_domain > *d, unsigned int irq, > irq_hw_number_t hwirq) > { > struct plic_priv *priv = d->host_data; > + struct irq_data *irqd; > > irq_domain_set_info(d, irq, hwirq, &plic_chip, d->host_data, > handle_fasteoi_irq, NULL, NULL); > irq_set_noprobe(irq); > + irqd = irq_get_irq_data(irq); > + irqd_set_single_target(irqd); > + irqd_set_affinity_on_activate(irqd); > irq_set_affinity(irq, &priv->lmask); > return 0; > } How does this fix anything? The plic driver doesn't have an activate callback, so how does it make any difference? I get the feeling this papers over another issue. M.
On Thu, Oct 15 2020 at 22:23, Marc Zyngier wrote: > On 2020-10-12 14:57, Greentime Hu wrote: >> In commit 2ca0b460bbcb ("genirq/affinity: Make affinity setting if >> activated opt-in"), This commit sha does not exist in mainline. Referencing a random stable kernel tree is useless. >> it added irqd_affinity_on_activate() checking in the function >> irq_set_affinity_deactivated() so it will return false here. >> In that case, it will call irq_try_set_affinity() -> plic_irq_toggle() >> which will enable the interrupt to cause the CPU hang. >> @@ -183,10 +183,14 @@ static int plic_irqdomain_map(struct irq_domain >> *d, unsigned int irq, >> irq_hw_number_t hwirq) >> { >> struct plic_priv *priv = d->host_data; >> + struct irq_data *irqd; >> >> irq_domain_set_info(d, irq, hwirq, &plic_chip, d->host_data, >> handle_fasteoi_irq, NULL, NULL); >> irq_set_noprobe(irq); >> + irqd = irq_get_irq_data(irq); >> + irqd_set_single_target(irqd); >> + irqd_set_affinity_on_activate(irqd); >> irq_set_affinity(irq, &priv->lmask); >> return 0; >> } > > How does this fix anything? The plic driver doesn't have an activate > callback, so how does it make any difference? I get the feeling this > papers over another issue. The existence of the activate callback is not really the interesting part. And of course the analysis is completely bogus. The referenced commit is a follow up to: baedb87d1b53 ("genirq/affinity: Handle affinity setting on inactive interrupts correctly") which was added on Jul 17 and the fix was added on Jul 24: f0c7baca1800 ("genirq/affinity: Make affinity setting if activated opt-in") The latter restored the original behaviour within 7 days, except for x86. And the original behaviour was that the core did not care about the activated state at all. It just invoked irqchip->irq_set_affinity() unconditionally, which is still does except when the interrupt is marked with irqd_set_affinity_on_activate(). So the plic code had this problem before that change already: irq_set_noprobe(irq); irq_set_affinity(irq, &priv->lmask); Mapping happens way before anything else, so Hu is definitely barking up the wrong tree. What the patch works around is the irq_set_affinity() callback which is completely bogus and still broken even after that duct tape which" fixes" the boot fail. irq_set_affinity() can be called on masked interrupts so this: plic_irq_toggle(&priv->lmask, d, 0); plic_irq_toggle(cpumask_of(cpu), d, 1); will always unconditionally unmask the interrupt when affinity is changed. plic_irq_toggle(cpumask_of(cpu), d, !irqd_irq_masked(d)); is all it needs to work everywhere. Obvious, right? The changelog wants to be "...: Fix broken irq_set_affinity() callback" and the Fixes: tag referencing the commit which introduced that unconditional unmask. Oh well. Thanks, tglx
diff --git a/drivers/irqchip/irq-sifive-plic.c b/drivers/irqchip/irq-sifive-plic.c index 4cc8a2657a6d..8a673d9cff69 100644 --- a/drivers/irqchip/irq-sifive-plic.c +++ b/drivers/irqchip/irq-sifive-plic.c @@ -183,10 +183,14 @@ static int plic_irqdomain_map(struct irq_domain *d, unsigned int irq, irq_hw_number_t hwirq) { struct plic_priv *priv = d->host_data; + struct irq_data *irqd; irq_domain_set_info(d, irq, hwirq, &plic_chip, d->host_data, handle_fasteoi_irq, NULL, NULL); irq_set_noprobe(irq); + irqd = irq_get_irq_data(irq); + irqd_set_single_target(irqd); + irqd_set_affinity_on_activate(irqd); irq_set_affinity(irq, &priv->lmask); return 0; }
In commit 2ca0b460bbcb ("genirq/affinity: Make affinity setting if activated opt-in"), it added irqd_affinity_on_activate() checking in the function irq_set_affinity_deactivated() so it will return false here. In that case, it will call irq_try_set_affinity() -> plic_irq_toggle() which will enable the interrupt to cause the CPU hang. if (irq_set_affinity_deactivated()) return 0; ... irq_try_set_affinity(data, mask, force); [ 919.015783] rcu: INFO: rcu_sched detected stalls on CPUs/tasks: [ 919.020922] rcu: 0-...0: (0 ticks this GP) idle=7d2/1/0x4000000000000002 softirq=1424/1424 fqs=105807 [ 919.030295] (detected by 1, t=225825 jiffies, g=1561, q=3496) [ 919.036109] Task dump for CPU 0: [ 919.039321] kworker/0:1 R running task 0 30 2 0x00000008 [ 919.046359] Workqueue: events set_brightness_delayed [ 919.051302] Call Trace: [ 919.053738] [<ffffffe000930d92>] __schedule+0x194/0x4de [ 982.035783] rcu: INFO: rcu_sched detected stalls on CPUs/tasks: [ 982.040923] rcu: 0-...0: (0 ticks this GP) idle=7d2/1/0x4000000000000002 softirq=1424/1424 fqs=113325 [ 982.050294] (detected by 1, t=241580 jiffies, g=1561, q=3509) [ 982.056108] Task dump for CPU 0: [ 982.059321] kworker/0:1 R running task 0 30 2 0x00000008 [ 982.066359] Workqueue: events set_brightness_delayed [ 982.071302] Call Trace: [ 982.073739] [<ffffffe000930d92>] __schedule+0x194/0x4de [..] After applying this patch, the CPU hang issue can be fixed. Signed-off-by: Greentime Hu <greentime.hu@sifive.com> --- drivers/irqchip/irq-sifive-plic.c | 4 ++++ 1 file changed, 4 insertions(+)