Message ID | 1432252433-25206-1-git-send-email-dianders@chromium.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Thu, May 21, 2015 at 04:53:53PM -0700, Doug Anderson wrote: > It's "perf_event_cpu" that we're trying to fix here. For > perf_event_cpu, we actually expect to have a single IRQ per CPU. This > doesn't appear to be an "IRQ_PER_CPU IRQ" because each CPU has a > distinct IRQ number. However, moving a perf event IRQ from one CPU to > another makes no sense since the IRQ can only be handled on the CPU > that they're destined for (we can't access the relevant CP15 registers > on the other CPUs). While we could come up with a new concept just > for perf_event_cpu that indicates that we have an non "per cpu" IRQ > that also shoulnd't be migrated, simply using the already present > IRQF_NOBALANCING seems safe and should work just fine. Let's be clear: what you're talking about is extending the workaround for broken hardware into yet more code. Perf interrupts should _never_ have been connected to SPIs - the hardware designers that did that need to be re-educated. :) > The clocksource files I've checked appear to use IRQF_NOBALANCING for > interrupts that are also supposed to be destined for a CPU. For > instance: > - exynos_mct.c: Used for local (per CPU) timers > - qcom-timer.c: Also for local timer > - dw_apb_timer.c: Register function has "cpu" parameter indicating > that IRQ is targeted at a certain CPU. I don't think these are affected; when a CPU goes down, the CPU local timer is stopped. When that happens, - exynos_mct.c frees its interrupt. - qcom-timer.c doesn't (but ought to) - dw_apb_timer.c doesn't Considering that local timers are "setup" each time a CPU is hotplugged back in, not freeing the interrupt is a bug in itself, as the interrupt won't be able to be claimed when the CPU comes back. So, out of those you list, only exynos_mct.c gets it right, the others are buggy. When they're fixed, they won't cause the affinity warning, because there won't be an IRQ in-use on the offlining CPU. Perf should _probably_ do the same thing - cleanly shut itself down on the CPU which is being unplugged. This is going to become more important if/when we start using FIQs for perf on platforms which support that (which will give much better perf traces, since it'll then be able to perf those IRQ-off regions as well.) So all in all, I think the answer is: - timers need their bugs fixing - perf needs improving to keep their workarounds for broken hardware within the bounds of the perf code rather than having it spread elsewhere.
diff --git a/arch/arm/kernel/irq.c b/arch/arm/kernel/irq.c index 350f188..08399fe 100644 --- a/arch/arm/kernel/irq.c +++ b/arch/arm/kernel/irq.c @@ -145,10 +145,10 @@ static bool migrate_one_irq(struct irq_desc *desc) bool ret = false; /* - * If this is a per-CPU interrupt, or the affinity does not + * If this is a non-balancable interrupt, or the affinity does not * include this CPU, then we have nothing to do. */ - if (irqd_is_per_cpu(d) || !cpumask_test_cpu(smp_processor_id(), affinity)) + if (!irqd_can_balance(d) || !cpumask_test_cpu(smp_processor_id(), affinity)) return false; if (cpumask_any_and(affinity, cpu_online_mask) >= nr_cpu_ids) {
Right now we don't ever try to break affinity for "per CPU" IRQs when a CPU goes down. We should apply this logic to all non-balancable IRQs. All non-balancable IRQs I've can find are supposed to be targeted at a specific CPU and don't make sense on other CPUs. From a "grep" the list of interrupts possible on ARM devices that are non-balancable and _not_ per CPU consists of at most things in these files: - arch/arm/kernel/perf_event_cpu.c - drivers/clocksource/*.c It's "perf_event_cpu" that we're trying to fix here. For perf_event_cpu, we actually expect to have a single IRQ per CPU. This doesn't appear to be an "IRQ_PER_CPU IRQ" because each CPU has a distinct IRQ number. However, moving a perf event IRQ from one CPU to another makes no sense since the IRQ can only be handled on the CPU that they're destined for (we can't access the relevant CP15 registers on the other CPUs). While we could come up with a new concept just for perf_event_cpu that indicates that we have an non "per cpu" IRQ that also shoulnd't be migrated, simply using the already present IRQF_NOBALANCING seems safe and should work just fine. The clocksource files I've checked appear to use IRQF_NOBALANCING for interrupts that are also supposed to be destined for a CPU. For instance: - exynos_mct.c: Used for local (per CPU) timers - qcom-timer.c: Also for local timer - dw_apb_timer.c: Register function has "cpu" parameter indicating that IRQ is targeted at a certain CPU. Note that without this change if you're doing perf recording across a suspend/resume cycle (where CPUs go down and then come back up) you'll get warnings about 'IRQX no longer affine to CPUn', then eventually get warnings about 'irq X: nobody cared (try booting with the "irqpoll" option)' and 'Disabling IRQ #X'. When this happens (obviously) perf recording stops. After this change problems are resolved. A similar change ought to be made to arm64 and likely to other architectures as well if this concept of "per cpu" interrupts with unique irq numbers makes sense there too). Signed-off-by: Doug Anderson <dianders@chromium.org> --- arch/arm/kernel/irq.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)