Message ID | 877dv2rv25.fsf@nanos.tec.linutronix.de (mailing list archive) |
---|---|
State | Mainlined |
Commit | baedb87d1b53532f81b4bd0387f83b05d4f7eb9a |
Headers | show |
Series | [V2] genirq/affinity: Handle affinity setting on inactive interrupts correctly | expand |
Hi, On Fri, 17 Jul 2020 18:00:02 +0200 Thomas Gleixner <tglx@linutronix.de> wrote: > Setting interrupt affinity on inactive interrupts is inconsistent when > hierarchical irq domains are enabled. The core code should just store the > affinity and not call into the irq chip driver for inactive interrupts > because the chip drivers may not be in a state to handle such requests. > > X86 has a hacky workaround for that but all other irq chips have not which > causes problems e.g. on GIC V3 ITS. > > Instead of adding more ugly hacks all over the place, solve the problem in > the core code. If the affinity is set on an inactive interrupt then: > > - Store it in the irq descriptors affinity mask > - Update the effective affinity to reflect that so user space has > a consistent view > - Don't call into the irq chip driver > > This is the core equivalent of the X86 workaround and works correctly > because the affinity setting is established in the irq chip when the > interrupt is activated later on. > > Note, that this is only effective when hierarchical irq domains are enabled > by the architecture. Doing it unconditionally would break legacy irq chip > implementations. > > For hierarchial irq domains this works correctly as none of the drivers can > have a dependency on affinity setting in inactive state by design. > > Remove the X86 workaround as it is not longer required. > > Fixes: 02edee152d6e ("x86/apic/vector: Ignore set_affinity call for inactive interrupts") > Reported-by: Ali Saidi <alisaidi@amazon.com> > Signed-off-by: Thomas Gleixner <tglx@linutronix.de> > Cc: stable@vger.kernel.org > Link: https://lore.kernel.org/r/20200529015501.15771-1-alisaidi@amazon.com > --- > V2: Fix the fallout for CONFIG_GENERIC_IRQ_EFFECTIVE_AFF_MASK=n (0day) It seems that this patch breaks perf events on RK3288 because the PMU interrupts that should be per-cpu are now all on CPU0 so no events are collected from CPUs 1-3 and those interrupts are killed as spurious after a few seconds. I'm seeing this on 4.19.134 and 5.4.53 but as far as I can tell the relevant code hasn't changed through to next-20200723. Reverting the backport of this change fixes the problem. It looks like what happens is that because the interrupts are not per-CPU in the hardware, armpmu_request_irq() calls irq_force_affinity() while the interrupt is deactivated and then request_irq() with IRQF_PERCPU | IRQF_NOBALANCING. Now when irq_startup() runs with IRQ_STARTUP_NORMAL, it calls irq_setup_affinity() which returns early because IRQF_PERCPU and IRQF_NOBALANCING are set, leaving the interrupt on its original CPU. At this point /proc/interrupts clearly shows the interrupts occurring on CPU0 despite /proc/irq/N/effective_affinity and /proc/irq/N/smp_affinity showing them spread across the cores as expected. I don't think I understand what's meant to happen well enough to propose a patch, but hopefully the above explanation explains the problem. Regards, John > --- > arch/x86/kernel/apic/vector.c | 22 +++++----------------- > kernel/irq/manage.c | 37 +++++++++++++++++++++++++++++++++++-- > 2 files changed, 40 insertions(+), 19 deletions(-) > > --- a/arch/x86/kernel/apic/vector.c > +++ b/arch/x86/kernel/apic/vector.c > @@ -446,12 +446,10 @@ static int x86_vector_activate(struct ir > trace_vector_activate(irqd->irq, apicd->is_managed, > apicd->can_reserve, reserve); > > - /* Nothing to do for fixed assigned vectors */ > - if (!apicd->can_reserve && !apicd->is_managed) > - return 0; > - > raw_spin_lock_irqsave(&vector_lock, flags); > - if (reserve || irqd_is_managed_and_shutdown(irqd)) > + if (!apicd->can_reserve && !apicd->is_managed) > + assign_irq_vector_any_locked(irqd); > + else if (reserve || irqd_is_managed_and_shutdown(irqd)) > vector_assign_managed_shutdown(irqd); > else if (apicd->is_managed) > ret = activate_managed(irqd); > @@ -775,20 +773,10 @@ void lapic_offline(void) > static int apic_set_affinity(struct irq_data *irqd, > const struct cpumask *dest, bool force) > { > - struct apic_chip_data *apicd = apic_chip_data(irqd); > int err; > > - /* > - * Core code can call here for inactive interrupts. For inactive > - * interrupts which use managed or reservation mode there is no > - * point in going through the vector assignment right now as the > - * activation will assign a vector which fits the destination > - * cpumask. Let the core code store the destination mask and be > - * done with it. > - */ > - if (!irqd_is_activated(irqd) && > - (apicd->is_managed || apicd->can_reserve)) > - return IRQ_SET_MASK_OK; > + if (WARN_ON_ONCE(!irqd_is_activated(irqd))) > + return -EIO; > > raw_spin_lock(&vector_lock); > cpumask_and(vector_searchmask, dest, cpu_online_mask); > --- a/kernel/irq/manage.c > +++ b/kernel/irq/manage.c > @@ -195,9 +195,9 @@ void irq_set_thread_affinity(struct irq_ > set_bit(IRQTF_AFFINITY, &action->thread_flags); > } > > +#ifdef CONFIG_GENERIC_IRQ_EFFECTIVE_AFF_MASK > static void irq_validate_effective_affinity(struct irq_data *data) > { > -#ifdef CONFIG_GENERIC_IRQ_EFFECTIVE_AFF_MASK > const struct cpumask *m = irq_data_get_effective_affinity_mask(data); > struct irq_chip *chip = irq_data_get_irq_chip(data); > > @@ -205,9 +205,19 @@ static void irq_validate_effective_affin > return; > pr_warn_once("irq_chip %s did not update eff. affinity mask of irq %u\n", > chip->name, data->irq); > -#endif > } > > +static inline void irq_init_effective_affinity(struct irq_data *data, > + const struct cpumask *mask) > +{ > + cpumask_copy(irq_data_get_effective_affinity_mask(data), mask); > +} > +#else > +static inline void irq_validate_effective_affinity(struct irq_data *data) { } > +static inline void irq_init_effective_affinity(struct irq_data *data, > + const struct cpumask *mask) { } > +#endif > + > int irq_do_set_affinity(struct irq_data *data, const struct cpumask *mask, > bool force) > { > @@ -304,6 +314,26 @@ static int irq_try_set_affinity(struct i > return ret; > } > > +static bool irq_set_affinity_deactivated(struct irq_data *data, > + const struct cpumask *mask, bool force) > +{ > + struct irq_desc *desc = irq_data_to_desc(data); > + > + /* > + * If the interrupt is not yet activated, just store the affinity > + * mask and do not call the chip driver at all. On activation the > + * driver has to make sure anyway that the interrupt is in a > + * useable state so startup works. > + */ > + if (!IS_ENABLED(CONFIG_IRQ_DOMAIN_HIERARCHY) || irqd_is_activated(data)) > + return false; > + > + cpumask_copy(desc->irq_common_data.affinity, mask); > + irq_init_effective_affinity(data, mask); > + irqd_set(data, IRQD_AFFINITY_SET); > + return true; > +} > + > int irq_set_affinity_locked(struct irq_data *data, const struct cpumask *mask, > bool force) > { > @@ -314,6 +344,9 @@ int irq_set_affinity_locked(struct irq_d > if (!chip || !chip->irq_set_affinity) > return -EINVAL; > > + if (irq_set_affinity_deactivated(data, mask, force)) > + return 0; > + > if (irq_can_move_pcntxt(data) && !irqd_is_setaffinity_pending(data)) { > ret = irq_try_set_affinity(data, mask, force); > } else {
John, John Keeping <john@metanate.com> writes: > On Fri, 17 Jul 2020 18:00:02 +0200 > Thomas Gleixner <tglx@linutronix.de> wrote: > It seems that this patch breaks perf events on RK3288 because the PMU > interrupts that should be per-cpu are now all on CPU0 so no events are > collected from CPUs 1-3 and those interrupts are killed as spurious > after a few seconds. > > I'm seeing this on 4.19.134 and 5.4.53 but as far as I can tell the > relevant code hasn't changed through to next-20200723. Reverting the > backport of this change fixes the problem. Bah. > It looks like what happens is that because the interrupts are not > per-CPU in the hardware, armpmu_request_irq() calls irq_force_affinity() > while the interrupt is deactivated and then request_irq() with > IRQF_PERCPU | IRQF_NOBALANCING. > > Now when irq_startup() runs with IRQ_STARTUP_NORMAL, it calls > irq_setup_affinity() which returns early because IRQF_PERCPU and > IRQF_NOBALANCING are set, leaving the interrupt on its original CPU. Right. My brain tricked me to believe that we made activation mandatory, but that's not. I have some ideas for a trivial generic way to solve this without undoing the commit in question and without going through all the irq chip drivers. So far everything I came up with is butt ugly. Maybe Marc has some brilliant idea. Sorry for the wreckage and thanks for the excellent problem description. I'll come back to you in the next days. Thanks, tglx
John, Thomas Gleixner <tglx@linutronix.de> writes: > I have some ideas for a trivial generic way to solve this without > undoing the commit in question and without going through all the irq > chip drivers. So far everything I came up with is butt ugly. Maybe Marc > has some brilliant idea. > > Sorry for the wreckage and thanks for the excellent problem > description. I'll come back to you in the next days. couldn't give up :) So after staring in too many drivers, I resorted to make this mode opt-in and mark the interrupts accordingly for the two drivers which are known to want this. Not that I love it, but it's the least dangerous option. Completely untested patch below. Thanks, tglx --- arch/x86/kernel/apic/vector.c | 4 ++++ drivers/irqchip/irq-gic-v3-its.c | 5 ++++- include/linux/irq.h | 13 +++++++++++++ kernel/irq/manage.c | 6 +++++- 4 files changed, 26 insertions(+), 2 deletions(-) --- a/arch/x86/kernel/apic/vector.c +++ b/arch/x86/kernel/apic/vector.c @@ -560,6 +560,10 @@ static int x86_vector_alloc_irqs(struct * as that can corrupt the affinity move state. */ irqd_set_handle_enforce_irqctx(irqd); + + /* Don't invoke affinity setter on deactivated interrupts */ + irqd_set_affinity_on_activate(irqd); + /* * Legacy vectors are already assigned when the IOAPIC * takes them over. They stay on the same vector. This is --- a/drivers/irqchip/irq-gic-v3-its.c +++ b/drivers/irqchip/irq-gic-v3-its.c @@ -3523,6 +3523,7 @@ static int its_irq_domain_alloc(struct i msi_alloc_info_t *info = args; struct its_device *its_dev = info->scratchpad[0].ptr; struct its_node *its = its_dev->its; + struct irq_data *irqd; irq_hw_number_t hwirq; int err; int i; @@ -3542,7 +3543,9 @@ static int its_irq_domain_alloc(struct i irq_domain_set_hwirq_and_chip(domain, virq + i, hwirq + i, &its_irq_chip, its_dev); - irqd_set_single_target(irq_desc_get_irq_data(irq_to_desc(virq + i))); + irqd = irq_get_irq_data(virq + i); + irqd_set_single_target(irqd); + irqd_set_affinity_on_activate(irqd); pr_debug("ID:%d pID:%d vID:%d\n", (int)(hwirq + i - its_dev->event_map.lpi_base), (int)(hwirq + i), virq + i); --- a/include/linux/irq.h +++ b/include/linux/irq.h @@ -213,6 +213,8 @@ struct irq_data { * required * IRQD_HANDLE_ENFORCE_IRQCTX - Enforce that handle_irq_*() is only invoked * from actual interrupt context. + * IRQD_AFFINITY_ON_ACTIVATE - Affinity is set on activation. Don't call + * irq_chip::irq_set_affinity() when deactivated. */ enum { IRQD_TRIGGER_MASK = 0xf, @@ -237,6 +239,7 @@ enum { IRQD_CAN_RESERVE = (1 << 26), IRQD_MSI_NOMASK_QUIRK = (1 << 27), IRQD_HANDLE_ENFORCE_IRQCTX = (1 << 28), + IRQD_AFFINITY_ON_ACTIVATE = (1 << 29), }; #define __irqd_to_state(d) ACCESS_PRIVATE((d)->common, state_use_accessors) @@ -421,6 +424,16 @@ static inline bool irqd_msi_nomask_quirk return __irqd_to_state(d) & IRQD_MSI_NOMASK_QUIRK; } +static inline void irqd_set_affinity_on_activate(struct irq_data *d) +{ + __irqd_to_state(d) |= IRQD_AFFINITY_ON_ACTIVATE; +} + +static inline bool irqd_affinity_on_activate(struct irq_data *d) +{ + return __irqd_to_state(d) & IRQD_AFFINITY_ON_ACTIVATE; +} + #undef __irqd_to_state static inline irq_hw_number_t irqd_to_hwirq(struct irq_data *d) --- a/kernel/irq/manage.c +++ b/kernel/irq/manage.c @@ -320,12 +320,16 @@ static bool irq_set_affinity_deactivated struct irq_desc *desc = irq_data_to_desc(data); /* + * Handle irq chips which can handle affinity only in activated + * state correctly + * * If the interrupt is not yet activated, just store the affinity * mask and do not call the chip driver at all. On activation the * driver has to make sure anyway that the interrupt is in a * useable state so startup works. */ - if (!IS_ENABLED(CONFIG_IRQ_DOMAIN_HIERARCHY) || irqd_is_activated(data)) + if (!IS_ENABLED(CONFIG_IRQ_DOMAIN_HIERARCHY) || + irqd_is_activated(data) || !irqd_affinity_on_activate(data)) return false; cpumask_copy(desc->irq_common_data.affinity, mask);
Hi both, On Fri, 24 Jul 2020 21:03:50 +0100, Thomas Gleixner <tglx@linutronix.de> wrote: > > John, > > John Keeping <john@metanate.com> writes: > > On Fri, 17 Jul 2020 18:00:02 +0200 > > Thomas Gleixner <tglx@linutronix.de> wrote: > > It seems that this patch breaks perf events on RK3288 because the PMU > > interrupts that should be per-cpu are now all on CPU0 so no events are > > collected from CPUs 1-3 and those interrupts are killed as spurious > > after a few seconds. SPI-backed PMUs. Urgh... > > > > I'm seeing this on 4.19.134 and 5.4.53 but as far as I can tell the > > relevant code hasn't changed through to next-20200723. Reverting the > > backport of this change fixes the problem. > > Bah. > > > It looks like what happens is that because the interrupts are not > > per-CPU in the hardware, armpmu_request_irq() calls irq_force_affinity() > > while the interrupt is deactivated and then request_irq() with > > IRQF_PERCPU | IRQF_NOBALANCING. > > > > Now when irq_startup() runs with IRQ_STARTUP_NORMAL, it calls > > irq_setup_affinity() which returns early because IRQF_PERCPU and > > IRQF_NOBALANCING are set, leaving the interrupt on its original CPU. > > Right. My brain tricked me to believe that we made activation mandatory, > but that's not. > > I have some ideas for a trivial generic way to solve this without > undoing the commit in question and without going through all the irq > chip drivers. So far everything I came up with is butt ugly. Maybe Marc > has some brilliant idea. Not really. We have contradicting behaviours here, where some interrupts want to see the set_affinity early (the above case), and some cannot handle that (x86 vectors and the GICv3 ITS). We could key it on the presence of an activate callback, but it feels fragile. I'll follow up on your patch in the next email, which seems like a sensible approach. M.
Hi Thomas, On Fri, 24 Jul 2020 21:44:41 +0100, Thomas Gleixner <tglx@linutronix.de> wrote: > > John, > > Thomas Gleixner <tglx@linutronix.de> writes: > > I have some ideas for a trivial generic way to solve this without > > undoing the commit in question and without going through all the irq > > chip drivers. So far everything I came up with is butt ugly. Maybe Marc > > has some brilliant idea. > > > > Sorry for the wreckage and thanks for the excellent problem > > description. I'll come back to you in the next days. > > couldn't give up :) > > So after staring in too many drivers, I resorted to make this mode > opt-in and mark the interrupts accordingly for the two drivers which are > known to want this. Not that I love it, but it's the least dangerous > option. Completely untested patch below. > > Thanks, > > tglx > --- > arch/x86/kernel/apic/vector.c | 4 ++++ > drivers/irqchip/irq-gic-v3-its.c | 5 ++++- > include/linux/irq.h | 13 +++++++++++++ > kernel/irq/manage.c | 6 +++++- > 4 files changed, 26 insertions(+), 2 deletions(-) > > --- a/arch/x86/kernel/apic/vector.c > +++ b/arch/x86/kernel/apic/vector.c > @@ -560,6 +560,10 @@ static int x86_vector_alloc_irqs(struct > * as that can corrupt the affinity move state. > */ > irqd_set_handle_enforce_irqctx(irqd); > + > + /* Don't invoke affinity setter on deactivated interrupts */ > + irqd_set_affinity_on_activate(irqd); > + > /* > * Legacy vectors are already assigned when the IOAPIC > * takes them over. They stay on the same vector. This is > --- a/drivers/irqchip/irq-gic-v3-its.c > +++ b/drivers/irqchip/irq-gic-v3-its.c > @@ -3523,6 +3523,7 @@ static int its_irq_domain_alloc(struct i > msi_alloc_info_t *info = args; > struct its_device *its_dev = info->scratchpad[0].ptr; > struct its_node *its = its_dev->its; > + struct irq_data *irqd; > irq_hw_number_t hwirq; > int err; > int i; > @@ -3542,7 +3543,9 @@ static int its_irq_domain_alloc(struct i > > irq_domain_set_hwirq_and_chip(domain, virq + i, > hwirq + i, &its_irq_chip, its_dev); > - irqd_set_single_target(irq_desc_get_irq_data(irq_to_desc(virq + i))); > + irqd = irq_get_irq_data(virq + i); > + irqd_set_single_target(irqd); > + irqd_set_affinity_on_activate(irqd); > pr_debug("ID:%d pID:%d vID:%d\n", > (int)(hwirq + i - its_dev->event_map.lpi_base), > (int)(hwirq + i), virq + i); > --- a/include/linux/irq.h > +++ b/include/linux/irq.h > @@ -213,6 +213,8 @@ struct irq_data { > * required > * IRQD_HANDLE_ENFORCE_IRQCTX - Enforce that handle_irq_*() is only invoked > * from actual interrupt context. > + * IRQD_AFFINITY_ON_ACTIVATE - Affinity is set on activation. Don't call > + * irq_chip::irq_set_affinity() when deactivated. > */ > enum { > IRQD_TRIGGER_MASK = 0xf, > @@ -237,6 +239,7 @@ enum { > IRQD_CAN_RESERVE = (1 << 26), > IRQD_MSI_NOMASK_QUIRK = (1 << 27), > IRQD_HANDLE_ENFORCE_IRQCTX = (1 << 28), > + IRQD_AFFINITY_ON_ACTIVATE = (1 << 29), > }; > > #define __irqd_to_state(d) ACCESS_PRIVATE((d)->common, state_use_accessors) > @@ -421,6 +424,16 @@ static inline bool irqd_msi_nomask_quirk > return __irqd_to_state(d) & IRQD_MSI_NOMASK_QUIRK; > } > > +static inline void irqd_set_affinity_on_activate(struct irq_data *d) > +{ > + __irqd_to_state(d) |= IRQD_AFFINITY_ON_ACTIVATE; > +} > + > +static inline bool irqd_affinity_on_activate(struct irq_data *d) > +{ > + return __irqd_to_state(d) & IRQD_AFFINITY_ON_ACTIVATE; > +} > + > #undef __irqd_to_state > > static inline irq_hw_number_t irqd_to_hwirq(struct irq_data *d) > --- a/kernel/irq/manage.c > +++ b/kernel/irq/manage.c > @@ -320,12 +320,16 @@ static bool irq_set_affinity_deactivated > struct irq_desc *desc = irq_data_to_desc(data); > > /* > + * Handle irq chips which can handle affinity only in activated > + * state correctly > + * > * If the interrupt is not yet activated, just store the affinity > * mask and do not call the chip driver at all. On activation the > * driver has to make sure anyway that the interrupt is in a > * useable state so startup works. > */ > - if (!IS_ENABLED(CONFIG_IRQ_DOMAIN_HIERARCHY) || irqd_is_activated(data)) > + if (!IS_ENABLED(CONFIG_IRQ_DOMAIN_HIERARCHY) || > + irqd_is_activated(data) || !irqd_affinity_on_activate(data)) > return false; > > cpumask_copy(desc->irq_common_data.affinity, mask); > I have given this a go on two systems: one with a GICv2 that has its PMUs wired in a similar way to John's system (each CPU PMU is on a separate SPI), and another one that has an ITS. Both came up normally and their interrupts are routed as expected: * GICv2 PMU: 30: 121 0 0 0 0 0 0 0 GICv2 39 Level arm-pmu 31: 0 167 0 0 0 0 0 0 GICv2 40 Level arm-pmu 32: 0 0 145 0 0 0 0 0 GICv2 41 Level arm-pmu 33: 0 0 0 400 0 0 0 0 GICv2 42 Level arm-pmu 34: 0 0 0 0 97 0 0 0 GICv2 43 Level arm-pmu 35: 0 0 0 0 0 463 0 0 GICv2 44 Level arm-pmu 36: 0 0 0 0 0 0 74 0 GICv2 45 Level arm-pmu 37: 0 0 0 0 0 0 0 8 GICv2 46 Level arm-pmu * GICv3+ITS: 241: 219 0 0 0 0 0 ITS-MSI 524289 Edge nvme0q1 242: 0 251 0 0 0 0 ITS-MSI 524290 Edge nvme0q2 243: 0 0 197 0 0 0 ITS-MSI 524291 Edge nvme0q3 244: 0 0 0 380 0 0 ITS-MSI 524292 Edge nvme0q4 245: 0 0 0 0 675 0 ITS-MSI 524293 Edge nvme0q5 246: 0 0 0 0 0 436 ITS-MSI 524294 Edge nvme0q6 For a good measure, I've added this on top, adding the missing bits to the debugfs entries: diff --git a/kernel/irq/debugfs.c b/kernel/irq/debugfs.c index 4f9f844074db..d44fc8a5dab2 100644 --- a/kernel/irq/debugfs.c +++ b/kernel/irq/debugfs.c @@ -120,6 +120,9 @@ static const struct irq_bit_descr irqdata_states[] = { BIT_MASK_DESCR(IRQD_WAKEUP_STATE), BIT_MASK_DESCR(IRQD_WAKEUP_ARMED), + BIT_MASK_DESCR(IRQD_DEFAULT_TRIGGER_SET), + BIT_MASK_DESCR(IRQD_HANDLE_ENFORCE_IRQCTX), + BIT_MASK_DESCR(IRQD_AFFINITY_ON_ACTIVATE), }; static const struct irq_bit_descr irqdesc_states[] = { FWIW: Acked-by: Marc Zyngier <maz@kernel.org> Tested-by: Marc Zyngier <maz@kernel.org> M.
Marc, Marc Zyngier <maz@kernel.org> writes: > On Fri, 24 Jul 2020 21:03:50 +0100, > Thomas Gleixner <tglx@linutronix.de> wrote: >> Right. My brain tricked me to believe that we made activation mandatory, >> but that's not. >> >> I have some ideas for a trivial generic way to solve this without >> undoing the commit in question and without going through all the irq >> chip drivers. So far everything I came up with is butt ugly. Maybe Marc >> has some brilliant idea. > > Not really. We have contradicting behaviours here, where some > interrupts want to see the set_affinity early (the above case), and > some cannot handle that (x86 vectors and the GICv3 ITS). We could key > it on the presence of an activate callback, but it feels fragile. Yes, I thought about that briefly, but yeah, it's fragile and bound to break in weird ways. That said, we should make activate mandatory and actually set up the affinity during activation correctly. Thanks, tglx
--- a/arch/x86/kernel/apic/vector.c +++ b/arch/x86/kernel/apic/vector.c @@ -446,12 +446,10 @@ static int x86_vector_activate(struct ir trace_vector_activate(irqd->irq, apicd->is_managed, apicd->can_reserve, reserve); - /* Nothing to do for fixed assigned vectors */ - if (!apicd->can_reserve && !apicd->is_managed) - return 0; - raw_spin_lock_irqsave(&vector_lock, flags); - if (reserve || irqd_is_managed_and_shutdown(irqd)) + if (!apicd->can_reserve && !apicd->is_managed) + assign_irq_vector_any_locked(irqd); + else if (reserve || irqd_is_managed_and_shutdown(irqd)) vector_assign_managed_shutdown(irqd); else if (apicd->is_managed) ret = activate_managed(irqd); @@ -775,20 +773,10 @@ void lapic_offline(void) static int apic_set_affinity(struct irq_data *irqd, const struct cpumask *dest, bool force) { - struct apic_chip_data *apicd = apic_chip_data(irqd); int err; - /* - * Core code can call here for inactive interrupts. For inactive - * interrupts which use managed or reservation mode there is no - * point in going through the vector assignment right now as the - * activation will assign a vector which fits the destination - * cpumask. Let the core code store the destination mask and be - * done with it. - */ - if (!irqd_is_activated(irqd) && - (apicd->is_managed || apicd->can_reserve)) - return IRQ_SET_MASK_OK; + if (WARN_ON_ONCE(!irqd_is_activated(irqd))) + return -EIO; raw_spin_lock(&vector_lock); cpumask_and(vector_searchmask, dest, cpu_online_mask); --- a/kernel/irq/manage.c +++ b/kernel/irq/manage.c @@ -195,9 +195,9 @@ void irq_set_thread_affinity(struct irq_ set_bit(IRQTF_AFFINITY, &action->thread_flags); } +#ifdef CONFIG_GENERIC_IRQ_EFFECTIVE_AFF_MASK static void irq_validate_effective_affinity(struct irq_data *data) { -#ifdef CONFIG_GENERIC_IRQ_EFFECTIVE_AFF_MASK const struct cpumask *m = irq_data_get_effective_affinity_mask(data); struct irq_chip *chip = irq_data_get_irq_chip(data); @@ -205,9 +205,19 @@ static void irq_validate_effective_affin return; pr_warn_once("irq_chip %s did not update eff. affinity mask of irq %u\n", chip->name, data->irq); -#endif } +static inline void irq_init_effective_affinity(struct irq_data *data, + const struct cpumask *mask) +{ + cpumask_copy(irq_data_get_effective_affinity_mask(data), mask); +} +#else +static inline void irq_validate_effective_affinity(struct irq_data *data) { } +static inline void irq_init_effective_affinity(struct irq_data *data, + const struct cpumask *mask) { } +#endif + int irq_do_set_affinity(struct irq_data *data, const struct cpumask *mask, bool force) { @@ -304,6 +314,26 @@ static int irq_try_set_affinity(struct i return ret; } +static bool irq_set_affinity_deactivated(struct irq_data *data, + const struct cpumask *mask, bool force) +{ + struct irq_desc *desc = irq_data_to_desc(data); + + /* + * If the interrupt is not yet activated, just store the affinity + * mask and do not call the chip driver at all. On activation the + * driver has to make sure anyway that the interrupt is in a + * useable state so startup works. + */ + if (!IS_ENABLED(CONFIG_IRQ_DOMAIN_HIERARCHY) || irqd_is_activated(data)) + return false; + + cpumask_copy(desc->irq_common_data.affinity, mask); + irq_init_effective_affinity(data, mask); + irqd_set(data, IRQD_AFFINITY_SET); + return true; +} + int irq_set_affinity_locked(struct irq_data *data, const struct cpumask *mask, bool force) { @@ -314,6 +344,9 @@ int irq_set_affinity_locked(struct irq_d if (!chip || !chip->irq_set_affinity) return -EINVAL; + if (irq_set_affinity_deactivated(data, mask, force)) + return 0; + if (irq_can_move_pcntxt(data) && !irqd_is_setaffinity_pending(data)) { ret = irq_try_set_affinity(data, mask, force); } else {
Setting interrupt affinity on inactive interrupts is inconsistent when hierarchical irq domains are enabled. The core code should just store the affinity and not call into the irq chip driver for inactive interrupts because the chip drivers may not be in a state to handle such requests. X86 has a hacky workaround for that but all other irq chips have not which causes problems e.g. on GIC V3 ITS. Instead of adding more ugly hacks all over the place, solve the problem in the core code. If the affinity is set on an inactive interrupt then: - Store it in the irq descriptors affinity mask - Update the effective affinity to reflect that so user space has a consistent view - Don't call into the irq chip driver This is the core equivalent of the X86 workaround and works correctly because the affinity setting is established in the irq chip when the interrupt is activated later on. Note, that this is only effective when hierarchical irq domains are enabled by the architecture. Doing it unconditionally would break legacy irq chip implementations. For hierarchial irq domains this works correctly as none of the drivers can have a dependency on affinity setting in inactive state by design. Remove the X86 workaround as it is not longer required. Fixes: 02edee152d6e ("x86/apic/vector: Ignore set_affinity call for inactive interrupts") Reported-by: Ali Saidi <alisaidi@amazon.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20200529015501.15771-1-alisaidi@amazon.com --- V2: Fix the fallout for CONFIG_GENERIC_IRQ_EFFECTIVE_AFF_MASK=n (0day) --- arch/x86/kernel/apic/vector.c | 22 +++++----------------- kernel/irq/manage.c | 37 +++++++++++++++++++++++++++++++++++-- 2 files changed, 40 insertions(+), 19 deletions(-)