Message ID | 1377015070-26320-1-git-send-email-t.figa@samsung.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 08/20/13 09:11, Tomasz Figa wrote: > Sometimes it is necessary to fix interrupt affinity to an offline CPU, > for example in initialization of local timers. This patch modifies > .set_affinity() operation of irq-gic driver to fall back to any possible > CPU if no online CPU can be found in requested CPU mask. > > This fixes broken Exynos4210 support since commit > ee98d27df6 ARM: EXYNOS4: Divorce mct from local timer API > caused by timer initialization code unable to set affinity for local > timer interrupts. Care to elaborate further? I don't see how the interrupt affinity is set for a CPU that isn't online because the mct code runs on the CPU that the affinity is being set to.
On Tuesday 20 of August 2013 09:33:31 Stephen Boyd wrote: > On 08/20/13 09:11, Tomasz Figa wrote: > > Sometimes it is necessary to fix interrupt affinity to an offline CPU, > > for example in initialization of local timers. This patch modifies > > .set_affinity() operation of irq-gic driver to fall back to any > > possible > > CPU if no online CPU can be found in requested CPU mask. > > > > This fixes broken Exynos4210 support since commit > > > > ee98d27df6 ARM: EXYNOS4: Divorce mct from local timer API > > > > caused by timer initialization code unable to set affinity for local > > timer interrupts. > > Care to elaborate further? I don't see how the interrupt affinity is set > for a CPU that isn't online because the mct code runs on the CPU that > the affinity is being set to. Well, please look at secondary_start_kernel() in arch/arm/kernel/smp.c [1]. You can see that notify_cpu_starting() (line 348) that fires the notifier registered in MCT driver is called before set_cpu_online() (line 359) that marks the CPU as online. Also notice that, originally, local timer initialization was happening after set_cpu_online() - see line 365. [1] - http://lxr.free-electrons.com/source/arch/arm/kernel/smp.c?a=arm#L312 Best regards, Tomasz
On 08/20/13 09:41, Tomasz Figa wrote: > On Tuesday 20 of August 2013 09:33:31 Stephen Boyd wrote: >> On 08/20/13 09:11, Tomasz Figa wrote: >>> Sometimes it is necessary to fix interrupt affinity to an offline CPU, >>> for example in initialization of local timers. This patch modifies >>> .set_affinity() operation of irq-gic driver to fall back to any >>> possible >>> CPU if no online CPU can be found in requested CPU mask. >>> >>> This fixes broken Exynos4210 support since commit >>> >>> ee98d27df6 ARM: EXYNOS4: Divorce mct from local timer API >>> >>> caused by timer initialization code unable to set affinity for local >>> timer interrupts. >> Care to elaborate further? I don't see how the interrupt affinity is set >> for a CPU that isn't online because the mct code runs on the CPU that >> the affinity is being set to. > Well, please look at secondary_start_kernel() in arch/arm/kernel/smp.c [1]. > You can see that notify_cpu_starting() (line 348) that fires the notifier > registered in MCT driver is called before set_cpu_online() (line 359) that > marks the CPU as online. Also notice that, originally, local timer > initialization was happening after set_cpu_online() - see line 365. > > Great, thank you. Please put this information in the commit text next time. I wonder if we shouldn't make the cpumask_any_and() work on the present mask instead? If we ever support physical hotplug on ARM I think we wouldn't want to allow interrupts to go to CPUs that aren't even present (but still possible).
On Tuesday 20 of August 2013 09:48:25 Stephen Boyd wrote: > On 08/20/13 09:41, Tomasz Figa wrote: > > On Tuesday 20 of August 2013 09:33:31 Stephen Boyd wrote: > >> On 08/20/13 09:11, Tomasz Figa wrote: > >>> Sometimes it is necessary to fix interrupt affinity to an offline > >>> CPU, > >>> for example in initialization of local timers. This patch modifies > >>> .set_affinity() operation of irq-gic driver to fall back to any > >>> possible > >>> CPU if no online CPU can be found in requested CPU mask. > >>> > >>> This fixes broken Exynos4210 support since commit > >>> > >>> ee98d27df6 ARM: EXYNOS4: Divorce mct from local timer API > >>> > >>> caused by timer initialization code unable to set affinity for local > >>> timer interrupts. > >> > >> Care to elaborate further? I don't see how the interrupt affinity is > >> set > >> for a CPU that isn't online because the mct code runs on the CPU that > >> the affinity is being set to. > > > > Well, please look at secondary_start_kernel() in arch/arm/kernel/smp.c > > [1]. You can see that notify_cpu_starting() (line 348) that fires the > > notifier registered in MCT driver is called before set_cpu_online() > > (line 359) that marks the CPU as online. Also notice that, originally, > > local timer initialization was happening after set_cpu_online() - see > > line 365. > Great, thank you. Please put this information in the commit text next > time. Right, I could have added a sentence or two about this. > I wonder if we shouldn't make the cpumask_any_and() work on the present > mask instead? If we ever support physical hotplug on ARM I think we > wouldn't want to allow interrupts to go to CPUs that aren't even present > (but still possible). Yes, cpu_present_mask might be better indeed. Best regards, Tomasz
On Tue, Aug 20, 2013 at 06:11:10PM +0200, Tomasz Figa wrote: > Sometimes it is necessary to fix interrupt affinity to an offline CPU, > for example in initialization of local timers. This patch modifies > .set_affinity() operation of irq-gic driver to fall back to any possible > CPU if no online CPU can be found in requested CPU mask. Err, this is a bad idea. If a CPU is offline, then it must not respond to interrupts. If you bind an interrupt to an offline CPU, and that device asserts its interrupt, what happens? It doesn't get serviced until that CPU comes back online, which may be a very long time. If, for example, that is your network device, it would mean your network stops operating. Worse, the network layer will time out and reset the ethernet device, trying to get things working (which it won't.) I think how I used to handle this case prior to genirq is that I fell back to any online CPU if the interrupt ended up only routed to offline CPUs, but when an offline CPU comes back, it could then be re-routed back to that CPU. In other words, the mask change was non-destructive. I think with genirq, such mask changes are destructive.
On Tuesday 20 of August 2013 22:14:42 Russell King - ARM Linux wrote: > On Tue, Aug 20, 2013 at 06:11:10PM +0200, Tomasz Figa wrote: > > Sometimes it is necessary to fix interrupt affinity to an offline CPU, > > for example in initialization of local timers. This patch modifies > > .set_affinity() operation of irq-gic driver to fall back to any > > possible CPU if no online CPU can be found in requested CPU mask. > > Err, this is a bad idea. If a CPU is offline, then it must not respond > to interrupts. If you bind an interrupt to an offline CPU, and that > device asserts its interrupt, what happens? It doesn't get serviced > until that CPU comes back online, which may be a very long time. > > If, for example, that is your network device, it would mean your > network stops operating. Worse, the network layer will time out and > reset the ethernet device, trying to get things working (which it > won't.) > > I think how I used to handle this case prior to genirq is that I fell > back to any online CPU if the interrupt ended up only routed to offline > CPUs, but when an offline CPU comes back, it could then be re-routed > back to that CPU. In other words, the mask change was non-destructive. > > I think with genirq, such mask changes are destructive. Yes, that's correct. Although if you _explicitly_ request the interrupt to be routed to an offline CPU (i.e. only offline CPUs have bits set in passed cpumask), is it an error? There is at least one irqchip that does not check received cpumask for this (metag) and I don't see any documentation saying what should happen in this case in .set_affinity operation. Still, if you have any better solution for the original problem (broken Exynos4210 local timers, due to failing irq_set_affinity()), then I'd appreciate it, as I don't like the one from this patch too much either. Best regards, Tomasz
On 08/21, Tomasz Figa wrote: > On Tuesday 20 of August 2013 22:14:42 Russell King - ARM Linux wrote: > > On Tue, Aug 20, 2013 at 06:11:10PM +0200, Tomasz Figa wrote: > > > Sometimes it is necessary to fix interrupt affinity to an offline CPU, > > > for example in initialization of local timers. This patch modifies > > > .set_affinity() operation of irq-gic driver to fall back to any > > > possible CPU if no online CPU can be found in requested CPU mask. > > > > Err, this is a bad idea. If a CPU is offline, then it must not respond > > to interrupts. If you bind an interrupt to an offline CPU, and that > > device asserts its interrupt, what happens? It doesn't get serviced > > until that CPU comes back online, which may be a very long time. > > > > If, for example, that is your network device, it would mean your > > network stops operating. Worse, the network layer will time out and > > reset the ethernet device, trying to get things working (which it > > won't.) > > > > I think how I used to handle this case prior to genirq is that I fell > > back to any online CPU if the interrupt ended up only routed to offline > > CPUs, but when an offline CPU comes back, it could then be re-routed > > back to that CPU. In other words, the mask change was non-destructive. > > > > I think with genirq, such mask changes are destructive. > > Yes, that's correct. Although if you _explicitly_ request the interrupt to > be routed to an offline CPU (i.e. only offline CPUs have bits set in > passed cpumask), is it an error? > > There is at least one irqchip that does not check received cpumask for > this (metag) and I don't see any documentation saying what should happen > in this case in .set_affinity operation. > > Still, if you have any better solution for the original problem (broken > Exynos4210 local timers, due to failing irq_set_affinity()), then I'd > appreciate it, as I don't like the one from this patch too much either. > One "solution" might be to change the irq affinity after the CPU is marked online via the hotplug notifier chain. For a short period of time the timer interrupt will go to a different CPU but I don't see how that is a problem.
[Copying Daniel] On Tuesday 20 of August 2013 15:39:17 Stephen Boyd wrote: > On 08/21, Tomasz Figa wrote: > > On Tuesday 20 of August 2013 22:14:42 Russell King - ARM Linux wrote: > > > On Tue, Aug 20, 2013 at 06:11:10PM +0200, Tomasz Figa wrote: > > > > Sometimes it is necessary to fix interrupt affinity to an offline > > > > CPU, > > > > for example in initialization of local timers. This patch modifies > > > > .set_affinity() operation of irq-gic driver to fall back to any > > > > possible CPU if no online CPU can be found in requested CPU mask. > > > > > > Err, this is a bad idea. If a CPU is offline, then it must not > > > respond > > > to interrupts. If you bind an interrupt to an offline CPU, and that > > > device asserts its interrupt, what happens? It doesn't get serviced > > > until that CPU comes back online, which may be a very long time. > > > > > > If, for example, that is your network device, it would mean your > > > network stops operating. Worse, the network layer will time out and > > > reset the ethernet device, trying to get things working (which it > > > won't.) > > > > > > I think how I used to handle this case prior to genirq is that I fell > > > back to any online CPU if the interrupt ended up only routed to > > > offline > > > CPUs, but when an offline CPU comes back, it could then be re-routed > > > back to that CPU. In other words, the mask change was > > > non-destructive. > > > > > > I think with genirq, such mask changes are destructive. > > > > Yes, that's correct. Although if you _explicitly_ request the interrupt > > to be routed to an offline CPU (i.e. only offline CPUs have bits set > > in passed cpumask), is it an error? > > > > There is at least one irqchip that does not check received cpumask for > > this (metag) and I don't see any documentation saying what should > > happen > > in this case in .set_affinity operation. > > > > Still, if you have any better solution for the original problem (broken > > Exynos4210 local timers, due to failing irq_set_affinity()), then I'd > > appreciate it, as I don't like the one from this patch too much either. > > One "solution" might be to change the irq affinity after the CPU > is marked online via the hotplug notifier chain. For a short > period of time the timer interrupt will go to a different CPU but > I don't see how that is a problem. After initial testing, this seems to work, but but it still seems a little hackish. I'd like to make sure that nothing bad happens if the irq somehow fires before setting the affinity. An opinion of someone that is more into kernel timekeeping than me would be nice. Best regards, Tomasz
diff --git a/drivers/irqchip/irq-gic.c b/drivers/irqchip/irq-gic.c index ee7c503..5f0797e 100644 --- a/drivers/irqchip/irq-gic.c +++ b/drivers/irqchip/irq-gic.c @@ -250,6 +250,15 @@ static int gic_set_affinity(struct irq_data *d, const struct cpumask *mask_val, unsigned int cpu = cpumask_any_and(mask_val, cpu_online_mask); u32 val, mask, bit; + /* + * If no online CPU could be found, fall back to any possible CPU. + * + * This is to allow setting affinity of some interrupts to a CPU + * before it is marked as online, i.e. local timer initialization. + */ + if (cpu >= nr_cpu_ids) + cpu = cpumask_any_and(mask_val, cpu_possible_mask); + if (cpu >= NR_GIC_CPU_IF || cpu >= nr_cpu_ids) return -EINVAL;