Message ID | 1465235791-7064-2-git-send-email-marc.zyngier@arm.com (mailing list archive) |
---|---|
State | Not Applicable |
Headers | show |
On 06/06/2016 10:56 AM, Marc Zyngier wrote: > The ARM architected timer produces level-triggered interrupts (this > is mandated by the architecture). Unfortunately, most device-trees > get this wrong, and expose an edge-triggered interrupt. > > Until now, this wasn't too much an issue, as the programming of the > trigger would fail (the corresponding PPI cannot be reconfigured), > and the kernel would be happy with this. But we're about to change > this, and trust DT a lot if the driver doesn't provide its own > trigger information. In that context, the timer breaks badly. > > While we do need to fix the DTs, there is also some userspace out > there (kvmtool) that generates the same kind of broken DT on the > fly, and that will completely break with newer kernels. > > As a safety measure, and to keep buggy software alive as well as > buying us some time to fix DTs all over the place, let's check > what trigger configuration has been given us by the firmware. > If this is not a level configuration, then we know that the > DT/ACPI configuration is bust, and we pick some defaults which > won't be worse than the existing setup. > > Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> I tried to test this patch, but there is a problem somewhere that I have not yet tracked down. On Cavium Thunder (gic-v3 based) I have tested with the device tree interrupt type of both 4 and 8 and get the same result: [ 0.000000] arm_arch_timer: WARNING: Invalid trigger for IRQ2, assuming level low [ 0.000000] arm_arch_timer: WARNING: Please fix your firmware [ 0.000000] arm_arch_timer: WARNING: Invalid trigger for IRQ3, assuming level low [ 0.000000] arm_arch_timer: WARNING: Please fix your firmware [ 0.000000] arm_arch_timer: Architected cp15 timer(s) running at 100.00MHz (phys). [ 0.000000] clocksource: arch_sys_counter: mask: 0xffffffffffffff max_cycles: 0x171024e7e0, max_idle_ns: 440795205315 ns [ 0.000002] sched_clock: 56 bits at 100MHz, resolution 10ns, wraps every 4398046511100ns It could be that the gic-v3 irq mapping code is broken. I will try to look into it, but there may be other fixes needed before we would consider this patch to be an improvement. > --- > drivers/clocksource/arm_arch_timer.c | 27 ++++++++++++++++++++++++--- > 1 file changed, 24 insertions(+), 3 deletions(-) > > diff --git a/drivers/clocksource/arm_arch_timer.c b/drivers/clocksource/arm_arch_timer.c > index 3628ac8..1310641 100644 > --- a/drivers/clocksource/arm_arch_timer.c > +++ b/drivers/clocksource/arm_arch_timer.c > @@ -8,6 +8,9 @@ > * it under the terms of the GNU General Public License version 2 as > * published by the Free Software Foundation. > */ > + > +#define pr_fmt(fmt) "arm_arch_timer: " fmt > + > #include <linux/init.h> > #include <linux/kernel.h> > #include <linux/device.h> > @@ -462,14 +465,32 @@ static bool arch_timer_has_nonsecure_ppi(void) > arch_timer_ppi[PHYS_NONSECURE_PPI]); > } > > +static u32 check_ppi_trigger(int irq) > +{ > + u32 flags = irq_get_trigger_type(irq); > + > + if (flags != IRQF_TRIGGER_HIGH && flags != IRQF_TRIGGER_LOW) { > + pr_warn("WARNING: Invalid trigger for IRQ%d, assuming level low\n", irq); > + pr_warn("WARNING: Please fix your firmware\n"); > + flags = IRQF_TRIGGER_LOW; > + } > + > + return flags; > +} > + > static int arch_timer_setup(struct clock_event_device *clk) > { > + u32 flags; > + > __arch_timer_setup(ARCH_CP15_TIMER, clk); > > - enable_percpu_irq(arch_timer_ppi[arch_timer_uses_ppi], 0); > + flags = check_ppi_trigger(arch_timer_ppi[arch_timer_uses_ppi]); > + enable_percpu_irq(arch_timer_ppi[arch_timer_uses_ppi], flags); > > - if (arch_timer_has_nonsecure_ppi()) > - enable_percpu_irq(arch_timer_ppi[PHYS_NONSECURE_PPI], 0); > + if (arch_timer_has_nonsecure_ppi()) { > + flags = check_ppi_trigger(arch_timer_ppi[PHYS_NONSECURE_PPI]); > + enable_percpu_irq(arch_timer_ppi[PHYS_NONSECURE_PPI], flags); > + } > > arch_counter_set_user_access(); > if (IS_ENABLED(CONFIG_ARM_ARCH_TIMER_EVTSTREAM)) >
On Thu, 09 Jun 2016 14:10:48 -0700 David Daney <ddaney.cavm@gmail.com> wrote: > On 06/06/2016 10:56 AM, Marc Zyngier wrote: > > The ARM architected timer produces level-triggered interrupts (this > > is mandated by the architecture). Unfortunately, most device-trees > > get this wrong, and expose an edge-triggered interrupt. > > > > Until now, this wasn't too much an issue, as the programming of the > > trigger would fail (the corresponding PPI cannot be reconfigured), > > and the kernel would be happy with this. But we're about to change > > this, and trust DT a lot if the driver doesn't provide its own > > trigger information. In that context, the timer breaks badly. > > > > While we do need to fix the DTs, there is also some userspace out > > there (kvmtool) that generates the same kind of broken DT on the > > fly, and that will completely break with newer kernels. > > > > As a safety measure, and to keep buggy software alive as well as > > buying us some time to fix DTs all over the place, let's check > > what trigger configuration has been given us by the firmware. > > If this is not a level configuration, then we know that the > > DT/ACPI configuration is bust, and we pick some defaults which > > won't be worse than the existing setup. > > > > Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> > > > I tried to test this patch, but there is a problem somewhere that I have > not yet tracked down. On Cavium Thunder (gic-v3 based) I have tested > with the device tree interrupt type of both 4 and 8 and get the same result: > > > [ 0.000000] arm_arch_timer: WARNING: Invalid trigger for IRQ2, > assuming level low > [ 0.000000] arm_arch_timer: WARNING: Please fix your firmware > [ 0.000000] arm_arch_timer: WARNING: Invalid trigger for IRQ3, > assuming level low > [ 0.000000] arm_arch_timer: WARNING: Please fix your firmware > [ 0.000000] arm_arch_timer: Architected cp15 timer(s) running at > 100.00MHz (phys). > [ 0.000000] clocksource: arch_sys_counter: mask: 0xffffffffffffff > max_cycles: 0x171024e7e0, max_idle_ns: 440795205315 ns > [ 0.000002] sched_clock: 56 bits at 100MHz, resolution 10ns, wraps > every 4398046511100ns > > It could be that the gic-v3 irq mapping code is broken. I will try to > look into it, but there may be other fixes needed before we would > consider this patch to be an improvement. That's because the core kernel has other bugs which are going to be addressed in 4.8. So far, we cannot set the trigger of a per-cpu interrupt from the device tree, and we end-up with whatever is the default (edge). You can put whatever you want in the DT, it will be ignored. This series in preparation of these fixes landing in 4.8, where we'll be able to do the right thing, and will start noticing stupid things coming from the DT. Thanks, M.
On 06/10/2016 12:29 AM, Marc Zyngier wrote: > On Thu, 09 Jun 2016 14:10:48 -0700 > David Daney <ddaney.cavm@gmail.com> wrote: > >> On 06/06/2016 10:56 AM, Marc Zyngier wrote: >>> The ARM architected timer produces level-triggered interrupts (this >>> is mandated by the architecture). Unfortunately, most device-trees >>> get this wrong, and expose an edge-triggered interrupt. >>> >>> Until now, this wasn't too much an issue, as the programming of the >>> trigger would fail (the corresponding PPI cannot be reconfigured), >>> and the kernel would be happy with this. But we're about to change >>> this, and trust DT a lot if the driver doesn't provide its own >>> trigger information. In that context, the timer breaks badly. >>> >>> While we do need to fix the DTs, there is also some userspace out >>> there (kvmtool) that generates the same kind of broken DT on the >>> fly, and that will completely break with newer kernels. >>> >>> As a safety measure, and to keep buggy software alive as well as >>> buying us some time to fix DTs all over the place, let's check >>> what trigger configuration has been given us by the firmware. >>> If this is not a level configuration, then we know that the >>> DT/ACPI configuration is bust, and we pick some defaults which >>> won't be worse than the existing setup. >>> >>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> >> >> >> I tried to test this patch, but there is a problem somewhere that I have >> not yet tracked down. On Cavium Thunder (gic-v3 based) I have tested >> with the device tree interrupt type of both 4 and 8 and get the same result: >> >> >> [ 0.000000] arm_arch_timer: WARNING: Invalid trigger for IRQ2, >> assuming level low >> [ 0.000000] arm_arch_timer: WARNING: Please fix your firmware >> [ 0.000000] arm_arch_timer: WARNING: Invalid trigger for IRQ3, >> assuming level low >> [ 0.000000] arm_arch_timer: WARNING: Please fix your firmware >> [ 0.000000] arm_arch_timer: Architected cp15 timer(s) running at >> 100.00MHz (phys). >> [ 0.000000] clocksource: arch_sys_counter: mask: 0xffffffffffffff >> max_cycles: 0x171024e7e0, max_idle_ns: 440795205315 ns >> [ 0.000002] sched_clock: 56 bits at 100MHz, resolution 10ns, wraps >> every 4398046511100ns >> >> It could be that the gic-v3 irq mapping code is broken. I will try to >> look into it, but there may be other fixes needed before we would >> consider this patch to be an improvement. > > That's because the core kernel has other bugs which are going to be > addressed in 4.8. So far, we cannot set the trigger of a per-cpu > interrupt from the device tree, and we end-up with whatever is the > default (edge). You can put whatever you want in the DT, it will be > ignored. Yes, after looking into it, I see what you mean. > > This series in preparation of these fixes landing in 4.8, where we'll > be able to do the right thing, and will start noticing stupid things > coming from the DT. > I don't object to the patch, but would suggest a couple of things: o We need to test it *after* the irq configuration issues are corrected. o The merging order be such that we never get the WARNING messages. Thanks, David.
On Fri, Jun 10, 2016 at 12:29 AM, Marc Zyngier <marc.zyngier@arm.com> wrote: > On Thu, 09 Jun 2016 14:10:48 -0700 > David Daney <ddaney.cavm@gmail.com> wrote: > >> On 06/06/2016 10:56 AM, Marc Zyngier wrote: >> > The ARM architected timer produces level-triggered interrupts (this >> > is mandated by the architecture). Unfortunately, most device-trees >> > get this wrong, and expose an edge-triggered interrupt. >> > >> > Until now, this wasn't too much an issue, as the programming of the >> > trigger would fail (the corresponding PPI cannot be reconfigured), >> > and the kernel would be happy with this. But we're about to change >> > this, and trust DT a lot if the driver doesn't provide its own >> > trigger information. In that context, the timer breaks badly. >> > >> > While we do need to fix the DTs, there is also some userspace out >> > there (kvmtool) that generates the same kind of broken DT on the >> > fly, and that will completely break with newer kernels. >> > >> > As a safety measure, and to keep buggy software alive as well as >> > buying us some time to fix DTs all over the place, let's check >> > what trigger configuration has been given us by the firmware. >> > If this is not a level configuration, then we know that the >> > DT/ACPI configuration is bust, and we pick some defaults which >> > won't be worse than the existing setup. >> > >> > Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> >> >> >> I tried to test this patch, but there is a problem somewhere that I have >> not yet tracked down. On Cavium Thunder (gic-v3 based) I have tested >> with the device tree interrupt type of both 4 and 8 and get the same result: >> >> >> [ 0.000000] arm_arch_timer: WARNING: Invalid trigger for IRQ2, >> assuming level low >> [ 0.000000] arm_arch_timer: WARNING: Please fix your firmware >> [ 0.000000] arm_arch_timer: WARNING: Invalid trigger for IRQ3, >> assuming level low >> [ 0.000000] arm_arch_timer: WARNING: Please fix your firmware >> [ 0.000000] arm_arch_timer: Architected cp15 timer(s) running at >> 100.00MHz (phys). >> [ 0.000000] clocksource: arch_sys_counter: mask: 0xffffffffffffff >> max_cycles: 0x171024e7e0, max_idle_ns: 440795205315 ns >> [ 0.000002] sched_clock: 56 bits at 100MHz, resolution 10ns, wraps >> every 4398046511100ns >> >> It could be that the gic-v3 irq mapping code is broken. I will try to >> look into it, but there may be other fixes needed before we would >> consider this patch to be an improvement. > > That's because the core kernel has other bugs which are going to be > addressed in 4.8. So far, we cannot set the trigger of a per-cpu > interrupt from the device tree, and we end-up with whatever is the > default (edge). You can put whatever you want in the DT, it will be > ignored. > > This series in preparation of these fixes landing in 4.8, where we'll > be able to do the right thing, and will start noticing stupid things > coming from the DT. Hi Marc, I also see the same warning that David saw. Can you please cc me when the bug fix series is available? I will test it out for X-Gene 1 and will need to change the interrupt setting for timer events on X-Gene 2 as well. Regards, Duc Dang. > > Thanks, > > M. > -- > Jazz is not dead. It just smells funny.
On Fri, 10 Jun 2016 10:39:22 -0700 David Daney <ddaney.cavm@gmail.com> wrote: > On 06/10/2016 12:29 AM, Marc Zyngier wrote: > > On Thu, 09 Jun 2016 14:10:48 -0700 > > David Daney <ddaney.cavm@gmail.com> wrote: > > > >> On 06/06/2016 10:56 AM, Marc Zyngier wrote: > >>> The ARM architected timer produces level-triggered interrupts (this > >>> is mandated by the architecture). Unfortunately, most device-trees > >>> get this wrong, and expose an edge-triggered interrupt. > >>> > >>> Until now, this wasn't too much an issue, as the programming of the > >>> trigger would fail (the corresponding PPI cannot be reconfigured), > >>> and the kernel would be happy with this. But we're about to change > >>> this, and trust DT a lot if the driver doesn't provide its own > >>> trigger information. In that context, the timer breaks badly. > >>> > >>> While we do need to fix the DTs, there is also some userspace out > >>> there (kvmtool) that generates the same kind of broken DT on the > >>> fly, and that will completely break with newer kernels. > >>> > >>> As a safety measure, and to keep buggy software alive as well as > >>> buying us some time to fix DTs all over the place, let's check > >>> what trigger configuration has been given us by the firmware. > >>> If this is not a level configuration, then we know that the > >>> DT/ACPI configuration is bust, and we pick some defaults which > >>> won't be worse than the existing setup. > >>> > >>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> > >> > >> > >> I tried to test this patch, but there is a problem somewhere that I have > >> not yet tracked down. On Cavium Thunder (gic-v3 based) I have tested > >> with the device tree interrupt type of both 4 and 8 and get the same result: > >> > >> > >> [ 0.000000] arm_arch_timer: WARNING: Invalid trigger for IRQ2, > >> assuming level low > >> [ 0.000000] arm_arch_timer: WARNING: Please fix your firmware > >> [ 0.000000] arm_arch_timer: WARNING: Invalid trigger for IRQ3, > >> assuming level low > >> [ 0.000000] arm_arch_timer: WARNING: Please fix your firmware > >> [ 0.000000] arm_arch_timer: Architected cp15 timer(s) running at > >> 100.00MHz (phys). > >> [ 0.000000] clocksource: arch_sys_counter: mask: 0xffffffffffffff > >> max_cycles: 0x171024e7e0, max_idle_ns: 440795205315 ns > >> [ 0.000002] sched_clock: 56 bits at 100MHz, resolution 10ns, wraps > >> every 4398046511100ns > >> > >> It could be that the gic-v3 irq mapping code is broken. I will try to > >> look into it, but there may be other fixes needed before we would > >> consider this patch to be an improvement. > > > > That's because the core kernel has other bugs which are going to be > > addressed in 4.8. So far, we cannot set the trigger of a per-cpu > > interrupt from the device tree, and we end-up with whatever is the > > default (edge). You can put whatever you want in the DT, it will be > > ignored. > > Yes, after looking into it, I see what you mean. > > > > > This series in preparation of these fixes landing in 4.8, where we'll > > be able to do the right thing, and will start noticing stupid things > > coming from the DT. > > > > I don't object to the patch, but would suggest a couple of things: > > o We need to test it *after* the irq configuration issues are corrected. > > o The merging order be such that we never get the WARNING messages. What do you prefer? A benign warning message? or a broken system? You sound more afraid of the former than the later (and yes, things *are* broken at the moment). Thanks, M.
On Sat, 11 Jun 2016 13:02:44 +0100
Ben Dooks <bjdooks@googlemail.com> wrote:
> out of interest, do you have a list of what the problems are?
The trigger configuration for per-cpu interrupts silently fails
(because set_irq_type cannot deal with them). Which means we're relying
on whatever configuration the firmware has left in there. Also, the
kernel defaults to considering the interrupt as edge.
What saves most platforms so far is that they are using a GIC:
1) Most GIC implementations have their PPI configuration as RO, which
means that we can't get it wrong.
2) If using a fasteoi handler, there is no significant difference in the
flow between edge and level (we're relying on the HW dealing with it,
so (1) is critical).
If your GIC allows PPI configuration to be written and firmware gets it
wrong, you'll miss interrupts. If you don't have a GIC, all bets are
off.
I've queued a number of patches to solve this, which I hope to send to
tglx tomorrow (after looking at this weekend test run).
Thanks,
M.
diff --git a/drivers/clocksource/arm_arch_timer.c b/drivers/clocksource/arm_arch_timer.c index 3628ac8..1310641 100644 --- a/drivers/clocksource/arm_arch_timer.c +++ b/drivers/clocksource/arm_arch_timer.c @@ -8,6 +8,9 @@ * it under the terms of the GNU General Public License version 2 as * published by the Free Software Foundation. */ + +#define pr_fmt(fmt) "arm_arch_timer: " fmt + #include <linux/init.h> #include <linux/kernel.h> #include <linux/device.h> @@ -462,14 +465,32 @@ static bool arch_timer_has_nonsecure_ppi(void) arch_timer_ppi[PHYS_NONSECURE_PPI]); } +static u32 check_ppi_trigger(int irq) +{ + u32 flags = irq_get_trigger_type(irq); + + if (flags != IRQF_TRIGGER_HIGH && flags != IRQF_TRIGGER_LOW) { + pr_warn("WARNING: Invalid trigger for IRQ%d, assuming level low\n", irq); + pr_warn("WARNING: Please fix your firmware\n"); + flags = IRQF_TRIGGER_LOW; + } + + return flags; +} + static int arch_timer_setup(struct clock_event_device *clk) { + u32 flags; + __arch_timer_setup(ARCH_CP15_TIMER, clk); - enable_percpu_irq(arch_timer_ppi[arch_timer_uses_ppi], 0); + flags = check_ppi_trigger(arch_timer_ppi[arch_timer_uses_ppi]); + enable_percpu_irq(arch_timer_ppi[arch_timer_uses_ppi], flags); - if (arch_timer_has_nonsecure_ppi()) - enable_percpu_irq(arch_timer_ppi[PHYS_NONSECURE_PPI], 0); + if (arch_timer_has_nonsecure_ppi()) { + flags = check_ppi_trigger(arch_timer_ppi[PHYS_NONSECURE_PPI]); + enable_percpu_irq(arch_timer_ppi[PHYS_NONSECURE_PPI], flags); + } arch_counter_set_user_access(); if (IS_ENABLED(CONFIG_ARM_ARCH_TIMER_EVTSTREAM))
The ARM architected timer produces level-triggered interrupts (this is mandated by the architecture). Unfortunately, most device-trees get this wrong, and expose an edge-triggered interrupt. Until now, this wasn't too much an issue, as the programming of the trigger would fail (the corresponding PPI cannot be reconfigured), and the kernel would be happy with this. But we're about to change this, and trust DT a lot if the driver doesn't provide its own trigger information. In that context, the timer breaks badly. While we do need to fix the DTs, there is also some userspace out there (kvmtool) that generates the same kind of broken DT on the fly, and that will completely break with newer kernels. As a safety measure, and to keep buggy software alive as well as buying us some time to fix DTs all over the place, let's check what trigger configuration has been given us by the firmware. If this is not a level configuration, then we know that the DT/ACPI configuration is bust, and we pick some defaults which won't be worse than the existing setup. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> --- drivers/clocksource/arm_arch_timer.c | 27 ++++++++++++++++++++++++--- 1 file changed, 24 insertions(+), 3 deletions(-)