Message ID | 1363369032-12639-1-git-send-email-nicolas.ferre@atmel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Fri, 15 Mar 2013 18:37:12 +0100 Nicolas Ferre <nicolas.ferre@atmel.com> wrote: > On some revisions of AT91 SoCs, the RTC IMR register is not working. > Instead of elaborating a workaround for that specific SoC or IP version, > we simply use a software variable to store the Interrupt Mask Register and > modify it for each enabling/disabling of an interrupt. The overhead of this > is negligible anyway. This description doesn't really allow me or others to work out whether the fix should be included in 3.9 or backported into earlier kernels. So please, when fixing a bug do include a full description of the user-visible effects of that bug. And your opinion regarding the -mainline and -stable decision is always useful.
On 13-03-20 05:50 PM, Andrew Morton wrote: > On Fri, 15 Mar 2013 18:37:12 +0100 Nicolas Ferre <nicolas.ferre@atmel.com> wrote: > >> On some revisions of AT91 SoCs, the RTC IMR register is not working. >> Instead of elaborating a workaround for that specific SoC or IP version, >> we simply use a software variable to store the Interrupt Mask Register and >> modify it for each enabling/disabling of an interrupt. The overhead of this >> is negligible anyway. > > This description doesn't really allow me or others to work out whether > the fix should be included in 3.9 or backported into earlier kernels. > > So please, when fixing a bug do include a full description of the > user-visible effects of that bug. And your opinion regarding the > -mainline and -stable decision is always useful. The interrupt mask register (IMR) for the RTC is broken on the AT91SAM9x5 sub-family of SoCs (good overview of the members here: http://www.eewiki.net/display/linuxonarm/AT91SAM9x5 ). The "user visible effect" is the RTC doesn't work. That sub-family is less than two years old and only has devicetree (DT) support and came online circa lk 3.7 . The dust is yet to settle on the DT stuff at least for AT91 SoCs (translation: lots of stuff is still broken, so much that it is hard to know where to start). The fix in the patch is pretty simple: just shadow the silicon IMR register with a variable in the driver. Some older SoCs (pre-DT) use the the rtc-at91rm9200 driver (e.g. obviously the AT91RM9200) and they should not be impacted by the change. There shouldn't be a large volume of interrupts associated with a RTC. Compared to a relatively stable kernel subsystem like SCSI, what is happening in the ARM architecture with DT is huge and ongoing. So I think you either need new rules or suspend some of the stricter rules applied to more stable subsystems. Just my two cents worth. Doug Gilbert who hasn't seen that frill-necked lizard for a while
On 03/20/2013 10:50 PM, Andrew Morton : > On Fri, 15 Mar 2013 18:37:12 +0100 Nicolas Ferre <nicolas.ferre@atmel.com> wrote: > >> On some revisions of AT91 SoCs, the RTC IMR register is not working. >> Instead of elaborating a workaround for that specific SoC or IP version, >> we simply use a software variable to store the Interrupt Mask Register and >> modify it for each enabling/disabling of an interrupt. The overhead of this >> is negligible anyway. > > This description doesn't really allow me or others to work out whether > the fix should be included in 3.9 or backported into earlier kernels. > > So please, when fixing a bug do include a full description of the > user-visible effects of that bug. And your opinion regarding the > -mainline and -stable decision is always useful. Yes, sure. Concerning this patch, we can imagine to push it upstream as a bugfix for 3.9-rc. It is not a regression because... well... it has never worked properly on the affected SoC family. For -stable, we can add this tag: Cc: stable <stable@vger.kernel.org> [v3.8+] As this patch applies up to this revision. For other kernel releases, I will need to rework the patch a little bit due to files name modifications. Thanks, best regards,
On Wed, 20 Mar 2013 21:15:23 -0400 Douglas Gilbert <dgilbert@interlog.com> wrote: > On 13-03-20 05:50 PM, Andrew Morton wrote: > > On Fri, 15 Mar 2013 18:37:12 +0100 Nicolas Ferre <nicolas.ferre@atmel.com> wrote: > > > >> On some revisions of AT91 SoCs, the RTC IMR register is not working. > >> Instead of elaborating a workaround for that specific SoC or IP version, > >> we simply use a software variable to store the Interrupt Mask Register and > >> modify it for each enabling/disabling of an interrupt. The overhead of this > >> is negligible anyway. > > > > This description doesn't really allow me or others to work out whether > > the fix should be included in 3.9 or backported into earlier kernels. > > > > So please, when fixing a bug do include a full description of the > > user-visible effects of that bug. And your opinion regarding the > > -mainline and -stable decision is always useful. > > The interrupt mask register (IMR) for the RTC is broken > on the AT91SAM9x5 sub-family of SoCs (good overview of the > members here: http://www.eewiki.net/display/linuxonarm/AT91SAM9x5 ). > The "user visible effect" is the RTC doesn't work. > > That sub-family is less than two years old and only has devicetree > (DT) support and came online circa lk 3.7 . The dust is yet to > settle on the DT stuff at least for AT91 SoCs (translation: > lots of stuff is still broken, so much that it is hard to know > where to start). > > The fix in the patch is pretty simple: just shadow the silicon > IMR register with a variable in the driver. Some older SoCs (pre-DT) > use the the rtc-at91rm9200 driver (e.g. obviously the AT91RM9200) > and they should not be impacted by the change. There shouldn't > be a large volume of interrupts associated with a RTC. Thanks. > Compared to a relatively stable kernel subsystem like SCSI, what > is happening in the ARM architecture with DT is huge and ongoing. > So I think you either need new rules or suspend some of the stricter > rules applied to more stable subsystems. Just my two cents worth. I don't know what this means. Shrug. I tagged the patch for -stable backporting on the basis that "the RTC doesn't work" is undesirable ;)
On Fri, Mar 15, 2013 at 06:37:12PM +0100, Nicolas Ferre wrote: > On some revisions of AT91 SoCs, the RTC IMR register is not working. > Instead of elaborating a workaround for that specific SoC or IP version, > we simply use a software variable to store the Interrupt Mask Register and > modify it for each enabling/disabling of an interrupt. The overhead of this > is negligible anyway. The patch does not add any memory barriers or register read-backs when manipulating the interrupt-mask variable. This could possibly lead to spurious interrupts both when enabling and disabling the various RTC-interrupts due to write reordering and bus latencies. Has this been considered? And is this reason enough for a more targeted work-around so that the SOCs with functional RTC_IMR are not affected? > Reported-by: Douglas Gilbert <dgilbert@interlog.com> > Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com> > --- > drivers/rtc/rtc-at91rm9200.c | 50 +++++++++++++++++++++++++++----------------- > drivers/rtc/rtc-at91rm9200.h | 1 - > 2 files changed, 31 insertions(+), 20 deletions(-) > > diff --git a/drivers/rtc/rtc-at91rm9200.c b/drivers/rtc/rtc-at91rm9200.c > index 79233d0..29b92e4 100644 > --- a/drivers/rtc/rtc-at91rm9200.c > +++ b/drivers/rtc/rtc-at91rm9200.c > @@ -46,6 +46,7 @@ static DECLARE_COMPLETION(at91_rtc_updated); > static unsigned int at91_alarm_year = AT91_RTC_EPOCH; > static void __iomem *at91_rtc_regs; > static int irq; > +static u32 at91_rtc_imr; > > /* > > * Decode time/date into rtc_time structure [...] > @@ -198,9 +203,12 @@ static int at91_rtc_alarm_irq_enable(struct device *dev, unsigned int enabled) > > if (enabled) { > at91_rtc_write(AT91_RTC_SCCR, AT91_RTC_ALARM); > + at91_rtc_imr |= AT91_RTC_ALARM; wmb() needed before enabling interrupt as at91_rtc_write() uses __raw_writel() which does not add any barriers? > at91_rtc_write(AT91_RTC_IER, AT91_RTC_ALARM); > - } else > + } else { > at91_rtc_write(AT91_RTC_IDR, AT91_RTC_ALARM); wmb() and register read-back needed before updating interrupt mask? > + at91_rtc_imr &= ~AT91_RTC_ALARM; > + } > > return 0; > } [...] > @@ -229,7 +235,7 @@ static irqreturn_t at91_rtc_interrupt(int irq, void *dev_id) > unsigned int rtsr; > unsigned long events = 0; > > - rtsr = at91_rtc_read(AT91_RTC_SR) & at91_rtc_read(AT91_RTC_IMR); > + rtsr = at91_rtc_read(AT91_RTC_SR) & at91_rtc_imr; Does at91_rtc_imr necessarily match the hardware state here? > if (rtsr) { /* this interrupt is shared! Is it ours? */ > if (rtsr & AT91_RTC_ALARM) > events |= (RTC_AF | RTC_IRQF); Johan
On 13-03-26 03:27 PM, Johan Hovold wrote: > On Fri, Mar 15, 2013 at 06:37:12PM +0100, Nicolas Ferre wrote: >> On some revisions of AT91 SoCs, the RTC IMR register is not working. >> Instead of elaborating a workaround for that specific SoC or IP version, >> we simply use a software variable to store the Interrupt Mask Register and >> modify it for each enabling/disabling of an interrupt. The overhead of this >> is negligible anyway. > > The patch does not add any memory barriers or register read-backs when > manipulating the interrupt-mask variable. This could possibly lead to > spurious interrupts both when enabling and disabling the various > RTC-interrupts due to write reordering and bus latencies. > > Has this been considered? And is this reason enough for a more targeted > work-around so that the SOCs with functional RTC_IMR are not affected? The SoCs in question use a single embedded ARM926EJ-S and according to the Atmel documentation, that CPU's instruction set contains no barrier (or related) instructions. In the arch/arm/mach-at91 sub-tree of the kernel source I can find no use of the wmb() call. Also checked all drivers in the kernel containing "at91" and none called wmb(). Doug Gilbert
On Tue, Mar 26, 2013 at 05:09:59PM -0400, Douglas Gilbert wrote: > On 13-03-26 03:27 PM, Johan Hovold wrote: > > On Fri, Mar 15, 2013 at 06:37:12PM +0100, Nicolas Ferre wrote: > >> On some revisions of AT91 SoCs, the RTC IMR register is not working. > >> Instead of elaborating a workaround for that specific SoC or IP version, > >> we simply use a software variable to store the Interrupt Mask Register and > >> modify it for each enabling/disabling of an interrupt. The overhead of this > >> is negligible anyway. > > > > The patch does not add any memory barriers or register read-backs when > > manipulating the interrupt-mask variable. This could possibly lead to > > spurious interrupts both when enabling and disabling the various > > RTC-interrupts due to write reordering and bus latencies. > > > > Has this been considered? And is this reason enough for a more targeted > > work-around so that the SOCs with functional RTC_IMR are not affected? > > The SoCs in question use a single embedded ARM926EJ-S and > according to the Atmel documentation, that CPU's instruction > set contains no barrier (or related) instructions. The ARM926EJ-S actually does have a Drain Write Buffer instruction but it's not used by the ARM barrier-implementation unless CONFIG_ARM_DMA_MEM_BUFFERABLE or CONFIG_SMP is set. However, wmb() always implies a compiler barrier which is what is needed in this case. > In the arch/arm/mach-at91 sub-tree of the kernel source > I can find no use of the wmb() call. Also checked all drivers > in the kernel containing "at91" and none called wmb(). I/O-operations are normally not reordered, but this patch is faking a hardware register and thus extra care needs to be taken. To repeat: > @@ -198,9 +203,12 @@ static int at91_rtc_alarm_irq_enable(struct device *dev, unsigned int enabled) > > if (enabled) { > at91_rtc_write(AT91_RTC_SCCR, AT91_RTC_ALARM); > + at91_rtc_imr |= AT91_RTC_ALARM; Here a barrier is needed to prevent the compiler from reordering the two writes (i.e., mask update and interrupt enable). > at91_rtc_write(AT91_RTC_IER, AT91_RTC_ALARM); > - } else > + } else { > at91_rtc_write(AT91_RTC_IDR, AT91_RTC_ALARM); Here a barrier is again needed to prevent the compiler from reordering, but we also need a register read back (of some RTC-register) before updating the mask. Without the register read back, there could be a window where the mask does not match the hardware state due to bus latencies. Note that even with a register read back there is a (theoretical) possibility that the interrupts have not yet been disabled when the fake mask is updated. The only way to know for sure is to poll RTC_IMR but that is the very register you're trying to emulate. > + at91_rtc_imr &= ~AT91_RTC_ALARM; > + } > > return 0; > } In the worst-case scenario ignoring the shared RTC-interrupt could lead to the disabling of the system interrupt and thus also PIT, DBGU, ... I think this patch should be reverted and a fix for the broken SoCs be implemented which does not penalise the other SoCs. That is, only fall-back to faking IMR on the SoCs where it is actually broken. Nicolas, should I send a revert patch and follow up with a fix for the broken SoCs which includes the required barriers and read-backs? Note that the patch is already being picked up for some stable trees. The fix I'm proposing would require adding minimal DT-support to the driver and is not really stable material. Therefore, a revert followed by a patch for 3.10 seems like the way to go. Johan
On 03/28/2013 10:57 AM, Johan Hovold : > On Tue, Mar 26, 2013 at 05:09:59PM -0400, Douglas Gilbert wrote: >> On 13-03-26 03:27 PM, Johan Hovold wrote: >>> On Fri, Mar 15, 2013 at 06:37:12PM +0100, Nicolas Ferre wrote: >>>> On some revisions of AT91 SoCs, the RTC IMR register is not working. >>>> Instead of elaborating a workaround for that specific SoC or IP version, >>>> we simply use a software variable to store the Interrupt Mask Register and >>>> modify it for each enabling/disabling of an interrupt. The overhead of this >>>> is negligible anyway. >>> >>> The patch does not add any memory barriers or register read-backs when >>> manipulating the interrupt-mask variable. This could possibly lead to >>> spurious interrupts both when enabling and disabling the various >>> RTC-interrupts due to write reordering and bus latencies. You are right pointing out that it has to be considered. I am in the process of analyzing the possible issues generated by this patch. I am not convinced for now that we should revert it... Give me a little more time to study this. >>> Has this been considered? And is this reason enough for a more targeted >>> work-around so that the SOCs with functional RTC_IMR are not affected? >> >> The SoCs in question use a single embedded ARM926EJ-S and >> according to the Atmel documentation, that CPU's instruction >> set contains no barrier (or related) instructions. > > The ARM926EJ-S actually does have a Drain Write Buffer instruction but > it's not used by the ARM barrier-implementation unless > CONFIG_ARM_DMA_MEM_BUFFERABLE or CONFIG_SMP is set. > > However, wmb() always implies a compiler barrier which is what is needed > in this case. > >> In the arch/arm/mach-at91 sub-tree of the kernel source >> I can find no use of the wmb() call. Also checked all drivers >> in the kernel containing "at91" and none called wmb(). > > I/O-operations are normally not reordered, but this patch is faking a > hardware register and thus extra care needs to be taken. > > To repeat: > >> @@ -198,9 +203,12 @@ static int at91_rtc_alarm_irq_enable(struct device *dev, unsigned int enabled) >> >> if (enabled) { >> at91_rtc_write(AT91_RTC_SCCR, AT91_RTC_ALARM); >> + at91_rtc_imr |= AT91_RTC_ALARM; > > Here a barrier is needed to prevent the compiler from reordering the two > writes (i.e., mask update and interrupt enable). > >> at91_rtc_write(AT91_RTC_IER, AT91_RTC_ALARM); >> - } else >> + } else { >> at91_rtc_write(AT91_RTC_IDR, AT91_RTC_ALARM); > > Here a barrier is again needed to prevent the compiler from reordering, > but we also need a register read back (of some RTC-register) before > updating the mask. Without the register read back, there could be a > window where the mask does not match the hardware state due to bus > latencies. > > Note that even with a register read back there is a (theoretical) > possibility that the interrupts have not yet been disabled when the fake > mask is updated. The only way to know for sure is to poll RTC_IMR but > that is the very register you're trying to emulate. In fact, if we protect the two code lines with the proper spinlock, we may find that all this reordering issue will not lead to a race condition. So I guess it is a simpler solution to the problem that you highlight. >> + at91_rtc_imr &= ~AT91_RTC_ALARM; >> + } >> >> return 0; >> } > > In the worst-case scenario ignoring the shared RTC-interrupt could lead > to the disabling of the system interrupt and thus also PIT, DBGU, ... > > I think this patch should be reverted and a fix for the broken SoCs be > implemented which does not penalise the other SoCs. That is, only > fall-back to faking IMR on the SoCs where it is actually broken. > > Nicolas, should I send a revert patch and follow up with a fix for the > broken SoCs which includes the required barriers and read-backs? I prefer to not distinguish between broken SoC and others. But I may be too optimistic... > Note that the patch is already being picked up for some stable trees. > The fix I'm proposing would require adding minimal DT-support to the > driver and is not really stable material. Therefore, a revert followed > by a patch for 3.10 seems like the way to go. I hope that we could avoid this scenario. Best regards,
On 13-03-28 05:57 AM, Johan Hovold wrote: > On Tue, Mar 26, 2013 at 05:09:59PM -0400, Douglas Gilbert wrote: >> On 13-03-26 03:27 PM, Johan Hovold wrote: >>> On Fri, Mar 15, 2013 at 06:37:12PM +0100, Nicolas Ferre wrote: >>>> On some revisions of AT91 SoCs, the RTC IMR register is not working. >>>> Instead of elaborating a workaround for that specific SoC or IP version, >>>> we simply use a software variable to store the Interrupt Mask Register and >>>> modify it for each enabling/disabling of an interrupt. The overhead of this >>>> is negligible anyway. >>> >>> The patch does not add any memory barriers or register read-backs when >>> manipulating the interrupt-mask variable. This could possibly lead to >>> spurious interrupts both when enabling and disabling the various >>> RTC-interrupts due to write reordering and bus latencies. >>> >>> Has this been considered? And is this reason enough for a more targeted >>> work-around so that the SOCs with functional RTC_IMR are not affected? >> >> The SoCs in question use a single embedded ARM926EJ-S and >> according to the Atmel documentation, that CPU's instruction >> set contains no barrier (or related) instructions. > > The ARM926EJ-S actually does have a Drain Write Buffer instruction but > it's not used by the ARM barrier-implementation unless > CONFIG_ARM_DMA_MEM_BUFFERABLE or CONFIG_SMP is set. The ARM926EJ-S is ARMv5 so CONFIG_ARM_DMA_MEM_BUFFERABLE is not available. SMP is not an option for arm/mach-at91. > However, wmb() always implies a compiler barrier which is what is needed > in this case. Even if wmb() did anything, would it make this case "safe"? >> In the arch/arm/mach-at91 sub-tree of the kernel source >> I can find no use of the wmb() call. Also checked all drivers >> in the kernel containing "at91" and none called wmb(). > > I/O-operations are normally not reordered, but this patch is faking a > hardware register and thus extra care needs to be taken. > > To repeat: > >> @@ -198,9 +203,12 @@ static int at91_rtc_alarm_irq_enable(struct device *dev, unsigned int enabled) >> >> if (enabled) { >> at91_rtc_write(AT91_RTC_SCCR, AT91_RTC_ALARM); >> + at91_rtc_imr |= AT91_RTC_ALARM; > > Here a barrier is needed to prevent the compiler from reordering the two > writes (i.e., mask update and interrupt enable). Isn't either order potentially unsafe? So even if the compiler did foolishly re-order them, the sequence is still unsafe when a SYS interrupt splits those two lines (since the SYS interrupt is shared, it can occur at any time). >> at91_rtc_write(AT91_RTC_IER, AT91_RTC_ALARM); >> - } else >> + } else { >> at91_rtc_write(AT91_RTC_IDR, AT91_RTC_ALARM); > > Here a barrier is again needed to prevent the compiler from reordering, > but we also need a register read back (of some RTC-register) before > updating the mask. Without the register read back, there could be a > window where the mask does not match the hardware state due to bus > latencies. > > Note that even with a register read back there is a (theoretical) > possibility that the interrupts have not yet been disabled when the fake > mask is updated. The only way to know for sure is to poll RTC_IMR but > that is the very register you're trying to emulate. > >> + at91_rtc_imr &= ~AT91_RTC_ALARM; >> + } >> >> return 0; >> } > > In the worst-case scenario ignoring the shared RTC-interrupt could lead > to the disabling of the system interrupt and thus also PIT, DBGU, ... And how often does the AT91_RTC_ALARM alarm interrupt fire? > I think this patch should be reverted and a fix for the broken SoCs be > implemented which does not penalise the other SoCs. That is, only > fall-back to faking IMR on the SoCs where it is actually broken. Even though I sent a patch to fix this problem to Nicolas, what was presented is not my version. In mine I added DT support: #ifdef CONFIG_OF static const struct of_device_id at91rm9200_rtc_dt_ids[] = { { .compatible = "atmel,at91rm9200-rtc", .data = &at91rm9200_config }, { .compatible = "atmel,at91sam9x5-rtc", .data = &at91sam9x5_config }, { /* sentinel */ } }; MODULE_DEVICE_TABLE(of, at91rm9200_rtc_dt_ids); #else #define at91rm9200_rtc_dt_ids NULL #endif /* CONFIG_OF */ The shadow IMR variable was only active in the .compatible = "atmel,at91sam9x5-rtc" case. That protected all existing users from any problems that might be introduced. Doug Gilbert
On 03/28/2013 07:20 PM, Douglas Gilbert : > On 13-03-28 05:57 AM, Johan Hovold wrote: >> On Tue, Mar 26, 2013 at 05:09:59PM -0400, Douglas Gilbert wrote: >>> On 13-03-26 03:27 PM, Johan Hovold wrote: >>>> On Fri, Mar 15, 2013 at 06:37:12PM +0100, Nicolas Ferre wrote: >>>>> On some revisions of AT91 SoCs, the RTC IMR register is not working. >>>>> Instead of elaborating a workaround for that specific SoC or IP >>>>> version, >>>>> we simply use a software variable to store the Interrupt Mask >>>>> Register and >>>>> modify it for each enabling/disabling of an interrupt. The overhead >>>>> of this >>>>> is negligible anyway. >>>> >>>> The patch does not add any memory barriers or register read-backs when >>>> manipulating the interrupt-mask variable. This could possibly lead to >>>> spurious interrupts both when enabling and disabling the various >>>> RTC-interrupts due to write reordering and bus latencies. >>>> >>>> Has this been considered? And is this reason enough for a more targeted >>>> work-around so that the SOCs with functional RTC_IMR are not affected? >>> >>> The SoCs in question use a single embedded ARM926EJ-S and >>> according to the Atmel documentation, that CPU's instruction >>> set contains no barrier (or related) instructions. >> >> The ARM926EJ-S actually does have a Drain Write Buffer instruction but >> it's not used by the ARM barrier-implementation unless >> CONFIG_ARM_DMA_MEM_BUFFERABLE or CONFIG_SMP is set. > > The ARM926EJ-S is ARMv5 so CONFIG_ARM_DMA_MEM_BUFFERABLE is not > available. SMP is not an option for arm/mach-at91. > >> However, wmb() always implies a compiler barrier which is what is needed >> in this case. > > Even if wmb() did anything, would it make this case "safe"? > >>> In the arch/arm/mach-at91 sub-tree of the kernel source >>> I can find no use of the wmb() call. Also checked all drivers >>> in the kernel containing "at91" and none called wmb(). >> >> I/O-operations are normally not reordered, but this patch is faking a >> hardware register and thus extra care needs to be taken. >> >> To repeat: >> >>> @@ -198,9 +203,12 @@ static int at91_rtc_alarm_irq_enable(struct >>> device *dev, unsigned int enabled) >>> >>> if (enabled) { >>> at91_rtc_write(AT91_RTC_SCCR, AT91_RTC_ALARM); >>> + at91_rtc_imr |= AT91_RTC_ALARM; >> >> Here a barrier is needed to prevent the compiler from reordering the two >> writes (i.e., mask update and interrupt enable). > > Isn't either order potentially unsafe? So even if the compiler > did foolishly re-order them, the sequence is still unsafe when > a SYS interrupt splits those two lines (since the SYS interrupt > is shared, it can occur at any time). Absolutely: I think it has to be protected by the proper spin_lock_(irqsave)() functions, each time we: - modify an interrupt + updated the shadow register - read the shadow register Note, on our current UP, it is the "irqsave" part that makes the difference tough... >>> at91_rtc_write(AT91_RTC_IER, AT91_RTC_ALARM); >>> - } else >>> + } else { >>> at91_rtc_write(AT91_RTC_IDR, AT91_RTC_ALARM); >> >> Here a barrier is again needed to prevent the compiler from reordering, >> but we also need a register read back (of some RTC-register) before >> updating the mask. Without the register read back, there could be a >> window where the mask does not match the hardware state due to bus >> latencies. >> >> Note that even with a register read back there is a (theoretical) >> possibility that the interrupts have not yet been disabled when the fake >> mask is updated. The only way to know for sure is to poll RTC_IMR but >> that is the very register you're trying to emulate. >> >>> + at91_rtc_imr &= ~AT91_RTC_ALARM; >>> + } >>> >>> return 0; >>> } >> >> In the worst-case scenario ignoring the shared RTC-interrupt could lead >> to the disabling of the system interrupt and thus also PIT, DBGU, ... > > And how often does the AT91_RTC_ALARM alarm interrupt fire? > >> I think this patch should be reverted and a fix for the broken SoCs be >> implemented which does not penalise the other SoCs. That is, only >> fall-back to faking IMR on the SoCs where it is actually broken. > > Even though I sent a patch to fix this problem to Nicolas, > what was presented is not my version. In mine I added DT > support: > > #ifdef CONFIG_OF > static const struct of_device_id at91rm9200_rtc_dt_ids[] = { > { .compatible = "atmel,at91rm9200-rtc", .data = > &at91rm9200_config }, > { .compatible = "atmel,at91sam9x5-rtc", .data = > &at91sam9x5_config }, > { /* sentinel */ } > }; > MODULE_DEVICE_TABLE(of, at91rm9200_rtc_dt_ids); > #else > #define at91rm9200_rtc_dt_ids NULL > #endif /* CONFIG_OF */ > > > The shadow IMR variable was only active in the > .compatible = "atmel,at91sam9x5-rtc" > case. That protected all existing users from any problems > that might be introduced. Indeed. But I am trying to build a patch that take the "broken/not broken" information out of the IP revision number. I try to write it and post it for your reviewing quickly. Best regards,
On Thu, Mar 28, 2013 at 05:16:00PM +0100, Nicolas Ferre wrote: > On 03/28/2013 10:57 AM, Johan Hovold : > > On Tue, Mar 26, 2013 at 05:09:59PM -0400, Douglas Gilbert wrote: > >> On 13-03-26 03:27 PM, Johan Hovold wrote: > >>> On Fri, Mar 15, 2013 at 06:37:12PM +0100, Nicolas Ferre wrote: [...] > >> @@ -198,9 +203,12 @@ static int at91_rtc_alarm_irq_enable(struct device *dev, unsigned int enabled) > >> > >> if (enabled) { > >> at91_rtc_write(AT91_RTC_SCCR, AT91_RTC_ALARM); > >> + at91_rtc_imr |= AT91_RTC_ALARM; > > > > Here a barrier is needed to prevent the compiler from reordering the two > > writes (i.e., mask update and interrupt enable). > > > >> at91_rtc_write(AT91_RTC_IER, AT91_RTC_ALARM); > >> - } else > >> + } else { > >> at91_rtc_write(AT91_RTC_IDR, AT91_RTC_ALARM); > > > > Here a barrier is again needed to prevent the compiler from reordering, > > but we also need a register read back (of some RTC-register) before > > updating the mask. Without the register read back, there could be a > > window where the mask does not match the hardware state due to bus > > latencies. > > > > Note that even with a register read back there is a (theoretical) > > possibility that the interrupts have not yet been disabled when the fake > > mask is updated. The only way to know for sure is to poll RTC_IMR but > > that is the very register you're trying to emulate. > > In fact, if we protect the two code lines with the proper spinlock, we > may find that all this reordering issue will not lead to a race > condition. So I guess it is a simpler solution to the problem that you > highlight. A spinlock would also be a solution to the reordering problem, albeit a slightly heavier one than the wmb(). A comment from from Doug made me realise that one more barrier would actually be needed, so I think using a spinlock is indeed preferred. It would however not be sufficient to address the second problem, which is that the interrupt disable is not immediate. An RTC-register read back is needed to make sure the IDR-write has reached the peripheral, but as I mentioned above it is merely a reasonable heuristic as the only way to be certain that the interrupts have been disabled in hardware would be to poll the RTC_IMR-register, which is register we are trying to emulate. In practice, the read-back heuristic would most likely be perfectly sufficient, but I'd prefer this to only be used for SoCs where the RTC_IMR-register is actually broken. > >> + at91_rtc_imr &= ~AT91_RTC_ALARM; > >> + } > >> > >> return 0; > >> } > > > > In the worst-case scenario ignoring the shared RTC-interrupt could lead > > to the disabling of the system interrupt and thus also PIT, DBGU, ... > > > > I think this patch should be reverted and a fix for the broken SoCs be > > implemented which does not penalise the other SoCs. That is, only > > fall-back to faking IMR on the SoCs where it is actually broken. > > > > Nicolas, should I send a revert patch and follow up with a fix for the > > broken SoCs which includes the required barriers and read-backs? > > I prefer to not distinguish between broken SoC and others. But I may be > too optimistic... If you agree with me that the second problem (interrupt-disable latency) cannot be solved without resorting to heuristics (e.g., adding read-backs or delays) then I think that should be the deciding point. I'm responding to this message with an RFC of how the work-around could be implemented (adding DT-support to the driver in the process). It applies after having reverted the current workaround. Note that only using the shadow interrupt mask when it is actually needed does not add much extra code at all. Thanks, Johan
On Thu, Mar 28, 2013 at 02:20:17PM -0400, Douglas Gilbert wrote: > On 13-03-28 05:57 AM, Johan Hovold wrote: > > On Tue, Mar 26, 2013 at 05:09:59PM -0400, Douglas Gilbert wrote: > >> On 13-03-26 03:27 PM, Johan Hovold wrote: > >>> On Fri, Mar 15, 2013 at 06:37:12PM +0100, Nicolas Ferre wrote: > >>>> On some revisions of AT91 SoCs, the RTC IMR register is not working. > >>>> Instead of elaborating a workaround for that specific SoC or IP version, > >>>> we simply use a software variable to store the Interrupt Mask Register and > >>>> modify it for each enabling/disabling of an interrupt. The overhead of this > >>>> is negligible anyway. > >>> > >>> The patch does not add any memory barriers or register read-backs when > >>> manipulating the interrupt-mask variable. This could possibly lead to > >>> spurious interrupts both when enabling and disabling the various > >>> RTC-interrupts due to write reordering and bus latencies. > >>> > >>> Has this been considered? And is this reason enough for a more targeted > >>> work-around so that the SOCs with functional RTC_IMR are not affected? > >> > >> The SoCs in question use a single embedded ARM926EJ-S and > >> according to the Atmel documentation, that CPU's instruction > >> set contains no barrier (or related) instructions. > > > > The ARM926EJ-S actually does have a Drain Write Buffer instruction but > > it's not used by the ARM barrier-implementation unless > > CONFIG_ARM_DMA_MEM_BUFFERABLE or CONFIG_SMP is set. > > The ARM926EJ-S is ARMv5 so CONFIG_ARM_DMA_MEM_BUFFERABLE is not > available. SMP is not an option for arm/mach-at91. Never said it was. I merely disputed the claim that the ARM926EJ-S has no barrier instruction. > > However, wmb() always implies a compiler barrier which is what is needed > > in this case. > > Even if wmb() did anything, would it make this case "safe"? As I write above, wmb() always implies a compiler barrier, which is is sufficient to prevent write reordering on this SoC. In particular, wmb() is defined as a compiler barrier on AT91. > >> In the arch/arm/mach-at91 sub-tree of the kernel source > >> I can find no use of the wmb() call. Also checked all drivers > >> in the kernel containing "at91" and none called wmb(). > > > > I/O-operations are normally not reordered, but this patch is faking a > > hardware register and thus extra care needs to be taken. > > > > To repeat: > > > >> @@ -198,9 +203,12 @@ static int at91_rtc_alarm_irq_enable(struct device *dev, unsigned int enabled) > >> > >> if (enabled) { > >> at91_rtc_write(AT91_RTC_SCCR, AT91_RTC_ALARM); > >> + at91_rtc_imr |= AT91_RTC_ALARM; > > > > Here a barrier is needed to prevent the compiler from reordering the two > > writes (i.e., mask update and interrupt enable). > > Isn't either order potentially unsafe? So even if the compiler > did foolishly re-order them, the sequence is still unsafe when > a SYS interrupt splits those two lines (since the SYS interrupt > is shared, it can occur at any time). Good point. This would not cause any problem as the interrupt mask is ANDed with the (hardware) status register in the interrupt handler, but only if the status register is _guaranteed_ to be cleared before updating the mask and that would require another wmb() after writing SCCR above. To avoid having to worry about such subtleties, a spinlock (and the barriers it implies) seems like the most reasonable way to prevent the race in this case. Note however that this only fixes the reordering part of the problem. The register read back would still be needed below. > >> at91_rtc_write(AT91_RTC_IER, AT91_RTC_ALARM); > >> - } else > >> + } else { > >> at91_rtc_write(AT91_RTC_IDR, AT91_RTC_ALARM); > > > > Here a barrier is again needed to prevent the compiler from reordering, > > but we also need a register read back (of some RTC-register) before > > updating the mask. Without the register read back, there could be a > > window where the mask does not match the hardware state due to bus > > latencies. > > > > Note that even with a register read back there is a (theoretical) > > possibility that the interrupts have not yet been disabled when the fake > > mask is updated. The only way to know for sure is to poll RTC_IMR but > > that is the very register you're trying to emulate. > > > >> + at91_rtc_imr &= ~AT91_RTC_ALARM; > >> + } > >> > >> return 0; > >> } > > > > In the worst-case scenario ignoring the shared RTC-interrupt could lead > > to the disabling of the system interrupt and thus also PIT, DBGU, ... > > And how often does the AT91_RTC_ALARM alarm interrupt fire? That's not relevant is it? We should not knowingly add code that could potentially blow up, no matter how improbable it is, and at least not when there are options. > > I think this patch should be reverted and a fix for the broken SoCs be > > implemented which does not penalise the other SoCs. That is, only > > fall-back to faking IMR on the SoCs where it is actually broken. > > Even though I sent a patch to fix this problem to Nicolas, > what was presented is not my version. In mine I added DT > support: > > #ifdef CONFIG_OF > static const struct of_device_id at91rm9200_rtc_dt_ids[] = { > { .compatible = "atmel,at91rm9200-rtc", .data = &at91rm9200_config }, > { .compatible = "atmel,at91sam9x5-rtc", .data = &at91sam9x5_config }, > { /* sentinel */ } > }; > MODULE_DEVICE_TABLE(of, at91rm9200_rtc_dt_ids); > #else > #define at91rm9200_rtc_dt_ids NULL > #endif /* CONFIG_OF */ > > The shadow IMR variable was only active in the > .compatible = "atmel,at91sam9x5-rtc" > case. That protected all existing users from any problems > that might be introduced. That's all I'm asking for. Do you have a link to your patch and the thread it was posted in? Is there a hardware erratum for the bug? Thanks, Johan
diff --git a/drivers/rtc/rtc-at91rm9200.c b/drivers/rtc/rtc-at91rm9200.c index 79233d0..29b92e4 100644 --- a/drivers/rtc/rtc-at91rm9200.c +++ b/drivers/rtc/rtc-at91rm9200.c @@ -46,6 +46,7 @@ static DECLARE_COMPLETION(at91_rtc_updated); static unsigned int at91_alarm_year = AT91_RTC_EPOCH; static void __iomem *at91_rtc_regs; static int irq; +static u32 at91_rtc_imr; /* * Decode time/date into rtc_time structure @@ -110,9 +111,11 @@ static int at91_rtc_settime(struct device *dev, struct rtc_time *tm) cr = at91_rtc_read(AT91_RTC_CR); at91_rtc_write(AT91_RTC_CR, cr | AT91_RTC_UPDCAL | AT91_RTC_UPDTIM); + at91_rtc_imr |= AT91_RTC_ACKUPD; at91_rtc_write(AT91_RTC_IER, AT91_RTC_ACKUPD); wait_for_completion(&at91_rtc_updated); /* wait for ACKUPD interrupt */ at91_rtc_write(AT91_RTC_IDR, AT91_RTC_ACKUPD); + at91_rtc_imr &= ~AT91_RTC_ACKUPD; at91_rtc_write(AT91_RTC_TIMR, bin2bcd(tm->tm_sec) << 0 @@ -144,7 +147,7 @@ static int at91_rtc_readalarm(struct device *dev, struct rtc_wkalrm *alrm) tm->tm_yday = rtc_year_days(tm->tm_mday, tm->tm_mon, tm->tm_year); tm->tm_year = at91_alarm_year - 1900; - alrm->enabled = (at91_rtc_read(AT91_RTC_IMR) & AT91_RTC_ALARM) + alrm->enabled = (at91_rtc_imr & AT91_RTC_ALARM) ? 1 : 0; dev_dbg(dev, "%s(): %4d-%02d-%02d %02d:%02d:%02d\n", __func__, @@ -170,6 +173,7 @@ static int at91_rtc_setalarm(struct device *dev, struct rtc_wkalrm *alrm) tm.tm_sec = alrm->time.tm_sec; at91_rtc_write(AT91_RTC_IDR, AT91_RTC_ALARM); + at91_rtc_imr &= ~AT91_RTC_ALARM; at91_rtc_write(AT91_RTC_TIMALR, bin2bcd(tm.tm_sec) << 0 | bin2bcd(tm.tm_min) << 8 @@ -182,6 +186,7 @@ static int at91_rtc_setalarm(struct device *dev, struct rtc_wkalrm *alrm) if (alrm->enabled) { at91_rtc_write(AT91_RTC_SCCR, AT91_RTC_ALARM); + at91_rtc_imr |= AT91_RTC_ALARM; at91_rtc_write(AT91_RTC_IER, AT91_RTC_ALARM); } @@ -198,9 +203,12 @@ static int at91_rtc_alarm_irq_enable(struct device *dev, unsigned int enabled) if (enabled) { at91_rtc_write(AT91_RTC_SCCR, AT91_RTC_ALARM); + at91_rtc_imr |= AT91_RTC_ALARM; at91_rtc_write(AT91_RTC_IER, AT91_RTC_ALARM); - } else + } else { at91_rtc_write(AT91_RTC_IDR, AT91_RTC_ALARM); + at91_rtc_imr &= ~AT91_RTC_ALARM; + } return 0; } @@ -209,12 +217,10 @@ static int at91_rtc_alarm_irq_enable(struct device *dev, unsigned int enabled) */ static int at91_rtc_proc(struct device *dev, struct seq_file *seq) { - unsigned long imr = at91_rtc_read(AT91_RTC_IMR); - seq_printf(seq, "update_IRQ\t: %s\n", - (imr & AT91_RTC_ACKUPD) ? "yes" : "no"); + (at91_rtc_imr & AT91_RTC_ACKUPD) ? "yes" : "no"); seq_printf(seq, "periodic_IRQ\t: %s\n", - (imr & AT91_RTC_SECEV) ? "yes" : "no"); + (at91_rtc_imr & AT91_RTC_SECEV) ? "yes" : "no"); return 0; } @@ -229,7 +235,7 @@ static irqreturn_t at91_rtc_interrupt(int irq, void *dev_id) unsigned int rtsr; unsigned long events = 0; - rtsr = at91_rtc_read(AT91_RTC_SR) & at91_rtc_read(AT91_RTC_IMR); + rtsr = at91_rtc_read(AT91_RTC_SR) & at91_rtc_imr; if (rtsr) { /* this interrupt is shared! Is it ours? */ if (rtsr & AT91_RTC_ALARM) events |= (RTC_AF | RTC_IRQF); @@ -293,6 +299,7 @@ static int __init at91_rtc_probe(struct platform_device *pdev) at91_rtc_write(AT91_RTC_IDR, AT91_RTC_ACKUPD | AT91_RTC_ALARM | AT91_RTC_SECEV | AT91_RTC_TIMEV | AT91_RTC_CALEV); + at91_rtc_imr = 0; ret = request_irq(irq, at91_rtc_interrupt, IRQF_SHARED, @@ -331,6 +338,7 @@ static int __exit at91_rtc_remove(struct platform_device *pdev) at91_rtc_write(AT91_RTC_IDR, AT91_RTC_ACKUPD | AT91_RTC_ALARM | AT91_RTC_SECEV | AT91_RTC_TIMEV | AT91_RTC_CALEV); + at91_rtc_imr = 0; free_irq(irq, pdev); rtc_device_unregister(rtc); @@ -343,31 +351,35 @@ static int __exit at91_rtc_remove(struct platform_device *pdev) /* AT91RM9200 RTC Power management control */ -static u32 at91_rtc_imr; +static u32 at91_rtc_bkpimr; + static int at91_rtc_suspend(struct device *dev) { /* this IRQ is shared with DBGU and other hardware which isn't * necessarily doing PM like we are... */ - at91_rtc_imr = at91_rtc_read(AT91_RTC_IMR) - & (AT91_RTC_ALARM|AT91_RTC_SECEV); - if (at91_rtc_imr) { - if (device_may_wakeup(dev)) + at91_rtc_bkpimr = at91_rtc_imr & (AT91_RTC_ALARM|AT91_RTC_SECEV); + if (at91_rtc_bkpimr) { + if (device_may_wakeup(dev)) { enable_irq_wake(irq); - else - at91_rtc_write(AT91_RTC_IDR, at91_rtc_imr); - } + } else { + at91_rtc_write(AT91_RTC_IDR, at91_rtc_bkpimr); + at91_rtc_imr &= ~at91_rtc_bkpimr; + } +} return 0; } static int at91_rtc_resume(struct device *dev) { - if (at91_rtc_imr) { - if (device_may_wakeup(dev)) + if (at91_rtc_bkpimr) { + if (device_may_wakeup(dev)) { disable_irq_wake(irq); - else - at91_rtc_write(AT91_RTC_IER, at91_rtc_imr); + } else { + at91_rtc_imr |= at91_rtc_bkpimr; + at91_rtc_write(AT91_RTC_IER, at91_rtc_bkpimr); + } } return 0; } diff --git a/drivers/rtc/rtc-at91rm9200.h b/drivers/rtc/rtc-at91rm9200.h index da1945e..5f940b6 100644 --- a/drivers/rtc/rtc-at91rm9200.h +++ b/drivers/rtc/rtc-at91rm9200.h @@ -64,7 +64,6 @@ #define AT91_RTC_SCCR 0x1c /* Status Clear Command Register */ #define AT91_RTC_IER 0x20 /* Interrupt Enable Register */ #define AT91_RTC_IDR 0x24 /* Interrupt Disable Register */ -#define AT91_RTC_IMR 0x28 /* Interrupt Mask Register */ #define AT91_RTC_VER 0x2c /* Valid Entry Register */ #define AT91_RTC_NVTIM (1 << 0) /* Non valid Time */
On some revisions of AT91 SoCs, the RTC IMR register is not working. Instead of elaborating a workaround for that specific SoC or IP version, we simply use a software variable to store the Interrupt Mask Register and modify it for each enabling/disabling of an interrupt. The overhead of this is negligible anyway. Reported-by: Douglas Gilbert <dgilbert@interlog.com> Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com> --- drivers/rtc/rtc-at91rm9200.c | 50 +++++++++++++++++++++++++++----------------- drivers/rtc/rtc-at91rm9200.h | 1 - 2 files changed, 31 insertions(+), 20 deletions(-)