diff mbox

rtc: rtc-at91rm9200: use a variable for storing IMR

Message ID 1363369032-12639-1-git-send-email-nicolas.ferre@atmel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Nicolas Ferre March 15, 2013, 5:37 p.m. UTC
On some revisions of AT91 SoCs, the RTC IMR register is not working.
Instead of elaborating a workaround for that specific SoC or IP version,
we simply use a software variable to store the Interrupt Mask Register and
modify it for each enabling/disabling of an interrupt. The overhead of this
is negligible anyway.

Reported-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
---
 drivers/rtc/rtc-at91rm9200.c | 50 +++++++++++++++++++++++++++-----------------
 drivers/rtc/rtc-at91rm9200.h |  1 -
 2 files changed, 31 insertions(+), 20 deletions(-)

Comments

Andrew Morton March 20, 2013, 9:50 p.m. UTC | #1
On Fri, 15 Mar 2013 18:37:12 +0100 Nicolas Ferre <nicolas.ferre@atmel.com> wrote:

> On some revisions of AT91 SoCs, the RTC IMR register is not working.
> Instead of elaborating a workaround for that specific SoC or IP version,
> we simply use a software variable to store the Interrupt Mask Register and
> modify it for each enabling/disabling of an interrupt. The overhead of this
> is negligible anyway.

This description doesn't really allow me or others to work out whether
the fix should be included in 3.9 or backported into earlier kernels.

So please, when fixing a bug do include a full description of the
user-visible effects of that bug.  And your opinion regarding the
-mainline and -stable decision is always useful.
Douglas Gilbert March 21, 2013, 1:15 a.m. UTC | #2
On 13-03-20 05:50 PM, Andrew Morton wrote:
> On Fri, 15 Mar 2013 18:37:12 +0100 Nicolas Ferre <nicolas.ferre@atmel.com> wrote:
>
>> On some revisions of AT91 SoCs, the RTC IMR register is not working.
>> Instead of elaborating a workaround for that specific SoC or IP version,
>> we simply use a software variable to store the Interrupt Mask Register and
>> modify it for each enabling/disabling of an interrupt. The overhead of this
>> is negligible anyway.
>
> This description doesn't really allow me or others to work out whether
> the fix should be included in 3.9 or backported into earlier kernels.
>
> So please, when fixing a bug do include a full description of the
> user-visible effects of that bug.  And your opinion regarding the
> -mainline and -stable decision is always useful.

The interrupt mask register (IMR) for the RTC is broken
on the AT91SAM9x5 sub-family of SoCs (good overview of the
members here: http://www.eewiki.net/display/linuxonarm/AT91SAM9x5 ).
The "user visible effect" is the RTC doesn't work.

That sub-family is less than two years old and only has devicetree
(DT) support and came online circa lk 3.7 . The dust is yet to
settle on the DT stuff at least for AT91 SoCs (translation:
lots of stuff is still broken, so much that it is hard to know
where to start).

The fix in the patch is pretty simple: just shadow the silicon
IMR register with a variable in the driver. Some older SoCs (pre-DT)
use the the rtc-at91rm9200 driver (e.g. obviously the AT91RM9200)
and they should not be impacted by the change. There shouldn't
be a large volume of interrupts associated with a RTC.


Compared to a relatively stable kernel subsystem like SCSI, what
is happening in the ARM architecture with DT is huge and ongoing.
So I think you either need new rules or suspend some of the stricter
rules applied to more stable subsystems. Just my two cents worth.


Doug Gilbert

who hasn't seen that frill-necked lizard for a while
Nicolas Ferre March 21, 2013, 9:46 a.m. UTC | #3
On 03/20/2013 10:50 PM, Andrew Morton :
> On Fri, 15 Mar 2013 18:37:12 +0100 Nicolas Ferre <nicolas.ferre@atmel.com> wrote:
> 
>> On some revisions of AT91 SoCs, the RTC IMR register is not working.
>> Instead of elaborating a workaround for that specific SoC or IP version,
>> we simply use a software variable to store the Interrupt Mask Register and
>> modify it for each enabling/disabling of an interrupt. The overhead of this
>> is negligible anyway.
> 
> This description doesn't really allow me or others to work out whether
> the fix should be included in 3.9 or backported into earlier kernels.
> 
> So please, when fixing a bug do include a full description of the
> user-visible effects of that bug.  And your opinion regarding the
> -mainline and -stable decision is always useful.

Yes, sure.

Concerning this patch, we can imagine to push it upstream as a bugfix
for 3.9-rc. It is not a regression because... well... it has never
worked properly on the affected SoC family.

For -stable, we can add this tag:

Cc: stable <stable@vger.kernel.org> [v3.8+]


As this patch applies up to this revision. For other kernel releases, I
will need to rework the patch a little bit due to files name modifications.

Thanks, best regards,
Andrew Morton March 21, 2013, 9:33 p.m. UTC | #4
On Wed, 20 Mar 2013 21:15:23 -0400 Douglas Gilbert <dgilbert@interlog.com> wrote:

> On 13-03-20 05:50 PM, Andrew Morton wrote:
> > On Fri, 15 Mar 2013 18:37:12 +0100 Nicolas Ferre <nicolas.ferre@atmel.com> wrote:
> >
> >> On some revisions of AT91 SoCs, the RTC IMR register is not working.
> >> Instead of elaborating a workaround for that specific SoC or IP version,
> >> we simply use a software variable to store the Interrupt Mask Register and
> >> modify it for each enabling/disabling of an interrupt. The overhead of this
> >> is negligible anyway.
> >
> > This description doesn't really allow me or others to work out whether
> > the fix should be included in 3.9 or backported into earlier kernels.
> >
> > So please, when fixing a bug do include a full description of the
> > user-visible effects of that bug.  And your opinion regarding the
> > -mainline and -stable decision is always useful.
> 
> The interrupt mask register (IMR) for the RTC is broken
> on the AT91SAM9x5 sub-family of SoCs (good overview of the
> members here: http://www.eewiki.net/display/linuxonarm/AT91SAM9x5 ).
> The "user visible effect" is the RTC doesn't work.
> 
> That sub-family is less than two years old and only has devicetree
> (DT) support and came online circa lk 3.7 . The dust is yet to
> settle on the DT stuff at least for AT91 SoCs (translation:
> lots of stuff is still broken, so much that it is hard to know
> where to start).
> 
> The fix in the patch is pretty simple: just shadow the silicon
> IMR register with a variable in the driver. Some older SoCs (pre-DT)
> use the the rtc-at91rm9200 driver (e.g. obviously the AT91RM9200)
> and they should not be impacted by the change. There shouldn't
> be a large volume of interrupts associated with a RTC.

Thanks.

> Compared to a relatively stable kernel subsystem like SCSI, what
> is happening in the ARM architecture with DT is huge and ongoing.
> So I think you either need new rules or suspend some of the stricter
> rules applied to more stable subsystems. Just my two cents worth.

I don't know what this means.

Shrug.  I tagged the patch for -stable backporting on the basis that
"the RTC doesn't work" is undesirable ;)
Johan Hovold March 26, 2013, 7:27 p.m. UTC | #5
On Fri, Mar 15, 2013 at 06:37:12PM +0100, Nicolas Ferre wrote:
> On some revisions of AT91 SoCs, the RTC IMR register is not working.
> Instead of elaborating a workaround for that specific SoC or IP version,
> we simply use a software variable to store the Interrupt Mask Register and
> modify it for each enabling/disabling of an interrupt. The overhead of this
> is negligible anyway.

The patch does not add any memory barriers or register read-backs when
manipulating the interrupt-mask variable. This could possibly lead to
spurious interrupts both when enabling and disabling the various
RTC-interrupts due to write reordering and bus latencies.

Has this been considered? And is this reason enough for a more targeted
work-around so that the SOCs with functional RTC_IMR are not affected?

> Reported-by: Douglas Gilbert <dgilbert@interlog.com>
> Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
> ---
>  drivers/rtc/rtc-at91rm9200.c | 50 +++++++++++++++++++++++++++-----------------
>  drivers/rtc/rtc-at91rm9200.h |  1 -
>  2 files changed, 31 insertions(+), 20 deletions(-)
> 
> diff --git a/drivers/rtc/rtc-at91rm9200.c b/drivers/rtc/rtc-at91rm9200.c
> index 79233d0..29b92e4 100644
> --- a/drivers/rtc/rtc-at91rm9200.c
> +++ b/drivers/rtc/rtc-at91rm9200.c
> @@ -46,6 +46,7 @@ static DECLARE_COMPLETION(at91_rtc_updated);
>  static unsigned int at91_alarm_year = AT91_RTC_EPOCH;
>  static void __iomem *at91_rtc_regs;
>  static int irq;
> +static u32 at91_rtc_imr;
>  
>  /*
>
> * Decode time/date into rtc_time structure

[...]

> @@ -198,9 +203,12 @@ static int at91_rtc_alarm_irq_enable(struct device *dev, unsigned int enabled)
>  
>  	if (enabled) {
>  		at91_rtc_write(AT91_RTC_SCCR, AT91_RTC_ALARM);
> +		at91_rtc_imr |= AT91_RTC_ALARM;

wmb() needed before enabling interrupt as at91_rtc_write() uses
__raw_writel() which does not add any barriers?

>  		at91_rtc_write(AT91_RTC_IER, AT91_RTC_ALARM);
> -	} else
> +	} else {
>  		at91_rtc_write(AT91_RTC_IDR, AT91_RTC_ALARM);

wmb() and register read-back needed before updating interrupt mask?

> +		at91_rtc_imr &= ~AT91_RTC_ALARM;
> +	}
>  
>  	return 0;
> }

[...]

> @@ -229,7 +235,7 @@ static irqreturn_t at91_rtc_interrupt(int irq, void *dev_id)
>  	unsigned int rtsr;
>  	unsigned long events = 0;
>  
> -	rtsr = at91_rtc_read(AT91_RTC_SR) & at91_rtc_read(AT91_RTC_IMR);
> +	rtsr = at91_rtc_read(AT91_RTC_SR) & at91_rtc_imr;

Does at91_rtc_imr necessarily match the hardware state here?

>  	if (rtsr) {		/* this interrupt is shared!  Is it ours? */
>  		if (rtsr & AT91_RTC_ALARM)
>  			events |= (RTC_AF | RTC_IRQF);

Johan
Douglas Gilbert March 26, 2013, 9:09 p.m. UTC | #6
On 13-03-26 03:27 PM, Johan Hovold wrote:
> On Fri, Mar 15, 2013 at 06:37:12PM +0100, Nicolas Ferre wrote:
>> On some revisions of AT91 SoCs, the RTC IMR register is not working.
>> Instead of elaborating a workaround for that specific SoC or IP version,
>> we simply use a software variable to store the Interrupt Mask Register and
>> modify it for each enabling/disabling of an interrupt. The overhead of this
>> is negligible anyway.
>
> The patch does not add any memory barriers or register read-backs when
> manipulating the interrupt-mask variable. This could possibly lead to
> spurious interrupts both when enabling and disabling the various
> RTC-interrupts due to write reordering and bus latencies.
>
> Has this been considered? And is this reason enough for a more targeted
> work-around so that the SOCs with functional RTC_IMR are not affected?

The SoCs in question use a single embedded ARM926EJ-S and
according to the Atmel documentation, that CPU's instruction
set contains no barrier (or related) instructions.

In the arch/arm/mach-at91 sub-tree of the kernel source
I can find no use of the wmb() call. Also checked all drivers
in the kernel containing "at91" and none called wmb().

Doug Gilbert
Johan Hovold March 28, 2013, 9:57 a.m. UTC | #7
On Tue, Mar 26, 2013 at 05:09:59PM -0400, Douglas Gilbert wrote:
> On 13-03-26 03:27 PM, Johan Hovold wrote:
> > On Fri, Mar 15, 2013 at 06:37:12PM +0100, Nicolas Ferre wrote:
> >> On some revisions of AT91 SoCs, the RTC IMR register is not working.
> >> Instead of elaborating a workaround for that specific SoC or IP version,
> >> we simply use a software variable to store the Interrupt Mask Register and
> >> modify it for each enabling/disabling of an interrupt. The overhead of this
> >> is negligible anyway.
> >
> > The patch does not add any memory barriers or register read-backs when
> > manipulating the interrupt-mask variable. This could possibly lead to
> > spurious interrupts both when enabling and disabling the various
> > RTC-interrupts due to write reordering and bus latencies.
> >
> > Has this been considered? And is this reason enough for a more targeted
> > work-around so that the SOCs with functional RTC_IMR are not affected?
> 
> The SoCs in question use a single embedded ARM926EJ-S and
> according to the Atmel documentation, that CPU's instruction
> set contains no barrier (or related) instructions.

The ARM926EJ-S actually does have a Drain Write Buffer instruction but
it's not used by the ARM barrier-implementation unless
CONFIG_ARM_DMA_MEM_BUFFERABLE or CONFIG_SMP is set.

However, wmb() always implies a compiler barrier which is what is needed
in this case.

> In the arch/arm/mach-at91 sub-tree of the kernel source
> I can find no use of the wmb() call. Also checked all drivers
> in the kernel containing "at91" and none called wmb().

I/O-operations are normally not reordered, but this patch is faking a
hardware register and thus extra care needs to be taken.

To repeat:

> @@ -198,9 +203,12 @@ static int at91_rtc_alarm_irq_enable(struct device *dev, unsigned int enabled)
>
>       if (enabled) {
>               at91_rtc_write(AT91_RTC_SCCR, AT91_RTC_ALARM);
> +             at91_rtc_imr |= AT91_RTC_ALARM;

Here a barrier is needed to prevent the compiler from reordering the two
writes (i.e., mask update and interrupt enable).

>               at91_rtc_write(AT91_RTC_IER, AT91_RTC_ALARM);
> -     } else
> +     } else {
>               at91_rtc_write(AT91_RTC_IDR, AT91_RTC_ALARM);

Here a barrier is again needed to prevent the compiler from reordering,
but we also need a register read back (of some RTC-register) before
updating the mask. Without the register read back, there could be a
window where the mask does not match the hardware state due to bus
latencies.

Note that even with a register read back there is a (theoretical)
possibility that the interrupts have not yet been disabled when the fake
mask is updated. The only way to know for sure is to poll RTC_IMR but
that is the very register you're trying to emulate.

> +             at91_rtc_imr &= ~AT91_RTC_ALARM;
> +     }
>
>       return 0;
> }

In the worst-case scenario ignoring the shared RTC-interrupt could lead
to the disabling of the system interrupt and thus also PIT, DBGU, ...

I think this patch should be reverted and a fix for the broken SoCs be
implemented which does not penalise the other SoCs. That is, only
fall-back to faking IMR on the SoCs where it is actually broken.

Nicolas, should I send a revert patch and follow up with a fix for the
broken SoCs which includes the required barriers and read-backs?

Note that the patch is already being picked up for some stable trees.
The fix I'm proposing would require adding minimal DT-support to the
driver and is not really stable material. Therefore, a revert followed
by a patch for 3.10 seems like the way to go.

Johan
Nicolas Ferre March 28, 2013, 4:16 p.m. UTC | #8
On 03/28/2013 10:57 AM, Johan Hovold :
> On Tue, Mar 26, 2013 at 05:09:59PM -0400, Douglas Gilbert wrote:
>> On 13-03-26 03:27 PM, Johan Hovold wrote:
>>> On Fri, Mar 15, 2013 at 06:37:12PM +0100, Nicolas Ferre wrote:
>>>> On some revisions of AT91 SoCs, the RTC IMR register is not working.
>>>> Instead of elaborating a workaround for that specific SoC or IP version,
>>>> we simply use a software variable to store the Interrupt Mask Register and
>>>> modify it for each enabling/disabling of an interrupt. The overhead of this
>>>> is negligible anyway.
>>>
>>> The patch does not add any memory barriers or register read-backs when
>>> manipulating the interrupt-mask variable. This could possibly lead to
>>> spurious interrupts both when enabling and disabling the various
>>> RTC-interrupts due to write reordering and bus latencies.

You are right pointing out that it has to be considered.
I am in the process of analyzing the possible issues generated by this
patch.
I am not convinced for now that we should revert it... Give me a little
more time to study this.

>>> Has this been considered? And is this reason enough for a more targeted
>>> work-around so that the SOCs with functional RTC_IMR are not affected?
>>
>> The SoCs in question use a single embedded ARM926EJ-S and
>> according to the Atmel documentation, that CPU's instruction
>> set contains no barrier (or related) instructions.
> 
> The ARM926EJ-S actually does have a Drain Write Buffer instruction but
> it's not used by the ARM barrier-implementation unless
> CONFIG_ARM_DMA_MEM_BUFFERABLE or CONFIG_SMP is set.
> 
> However, wmb() always implies a compiler barrier which is what is needed
> in this case.
> 
>> In the arch/arm/mach-at91 sub-tree of the kernel source
>> I can find no use of the wmb() call. Also checked all drivers
>> in the kernel containing "at91" and none called wmb().
> 
> I/O-operations are normally not reordered, but this patch is faking a
> hardware register and thus extra care needs to be taken.
> 
> To repeat:
> 
>> @@ -198,9 +203,12 @@ static int at91_rtc_alarm_irq_enable(struct device *dev, unsigned int enabled)
>>
>>       if (enabled) {
>>               at91_rtc_write(AT91_RTC_SCCR, AT91_RTC_ALARM);
>> +             at91_rtc_imr |= AT91_RTC_ALARM;
> 
> Here a barrier is needed to prevent the compiler from reordering the two
> writes (i.e., mask update and interrupt enable).
> 
>>               at91_rtc_write(AT91_RTC_IER, AT91_RTC_ALARM);
>> -     } else
>> +     } else {
>>               at91_rtc_write(AT91_RTC_IDR, AT91_RTC_ALARM);
> 
> Here a barrier is again needed to prevent the compiler from reordering,
> but we also need a register read back (of some RTC-register) before
> updating the mask. Without the register read back, there could be a
> window where the mask does not match the hardware state due to bus
> latencies.
> 
> Note that even with a register read back there is a (theoretical)
> possibility that the interrupts have not yet been disabled when the fake
> mask is updated. The only way to know for sure is to poll RTC_IMR but
> that is the very register you're trying to emulate.

In fact, if we protect the two code lines with the proper spinlock, we
may find that all this reordering issue will not lead to a race
condition. So I guess it is a simpler solution to the problem that you
highlight.

>> +             at91_rtc_imr &= ~AT91_RTC_ALARM;
>> +     }
>>
>>       return 0;
>> }
> 
> In the worst-case scenario ignoring the shared RTC-interrupt could lead
> to the disabling of the system interrupt and thus also PIT, DBGU, ...
> 
> I think this patch should be reverted and a fix for the broken SoCs be
> implemented which does not penalise the other SoCs. That is, only
> fall-back to faking IMR on the SoCs where it is actually broken.
> 
> Nicolas, should I send a revert patch and follow up with a fix for the
> broken SoCs which includes the required barriers and read-backs?

I prefer to not distinguish between broken SoC and others. But I may be
too optimistic...

> Note that the patch is already being picked up for some stable trees.
> The fix I'm proposing would require adding minimal DT-support to the
> driver and is not really stable material. Therefore, a revert followed
> by a patch for 3.10 seems like the way to go.

I hope that we could avoid this scenario.

Best regards,
Douglas Gilbert March 28, 2013, 6:20 p.m. UTC | #9
On 13-03-28 05:57 AM, Johan Hovold wrote:
> On Tue, Mar 26, 2013 at 05:09:59PM -0400, Douglas Gilbert wrote:
>> On 13-03-26 03:27 PM, Johan Hovold wrote:
>>> On Fri, Mar 15, 2013 at 06:37:12PM +0100, Nicolas Ferre wrote:
>>>> On some revisions of AT91 SoCs, the RTC IMR register is not working.
>>>> Instead of elaborating a workaround for that specific SoC or IP version,
>>>> we simply use a software variable to store the Interrupt Mask Register and
>>>> modify it for each enabling/disabling of an interrupt. The overhead of this
>>>> is negligible anyway.
>>>
>>> The patch does not add any memory barriers or register read-backs when
>>> manipulating the interrupt-mask variable. This could possibly lead to
>>> spurious interrupts both when enabling and disabling the various
>>> RTC-interrupts due to write reordering and bus latencies.
>>>
>>> Has this been considered? And is this reason enough for a more targeted
>>> work-around so that the SOCs with functional RTC_IMR are not affected?
>>
>> The SoCs in question use a single embedded ARM926EJ-S and
>> according to the Atmel documentation, that CPU's instruction
>> set contains no barrier (or related) instructions.
>
> The ARM926EJ-S actually does have a Drain Write Buffer instruction but
> it's not used by the ARM barrier-implementation unless
> CONFIG_ARM_DMA_MEM_BUFFERABLE or CONFIG_SMP is set.

The ARM926EJ-S is ARMv5 so CONFIG_ARM_DMA_MEM_BUFFERABLE is not
available. SMP is not an option for arm/mach-at91.

> However, wmb() always implies a compiler barrier which is what is needed
> in this case.

Even if wmb() did anything, would it make this case "safe"?

>> In the arch/arm/mach-at91 sub-tree of the kernel source
>> I can find no use of the wmb() call. Also checked all drivers
>> in the kernel containing "at91" and none called wmb().
>
> I/O-operations are normally not reordered, but this patch is faking a
> hardware register and thus extra care needs to be taken.
>
> To repeat:
>
>> @@ -198,9 +203,12 @@ static int at91_rtc_alarm_irq_enable(struct device *dev, unsigned int enabled)
>>
>>        if (enabled) {
>>                at91_rtc_write(AT91_RTC_SCCR, AT91_RTC_ALARM);
>> +             at91_rtc_imr |= AT91_RTC_ALARM;
>
> Here a barrier is needed to prevent the compiler from reordering the two
> writes (i.e., mask update and interrupt enable).

Isn't either order potentially unsafe? So even if the compiler
did foolishly re-order them, the sequence is still unsafe when
a SYS interrupt splits those two lines (since the SYS interrupt
is shared, it can occur at any time).

>>                at91_rtc_write(AT91_RTC_IER, AT91_RTC_ALARM);
>> -     } else
>> +     } else {
>>                at91_rtc_write(AT91_RTC_IDR, AT91_RTC_ALARM);
>
> Here a barrier is again needed to prevent the compiler from reordering,
> but we also need a register read back (of some RTC-register) before
> updating the mask. Without the register read back, there could be a
> window where the mask does not match the hardware state due to bus
> latencies.
>
> Note that even with a register read back there is a (theoretical)
> possibility that the interrupts have not yet been disabled when the fake
> mask is updated. The only way to know for sure is to poll RTC_IMR but
> that is the very register you're trying to emulate.
>
>> +             at91_rtc_imr &= ~AT91_RTC_ALARM;
>> +     }
>>
>>        return 0;
>> }
>
> In the worst-case scenario ignoring the shared RTC-interrupt could lead
> to the disabling of the system interrupt and thus also PIT, DBGU, ...

And how often does the AT91_RTC_ALARM alarm interrupt fire?

> I think this patch should be reverted and a fix for the broken SoCs be
> implemented which does not penalise the other SoCs. That is, only
> fall-back to faking IMR on the SoCs where it is actually broken.

Even though I sent a patch to fix this problem to Nicolas,
what was presented is not my version. In mine I added DT
support:

#ifdef CONFIG_OF
static const struct of_device_id at91rm9200_rtc_dt_ids[] = {
        { .compatible = "atmel,at91rm9200-rtc", .data = &at91rm9200_config },
        { .compatible = "atmel,at91sam9x5-rtc", .data = &at91sam9x5_config },
        { /* sentinel */ }
};
MODULE_DEVICE_TABLE(of, at91rm9200_rtc_dt_ids);
#else
#define at91rm9200_rtc_dt_ids NULL
#endif /* CONFIG_OF */


The shadow IMR variable was only active in the
      .compatible = "atmel,at91sam9x5-rtc"
case. That protected all existing users from any problems
that might be introduced.

Doug Gilbert
Nicolas Ferre March 29, 2013, 3:45 p.m. UTC | #10
On 03/28/2013 07:20 PM, Douglas Gilbert :
> On 13-03-28 05:57 AM, Johan Hovold wrote:
>> On Tue, Mar 26, 2013 at 05:09:59PM -0400, Douglas Gilbert wrote:
>>> On 13-03-26 03:27 PM, Johan Hovold wrote:
>>>> On Fri, Mar 15, 2013 at 06:37:12PM +0100, Nicolas Ferre wrote:
>>>>> On some revisions of AT91 SoCs, the RTC IMR register is not working.
>>>>> Instead of elaborating a workaround for that specific SoC or IP
>>>>> version,
>>>>> we simply use a software variable to store the Interrupt Mask
>>>>> Register and
>>>>> modify it for each enabling/disabling of an interrupt. The overhead
>>>>> of this
>>>>> is negligible anyway.
>>>>
>>>> The patch does not add any memory barriers or register read-backs when
>>>> manipulating the interrupt-mask variable. This could possibly lead to
>>>> spurious interrupts both when enabling and disabling the various
>>>> RTC-interrupts due to write reordering and bus latencies.
>>>>
>>>> Has this been considered? And is this reason enough for a more targeted
>>>> work-around so that the SOCs with functional RTC_IMR are not affected?
>>>
>>> The SoCs in question use a single embedded ARM926EJ-S and
>>> according to the Atmel documentation, that CPU's instruction
>>> set contains no barrier (or related) instructions.
>>
>> The ARM926EJ-S actually does have a Drain Write Buffer instruction but
>> it's not used by the ARM barrier-implementation unless
>> CONFIG_ARM_DMA_MEM_BUFFERABLE or CONFIG_SMP is set.
> 
> The ARM926EJ-S is ARMv5 so CONFIG_ARM_DMA_MEM_BUFFERABLE is not
> available. SMP is not an option for arm/mach-at91.
> 
>> However, wmb() always implies a compiler barrier which is what is needed
>> in this case.
> 
> Even if wmb() did anything, would it make this case "safe"?
> 
>>> In the arch/arm/mach-at91 sub-tree of the kernel source
>>> I can find no use of the wmb() call. Also checked all drivers
>>> in the kernel containing "at91" and none called wmb().
>>
>> I/O-operations are normally not reordered, but this patch is faking a
>> hardware register and thus extra care needs to be taken.
>>
>> To repeat:
>>
>>> @@ -198,9 +203,12 @@ static int at91_rtc_alarm_irq_enable(struct
>>> device *dev, unsigned int enabled)
>>>
>>>        if (enabled) {
>>>                at91_rtc_write(AT91_RTC_SCCR, AT91_RTC_ALARM);
>>> +             at91_rtc_imr |= AT91_RTC_ALARM;
>>
>> Here a barrier is needed to prevent the compiler from reordering the two
>> writes (i.e., mask update and interrupt enable).
> 
> Isn't either order potentially unsafe? So even if the compiler
> did foolishly re-order them, the sequence is still unsafe when
> a SYS interrupt splits those two lines (since the SYS interrupt
> is shared, it can occur at any time).

Absolutely: I think it has to be protected by the proper
spin_lock_(irqsave)() functions, each time we:
- modify an interrupt + updated the shadow register
- read the shadow register

Note, on our current UP, it is the "irqsave" part that makes the
difference tough...

>>>                at91_rtc_write(AT91_RTC_IER, AT91_RTC_ALARM);
>>> -     } else
>>> +     } else {
>>>                at91_rtc_write(AT91_RTC_IDR, AT91_RTC_ALARM);
>>
>> Here a barrier is again needed to prevent the compiler from reordering,
>> but we also need a register read back (of some RTC-register) before
>> updating the mask. Without the register read back, there could be a
>> window where the mask does not match the hardware state due to bus
>> latencies.
>>
>> Note that even with a register read back there is a (theoretical)
>> possibility that the interrupts have not yet been disabled when the fake
>> mask is updated. The only way to know for sure is to poll RTC_IMR but
>> that is the very register you're trying to emulate.
>>
>>> +             at91_rtc_imr &= ~AT91_RTC_ALARM;
>>> +     }
>>>
>>>        return 0;
>>> }
>>
>> In the worst-case scenario ignoring the shared RTC-interrupt could lead
>> to the disabling of the system interrupt and thus also PIT, DBGU, ...
> 
> And how often does the AT91_RTC_ALARM alarm interrupt fire?
> 
>> I think this patch should be reverted and a fix for the broken SoCs be
>> implemented which does not penalise the other SoCs. That is, only
>> fall-back to faking IMR on the SoCs where it is actually broken.
> 
> Even though I sent a patch to fix this problem to Nicolas,
> what was presented is not my version. In mine I added DT
> support:
> 
> #ifdef CONFIG_OF
> static const struct of_device_id at91rm9200_rtc_dt_ids[] = {
>        { .compatible = "atmel,at91rm9200-rtc", .data =
> &at91rm9200_config },
>        { .compatible = "atmel,at91sam9x5-rtc", .data =
> &at91sam9x5_config },
>        { /* sentinel */ }
> };
> MODULE_DEVICE_TABLE(of, at91rm9200_rtc_dt_ids);
> #else
> #define at91rm9200_rtc_dt_ids NULL
> #endif /* CONFIG_OF */
> 
> 
> The shadow IMR variable was only active in the
>      .compatible = "atmel,at91sam9x5-rtc"
> case. That protected all existing users from any problems
> that might be introduced.

Indeed. But I am trying to build a patch that take the "broken/not
broken" information out of the IP revision number.

I try to write it and post it for your reviewing quickly.

Best regards,
Johan Hovold March 29, 2013, 3:57 p.m. UTC | #11
On Thu, Mar 28, 2013 at 05:16:00PM +0100, Nicolas Ferre wrote:
> On 03/28/2013 10:57 AM, Johan Hovold :
> > On Tue, Mar 26, 2013 at 05:09:59PM -0400, Douglas Gilbert wrote:
> >> On 13-03-26 03:27 PM, Johan Hovold wrote:
> >>> On Fri, Mar 15, 2013 at 06:37:12PM +0100, Nicolas Ferre wrote:

[...]

> >> @@ -198,9 +203,12 @@ static int at91_rtc_alarm_irq_enable(struct device *dev, unsigned int enabled)
> >>
> >>       if (enabled) {
> >>               at91_rtc_write(AT91_RTC_SCCR, AT91_RTC_ALARM);
> >> +             at91_rtc_imr |= AT91_RTC_ALARM;
> > 
> > Here a barrier is needed to prevent the compiler from reordering the two
> > writes (i.e., mask update and interrupt enable).
> > 
> >>               at91_rtc_write(AT91_RTC_IER, AT91_RTC_ALARM);
> >> -     } else
> >> +     } else {
> >>               at91_rtc_write(AT91_RTC_IDR, AT91_RTC_ALARM);
> > 
> > Here a barrier is again needed to prevent the compiler from reordering,
> > but we also need a register read back (of some RTC-register) before
> > updating the mask. Without the register read back, there could be a
> > window where the mask does not match the hardware state due to bus
> > latencies.
> > 
> > Note that even with a register read back there is a (theoretical)
> > possibility that the interrupts have not yet been disabled when the fake
> > mask is updated. The only way to know for sure is to poll RTC_IMR but
> > that is the very register you're trying to emulate.
> 
> In fact, if we protect the two code lines with the proper spinlock, we
> may find that all this reordering issue will not lead to a race
> condition. So I guess it is a simpler solution to the problem that you
> highlight.

A spinlock would also be a solution to the reordering problem, albeit a
slightly heavier one than the wmb(). A comment from from Doug made me
realise that one more barrier would actually be needed, so I think using
a spinlock is indeed preferred.

It would however not be sufficient to address the second problem, which
is that the interrupt disable is not immediate. An RTC-register read
back is needed to make sure the IDR-write has reached the peripheral,
but as I mentioned above it is merely a reasonable heuristic as the only
way to be certain that the interrupts have been disabled in hardware
would be to poll the RTC_IMR-register, which is register we are trying
to emulate.

In practice, the read-back heuristic would most likely be perfectly
sufficient, but I'd prefer this to only be used for SoCs where the
RTC_IMR-register is actually broken.

> >> +             at91_rtc_imr &= ~AT91_RTC_ALARM;
> >> +     }
> >>
> >>       return 0;
> >> }
> > 
> > In the worst-case scenario ignoring the shared RTC-interrupt could lead
> > to the disabling of the system interrupt and thus also PIT, DBGU, ...
> > 
> > I think this patch should be reverted and a fix for the broken SoCs be
> > implemented which does not penalise the other SoCs. That is, only
> > fall-back to faking IMR on the SoCs where it is actually broken.
> > 
> > Nicolas, should I send a revert patch and follow up with a fix for the
> > broken SoCs which includes the required barriers and read-backs?
> 
> I prefer to not distinguish between broken SoC and others. But I may be
> too optimistic...

If you agree with me that the second problem (interrupt-disable latency)
cannot be solved without resorting to heuristics (e.g., adding
read-backs or delays) then I think that should be the deciding point.

I'm responding to this message with an RFC of how the work-around could
be implemented (adding DT-support to the driver in the process). It
applies after having reverted the current workaround.

Note that only using the shadow interrupt mask when it is actually
needed does not add much extra code at all.

Thanks,
Johan
Johan Hovold March 29, 2013, 4:01 p.m. UTC | #12
On Thu, Mar 28, 2013 at 02:20:17PM -0400, Douglas Gilbert wrote:
> On 13-03-28 05:57 AM, Johan Hovold wrote:
> > On Tue, Mar 26, 2013 at 05:09:59PM -0400, Douglas Gilbert wrote:
> >> On 13-03-26 03:27 PM, Johan Hovold wrote:
> >>> On Fri, Mar 15, 2013 at 06:37:12PM +0100, Nicolas Ferre wrote:
> >>>> On some revisions of AT91 SoCs, the RTC IMR register is not working.
> >>>> Instead of elaborating a workaround for that specific SoC or IP version,
> >>>> we simply use a software variable to store the Interrupt Mask Register and
> >>>> modify it for each enabling/disabling of an interrupt. The overhead of this
> >>>> is negligible anyway.
> >>>
> >>> The patch does not add any memory barriers or register read-backs when
> >>> manipulating the interrupt-mask variable. This could possibly lead to
> >>> spurious interrupts both when enabling and disabling the various
> >>> RTC-interrupts due to write reordering and bus latencies.
> >>>
> >>> Has this been considered? And is this reason enough for a more targeted
> >>> work-around so that the SOCs with functional RTC_IMR are not affected?
> >>
> >> The SoCs in question use a single embedded ARM926EJ-S and
> >> according to the Atmel documentation, that CPU's instruction
> >> set contains no barrier (or related) instructions.
> >
> > The ARM926EJ-S actually does have a Drain Write Buffer instruction but
> > it's not used by the ARM barrier-implementation unless
> > CONFIG_ARM_DMA_MEM_BUFFERABLE or CONFIG_SMP is set.
> 
> The ARM926EJ-S is ARMv5 so CONFIG_ARM_DMA_MEM_BUFFERABLE is not
> available. SMP is not an option for arm/mach-at91.

Never said it was. I merely disputed the claim that the ARM926EJ-S has
no barrier instruction.

> > However, wmb() always implies a compiler barrier which is what is needed
> > in this case.
> 
> Even if wmb() did anything, would it make this case "safe"?

As I write above, wmb() always implies a compiler barrier, which is is
sufficient to prevent write reordering on this SoC. In particular, wmb()
is defined as a compiler barrier on AT91.

> >> In the arch/arm/mach-at91 sub-tree of the kernel source
> >> I can find no use of the wmb() call. Also checked all drivers
> >> in the kernel containing "at91" and none called wmb().
> >
> > I/O-operations are normally not reordered, but this patch is faking a
> > hardware register and thus extra care needs to be taken.
> >
> > To repeat:
> >
> >> @@ -198,9 +203,12 @@ static int at91_rtc_alarm_irq_enable(struct device *dev, unsigned int enabled)
> >>
> >>        if (enabled) {
> >>                at91_rtc_write(AT91_RTC_SCCR, AT91_RTC_ALARM);
> >> +             at91_rtc_imr |= AT91_RTC_ALARM;
> >
> > Here a barrier is needed to prevent the compiler from reordering the two
> > writes (i.e., mask update and interrupt enable).
> 
> Isn't either order potentially unsafe? So even if the compiler
> did foolishly re-order them, the sequence is still unsafe when
> a SYS interrupt splits those two lines (since the SYS interrupt
> is shared, it can occur at any time).

Good point. This would not cause any problem as the interrupt mask is
ANDed with the (hardware) status register in the interrupt handler, but
only if the status register is _guaranteed_ to be cleared before
updating the mask and that would require another wmb() after writing
SCCR above.

To avoid having to worry about such subtleties, a spinlock (and the
barriers it implies) seems like the most reasonable way to prevent the
race in this case. Note however that this only fixes the reordering part
of the problem. The register read back would still be needed below.

> >>                at91_rtc_write(AT91_RTC_IER, AT91_RTC_ALARM);
> >> -     } else
> >> +     } else {
> >>                at91_rtc_write(AT91_RTC_IDR, AT91_RTC_ALARM);
> >
> > Here a barrier is again needed to prevent the compiler from reordering,
> > but we also need a register read back (of some RTC-register) before
> > updating the mask. Without the register read back, there could be a
> > window where the mask does not match the hardware state due to bus
> > latencies.
> >
> > Note that even with a register read back there is a (theoretical)
> > possibility that the interrupts have not yet been disabled when the fake
> > mask is updated. The only way to know for sure is to poll RTC_IMR but
> > that is the very register you're trying to emulate.
> >
> >> +             at91_rtc_imr &= ~AT91_RTC_ALARM;
> >> +     }
> >>
> >>        return 0;
> >> }
> >
> > In the worst-case scenario ignoring the shared RTC-interrupt could lead
> > to the disabling of the system interrupt and thus also PIT, DBGU, ...
> 
> And how often does the AT91_RTC_ALARM alarm interrupt fire?

That's not relevant is it? We should not knowingly add code that could
potentially blow up, no matter how improbable it is, and at least not
when there are options.

> > I think this patch should be reverted and a fix for the broken SoCs be
> > implemented which does not penalise the other SoCs. That is, only
> > fall-back to faking IMR on the SoCs where it is actually broken.
> 
> Even though I sent a patch to fix this problem to Nicolas,
> what was presented is not my version. In mine I added DT
> support:
> 
> #ifdef CONFIG_OF
> static const struct of_device_id at91rm9200_rtc_dt_ids[] = {
>         { .compatible = "atmel,at91rm9200-rtc", .data = &at91rm9200_config },
>         { .compatible = "atmel,at91sam9x5-rtc", .data = &at91sam9x5_config },
>         { /* sentinel */ }
> };
> MODULE_DEVICE_TABLE(of, at91rm9200_rtc_dt_ids);
> #else
> #define at91rm9200_rtc_dt_ids NULL
> #endif /* CONFIG_OF */
> 
> The shadow IMR variable was only active in the
>       .compatible = "atmel,at91sam9x5-rtc"
> case. That protected all existing users from any problems
> that might be introduced.

That's all I'm asking for.

Do you have a link to your patch and the thread it was posted in?

Is there a hardware erratum for the bug?

Thanks,
Johan
diff mbox

Patch

diff --git a/drivers/rtc/rtc-at91rm9200.c b/drivers/rtc/rtc-at91rm9200.c
index 79233d0..29b92e4 100644
--- a/drivers/rtc/rtc-at91rm9200.c
+++ b/drivers/rtc/rtc-at91rm9200.c
@@ -46,6 +46,7 @@  static DECLARE_COMPLETION(at91_rtc_updated);
 static unsigned int at91_alarm_year = AT91_RTC_EPOCH;
 static void __iomem *at91_rtc_regs;
 static int irq;
+static u32 at91_rtc_imr;
 
 /*
  * Decode time/date into rtc_time structure
@@ -110,9 +111,11 @@  static int at91_rtc_settime(struct device *dev, struct rtc_time *tm)
 	cr = at91_rtc_read(AT91_RTC_CR);
 	at91_rtc_write(AT91_RTC_CR, cr | AT91_RTC_UPDCAL | AT91_RTC_UPDTIM);
 
+	at91_rtc_imr |= AT91_RTC_ACKUPD;
 	at91_rtc_write(AT91_RTC_IER, AT91_RTC_ACKUPD);
 	wait_for_completion(&at91_rtc_updated);	/* wait for ACKUPD interrupt */
 	at91_rtc_write(AT91_RTC_IDR, AT91_RTC_ACKUPD);
+	at91_rtc_imr &= ~AT91_RTC_ACKUPD;
 
 	at91_rtc_write(AT91_RTC_TIMR,
 			  bin2bcd(tm->tm_sec) << 0
@@ -144,7 +147,7 @@  static int at91_rtc_readalarm(struct device *dev, struct rtc_wkalrm *alrm)
 	tm->tm_yday = rtc_year_days(tm->tm_mday, tm->tm_mon, tm->tm_year);
 	tm->tm_year = at91_alarm_year - 1900;
 
-	alrm->enabled = (at91_rtc_read(AT91_RTC_IMR) & AT91_RTC_ALARM)
+	alrm->enabled = (at91_rtc_imr & AT91_RTC_ALARM)
 			? 1 : 0;
 
 	dev_dbg(dev, "%s(): %4d-%02d-%02d %02d:%02d:%02d\n", __func__,
@@ -170,6 +173,7 @@  static int at91_rtc_setalarm(struct device *dev, struct rtc_wkalrm *alrm)
 	tm.tm_sec = alrm->time.tm_sec;
 
 	at91_rtc_write(AT91_RTC_IDR, AT91_RTC_ALARM);
+	at91_rtc_imr &= ~AT91_RTC_ALARM;
 	at91_rtc_write(AT91_RTC_TIMALR,
 		  bin2bcd(tm.tm_sec) << 0
 		| bin2bcd(tm.tm_min) << 8
@@ -182,6 +186,7 @@  static int at91_rtc_setalarm(struct device *dev, struct rtc_wkalrm *alrm)
 
 	if (alrm->enabled) {
 		at91_rtc_write(AT91_RTC_SCCR, AT91_RTC_ALARM);
+		at91_rtc_imr |= AT91_RTC_ALARM;
 		at91_rtc_write(AT91_RTC_IER, AT91_RTC_ALARM);
 	}
 
@@ -198,9 +203,12 @@  static int at91_rtc_alarm_irq_enable(struct device *dev, unsigned int enabled)
 
 	if (enabled) {
 		at91_rtc_write(AT91_RTC_SCCR, AT91_RTC_ALARM);
+		at91_rtc_imr |= AT91_RTC_ALARM;
 		at91_rtc_write(AT91_RTC_IER, AT91_RTC_ALARM);
-	} else
+	} else {
 		at91_rtc_write(AT91_RTC_IDR, AT91_RTC_ALARM);
+		at91_rtc_imr &= ~AT91_RTC_ALARM;
+	}
 
 	return 0;
 }
@@ -209,12 +217,10 @@  static int at91_rtc_alarm_irq_enable(struct device *dev, unsigned int enabled)
  */
 static int at91_rtc_proc(struct device *dev, struct seq_file *seq)
 {
-	unsigned long imr = at91_rtc_read(AT91_RTC_IMR);
-
 	seq_printf(seq, "update_IRQ\t: %s\n",
-			(imr & AT91_RTC_ACKUPD) ? "yes" : "no");
+			(at91_rtc_imr & AT91_RTC_ACKUPD) ? "yes" : "no");
 	seq_printf(seq, "periodic_IRQ\t: %s\n",
-			(imr & AT91_RTC_SECEV) ? "yes" : "no");
+			(at91_rtc_imr & AT91_RTC_SECEV) ? "yes" : "no");
 
 	return 0;
 }
@@ -229,7 +235,7 @@  static irqreturn_t at91_rtc_interrupt(int irq, void *dev_id)
 	unsigned int rtsr;
 	unsigned long events = 0;
 
-	rtsr = at91_rtc_read(AT91_RTC_SR) & at91_rtc_read(AT91_RTC_IMR);
+	rtsr = at91_rtc_read(AT91_RTC_SR) & at91_rtc_imr;
 	if (rtsr) {		/* this interrupt is shared!  Is it ours? */
 		if (rtsr & AT91_RTC_ALARM)
 			events |= (RTC_AF | RTC_IRQF);
@@ -293,6 +299,7 @@  static int __init at91_rtc_probe(struct platform_device *pdev)
 	at91_rtc_write(AT91_RTC_IDR, AT91_RTC_ACKUPD | AT91_RTC_ALARM |
 					AT91_RTC_SECEV | AT91_RTC_TIMEV |
 					AT91_RTC_CALEV);
+	at91_rtc_imr = 0;
 
 	ret = request_irq(irq, at91_rtc_interrupt,
 				IRQF_SHARED,
@@ -331,6 +338,7 @@  static int __exit at91_rtc_remove(struct platform_device *pdev)
 	at91_rtc_write(AT91_RTC_IDR, AT91_RTC_ACKUPD | AT91_RTC_ALARM |
 					AT91_RTC_SECEV | AT91_RTC_TIMEV |
 					AT91_RTC_CALEV);
+	at91_rtc_imr = 0;
 	free_irq(irq, pdev);
 
 	rtc_device_unregister(rtc);
@@ -343,31 +351,35 @@  static int __exit at91_rtc_remove(struct platform_device *pdev)
 
 /* AT91RM9200 RTC Power management control */
 
-static u32 at91_rtc_imr;
+static u32 at91_rtc_bkpimr;
+
 
 static int at91_rtc_suspend(struct device *dev)
 {
 	/* this IRQ is shared with DBGU and other hardware which isn't
 	 * necessarily doing PM like we are...
 	 */
-	at91_rtc_imr = at91_rtc_read(AT91_RTC_IMR)
-			& (AT91_RTC_ALARM|AT91_RTC_SECEV);
-	if (at91_rtc_imr) {
-		if (device_may_wakeup(dev))
+	at91_rtc_bkpimr = at91_rtc_imr & (AT91_RTC_ALARM|AT91_RTC_SECEV);
+	if (at91_rtc_bkpimr) {
+		if (device_may_wakeup(dev)) {
 			enable_irq_wake(irq);
-		else
-			at91_rtc_write(AT91_RTC_IDR, at91_rtc_imr);
-	}
+		} else {
+			at91_rtc_write(AT91_RTC_IDR, at91_rtc_bkpimr);
+			at91_rtc_imr &= ~at91_rtc_bkpimr;
+		}
+}
 	return 0;
 }
 
 static int at91_rtc_resume(struct device *dev)
 {
-	if (at91_rtc_imr) {
-		if (device_may_wakeup(dev))
+	if (at91_rtc_bkpimr) {
+		if (device_may_wakeup(dev)) {
 			disable_irq_wake(irq);
-		else
-			at91_rtc_write(AT91_RTC_IER, at91_rtc_imr);
+		} else {
+			at91_rtc_imr |= at91_rtc_bkpimr;
+			at91_rtc_write(AT91_RTC_IER, at91_rtc_bkpimr);
+		}
 	}
 	return 0;
 }
diff --git a/drivers/rtc/rtc-at91rm9200.h b/drivers/rtc/rtc-at91rm9200.h
index da1945e..5f940b6 100644
--- a/drivers/rtc/rtc-at91rm9200.h
+++ b/drivers/rtc/rtc-at91rm9200.h
@@ -64,7 +64,6 @@ 
 #define	AT91_RTC_SCCR		0x1c			/* Status Clear Command Register */
 #define	AT91_RTC_IER		0x20			/* Interrupt Enable Register */
 #define	AT91_RTC_IDR		0x24			/* Interrupt Disable Register */
-#define	AT91_RTC_IMR		0x28			/* Interrupt Mask Register */
 
 #define	AT91_RTC_VER		0x2c			/* Valid Entry Register */
 #define		AT91_RTC_NVTIM		(1 <<  0)		/* Non valid Time */