diff mbox

[RESEND] pinctrl: rockchip: Disable interrupt when changing it's capability

Message ID 20180503065553.7762-1-jeffy.chen@rock-chips.com (mailing list archive)
State New, archived
Headers show

Commit Message

Jeffy Chen May 3, 2018, 6:55 a.m. UTC
We saw spurious irq when changing irq's trigger type, for example
setting gpio-keys's wakeup irq trigger type.

And according to the TRM:
"Programming the GPIO registers for interrupt capability, edge-sensitive
or level-sensitive interrupts, and interrupt polarity should be
completed prior to enabling the interrupts on Port A in order to prevent
spurious glitches on the interrupt lines to the interrupt controller."

Reported-by: Brian Norris <briannorris@google.com>
Signed-off-by: Jeffy Chen <jeffy.chen@rock-chips.com>
---

 drivers/pinctrl/pinctrl-rockchip.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

Comments

Brian Norris May 7, 2018, 10:15 p.m. UTC | #1
+ Doug

Hi Jeffy,

On Thu, May 03, 2018 at 02:55:53PM +0800, Jeffy Chen wrote:
> We saw spurious irq when changing irq's trigger type, for example
> setting gpio-keys's wakeup irq trigger type.
> 
> And according to the TRM:
> "Programming the GPIO registers for interrupt capability, edge-sensitive
> or level-sensitive interrupts, and interrupt polarity should be
> completed prior to enabling the interrupts on Port A in order to prevent
> spurious glitches on the interrupt lines to the interrupt controller."
> 
> Reported-by: Brian Norris <briannorris@google.com>
> Signed-off-by: Jeffy Chen <jeffy.chen@rock-chips.com>
> ---
> 
>  drivers/pinctrl/pinctrl-rockchip.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/drivers/pinctrl/pinctrl-rockchip.c b/drivers/pinctrl/pinctrl-rockchip.c
> index 3924779f55785..7ff45ec8330d1 100644
> --- a/drivers/pinctrl/pinctrl-rockchip.c
> +++ b/drivers/pinctrl/pinctrl-rockchip.c
> @@ -2727,9 +2727,19 @@ static int rockchip_irq_set_type(struct irq_data *d, unsigned int type)
>  		return -EINVAL;
>  	}
>  
> +	/**

The typical multiline comment style is to use only 1 asterisk -- 2
asterisks usually denote something more formal, like kerneldoc.

> +	 * According to the TRM, we should keep irq disabled during programming
> +	 * interrupt capability to prevent spurious glitches on the interrupt
> +	 * lines to the interrupt controller.
> +	 */

It's also worth noting that this case may come up in
rockchip_irq_demux() for EDGE_BOTH triggers:

		/*
		 * Triggering IRQ on both rising and falling edge
		 * needs manual intervention.
		 */
		if (bank->toggle_edge_mode & BIT(irq)) {
...
				polarity = readl_relaxed(bank->reg_base +
							 GPIO_INT_POLARITY);
				if (data & BIT(irq))
					polarity &= ~BIT(irq);
				else
					polarity |= BIT(irq);
				writel(polarity,
				       bank->reg_base + GPIO_INT_POLARITY);

AIUI, this case is not actually a problem, because this polarity
reconfiguration happens before we call generic_handle_irq(), so the
extra spurious interrupt is handled and cleared after this point. But it
seems like this should be noted somewhere in either the commit message,
the code comments, or both.

On the other hand...this also implies there may be a race condition
there, where we might lose an interrupt if there is an edge between the
re-configuration of the polarity in rockchip_irq_demux() and the
clearing/handling of the interrupt (handle_edge_irq() ->
chip->irq_ack()). If we have an edge between there, then we might ack
it, but leave the polarity such that we aren't ready for the next
(inverted) edge.

I'm not 100% sure about the above, so it might be good to verify it. But
I also expect this means there's really no way to 100% reliably
implement both-edge trigger types on hardware like this that doesn't
directly implement it. So I'm not sure what we'd do with that knowledge.

> +	data = readl(bank->reg_base + GPIO_INTEN);
> +	writel_relaxed(data & ~mask, gc->reg_base + GPIO_INTEN);

Not sure if this is a needless optimization: but couldn't you only
disable the interrupt (and make the level and polarity changes) only if
the level or polarity are actually changing? For example, it's possible
to end up here when changing from EDGE_BOTH to EDGE_RISING, but we might
not actually program a new value.

Brian

> +
>  	writel_relaxed(level, gc->reg_base + GPIO_INTTYPE_LEVEL);
>  	writel_relaxed(polarity, gc->reg_base + GPIO_INT_POLARITY);
>  
> +	writel_relaxed(data, gc->reg_base + GPIO_INTEN);
> +
>  	irq_gc_unlock(gc);
>  	raw_spin_unlock_irqrestore(&bank->slock, flags);
>  	clk_disable(bank->clk);
> -- 
> 2.11.0
> 
>
Jeffy Chen May 8, 2018, 1:36 a.m. UTC | #2
Hi Brian,

On 05/08/2018 06:15 AM, Brian Norris wrote:
> + Doug
>
> Hi Jeffy,
>
> On Thu, May 03, 2018 at 02:55:53PM +0800, Jeffy Chen wrote:
>> We saw spurious irq when changing irq's trigger type, for example
>> setting gpio-keys's wakeup irq trigger type.
>>
>> And according to the TRM:
>> "Programming the GPIO registers for interrupt capability, edge-sensitive
>> or level-sensitive interrupts, and interrupt polarity should be
>> completed prior to enabling the interrupts on Port A in order to prevent
>> spurious glitches on the interrupt lines to the interrupt controller."
>>
>> Reported-by: Brian Norris <briannorris@google.com>
>> Signed-off-by: Jeffy Chen <jeffy.chen@rock-chips.com>
>> ---
>>
>>   drivers/pinctrl/pinctrl-rockchip.c | 10 ++++++++++
>>   1 file changed, 10 insertions(+)
>>
>> diff --git a/drivers/pinctrl/pinctrl-rockchip.c b/drivers/pinctrl/pinctrl-rockchip.c
>> index 3924779f55785..7ff45ec8330d1 100644
>> --- a/drivers/pinctrl/pinctrl-rockchip.c
>> +++ b/drivers/pinctrl/pinctrl-rockchip.c
>> @@ -2727,9 +2727,19 @@ static int rockchip_irq_set_type(struct irq_data *d, unsigned int type)
>>   		return -EINVAL;
>>   	}
>>
>> +	/**
>
> The typical multiline comment style is to use only 1 asterisk -- 2
> asterisks usually denote something more formal, like kerneldoc.

ok, will fix it.

>
>> +	 * According to the TRM, we should keep irq disabled during programming
>> +	 * interrupt capability to prevent spurious glitches on the interrupt
>> +	 * lines to the interrupt controller.
>> +	 */
>
> It's also worth noting that this case may come up in
> rockchip_irq_demux() for EDGE_BOTH triggers:
>
> 		/*
> 		 * Triggering IRQ on both rising and falling edge
> 		 * needs manual intervention.
> 		 */
> 		if (bank->toggle_edge_mode & BIT(irq)) {
> ...
> 				polarity = readl_relaxed(bank->reg_base +
> 							 GPIO_INT_POLARITY);
> 				if (data & BIT(irq))
> 					polarity &= ~BIT(irq);
> 				else
> 					polarity |= BIT(irq);
> 				writel(polarity,
> 				       bank->reg_base + GPIO_INT_POLARITY);
>
> AIUI, this case is not actually a problem, because this polarity
> reconfiguration happens before we call generic_handle_irq(), so the
> extra spurious interrupt is handled and cleared after this point. But it
> seems like this should be noted somewhere in either the commit message,
> the code comments, or both.

just from my testing, the spurious irq only happen when set EDGE_RISING 
to a gpio which is already high level, or set EDGE_FALLING to a gpio 
which is already low level.so the demux / EDGE_BOTH seem ok.

but adding comments as your suggested is a good idea, will do that.

>
> On the other hand...this also implies there may be a race condition
> there, where we might lose an interrupt if there is an edge between the
> re-configuration of the polarity in rockchip_irq_demux() and the
> clearing/handling of the interrupt (handle_edge_irq() ->
> chip->irq_ack()). If we have an edge between there, then we might ack
> it, but leave the polarity such that we aren't ready for the next
> (inverted) edge.

if let me guess, the unexpected irq we saw is the hardware trying to 
avoid losing irq? for example, we set a EDGE_RISING, and the hardware 
saw the gpio is already high, then though it might lost an irq, so fake 
one for safe?

i'll try to confirm it with IC guys.

>
> I'm not 100% sure about the above, so it might be good to verify it. But
> I also expect this means there's really no way to 100% reliably
> implement both-edge trigger types on hardware like this that doesn't
> directly implement it. So I'm not sure what we'd do with that knowledge.
>
>> +	data = readl(bank->reg_base + GPIO_INTEN);
>> +	writel_relaxed(data & ~mask, gc->reg_base + GPIO_INTEN);
>
> Not sure if this is a needless optimization: but couldn't you only
> disable the interrupt (and make the level and polarity changes) only if
> the level or polarity are actually changing? For example, it's possible
> to end up here when changing from EDGE_BOTH to EDGE_RISING, but we might
> not actually program a new value.

right, i noticed that too, will add a patch for that in v2.
>
> Brian
>
>> +
>>   	writel_relaxed(level, gc->reg_base + GPIO_INTTYPE_LEVEL);
>>   	writel_relaxed(polarity, gc->reg_base + GPIO_INT_POLARITY);
>>
>> +	writel_relaxed(data, gc->reg_base + GPIO_INTEN);
>> +
>>   	irq_gc_unlock(gc);
>>   	raw_spin_unlock_irqrestore(&bank->slock, flags);
>>   	clk_disable(bank->clk);
>> --
>> 2.11.0
>>
>>
>
>
>
Brian Norris May 8, 2018, 1:56 a.m. UTC | #3
On Tue, May 08, 2018 at 09:36:24AM +0800, Jeffy Chen wrote:
> On 05/08/2018 06:15 AM, Brian Norris wrote:
> > On the other hand...this also implies there may be a race condition
> > there, where we might lose an interrupt if there is an edge between the
> > re-configuration of the polarity in rockchip_irq_demux() and the
> > clearing/handling of the interrupt (handle_edge_irq() ->
> > chip->irq_ack()). If we have an edge between there, then we might ack
> > it, but leave the polarity such that we aren't ready for the next
> > (inverted) edge.
> 
> if let me guess, the unexpected irq we saw is the hardware trying to avoid
> losing irq? for example, we set a EDGE_RISING, and the hardware saw the gpio
> is already high, then though it might lost an irq, so fake one for safe?

I won't pretend to know what the IC designers were doing, but I don't
think that would resolve the problem I'm talking about. The sequence is
something like:
1. EDGE_BOTH IRQ occurs (e.g., low to high)
2. reconfigure polarity in rockchip_irq_demux() (polarity=low)
3. continue to handle_edge_irq()
4. another HW edge occurs (e.g., high to low)
5. handle_edge_irq() (from 3) acks (clears) IRQ (before a subsequent
   rockchip_irq_demux() gets a chance to run and flip the polarity)
...

Now the polarity is still low, but the next trigger should be a
low-to-high edge.

> i'll try to confirm it with IC guys.

Brian
Jeffy Chen May 8, 2018, 2:31 a.m. UTC | #4
Hi Brian,

On 05/08/2018 09:56 AM, Brian Norris wrote:
> On Tue, May 08, 2018 at 09:36:24AM +0800, Jeffy Chen wrote:
>> On 05/08/2018 06:15 AM, Brian Norris wrote:
>>> On the other hand...this also implies there may be a race condition
>>> there, where we might lose an interrupt if there is an edge between the
>>> re-configuration of the polarity in rockchip_irq_demux() and the
>>> clearing/handling of the interrupt (handle_edge_irq() ->
>>> chip->irq_ack()). If we have an edge between there, then we might ack
>>> it, but leave the polarity such that we aren't ready for the next
>>> (inverted) edge.
>>
>> if let me guess, the unexpected irq we saw is the hardware trying to avoid
>> losing irq? for example, we set a EDGE_RISING, and the hardware saw the gpio
>> is already high, then though it might lost an irq, so fake one for safe?
>
> I won't pretend to know what the IC designers were doing, but I don't
> think that would resolve the problem I'm talking about. The sequence is
> something like:
> 1. EDGE_BOTH IRQ occurs (e.g., low to high)
> 2. reconfigure polarity in rockchip_irq_demux() (polarity=low)
> 3. continue to handle_edge_irq()
> 4. another HW edge occurs (e.g., high to low)
> 5. handle_edge_irq() (from 3) acks (clears) IRQ (before a subsequent
>     rockchip_irq_demux() gets a chance to run and flip the polarity)
> ...
>
> Now the polarity is still low, but the next trigger should be a
> low-to-high edge.

oops, i see the problem.

so what if we do these:
1/ edge irq triggered
2/ read gpio level
3/ handle irq(ack irq)
4/ toggle edge mode(with a while gpio level check)

if the gpio changed in 2/ -> 3/, the 4/ will trigger an irq when writing 
GPIO_INT_POLARITY(which is what we are trying to avoid in the set_type case)

but this would not work if i'm wrong about how the HW fake an irq when 
changing POLARITY...


or maybe we could just check the gpio status again after 
handle_edge_irq, and correct the polarity in this case

>
>> i'll try to confirm it with IC guys.
>
> Brian
>
>
>
Jeffy Chen May 8, 2018, 2:56 a.m. UTC | #5
Hi Brian,

On 05/08/2018 10:31 AM, JeffyChen wrote:
> Hi Brian,
>
> On 05/08/2018 09:56 AM, Brian Norris wrote:
>> On Tue, May 08, 2018 at 09:36:24AM +0800, Jeffy Chen wrote:
>>> On 05/08/2018 06:15 AM, Brian Norris wrote:
>>>> On the other hand...this also implies there may be a race condition
>>>> there, where we might lose an interrupt if there is an edge between the
>>>> re-configuration of the polarity in rockchip_irq_demux() and the
>>>> clearing/handling of the interrupt (handle_edge_irq() ->
>>>> chip->irq_ack()). If we have an edge between there, then we might ack
>>>> it, but leave the polarity such that we aren't ready for the next
>>>> (inverted) edge.
>>>
>>> if let me guess, the unexpected irq we saw is the hardware trying to
>>> avoid
>>> losing irq? for example, we set a EDGE_RISING, and the hardware saw
>>> the gpio
>>> is already high, then though it might lost an irq, so fake one for safe?
>>
>> I won't pretend to know what the IC designers were doing, but I don't
>> think that would resolve the problem I'm talking about. The sequence is
>> something like:
>> 1. EDGE_BOTH IRQ occurs (e.g., low to high)
>> 2. reconfigure polarity in rockchip_irq_demux() (polarity=low)
>> 3. continue to handle_edge_irq()
>> 4. another HW edge occurs (e.g., high to low)
>> 5. handle_edge_irq() (from 3) acks (clears) IRQ (before a subsequent
>>     rockchip_irq_demux() gets a chance to run and flip the polarity)
>> ...
>>
>> Now the polarity is still low, but the next trigger should be a
>> low-to-high edge.
>
> oops, i see the problem.
>
> so what if we do these:
> 1/ edge irq triggered
> 2/ read gpio level
> 3/ handle irq(ack irq)
> 4/ toggle edge mode(with a while gpio level check)
>
> if the gpio changed in 2/ -> 3/, the 4/ will trigger an irq when writing
> GPIO_INT_POLARITY(which is what we are trying to avoid in the set_type
> case)
>
> but this would not work if i'm wrong about how the HW fake an irq when
> changing POLARITY...
>
>
> or maybe we could just check the gpio status again after
> handle_edge_irq, and correct the polarity in this case

i saw the pinctrl-msm.c do this in the ack(),

and also at the end of set_type(), which might avoid another race in the 
set_type()

>
>>
>>> i'll try to confirm it with IC guys.
>>
>> Brian
>>
>>
>>
>
Doug Anderson May 8, 2018, 7:46 p.m. UTC | #6
Hi,

On Mon, May 7, 2018 at 6:56 PM, Brian Norris <briannorris@chromium.org> wrote:
> On Tue, May 08, 2018 at 09:36:24AM +0800, Jeffy Chen wrote:
>> On 05/08/2018 06:15 AM, Brian Norris wrote:
>> > On the other hand...this also implies there may be a race condition
>> > there, where we might lose an interrupt if there is an edge between the
>> > re-configuration of the polarity in rockchip_irq_demux() and the
>> > clearing/handling of the interrupt (handle_edge_irq() ->
>> > chip->irq_ack()). If we have an edge between there, then we might ack
>> > it, but leave the polarity such that we aren't ready for the next
>> > (inverted) edge.
>>
>> if let me guess, the unexpected irq we saw is the hardware trying to avoid
>> losing irq? for example, we set a EDGE_RISING, and the hardware saw the gpio
>> is already high, then though it might lost an irq, so fake one for safe?
>
> I won't pretend to know what the IC designers were doing, but I don't
> think that would resolve the problem I'm talking about. The sequence is
> something like:
> 1. EDGE_BOTH IRQ occurs (e.g., low to high)
> 2. reconfigure polarity in rockchip_irq_demux() (polarity=low)
> 3. continue to handle_edge_irq()
> 4. another HW edge occurs (e.g., high to low)
> 5. handle_edge_irq() (from 3) acks (clears) IRQ (before a subsequent
>    rockchip_irq_demux() gets a chance to run and flip the polarity)
> ...
>
> Now the polarity is still low, but the next trigger should be a
> low-to-high edge.
>
>> i'll try to confirm it with IC guys.

One note is that in the case Brian points at (where we need to
simulate EDGE_BOTH by swapping edges) we purposely ignored the TRM and
we needed to do that to avoid losing interrupts.  For details, see
commit 53b1bfc76df2 ("pinctrl: rockchip: Avoid losing interrupts when
supporting both edges").  We did this because:

1. We believed that the IP block in Rockchip SoCs has nearly the same
logic as "gpio-dwapb.c" and that's what "gpio-dwapb.c" did.

2. We were actually losing real interrupts and this was the only way
we could figure out how to fix it.

When I tested that back in the day I was fairly convinced that we
weren't losing any interrupts in the EDGE_BOTH case after my fix, but
I certainly could have messed up.


For the EDGE_BOTH case it was important not to lose an interrupt
because, as you guys are talking about, we could end up configured the
wrong way.  I think in your case where you're just picking one
polarity losing an interrupt shouldn't matter since it's undefined
exactly if an edge happens while you're in the middle of executing
rockchip_irq_set_type().  Is that right?


-Doug
Jeffy Chen May 9, 2018, 2:21 a.m. UTC | #7
Hi Doug,

On 05/09/2018 03:46 AM, Doug Anderson wrote:
> One note is that in the case Brian points at (where we need to
> simulate EDGE_BOTH by swapping edges) we purposely ignored the TRM and
> we needed to do that to avoid losing interrupts.  For details, see
> commit 53b1bfc76df2 ("pinctrl: rockchip: Avoid losing interrupts when
> supporting both edges").  We did this because:
>
> 1. We believed that the IP block in Rockchip SoCs has nearly the same
> logic as "gpio-dwapb.c" and that's what "gpio-dwapb.c" did.
>

hmm, but i saw the gpio-dwapb.c actually toggle trigger after handle 
irq, which might avoid the race Brian mentioned:
+               generic_handle_irq(gpio_irq);
+               irq_status &= ~BIT(hwirq);
+
+               if ((irq_get_trigger_type(gpio_irq) & IRQ_TYPE_SENSE_MASK)
+                       == IRQ_TYPE_EDGE_BOTH)
+                       dwapb_toggle_trigger(gpio, hwirq);


and i also saw ./qcom/pinctrl-msm.c do the 
toggle(msm_gpio_update_dual_edge_pos) at the end of 
msm_gpio_irq_set_type and msm_gpio_irq_ack, that seems can avoid the 
polarity races too.

> 2. We were actually losing real interrupts and this was the only way
> we could figure out how to fix it.
>
> When I tested that back in the day I was fairly convinced that we
> weren't losing any interrupts in the EDGE_BOTH case after my fix, but
> I certainly could have messed up.
>
>
> For the EDGE_BOTH case it was important not to lose an interrupt
> because, as you guys are talking about, we could end up configured the
> wrong way.  I think in your case where you're just picking one
> polarity losing an interrupt shouldn't matter since it's undefined
> exactly if an edge happens while you're in the middle of executing
> rockchip_irq_set_type().  Is that right?

right, so we now have 2 cases: rockchip_irq_demux/ rockchip_irq_set_type

if i'm right about the spurious irq(only happen when set rising for a 
high gpio, or set falling for a low gpio), then:

1/ rockchip_irq_demux
it's important to not losing irqs in this case, maybe we can

a) ack irq
b) update polarity for edge both irq

we don't need to disable irq in b), since we would not hit the spurious 
irq cases here(always check gpio level to toggle it)


2/ rockchip_irq_set_type
it's important to not having spurious irqs

so we can disable irq during changing polarity only in these case:
((rising && gpio is heigh) || (falling && gpio is low))

i'm still confirming the spurious irq with IC guys.

>
>
> -Doug
>
>
>
Doug Anderson May 9, 2018, 5:18 a.m. UTC | #8
Hi,

On Tue, May 8, 2018 at 7:21 PM, JeffyChen <jeffy.chen@rock-chips.com> wrote:
> Hi Doug,
>
> On 05/09/2018 03:46 AM, Doug Anderson wrote:
>>
>> One note is that in the case Brian points at (where we need to
>> simulate EDGE_BOTH by swapping edges) we purposely ignored the TRM and
>> we needed to do that to avoid losing interrupts.  For details, see
>> commit 53b1bfc76df2 ("pinctrl: rockchip: Avoid losing interrupts when
>> supporting both edges").  We did this because:
>>
>> 1. We believed that the IP block in Rockchip SoCs has nearly the same
>> logic as "gpio-dwapb.c" and that's what "gpio-dwapb.c" did.
>>
>
> hmm, but i saw the gpio-dwapb.c actually toggle trigger after handle irq,
> which might avoid the race Brian mentioned:
> +               generic_handle_irq(gpio_irq);
> +               irq_status &= ~BIT(hwirq);
> +
> +               if ((irq_get_trigger_type(gpio_irq) & IRQ_TYPE_SENSE_MASK)
> +                       == IRQ_TYPE_EDGE_BOTH)
> +                       dwapb_toggle_trigger(gpio, hwirq);

The code you point at in dwapb_toggle_trigger() specifically is an
example of toggling the polarity _without_ disabling the interrupt in
between.  We took this as "working" code and as proof that it was OK
to change polarity without disabling the interrupt in-between.


> and i also saw ./qcom/pinctrl-msm.c do the
> toggle(msm_gpio_update_dual_edge_pos) at the end of msm_gpio_irq_set_type
> and msm_gpio_irq_ack, that seems can avoid the polarity races too.
>
>> 2. We were actually losing real interrupts and this was the only way
>> we could figure out how to fix it.
>>
>> When I tested that back in the day I was fairly convinced that we
>> weren't losing any interrupts in the EDGE_BOTH case after my fix, but
>> I certainly could have messed up.
>>
>>
>> For the EDGE_BOTH case it was important not to lose an interrupt
>> because, as you guys are talking about, we could end up configured the
>> wrong way.  I think in your case where you're just picking one
>> polarity losing an interrupt shouldn't matter since it's undefined
>> exactly if an edge happens while you're in the middle of executing
>> rockchip_irq_set_type().  Is that right?
>
>
> right, so we now have 2 cases: rockchip_irq_demux/ rockchip_irq_set_type
>
> if i'm right about the spurious irq(only happen when set rising for a high
> gpio, or set falling for a low gpio), then:
>
> 1/ rockchip_irq_demux
> it's important to not losing irqs in this case, maybe we can
>
> a) ack irq
> b) update polarity for edge both irq
>
> we don't need to disable irq in b), since we would not hit the spurious irq
> cases here(always check gpio level to toggle it)

Unless you have some sort of proof that rockchip_irq_demux(), I would
take it as an example of something that works.  I remember stress
testing the heck out of it.  Do you have some evidence that it's not
working?  I think Brian was simply speculating that there might be a
race here, but I don't think anyone has shown it have they?  Looking
back at my notes, the thing I really made sure to stress was that we
never got into a situation where we were losing an edge (AKA we were
never configured to look for a falling edge when the line was already
low).  I'm not sure I confirmed that we never got an extra interrupt.

I'm at home right now and I can't add prints and poke at things, but
as I understand it for edge interrupts the usual flow to make sure
interrupts aren't ever lost is:

1. See that the interrupt went off
2. Ack it (clear it)
3. Call the interrupt handler

...presumably in this case rockchip_irq_demux() is called after step
#2 (but I don't know if it's called before or after step #3).  If the
line is toggling like crazy while the 3 steps are going on, it's OK if
the interrupt handler is called more than once.  In general this could
be considered expected.  That's why you Ack before handling--any extra
edges that come in any time after the interrupt handler starts (even
after the very first instruction) need to cause the interrupt handler
to get called again.

This is different than Brian's understanding since he seemed to think
the Ack was happening later.  If you're in front of something where
you can add printouts, maybe you can tell us.  I tried to look through
the code and it was too twisted for me to be sure.


> 2/ rockchip_irq_set_type
> it's important to not having spurious irqs
>
> so we can disable irq during changing polarity only in these case:
> ((rising && gpio is heigh) || (falling && gpio is low))
>
> i'm still confirming the spurious irq with IC guys.

Hmmm, thinking about all this more, I'm curious how callers expect
this to work.  Certainly things are undefined if you have the
following situation:

Start: rising edge trigger, line is actually high
Request: change to falling edge trigger
Line falls during the request

In that case it's OK to throw the interrupt away because it can be
argued that the line could have fallen before the request actually
took place.  ...but it seems like there could be situations where the
user wouldn't expect interrupts to be thrown away by a call to
irq_set_type().  In other words:

Start: rising edge trigger, line is actually high
Request: change to falling edge trigger
Line falls, rises, and falls during the request

...in that case you'd expect that some sort of interrupt would have
gone off and not be thrown away.  No matter what instant in time the
request actually took place it should have caught an edge, right?


Said another way: As a client of irq_set_type() I'd expect it to not
throw away interrupts, but I'd expect that the change in type would be
atomic.  That is: if the interrupt came in before the type change in
type applied then it should trigger with the old rules.  If the
interrupt came in after the type change then it should trigger with
the new rules.


I would be tempted to confirm your testing and just clear the spurious
interrupts that you're aware of.  AKA: if there's no interrupt pending
and you change the type to "IRQ_TYPE_EDGE_RISING" or
"IRQ_TYPE_EDGE_FALLING" then you should clear the interrupt.  It's
still racy, but I guess it's the best you can do unless IC guys come
up with something better.



Anyway, it's past my bedtime.  Hopefully some of the above made sense.
I'm sure you'll tell me if it didn't or if I said something
stupid/wrong.  :-P

-Doug
Jeffy Chen May 9, 2018, 6:41 a.m. UTC | #9
Hi Doug,

Thanks for your reply :)

On 05/09/2018 01:18 PM, Doug Anderson wrote:
>> >
>> >
>> >right, so we now have 2 cases: rockchip_irq_demux/ rockchip_irq_set_type
>> >
>> >if i'm right about the spurious irq(only happen when set rising for a high
>> >gpio, or set falling for a low gpio), then:
>> >
>> >1/ rockchip_irq_demux
>> >it's important to not losing irqs in this case, maybe we can
>> >
>> >a) ack irq
>> >b) update polarity for edge both irq
>> >
>> >we don't need to disable irq in b), since we would not hit the spurious irq
>> >cases here(always check gpio level to toggle it)
> Unless you have some sort of proof that rockchip_irq_demux(), I would
> take it as an example of something that works.  I remember stress
> testing the heck out of it.  Do you have some evidence that it's not
> working?  I think Brian was simply speculating that there might be a
> race here, but I don't think anyone has shown it have they?  Looking
> back at my notes, the thing I really made sure to stress was that we
> never got into a situation where we were losing an edge (AKA we were
> never configured to look for a falling edge when the line was already
> low).  I'm not sure I confirmed that we never got an extra interrupt.
>
> I'm at home right now and I can't add prints and poke at things, but
> as I understand it for edge interrupts the usual flow to make sure
> interrupts aren't ever lost is:
>
> 1. See that the interrupt went off
> 2. Ack it (clear it)
> 3. Call the interrupt handler
>
> ...presumably in this case rockchip_irq_demux() is called after step
> #2 (but I don't know if it's called before or after step #3).  If the
> line is toggling like crazy while the 3 steps are going on, it's OK if
> the interrupt handler is called more than once.  In general this could
> be considered expected.  That's why you Ack before handling--any extra
> edges that come in any time after the interrupt handler starts (even
> after the very first instruction) need to cause the interrupt handler
> to get called again.
>
> This is different than Brian's understanding since he seemed to think
> the Ack was happening later.  If you're in front of something where
> you can add printouts, maybe you can tell us.  I tried to look through
> the code and it was too twisted for me to be sure.

i think the current edge both irq flow for rk3399 would be:
gic_handle_irq //irq-gic-v3.c
   handle_domain_irq
     rockchip_irq_demux //pinctrl-rockchip.c
       toggle polarity
       generic_handle_irq
         handle_edge_irq //kernel/irq
           irq_ack
           handle_irq_event
             action->handler

so i think the race might actually exist (maybe we can add some delay 
after toggle polarity to confirm)


BTW, checking other drivers, there're quite a few using this kind of 
toggle edge for edge both, and they go different ways:

1/ toggle it in ack():
mediatek/pinctrl-mtk-common.c
gpio-ingenic.c
gpio-ep93xx.c

2/ toggle it before handle_irq:
pinctrl-rockchip.c
pinctrl-armada-37xx.c
gpio-ath79.c
gpio-mxs.c
gpio-omap.c
gpio-mvebu.c

3/ toggle it after handle_irq:
gpio-dwapb.c
gpio-pmic-eic-sprd.c


would it make sense to support this kind of emulate edge both irq in 
some core codes?

>
>
>> >2/ rockchip_irq_set_type
>> >it's important to not having spurious irqs
>> >
>> >so we can disable irq during changing polarity only in these case:
>> >((rising && gpio is heigh) || (falling && gpio is low))
>> >
>> >i'm still confirming the spurious irq with IC guys.
> Hmmm, thinking about all this more, I'm curious how callers expect
> this to work.  Certainly things are undefined if you have the
> following situation:
>
> Start: rising edge trigger, line is actually high
> Request: change to falling edge trigger
> Line falls during the request
>
> In that case it's OK to throw the interrupt away because it can be
> argued that the line could have fallen before the request actually
> took place.  ...but it seems like there could be situations where the
> user wouldn't expect interrupts to be thrown away by a call to
> irq_set_type().  In other words:
>
> Start: rising edge trigger, line is actually high
> Request: change to falling edge trigger
> Line falls, rises, and falls during the request
>
> ...in that case you'd expect that some sort of interrupt would have
> gone off and not be thrown away.  No matter what instant in time the
> request actually took place it should have caught an edge, right?
>
>
> Said another way: As a client of irq_set_type() I'd expect it to not
> throw away interrupts, but I'd expect that the change in type would be
> atomic.  That is: if the interrupt came in before the type change in
> type applied then it should trigger with the old rules.  If the
> interrupt came in after the type change then it should trigger with
> the new rules.
>
>
> I would be tempted to confirm your testing and just clear the spurious
> interrupts that you're aware of.  AKA: if there's no interrupt pending
> and you change the type to "IRQ_TYPE_EDGE_RISING" or
> "IRQ_TYPE_EDGE_FALLING" then you should clear the interrupt.  It's
> still racy, but I guess it's the best you can do unless IC guys come
> up with something better.
>

hmmm, right, clear the spurious irq seems to be a better way, will try 
to do it in the next version.
>
>
> Anyway, it's past my bedtime.  Hopefully some of the above made sense.
> I'm sure you'll tell me if it didn't or if I said something
> stupid/wrong.:-P
>
> -Doug
>
>
>
Brian Norris May 10, 2018, 10:16 p.m. UTC | #10
+ irqchip maintainers

[ irqchip is weird -- it's all over drivers/{pinctrl,gpio,irqchip}/ :D ]

Hi Doug,

On Tue, May 08, 2018 at 10:18:18PM -0700, Doug Anderson wrote:
> On Tue, May 8, 2018 at 7:21 PM, JeffyChen <jeffy.chen@rock-chips.com> wrote:
> > On 05/09/2018 03:46 AM, Doug Anderson wrote:
> >> One note is that in the case Brian points at (where we need to
> >> simulate EDGE_BOTH by swapping edges) we purposely ignored the TRM and
> >> we needed to do that to avoid losing interrupts.  For details, see
> >> commit 53b1bfc76df2 ("pinctrl: rockchip: Avoid losing interrupts when
> >> supporting both edges").  We did this because:
> >>
> >> 1. We believed that the IP block in Rockchip SoCs has nearly the same
> >> logic as "gpio-dwapb.c" and that's what "gpio-dwapb.c" did.
> >>
> >
> > hmm, but i saw the gpio-dwapb.c actually toggle trigger after handle irq,
> > which might avoid the race Brian mentioned:
> > +               generic_handle_irq(gpio_irq);
> > +               irq_status &= ~BIT(hwirq);
> > +
> > +               if ((irq_get_trigger_type(gpio_irq) & IRQ_TYPE_SENSE_MASK)
> > +                       == IRQ_TYPE_EDGE_BOTH)
> > +                       dwapb_toggle_trigger(gpio, hwirq);
> 
> The code you point at in dwapb_toggle_trigger() specifically is an
> example of toggling the polarity _without_ disabling the interrupt in
> between.  We took this as "working" code and as proof that it was OK
> to change polarity without disabling the interrupt in-between.

There's a crucial ordering difference though: gpio-dwapb performs its
polarity adjustments *after* calling generic_handle_irq(), which among
other things calls ->irq_ack(). This means that it's re-ensuring that
the polarity is correct *after* the point at which it last ack'ed the
interrupt. So there's no chance of it clearing a second interrupt
without appropriately reconfiguring the polarity.

> > and i also saw ./qcom/pinctrl-msm.c do the
> > toggle(msm_gpio_update_dual_edge_pos) at the end of msm_gpio_irq_set_type
> > and msm_gpio_irq_ack, that seems can avoid the polarity races too.
> >
> >> 2. We were actually losing real interrupts and this was the only way
> >> we could figure out how to fix it.
> >>
> >> When I tested that back in the day I was fairly convinced that we
> >> weren't losing any interrupts in the EDGE_BOTH case after my fix, but
> >> I certainly could have messed up.
> >>
> >>
> >> For the EDGE_BOTH case it was important not to lose an interrupt
> >> because, as you guys are talking about, we could end up configured the
> >> wrong way.  I think in your case where you're just picking one
> >> polarity losing an interrupt shouldn't matter since it's undefined
> >> exactly if an edge happens while you're in the middle of executing
> >> rockchip_irq_set_type().  Is that right?
> >
> >
> > right, so we now have 2 cases: rockchip_irq_demux/ rockchip_irq_set_type
> >
> > if i'm right about the spurious irq(only happen when set rising for a high
> > gpio, or set falling for a low gpio), then:
> >
> > 1/ rockchip_irq_demux
> > it's important to not losing irqs in this case, maybe we can
> >
> > a) ack irq
> > b) update polarity for edge both irq
> >
> > we don't need to disable irq in b), since we would not hit the spurious irq
> > cases here(always check gpio level to toggle it)
> 
> Unless you have some sort of proof that rockchip_irq_demux(), I would
> take it as an example of something that works.  I remember stress
> testing the heck out of it.  Do you have some evidence that it's not
> working?  I think Brian was simply speculating that there might be a
> race here, but I don't think anyone has shown it have they?  Looking
> back at my notes, the thing I really made sure to stress was that we
> never got into a situation where we were losing an edge (AKA we were
> never configured to look for a falling edge when the line was already
> low).  I'm not sure I confirmed that we never got an extra interrupt.

I'll agree wholeheartedly that I'm only at the speculation stage right
now :) I can try to poke at it sometime if I get some cycles. I'd
definitely want to get better test results to prove this before changing
this part.

This is really just a side tangent anyway, because apparently the
existing code is working well enough for rockchip_irq_demux(), and for
$subject, we're really only working on improving set_type().

> I'm at home right now and I can't add prints and poke at things, but
> as I understand it for edge interrupts the usual flow to make sure
> interrupts aren't ever lost is:
> 
> 1. See that the interrupt went off
> 2. Ack it (clear it)
> 3. Call the interrupt handler
> 
> ...presumably in this case rockchip_irq_demux() is called after step
> #2 (but I don't know if it's called before or after step #3).  If the

One thing to note here is that we're talking about a 2-level (chained)
interrupt system. We've got the GIC, which does all of 1, 2, 3 at its
level, and as part of #3 for the GIC, it runs the 2nd-level handler --
rockchip_irq_demux() -- which has to perform all of 1, 2, 3 at its level.

1 -> this is looping over GPIO_INT_STATUS in rockchip_irq_demux()
2 -> this happens when writing to GPIO_PORTS_EOI, which only is called
     by ->irq_ack() (irq_gc_ack_set_bit()) ... which happens from:
       rockchip_irq_demux()
         -> generic_handle_irq()
	   -> handle_edge_irq()
             -> desc->irq_data.chip->irq_ack() = irq_gc_ack_set_bit()
3 -> same callpath as 2, except it's
     -> handle_edge_irq()
       -> handle_irq_event()

Those all happen in the correct order.

But the problem is that you left out the part about "change polarity for
emulating EDGE_BOTH"; in your example (gpio-dwapb.c), polarity
adjustment happens *after* 2; in Rockchip's driver, we have it before.
I'm pretty sure dwapb is correct, and we are not.

> line is toggling like crazy while the 3 steps are going on, it's OK if
> the interrupt handler is called more than once.  In general this could
> be considered expected.  That's why you Ack before handling--any extra
> edges that come in any time after the interrupt handler starts (even
> after the very first instruction) need to cause the interrupt handler
> to get called again.
> 
> This is different than Brian's understanding since he seemed to think
> the Ack was happening later.  If you're in front of something where
> you can add printouts, maybe you can tell us.  I tried to look through
> the code and it was too twisted for me to be sure.

I'm not sure your understanding of my understanding is accurate :)
Hopefully the above clarifies what I'm thinking?

> > 2/ rockchip_irq_set_type
> > it's important to not having spurious irqs
> >
> > so we can disable irq during changing polarity only in these case:
> > ((rising && gpio is heigh) || (falling && gpio is low))
> >
> > i'm still confirming the spurious irq with IC guys.
> 
> Hmmm, thinking about all this more, I'm curious how callers expect
> this to work.  Certainly things are undefined if you have the
> following situation:
> 
> Start: rising edge trigger, line is actually high
> Request: change to falling edge trigger
> Line falls during the request
> 
> In that case it's OK to throw the interrupt away because it can be
> argued that the line could have fallen before the request actually
> took place.  ...but it seems like there could be situations where the
> user wouldn't expect interrupts to be thrown away by a call to
> irq_set_type().  In other words:
> 
> Start: rising edge trigger, line is actually high
> Request: change to falling edge trigger
> Line falls, rises, and falls during the request
> 
> ...in that case you'd expect that some sort of interrupt would have
> gone off and not be thrown away.  No matter what instant in time the
> request actually took place it should have caught an edge, right?
> 
> 
> Said another way: As a client of irq_set_type() I'd expect it to not
> throw away interrupts, but I'd expect that the change in type would be
> atomic.  That is: if the interrupt came in before the type change in
> type applied then it should trigger with the old rules.  If the
> interrupt came in after the type change then it should trigger with
> the new rules.

I'm not sure it's totally possible to differentiate these, but that
seems about right I think.

> I would be tempted to confirm your testing and just clear the spurious
> interrupts that you're aware of.  AKA: if there's no interrupt pending
> and you change the type to "IRQ_TYPE_EDGE_RISING" or
> "IRQ_TYPE_EDGE_FALLING" then you should clear the interrupt.  It's
> still racy, but I guess it's the best you can do unless IC guys come
> up with something better.

Thanks! Yeah, clearing (rather than temporarily disabling) seems to make
more sense.

> Anyway, it's past my bedtime.  Hopefully some of the above made sense.
> I'm sure you'll tell me if it didn't or if I said something
> stupid/wrong.  :-P

Brian
diff mbox

Patch

diff --git a/drivers/pinctrl/pinctrl-rockchip.c b/drivers/pinctrl/pinctrl-rockchip.c
index 3924779f55785..7ff45ec8330d1 100644
--- a/drivers/pinctrl/pinctrl-rockchip.c
+++ b/drivers/pinctrl/pinctrl-rockchip.c
@@ -2727,9 +2727,19 @@  static int rockchip_irq_set_type(struct irq_data *d, unsigned int type)
 		return -EINVAL;
 	}
 
+	/**
+	 * According to the TRM, we should keep irq disabled during programming
+	 * interrupt capability to prevent spurious glitches on the interrupt
+	 * lines to the interrupt controller.
+	 */
+	data = readl(bank->reg_base + GPIO_INTEN);
+	writel_relaxed(data & ~mask, gc->reg_base + GPIO_INTEN);
+
 	writel_relaxed(level, gc->reg_base + GPIO_INTTYPE_LEVEL);
 	writel_relaxed(polarity, gc->reg_base + GPIO_INT_POLARITY);
 
+	writel_relaxed(data, gc->reg_base + GPIO_INTEN);
+
 	irq_gc_unlock(gc);
 	raw_spin_unlock_irqrestore(&bank->slock, flags);
 	clk_disable(bank->clk);