Message ID | 20241028160748.489596-9-u.kleine-koenig@baylibre.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | iio: adc: ad7124: Make it work on de10-nano | expand |
On Mon, 2024-10-28 at 17:07 +0100, Uwe Kleine-König wrote: > Some of the ADCs by Analog signal their irq condition on the MISO line. > So typically that line is connected to an SPI controller and a GPIO. The > GPIO is used as input and the respective interrupt is enabled when the > last SPI transfer is completed. > > Depending on the GPIO controller the toggling MISO line might make the > interrupt pending even while it's masked. In that case the irq handler > is called immediately after irq_enable() and so before the device > actually pulls that line low which results in non-sense values being > reported to the upper layers. > > The only way to find out if the line was actually pulled low is to read > the GPIO. (There is a flag in AD7124's status register that also signals > if an interrupt was asserted, but reading that register toggles the MISO > line and so might trigger another spurious interrupt.) > > Add the possibility to specify an interrupt GPIO in the machine > description instead of a plain interrupt. This GPIO is used as interrupt > source and to check if the irq line is actually active in the irq > handler. > > Signed-off-by: Uwe Kleine-König <u.kleine-koenig@baylibre.com> > --- Hi all, Regarding this, I do share some of the concerns already raised by Jonathan. I fear that we're papering around an issue with the IRQ controller rather than being an issue with the device. When I look at irq_disable() docs [1], it feels that we're already doing what we're supposed to do. IOW, we disable the lazy approach so we *should* not get any pending IRQ. Also looking at drivers as the xilinx gpio controller, it seems some are careful about this [2] and make sure to clear all pending IRQs when unmasking. Jonathan also said this: "True enough - that race is a possibility, but not all interrupt inputs are capable of gpio usage whilst setup to received interrupts." To my understanding this also means this is doomed to fail for some devices or am I not following it? All that said, my naive feeling would be for a masked line to not get any pending IRQ and if it does, the driver should make sure to clean all outstanding interrupts when unmasking. But I'm far from being an expert of the IRQ subsystem. Maybe it would be interesting to get some inputs about someone who actually knows better? [1]: https://elixir.bootlin.com/linux/v6.11.5/source/kernel/irq/chip.c#L366 [2]: https://elixir.bootlin.com/linux/v6.11.5/source/kernel/irq/chip.c#L366 - Nuno Sá
On Wed, 30 Oct 2024 14:04:58 +0100 Nuno Sá <noname.nuno@gmail.com> wrote: > On Mon, 2024-10-28 at 17:07 +0100, Uwe Kleine-König wrote: > > Some of the ADCs by Analog signal their irq condition on the MISO line. > > So typically that line is connected to an SPI controller and a GPIO. The > > GPIO is used as input and the respective interrupt is enabled when the > > last SPI transfer is completed. > > > > Depending on the GPIO controller the toggling MISO line might make the > > interrupt pending even while it's masked. In that case the irq handler > > is called immediately after irq_enable() and so before the device > > actually pulls that line low which results in non-sense values being > > reported to the upper layers. > > > > The only way to find out if the line was actually pulled low is to read > > the GPIO. (There is a flag in AD7124's status register that also signals > > if an interrupt was asserted, but reading that register toggles the MISO > > line and so might trigger another spurious interrupt.) > > > > Add the possibility to specify an interrupt GPIO in the machine > > description instead of a plain interrupt. This GPIO is used as interrupt > > source and to check if the irq line is actually active in the irq > > handler. > > > > Signed-off-by: Uwe Kleine-König <u.kleine-koenig@baylibre.com> > > --- > > Hi all, > > Regarding this, I do share some of the concerns already raised by Jonathan. I fear > that we're papering around an issue with the IRQ controller rather than being an > issue with the device. When I look at irq_disable() docs [1], it feels that we're > already doing what we're supposed to do. IOW, we disable the lazy approach so we > *should* not get any pending IRQ. Also looking at drivers as the xilinx gpio > controller, it seems some are careful about this [2] and make sure to clear all > pending IRQs when unmasking. Your links are both to the same place. > > Jonathan also said this: > > "True enough - that race is a possibility, but not all interrupt inputs > are capable of gpio usage whilst setup to received interrupts." Race should be easy to avoid using a level interrupt now I think more on that: can't miss a level. > > To my understanding this also means this is doomed to fail for some devices or am I > not following it? If you were wired to one of those, you couldn't use the GPIO trick, but then don't have a GPIO in your DT in that case. > > All that said, my naive feeling would be for a masked line to not get any pending IRQ > and if it does, the driver should make sure to clean all outstanding interrupts when > unmasking. But I'm far from being an expert of the IRQ subsystem. Maybe it would be > interesting to get some inputs about someone who actually knows better? +CC Thomas Gleixner, Hi Thomas, Annoying case where a wire is both the interrupt source for dataready and the SPI data line (if separate clock signal is toggling) So currently the driver masks interrupts at the host end, but we have at least one interrupt controller where they end up pending and fire on reenabling the interrupt. Querying the device to check the status register then ends up causing it to happen again, so that doesn't help. Proposal is to query it as a GPIO (or maybe a separate GPIO wired to the same pin) to check the level when an interrupt comes in. Any thoughts on this nasty hardware and what is responsiblity of the device driver vs the interrupt controller driver in this case? Jonathan > > [1]: https://elixir.bootlin.com/linux/v6.11.5/source/kernel/irq/chip.c#L366 > [2]: https://elixir.bootlin.com/linux/v6.11.5/source/kernel/irq/chip.c#L366 > > - Nuno Sá > >
Hello, On Wed, Oct 30, 2024 at 08:44:29PM +0000, Jonathan Cameron wrote: > On Wed, 30 Oct 2024 14:04:58 +0100 > Nuno Sá <noname.nuno@gmail.com> wrote: > > > On Mon, 2024-10-28 at 17:07 +0100, Uwe Kleine-König wrote: > > > Some of the ADCs by Analog signal their irq condition on the MISO line. > > > So typically that line is connected to an SPI controller and a GPIO. The > > > GPIO is used as input and the respective interrupt is enabled when the > > > last SPI transfer is completed. > > > > > > Depending on the GPIO controller the toggling MISO line might make the > > > interrupt pending even while it's masked. In that case the irq handler > > > is called immediately after irq_enable() and so before the device > > > actually pulls that line low which results in non-sense values being > > > reported to the upper layers. > > > > > > The only way to find out if the line was actually pulled low is to read > > > the GPIO. (There is a flag in AD7124's status register that also signals > > > if an interrupt was asserted, but reading that register toggles the MISO > > > line and so might trigger another spurious interrupt.) > > > > > > Add the possibility to specify an interrupt GPIO in the machine > > > description instead of a plain interrupt. This GPIO is used as interrupt > > > source and to check if the irq line is actually active in the irq > > > handler. > > > > > > Signed-off-by: Uwe Kleine-König <u.kleine-koenig@baylibre.com> > > > --- > > > > Hi all, > > > > Regarding this, I do share some of the concerns already raised by Jonathan. I fear > > that we're papering around an issue with the IRQ controller rather than being an > > issue with the device. When I look at irq_disable() docs [1], it feels that we're > > already doing what we're supposed to do. IOW, we disable the lazy approach so we > > *should* not get any pending IRQ. I think this is wrong and you always have to be prepared to see an irq triggering that became pending while masked. > > Also looking at drivers as the xilinx gpio controller, it seems some > > are careful about this [2] and make sure to clear all pending IRQs > > when unmasking. > Your links are both to the same place. The right one is: https://elixir.bootlin.com/linux/v6.11.5/source/drivers/gpio/gpio-xilinx.c#L419 I think this is buggy, see below for the reasoning. > > Jonathan also said this: > > > > "True enough - that race is a possibility, but not all interrupt inputs > > are capable of gpio usage whilst setup to received interrupts." > Race should be easy to avoid using a level interrupt now I think more on that: > can't miss a level. In general this isn't true. If it were that easy we could just assume all irqs being level interrupts and simplify the irq code a bit. At least for the ad7124 if a conversion is done, the chip holds the line low until the next conversion is done. In that case it deasserts DOUT/̅R̅D̅Y for a short while to create another falling edge signalling another event. I can imagine this to confuse level detection?! > > To my understanding this also means this is doomed to fail for some devices or am I > > not following it? > > If you were wired to one of those, you couldn't use the GPIO trick, but then > don't have a GPIO in your DT in that case. Yes. If the device isn't properly connected in hardware you're out of luck. But that is also true if the spi clock line isn't connected. So apart from the requirement that "properly" involves things that are unusual for other SPI devices, that's expected. Having said that it was clear before because the MISO (aka DOUT/̅R̅D̅Y) line was already know to have to be connected to an irq capable pin. > > All that said, my naive feeling would be for a masked line to not get any pending IRQ > > and if it does, the driver should make sure to clean all outstanding interrupts when > > unmasking. But I'm far from being an expert of the IRQ subsystem. Maybe it would be > > interesting to get some inputs about someone who actually knows better? > +CC Thomas Gleixner, > > Annoying case where a wire is both the interrupt source for dataready and the > SPI data line (if separate clock signal is toggling) So currently the driver > masks interrupts at the host end, but we have at least one interrupt controller > where they end up pending and fire on reenabling the interrupt. Querying the > device to check the status register then ends up causing it to happen again, > so that doesn't help. > > Proposal is to query it as a GPIO (or maybe a separate GPIO wired to the same > pin) to check the level when an interrupt comes in. In my understanding it's the expected behaviour of an irq controller that a masked irq becomes pending if the irq event (level or edge) happens and then triggers immediately after enable_irq() -- independent of laziness being used or not. That's also the only way to prevent missing interrupts. To understand that consider an irq that signals there is data in a fifo. The handler reads that data out and when it's empty returns. Returning results in unmasking the interrupt again. If new data arrives between seeing the fifo empty and reenabling the irq you miss the event if the irq doesn't become pending while it's masked. > Any thoughts on this nasty hardware and what is responsiblity of the device > driver vs the interrupt controller driver in this case? +1 Best regards Uwe
On Thu, 2024-10-31 at 11:40 +0100, Uwe Kleine-König wrote: > Hello, > > On Wed, Oct 30, 2024 at 08:44:29PM +0000, Jonathan Cameron wrote: > > On Wed, 30 Oct 2024 14:04:58 +0100 > > Nuno Sá <noname.nuno@gmail.com> wrote: > > > > > On Mon, 2024-10-28 at 17:07 +0100, Uwe Kleine-König wrote: > > > > Some of the ADCs by Analog signal their irq condition on the MISO line. > > > > So typically that line is connected to an SPI controller and a GPIO. The > > > > GPIO is used as input and the respective interrupt is enabled when the > > > > last SPI transfer is completed. > > > > > > > > Depending on the GPIO controller the toggling MISO line might make the > > > > interrupt pending even while it's masked. In that case the irq handler > > > > is called immediately after irq_enable() and so before the device > > > > actually pulls that line low which results in non-sense values being > > > > reported to the upper layers. > > > > > > > > The only way to find out if the line was actually pulled low is to read > > > > the GPIO. (There is a flag in AD7124's status register that also signals > > > > if an interrupt was asserted, but reading that register toggles the MISO > > > > line and so might trigger another spurious interrupt.) > > > > > > > > Add the possibility to specify an interrupt GPIO in the machine > > > > description instead of a plain interrupt. This GPIO is used as interrupt > > > > source and to check if the irq line is actually active in the irq > > > > handler. > > > > > > > > Signed-off-by: Uwe Kleine-König <u.kleine-koenig@baylibre.com> > > > > --- > > > > > > Hi all, > > > > > > Regarding this, I do share some of the concerns already raised by Jonathan. I > > > fear > > > that we're papering around an issue with the IRQ controller rather than being > > > an > > > issue with the device. When I look at irq_disable() docs [1], it feels that > > > we're > > > already doing what we're supposed to do. IOW, we disable the lazy approach so > > > we > > > *should* not get any pending IRQ. > > I think this is wrong and you always have to be prepared to see an irq > triggering that became pending while masked. > > > > Also looking at drivers as the xilinx gpio controller, it seems some > > > are careful about this [2] and make sure to clear all pending IRQs > > > when unmasking. > > Your links are both to the same place. > > The right one is: > https://elixir.bootlin.com/linux/v6.11.5/source/drivers/gpio/gpio-xilinx.c#L419 > > I think this is buggy, see below for the reasoning. > > > > Jonathan also said this: > > > > > > "True enough - that race is a possibility, but not all interrupt inputs > > > are capable of gpio usage whilst setup to received interrupts." > > Race should be easy to avoid using a level interrupt now I think more on that: > > can't miss a level. > > In general this isn't true. If it were that easy we could just assume > all irqs being level interrupts and simplify the irq code a bit. At > least for the ad7124 if a conversion is done, the chip holds the line > low until the next conversion is done. In that case it deasserts > DOUT/̅R̅D̅Y for a short while to create another falling edge signalling > another event. I can imagine this to confuse level detection?! > > > > To my understanding this also means this is doomed to fail for some devices or > > > am I > > > not following it? > > > > If you were wired to one of those, you couldn't use the GPIO trick, but then > > don't have a GPIO in your DT in that case. > > Yes. If the device isn't properly connected in hardware you're out of > luck. But that is also true if the spi clock line isn't connected. So > apart from the requirement that "properly" involves things that are > unusual for other SPI devices, that's expected. Having said that it was > clear before because the MISO (aka DOUT/̅R̅D̅Y) line was already know to have > to be connected to an irq capable pin. > > > > All that said, my naive feeling would be for a masked line to not get any > > > pending IRQ > > > and if it does, the driver should make sure to clean all outstanding interrupts > > > when > > > unmasking. But I'm far from being an expert of the IRQ subsystem. Maybe it > > > would be > > > interesting to get some inputs about someone who actually knows better? > > +CC Thomas Gleixner, > > > > Annoying case where a wire is both the interrupt source for dataready and the > > SPI data line (if separate clock signal is toggling) So currently the driver > > masks interrupts at the host end, but we have at least one interrupt controller > > where they end up pending and fire on reenabling the interrupt. Querying the > > device to check the status register then ends up causing it to happen again, > > so that doesn't help. > > > > Proposal is to query it as a GPIO (or maybe a separate GPIO wired to the same > > pin) to check the level when an interrupt comes in. > > In my understanding it's the expected behaviour of an irq controller > that a masked irq becomes pending if the irq event (level or edge) > happens and then triggers immediately after enable_irq() -- independent > of laziness being used or not. > I'm really not sure about that. If a consumer disables/masks an interrupt, then I would think it expects no interrupts. If one comes during that time, it seems reasonable to me that the IRQ is discarded. And if the expected behavior is to have a pending IRQ if we got it while masked, then I'm not sure why we have the UNLAZY thing. I mean, let's always do the lazy approach which is effectively only masking the line if we get an IRQ while disabled (and also mark it as pending). If both approaches result in a pending IRQ... But again, not sure what are the expectations here and far from an expert on the IRQ subsystem. > That's also the only way to prevent missing interrupts. To understand > that consider an irq that signals there is data in a fifo. The handler > reads that data out and when it's empty returns. Returning results in > unmasking the interrupt again. If new data arrives between seeing the > fifo empty and reenabling the irq you miss the event if the irq doesn't > become pending while it's masked. > > I could argue that's the old issue of not handling the IRQs fast enough in the handler in which case some other alternatives would have to be implemented... Anyways, I guess it can go both ways. Because it can also be plausible that if one IRQ came while the line is masked then the consumer does not really care about it and if we get it right after enable_irq(), we get a spurious interrupt (which is the case here). But well, in the end you might be right in the sense that putting this awareness into the irqchip is asking for other issues. And for devices that cannot disable the IRQ at the device level and rely on disable_irq(), we need to handle it at the device level. - Nuno Sá >
On Thu, 2024-10-31 at 13:05 +0100, Nuno Sá wrote: > On Thu, 2024-10-31 at 11:40 +0100, Uwe Kleine-König wrote: > > Hello, > > > > On Wed, Oct 30, 2024 at 08:44:29PM +0000, Jonathan Cameron wrote: > > > On Wed, 30 Oct 2024 14:04:58 +0100 > > > Nuno Sá <noname.nuno@gmail.com> wrote: > > > > > > > On Mon, 2024-10-28 at 17:07 +0100, Uwe Kleine-König wrote: > > > > > Some of the ADCs by Analog signal their irq condition on the MISO line. > > > > > So typically that line is connected to an SPI controller and a GPIO. The > > > > > GPIO is used as input and the respective interrupt is enabled when the > > > > > last SPI transfer is completed. > > > > > > > > > > Depending on the GPIO controller the toggling MISO line might make the > > > > > interrupt pending even while it's masked. In that case the irq handler > > > > > is called immediately after irq_enable() and so before the device > > > > > actually pulls that line low which results in non-sense values being > > > > > reported to the upper layers. > > > > > > > > > > The only way to find out if the line was actually pulled low is to read > > > > > the GPIO. (There is a flag in AD7124's status register that also signals > > > > > if an interrupt was asserted, but reading that register toggles the MISO > > > > > line and so might trigger another spurious interrupt.) > > > > > > > > > > Add the possibility to specify an interrupt GPIO in the machine > > > > > description instead of a plain interrupt. This GPIO is used as interrupt > > > > > source and to check if the irq line is actually active in the irq > > > > > handler. > > > > > > > > > > Signed-off-by: Uwe Kleine-König <u.kleine-koenig@baylibre.com> > > > > > --- > > > > > > > > Hi all, > > > > > > > > Regarding this, I do share some of the concerns already raised by Jonathan. I > > > > fear > > > > that we're papering around an issue with the IRQ controller rather than being > > > > an > > > > issue with the device. When I look at irq_disable() docs [1], it feels that > > > > we're > > > > already doing what we're supposed to do. IOW, we disable the lazy approach so > > > > we > > > > *should* not get any pending IRQ. > > > > I think this is wrong and you always have to be prepared to see an irq > > triggering that became pending while masked. > > > > > > Also looking at drivers as the xilinx gpio controller, it seems some > > > > are careful about this [2] and make sure to clear all pending IRQs > > > > when unmasking. > > > Your links are both to the same place. > > > > The right one is: > > https://elixir.bootlin.com/linux/v6.11.5/source/drivers/gpio/gpio-xilinx.c#L419 > > > > I think this is buggy, see below for the reasoning. > > > > > > Jonathan also said this: > > > > > > > > "True enough - that race is a possibility, but not all interrupt inputs > > > > are capable of gpio usage whilst setup to received interrupts." > > > Race should be easy to avoid using a level interrupt now I think more on that: > > > can't miss a level. > > > > In general this isn't true. If it were that easy we could just assume > > all irqs being level interrupts and simplify the irq code a bit. At > > least for the ad7124 if a conversion is done, the chip holds the line > > low until the next conversion is done. In that case it deasserts > > DOUT/̅R̅D̅Y for a short while to create another falling edge signalling > > another event. I can imagine this to confuse level detection?! > > > > > > To my understanding this also means this is doomed to fail for some devices > > > > or > > > > am I > > > > not following it? > > > > > > If you were wired to one of those, you couldn't use the GPIO trick, but then > > > don't have a GPIO in your DT in that case. > > > > Yes. If the device isn't properly connected in hardware you're out of > > luck. But that is also true if the spi clock line isn't connected. So > > apart from the requirement that "properly" involves things that are > > unusual for other SPI devices, that's expected. Having said that it was > > clear before because the MISO (aka DOUT/̅R̅D̅Y) line was already know to have > > to be connected to an irq capable pin. > > > > > > All that said, my naive feeling would be for a masked line to not get any > > > > pending IRQ > > > > and if it does, the driver should make sure to clean all outstanding > > > > interrupts > > > > when > > > > unmasking. But I'm far from being an expert of the IRQ subsystem. Maybe it > > > > would be > > > > interesting to get some inputs about someone who actually knows better? > > > +CC Thomas Gleixner, > > > > > > Annoying case where a wire is both the interrupt source for dataready and the > > > SPI data line (if separate clock signal is toggling) So currently the driver > > > masks interrupts at the host end, but we have at least one interrupt controller > > > where they end up pending and fire on reenabling the interrupt. Querying the > > > device to check the status register then ends up causing it to happen again, > > > so that doesn't help. > > > > > > Proposal is to query it as a GPIO (or maybe a separate GPIO wired to the same > > > pin) to check the level when an interrupt comes in. > > > > In my understanding it's the expected behaviour of an irq controller > > that a masked irq becomes pending if the irq event (level or edge) > > happens and then triggers immediately after enable_irq() -- independent > > of laziness being used or not. > > > > I'm really not sure about that. If a consumer disables/masks an interrupt, then I > would think it expects no interrupts. If one comes during that time, it seems > reasonable to me that the IRQ is discarded. And if the expected behavior is to have > a > pending IRQ if we got it while masked, then I'm not sure why we have the UNLAZY > thing. I mean, let's always do the lazy approach which is effectively only masking > the line if we get an IRQ while disabled (and also mark it as pending). If both > approaches result in a pending IRQ... Hmmm maybe the real point with UNLAZY is to not take the same interrupt twice. One through the resend mechanism and another one through HW. - Nuno Sá
Hello, [adding rmk to Cc as the docs state that he invented lazy disabling] On Thu, Oct 31, 2024 at 01:05:21PM +0100, Nuno Sá wrote: > On Thu, 2024-10-31 at 11:40 +0100, Uwe Kleine-König wrote: > > On Wed, Oct 30, 2024 at 08:44:29PM +0000, Jonathan Cameron wrote: > > > On Wed, 30 Oct 2024 14:04:58 +0100 > > > Nuno Sá <noname.nuno@gmail.com> wrote: > > > > Regarding this, I do share some of the concerns already raised by Jonathan. I fear > > > > that we're papering around an issue with the IRQ controller rather than being an > > > > issue with the device. When I look at irq_disable() docs [1], it feels that we're > > > > already doing what we're supposed to do. IOW, we disable the lazy approach so we > > > > *should* not get any pending IRQ. > > > > I think this is wrong and you always have to be prepared to see an irq > > triggering that became pending while masked. I did some research, here are my findings: https://www.kernel.org/doc/html/v6.12-rc6/core-api/genericirq.html#delayed-interrupt-disable reads: The interrupt is kept enabled and is masked in the flow handler when an interrupt event happens. This prevents losing edge interrupts on hardware which does not store an edge interrupt event while the interrupt is disabled at the hardware level. This suggests that lazy disabling is needed for some controllers that stop their event detection when disabled. I read that as: *Normally* an irq event gets pending in hardware while the irq is disabled. The lazy disable approach is expected to work fine always, the reason to implement non-lazy disabling is "only" a performance optimisation. See commit e9849777d0e27cdd2902805be51da73e7c79578c. With the DOUT/̅R̅D̅Y pin the ad7124 (and others) is in this "Unfortunately there are devices which do not allow the interrupt to be disabled easily at the device level." class. However that makes me wonder what is the difference between the irq_mask() and irq_disable() callbacks defined in struct irq_chip. I don't know, but there is a difference for sure. So please forgive me for (probably) using the terms disable and mask wrongly. Also I wonder what happens if a device driver calls irq_set_status_flags(myirq, IRQ_DISABLE_UNLAZY); when the respective irq controller is one of those that miss events while masked. Shouldn't that better be caught? Best regards Uwe
On Mon, 2024-11-04 at 13:49 +0100, Uwe Kleine-König wrote: > Hello, > > [adding rmk to Cc as the docs state that he invented lazy disabling] > > On Thu, Oct 31, 2024 at 01:05:21PM +0100, Nuno Sá wrote: > > On Thu, 2024-10-31 at 11:40 +0100, Uwe Kleine-König wrote: > > > On Wed, Oct 30, 2024 at 08:44:29PM +0000, Jonathan Cameron wrote: > > > > On Wed, 30 Oct 2024 14:04:58 +0100 > > > > Nuno Sá <noname.nuno@gmail.com> wrote: > > > > > Regarding this, I do share some of the concerns already raised by Jonathan. > > > > > I fear > > > > > that we're papering around an issue with the IRQ controller rather than > > > > > being an > > > > > issue with the device. When I look at irq_disable() docs [1], it feels that > > > > > we're > > > > > already doing what we're supposed to do. IOW, we disable the lazy approach > > > > > so we > > > > > *should* not get any pending IRQ. > > > > > > I think this is wrong and you always have to be prepared to see an irq > > > triggering that became pending while masked. > > I did some research, here are my findings: > > https://www.kernel.org/doc/html/v6.12-rc6/core-api/genericirq.html#delayed-interrupt-disable > reads: > > The interrupt is kept enabled and is masked in the flow handler > when an interrupt event happens. This prevents losing edge > interrupts on hardware which does not store an edge interrupt > event while the interrupt is disabled at the hardware level. > > This suggests that lazy disabling is needed for some controllers that > stop their event detection when disabled. I read that as: *Normally* an > irq event gets pending in hardware while the irq is disabled. I might be wrong, but I think that the lazy approach is the one for optimization. I think the reasoning is that __normally__ no more IRQs will come so no need to access the HW. But also thinking more on the subject and looking at what the lazy approach is doing, I take back what I said in previous emails. I *think* the expectation for a received IRQ while the line is masked (or disabled?!), is to keep it as pending (both on HW - when possible - and in SW). > > The lazy disable approach is expected to work fine always, the reason to > implement non-lazy disabling is "only" a performance optimisation. See > commit e9849777d0e27cdd2902805be51da73e7c79578c. Not sure If I understood you correctly, but I think is the other way around? Also, as said in the commit, I think it also prevents the same interrupt from happening twice (in some cases). > > With the DOUT/̅R̅D̅Y pin the ad7124 (and others) is in this "Unfortunately > there are devices which do not allow the interrupt to be disabled easily > at the device level." class. > > However that makes me wonder what is the difference between the > irq_mask() and irq_disable() callbacks defined in struct irq_chip. Wondering the same... Thanks for digging into this. This has been a long standing thing with sigma delta ADCs (I'm fairly sure this discussion about being an issue on the IRQ controller or not already happened before). - Nuno Sá
Hello Nuno, On Mon, Nov 04, 2024 at 02:15:49PM +0100, Nuno Sá wrote: > On Mon, 2024-11-04 at 13:49 +0100, Uwe Kleine-König wrote: > > On Thu, Oct 31, 2024 at 01:05:21PM +0100, Nuno Sá wrote: > > > On Thu, 2024-10-31 at 11:40 +0100, Uwe Kleine-König wrote: > > > > On Wed, Oct 30, 2024 at 08:44:29PM +0000, Jonathan Cameron wrote: > > > > > On Wed, 30 Oct 2024 14:04:58 +0100 > > > > > Nuno Sá <noname.nuno@gmail.com> wrote: > > > > > > Regarding this, I do share some of the concerns already raised by Jonathan. > > > > > > I fear > > > > > > that we're papering around an issue with the IRQ controller rather than > > > > > > being an > > > > > > issue with the device. When I look at irq_disable() docs [1], it feels that > > > > > > we're > > > > > > already doing what we're supposed to do. IOW, we disable the lazy approach > > > > > > so we > > > > > > *should* not get any pending IRQ. > > > > > > > > I think this is wrong and you always have to be prepared to see an irq > > > > triggering that became pending while masked. > > > > I did some research, here are my findings: > > > > https://www.kernel.org/doc/html/v6.12-rc6/core-api/genericirq.html#delayed-interrupt-disable > > reads: > > > > The interrupt is kept enabled and is masked in the flow handler > > when an interrupt event happens. This prevents losing edge > > interrupts on hardware which does not store an edge interrupt > > event while the interrupt is disabled at the hardware level. > > > > This suggests that lazy disabling is needed for some controllers that > > stop their event detection when disabled. I read that as: *Normally* an > > irq event gets pending in hardware while the irq is disabled. > > I might be wrong, but I think that the lazy approach is the one for optimization. It's both. Needed for some controllers *and* an optimisation (that isn't beneficial in some corner cases). > I think the reasoning is that __normally__ no more IRQs will come so > no need to access the HW. But also thinking more on the subject and > looking at what the lazy approach is doing, I take back what I said in > previous emails. I *think* the expectation for a received IRQ while > the line is masked (or disabled?!), is to keep it as pending (both on > HW - when possible - and in SW). > > > The lazy disable approach is expected to work fine always, the reason to > > implement non-lazy disabling is "only" a performance optimisation. See > > commit e9849777d0e27cdd2902805be51da73e7c79578c. > > Not sure If I understood you correctly, but I think is the other way around? > Also, as said in the commit, I think it also prevents the same interrupt from > happening twice (in some cases). The conversation thread isn't complete on lore.kernel.org, so I don't know for sure, but the way I understand it is: Normally while you handle an irq no new irq comes in and so it's sensible to do lazy disabling. Approximately: In 99.9 % of the cases you save 1 µs by not masking and in the remaining 0.1% you get another hard irq that costs you say 500 µs. So on average you save: 0.999 * 1 µs + 0.001 * (1 - 500) µs = 0.5 µs. However if for a certain device it's normal that another irq comes in the "improvement" degrades to: In 20 % of the cases you save 1 µs and in the remaining 80 % you get a penalty of 500 µs. So in this case it's not an expected win anymore and you can better stop doing lazy disabling and invest the time to mask the irq improving the average cost from - 0.2 * 1 µs + 0.8 * 499 µs = 399.4 µs to 1 µs. The interrupt happening twice is not a problem for correctness as the second one is not given to the device driver but caught in the irq subsystem that only then disables the irq in hardware and marking it pending for later consumption. It "only" costs cpu time. (And maybe it's given to the driver twice after enable_irq is called?) Looking at the problem at hand with the shared DOUT/̅R̅D̅Y line: It's (nearly?) 100% of the cases that the line toggles while the irq is logically disabled/masked. So it makes sense to do irq_set_status_flags(sigma_delta->irq_line, IRQ_DISABLE_UNLAZY); which however should only have an effect iff the irq controller doesn't miss an edge while the irq is disabled. So assuming my understanding is correct, there is something to do about the raspberry pi gpio controller to prevent setting IRQ_DISABLE_UNLAZY have an effect, because that one looses events. > > However that makes me wonder what is the difference between the > > irq_mask() and irq_disable() callbacks defined in struct irq_chip. > > Wondering the same... > > Thanks for digging into this. This has been a long standing thing with sigma delta > ADCs (I'm fairly sure this discussion about being an issue on the IRQ controller or > not already happened before). I keep that paragraph in my reply because the question is still open even though I don't add new infos here. Best regards Uwe
On Tue, 2024-11-05 at 10:20 +0100, Uwe Kleine-König wrote: > Hello Nuno, > > On Mon, Nov 04, 2024 at 02:15:49PM +0100, Nuno Sá wrote: > > On Mon, 2024-11-04 at 13:49 +0100, Uwe Kleine-König wrote: > > > On Thu, Oct 31, 2024 at 01:05:21PM +0100, Nuno Sá wrote: > > > > On Thu, 2024-10-31 at 11:40 +0100, Uwe Kleine-König wrote: > > > > > On Wed, Oct 30, 2024 at 08:44:29PM +0000, Jonathan Cameron wrote: > > > > > > On Wed, 30 Oct 2024 14:04:58 +0100 > > > > > > Nuno Sá <noname.nuno@gmail.com> wrote: > > > > > > > Regarding this, I do share some of the concerns already raised by > > > > > > > Jonathan. > > > > > > > I fear > > > > > > > that we're papering around an issue with the IRQ controller rather > > > > > > > than > > > > > > > being an > > > > > > > issue with the device. When I look at irq_disable() docs [1], it > > > > > > > feels that > > > > > > > we're > > > > > > > already doing what we're supposed to do. IOW, we disable the lazy > > > > > > > approach > > > > > > > so we > > > > > > > *should* not get any pending IRQ. > > > > > > > > > > I think this is wrong and you always have to be prepared to see an irq > > > > > triggering that became pending while masked. > > > > > > I did some research, here are my findings: > > > > > > https://www.kernel.org/doc/html/v6.12-rc6/core-api/genericirq.html#delayed-interrupt-disable > > > reads: > > > > > > The interrupt is kept enabled and is masked in the flow handler > > > when an interrupt event happens. This prevents losing edge > > > interrupts on hardware which does not store an edge interrupt > > > event while the interrupt is disabled at the hardware level. > > > > > > This suggests that lazy disabling is needed for some controllers that > > > stop their event detection when disabled. I read that as: *Normally* an > > > irq event gets pending in hardware while the irq is disabled. > > > > I might be wrong, but I think that the lazy approach is the one for > > optimization. > > It's both. Needed for some controllers *and* an optimisation (that > isn't beneficial in some corner cases). > > > I think the reasoning is that __normally__ no more IRQs will come so > > no need to access the HW. But also thinking more on the subject and > > looking at what the lazy approach is doing, I take back what I said in > > previous emails. I *think* the expectation for a received IRQ while > > the line is masked (or disabled?!), is to keep it as pending (both on > > HW - when possible - and in SW). > > > > > The lazy disable approach is expected to work fine always, the reason to > > > implement non-lazy disabling is "only" a performance optimisation. See > > > commit e9849777d0e27cdd2902805be51da73e7c79578c. > > > > Not sure If I understood you correctly, but I think is the other way > > around? > > Also, as said in the commit, I think it also prevents the same interrupt > > from > > happening twice (in some cases). > > The conversation thread isn't complete on lore.kernel.org, so I don't > know for sure, but the way I understand it is: Normally while you handle > an irq no new irq comes in and so it's sensible to do lazy disabling. > Approximately: In 99.9 % of the cases you save 1 µs by not masking and > in the remaining 0.1% you get another hard irq that costs you say > 500 µs. So on average you save: 0.999 * 1 µs + 0.001 * (1 - 500) µs = 0.5 µs. > > However if for a certain device it's normal that another irq comes in > the "improvement" degrades to: In 20 % of the cases you save 1 µs and in > the remaining 80 % you get a penalty of 500 µs. So in this case it's not > an expected win anymore and you can better stop doing lazy disabling and > invest the time to mask the irq improving the average cost from > - 0.2 * 1 µs + 0.8 * 499 µs = 399.4 µs to 1 µs. > > The interrupt happening twice is not a problem for correctness as the > second one is not given to the device driver but caught in the irq > subsystem that only then disables the irq in hardware and marking it > pending for later consumption. It "only" costs cpu time. (And maybe it's > given to the driver twice after enable_irq is called?) Yeah, enable_irq() was what I meant. So, in the commit message, it's stated: "Unfortunately there are devices which do not allow the interrupt to be disabled easily at the device level. They are forced to use disable_irq_nosync(). This can result in taking each interrupt twice." And the taking twice part was something that I was not getting it. Still not 100% sure but I think that what is meant is that when we enable the IRQ we might get it through [1] and then afterwards through the IRQ controller for devices that latch the events (as soon as you unmask the line, the event should trigger). On [1], there's a retrigger path that goes through HW and I'm not sure if that one is problematic. But I would expect the SW one to be... [1]: https://elixir.bootlin.com/linux/v6.11.6/source/kernel/irq/chip.c#L283 - Nuno Sá > > Looking at the problem at hand with the shared DOUT/̅R̅D̅Y line: It's > (nearly?) 100% of the cases that the line toggles while the irq is > logically disabled/masked. So it makes sense to do > > irq_set_status_flags(sigma_delta->irq_line, IRQ_DISABLE_UNLAZY); > > which however should only have an effect iff the irq controller doesn't > miss an edge while the irq is disabled. > > So assuming my understanding is correct, there is something to do about > the raspberry pi gpio controller to prevent setting IRQ_DISABLE_UNLAZY > have an effect, because that one looses events. > > > > However that makes me wonder what is the difference between the > > > irq_mask() and irq_disable() callbacks defined in struct irq_chip. > > > > Wondering the same... > > > > Thanks for digging into this. This has been a long standing thing with sigma > > delta > > ADCs (I'm fairly sure this discussion about being an issue on the IRQ > > controller or > > not already happened before). > > I keep that paragraph in my reply because the question is still open > even though I don't add new infos here. > > Best regards > Uwe
diff --git a/drivers/iio/adc/ad_sigma_delta.c b/drivers/iio/adc/ad_sigma_delta.c index e2bed2d648f2..b5785f2a0abe 100644 --- a/drivers/iio/adc/ad_sigma_delta.c +++ b/drivers/iio/adc/ad_sigma_delta.c @@ -539,12 +539,29 @@ static irqreturn_t ad_sd_data_rdy_trig_poll(int irq, void *private) { struct ad_sigma_delta *sigma_delta = private; - complete(&sigma_delta->completion); - disable_irq_nosync(irq); - sigma_delta->irq_dis = true; - iio_trigger_poll(sigma_delta->trig); + /* + * AD7124 and a few others use the same physical line for interrupt + * reporting (nRDY) and MISO. + * As MISO toggles when reading a register, this likely results in a + * pending interrupt. This has two consequences: a) The irq might + * trigger immediately after it's enabled even though the conversion + * isn't done yet; and b) checking the STATUS register's nRDY flag is + * off-limits as reading that would trigger another irq event. + * + * So read the MOSI line as GPIO (if available) and only trigger the irq + * if the line is active. + */ - return IRQ_HANDLED; + if (!sigma_delta->rdy_gpiod || gpiod_get_value(sigma_delta->rdy_gpiod)) { + complete(&sigma_delta->completion); + disable_irq_nosync(irq); + sigma_delta->irq_dis = true; + iio_trigger_poll(sigma_delta->trig); + + return IRQ_HANDLED; + } else { + return IRQ_NONE; + } } /** @@ -679,6 +696,14 @@ int ad_sd_init(struct ad_sigma_delta *sigma_delta, struct iio_dev *indio_dev, else sigma_delta->irq_line = spi->irq; + sigma_delta->rdy_gpiod = devm_gpiod_get_optional(&spi->dev, "rdy", GPIOD_IN); + if (IS_ERR(sigma_delta->rdy_gpiod)) + return dev_err_probe(&spi->dev, PTR_ERR(sigma_delta->rdy_gpiod), + "Failed to find rdy gpio\n"); + + if (sigma_delta->rdy_gpiod && !sigma_delta->irq_line) + sigma_delta->irq_line = gpiod_to_irq(sigma_delta->rdy_gpiod); + iio_device_set_drvdata(indio_dev, sigma_delta); return 0; diff --git a/include/linux/iio/adc/ad_sigma_delta.h b/include/linux/iio/adc/ad_sigma_delta.h index f8c1d2505940..866b4c21794b 100644 --- a/include/linux/iio/adc/ad_sigma_delta.h +++ b/include/linux/iio/adc/ad_sigma_delta.h @@ -96,6 +96,7 @@ struct ad_sigma_delta { unsigned int active_slots; unsigned int current_slot; unsigned int num_slots; + struct gpio_desc *rdy_gpiod; int irq_line; bool status_appended; /* map slots to channels in order to know what to expect from devices */
Some of the ADCs by Analog signal their irq condition on the MISO line. So typically that line is connected to an SPI controller and a GPIO. The GPIO is used as input and the respective interrupt is enabled when the last SPI transfer is completed. Depending on the GPIO controller the toggling MISO line might make the interrupt pending even while it's masked. In that case the irq handler is called immediately after irq_enable() and so before the device actually pulls that line low which results in non-sense values being reported to the upper layers. The only way to find out if the line was actually pulled low is to read the GPIO. (There is a flag in AD7124's status register that also signals if an interrupt was asserted, but reading that register toggles the MISO line and so might trigger another spurious interrupt.) Add the possibility to specify an interrupt GPIO in the machine description instead of a plain interrupt. This GPIO is used as interrupt source and to check if the irq line is actually active in the irq handler. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@baylibre.com> --- drivers/iio/adc/ad_sigma_delta.c | 35 ++++++++++++++++++++++---- include/linux/iio/adc/ad_sigma_delta.h | 1 + 2 files changed, 31 insertions(+), 5 deletions(-)