diff mbox

[v3,2/7] i2c: bcm2835: Protect against unexpected TXW/RXR interrupts

Message ID 1475085056-5205-3-git-send-email-noralf@tronnes.org (mailing list archive)
State New, archived
Headers show

Commit Message

Noralf Trønnes Sept. 28, 2016, 5:50 p.m. UTC
If an unexpected TXW or RXR interrupt occurs (msg_buf_remaining == 0),
the driver has no way to fill/drain the FIFO to stop the interrupts.
In this case the controller has to be disabled and the transfer
completed to avoid hang.

(CLKT | ERR) and DONE interrupts are completed in their own paths, and
the controller is disabled in the transfer function after completion.
Unite the code paths and do disabling inside the interrupt routine.

Clear interrupt status bits in the united completion path instead of
trying to do it on every interrupt which isn't necessary.
Only CLKT, ERR and DONE can be cleared that way.

Add the status value to the error value in case of TXW/RXR errors to
distinguish them from the other S_LEN error.

Signed-off-by: Noralf Trønnes <noralf@tronnes.org>
---
 drivers/i2c/busses/i2c-bcm2835.c | 31 ++++++++++++++++++++++---------
 1 file changed, 22 insertions(+), 9 deletions(-)

Comments

Eric Anholt Sept. 28, 2016, 10 p.m. UTC | #1
Noralf Trønnes <noralf@tronnes.org> writes:

> If an unexpected TXW or RXR interrupt occurs (msg_buf_remaining == 0),
> the driver has no way to fill/drain the FIFO to stop the interrupts.
> In this case the controller has to be disabled and the transfer
> completed to avoid hang.
>
> (CLKT | ERR) and DONE interrupts are completed in their own paths, and
> the controller is disabled in the transfer function after completion.
> Unite the code paths and do disabling inside the interrupt routine.
>
> Clear interrupt status bits in the united completion path instead of
> trying to do it on every interrupt which isn't necessary.
> Only CLKT, ERR and DONE can be cleared that way.
>
> Add the status value to the error value in case of TXW/RXR errors to
> distinguish them from the other S_LEN error.

I was surprised that not writing the TXW/RXR bits on handling their
interrupts was OK, given that we were doing so before, but it's a level
interrupt and those bits are basically ignored on write.

This patch and 3, 4, and 6 are:

Reviewed-by: Eric Anholt <eric@anholt.net>

Patch 5 is:

Acked-by: Eric Anholt <eric@anholt.net>

Note for future debug: The I2C_C_CLEAR on errors will take some time to
resolve -- if you were in non-idle state and I2C_C_READ, it sets an
abort_rx flag and runs through the state machine to send a NACK and a
STOP, I think.  Since we're setting CLEAR without I2CEN, that NACK will
be hanging around queued up for next time we start the engine.

Patch 7 I had questions about but probably will send an ack when you
reply.
Noralf Trønnes Sept. 28, 2016, 10:22 p.m. UTC | #2
Den 29.09.2016 00:00, skrev Eric Anholt:
> Noralf Trønnes <noralf@tronnes.org> writes:
>
>> If an unexpected TXW or RXR interrupt occurs (msg_buf_remaining == 0),
>> the driver has no way to fill/drain the FIFO to stop the interrupts.
>> In this case the controller has to be disabled and the transfer
>> completed to avoid hang.
>>
>> (CLKT | ERR) and DONE interrupts are completed in their own paths, and
>> the controller is disabled in the transfer function after completion.
>> Unite the code paths and do disabling inside the interrupt routine.
>>
>> Clear interrupt status bits in the united completion path instead of
>> trying to do it on every interrupt which isn't necessary.
>> Only CLKT, ERR and DONE can be cleared that way.
>>
>> Add the status value to the error value in case of TXW/RXR errors to
>> distinguish them from the other S_LEN error.
> I was surprised that not writing the TXW/RXR bits on handling their
> interrupts was OK, given that we were doing so before, but it's a level
> interrupt and those bits are basically ignored on write.
>
> This patch and 3, 4, and 6 are:
>
> Reviewed-by: Eric Anholt <eric@anholt.net>
>
> Patch 5 is:
>
> Acked-by: Eric Anholt <eric@anholt.net>
>
> Note for future debug: The I2C_C_CLEAR on errors will take some time to
> resolve -- if you were in non-idle state and I2C_C_READ, it sets an
> abort_rx flag and runs through the state machine to send a NACK and a
> STOP, I think.  Since we're setting CLEAR without I2CEN, that NACK will
> be hanging around queued up for next time we start the engine.

Maybe you're able to explain the issues I had with reset:
https://github.com/raspberrypi/linux/issues/1653

Should we put your note into the commit message?
It will most likely be lost if it just stays in this email.

Noralf.
Stefan Wahren Sept. 29, 2016, 5:37 a.m. UTC | #3
> Noralf Trønnes <noralf@tronnes.org> hat am 29. September 2016 um 00:22
> geschrieben:
> 
> 
> 
> Den 29.09.2016 00:00, skrev Eric Anholt:
> > Noralf Trønnes <noralf@tronnes.org> writes:
> >
> >> If an unexpected TXW or RXR interrupt occurs (msg_buf_remaining == 0),
> >> the driver has no way to fill/drain the FIFO to stop the interrupts.
> >> In this case the controller has to be disabled and the transfer
> >> completed to avoid hang.
> >>
> >> (CLKT | ERR) and DONE interrupts are completed in their own paths, and
> >> the controller is disabled in the transfer function after completion.
> >> Unite the code paths and do disabling inside the interrupt routine.
> >>
> >> Clear interrupt status bits in the united completion path instead of
> >> trying to do it on every interrupt which isn't necessary.
> >> Only CLKT, ERR and DONE can be cleared that way.
> >>
> >> Add the status value to the error value in case of TXW/RXR errors to
> >> distinguish them from the other S_LEN error.
> > I was surprised that not writing the TXW/RXR bits on handling their
> > interrupts was OK, given that we were doing so before, but it's a level
> > interrupt and those bits are basically ignored on write.
> >
> > This patch and 3, 4, and 6 are:
> >
> > Reviewed-by: Eric Anholt <eric@anholt.net>
> >
> > Patch 5 is:
> >
> > Acked-by: Eric Anholt <eric@anholt.net>
> >
> > Note for future debug: The I2C_C_CLEAR on errors will take some time to
> > resolve -- if you were in non-idle state and I2C_C_READ, it sets an
> > abort_rx flag and runs through the state machine to send a NACK and a
> > STOP, I think.  Since we're setting CLEAR without I2CEN, that NACK will
> > be hanging around queued up for next time we start the engine.
> 
> Maybe you're able to explain the issues I had with reset:
> https://github.com/raspberrypi/linux/issues/1653
> 
> Should we put your note into the commit message?
> It will most likely be lost if it just stays in this email.

I prefer to have this kind of information as a code comment.

> 
> Noralf.
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Noralf Trønnes Oct. 2, 2016, 2:19 p.m. UTC | #4
Den 29.09.2016 07:37, skrev Stefan Wahren:
>> Noralf Trønnes <noralf@tronnes.org> hat am 29. September 2016 um 00:22
>> geschrieben:
>>
>>
>>
>> Den 29.09.2016 00:00, skrev Eric Anholt:
>>> Noralf Trønnes <noralf@tronnes.org> writes:
>>>
>>>> If an unexpected TXW or RXR interrupt occurs (msg_buf_remaining == 0),
>>>> the driver has no way to fill/drain the FIFO to stop the interrupts.
>>>> In this case the controller has to be disabled and the transfer
>>>> completed to avoid hang.
>>>>
>>>> (CLKT | ERR) and DONE interrupts are completed in their own paths, and
>>>> the controller is disabled in the transfer function after completion.
>>>> Unite the code paths and do disabling inside the interrupt routine.
>>>>
>>>> Clear interrupt status bits in the united completion path instead of
>>>> trying to do it on every interrupt which isn't necessary.
>>>> Only CLKT, ERR and DONE can be cleared that way.
>>>>
>>>> Add the status value to the error value in case of TXW/RXR errors to
>>>> distinguish them from the other S_LEN error.
>>> I was surprised that not writing the TXW/RXR bits on handling their
>>> interrupts was OK, given that we were doing so before, but it's a level
>>> interrupt and those bits are basically ignored on write.
>>>
>>> This patch and 3, 4, and 6 are:
>>>
>>> Reviewed-by: Eric Anholt <eric@anholt.net>
>>>
>>> Patch 5 is:
>>>
>>> Acked-by: Eric Anholt <eric@anholt.net>
>>>
>>> Note for future debug: The I2C_C_CLEAR on errors will take some time to
>>> resolve -- if you were in non-idle state and I2C_C_READ, it sets an
>>> abort_rx flag and runs through the state machine to send a NACK and a
>>> STOP, I think.  Since we're setting CLEAR without I2CEN, that NACK will
>>> be hanging around queued up for next time we start the engine.
>> Maybe you're able to explain the issues I had with reset:
>> https://github.com/raspberrypi/linux/issues/1653
>>
>> Should we put your note into the commit message?
>> It will most likely be lost if it just stays in this email.
> I prefer to have this kind of information as a code comment.

Eric, does this look good to you as a code comment:

/*
  * Note about I2C_C_CLEAR on error:
  * The I2C_C_CLEAR on errors will take some time to resolve -- if you 
were in
  * non-idle state and I2C_C_READ, it sets an abort_rx flag and runs through
  * the state machine to send a NACK and a STOP. Since we're setting CLEAR
  * without I2CEN, that NACK will be hanging around queued up for next time
  * we start the engine.
  */


If it is, I'll resend the series with this change and add all the ack's 
and r-b's.

Noralf.
Eric Anholt Oct. 3, 2016, 6:58 p.m. UTC | #5
Noralf Trønnes <noralf@tronnes.org> writes:

> Den 29.09.2016 07:37, skrev Stefan Wahren:
>>> Noralf Trønnes <noralf@tronnes.org> hat am 29. September 2016 um 00:22
>>> geschrieben:
>>>
>>>
>>>
>>> Den 29.09.2016 00:00, skrev Eric Anholt:
>>>> Noralf Trønnes <noralf@tronnes.org> writes:
>>>>
>>>>> If an unexpected TXW or RXR interrupt occurs (msg_buf_remaining == 0),
>>>>> the driver has no way to fill/drain the FIFO to stop the interrupts.
>>>>> In this case the controller has to be disabled and the transfer
>>>>> completed to avoid hang.
>>>>>
>>>>> (CLKT | ERR) and DONE interrupts are completed in their own paths, and
>>>>> the controller is disabled in the transfer function after completion.
>>>>> Unite the code paths and do disabling inside the interrupt routine.
>>>>>
>>>>> Clear interrupt status bits in the united completion path instead of
>>>>> trying to do it on every interrupt which isn't necessary.
>>>>> Only CLKT, ERR and DONE can be cleared that way.
>>>>>
>>>>> Add the status value to the error value in case of TXW/RXR errors to
>>>>> distinguish them from the other S_LEN error.
>>>> I was surprised that not writing the TXW/RXR bits on handling their
>>>> interrupts was OK, given that we were doing so before, but it's a level
>>>> interrupt and those bits are basically ignored on write.
>>>>
>>>> This patch and 3, 4, and 6 are:
>>>>
>>>> Reviewed-by: Eric Anholt <eric@anholt.net>
>>>>
>>>> Patch 5 is:
>>>>
>>>> Acked-by: Eric Anholt <eric@anholt.net>
>>>>
>>>> Note for future debug: The I2C_C_CLEAR on errors will take some time to
>>>> resolve -- if you were in non-idle state and I2C_C_READ, it sets an
>>>> abort_rx flag and runs through the state machine to send a NACK and a
>>>> STOP, I think.  Since we're setting CLEAR without I2CEN, that NACK will
>>>> be hanging around queued up for next time we start the engine.
>>> Maybe you're able to explain the issues I had with reset:
>>> https://github.com/raspberrypi/linux/issues/1653
>>>
>>> Should we put your note into the commit message?
>>> It will most likely be lost if it just stays in this email.
>> I prefer to have this kind of information as a code comment.
>
> Eric, does this look good to you as a code comment:
>
> /*
>   * Note about I2C_C_CLEAR on error:
>   * The I2C_C_CLEAR on errors will take some time to resolve -- if you 
> were in
>   * non-idle state and I2C_C_READ, it sets an abort_rx flag and runs through
>   * the state machine to send a NACK and a STOP. Since we're setting CLEAR
>   * without I2CEN, that NACK will be hanging around queued up for next time
>   * we start the engine.
>   */
>
>
> If it is, I'll resend the series with this change and add all the ack's 
> and r-b's.

Looks good to me.  Thanks!
Eric Anholt Oct. 3, 2016, 7:42 p.m. UTC | #6
Noralf Trønnes <noralf@tronnes.org> writes:

> Den 29.09.2016 00:00, skrev Eric Anholt:
>> Noralf Trønnes <noralf@tronnes.org> writes:
>>
>>> If an unexpected TXW or RXR interrupt occurs (msg_buf_remaining == 0),
>>> the driver has no way to fill/drain the FIFO to stop the interrupts.
>>> In this case the controller has to be disabled and the transfer
>>> completed to avoid hang.
>>>
>>> (CLKT | ERR) and DONE interrupts are completed in their own paths, and
>>> the controller is disabled in the transfer function after completion.
>>> Unite the code paths and do disabling inside the interrupt routine.
>>>
>>> Clear interrupt status bits in the united completion path instead of
>>> trying to do it on every interrupt which isn't necessary.
>>> Only CLKT, ERR and DONE can be cleared that way.
>>>
>>> Add the status value to the error value in case of TXW/RXR errors to
>>> distinguish them from the other S_LEN error.
>> I was surprised that not writing the TXW/RXR bits on handling their
>> interrupts was OK, given that we were doing so before, but it's a level
>> interrupt and those bits are basically ignored on write.
>>
>> This patch and 3, 4, and 6 are:
>>
>> Reviewed-by: Eric Anholt <eric@anholt.net>
>>
>> Patch 5 is:
>>
>> Acked-by: Eric Anholt <eric@anholt.net>
>>
>> Note for future debug: The I2C_C_CLEAR on errors will take some time to
>> resolve -- if you were in non-idle state and I2C_C_READ, it sets an
>> abort_rx flag and runs through the state machine to send a NACK and a
>> STOP, I think.  Since we're setting CLEAR without I2CEN, that NACK will
>> be hanging around queued up for next time we start the engine.
>
> Maybe you're able to explain the issues I had with reset:
> https://github.com/raspberrypi/linux/issues/1653

One of the questions I think you might have is "what state does the
controller end up in after the various interrupts?"

ERR:
- produced if we get a nack that's not at the end of a read.

- Proceeds to repeated start if BCM2835_I2C_C_ST is queued, otherwise
  stop.

CLKT:
- Triggered by a counter outside of the state machine when stretching
  happens and then times out.

- Sets cs_override, which causes proceeding through the state machine as
  if the clock wasn't getting stretched, until the end of the next byte
  sent/received.

- According to Wolfram we shouldn't be timing out on clock stretching
  for i2c, just on the transfer as a whole
  (https://patchwork.kernel.org/patch/9148431/), so I wrote
  https://github.com/anholt/linux/commit/894200276239d2e4c60b378bdc52164fcb13af8d.
  However, I don't see an obvious way to get back to IDLE while the
  slave is still stretching, without triggering the clock stretching
  timeout path.

DONE:
- Signaled at STOP, and just moves to IDLE state which keeps scl/sda
  high and waits for a BCM2835_I2C_C_ST while we're not clearing the
  FIFOs (if you do signal start while the fifos are clearing, the start
  will hang around until the fifo clear is done).  This is the only way
  to get to IDLE.

I'm don't think I have an answer to the "what should I do?" question you
had, but hopefully this helps.
Noralf Trønnes Oct. 4, 2016, 7:24 p.m. UTC | #7
Den 03.10.2016 21:42, skrev Eric Anholt:
> Noralf Trønnes <noralf@tronnes.org> writes:
>
>> Den 29.09.2016 00:00, skrev Eric Anholt:
>>> Noralf Trønnes <noralf@tronnes.org> writes:
>>>
>>>> If an unexpected TXW or RXR interrupt occurs (msg_buf_remaining == 0),
>>>> the driver has no way to fill/drain the FIFO to stop the interrupts.
>>>> In this case the controller has to be disabled and the transfer
>>>> completed to avoid hang.
>>>>
>>>> (CLKT | ERR) and DONE interrupts are completed in their own paths, and
>>>> the controller is disabled in the transfer function after completion.
>>>> Unite the code paths and do disabling inside the interrupt routine.
>>>>
>>>> Clear interrupt status bits in the united completion path instead of
>>>> trying to do it on every interrupt which isn't necessary.
>>>> Only CLKT, ERR and DONE can be cleared that way.
>>>>
>>>> Add the status value to the error value in case of TXW/RXR errors to
>>>> distinguish them from the other S_LEN error.
>>> I was surprised that not writing the TXW/RXR bits on handling their
>>> interrupts was OK, given that we were doing so before, but it's a level
>>> interrupt and those bits are basically ignored on write.
>>>
>>> This patch and 3, 4, and 6 are:
>>>
>>> Reviewed-by: Eric Anholt <eric@anholt.net>
>>>
>>> Patch 5 is:
>>>
>>> Acked-by: Eric Anholt <eric@anholt.net>
>>>
>>> Note for future debug: The I2C_C_CLEAR on errors will take some time to
>>> resolve -- if you were in non-idle state and I2C_C_READ, it sets an
>>> abort_rx flag and runs through the state machine to send a NACK and a
>>> STOP, I think.  Since we're setting CLEAR without I2CEN, that NACK will
>>> be hanging around queued up for next time we start the engine.
>> Maybe you're able to explain the issues I had with reset:
>> https://github.com/raspberrypi/linux/issues/1653
> One of the questions I think you might have is "what state does the
> controller end up in after the various interrupts?"
>
> ERR:
> - produced if we get a nack that's not at the end of a read.
>
> - Proceeds to repeated start if BCM2835_I2C_C_ST is queued, otherwise
>    stop.
>
> CLKT:
> - Triggered by a counter outside of the state machine when stretching
>    happens and then times out.
>
> - Sets cs_override, which causes proceeding through the state machine as
>    if the clock wasn't getting stretched, until the end of the next byte
>    sent/received.
>
> - According to Wolfram we shouldn't be timing out on clock stretching
>    for i2c, just on the transfer as a whole
>    (https://patchwork.kernel.org/patch/9148431/), so I wrote
>    https://github.com/anholt/linux/commit/894200276239d2e4c60b378bdc52164fcb13af8d.
>    However, I don't see an obvious way to get back to IDLE while the
>    slave is still stretching, without triggering the clock stretching
>    timeout path.

If the transfer times out, whatever the reason, we clear the fifo
(and disable). Doesn't that get us back to IDLE?

Code with my patches:

static int bcm2835_i2c_xfer(struct i2c_adapter *adap, struct i2c_msg msgs[],
                 int num)
{
[...]
     bcm2835_i2c_start_transfer(i2c_dev);

     time_left = wait_for_completion_timeout(&i2c_dev->completion,
                         adap->timeout);
     if (!time_left) {
         bcm2835_i2c_writel(i2c_dev, BCM2835_I2C_C,
                    BCM2835_I2C_C_CLEAR);
         dev_err(i2c_dev->dev, "i2c transfer timed out\n");
         return -ETIMEDOUT;
     }

> DONE:
> - Signaled at STOP, and just moves to IDLE state which keeps scl/sda
>    high and waits for a BCM2835_I2C_C_ST while we're not clearing the
>    FIFOs (if you do signal start while the fifos are clearing, the start
>    will hang around until the fifo clear is done).  This is the only way
>    to get to IDLE.
>
> I'm don't think I have an answer to the "what should I do?" question you
> had, but hopefully this helps.
diff mbox

Patch

diff --git a/drivers/i2c/busses/i2c-bcm2835.c b/drivers/i2c/busses/i2c-bcm2835.c
index f283b71..df036ed 100644
--- a/drivers/i2c/busses/i2c-bcm2835.c
+++ b/drivers/i2c/busses/i2c-bcm2835.c
@@ -50,8 +50,6 @@ 
 #define BCM2835_I2C_S_CLKT	BIT(9)
 #define BCM2835_I2C_S_LEN	BIT(10) /* Fake bit for SW error reporting */
 
-#define BCM2835_I2C_BITMSK_S	0x03FF
-
 #define BCM2835_I2C_CDIV_MIN	0x0002
 #define BCM2835_I2C_CDIV_MAX	0xFFFE
 
@@ -117,14 +115,11 @@  static irqreturn_t bcm2835_i2c_isr(int this_irq, void *data)
 	u32 val, err;
 
 	val = bcm2835_i2c_readl(i2c_dev, BCM2835_I2C_S);
-	val &= BCM2835_I2C_BITMSK_S;
-	bcm2835_i2c_writel(i2c_dev, BCM2835_I2C_S, val);
 
 	err = val & (BCM2835_I2C_S_CLKT | BCM2835_I2C_S_ERR);
 	if (err) {
 		i2c_dev->msg_err = err;
-		complete(&i2c_dev->completion);
-		return IRQ_HANDLED;
+		goto complete;
 	}
 
 	if (val & BCM2835_I2C_S_DONE) {
@@ -137,21 +132,38 @@  static irqreturn_t bcm2835_i2c_isr(int this_irq, void *data)
 			i2c_dev->msg_err = BCM2835_I2C_S_LEN;
 		else
 			i2c_dev->msg_err = 0;
-		complete(&i2c_dev->completion);
-		return IRQ_HANDLED;
+		goto complete;
 	}
 
 	if (val & BCM2835_I2C_S_TXW) {
+		if (!i2c_dev->msg_buf_remaining) {
+			i2c_dev->msg_err = val | BCM2835_I2C_S_LEN;
+			goto complete;
+		}
+
 		bcm2835_fill_txfifo(i2c_dev);
 		return IRQ_HANDLED;
 	}
 
 	if (val & BCM2835_I2C_S_RXR) {
+		if (!i2c_dev->msg_buf_remaining) {
+			i2c_dev->msg_err = val | BCM2835_I2C_S_LEN;
+			goto complete;
+		}
+
 		bcm2835_drain_rxfifo(i2c_dev);
 		return IRQ_HANDLED;
 	}
 
 	return IRQ_NONE;
+
+complete:
+	bcm2835_i2c_writel(i2c_dev, BCM2835_I2C_C, BCM2835_I2C_C_CLEAR);
+	bcm2835_i2c_writel(i2c_dev, BCM2835_I2C_S, BCM2835_I2C_S_CLKT |
+			   BCM2835_I2C_S_ERR | BCM2835_I2C_S_DONE);
+	complete(&i2c_dev->completion);
+
+	return IRQ_HANDLED;
 }
 
 static int bcm2835_i2c_xfer_msg(struct bcm2835_i2c_dev *i2c_dev,
@@ -181,8 +193,9 @@  static int bcm2835_i2c_xfer_msg(struct bcm2835_i2c_dev *i2c_dev,
 
 	time_left = wait_for_completion_timeout(&i2c_dev->completion,
 						BCM2835_I2C_TIMEOUT);
-	bcm2835_i2c_writel(i2c_dev, BCM2835_I2C_C, BCM2835_I2C_C_CLEAR);
 	if (!time_left) {
+		bcm2835_i2c_writel(i2c_dev, BCM2835_I2C_C,
+				   BCM2835_I2C_C_CLEAR);
 		dev_err(i2c_dev->dev, "i2c transfer timed out\n");
 		return -ETIMEDOUT;
 	}