Message ID | 20161118193542.GO1041@n2100.armlinux.org.uk (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Fri, Nov 18, 2016 at 07:35:42PM +0000, Russell King - ARM Linux wrote: > With reference to this commit: > > commit d39f77b06a712fcba6185a20bb209e357923d980 > Author: Andrew Jackson <Andrew.Jackson@arm.com> > Date: Fri Nov 7 12:10:44 2014 +0000 > > i2c: designware: prevent early stop on TX FIFO empty > > If the Designware core is configured with IC_EMPTYFIFO_HOLD_MASTER_EN > set to zero, allowing the TX FIFO to become empty causes a STOP > condition to be generated on the I2C bus. If the transmit FIFO > threshold is set too high, an erroneous STOP condition can be > generated on long transfers - particularly where the interrupt > latency is extended. > > Signed-off-by: Andrew Jackson <Andrew.Jackson@arm.com> > Signed-off-by: Liviu Dudau <Liviu.Dudau@arm.com> > Tested-by: Mika Westerberg <mika.westerberg@linux.intel.com> > Signed-off-by: Wolfram Sang <wsa@the-dreams.de> > > The TDA998x driver issues long I2C transfers to read the EDID from the > device - and userspace can also issue large transfers too. However, > if a DW core is configured with IC_EMPTYFIFO_HOLD_MASTER_EN set as > zero, the above commit doesn't seem to solve the problem. During > boot, with the patch below, I see: > > [ 1.736549] i2c_designware 7ffa0000.i2c: i2c_dw_isr: enabled=0x1 stat=0x10 > [ 1.736564] i2c_designware 7ffa0000.i2c: i2c_dw_isr: enabled=0x1 stat=0x510 > [ 1.736608] i2c_designware 7ffa0000.i2c: i2c_dw_isr: enabled=0x1 stat=0x504 > [ 1.736799] i2c_designware 7ffa0000.i2c: i2c_dw_isr: enabled=0x1 stat=0x514 > [ 1.736819] i2c_designware 7ffa0000.i2c: i2c_dw_isr: enabled=0x1 stat=0x510 > ... > [ 1.737986] i2c_designware 7ffa0000.i2c: i2c_dw_isr: enabled=0x1 stat=0x504 > [ 1.738010] i2c_designware 7ffa0000.i2c: i2c_dw_isr: enabled=0x1 stat=0x514 > [ 1.738034] i2c_designware 7ffa0000.i2c: i2c_dw_isr: enabled=0x1 stat=0x504 > [ 1.738039] random: fast init done > [ 1.740120] i2c_designware 7ffa0000.i2c: i2c_dw_isr: enabled=0x1 stat=0x714 > [ 1.740231] i2c_dw_xfer: ffffffc97657b770:1 -> ffffffc97657b770:1 (0:0) [0 0 3 0] 8 [tx:ffffffc976682380:47] [rx:ffffffc9766823c9:55] > [ 1.740249] [drm:drm_edid_block_valid] *ERROR* EDID checksum is invalid, remainder is 93 > [ 1.746979] Raw EDID: > [ 1.747934] 00 ff ff ff ff ff ff 00 34 a9 96 a2 01 01 01 01 > [ 1.752342] 00 17 01 03 80 80 48 78 0a da ff a3 58 4a a2 29 > [ 1.756748] 17 49 4b 21 08 00 31 40 45 40 61 40 81 80 01 01 > [ 1.761153] 01 01 01 01 01 01 02 3a 80 d0 72 38 2d 40 10 2c > [ 1.765555] 45 80 ba 88 21 00 00 1e 02 00 d0 4e 30 09 12 54 > [ 1.769958] 01 08 02 00 23 36 01 40 01 05 00 80 a1 4c 4b 49 > [ 1.774361] 22 00 00 40 03 00 28 00 23 01 20 00 01 88 00 01 > [ 1.778762] 08 00 00 40 00 02 03 04 0a 00 80 00 02 00 00 40 > > The significant thing is the "i2c_dw_xfer" line, where I add a print of > the current state. Here, we can see that the transfer is mid-way, but > a stop condition has been generated by the hardware, leaving 55 bytes > to be received. > > Unfortunately, the i2c-designware driver ignores this, and believes that > the transfer completed both fully and successfully, but returns bogus > data to userspace or the kernel driver. That's really _bad_ behaviour > by the driver - it should at least return an error. Totally agree. > This problem is _soo_ bad that on my Juno, I can't run Xorg (it hits > this every time we try to read the EDID) nor can I boot with the TV > connected (it hits this every boot as well.) > > I'd go as far as to say that the i2c-designware hardware, when > configured with this option set to zero, is fundamentally broken for OS > which do not provide any guarantee for interrupt latency, such as Linux. > > The commit above tries to mitigate this by reducing the Tx FIFO > threshold, so the interrupt is raised sooner, but that's clearly not > enough for reliable operation. > > Another mitigation would be to lower the I2C bus frequency on Juno from > 400kHz to 100kHz, so that there's 4x longer IRQ latency possible. > However, even that isn't going to be reliable - even going to 100kHz > isn't going to allow the above case to be solved - the interrupt is > delayed by around 2ms, and it takes about 1.4ms to send/receive 16 bytes > at 100kHz. (9 * 16 / (100*10^3)). > > So, I think all hope is lost for i2c-designware on Juno to cope with > reading the EDID from TDA998x reliably. :-( I wonder if we can get it work more reliably by using DMA (provided that there are DMA channels available for I2C in Juno)? That would allow the hardware to perform longer reads without relying on how fast the interrupt handler is able to empty the Rx FIFO.
On Mon, Nov 21, 2016 at 12:29:01PM +0200, Mika Westerberg wrote: > On Fri, Nov 18, 2016 at 07:35:42PM +0000, Russell King - ARM Linux wrote: > > Another mitigation would be to lower the I2C bus frequency on Juno from > > 400kHz to 100kHz, so that there's 4x longer IRQ latency possible. > > However, even that isn't going to be reliable - even going to 100kHz > > isn't going to allow the above case to be solved - the interrupt is > > delayed by around 2ms, and it takes about 1.4ms to send/receive 16 bytes > > at 100kHz. (9 * 16 / (100*10^3)). > > > > So, I think all hope is lost for i2c-designware on Juno to cope with > > reading the EDID from TDA998x reliably. > > :-( > > I wonder if we can get it work more reliably by using DMA (provided that > there are DMA channels available for I2C in Juno)? That would allow the > hardware to perform longer reads without relying on how fast the > interrupt handler is able to empty the Rx FIFO. It would need to DMA to the Tx FIFO to keep it filled - it triggers the stop condition when the Tx FIFO empties. From what I can see in the driver, the Tx FIFO not only takes the data but also a "command" to tell the hardware what to do. The Rx FIFO would also need DMA to avoid it overflowing due to high interrupt latency. I don't know what state DMA is in on the Juno, or even whether it has DMA - it has a PL330 DMA controller, but I see nothing in the DT files making use of it.
On Mon, Nov 21, 2016 at 10:43:29AM +0000, Russell King - ARM Linux wrote: > On Mon, Nov 21, 2016 at 12:29:01PM +0200, Mika Westerberg wrote: > > On Fri, Nov 18, 2016 at 07:35:42PM +0000, Russell King - ARM Linux wrote: > > > Another mitigation would be to lower the I2C bus frequency on Juno from > > > 400kHz to 100kHz, so that there's 4x longer IRQ latency possible. > > > However, even that isn't going to be reliable - even going to 100kHz > > > isn't going to allow the above case to be solved - the interrupt is > > > delayed by around 2ms, and it takes about 1.4ms to send/receive 16 bytes > > > at 100kHz. (9 * 16 / (100*10^3)). > > > > > > So, I think all hope is lost for i2c-designware on Juno to cope with > > > reading the EDID from TDA998x reliably. > > > > :-( > > > > I wonder if we can get it work more reliably by using DMA (provided that > > there are DMA channels available for I2C in Juno)? That would allow the > > hardware to perform longer reads without relying on how fast the > > interrupt handler is able to empty the Rx FIFO. > > It would need to DMA to the Tx FIFO to keep it filled - it triggers the > stop condition when the Tx FIFO empties. From what I can see in the > driver, the Tx FIFO not only takes the data but also a "command" to tell > the hardware what to do. Yes, the command is either "read" or "write" (meaning even for Rx a write to the Tx FIFO is needed). > The Rx FIFO would also need DMA to avoid it overflowing due to high > interrupt latency. I guess for Rx we would need to supply a dummy buffer of "read" commands for the Tx FIFO in addition to reading bytes from Rx FIFO. But still that might help to improve reliability in case of Juno. > I don't know what state DMA is in on the Juno, or even whether it has > DMA - it has a PL330 DMA controller, but I see nothing in the DT files > making use of it. OK
On Mon, Nov 21, 2016 at 10:43:29AM +0000, Russell King - ARM Linux wrote: > On Mon, Nov 21, 2016 at 12:29:01PM +0200, Mika Westerberg wrote: > > On Fri, Nov 18, 2016 at 07:35:42PM +0000, Russell King - ARM Linux wrote: > > > Another mitigation would be to lower the I2C bus frequency on Juno from > > > 400kHz to 100kHz, so that there's 4x longer IRQ latency possible. > > > However, even that isn't going to be reliable - even going to 100kHz > > > isn't going to allow the above case to be solved - the interrupt is > > > delayed by around 2ms, and it takes about 1.4ms to send/receive 16 bytes > > > at 100kHz. (9 * 16 / (100*10^3)). > > > > > > So, I think all hope is lost for i2c-designware on Juno to cope with > > > reading the EDID from TDA998x reliably. > > > > :-( > > > > I wonder if we can get it work more reliably by using DMA (provided that > > there are DMA channels available for I2C in Juno)? That would allow the > > hardware to perform longer reads without relying on how fast the > > interrupt handler is able to empty the Rx FIFO. > > It would need to DMA to the Tx FIFO to keep it filled - it triggers the > stop condition when the Tx FIFO empties. From what I can see in the > driver, the Tx FIFO not only takes the data but also a "command" to tell > the hardware what to do. > > The Rx FIFO would also need DMA to avoid it overflowing due to high > interrupt latency. > > I don't know what state DMA is in on the Juno, or even whether it has > DMA - it has a PL330 DMA controller, but I see nothing in the DT files > making use of it. The only thing we have tested PL330 with was UART and I2S. I'm not sure we can really use it with I2C given the above hardware configuration issue. The other thing we have tried in our private branches was to repeat the EDID transfer a number of times until the checksum was correct. Andrew Jackson sent a patch a year or so back to have some ridiculously high number of retries (30) and that has been rightfully shut down in the upstream. The unfortunate thing is that the monitors and TVs that we use inside ARM for testing don't seem to trigger this issue on a regular basis. We had one or two small 7" TVs that did that, and the brute force retry of EDID transfer sorted them out for the limited use that we had for them. Best regards, Liviu > > -- > RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ > FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up > according to speedtest.net.
On Mon, Nov 21, 2016 at 11:21:35AM +0000, Liviu Dudau wrote: > On Mon, Nov 21, 2016 at 10:43:29AM +0000, Russell King - ARM Linux wrote: > > It would need to DMA to the Tx FIFO to keep it filled - it triggers the > > stop condition when the Tx FIFO empties. From what I can see in the > > driver, the Tx FIFO not only takes the data but also a "command" to tell > > the hardware what to do. > > > > The Rx FIFO would also need DMA to avoid it overflowing due to high > > interrupt latency. > > > > I don't know what state DMA is in on the Juno, or even whether it has > > DMA - it has a PL330 DMA controller, but I see nothing in the DT files > > making use of it. > > The only thing we have tested PL330 with was UART and I2S. I'm not sure we > can really use it with I2C given the above hardware configuration issue. > > The other thing we have tried in our private branches was to repeat the EDID > transfer a number of times until the checksum was correct. Andrew Jackson > sent a patch a year or so back to have some ridiculously high number of > retries (30) and that has been rightfully shut down in the upstream. > > The unfortunate thing is that the monitors and TVs that we use inside ARM > for testing don't seem to trigger this issue on a regular basis. We had one > or two small 7" TVs that did that, and the brute force retry of EDID transfer > sorted them out for the limited use that we had for them. Due to the way the TDA998x works, it doesn't have much to do with the TV. The TDA998x is instructed to read up to 128 bytes of EDID data into its own internal buffer. It then does so, and raises an interrupt (or we notice that the interrupt flag is set when there's no hardware interrupt line), and we then read the EDID data out of the TDA998x. At the point we're reading from the TDA998x, it has loaded the data from the DDC bus and the DDC bus is idle... we don't have direct access to the DDC bus. The only reason I can think that the checksum would pass is if you were somehow ending up with data in the buffer that did cause the basic checksum to pass, even though you had an incomplete read.
On 21/11/16 10:43, Russell King - ARM Linux wrote: > On Mon, Nov 21, 2016 at 12:29:01PM +0200, Mika Westerberg wrote: >> On Fri, Nov 18, 2016 at 07:35:42PM +0000, Russell King - ARM Linux wrote: >>> Another mitigation would be to lower the I2C bus frequency on Juno from >>> 400kHz to 100kHz, so that there's 4x longer IRQ latency possible. >>> However, even that isn't going to be reliable - even going to 100kHz >>> isn't going to allow the above case to be solved - the interrupt is >>> delayed by around 2ms, and it takes about 1.4ms to send/receive 16 bytes >>> at 100kHz. (9 * 16 / (100*10^3)). >>> >>> So, I think all hope is lost for i2c-designware on Juno to cope with >>> reading the EDID from TDA998x reliably. >> >> :-( >> >> I wonder if we can get it work more reliably by using DMA (provided that >> there are DMA channels available for I2C in Juno)? That would allow the >> hardware to perform longer reads without relying on how fast the >> interrupt handler is able to empty the Rx FIFO. > > It would need to DMA to the Tx FIFO to keep it filled - it triggers the > stop condition when the Tx FIFO empties. From what I can see in the > driver, the Tx FIFO not only takes the data but also a "command" to tell > the hardware what to do. > > The Rx FIFO would also need DMA to avoid it overflowing due to high > interrupt latency. > > I don't know what state DMA is in on the Juno, or even whether it has > DMA - it has a PL330 DMA controller, but I see nothing in the DT files > making use of it. There's no DMA request wired up for I2C, unfortunately - only for the UARTs, I2S, FPFGA tile and trace unit. I recall Liviu did have a patch to lower the I2C clock as the mitigation you mention, which I thought had made it into mainline, but apparently not. Robin.
diff --git a/drivers/i2c/busses/i2c-designware-core.c b/drivers/i2c/busses/i2c-designware-core.c index 11e866d05368..060ae9e5a916 100644 --- a/drivers/i2c/busses/i2c-designware-core.c +++ b/drivers/i2c/busses/i2c-designware-core.c @@ -752,6 +752,15 @@ i2c_dw_xfer(struct i2c_adapter *adap, struct i2c_msg msgs[], int num) goto done; } + printk(KERN_DEBUG "%s: %p:%d -> %p:%d (%d:%d) [%x %x %x %x] %d [tx:%p:%d] [rx:%p:%d]\n", + __func__, msgs, num, + dev->msgs, dev->msgs_num, + dev->msg_write_idx, dev->msg_read_idx, + dev->cmd_err, dev->msg_err, dev->status, dev->abort_source, + dev->rx_outstanding, + dev->tx_buf, dev->tx_buf_len, + dev->rx_buf, dev->rx_buf_len); + if (dev->msg_err) { ret = dev->msg_err; goto done; @@ -857,7 +866,7 @@ static irqreturn_t i2c_dw_isr(int this_irq, void *dev_id) enabled = dw_readl(dev, DW_IC_ENABLE); stat = dw_readl(dev, DW_IC_RAW_INTR_STAT); - dev_dbg(dev->dev, "%s: enabled=%#x stat=%#x\n", __func__, enabled, stat); + dev_printk(KERN_DEBUG, dev->dev, "%s: enabled=%#x stat=%#x\n", __func__, enabled, stat); if (!enabled || !(stat & ~DW_IC_INTR_ACTIVITY)) return IRQ_NONE;