diff mbox series

i2c: imx: increase retries on arbitration loss

Message ID 20221216084511.2576786-1-primoz.fiser@norik.com (mailing list archive)
State New, archived
Headers show
Series i2c: imx: increase retries on arbitration loss | expand

Commit Message

Primoz Fiser Dec. 16, 2022, 8:45 a.m. UTC
By default, retries value is set to 0 (no retries). Set retries to more
sensible value of 3 to allow i2c core to re-attempt transfer in case of
i2c arbitration loss (i2c-imx returns -EAGAIN errno is such case).

Signed-off-by: Primoz Fiser <primoz.fiser@norik.com>
---
 drivers/i2c/busses/i2c-imx.c | 1 +
 1 file changed, 1 insertion(+)

Comments

Marco Felsch Dec. 16, 2022, 9:45 a.m. UTC | #1
Hi Primoz,

On 22-12-16, Primoz Fiser wrote:
> By default, retries value is set to 0 (no retries). Set retries to more
> sensible value of 3 to allow i2c core to re-attempt transfer in case of
> i2c arbitration loss (i2c-imx returns -EAGAIN errno is such case).

apart the fact that the number of retries vary a lot and so the client
driver behaviour can vary a lot which is not good IMHO, why do you think
that 3 is a sufficient number?

If an arbitration loss happen, why do you think that retrying it 3 times
changes that?

Regards,
  Marco


> 
> Signed-off-by: Primoz Fiser <primoz.fiser@norik.com>
> ---
>  drivers/i2c/busses/i2c-imx.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/i2c/busses/i2c-imx.c b/drivers/i2c/busses/i2c-imx.c
> index cf5bacf3a488..6a5694cfe1cc 100644
> --- a/drivers/i2c/busses/i2c-imx.c
> +++ b/drivers/i2c/busses/i2c-imx.c
> @@ -1478,6 +1478,7 @@ static int i2c_imx_probe(struct platform_device *pdev)
>  	i2c_imx->adapter.dev.parent	= &pdev->dev;
>  	i2c_imx->adapter.nr		= pdev->id;
>  	i2c_imx->adapter.dev.of_node	= pdev->dev.of_node;
> +	i2c_imx->adapter.retries	= 3;
>  	i2c_imx->base			= base;
>  	ACPI_COMPANION_SET(&i2c_imx->adapter.dev, ACPI_COMPANION(&pdev->dev));
>  
> -- 
> 2.25.1
> 
> 
>
Primoz Fiser Dec. 16, 2022, 10:41 a.m. UTC | #2
Hi Marco,

On 16. 12. 22 10:45, Marco Felsch wrote:
> Hi Primoz,
> 
> On 22-12-16, Primoz Fiser wrote:
>> By default, retries value is set to 0 (no retries). Set retries to more
>> sensible value of 3 to allow i2c core to re-attempt transfer in case of
>> i2c arbitration loss (i2c-imx returns -EAGAIN errno is such case).
> 
> apart the fact that the number of retries vary a lot and so the client
> driver behaviour can vary a lot which is not good IMHO, why do you think
> that 3 is a sufficient number?

IMHO it is better than leaving it at 0 (no retries)?

Setting it to sensible value like 3 will at least attempt to make 
transfer in case arbitration-loss occurs.

> 
> If an arbitration loss happen, why do you think that retrying it 3 times
> changes that?

I our case, setting retries to non-zero value solves issues with PMIC 
shutdown on phyboard-mira which in some rare cases fails with "Failed to
shutdown (err =  -11)" (-EAGAIN).

To me it makes common sense retries is set to non-zero value especially 
for such rare conditions/situations.

BR,
Primoz

> 
> Regards,
>    Marco
> 
> 
>>
>> Signed-off-by: Primoz Fiser <primoz.fiser@norik.com>
>> ---
>>   drivers/i2c/busses/i2c-imx.c | 1 +
>>   1 file changed, 1 insertion(+)
>>
>> diff --git a/drivers/i2c/busses/i2c-imx.c b/drivers/i2c/busses/i2c-imx.c
>> index cf5bacf3a488..6a5694cfe1cc 100644
>> --- a/drivers/i2c/busses/i2c-imx.c
>> +++ b/drivers/i2c/busses/i2c-imx.c
>> @@ -1478,6 +1478,7 @@ static int i2c_imx_probe(struct platform_device *pdev)
>>   	i2c_imx->adapter.dev.parent	= &pdev->dev;
>>   	i2c_imx->adapter.nr		= pdev->id;
>>   	i2c_imx->adapter.dev.of_node	= pdev->dev.of_node;
>> +	i2c_imx->adapter.retries	= 3;
>>   	i2c_imx->base			= base;
>>   	ACPI_COMPANION_SET(&i2c_imx->adapter.dev, ACPI_COMPANION(&pdev->dev));
>>   
>> -- 
>> 2.25.1
>>
>>
>>
Oleksij Rempel Dec. 16, 2022, 11:02 a.m. UTC | #3
On Fri, Dec 16, 2022 at 11:41:08AM +0100, Primoz Fiser wrote:
> Hi Marco,
> 
> On 16. 12. 22 10:45, Marco Felsch wrote:
> > Hi Primoz,
> > 
> > On 22-12-16, Primoz Fiser wrote:
> > > By default, retries value is set to 0 (no retries). Set retries to more
> > > sensible value of 3 to allow i2c core to re-attempt transfer in case of
> > > i2c arbitration loss (i2c-imx returns -EAGAIN errno is such case).
> > 
> > apart the fact that the number of retries vary a lot and so the client
> > driver behaviour can vary a lot which is not good IMHO, why do you think
> > that 3 is a sufficient number?
> 
> IMHO it is better than leaving it at 0 (no retries)?
> 
> Setting it to sensible value like 3 will at least attempt to make transfer
> in case arbitration-loss occurs.
> 
> > 
> > If an arbitration loss happen, why do you think that retrying it 3 times
> > changes that?
> 
> I our case, setting retries to non-zero value solves issues with PMIC
> shutdown on phyboard-mira which in some rare cases fails with "Failed to
> shutdown (err =  -11)" (-EAGAIN).
> 
> To me it makes common sense retries is set to non-zero value especially for
> such rare conditions/situations.

https://lore.kernel.org/all/Ys1bw9zuIwWS+bqw@shikoro/
Uwe Kleine-König Dec. 16, 2022, 11:13 a.m. UTC | #4
On Fri, Dec 16, 2022 at 12:02:27PM +0100, Oleksij Rempel wrote:
> On Fri, Dec 16, 2022 at 11:41:08AM +0100, Primoz Fiser wrote:
> > Hi Marco,
> > 
> > On 16. 12. 22 10:45, Marco Felsch wrote:
> > > Hi Primoz,
> > > 
> > > On 22-12-16, Primoz Fiser wrote:
> > > > By default, retries value is set to 0 (no retries). Set retries to more
> > > > sensible value of 3 to allow i2c core to re-attempt transfer in case of
> > > > i2c arbitration loss (i2c-imx returns -EAGAIN errno is such case).
> > > 
> > > apart the fact that the number of retries vary a lot and so the client
> > > driver behaviour can vary a lot which is not good IMHO, why do you think
> > > that 3 is a sufficient number?
> > 
> > IMHO it is better than leaving it at 0 (no retries)?
> > 
> > Setting it to sensible value like 3 will at least attempt to make transfer
> > in case arbitration-loss occurs.
> > 
> > > 
> > > If an arbitration loss happen, why do you think that retrying it 3 times
> > > changes that?
> > 
> > I our case, setting retries to non-zero value solves issues with PMIC
> > shutdown on phyboard-mira which in some rare cases fails with "Failed to
> > shutdown (err =  -11)" (-EAGAIN).
> > 
> > To me it makes common sense retries is set to non-zero value especially for
> > such rare conditions/situations.
> 
> https://lore.kernel.org/all/Ys1bw9zuIwWS+bqw@shikoro/

Also in the same thread there is the question about better setting it in
the i2c core if 3 instead of 0 is a good idea for the imx driver.

Best regards
Uwe
Primoz Fiser Dec. 16, 2022, 12:23 p.m. UTC | #5
Hi all,

On 16. 12. 22 12:13, Uwe Kleine-König wrote:
> On Fri, Dec 16, 2022 at 12:02:27PM +0100, Oleksij Rempel wrote:
>> On Fri, Dec 16, 2022 at 11:41:08AM +0100, Primoz Fiser wrote:
>>> Hi Marco,
>>>
>>> On 16. 12. 22 10:45, Marco Felsch wrote:
>>>> Hi Primoz,
>>>>
>>>> On 22-12-16, Primoz Fiser wrote:
>>>>> By default, retries value is set to 0 (no retries). Set retries to more
>>>>> sensible value of 3 to allow i2c core to re-attempt transfer in case of
>>>>> i2c arbitration loss (i2c-imx returns -EAGAIN errno is such case).
>>>>
>>>> apart the fact that the number of retries vary a lot and so the client
>>>> driver behaviour can vary a lot which is not good IMHO, why do you think
>>>> that 3 is a sufficient number?
>>>
>>> IMHO it is better than leaving it at 0 (no retries)?
>>>
>>> Setting it to sensible value like 3 will at least attempt to make transfer
>>> in case arbitration-loss occurs.
>>>
>>>>
>>>> If an arbitration loss happen, why do you think that retrying it 3 times
>>>> changes that?
>>>
>>> I our case, setting retries to non-zero value solves issues with PMIC
>>> shutdown on phyboard-mira which in some rare cases fails with "Failed to
>>> shutdown (err =  -11)" (-EAGAIN).
>>>
>>> To me it makes common sense retries is set to non-zero value especially for
>>> such rare conditions/situations.
>>
>> https://lore.kernel.org/all/Ys1bw9zuIwWS+bqw@shikoro/

Ohh I see.

Reading through the thread I guess we aren't getting this mainlined at 
all :)

But let me switch side and ask why do you think leaving retries = 0 is a 
good idea?

The only solid point in the thread seems to be that in that case we are 
not covering up the potential i2c hardware issues?

Yeah fair point but on the other hand, goal of this patch would be to 
improve robustness in case of otherwise good performing hardware. From 
user perspective I just want it to work despite it retrying under the 
hood from time to time. I think Francesco had the same idea.

> 
> Also in the same thread there is the question about better setting it in
> the i2c core if 3 instead of 0 is a good idea for the imx driver.

Using I2C_RETRIES ioctl for this seems a bit of an overkill considering 
other i2c bus drivers also set retries to non-zero value. But anyways, 
thank you for the idea.


> 
> Best regards
> Uwe
> 

BR,
Primoz
Francesco Dolcini Dec. 16, 2022, 12:51 p.m. UTC | #6
On Fri, Dec 16, 2022 at 01:23:29PM +0100, Primoz Fiser wrote:
> Hi all,
> 
> On 16. 12. 22 12:13, Uwe Kleine-König wrote:
> > On Fri, Dec 16, 2022 at 12:02:27PM +0100, Oleksij Rempel wrote:
> > > On Fri, Dec 16, 2022 at 11:41:08AM +0100, Primoz Fiser wrote:
> > > > Hi Marco,
> > > > 
> > > > On 16. 12. 22 10:45, Marco Felsch wrote:
> > > > > Hi Primoz,
> > > > > 
> > > > > On 22-12-16, Primoz Fiser wrote:
> > > > > > By default, retries value is set to 0 (no retries). Set retries to more
> > > > > > sensible value of 3 to allow i2c core to re-attempt transfer in case of
> > > > > > i2c arbitration loss (i2c-imx returns -EAGAIN errno is such case).
> > > > > 
> > > > > apart the fact that the number of retries vary a lot and so the client
> > > > > driver behaviour can vary a lot which is not good IMHO, why do you think
> > > > > that 3 is a sufficient number?
> > > > 
> > > > IMHO it is better than leaving it at 0 (no retries)?
> > > > 
> > > > Setting it to sensible value like 3 will at least attempt to make transfer
> > > > in case arbitration-loss occurs.
> > > > 
> > > > > 
> > > > > If an arbitration loss happen, why do you think that retrying it 3 times
> > > > > changes that?
> > > > 
> > > > I our case, setting retries to non-zero value solves issues with PMIC
> > > > shutdown on phyboard-mira which in some rare cases fails with "Failed to
> > > > shutdown (err =  -11)" (-EAGAIN).
> > > > 
> > > > To me it makes common sense retries is set to non-zero value especially for
> > > > such rare conditions/situations.
> > > 
> > > https://lore.kernel.org/all/Ys1bw9zuIwWS+bqw@shikoro/
> 
> Ohh I see.
> 
> Reading through the thread I guess we aren't getting this mainlined at all
> :)
>
> The only solid point in the thread seems to be that in that case we are not
> covering up the potential i2c hardware issues?

I believe that in this case we should just have a warning in the kernel.
The retry potentially work-around a transient issue and we do not hide any hardware
issue at the same time. It seems an easy win-win solution.

> Yeah fair point but on the other hand, goal of this patch would be to
> improve robustness in case of otherwise good performing hardware. From user
> perspective I just want it to work despite it retrying under the hood from
> time to time. I think Francesco had the same idea.

Unfortunately I was missing the exact background that made us do this
change, we just had it sitting in our fork for too long :-/
This is one of the reason I gave up on it.

Quoting Uwe [1]
> sometimes there is no practical way around such work arounds. I happens
> from time to time that the reason for problem is known, but fixing the
> hardware is no option, then you need such workrounds. (This applies to
> both, retrying the transfers and resetting the bus.) 

Francesco

[1] https://lore.kernel.org/all/20220715083400.q226rrwxsgt4eomp@pengutronix.de/
Primoz Fiser Dec. 28, 2022, 8:01 a.m. UTC | #7
Hi all,

On 16. 12. 22 13:51, Francesco Dolcini wrote:
> On Fri, Dec 16, 2022 at 01:23:29PM +0100, Primoz Fiser wrote:
>> Hi all,
>>
>> On 16. 12. 22 12:13, Uwe Kleine-König wrote:
>>> On Fri, Dec 16, 2022 at 12:02:27PM +0100, Oleksij Rempel wrote:
>>>> On Fri, Dec 16, 2022 at 11:41:08AM +0100, Primoz Fiser wrote:
>>>>> Hi Marco,
>>>>>
>>>>> On 16. 12. 22 10:45, Marco Felsch wrote:
>>>>>> Hi Primoz,
>>>>>>
>>>>>> On 22-12-16, Primoz Fiser wrote:
>>>>>>> By default, retries value is set to 0 (no retries). Set retries to more
>>>>>>> sensible value of 3 to allow i2c core to re-attempt transfer in case of
>>>>>>> i2c arbitration loss (i2c-imx returns -EAGAIN errno is such case).
>>>>>>
>>>>>> apart the fact that the number of retries vary a lot and so the client
>>>>>> driver behaviour can vary a lot which is not good IMHO, why do you think
>>>>>> that 3 is a sufficient number?
>>>>>
>>>>> IMHO it is better than leaving it at 0 (no retries)?
>>>>>
>>>>> Setting it to sensible value like 3 will at least attempt to make transfer
>>>>> in case arbitration-loss occurs.
>>>>>
>>>>>>
>>>>>> If an arbitration loss happen, why do you think that retrying it 3 times
>>>>>> changes that?
>>>>>
>>>>> I our case, setting retries to non-zero value solves issues with PMIC
>>>>> shutdown on phyboard-mira which in some rare cases fails with "Failed to
>>>>> shutdown (err =  -11)" (-EAGAIN).
>>>>>
>>>>> To me it makes common sense retries is set to non-zero value especially for
>>>>> such rare conditions/situations.
>>>>
>>>> https://lore.kernel.org/all/Ys1bw9zuIwWS+bqw@shikoro/
>>
>> Ohh I see.
>>
>> Reading through the thread I guess we aren't getting this mainlined at all
>> :)
>>
>> The only solid point in the thread seems to be that in that case we are not
>> covering up the potential i2c hardware issues?
> 
> I believe that in this case we should just have a warning in the kernel.
> The retry potentially work-around a transient issue and we do not hide any hardware
> issue at the same time. It seems an easy win-win solution.

I would agree about throwing a warning message in retry case.

Not sure how would it affect other i2c bus drivers using retries > 0. 
Retries might be pretty rare with i2c-imx but some other drivers set 
this to 5 for example. At least using _ratelimited printk is a must 
using this approach.

> 
>> Yeah fair point but on the other hand, goal of this patch would be to
>> improve robustness in case of otherwise good performing hardware. From user
>> perspective I just want it to work despite it retrying under the hood from
>> time to time. I think Francesco had the same idea.
> 
> Unfortunately I was missing the exact background that made us do this
> change, we just had it sitting in our fork for too long :-/
> This is one of the reason I gave up on it.
> 
> Quoting Uwe [1]
>> sometimes there is no practical way around such work arounds. I happens
>> from time to time that the reason for problem is known, but fixing the
>> hardware is no option, then you need such workrounds. (This applies to
>> both, retrying the transfers and resetting the bus.)

I wouldn't say this is exactly a workaround if "retries mechanism" is 
standard part of the i2c core. Other drivers use it as well. it is just 
a setting to improve bus robustness.

But OK, I guess we can live with this one-liner in the downstream kernel.

> 
> Francesco
> 
> [1] https://lore.kernel.org/all/20220715083400.q226rrwxsgt4eomp@pengutronix.de/
>
Francesco Dolcini Dec. 30, 2022, 2:40 p.m. UTC | #8
+Wolfram

On Wed, Dec 28, 2022 at 09:01:46AM +0100, Primoz Fiser wrote:
> On 16. 12. 22 13:51, Francesco Dolcini wrote:
> > On Fri, Dec 16, 2022 at 01:23:29PM +0100, Primoz Fiser wrote:
> > > The only solid point in the thread seems to be that in that case we are not
> > > covering up the potential i2c hardware issues?
> > 
> > I believe that in this case we should just have a warning in the kernel.
> > The retry potentially work-around a transient issue and we do not hide any hardware
> > issue at the same time. It seems an easy win-win solution.
> 
> I would agree about throwing a warning message in retry case.
> 
> Not sure how would it affect other i2c bus drivers using retries > 0.
> Retries might be pretty rare with i2c-imx but some other drivers set this to
> 5 for example. At least using _ratelimited printk is a must using this
> approach.

Wolfram, Uwe, Oleksij

Would it be acceptable to have a warning when we have I2C retries, and
with that in place enabling retries on the imx driver?

It exists hardware that requires this to work correctly, and at a
minimum setting the retry count from user space is not going to solve
potential issues during initial driver probe.

To me the only reasonable solution is to have the retry enabled with a
sensible number (3? 5?), however there is a concern that this might
hide real hardware issues.

> > > Yeah fair point but on the other hand, goal of this patch would be to
> > > improve robustness in case of otherwise good performing hardware. From user
> > > perspective I just want it to work despite it retrying under the hood from
> > > time to time. I think Francesco had the same idea.
> > 
> > Unfortunately I was missing the exact background that made us do this
> > change, we just had it sitting in our fork for too long :-/
> > This is one of the reason I gave up on it.
> > 
> > Quoting Uwe [1]
> > > sometimes there is no practical way around such work arounds. I happens
> > > from time to time that the reason for problem is known, but fixing the
> > > hardware is no option, then you need such workrounds. (This applies to
> > > both, retrying the transfers and resetting the bus.)
> 
> I wouldn't say this is exactly a workaround if "retries mechanism" is
> standard part of the i2c core. Other drivers use it as well. it is just a
> setting to improve bus robustness.
> 
> But OK, I guess we can live with this one-liner in the downstream kernel.
To me this is not a good idea.

Francesco
Oleksij Rempel Dec. 30, 2022, 4:12 p.m. UTC | #9
On Fri, Dec 30, 2022 at 03:40:58PM +0100, Francesco Dolcini wrote:
> +Wolfram
> 
> On Wed, Dec 28, 2022 at 09:01:46AM +0100, Primoz Fiser wrote:
> > On 16. 12. 22 13:51, Francesco Dolcini wrote:
> > > On Fri, Dec 16, 2022 at 01:23:29PM +0100, Primoz Fiser wrote:
> > > > The only solid point in the thread seems to be that in that case we are not
> > > > covering up the potential i2c hardware issues?
> > > 
> > > I believe that in this case we should just have a warning in the kernel.
> > > The retry potentially work-around a transient issue and we do not hide any hardware
> > > issue at the same time. It seems an easy win-win solution.
> > 
> > I would agree about throwing a warning message in retry case.
> > 
> > Not sure how would it affect other i2c bus drivers using retries > 0.
> > Retries might be pretty rare with i2c-imx but some other drivers set this to
> > 5 for example. At least using _ratelimited printk is a must using this
> > approach.
> 
> Wolfram, Uwe, Oleksij
> 
> Would it be acceptable to have a warning when we have I2C retries, and
> with that in place enabling retries on the imx driver?
> 
> It exists hardware that requires this to work correctly,

Well, this is persistent confusion in this monolog. It will not make it
correctly.

> and at a
> minimum setting the retry count from user space is not going to solve
> potential issues during initial driver probe.

I assume it is not clear from programmer point of view. Lets try other way:

- The I2C slave could not correctly interpret the data on SDA because the SDA
  high or low-level voltages do not reach its appropriate input
  thresholds.

This means:

You have this:

    /-\    /-\ ----- 2.5Vcc
___/   \__/   \___

Instead of this:

     /-\     /-\ ----- 3.3Vcc
    /  \    /   \
___/    \__/     \___

This is bad, because master or slave will not be able to interpret the pick level
correctly. It may see some times 0 instead of 1. This means, what ever we are
writing we are to the slave or reading from the slave is potentially corrupt
and only __sometimes__ the master was able to detect it. 

- The I2C slave missed an SCL cycle because the SCL high or low-level voltages
  do not reach its appropriate input thresholds.

This means, the bus frequency is too high for current configured or physical PCB
designed. So, you will have different kind of corruptions and some times they
will be detected. 

- The I2C slave accidently interpreted a spike etc. as an SCL cycle.

This means the noise level is to high. The driver strange should be increased
or PCB redesign should be made. May be there are more options. If not done,
data corruption can be expected.

None of this issue can be "fixed" by retries or made more "robust".
Doing more retries means: we do what ever we do until the system was not able to
detect the error.

> To me the only reasonable solution is to have the retry enabled with a
> sensible number (3? 5?), however there is a concern that this might
> hide real hardware issues.
 
There is real hardware issue. 

Regards,
Oleksij
Francesco Dolcini Dec. 30, 2022, 4:47 p.m. UTC | #10
On Fri, Dec 30, 2022 at 05:12:09PM +0100, Oleksij Rempel wrote:
> On Fri, Dec 30, 2022 at 03:40:58PM +0100, Francesco Dolcini wrote:
> > +Wolfram
> > 
> > On Wed, Dec 28, 2022 at 09:01:46AM +0100, Primoz Fiser wrote:
> > > On 16. 12. 22 13:51, Francesco Dolcini wrote:
> > > > On Fri, Dec 16, 2022 at 01:23:29PM +0100, Primoz Fiser wrote:
> > > > > The only solid point in the thread seems to be that in that case we are not
> > > > > covering up the potential i2c hardware issues?
> > > > 
> > > > I believe that in this case we should just have a warning in the kernel.
> > > > The retry potentially work-around a transient issue and we do not hide any hardware
> > > > issue at the same time. It seems an easy win-win solution.
> > > 
> > > I would agree about throwing a warning message in retry case.
> > > 
> > > Not sure how would it affect other i2c bus drivers using retries > 0.
> > > Retries might be pretty rare with i2c-imx but some other drivers set this to
> > > 5 for example. At least using _ratelimited printk is a must using this
> > > approach.
> > 
> > Wolfram, Uwe, Oleksij
> > 
> > Would it be acceptable to have a warning when we have I2C retries, and
> > with that in place enabling retries on the imx driver?
> > 
> > It exists hardware that requires this to work correctly,
> 
> Well, this is persistent confusion in this monolog. It will not make it
> correctly.
> 
> > and at a
> > minimum setting the retry count from user space is not going to solve
> > potential issues during initial driver probe.
> 
> I assume it is not clear from programmer point of view. Lets try other way:
> 
> - The I2C slave could not correctly interpret the data on SDA because the SDA
>   high or low-level voltages do not reach its appropriate input
>   thresholds.
> 
> This means:
> 
> You have this:
> 
>     /-\    /-\ ----- 2.5Vcc
> ___/   \__/   \___
> 
> Instead of this:
> 
>      /-\     /-\ ----- 3.3Vcc
>     /  \    /   \
> ___/    \__/     \___
> 
> This is bad, because master or slave will not be able to interpret the pick level
> correctly. It may see some times 0 instead of 1. This means, what ever we are
> writing we are to the slave or reading from the slave is potentially corrupt
> and only __sometimes__ the master was able to detect it. 
> 
> - The I2C slave missed an SCL cycle because the SCL high or low-level voltages
>   do not reach its appropriate input thresholds.
> 
> This means, the bus frequency is too high for current configured or physical PCB
> designed. So, you will have different kind of corruptions and some times they
> will be detected. 
> 
> - The I2C slave accidently interpreted a spike etc. as an SCL cycle.
> 
> This means the noise level is to high. The driver strange should be increased
> or PCB redesign should be made. May be there are more options. If not done,
> data corruption can be expected.
> 
> None of this issue can be "fixed" by retries or made more "robust".
> Doing more retries means: we do what ever we do until the system was not able to
> detect the error.

Hello Oleksij,
thanks for the detailed explanation, appreciated.

Given that is it correct that the i2c imx driver return EAGAIN in such a
case (arbitration error)? You made it crystal clear that there is no
such thing as try again for this error, I would be inclined to prepare a
patch to fix this.

diff --git a/drivers/i2c/busses/i2c-imx.c b/drivers/i2c/busses/i2c-imx.c
index cf5bacf3a488..a2a581c8ae07 100644
--- a/drivers/i2c/busses/i2c-imx.c
+++ b/drivers/i2c/busses/i2c-imx.c
@@ -492,7 +492,7 @@ static int i2c_imx_bus_busy(struct imx_i2c_struct *i2c_imx, int for_busy, bool a
                /* check for arbitration lost */
                if (temp & I2SR_IAL) {
                        i2c_imx_clear_irq(i2c_imx, I2SR_IAL);
-                       return -EAGAIN;
+                       return -EIO;
                }

                if (for_busy && (temp & I2SR_IBB)) {


In addition to that is there any valid use case of the i2c retry
mechanism? Is possible for an I2C controller to report anything that can
be recovered with a retry?

Francesco
Francesco Dolcini Dec. 30, 2022, 5:09 p.m. UTC | #11
On Fri, Dec 30, 2022 at 05:47:42PM +0100, Francesco Dolcini wrote:
> On Fri, Dec 30, 2022 at 05:12:09PM +0100, Oleksij Rempel wrote:
> > On Fri, Dec 30, 2022 at 03:40:58PM +0100, Francesco Dolcini wrote:
> > > +Wolfram
> > > 
> > > On Wed, Dec 28, 2022 at 09:01:46AM +0100, Primoz Fiser wrote:
> > > > On 16. 12. 22 13:51, Francesco Dolcini wrote:
> > > > > On Fri, Dec 16, 2022 at 01:23:29PM +0100, Primoz Fiser wrote:
> > > > > > The only solid point in the thread seems to be that in that case we are not
> > > > > > covering up the potential i2c hardware issues?
> > > > > 
> > > > > I believe that in this case we should just have a warning in the kernel.
> > > > > The retry potentially work-around a transient issue and we do not hide any hardware
> > > > > issue at the same time. It seems an easy win-win solution.
> > > > 
> > > > I would agree about throwing a warning message in retry case.
> > > > 
> > > > Not sure how would it affect other i2c bus drivers using retries > 0.
> > > > Retries might be pretty rare with i2c-imx but some other drivers set this to
> > > > 5 for example. At least using _ratelimited printk is a must using this
> > > > approach.
> > > 
> > > Wolfram, Uwe, Oleksij
> > > 
> > > Would it be acceptable to have a warning when we have I2C retries, and
> > > with that in place enabling retries on the imx driver?
> > > 
> > > It exists hardware that requires this to work correctly,
> > 
> > Well, this is persistent confusion in this monolog. It will not make it
> > correctly.
> > 
> > > and at a
> > > minimum setting the retry count from user space is not going to solve
> > > potential issues during initial driver probe.
> > 
> > I assume it is not clear from programmer point of view. Lets try other way:
> > 
> > - The I2C slave could not correctly interpret the data on SDA because the SDA
> >   high or low-level voltages do not reach its appropriate input
> >   thresholds.
> > 
> > This means:
> > 
> > You have this:
> > 
> >     /-\    /-\ ----- 2.5Vcc
> > ___/   \__/   \___
> > 
> > Instead of this:
> > 
> >      /-\     /-\ ----- 3.3Vcc
> >     /  \    /   \
> > ___/    \__/     \___
> > 
> > This is bad, because master or slave will not be able to interpret the pick level
> > correctly. It may see some times 0 instead of 1. This means, what ever we are
> > writing we are to the slave or reading from the slave is potentially corrupt
> > and only __sometimes__ the master was able to detect it. 
> > 
> > - The I2C slave missed an SCL cycle because the SCL high or low-level voltages
> >   do not reach its appropriate input thresholds.
> > 
> > This means, the bus frequency is too high for current configured or physical PCB
> > designed. So, you will have different kind of corruptions and some times they
> > will be detected. 
> > 
> > - The I2C slave accidently interpreted a spike etc. as an SCL cycle.
> > 
> > This means the noise level is to high. The driver strange should be increased
> > or PCB redesign should be made. May be there are more options. If not done,
> > data corruption can be expected.
> > 
> > None of this issue can be "fixed" by retries or made more "robust".
> > Doing more retries means: we do what ever we do until the system was not able to
> > detect the error.
> 
> Hello Oleksij,
> thanks for the detailed explanation, appreciated.
> 
> Given that is it correct that the i2c imx driver return EAGAIN in such a
> case (arbitration error)? You made it crystal clear that there is no
> such thing as try again for this error, I would be inclined to prepare a
> patch to fix this.
> 
> diff --git a/drivers/i2c/busses/i2c-imx.c b/drivers/i2c/busses/i2c-imx.c
> index cf5bacf3a488..a2a581c8ae07 100644
> --- a/drivers/i2c/busses/i2c-imx.c
> +++ b/drivers/i2c/busses/i2c-imx.c
> @@ -492,7 +492,7 @@ static int i2c_imx_bus_busy(struct imx_i2c_struct *i2c_imx, int for_busy, bool a
>                 /* check for arbitration lost */
>                 if (temp & I2SR_IAL) {
>                         i2c_imx_clear_irq(i2c_imx, I2SR_IAL);
> -                       return -EAGAIN;
> +                       return -EIO;
>                 }
> 
>                 if (for_busy && (temp & I2SR_IBB)) {
> 

Just a small addition, the tegra i2c driver is interesting.
It returns EAGAIN only when an arbitration error is detected on multi master
node, otherwise it tries the bus recovery procedure.

	/* start recovery upon arbitration loss in single master mode */
	if (i2c_dev->msg_err == I2C_ERR_ARBITRATION_LOST) {
		if (!i2c_dev->multimaster_mode)
			return i2c_recover_bus(&i2c_dev->adapter);

		return -EAGAIN;
	}

Francesco
Oleksij Rempel Dec. 30, 2022, 5:25 p.m. UTC | #12
On Fri, Dec 30, 2022 at 05:47:21PM +0100, Francesco Dolcini wrote:
> On Fri, Dec 30, 2022 at 05:12:09PM +0100, Oleksij Rempel wrote:
> > On Fri, Dec 30, 2022 at 03:40:58PM +0100, Francesco Dolcini wrote:
> > > +Wolfram
> > > 
> > > On Wed, Dec 28, 2022 at 09:01:46AM +0100, Primoz Fiser wrote:
> > > > On 16. 12. 22 13:51, Francesco Dolcini wrote:
> > > > > On Fri, Dec 16, 2022 at 01:23:29PM +0100, Primoz Fiser wrote:
> > > > > > The only solid point in the thread seems to be that in that case we are not
> > > > > > covering up the potential i2c hardware issues?
> > > > > 
> > > > > I believe that in this case we should just have a warning in the kernel.
> > > > > The retry potentially work-around a transient issue and we do not hide any hardware
> > > > > issue at the same time. It seems an easy win-win solution.
> > > > 
> > > > I would agree about throwing a warning message in retry case.
> > > > 
> > > > Not sure how would it affect other i2c bus drivers using retries > 0.
> > > > Retries might be pretty rare with i2c-imx but some other drivers set this to
> > > > 5 for example. At least using _ratelimited printk is a must using this
> > > > approach.
> > > 
> > > Wolfram, Uwe, Oleksij
> > > 
> > > Would it be acceptable to have a warning when we have I2C retries, and
> > > with that in place enabling retries on the imx driver?
> > > 
> > > It exists hardware that requires this to work correctly,
> > 
> > Well, this is persistent confusion in this monolog. It will not make it
> > correctly.
> > 
> > > and at a
> > > minimum setting the retry count from user space is not going to solve
> > > potential issues during initial driver probe.
> > 
> > I assume it is not clear from programmer point of view. Lets try other way:
> > 
> > - The I2C slave could not correctly interpret the data on SDA because the SDA
> >   high or low-level voltages do not reach its appropriate input
> >   thresholds.
> > 
> > This means:
> > 
> > You have this:
> > 
> >     /-\    /-\ ----- 2.5Vcc
> > ___/   \__/   \___
> > 
> > Instead of this:
> > 
> >      /-\     /-\ ----- 3.3Vcc
> >     /  \    /   \
> > ___/    \__/     \___
> > 
> > This is bad, because master or slave will not be able to interpret the pick level
> > correctly. It may see some times 0 instead of 1. This means, what ever we are
> > writing we are to the slave or reading from the slave is potentially corrupt
> > and only __sometimes__ the master was able to detect it. 
> > 
> > - The I2C slave missed an SCL cycle because the SCL high or low-level voltages
> >   do not reach its appropriate input thresholds.
> > 
> > This means, the bus frequency is too high for current configured or physical PCB
> > designed. So, you will have different kind of corruptions and some times they
> > will be detected. 
> > 
> > - The I2C slave accidently interpreted a spike etc. as an SCL cycle.
> > 
> > This means the noise level is to high. The driver strange should be increased
> > or PCB redesign should be made. May be there are more options. If not done,
> > data corruption can be expected.
> > 
> > None of this issue can be "fixed" by retries or made more "robust".
> > Doing more retries means: we do what ever we do until the system was not able to
> > detect the error.
> 
> Hello Oleksij,
> thanks for the detailed explanation, appreciated.
> 
> Given that is it correct that the i2c imx driver return EAGAIN in such a
> case (arbitration error)? You made it crystal clear that there is no
> such thing as try again for this error, I would be inclined to prepare a
> patch to fix this.
> 
> diff --git a/drivers/i2c/busses/i2c-imx.c b/drivers/i2c/busses/i2c-imx.c
> index cf5bacf3a488..a2a581c8ae07 100644
> --- a/drivers/i2c/busses/i2c-imx.c
> +++ b/drivers/i2c/busses/i2c-imx.c
> @@ -492,7 +492,7 @@ static int i2c_imx_bus_busy(struct imx_i2c_struct *i2c_imx, int for_busy, bool a
>                 /* check for arbitration lost */
>                 if (temp & I2SR_IAL) {
>                         i2c_imx_clear_irq(i2c_imx, I2SR_IAL);
> -                       return -EAGAIN;
> +                       return -EIO;
>                 }
> 
>                 if (for_busy && (temp & I2SR_IBB)) {
> 

Hm, good question.

> In addition to that is there any valid use case of the i2c retry
> mechanism?
> Is possible for an I2C controller to report anything that can
> be recovered with a retry?

In case of multimaster bus, except of noise and signal level issues, we
would have a simple conflict between masters. In this case, we should
retry. Potentially, every master should use randomized pause before
retrying (at last it is done by some other protocol using shared
medium).

Regards,
Oleksij
diff mbox series

Patch

diff --git a/drivers/i2c/busses/i2c-imx.c b/drivers/i2c/busses/i2c-imx.c
index cf5bacf3a488..6a5694cfe1cc 100644
--- a/drivers/i2c/busses/i2c-imx.c
+++ b/drivers/i2c/busses/i2c-imx.c
@@ -1478,6 +1478,7 @@  static int i2c_imx_probe(struct platform_device *pdev)
 	i2c_imx->adapter.dev.parent	= &pdev->dev;
 	i2c_imx->adapter.nr		= pdev->id;
 	i2c_imx->adapter.dev.of_node	= pdev->dev.of_node;
+	i2c_imx->adapter.retries	= 3;
 	i2c_imx->base			= base;
 	ACPI_COMPANION_SET(&i2c_imx->adapter.dev, ACPI_COMPANION(&pdev->dev));