mbox series

[RFC,0/2] Throttle I2C transfers to UCD9000 devices

Message ID 20200914122811.3295678-1-andrew@aj.id.au (mailing list archive)
Headers show
Series Throttle I2C transfers to UCD9000 devices | expand

Message

Andrew Jeffery Sept. 14, 2020, 12:28 p.m. UTC
Hello,

While working with system designs making use of TI's UCD90320 Power
Sequencer we've found that communication with the device isn't terribly
reliable.

It appears that back-to-back transfers where commands addressed to the
device are put onto the bus with intervals between STOP and START in the
neighbourhood of 250us or less can cause bad behaviour. This primarily
happens during driver probe while scanning the device to determine its
capabilities.

We have observed the device causing excessive clock stretches and bus
lockups, and also corruption of the device's volatile state (requiring it
to be reset).  The latter is particularly disruptive in that the controlled
rails are brought down either by:

1. The corruption causing a fault condition, or
2. Asserting the device's reset line to recover

A further observation is that pacing transfers to the device appears to
mitigate the bad behaviour. We're in discussion with TI to better
understand the limitations and at least get the behaviour documented.

This short series implements the mitigation in terms of a throttle in the
i2c_client associated with the device's driver. Before the first
communication with the device in the probe() of ucd9000 we configure the
i2c_client to throttle transfers with a minimum of a 1ms delay (with the
delay exposed as a module parameter).

The series is RFC for several reasons:

The first is to sus out feelings on the general direction. The problem is
pretty unfortunate - are there better ways to implement the mitigation?

If there aren't, then:

I'd like thoughts on whether we want to account for i2c-dev clients.
Implementing throttling in i2c_client feels like a solution-by-proxy as the
throttling is really a property of the targeted device, but we don't have a
coherent representation between platform devices and devices associated
with i2c-dev clients. At the moment we'd have to resort to address-based
lookups for platform data stashed in the transfer functions.

Next is that I've only implemented throttling for SMBus devices. I don't
yet have a use-case for throttling non-SMBus devices so I'm not sure it's
worth poking at it, but would appreciate thoughts there.

Further, I've had a bit of a stab at dealing with atomic transfers that's
not been tested. Hopefully it makes sense.

Finally I'm also interested in feedback on exposing the control in a little
more general manner than having to implement a module parameter in all
drivers that want to take advantage of throttling. This isn't a big problem
at the moment, but if anyone has thoughts there then I'm happy to poke at
those too.

Please review!

Andrew

Andrew Jeffery (2):
  i2c: smbus: Allow throttling of transfers to client devices
  hwmon: (pmbus/ucd9000) Throttle SMBus transfers to avoid poor
    behaviour

 drivers/hwmon/pmbus/ucd9000.c |   6 ++
 drivers/i2c/i2c-core-base.c   |   8 +-
 drivers/i2c/i2c-core-smbus.c  | 149 +++++++++++++++++++++++++++-------
 drivers/i2c/i2c-core.h        |  22 +++++
 include/linux/i2c.h           |   3 +
 5 files changed, 157 insertions(+), 31 deletions(-)

Comments

Andrew Jeffery Sept. 15, 2020, 12:19 a.m. UTC | #1
On Tue, 15 Sep 2020, at 02:13, Guenter Roeck wrote:
> On 9/14/20 5:28 AM, Andrew Jeffery wrote:
> > Hello,
> > 
> > While working with system designs making use of TI's UCD90320 Power
> > Sequencer we've found that communication with the device isn't terribly
> > reliable.
> > 
> > It appears that back-to-back transfers where commands addressed to the
> > device are put onto the bus with intervals between STOP and START in the
> > neighbourhood of 250us or less can cause bad behaviour. This primarily
> > happens during driver probe while scanning the device to determine its
> > capabilities.
> > 
> > We have observed the device causing excessive clock stretches and bus
> > lockups, and also corruption of the device's volatile state (requiring it
> > to be reset).  The latter is particularly disruptive in that the controlled
> > rails are brought down either by:
> > 
> > 1. The corruption causing a fault condition, or
> > 2. Asserting the device's reset line to recover
> > 
> > A further observation is that pacing transfers to the device appears to
> > mitigate the bad behaviour. We're in discussion with TI to better
> > understand the limitations and at least get the behaviour documented.
> > 
> > This short series implements the mitigation in terms of a throttle in the
> > i2c_client associated with the device's driver. Before the first
> > communication with the device in the probe() of ucd9000 we configure the
> > i2c_client to throttle transfers with a minimum of a 1ms delay (with the
> > delay exposed as a module parameter).
> > 
> > The series is RFC for several reasons:
> > 
> > The first is to sus out feelings on the general direction. The problem is
> > pretty unfortunate - are there better ways to implement the mitigation?
> > 
> > If there aren't, then:
> > 
> > I'd like thoughts on whether we want to account for i2c-dev clients.
> > Implementing throttling in i2c_client feels like a solution-by-proxy as the
> > throttling is really a property of the targeted device, but we don't have a
> > coherent representation between platform devices and devices associated
> > with i2c-dev clients. At the moment we'd have to resort to address-based
> > lookups for platform data stashed in the transfer functions.
> > 
> > Next is that I've only implemented throttling for SMBus devices. I don't
> > yet have a use-case for throttling non-SMBus devices so I'm not sure it's
> > worth poking at it, but would appreciate thoughts there.
> > 
> > Further, I've had a bit of a stab at dealing with atomic transfers that's
> > not been tested. Hopefully it makes sense.
> > 
> > Finally I'm also interested in feedback on exposing the control in a little
> > more general manner than having to implement a module parameter in all
> > drivers that want to take advantage of throttling. This isn't a big problem
> > at the moment, but if anyone has thoughts there then I'm happy to poke at
> > those too.
> > 
> 
> As mentioned in patch 2/2, I don't think a module parameter is a good idea.
> I think this should be implemented on driver level, similar to zl6100.c,
> it should be limited to affected devices and not be user controllable.

Yep. I will look at zl6100.c.

> 
> In respect to implementation in the i2c core vs in drivers: So far we
> encountered this problem for some Zilker labs devices and for some LTC
> devices. While the solution needed here looks similar to the solution
> implemented for Zilker labs devices, the solution for LTC devices is
> different. I am not sure if an implementation in the i2c core is
> desirable. It looks quite invasive to me, and it won't solve the problem
> for all devices since it isn't always a simple "wait <n> microseconds
> between accesses". For example, some devices may require a wait after
> a write but not after a read, or a wait only after certain commands (such
> as commands writing to an EEPROM). Other devices may require a mechanism
> different to "wait a certain period of time". It seems all but impossible
> to implement a generic mechanism on i2c level.

Yep, that's fair. I went this route to avoid implementing two sets of handlers 
providing the pacing in the driver (for before and after we register with the 
pmbus core), but it is invasive as you point out. Let me look at your suggested 
alternatives and get back to you.

Andrew
Andrew Jeffery Sept. 16, 2020, 5:35 a.m. UTC | #2
On Tue, 15 Sep 2020, at 02:13, Guenter Roeck wrote:
> On 9/14/20 5:28 AM, Andrew Jeffery wrote:
> > Hello,
> > 
> > While working with system designs making use of TI's UCD90320 Power
> > Sequencer we've found that communication with the device isn't terribly
> > reliable.
> > 
> > It appears that back-to-back transfers where commands addressed to the
> > device are put onto the bus with intervals between STOP and START in the
> > neighbourhood of 250us or less can cause bad behaviour. This primarily
> > happens during driver probe while scanning the device to determine its
> > capabilities.
> > 
> > We have observed the device causing excessive clock stretches and bus
> > lockups, and also corruption of the device's volatile state (requiring it
> > to be reset).  The latter is particularly disruptive in that the controlled
> > rails are brought down either by:
> > 
> > 1. The corruption causing a fault condition, or
> > 2. Asserting the device's reset line to recover
> > 
> > A further observation is that pacing transfers to the device appears to
> > mitigate the bad behaviour. We're in discussion with TI to better
> > understand the limitations and at least get the behaviour documented.
> > 
> > This short series implements the mitigation in terms of a throttle in the
> > i2c_client associated with the device's driver. Before the first
> > communication with the device in the probe() of ucd9000 we configure the
> > i2c_client to throttle transfers with a minimum of a 1ms delay (with the
> > delay exposed as a module parameter).
> > 
> > The series is RFC for several reasons:
> > 
> > The first is to sus out feelings on the general direction. The problem is
> > pretty unfortunate - are there better ways to implement the mitigation?
> > 
> > If there aren't, then:
> > 
> > I'd like thoughts on whether we want to account for i2c-dev clients.
> > Implementing throttling in i2c_client feels like a solution-by-proxy as the
> > throttling is really a property of the targeted device, but we don't have a
> > coherent representation between platform devices and devices associated
> > with i2c-dev clients. At the moment we'd have to resort to address-based
> > lookups for platform data stashed in the transfer functions.
> > 
> > Next is that I've only implemented throttling for SMBus devices. I don't
> > yet have a use-case for throttling non-SMBus devices so I'm not sure it's
> > worth poking at it, but would appreciate thoughts there.
> > 
> > Further, I've had a bit of a stab at dealing with atomic transfers that's
> > not been tested. Hopefully it makes sense.
> > 
> > Finally I'm also interested in feedback on exposing the control in a little
> > more general manner than having to implement a module parameter in all
> > drivers that want to take advantage of throttling. This isn't a big problem
> > at the moment, but if anyone has thoughts there then I'm happy to poke at
> > those too.
> > 
> 
> As mentioned in patch 2/2, I don't think a module parameter is a good idea.
> I think this should be implemented on driver level, similar to zl6100.c,
> it should be limited to affected devices and not be user controllable.
> 
> In respect to implementation in the i2c core vs in drivers: So far we
> encountered this problem for some Zilker labs devices and for some LTC
> devices. While the solution needed here looks similar to the solution
> implemented for Zilker labs devices, the solution for LTC devices is
> different. I am not sure if an implementation in the i2c core is
> desirable. It looks quite invasive to me, and it won't solve the problem
> for all devices since it isn't always a simple "wait <n> microseconds
> between accesses". For example, some devices may require a wait after
> a write but not after a read, or a wait only after certain commands (such
> as commands writing to an EEPROM). Other devices may require a mechanism
> different to "wait a certain period of time". It seems all but impossible
> to implement a generic mechanism on i2c level.

So I think it could be handled with an optional i2c client callback: e.g.

struct i2c_client {
...
bool (*prepare_device)(const struct i2c_client *client);
}

This way the logic to delay is kept inside the driver, catering to both the 
Zilker and the LTC devices. If the problem exists only after specific 
operations then we can stash some state in the client in the same way I've done 
in patch 1, test that state in the callback and only do the "preparation" if 
it's necessary.

I can knock that up and post another RFC, just so we can get a feel for how 
that solution looks.

Andrew