Message ID | 1465252079-126836-1-git-send-email-eswierk@skyportsystems.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Ed Swierk <eswierk@skyportsystems.com> wrote on 06/06/2016 06:27:59 PM: > > The STMicro ST19NP18-TPM sometimes takes much longer to execute > commands than it reports in its capabilities. For example, command 186 > (TPM_FlushSpecific) has been observed to take 14560 msec to complete, > far longer than the 750 msec limit for "short" commands reported by > the chip. The behavior has also been seen with command 101 > (TPM_GetCapability). Hm, those should be really fast. > > Worse, when the tpm_tis driver attempts to cancel the current command > (by writing commandReady = 1 to TPM_STS_x), the chip locks up > completely, returning all-1s from all memory-mapped register > reads. The lockup can be cleared only by resetting the system. > > The occurrence of this excessive command duration depends on the > sequence of commands preceding it. One sequence is creating at least 2 > new keys via TPM_CreateWrapKey, then letting the TPM idle for at least How long does it take to create those keys? Maybe it will create new keys in the 'background' after that. > 30 seconds, then loading a key via TPM_LoadKey2. The next > TPM_FlushSpecific occasionally takes tens of seconds to > complete. Another sequence is creating many keys in a row without > pause. The TPM_CreateWrapKey operation gets much slower after the > first few iterations, as one would expect when the pool of precomputed > keys is exhausted. Then after a 35-second pause, the same TPM_LoadKey2 > followed by TPM_FlushSpecific sequence triggers the behavior. > > Our working theory is that this older TPM sometimes pauses to perform > internal garbage collection, which modern chips implement as a > background process. Without access to the chip's implementation > details it's impossible to know whether any commands are immune to > this behavior. So it seems safest to ignore the chip's reported > command durations, and use a value much higher than any observed > duration, like 2 minutes (which happens to be the value used for > TPM_UNDEFINED commands in tpm_calc_ordinal_duration()). > > v2: Minor correction of patch description. > > Signed-off-by: Ed Swierk <eswierk@skyportsystems.com> Reviewed-by: Stefan Berger <stefanb@linux.vnet.ibm.com> ------------------------------------------------------------------------------ What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic patterns at an interface-level. Reveals which users, apps, and protocols are consuming the most bandwidth. Provides multi-vendor support for NetFlow, J-Flow, sFlow and other flows. Make informed decisions using capacity planning reports. https://ad.doubleclick.net/ddm/clk/305295220;132659582;e
On Mon, Jun 6, 2016 at 6:07 PM, Stefan Berger <stefanb@us.ibm.com> wrote: > Ed Swierk <eswierk@skyportsystems.com> wrote on 06/06/2016 06:27:59 PM: > > The occurrence of this excessive command duration depends on the > > sequence of commands preceding it. One sequence is creating at least 2 > > new keys via TPM_CreateWrapKey, then letting the TPM idle for at least > > How long does it take to create those keys? Maybe it will create new keys in the 'background' after that. The first few TPM_CreateWrapKey commands take roughly 300 msec. I've seen it go as high as 80 seconds after several of those in a row. It makes sense that a key generation process starts up after the chip thinks it's idle. Slowing down unrelated operations stretches the meaning of "background", of course. And locking up the chip is downright impolite. > > 30 seconds, then loading a key via TPM_LoadKey2. The next > > TPM_FlushSpecific occasionally takes tens of seconds to > > complete. Another sequence is creating many keys in a row without > > pause. The TPM_CreateWrapKey operation gets much slower after the > > first few iterations, as one would expect when the pool of precomputed > > keys is exhausted. Then after a 35-second pause, the same TPM_LoadKey2 > > followed by TPM_FlushSpecific sequence triggers the behavior. > > > > Our working theory is that this older TPM sometimes pauses to perform > > internal garbage collection, which modern chips implement as a > > background process. Without access to the chip's implementation > > details it's impossible to know whether any commands are immune to > > this behavior. So it seems safest to ignore the chip's reported > > command durations, and use a value much higher than any observed > > duration, like 2 minutes (which happens to be the value used for > > TPM_UNDEFINED commands in tpm_calc_ordinal_duration()). On further testing of my patch, I realize that I have totally confused timeouts and durations. I need to override the command durations (short, medium, long) reported by the chip. The reported protocol timeouts (a, b, c, d) are fine. Please consider this patch withdrawn. I'll send an updated patch shortly. --Ed ------------------------------------------------------------------------------ What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic patterns at an interface-level. Reveals which users, apps, and protocols are consuming the most bandwidth. Provides multi-vendor support for NetFlow, J-Flow, sFlow and other flows. Make informed decisions using capacity planning reports. https://ad.doubleclick.net/ddm/clk/305295220;132659582;e
On Mon, Jun 06, 2016 at 03:27:59PM -0700, Ed Swierk wrote: > The STMicro ST19NP18-TPM sometimes takes much longer to execute > commands than it reports in its capabilities. For example, command 186 > (TPM_FlushSpecific) has been observed to take 14560 msec to complete, > far longer than the 750 msec limit for "short" commands reported by > the chip. The behavior has also been seen with command 101 > (TPM_GetCapability). > > Worse, when the tpm_tis driver attempts to cancel the current command > (by writing commandReady = 1 to TPM_STS_x), the chip locks up > completely, returning all-1s from all memory-mapped register > reads. The lockup can be cleared only by resetting the system. Does this also happen when command doesn't take enormously long? I'm just trying to understand does this contain one or two issues. I don't know how hard this would be to test in practice with commands that last 750 ms (maybe with a quirk to kernel code). > The occurrence of this excessive command duration depends on the > sequence of commands preceding it. One sequence is creating at least 2 > new keys via TPM_CreateWrapKey, then letting the TPM idle for at least > 30 seconds, then loading a key via TPM_LoadKey2. The next > TPM_FlushSpecific occasionally takes tens of seconds to > complete. Another sequence is creating many keys in a row without > pause. The TPM_CreateWrapKey operation gets much slower after the > first few iterations, as one would expect when the pool of precomputed > keys is exhausted. Then after a 35-second pause, the same TPM_LoadKey2 > followed by TPM_FlushSpecific sequence triggers the behavior. > > Our working theory is that this older TPM sometimes pauses to perform > internal garbage collection, which modern chips implement as a > background process. Without access to the chip's implementation > details it's impossible to know whether any commands are immune to > this behavior. So it seems safest to ignore the chip's reported > command durations, and use a value much higher than any observed > duration, like 2 minutes (which happens to be the value used for > TPM_UNDEFINED commands in tpm_calc_ordinal_duration()). > > v2: Minor correction of patch description. > > Signed-off-by: Ed Swierk <eswierk@skyportsystems.com> Acked-by: Jarkko Sakkinne <jarkko.sakkinen@linux.intel.com> /Jarkko ------------------------------------------------------------------------------ What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic patterns at an interface-level. Reveals which users, apps, and protocols are consuming the most bandwidth. Provides multi-vendor support for NetFlow, J-Flow, sFlow and other flows. Make informed decisions using capacity planning reports. https://ad.doubleclick.net/ddm/clk/305295220;132659582;e
On Mon, Jun 06, 2016 at 06:48:10PM -0700, Ed Swierk wrote: > On Mon, Jun 6, 2016 at 6:07 PM, Stefan Berger <stefanb@us.ibm.com> wrote: > > Ed Swierk <eswierk@skyportsystems.com> wrote on 06/06/2016 06:27:59 PM: > > > The occurrence of this excessive command duration depends on the > > > sequence of commands preceding it. One sequence is creating at least 2 > > > new keys via TPM_CreateWrapKey, then letting the TPM idle for at least > > > > How long does it take to create those keys? Maybe it will create new keys in the 'background' after that. > > The first few TPM_CreateWrapKey commands take roughly 300 msec. I've > seen it go as high as 80 seconds after several of those in a row. > > It makes sense that a key generation process starts up after the chip > thinks it's idle. Slowing down unrelated operations stretches the > meaning of "background", of course. And locking up the chip is > downright impolite. > > > > 30 seconds, then loading a key via TPM_LoadKey2. The next > > > TPM_FlushSpecific occasionally takes tens of seconds to > > > complete. Another sequence is creating many keys in a row without > > > pause. The TPM_CreateWrapKey operation gets much slower after the > > > first few iterations, as one would expect when the pool of precomputed > > > keys is exhausted. Then after a 35-second pause, the same TPM_LoadKey2 > > > followed by TPM_FlushSpecific sequence triggers the behavior. > > > > > > Our working theory is that this older TPM sometimes pauses to perform > > > internal garbage collection, which modern chips implement as a > > > background process. Without access to the chip's implementation > > > details it's impossible to know whether any commands are immune to > > > this behavior. So it seems safest to ignore the chip's reported > > > command durations, and use a value much higher than any observed > > > duration, like 2 minutes (which happens to be the value used for > > > TPM_UNDEFINED commands in tpm_calc_ordinal_duration()). > > On further testing of my patch, I realize that I have totally confused > timeouts and durations. I need to override the command durations > (short, medium, long) reported by the chip. The reported protocol > timeouts (a, b, c, d) are fine. > > Please consider this patch withdrawn. I'll send an updated patch shortly. Ack. > --Ed /Jarkko ------------------------------------------------------------------------------ What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic patterns at an interface-level. Reveals which users, apps, and protocols are consuming the most bandwidth. Provides multi-vendor support for NetFlow, J-Flow, sFlow and other flows. Make informed decisions using capacity planning reports. https://ad.doubleclick.net/ddm/clk/305295220;132659582;e
diff --git a/drivers/char/tpm/tpm_tis.c b/drivers/char/tpm/tpm_tis.c index 088fa86..3070be2 100644 --- a/drivers/char/tpm/tpm_tis.c +++ b/drivers/char/tpm/tpm_tis.c @@ -484,6 +484,9 @@ static const struct tis_vendor_timeout_override vendor_timeout_overrides[] = { /* Atmel 3204 */ { 0x32041114, { (TIS_SHORT_TIMEOUT*1000), (TIS_LONG_TIMEOUT*1000), (TIS_SHORT_TIMEOUT*1000), (TIS_SHORT_TIMEOUT*1000) } }, + /* STMicro ST19NP18-TPM */ + { 0x0000104a, { (120*1000*1000), (120*1000*1000), + (120*1000*1000), (120*1000*1000) } }, }; static bool tpm_tis_update_timeouts(struct tpm_chip *chip,
The STMicro ST19NP18-TPM sometimes takes much longer to execute commands than it reports in its capabilities. For example, command 186 (TPM_FlushSpecific) has been observed to take 14560 msec to complete, far longer than the 750 msec limit for "short" commands reported by the chip. The behavior has also been seen with command 101 (TPM_GetCapability). Worse, when the tpm_tis driver attempts to cancel the current command (by writing commandReady = 1 to TPM_STS_x), the chip locks up completely, returning all-1s from all memory-mapped register reads. The lockup can be cleared only by resetting the system. The occurrence of this excessive command duration depends on the sequence of commands preceding it. One sequence is creating at least 2 new keys via TPM_CreateWrapKey, then letting the TPM idle for at least 30 seconds, then loading a key via TPM_LoadKey2. The next TPM_FlushSpecific occasionally takes tens of seconds to complete. Another sequence is creating many keys in a row without pause. The TPM_CreateWrapKey operation gets much slower after the first few iterations, as one would expect when the pool of precomputed keys is exhausted. Then after a 35-second pause, the same TPM_LoadKey2 followed by TPM_FlushSpecific sequence triggers the behavior. Our working theory is that this older TPM sometimes pauses to perform internal garbage collection, which modern chips implement as a background process. Without access to the chip's implementation details it's impossible to know whether any commands are immune to this behavior. So it seems safest to ignore the chip's reported command durations, and use a value much higher than any observed duration, like 2 minutes (which happens to be the value used for TPM_UNDEFINED commands in tpm_calc_ordinal_duration()). v2: Minor correction of patch description. Signed-off-by: Ed Swierk <eswierk@skyportsystems.com> --- drivers/char/tpm/tpm_tis.c | 3 +++ 1 file changed, 3 insertions(+)