Message ID | 20171208184658.9588-9-Alexander.Steffen@infineon.com (mailing list archive) |
---|---|
State | Rejected |
Headers | show |
Hi Alexander, On Fri, 2017-12-08 at 19:46 +0100, Alexander Steffen wrote: > There do not seem to be any real limits one the amount/duration of wait > states that the TPM is allowed to generate. So at least give the TPM some > time to clean up the situation that caused the wait states instead of > retrying the transfers as fast as possible. Without this patch, the TPM performance on the pi is amazing! A thousand extends takes ~6.5 seconds. Unfortunately, the same thousand extends with this patch takes > 2+ minutes. TPM_TIMEOUT (5 msecs) is a really long time. Why so long? Mimi > Signed-off-by: Alexander Steffen <Alexander.Steffen@infineon.com> > --- > drivers/char/tpm/tpm_tis_spi.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/drivers/char/tpm/tpm_tis_spi.c b/drivers/char/tpm/tpm_tis_spi.c > index 2cc6aa9..e53c9c3 100644 > --- a/drivers/char/tpm/tpm_tis_spi.c > +++ b/drivers/char/tpm/tpm_tis_spi.c > @@ -88,7 +88,7 @@ static int tpm_tis_spi_transfer(struct tpm_tis_data *data, u32 addr, u16 len, > > if ((phy->iobuf[3] & 0x01) == 0) { > // handle SPI wait states > - for (i = 0; i < TPM_RETRY; i++) { > + for (i = 1; i <= TPM_RETRY; i++) { > phy->iobuf[0] = addr; > spi_xfer.len = 1; > spi_message_init(&m); > @@ -98,9 +98,10 @@ static int tpm_tis_spi_transfer(struct tpm_tis_data *data, u32 addr, u16 len, > goto exit; > if (phy->iobuf[0] & 0x01) > break; > + tpm_msleep(i * TPM_TIMEOUT); > } > > - if (i == TPM_RETRY) { > + if (i > TPM_RETRY) { > spi_xfer.tx_buf = NULL; > spi_xfer.rx_buf = NULL; > spi_xfer.len = 0;
On 11.01.2018 20:46, Mimi Zohar wrote: > Hi Alexander, > > On Fri, 2017-12-08 at 19:46 +0100, Alexander Steffen wrote: >> There do not seem to be any real limits one the amount/duration of wait >> states that the TPM is allowed to generate. So at least give the TPM some >> time to clean up the situation that caused the wait states instead of >> retrying the transfers as fast as possible. > > Without this patch, the TPM performance on the pi is amazing! A > thousand extends takes ~6.5 seconds. Unfortunately, the same thousand > extends with this patch takes > 2+ minutes. TPM_TIMEOUT (5 msecs) is > a really long time. Why so long? My problem is the lack of specification for the wait state functionality. How many wait states may be signaled by the TPM? For how long may the TPM signal wait states? Do you know any more details? The current implementation is based on the assumption that wait states are the exception rather than the rule, so that longer delays are acceptable. And without knowing for how long wait states can happen, I chose rather longer delays (with this patch the TPM has several seconds to clear wait states) to avoid failing on some unknown TPM implementation in some special cases. What would be a better implementation? No delay and simply retry for five seconds? What TPMs are you using for your tests? At least for the TPMs that I have available for my tests (with a FIFO size of ~256 bytes) I would not expect any wait states for extend commands. Alexander > > Mimi > > >> Signed-off-by: Alexander Steffen <Alexander.Steffen@infineon.com> >> --- >> drivers/char/tpm/tpm_tis_spi.c | 5 +++-- >> 1 file changed, 3 insertions(+), 2 deletions(-) >> >> diff --git a/drivers/char/tpm/tpm_tis_spi.c b/drivers/char/tpm/tpm_tis_spi.c >> index 2cc6aa9..e53c9c3 100644 >> --- a/drivers/char/tpm/tpm_tis_spi.c >> +++ b/drivers/char/tpm/tpm_tis_spi.c >> @@ -88,7 +88,7 @@ static int tpm_tis_spi_transfer(struct tpm_tis_data *data, u32 addr, u16 len, >> >> if ((phy->iobuf[3] & 0x01) == 0) { >> // handle SPI wait states >> - for (i = 0; i < TPM_RETRY; i++) { >> + for (i = 1; i <= TPM_RETRY; i++) { >> phy->iobuf[0] = addr; >> spi_xfer.len = 1; >> spi_message_init(&m); >> @@ -98,9 +98,10 @@ static int tpm_tis_spi_transfer(struct tpm_tis_data *data, u32 addr, u16 len, >> goto exit; >> if (phy->iobuf[0] & 0x01) >> break; >> + tpm_msleep(i * TPM_TIMEOUT); >> } >> >> - if (i == TPM_RETRY) { >> + if (i > TPM_RETRY) { >> spi_xfer.tx_buf = NULL; >> spi_xfer.rx_buf = NULL; >> spi_xfer.len = 0; > >
On Fri, 2018-01-12 at 09:28 +0100, Alexander Steffen wrote: > On 11.01.2018 20:46, Mimi Zohar wrote: > > Hi Alexander, > > > > On Fri, 2017-12-08 at 19:46 +0100, Alexander Steffen wrote: > >> There do not seem to be any real limits one the amount/duration of wait > >> states that the TPM is allowed to generate. So at least give the TPM some > >> time to clean up the situation that caused the wait states instead of > >> retrying the transfers as fast as possible. > > > > Without this patch, the TPM performance on the pi is amazing! A > > thousand extends takes ~6.5 seconds. Unfortunately, the same thousand > > extends with this patch takes > 2+ minutes. TPM_TIMEOUT (5 msecs) is > > a really long time. Why so long? > > My problem is the lack of specification for the wait state > functionality. How many wait states may be signaled by the TPM? For how > long may the TPM signal wait states? Do you know any more details? First, let me apologize for not thanking you for making these changes in the first place. With just the burstcount patch, but without your wait state patches, the pi doesn't boot. I now have a test environment and can reproduce the problem. :) With your wait state patches, but without this patch, the time is ~14 seconds per thousand extends, as opposed to the ~6.5 I reported above. [The ~6.5 seconds is for already measured files. Files that have already been measured are not re-measured, so the TPM is not extended again. On other systems, "re-measuring" a 1000 files takes less than 1 sec. I'm not sure why it is taking so long on the pi.] Without any sleeps, I'm seeing that when it goes into a wait state, it mostly loops once, every so often 3 times, but printing the counter introduces delays and probably skews the results. To get more accurate results, would require aggregating the results and only printing them after N number of extends. I'll try to get that information for you next week. > The current implementation is based on the assumption that wait states > are the exception rather than the rule, so that longer delays are > acceptable. And without knowing for how long wait states can happen, I > chose rather longer delays (with this patch the TPM has several seconds > to clear wait states) to avoid failing on some unknown TPM > implementation in some special cases. > > What would be a better implementation? No delay and simply retry for > five seconds? Perhaps use TPM_POLL_SLEEP (1 msec), which Nayna introduced to sleep within a max time loop. Instead of using the number of retries, which is currently defined as 50, define a max end time. You could still increment the timeout, but instead of "i * TPM_TIMEOUT) use "i * TPM_POLL_SLEEP". Although usleep_range is a lot better than msleep(), there is still some overhead. With tpm_msleep(), we don't have a way of sleeping for less than 1 msec. Perhaps the first time, when "i == 0", try again without sleeping. > What TPMs are you using for your tests? At least for the TPMs that I > have available for my tests (with a FIFO size of ~256 bytes) I would not > expect any wait states for extend commands. I'm also surprised. The TPM on the pi isn't on a real SPI bus. The test is a tight loop, which forces N number of extends. The pi is at work, but I'm working from home today. :( I'll have to get this info next week. Have you tried booting the pi with the "ima_tcb" boot command line option? Do you have any performance results that you could share? thanks, Mimi
> What would be a better implementation? No delay and simply retry for > five seconds? > > What TPMs are you using for your tests? At least for the TPMs that I > have available for my tests (with a FIFO size of ~256 bytes) I would not > expect any wait states for extend commands. The TPM burstcount is 32. Unfortunately, with this patch set the performance on the pi is worse than without it. Without this patch set we're seeing ~14 secs for a thousand measurements vs. either ~38s or ~23s, depending on the wait sleep length introduced in patch 8/9. This testing was done on a pi with an emulated SPI bus. On systems with a large burstcount, there is no difference, but on systems with a small burstcount, we're seeing an improvement of ~39% (~14s base time, ~8.5 with patches). Does anyone have a system with a TPM on a real SPI bus and is willing to test this patch set? thanks, Mimi
On Mon, 2018-01-15 at 17:30 -0500, Mimi Zohar wrote: > > What would be a better implementation? No delay and simply retry for > > five seconds? > > > > What TPMs are you using for your tests? At least for the TPMs that I > > have available for my tests (with a FIFO size of ~256 bytes) I would not > > expect any wait states for extend commands. > > The TPM burstcount is 32. Unfortunately, with this patch set the > performance on the pi is worse than without it. Without this patch > set we're seeing ~14 secs for a thousand measurements vs. either ~38s > or ~23s, depending on the wait sleep length introduced in patch 8/9. From my response, it might not have been clear that with all of the patches, except this one 8/9 calling tpm_msleep(), the performance is equivalent to without any of the patches. With this patch set, but with or without the call to tpm_msleep(), it loops 2 or 3 times. Mimi
On 17.01.2018 18:15, Mimi Zohar wrote: > On Mon, 2018-01-15 at 17:30 -0500, Mimi Zohar wrote: >>> What would be a better implementation? No delay and simply retry for >>> five seconds? >>> >>> What TPMs are you using for your tests? At least for the TPMs that I >>> have available for my tests (with a FIFO size of ~256 bytes) I would not >>> expect any wait states for extend commands. >> >> The TPM burstcount is 32. Unfortunately, with this patch set the >> performance on the pi is worse than without it. Without this patch >> set we're seeing ~14 secs for a thousand measurements vs. either ~38s >> or ~23s, depending on the wait sleep length introduced in patch 8/9. > > From my response, it might not have been clear that with all of the > patches, except this one 8/9 calling tpm_msleep(), the performance is > equivalent to without any of the patches. With this patch set, but > with or without the call to tpm_msleep(), it loops 2 or 3 times. Thanks for your tests. I haven't yet done such extensive performance tests for those changes. But I've got some performance tests that I still want to run to see what performance impact I can measure with my TPMs. It seems clear to me that for optimum performance there shouldn't be any sleeps, at least not for the first few retries. If I use a time limit instead of counting the number of retries, are sleeps necessary at all? The SPI bus is locked anyway, so nothing else can use it while we are sleeping. The SPI transfers should slow us down sufficiently, so that we are not consuming 100% of CPU time. And in the common cases (at least those that we know) the wait states seem to clear up pretty quickly. If there are no objections, I'll rewrite the patch in such a way (time limit of several seconds, no sleeps). Maybe it is also worth trying to see what performance can be gained by removing more sleeps, in wait_for_tpm_stat and elsewhere. Perhaps the first millisecond(s) or so should always be busy waiting, and sleeps should only come afterwards. Alexander
On Fri, Dec 08, 2017 at 07:46:57PM +0100, Alexander Steffen wrote: > There do not seem to be any real limits one the amount/duration of wait > states that the TPM is allowed to generate. So at least give the TPM some > time to clean up the situation that caused the wait states instead of > retrying the transfers as fast as possible. > > Signed-off-by: Alexander Steffen <Alexander.Steffen@infineon.com> I agreee with Mimis comment that this change should be based on time rather than retries. /Jarkko
diff --git a/drivers/char/tpm/tpm_tis_spi.c b/drivers/char/tpm/tpm_tis_spi.c index 2cc6aa9..e53c9c3 100644 --- a/drivers/char/tpm/tpm_tis_spi.c +++ b/drivers/char/tpm/tpm_tis_spi.c @@ -88,7 +88,7 @@ static int tpm_tis_spi_transfer(struct tpm_tis_data *data, u32 addr, u16 len, if ((phy->iobuf[3] & 0x01) == 0) { // handle SPI wait states - for (i = 0; i < TPM_RETRY; i++) { + for (i = 1; i <= TPM_RETRY; i++) { phy->iobuf[0] = addr; spi_xfer.len = 1; spi_message_init(&m); @@ -98,9 +98,10 @@ static int tpm_tis_spi_transfer(struct tpm_tis_data *data, u32 addr, u16 len, goto exit; if (phy->iobuf[0] & 0x01) break; + tpm_msleep(i * TPM_TIMEOUT); } - if (i == TPM_RETRY) { + if (i > TPM_RETRY) { spi_xfer.tx_buf = NULL; spi_xfer.rx_buf = NULL; spi_xfer.len = 0;
There do not seem to be any real limits one the amount/duration of wait states that the TPM is allowed to generate. So at least give the TPM some time to clean up the situation that caused the wait states instead of retrying the transfers as fast as possible. Signed-off-by: Alexander Steffen <Alexander.Steffen@infineon.com> --- drivers/char/tpm/tpm_tis_spi.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)