Message ID | 1517498648.3145.4.camel@HansenPartnership.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Thu, Feb 01, 2018 at 03:24:08PM +0000, James Bottomley wrote: > OK, I investigated but now my TPM has returned to normal (as in it > passes the selftest immediately). There's clearly something that makes > it return TPM_RC_TESTING to every self test probe for seconds at a > time, but I don't know what it is. Sending a different command seems > to cause the problem to clear (Managed to reproduce once with the patch > to verify), so this is my proposed fix. It's clearly nonsensical to > detach the driver because the self test still returns TPM_RC_TESTING, > so convert that return to a TPM_RC_SUCCESS on timeout. It prints a > warning message so we'll see it in the logs if it causes problems. > Given that this seems to be some type of internal TPM issue, I don't > believe changing the timings would work. But if a selftest returns TPM2_RC_TESTING I would expect the next command to also fail with a testing in progress code? At least by the spec.. The point of invoking selftest is to get to a state where future TPM commands will succeed, so returning immediately on RC_TESTING seems wrong? Jason
On Thu, 2018-02-01 at 10:40 -0700, Jason Gunthorpe wrote: > On Thu, Feb 01, 2018 at 03:24:08PM +0000, James Bottomley wrote: > > > > > OK, I investigated but now my TPM has returned to normal (as in it > > passes the selftest immediately). There's clearly something that > > makes it return TPM_RC_TESTING to every self test probe for seconds > > at a time, but I don't know what it is. Sending a different > > command seems to cause the problem to clear (Managed to reproduce > > once with the patch to verify), so this is my proposed fix. It's > > clearly nonsensical to detach the driver because the self test > > still returns TPM_RC_TESTING, so convert that return to a > > TPM_RC_SUCCESS on timeout. It prints a warning message so we'll > > see it in the logs if it causes problems. Given that this seems to > > be some type of internal TPM issue, I don't believe changing the > > timings would work. > > But if a selftest returns TPM2_RC_TESTING I would expect the next > command to also fail with a testing in progress code? At least by the > spec.. No, the spec says only that the command may fail with TPM_RC_TESTING *if* the system it requires is under test. I have no idea what's up with my TPM. The selftests usually complete in under 1ms and return TPM_RC_SUCCESS. However occasionally the self test can return TPM_RC_TESTING for seconds. I've already booted with the patch and verified it doesn't induce any failures for me. Either the long running test is in a completely unconnected subsystem or invoking a different command forces the TPM to clear the testing state. > The point of invoking selftest is to get to a state where future TPM > commands will succeed, so returning immediately on RC_TESTING seems > wrong? Well it's definitely less wrong than the converse of actually detaching the TPM driver because a self test is apparently still in progress. If you depend on the TPM for a critical function, your entire system is hosed at that point. I honestly don't think we should be waiting for the self test at all. We should kick it off and treat any TPM_RC_TESTING error as -EAGAIN. We're already under fire for slow boot sequences and adding 2s just to wait for the TPM to self test adds to that for no real value. James
On Thu, Feb 01, 2018 at 07:46:04PM +0100, James Bottomley wrote: > I honestly don't think we should be waiting for the self test at all. > We should kick it off and treat any TPM_RC_TESTING error as -EAGAIN. > We're already under fire for slow boot sequences and adding 2s just to > wait for the TPM to self test adds to that for no real value. Arguably the BIOS should have completed the selftest - this stuff generally only exists to support embedded. I don't like the idea of EAGAIN, that just expose all our users to this mess. I would support making transmit_cmd genericly wait and retry if the TPM insists we need to wait for selftest to complete the specific command though. Jason
Dear James, Am 01.02.2018 um 16:24 schrieb James Bottomley: > On Thu, 2018-02-01 at 12:42 +0000, James Bottomley wrote: >> On Thu, 2018-02-01 at 13:21 +0100, Paul Menzel wrote: >>> On 02/01/18 13:16, James Bottomley wrote: >>>> >>>> >>>> Embarrassingly enough, I'm just on my way to do a TPM talk at >>>> FOSDEM. I installed my shiny new 4.15 kernel on the 'plane and >>>> this is what I got after I arrived this morning: >>>> >>>> jejb@jarvis:~> dmesg | grep -i tpm >>>> [ 0.000000] ACPI: TPM2 0x0000000079446CC0 000034 (v03 Tpm2Tabl 00000001 AMI 00000000) >>>> [ 1.598059] tpm_tis MSFT0101:00: 2.0 TPM (device-id 0xFE, rev-id 2) >>>> [ 1.608863] tpm tpm0: A TPM error (2314) occurred continue selftest >>>> [ 1.640052] tpm tpm0: A TPM error (2314) occurred continue selftest >>>> [ 1.691215] tpm tpm0: A TPM error (2314) occurred continue selftest >>>> [ 1.782377] tpm tpm0: A TPM error (2314) occurred continue selftest >>>> [ 1.953539] tpm tpm0: A TPM error (2314) occurred continue selftest >>>> [ 2.284701] tpm tpm0: A TPM error (2314) occurred continue selftest >>>> [ 2.935743] tpm tpm0: A TPM error (2314) occurred continue selftest >>>> [ 4.216236] tpm tpm0: TPM self test failed >>>> [ 4.236829] ima: No TPM chip found, activating TPM-bypass! (rc=-19) >>>> >>>> The error is TPM_RC_TESTING, which means it looks like we don't >>>> wait long enough for the selftests to complete. I get this all >>>> the time booting with 4.15. Fortunately I have a 4.13 backup >>>> kernel which is fine (otherwise I'd be a bit hosed since all my >>>> keys now require a TPM). >>>> >>>> I'll debug on the train; my current suspicion is that the >>>> TPM_LONG duration might be a bit short for this chip (A nuvoton >>>> 6xx in a dell XPS-13). >>> >>> Please join the thread [1], where I reported the same problem for >>> the Dell XPS 13 9360. Unfortunately, no solution was found, >>> especially, as I did not use the TPM. Other owners of that system >>> unfortunately didn’t have time to report back if it work for them, >>> so the “conclusion” kind of was, that my TPM was broken, and had to >>> be tested. >> >> OK, I'll try to find a fix. It's clearly a marginal problem since >> I've booted most -rc kernels without issue, so there's some slight >> timing change in 4.15 that triggered it. It could also be a shutdown >> issue. Any NV ram stuff deferred to start up would take a variable >> amount of time. >> >> You'd almost think it's some sort of TPM self protest: the more stuff >> I use it for the more problems it seems to create. I'm definitely >> motivated to fix it because without a TPM I can't actually do much >> with my laptop. > > OK, I investigated but now my TPM has returned to normal (as in it > passes the selftest immediately). There's clearly something that makes > it return TPM_RC_TESTING to every self test probe for seconds at a > time, but I don't know what it is. Sending a different command seems > to cause the problem to clear (Managed to reproduce once with the patch > to verify), so this is my proposed fix. It's clearly nonsensical to > detach the driver because the self test still returns TPM_RC_TESTING, > so convert that return to a TPM_RC_SUCCESS on timeout. It prints a > warning message so we'll see it in the logs if it causes problems. > Given that this seems to be some type of internal TPM issue, I don't > believe changing the timings would work. Maybe Mario can confirm this issue too, now that Linux 4.15 is released. Maybe he also has a way to get the Nuvoton people involved. > --- > > diff --git a/drivers/char/tpm/tpm2-cmd.c b/drivers/char/tpm/tpm2-cmd.c > index f40d20671a78..3e1b062d8888 100644 > --- a/drivers/char/tpm/tpm2-cmd.c > +++ b/drivers/char/tpm/tpm2-cmd.c > @@ -872,6 +872,17 @@ static int tpm2_do_selftest(struct tpm_chip *chip) > /* wait longer the next round */ > delay_msec *= 2; > } > + if (rc == TPM2_RC_TESTING) { > + /* > + * A return of RC_TESTING means the TPM is still > + * running self tests. If one fails it will go into > + * failure mode and return RC_FAILED to every command, > + * so treat a still in testing return as a success > + * rather than causing a driver detach. > + */ > + dev_err(&chip->dev,"TPM: Still in testing mode after %dms, continuing\n", delay_msec); > + rc = TPM2_RC_SUCCESS; > + } > > return rc; > } Alexander replied the following in the other thread. No idea if you read it yet. > The list of "A TPM error (2314) occurred continue selftest" is caused by my > commit 125a2210541079e8e7c69e629ad06cabed788f8c ("tpm: React correctly to > RC_TESTING from TPM 2.0 self tests") [1]. 2314 is TPM_RC_TESTING, so the TPM > tells us that self-tests are still running in the background. This problem was > not visible in previous versions, since it (incorrectly) ignored > TPM_RC_TESTING. Maybe the commit should be reverted for now until this has cleared up for the Dell XPS 13 9360(?) to adhere to Linux’ no regression policy. Kind regards, Paul PS: Alexander will also be at FOSDEM and mentioned your talk [2]. [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit?id=125a2210541079e8e7c69e629ad06cabed788f8[2] https://lists.01.org/pipermail/tpm2/2018-January/000486.html
[resend with regressions@ address fixed, sorry] Am 01.02.2018 um 20:16 schrieb Paul Menzel: > Dear James, > > > Am 01.02.2018 um 16:24 schrieb James Bottomley: >> On Thu, 2018-02-01 at 12:42 +0000, James Bottomley wrote: >>> On Thu, 2018-02-01 at 13:21 +0100, Paul Menzel wrote: > >>>> On 02/01/18 13:16, James Bottomley wrote: >>>>> >>>>> >>>>> Embarrassingly enough, I'm just on my way to do a TPM talk at >>>>> FOSDEM. I installed my shiny new 4.15 kernel on the 'plane and >>>>> this is what I got after I arrived this morning: >>>>> >>>>> jejb@jarvis:~> dmesg | grep -i tpm >>>>> [ 0.000000] ACPI: TPM2 0x0000000079446CC0 000034 >>>>> (v03 Tpm2Tabl 00000001 AMI 00000000) >>>>> [ 1.598059] tpm_tis MSFT0101:00: 2.0 TPM (device-id 0xFE, rev-id 2) >>>>> [ 1.608863] tpm tpm0: A TPM error (2314) occurred continue selftest >>>>> [ 1.640052] tpm tpm0: A TPM error (2314) occurred continue selftest >>>>> [ 1.691215] tpm tpm0: A TPM error (2314) occurred continue selftest >>>>> [ 1.782377] tpm tpm0: A TPM error (2314) occurred continue selftest >>>>> [ 1.953539] tpm tpm0: A TPM error (2314) occurred continue selftest >>>>> [ 2.284701] tpm tpm0: A TPM error (2314) occurred continue selftest >>>>> [ 2.935743] tpm tpm0: A TPM error (2314) occurred continue selftest >>>>> [ 4.216236] tpm tpm0: TPM self test failed >>>>> [ 4.236829] ima: No TPM chip found, activating TPM-bypass! (rc=-19) >>>>> >>>>> The error is TPM_RC_TESTING, which means it looks like we don't >>>>> wait long enough for the selftests to complete. I get this all >>>>> the time booting with 4.15. Fortunately I have a 4.13 backup >>>>> kernel which is fine (otherwise I'd be a bit hosed since all my >>>>> keys now require a TPM). >>>>> >>>>> I'll debug on the train; my current suspicion is that the >>>>> TPM_LONG duration might be a bit short for this chip (A nuvoton >>>>> 6xx in a dell XPS-13). >>>> >>>> Please join the thread [1], where I reported the same problem for >>>> the Dell XPS 13 9360. Unfortunately, no solution was found, >>>> especially, as I did not use the TPM. Other owners of that system >>>> unfortunately didn’t have time to report back if it work for them, >>>> so the “conclusion” kind of was, that my TPM was broken, and had to >>>> be tested. >>> >>> OK, I'll try to find a fix. It's clearly a marginal problem since >>> I've booted most -rc kernels without issue, so there's some slight >>> timing change in 4.15 that triggered it. It could also be a shutdown >>> issue. Any NV ram stuff deferred to start up would take a variable >>> amount of time. >>> >>> You'd almost think it's some sort of TPM self protest: the more stuff >>> I use it for the more problems it seems to create. I'm definitely >>> motivated to fix it because without a TPM I can't actually do much >>> with my laptop. >> >> OK, I investigated but now my TPM has returned to normal (as in it >> passes the selftest immediately). There's clearly something that makes >> it return TPM_RC_TESTING to every self test probe for seconds at a >> time, but I don't know what it is. Sending a different command seems >> to cause the problem to clear (Managed to reproduce once with the patch >> to verify), so this is my proposed fix. It's clearly nonsensical to >> detach the driver because the self test still returns TPM_RC_TESTING, >> so convert that return to a TPM_RC_SUCCESS on timeout. It prints a >> warning message so we'll see it in the logs if it causes problems. >> Given that this seems to be some type of internal TPM issue, I don't >> believe changing the timings would work. > > Maybe Mario can confirm this issue too, now that Linux 4.15 is released. > Maybe he also has a way to get the Nuvoton people involved. > >> --- >> >> diff --git a/drivers/char/tpm/tpm2-cmd.c b/drivers/char/tpm/tpm2-cmd.c >> index f40d20671a78..3e1b062d8888 100644 >> --- a/drivers/char/tpm/tpm2-cmd.c >> +++ b/drivers/char/tpm/tpm2-cmd.c >> @@ -872,6 +872,17 @@ static int tpm2_do_selftest(struct tpm_chip *chip) >> /* wait longer the next round */ >> delay_msec *= 2; >> } >> + if (rc == TPM2_RC_TESTING) { >> + /* >> + * A return of RC_TESTING means the TPM is still >> + * running self tests. If one fails it will go into >> + * failure mode and return RC_FAILED to every command, >> + * so treat a still in testing return as a success >> + * rather than causing a driver detach. >> + */ >> + dev_err(&chip->dev,"TPM: Still in testing mode after %dms, >> continuing\n", delay_msec); >> + rc = TPM2_RC_SUCCESS; >> + } >> return rc; >> } > > Alexander replied the following in the other thread. No idea if you read > it yet. > >> The list of "A TPM error (2314) occurred continue selftest" is caused >> by my commit 125a2210541079e8e7c69e629ad06cabed788f8c ("tpm: React >> correctly to >> RC_TESTING from TPM 2.0 self tests") [1]. 2314 is TPM_RC_TESTING, so >> the TPM >> tells us that self-tests are still running in the background. This >> problem was >> not visible in previous versions, since it (incorrectly) ignored > >> TPM_RC_TESTING. > > Maybe the commit should be reverted for now until this has cleared up > for the Dell XPS 13 9360(?) to adhere to Linux’ no regression policy. > > > Kind regards, > > Paul > > > PS: Alexander will also be at FOSDEM and mentioned your talk [2]. > > > [1] > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit?id=125a2210541079e8e7c69e629ad06cabed788f8[2] > https://lists.01.org/pipermail/tpm2/2018-January/000486.html
> -----Original Message----- > From: Paul Menzel [mailto:pmenzel@molgen.mpg.de] > Sent: Thursday, February 1, 2018 1:17 PM > To: James Bottomley <James.Bottomley@HansenPartnership.com> > Cc: linux-integrity <linux-integrity@vger.kernel.org>; Limonciello, Mario > <Mario_Limonciello@Dell.com>; regressions@leemhuis.info; Alexander Steffen > <Alexander.Steffen@infineon.com> > Subject: Re: TPM selftest failure in 4.15 (Dell XPS 13, Nuvoton 6xx) > > [resend with regressions@ address fixed, sorry] > > Am 01.02.2018 um 20:16 schrieb Paul Menzel: > > Dear James, > > > > > > Am 01.02.2018 um 16:24 schrieb James Bottomley: > >> On Thu, 2018-02-01 at 12:42 +0000, James Bottomley wrote: > >>> On Thu, 2018-02-01 at 13:21 +0100, Paul Menzel wrote: > > > >>>> On 02/01/18 13:16, James Bottomley wrote: > >>>>> > >>>>> > >>>>> Embarrassingly enough, I'm just on my way to do a TPM talk at > >>>>> FOSDEM. I installed my shiny new 4.15 kernel on the 'plane and > >>>>> this is what I got after I arrived this morning: > >>>>> > >>>>> jejb@jarvis:~> dmesg | grep -i tpm > >>>>> [ 0.000000] ACPI: TPM2 0x0000000079446CC0 000034 > >>>>> (v03 Tpm2Tabl 00000001 AMI 00000000) > >>>>> [ 1.598059] tpm_tis MSFT0101:00: 2.0 TPM (device-id 0xFE, rev-id 2) > >>>>> [ 1.608863] tpm tpm0: A TPM error (2314) occurred continue selftest > >>>>> [ 1.640052] tpm tpm0: A TPM error (2314) occurred continue selftest > >>>>> [ 1.691215] tpm tpm0: A TPM error (2314) occurred continue selftest > >>>>> [ 1.782377] tpm tpm0: A TPM error (2314) occurred continue selftest > >>>>> [ 1.953539] tpm tpm0: A TPM error (2314) occurred continue selftest > >>>>> [ 2.284701] tpm tpm0: A TPM error (2314) occurred continue selftest > >>>>> [ 2.935743] tpm tpm0: A TPM error (2314) occurred continue selftest > >>>>> [ 4.216236] tpm tpm0: TPM self test failed > >>>>> [ 4.236829] ima: No TPM chip found, activating TPM-bypass! (rc=-19) > >>>>> > >>>>> The error is TPM_RC_TESTING, which means it looks like we don't > >>>>> wait long enough for the selftests to complete. I get this all > >>>>> the time booting with 4.15. Fortunately I have a 4.13 backup > >>>>> kernel which is fine (otherwise I'd be a bit hosed since all my > >>>>> keys now require a TPM). > >>>>> > >>>>> I'll debug on the train; my current suspicion is that the > >>>>> TPM_LONG duration might be a bit short for this chip (A nuvoton > >>>>> 6xx in a dell XPS-13). > >>>> > >>>> Please join the thread [1], where I reported the same problem for > >>>> the Dell XPS 13 9360. Unfortunately, no solution was found, > >>>> especially, as I did not use the TPM. Other owners of that system > >>>> unfortunately didn’t have time to report back if it work for them, > >>>> so the “conclusion” kind of was, that my TPM was broken, and had to > >>>> be tested. > >>> > >>> OK, I'll try to find a fix. It's clearly a marginal problem since > >>> I've booted most -rc kernels without issue, so there's some slight > >>> timing change in 4.15 that triggered it. It could also be a shutdown > >>> issue. Any NV ram stuff deferred to start up would take a variable > >>> amount of time. > >>> > >>> You'd almost think it's some sort of TPM self protest: the more stuff > >>> I use it for the more problems it seems to create. I'm definitely > >>> motivated to fix it because without a TPM I can't actually do much > >>> with my laptop. > >> > >> OK, I investigated but now my TPM has returned to normal (as in it > >> passes the selftest immediately). There's clearly something that makes > >> it return TPM_RC_TESTING to every self test probe for seconds at a > >> time, but I don't know what it is. Sending a different command seems > >> to cause the problem to clear (Managed to reproduce once with the patch > >> to verify), so this is my proposed fix. It's clearly nonsensical to > >> detach the driver because the self test still returns TPM_RC_TESTING, > >> so convert that return to a TPM_RC_SUCCESS on timeout. It prints a > >> warning message so we'll see it in the logs if it causes problems. > >> Given that this seems to be some type of internal TPM issue, I don't > >> believe changing the timings would work. > > > > Maybe Mario can confirm this issue too, now that Linux 4.15 is released. > > Maybe he also has a way to get the Nuvoton people involved. James, Did you actually experiment with changing the timings? I was told that TPMs that are FIPS validated (such as that in the XPS 13) may take longer for the self tests to run. > > > >> --- > >> > >> diff --git a/drivers/char/tpm/tpm2-cmd.c b/drivers/char/tpm/tpm2-cmd.c > >> index f40d20671a78..3e1b062d8888 100644 > >> --- a/drivers/char/tpm/tpm2-cmd.c > >> +++ b/drivers/char/tpm/tpm2-cmd.c > >> @@ -872,6 +872,17 @@ static int tpm2_do_selftest(struct tpm_chip *chip) > >> /* wait longer the next round */ > >> delay_msec *= 2; > >> } > >> + if (rc == TPM2_RC_TESTING) { > >> + /* > >> + * A return of RC_TESTING means the TPM is still > >> + * running self tests. If one fails it will go into > >> + * failure mode and return RC_FAILED to every command, > >> + * so treat a still in testing return as a success > >> + * rather than causing a driver detach. > >> + */ > >> + dev_err(&chip->dev,"TPM: Still in testing mode after %dms, > >> continuing\n", delay_msec); > >> + rc = TPM2_RC_SUCCESS; > >> + } > >> return rc; > >> } > > > > Alexander replied the following in the other thread. No idea if you read > > it yet. > > > >> The list of "A TPM error (2314) occurred continue selftest" is caused > >> by my commit 125a2210541079e8e7c69e629ad06cabed788f8c ("tpm: React > >> correctly to > >> RC_TESTING from TPM 2.0 self tests") [1]. 2314 is TPM_RC_TESTING, so > >> the TPM > >> tells us that self-tests are still running in the background. This > >> problem was > >> not visible in previous versions, since it (incorrectly) ignored > > >> TPM_RC_TESTING. > > > > Maybe the commit should be reverted for now until this has cleared up > > for the Dell XPS 13 9360(?) to adhere to Linux’ no regression policy. > > > > > > Kind regards, > > > > Paul > > > > > > PS: Alexander will also be at FOSDEM and mentioned your talk [2]. > > > > > > [1] > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit?id=125 > a2210541079e8e7c69e629ad06cabed788f8[2] > > https://lists.01.org/pipermail/tpm2/2018-January/000486.html
> -----Original Message----- > From: Limonciello, Mario > Sent: Thursday, February 1, 2018 2:12 PM > To: 'Paul Menzel' <pmenzel@molgen.mpg.de>; James Bottomley > <James.Bottomley@HansenPartnership.com> > Cc: linux-integrity <linux-integrity@vger.kernel.org>; regressions@leemhuis.info; > Alexander Steffen <Alexander.Steffen@infineon.com> > Subject: RE: TPM selftest failure in 4.15 (Dell XPS 13, Nuvoton 6xx) > > > > > -----Original Message----- > > From: Paul Menzel [mailto:pmenzel@molgen.mpg.de] > > Sent: Thursday, February 1, 2018 1:17 PM > > To: James Bottomley <James.Bottomley@HansenPartnership.com> > > Cc: linux-integrity <linux-integrity@vger.kernel.org>; Limonciello, Mario > > <Mario_Limonciello@Dell.com>; regressions@leemhuis.info; Alexander Steffen > > <Alexander.Steffen@infineon.com> > > Subject: Re: TPM selftest failure in 4.15 (Dell XPS 13, Nuvoton 6xx) > > > > [resend with regressions@ address fixed, sorry] > > > > Am 01.02.2018 um 20:16 schrieb Paul Menzel: > > > Dear James, > > > > > > > > > Am 01.02.2018 um 16:24 schrieb James Bottomley: > > >> On Thu, 2018-02-01 at 12:42 +0000, James Bottomley wrote: > > >>> On Thu, 2018-02-01 at 13:21 +0100, Paul Menzel wrote: > > > > > >>>> On 02/01/18 13:16, James Bottomley wrote: > > >>>>> > > >>>>> > > >>>>> Embarrassingly enough, I'm just on my way to do a TPM talk at > > >>>>> FOSDEM. I installed my shiny new 4.15 kernel on the 'plane and > > >>>>> this is what I got after I arrived this morning: > > >>>>> > > >>>>> jejb@jarvis:~> dmesg | grep -i tpm > > >>>>> [ 0.000000] ACPI: TPM2 0x0000000079446CC0 000034 > > >>>>> (v03 Tpm2Tabl 00000001 AMI 00000000) > > >>>>> [ 1.598059] tpm_tis MSFT0101:00: 2.0 TPM (device-id 0xFE, rev-id 2) > > >>>>> [ 1.608863] tpm tpm0: A TPM error (2314) occurred continue selftest > > >>>>> [ 1.640052] tpm tpm0: A TPM error (2314) occurred continue selftest > > >>>>> [ 1.691215] tpm tpm0: A TPM error (2314) occurred continue selftest > > >>>>> [ 1.782377] tpm tpm0: A TPM error (2314) occurred continue selftest > > >>>>> [ 1.953539] tpm tpm0: A TPM error (2314) occurred continue selftest > > >>>>> [ 2.284701] tpm tpm0: A TPM error (2314) occurred continue selftest > > >>>>> [ 2.935743] tpm tpm0: A TPM error (2314) occurred continue selftest > > >>>>> [ 4.216236] tpm tpm0: TPM self test failed > > >>>>> [ 4.236829] ima: No TPM chip found, activating TPM-bypass! (rc=-19) > > >>>>> > > >>>>> The error is TPM_RC_TESTING, which means it looks like we don't > > >>>>> wait long enough for the selftests to complete. I get this all > > >>>>> the time booting with 4.15. Fortunately I have a 4.13 backup > > >>>>> kernel which is fine (otherwise I'd be a bit hosed since all my > > >>>>> keys now require a TPM). > > >>>>> > > >>>>> I'll debug on the train; my current suspicion is that the > > >>>>> TPM_LONG duration might be a bit short for this chip (A nuvoton > > >>>>> 6xx in a dell XPS-13). > > >>>> > > >>>> Please join the thread [1], where I reported the same problem for > > >>>> the Dell XPS 13 9360. Unfortunately, no solution was found, > > >>>> especially, as I did not use the TPM. Other owners of that system > > >>>> unfortunately didn’t have time to report back if it work for them, > > >>>> so the “conclusion” kind of was, that my TPM was broken, and had to > > >>>> be tested. > > >>> > > >>> OK, I'll try to find a fix. It's clearly a marginal problem since > > >>> I've booted most -rc kernels without issue, so there's some slight > > >>> timing change in 4.15 that triggered it. It could also be a shutdown > > >>> issue. Any NV ram stuff deferred to start up would take a variable > > >>> amount of time. > > >>> > > >>> You'd almost think it's some sort of TPM self protest: the more stuff > > >>> I use it for the more problems it seems to create. I'm definitely > > >>> motivated to fix it because without a TPM I can't actually do much > > >>> with my laptop. > > >> > > >> OK, I investigated but now my TPM has returned to normal (as in it > > >> passes the selftest immediately). There's clearly something that makes > > >> it return TPM_RC_TESTING to every self test probe for seconds at a > > >> time, but I don't know what it is. Sending a different command seems > > >> to cause the problem to clear (Managed to reproduce once with the patch > > >> to verify), so this is my proposed fix. It's clearly nonsensical to > > >> detach the driver because the self test still returns TPM_RC_TESTING, > > >> so convert that return to a TPM_RC_SUCCESS on timeout. It prints a > > >> warning message so we'll see it in the logs if it causes problems. > > >> Given that this seems to be some type of internal TPM issue, I don't > > >> believe changing the timings would work. > > > > > > Maybe Mario can confirm this issue too, now that Linux 4.15 is released. > > > Maybe he also has a way to get the Nuvoton people involved. > > James, > > Did you actually experiment with changing the timings? > > I was told that TPMs that are FIPS validated (such as that in the XPS 13) may > take longer for the self tests to run. > > > > > > >> --- > > >> > > >> diff --git a/drivers/char/tpm/tpm2-cmd.c b/drivers/char/tpm/tpm2-cmd.c > > >> index f40d20671a78..3e1b062d8888 100644 > > >> --- a/drivers/char/tpm/tpm2-cmd.c > > >> +++ b/drivers/char/tpm/tpm2-cmd.c > > >> @@ -872,6 +872,17 @@ static int tpm2_do_selftest(struct tpm_chip *chip) > > >> /* wait longer the next round */ > > >> delay_msec *= 2; > > >> } > > >> + if (rc == TPM2_RC_TESTING) { > > >> + /* > > >> + * A return of RC_TESTING means the TPM is still > > >> + * running self tests. If one fails it will go into > > >> + * failure mode and return RC_FAILED to every command, > > >> + * so treat a still in testing return as a success > > >> + * rather than causing a driver detach. > > >> + */ > > >> + dev_err(&chip->dev,"TPM: Still in testing mode after %dms, > > >> continuing\n", delay_msec); > > >> + rc = TPM2_RC_SUCCESS; > > >> + } > > >> return rc; > > >> } > > > I discussed this with some folks and although it would fix the problem it is not accurately characterizing the situation. What is likely happening here is that issuing the self test command in succession is causing the TPM to restart the self test and not complete. Instead the selfTestDone bit should be polled. I feel Paul is right, if a solution can't be brought up to do this instead, this commit 125a2210541079e8e7c69e629ad06cabed788f8c should be reverted. > > > Alexander replied the following in the other thread. No idea if you read > > > it yet. > > > > > >> The list of "A TPM error (2314) occurred continue selftest" is caused > > >> by my commit 125a2210541079e8e7c69e629ad06cabed788f8c ("tpm: React > > >> correctly to > > >> RC_TESTING from TPM 2.0 self tests") [1]. 2314 is TPM_RC_TESTING, so > > >> the TPM > > >> tells us that self-tests are still running in the background. This > > >> problem was > > >> not visible in previous versions, since it (incorrectly) ignored > > > >> TPM_RC_TESTING. > > > > > > Maybe the commit should be reverted for now until this has cleared up > > > for the Dell XPS 13 9360(?) to adhere to Linux’ no regression policy. > > > > > > > > > Kind regards, > > > > > > Paul > > > > > > > > > PS: Alexander will also be at FOSDEM and mentioned your talk [2]. > > > > > > > > > [1] > > > > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit?id=125 > > a2210541079e8e7c69e629ad06cabed788f8[2] > > > https://lists.01.org/pipermail/tpm2/2018-January/000486.html
On Thu, Feb 01, 2018 at 09:06:55PM +0000, Mario.Limonciello@dell.com wrote: > I discussed this with some folks and although it would fix the > problem it is not accurately characterizing the situation. What is > likely happening here is that issuing the self test command in > succession is causing the TPM to restart the self test and not > complete. Instead the selfTestDone bit should be polled. Oh, I wonder if that is beacuse this was copied from the TPM1 version where we don't have a bit like that?? Should we check this selfTestDone bit before event sending the self test? Jsaon
On Thu, 2018-02-01 at 20:12 +0000, Mario.Limonciello@dell.com wrote: > > > > > -----Original Message----- > > From: Paul Menzel [mailto:pmenzel@molgen.mpg.de] > > Sent: Thursday, February 1, 2018 1:17 PM > > To: James Bottomley <James.Bottomley@HansenPartnership.com> > > Cc: linux-integrity <linux-integrity@vger.kernel.org>; Limonciello, > > Mario > > <Mario_Limonciello@Dell.com>; regressions@leemhuis.info; Alexander > > Steffen > > <Alexander.Steffen@infineon.com> > > Subject: Re: TPM selftest failure in 4.15 (Dell XPS 13, Nuvoton > > 6xx) > > > > [resend with regressions@ address fixed, sorry] > > > > Am 01.02.2018 um 20:16 schrieb Paul Menzel: > > > > > > Dear James, > > > > > > > > > Am 01.02.2018 um 16:24 schrieb James Bottomley: > > > > > > > > On Thu, 2018-02-01 at 12:42 +0000, James Bottomley wrote: > > > > > > > > > > On Thu, 2018-02-01 at 13:21 +0100, Paul Menzel wrote: > > > > > > > > > > > > > > > > > > > > > > > > On 02/01/18 13:16, James Bottomley wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > Embarrassingly enough, I'm just on my way to do a TPM > > > > > > > talk at > > > > > > > FOSDEM. I installed my shiny new 4.15 kernel on the > > > > > > > 'plane and > > > > > > > this is what I got after I arrived this morning: > > > > > > > > > > > > > > jejb@jarvis:~> dmesg | grep -i tpm > > > > > > > [ 0.000000] ACPI: TPM2 0x0000000079446CC0 000034 > > > > > > > (v03 Tpm2Tabl 00000001 AMI 00000000) > > > > > > > [ 1.598059] tpm_tis MSFT0101:00: 2.0 TPM (device-id > > > > > > > 0xFE, rev-id 2) > > > > > > > [ 1.608863] tpm tpm0: A TPM error (2314) occurred > > > > > > > continue selftest > > > > > > > [ 1.640052] tpm tpm0: A TPM error (2314) occurred > > > > > > > continue selftest > > > > > > > [ 1.691215] tpm tpm0: A TPM error (2314) occurred > > > > > > > continue selftest > > > > > > > [ 1.782377] tpm tpm0: A TPM error (2314) occurred > > > > > > > continue selftest > > > > > > > [ 1.953539] tpm tpm0: A TPM error (2314) occurred > > > > > > > continue selftest > > > > > > > [ 2.284701] tpm tpm0: A TPM error (2314) occurred > > > > > > > continue selftest > > > > > > > [ 2.935743] tpm tpm0: A TPM error (2314) occurred > > > > > > > continue selftest > > > > > > > [ 4.216236] tpm tpm0: TPM self test failed > > > > > > > [ 4.236829] ima: No TPM chip found, activating TPM- > > > > > > > bypass! (rc=-19) > > > > > > > > > > > > > > The error is TPM_RC_TESTING, which means it looks like we > > > > > > > don't wait long enough for the selftests to complete. I > > > > > > > get this all the time booting with 4.15. Fortunately I > > > > > > > have a 4.13 backup kernel which is fine (otherwise I'd be > > > > > > > a bit hosed since all my keys now require a TPM). > > > > > > > > > > > > > > I'll debug on the train; my current suspicion is that the > > > > > > > TPM_LONG duration might be a bit short for this chip (A > > > > > > > nuvoton 6xx in a dell XPS-13). > > > > > > > > > > > > Please join the thread [1], where I reported the same > > > > > > problem for the Dell XPS 13 9360. Unfortunately, no > > > > > > solution was found, especially, as I did not use the TPM. > > > > > > Other owners of that system unfortunately didn’t have time > > > > > > to report back if it work for them, so the “conclusion” > > > > > > kind of was, that my TPM was broken, and had to be tested. > > > > > > > > > > OK, I'll try to find a fix. It's clearly a marginal problem > > > > > since I've booted most -rc kernels without issue, so there's > > > > > some slight timing change in 4.15 that triggered it. It > > > > > could also be a shutdown issue. Any NV ram stuff deferred to > > > > > start up would take a variable amount of time. > > > > > > > > > > You'd almost think it's some sort of TPM self protest: the > > > > > more stuff I use it for the more problems it seems to create. > > > > > I'm definitely motivated to fix it because without a TPM I > > > > > can't actually do much with my laptop. > > > > > > > > OK, I investigated but now my TPM has returned to normal (as in > > > > it passes the selftest immediately). There's clearly something > > > > that makes it return TPM_RC_TESTING to every self test probe > > > > for seconds at a time, but I don't know what it is. Sending a > > > > different command seems to cause the problem to clear (Managed > > > > to reproduce once with the patch to verify), so this is my > > > > proposed fix. It's clearly nonsensical to detach the driver > > > > because the self test still returns TPM_RC_TESTING, > > > > so convert that return to a TPM_RC_SUCCESS on timeout. It > > > > prints a warning message so we'll see it in the logs if it > > > > causes problems. Given that this seems to be some type of > > > > internal TPM issue, I don't believe changing the timings would > > > > work. > > > > > > Maybe Mario can confirm this issue too, now that Linux 4.15 is > > > released. Maybe he also has a way to get the Nuvoton people > > > involved. > > James, > > Did you actually experiment with changing the timings? No, I already said: waiting 2s for a device driver init is already too great a burden on the boot sequence. I don't honestly think waiting longer would help either ... 2s is a huge amount of time so there's something else going on with the TPM. James > I was told that TPMs that are FIPS validated (such as that in the XPS > 13) may take longer for the self tests to run. > > > > > > > > > > > > > > > > > --- > > > > > > > > diff --git a/drivers/char/tpm/tpm2-cmd.c > > > > b/drivers/char/tpm/tpm2-cmd.c > > > > index f40d20671a78..3e1b062d8888 100644 > > > > --- a/drivers/char/tpm/tpm2-cmd.c > > > > +++ b/drivers/char/tpm/tpm2-cmd.c > > > > @@ -872,6 +872,17 @@ static int tpm2_do_selftest(struct > > > > tpm_chip *chip) > > > > /* wait longer the next round */ > > > > delay_msec *= 2; > > > > } > > > > + if (rc == TPM2_RC_TESTING) { > > > > + /* > > > > + * A return of RC_TESTING means the TPM is still > > > > + * running self tests. If one fails it will go into > > > > + * failure mode and return RC_FAILED to every command, > > > > + * so treat a still in testing return as a success > > > > + * rather than causing a driver detach. > > > > + */ > > > > + dev_err(&chip->dev,"TPM: Still in testing mode after > > > > %dms, > > > > continuing\n", delay_msec); > > > > + rc = TPM2_RC_SUCCESS; > > > > + } > > > > return rc; > > > > } > > > > > > Alexander replied the following in the other thread. No idea if > > > you read > > > it yet. > > > > > > > > > > > The list of "A TPM error (2314) occurred continue selftest" is > > > > caused > > > > by my commit 125a2210541079e8e7c69e629ad06cabed788f8c ("tpm: > > > > React > > > > correctly to > > > > RC_TESTING from TPM 2.0 self tests") [1]. 2314 is > > > > TPM_RC_TESTING, so > > > > the TPM > > > > tells us that self-tests are still running in the background. > > > > This > > > > problem was > > > > not visible in previous versions, since it (incorrectly) > > > > ignored > > > > > TPM_RC_TESTING. > > > > > > Maybe the commit should be reverted for now until this has > > > cleared up > > > for the Dell XPS 13 9360(?) to adhere to Linux’ no regression > > > policy. > > > > > > > > > Kind regards, > > > > > > Paul > > > > > > > > > PS: Alexander will also be at FOSDEM and mentioned your talk [2]. > > > > > > > > > [1] > > > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/ > > commit?id=125 > > a2210541079e8e7c69e629ad06cabed788f8[2] > > > > > > https://lists.01.org/pipermail/tpm2/2018-January/000486.html
On Thu, 2018-02-01 at 15:22 -0700, Jason Gunthorpe wrote: > On Thu, Feb 01, 2018 at 09:06:55PM +0000, Mario.Limonciello@dell.com > wrote: > > > > > I discussed this with some folks and although it would fix the > > problem it is not accurately characterizing the situation. What is > > likely happening here is that issuing the self test command in > > succession is causing the TPM to restart the self test and not > > complete. Instead the selfTestDone bit should be polled. > > Oh, I wonder if that is beacuse this was copied from the TPM1 version > where we don't have a bit like that?? > > Should we check this selfTestDone bit before event sending the self > test? The proposed patch already takes care of that by only sending the self test once. I suppose checking the selfTestDone bit and not sending a self test at all could be an additional optimization on top, though. James
On Thu, Feb 01, 2018 at 03:24:08PM +0000, James Bottomley wrote: > On Thu, 2018-02-01 at 12:42 +0000, James Bottomley wrote: > > On Thu, 2018-02-01 at 13:21 +0100, Paul Menzel wrote: > > > > > > Dear James, > > > > > > > > > On 02/01/18 13:16, James Bottomley wrote: > > > > > > > > > > > > Embarrassingly enough, I'm just on my way to do a TPM talk at > > > > FOSDEM. I installed my shiny new 4.15 kernel on the 'plane and > > > > this is what I got after I arrived this morning: > > > > > > > > jejb@jarvis:~> dmesg | grep -i tpm > > > > [ 0.000000] ACPI: TPM2 0x0000000079446CC0 000034 > > > > (v03 Tpm2Tabl 00000001 AMI 00000000) > > > > [ 1.598059] tpm_tis MSFT0101:00: 2.0 TPM (device-id 0xFE, rev- > > > > id > > > > 2) > > > > [ 1.608863] tpm tpm0: A TPM error (2314) occurred continue > > > > selftest > > > > [ 1.640052] tpm tpm0: A TPM error (2314) occurred continue > > > > selftest > > > > [ 1.691215] tpm tpm0: A TPM error (2314) occurred continue > > > > selftest > > > > [ 1.782377] tpm tpm0: A TPM error (2314) occurred continue > > > > selftest > > > > [ 1.953539] tpm tpm0: A TPM error (2314) occurred continue > > > > selftest > > > > [ 2.284701] tpm tpm0: A TPM error (2314) occurred continue > > > > selftest > > > > [ 2.935743] tpm tpm0: A TPM error (2314) occurred continue > > > > selftest > > > > [ 4.216236] tpm tpm0: TPM self test failed > > > > [ 4.236829] ima: No TPM chip found, activating TPM-bypass! > > > > (rc=- > > > > 19) > > > > > > > > The error is TPM_RC_TESTING, which means it looks like we don't > > > > wait long enough for the selftests to complete. I get this all > > > > the time booting with 4.15. Fortunately I have a 4.13 backup > > > > kernel which is fine (otherwise I'd be a bit hosed since all my > > > > keys now require a TPM). > > > > > > > > I'll debug on the train; my current suspicion is that the > > > > TPM_LONG duration might be a bit short for this chip (A nuvoton > > > > 6xx in a dell XPS-13). > > > > > > Please join the thread [1], where I reported the same problem for > > > the Dell XPS 13 9360. Unfortunately, no solution was found, > > > especially, as I did not use the TPM. Other owners of that system > > > unfortunately didn’t have time to report back if it work for them, > > > so the “conclusion” kind of was, that my TPM was broken, and had to > > > be tested. > > > > OK, I'll try to find a fix. It's clearly a marginal problem since > > I've booted most -rc kernels without issue, so there's some slight > > timing change in 4.15 that triggered it. It could also be a shutdown > > issue. Any NV ram stuff deferred to start up would take a variable > > amount of time. > > > > You'd almost think it's some sort of TPM self protest: the more stuff > > I use it for the more problems it seems to create. I'm definitely > > motivated to fix it because without a TPM I can't actually do much > > with my laptop. > > OK, I investigated but now my TPM has returned to normal (as in it > passes the selftest immediately). There's clearly something that makes > it return TPM_RC_TESTING to every self test probe for seconds at a > time, but I don't know what it is. Sending a different command seems > to cause the problem to clear (Managed to reproduce once with the patch > to verify), so this is my proposed fix. It's clearly nonsensical to > detach the driver because the self test still returns TPM_RC_TESTING, > so convert that return to a TPM_RC_SUCCESS on timeout. It prints a > warning message so we'll see it in the logs if it causes problems. > Given that this seems to be some type of internal TPM issue, I don't > believe changing the timings would work. I don't think this is a sane rationale for a fix if the driver has worked just fine on 4.14. It would be better to first identify the commit that causes the regression and plan after that how to fix it. /Jarkko
On Thu, Feb 01, 2018 at 08:16:26PM +0100, Paul Menzel wrote: > > The list of "A TPM error (2314) occurred continue selftest" is caused by > > my commit 125a2210541079e8e7c69e629ad06cabed788f8c ("tpm: React > > correctly to > > RC_TESTING from TPM 2.0 self tests") [1]. 2314 is TPM_RC_TESTING, so the TPM > > tells us that self-tests are still running in the background. This problem was > > not visible in previous versions, since it (incorrectly) ignored > TPM_RC_TESTING. > > Maybe the commit should be reverted for now until this has cleared up for > the Dell XPS 13 9360(?) to adhere to Linux’ no regression policy. I think this should be done as an immediate action with Cc to stable. /Jarkko
On 2/1/2018 3:12 PM, Mario.Limonciello@dell.com wrote: > > I was told that TPMs that are FIPS validated (such as that in the XPS 13) may > take longer for the self tests to run. I don't understand why the SeftTest command should take longer. I understand that a FIPS TPM must do all the self tests before it returns any results, while a non-FIPS must only test the algorithms required to return the result. So, some other command will take longer. ~~ From the TPM spec: FIPS 140-2 requires that all power-on self-tests be complete before the TPM returns any value that depends on the results of a testable function. If compliance with FIPS 140-2 is required, any command that requires use of an untested function causes the TPM to operate as if TPM2_SelfTest(fullTest = NO) was received. The TPM returns TPM_RC_TESTING and continues to return TPM_RC_TESTING until all tests are complete. Alternatively, it may complete all tests and then complete the command. It may also return TPM_RC_NEEDS_TEST.
On 2/1/2018 12:40 PM, Jason Gunthorpe wrote: > > But if a selftest returns TPM2_RC_TESTING I would expect the next > command to also fail with a testing in progress code? At least by the > spec.. It will return that code if the command requires an untested function. For FIPS, the code is returned if there is any untested function. Commands that don't require testable functions will succeed. > The point of invoking selftest is to get to a state where future TPM > commands will succeed, so returning immediately on RC_TESTING seems > wrong? Blocking vs. non-blocking is a TPM vendor implementation option. Re., "seems wrong", the PC Client TPM changed the requirement, so that a guaranteed non-blocking option is available. ~~ 2. On receipt of TPM2_SelfTest(fullTest == YES), the TPM SHALL perform a full self-test and return the result when all tests are complete. ~~ Of course, at some layer, the device drive is still polling, either for a success code (non-blocking) or a response (blocking).
diff --git a/drivers/char/tpm/tpm2-cmd.c b/drivers/char/tpm/tpm2-cmd.c index f40d20671a78..3e1b062d8888 100644 --- a/drivers/char/tpm/tpm2-cmd.c +++ b/drivers/char/tpm/tpm2-cmd.c @@ -872,6 +872,17 @@ static int tpm2_do_selftest(struct tpm_chip *chip) /* wait longer the next round */ delay_msec *= 2; } + if (rc == TPM2_RC_TESTING) { + /* + * A return of RC_TESTING means the TPM is still + * running self tests. If one fails it will go into + * failure mode and return RC_FAILED to every command, + * so treat a still in testing return as a success + * rather than causing a driver detach. + */ + dev_err(&chip->dev,"TPM: Still in testing mode after %dms, continuing\n", delay_msec); + rc = TPM2_RC_SUCCESS; + } return rc; }