Message ID | 20220706164043.417780-1-jandryuk@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | tpm_tis: Hold locality open during probe | expand |
在 2022/7/7 0:41, Jason Andryuk 写道: > WEC TPMs (in 1.2 mode) and NTC (in 2.0 mode) have been observer to > frequently, but intermittently, fail probe with: > tpm_tis: probe of 00:09 failed with error -1 > > Added debugging output showed that the request_locality in > tpm_tis_core_init succeeds, but then the tpm_chip_start fails when its > call to tpm_request_locality -> request_locality fails. > > The access register in check_locality would show: > 0x80 TPM_ACCESS_VALID > 0x82 TPM_ACCESS_VALID | TPM_ACCESS_REQUEST_USE > 0x80 TPM_ACCESS_VALID > continuing until it times out. TPM_ACCESS_ACTIVE_LOCALITY (0x20) doesn't > get set which would end the wait. > > My best guess is something racy was going on between release_locality's > write and request_locality's write. There is no wait in > release_locality to ensure that the locality is released, so the > subsequent request_locality could confuse the TPM? > > tpm_chip_start grabs locality 0, and updates chip->locality. Call that > before the TPM_INT_ENABLE write, and drop the explicit request/release > calls. tpm_chip_stop performs the release. With this, we switch to > using chip->locality instead of priv->locality. The probe failure is > not seen after this. > > commit 0ef333f5ba7f ("tpm: add request_locality before write > TPM_INT_ENABLE") added a request_locality/release_locality pair around > tpm_tis_write32 TPM_INT_ENABLE, but there is a read of > TPM_INT_ENABLE for the intmask which should also have the locality > grabbed. tpm_chip_start is moved before that to have the locality open > during the read. > > Fixes: 0ef333f5ba7f ("tpm: add request_locality before write TPM_INT_ENABLE") 0ef333f5ba7f is probably not the commit that introduced the problem? As you said the problem was in 5.4 and the commit was merged in 5.16. > CC: stable@vger.kernel.org > Signed-off-by: Jason Andryuk <jandryuk@gmail.com> > --- > The probe failure was seen on 5.4, 5.15 and 5.17. > > commit e42acf104d6e ("tpm_tis: Clean up locality release") removed the > release wait. I haven't tried, but re-introducing that would probably > fix this issue. It's hard to know apriori when a synchronous wait is > needed, and they don't seem to be needed typically. Re-introducing the > wait would re-introduce a wait in all cases. > > Surrounding the read of TPM_INT_ENABLE with grabbing the locality may > not be necessary? It looks like the code only grabs a locality for > writing, but that asymmetry is surprising to me. > > tpm_chip and tpm_tis_data track the locality separately. Should the > tpm_tis_data one be removed so they don't get out of sync? > --- > drivers/char/tpm/tpm_tis_core.c | 20 ++++++++------------ > 1 file changed, 8 insertions(+), 12 deletions(-) > > diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c > index dc56b976d816..529c241800c0 100644 > --- a/drivers/char/tpm/tpm_tis_core.c > +++ b/drivers/char/tpm/tpm_tis_core.c > @@ -986,8 +986,13 @@ int tpm_tis_core_init(struct device *dev, struct tpm_tis_data *priv, int irq, > goto out_err; > } > > + /* Grabs locality 0. */ > + rc = tpm_chip_start(chip); > + if (rc) > + goto out_err; > + > /* Take control of the TPM's interrupt hardware and shut it off */ > - rc = tpm_tis_read32(priv, TPM_INT_ENABLE(priv->locality), &intmask); > + rc = tpm_tis_read32(priv, TPM_INT_ENABLE(chip->locality), &intmask); > if (rc < 0) > goto out_err; > > @@ -995,19 +1000,10 @@ int tpm_tis_core_init(struct device *dev, struct tpm_tis_data *priv, int irq, > TPM_INTF_DATA_AVAIL_INT | TPM_INTF_STS_VALID_INT; > intmask &= ~TPM_GLOBAL_INT_ENABLE; > > - rc = request_locality(chip, 0); > - if (rc < 0) { > - rc = -ENODEV; > - goto out_err; > - } > - > - tpm_tis_write32(priv, TPM_INT_ENABLE(priv->locality), intmask); > - release_locality(chip, 0); > + tpm_tis_write32(priv, TPM_INT_ENABLE(chip->locality), intmask); > > - rc = tpm_chip_start(chip); > - if (rc) > - goto out_err; > rc = tpm2_probe(chip); > + /* Releases locality 0. */ > tpm_chip_stop(chip); > if (rc) > goto out_err; >
On Thu, Jul 7, 2022 at 3:48 AM chenjun (AM) <chenjun102@huawei.com> wrote: > > 在 2022/7/7 0:41, Jason Andryuk 写道: > > WEC TPMs (in 1.2 mode) and NTC (in 2.0 mode) have been observer to > > frequently, but intermittently, fail probe with: > > tpm_tis: probe of 00:09 failed with error -1 > > > > Added debugging output showed that the request_locality in > > tpm_tis_core_init succeeds, but then the tpm_chip_start fails when its > > call to tpm_request_locality -> request_locality fails. > > > > The access register in check_locality would show: > > 0x80 TPM_ACCESS_VALID > > 0x82 TPM_ACCESS_VALID | TPM_ACCESS_REQUEST_USE > > 0x80 TPM_ACCESS_VALID > > continuing until it times out. TPM_ACCESS_ACTIVE_LOCALITY (0x20) doesn't > > get set which would end the wait. > > > > My best guess is something racy was going on between release_locality's > > write and request_locality's write. There is no wait in > > release_locality to ensure that the locality is released, so the > > subsequent request_locality could confuse the TPM? > > > > tpm_chip_start grabs locality 0, and updates chip->locality. Call that > > before the TPM_INT_ENABLE write, and drop the explicit request/release > > calls. tpm_chip_stop performs the release. With this, we switch to > > using chip->locality instead of priv->locality. The probe failure is > > not seen after this. > > > > commit 0ef333f5ba7f ("tpm: add request_locality before write > > TPM_INT_ENABLE") added a request_locality/release_locality pair around > > tpm_tis_write32 TPM_INT_ENABLE, but there is a read of > > TPM_INT_ENABLE for the intmask which should also have the locality > > grabbed. tpm_chip_start is moved before that to have the locality open > > during the read. > > > > Fixes: 0ef333f5ba7f ("tpm: add request_locality before write TPM_INT_ENABLE") > > 0ef333f5ba7f is probably not the commit that introduced the problem? As > you said the problem was in 5.4 and the commit was merged in 5.16. I was imprecise with my versions. 0ef333f5ba7f was backported to stable-5.4.y as 13af3a9b1ba6 in 5.4.174. I was running 5.4.163 on some systems without problem, but the probe failures started after jumping to 5.4.200. Other systems showing the probe failures were 5.15.29 (which has 0ef333f5ba7f backported as ea1fd8364c9f 5.15.17) and 5.17.y (5.17.5 I think) from a Fedora 36 USB drive. Other machines run fine with 0ef333f5ba7f which is part of why I didn't notice it earlier between 5.4.613 and 5.4.200. At the top I wrote: "frequently, but intermittently, fail probe". To expand on that: Basically on the affected machines, the probe during boot fails. In one instance, I repeatedly ran `echo 00:05 > /sys/bus/pnp/drivers/tpm_tis/bind`, and it successfully probed on the 7th try. So something racy seems to be going on. Regards, Jason
On Wed, Jul 06, 2022 at 12:40:43PM -0400, Jason Andryuk wrote: > WEC TPMs (in 1.2 mode) and NTC (in 2.0 mode) have been observer to > frequently, but intermittently, fail probe with: > tpm_tis: probe of 00:09 failed with error -1 > > Added debugging output showed that the request_locality in > tpm_tis_core_init succeeds, but then the tpm_chip_start fails when its > call to tpm_request_locality -> request_locality fails. > > The access register in check_locality would show: > 0x80 TPM_ACCESS_VALID > 0x82 TPM_ACCESS_VALID | TPM_ACCESS_REQUEST_USE > 0x80 TPM_ACCESS_VALID > continuing until it times out. TPM_ACCESS_ACTIVE_LOCALITY (0x20) doesn't > get set which would end the wait. > > My best guess is something racy was going on between release_locality's > write and request_locality's write. There is no wait in > release_locality to ensure that the locality is released, so the > subsequent request_locality could confuse the TPM? > > tpm_chip_start grabs locality 0, and updates chip->locality. Call that > before the TPM_INT_ENABLE write, and drop the explicit request/release > calls. tpm_chip_stop performs the release. With this, we switch to > using chip->locality instead of priv->locality. The probe failure is > not seen after this. > > commit 0ef333f5ba7f ("tpm: add request_locality before write > TPM_INT_ENABLE") added a request_locality/release_locality pair around > tpm_tis_write32 TPM_INT_ENABLE, but there is a read of > TPM_INT_ENABLE for the intmask which should also have the locality > grabbed. tpm_chip_start is moved before that to have the locality open > during the read. > > Fixes: 0ef333f5ba7f ("tpm: add request_locality before write TPM_INT_ENABLE") > CC: stable@vger.kernel.org > Signed-off-by: Jason Andryuk <jandryuk@gmail.com> > --- > The probe failure was seen on 5.4, 5.15 and 5.17. > > commit e42acf104d6e ("tpm_tis: Clean up locality release") removed the > release wait. I haven't tried, but re-introducing that would probably > fix this issue. It's hard to know apriori when a synchronous wait is > needed, and they don't seem to be needed typically. Re-introducing the > wait would re-introduce a wait in all cases. > > Surrounding the read of TPM_INT_ENABLE with grabbing the locality may > not be necessary? It looks like the code only grabs a locality for > writing, but that asymmetry is surprising to me. > > tpm_chip and tpm_tis_data track the locality separately. Should the > tpm_tis_data one be removed so they don't get out of sync? > --- > drivers/char/tpm/tpm_tis_core.c | 20 ++++++++------------ > 1 file changed, 8 insertions(+), 12 deletions(-) > > diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c > index dc56b976d816..529c241800c0 100644 > --- a/drivers/char/tpm/tpm_tis_core.c > +++ b/drivers/char/tpm/tpm_tis_core.c > @@ -986,8 +986,13 @@ int tpm_tis_core_init(struct device *dev, struct tpm_tis_data *priv, int irq, > goto out_err; > } > > + /* Grabs locality 0. */ > + rc = tpm_chip_start(chip); > + if (rc) > + goto out_err; > + > /* Take control of the TPM's interrupt hardware and shut it off */ > - rc = tpm_tis_read32(priv, TPM_INT_ENABLE(priv->locality), &intmask); > + rc = tpm_tis_read32(priv, TPM_INT_ENABLE(chip->locality), &intmask); > if (rc < 0) > goto out_err; > > @@ -995,19 +1000,10 @@ int tpm_tis_core_init(struct device *dev, struct tpm_tis_data *priv, int irq, > TPM_INTF_DATA_AVAIL_INT | TPM_INTF_STS_VALID_INT; > intmask &= ~TPM_GLOBAL_INT_ENABLE; > > - rc = request_locality(chip, 0); > - if (rc < 0) { > - rc = -ENODEV; > - goto out_err; > - } > - > - tpm_tis_write32(priv, TPM_INT_ENABLE(priv->locality), intmask); > - release_locality(chip, 0); > + tpm_tis_write32(priv, TPM_INT_ENABLE(chip->locality), intmask); > > - rc = tpm_chip_start(chip); > - if (rc) > - goto out_err; > rc = tpm2_probe(chip); > + /* Releases locality 0. */ > tpm_chip_stop(chip); > if (rc) > goto out_err; > -- > 2.36.1 > Can you test against https://lore.kernel.org/linux-integrity/20220629232653.1306735-1-LinoSanfilippo@gmx.de/T/#t BR, Jarkko
On Sun, Jul 10, 2022 at 10:58 PM Jarkko Sakkinen <jarkko@kernel.org> wrote: > > On Wed, Jul 06, 2022 at 12:40:43PM -0400, Jason Andryuk wrote: > > WEC TPMs (in 1.2 mode) and NTC (in 2.0 mode) have been observer to > > frequently, but intermittently, fail probe with: > > tpm_tis: probe of 00:09 failed with error -1 > > > > Added debugging output showed that the request_locality in > > tpm_tis_core_init succeeds, but then the tpm_chip_start fails when its > > call to tpm_request_locality -> request_locality fails. > > > > The access register in check_locality would show: > > 0x80 TPM_ACCESS_VALID > > 0x82 TPM_ACCESS_VALID | TPM_ACCESS_REQUEST_USE > > 0x80 TPM_ACCESS_VALID > > continuing until it times out. TPM_ACCESS_ACTIVE_LOCALITY (0x20) doesn't > > get set which would end the wait. > > > > My best guess is something racy was going on between release_locality's > > write and request_locality's write. There is no wait in > > release_locality to ensure that the locality is released, so the > > subsequent request_locality could confuse the TPM? > > > > tpm_chip_start grabs locality 0, and updates chip->locality. Call that > > before the TPM_INT_ENABLE write, and drop the explicit request/release > > calls. tpm_chip_stop performs the release. With this, we switch to > > using chip->locality instead of priv->locality. The probe failure is > > not seen after this. > > > > commit 0ef333f5ba7f ("tpm: add request_locality before write > > TPM_INT_ENABLE") added a request_locality/release_locality pair around > > tpm_tis_write32 TPM_INT_ENABLE, but there is a read of > > TPM_INT_ENABLE for the intmask which should also have the locality > > grabbed. tpm_chip_start is moved before that to have the locality open > > during the read. > > > > Fixes: 0ef333f5ba7f ("tpm: add request_locality before write TPM_INT_ENABLE") > > CC: stable@vger.kernel.org > > Signed-off-by: Jason Andryuk <jandryuk@gmail.com> > > --- > > The probe failure was seen on 5.4, 5.15 and 5.17. > > > > commit e42acf104d6e ("tpm_tis: Clean up locality release") removed the > > release wait. I haven't tried, but re-introducing that would probably > > fix this issue. It's hard to know apriori when a synchronous wait is > > needed, and they don't seem to be needed typically. Re-introducing the > > wait would re-introduce a wait in all cases. > > > > Surrounding the read of TPM_INT_ENABLE with grabbing the locality may > > not be necessary? It looks like the code only grabs a locality for > > writing, but that asymmetry is surprising to me. > > > > tpm_chip and tpm_tis_data track the locality separately. Should the > > tpm_tis_data one be removed so they don't get out of sync? > > --- > > drivers/char/tpm/tpm_tis_core.c | 20 ++++++++------------ > > 1 file changed, 8 insertions(+), 12 deletions(-) > > > > diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c > > index dc56b976d816..529c241800c0 100644 > > --- a/drivers/char/tpm/tpm_tis_core.c > > +++ b/drivers/char/tpm/tpm_tis_core.c > > @@ -986,8 +986,13 @@ int tpm_tis_core_init(struct device *dev, struct tpm_tis_data *priv, int irq, > > goto out_err; > > } > > > > + /* Grabs locality 0. */ > > + rc = tpm_chip_start(chip); > > + if (rc) > > + goto out_err; > > + > > /* Take control of the TPM's interrupt hardware and shut it off */ > > - rc = tpm_tis_read32(priv, TPM_INT_ENABLE(priv->locality), &intmask); > > + rc = tpm_tis_read32(priv, TPM_INT_ENABLE(chip->locality), &intmask); > > if (rc < 0) > > goto out_err; > > > > @@ -995,19 +1000,10 @@ int tpm_tis_core_init(struct device *dev, struct tpm_tis_data *priv, int irq, > > TPM_INTF_DATA_AVAIL_INT | TPM_INTF_STS_VALID_INT; > > intmask &= ~TPM_GLOBAL_INT_ENABLE; > > > > - rc = request_locality(chip, 0); > > - if (rc < 0) { > > - rc = -ENODEV; > > - goto out_err; > > - } > > - > > - tpm_tis_write32(priv, TPM_INT_ENABLE(priv->locality), intmask); > > - release_locality(chip, 0); > > + tpm_tis_write32(priv, TPM_INT_ENABLE(chip->locality), intmask); > > > > - rc = tpm_chip_start(chip); > > - if (rc) > > - goto out_err; > > rc = tpm2_probe(chip); > > + /* Releases locality 0. */ > > tpm_chip_stop(chip); > > if (rc) > > goto out_err; > > -- > > 2.36.1 > > > > Can you test against > > https://lore.kernel.org/linux-integrity/20220629232653.1306735-1-LinoSanfilippo@gmx.de/T/#t I applied on top of 5.15.53, and the probe on boot still fails. Manually probing works intermittently. Regards, Jason
On Mon, Jul 11, 2022 at 3:37 PM Jason Andryuk <jandryuk@gmail.com> wrote: > > On Sun, Jul 10, 2022 at 10:58 PM Jarkko Sakkinen <jarkko@kernel.org> wrote: > > > > Can you test against > > > > https://lore.kernel.org/linux-integrity/20220629232653.1306735-1-LinoSanfilippo@gmx.de/T/#t > > I applied on top of 5.15.53, and the probe on boot still fails. > Manually probing works intermittently. On top of Lino Sanfilippo's patch queue, I added an additional tpm_tis_request_locality and tpm_tis_release_locality to hold the locality open via refcount in tpm_tis_chip_init. Similar to my patch in this thread, it acquires the locality before the TPM_INT_ENABLE write and holds it until after the TPM_RID read. That fixes the probe issue on the box where I tested. While tpm_tis_core_init is definitely a problem, I wonder if there are other code paths that could trigger this acquire -> release -> acquire issue. In that light, restoring a wait by reverting commit e42acf104d6e ("tpm_tis: Clean up locality release") seems safer since it would cover any code path. I just tested reverting that and it still fails to probe on boot and intermittently. I'm surprised since I expected it to solve the issue, but my original debugging was showing TPM_ACCESS_ACTIVE_LOCALITY cleared (and never re-set) so the driver isn't actually waiting. All this locality acquiring and releasing seems rather redundant. If locality 0 is all that is ever used, why not just hold it open? I guess Trench Boot/Secure Launch wants to use locality 2 (https://lore.kernel.org/lkml/1645070085-14255-13-git-send-email-ross.philipson@oracle.com/), but in either case doesn't the system just stay in the assigned locality? I haven't read the spec, so maybe that is disallowed. There is something nice about cleaning up and releasing the locality when not in use, but it's also causing a problem here. Regards, Jason
diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c index dc56b976d816..529c241800c0 100644 --- a/drivers/char/tpm/tpm_tis_core.c +++ b/drivers/char/tpm/tpm_tis_core.c @@ -986,8 +986,13 @@ int tpm_tis_core_init(struct device *dev, struct tpm_tis_data *priv, int irq, goto out_err; } + /* Grabs locality 0. */ + rc = tpm_chip_start(chip); + if (rc) + goto out_err; + /* Take control of the TPM's interrupt hardware and shut it off */ - rc = tpm_tis_read32(priv, TPM_INT_ENABLE(priv->locality), &intmask); + rc = tpm_tis_read32(priv, TPM_INT_ENABLE(chip->locality), &intmask); if (rc < 0) goto out_err; @@ -995,19 +1000,10 @@ int tpm_tis_core_init(struct device *dev, struct tpm_tis_data *priv, int irq, TPM_INTF_DATA_AVAIL_INT | TPM_INTF_STS_VALID_INT; intmask &= ~TPM_GLOBAL_INT_ENABLE; - rc = request_locality(chip, 0); - if (rc < 0) { - rc = -ENODEV; - goto out_err; - } - - tpm_tis_write32(priv, TPM_INT_ENABLE(priv->locality), intmask); - release_locality(chip, 0); + tpm_tis_write32(priv, TPM_INT_ENABLE(chip->locality), intmask); - rc = tpm_chip_start(chip); - if (rc) - goto out_err; rc = tpm2_probe(chip); + /* Releases locality 0. */ tpm_chip_stop(chip); if (rc) goto out_err;
WEC TPMs (in 1.2 mode) and NTC (in 2.0 mode) have been observer to frequently, but intermittently, fail probe with: tpm_tis: probe of 00:09 failed with error -1 Added debugging output showed that the request_locality in tpm_tis_core_init succeeds, but then the tpm_chip_start fails when its call to tpm_request_locality -> request_locality fails. The access register in check_locality would show: 0x80 TPM_ACCESS_VALID 0x82 TPM_ACCESS_VALID | TPM_ACCESS_REQUEST_USE 0x80 TPM_ACCESS_VALID continuing until it times out. TPM_ACCESS_ACTIVE_LOCALITY (0x20) doesn't get set which would end the wait. My best guess is something racy was going on between release_locality's write and request_locality's write. There is no wait in release_locality to ensure that the locality is released, so the subsequent request_locality could confuse the TPM? tpm_chip_start grabs locality 0, and updates chip->locality. Call that before the TPM_INT_ENABLE write, and drop the explicit request/release calls. tpm_chip_stop performs the release. With this, we switch to using chip->locality instead of priv->locality. The probe failure is not seen after this. commit 0ef333f5ba7f ("tpm: add request_locality before write TPM_INT_ENABLE") added a request_locality/release_locality pair around tpm_tis_write32 TPM_INT_ENABLE, but there is a read of TPM_INT_ENABLE for the intmask which should also have the locality grabbed. tpm_chip_start is moved before that to have the locality open during the read. Fixes: 0ef333f5ba7f ("tpm: add request_locality before write TPM_INT_ENABLE") CC: stable@vger.kernel.org Signed-off-by: Jason Andryuk <jandryuk@gmail.com> --- The probe failure was seen on 5.4, 5.15 and 5.17. commit e42acf104d6e ("tpm_tis: Clean up locality release") removed the release wait. I haven't tried, but re-introducing that would probably fix this issue. It's hard to know apriori when a synchronous wait is needed, and they don't seem to be needed typically. Re-introducing the wait would re-introduce a wait in all cases. Surrounding the read of TPM_INT_ENABLE with grabbing the locality may not be necessary? It looks like the code only grabs a locality for writing, but that asymmetry is surprising to me. tpm_chip and tpm_tis_data track the locality separately. Should the tpm_tis_data one be removed so they don't get out of sync? --- drivers/char/tpm/tpm_tis_core.c | 20 ++++++++------------ 1 file changed, 8 insertions(+), 12 deletions(-)