tpm_tis: Add missing start/stop_tpm_chip calls

Message ID	20210123014247.989368-1-lma@semihalf.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-integrity-owner@kernel.org> From: Lukasz Majczak <lma@semihalf.com> To: Peter Huewe <peterhuewe@gmx.de>, Jarkko Sakkinen <jarkko@kernel.org>, Jason Gunthorpe <jgg@ziepe.ca> Cc: linux-integrity@vger.kernel.org, linux-kernel@vger.kernel.org, Radoslaw Biernacki <rad@semihalf.com>, Marcin Wojtas <mw@semihalf.com>, Alex Levin <levinale@google.com> Subject: [PATCH] tpm_tis: Add missing start/stop_tpm_chip calls Date: Sat, 23 Jan 2021 02:42:47 +0100 Message-Id: <20210123014247.989368-1-lma@semihalf.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	tpm_tis: Add missing start/stop_tpm_chip calls \| expand tpm_tis: Add missing start/stop_tpm_chip calls

Lukasz Majczak Jan. 23, 2021, 1:42 a.m. UTC

There is a missing call to start_tpm_chip before the call to
the tpm_get_timeouts() and tpm_tis_probe_irq_single(). As the current
approach maight work for tpm2, it fails for tpm1.x - in that case
call to tpm_get_timeouts() or tpm_tis_probe_irq_single() tries to
transmit TPM commands on a disabled chip what what doesn't succeed
and in turn causes tpm_tis_core_init() to fail.
Tested on Samsung Chromebook Pro (Caroline).

Signed-off-by: Lukasz Majczak <lma@semihalf.com>
---
 drivers/char/tpm/tpm_tis_core.c | 5 +++++
 1 file changed, 5 insertions(+)

Jarkko Sakkinen Jan. 25, 2021, 5:12 p.m. UTC | #1

On Sat, Jan 23, 2021 at 02:42:47AM +0100, Lukasz Majczak wrote:
> There is a missing call to start_tpm_chip before the call to
> the tpm_get_timeouts() and tpm_tis_probe_irq_single(). As the current
> approach maight work for tpm2, it fails for tpm1.x - in that case
> call to tpm_get_timeouts() or tpm_tis_probe_irq_single() tries to
> transmit TPM commands on a disabled chip what what doesn't succeed
> and in turn causes tpm_tis_core_init() to fail.
> Tested on Samsung Chromebook Pro (Caroline).
> 
> Signed-off-by: Lukasz Majczak <lma@semihalf.com>

Do you have a log demonstrating the failure?

/Jarkko

Guenter Roeck Jan. 25, 2021, 5:18 p.m. UTC | #2

Hi Lukasz,

On Sat, Jan 23, 2021 at 02:42:47AM +0100, Lukasz Majczak wrote:
> There is a missing call to start_tpm_chip before the call to
> the tpm_get_timeouts() and tpm_tis_probe_irq_single(). As the current
> approach maight work for tpm2, it fails for tpm1.x - in that case
> call to tpm_get_timeouts() or tpm_tis_probe_irq_single() tries to
> transmit TPM commands on a disabled chip what what doesn't succeed

s/what what/what/

> and in turn causes tpm_tis_core_init() to fail.
> Tested on Samsung Chromebook Pro (Caroline).
> 
> Signed-off-by: Lukasz Majczak <lma@semihalf.com>
> ---
>  drivers/char/tpm/tpm_tis_core.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c
> index 92c51c6cfd1b..ff0e5fe46a9d 100644
> --- a/drivers/char/tpm/tpm_tis_core.c
> +++ b/drivers/char/tpm/tpm_tis_core.c
> @@ -1063,12 +1063,16 @@ int tpm_tis_core_init(struct device *dev, struct tpm_tis_data *priv, int irq,
>  	init_waitqueue_head(&priv->read_queue);
>  	init_waitqueue_head(&priv->int_queue);
>  	if (irq != -1) {
> +		rc = tpm_chip_start(chip);

Unless I am missing something, the underlying problem seems to be
the calls to tpm1_getcap(). From other code calling this function,
it looks like it may only require tpm_clk_enable() to work.

With that in mind, would it possibly be better to call tpm_clk_enable()
and tpm_clk_disable() around the calls to tpm1_getcap(), ie in
tpm1_get_timeouts() and in tpm_tis_gen_interrupt() ?

This would avoid the unnecessary calls to tpm_chip_start() and
tpm_chip_stop() for tpm2 chips.

Thanks,
Guenter


> +		if (rc)
> +			goto out_err;
>  		/* Before doing irq testing issue a command to the TPM in polling mode
>  		 * to make sure it works. May as well use that command to set the
>  		 * proper timeouts for the driver.
>  		 */
>  		if (tpm_get_timeouts(chip)) {
>  			dev_err(dev, "Could not get TPM timeouts and durations\n");
> +			tpm_chip_stop(chip);
>  			rc = -ENODEV;
>  			goto out_err;
>  		}
> @@ -1085,6 +1089,7 @@ int tpm_tis_core_init(struct device *dev, struct tpm_tis_data *priv, int irq,
>  		} else {
>  			tpm_tis_probe_irq(chip, intmask);
>  		}
> +		tpm_chip_stop(chip);
>  	}
>  
>  	rc = tpm_chip_register(chip);

Lukasz Majczak Jan. 26, 2021, 3:46 p.m. UTC | #3

Hi Jarkko, Guenter

Yes, here are the logs when failure occurs -
https://gist.github.com/semihalf-majczak-lukasz/1575461f585f1e7fb1e9366b8eceaab9
Look for a phrase "TPM returned invalid status"

Guenter - good suggestion - I will try to keep it as tight as possible.

Best regards,
Lukasz

pon., 25 sty 2021 o 18:18 Guenter Roeck <linux@roeck-us.net> napisał(a):
>
> Hi Lukasz,
>
> On Sat, Jan 23, 2021 at 02:42:47AM +0100, Lukasz Majczak wrote:
> > There is a missing call to start_tpm_chip before the call to
> > the tpm_get_timeouts() and tpm_tis_probe_irq_single(). As the current
> > approach maight work for tpm2, it fails for tpm1.x - in that case
> > call to tpm_get_timeouts() or tpm_tis_probe_irq_single() tries to
> > transmit TPM commands on a disabled chip what what doesn't succeed
>
> s/what what/what/
>
> > and in turn causes tpm_tis_core_init() to fail.
> > Tested on Samsung Chromebook Pro (Caroline).
> >
> > Signed-off-by: Lukasz Majczak <lma@semihalf.com>
> > ---
> >  drivers/char/tpm/tpm_tis_core.c | 5 +++++
> >  1 file changed, 5 insertions(+)
> >
> > diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c
> > index 92c51c6cfd1b..ff0e5fe46a9d 100644
> > --- a/drivers/char/tpm/tpm_tis_core.c
> > +++ b/drivers/char/tpm/tpm_tis_core.c
> > @@ -1063,12 +1063,16 @@ int tpm_tis_core_init(struct device *dev, struct tpm_tis_data *priv, int irq,
> >       init_waitqueue_head(&priv->read_queue);
> >       init_waitqueue_head(&priv->int_queue);
> >       if (irq != -1) {
> > +             rc = tpm_chip_start(chip);
>
> Unless I am missing something, the underlying problem seems to be
> the calls to tpm1_getcap(). From other code calling this function,
> it looks like it may only require tpm_clk_enable() to work.
>
> With that in mind, would it possibly be better to call tpm_clk_enable()
> and tpm_clk_disable() around the calls to tpm1_getcap(), ie in
> tpm1_get_timeouts() and in tpm_tis_gen_interrupt() ?
>
> This would avoid the unnecessary calls to tpm_chip_start() and
> tpm_chip_stop() for tpm2 chips.
>
> Thanks,
> Guenter
>
>
> > +             if (rc)
> > +                     goto out_err;
> >               /* Before doing irq testing issue a command to the TPM in polling mode
> >                * to make sure it works. May as well use that command to set the
> >                * proper timeouts for the driver.
> >                */
> >               if (tpm_get_timeouts(chip)) {
> >                       dev_err(dev, "Could not get TPM timeouts and durations\n");
> > +                     tpm_chip_stop(chip);
> >                       rc = -ENODEV;
> >                       goto out_err;
> >               }
> > @@ -1085,6 +1089,7 @@ int tpm_tis_core_init(struct device *dev, struct tpm_tis_data *priv, int irq,
> >               } else {
> >                       tpm_tis_probe_irq(chip, intmask);
> >               }
> > +             tpm_chip_stop(chip);
> >       }
> >
> >       rc = tpm_chip_register(chip);

James Bottomley Jan. 26, 2021, 4:46 p.m. UTC | #4

On Tue, 2021-01-26 at 16:46 +0100, Łukasz Majczak wrote:
> Hi Jarkko, Guenter
> 
> Yes, here are the logs when failure occurs -
> https://gist.github.com/semihalf-majczak-lukasz/1575461f585f1e7fb1e9366b8eceaab9
> Look for a phrase "TPM returned invalid status"

We've had other reports of this:

https://lore.kernel.org/linux-integrity/ghsgagsnag.fsf@gouders.net/
https://lore.kernel.org/linux-integrity/374e918c-f167-9308-2bea-ae6bc6a3d2e3@elloe.vision/

The problem is some TIS TPMs don't begin in the correct locality so we
have to set it.  When I proposed the check, I also proposed a fix for
this problem:

https://lore.kernel.org/linux-integrity/20201001180925.13808-5-James.Bottomley@HansenPartnership.com/

But it's part of a series that never went upstream.  Part of the reason
was Jarkko proposed the get/put patch to fix this instead, but that
never went upstream either.  We need to decide an approach and apply
one or other fixes.

James

Tj (Elloe Linux) Jan. 26, 2021, 6:55 p.m. UTC | #5

On 26/01/2021 16:46, James Bottomley wrote:
> On Tue, 2021-01-26 at 16:46 +0100, Łukasz Majczak wrote:
>> Hi Jarkko, Guenter
>>
>> Yes, here are the logs when failure occurs -
>> https://gist.github.com/semihalf-majczak-lukasz/1575461f585f1e7fb1e9366b8eceaab9
>> Look for a phrase "TPM returned invalid status"
> 
> We've had other reports of this:
> 
> https://lore.kernel.org/linux-integrity/ghsgagsnag.fsf@gouders.net/
> https://lore.kernel.org/linux-integrity/374e918c-f167-9308-2bea-ae6bc6a3d2e3@elloe.vision/
> 
> The problem is some TIS TPMs don't begin in the correct locality so we
> have to set it.  When I proposed the check, I also proposed a fix for
> this problem:
> 
> https://lore.kernel.org/linux-integrity/20201001180925.13808-5-James.Bottomley@HansenPartnership.com/
> 
This patch solves the error messages on top of -rc5

> But it's part of a series that never went upstream.  Part of the reason
> was Jarkko proposed the get/put patch to fix this instead, but that
> never went upstream either.  We need to decide an approach and apply
> one or other fixes.
> 
> James
> 
>

Jarkko Sakkinen Jan. 28, 2021, 5:58 a.m. UTC | #6

On Tue, Jan 26, 2021 at 08:46:48AM -0800, James Bottomley wrote:
> On Tue, 2021-01-26 at 16:46 +0100, Łukasz Majczak wrote:
> > Hi Jarkko, Guenter
> > 
> > Yes, here are the logs when failure occurs -
> > https://gist.github.com/semihalf-majczak-lukasz/1575461f585f1e7fb1e9366b8eceaab9
> > Look for a phrase "TPM returned invalid status"
> 
> We've had other reports of this:
> 
> https://lore.kernel.org/linux-integrity/ghsgagsnag.fsf@gouders.net/
> https://lore.kernel.org/linux-integrity/374e918c-f167-9308-2bea-ae6bc6a3d2e3@elloe.vision/
> 
> The problem is some TIS TPMs don't begin in the correct locality so we
> have to set it.  When I proposed the check, I also proposed a fix for
> this problem:
> 
> https://lore.kernel.org/linux-integrity/20201001180925.13808-5-James.Bottomley@HansenPartnership.com/
> 
> But it's part of a series that never went upstream.  Part of the reason
> was Jarkko proposed the get/put patch to fix this instead, but that
> never went upstream either.  We need to decide an approach and apply
> one or other fixes.

Can you remind me what I proposed? I remember only proposing removing
interrupt code.

Can you pick up just 1/5 and 2/5 from that serieis and send them as a
mini series?

I had one remark for 1/5, which can be found here:

https://lore.kernel.org/linux-integrity/20201024120744.GA32607@kernel.org/

I don't think there was never argument on locality changes.

/Jarkko

Jarkko Sakkinen Jan. 29, 2021, 10:49 p.m. UTC | #7

On Mon, Jan 25, 2021 at 09:18:46AM -0800, Guenter Roeck wrote:
> Hi Lukasz,
> 
> On Sat, Jan 23, 2021 at 02:42:47AM +0100, Lukasz Majczak wrote:
> > There is a missing call to start_tpm_chip before the call to
> > the tpm_get_timeouts() and tpm_tis_probe_irq_single(). As the current
> > approach maight work for tpm2, it fails for tpm1.x - in that case
> > call to tpm_get_timeouts() or tpm_tis_probe_irq_single() tries to
> > transmit TPM commands on a disabled chip what what doesn't succeed
> 
> s/what what/what/

s/maight/might/

Also, would be nice to have the capatalization of acronyms correct
and consistent. E.g. tpm1.x should be rather written as "TPM 1.x
chips".

It's also incorrect to state that something fails for TPM 1.x chips,
unless you can somehow make a sense that every single TPM 1.x at wild
fails, which probably is not true.

> > and in turn causes tpm_tis_core_init() to fail.
> > Tested on Samsung Chromebook Pro (Caroline).

Anyone can tell me what does Caroline mean anyway?

> > 
> > Signed-off-by: Lukasz Majczak <lma@semihalf.com>
> > ---
> >  drivers/char/tpm/tpm_tis_core.c | 5 +++++
> >  1 file changed, 5 insertions(+)
> > 
> > diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c
> > index 92c51c6cfd1b..ff0e5fe46a9d 100644
> > --- a/drivers/char/tpm/tpm_tis_core.c
> > +++ b/drivers/char/tpm/tpm_tis_core.c
> > @@ -1063,12 +1063,16 @@ int tpm_tis_core_init(struct device *dev, struct tpm_tis_data *priv, int irq,
> >  	init_waitqueue_head(&priv->read_queue);
> >  	init_waitqueue_head(&priv->int_queue);
> >  	if (irq != -1) {
> > +		rc = tpm_chip_start(chip);
> 
> Unless I am missing something, the underlying problem seems to be
> the calls to tpm1_getcap(). From other code calling this function,
> it looks like it may only require tpm_clk_enable() to work.

I don't claim to be bus expert but afaik CLKRUN is used with to power
on/off clock, when using LPC bus.

Also, TPM interface specification speaks about CLKRUN in the section
6.3 "TPM LPC Hardware Protocol".

> With that in mind, would it possibly be better to call tpm_clk_enable()
> and tpm_clk_disable() around the calls to tpm1_getcap(), ie in
> tpm1_get_timeouts() and in tpm_tis_gen_interrupt() ?
> 
> This would avoid the unnecessary calls to tpm_chip_start() and
> tpm_chip_stop() for tpm2 chips.

Not enough information about hardware and no klog. Cannot make much
conclusions with the information available.

> Thanks,
> Guenter

Off-topic: I noticed some dumb code in tpm_tis_core.c:

	if (chip->ops->clk_enable != NULL)
		chip->ops->clk_enable(chip, false);

These should be definitely changed as:

       tpm_tis_clkrun_enable(chip, false);

->clk_enable() should be only used in tpm-interface.c.

Not a bug, just really stupid code (and harder to trace).

/Jarkko

Jarkko Sakkinen Jan. 29, 2021, 10:59 p.m. UTC | #8

On Tue, Jan 26, 2021 at 04:46:07PM +0100, Łukasz Majczak wrote:
> Hi Jarkko, Guenter
> 
> Yes, here are the logs when failure occurs -
> https://gist.github.com/semihalf-majczak-lukasz/1575461f585f1e7fb1e9366b8eceaab9
> Look for a phrase "TPM returned invalid status"
> 
> Guenter - good suggestion - I will try to keep it as tight as possible.
> 
> Best regards,
> Lukasz

Is it possible for you try out with linux-next? Thanks. It's a known
issue, which ought to be fixed by now.

The log message is harmless, it'a warning not panic, and does not
endanger system stability. WARN()'s always dump stack trace. No oops
is happening.

/Jarkko

Jarkko Sakkinen Jan. 29, 2021, 11 p.m. UTC | #9

On Sat, Jan 30, 2021 at 12:59:09AM +0200, Jarkko Sakkinen wrote:
> On Tue, Jan 26, 2021 at 04:46:07PM +0100, Łukasz Majczak wrote:
> > Hi Jarkko, Guenter
> > 
> > Yes, here are the logs when failure occurs -
> > https://gist.github.com/semihalf-majczak-lukasz/1575461f585f1e7fb1e9366b8eceaab9
> > Look for a phrase "TPM returned invalid status"
> > 
> > Guenter - good suggestion - I will try to keep it as tight as possible.
> > 
> > Best regards,
> > Lukasz
> 
> Is it possible for you try out with linux-next? Thanks. It's a known
> issue, which ought to be fixed by now.
> 
> The log message is harmless, it'a warning not panic, and does not
> endanger system stability. WARN()'s always dump stack trace. No oops
> is happening.

The regression itself originates from 2006. It has just been unmasked
with "improved" logging.

/Jarkko

Guenter Roeck Jan. 29, 2021, 11:32 p.m. UTC | #10

On 1/29/21 2:49 PM, Jarkko Sakkinen wrote:
> On Mon, Jan 25, 2021 at 09:18:46AM -0800, Guenter Roeck wrote:
>> Hi Lukasz,
>>
>> On Sat, Jan 23, 2021 at 02:42:47AM +0100, Lukasz Majczak wrote:
>>> There is a missing call to start_tpm_chip before the call to
>>> the tpm_get_timeouts() and tpm_tis_probe_irq_single(). As the current
>>> approach maight work for tpm2, it fails for tpm1.x - in that case
>>> call to tpm_get_timeouts() or tpm_tis_probe_irq_single() tries to
>>> transmit TPM commands on a disabled chip what what doesn't succeed
>>
>> s/what what/what/
> 
> s/maight/might/
> 
> Also, would be nice to have the capatalization of acronyms correct
> and consistent. E.g. tpm1.x should be rather written as "TPM 1.x
> chips".
> 
> It's also incorrect to state that something fails for TPM 1.x chips,
> unless you can somehow make a sense that every single TPM 1.x at wild
> fails, which probably is not true.
> 
>>> and in turn causes tpm_tis_core_init() to fail.
>>> Tested on Samsung Chromebook Pro (Caroline).
> 
> Anyone can tell me what does Caroline mean anyway?
> 

"Caroline" is the code name for Samsung Chromebook Pro. The term
"Samsung Chromebook Pro (Caroline)" is quite widely used for this
system. Or, alternatively, "Caroline (Samsung Chromebook Pro)".

Guenter

Guenter Roeck Jan. 30, 2021, 11:49 p.m. UTC | #11

On 1/29/21 2:59 PM, Jarkko Sakkinen wrote:
> On Tue, Jan 26, 2021 at 04:46:07PM +0100, Łukasz Majczak wrote:
>> Hi Jarkko, Guenter
>>
>> Yes, here are the logs when failure occurs -
>> https://gist.github.com/semihalf-majczak-lukasz/1575461f585f1e7fb1e9366b8eceaab9
>> Look for a phrase "TPM returned invalid status"
>>
>> Guenter - good suggestion - I will try to keep it as tight as possible.
>>
>> Best regards,
>> Lukasz
> 
> Is it possible for you try out with linux-next? Thanks. It's a known
> issue, which ought to be fixed by now.
> 
> The log message is harmless, it'a warning not panic, and does not
> endanger system stability. WARN()'s always dump stack trace. No oops
> is happening.
> 

There is a note in the kernel documentation which states:

Note that the WARN()-family should only be used for "expected to
be unreachable" situations. If you want to warn about "reachable
but undesirable" situations, please use the pr_warn()-family of
functions.

It seems to me that "harmless" doesn't really fit the expected
use of WARN(). Should it possibly be converted to pr_warn() ?

Thanks,
Guenter

James Bottomley Jan. 31, 2021, 12:41 a.m. UTC | #12

On Sat, 2021-01-30 at 15:49 -0800, Guenter Roeck wrote:
> On 1/29/21 2:59 PM, Jarkko Sakkinen wrote:
> > On Tue, Jan 26, 2021 at 04:46:07PM +0100, Łukasz Majczak wrote:
> > > Hi Jarkko, Guenter
> > > 
> > > Yes, here are the logs when failure occurs -
> > > https://gist.github.com/semihalf-majczak-lukasz/1575461f585f1e7fb1e9366b8eceaab9
> > > Look for a phrase "TPM returned invalid status"
> > > 
> > > Guenter - good suggestion - I will try to keep it as tight as
> > > possible.
> > > 
> > > Best regards,
> > > Lukasz
> > 
> > Is it possible for you try out with linux-next? Thanks. It's a
> > known issue, which ought to be fixed by now.
> > 
> > The log message is harmless, it'a warning not panic, and does not
> > endanger system stability. WARN()'s always dump stack trace. No
> > oops is happening.
> > 
> 
> There is a note in the kernel documentation which states:
> 
> Note that the WARN()-family should only be used for "expected to
> be unreachable" situations. If you want to warn about "reachable
> but undesirable" situations, please use the pr_warn()-family of
> functions.

It fits the definition.  The warning only triggers if the access is in
the wrong locality, which should be impossible, so the warning should
be unreachable.

James

> It seems to me that "harmless" doesn't really fit the expected
> use of WARN(). Should it possibly be converted to pr_warn() ?

Guenter Roeck Jan. 31, 2021, 3:36 a.m. UTC | #13

On 1/30/21 4:41 PM, James Bottomley wrote:
> On Sat, 2021-01-30 at 15:49 -0800, Guenter Roeck wrote:
>> On 1/29/21 2:59 PM, Jarkko Sakkinen wrote:
>>> On Tue, Jan 26, 2021 at 04:46:07PM +0100, Łukasz Majczak wrote:
>>>> Hi Jarkko, Guenter
>>>>
>>>> Yes, here are the logs when failure occurs -
>>>> https://gist.github.com/semihalf-majczak-lukasz/1575461f585f1e7fb1e9366b8eceaab9
>>>> Look for a phrase "TPM returned invalid status"
>>>>
>>>> Guenter - good suggestion - I will try to keep it as tight as
>>>> possible.
>>>>
>>>> Best regards,
>>>> Lukasz
>>>
>>> Is it possible for you try out with linux-next? Thanks. It's a
>>> known issue, which ought to be fixed by now.
>>>
>>> The log message is harmless, it'a warning not panic, and does not
>>> endanger system stability. WARN()'s always dump stack trace. No
>>> oops is happening.
>>>
>>
>> There is a note in the kernel documentation which states:
>>
>> Note that the WARN()-family should only be used for "expected to
>> be unreachable" situations. If you want to warn about "reachable
>> but undesirable" situations, please use the pr_warn()-family of
>> functions.
> 
> It fits the definition.  The warning only triggers if the access is in
> the wrong locality, which should be impossible, so the warning should
> be unreachable.
> 
Thanks a lot for the clarification. So a warning traceback in the kernel
doesn't necessarily suggest that there is a serious problem that should
be fixed; it only means that some code is executed which should not be
reachable (but is otherwise harmless).

That makes me wonder, though, if it would make sense to mark such harmless
tracebacks differently. The terms "warning" and "harmless" sound like
a bit of a contradiction to me (especially for systems where panic_on_warn
is set).

Thanks,
Guenter

James Bottomley Jan. 31, 2021, 4:18 a.m. UTC | #14

On Sat, 2021-01-30 at 19:36 -0800, Guenter Roeck wrote:
> On 1/30/21 4:41 PM, James Bottomley wrote:
> > On Sat, 2021-01-30 at 15:49 -0800, Guenter Roeck wrote:
> > > On 1/29/21 2:59 PM, Jarkko Sakkinen wrote:
> > > > On Tue, Jan 26, 2021 at 04:46:07PM +0100, Łukasz Majczak wrote:
> > > > > Hi Jarkko, Guenter
> > > > > 
> > > > > Yes, here are the logs when failure occurs -
> > > > > https://gist.github.com/semihalf-majczak-lukasz/1575461f585f1e7fb1e9366b8eceaab9
> > > > > Look for a phrase "TPM returned invalid status"
> > > > > 
> > > > > Guenter - good suggestion - I will try to keep it as tight as
> > > > > possible.
> > > > > 
> > > > > Best regards,
> > > > > Lukasz
> > > > 
> > > > Is it possible for you try out with linux-next? Thanks. It's a
> > > > known issue, which ought to be fixed by now.
> > > > 
> > > > The log message is harmless, it'a warning not panic, and does
> > > > not endanger system stability. WARN()'s always dump stack
> > > > trace. No oops is happening.
> > > > 
> > > 
> > > There is a note in the kernel documentation which states:
> > > 
> > > Note that the WARN()-family should only be used for "expected to
> > > be unreachable" situations. If you want to warn about "reachable
> > > but undesirable" situations, please use the pr_warn()-family of
> > > functions.
> > 
> > It fits the definition.  The warning only triggers if the access is
> > in the wrong locality, which should be impossible, so the warning
> > should be unreachable.
> > 
> Thanks a lot for the clarification. So a warning traceback in the
> kernel doesn't necessarily suggest that there is a serious problem
> that should be fixed; it only means that some code is executed which
> should not be reachable (but is otherwise harmless).
> 
> That makes me wonder, though, if it would make sense to mark such
> harmless tracebacks differently. The terms "warning" and "harmless"
> sound like a bit of a contradiction to me (especially for systems
> where panic_on_warn is set).

Well, it's not harmless; because it occurs at start of day, it means we
clear the ineffective command and use default values and those happen
to work fine for the TPM in question, so the problem is pretty much
covered up.  If it had occurred anywhere else it would result in a loss
of the command data with unknown ramifications to user space, possibly
leading to a TPM failure.

Hopefully this means this is the only place we screwed up, but you can
see why a scary warning and stack trace is appropriate: if it triggers,
something in the kernel violated the TPM command model.

James

Jarkko Sakkinen Feb. 2, 2021, 3:33 p.m. UTC | #15

On Sat, Jan 30, 2021 at 03:49:09PM -0800, Guenter Roeck wrote:
> On 1/29/21 2:59 PM, Jarkko Sakkinen wrote:
> > On Tue, Jan 26, 2021 at 04:46:07PM +0100, Łukasz Majczak wrote:
> >> Hi Jarkko, Guenter
> >>
> >> Yes, here are the logs when failure occurs -
> >> https://gist.github.com/semihalf-majczak-lukasz/1575461f585f1e7fb1e9366b8eceaab9
> >> Look for a phrase "TPM returned invalid status"
> >>
> >> Guenter - good suggestion - I will try to keep it as tight as possible.
> >>
> >> Best regards,
> >> Lukasz
> > 
> > Is it possible for you try out with linux-next? Thanks. It's a known
> > issue, which ought to be fixed by now.
> > 
> > The log message is harmless, it'a warning not panic, and does not
> > endanger system stability. WARN()'s always dump stack trace. No oops
> > is happening.
> > 
> 
> There is a note in the kernel documentation which states:
> 
> Note that the WARN()-family should only be used for "expected to
> be unreachable" situations. If you want to warn about "reachable
> but undesirable" situations, please use the pr_warn()-family of
> functions.
> 
> It seems to me that "harmless" doesn't really fit the expected
> use of WARN(). Should it possibly be converted to pr_warn() ?

It should, and I agree that it was a mistake to merge the commit
that added this WARN(). I'm sending a late PR to Linus containing
just the James' fixes. I'll include one line change to that PR,
that does just what you suggested.

It also lacks useful information, i.e. the status.

I just send a fixed with your "suggested-by". Can you review and
ack it ASAP so that I can then go on and send PR to Linus?

/Jarkko

Jarkko Sakkinen Feb. 2, 2021, 4:17 p.m. UTC | #16

On Sat, Jan 30, 2021 at 04:41:13PM -0800, James Bottomley wrote:
> On Sat, 2021-01-30 at 15:49 -0800, Guenter Roeck wrote:
> > On 1/29/21 2:59 PM, Jarkko Sakkinen wrote:
> > > On Tue, Jan 26, 2021 at 04:46:07PM +0100, Łukasz Majczak wrote:
> > > > Hi Jarkko, Guenter
> > > > 
> > > > Yes, here are the logs when failure occurs -
> > > > https://gist.github.com/semihalf-majczak-lukasz/1575461f585f1e7fb1e9366b8eceaab9
> > > > Look for a phrase "TPM returned invalid status"
> > > > 
> > > > Guenter - good suggestion - I will try to keep it as tight as
> > > > possible.
> > > > 
> > > > Best regards,
> > > > Lukasz
> > > 
> > > Is it possible for you try out with linux-next? Thanks. It's a
> > > known issue, which ought to be fixed by now.
> > > 
> > > The log message is harmless, it'a warning not panic, and does not
> > > endanger system stability. WARN()'s always dump stack trace. No
> > > oops is happening.
> > > 
> > 
> > There is a note in the kernel documentation which states:
> > 
> > Note that the WARN()-family should only be used for "expected to
> > be unreachable" situations. If you want to warn about "reachable
> > but undesirable" situations, please use the pr_warn()-family of
> > functions.
> 
> It fits the definition.  The warning only triggers if the access is in
> the wrong locality, which should be impossible, so the warning should
> be unreachable.

It's an overkill. Even in perfectly working kernel it's not impossible, as
sometimes hardware gives faulty data. I think that it also lacks the useful
information i.e. the status code.

I would useful WARN() only if the driver state could suffer. In this case
it doesn't. It only results failing transfer but kernel state is still
legit.

/Jarkko
>

tpm_tis: Add missing start/stop_tpm_chip calls

Commit Message

Comments

Patch