diff mbox series

[v3,3/4] tpm_tis: Disable interrupts if interrupt storm detected

Message ID 20201205014340.148235-4-jsnitsel@redhat.com (mailing list archive)
State New, archived
Headers show
Series tpm_tis: Detect interrupt storms | expand

Commit Message

Jerry Snitselaar Dec. 5, 2020, 1:43 a.m. UTC
When enabling the interrupt code for the tpm_tis driver we have
noticed some systems have a bios issue causing an interrupt storm to
occur. The issue isn't limited to a single tpm or system manufacturer
so keeping a denylist of systems with the issue isn't optimal. Instead
try to detect the problem occurring, disable interrupts, and revert to
polling when it happens.

Cc: Jarkko Sakkinen <jarkko@kernel.org>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Peter Huewe <peterhuewe@gmx.de>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Matthew Garrett <mjg59@google.com>
Cc: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Jerry Snitselaar <jsnitsel@redhat.com>
---
v3: - Change include from linux/kernel_stat.h to linux/irq.h
v2: - drop tpm_tis specific workqueue and use just system_w

drivers/char/tpm/tpm_tis_core.c | 27 +++++++++++++++++++++++++++
 drivers/char/tpm/tpm_tis_core.h |  2 ++
 2 files changed, 29 insertions(+)

Comments

Thomas Gleixner Dec. 6, 2020, 7:26 p.m. UTC | #1
Jerry,

On Fri, Dec 04 2020 at 18:43, Jerry Snitselaar wrote:
> @@ -715,9 +717,23 @@ static irqreturn_t tis_int_handler(int dummy, void *dev_id)
>  {
>  	struct tpm_chip *chip = dev_id;
>  	struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev);
> +	static bool check_storm = true;
> +	static unsigned int check_start;

So this assumes that there can't be two TPMs which is probably true, but
everything else in this driver has stuff in tpm_tis_data per device.

>  	u32 interrupt;
>  	int i, rc;
>  
> +	if (unlikely(check_storm)) {
> +		if (!check_start) {
> +			check_start = jiffies_to_msecs(jiffies);

Yuck. I had to read that twice to figure out that it's correct vs. the
truncation of the result to unsigned int. You can spare that conversion
by simply doing

   unsigned long end_of_check = jiffies + HZ / 2;

and then the check becomes

    time_before(jiffies, end_of_check)

> +		} else if ((kstat_irqs(priv->irq) > 1000) &&
> +			   (jiffies_to_msecs(jiffies) - check_start < 500)) {

I assume you can't call disable_irq_nosync() here, but shouldn't this
shut up the interrupt at the TPM level right here?

> +			check_storm = false;
> +			schedule_work(&priv->storm_work);
> +		} else if (jiffies_to_msecs(jiffies) - check_start >= 500) {
> +			check_storm = false;
> +		}
> +	}

So back to kstat_irqs(). As this needs two extra variables anyway:

init()
	priv->irq_check = 1;
	priv->end_check = 0;

isr()
	if (unlikely(priv->irq_check)) {
		if (!priv->end_check) {
			priv->end_check = jiffies + HZ / 2;
		} else if (time_before(jiffies, priv->end_check)) {
			if (priv->irq_check++ > 1000)
				schedule_work(...);
		} else {
			priv->irq_check = 0;
		}
	}

Hmm? I still need to see an argument for an kstat_irqs() export being
superior.

Though I wonder whether such an infrastructure should be provided in the
irq core. Let me think about it.

Just as a side note. I was looking at tpm_tis_probe_irq_single() and
that function is leaking the interrupt request if any of the checks
afterwards fails, except for the final interrupt probe check which does
a cleanup. That means on fail before that the interrupt handler stays
requested up to the point where the module is removed. If that's a
shared interrupt and some other device is active on the same line, then
each interrupt from that device will call into the TPM code. Something
like the below is needed.

Also the X86 autoprobe mechanism is interesting:

	if (IS_ENABLED(CONFIG_X86))
		for (i = 3; i <= 15; i++)
			if (!tpm_tis_probe_irq_single(chip, intmask, 0, i))
				return;

The third argument is 'flags' which is handed to request_irq(). So that
won't ever be able to probe a shared interrupt. But if an interrupt
number > 0 is handed to tpm_tis_core_init() the interrupt is requested
with IRQF_SHARED. Same issue when the chip has an interrupt number in
the register. It's also requested exclusive which is pretty likely
to fail on ancient x86 machines.

The vast amount of comments didn't help to figure out what the reasoning
is.

Thanks,

        tglx
---
 drivers/char/tpm/tpm_tis_core.c |   14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

--- a/drivers/char/tpm/tpm_tis_core.c
+++ b/drivers/char/tpm/tpm_tis_core.c
@@ -782,26 +782,26 @@ static int tpm_tis_probe_irq_single(stru
 	rc = tpm_tis_read8(priv, TPM_INT_VECTOR(priv->locality),
 			   &original_int_vec);
 	if (rc < 0)
-		return rc;
+		goto fail;
 
 	rc = tpm_tis_write8(priv, TPM_INT_VECTOR(priv->locality), irq);
 	if (rc < 0)
-		return rc;
+		goto fail;
 
 	rc = tpm_tis_read32(priv, TPM_INT_STATUS(priv->locality), &int_status);
 	if (rc < 0)
-		return rc;
+		goto fail;
 
 	/* Clear all existing */
 	rc = tpm_tis_write32(priv, TPM_INT_STATUS(priv->locality), int_status);
 	if (rc < 0)
-		return rc;
+		goto fail;
 
 	/* Turn on */
 	rc = tpm_tis_write32(priv, TPM_INT_ENABLE(priv->locality),
 			     intmask | TPM_GLOBAL_INT_ENABLE);
 	if (rc < 0)
-		return rc;
+		goto fail;
 
 	priv->irq_tested = false;
 
@@ -825,6 +825,10 @@ static int tpm_tis_probe_irq_single(stru
 	}
 
 	return 0;
+
+fail:
+	disable_interrupts(chip);
+	return rc;
 }
 
 /* Try to find the IRQ the TPM is using. This is for legacy x86 systems that
Jason Gunthorpe Dec. 7, 2020, 7:28 p.m. UTC | #2
On Sun, Dec 06, 2020 at 08:26:16PM +0100, Thomas Gleixner wrote:
> Just as a side note. I was looking at tpm_tis_probe_irq_single() and
> that function is leaking the interrupt request if any of the checks
> afterwards fails, except for the final interrupt probe check which does
> a cleanup. That means on fail before that the interrupt handler stays
> requested up to the point where the module is removed. If that's a
> shared interrupt and some other device is active on the same line, then
> each interrupt from that device will call into the TPM code. Something
> like the below is needed.
> 
> Also the X86 autoprobe mechanism is interesting:
> 
> 	if (IS_ENABLED(CONFIG_X86))
> 		for (i = 3; i <= 15; i++)
> 			if (!tpm_tis_probe_irq_single(chip, intmask, 0, i))
> 				return;
> 
> The third argument is 'flags' which is handed to request_irq(). So that
> won't ever be able to probe a shared interrupt. But if an interrupt
> number > 0 is handed to tpm_tis_core_init() the interrupt is requested
> with IRQF_SHARED. Same issue when the chip has an interrupt number in
> the register. It's also requested exclusive which is pretty likely
> to fail on ancient x86 machines.

It is very likely none of this works any more, it has been repeatedly
reworked over the years and just left behind out of fear someone needs
it. I've thought it should be deleted for a while now.

I suppose the original logic was to try and probe without SHARED
because a probe would need exclusive access to the interrupt to tell
if the TPM was actually the source, not some other device.

It is all very old and very out of step with current thinking, IMHO. I
skeptical that TPM interrupts were ever valuable enough to deserve
this in the first place.

Jason
James Bottomley Dec. 7, 2020, 7:58 p.m. UTC | #3
On Mon, 2020-12-07 at 15:28 -0400, Jason Gunthorpe wrote:
> On Sun, Dec 06, 2020 at 08:26:16PM +0100, Thomas Gleixner wrote:
> > Just as a side note. I was looking at tpm_tis_probe_irq_single()
> > and that function is leaking the interrupt request if any of the
> > checks afterwards fails, except for the final interrupt probe check
> > which does a cleanup. That means on fail before that the interrupt
> > handler stays requested up to the point where the module is
> > removed. If that's a shared interrupt and some other device is
> > active on the same line, then each interrupt from that device will
> > call into the TPM code. Something like the below is needed.
> > 
> > Also the X86 autoprobe mechanism is interesting:
> > 
> > 	if (IS_ENABLED(CONFIG_X86))
> > 		for (i = 3; i <= 15; i++)
> > 			if (!tpm_tis_probe_irq_single(chip, intmask, 0,
> > i))
> > 				return;
> > 
> > The third argument is 'flags' which is handed to request_irq(). So
> > that won't ever be able to probe a shared interrupt. But if an
> > interrupt number > 0 is handed to tpm_tis_core_init() the interrupt
> > is requested with IRQF_SHARED. Same issue when the chip has an
> > interrupt number in the register. It's also requested exclusive
> > which is pretty likely to fail on ancient x86 machines.
> 
> It is very likely none of this works any more, it has been repeatedly
> reworked over the years and just left behind out of fear someone
> needs it. I've thought it should be deleted for a while now.
> 
> I suppose the original logic was to try and probe without SHARED
> because a probe would need exclusive access to the interrupt to tell
> if the TPM was actually the source, not some other device.
> 
> It is all very old and very out of step with current thinking, IMHO.
> I skeptical that TPM interrupts were ever valuable enough to deserve
> this in the first place.

For what it's worth, I agree.  Trying to probe all 15 ISA interrupts is
last millennium thinking we should completely avoid.  If it's not
described in ACPI then you don't get an interrupt full stop.

James
Jarkko Sakkinen Dec. 8, 2020, 5:42 p.m. UTC | #4
On Mon, Dec 07, 2020 at 03:28:03PM -0400, Jason Gunthorpe wrote:
> On Sun, Dec 06, 2020 at 08:26:16PM +0100, Thomas Gleixner wrote:
> > Just as a side note. I was looking at tpm_tis_probe_irq_single() and
> > that function is leaking the interrupt request if any of the checks
> > afterwards fails, except for the final interrupt probe check which does
> > a cleanup. That means on fail before that the interrupt handler stays
> > requested up to the point where the module is removed. If that's a
> > shared interrupt and some other device is active on the same line, then
> > each interrupt from that device will call into the TPM code. Something
> > like the below is needed.
> > 
> > Also the X86 autoprobe mechanism is interesting:
> > 
> > 	if (IS_ENABLED(CONFIG_X86))
> > 		for (i = 3; i <= 15; i++)
> > 			if (!tpm_tis_probe_irq_single(chip, intmask, 0, i))
> > 				return;
> > 
> > The third argument is 'flags' which is handed to request_irq(). So that
> > won't ever be able to probe a shared interrupt. But if an interrupt
> > number > 0 is handed to tpm_tis_core_init() the interrupt is requested
> > with IRQF_SHARED. Same issue when the chip has an interrupt number in
> > the register. It's also requested exclusive which is pretty likely
> > to fail on ancient x86 machines.
> 
> It is very likely none of this works any more, it has been repeatedly
> reworked over the years and just left behind out of fear someone needs
> it. I've thought it should be deleted for a while now.
> 
> I suppose the original logic was to try and probe without SHARED
> because a probe would need exclusive access to the interrupt to tell
> if the TPM was actually the source, not some other device.
> 
> It is all very old and very out of step with current thinking, IMHO. I
> skeptical that TPM interrupts were ever valuable enough to deserve
> this in the first place.
> 
> Jason

+1 for removing it.

/Jarkko
Jarkko Sakkinen Dec. 8, 2020, 5:43 p.m. UTC | #5
On Mon, Dec 07, 2020 at 11:58:44AM -0800, James Bottomley wrote:
> On Mon, 2020-12-07 at 15:28 -0400, Jason Gunthorpe wrote:
> > On Sun, Dec 06, 2020 at 08:26:16PM +0100, Thomas Gleixner wrote:
> > > Just as a side note. I was looking at tpm_tis_probe_irq_single()
> > > and that function is leaking the interrupt request if any of the
> > > checks afterwards fails, except for the final interrupt probe check
> > > which does a cleanup. That means on fail before that the interrupt
> > > handler stays requested up to the point where the module is
> > > removed. If that's a shared interrupt and some other device is
> > > active on the same line, then each interrupt from that device will
> > > call into the TPM code. Something like the below is needed.
> > > 
> > > Also the X86 autoprobe mechanism is interesting:
> > > 
> > > 	if (IS_ENABLED(CONFIG_X86))
> > > 		for (i = 3; i <= 15; i++)
> > > 			if (!tpm_tis_probe_irq_single(chip, intmask, 0,
> > > i))
> > > 				return;
> > > 
> > > The third argument is 'flags' which is handed to request_irq(). So
> > > that won't ever be able to probe a shared interrupt. But if an
> > > interrupt number > 0 is handed to tpm_tis_core_init() the interrupt
> > > is requested with IRQF_SHARED. Same issue when the chip has an
> > > interrupt number in the register. It's also requested exclusive
> > > which is pretty likely to fail on ancient x86 machines.
> > 
> > It is very likely none of this works any more, it has been repeatedly
> > reworked over the years and just left behind out of fear someone
> > needs it. I've thought it should be deleted for a while now.
> > 
> > I suppose the original logic was to try and probe without SHARED
> > because a probe would need exclusive access to the interrupt to tell
> > if the TPM was actually the source, not some other device.
> > 
> > It is all very old and very out of step with current thinking, IMHO.
> > I skeptical that TPM interrupts were ever valuable enough to deserve
> > this in the first place.
> 
> For what it's worth, I agree.  Trying to probe all 15 ISA interrupts is
> last millennium thinking we should completely avoid.  If it's not
> described in ACPI then you don't get an interrupt full stop.
> 
> James

Maybe you could add this as part of your patches?

/Jarkko
diff mbox series

Patch

diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c
index 92c51c6cfd1b..d817ff5664d1 100644
--- a/drivers/char/tpm/tpm_tis_core.c
+++ b/drivers/char/tpm/tpm_tis_core.c
@@ -24,6 +24,8 @@ 
 #include <linux/wait.h>
 #include <linux/acpi.h>
 #include <linux/freezer.h>
+#include <linux/workqueue.h>
+#include <linux/irq.h>
 #include "tpm.h"
 #include "tpm_tis_core.h"
 
@@ -715,9 +717,23 @@  static irqreturn_t tis_int_handler(int dummy, void *dev_id)
 {
 	struct tpm_chip *chip = dev_id;
 	struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev);
+	static bool check_storm = true;
+	static unsigned int check_start;
 	u32 interrupt;
 	int i, rc;
 
+	if (unlikely(check_storm)) {
+		if (!check_start) {
+			check_start = jiffies_to_msecs(jiffies);
+		} else if ((kstat_irqs(priv->irq) > 1000) &&
+			   (jiffies_to_msecs(jiffies) - check_start < 500)) {
+			check_storm = false;
+			schedule_work(&priv->storm_work);
+		} else if (jiffies_to_msecs(jiffies) - check_start >= 500) {
+			check_storm = false;
+		}
+	}
+
 	rc = tpm_tis_read32(priv, TPM_INT_STATUS(priv->locality), &interrupt);
 	if (rc < 0)
 		return IRQ_NONE;
@@ -943,6 +959,14 @@  static const struct tpm_class_ops tpm_tis = {
 	.clk_enable = tpm_tis_clkrun_enable,
 };
 
+static void tpm_tis_storm_work(struct work_struct *work)
+{
+	struct tpm_tis_data *priv = container_of(work, struct tpm_tis_data, storm_work);
+
+	disable_interrupts(priv->chip);
+	dev_warn(&priv->chip->dev, "Interrupt storm detected, using polling.\n");
+}
+
 int tpm_tis_core_init(struct device *dev, struct tpm_tis_data *priv, int irq,
 		      const struct tpm_tis_phy_ops *phy_ops,
 		      acpi_handle acpi_dev_handle)
@@ -959,6 +983,9 @@  int tpm_tis_core_init(struct device *dev, struct tpm_tis_data *priv, int irq,
 	if (IS_ERR(chip))
 		return PTR_ERR(chip);
 
+	priv->chip = chip;
+	INIT_WORK(&priv->storm_work, tpm_tis_storm_work);
+
 #ifdef CONFIG_ACPI
 	chip->acpi_dev_handle = acpi_dev_handle;
 #endif
diff --git a/drivers/char/tpm/tpm_tis_core.h b/drivers/char/tpm/tpm_tis_core.h
index 9b2d32a59f67..973297ee2e16 100644
--- a/drivers/char/tpm/tpm_tis_core.h
+++ b/drivers/char/tpm/tpm_tis_core.h
@@ -95,6 +95,8 @@  struct tpm_tis_data {
 	u16 clkrun_enabled;
 	wait_queue_head_t int_queue;
 	wait_queue_head_t read_queue;
+	struct work_struct storm_work;
+	struct tpm_chip *chip;
 	const struct tpm_tis_phy_ops *phy_ops;
 	unsigned short rng_quality;
 };