diff mbox series

[v6,06/20] irqdomain: Fix mapping-creation race

Message ID 20230213104302.17307-7-johan+linaro@kernel.org (mailing list archive)
State Handled Elsewhere
Headers show
Series irqdomain: fix mapping race and rework locking | expand

Commit Message

Johan Hovold Feb. 13, 2023, 10:42 a.m. UTC
Parallel probing of devices that share interrupts (e.g. when a driver
uses asynchronous probing) can currently result in two mappings for the
same hardware interrupt to be created due to missing serialisation.

Make sure to hold the irq_domain_mutex when creating mappings so that
looking for an existing mapping before creating a new one is done
atomically.

Fixes: 765230b5f084 ("driver-core: add asynchronous probing support for drivers")
Fixes: b62b2cf5759b ("irqdomain: Fix handling of type settings for existing mappings")
Link: https://lore.kernel.org/r/YuJXMHoT4ijUxnRb@hovoldconsulting.com
Cc: stable@vger.kernel.org      # 4.8
Cc: Dmitry Torokhov <dtor@chromium.org>
Cc: Jon Hunter <jonathanh@nvidia.com>
Tested-by: Hsin-Yi Wang <hsinyi@chromium.org>
Tested-by: Mark-PK Tsai <mark-pk.tsai@mediatek.com>
Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
---
 kernel/irq/irqdomain.c | 64 ++++++++++++++++++++++++++++++------------
 1 file changed, 46 insertions(+), 18 deletions(-)

Comments

Cyril Brulebois March 8, 2023, 2:41 p.m. UTC | #1
Hi Johan,

And thanks so much for this patch series.

Johan Hovold <johan+linaro@kernel.org> (2023-02-13):
> Parallel probing of devices that share interrupts (e.g. when a driver
> uses asynchronous probing) can currently result in two mappings for the
> same hardware interrupt to be created due to missing serialisation.
> 
> Make sure to hold the irq_domain_mutex when creating mappings so that
> looking for an existing mapping before creating a new one is done
> atomically.

Just for information: This patch fixes a long-standing regression
regarding Raspberry Pi devices, which have been failing to boot (at
least reliably) due to MMC timeouts for a long while; I think that
started between v5.17 and v5.19, but I couldn't bisect at the time
(I was already chasing some other regression).

Example bug report:
  https://bugs.debian.org/1019700

Before trying to pinpoint when the regression appeared, I've checked
these versions, with a Debian testing userspace as of 2023-03-07:
 - v6.1.12: affected.
 - v6.2: affected.
 - v6.3-rc1: not affected.

A bisect between v6.2 and v6.3-rc1 led me to this patch specifically.
Seeing how it's part of a patch series, and how previous patches are
preliminary ones, I've checked that cherry-picking the first 6 patches
on top of v6.1.15 indeed fixes the problem there too, and it does
(git cherry-pick v6.2-rc4..601363cc08da25747feb87c55573dd54de91d66a).


With the following systems:
 - Pi 4 B, using external storage (SD card),
 - CM4 Lite on CM4 IO Board, using external storage (SD card),
 - CM4 on CM4 IO Board, using internal storage (eMMC),

I've been able to verify that v6.1.12 (baseline in Debian testing)
triggers this MMC timeout issue, while v6.1.15 + the aforementioned
range of cherry-picked commits no longer triggers this issue.

(Methodology: cold boot then reboot 20 times, monitoring via serial
console to keep HDMI output of the equation; affected systems stop
booting after 1-4 boots; unaffected systems boot and reboot just fine
all the time.)


This looks like a critical bugfix for Raspberry Pi users.

Seeing the stable@ mention is about 4.8, I suppose this is going to be
considered for a wide range of kernels already… but I'm happy to dig
into this further to pinpoint when the regression appeared, if that's
helpful.


Cheers,
Marc Zyngier March 8, 2023, 2:53 p.m. UTC | #2
On Wed, 08 Mar 2023 14:41:05 +0000,
Cyril Brulebois <kibi@debian.org> wrote:
> 
> Hi Johan,
> 
> And thanks so much for this patch series.
> 
> Johan Hovold <johan+linaro@kernel.org> (2023-02-13):
> > Parallel probing of devices that share interrupts (e.g. when a driver
> > uses asynchronous probing) can currently result in two mappings for the
> > same hardware interrupt to be created due to missing serialisation.
> > 
> > Make sure to hold the irq_domain_mutex when creating mappings so that
> > looking for an existing mapping before creating a new one is done
> > atomically.
> 
> Just for information: This patch fixes a long-standing regression
> regarding Raspberry Pi devices, which have been failing to boot (at
> least reliably) due to MMC timeouts for a long while; I think that
> started between v5.17 and v5.19, but I couldn't bisect at the time
> (I was already chasing some other regression).
> 
> Example bug report:
>   https://bugs.debian.org/1019700
> 
> Before trying to pinpoint when the regression appeared, I've checked
> these versions, with a Debian testing userspace as of 2023-03-07:
>  - v6.1.12: affected.
>  - v6.2: affected.
>  - v6.3-rc1: not affected.
> 
> A bisect between v6.2 and v6.3-rc1 led me to this patch specifically.
> Seeing how it's part of a patch series, and how previous patches are
> preliminary ones, I've checked that cherry-picking the first 6 patches
> on top of v6.1.15 indeed fixes the problem there too, and it does
> (git cherry-pick v6.2-rc4..601363cc08da25747feb87c55573dd54de91d66a).
> 
> 
> With the following systems:
>  - Pi 4 B, using external storage (SD card),
>  - CM4 Lite on CM4 IO Board, using external storage (SD card),
>  - CM4 on CM4 IO Board, using internal storage (eMMC),
> 
> I've been able to verify that v6.1.12 (baseline in Debian testing)
> triggers this MMC timeout issue, while v6.1.15 + the aforementioned
> range of cherry-picked commits no longer triggers this issue.
> 
> (Methodology: cold boot then reboot 20 times, monitoring via serial
> console to keep HDMI output of the equation; affected systems stop
> booting after 1-4 boots; unaffected systems boot and reboot just fine
> all the time.)
> 
> 
> This looks like a critical bugfix for Raspberry Pi users.
> 
> Seeing the stable@ mention is about 4.8, I suppose this is going to be
> considered for a wide range of kernels already… but I'm happy to dig
> into this further to pinpoint when the regression appeared, if that's
> helpful.

If you have an interest in these patches being backported, may I
suggest you look at the backporting failures that have been
reported[1]?

Note that now that 4.9 is out of the picture, nothing is going to be
backported past 4.14.

Thanks,

	M.

[1] https://lore.kernel.org/r/167812853717924@kroah.com
Johan Hovold March 9, 2023, 7:32 a.m. UTC | #3
On Wed, Mar 08, 2023 at 03:41:05PM +0100, Cyril Brulebois wrote:
> Hi Johan,
> 
> And thanks so much for this patch series.
> 
> Johan Hovold <johan+linaro@kernel.org> (2023-02-13):
> > Parallel probing of devices that share interrupts (e.g. when a driver
> > uses asynchronous probing) can currently result in two mappings for the
> > same hardware interrupt to be created due to missing serialisation.
> > 
> > Make sure to hold the irq_domain_mutex when creating mappings so that
> > looking for an existing mapping before creating a new one is done
> > atomically.
> 
> Just for information: This patch fixes a long-standing regression
> regarding Raspberry Pi devices, which have been failing to boot (at
> least reliably) due to MMC timeouts for a long while; I think that
> started between v5.17 and v5.19, but I couldn't bisect at the time
> (I was already chasing some other regression).
> 
> Example bug report:
>   https://bugs.debian.org/1019700
> 
> Before trying to pinpoint when the regression appeared, I've checked
> these versions, with a Debian testing userspace as of 2023-03-07:
>  - v6.1.12: affected.
>  - v6.2: affected.
>  - v6.3-rc1: not affected.
> 
> A bisect between v6.2 and v6.3-rc1 led me to this patch specifically.
> Seeing how it's part of a patch series, and how previous patches are
> preliminary ones, I've checked that cherry-picking the first 6 patches
> on top of v6.1.15 indeed fixes the problem there too, and it does
> (git cherry-pick v6.2-rc4..601363cc08da25747feb87c55573dd54de91d66a).

That's good to hear, thanks for reporting.

> With the following systems:
>  - Pi 4 B, using external storage (SD card),
>  - CM4 Lite on CM4 IO Board, using external storage (SD card),
>  - CM4 on CM4 IO Board, using internal storage (eMMC),
> 
> I've been able to verify that v6.1.12 (baseline in Debian testing)
> triggers this MMC timeout issue, while v6.1.15 + the aforementioned
> range of cherry-picked commits no longer triggers this issue.
> 
> (Methodology: cold boot then reboot 20 times, monitoring via serial
> console to keep HDMI output of the equation; affected systems stop
> booting after 1-4 boots; unaffected systems boot and reboot just fine
> all the time.)
> 
> 
> This looks like a critical bugfix for Raspberry Pi users.
> 
> Seeing the stable@ mention is about 4.8, I suppose this is going to be
> considered for a wide range of kernels already… but I'm happy to dig
> into this further to pinpoint when the regression appeared, if that's
> helpful.

I took a quick look at the rpi-4 devicetree and it looks like the emmc
indeed is sharing an irq line with another device which should make is
susceptible to the mapping race.

The corresponding drivers also use asynchronous probing since commits

	7320915c8861 ("mmc: Set PROBE_PREFER_ASYNCHRONOUS for drivers that existed in v4.14")
	21b2cec61c04 ("mmc: Set PROBE_PREFER_ASYNCHRONOUS for drivers that existed in v4.4"

that went into 5.10. So the issue should have been there for longer but
perhaps only manifested itself around 5.17 due to changes in timing.

Johan
diff mbox series

Patch

diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c
index 7b57949bc79c..bfda4adc05c0 100644
--- a/kernel/irq/irqdomain.c
+++ b/kernel/irq/irqdomain.c
@@ -25,6 +25,9 @@  static DEFINE_MUTEX(irq_domain_mutex);
 
 static struct irq_domain *irq_default_domain;
 
+static int irq_domain_alloc_irqs_locked(struct irq_domain *domain, int irq_base,
+					unsigned int nr_irqs, int node, void *arg,
+					bool realloc, const struct irq_affinity_desc *affinity);
 static void irq_domain_check_hierarchy(struct irq_domain *domain);
 
 struct irqchip_fwid {
@@ -682,9 +685,9 @@  unsigned int irq_create_direct_mapping(struct irq_domain *domain)
 EXPORT_SYMBOL_GPL(irq_create_direct_mapping);
 #endif
 
-static unsigned int __irq_create_mapping_affinity(struct irq_domain *domain,
-						  irq_hw_number_t hwirq,
-						  const struct irq_affinity_desc *affinity)
+static unsigned int irq_create_mapping_affinity_locked(struct irq_domain *domain,
+						       irq_hw_number_t hwirq,
+						       const struct irq_affinity_desc *affinity)
 {
 	struct device_node *of_node = irq_domain_get_of_node(domain);
 	int virq;
@@ -699,7 +702,7 @@  static unsigned int __irq_create_mapping_affinity(struct irq_domain *domain,
 		return 0;
 	}
 
-	if (irq_domain_associate(domain, virq, hwirq)) {
+	if (irq_domain_associate_locked(domain, virq, hwirq)) {
 		irq_free_desc(virq);
 		return 0;
 	}
@@ -735,14 +738,20 @@  unsigned int irq_create_mapping_affinity(struct irq_domain *domain,
 		return 0;
 	}
 
+	mutex_lock(&irq_domain_mutex);
+
 	/* Check if mapping already exists */
 	virq = irq_find_mapping(domain, hwirq);
 	if (virq) {
 		pr_debug("existing mapping on virq %d\n", virq);
-		return virq;
+		goto out;
 	}
 
-	return __irq_create_mapping_affinity(domain, hwirq, affinity);
+	virq = irq_create_mapping_affinity_locked(domain, hwirq, affinity);
+out:
+	mutex_unlock(&irq_domain_mutex);
+
+	return virq;
 }
 EXPORT_SYMBOL_GPL(irq_create_mapping_affinity);
 
@@ -809,6 +818,8 @@  unsigned int irq_create_fwspec_mapping(struct irq_fwspec *fwspec)
 	if (WARN_ON(type & ~IRQ_TYPE_SENSE_MASK))
 		type &= IRQ_TYPE_SENSE_MASK;
 
+	mutex_lock(&irq_domain_mutex);
+
 	/*
 	 * If we've already configured this interrupt,
 	 * don't do it again, or hell will break loose.
@@ -821,7 +832,7 @@  unsigned int irq_create_fwspec_mapping(struct irq_fwspec *fwspec)
 		 * interrupt number.
 		 */
 		if (type == IRQ_TYPE_NONE || type == irq_get_trigger_type(virq))
-			return virq;
+			goto out;
 
 		/*
 		 * If the trigger type has not been set yet, then set
@@ -829,35 +840,45 @@  unsigned int irq_create_fwspec_mapping(struct irq_fwspec *fwspec)
 		 */
 		if (irq_get_trigger_type(virq) == IRQ_TYPE_NONE) {
 			irq_data = irq_get_irq_data(virq);
-			if (!irq_data)
-				return 0;
+			if (!irq_data) {
+				virq = 0;
+				goto out;
+			}
 
 			irqd_set_trigger_type(irq_data, type);
-			return virq;
+			goto out;
 		}
 
 		pr_warn("type mismatch, failed to map hwirq-%lu for %s!\n",
 			hwirq, of_node_full_name(to_of_node(fwspec->fwnode)));
-		return 0;
+		virq = 0;
+		goto out;
 	}
 
 	if (irq_domain_is_hierarchy(domain)) {
-		virq = irq_domain_alloc_irqs(domain, 1, NUMA_NO_NODE, fwspec);
-		if (virq <= 0)
-			return 0;
+		virq = irq_domain_alloc_irqs_locked(domain, -1, 1, NUMA_NO_NODE,
+						    fwspec, false, NULL);
+		if (virq <= 0) {
+			virq = 0;
+			goto out;
+		}
 	} else {
 		/* Create mapping */
-		virq = __irq_create_mapping_affinity(domain, hwirq, NULL);
+		virq = irq_create_mapping_affinity_locked(domain, hwirq, NULL);
 		if (!virq)
-			return virq;
+			goto out;
 	}
 
 	irq_data = irq_get_irq_data(virq);
-	if (WARN_ON(!irq_data))
-		return 0;
+	if (WARN_ON(!irq_data)) {
+		virq = 0;
+		goto out;
+	}
 
 	/* Store trigger type */
 	irqd_set_trigger_type(irq_data, type);
+out:
+	mutex_unlock(&irq_domain_mutex);
 
 	return virq;
 }
@@ -1888,6 +1909,13 @@  void irq_domain_set_info(struct irq_domain *domain, unsigned int virq,
 	irq_set_handler_data(virq, handler_data);
 }
 
+static int irq_domain_alloc_irqs_locked(struct irq_domain *domain, int irq_base,
+					unsigned int nr_irqs, int node, void *arg,
+					bool realloc, const struct irq_affinity_desc *affinity)
+{
+	return -EINVAL;
+}
+
 static void irq_domain_check_hierarchy(struct irq_domain *domain)
 {
 }