diff mbox series

[v8,11/16] irqchip/gic-v3: Add support for ACPI's disabled but 'online capable' CPUs

Message ID 20240426135126.12802-12-Jonathan.Cameron@huawei.com (mailing list archive)
State Superseded, archived
Headers show
Series ACPI/arm64: add support for virtual cpu hotplug | expand

Commit Message

Jonathan Cameron April 26, 2024, 1:51 p.m. UTC
From: James Morse <james.morse@arm.com>

To support virtual CPU hotplug, ACPI has added an 'online capable' bit
to the MADT GICC entries. This indicates a disabled CPU entry may not
be possible to online via PSCI until firmware has set enabled bit in
_STA.

This means that a "usable" GIC is one that is marked as either enabled,
or online capable. Therefore, change acpi_gicc_is_usable() to check both
bits. However, we need to change the test in gic_acpi_match_gicc() back
to testing just the enabled bit so the count of enabled distributors is
correct.

What about the redistributor in the GICC entry? ACPI doesn't want to say.
Assume the worst: When a redistributor is described in the GICC entry,
but the entry is marked as disabled at boot, assume the redistributor
is inaccessible.

The GICv3 driver doesn't support late online of redistributors, so this
means the corresponding CPU can't be brought online either.
Rather than modifying cpu masks that may already have been used,
register a new cpuhp callback to fail this case. This must run earlier
than the main gic_starting_cpu() so that this case can be rejected
before the section of cpuhp that runs on the CPU that is coming up as
that is not allowed to fail. This solution keeps the handling of this
broken firmware corner case local to the GIC driver. As precise ordering
of this callback doesn't need to be controlled as long as it is
in that initial prepare phase, use CPUHP_BP_PREPARE_DYN.

Systems that want CPU hotplug in a VM can ensure their redistributors
are always-on, and describe them that way with a GICR entry in the MADT.

Suggested-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Tested-by: Miguel Luis <miguel.luis@oracle.com>
Co-developed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

---
Thanks to Marc for review and suggestions!
v8: Change the handling of broken rdists to fail cpuhp rather than
    modifying the cpu_present and cpu_possible masks.
    Updated commit text to reflect that.
    Added a sb tag for Marc given this is more or less what he put
    in his review comment.
---
 drivers/irqchip/irq-gic-v3.c | 38 ++++++++++++++++++++++++++++++++++--
 include/linux/acpi.h         |  3 ++-
 2 files changed, 38 insertions(+), 3 deletions(-)

Comments

Marc Zyngier April 26, 2024, 4:26 p.m. UTC | #1
On Fri, 26 Apr 2024 14:51:21 +0100,
Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:
> 
> From: James Morse <james.morse@arm.com>
> 
> To support virtual CPU hotplug, ACPI has added an 'online capable' bit
> to the MADT GICC entries. This indicates a disabled CPU entry may not
> be possible to online via PSCI until firmware has set enabled bit in
> _STA.
> 
> This means that a "usable" GIC is one that is marked as either enabled,

nit: "GIC" usually designs the whole HW infrastructure (distributor,
redistributors, and ITSs). My understanding is that you are only
referring to the redistributors.

> or online capable. Therefore, change acpi_gicc_is_usable() to check both
> bits. However, we need to change the test in gic_acpi_match_gicc() back
> to testing just the enabled bit so the count of enabled distributors is
> correct.
> 
> What about the redistributor in the GICC entry? ACPI doesn't want to say.
> Assume the worst: When a redistributor is described in the GICC entry,
> but the entry is marked as disabled at boot, assume the redistributor
> is inaccessible.
> 
> The GICv3 driver doesn't support late online of redistributors, so this
> means the corresponding CPU can't be brought online either.
> Rather than modifying cpu masks that may already have been used,
> register a new cpuhp callback to fail this case. This must run earlier
> than the main gic_starting_cpu() so that this case can be rejected
> before the section of cpuhp that runs on the CPU that is coming up as
> that is not allowed to fail. This solution keeps the handling of this
> broken firmware corner case local to the GIC driver. As precise ordering
> of this callback doesn't need to be controlled as long as it is
> in that initial prepare phase, use CPUHP_BP_PREPARE_DYN.
> 
> Systems that want CPU hotplug in a VM can ensure their redistributors
> are always-on, and describe them that way with a GICR entry in the MADT.
> 
> Suggested-by: Marc Zyngier <maz@kernel.org>
> Signed-off-by: James Morse <james.morse@arm.com>
> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
> Tested-by: Miguel Luis <miguel.luis@oracle.com>
> Co-developed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> 
> ---
> Thanks to Marc for review and suggestions!
> v8: Change the handling of broken rdists to fail cpuhp rather than
>     modifying the cpu_present and cpu_possible masks.
>     Updated commit text to reflect that.
>     Added a sb tag for Marc given this is more or less what he put
>     in his review comment.
> ---
>  drivers/irqchip/irq-gic-v3.c | 38 ++++++++++++++++++++++++++++++++++--
>  include/linux/acpi.h         |  3 ++-
>  2 files changed, 38 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
> index 10af15f93d4d..b4685991953e 100644
> --- a/drivers/irqchip/irq-gic-v3.c
> +++ b/drivers/irqchip/irq-gic-v3.c
> @@ -44,6 +44,8 @@
>  
>  #define GIC_IRQ_TYPE_PARTITION	(GIC_IRQ_TYPE_LPI + 1)
>  
> +static struct cpumask broken_rdists __read_mostly;
> +
>  struct redist_region {
>  	void __iomem		*redist_base;
>  	phys_addr_t		phys_base;
> @@ -1293,6 +1295,18 @@ static void gic_cpu_init(void)
>  #define MPIDR_TO_SGI_RS(mpidr)	(MPIDR_RS(mpidr) << ICC_SGI1R_RS_SHIFT)
>  #define MPIDR_TO_SGI_CLUSTER_ID(mpidr)	((mpidr) & ~0xFUL)
>  
> +/*
> + * gic_starting_cpu() is called after the last point where cpuhp is allowed
> + * to fail. So pre check for problems earlier.
> + */
> +static int gic_check_rdist(unsigned int cpu)
> +{
> +	if (cpumask_test_cpu(cpu, &broken_rdists))
> +		return -EINVAL;
> +
> +	return 0;
> +}
> +
>  static int gic_starting_cpu(unsigned int cpu)
>  {
>  	gic_cpu_init();
> @@ -1384,6 +1398,10 @@ static void __init gic_smp_init(void)
>  	};
>  	int base_sgi;
>  
> +	cpuhp_setup_state_nocalls(CPUHP_BP_PREPARE_DYN,
> +				  "irqchip/arm/gicv3:checkrdist",
> +				  gic_check_rdist, NULL);
> +
>  	cpuhp_setup_state_nocalls(CPUHP_AP_IRQ_GIC_STARTING,
>  				  "irqchip/arm/gicv3:starting",
>  				  gic_starting_cpu, NULL);
> @@ -2363,11 +2381,24 @@ gic_acpi_parse_madt_gicc(union acpi_subtable_headers *header,
>  				(struct acpi_madt_generic_interrupt *)header;
>  	u32 reg = readl_relaxed(acpi_data.dist_base + GICD_PIDR2) & GIC_PIDR2_ARCH_MASK;
>  	u32 size = reg == GIC_PIDR2_ARCH_GICv4 ? SZ_64K * 4 : SZ_64K * 2;
> +	int cpu = get_cpu_for_acpi_id(gicc->uid);
>  	void __iomem *redist_base;
>  
>  	if (!acpi_gicc_is_usable(gicc))
>  		return 0;
>  
> +	/*
> +	 * Capable but disabled CPUs can be brought online later. What about
> +	 * the redistributor? ACPI doesn't want to say!
> +	 * Virtual hotplug systems can use the MADT's "always-on" GICR entries.
> +	 * Otherwise, prevent such CPUs from being brought online.
> +	 */
> +	if (!(gicc->flags & ACPI_MADT_ENABLED)) {

Now this makes the above acpi_gicc_is_usable() very odd. It checks for
MADT_ENABLED *or* GICC_ONLINE_CAPABLE. But we definitely don't want to
deal with the lack of MADT_ENABLED.

So why don't we explicitly check for individual flags and get rid of
acpi_gicc_is_usable(), as its new definition doesn't tell you anything
useful?

> +		pr_warn_once("CPU %u's redistributor is inaccessible: this CPU can't be brought online\n", cpu);
> +		cpumask_set_cpu(cpu, &broken_rdists);

Given that get_cpu_for_acpi_id() can return -EINVAL, you'd want to
check that. Also, I'd like to drop the _once on the warning.
Indicating all the broken CPUs is useful information, and only happens
once per boot.

> +		return 0;
> +	}
> +
>  	redist_base = ioremap(gicc->gicr_base_address, size);
>  	if (!redist_base)
>  		return -ENOMEM;
> @@ -2413,9 +2444,12 @@ static int __init gic_acpi_match_gicc(union acpi_subtable_headers *header,
>  
>  	/*
>  	 * If GICC is enabled and has valid gicr base address, then it means
> -	 * GICR base is presented via GICC
> +	 * GICR base is presented via GICC. The redistributor is only known to
> +	 * be accessible if the GICC is marked as enabled. If this bit is not
> +	 * set, we'd need to add the redistributor at runtime, which isn't
> +	 * supported.
>  	 */
> -	if (acpi_gicc_is_usable(gicc) && gicc->gicr_base_address)
> +	if (gicc->flags & ACPI_MADT_ENABLED && gicc->gicr_base_address)
>  		acpi_data.enabled_rdists++;
>  
>  	return 0;
> diff --git a/include/linux/acpi.h b/include/linux/acpi.h
> index 9844a3f9c4e5..fcfb7bb6789e 100644
> --- a/include/linux/acpi.h
> +++ b/include/linux/acpi.h
> @@ -239,7 +239,8 @@ void acpi_table_print_madt_entry (struct acpi_subtable_header *madt);
>  
>  static inline bool acpi_gicc_is_usable(struct acpi_madt_generic_interrupt *gicc)
>  {
> -	return gicc->flags & ACPI_MADT_ENABLED;
> +	return gicc->flags & (ACPI_MADT_ENABLED |
> +			      ACPI_MADT_GICC_ONLINE_CAPABLE);
>  }
>  
>  /* the following numa functions are architecture-dependent */

Thanks,

	M.
Jonathan Cameron April 26, 2024, 6:28 p.m. UTC | #2
> > @@ -2363,11 +2381,24 @@ gic_acpi_parse_madt_gicc(union acpi_subtable_headers *header,
> >  				(struct acpi_madt_generic_interrupt *)header;
> >  	u32 reg = readl_relaxed(acpi_data.dist_base + GICD_PIDR2) & GIC_PIDR2_ARCH_MASK;
> >  	u32 size = reg == GIC_PIDR2_ARCH_GICv4 ? SZ_64K * 4 : SZ_64K * 2;
> > +	int cpu = get_cpu_for_acpi_id(gicc->uid);
> >  	void __iomem *redist_base;
> >  
> >  	if (!acpi_gicc_is_usable(gicc))
> >  		return 0;
> >  
> > +	/*
> > +	 * Capable but disabled CPUs can be brought online later. What about
> > +	 * the redistributor? ACPI doesn't want to say!
> > +	 * Virtual hotplug systems can use the MADT's "always-on" GICR entries.
> > +	 * Otherwise, prevent such CPUs from being brought online.
> > +	 */
> > +	if (!(gicc->flags & ACPI_MADT_ENABLED)) {  
> 
> Now this makes the above acpi_gicc_is_usable() very odd. It checks for
> MADT_ENABLED *or* GICC_ONLINE_CAPABLE. But we definitely don't want to
> deal with the lack of MADT_ENABLED.
> 
> So why don't we explicitly check for individual flags and get rid of
> acpi_gicc_is_usable(), as its new definition doesn't tell you anything
> useful?

That does seem to have evolved to something rather odd.

I messed around with various reorganizations of the boolean logic
and ended up with same 2 conditions as here as otherwise
the indent gets deep and the code becomes fiddlier to reason about
(see below for result)

> 
> > +		return 0;
> > +	}
> > +
> >  	redist_base = ioremap(gicc->gicr_base_address, size);
> >  	if (!redist_base)
> >  		return -ENOMEM;
> > @@ -2413,9 +2444,12 @@ static int __init gic_acpi_match_gicc(union acpi_subtable_headers *header,
> >  
> >  	/*
> >  	 * If GICC is enabled and has valid gicr base address, then it means
> > -	 * GICR base is presented via GICC
> > +	 * GICR base is presented via GICC. The redistributor is only known to
> > +	 * be accessible if the GICC is marked as enabled. If this bit is not
> > +	 * set, we'd need to add the redistributor at runtime, which isn't
> > +	 * supported.
> >  	 */
> > -	if (acpi_gicc_is_usable(gicc) && gicc->gicr_base_address)
> > +	if (gicc->flags & ACPI_MADT_ENABLED && gicc->gicr_base_address)
> >  		acpi_data.enabled_rdists++;
> >  
> >  	return 0;
> > diff --git a/include/linux/acpi.h b/include/linux/acpi.h
> > index 9844a3f9c4e5..fcfb7bb6789e 100644
> > --- a/include/linux/acpi.h
> > +++ b/include/linux/acpi.h
> > @@ -239,7 +239,8 @@ void acpi_table_print_madt_entry (struct acpi_subtable_header *madt);
> >  
> >  static inline bool acpi_gicc_is_usable(struct acpi_madt_generic_interrupt *gicc)
> >  {
> > -	return gicc->flags & ACPI_MADT_ENABLED;
> > +	return gicc->flags & (ACPI_MADT_ENABLED |
> > +			      ACPI_MADT_GICC_ONLINE_CAPABLE);
> >  }
> >  
> >  /* the following numa functions are architecture-dependent */  
> 
> Thanks,

I'll not send a formal v9 until early next week, so here is the current state
if you have time to take another look before then.

From a8a54cfbadccf1782b7cc04b93eb875dedbee7a9 Mon Sep 17 00:00:00 2001
From: James Morse <james.morse@arm.com>
Date: Thu, 18 Apr 2024 14:54:07 +0100
Subject: [PATCH] irqchip/gic-v3: Add support for ACPI's disabled but 'online
 capable' CPUs

To support virtual CPU hotplug, ACPI has added an 'online capable' bit
to the MADT GICC entries. This indicates a disabled CPU entry may not
be possible to online via PSCI until firmware has set enabled bit in
_STA.

This means that a "usable" GIC redistributor is one that is marked as
either enabled, or online capable. The meaning of the
acpi_gicc_is_usable() would become less clear than just checking the
pair of flags at call sites. As such, drop that helper function.
The test in gic_acpi_match_gicc() remains as testing just the
enabled bit so the count of enabled distributors is correct.

What about the redistributor in the GICC entry? ACPI doesn't want to say.
Assume the worst: When a redistributor is described in the GICC entry,
but the entry is marked as disabled at boot, assume the redistributor
is inaccessible.

The GICv3 driver doesn't support late online of redistributors, so this
means the corresponding CPU can't be brought online either.
Rather than modifying cpu masks that may already have been used,
register a new cpuhp callback to fail this case. This must run earlier
than the main gic_starting_cpu() so that this case can be rejected
before the section of cpuhp that runs on the CPU that is coming up as
that is not allowed to fail. This solution keeps the handling of this
broken firmware corner case local to the GIC driver. As precise ordering
of this callback doesn't need to be controlled as long as it is
in that initial prepare phase, use CPUHP_BP_PREPARE_DYN.

Systems that want CPU hotplug in a VM can ensure their redistributors
are always-on, and describe them that way with a GICR entry in the MADT.

Suggested-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Tested-by: Miguel Luis <miguel.luis@oracle.com>
Co-developed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

---
v9: Thanks to Marc for quick follow up.
Fix up description and drop the acpi_gicc_is_usable() check given that
now doesn't actually mean they are usable.

Thanks to Marc for review and suggestions!
v8: Change the handling of broken rdists to fail cpuhp rather than
    modifying the cpu_present and cpu_possible masks.
    Updated commit text to reflect that.
    Added a sb tag for Marc given this is more or less what he put
    in his review comment.
---
 arch/arm64/kernel/smp.c       |  3 ++-
 drivers/acpi/processor_core.c |  3 ++-
 drivers/irqchip/irq-gic-v3.c  | 44 +++++++++++++++++++++++++++++++----
 include/linux/acpi.h          |  5 ----
 4 files changed, 44 insertions(+), 11 deletions(-)

diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index 4ced34f62dab..afe835c1cbe2 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -523,7 +523,8 @@ acpi_map_gic_cpu_interface(struct acpi_madt_generic_interrupt *processor)
 {
 	u64 hwid = processor->arm_mpidr;
 
-	if (!acpi_gicc_is_usable(processor)) {
+	if (!(processor->flags &
+	      (ACPI_MADT_ENABLED | ACPI_MADT_GICC_ONLINE_CAPABLE))) {
 		pr_debug("skipping disabled CPU entry with 0x%llx MPIDR\n", hwid);
 		return;
 	}
diff --git a/drivers/acpi/processor_core.c b/drivers/acpi/processor_core.c
index b203cfe28550..b04b684f3190 100644
--- a/drivers/acpi/processor_core.c
+++ b/drivers/acpi/processor_core.c
@@ -90,7 +90,8 @@ static int map_gicc_mpidr(struct acpi_subtable_header *entry,
 	struct acpi_madt_generic_interrupt *gicc =
 	    container_of(entry, struct acpi_madt_generic_interrupt, header);
 
-	if (!acpi_gicc_is_usable(gicc))
+	if (!(gicc->flags &
+	      (ACPI_MADT_ENABLED | ACPI_MADT_GICC_ONLINE_CAPABLE)))
 		return -ENODEV;
 
 	/* device_declaration means Device object in DSDT, in the
diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
index 10af15f93d4d..45272316d155 100644
--- a/drivers/irqchip/irq-gic-v3.c
+++ b/drivers/irqchip/irq-gic-v3.c
@@ -44,6 +44,8 @@
 
 #define GIC_IRQ_TYPE_PARTITION	(GIC_IRQ_TYPE_LPI + 1)
 
+static struct cpumask broken_rdists __read_mostly;
+
 struct redist_region {
 	void __iomem		*redist_base;
 	phys_addr_t		phys_base;
@@ -1293,6 +1295,18 @@ static void gic_cpu_init(void)
 #define MPIDR_TO_SGI_RS(mpidr)	(MPIDR_RS(mpidr) << ICC_SGI1R_RS_SHIFT)
 #define MPIDR_TO_SGI_CLUSTER_ID(mpidr)	((mpidr) & ~0xFUL)
 
+/*
+ * gic_starting_cpu() is called after the last point where cpuhp is allowed
+ * to fail. So pre check for problems earlier.
+ */
+static int gic_check_rdist(unsigned int cpu)
+{
+	if (cpumask_test_cpu(cpu, &broken_rdists))
+		return -EINVAL;
+
+	return 0;
+}
+
 static int gic_starting_cpu(unsigned int cpu)
 {
 	gic_cpu_init();
@@ -1384,6 +1398,10 @@ static void __init gic_smp_init(void)
 	};
 	int base_sgi;
 
+	cpuhp_setup_state_nocalls(CPUHP_BP_PREPARE_DYN,
+				  "irqchip/arm/gicv3:checkrdist",
+				  gic_check_rdist, NULL);
+
 	cpuhp_setup_state_nocalls(CPUHP_AP_IRQ_GIC_STARTING,
 				  "irqchip/arm/gicv3:starting",
 				  gic_starting_cpu, NULL);
@@ -2363,11 +2381,25 @@ gic_acpi_parse_madt_gicc(union acpi_subtable_headers *header,
 				(struct acpi_madt_generic_interrupt *)header;
 	u32 reg = readl_relaxed(acpi_data.dist_base + GICD_PIDR2) & GIC_PIDR2_ARCH_MASK;
 	u32 size = reg == GIC_PIDR2_ARCH_GICv4 ? SZ_64K * 4 : SZ_64K * 2;
+	int cpu = get_cpu_for_acpi_id(gicc->uid);
 	void __iomem *redist_base;
 
-	if (!acpi_gicc_is_usable(gicc))
+	/* Neither enabled or online capable means it doesn't exist, skip it */
+	if (!(gicc->flags & (ACPI_MADT_ENABLED | ACPI_MADT_GICC_ONLINE_CAPABLE)))
 		return 0;
 
+	/*
+	 * Capable but disabled CPUs can be brought online later. What about
+	 * the redistributor? ACPI doesn't want to say!
+	 * Virtual hotplug systems can use the MADT's "always-on" GICR entries.
+	 * Otherwise, prevent such CPUs from being brought online.
+	 */
+	if (!(gicc->flags & ACPI_MADT_ENABLED)) {
+		pr_warn("CPU %u's redistributor is inaccessible: this CPU can't be brought online\n", cpu);
+		cpumask_set_cpu(cpu, &broken_rdists);
+		return 0;
+	}
+
 	redist_base = ioremap(gicc->gicr_base_address, size);
 	if (!redist_base)
 		return -ENOMEM;
@@ -2413,9 +2445,12 @@ static int __init gic_acpi_match_gicc(union acpi_subtable_headers *header,
 
 	/*
 	 * If GICC is enabled and has valid gicr base address, then it means
-	 * GICR base is presented via GICC
+	 * GICR base is presented via GICC. The redistributor is only known to
+	 * be accessible if the GICC is marked as enabled. If this bit is not
+	 * set, we'd need to add the redistributor at runtime, which isn't
+	 * supported.
 	 */
-	if (acpi_gicc_is_usable(gicc) && gicc->gicr_base_address)
+	if (gicc->flags & ACPI_MADT_ENABLED && gicc->gicr_base_address)
 		acpi_data.enabled_rdists++;
 
 	return 0;
@@ -2474,7 +2509,8 @@ static int __init gic_acpi_parse_virt_madt_gicc(union acpi_subtable_headers *hea
 	int maint_irq_mode;
 	static int first_madt = true;
 
-	if (!acpi_gicc_is_usable(gicc))
+	if (!(gicc->flags &
+	      (ACPI_MADT_ENABLED | ACPI_MADT_GICC_ONLINE_CAPABLE)))
 		return 0;
 
 	maint_irq_mode = (gicc->flags & ACPI_MADT_VGIC_IRQ_MODE) ?
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index 9844a3f9c4e5..cf5d2a6950ec 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -237,11 +237,6 @@ acpi_table_parse_cedt(enum acpi_cedt_type id,
 int acpi_parse_mcfg (struct acpi_table_header *header);
 void acpi_table_print_madt_entry (struct acpi_subtable_header *madt);
 
-static inline bool acpi_gicc_is_usable(struct acpi_madt_generic_interrupt *gicc)
-{
-	return gicc->flags & ACPI_MADT_ENABLED;
-}
-
 /* the following numa functions are architecture-dependent */
 void acpi_numa_slit_init (struct acpi_table_slit *slit);
Marc Zyngier April 28, 2024, 11:28 a.m. UTC | #3
On Fri, 26 Apr 2024 19:28:58 +0100,
Jonathan Cameron <Jonathan.Cameron@Huawei.com> wrote:
> 
> 
> I'll not send a formal v9 until early next week, so here is the current state
> if you have time to take another look before then.

Don't bother resending this on my account -- you only sent it on
Friday and there hasn't been much response to it yet. There is still a
problem (see below), but looks otherwise OK.

[...]

> @@ -2363,11 +2381,25 @@ gic_acpi_parse_madt_gicc(union acpi_subtable_headers *header,
>  				(struct acpi_madt_generic_interrupt *)header;
>  	u32 reg = readl_relaxed(acpi_data.dist_base + GICD_PIDR2) & GIC_PIDR2_ARCH_MASK;
>  	u32 size = reg == GIC_PIDR2_ARCH_GICv4 ? SZ_64K * 4 : SZ_64K * 2;
> +	int cpu = get_cpu_for_acpi_id(gicc->uid);

I already commented that get_cpu_for_acpi_id() can...

>  	void __iomem *redist_base;
>  
> -	if (!acpi_gicc_is_usable(gicc))
> +	/* Neither enabled or online capable means it doesn't exist, skip it */
> +	if (!(gicc->flags & (ACPI_MADT_ENABLED | ACPI_MADT_GICC_ONLINE_CAPABLE)))
>  		return 0;
>  
> +	/*
> +	 * Capable but disabled CPUs can be brought online later. What about
> +	 * the redistributor? ACPI doesn't want to say!
> +	 * Virtual hotplug systems can use the MADT's "always-on" GICR entries.
> +	 * Otherwise, prevent such CPUs from being brought online.
> +	 */
> +	if (!(gicc->flags & ACPI_MADT_ENABLED)) {
> +		pr_warn("CPU %u's redistributor is inaccessible: this CPU can't be brought online\n", cpu);
> +		cpumask_set_cpu(cpu, &broken_rdists);

... return -EINVAL, and then be passed to cpumask_set_cpu(), with
interesting effects. It shouldn't happen, but I trust anything that
comes from firmware tables as much as I trust a campaigning
politician's promises. This should really result in the RD being
considered unusable, but without affecting any CPU (there is no valid
CPU the first place).

Another question is what get_cpu_for acpi_id() returns for a disabled
CPU. A valid CPU number? Or -EINVAL?

Thanks,

	M.
Jonathan Cameron April 29, 2024, 9:21 a.m. UTC | #4
On Sun, 28 Apr 2024 12:28:03 +0100
Marc Zyngier <maz@kernel.org> wrote:

> On Fri, 26 Apr 2024 19:28:58 +0100,
> Jonathan Cameron <Jonathan.Cameron@Huawei.com> wrote:
> > 
> > 
> > I'll not send a formal v9 until early next week, so here is the current state
> > if you have time to take another look before then.  
> 
> Don't bother resending this on my account -- you only sent it on
> Friday and there hasn't been much response to it yet. There is still a
> problem (see below), but looks otherwise OK.
> 
> [...]
> 
> > @@ -2363,11 +2381,25 @@ gic_acpi_parse_madt_gicc(union acpi_subtable_headers *header,
> >  				(struct acpi_madt_generic_interrupt *)header;
> >  	u32 reg = readl_relaxed(acpi_data.dist_base + GICD_PIDR2) & GIC_PIDR2_ARCH_MASK;
> >  	u32 size = reg == GIC_PIDR2_ARCH_GICv4 ? SZ_64K * 4 : SZ_64K * 2;
> > +	int cpu = get_cpu_for_acpi_id(gicc->uid);  
> 
> I already commented that get_cpu_for_acpi_id() can...

Indeed sorry - I blame Friday syndrome for me failing to address that.

> 
> >  	void __iomem *redist_base;
> >  
> > -	if (!acpi_gicc_is_usable(gicc))
> > +	/* Neither enabled or online capable means it doesn't exist, skip it */
> > +	if (!(gicc->flags & (ACPI_MADT_ENABLED | ACPI_MADT_GICC_ONLINE_CAPABLE)))
> >  		return 0;
> >  
> > +	/*
> > +	 * Capable but disabled CPUs can be brought online later. What about
> > +	 * the redistributor? ACPI doesn't want to say!
> > +	 * Virtual hotplug systems can use the MADT's "always-on" GICR entries.
> > +	 * Otherwise, prevent such CPUs from being brought online.
> > +	 */
> > +	if (!(gicc->flags & ACPI_MADT_ENABLED)) {
> > +		pr_warn("CPU %u's redistributor is inaccessible: this CPU can't be brought online\n", cpu);
> > +		cpumask_set_cpu(cpu, &broken_rdists);  
> 
> ... return -EINVAL, and then be passed to cpumask_set_cpu(), with
> interesting effects. It shouldn't happen, but I trust anything that
> comes from firmware tables as much as I trust a campaigning
> politician's promises. This should really result in the RD being
> considered unusable, but without affecting any CPU (there is no valid
> CPU the first place).
> 
> Another question is what get_cpu_for acpi_id() returns for a disabled
> CPU. A valid CPU number? Or -EINVAL?
It's a match function that works by iterating over 0 to nr_cpu_ids and

if (uid == get_acpi_id_for_cpu(cpu))

So the question become does get_acpi_id_for_cpu() return a valid CPU
number for a disabled CPU.

That uses acpi_cpu_get_madt_gicc(cpu)->uid so this all gets a bit circular.
That looks it up via cpu_madt_gicc[cpu] which after the proposed updated
patch is set if enabled or online capable.  There are however a few other
error checks in acpi_map_gic_cpu_interface() that could lead to it
not being set (MPIDR validity checks). I suspect all of these end up being
fatal elsewhere which is why this hasn't blown up before.

If any of those cases are possible we could get a null pointer
dereference.

Easy to harden this case via the following (which will leave us with
-EINVAL.  There are other call sites that might trip over this.
I'm inclined to harden them as a separate issue though so as not
to get in the way of this patch set.


diff --git a/arch/arm64/include/asm/acpi.h b/arch/arm64/include/asm/acpi.h
index bc9a6656fc0c..a407f9cd549e 100644
--- a/arch/arm64/include/asm/acpi.h
+++ b/arch/arm64/include/asm/acpi.h
@@ -124,7 +124,8 @@ static inline int get_cpu_for_acpi_id(u32 uid)
        int cpu;

        for (cpu = 0; cpu < nr_cpu_ids; cpu++)
-               if (uid == get_acpi_id_for_cpu(cpu))
+               if (acpi_cpu_get_madt_gicc(cpu) &&
+                   uid == get_acpi_id_for_cpu(cpu))
                        return cpu;

        return -EINVAL;

I'll spin an additional patch to make that change after testing I haven't
messed it up.

At the call site in gic_acpi_parse_madt_gicc() I'm not sure we can do better
than just skipping setting broken_rdists. I'll also pull the declaration of
that cpu variable down into this condition so it's more obvious we only
care about it in this error path.

Jonathan





> 
> Thanks,
> 
> 	M.
>
Jonathan Cameron April 30, 2024, 12:15 p.m. UTC | #5
On Mon, 29 Apr 2024 10:21:31 +0100
Jonathan Cameron <Jonathan.Cameron@Huawei.com> wrote:

> On Sun, 28 Apr 2024 12:28:03 +0100
> Marc Zyngier <maz@kernel.org> wrote:
> 
> > On Fri, 26 Apr 2024 19:28:58 +0100,
> > Jonathan Cameron <Jonathan.Cameron@Huawei.com> wrote:  
> > > 
> > > 
> > > I'll not send a formal v9 until early next week, so here is the current state
> > > if you have time to take another look before then.    
> > 
> > Don't bother resending this on my account -- you only sent it on
> > Friday and there hasn't been much response to it yet. There is still a
> > problem (see below), but looks otherwise OK.
> > 
> > [...]
> >   
> > > @@ -2363,11 +2381,25 @@ gic_acpi_parse_madt_gicc(union acpi_subtable_headers *header,
> > >  				(struct acpi_madt_generic_interrupt *)header;
> > >  	u32 reg = readl_relaxed(acpi_data.dist_base + GICD_PIDR2) & GIC_PIDR2_ARCH_MASK;
> > >  	u32 size = reg == GIC_PIDR2_ARCH_GICv4 ? SZ_64K * 4 : SZ_64K * 2;
> > > +	int cpu = get_cpu_for_acpi_id(gicc->uid);    
> > 
> > I already commented that get_cpu_for_acpi_id() can...  
> 
> Indeed sorry - I blame Friday syndrome for me failing to address that.
> 
> >   
> > >  	void __iomem *redist_base;
> > >  
> > > -	if (!acpi_gicc_is_usable(gicc))
> > > +	/* Neither enabled or online capable means it doesn't exist, skip it */
> > > +	if (!(gicc->flags & (ACPI_MADT_ENABLED | ACPI_MADT_GICC_ONLINE_CAPABLE)))
> > >  		return 0;
> > >  
> > > +	/*
> > > +	 * Capable but disabled CPUs can be brought online later. What about
> > > +	 * the redistributor? ACPI doesn't want to say!
> > > +	 * Virtual hotplug systems can use the MADT's "always-on" GICR entries.
> > > +	 * Otherwise, prevent such CPUs from being brought online.
> > > +	 */
> > > +	if (!(gicc->flags & ACPI_MADT_ENABLED)) {
> > > +		pr_warn("CPU %u's redistributor is inaccessible: this CPU can't be brought online\n", cpu);
> > > +		cpumask_set_cpu(cpu, &broken_rdists);    
> > 
> > ... return -EINVAL, and then be passed to cpumask_set_cpu(), with
> > interesting effects. It shouldn't happen, but I trust anything that
> > comes from firmware tables as much as I trust a campaigning
> > politician's promises. This should really result in the RD being
> > considered unusable, but without affecting any CPU (there is no valid
> > CPU the first place).
> > 
> > Another question is what get_cpu_for acpi_id() returns for a disabled
> > CPU. A valid CPU number? Or -EINVAL?  
> It's a match function that works by iterating over 0 to nr_cpu_ids and
> 
> if (uid == get_acpi_id_for_cpu(cpu))
> 
> So the question become does get_acpi_id_for_cpu() return a valid CPU
> number for a disabled CPU.
> 
> That uses acpi_cpu_get_madt_gicc(cpu)->uid so this all gets a bit circular.
> That looks it up via cpu_madt_gicc[cpu] which after the proposed updated
> patch is set if enabled or online capable.  There are however a few other
> error checks in acpi_map_gic_cpu_interface() that could lead to it
> not being set (MPIDR validity checks). I suspect all of these end up being
> fatal elsewhere which is why this hasn't blown up before.
> 
> If any of those cases are possible we could get a null pointer
> dereference.
> 
> Easy to harden this case via the following (which will leave us with
> -EINVAL.  There are other call sites that might trip over this.
> I'm inclined to harden them as a separate issue though so as not
> to get in the way of this patch set.
> 
> 
> diff --git a/arch/arm64/include/asm/acpi.h b/arch/arm64/include/asm/acpi.h
> index bc9a6656fc0c..a407f9cd549e 100644
> --- a/arch/arm64/include/asm/acpi.h
> +++ b/arch/arm64/include/asm/acpi.h
> @@ -124,7 +124,8 @@ static inline int get_cpu_for_acpi_id(u32 uid)
>         int cpu;
> 
>         for (cpu = 0; cpu < nr_cpu_ids; cpu++)
> -               if (uid == get_acpi_id_for_cpu(cpu))
> +               if (acpi_cpu_get_madt_gicc(cpu) &&
> +                   uid == get_acpi_id_for_cpu(cpu))
>                         return cpu;
> 
>         return -EINVAL;
> 
> I'll spin an additional patch to make that change after testing I haven't
> messed it up.
> 
> At the call site in gic_acpi_parse_madt_gicc() I'm not sure we can do better
> than just skipping setting broken_rdists. I'll also pull the declaration of
> that cpu variable down into this condition so it's more obvious we only
> care about it in this error path.

Just for the record, for my deliberately broken test case it seems that it returns
a valid CPU ID anyway. That's what I'd expect given acpi_parse_and_init_cpus()
doesn't check if the gicc entrees are enabled or not.

Jonathan

> 
> Jonathan
> 
> 
> 
> 
> 
> > 
> > Thanks,
> > 
> > 	M.
> >   
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
diff mbox series

Patch

diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
index 10af15f93d4d..b4685991953e 100644
--- a/drivers/irqchip/irq-gic-v3.c
+++ b/drivers/irqchip/irq-gic-v3.c
@@ -44,6 +44,8 @@ 
 
 #define GIC_IRQ_TYPE_PARTITION	(GIC_IRQ_TYPE_LPI + 1)
 
+static struct cpumask broken_rdists __read_mostly;
+
 struct redist_region {
 	void __iomem		*redist_base;
 	phys_addr_t		phys_base;
@@ -1293,6 +1295,18 @@  static void gic_cpu_init(void)
 #define MPIDR_TO_SGI_RS(mpidr)	(MPIDR_RS(mpidr) << ICC_SGI1R_RS_SHIFT)
 #define MPIDR_TO_SGI_CLUSTER_ID(mpidr)	((mpidr) & ~0xFUL)
 
+/*
+ * gic_starting_cpu() is called after the last point where cpuhp is allowed
+ * to fail. So pre check for problems earlier.
+ */
+static int gic_check_rdist(unsigned int cpu)
+{
+	if (cpumask_test_cpu(cpu, &broken_rdists))
+		return -EINVAL;
+
+	return 0;
+}
+
 static int gic_starting_cpu(unsigned int cpu)
 {
 	gic_cpu_init();
@@ -1384,6 +1398,10 @@  static void __init gic_smp_init(void)
 	};
 	int base_sgi;
 
+	cpuhp_setup_state_nocalls(CPUHP_BP_PREPARE_DYN,
+				  "irqchip/arm/gicv3:checkrdist",
+				  gic_check_rdist, NULL);
+
 	cpuhp_setup_state_nocalls(CPUHP_AP_IRQ_GIC_STARTING,
 				  "irqchip/arm/gicv3:starting",
 				  gic_starting_cpu, NULL);
@@ -2363,11 +2381,24 @@  gic_acpi_parse_madt_gicc(union acpi_subtable_headers *header,
 				(struct acpi_madt_generic_interrupt *)header;
 	u32 reg = readl_relaxed(acpi_data.dist_base + GICD_PIDR2) & GIC_PIDR2_ARCH_MASK;
 	u32 size = reg == GIC_PIDR2_ARCH_GICv4 ? SZ_64K * 4 : SZ_64K * 2;
+	int cpu = get_cpu_for_acpi_id(gicc->uid);
 	void __iomem *redist_base;
 
 	if (!acpi_gicc_is_usable(gicc))
 		return 0;
 
+	/*
+	 * Capable but disabled CPUs can be brought online later. What about
+	 * the redistributor? ACPI doesn't want to say!
+	 * Virtual hotplug systems can use the MADT's "always-on" GICR entries.
+	 * Otherwise, prevent such CPUs from being brought online.
+	 */
+	if (!(gicc->flags & ACPI_MADT_ENABLED)) {
+		pr_warn_once("CPU %u's redistributor is inaccessible: this CPU can't be brought online\n", cpu);
+		cpumask_set_cpu(cpu, &broken_rdists);
+		return 0;
+	}
+
 	redist_base = ioremap(gicc->gicr_base_address, size);
 	if (!redist_base)
 		return -ENOMEM;
@@ -2413,9 +2444,12 @@  static int __init gic_acpi_match_gicc(union acpi_subtable_headers *header,
 
 	/*
 	 * If GICC is enabled and has valid gicr base address, then it means
-	 * GICR base is presented via GICC
+	 * GICR base is presented via GICC. The redistributor is only known to
+	 * be accessible if the GICC is marked as enabled. If this bit is not
+	 * set, we'd need to add the redistributor at runtime, which isn't
+	 * supported.
 	 */
-	if (acpi_gicc_is_usable(gicc) && gicc->gicr_base_address)
+	if (gicc->flags & ACPI_MADT_ENABLED && gicc->gicr_base_address)
 		acpi_data.enabled_rdists++;
 
 	return 0;
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index 9844a3f9c4e5..fcfb7bb6789e 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -239,7 +239,8 @@  void acpi_table_print_madt_entry (struct acpi_subtable_header *madt);
 
 static inline bool acpi_gicc_is_usable(struct acpi_madt_generic_interrupt *gicc)
 {
-	return gicc->flags & ACPI_MADT_ENABLED;
+	return gicc->flags & (ACPI_MADT_ENABLED |
+			      ACPI_MADT_GICC_ONLINE_CAPABLE);
 }
 
 /* the following numa functions are architecture-dependent */