Message ID | 20231006151547.1.Ide945748593cffd8ff0feb9ae22b795935b944d6@changeid (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [1/3] arm64: Disable GiC priorities on Mediatek devices w/ firmware issues | expand |
On Fri, Oct 06, 2023 at 03:15:51PM -0700, Douglas Anderson wrote: > In commit 44bd78dd2b88 ("irqchip/gic-v3: Disable pseudo NMIs on > Mediatek devices w/ firmware issues") we added a method for detecting > Mediatek devices with broken firmware and disabled pseudo-NMI. While > that worked, it didn't address the problem at a deep enough level. > > The fundamental issue with this broken firmware is that it's not > saving and restoring several important GICR registers. The current > list is believed to be: > * GICR_NUM_IPRIORITYR > * GICR_CTLR > * GICR_ISPENDR0 > * GICR_ISACTIVER0 > * GICR_NSACR > > Pseudo-NMI didn't work because it was the only thing (currently) in > the kernel that relied on the broken registers, so forcing pseudo-NMI > off was an effective fix. However, it could be observed that calling > system_uses_irq_prio_masking() on these systems still returned > "true". That caused confusion and led to the need for > commit a07a59415217 ("arm64: smp: avoid NMI IPIs with broken MediaTek > FW"). It's worried that the incorrect value returned by > system_uses_irq_prio_masking() on these systems will continue to > confuse future developers. > > Let's fix the issue a little more completely by disabling IRQ > priorities at a deeper level in the kernel. Once we do this we can > revert some of the other bits of code dealing with this quirk. > > Signed-off-by: Douglas Anderson <dianders@chromium.org> > --- > > arch/arm64/kernel/cpufeature.c | 21 +++++++++++++++++++++ > 1 file changed, 21 insertions(+) > > diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c > index 2806a2850e78..e35efab8efa9 100644 > --- a/arch/arm64/kernel/cpufeature.c > +++ b/arch/arm64/kernel/cpufeature.c > @@ -2094,9 +2094,30 @@ static int __init early_enable_pseudo_nmi(char *p) > } > early_param("irqchip.gicv3_pseudo_nmi", early_enable_pseudo_nmi); > > +static bool are_gic_priorities_broken(void) > +{ > + bool is_broken = false; > + struct device_node *np; > + > + /* > + * Detect broken Mediatek firmware that doesn't properly save and > + * restore GIC priorities. > + */ > + np = of_find_compatible_node(NULL, NULL, "arm,gic-v3"); > + if (np) { > + is_broken = of_property_read_bool(np, "mediatek,broken-save-restore-fw"); > + of_node_put(np); > + } > + > + return is_broken; > +} I'm definitely in favour of detecting this in the cpucap, but I think it'd be better to parse the DT once on the boot CPU rather than on each CPU every time it's brought up. I think if we add something like: #ifdef CONFIG_ARM64_PSEUDO_NMI static void detect_system_supports_pseudo_nmi(void) { struct device_node *np; if (!enable_pseudo_nmi) return; /* * Detect broken Mediatek firmware that doesn't properly save and * restore GIC priorities. */ np = of_find_compatible_node(NULL, NULL, "arm,gic-v3"); if (np && of_property_read_bool(np, "mediatek,broken-save-restore-fw")) { pr_info("Pseudo-NMI disabled due to Mediatek Chromebook GICR save problem"); enable_pseudo_nmi = false; } of_node_put(np); } #endif /* CONFIG_ARM64_PSEUDO_NMI */ static inline void detect_system_supports_pseudo_nmi(void) { } #endif ... then we can call that from init_cpu_features() before we call setup_boot_cpu_capabilities(), and then the existing logic in can_use_gic_priorities() should just work as that returns the value of enable_pseudo_nmi. Note: of_node_put(NULL) does nothing, like kfree(NULL), so it's fine for that to be called in the !np case. Would you be happy to fold that in? I'm happy with a Suggested-by tag if so. :) Mark
Hi, On Wed, Oct 18, 2023 at 4:01 AM Mark Rutland <mark.rutland@arm.com> wrote: > > On Fri, Oct 06, 2023 at 03:15:51PM -0700, Douglas Anderson wrote: > > In commit 44bd78dd2b88 ("irqchip/gic-v3: Disable pseudo NMIs on > > Mediatek devices w/ firmware issues") we added a method for detecting > > Mediatek devices with broken firmware and disabled pseudo-NMI. While > > that worked, it didn't address the problem at a deep enough level. > > > > The fundamental issue with this broken firmware is that it's not > > saving and restoring several important GICR registers. The current > > list is believed to be: > > * GICR_NUM_IPRIORITYR > > * GICR_CTLR > > * GICR_ISPENDR0 > > * GICR_ISACTIVER0 > > * GICR_NSACR > > > > Pseudo-NMI didn't work because it was the only thing (currently) in > > the kernel that relied on the broken registers, so forcing pseudo-NMI > > off was an effective fix. However, it could be observed that calling > > system_uses_irq_prio_masking() on these systems still returned > > "true". That caused confusion and led to the need for > > commit a07a59415217 ("arm64: smp: avoid NMI IPIs with broken MediaTek > > FW"). It's worried that the incorrect value returned by > > system_uses_irq_prio_masking() on these systems will continue to > > confuse future developers. > > > > Let's fix the issue a little more completely by disabling IRQ > > priorities at a deeper level in the kernel. Once we do this we can > > revert some of the other bits of code dealing with this quirk. > > > > Signed-off-by: Douglas Anderson <dianders@chromium.org> > > --- > > > > arch/arm64/kernel/cpufeature.c | 21 +++++++++++++++++++++ > > 1 file changed, 21 insertions(+) > > > > diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c > > index 2806a2850e78..e35efab8efa9 100644 > > --- a/arch/arm64/kernel/cpufeature.c > > +++ b/arch/arm64/kernel/cpufeature.c > > @@ -2094,9 +2094,30 @@ static int __init early_enable_pseudo_nmi(char *p) > > } > > early_param("irqchip.gicv3_pseudo_nmi", early_enable_pseudo_nmi); > > > > +static bool are_gic_priorities_broken(void) > > +{ > > + bool is_broken = false; > > + struct device_node *np; > > + > > + /* > > + * Detect broken Mediatek firmware that doesn't properly save and > > + * restore GIC priorities. > > + */ > > + np = of_find_compatible_node(NULL, NULL, "arm,gic-v3"); > > + if (np) { > > + is_broken = of_property_read_bool(np, "mediatek,broken-save-restore-fw"); > > + of_node_put(np); > > + } > > + > > + return is_broken; > > +} > > I'm definitely in favour of detecting this in the cpucap, but I think it'd be > better to parse the DT once on the boot CPU rather than on each CPU every time > it's brought up. > > I think if we add something like: > > #ifdef CONFIG_ARM64_PSEUDO_NMI > static void detect_system_supports_pseudo_nmi(void) > { > struct device_node *np; > > if (!enable_pseudo_nmi) > return; > > /* > * Detect broken Mediatek firmware that doesn't properly save and > * restore GIC priorities. > */ > np = of_find_compatible_node(NULL, NULL, "arm,gic-v3"); > if (np && of_property_read_bool(np, "mediatek,broken-save-restore-fw")) { > pr_info("Pseudo-NMI disabled due to Mediatek Chromebook GICR save problem"); > enable_pseudo_nmi = false; > } > of_node_put(np); > } > #endif /* CONFIG_ARM64_PSEUDO_NMI */ > static inline void detect_system_supports_pseudo_nmi(void) { } > #endif > > ... then we can call that from init_cpu_features() before we call > setup_boot_cpu_capabilities(), and then the existing logic in > can_use_gic_priorities() should just work as that returns the value of > enable_pseudo_nmi. > > Note: of_node_put(NULL) does nothing, like kfree(NULL), so it's fine for that > to be called in the !np case. > > Would you be happy to fold that in? I'm happy with a Suggested-by tag if so. :) Yup, that looks good to me and I can fold it in (fixing a few nits like missing "\n" and adding __init to the function). I'll wait to get maintainers opinions on whether to fold patch #3 in here and then send a v2. -Doug
On Mon, Oct 30, 2023 at 04:19:55PM -0700, Doug Anderson wrote: > On Wed, Oct 18, 2023 at 4:01 AM Mark Rutland <mark.rutland@arm.com> wrote: > > On Fri, Oct 06, 2023 at 03:15:51PM -0700, Douglas Anderson wrote: > > > diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c > > > index 2806a2850e78..e35efab8efa9 100644 > > > --- a/arch/arm64/kernel/cpufeature.c > > > +++ b/arch/arm64/kernel/cpufeature.c > > > @@ -2094,9 +2094,30 @@ static int __init early_enable_pseudo_nmi(char *p) > > > } > > > early_param("irqchip.gicv3_pseudo_nmi", early_enable_pseudo_nmi); > > > > > > +static bool are_gic_priorities_broken(void) > > > +{ > > > + bool is_broken = false; > > > + struct device_node *np; > > > + > > > + /* > > > + * Detect broken Mediatek firmware that doesn't properly save and > > > + * restore GIC priorities. > > > + */ > > > + np = of_find_compatible_node(NULL, NULL, "arm,gic-v3"); > > > + if (np) { > > > + is_broken = of_property_read_bool(np, "mediatek,broken-save-restore-fw"); > > > + of_node_put(np); > > > + } > > > + > > > + return is_broken; > > > +} > > > > I'm definitely in favour of detecting this in the cpucap, but I think it'd be > > better to parse the DT once on the boot CPU rather than on each CPU every time > > it's brought up. > > > > I think if we add something like: > > > > #ifdef CONFIG_ARM64_PSEUDO_NMI > > static void detect_system_supports_pseudo_nmi(void) > > { > > struct device_node *np; > > > > if (!enable_pseudo_nmi) > > return; > > > > /* > > * Detect broken Mediatek firmware that doesn't properly save and > > * restore GIC priorities. > > */ > > np = of_find_compatible_node(NULL, NULL, "arm,gic-v3"); > > if (np && of_property_read_bool(np, "mediatek,broken-save-restore-fw")) { > > pr_info("Pseudo-NMI disabled due to Mediatek Chromebook GICR save problem"); > > enable_pseudo_nmi = false; > > } > > of_node_put(np); > > } > > #endif /* CONFIG_ARM64_PSEUDO_NMI */ > > static inline void detect_system_supports_pseudo_nmi(void) { } > > #endif > > > > ... then we can call that from init_cpu_features() before we call > > setup_boot_cpu_capabilities(), and then the existing logic in > > can_use_gic_priorities() should just work as that returns the value of > > enable_pseudo_nmi. > > > > Note: of_node_put(NULL) does nothing, like kfree(NULL), so it's fine for that > > to be called in the !np case. > > > > Would you be happy to fold that in? I'm happy with a Suggested-by tag if so. :) > > Yup, that looks good to me and I can fold it in (fixing a few nits > like missing "\n" and adding __init to the function). I'll wait to get > maintainers opinions on whether to fold patch #3 in here and then send > a v2. No preference from me; I assume this stuff's all going in together anyway. Will
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c index 2806a2850e78..e35efab8efa9 100644 --- a/arch/arm64/kernel/cpufeature.c +++ b/arch/arm64/kernel/cpufeature.c @@ -2094,9 +2094,30 @@ static int __init early_enable_pseudo_nmi(char *p) } early_param("irqchip.gicv3_pseudo_nmi", early_enable_pseudo_nmi); +static bool are_gic_priorities_broken(void) +{ + bool is_broken = false; + struct device_node *np; + + /* + * Detect broken Mediatek firmware that doesn't properly save and + * restore GIC priorities. + */ + np = of_find_compatible_node(NULL, NULL, "arm,gic-v3"); + if (np) { + is_broken = of_property_read_bool(np, "mediatek,broken-save-restore-fw"); + of_node_put(np); + } + + return is_broken; +} + static bool can_use_gic_priorities(const struct arm64_cpu_capabilities *entry, int scope) { + if (are_gic_priorities_broken()) + return false; + /* * ARM64_HAS_GIC_CPUIF_SYSREGS has a lower index, and is a boot CPU * feature, so will be detected earlier.
In commit 44bd78dd2b88 ("irqchip/gic-v3: Disable pseudo NMIs on Mediatek devices w/ firmware issues") we added a method for detecting Mediatek devices with broken firmware and disabled pseudo-NMI. While that worked, it didn't address the problem at a deep enough level. The fundamental issue with this broken firmware is that it's not saving and restoring several important GICR registers. The current list is believed to be: * GICR_NUM_IPRIORITYR * GICR_CTLR * GICR_ISPENDR0 * GICR_ISACTIVER0 * GICR_NSACR Pseudo-NMI didn't work because it was the only thing (currently) in the kernel that relied on the broken registers, so forcing pseudo-NMI off was an effective fix. However, it could be observed that calling system_uses_irq_prio_masking() on these systems still returned "true". That caused confusion and led to the need for commit a07a59415217 ("arm64: smp: avoid NMI IPIs with broken MediaTek FW"). It's worried that the incorrect value returned by system_uses_irq_prio_masking() on these systems will continue to confuse future developers. Let's fix the issue a little more completely by disabling IRQ priorities at a deeper level in the kernel. Once we do this we can revert some of the other bits of code dealing with this quirk. Signed-off-by: Douglas Anderson <dianders@chromium.org> --- arch/arm64/kernel/cpufeature.c | 21 +++++++++++++++++++++ 1 file changed, 21 insertions(+)