diff mbox series

[18/21] ARM: drop SMP support for ARM11MPCore

Message ID 20230327121317.4081816-19-arnd@kernel.org (mailing list archive)
State Handled Elsewhere
Headers show
Series dma-mapping: unify support for cache flushes | expand

Commit Message

Arnd Bergmann March 27, 2023, 12:13 p.m. UTC
From: Arnd Bergmann <arnd@arndb.de>

The cache management operations for noncoherent DMA on ARMv6 work
in two different ways:

 * When CONFIG_DMA_CACHE_RWFO is set, speculative prefetches on in-flight
   DMA buffers lead to data corruption when the prefetched data is written
   back on top of data from the device.

 * When CONFIG_DMA_CACHE_RWFO is disabled, a cache flush on one CPU
   is not seen by the other core(s), leading to inconsistent contents
   accross the system.

As a consequence, neither configuration is actually safe to use in a
general-purpose kernel that is used on both MPCore systems and ARM1176
with prefetching enabled.

We could add further workarounds to make the behavior more dynamic based
on the system, but realistically, there are close to zero remaining
users on any ARM11MPCore anyway, and nobody seems too interested in it,
compared to the more popular ARM1176 used in BMC2835 and AST2500.

The Oxnas platform has some minimal support in OpenWRT, but most of the
drivers and dts files never made it into the mainline kernel, while the
Arm Versatile/Realview platform mainly serves as a reference system but
is not necessary to be kept working once all other ARM11MPCore are gone.

Take the easy way out here and drop support for multiprocessing on
ARMv6, along with the CONFIG_DMA_CACHE_RWFO option and the cache
management implementation for it. This also helps with other ARMv6
issues, but for the moment leaves the ability to build a kernel that
can run on both ARMv7 SMP and single-processor ARMv6, which we probably
want to stop supporting as well, but not as part of this series.

Cc: Neil Armstrong <neil.armstrong@linaro.org>
Cc: Daniel Golle <daniel@makrotopia.org>
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: linux-oxnas@groups.io
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
I could use some help clarifying the above changelog text to describe
the exact problem, and how the CONFIG_DMA_CACHE_RWFO actually works on
MPCore. The TRMs for both 1176 and 11MPCore only describe prefetching
into the instruction cache, not the data cache, but this can end up in
the outercache as a result. The 1176 has some extra control bits to
control prefetching, but I found no reference that explains why an
MPCore does not run into the problem.
---
 arch/arm/mach-oxnas/Kconfig                |  4 -
 arch/arm/mach-oxnas/Makefile               |  1 -
 arch/arm/mach-oxnas/headsmp.S              | 23 ------
 arch/arm/mach-oxnas/platsmp.c              | 96 ----------------------
 arch/arm/mach-versatile/platsmp-realview.c |  4 -
 arch/arm/mm/Kconfig                        | 19 -----
 arch/arm/mm/cache-v6.S                     | 31 -------
 7 files changed, 178 deletions(-)
 delete mode 100644 arch/arm/mach-oxnas/headsmp.S
 delete mode 100644 arch/arm/mach-oxnas/platsmp.c

Comments

Neil Armstrong March 30, 2023, 7:48 a.m. UTC | #1
On 27/03/2023 14:13, Arnd Bergmann wrote:
> From: Arnd Bergmann <arnd@arndb.de>
> 
> The cache management operations for noncoherent DMA on ARMv6 work
> in two different ways:
> 
>   * When CONFIG_DMA_CACHE_RWFO is set, speculative prefetches on in-flight
>     DMA buffers lead to data corruption when the prefetched data is written
>     back on top of data from the device.
> 
>   * When CONFIG_DMA_CACHE_RWFO is disabled, a cache flush on one CPU
>     is not seen by the other core(s), leading to inconsistent contents
>     accross the system.
> 
> As a consequence, neither configuration is actually safe to use in a
> general-purpose kernel that is used on both MPCore systems and ARM1176
> with prefetching enabled.
> 
> We could add further workarounds to make the behavior more dynamic based
> on the system, but realistically, there are close to zero remaining
> users on any ARM11MPCore anyway, and nobody seems too interested in it,
> compared to the more popular ARM1176 used in BMC2835 and AST2500.
> 
> The Oxnas platform has some minimal support in OpenWRT, but most of the
> drivers and dts files never made it into the mainline kernel, while the
> Arm Versatile/Realview platform mainly serves as a reference system but
> is not necessary to be kept working once all other ARM11MPCore are gone.

Acked-by: Neil Armstrong <neil.armstrong@linaro.org>

It's sad but it's the reality, there's no chance full OXNAS support will
ever come upstream and no real work has been done for years.

I think OXNAS support can be programmed for removal for next release,
it would need significant work to rework current support to make it acceptable
before trying to upstream missing bits anyway.

Thanks,
Neil


> 
> Take the easy way out here and drop support for multiprocessing on
> ARMv6, along with the CONFIG_DMA_CACHE_RWFO option and the cache
> management implementation for it. This also helps with other ARMv6
> issues, but for the moment leaves the ability to build a kernel that
> can run on both ARMv7 SMP and single-processor ARMv6, which we probably
> want to stop supporting as well, but not as part of this series.
> 
> Cc: Neil Armstrong <neil.armstrong@linaro.org>
> Cc: Daniel Golle <daniel@makrotopia.org>
> Cc: Linus Walleij <linus.walleij@linaro.org>
> Cc: linux-oxnas@groups.io
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
> ---
> I could use some help clarifying the above changelog text to describe
> the exact problem, and how the CONFIG_DMA_CACHE_RWFO actually works on
> MPCore. The TRMs for both 1176 and 11MPCore only describe prefetching
> into the instruction cache, not the data cache, but this can end up in
> the outercache as a result. The 1176 has some extra control bits to
> control prefetching, but I found no reference that explains why an
> MPCore does not run into the problem.
> ---
>   arch/arm/mach-oxnas/Kconfig                |  4 -
>   arch/arm/mach-oxnas/Makefile               |  1 -
>   arch/arm/mach-oxnas/headsmp.S              | 23 ------
>   arch/arm/mach-oxnas/platsmp.c              | 96 ----------------------
>   arch/arm/mach-versatile/platsmp-realview.c |  4 -
>   arch/arm/mm/Kconfig                        | 19 -----
>   arch/arm/mm/cache-v6.S                     | 31 -------
>   7 files changed, 178 deletions(-)
>   delete mode 100644 arch/arm/mach-oxnas/headsmp.S
>   delete mode 100644 arch/arm/mach-oxnas/platsmp.c
> 
> diff --git a/arch/arm/mach-oxnas/Kconfig b/arch/arm/mach-oxnas/Kconfig
> index a9ded7079268..a054235c3d6c 100644
> --- a/arch/arm/mach-oxnas/Kconfig
> +++ b/arch/arm/mach-oxnas/Kconfig
> @@ -28,10 +28,6 @@ config MACH_OX820
>   	bool "Support OX820 Based Products"
>   	depends on ARCH_MULTI_V6
>   	select ARM_GIC
> -	select DMA_CACHE_RWFO if SMP
> -	select HAVE_SMP
> -	select HAVE_ARM_SCU if SMP
> -	select HAVE_ARM_TWD if SMP
>   	help
>   	  Include Support for the Oxford Semiconductor OX820 SoC Based Products.
>   
> diff --git a/arch/arm/mach-oxnas/Makefile b/arch/arm/mach-oxnas/Makefile
> index 0e78ecfe6c49..a4e40e534e6a 100644
> --- a/arch/arm/mach-oxnas/Makefile
> +++ b/arch/arm/mach-oxnas/Makefile
> @@ -1,2 +1 @@
>   # SPDX-License-Identifier: GPL-2.0-only
> -obj-$(CONFIG_SMP)		+= platsmp.o headsmp.o
> diff --git a/arch/arm/mach-oxnas/headsmp.S b/arch/arm/mach-oxnas/headsmp.S
> deleted file mode 100644
> index 9c0f1479f33a..000000000000
> --- a/arch/arm/mach-oxnas/headsmp.S
> +++ /dev/null
> @@ -1,23 +0,0 @@
> -/* SPDX-License-Identifier: GPL-2.0-only */
> -/*
> - * Copyright (C) 2013 Ma Haijun <mahaijuns@gmail.com>
> - * Copyright (c) 2003 ARM Limited
> - * All Rights Reserved
> - */
> -#include <linux/linkage.h>
> -#include <linux/init.h>
> -
> -	__INIT
> -
> -/*
> - * OX820 specific entry point for secondary CPUs.
> - */
> -ENTRY(ox820_secondary_startup)
> -	mov r4, #0
> -	/* invalidate both caches and branch target cache */
> -	mcr p15, 0, r4, c7, c7, 0
> -	/*
> -	 * we've been released from the holding pen: secondary_stack
> -	 * should now contain the SVC stack for this core
> -	 */
> -	b	secondary_startup
> diff --git a/arch/arm/mach-oxnas/platsmp.c b/arch/arm/mach-oxnas/platsmp.c
> deleted file mode 100644
> index f0a50b9e61df..000000000000
> --- a/arch/arm/mach-oxnas/platsmp.c
> +++ /dev/null
> @@ -1,96 +0,0 @@
> -// SPDX-License-Identifier: GPL-2.0-only
> -/*
> - * Copyright (C) 2016 Neil Armstrong <narmstrong@baylibre.com>
> - * Copyright (C) 2013 Ma Haijun <mahaijuns@gmail.com>
> - * Copyright (C) 2002 ARM Ltd.
> - * All Rights Reserved
> - */
> -#include <linux/io.h>
> -#include <linux/delay.h>
> -#include <linux/of.h>
> -#include <linux/of_address.h>
> -
> -#include <asm/cacheflush.h>
> -#include <asm/cp15.h>
> -#include <asm/smp_plat.h>
> -#include <asm/smp_scu.h>
> -
> -extern void ox820_secondary_startup(void);
> -
> -static void __iomem *cpu_ctrl;
> -static void __iomem *gic_cpu_ctrl;
> -
> -#define HOLDINGPEN_CPU_OFFSET		0xc8
> -#define HOLDINGPEN_LOCATION_OFFSET	0xc4
> -
> -#define GIC_NCPU_OFFSET(cpu)		(0x100 + (cpu)*0x100)
> -#define GIC_CPU_CTRL			0x00
> -#define GIC_CPU_CTRL_ENABLE		1
> -
> -static int __init ox820_boot_secondary(unsigned int cpu,
> -		struct task_struct *idle)
> -{
> -	/*
> -	 * Write the address of secondary startup into the
> -	 * system-wide flags register. The BootMonitor waits
> -	 * until it receives a soft interrupt, and then the
> -	 * secondary CPU branches to this address.
> -	 */
> -	writel(virt_to_phys(ox820_secondary_startup),
> -			cpu_ctrl + HOLDINGPEN_LOCATION_OFFSET);
> -
> -	writel(cpu, cpu_ctrl + HOLDINGPEN_CPU_OFFSET);
> -
> -	/*
> -	 * Enable GIC cpu interface in CPU Interface Control Register
> -	 */
> -	writel(GIC_CPU_CTRL_ENABLE,
> -		gic_cpu_ctrl + GIC_NCPU_OFFSET(cpu) + GIC_CPU_CTRL);
> -
> -	/*
> -	 * Send the secondary CPU a soft interrupt, thereby causing
> -	 * the boot monitor to read the system wide flags register,
> -	 * and branch to the address found there.
> -	 */
> -	arch_send_wakeup_ipi_mask(cpumask_of(cpu));
> -
> -	return 0;
> -}
> -
> -static void __init ox820_smp_prepare_cpus(unsigned int max_cpus)
> -{
> -	struct device_node *np;
> -	void __iomem *scu_base;
> -
> -	np = of_find_compatible_node(NULL, NULL, "arm,arm11mp-scu");
> -	scu_base = of_iomap(np, 0);
> -	of_node_put(np);
> -	if (!scu_base)
> -		return;
> -
> -	/* Remap CPU Interrupt Interface Registers */
> -	np = of_find_compatible_node(NULL, NULL, "arm,arm11mp-gic");
> -	gic_cpu_ctrl = of_iomap(np, 1);
> -	of_node_put(np);
> -	if (!gic_cpu_ctrl)
> -		goto unmap_scu;
> -
> -	np = of_find_compatible_node(NULL, NULL, "oxsemi,ox820-sys-ctrl");
> -	cpu_ctrl = of_iomap(np, 0);
> -	of_node_put(np);
> -	if (!cpu_ctrl)
> -		goto unmap_scu;
> -
> -	scu_enable(scu_base);
> -	flush_cache_all();
> -
> -unmap_scu:
> -	iounmap(scu_base);
> -}
> -
> -static const struct smp_operations ox820_smp_ops __initconst = {
> -	.smp_prepare_cpus	= ox820_smp_prepare_cpus,
> -	.smp_boot_secondary	= ox820_boot_secondary,
> -};
> -
> -CPU_METHOD_OF_DECLARE(ox820_smp, "oxsemi,ox820-smp", &ox820_smp_ops);
> diff --git a/arch/arm/mach-versatile/platsmp-realview.c b/arch/arm/mach-versatile/platsmp-realview.c
> index 5d363385c801..fa31fd2d211d 100644
> --- a/arch/arm/mach-versatile/platsmp-realview.c
> +++ b/arch/arm/mach-versatile/platsmp-realview.c
> @@ -18,16 +18,12 @@
>   #define REALVIEW_SYS_FLAGSSET_OFFSET	0x30
>   
>   static const struct of_device_id realview_scu_match[] = {
> -	{ .compatible = "arm,arm11mp-scu", },
>   	{ .compatible = "arm,cortex-a9-scu", },
>   	{ .compatible = "arm,cortex-a5-scu", },
>   	{ }
>   };
>   
>   static const struct of_device_id realview_syscon_match[] = {
> -        { .compatible = "arm,core-module-integrator", },
> -        { .compatible = "arm,realview-eb-syscon", },
> -        { .compatible = "arm,realview-pb11mp-syscon", },
>           { .compatible = "arm,realview-pbx-syscon", },
>           { },
>   };
> diff --git a/arch/arm/mm/Kconfig b/arch/arm/mm/Kconfig
> index c5bbae86f725..16b62bc0a970 100644
> --- a/arch/arm/mm/Kconfig
> +++ b/arch/arm/mm/Kconfig
> @@ -937,25 +937,6 @@ config VDSO
>   	  You must have glibc 2.22 or later for programs to seamlessly
>   	  take advantage of this.
>   
> -config DMA_CACHE_RWFO
> -	bool "Enable read/write for ownership DMA cache maintenance"
> -	depends on CPU_V6K && SMP
> -	default y
> -	help
> -	  The Snoop Control Unit on ARM11MPCore does not detect the
> -	  cache maintenance operations and the dma_{map,unmap}_area()
> -	  functions may leave stale cache entries on other CPUs. By
> -	  enabling this option, Read or Write For Ownership in the ARMv6
> -	  DMA cache maintenance functions is performed. These LDR/STR
> -	  instructions change the cache line state to shared or modified
> -	  so that the cache operation has the desired effect.
> -
> -	  Note that the workaround is only valid on processors that do
> -	  not perform speculative loads into the D-cache. For such
> -	  processors, if cache maintenance operations are not broadcast
> -	  in hardware, other workarounds are needed (e.g. cache
> -	  maintenance broadcasting in software via FIQ).
> -
>   config OUTER_CACHE
>   	bool
>   
> diff --git a/arch/arm/mm/cache-v6.S b/arch/arm/mm/cache-v6.S
> index abae7ff5defc..f6ee53c1de20 100644
> --- a/arch/arm/mm/cache-v6.S
> +++ b/arch/arm/mm/cache-v6.S
> @@ -201,10 +201,6 @@ ENTRY(v6_flush_kern_dcache_area)
>    *	- end     - virtual end address of region
>    */
>   ENTRY(v6_dma_inv_range)
> -#ifdef CONFIG_DMA_CACHE_RWFO
> -	ldrb	r2, [r0]			@ read for ownership
> -	strb	r2, [r0]			@ write for ownership
> -#endif
>   	tst	r0, #D_CACHE_LINE_SIZE - 1
>   	bic	r0, r0, #D_CACHE_LINE_SIZE - 1
>   #ifdef HARVARD_CACHE
> @@ -213,10 +209,6 @@ ENTRY(v6_dma_inv_range)
>   	mcrne	p15, 0, r0, c7, c11, 1		@ clean unified line
>   #endif
>   	tst	r1, #D_CACHE_LINE_SIZE - 1
> -#ifdef CONFIG_DMA_CACHE_RWFO
> -	ldrbne	r2, [r1, #-1]			@ read for ownership
> -	strbne	r2, [r1, #-1]			@ write for ownership
> -#endif
>   	bic	r1, r1, #D_CACHE_LINE_SIZE - 1
>   #ifdef HARVARD_CACHE
>   	mcrne	p15, 0, r1, c7, c14, 1		@ clean & invalidate D line
> @@ -231,10 +223,6 @@ ENTRY(v6_dma_inv_range)
>   #endif
>   	add	r0, r0, #D_CACHE_LINE_SIZE
>   	cmp	r0, r1
> -#ifdef CONFIG_DMA_CACHE_RWFO
> -	ldrlo	r2, [r0]			@ read for ownership
> -	strlo	r2, [r0]			@ write for ownership
> -#endif
>   	blo	1b
>   	mov	r0, #0
>   	mcr	p15, 0, r0, c7, c10, 4		@ drain write buffer
> @@ -248,9 +236,6 @@ ENTRY(v6_dma_inv_range)
>   ENTRY(v6_dma_clean_range)
>   	bic	r0, r0, #D_CACHE_LINE_SIZE - 1
>   1:
> -#ifdef CONFIG_DMA_CACHE_RWFO
> -	ldr	r2, [r0]			@ read for ownership
> -#endif
>   #ifdef HARVARD_CACHE
>   	mcr	p15, 0, r0, c7, c10, 1		@ clean D line
>   #else
> @@ -269,10 +254,6 @@ ENTRY(v6_dma_clean_range)
>    *	- end     - virtual end address of region
>    */
>   ENTRY(v6_dma_flush_range)
> -#ifdef CONFIG_DMA_CACHE_RWFO
> -	ldrb	r2, [r0]		@ read for ownership
> -	strb	r2, [r0]		@ write for ownership
> -#endif
>   	bic	r0, r0, #D_CACHE_LINE_SIZE - 1
>   1:
>   #ifdef HARVARD_CACHE
> @@ -282,10 +263,6 @@ ENTRY(v6_dma_flush_range)
>   #endif
>   	add	r0, r0, #D_CACHE_LINE_SIZE
>   	cmp	r0, r1
> -#ifdef CONFIG_DMA_CACHE_RWFO
> -	ldrblo	r2, [r0]			@ read for ownership
> -	strblo	r2, [r0]			@ write for ownership
> -#endif
>   	blo	1b
>   	mov	r0, #0
>   	mcr	p15, 0, r0, c7, c10, 4		@ drain write buffer
> @@ -301,13 +278,7 @@ ENTRY(v6_dma_map_area)
>   	add	r1, r1, r0
>   	teq	r2, #DMA_FROM_DEVICE
>   	beq	v6_dma_inv_range
> -#ifndef CONFIG_DMA_CACHE_RWFO
>   	b	v6_dma_clean_range
> -#else
> -	teq	r2, #DMA_TO_DEVICE
> -	beq	v6_dma_clean_range
> -	b	v6_dma_flush_range
> -#endif
>   ENDPROC(v6_dma_map_area)
>   
>   /*
> @@ -317,11 +288,9 @@ ENDPROC(v6_dma_map_area)
>    *	- dir	- DMA direction
>    */
>   ENTRY(v6_dma_unmap_area)
> -#ifndef CONFIG_DMA_CACHE_RWFO
>   	add	r1, r1, r0
>   	teq	r2, #DMA_TO_DEVICE
>   	bne	v6_dma_inv_range
> -#endif
>   	ret	lr
>   ENDPROC(v6_dma_unmap_area)
>
Linus Walleij March 30, 2023, 8:12 a.m. UTC | #2
On Mon, Mar 27, 2023 at 2:16 PM Arnd Bergmann <arnd@kernel.org> wrote:

> From: Arnd Bergmann <arnd@arndb.de>
>
> The cache management operations for noncoherent DMA on ARMv6 work
> in two different ways:
>
>  * When CONFIG_DMA_CACHE_RWFO is set, speculative prefetches on in-flight
>    DMA buffers lead to data corruption when the prefetched data is written
>    back on top of data from the device.
>
>  * When CONFIG_DMA_CACHE_RWFO is disabled, a cache flush on one CPU
>    is not seen by the other core(s), leading to inconsistent contents
>    accross the system.
>
> As a consequence, neither configuration is actually safe to use in a
> general-purpose kernel that is used on both MPCore systems and ARM1176
> with prefetching enabled.
>
> We could add further workarounds to make the behavior more dynamic based
> on the system, but realistically, there are close to zero remaining
> users on any ARM11MPCore anyway, and nobody seems too interested in it,
> compared to the more popular ARM1176 used in BMC2835 and AST2500.
>
> The Oxnas platform has some minimal support in OpenWRT, but most of the
> drivers and dts files never made it into the mainline kernel, while the
> Arm Versatile/Realview platform mainly serves as a reference system but
> is not necessary to be kept working once all other ARM11MPCore are gone.
>
> Take the easy way out here and drop support for multiprocessing on
> ARMv6, along with the CONFIG_DMA_CACHE_RWFO option and the cache
> management implementation for it. This also helps with other ARMv6
> issues, but for the moment leaves the ability to build a kernel that
> can run on both ARMv7 SMP and single-processor ARMv6, which we probably
> want to stop supporting as well, but not as part of this series.
>
> Cc: Neil Armstrong <neil.armstrong@linaro.org>
> Cc: Daniel Golle <daniel@makrotopia.org>
> Cc: Linus Walleij <linus.walleij@linaro.org>
> Cc: linux-oxnas@groups.io
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>

Yeah, we discussed this earlier, let's just drop it. Not worth the effort.
Acked-by: Linus Walleij <linus.walleij@linaro.org>

Yours,
Linus Walleij
Arnd Bergmann March 30, 2023, 10:03 a.m. UTC | #3
On Thu, Mar 30, 2023, at 09:48, Neil Armstrong wrote:
> On 27/03/2023 14:13, Arnd Bergmann wrote:
>> From: Arnd Bergmann <arnd@arndb.de>
>> 
>> The cache management operations for noncoherent DMA on ARMv6 work
>> in two different ways:
>> 
>>   * When CONFIG_DMA_CACHE_RWFO is set, speculative prefetches on in-flight
>>     DMA buffers lead to data corruption when the prefetched data is written
>>     back on top of data from the device.
>> 
>>   * When CONFIG_DMA_CACHE_RWFO is disabled, a cache flush on one CPU
>>     is not seen by the other core(s), leading to inconsistent contents
>>     accross the system.
>> 
>> As a consequence, neither configuration is actually safe to use in a
>> general-purpose kernel that is used on both MPCore systems and ARM1176
>> with prefetching enabled.
>> 
>> We could add further workarounds to make the behavior more dynamic based
>> on the system, but realistically, there are close to zero remaining
>> users on any ARM11MPCore anyway, and nobody seems too interested in it,
>> compared to the more popular ARM1176 used in BMC2835 and AST2500.
>> 
>> The Oxnas platform has some minimal support in OpenWRT, but most of the
>> drivers and dts files never made it into the mainline kernel, while the
>> Arm Versatile/Realview platform mainly serves as a reference system but
>> is not necessary to be kept working once all other ARM11MPCore are gone.
>
> Acked-by: Neil Armstrong <neil.armstrong@linaro.org>
>
> It's sad but it's the reality, there's no chance full OXNAS support will
> ever come upstream and no real work has been done for years.
>
> I think OXNAS support can be programmed for removal for next release,
> it would need significant work to rework current support to make it acceptable
> before trying to upstream missing bits anyway.

Ok, thanks for your reply!

To clarify, do you think we should plan for removal after the next
stable release (6.3, removed in 6.4), or after the next LTS
release (probably 6.6, removed in 6.7)? As far as I understand,
the next OpenWRT release (23.x) will be based on linux-5.15,
and the one after that (24.x) would likely still use 6.1, unless
they skip an LTS kernel.

     Arnd
Ard Biesheuvel March 30, 2023, 11:51 a.m. UTC | #4
On Mon, 27 Mar 2023 at 14:18, Arnd Bergmann <arnd@kernel.org> wrote:
>
> From: Arnd Bergmann <arnd@arndb.de>
>
> The cache management operations for noncoherent DMA on ARMv6 work
> in two different ways:
>
>  * When CONFIG_DMA_CACHE_RWFO is set, speculative prefetches on in-flight
>    DMA buffers lead to data corruption when the prefetched data is written
>    back on top of data from the device.
>
>  * When CONFIG_DMA_CACHE_RWFO is disabled, a cache flush on one CPU
>    is not seen by the other core(s), leading to inconsistent contents
>    accross the system.
>
> As a consequence, neither configuration is actually safe to use in a
> general-purpose kernel that is used on both MPCore systems and ARM1176
> with prefetching enabled.
>
> We could add further workarounds to make the behavior more dynamic based
> on the system, but realistically, there are close to zero remaining
> users on any ARM11MPCore anyway, and nobody seems too interested in it,
> compared to the more popular ARM1176 used in BMC2835 and AST2500.
>
> The Oxnas platform has some minimal support in OpenWRT, but most of the
> drivers and dts files never made it into the mainline kernel, while the
> Arm Versatile/Realview platform mainly serves as a reference system but
> is not necessary to be kept working once all other ARM11MPCore are gone.
>
> Take the easy way out here and drop support for multiprocessing on
> ARMv6, along with the CONFIG_DMA_CACHE_RWFO option and the cache
> management implementation for it. This also helps with other ARMv6
> issues, but for the moment leaves the ability to build a kernel that
> can run on both ARMv7 SMP and single-processor ARMv6, which we probably
> want to stop supporting as well, but not as part of this series.
>
> Cc: Neil Armstrong <neil.armstrong@linaro.org>
> Cc: Daniel Golle <daniel@makrotopia.org>
> Cc: Linus Walleij <linus.walleij@linaro.org>
> Cc: linux-oxnas@groups.io
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>

Acked-by: Ard Biesheuvel <ardb@kernel.org>
Neil Armstrong March 30, 2023, 4:40 p.m. UTC | #5
Le 30/03/2023 à 12:03, Arnd Bergmann a écrit :
> On Thu, Mar 30, 2023, at 09:48, Neil Armstrong wrote:
>> On 27/03/2023 14:13, Arnd Bergmann wrote:
>>> From: Arnd Bergmann <arnd@arndb.de>
>>>
>>> The cache management operations for noncoherent DMA on ARMv6 work
>>> in two different ways:
>>>
>>>    * When CONFIG_DMA_CACHE_RWFO is set, speculative prefetches on in-flight
>>>      DMA buffers lead to data corruption when the prefetched data is written
>>>      back on top of data from the device.
>>>
>>>    * When CONFIG_DMA_CACHE_RWFO is disabled, a cache flush on one CPU
>>>      is not seen by the other core(s), leading to inconsistent contents
>>>      accross the system.
>>>
>>> As a consequence, neither configuration is actually safe to use in a
>>> general-purpose kernel that is used on both MPCore systems and ARM1176
>>> with prefetching enabled.
>>>
>>> We could add further workarounds to make the behavior more dynamic based
>>> on the system, but realistically, there are close to zero remaining
>>> users on any ARM11MPCore anyway, and nobody seems too interested in it,
>>> compared to the more popular ARM1176 used in BMC2835 and AST2500.
>>>
>>> The Oxnas platform has some minimal support in OpenWRT, but most of the
>>> drivers and dts files never made it into the mainline kernel, while the
>>> Arm Versatile/Realview platform mainly serves as a reference system but
>>> is not necessary to be kept working once all other ARM11MPCore are gone.
>>
>> Acked-by: Neil Armstrong <neil.armstrong@linaro.org>
>>
>> It's sad but it's the reality, there's no chance full OXNAS support will
>> ever come upstream and no real work has been done for years.
>>
>> I think OXNAS support can be programmed for removal for next release,
>> it would need significant work to rework current support to make it acceptable
>> before trying to upstream missing bits anyway.
> 
> Ok, thanks for your reply!
> 
> To clarify, do you think we should plan for removal after the next
> stable release (6.3, removed in 6.4), or after the next LTS
> release (probably 6.6, removed in 6.7)? As far as I understand,
> the next OpenWRT release (23.x) will be based on linux-5.15,
> and the one after that (24.x) would likely still use 6.1, unless
> they skip an LTS kernel.

I think it's ok to remove it ASAP, or at least before the next LTS,
not having SMP makes the platform barely usable so the earliest is the best.

Neil

> 
>       Arnd
Catalin Marinas March 31, 2023, 5:09 p.m. UTC | #6
On Mon, Mar 27, 2023 at 02:13:14PM +0200, Arnd Bergmann wrote:
> From: Arnd Bergmann <arnd@arndb.de>
> 
> The cache management operations for noncoherent DMA on ARMv6 work
> in two different ways:
> 
>  * When CONFIG_DMA_CACHE_RWFO is set, speculative prefetches on in-flight
>    DMA buffers lead to data corruption when the prefetched data is written
>    back on top of data from the device.
> 
>  * When CONFIG_DMA_CACHE_RWFO is disabled, a cache flush on one CPU
>    is not seen by the other core(s), leading to inconsistent contents
>    accross the system.
> 
> As a consequence, neither configuration is actually safe to use in a
> general-purpose kernel that is used on both MPCore systems and ARM1176
> with prefetching enabled.

As the author of this terrible hack (created under duress ;))

Acked-by: Catalin Marinas <catalin.marinas@arm.com>

IIRC, RWFO is working in combination with the cache operations. Because
the cache maintenance broadcast did not happen, we forced the cache
lines to migrate to a CPU via a write (for ownership) and doing the
cache maintenance on that CPU (that was the FROM_DEVICE case). For the
TO_DEVICE case, reading on a CPU would cause dirty lines on another CPU
to be evicted (or migrated as dirty to the current CPU IIRC) then the
cache maintenance to clean them to PoC on the local CPU.

But there's always a small window between read/write for ownership and
the actual cache maintenance which can cause a cache line to migrate to
other CPUs if they do speculative prefetches. At the time ARM11MPCore
was deemed safe-ish but I haven't followed what later implementations
actually did (luckily we fixed the architecture in ARMv7).
diff mbox series

Patch

diff --git a/arch/arm/mach-oxnas/Kconfig b/arch/arm/mach-oxnas/Kconfig
index a9ded7079268..a054235c3d6c 100644
--- a/arch/arm/mach-oxnas/Kconfig
+++ b/arch/arm/mach-oxnas/Kconfig
@@ -28,10 +28,6 @@  config MACH_OX820
 	bool "Support OX820 Based Products"
 	depends on ARCH_MULTI_V6
 	select ARM_GIC
-	select DMA_CACHE_RWFO if SMP
-	select HAVE_SMP
-	select HAVE_ARM_SCU if SMP
-	select HAVE_ARM_TWD if SMP
 	help
 	  Include Support for the Oxford Semiconductor OX820 SoC Based Products.
 
diff --git a/arch/arm/mach-oxnas/Makefile b/arch/arm/mach-oxnas/Makefile
index 0e78ecfe6c49..a4e40e534e6a 100644
--- a/arch/arm/mach-oxnas/Makefile
+++ b/arch/arm/mach-oxnas/Makefile
@@ -1,2 +1 @@ 
 # SPDX-License-Identifier: GPL-2.0-only
-obj-$(CONFIG_SMP)		+= platsmp.o headsmp.o
diff --git a/arch/arm/mach-oxnas/headsmp.S b/arch/arm/mach-oxnas/headsmp.S
deleted file mode 100644
index 9c0f1479f33a..000000000000
--- a/arch/arm/mach-oxnas/headsmp.S
+++ /dev/null
@@ -1,23 +0,0 @@ 
-/* SPDX-License-Identifier: GPL-2.0-only */
-/*
- * Copyright (C) 2013 Ma Haijun <mahaijuns@gmail.com>
- * Copyright (c) 2003 ARM Limited
- * All Rights Reserved
- */
-#include <linux/linkage.h>
-#include <linux/init.h>
-
-	__INIT
-
-/*
- * OX820 specific entry point for secondary CPUs.
- */
-ENTRY(ox820_secondary_startup)
-	mov r4, #0
-	/* invalidate both caches and branch target cache */
-	mcr p15, 0, r4, c7, c7, 0
-	/*
-	 * we've been released from the holding pen: secondary_stack
-	 * should now contain the SVC stack for this core
-	 */
-	b	secondary_startup
diff --git a/arch/arm/mach-oxnas/platsmp.c b/arch/arm/mach-oxnas/platsmp.c
deleted file mode 100644
index f0a50b9e61df..000000000000
--- a/arch/arm/mach-oxnas/platsmp.c
+++ /dev/null
@@ -1,96 +0,0 @@ 
-// SPDX-License-Identifier: GPL-2.0-only
-/*
- * Copyright (C) 2016 Neil Armstrong <narmstrong@baylibre.com>
- * Copyright (C) 2013 Ma Haijun <mahaijuns@gmail.com>
- * Copyright (C) 2002 ARM Ltd.
- * All Rights Reserved
- */
-#include <linux/io.h>
-#include <linux/delay.h>
-#include <linux/of.h>
-#include <linux/of_address.h>
-
-#include <asm/cacheflush.h>
-#include <asm/cp15.h>
-#include <asm/smp_plat.h>
-#include <asm/smp_scu.h>
-
-extern void ox820_secondary_startup(void);
-
-static void __iomem *cpu_ctrl;
-static void __iomem *gic_cpu_ctrl;
-
-#define HOLDINGPEN_CPU_OFFSET		0xc8
-#define HOLDINGPEN_LOCATION_OFFSET	0xc4
-
-#define GIC_NCPU_OFFSET(cpu)		(0x100 + (cpu)*0x100)
-#define GIC_CPU_CTRL			0x00
-#define GIC_CPU_CTRL_ENABLE		1
-
-static int __init ox820_boot_secondary(unsigned int cpu,
-		struct task_struct *idle)
-{
-	/*
-	 * Write the address of secondary startup into the
-	 * system-wide flags register. The BootMonitor waits
-	 * until it receives a soft interrupt, and then the
-	 * secondary CPU branches to this address.
-	 */
-	writel(virt_to_phys(ox820_secondary_startup),
-			cpu_ctrl + HOLDINGPEN_LOCATION_OFFSET);
-
-	writel(cpu, cpu_ctrl + HOLDINGPEN_CPU_OFFSET);
-
-	/*
-	 * Enable GIC cpu interface in CPU Interface Control Register
-	 */
-	writel(GIC_CPU_CTRL_ENABLE,
-		gic_cpu_ctrl + GIC_NCPU_OFFSET(cpu) + GIC_CPU_CTRL);
-
-	/*
-	 * Send the secondary CPU a soft interrupt, thereby causing
-	 * the boot monitor to read the system wide flags register,
-	 * and branch to the address found there.
-	 */
-	arch_send_wakeup_ipi_mask(cpumask_of(cpu));
-
-	return 0;
-}
-
-static void __init ox820_smp_prepare_cpus(unsigned int max_cpus)
-{
-	struct device_node *np;
-	void __iomem *scu_base;
-
-	np = of_find_compatible_node(NULL, NULL, "arm,arm11mp-scu");
-	scu_base = of_iomap(np, 0);
-	of_node_put(np);
-	if (!scu_base)
-		return;
-
-	/* Remap CPU Interrupt Interface Registers */
-	np = of_find_compatible_node(NULL, NULL, "arm,arm11mp-gic");
-	gic_cpu_ctrl = of_iomap(np, 1);
-	of_node_put(np);
-	if (!gic_cpu_ctrl)
-		goto unmap_scu;
-
-	np = of_find_compatible_node(NULL, NULL, "oxsemi,ox820-sys-ctrl");
-	cpu_ctrl = of_iomap(np, 0);
-	of_node_put(np);
-	if (!cpu_ctrl)
-		goto unmap_scu;
-
-	scu_enable(scu_base);
-	flush_cache_all();
-
-unmap_scu:
-	iounmap(scu_base);
-}
-
-static const struct smp_operations ox820_smp_ops __initconst = {
-	.smp_prepare_cpus	= ox820_smp_prepare_cpus,
-	.smp_boot_secondary	= ox820_boot_secondary,
-};
-
-CPU_METHOD_OF_DECLARE(ox820_smp, "oxsemi,ox820-smp", &ox820_smp_ops);
diff --git a/arch/arm/mach-versatile/platsmp-realview.c b/arch/arm/mach-versatile/platsmp-realview.c
index 5d363385c801..fa31fd2d211d 100644
--- a/arch/arm/mach-versatile/platsmp-realview.c
+++ b/arch/arm/mach-versatile/platsmp-realview.c
@@ -18,16 +18,12 @@ 
 #define REALVIEW_SYS_FLAGSSET_OFFSET	0x30
 
 static const struct of_device_id realview_scu_match[] = {
-	{ .compatible = "arm,arm11mp-scu", },
 	{ .compatible = "arm,cortex-a9-scu", },
 	{ .compatible = "arm,cortex-a5-scu", },
 	{ }
 };
 
 static const struct of_device_id realview_syscon_match[] = {
-        { .compatible = "arm,core-module-integrator", },
-        { .compatible = "arm,realview-eb-syscon", },
-        { .compatible = "arm,realview-pb11mp-syscon", },
         { .compatible = "arm,realview-pbx-syscon", },
         { },
 };
diff --git a/arch/arm/mm/Kconfig b/arch/arm/mm/Kconfig
index c5bbae86f725..16b62bc0a970 100644
--- a/arch/arm/mm/Kconfig
+++ b/arch/arm/mm/Kconfig
@@ -937,25 +937,6 @@  config VDSO
 	  You must have glibc 2.22 or later for programs to seamlessly
 	  take advantage of this.
 
-config DMA_CACHE_RWFO
-	bool "Enable read/write for ownership DMA cache maintenance"
-	depends on CPU_V6K && SMP
-	default y
-	help
-	  The Snoop Control Unit on ARM11MPCore does not detect the
-	  cache maintenance operations and the dma_{map,unmap}_area()
-	  functions may leave stale cache entries on other CPUs. By
-	  enabling this option, Read or Write For Ownership in the ARMv6
-	  DMA cache maintenance functions is performed. These LDR/STR
-	  instructions change the cache line state to shared or modified
-	  so that the cache operation has the desired effect.
-
-	  Note that the workaround is only valid on processors that do
-	  not perform speculative loads into the D-cache. For such
-	  processors, if cache maintenance operations are not broadcast
-	  in hardware, other workarounds are needed (e.g. cache
-	  maintenance broadcasting in software via FIQ).
-
 config OUTER_CACHE
 	bool
 
diff --git a/arch/arm/mm/cache-v6.S b/arch/arm/mm/cache-v6.S
index abae7ff5defc..f6ee53c1de20 100644
--- a/arch/arm/mm/cache-v6.S
+++ b/arch/arm/mm/cache-v6.S
@@ -201,10 +201,6 @@  ENTRY(v6_flush_kern_dcache_area)
  *	- end     - virtual end address of region
  */
 ENTRY(v6_dma_inv_range)
-#ifdef CONFIG_DMA_CACHE_RWFO
-	ldrb	r2, [r0]			@ read for ownership
-	strb	r2, [r0]			@ write for ownership
-#endif
 	tst	r0, #D_CACHE_LINE_SIZE - 1
 	bic	r0, r0, #D_CACHE_LINE_SIZE - 1
 #ifdef HARVARD_CACHE
@@ -213,10 +209,6 @@  ENTRY(v6_dma_inv_range)
 	mcrne	p15, 0, r0, c7, c11, 1		@ clean unified line
 #endif
 	tst	r1, #D_CACHE_LINE_SIZE - 1
-#ifdef CONFIG_DMA_CACHE_RWFO
-	ldrbne	r2, [r1, #-1]			@ read for ownership
-	strbne	r2, [r1, #-1]			@ write for ownership
-#endif
 	bic	r1, r1, #D_CACHE_LINE_SIZE - 1
 #ifdef HARVARD_CACHE
 	mcrne	p15, 0, r1, c7, c14, 1		@ clean & invalidate D line
@@ -231,10 +223,6 @@  ENTRY(v6_dma_inv_range)
 #endif
 	add	r0, r0, #D_CACHE_LINE_SIZE
 	cmp	r0, r1
-#ifdef CONFIG_DMA_CACHE_RWFO
-	ldrlo	r2, [r0]			@ read for ownership
-	strlo	r2, [r0]			@ write for ownership
-#endif
 	blo	1b
 	mov	r0, #0
 	mcr	p15, 0, r0, c7, c10, 4		@ drain write buffer
@@ -248,9 +236,6 @@  ENTRY(v6_dma_inv_range)
 ENTRY(v6_dma_clean_range)
 	bic	r0, r0, #D_CACHE_LINE_SIZE - 1
 1:
-#ifdef CONFIG_DMA_CACHE_RWFO
-	ldr	r2, [r0]			@ read for ownership
-#endif
 #ifdef HARVARD_CACHE
 	mcr	p15, 0, r0, c7, c10, 1		@ clean D line
 #else
@@ -269,10 +254,6 @@  ENTRY(v6_dma_clean_range)
  *	- end     - virtual end address of region
  */
 ENTRY(v6_dma_flush_range)
-#ifdef CONFIG_DMA_CACHE_RWFO
-	ldrb	r2, [r0]		@ read for ownership
-	strb	r2, [r0]		@ write for ownership
-#endif
 	bic	r0, r0, #D_CACHE_LINE_SIZE - 1
 1:
 #ifdef HARVARD_CACHE
@@ -282,10 +263,6 @@  ENTRY(v6_dma_flush_range)
 #endif
 	add	r0, r0, #D_CACHE_LINE_SIZE
 	cmp	r0, r1
-#ifdef CONFIG_DMA_CACHE_RWFO
-	ldrblo	r2, [r0]			@ read for ownership
-	strblo	r2, [r0]			@ write for ownership
-#endif
 	blo	1b
 	mov	r0, #0
 	mcr	p15, 0, r0, c7, c10, 4		@ drain write buffer
@@ -301,13 +278,7 @@  ENTRY(v6_dma_map_area)
 	add	r1, r1, r0
 	teq	r2, #DMA_FROM_DEVICE
 	beq	v6_dma_inv_range
-#ifndef CONFIG_DMA_CACHE_RWFO
 	b	v6_dma_clean_range
-#else
-	teq	r2, #DMA_TO_DEVICE
-	beq	v6_dma_clean_range
-	b	v6_dma_flush_range
-#endif
 ENDPROC(v6_dma_map_area)
 
 /*
@@ -317,11 +288,9 @@  ENDPROC(v6_dma_map_area)
  *	- dir	- DMA direction
  */
 ENTRY(v6_dma_unmap_area)
-#ifndef CONFIG_DMA_CACHE_RWFO
 	add	r1, r1, r0
 	teq	r2, #DMA_TO_DEVICE
 	bne	v6_dma_inv_range
-#endif
 	ret	lr
 ENDPROC(v6_dma_unmap_area)