Message ID | 20240430-arm32-cfi-topping-v1-1-4cb6a15552aa@linaro.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | ARM: Use conditionals for CFI branches | expand |
On Tue, 30 Apr 2024 at 10:26, Linus Walleij <linus.walleij@linaro.org> wrote: > > Commit 9385/2 introduced a few branches inside function > prototypes when using CFI in order to deal with the situation > where CFI inserts a few bytes of function information in front > of the symbol. > > This is not good for older CPUs where every cycle counts. > > Commit 9386/2 alleviated the situation a bit by using aliases > for the cache functions with identical signatures. > > This leaves the coherent cache flush functions > *_coherent_kern_range() with these branches to the corresponing > *_coherent_user_range() around, since their return type differ and > they therefore cannot be aliased. > > Solve this by a simple ifdef so at least we can use fallthroughs > when compiling without CFI enabled. > > Suggested-by: Ard Biesheuvel <ardb@kernel.org> > Link: https://lore.kernel.org/linux-arm-kernel/Zi+e9M%2Ff5b%2FSto9H@shell.armlinux.org.uk/ > Signed-off-by: Linus Walleij <linus.walleij@linaro.org> > --- > arch/arm/mm/cache-fa.S | 2 ++ > arch/arm/mm/cache-v4.S | 2 ++ > arch/arm/mm/cache-v4wb.S | 4 ++++ > arch/arm/mm/cache-v4wt.S | 2 ++ > arch/arm/mm/cache-v6.S | 2 ++ > arch/arm/mm/cache-v7.S | 2 ++ > arch/arm/mm/cache-v7m.S | 2 ++ > arch/arm/mm/proc-arm1020.S | 2 ++ > arch/arm/mm/proc-arm1020e.S | 2 ++ > arch/arm/mm/proc-arm1022.S | 2 ++ > arch/arm/mm/proc-arm1026.S | 2 ++ > arch/arm/mm/proc-arm920.S | 2 ++ > arch/arm/mm/proc-arm922.S | 2 ++ > arch/arm/mm/proc-arm925.S | 2 ++ > arch/arm/mm/proc-arm926.S | 2 ++ > arch/arm/mm/proc-arm940.S | 2 ++ > arch/arm/mm/proc-arm946.S | 2 ++ > arch/arm/mm/proc-feroceon.S | 2 ++ > arch/arm/mm/proc-mohawk.S | 2 ++ > arch/arm/mm/proc-xsc3.S | 2 ++ > 20 files changed, 42 insertions(+) > > diff --git a/arch/arm/mm/cache-fa.S b/arch/arm/mm/cache-fa.S > index db454033b76f..4a3668b52a2d 100644 > --- a/arch/arm/mm/cache-fa.S > +++ b/arch/arm/mm/cache-fa.S > @@ -112,7 +112,9 @@ SYM_FUNC_END(fa_flush_user_cache_range) > * - end - virtual end address > */ > SYM_TYPED_FUNC_START(fa_coherent_kern_range) > +#ifdef CONFIG_CFI_CLANG /* Fallthrough if !CFI */ These functions are only called indirectly if MULTI_CACHE is enabled, right? If so, this could be #if defined(CONFIG_CFI_CLANG) && defined(MULTI_CACHE) In any case, with or without that tweak Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
On Tue, Apr 30, 2024 at 11:18:55AM +0200, Ard Biesheuvel wrote: > On Tue, 30 Apr 2024 at 10:26, Linus Walleij <linus.walleij@linaro.org> wrote: > > diff --git a/arch/arm/mm/cache-fa.S b/arch/arm/mm/cache-fa.S > > index db454033b76f..4a3668b52a2d 100644 > > --- a/arch/arm/mm/cache-fa.S > > +++ b/arch/arm/mm/cache-fa.S > > @@ -112,7 +112,9 @@ SYM_FUNC_END(fa_flush_user_cache_range) > > * - end - virtual end address > > */ > > SYM_TYPED_FUNC_START(fa_coherent_kern_range) > > +#ifdef CONFIG_CFI_CLANG /* Fallthrough if !CFI */ > > These functions are only called indirectly if MULTI_CACHE is enabled, > right? If so, this could be > > #if defined(CONFIG_CFI_CLANG) && defined(MULTI_CACHE) I don't see that makes any difference. Whether or not they're called indirectly, the symbol is the entry point to the function. If called directly and the useless branch is there, we'll incur the overhead of the BL instruction flushing the pipeline followed immediately by the overhead of the B instruction flushing the pipeline again.
On Tue, 30 Apr 2024 at 11:23, Russell King (Oracle) <linux@armlinux.org.uk> wrote: > > On Tue, Apr 30, 2024 at 11:18:55AM +0200, Ard Biesheuvel wrote: > > On Tue, 30 Apr 2024 at 10:26, Linus Walleij <linus.walleij@linaro.org> wrote: > > > diff --git a/arch/arm/mm/cache-fa.S b/arch/arm/mm/cache-fa.S > > > index db454033b76f..4a3668b52a2d 100644 > > > --- a/arch/arm/mm/cache-fa.S > > > +++ b/arch/arm/mm/cache-fa.S > > > @@ -112,7 +112,9 @@ SYM_FUNC_END(fa_flush_user_cache_range) > > > * - end - virtual end address > > > */ > > > SYM_TYPED_FUNC_START(fa_coherent_kern_range) > > > +#ifdef CONFIG_CFI_CLANG /* Fallthrough if !CFI */ > > > > These functions are only called indirectly if MULTI_CACHE is enabled, > > right? If so, this could be > > > > #if defined(CONFIG_CFI_CLANG) && defined(MULTI_CACHE) > > I don't see that makes any difference. Whether or not they're called > indirectly, the symbol is the entry point to the function. If called > directly and the useless branch is there, we'll incur the overhead of > the BL instruction flushing the pipeline followed immediately by the > overhead of the B instruction flushing the pipeline again. That is not what I meant. I meant that, if you decide to enable CFI clang for these targets, you still only need the branch if MULTI_CACHE is enabled. IOW, we can avoid the branch entirely in even more situations.
On Tue, Apr 30, 2024 at 11:19 AM Ard Biesheuvel <ardb@kernel.org> wrote: > These functions are only called indirectly if MULTI_CACHE is enabled, > right? If so, this could be > > #if defined(CONFIG_CFI_CLANG) && defined(MULTI_CACHE) Hm Sami may know better here, but with !MULTI_CACHE the functions are indeed called directly, but does that mean the compiler will not emit some magic bytes in front of these symbols then? Yours, Linus Walleij
On Tue, 30 Apr 2024 at 13:45, Linus Walleij <linus.walleij@linaro.org> wrote: > > On Tue, Apr 30, 2024 at 11:19 AM Ard Biesheuvel <ardb@kernel.org> wrote: > > > > These functions are only called indirectly if MULTI_CACHE is enabled, > > right? If so, this could be > > > > #if defined(CONFIG_CFI_CLANG) && defined(MULTI_CACHE) > > Hm Sami may know better here, but with !MULTI_CACHE the > functions are indeed called directly, but does that mean the > compiler will not emit some magic bytes in front of these > symbols then? > Ah yes, of course. So that wouldn't work. So the alternative might be #if defined(CONFIG_CFI_CLANG) && defined(MULTI_CACHE) SYM_TYPED_FUNC_START(v4_flush_kern_dcache_area) b v4_dma_flush_range SYM_FUNC_END(v4_flush_kern_dcache_area) #else SYM_FUNC_ALIAS(v4_flush_kern_dcache_area, v4_dma_flush_range) #endif Of course, unless anyone has an interest in building these older platforms with Clang and CFI, it doesn't really make a difference, so perhaps we should just go with your version.
diff --git a/arch/arm/mm/cache-fa.S b/arch/arm/mm/cache-fa.S index db454033b76f..4a3668b52a2d 100644 --- a/arch/arm/mm/cache-fa.S +++ b/arch/arm/mm/cache-fa.S @@ -112,7 +112,9 @@ SYM_FUNC_END(fa_flush_user_cache_range) * - end - virtual end address */ SYM_TYPED_FUNC_START(fa_coherent_kern_range) +#ifdef CONFIG_CFI_CLANG /* Fallthrough if !CFI */ b fa_coherent_user_range +#endif SYM_FUNC_END(fa_coherent_kern_range) /* diff --git a/arch/arm/mm/cache-v4.S b/arch/arm/mm/cache-v4.S index 0df97a610026..0e94e5193dbd 100644 --- a/arch/arm/mm/cache-v4.S +++ b/arch/arm/mm/cache-v4.S @@ -104,7 +104,9 @@ SYM_FUNC_END(v4_coherent_user_range) * - size - region size */ SYM_TYPED_FUNC_START(v4_flush_kern_dcache_area) +#ifdef CONFIG_CFI_CLANG /* Fallthrough if !CFI */ b v4_dma_flush_range +#endif SYM_FUNC_END(v4_flush_kern_dcache_area) /* diff --git a/arch/arm/mm/cache-v4wb.S b/arch/arm/mm/cache-v4wb.S index 1912f559968c..ce55a2eef5da 100644 --- a/arch/arm/mm/cache-v4wb.S +++ b/arch/arm/mm/cache-v4wb.S @@ -136,7 +136,9 @@ SYM_FUNC_END(v4wb_flush_user_cache_range) */ SYM_TYPED_FUNC_START(v4wb_flush_kern_dcache_area) add r1, r0, r1 +#ifdef CONFIG_CFI_CLANG /* Fallthrough if !CFI */ b v4wb_coherent_user_range +#endif SYM_FUNC_END(v4wb_flush_kern_dcache_area) /* @@ -150,7 +152,9 @@ SYM_FUNC_END(v4wb_flush_kern_dcache_area) * - end - virtual end address */ SYM_TYPED_FUNC_START(v4wb_coherent_kern_range) +#ifdef CONFIG_CFI_CLANG /* Fallthrough if !CFI */ b v4wb_coherent_user_range +#endif SYM_FUNC_END(v4wb_coherent_kern_range) /* diff --git a/arch/arm/mm/cache-v4wt.S b/arch/arm/mm/cache-v4wt.S index 43b4275ab680..a97dc267b3b0 100644 --- a/arch/arm/mm/cache-v4wt.S +++ b/arch/arm/mm/cache-v4wt.S @@ -108,7 +108,9 @@ SYM_FUNC_END(v4wt_flush_user_cache_range) * - end - virtual end address */ SYM_TYPED_FUNC_START(v4wt_coherent_kern_range) +#ifdef CONFIG_CFI_CLANG /* Fallthrough if !CFI */ b v4wt_coherent_user_range +#endif SYM_FUNC_END(v4wt_coherent_kern_range) /* diff --git a/arch/arm/mm/cache-v6.S b/arch/arm/mm/cache-v6.S index 86affd60d6d4..9f415476e218 100644 --- a/arch/arm/mm/cache-v6.S +++ b/arch/arm/mm/cache-v6.S @@ -117,7 +117,9 @@ SYM_FUNC_END(v6_flush_user_cache_range) * - the Icache does not read data from the write buffer */ SYM_TYPED_FUNC_START(v6_coherent_kern_range) +#ifdef CONFIG_CFI_CLANG /* Fallthrough if !CFI */ b v6_coherent_user_range +#endif SYM_FUNC_END(v6_coherent_kern_range) /* diff --git a/arch/arm/mm/cache-v7.S b/arch/arm/mm/cache-v7.S index 170b9ac72331..201ca05436fa 100644 --- a/arch/arm/mm/cache-v7.S +++ b/arch/arm/mm/cache-v7.S @@ -261,7 +261,9 @@ SYM_FUNC_END(v7_flush_user_cache_range) * - the Icache does not read data from the write buffer */ SYM_TYPED_FUNC_START(v7_coherent_kern_range) +#ifdef CONFIG_CFI_CLANG /* Fallthrough if !CFI */ b v7_coherent_user_range +#endif SYM_FUNC_END(v7_coherent_kern_range) /* diff --git a/arch/arm/mm/cache-v7m.S b/arch/arm/mm/cache-v7m.S index 4e670697eabc..14d719eba729 100644 --- a/arch/arm/mm/cache-v7m.S +++ b/arch/arm/mm/cache-v7m.S @@ -286,7 +286,9 @@ SYM_FUNC_END(v7m_flush_user_cache_range) * - the Icache does not read data from the write buffer */ SYM_TYPED_FUNC_START(v7m_coherent_kern_range) +#ifdef CONFIG_CFI_CLANG /* Fallthrough if !CFI */ b v7m_coherent_user_range +#endif SYM_FUNC_END(v7m_coherent_kern_range) /* diff --git a/arch/arm/mm/proc-arm1020.S b/arch/arm/mm/proc-arm1020.S index ddda04929dae..d0ce3414a13e 100644 --- a/arch/arm/mm/proc-arm1020.S +++ b/arch/arm/mm/proc-arm1020.S @@ -203,7 +203,9 @@ SYM_FUNC_END(arm1020_flush_user_cache_range) * - end - virtual end address */ SYM_TYPED_FUNC_START(arm1020_coherent_kern_range) +#ifdef CONFIG_CFI_CLANG /* Fallthrough if !CFI */ b arm1020_coherent_user_range +#endif SYM_FUNC_END(arm1020_coherent_kern_range) /* diff --git a/arch/arm/mm/proc-arm1020e.S b/arch/arm/mm/proc-arm1020e.S index 60169f4ca391..64f031bf6eff 100644 --- a/arch/arm/mm/proc-arm1020e.S +++ b/arch/arm/mm/proc-arm1020e.S @@ -200,7 +200,9 @@ SYM_FUNC_END(arm1020e_flush_user_cache_range) * - end - virtual end address */ SYM_TYPED_FUNC_START(arm1020e_coherent_kern_range) +#ifdef CONFIG_CFI_CLANG /* Fallthrough if !CFI */ b arm1020e_coherent_user_range +#endif SYM_FUNC_END(arm1020e_coherent_kern_range) /* diff --git a/arch/arm/mm/proc-arm1022.S b/arch/arm/mm/proc-arm1022.S index 30b0e4d05a05..42ed5ed07252 100644 --- a/arch/arm/mm/proc-arm1022.S +++ b/arch/arm/mm/proc-arm1022.S @@ -199,7 +199,9 @@ SYM_FUNC_END(arm1022_flush_user_cache_range) * - end - virtual end address */ SYM_TYPED_FUNC_START(arm1022_coherent_kern_range) +#ifdef CONFIG_CFI_CLANG /* Fallthrough if !CFI */ b arm1022_coherent_user_range +#endif SYM_FUNC_END(arm1022_coherent_kern_range) /* diff --git a/arch/arm/mm/proc-arm1026.S b/arch/arm/mm/proc-arm1026.S index ae06262ca779..b3ae62cd553a 100644 --- a/arch/arm/mm/proc-arm1026.S +++ b/arch/arm/mm/proc-arm1026.S @@ -194,7 +194,9 @@ SYM_FUNC_END(arm1026_flush_user_cache_range) * - end - virtual end address */ SYM_TYPED_FUNC_START(arm1026_coherent_kern_range) +#ifdef CONFIG_CFI_CLANG /* Fallthrough if !CFI */ b arm1026_coherent_user_range +#endif SYM_FUNC_END(arm1026_coherent_kern_range) /* diff --git a/arch/arm/mm/proc-arm920.S b/arch/arm/mm/proc-arm920.S index e47411243b08..a30df54ad5fa 100644 --- a/arch/arm/mm/proc-arm920.S +++ b/arch/arm/mm/proc-arm920.S @@ -180,7 +180,9 @@ SYM_FUNC_END(arm920_flush_user_cache_range) * - end - virtual end address */ SYM_TYPED_FUNC_START(arm920_coherent_kern_range) +#ifdef CONFIG_CFI_CLANG /* Fallthrough if !CFI */ b arm920_coherent_user_range +#endif SYM_FUNC_END(arm920_coherent_kern_range) /* diff --git a/arch/arm/mm/proc-arm922.S b/arch/arm/mm/proc-arm922.S index 092f8b7656a7..aac4e048100d 100644 --- a/arch/arm/mm/proc-arm922.S +++ b/arch/arm/mm/proc-arm922.S @@ -182,7 +182,9 @@ SYM_FUNC_END(arm922_flush_user_cache_range) * - end - virtual end address */ SYM_TYPED_FUNC_START(arm922_coherent_kern_range) +#ifdef CONFIG_CFI_CLANG /* Fallthrough if !CFI */ b arm922_coherent_user_range +#endif SYM_FUNC_END(arm922_coherent_kern_range) /* diff --git a/arch/arm/mm/proc-arm925.S b/arch/arm/mm/proc-arm925.S index a06039a3b2a8..035941faeb2e 100644 --- a/arch/arm/mm/proc-arm925.S +++ b/arch/arm/mm/proc-arm925.S @@ -229,7 +229,9 @@ SYM_FUNC_END(arm925_flush_user_cache_range) * - end - virtual end address */ SYM_TYPED_FUNC_START(arm925_coherent_kern_range) +#ifdef CONFIG_CFI_CLANG /* Fallthrough if !CFI */ b arm925_coherent_user_range +#endif SYM_FUNC_END(arm925_coherent_kern_range) /* diff --git a/arch/arm/mm/proc-arm926.S b/arch/arm/mm/proc-arm926.S index 2c8b93d446a9..6f43d6af2d9a 100644 --- a/arch/arm/mm/proc-arm926.S +++ b/arch/arm/mm/proc-arm926.S @@ -192,7 +192,9 @@ SYM_FUNC_END(arm926_flush_user_cache_range) * - end - virtual end address */ SYM_TYPED_FUNC_START(arm926_coherent_kern_range) +#ifdef CONFIG_CFI_CLANG /* Fallthrough if !CFI */ b arm926_coherent_user_range +#endif SYM_FUNC_END(arm926_coherent_kern_range) /* diff --git a/arch/arm/mm/proc-arm940.S b/arch/arm/mm/proc-arm940.S index fbe168213ec1..0d30bb25c42b 100644 --- a/arch/arm/mm/proc-arm940.S +++ b/arch/arm/mm/proc-arm940.S @@ -153,7 +153,9 @@ SYM_FUNC_END(arm940_coherent_kern_range) * - end - virtual end address */ SYM_TYPED_FUNC_START(arm940_coherent_user_range) +#ifdef CONFIG_CFI_CLANG /* Fallthrough if !CFI */ b arm940_flush_kern_dcache_area +#endif SYM_FUNC_END(arm940_coherent_user_range) /* diff --git a/arch/arm/mm/proc-arm946.S b/arch/arm/mm/proc-arm946.S index 4772b46064e6..27750ace2ced 100644 --- a/arch/arm/mm/proc-arm946.S +++ b/arch/arm/mm/proc-arm946.S @@ -173,7 +173,9 @@ SYM_FUNC_END(arm946_flush_user_cache_range) * - end - virtual end address */ SYM_TYPED_FUNC_START(arm946_coherent_kern_range) +#ifdef CONFIG_CFI_CLANG /* Fallthrough if !CFI */ b arm946_coherent_user_range +#endif SYM_FUNC_END(arm946_coherent_kern_range) /* diff --git a/arch/arm/mm/proc-feroceon.S b/arch/arm/mm/proc-feroceon.S index 8519ff60e512..f67b2ffac854 100644 --- a/arch/arm/mm/proc-feroceon.S +++ b/arch/arm/mm/proc-feroceon.S @@ -208,7 +208,9 @@ SYM_FUNC_END(feroceon_flush_user_cache_range) */ .align 5 SYM_TYPED_FUNC_START(feroceon_coherent_kern_range) +#ifdef CONFIG_CFI_CLANG /* Fallthrough if !CFI */ b feroceon_coherent_user_range +#endif SYM_FUNC_END(feroceon_coherent_kern_range) /* diff --git a/arch/arm/mm/proc-mohawk.S b/arch/arm/mm/proc-mohawk.S index 091f6c6719a8..8e9f38da863a 100644 --- a/arch/arm/mm/proc-mohawk.S +++ b/arch/arm/mm/proc-mohawk.S @@ -163,7 +163,9 @@ SYM_FUNC_END(mohawk_flush_user_cache_range) * - end - virtual end address */ SYM_TYPED_FUNC_START(mohawk_coherent_kern_range) +#ifdef CONFIG_CFI_CLANG /* Fallthrough if !CFI */ b mohawk_coherent_user_range +#endif SYM_FUNC_END(mohawk_coherent_kern_range) /* diff --git a/arch/arm/mm/proc-xsc3.S b/arch/arm/mm/proc-xsc3.S index f4889911eca2..14927b380452 100644 --- a/arch/arm/mm/proc-xsc3.S +++ b/arch/arm/mm/proc-xsc3.S @@ -223,7 +223,9 @@ SYM_FUNC_END(xsc3_flush_user_cache_range) * it also trashes the mini I-cache used by JTAG debuggers. */ SYM_TYPED_FUNC_START(xsc3_coherent_kern_range) +#ifdef CONFIG_CFI_CLANG /* Fallthrough if !CFI */ b xsc3_coherent_user_range +#endif SYM_FUNC_END(xsc3_coherent_kern_range) SYM_TYPED_FUNC_START(xsc3_coherent_user_range)
Commit 9385/2 introduced a few branches inside function prototypes when using CFI in order to deal with the situation where CFI inserts a few bytes of function information in front of the symbol. This is not good for older CPUs where every cycle counts. Commit 9386/2 alleviated the situation a bit by using aliases for the cache functions with identical signatures. This leaves the coherent cache flush functions *_coherent_kern_range() with these branches to the corresponing *_coherent_user_range() around, since their return type differ and they therefore cannot be aliased. Solve this by a simple ifdef so at least we can use fallthroughs when compiling without CFI enabled. Suggested-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/linux-arm-kernel/Zi+e9M%2Ff5b%2FSto9H@shell.armlinux.org.uk/ Signed-off-by: Linus Walleij <linus.walleij@linaro.org> --- arch/arm/mm/cache-fa.S | 2 ++ arch/arm/mm/cache-v4.S | 2 ++ arch/arm/mm/cache-v4wb.S | 4 ++++ arch/arm/mm/cache-v4wt.S | 2 ++ arch/arm/mm/cache-v6.S | 2 ++ arch/arm/mm/cache-v7.S | 2 ++ arch/arm/mm/cache-v7m.S | 2 ++ arch/arm/mm/proc-arm1020.S | 2 ++ arch/arm/mm/proc-arm1020e.S | 2 ++ arch/arm/mm/proc-arm1022.S | 2 ++ arch/arm/mm/proc-arm1026.S | 2 ++ arch/arm/mm/proc-arm920.S | 2 ++ arch/arm/mm/proc-arm922.S | 2 ++ arch/arm/mm/proc-arm925.S | 2 ++ arch/arm/mm/proc-arm926.S | 2 ++ arch/arm/mm/proc-arm940.S | 2 ++ arch/arm/mm/proc-arm946.S | 2 ++ arch/arm/mm/proc-feroceon.S | 2 ++ arch/arm/mm/proc-mohawk.S | 2 ++ arch/arm/mm/proc-xsc3.S | 2 ++ 20 files changed, 42 insertions(+) --- base-commit: b42bf708a0c9d6c7303bd623a6b615b6d599dc75 change-id: 20240430-arm32-cfi-topping-3253b11d3600 Best regards,