mbox series

[v5,00/10] CFI for ARM32 using LLVM

Message ID 20240415-arm32-cfi-v5-0-ff11093eeccc@linaro.org (mailing list archive)
Headers show
Series CFI for ARM32 using LLVM | expand

Message

Linus Walleij April 15, 2024, 1:43 p.m. UTC
This is a first patch set to support CLANG CFI (Control Flow
Integrity) on ARM32.

For information about what CFI is, see:
https://clang.llvm.org/docs/ControlFlowIntegrity.html

For the kernel KCFI flavor, see:
https://lwn.net/Articles/898040/

The base changes required to bring up KCFI on ARM32 was mostly
related to the use of custom vtables in the kernel, combined
with defines to call into these vtable members directly from
sites where they are used.

We annotate all assembly calls that are called directly from
C with SYM_TYPED_FUNC_START()/SYM_FUNC_END() so it is easy
to see while reading the assembly that these functions are
called from C and can have CFI prototype information prefixed
to them.

As protype prefix information is just some random bytes, it is
not possible to "fall through" into an assembly function that
is tagged with SYM_TYPED_FUNC_START(): there will be some
binary noise in front of the function so this design pattern
needs to be explicitly avoided at each site where it occurred.

The approach to binding the calls to C is two-fold:

- Either convert the affected vtable struct to C and provide
  per-CPU prototypes for all the calls (done for TLB, cache)
  or:

- Provide prototypes in a special files just for CFI and tag
  all these functions addressable.

The permissive mode handles the new breakpoint type (0x03) that
LLVM CLANG is emitting.

To runtime-test the patches:
- Enable CONFIG_LKDTM
- echo CFI_FORWARD_PROTO > /sys/kernel/debug/provoke-crash/DIRECT

The patch set has been booted to userspace on the following
test platforms:

- Arm Versatile (QEMU)
- Arm Versatile Express (QEMU)
- multi_v7 booted on Versatile Express (QEMU)
- Footbridge Netwinder (SA110 ARMv4)
- Ux500 (ARMv7 SMP)
- Gemini (FA526)

I am not saying there will not be corner cases that we need
to fix in addition to this, but it is enough to get started.
Looking at what was fixed for arm64 I am a bit weary that
e.g. BPF might need something to trampoline properly.

But hopefullt people can get to testing it and help me fix
remaining issues before the final version, or we can fix it
in-tree.

Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
---
Changes in v5:
- I started to put the patches into the patch tracker and it
  rightfully complained that the patches tagging all assembly
  with CFI symbol type macros and adding C prototypes were
  too large.
- Split the two patches annotating assembly into one patch
  doing the annotation and one patch adding the C prototypes.
  This is a good split anyway.
- The first patches from the series are unchanged and in the
  patch tracker, I resend them anyway and will soon populate
  the patch tracker with the split patches from this
  series unless there are more comments.
- Link to v4: https://lore.kernel.org/r/20240328-arm32-cfi-v4-0-a11046139125@linaro.org

Changes in v4:
- Rebase on v6.9-rc1
- Use Ard's patch for converting TLB operation vtables to C
- Rewrite the cache vtables in C and use SYM_SYM_TYPED_FUNC in the
  assembly to make CFI work all the way down.
- Instead of tagging all the delay functions as __nocfi get to the
  root cause and annotate the loop delay code with SYM_TYPED_FUNC_START()
  and rewrite it using explicit branches so we get CFI all the way
  down.
- Drop the patch turning highmem page accesses into static inlines:
  this was probably a development artifact since this code does
  a lot of cache and TLB flusing, and that assembly is now properly
  annotated.
- Do not define static inlines tagged __nocfi for all the proc functions,
  instead provide proper C prototypes in a separate CFI-only file
  and make these explicitly addressable.
- Link to v3: https://lore.kernel.org/r/20240311-arm32-cfi-v3-0-224a0f0a45c2@linaro.org

Changes in v3:
- Use report_cfi_failure() like everyone else in the breakpoint
  handler.
- I think we cannot implement target and type for the report callback
  without operand bundling compiler extensions, so just leaving these as zero.
- Link to v2: https://lore.kernel.org/r/20240307-arm32-cfi-v2-0-cc74ea0306b3@linaro.org

Changes in v2:
- Add the missing ftrace graph tracer stub.
- Enable permissive mode using a breakpoint handler.
- Link to v1: https://lore.kernel.org/r/20240225-arm32-cfi-v1-0-6943306f065b@linaro.org

---
Ard Biesheuvel (1):
      ARM: mm: Make tlbflush routines CFI safe

Linus Walleij (9):
      ARM: bugs: Check in the vtable instead of defined aliases
      ARM: ftrace: Define ftrace_stub_graph
      ARM: mm: Type-annotate all cache assembly routines
      ARM: mm: Rewrite cacheflush vtables in CFI safe C
      ARM: mm Type-annotate all per-processor assembly routines
      ARM: mm: Define prototypes for all per-processor calls
      ARM: lib: Annotate loop delay instructions for CFI
      ARM: hw_breakpoint: Handle CFI breakpoints
      ARM: Support CLANG CFI

 arch/arm/Kconfig                     |   1 +
 arch/arm/include/asm/glue-cache.h    |  28 +-
 arch/arm/include/asm/hw_breakpoint.h |   1 +
 arch/arm/kernel/bugs.c               |   2 +-
 arch/arm/kernel/entry-ftrace.S       |   4 +
 arch/arm/kernel/hw_breakpoint.c      |  30 ++
 arch/arm/lib/delay-loop.S            |  16 +-
 arch/arm/mm/Makefile                 |   3 +
 arch/arm/mm/cache-b15-rac.c          |   1 +
 arch/arm/mm/cache-fa.S               |  47 +--
 arch/arm/mm/cache-nop.S              |  57 +--
 arch/arm/mm/cache-v4.S               |  55 +--
 arch/arm/mm/cache-v4wb.S             |  47 ++-
 arch/arm/mm/cache-v4wt.S             |  55 +--
 arch/arm/mm/cache-v6.S               |  49 ++-
 arch/arm/mm/cache-v7.S               |  74 ++--
 arch/arm/mm/cache-v7m.S              |  53 ++-
 arch/arm/mm/cache.c                  | 663 +++++++++++++++++++++++++++++++++++
 arch/arm/mm/proc-arm1020.S           |  69 ++--
 arch/arm/mm/proc-arm1020e.S          |  70 ++--
 arch/arm/mm/proc-arm1022.S           |  69 ++--
 arch/arm/mm/proc-arm1026.S           |  70 ++--
 arch/arm/mm/proc-arm720.S            |  25 +-
 arch/arm/mm/proc-arm740.S            |  26 +-
 arch/arm/mm/proc-arm7tdmi.S          |  34 +-
 arch/arm/mm/proc-arm920.S            |  76 ++--
 arch/arm/mm/proc-arm922.S            |  69 ++--
 arch/arm/mm/proc-arm925.S            |  66 ++--
 arch/arm/mm/proc-arm926.S            |  75 ++--
 arch/arm/mm/proc-arm940.S            |  69 ++--
 arch/arm/mm/proc-arm946.S            |  65 ++--
 arch/arm/mm/proc-arm9tdmi.S          |  26 +-
 arch/arm/mm/proc-fa526.S             |  24 +-
 arch/arm/mm/proc-feroceon.S          | 105 +++---
 arch/arm/mm/proc-macros.S            |  33 --
 arch/arm/mm/proc-mohawk.S            |  74 ++--
 arch/arm/mm/proc-sa110.S             |  23 +-
 arch/arm/mm/proc-sa1100.S            |  31 +-
 arch/arm/mm/proc-v6.S                |  31 +-
 arch/arm/mm/proc-v7-2level.S         |   8 +-
 arch/arm/mm/proc-v7-3level.S         |   8 +-
 arch/arm/mm/proc-v7-bugs.c           |   4 +-
 arch/arm/mm/proc-v7.S                |  66 ++--
 arch/arm/mm/proc-v7m.S               |  41 +--
 arch/arm/mm/proc-xsc3.S              |  75 ++--
 arch/arm/mm/proc-xscale.S            | 127 +++----
 arch/arm/mm/proc.c                   | 500 ++++++++++++++++++++++++++
 arch/arm/mm/tlb-fa.S                 |  12 +-
 arch/arm/mm/tlb-v4.S                 |  15 +-
 arch/arm/mm/tlb-v4wb.S               |  12 +-
 arch/arm/mm/tlb-v4wbi.S              |  12 +-
 arch/arm/mm/tlb-v6.S                 |  12 +-
 arch/arm/mm/tlb-v7.S                 |  14 +-
 arch/arm/mm/tlb.c                    |  84 +++++
 54 files changed, 2334 insertions(+), 972 deletions(-)
---
base-commit: 4cece764965020c22cff7665b18a012006359095
change-id: 20240115-arm32-cfi-65d60f201108

Best regards,

Comments

Ard Biesheuvel April 15, 2024, 4:35 p.m. UTC | #1
Hi Linus,

On Mon, 15 Apr 2024 at 15:43, Linus Walleij <linus.walleij@linaro.org> wrote:
>
> Tag all references to assembly functions with SYM_TYPED_FUNC_START()
> and SYM_FUNC_END() so they also become CFI-safe.
>
> When we add SYM_TYPED_FUNC_START() to assembly calls, a function
> prototype signature will be emitted into the object file at
> (pc-4) at the call site, so that the KCFI runtime check can compare
> this to the expected call. Example:
>
> 8011ae38:       a540670c        .word   0xa540670c
>
> 8011ae3c <v7_flush_icache_all>:
> 8011ae3c:       e3a00000        mov     r0, #0
> 8011ae40:       ee070f11        mcr     15, 0, r0, cr7, cr1, {0}
> 8011ae44:       e12fff1e        bx      lr
>
> This means no "fallthrough" code can enter a SYM_TYPED_FUNC_START()
> call from above it: there will be a function prototype signature
> there, so those are consistently converted to a branch or ret lr
> depending on context.
>
> Tested-by: Kees Cook <keescook@chromium.org>
> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
> ---
>  arch/arm/mm/cache-fa.S      | 39 +++++++++++++++++++++-------------
>  arch/arm/mm/cache-nop.S     | 51 ++++++++++++++++++++++++++-------------------
>  arch/arm/mm/cache-v4.S      | 47 ++++++++++++++++++++++++-----------------
>  arch/arm/mm/cache-v4wb.S    | 39 ++++++++++++++++++++--------------
>  arch/arm/mm/cache-v4wt.S    | 47 ++++++++++++++++++++++++-----------------
>  arch/arm/mm/cache-v6.S      | 41 ++++++++++++++++++++----------------
>  arch/arm/mm/cache-v7.S      | 49 ++++++++++++++++++++++---------------------
>  arch/arm/mm/cache-v7m.S     | 45 ++++++++++++++++++++-------------------
>  arch/arm/mm/proc-arm1020.S  | 39 +++++++++++++++++++++-------------
>  arch/arm/mm/proc-arm1020e.S | 40 ++++++++++++++++++++++-------------
>  arch/arm/mm/proc-arm1022.S  | 39 +++++++++++++++++++++-------------
>  arch/arm/mm/proc-arm1026.S  | 40 ++++++++++++++++++++++-------------
>  arch/arm/mm/proc-arm920.S   | 40 +++++++++++++++++++++--------------
>  arch/arm/mm/proc-arm922.S   | 40 +++++++++++++++++++++--------------
>  arch/arm/mm/proc-arm925.S   | 38 ++++++++++++++++++++-------------
>  arch/arm/mm/proc-arm926.S   | 38 ++++++++++++++++++++-------------
>  arch/arm/mm/proc-arm940.S   | 42 ++++++++++++++++++++++---------------
>  arch/arm/mm/proc-arm946.S   | 38 ++++++++++++++++++++-------------
>  arch/arm/mm/proc-feroceon.S | 48 +++++++++++++++++++++++++-----------------
>  arch/arm/mm/proc-mohawk.S   | 38 ++++++++++++++++++++-------------
>  arch/arm/mm/proc-xsc3.S     | 39 +++++++++++++++++++++-------------
>  arch/arm/mm/proc-xscale.S   | 40 +++++++++++++++++++++--------------
>  22 files changed, 544 insertions(+), 373 deletions(-)
>
> diff --git a/arch/arm/mm/cache-fa.S b/arch/arm/mm/cache-fa.S
> index 71c64e92dead..c3642d5daf38 100644
> --- a/arch/arm/mm/cache-fa.S
> +++ b/arch/arm/mm/cache-fa.S
> @@ -12,6 +12,7 @@
>   */
>  #include <linux/linkage.h>
>  #include <linux/init.h>
> +#include <linux/cfi_types.h>
>  #include <asm/assembler.h>
>  #include <asm/page.h>
>
> @@ -39,11 +40,11 @@
>   *
>   *     Unconditionally clean and invalidate the entire icache.
>   */
> -ENTRY(fa_flush_icache_all)
> +SYM_TYPED_FUNC_START(fa_flush_icache_all)
>         mov     r0, #0
>         mcr     p15, 0, r0, c7, c5, 0           @ invalidate I cache
>         ret     lr
> -ENDPROC(fa_flush_icache_all)
> +SYM_FUNC_END(fa_flush_icache_all)
>
>  /*
>   *     flush_user_cache_all()
> @@ -51,14 +52,16 @@ ENDPROC(fa_flush_icache_all)
>   *     Clean and invalidate all cache entries in a particular address
>   *     space.
>   */
> -ENTRY(fa_flush_user_cache_all)
> -       /* FALLTHROUGH */
> +SYM_TYPED_FUNC_START(fa_flush_user_cache_all)
> +       b       fa_flush_kern_cache_all
> +SYM_FUNC_END(fa_flush_user_cache_all)
> +

Given that the prototypes are identical (and therefore compatible as
far as kCFI is concerned), we might just define an alias

SYM_FUNC_ALIAS(fa_flush_user_cache_all, fa_flush_kern_cache_all)

rather than generate different code.

>  /*
>   *     flush_kern_cache_all()
>   *
>   *     Clean and invalidate the entire cache.
>   */
> -ENTRY(fa_flush_kern_cache_all)
> +SYM_TYPED_FUNC_START(fa_flush_kern_cache_all)
>         mov     ip, #0
>         mov     r2, #VM_EXEC
>  __flush_whole_cache:
> @@ -69,6 +72,7 @@ __flush_whole_cache:
>         mcrne   p15, 0, ip, c7, c10, 4          @ drain write buffer
>         mcrne   p15, 0, ip, c7, c5, 4           @ prefetch flush
>         ret     lr
> +SYM_FUNC_END(fa_flush_kern_cache_all)
>
>  /*
>   *     flush_user_cache_range(start, end, flags)
> @@ -80,7 +84,7 @@ __flush_whole_cache:
>   *     - end   - end address (exclusive, page aligned)
>   *     - flags - vma_area_struct flags describing address space
>   */
> -ENTRY(fa_flush_user_cache_range)
> +SYM_TYPED_FUNC_START(fa_flush_user_cache_range)
>         mov     ip, #0
>         sub     r3, r1, r0                      @ calculate total size
>         cmp     r3, #CACHE_DLIMIT               @ total size >= limit?
> @@ -97,6 +101,7 @@ ENTRY(fa_flush_user_cache_range)
>         mcrne   p15, 0, ip, c7, c10, 4          @ data write barrier
>         mcrne   p15, 0, ip, c7, c5, 4           @ prefetch flush
>         ret     lr
> +SYM_FUNC_END(fa_flush_user_cache_range)
>
>  /*
>   *     coherent_kern_range(start, end)
> @@ -108,8 +113,9 @@ ENTRY(fa_flush_user_cache_range)
>   *     - start  - virtual start address
>   *     - end    - virtual end address
>   */
> -ENTRY(fa_coherent_kern_range)
> -       /* fall through */
> +SYM_TYPED_FUNC_START(fa_coherent_kern_range)
> +       b       fa_coherent_user_range
> +SYM_FUNC_END(fa_coherent_kern_range)
>

Same here.

>  /*
>   *     coherent_user_range(start, end)
> @@ -121,7 +127,7 @@ ENTRY(fa_coherent_kern_range)
>   *     - start  - virtual start address
>   *     - end    - virtual end address
>   */
> -ENTRY(fa_coherent_user_range)
> +SYM_TYPED_FUNC_START(fa_coherent_user_range)
>         bic     r0, r0, #CACHE_DLINESIZE - 1
>  1:     mcr     p15, 0, r0, c7, c14, 1          @ clean and invalidate D entry
>         mcr     p15, 0, r0, c7, c5, 1           @ invalidate I entry
> @@ -133,6 +139,7 @@ ENTRY(fa_coherent_user_range)
>         mcr     p15, 0, r0, c7, c10, 4          @ drain write buffer
>         mcr     p15, 0, r0, c7, c5, 4           @ prefetch flush
>         ret     lr
> +SYM_FUNC_END(fa_coherent_user_range)
>
>  /*
>   *     flush_kern_dcache_area(void *addr, size_t size)
> @@ -143,7 +150,7 @@ ENTRY(fa_coherent_user_range)
>   *     - addr  - kernel address
>   *     - size  - size of region
>   */
> -ENTRY(fa_flush_kern_dcache_area)
> +SYM_TYPED_FUNC_START(fa_flush_kern_dcache_area)
>         add     r1, r0, r1
>  1:     mcr     p15, 0, r0, c7, c14, 1          @ clean & invalidate D line
>         add     r0, r0, #CACHE_DLINESIZE
> @@ -153,6 +160,7 @@ ENTRY(fa_flush_kern_dcache_area)
>         mcr     p15, 0, r0, c7, c5, 0           @ invalidate I cache
>         mcr     p15, 0, r0, c7, c10, 4          @ drain write buffer
>         ret     lr
> +SYM_FUNC_END(fa_flush_kern_dcache_area)
>
>  /*
>   *     dma_inv_range(start, end)
> @@ -203,7 +211,7 @@ fa_dma_clean_range:
>   *     - start   - virtual start address of region
>   *     - end     - virtual end address of region
>   */
> -ENTRY(fa_dma_flush_range)
> +SYM_TYPED_FUNC_START(fa_dma_flush_range)
>         bic     r0, r0, #CACHE_DLINESIZE - 1
>  1:     mcr     p15, 0, r0, c7, c14, 1          @ clean & invalidate D entry
>         add     r0, r0, #CACHE_DLINESIZE
> @@ -212,6 +220,7 @@ ENTRY(fa_dma_flush_range)
>         mov     r0, #0
>         mcr     p15, 0, r0, c7, c10, 4          @ drain write buffer
>         ret     lr
> +SYM_FUNC_END(fa_dma_flush_range)
>
>  /*
>   *     dma_map_area(start, size, dir)
> @@ -219,13 +228,13 @@ ENTRY(fa_dma_flush_range)
>   *     - size  - size of region
>   *     - dir   - DMA direction
>   */
> -ENTRY(fa_dma_map_area)
> +SYM_TYPED_FUNC_START(fa_dma_map_area)
>         add     r1, r1, r0
>         cmp     r2, #DMA_TO_DEVICE
>         beq     fa_dma_clean_range
>         bcs     fa_dma_inv_range
>         b       fa_dma_flush_range
> -ENDPROC(fa_dma_map_area)
> +SYM_FUNC_END(fa_dma_map_area)
>
>  /*
>   *     dma_unmap_area(start, size, dir)
> @@ -233,9 +242,9 @@ ENDPROC(fa_dma_map_area)
>   *     - size  - size of region
>   *     - dir   - DMA direction
>   */
> -ENTRY(fa_dma_unmap_area)
> +SYM_TYPED_FUNC_START(fa_dma_unmap_area)
>         ret     lr
> -ENDPROC(fa_dma_unmap_area)
> +SYM_FUNC_END(fa_dma_unmap_area)
>
>         .globl  fa_flush_kern_cache_louis
>         .equ    fa_flush_kern_cache_louis, fa_flush_kern_cache_all
> diff --git a/arch/arm/mm/cache-nop.S b/arch/arm/mm/cache-nop.S
> index 72d939ef8798..56e94091a55f 100644
> --- a/arch/arm/mm/cache-nop.S
> +++ b/arch/arm/mm/cache-nop.S
> @@ -1,47 +1,56 @@
>  /* SPDX-License-Identifier: GPL-2.0-only */
>  #include <linux/linkage.h>
>  #include <linux/init.h>
> +#include <linux/cfi_types.h>
>  #include <asm/assembler.h>
>
>  #include "proc-macros.S"
>
> -ENTRY(nop_flush_icache_all)
> +SYM_TYPED_FUNC_START(nop_flush_icache_all)
>         ret     lr
> -ENDPROC(nop_flush_icache_all)
> +SYM_FUNC_END(nop_flush_icache_all)
>
> -       .globl nop_flush_kern_cache_all
> -       .equ nop_flush_kern_cache_all, nop_flush_icache_all
> +SYM_TYPED_FUNC_START(nop_flush_kern_cache_all)
> +       ret     lr
> +SYM_FUNC_END(nop_flush_kern_cache_all)
>
>         .globl nop_flush_kern_cache_louis
>         .equ nop_flush_kern_cache_louis, nop_flush_icache_all
>
> -       .globl nop_flush_user_cache_all
> -       .equ nop_flush_user_cache_all, nop_flush_icache_all
> +SYM_TYPED_FUNC_START(nop_flush_user_cache_all)
> +       ret     lr
> +SYM_FUNC_END(nop_flush_user_cache_all)
>
> -       .globl nop_flush_user_cache_range
> -       .equ nop_flush_user_cache_range, nop_flush_icache_all
> +SYM_TYPED_FUNC_START(nop_flush_user_cache_range)
> +       ret     lr
> +SYM_FUNC_END(nop_flush_user_cache_range)
>
> -       .globl nop_coherent_kern_range
> -       .equ nop_coherent_kern_range, nop_flush_icache_all
> +SYM_TYPED_FUNC_START(nop_coherent_kern_range)
> +       ret     lr
> +SYM_FUNC_END(nop_coherent_kern_range)
>
> -ENTRY(nop_coherent_user_range)
> +SYM_TYPED_FUNC_START(nop_coherent_user_range)
>         mov     r0, 0
>         ret     lr
> -ENDPROC(nop_coherent_user_range)
> -
> -       .globl nop_flush_kern_dcache_area
> -       .equ nop_flush_kern_dcache_area, nop_flush_icache_all
> +SYM_FUNC_END(nop_coherent_user_range)
>
> -       .globl nop_dma_flush_range
> -       .equ nop_dma_flush_range, nop_flush_icache_all
> +SYM_TYPED_FUNC_START(nop_flush_kern_dcache_area)
> +       ret     lr
> +SYM_FUNC_END(nop_flush_kern_dcache_area)
>
> -       .globl nop_dma_map_area
> -       .equ nop_dma_map_area, nop_flush_icache_all
> +SYM_TYPED_FUNC_START(nop_dma_flush_range)
> +       ret     lr
> +SYM_FUNC_END(nop_dma_flush_range)
>
> -       .globl nop_dma_unmap_area
> -       .equ nop_dma_unmap_area, nop_flush_icache_all
> +SYM_TYPED_FUNC_START(nop_dma_map_area)
> +       ret     lr
> +SYM_FUNC_END(nop_dma_map_area)
>
>         __INITDATA
>
>         @ define struct cpu_cache_fns (see <asm/cacheflush.h> and proc-macros.S)
>         define_cache_functions nop
> +
> +SYM_TYPED_FUNC_START(nop_dma_unmap_area)
> +       ret     lr
> +SYM_FUNC_END(nop_dma_unmap_area)
> diff --git a/arch/arm/mm/cache-v4.S b/arch/arm/mm/cache-v4.S
> index 7787057e4990..22d9c9d9e0d7 100644
> --- a/arch/arm/mm/cache-v4.S
> +++ b/arch/arm/mm/cache-v4.S
> @@ -6,6 +6,7 @@
>   */
>  #include <linux/linkage.h>
>  #include <linux/init.h>
> +#include <linux/cfi_types.h>
>  #include <asm/assembler.h>
>  #include <asm/page.h>
>  #include "proc-macros.S"
> @@ -15,9 +16,9 @@
>   *
>   *     Unconditionally clean and invalidate the entire icache.
>   */
> -ENTRY(v4_flush_icache_all)
> +SYM_TYPED_FUNC_START(v4_flush_icache_all)
>         ret     lr
> -ENDPROC(v4_flush_icache_all)
> +SYM_FUNC_END(v4_flush_icache_all)
>
>  /*
>   *     flush_user_cache_all()
> @@ -27,21 +28,24 @@ ENDPROC(v4_flush_icache_all)
>   *
>   *     - mm    - mm_struct describing address space
>   */
> -ENTRY(v4_flush_user_cache_all)
> -       /* FALLTHROUGH */
> +SYM_TYPED_FUNC_START(v4_flush_user_cache_all)
> +       b       v4_flush_kern_cache_all
> +SYM_FUNC_END(v4_flush_user_cache_all)
> +
>  /*
>   *     flush_kern_cache_all()
>   *
>   *     Clean and invalidate the entire cache.
>   */
> -ENTRY(v4_flush_kern_cache_all)
> +SYM_TYPED_FUNC_START(v4_flush_kern_cache_all)
>  #ifdef CONFIG_CPU_CP15
>         mov     r0, #0
>         mcr     p15, 0, r0, c7, c7, 0           @ flush ID cache
>         ret     lr
>  #else
> -       /* FALLTHROUGH */
> +       ret     lr
>  #endif
> +SYM_FUNC_END(v4_flush_kern_cache_all)
>

Yeah, this is needlessly obfuscated, falling through twice to a 'ret lr' :-)

You could still use SYM_FUNC_ALIAS() here if the prototypes are identical.
Linus Walleij April 16, 2024, 7:38 a.m. UTC | #2
On Mon, Apr 15, 2024 at 6:35 PM Ard Biesheuvel <ardb@kernel.org> wrote:

> > -ENTRY(fa_flush_user_cache_all)
> > -       /* FALLTHROUGH */
> > +SYM_TYPED_FUNC_START(fa_flush_user_cache_all)
> > +       b       fa_flush_kern_cache_all
> > +SYM_FUNC_END(fa_flush_user_cache_all)
> > +
>
> Given that the prototypes are identical (and therefore compatible as
> far as kCFI is concerned), we might just define an alias
>
> SYM_FUNC_ALIAS(fa_flush_user_cache_all, fa_flush_kern_cache_all)
>
> rather than generate different code.

Now we are getting to the real meat of the review :)

I will comb over both annotation patches and make sure to use
SYM_FUNC_ALIAS() where applicable, it seems a good number of
sites can be simplified this way.

Yours,
Linus Walleij
Russell King (Oracle) April 29, 2024, 1:18 p.m. UTC | #3
On Mon, Apr 15, 2024 at 03:43:23PM +0200, Linus Walleij wrote:
> -ENTRY(fa_flush_user_cache_all)
> -	/* FALLTHROUGH */
> +SYM_TYPED_FUNC_START(fa_flush_user_cache_all)
> +	b	fa_flush_kern_cache_all
> +SYM_FUNC_END(fa_flush_user_cache_all)

The thing I don't like about this is that it makes these functions
less efficient - the early CPUs do _not_ have branch predictors,
and they flush the pipeline on branches, so rather than just dropping
rough, this kind of thing introduces pointless multi-cycle delays for
functionality just to keep CFI happy.