mbox series

[v8,0/9] CFI for ARM32 using LLVM

Message ID 20240423-arm32-cfi-v8-0-08f10f5d9297@linaro.org (mailing list archive)
Headers show
Series CFI for ARM32 using LLVM | expand

Message

Linus Walleij April 23, 2024, 7:19 a.m. UTC
This is a first patch set to support CLANG CFI (Control Flow
Integrity) on ARM32.

For information about what CFI is, see:
https://clang.llvm.org/docs/ControlFlowIntegrity.html

For the kernel KCFI flavor, see:
https://lwn.net/Articles/898040/

The base changes required to bring up KCFI on ARM32 was mostly
related to the use of custom vtables in the kernel, combined
with defines to call into these vtable members directly from
sites where they are used.

We annotate all assembly calls that are called directly from
C with SYM_TYPED_FUNC_START()/SYM_FUNC_END() so it is easy
to see while reading the assembly that these functions are
called from C and can have CFI prototype information prefixed
to them.

As protype prefix information is just some random bytes, it is
not possible to "fall through" into an assembly function that
is tagged with SYM_TYPED_FUNC_START(): there will be some
binary noise in front of the function so this design pattern
needs to be explicitly avoided at each site where it occurred.

The approach to binding the calls to C is two-fold:

- Either convert the affected vtable struct to C and provide
  per-CPU prototypes for all the calls (done for TLB, cache)
  or:

- Provide prototypes in a special files just for CFI and tag
  all these functions addressable.

The permissive mode handles the new breakpoint type (0x03) that
LLVM CLANG is emitting.

To runtime-test the patches:
- Enable CONFIG_LKDTM
- echo CFI_FORWARD_PROTO > /sys/kernel/debug/provoke-crash/DIRECT

The patch set has been booted to userspace on the following
test platforms:

- Arm Versatile (QEMU)
- Arm Versatile Express (QEMU)
- multi_v7 booted on Versatile Express (QEMU)
- Footbridge Netwinder (SA110 ARMv4)
- Ux500 (ARMv7 SMP)
- Gemini (FA526)

I am not saying there will not be corner cases that we need
to fix in addition to this, but it is enough to get started.
Looking at what was fixed for arm64 I am a bit weary that
e.g. BPF might need something to trampoline properly.

But hopefullt people can get to testing it and help me fix
remaining issues before the final version, or we can fix it
in-tree.

Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
---
Changes in v8:
- Drop aliases for the coherent cache maintenance functions, this
  will not work because these have different return types, despite
  the resturn valued is mostly ignored.
- Picked up Sami's Reviewed-by
- Drop the already applied ftrace functions patch.
- Drop the first patch in the series (checking calls using vtable
  instead of function) it is not needed anymore after doing the
  deeper fix with tagged symbols.
- I will update the patches in the patch tracker to this version
  and they will become the /2 versions.
- Link to v7: https://lore.kernel.org/r/20240421-arm32-cfi-v7-0-6e132a948cc8@linaro.org

Changes in v7:
- Use report_cfi_failure_noaddr() when reporting CFI faults.
- Leave a better comment on what needs to be done to get better
  target reporting.
- Link to v6: https://lore.kernel.org/r/20240417-arm32-cfi-v6-0-6486385eb136@linaro.org

Changes in v6:
- Add a separate patch adding aliases for some cache functions
  that were just branches to another function.
- Link to v5: https://lore.kernel.org/r/20240415-arm32-cfi-v5-0-ff11093eeccc@linaro.org

Changes in v5:
- I started to put the patches into the patch tracker and it
  rightfully complained that the patches tagging all assembly
  with CFI symbol type macros and adding C prototypes were
  too large.
- Split the two patches annotating assembly into one patch
  doing the annotation and one patch adding the C prototypes.
  This is a good split anyway.
- The first patches from the series are unchanged and in the
  patch tracker, I resend them anyway and will soon populate
  the patch tracker with the split patches from this
  series unless there are more comments.
- Link to v4: https://lore.kernel.org/r/20240328-arm32-cfi-v4-0-a11046139125@linaro.org

Changes in v4:
- Rebase on v6.9-rc1
- Use Ard's patch for converting TLB operation vtables to C
- Rewrite the cache vtables in C and use SYM_SYM_TYPED_FUNC in the
  assembly to make CFI work all the way down.
- Instead of tagging all the delay functions as __nocfi get to the
  root cause and annotate the loop delay code with SYM_TYPED_FUNC_START()
  and rewrite it using explicit branches so we get CFI all the way
  down.
- Drop the patch turning highmem page accesses into static inlines:
  this was probably a development artifact since this code does
  a lot of cache and TLB flusing, and that assembly is now properly
  annotated.
- Do not define static inlines tagged __nocfi for all the proc functions,
  instead provide proper C prototypes in a separate CFI-only file
  and make these explicitly addressable.
- Link to v3: https://lore.kernel.org/r/20240311-arm32-cfi-v3-0-224a0f0a45c2@linaro.org

Changes in v3:
- Use report_cfi_failure() like everyone else in the breakpoint
  handler.
- I think we cannot implement target and type for the report callback
  without operand bundling compiler extensions, so just leaving these as zero.
- Link to v2: https://lore.kernel.org/r/20240307-arm32-cfi-v2-0-cc74ea0306b3@linaro.org

Changes in v2:
- Add the missing ftrace graph tracer stub.
- Enable permissive mode using a breakpoint handler.
- Link to v1: https://lore.kernel.org/r/20240225-arm32-cfi-v1-0-6943306f065b@linaro.org

---
Ard Biesheuvel (1):
      ARM: mm: Make tlbflush routines CFI safe

Linus Walleij (8):
      ARM: mm: Type-annotate all cache assembly routines
      ARM: mm: Use symbol alias for two cache functions
      ARM: mm: Rewrite cacheflush vtables in CFI safe C
      ARM: mm: Type-annotate all per-processor assembly routines
      ARM: mm: Define prototypes for all per-processor calls
      ARM: lib: Annotate loop delay instructions for CFI
      ARM: hw_breakpoint: Handle CFI breakpoints
      ARM: Support CLANG CFI

 arch/arm/Kconfig                     |   1 +
 arch/arm/include/asm/glue-cache.h    |  28 +-
 arch/arm/include/asm/hw_breakpoint.h |   1 +
 arch/arm/kernel/hw_breakpoint.c      |  35 ++
 arch/arm/lib/delay-loop.S            |  16 +-
 arch/arm/mm/Makefile                 |   3 +
 arch/arm/mm/cache-b15-rac.c          |   1 +
 arch/arm/mm/cache-fa.S               |  45 ++-
 arch/arm/mm/cache-nop.S              |  61 ++--
 arch/arm/mm/cache-v4.S               |  53 ++-
 arch/arm/mm/cache-v4wb.S             |  45 ++-
 arch/arm/mm/cache-v4wt.S             |  53 ++-
 arch/arm/mm/cache-v6.S               |  49 ++-
 arch/arm/mm/cache-v7.S               |  74 ++--
 arch/arm/mm/cache-v7m.S              |  53 ++-
 arch/arm/mm/cache.c                  | 663 +++++++++++++++++++++++++++++++++++
 arch/arm/mm/proc-arm1020.S           |  67 ++--
 arch/arm/mm/proc-arm1020e.S          |  68 ++--
 arch/arm/mm/proc-arm1022.S           |  67 ++--
 arch/arm/mm/proc-arm1026.S           |  68 ++--
 arch/arm/mm/proc-arm720.S            |  25 +-
 arch/arm/mm/proc-arm740.S            |  26 +-
 arch/arm/mm/proc-arm7tdmi.S          |  34 +-
 arch/arm/mm/proc-arm920.S            |  74 ++--
 arch/arm/mm/proc-arm922.S            |  67 ++--
 arch/arm/mm/proc-arm925.S            |  64 ++--
 arch/arm/mm/proc-arm926.S            |  73 ++--
 arch/arm/mm/proc-arm940.S            |  67 ++--
 arch/arm/mm/proc-arm946.S            |  63 ++--
 arch/arm/mm/proc-arm9tdmi.S          |  26 +-
 arch/arm/mm/proc-fa526.S             |  24 +-
 arch/arm/mm/proc-feroceon.S          | 103 +++---
 arch/arm/mm/proc-macros.S            |  33 --
 arch/arm/mm/proc-mohawk.S            |  72 ++--
 arch/arm/mm/proc-sa110.S             |  23 +-
 arch/arm/mm/proc-sa1100.S            |  31 +-
 arch/arm/mm/proc-v6.S                |  31 +-
 arch/arm/mm/proc-v7-2level.S         |   8 +-
 arch/arm/mm/proc-v7-3level.S         |   8 +-
 arch/arm/mm/proc-v7.S                |  66 ++--
 arch/arm/mm/proc-v7m.S               |  41 +--
 arch/arm/mm/proc-xsc3.S              |  73 ++--
 arch/arm/mm/proc-xscale.S            | 125 +++----
 arch/arm/mm/proc.c                   | 500 ++++++++++++++++++++++++++
 arch/arm/mm/tlb-fa.S                 |  12 +-
 arch/arm/mm/tlb-v4.S                 |  15 +-
 arch/arm/mm/tlb-v4wb.S               |  12 +-
 arch/arm/mm/tlb-v4wbi.S              |  12 +-
 arch/arm/mm/tlb-v6.S                 |  12 +-
 arch/arm/mm/tlb-v7.S                 |  14 +-
 arch/arm/mm/tlb.c                    |  84 +++++
 51 files changed, 2300 insertions(+), 969 deletions(-)
---
base-commit: 43426466485392d6eedc422fdeddd43eb394d8aa
change-id: 20240115-arm32-cfi-65d60f201108

Best regards,

Comments

Russell King (Oracle) April 29, 2024, 1:21 p.m. UTC | #1
I've applied this to a separate branch, and it should be in
linux-next by tomorrow.

As mentioned today on a previous iteration of the patch series,
I'm not all that happy with the introduction of delay-inducing
branches to solve CFI issues especially on the early CPUs where
a branch causes the CPUs pipeline to be flushed - thus making
branches expensive.

Can we make these branches conditional on the use of CFI?

Didn't Ard mention there was a way to do this using symbol aliases?
Ard Biesheuvel April 29, 2024, 2:54 p.m. UTC | #2
On Mon, 29 Apr 2024 at 15:22, Russell King (Oracle)
<linux@armlinux.org.uk> wrote:
>
> I've applied this to a separate branch, and it should be in
> linux-next by tomorrow.
>
> As mentioned today on a previous iteration of the patch series,
> I'm not all that happy with the introduction of delay-inducing
> branches to solve CFI issues especially on the early CPUs where
> a branch causes the CPUs pipeline to be flushed - thus making
> branches expensive.
>
> Can we make these branches conditional on the use of CFI?
>
> Didn't Ard mention there was a way to do this using symbol aliases?
>

Yes, but only if the prototypes are identical. Not sure why Linus
decided to keep this separate, but 9386/2 implements this for
flush_user_cache_all() vs. flush_kern_cache_all().

For coherent_user_range() vs. coherent_kern_range(), the return type
is different (void vs int)

So we could either make coherent_kern_range() return int as well, or
alternatively, we could emit the branch instructions only when CFI is
enabled (as in that case, SYM_TYPED_FUNC_START() expands to something
that prevents a fall through)

E.g.,

--- a/arch/arm/mm/cache-v4wt.S
+++ b/arch/arm/mm/cache-v4wt.S
@@ -108,7 +108,9 @@ SYM_FUNC_END(v4wt_flush_user_cache_range)
  *     - end    - virtual end address
  */
 SYM_TYPED_FUNC_START(v4wt_coherent_kern_range)
+#ifdef CONFIG_CFI_CLANG
        b       v4wt_coherent_user_range
+#endif
 SYM_FUNC_END(v4wt_coherent_kern_range)

 /*

AFAICT, SYM_TYPED_FUNC_START() does not prevent a fall through if
CFI_CLANG is disabled, but someone should double check.
Sami Tolvanen April 29, 2024, 3:06 p.m. UTC | #3
On Mon, Apr 29, 2024 at 7:54 AM Ard Biesheuvel <ardb@kernel.org> wrote:
>
> On Mon, 29 Apr 2024 at 15:22, Russell King (Oracle)
> <linux@armlinux.org.uk> wrote:
> >
> > I've applied this to a separate branch, and it should be in
> > linux-next by tomorrow.
> >
> > As mentioned today on a previous iteration of the patch series,
> > I'm not all that happy with the introduction of delay-inducing
> > branches to solve CFI issues especially on the early CPUs where
> > a branch causes the CPUs pipeline to be flushed - thus making
> > branches expensive.
> >
> > Can we make these branches conditional on the use of CFI?
> >
> > Didn't Ard mention there was a way to do this using symbol aliases?
> >
>
> Yes, but only if the prototypes are identical. Not sure why Linus
> decided to keep this separate, but 9386/2 implements this for
> flush_user_cache_all() vs. flush_kern_cache_all().
>
> For coherent_user_range() vs. coherent_kern_range(), the return type
> is different (void vs int)
>
> So we could either make coherent_kern_range() return int as well, or
> alternatively, we could emit the branch instructions only when CFI is
> enabled (as in that case, SYM_TYPED_FUNC_START() expands to something
> that prevents a fall through)
>
> E.g.,
>
> --- a/arch/arm/mm/cache-v4wt.S
> +++ b/arch/arm/mm/cache-v4wt.S
> @@ -108,7 +108,9 @@ SYM_FUNC_END(v4wt_flush_user_cache_range)
>   *     - end    - virtual end address
>   */
>  SYM_TYPED_FUNC_START(v4wt_coherent_kern_range)
> +#ifdef CONFIG_CFI_CLANG
>         b       v4wt_coherent_user_range
> +#endif
>  SYM_FUNC_END(v4wt_coherent_kern_range)
>
>  /*
>
> AFAICT, SYM_TYPED_FUNC_START() does not prevent a fall through if
> CFI_CLANG is disabled, but someone should double check.

Correct, it's a normal function entry without CFI_CLANG and falling
through should work just fine.

Sami
Linus Walleij April 30, 2024, 7:43 a.m. UTC | #4
On Mon, Apr 29, 2024 at 4:54 PM Ard Biesheuvel <ardb@kernel.org> wrote:
> On Mon, 29 Apr 2024 at 15:22, Russell King (Oracle)
> <linux@armlinux.org.uk> wrote:
> >
> > I've applied this to a separate branch, and it should be in
> > linux-next by tomorrow.

Neat, it's nice to have some baseline for this.

> > Can we make these branches conditional on the use of CFI?
> >
> > Didn't Ard mention there was a way to do this using symbol aliases?
>
> Yes, but only if the prototypes are identical. Not sure why Linus
> decided to keep this separate, but 9386/2 implements this for
> flush_user_cache_all() vs. flush_kern_cache_all().

Yes I did it separately for bisectability.

> we could emit the branch instructions only when CFI is
> enabled (as in that case, SYM_TYPED_FUNC_START() expands to something
> that prevents a fall through)
>
> E.g.,
>
> --- a/arch/arm/mm/cache-v4wt.S
> +++ b/arch/arm/mm/cache-v4wt.S
> @@ -108,7 +108,9 @@ SYM_FUNC_END(v4wt_flush_user_cache_range)
>   *     - end    - virtual end address
>   */
>  SYM_TYPED_FUNC_START(v4wt_coherent_kern_range)
> +#ifdef CONFIG_CFI_CLANG
>         b       v4wt_coherent_user_range
> +#endif
>  SYM_FUNC_END(v4wt_coherent_kern_range)

Oh it's a bit ifdeffy but I'll make a patch like this because I
can't think of anything better.

Yours,
Linus Walleij