Message ID | 20220205110500.47430-1-ashimida@linux.alibaba.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | [v4,1/1,AARCH64,PR102768] aarch64: Add compiler support for Shadow Call Stack | expand |
Dan Li <ashimida@linux.alibaba.com> writes: > Shadow Call Stack can be used to protect the return address of a > function at runtime, and clang already supports this feature[1]. > > To enable SCS in user mode, in addition to compiler, other support > is also required (as discussed in [2]). This patch only adds basic > support for SCS from the compiler side, and provides convenience > for users to enable SCS. > > For linux kernel, only the support of the compiler is required. > > [1] https://clang.llvm.org/docs/ShadowCallStack.html > [2] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102768 > > Signed-off-by: Dan Li <ashimida@linux.alibaba.com> > > gcc/ChangeLog: > > * config/aarch64/aarch64.c (SLOT_REQUIRED): > Rename wb_candidate[12] to wb_push_candidate[12]. > (aarch64_layout_frame): Likewise, and > change callee_adjust when scs is enabled. > (aarch64_save_callee_saves): > Rename wb_candidate[12] to wb_push_candidate[12]. > (aarch64_restore_callee_saves): Likewise. > (aarch64_get_separate_components): Likewise. > (aarch64_expand_prologue): Push x30 onto SCS before it's > pushed onto stack. > (aarch64_expand_epilogue): Pop x30 frome SCS, while > preventing it from being popped from the regular stack again. > (aarch64_override_options_internal): Add SCS compile option check. > (TARGET_HAVE_SHADOW_CALL_STACK): New hook. > * config/aarch64/aarch64.h (struct GTY): Add is_scs_enabled, > wb_pop_candidate[12], and rename wb_candidate[12] to > wb_push_candidate[12]. > * config/aarch64/aarch64.md (scs_push): New template. > (scs_pop): Likewise. > * doc/invoke.texi: Document -fsanitize=shadow-call-stack. > * doc/tm.texi: Regenerate. > * doc/tm.texi.in: Add hook have_shadow_call_stack. > * flag-types.h (enum sanitize_code): > Add SANITIZE_SHADOW_CALL_STACK. > * opts.c: Add shadow-call-stack. > * target.def: New hook. > * toplev.c (process_options): Add SCS compile option check. > > gcc/testsuite/ChangeLog: > > * gcc.target/aarch64/shadow_call_stack_1.c: New test. > * gcc.target/aarch64/shadow_call_stack_2.c: New test. > * gcc.target/aarch64/shadow_call_stack_3.c: New test. > * gcc.target/aarch64/shadow_call_stack_4.c: New test. > * gcc.target/aarch64/shadow_call_stack_5.c: New test. > * gcc.target/aarch64/shadow_call_stack_6.c: New test. > * gcc.target/aarch64/shadow_call_stack_7.c: New test. > * gcc.target/aarch64/shadow_call_stack_8.c: New test. > --- > V4: > - Added wb_[push|pop]_candidates[12] to ensure push/pop can > emit different registers. > > V3: > - Change scs_push/pop to standard move patterns. > - Optimize scs_pop to avoid pop x30 twice when shadow stack is enabled. LGTM. Just a few minor comments below. > > gcc/config/aarch64/aarch64.c | 121 +++++++++++++----- > gcc/config/aarch64/aarch64.h | 21 ++- > gcc/config/aarch64/aarch64.md | 10 ++ > gcc/doc/invoke.texi | 30 +++++ > gcc/doc/tm.texi | 5 + > gcc/doc/tm.texi.in | 2 + > gcc/flag-types.h | 2 + > gcc/opts.c | 1 + > gcc/target.def | 8 ++ > .../gcc.target/aarch64/shadow_call_stack_1.c | 6 + > .../gcc.target/aarch64/shadow_call_stack_2.c | 6 + > .../gcc.target/aarch64/shadow_call_stack_3.c | 45 +++++++ > .../gcc.target/aarch64/shadow_call_stack_4.c | 20 +++ > .../gcc.target/aarch64/shadow_call_stack_5.c | 18 +++ > .../gcc.target/aarch64/shadow_call_stack_6.c | 18 +++ > .../gcc.target/aarch64/shadow_call_stack_7.c | 18 +++ > .../gcc.target/aarch64/shadow_call_stack_8.c | 24 ++++ > gcc/toplev.c | 10 ++ > 18 files changed, 332 insertions(+), 33 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/aarch64/shadow_call_stack_1.c > create mode 100644 gcc/testsuite/gcc.target/aarch64/shadow_call_stack_2.c > create mode 100644 gcc/testsuite/gcc.target/aarch64/shadow_call_stack_3.c > create mode 100644 gcc/testsuite/gcc.target/aarch64/shadow_call_stack_4.c > create mode 100644 gcc/testsuite/gcc.target/aarch64/shadow_call_stack_5.c > create mode 100644 gcc/testsuite/gcc.target/aarch64/shadow_call_stack_6.c > create mode 100644 gcc/testsuite/gcc.target/aarch64/shadow_call_stack_7.c > create mode 100644 gcc/testsuite/gcc.target/aarch64/shadow_call_stack_8.c > > diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c > index 699c105a42a..f4d962917c4 100644 > --- a/gcc/config/aarch64/aarch64.c > +++ b/gcc/config/aarch64/aarch64.c > @@ -79,6 +79,7 @@ > #include "tree-ssa-loop-niter.h" > #include "fractional-cost.h" > #include "rtlanal.h" > +#include "asan.h" > > /* This file should be included last. */ > #include "target-def.h" > @@ -7291,8 +7292,8 @@ aarch64_layout_frame (void) > #define SLOT_NOT_REQUIRED (-2) > #define SLOT_REQUIRED (-1) > > - frame.wb_candidate1 = INVALID_REGNUM; > - frame.wb_candidate2 = INVALID_REGNUM; > + frame.wb_push_candidate1 = INVALID_REGNUM; > + frame.wb_push_candidate2 = INVALID_REGNUM; > frame.spare_pred_reg = INVALID_REGNUM; > > /* First mark all the registers that really need to be saved... */ > @@ -7407,9 +7408,9 @@ aarch64_layout_frame (void) > { > /* FP and LR are placed in the linkage record. */ > frame.reg_offset[R29_REGNUM] = offset; > - frame.wb_candidate1 = R29_REGNUM; > + frame.wb_push_candidate1 = R29_REGNUM; > frame.reg_offset[R30_REGNUM] = offset + UNITS_PER_WORD; > - frame.wb_candidate2 = R30_REGNUM; > + frame.wb_push_candidate2 = R30_REGNUM; > offset += 2 * UNITS_PER_WORD; > } > > @@ -7417,10 +7418,10 @@ aarch64_layout_frame (void) > if (known_eq (frame.reg_offset[regno], SLOT_REQUIRED)) > { > frame.reg_offset[regno] = offset; > - if (frame.wb_candidate1 == INVALID_REGNUM) > - frame.wb_candidate1 = regno; > - else if (frame.wb_candidate2 == INVALID_REGNUM) > - frame.wb_candidate2 = regno; > + if (frame.wb_push_candidate1 == INVALID_REGNUM) > + frame.wb_push_candidate1 = regno; > + else if (frame.wb_push_candidate2 == INVALID_REGNUM) > + frame.wb_push_candidate2 = regno; > offset += UNITS_PER_WORD; > } > > @@ -7443,11 +7444,11 @@ aarch64_layout_frame (void) > } > > frame.reg_offset[regno] = offset; > - if (frame.wb_candidate1 == INVALID_REGNUM) > - frame.wb_candidate1 = regno; > - else if (frame.wb_candidate2 == INVALID_REGNUM > - && frame.wb_candidate1 >= V0_REGNUM) > - frame.wb_candidate2 = regno; > + if (frame.wb_push_candidate1 == INVALID_REGNUM) > + frame.wb_push_candidate1 = regno; > + else if (frame.wb_push_candidate2 == INVALID_REGNUM > + && frame.wb_push_candidate1 >= V0_REGNUM) > + frame.wb_push_candidate2 = regno; > offset += vector_save_size; > } > > @@ -7478,10 +7479,38 @@ aarch64_layout_frame (void) > frame.sve_callee_adjust = 0; > frame.callee_offset = 0; > > + frame.wb_pop_candidate1 = frame.wb_push_candidate1; > + frame.wb_pop_candidate2 = frame.wb_push_candidate2; > + > + /* Shadow call stack only deals with functions where the LR is pushed > + onto the stack and without specifying the "no_sanitize" attribute > + with the argument "shadow-call-stack". */ > + frame.is_scs_enabled > + = (!crtl->calls_eh_return > + && sanitize_flags_p (SANITIZE_SHADOW_CALL_STACK) > + && known_ge (cfun->machine->frame.reg_offset[LR_REGNUM], 0)); > + > + /* When shadow call stack is enabled, the scs_pop in the epilogue will > + restore x30, and we don't need to pop x30 again in the traditional > + way. Pop candidates record the registers that need to be popped > + eventually. */ > + if (frame.is_scs_enabled) > + { > + if (frame.wb_push_candidate2 == R30_REGNUM) > + frame.wb_pop_candidate2 = INVALID_REGNUM; > + else if (frame.wb_push_candidate1 == R30_REGNUM) > + frame.wb_pop_candidate1 = INVALID_REGNUM; Although it makes no difference to the behaviour, I think it would be clearer to use pop rather than push in the checks here. > + } > + > + /* If candidate2 is INVALID_REGNUM, we need to adjust max_push_offset to > + 256 to ensure that the offset meets the requirements of emit_move_insn. > + Similarly, if candidate1 is INVALID_REGNUM, we need to set > + max_push_offset to 0, because no registers are popped at this time, > + so callee_adjust cannot be adjusted. */ > HOST_WIDE_INT max_push_offset = 0; > - if (frame.wb_candidate2 != INVALID_REGNUM) > + if (frame.wb_pop_candidate2 != INVALID_REGNUM) > max_push_offset = 512; > - else if (frame.wb_candidate1 != INVALID_REGNUM) > + else if (frame.wb_pop_candidate1 != INVALID_REGNUM) > max_push_offset = 256; > > HOST_WIDE_INT const_size, const_outgoing_args_size, const_fp_offset; > @@ -7571,8 +7600,8 @@ aarch64_layout_frame (void) > { > /* We've decided not to associate any register saves with the initial > stack allocation. */ > - frame.wb_candidate1 = INVALID_REGNUM; > - frame.wb_candidate2 = INVALID_REGNUM; > + frame.wb_pop_candidate1 = frame.wb_push_candidate1 = INVALID_REGNUM; > + frame.wb_pop_candidate2 = frame.wb_push_candidate2 = INVALID_REGNUM; > } > > frame.laid_out = true; > @@ -7885,8 +7914,8 @@ aarch64_save_callee_saves (poly_int64 start_offset, > bool frame_related_p = aarch64_emit_cfi_for_reg_p (regno); > > if (skip_wb > - && (regno == cfun->machine->frame.wb_candidate1 > - || regno == cfun->machine->frame.wb_candidate2)) > + && (regno == cfun->machine->frame.wb_push_candidate1 > + || regno == cfun->machine->frame.wb_push_candidate2)) > continue; > > if (cfun->machine->reg_is_wrapped_separately[regno]) > @@ -7996,8 +8025,8 @@ aarch64_restore_callee_saves (poly_int64 start_offset, unsigned start, > rtx reg, mem; > > if (skip_wb > - && (regno == cfun->machine->frame.wb_candidate1 > - || regno == cfun->machine->frame.wb_candidate2)) > + && (regno == cfun->machine->frame.wb_push_candidate1 > + || regno == cfun->machine->frame.wb_push_candidate2)) Shouldn't this be using pop rather than push? > continue; > > machine_mode mode = aarch64_reg_save_mode (regno); > @@ -8168,8 +8197,8 @@ aarch64_get_separate_components (void) > if (cfun->machine->frame.spare_pred_reg != INVALID_REGNUM) > bitmap_clear_bit (components, cfun->machine->frame.spare_pred_reg); > > - unsigned reg1 = cfun->machine->frame.wb_candidate1; > - unsigned reg2 = cfun->machine->frame.wb_candidate2; > + unsigned reg1 = cfun->machine->frame.wb_push_candidate1; > + unsigned reg2 = cfun->machine->frame.wb_push_candidate2; > /* If registers have been chosen to be stored/restored with > writeback don't interfere with them to avoid having to output explicit > stack adjustment instructions. */ > @@ -8778,8 +8807,8 @@ aarch64_expand_prologue (void) > poly_int64 sve_callee_adjust = cfun->machine->frame.sve_callee_adjust; > poly_int64 below_hard_fp_saved_regs_size > = cfun->machine->frame.below_hard_fp_saved_regs_size; > - unsigned reg1 = cfun->machine->frame.wb_candidate1; > - unsigned reg2 = cfun->machine->frame.wb_candidate2; > + unsigned reg1 = cfun->machine->frame.wb_push_candidate1; > + unsigned reg2 = cfun->machine->frame.wb_push_candidate2; > bool emit_frame_chain = cfun->machine->frame.emit_frame_chain; > rtx_insn *insn; > > @@ -8810,6 +8839,10 @@ aarch64_expand_prologue (void) > RTX_FRAME_RELATED_P (insn) = 1; > } > > + /* Push return address to shadow call stack. */ > + if (cfun->machine->frame.is_scs_enabled) > + emit_insn (gen_scs_push ()); > + > if (flag_stack_usage_info) > current_function_static_stack_size = constant_lower_bound (frame_size); > > @@ -8956,8 +8989,8 @@ aarch64_expand_epilogue (bool for_sibcall) > poly_int64 sve_callee_adjust = cfun->machine->frame.sve_callee_adjust; > poly_int64 below_hard_fp_saved_regs_size > = cfun->machine->frame.below_hard_fp_saved_regs_size; > - unsigned reg1 = cfun->machine->frame.wb_candidate1; > - unsigned reg2 = cfun->machine->frame.wb_candidate2; > + unsigned reg1 = cfun->machine->frame.wb_pop_candidate1; > + unsigned reg2 = cfun->machine->frame.wb_pop_candidate2; > rtx cfi_ops = NULL; > rtx_insn *insn; > /* A stack clash protection prologue may not have left EP0_REGNUM or > @@ -9027,9 +9060,19 @@ aarch64_expand_epilogue (bool for_sibcall) > false, &cfi_ops); > if (maybe_ne (sve_callee_adjust, 0)) > aarch64_add_sp (NULL_RTX, NULL_RTX, sve_callee_adjust, true); > - aarch64_restore_callee_saves (callee_offset - sve_callee_adjust, > - R0_REGNUM, R30_REGNUM, > - callee_adjust != 0, &cfi_ops); > + > + /* When shadow call stack is enabled, the scs_pop in the epilogue will > + restore x30, we don't need to restore x30 again in the traditional > + way. */ > + if (cfun->machine->frame.is_scs_enabled) > + aarch64_restore_callee_saves (callee_offset - sve_callee_adjust, > + R0_REGNUM, R29_REGNUM, > + callee_adjust != 0, &cfi_ops); > + else > + aarch64_restore_callee_saves (callee_offset - sve_callee_adjust, > + R0_REGNUM, R30_REGNUM, > + callee_adjust != 0, &cfi_ops); > + Very minor, but I think it would be better to have: unsigned int last_gpr = (cfun->machine->frame.is_scs_enabled ? R29_REGNUM : R30_REGNUM); so that we don't need to repeat the other arguments. There's then less risk of the two versions getting out of sync. > > if (need_barrier_p) > emit_insn (gen_stack_tie (stack_pointer_rtx, stack_pointer_rtx)); > @@ -9066,6 +9109,17 @@ aarch64_expand_epilogue (bool for_sibcall) > RTX_FRAME_RELATED_P (insn) = 1; > } > > + /* Pop return address from shadow call stack. */ > + if (cfun->machine->frame.is_scs_enabled) > + { > + machine_mode mode = aarch64_reg_save_mode (R30_REGNUM); > + rtx reg = gen_rtx_REG (mode, R30_REGNUM); > + > + insn = emit_insn (gen_scs_pop ()); > + add_reg_note (insn, REG_CFA_RESTORE, reg); > + RTX_FRAME_RELATED_P (insn) = 1; > + } > + > /* We prefer to emit the combined return/authenticate instruction RETAA, > however there are three cases in which we must instead emit an explicit > authentication instruction. > @@ -16492,6 +16546,10 @@ aarch64_override_options_internal (struct gcc_options *opts) > aarch64_stack_protector_guard_offset = offs; > } > > + if ((flag_sanitize & SANITIZE_SHADOW_CALL_STACK) > + && !fixed_regs[R18_REGNUM]) > + error ("%<-fsanitize=shadow-call-stack%> requires %<-ffixed-x18%>"); > + > initialize_aarch64_code_model (opts); > initialize_aarch64_tls_size (opts); > > @@ -26505,6 +26563,9 @@ aarch64_libgcc_floating_mode_supported_p > #undef TARGET_ASM_FUNCTION_EPILOGUE > #define TARGET_ASM_FUNCTION_EPILOGUE aarch64_sls_emit_blr_function_thunks > > +#undef TARGET_HAVE_SHADOW_CALL_STACK > +#define TARGET_HAVE_SHADOW_CALL_STACK true > + > struct gcc_target targetm = TARGET_INITIALIZER; > > #include "gt-aarch64.h" > diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h > index 2792bb29adb..b5efe083f30 100644 > --- a/gcc/config/aarch64/aarch64.h > +++ b/gcc/config/aarch64/aarch64.h > @@ -906,9 +906,21 @@ struct GTY (()) aarch64_frame > Indicated by CALLEE_ADJUST == 0 && EMIT_FRAME_CHAIN. > > These fields indicate which registers we've decided to handle using > - (1) or (2), or INVALID_REGNUM if none. */ > - unsigned wb_candidate1; > - unsigned wb_candidate2; > + (1) or (2), or INVALID_REGNUM if none. > + > + In some cases we don't always need to pop all registers in the push > + candidates, pop candidates record which registers need to be popped > + eventually. The initial value of a pop candidate is copied from its > + corresponding push candidate. > + > + Currently, the pop candidates are only used for shadow call stack. Maybe s/the/different/, since the variables themselves are used regardless of -fsanitize. Thanks, Richard > + When "-fsanitize=shadow-call-stack" is specified, we replace x30 in > + the pop candidate with INVALID_REGNUM to ensure that x30 is not > + popped twice. */ > + unsigned wb_push_candidate1; > + unsigned wb_push_candidate2; > + unsigned wb_pop_candidate1; > + unsigned wb_pop_candidate2; > > /* Big-endian SVE frames need a spare predicate register in order > to save vector registers in the correct layout for unwinding. > @@ -916,6 +928,9 @@ struct GTY (()) aarch64_frame > unsigned spare_pred_reg; > > bool laid_out; > + > + /* True if shadow call stack should be enabled for the current function. */ > + bool is_scs_enabled; > }; > > typedef struct GTY (()) machine_function > diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md > index 1a39470a1fe..48666b4b218 100644 > --- a/gcc/config/aarch64/aarch64.md > +++ b/gcc/config/aarch64/aarch64.md > @@ -6994,6 +6994,16 @@ (define_insn "xpaclri" > "hint\t7 // xpaclri" > ) > > +;; Save X30 in the X18-based POST_INC stack (consistent with clang). > +(define_expand "scs_push" > + [(set (mem:DI (post_inc:DI (reg:DI R18_REGNUM))) > + (reg:DI R30_REGNUM))]) > + > +;; Load X30 form the X18-based PRE_DEC stack (consistent with clang). > +(define_expand "scs_pop" > + [(set (reg:DI R30_REGNUM) > + (mem:DI (pre_dec:DI (reg:DI R18_REGNUM))))]) > + > ;; UNSPEC_VOLATILE is considered to use and clobber all hard registers and > ;; all of memory. This blocks insns from being moved across this point. > > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi > index 71992b8c597..1e580107fab 100644 > --- a/gcc/doc/invoke.texi > +++ b/gcc/doc/invoke.texi > @@ -15224,6 +15224,36 @@ add @code{detect_invalid_pointer_pairs=2} to the environment variable > @env{ASAN_OPTIONS}. Using @code{detect_invalid_pointer_pairs=1} detects > invalid operation only when both pointers are non-null. > > +@item -fsanitize=shadow-call-stack > +@opindex fsanitize=shadow-call-stack > +Enable ShadowCallStack, a security enhancement mechanism used to protect > +programs against return address overwrites (e.g. stack buffer overflows.) > +It works by saving a function's return address to a separately allocated > +shadow call stack in the function prologue and restoring the return address > +from the shadow call stack in the function epilogue. Instrumentation only > +occurs in functions that need to save the return address to the stack. > + > +Currently it only supports the aarch64 platform. It is specifically > +designed for linux kernels that enable the CONFIG_SHADOW_CALL_STACK option. > +For the user space programs, runtime support is not currently provided > +in libc and libgcc. Users who want to use this feature in user space need > +to provide their own support for the runtime. It should be noted that > +this may cause the ABI rules to be broken. > + > +On aarch64, the instrumentation makes use of the platform register @code{x18}. > +This generally means that any code that may run on the same thread as code > +compiled with ShadowCallStack must be compiled with the flag > +@option{-ffixed-x18}, otherwise functions compiled without > +@option{-ffixed-x18} might clobber @code{x18} and so corrupt the shadow > +stack pointer. > + > +Also, because there is no userspace runtime support, code compiled with > +ShadowCallStack cannot use exception handling. Use @option{-fno-exceptions} > +to turn off exceptions. > + > +See @uref{https://clang.llvm.org/docs/ShadowCallStack.html} for more > +details. > + > @item -fsanitize=thread > @opindex fsanitize=thread > Enable ThreadSanitizer, a fast data race detector. > diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi > index 990152f5b15..19c130d7420 100644 > --- a/gcc/doc/tm.texi > +++ b/gcc/doc/tm.texi > @@ -12575,3 +12575,8 @@ counters are incremented using atomic operations. Targets not supporting > 64-bit atomic operations may override the default value and request a 32-bit > type. > @end deftypefn > + > +@deftypevr {Target Hook} bool TARGET_HAVE_SHADOW_CALL_STACK > +This value is true if the target platform supports > +@option{-fsanitize=shadow-call-stack}. The default value is false. > +@end deftypevr > diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in > index 193c9bdd853..01db5f54b5a 100644 > --- a/gcc/doc/tm.texi.in > +++ b/gcc/doc/tm.texi.in > @@ -8179,3 +8179,5 @@ maintainer is familiar with. > @hook TARGET_MEMTAG_UNTAGGED_POINTER > > @hook TARGET_GCOV_TYPE_SIZE > + > +@hook TARGET_HAVE_SHADOW_CALL_STACK > diff --git a/gcc/flag-types.h b/gcc/flag-types.h > index a5a637160d7..c22ef35a289 100644 > --- a/gcc/flag-types.h > +++ b/gcc/flag-types.h > @@ -321,6 +321,8 @@ enum sanitize_code { > SANITIZE_HWADDRESS = 1UL << 28, > SANITIZE_USER_HWADDRESS = 1UL << 29, > SANITIZE_KERNEL_HWADDRESS = 1UL << 30, > + /* Shadow Call Stack. */ > + SANITIZE_SHADOW_CALL_STACK = 1UL << 31, > SANITIZE_SHIFT = SANITIZE_SHIFT_BASE | SANITIZE_SHIFT_EXPONENT, > SANITIZE_UNDEFINED = SANITIZE_SHIFT | SANITIZE_DIVIDE | SANITIZE_UNREACHABLE > | SANITIZE_VLA | SANITIZE_NULL | SANITIZE_RETURN > diff --git a/gcc/opts.c b/gcc/opts.c > index 4472cec1b98..b2e00e8067a 100644 > --- a/gcc/opts.c > +++ b/gcc/opts.c > @@ -1994,6 +1994,7 @@ const struct sanitizer_opts_s sanitizer_opts[] = > SANITIZER_OPT (vptr, SANITIZE_VPTR, true), > SANITIZER_OPT (pointer-overflow, SANITIZE_POINTER_OVERFLOW, true), > SANITIZER_OPT (builtin, SANITIZE_BUILTIN, true), > + SANITIZER_OPT (shadow-call-stack, SANITIZE_SHADOW_CALL_STACK, false), > SANITIZER_OPT (all, ~0U, true), > #undef SANITIZER_OPT > { NULL, 0U, 0UL, false } > diff --git a/gcc/target.def b/gcc/target.def > index 87feeec2ea1..ce382714399 100644 > --- a/gcc/target.def > +++ b/gcc/target.def > @@ -7084,6 +7084,14 @@ counters are incremented using atomic operations. Targets not supporting\n\ > type.", > HOST_WIDE_INT, (void), default_gcov_type_size) > > +/* This value represents whether the shadow call stack is implemented on > + the target platform. */ > +DEFHOOKPOD > +(have_shadow_call_stack, > + "This value is true if the target platform supports\n\ > +@option{-fsanitize=shadow-call-stack}. The default value is false.", > + bool, false) > + > /* Close the 'struct gcc_target' definition. */ > HOOK_VECTOR_END (C90_EMPTY_HACK) > > diff --git a/gcc/testsuite/gcc.target/aarch64/shadow_call_stack_1.c b/gcc/testsuite/gcc.target/aarch64/shadow_call_stack_1.c > new file mode 100644 > index 00000000000..ab68d6e8482 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/shadow_call_stack_1.c > @@ -0,0 +1,6 @@ > +/* { dg-do compile } */ > +/* { dg-options "-fsanitize=shadow-call-stack -fno-exceptions" } */ > + > +int i; > + > +/* { dg-error "'-fsanitize=shadow-call-stack' requires '-ffixed-x18'" "" {target "aarch64*-*-*" } 0 } */ > diff --git a/gcc/testsuite/gcc.target/aarch64/shadow_call_stack_2.c b/gcc/testsuite/gcc.target/aarch64/shadow_call_stack_2.c > new file mode 100644 > index 00000000000..b5139a24559 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/shadow_call_stack_2.c > @@ -0,0 +1,6 @@ > +/* { dg-do compile } */ > +/* { dg-options "-fsanitize=shadow-call-stack -ffixed-x18 -fexceptions" } */ > + > +int i; > + > +/* { dg-error "'-fsanitize=shadow-call-stack' requires '-fno-exceptions'" "" {target "aarch64*-*-*" } 0 } */ > diff --git a/gcc/testsuite/gcc.target/aarch64/shadow_call_stack_3.c b/gcc/testsuite/gcc.target/aarch64/shadow_call_stack_3.c > new file mode 100644 > index 00000000000..b88e490f3ae > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/shadow_call_stack_3.c > @@ -0,0 +1,45 @@ > +/* Testing shadow call stack. */ > +/* scs_push: str x30, [x18], #8 */ > +/* scs_pop: ldr x30, [x18, #-8]! */ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -fsanitize=shadow-call-stack -ffixed-x18 -fno-exceptions" } */ > + > +int foo (int); > + > +/* function not use x30. */ > +int func1 (void) > +{ > + return 0; > +} > + > +/* function use x30. */ > +int func2 (void) > +{ > + /* scs push */ > + asm volatile ("":::"x30"); > + > + return 0; > + /* scs pop */ > +} > + > +/* sibcall. */ > +int func3 (int a, int b) > +{ > + /* scs push */ > + asm volatile ("":::"x30"); > + > + return foo (a+b); > + /* scs pop */ > +} > + > +/* eh_return. */ > +int func4 (long offset, void *handler) > +{ > + /* Do not emit scs push/pop */ > + asm volatile ("":::"x30"); > + > + __builtin_eh_return (offset, handler); > +} > + > +/* { dg-final { scan-assembler-times {str\tx30, \[x18\], #?8} 2 } } */ > +/* { dg-final { scan-assembler-times {ldr\tx30, \[x18, #?-8\]!} 2 } } */ > diff --git a/gcc/testsuite/gcc.target/aarch64/shadow_call_stack_4.c b/gcc/testsuite/gcc.target/aarch64/shadow_call_stack_4.c > new file mode 100644 > index 00000000000..f63169340e1 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/shadow_call_stack_4.c > @@ -0,0 +1,20 @@ > +/* Testing the disable of shadow call stack. */ > +/* scs_push: str x30, [x18], #8 */ > +/* scs_pop: ldr x30, [x18, #-8]! */ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -fno-omit-frame-pointer -fsanitize=shadow-call-stack -ffixed-x18 -fno-exceptions" } */ > + > +int foo (int); > + > +/* function disable shadow call stack. */ > +int __attribute__((no_sanitize("shadow-call-stack"))) func1 (void) > +{ > + asm volatile ("":::"x30"); > + > + return 0; > +} > + > +/* { dg-final { scan-assembler-not {str\tx30, \[x18\], #?8} } } */ > +/* { dg-final { scan-assembler-not {ldr\tx30, \[x18, #?-8\]!} } } */ > +/* { dg-final { scan-assembler-times {stp\tx29, x30, \[sp, -[0-9]+\]!} 1 } } */ > +/* { dg-final { scan-assembler-times {ldp\tx29, x30, \[sp\], [0-9]+} 1 } } */ > diff --git a/gcc/testsuite/gcc.target/aarch64/shadow_call_stack_5.c b/gcc/testsuite/gcc.target/aarch64/shadow_call_stack_5.c > new file mode 100644 > index 00000000000..d88357ca04d > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/shadow_call_stack_5.c > @@ -0,0 +1,18 @@ > +/* Verify: > + * -fno-omit-frame-pointer -fsanitize=shadow-call-stack -fno-exceptions -ffixed-x18. > + * without outgoing. > + * total frame size <= 512 but > 256. > + * callee-saved reg: x29, x30. > + * optimized code should use "stp x29, x30, [sp]" to save frame chain. > + * optimized code should use "ldr x29, [sp]" to restore x29 only. */ > + > +/* { dg-do compile } */ > +/* { dg-options "-O2 -fno-omit-frame-pointer -fsanitize=shadow-call-stack -fno-exceptions -ffixed-x18 --save-temps" } */ > + > +#include "test_frame_common.h" > + > +t_frame_pattern (func1, 400, ) > + > +/* { dg-final { scan-assembler-times {stp\tx29, x30, \[sp\]} 1 } } */ > +/* { dg-final { scan-assembler {ldr\tx29, \[sp\]} } } */ > + > diff --git a/gcc/testsuite/gcc.target/aarch64/shadow_call_stack_6.c b/gcc/testsuite/gcc.target/aarch64/shadow_call_stack_6.c > new file mode 100644 > index 00000000000..83b74834c6a > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/shadow_call_stack_6.c > @@ -0,0 +1,18 @@ > +/* Verify: > + * -fomit-frame-pointer -fsanitize=shadow-call-stack -fno-exceptions -ffixed-x18. > + * without outgoing. > + * total frame size <= 256. > + * callee-saved reg: x30 only. > + * optimized code should use "str x30, [sp]" to save x30 in prologue. > + * optimized code should not restore x30 in epilogue. */ > + > +/* { dg-do compile } */ > +/* { dg-options "-O2 -fomit-frame-pointer -fsanitize=shadow-call-stack -fno-exceptions -ffixed-x18 --save-temps" } */ > + > +#include "test_frame_common.h" > + > +t_frame_pattern (func1, 200, ) > + > +/* { dg-final { scan-assembler-times {str\tx30, \[sp\]} 1 } } */ > +/* { dg-final { scan-assembler-not {ld[r|p]\tx30, \[sp} } } */ > + > diff --git a/gcc/testsuite/gcc.target/aarch64/shadow_call_stack_7.c b/gcc/testsuite/gcc.target/aarch64/shadow_call_stack_7.c > new file mode 100644 > index 00000000000..5537fb3293a > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/shadow_call_stack_7.c > @@ -0,0 +1,18 @@ > +/* Verify: > + * -fomit-frame-pointer -fsanitize=shadow-call-stack -fno-exceptions -ffixed-x18. > + * without outgoing. > + * total frame size <= 256. > + * callee-saved reg: x19, x30. > + * optimized code should use "stp x19, x30, [sp, -x]!" to save x19, x30 in prologue. > + * optimized code should use "ldr x19, [sp], x" to restore x19 only. */ > + > +/* { dg-do compile } */ > +/* { dg-options "-O2 -fomit-frame-pointer -fsanitize=shadow-call-stack -fno-exceptions -ffixed-x18 --save-temps" } */ > + > +#include "test_frame_common.h" > + > +t_frame_pattern (func1, 200, "x19") > + > +/* { dg-final { scan-assembler-times {stp\tx19, x30, \[sp, -[0-9]+\]!} 1 } } */ > +/* { dg-final { scan-assembler {ldr\tx19, \[sp\], [0-9]+} } } */ > + > diff --git a/gcc/testsuite/gcc.target/aarch64/shadow_call_stack_8.c b/gcc/testsuite/gcc.target/aarch64/shadow_call_stack_8.c > new file mode 100644 > index 00000000000..b03f26f7bcf > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/shadow_call_stack_8.c > @@ -0,0 +1,24 @@ > +/* Verify: > + * -fomit-frame-pointer -fsanitize=shadow-call-stack -fno-exceptions -ffixed-x18. > + * without outgoing. > + * total frame <= 512 but > 256. > + * callee-saved reg: x19, x20, x30. > + * optimized code should use "stp x19, x20, [sp, -x]!" to save x19, x20 in prologue. > + * optimized code should use "str x30, [sp " to save x30 in prologue. > + * optimized code should use "ldp x19, x20, [sp], x" to retore x19, x20 in epilogue. > + * optimized code should not restore x30 in epilogue. */ > + > +/* { dg-do compile } */ > +/* { dg-options "-O0 -fomit-frame-pointer -fsanitize=shadow-call-stack -fno-exceptions -ffixed-x18 --save-temps" } */ > + > +int func1 (void) > +{ > + unsigned char a[200]; > + __asm__ ("":::"x19","x20","x30"); > + return 0; > +} > + > +/* { dg-final { scan-assembler-times {stp\tx19, x20, \[sp, -[0-9]+\]!} 1 } } */ > +/* { dg-final { scan-assembler-times {str\tx30, \[sp} 1 } } */ > +/* { dg-final { scan-assembler {ldp\tx19, x20, \[sp\], [0-9]+} } } */ > +/* { dg-final { scan-assembler-not {ld[r|p]\tx30, \[sp} } } */ > diff --git a/gcc/toplev.c b/gcc/toplev.c > index e91f083f8ff..93d17ddbda1 100644 > --- a/gcc/toplev.c > +++ b/gcc/toplev.c > @@ -1677,6 +1677,16 @@ process_options (bool no_backend) > flag_sanitize &= ~SANITIZE_HWADDRESS; > } > > + if (flag_sanitize & SANITIZE_SHADOW_CALL_STACK) > + { > + if (!targetm.have_shadow_call_stack) > + sorry ("%<-fsanitize=shadow-call-stack%> not supported " > + "in current platform"); > + else if (flag_exceptions) > + error_at (UNKNOWN_LOCATION, "%<-fsanitize=shadow-call-stack%> " > + "requires %<-fno-exceptions%>"); > + } > + > HOST_WIDE_INT patch_area_size, patch_area_start; > parse_and_check_patch_area (flag_patchable_function_entry, false, > &patch_area_size, &patch_area_start);
On 2/9/22 08:08, Richard Sandiford wrote: > Dan Li <ashimida@linux.alibaba.com> writes: >> + >> + /* When shadow call stack is enabled, the scs_pop in the epilogue will >> + restore x30, and we don't need to pop x30 again in the traditional >> + way. Pop candidates record the registers that need to be popped >> + eventually. */ >> + if (frame.is_scs_enabled) >> + { >> + if (frame.wb_push_candidate2 == R30_REGNUM) >> + frame.wb_pop_candidate2 = INVALID_REGNUM; >> + else if (frame.wb_push_candidate1 == R30_REGNUM) >> + frame.wb_pop_candidate1 = INVALID_REGNUM; > > Although it makes no difference to the behaviour, I think it would be > clearer to use pop rather than push in the checks here. > Got it. >> @@ -7885,8 +7914,8 @@ aarch64_save_callee_saves (poly_int64 start_offset, >> bool frame_related_p = aarch64_emit_cfi_for_reg_p (regno); >> >> if (skip_wb >> - && (regno == cfun->machine->frame.wb_candidate1 >> - || regno == cfun->machine->frame.wb_candidate2)) >> + && (regno == cfun->machine->frame.wb_push_candidate1 >> + || regno == cfun->machine->frame.wb_push_candidate2)) >> continue; >> >> if (cfun->machine->reg_is_wrapped_separately[regno]) >> @@ -7996,8 +8025,8 @@ aarch64_restore_callee_saves (poly_int64 start_offset, unsigned start, >> rtx reg, mem; >> >> if (skip_wb >> - && (regno == cfun->machine->frame.wb_candidate1 >> - || regno == cfun->machine->frame.wb_candidate2)) >> + && (regno == cfun->machine->frame.wb_push_candidate1 >> + || regno == cfun->machine->frame.wb_push_candidate2)) > > Shouldn't this be using pop rather than push? > There might be a little difference: - Using push candidates means that a register to be ignored in pop candidates will not be emitted again during the "restore" (pop_candidates should always be a subset of push_candidates, since popping a register without a push might not make sense). - Using pop candidates means that a registers to be ignored in pop candidates will be re-emitted during the "restore". For example, if we specify to ignore the x20 register in pop: --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -7502,6 +7502,8 @@ aarch64_layout_frame (void) frame.wb_pop_candidate1 = INVALID_REGNUM; } + if (frame.wb_pop_candidate2 == R20_REGNUM) + frame.wb_pop_candidate2 = INVALID_REGNUM; /* If candidate2 is INVALID_REGNUM, we need to adjust max_push_offset to 256 to ensure that the offset meets the requirements of emit_move_insn. Similarly, if candidate1 is INVALID_REGNUM, we need to set With the test case: int main(void) { __asm__ ("":::"x19", "x20"); return 0; } When we use "pop_candidate[12]", one more insn is emitted: 0000000000400604 <main>: 400604: a9bf53f3 stp x19, x20, [sp, #-16]! 400608: 52800000 mov w0, #0x0 + 40060c: f94007f4 ldr x20, [sp, #8] 400610: f84107f3 ldr x19, [sp], #16 400614: d65f03c0 ret But in the case of ignoring a specific register (like scs ignores x30), there is no difference between the two (because we always need to explicitly specify which registers to ignore in the parameter of aarch64_restore_callee_saves). If pop looks better here, I'd like to change it to pop in the next version :). >> + /* When shadow call stack is enabled, the scs_pop in the epilogue will >> + restore x30, we don't need to restore x30 again in the traditional >> + way. */ >> + if (cfun->machine->frame.is_scs_enabled) >> + aarch64_restore_callee_saves (callee_offset - sve_callee_adjust, >> + R0_REGNUM, R29_REGNUM, >> + callee_adjust != 0, &cfi_ops); >> + else >> + aarch64_restore_callee_saves (callee_offset - sve_callee_adjust, >> + R0_REGNUM, R30_REGNUM, >> + callee_adjust != 0, &cfi_ops); >> + > > Very minor, but I think it would be better to have: > > unsigned int last_gpr = (cfun->machine->frame.is_scs_enabled > ? R29_REGNUM : R30_REGNUM); > > so that we don't need to repeat the other arguments. There's then > less risk of the two versions getting out of sync. > Got it. >> >> if (need_barrier_p) >> emit_insn (gen_stack_tie (stack_pointer_rtx, stack_pointer_rtx)); >> @@ -9066,6 +9109,17 @@ aarch64_expand_epilogue (bool for_sibcall) >> RTX_FRAME_RELATED_P (insn) = 1; >> } >> >> + /* Pop return address from shadow call stack. */ >> + if (cfun->machine->frame.is_scs_enabled) >> + { >> + machine_mode mode = aarch64_reg_save_mode (R30_REGNUM); >> + rtx reg = gen_rtx_REG (mode, R30_REGNUM); >> + >> + insn = emit_insn (gen_scs_pop ()); >> + add_reg_note (insn, REG_CFA_RESTORE, reg); >> + RTX_FRAME_RELATED_P (insn) = 1; >> + } >> + >> /* We prefer to emit the combined return/authenticate instruction RETAA, >> however there are three cases in which we must instead emit an explicit >> authentication instruction. >> @@ -16492,6 +16546,10 @@ aarch64_override_options_internal (struct gcc_options *opts) >> aarch64_stack_protector_guard_offset = offs; >> } >> >> + if ((flag_sanitize & SANITIZE_SHADOW_CALL_STACK) >> + && !fixed_regs[R18_REGNUM]) >> + error ("%<-fsanitize=shadow-call-stack%> requires %<-ffixed-x18%>"); >> + >> initialize_aarch64_code_model (opts); >> initialize_aarch64_tls_size (opts); >> >> @@ -26505,6 +26563,9 @@ aarch64_libgcc_floating_mode_supported_p >> #undef TARGET_ASM_FUNCTION_EPILOGUE >> #define TARGET_ASM_FUNCTION_EPILOGUE aarch64_sls_emit_blr_function_thunks >> >> +#undef TARGET_HAVE_SHADOW_CALL_STACK >> +#define TARGET_HAVE_SHADOW_CALL_STACK true >> + >> struct gcc_target targetm = TARGET_INITIALIZER; >> >> #include "gt-aarch64.h" >> diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h >> index 2792bb29adb..b5efe083f30 100644 >> --- a/gcc/config/aarch64/aarch64.h >> +++ b/gcc/config/aarch64/aarch64.h >> @@ -906,9 +906,21 @@ struct GTY (()) aarch64_frame >> Indicated by CALLEE_ADJUST == 0 && EMIT_FRAME_CHAIN. >> >> These fields indicate which registers we've decided to handle using >> - (1) or (2), or INVALID_REGNUM if none. */ >> - unsigned wb_candidate1; >> - unsigned wb_candidate2; >> + (1) or (2), or INVALID_REGNUM if none. >> + >> + In some cases we don't always need to pop all registers in the push >> + candidates, pop candidates record which registers need to be popped >> + eventually. The initial value of a pop candidate is copied from its >> + corresponding push candidate. >> + >> + Currently, the pop candidates are only used for shadow call stack. > > Maybe s/the/different/, since the variables themselves are used > regardless of -fsanitize. > Got it. Thanks, Dan
Dan Li <ashimida@linux.alibaba.com> writes: > On 2/9/22 08:08, Richard Sandiford wrote: >> Dan Li <ashimida@linux.alibaba.com> writes: >>> + >>> + /* When shadow call stack is enabled, the scs_pop in the epilogue will >>> + restore x30, and we don't need to pop x30 again in the traditional >>> + way. Pop candidates record the registers that need to be popped >>> + eventually. */ >>> + if (frame.is_scs_enabled) >>> + { >>> + if (frame.wb_push_candidate2 == R30_REGNUM) >>> + frame.wb_pop_candidate2 = INVALID_REGNUM; >>> + else if (frame.wb_push_candidate1 == R30_REGNUM) >>> + frame.wb_pop_candidate1 = INVALID_REGNUM; >> >> Although it makes no difference to the behaviour, I think it would be >> clearer to use pop rather than push in the checks here. >> > > Got it. >>> @@ -7885,8 +7914,8 @@ aarch64_save_callee_saves (poly_int64 start_offset, >>> bool frame_related_p = aarch64_emit_cfi_for_reg_p (regno); >>> >>> if (skip_wb >>> - && (regno == cfun->machine->frame.wb_candidate1 >>> - || regno == cfun->machine->frame.wb_candidate2)) >>> + && (regno == cfun->machine->frame.wb_push_candidate1 >>> + || regno == cfun->machine->frame.wb_push_candidate2)) >>> continue; >>> >>> if (cfun->machine->reg_is_wrapped_separately[regno]) >>> @@ -7996,8 +8025,8 @@ aarch64_restore_callee_saves (poly_int64 start_offset, unsigned start, >>> rtx reg, mem; >>> >>> if (skip_wb >>> - && (regno == cfun->machine->frame.wb_candidate1 >>> - || regno == cfun->machine->frame.wb_candidate2)) >>> + && (regno == cfun->machine->frame.wb_push_candidate1 >>> + || regno == cfun->machine->frame.wb_push_candidate2)) >> >> Shouldn't this be using pop rather than push? >> > > There might be a little difference: > > - Using push candidates means that a register to be ignored in pop > candidates will not be emitted again during the "restore" (pop_candidates > should always be a subset of push_candidates, since popping a register > without a push might not make sense). The push candidates are simply a subset of the saved registers though. Similarly, the pop candidates are simply a subset of the restored registers. So I think the requirement operates at that level: the restored registers must be a subset of the saved registers. In other circumstances it could have been the other way around: there might have been a change that stopped us from saving two registers during the allocation, but we wanted to carry on restoring two registers during the deallocation. I don't think there's a reason that the push candidates *have* to be a superset of the pop candidates (even though they are with the current change). > - Using pop candidates means that a registers to be ignored in pop > candidates will be re-emitted during the "restore". For example, > if we specify to ignore the x20 register in pop: > > --- a/gcc/config/aarch64/aarch64.c > +++ b/gcc/config/aarch64/aarch64.c > @@ -7502,6 +7502,8 @@ aarch64_layout_frame (void) > frame.wb_pop_candidate1 = INVALID_REGNUM; > } > > + if (frame.wb_pop_candidate2 == R20_REGNUM) > + frame.wb_pop_candidate2 = INVALID_REGNUM; > /* If candidate2 is INVALID_REGNUM, we need to adjust max_push_offset to > 256 to ensure that the offset meets the requirements of emit_move_insn. > Similarly, if candidate1 is INVALID_REGNUM, we need to set > > With the test case: > > int main(void) > { > __asm__ ("":::"x19", "x20"); > return 0; > } > > When we use "pop_candidate[12]", one more insn is emitted: > > 0000000000400604 <main>: > 400604: a9bf53f3 stp x19, x20, [sp, #-16]! > 400608: 52800000 mov w0, #0x0 > + 40060c: f94007f4 ldr x20, [sp, #8] > 400610: f84107f3 ldr x19, [sp], #16 > 400614: d65f03c0 ret > > But in the case of ignoring a specific register (like scs ignores x30), > there is no difference between the two (because we always need > to explicitly specify which registers to ignore in the parameter of > aarch64_restore_callee_saves). I think this is the correct behaviour. If we don't want to restore a register at all then it should be excluded from the restore list somehow. In your case you're doing that be using a limit of X29_REGNUM instead of X30_REGNUM. FWIW, I did wonder whether aarch64_restore_callee_saves should be doing the scs pop, rather than aarch64_expand_epilogue, and in an earlier draft of the previous review I'd asked for that. It does seem conceptually cleaner, but in practice, it would probably have been awkward to implement. E.g. we'd need to explicitly stop an LDP being formed with X30 as the second register. But treating scs push and scs pop as part of the register save and restore sequences would have one advantage: it would allow the scs push and scs pop to be shrink-wrapped. Thanks, Richard > If pop looks better here, I'd like to change it to pop in the > next version :). > >>> + /* When shadow call stack is enabled, the scs_pop in the epilogue will >>> + restore x30, we don't need to restore x30 again in the traditional >>> + way. */ >>> + if (cfun->machine->frame.is_scs_enabled) >>> + aarch64_restore_callee_saves (callee_offset - sve_callee_adjust, >>> + R0_REGNUM, R29_REGNUM, >>> + callee_adjust != 0, &cfi_ops); >>> + else >>> + aarch64_restore_callee_saves (callee_offset - sve_callee_adjust, >>> + R0_REGNUM, R30_REGNUM, >>> + callee_adjust != 0, &cfi_ops); >>> + >> >> Very minor, but I think it would be better to have: >> >> unsigned int last_gpr = (cfun->machine->frame.is_scs_enabled >> ? R29_REGNUM : R30_REGNUM); >> >> so that we don't need to repeat the other arguments. There's then >> less risk of the two versions getting out of sync. >> > > Got it. > >>> >>> if (need_barrier_p) >>> emit_insn (gen_stack_tie (stack_pointer_rtx, stack_pointer_rtx)); >>> @@ -9066,6 +9109,17 @@ aarch64_expand_epilogue (bool for_sibcall) >>> RTX_FRAME_RELATED_P (insn) = 1; >>> } >>> >>> + /* Pop return address from shadow call stack. */ >>> + if (cfun->machine->frame.is_scs_enabled) >>> + { >>> + machine_mode mode = aarch64_reg_save_mode (R30_REGNUM); >>> + rtx reg = gen_rtx_REG (mode, R30_REGNUM); >>> + >>> + insn = emit_insn (gen_scs_pop ()); >>> + add_reg_note (insn, REG_CFA_RESTORE, reg); >>> + RTX_FRAME_RELATED_P (insn) = 1; >>> + } >>> + >>> /* We prefer to emit the combined return/authenticate instruction RETAA, >>> however there are three cases in which we must instead emit an explicit >>> authentication instruction. >>> @@ -16492,6 +16546,10 @@ aarch64_override_options_internal (struct gcc_options *opts) >>> aarch64_stack_protector_guard_offset = offs; >>> } >>> >>> + if ((flag_sanitize & SANITIZE_SHADOW_CALL_STACK) >>> + && !fixed_regs[R18_REGNUM]) >>> + error ("%<-fsanitize=shadow-call-stack%> requires %<-ffixed-x18%>"); >>> + >>> initialize_aarch64_code_model (opts); >>> initialize_aarch64_tls_size (opts); >>> >>> @@ -26505,6 +26563,9 @@ aarch64_libgcc_floating_mode_supported_p >>> #undef TARGET_ASM_FUNCTION_EPILOGUE >>> #define TARGET_ASM_FUNCTION_EPILOGUE aarch64_sls_emit_blr_function_thunks >>> >>> +#undef TARGET_HAVE_SHADOW_CALL_STACK >>> +#define TARGET_HAVE_SHADOW_CALL_STACK true >>> + >>> struct gcc_target targetm = TARGET_INITIALIZER; >>> >>> #include "gt-aarch64.h" >>> diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h >>> index 2792bb29adb..b5efe083f30 100644 >>> --- a/gcc/config/aarch64/aarch64.h >>> +++ b/gcc/config/aarch64/aarch64.h >>> @@ -906,9 +906,21 @@ struct GTY (()) aarch64_frame >>> Indicated by CALLEE_ADJUST == 0 && EMIT_FRAME_CHAIN. >>> >>> These fields indicate which registers we've decided to handle using >>> - (1) or (2), or INVALID_REGNUM if none. */ >>> - unsigned wb_candidate1; >>> - unsigned wb_candidate2; >>> + (1) or (2), or INVALID_REGNUM if none. >>> + >>> + In some cases we don't always need to pop all registers in the push >>> + candidates, pop candidates record which registers need to be popped >>> + eventually. The initial value of a pop candidate is copied from its >>> + corresponding push candidate. >>> + >>> + Currently, the pop candidates are only used for shadow call stack. >> >> Maybe s/the/different/, since the variables themselves are used >> regardless of -fsanitize. >> > > Got it. > > Thanks, > Dan
On 2/10/22 01:55, Richard Sandiford wrote: >> >> There might be a little difference: >> >> - Using push candidates means that a register to be ignored in pop >> candidates will not be emitted again during the "restore" (pop_candidates >> should always be a subset of push_candidates, since popping a register >> without a push might not make sense). > > The push candidates are simply a subset of the saved registers though. > Similarly, the pop candidates are simply a subset of the restored registers. > So I think the requirement operates at that level: the restored registers > must be a subset of the saved registers. > > In other circumstances it could have been the other way around: > there might have been a change that stopped us from saving two > registers during the allocation, but we wanted to carry on restoring > two registers during the deallocation. I don't think there's a > reason that the push candidates *have* to be a superset of the > pop candidates (even though they are with the current change). > Oh yeah, that sounds more reasonable. >> When we use "pop_candidate[12]", one more insn is emitted: >> >> 0000000000400604 <main>: >> 400604: a9bf53f3 stp x19, x20, [sp, #-16]! >> 400608: 52800000 mov w0, #0x0 >> + 40060c: f94007f4 ldr x20, [sp, #8] >> 400610: f84107f3 ldr x19, [sp], #16 >> 400614: d65f03c0 ret >> >> But in the case of ignoring a specific register (like scs ignores x30), >> there is no difference between the two (because we always need >> to explicitly specify which registers to ignore in the parameter of >> aarch64_restore_callee_saves). > > I think this is the correct behaviour. If we don't want to restore > a register at all then it should be excluded from the restore list > somehow. In your case you're doing that be using a limit of > X29_REGNUM instead of X30_REGNUM. > Got it, I'll use pop candidates in the next version. > FWIW, I did wonder whether aarch64_restore_callee_saves should be > doing the scs pop, rather than aarch64_expand_epilogue, and in an > earlier draft of the previous review I'd asked for that. It does > seem conceptually cleaner, but in practice, it would probably have > been awkward to implement. E.g. we'd need to explicitly stop an > LDP being formed with X30 as the second register. > Well, then I think I should keep it the same here :). > But treating scs push and scs pop as part of the register save and > restore sequences would have one advantage: it would allow the > scs push and scs pop to be shrink-wrapped. > Sorry for my limited knowledge of shrink warping, I don't think I get it here (I tried to find a case when compiling the kernel and some gcc test cases but I still don't have a clue.). I see that the bitmap of LR_REGNUM is cleared in aarch64_get_separate_components and scs push/pop are x18 based operations. If we handle them in aarch64_restore/save_callee_saves, could scs push/pop be shrink-wrapped in some cases? Thanks, Dan
Dan Li <ashimida@linux.alibaba.com> writes: > On 2/10/22 01:55, Richard Sandiford wrote: >>> >>> There might be a little difference: >>> >>> - Using push candidates means that a register to be ignored in pop >>> candidates will not be emitted again during the "restore" (pop_candidates >>> should always be a subset of push_candidates, since popping a register >>> without a push might not make sense). >> >> The push candidates are simply a subset of the saved registers though. >> Similarly, the pop candidates are simply a subset of the restored registers. >> So I think the requirement operates at that level: the restored registers >> must be a subset of the saved registers. >> >> In other circumstances it could have been the other way around: >> there might have been a change that stopped us from saving two >> registers during the allocation, but we wanted to carry on restoring >> two registers during the deallocation. I don't think there's a >> reason that the push candidates *have* to be a superset of the >> pop candidates (even though they are with the current change). >> > > Oh yeah, that sounds more reasonable. > >>> When we use "pop_candidate[12]", one more insn is emitted: >>> >>> 0000000000400604 <main>: >>> 400604: a9bf53f3 stp x19, x20, [sp, #-16]! >>> 400608: 52800000 mov w0, #0x0 >>> + 40060c: f94007f4 ldr x20, [sp, #8] >>> 400610: f84107f3 ldr x19, [sp], #16 >>> 400614: d65f03c0 ret >>> >>> But in the case of ignoring a specific register (like scs ignores x30), >>> there is no difference between the two (because we always need >>> to explicitly specify which registers to ignore in the parameter of >>> aarch64_restore_callee_saves). >> >> I think this is the correct behaviour. If we don't want to restore >> a register at all then it should be excluded from the restore list >> somehow. In your case you're doing that be using a limit of >> X29_REGNUM instead of X30_REGNUM. >> > > Got it, I'll use pop candidates in the next version. > >> FWIW, I did wonder whether aarch64_restore_callee_saves should be >> doing the scs pop, rather than aarch64_expand_epilogue, and in an >> earlier draft of the previous review I'd asked for that. It does >> seem conceptually cleaner, but in practice, it would probably have >> been awkward to implement. E.g. we'd need to explicitly stop an >> LDP being formed with X30 as the second register. >> > > Well, then I think I should keep it the same here :). > >> But treating scs push and scs pop as part of the register save and >> restore sequences would have one advantage: it would allow the >> scs push and scs pop to be shrink-wrapped. >> > > Sorry for my limited knowledge of shrink warping, I don't think I get > it here (I tried to find a case when compiling the kernel and some > gcc test cases but I still don't have a clue.). > > I see that the bitmap of LR_REGNUM is cleared in > aarch64_get_separate_components and scs push/pop are x18 based operations. > > If we handle them in aarch64_restore/save_callee_saves, > could scs push/pop be shrink-wrapped in some cases? Yeah, I think so. E.g. for: void f(); int g(int x) { if (x == 0) return 1; f(); return 2; } shrink wrapping would allow the scs push and pop to move along with the x30 save: g: cbnz w0, .L9 mov w0, 1 ret .L9: stp x29, x30, [sp, -16]! mov x29, sp bl f mov w0, 2 ldp x29, x30, [sp], 16 ret The idea is that aarch64_save_callee_saves would treat the scs push as part of saving x30 (along with the normal store to the frame chain, when used). aarch64_restore_callee_saves would similarly treat the scs pop as the way of restoring x30 (instead of loading from the frame chain). This is in contrast to the current patch, where the scs push and pop are treated as fixed parts of the prologue and epilogue instead, and where aarch64_restore_callee_saves tries to avoid doing anything for x30. If shrink-wrapping decides to treat x30 as a separate “component”, as it does in the example above, then the scs push and pop would be emitted by aarch64_process_components instead. It would be more complex, but it would give better code. Thanks, Richard
On 2/11/22 01:53, Richard Sandiford wrote: > Dan Li <ashimida@linux.alibaba.com> writes: >> On 2/10/22 01:55, Richard Sandiford wrote: >>>> >>> But treating scs push and scs pop as part of the register save and >>> restore sequences would have one advantage: it would allow the >>> scs push and scs pop to be shrink-wrapped. >>> >> >> Sorry for my limited knowledge of shrink warping, I don't think I get >> it here (I tried to find a case when compiling the kernel and some >> gcc test cases but I still don't have a clue.). >> >> I see that the bitmap of LR_REGNUM is cleared in >> aarch64_get_separate_components and scs push/pop are x18 based operations. >> >> If we handle them in aarch64_restore/save_callee_saves, >> could scs push/pop be shrink-wrapped in some cases? > > Yeah, I think so. E.g. for: > > void f(); > int g(int x) { > if (x == 0) > return 1; > f(); > return 2; > } > > shrink wrapping would allow the scs push and pop to move along with the > x30 save: > > g: > cbnz w0, .L9 > mov w0, 1 > ret > .L9: > stp x29, x30, [sp, -16]! > mov x29, sp > bl f > mov w0, 2 > ldp x29, x30, [sp], 16 > ret > Thanks Richard, (to make sure I understand correctly :)) I think it means that the current patch could do a "shrink-wapping", but the X30 could not be treat as a "component", now it could gen code like: g: cbnz w0, .L9 mov w0, 1 ret .L9: str x30, [x18], 8 stp x29, x30, [sp, -16]! mov x29, sp bl f ldr x30, [x18, -8]! mov w0, 2 ldr x29, [sp], 16 ret > The idea is that aarch64_save_callee_saves would treat the scs push > as part of saving x30 (along with the normal store to the frame chain, > when used). aarch64_restore_callee_saves would similarly treat the scs > pop as the way of restoring x30 (instead of loading from the frame chain). > This is in contrast to the current patch, where the scs push and pop are > treated as fixed parts of the prologue and epilogue instead, and where > aarch64_restore_callee_saves tries to avoid doing anything for x30. > > If shrink-wrapping decides to treat x30 as a separate “component”, as it > does in the example above, then the scs push and pop would be emitted > by aarch64_process_components instead. > > It would be more complex, but it would give better code. > Following your idea, I made a poc to add x30 in component bitmap: diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc index 35f6f64f5b2..fc9b5e7af54 100644 --- a/gcc/config/aarch64/aarch64.cc +++ b/gcc/config/aarch64/aarch64.cc @@ -8359,7 +8359,7 @@ aarch64_get_separate_components (void) if (reg1 != INVALID_REGNUM) bitmap_clear_bit (components, reg1); - bitmap_clear_bit (components, LR_REGNUM); bitmap_clear_bit (components, SP_REGNUM); return components; @@ -8396,7 +8396,7 @@ aarch64_components_for_bb (basic_block bb) /* GPRs are used in a bb if they are in the IN, GEN, or KILL sets. */ for (unsigned regno = 0; regno <= LAST_SAVED_REGNUM; regno++) if (!fixed_regs[regno] - && !crtl->abi->clobbers_full_reg_p (regno) + && (!crtl->abi->clobbers_full_reg_p (regno) || regno == R30_REGNUM) && (TEST_HARD_REG_BIT (extra_caller_saves, regno) || bitmap_bit_p (in, regno) || bitmap_bit_p (gen, regno) And with a test code compiled with -fno-omit-frame-pointer: void f(); int g(int x) { if (x == 0) { __asm__ ("":::"x19", "x20"); return 1; } f(); return 2; } Then it seems X30 is treat as a "component" (the test result of aarch64.exp also seems fine). g: stp x19, x20, [sp, -32]! cbnz w0, .L2 mov w0, 1 ldp x19, x20, [sp], 32 ret .L2: str x30, [sp, 16] bl f ldr x30, [sp, 16] mov w0, 2 ldp x19, x20, [sp], 32 ret And I think maybe we could handle this through three patches: 1.Keep current patch (a V5) unchanged for scs. 2.Add shrink-warpping for X30: logically this might be a separate topic, and I think more testing might be needed here (Well, I'm a little worried about if there might be other effects, since I just read this part of the code roughly yesterday). 3.Add scs push/pop to shrink-wrapping (and maybe we can do the same for the PAC code in pro/epilogue, since it's also the operation of the X30). Thanks, Dan
Dan Li <ashimida@linux.alibaba.com> writes: > On 2/11/22 01:53, Richard Sandiford wrote: >> Dan Li <ashimida@linux.alibaba.com> writes: >>> On 2/10/22 01:55, Richard Sandiford wrote: >>>>> >>>> But treating scs push and scs pop as part of the register save and >>>> restore sequences would have one advantage: it would allow the >>>> scs push and scs pop to be shrink-wrapped. >>>> >>> >>> Sorry for my limited knowledge of shrink warping, I don't think I get >>> it here (I tried to find a case when compiling the kernel and some >>> gcc test cases but I still don't have a clue.). >>> >>> I see that the bitmap of LR_REGNUM is cleared in >>> aarch64_get_separate_components and scs push/pop are x18 based operations. >>> >>> If we handle them in aarch64_restore/save_callee_saves, >>> could scs push/pop be shrink-wrapped in some cases? >> >> Yeah, I think so. E.g. for: >> >> void f(); >> int g(int x) { >> if (x == 0) >> return 1; >> f(); >> return 2; >> } >> >> shrink wrapping would allow the scs push and pop to move along with the >> x30 save: >> >> g: >> cbnz w0, .L9 >> mov w0, 1 >> ret >> .L9: >> stp x29, x30, [sp, -16]! >> mov x29, sp >> bl f >> mov w0, 2 >> ldp x29, x30, [sp], 16 >> ret >> > > Thanks Richard, (to make sure I understand correctly :)) I think > it means that the current patch could do a "shrink-wapping", but > the X30 could not be treat as a "component", now it could gen code > like: > > g: > cbnz w0, .L9 > mov w0, 1 > ret > .L9: > str x30, [x18], 8 > stp x29, x30, [sp, -16]! > mov x29, sp > bl f > ldr x30, [x18, -8]! > mov w0, 2 > ldr x29, [sp], 16 > ret Ah, right, sorry. I'd forgotten that this happened independently of the components stuff (and has to, since like you say, we don't treat LR_REGNUM as a separable component). >> The idea is that aarch64_save_callee_saves would treat the scs push >> as part of saving x30 (along with the normal store to the frame chain, >> when used). aarch64_restore_callee_saves would similarly treat the scs >> pop as the way of restoring x30 (instead of loading from the frame chain). >> This is in contrast to the current patch, where the scs push and pop are >> treated as fixed parts of the prologue and epilogue instead, and where >> aarch64_restore_callee_saves tries to avoid doing anything for x30. >> >> If shrink-wrapping decides to treat x30 as a separate “component”, as it >> does in the example above, then the scs push and pop would be emitted >> by aarch64_process_components instead. >> >> It would be more complex, but it would give better code. >> > > Following your idea, I made a poc to add x30 in component bitmap: > > diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc > index 35f6f64f5b2..fc9b5e7af54 100644 > --- a/gcc/config/aarch64/aarch64.cc > +++ b/gcc/config/aarch64/aarch64.cc > @@ -8359,7 +8359,7 @@ aarch64_get_separate_components (void) > if (reg1 != INVALID_REGNUM) > bitmap_clear_bit (components, reg1); > > - bitmap_clear_bit (components, LR_REGNUM); > bitmap_clear_bit (components, SP_REGNUM); > > return components; > @@ -8396,7 +8396,7 @@ aarch64_components_for_bb (basic_block bb) > /* GPRs are used in a bb if they are in the IN, GEN, or KILL sets. */ > for (unsigned regno = 0; regno <= LAST_SAVED_REGNUM; regno++) > if (!fixed_regs[regno] > - && !crtl->abi->clobbers_full_reg_p (regno) > + && (!crtl->abi->clobbers_full_reg_p (regno) || regno == R30_REGNUM) > && (TEST_HARD_REG_BIT (extra_caller_saves, regno) > || bitmap_bit_p (in, regno) > || bitmap_bit_p (gen, regno) > > And with a test code compiled with -fno-omit-frame-pointer: > > void f(); > int g(int x) { > if (x == 0) { > __asm__ ("":::"x19", "x20"); > return 1; > } > f(); > return 2; > } > > Then it seems X30 is treat as a "component" (the test > result of aarch64.exp also seems fine). > > g: > stp x19, x20, [sp, -32]! > cbnz w0, .L2 > mov w0, 1 > ldp x19, x20, [sp], 32 > ret > .L2: > str x30, [sp, 16] > bl f > ldr x30, [sp, 16] > mov w0, 2 > ldp x19, x20, [sp], 32 > ret > > And I think maybe we could handle this through three patches: > 1.Keep current patch (a V5) unchanged for scs. > 2.Add shrink-warpping for X30: > logically this might be a separate topic, and I think more testing > might be needed here (Well, I'm a little worried about if there might > be other effects, since I just read this part of the code roughly > yesterday). > 3.Add scs push/pop to shrink-wrapping (and maybe we can do the same for > the PAC code in pro/epilogue, since it's also the operation of the X30). Yeah, that's fair. (Like I said earlier, I wasn't asking for the shrink-wrapping change. It was just a note in passing. But as you point out, the individual shrink-wrapping support would be even more work than I'd imagined.) Thanks, Richard
On 2/11/22 07:35, Richard Sandiford wrote: > Dan Li <ashimida@linux.alibaba.com> writes: >> On 2/11/22 01:53, Richard Sandiford wrote: >>> Dan Li <ashimida@linux.alibaba.com> writes: >>>> On 2/10/22 01:55, Richard Sandiford wrote: >>>>>> >> And I think maybe we could handle this through three patches: >> 1.Keep current patch (a V5) unchanged for scs. >> 2.Add shrink-warpping for X30: >> logically this might be a separate topic, and I think more testing >> might be needed here (Well, I'm a little worried about if there might >> be other effects, since I just read this part of the code roughly >> yesterday). >> 3.Add scs push/pop to shrink-wrapping (and maybe we can do the same for >> the PAC code in pro/epilogue, since it's also the operation of the X30). > > Yeah, that's fair. > > (Like I said earlier, I wasn't asking for the shrink-wrapping change. > It was just a note in passing. But as you point out, the individual > shrink-wrapping support would be even more work than I'd imagined.) > Got it! Thanks, Dan
On 2/11/22 07:35, Richard Sandiford wrote: > Dan Li <ashimida@linux.alibaba.com> writes: >> On 2/11/22 01:53, Richard Sandiford wrote: >>> Dan Li <ashimida@linux.alibaba.com> writes: >>>> On 2/10/22 01:55, Richard Sandiford wrote: >>>>>> >> And I think maybe we could handle this through three patches: >> 1.Keep current patch (a V5) unchanged for scs. >> 2.Add shrink-warpping for X30: >> logically this might be a separate topic, and I think more testing >> might be needed here (Well, I'm a little worried about if there might >> be other effects, since I just read this part of the code roughly >> yesterday). >> 3.Add scs push/pop to shrink-wrapping (and maybe we can do the same for >> the PAC code in pro/epilogue, since it's also the operation of the X30). > > Yeah, that's fair. > > (Like I said earlier, I wasn't asking for the shrink-wrapping change. > It was just a note in passing. But as you point out, the individual > shrink-wrapping support would be even more work than I'd imagined.) Hi, Richard, I have sent out the v5[1] and rebased it to mainline at the same time, please let me know if there is anything else I need to do :) [1].https://gcc.gnu.org/pipermail/gcc-patches/2022-February/590311.html Thanks, Dan
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 699c105a42a..f4d962917c4 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -79,6 +79,7 @@ #include "tree-ssa-loop-niter.h" #include "fractional-cost.h" #include "rtlanal.h" +#include "asan.h" /* This file should be included last. */ #include "target-def.h" @@ -7291,8 +7292,8 @@ aarch64_layout_frame (void) #define SLOT_NOT_REQUIRED (-2) #define SLOT_REQUIRED (-1) - frame.wb_candidate1 = INVALID_REGNUM; - frame.wb_candidate2 = INVALID_REGNUM; + frame.wb_push_candidate1 = INVALID_REGNUM; + frame.wb_push_candidate2 = INVALID_REGNUM; frame.spare_pred_reg = INVALID_REGNUM; /* First mark all the registers that really need to be saved... */ @@ -7407,9 +7408,9 @@ aarch64_layout_frame (void) { /* FP and LR are placed in the linkage record. */ frame.reg_offset[R29_REGNUM] = offset; - frame.wb_candidate1 = R29_REGNUM; + frame.wb_push_candidate1 = R29_REGNUM; frame.reg_offset[R30_REGNUM] = offset + UNITS_PER_WORD; - frame.wb_candidate2 = R30_REGNUM; + frame.wb_push_candidate2 = R30_REGNUM; offset += 2 * UNITS_PER_WORD; } @@ -7417,10 +7418,10 @@ aarch64_layout_frame (void) if (known_eq (frame.reg_offset[regno], SLOT_REQUIRED)) { frame.reg_offset[regno] = offset; - if (frame.wb_candidate1 == INVALID_REGNUM) - frame.wb_candidate1 = regno; - else if (frame.wb_candidate2 == INVALID_REGNUM) - frame.wb_candidate2 = regno; + if (frame.wb_push_candidate1 == INVALID_REGNUM) + frame.wb_push_candidate1 = regno; + else if (frame.wb_push_candidate2 == INVALID_REGNUM) + frame.wb_push_candidate2 = regno; offset += UNITS_PER_WORD; } @@ -7443,11 +7444,11 @@ aarch64_layout_frame (void) } frame.reg_offset[regno] = offset; - if (frame.wb_candidate1 == INVALID_REGNUM) - frame.wb_candidate1 = regno; - else if (frame.wb_candidate2 == INVALID_REGNUM - && frame.wb_candidate1 >= V0_REGNUM) - frame.wb_candidate2 = regno; + if (frame.wb_push_candidate1 == INVALID_REGNUM) + frame.wb_push_candidate1 = regno; + else if (frame.wb_push_candidate2 == INVALID_REGNUM + && frame.wb_push_candidate1 >= V0_REGNUM) + frame.wb_push_candidate2 = regno; offset += vector_save_size; } @@ -7478,10 +7479,38 @@ aarch64_layout_frame (void) frame.sve_callee_adjust = 0; frame.callee_offset = 0; + frame.wb_pop_candidate1 = frame.wb_push_candidate1; + frame.wb_pop_candidate2 = frame.wb_push_candidate2; + + /* Shadow call stack only deals with functions where the LR is pushed + onto the stack and without specifying the "no_sanitize" attribute + with the argument "shadow-call-stack". */ + frame.is_scs_enabled + = (!crtl->calls_eh_return + && sanitize_flags_p (SANITIZE_SHADOW_CALL_STACK) + && known_ge (cfun->machine->frame.reg_offset[LR_REGNUM], 0)); + + /* When shadow call stack is enabled, the scs_pop in the epilogue will + restore x30, and we don't need to pop x30 again in the traditional + way. Pop candidates record the registers that need to be popped + eventually. */ + if (frame.is_scs_enabled) + { + if (frame.wb_push_candidate2 == R30_REGNUM) + frame.wb_pop_candidate2 = INVALID_REGNUM; + else if (frame.wb_push_candidate1 == R30_REGNUM) + frame.wb_pop_candidate1 = INVALID_REGNUM; + } + + /* If candidate2 is INVALID_REGNUM, we need to adjust max_push_offset to + 256 to ensure that the offset meets the requirements of emit_move_insn. + Similarly, if candidate1 is INVALID_REGNUM, we need to set + max_push_offset to 0, because no registers are popped at this time, + so callee_adjust cannot be adjusted. */ HOST_WIDE_INT max_push_offset = 0; - if (frame.wb_candidate2 != INVALID_REGNUM) + if (frame.wb_pop_candidate2 != INVALID_REGNUM) max_push_offset = 512; - else if (frame.wb_candidate1 != INVALID_REGNUM) + else if (frame.wb_pop_candidate1 != INVALID_REGNUM) max_push_offset = 256; HOST_WIDE_INT const_size, const_outgoing_args_size, const_fp_offset; @@ -7571,8 +7600,8 @@ aarch64_layout_frame (void) { /* We've decided not to associate any register saves with the initial stack allocation. */ - frame.wb_candidate1 = INVALID_REGNUM; - frame.wb_candidate2 = INVALID_REGNUM; + frame.wb_pop_candidate1 = frame.wb_push_candidate1 = INVALID_REGNUM; + frame.wb_pop_candidate2 = frame.wb_push_candidate2 = INVALID_REGNUM; } frame.laid_out = true; @@ -7885,8 +7914,8 @@ aarch64_save_callee_saves (poly_int64 start_offset, bool frame_related_p = aarch64_emit_cfi_for_reg_p (regno); if (skip_wb - && (regno == cfun->machine->frame.wb_candidate1 - || regno == cfun->machine->frame.wb_candidate2)) + && (regno == cfun->machine->frame.wb_push_candidate1 + || regno == cfun->machine->frame.wb_push_candidate2)) continue; if (cfun->machine->reg_is_wrapped_separately[regno]) @@ -7996,8 +8025,8 @@ aarch64_restore_callee_saves (poly_int64 start_offset, unsigned start, rtx reg, mem; if (skip_wb - && (regno == cfun->machine->frame.wb_candidate1 - || regno == cfun->machine->frame.wb_candidate2)) + && (regno == cfun->machine->frame.wb_push_candidate1 + || regno == cfun->machine->frame.wb_push_candidate2)) continue; machine_mode mode = aarch64_reg_save_mode (regno); @@ -8168,8 +8197,8 @@ aarch64_get_separate_components (void) if (cfun->machine->frame.spare_pred_reg != INVALID_REGNUM) bitmap_clear_bit (components, cfun->machine->frame.spare_pred_reg); - unsigned reg1 = cfun->machine->frame.wb_candidate1; - unsigned reg2 = cfun->machine->frame.wb_candidate2; + unsigned reg1 = cfun->machine->frame.wb_push_candidate1; + unsigned reg2 = cfun->machine->frame.wb_push_candidate2; /* If registers have been chosen to be stored/restored with writeback don't interfere with them to avoid having to output explicit stack adjustment instructions. */ @@ -8778,8 +8807,8 @@ aarch64_expand_prologue (void) poly_int64 sve_callee_adjust = cfun->machine->frame.sve_callee_adjust; poly_int64 below_hard_fp_saved_regs_size = cfun->machine->frame.below_hard_fp_saved_regs_size; - unsigned reg1 = cfun->machine->frame.wb_candidate1; - unsigned reg2 = cfun->machine->frame.wb_candidate2; + unsigned reg1 = cfun->machine->frame.wb_push_candidate1; + unsigned reg2 = cfun->machine->frame.wb_push_candidate2; bool emit_frame_chain = cfun->machine->frame.emit_frame_chain; rtx_insn *insn; @@ -8810,6 +8839,10 @@ aarch64_expand_prologue (void) RTX_FRAME_RELATED_P (insn) = 1; } + /* Push return address to shadow call stack. */ + if (cfun->machine->frame.is_scs_enabled) + emit_insn (gen_scs_push ()); + if (flag_stack_usage_info) current_function_static_stack_size = constant_lower_bound (frame_size); @@ -8956,8 +8989,8 @@ aarch64_expand_epilogue (bool for_sibcall) poly_int64 sve_callee_adjust = cfun->machine->frame.sve_callee_adjust; poly_int64 below_hard_fp_saved_regs_size = cfun->machine->frame.below_hard_fp_saved_regs_size; - unsigned reg1 = cfun->machine->frame.wb_candidate1; - unsigned reg2 = cfun->machine->frame.wb_candidate2; + unsigned reg1 = cfun->machine->frame.wb_pop_candidate1; + unsigned reg2 = cfun->machine->frame.wb_pop_candidate2; rtx cfi_ops = NULL; rtx_insn *insn; /* A stack clash protection prologue may not have left EP0_REGNUM or @@ -9027,9 +9060,19 @@ aarch64_expand_epilogue (bool for_sibcall) false, &cfi_ops); if (maybe_ne (sve_callee_adjust, 0)) aarch64_add_sp (NULL_RTX, NULL_RTX, sve_callee_adjust, true); - aarch64_restore_callee_saves (callee_offset - sve_callee_adjust, - R0_REGNUM, R30_REGNUM, - callee_adjust != 0, &cfi_ops); + + /* When shadow call stack is enabled, the scs_pop in the epilogue will + restore x30, we don't need to restore x30 again in the traditional + way. */ + if (cfun->machine->frame.is_scs_enabled) + aarch64_restore_callee_saves (callee_offset - sve_callee_adjust, + R0_REGNUM, R29_REGNUM, + callee_adjust != 0, &cfi_ops); + else + aarch64_restore_callee_saves (callee_offset - sve_callee_adjust, + R0_REGNUM, R30_REGNUM, + callee_adjust != 0, &cfi_ops); + if (need_barrier_p) emit_insn (gen_stack_tie (stack_pointer_rtx, stack_pointer_rtx)); @@ -9066,6 +9109,17 @@ aarch64_expand_epilogue (bool for_sibcall) RTX_FRAME_RELATED_P (insn) = 1; } + /* Pop return address from shadow call stack. */ + if (cfun->machine->frame.is_scs_enabled) + { + machine_mode mode = aarch64_reg_save_mode (R30_REGNUM); + rtx reg = gen_rtx_REG (mode, R30_REGNUM); + + insn = emit_insn (gen_scs_pop ()); + add_reg_note (insn, REG_CFA_RESTORE, reg); + RTX_FRAME_RELATED_P (insn) = 1; + } + /* We prefer to emit the combined return/authenticate instruction RETAA, however there are three cases in which we must instead emit an explicit authentication instruction. @@ -16492,6 +16546,10 @@ aarch64_override_options_internal (struct gcc_options *opts) aarch64_stack_protector_guard_offset = offs; } + if ((flag_sanitize & SANITIZE_SHADOW_CALL_STACK) + && !fixed_regs[R18_REGNUM]) + error ("%<-fsanitize=shadow-call-stack%> requires %<-ffixed-x18%>"); + initialize_aarch64_code_model (opts); initialize_aarch64_tls_size (opts); @@ -26505,6 +26563,9 @@ aarch64_libgcc_floating_mode_supported_p #undef TARGET_ASM_FUNCTION_EPILOGUE #define TARGET_ASM_FUNCTION_EPILOGUE aarch64_sls_emit_blr_function_thunks +#undef TARGET_HAVE_SHADOW_CALL_STACK +#define TARGET_HAVE_SHADOW_CALL_STACK true + struct gcc_target targetm = TARGET_INITIALIZER; #include "gt-aarch64.h" diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h index 2792bb29adb..b5efe083f30 100644 --- a/gcc/config/aarch64/aarch64.h +++ b/gcc/config/aarch64/aarch64.h @@ -906,9 +906,21 @@ struct GTY (()) aarch64_frame Indicated by CALLEE_ADJUST == 0 && EMIT_FRAME_CHAIN. These fields indicate which registers we've decided to handle using - (1) or (2), or INVALID_REGNUM if none. */ - unsigned wb_candidate1; - unsigned wb_candidate2; + (1) or (2), or INVALID_REGNUM if none. + + In some cases we don't always need to pop all registers in the push + candidates, pop candidates record which registers need to be popped + eventually. The initial value of a pop candidate is copied from its + corresponding push candidate. + + Currently, the pop candidates are only used for shadow call stack. + When "-fsanitize=shadow-call-stack" is specified, we replace x30 in + the pop candidate with INVALID_REGNUM to ensure that x30 is not + popped twice. */ + unsigned wb_push_candidate1; + unsigned wb_push_candidate2; + unsigned wb_pop_candidate1; + unsigned wb_pop_candidate2; /* Big-endian SVE frames need a spare predicate register in order to save vector registers in the correct layout for unwinding. @@ -916,6 +928,9 @@ struct GTY (()) aarch64_frame unsigned spare_pred_reg; bool laid_out; + + /* True if shadow call stack should be enabled for the current function. */ + bool is_scs_enabled; }; typedef struct GTY (()) machine_function diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 1a39470a1fe..48666b4b218 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -6994,6 +6994,16 @@ (define_insn "xpaclri" "hint\t7 // xpaclri" ) +;; Save X30 in the X18-based POST_INC stack (consistent with clang). +(define_expand "scs_push" + [(set (mem:DI (post_inc:DI (reg:DI R18_REGNUM))) + (reg:DI R30_REGNUM))]) + +;; Load X30 form the X18-based PRE_DEC stack (consistent with clang). +(define_expand "scs_pop" + [(set (reg:DI R30_REGNUM) + (mem:DI (pre_dec:DI (reg:DI R18_REGNUM))))]) + ;; UNSPEC_VOLATILE is considered to use and clobber all hard registers and ;; all of memory. This blocks insns from being moved across this point. diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 71992b8c597..1e580107fab 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -15224,6 +15224,36 @@ add @code{detect_invalid_pointer_pairs=2} to the environment variable @env{ASAN_OPTIONS}. Using @code{detect_invalid_pointer_pairs=1} detects invalid operation only when both pointers are non-null. +@item -fsanitize=shadow-call-stack +@opindex fsanitize=shadow-call-stack +Enable ShadowCallStack, a security enhancement mechanism used to protect +programs against return address overwrites (e.g. stack buffer overflows.) +It works by saving a function's return address to a separately allocated +shadow call stack in the function prologue and restoring the return address +from the shadow call stack in the function epilogue. Instrumentation only +occurs in functions that need to save the return address to the stack. + +Currently it only supports the aarch64 platform. It is specifically +designed for linux kernels that enable the CONFIG_SHADOW_CALL_STACK option. +For the user space programs, runtime support is not currently provided +in libc and libgcc. Users who want to use this feature in user space need +to provide their own support for the runtime. It should be noted that +this may cause the ABI rules to be broken. + +On aarch64, the instrumentation makes use of the platform register @code{x18}. +This generally means that any code that may run on the same thread as code +compiled with ShadowCallStack must be compiled with the flag +@option{-ffixed-x18}, otherwise functions compiled without +@option{-ffixed-x18} might clobber @code{x18} and so corrupt the shadow +stack pointer. + +Also, because there is no userspace runtime support, code compiled with +ShadowCallStack cannot use exception handling. Use @option{-fno-exceptions} +to turn off exceptions. + +See @uref{https://clang.llvm.org/docs/ShadowCallStack.html} for more +details. + @item -fsanitize=thread @opindex fsanitize=thread Enable ThreadSanitizer, a fast data race detector. diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi index 990152f5b15..19c130d7420 100644 --- a/gcc/doc/tm.texi +++ b/gcc/doc/tm.texi @@ -12575,3 +12575,8 @@ counters are incremented using atomic operations. Targets not supporting 64-bit atomic operations may override the default value and request a 32-bit type. @end deftypefn + +@deftypevr {Target Hook} bool TARGET_HAVE_SHADOW_CALL_STACK +This value is true if the target platform supports +@option{-fsanitize=shadow-call-stack}. The default value is false. +@end deftypevr diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in index 193c9bdd853..01db5f54b5a 100644 --- a/gcc/doc/tm.texi.in +++ b/gcc/doc/tm.texi.in @@ -8179,3 +8179,5 @@ maintainer is familiar with. @hook TARGET_MEMTAG_UNTAGGED_POINTER @hook TARGET_GCOV_TYPE_SIZE + +@hook TARGET_HAVE_SHADOW_CALL_STACK diff --git a/gcc/flag-types.h b/gcc/flag-types.h index a5a637160d7..c22ef35a289 100644 --- a/gcc/flag-types.h +++ b/gcc/flag-types.h @@ -321,6 +321,8 @@ enum sanitize_code { SANITIZE_HWADDRESS = 1UL << 28, SANITIZE_USER_HWADDRESS = 1UL << 29, SANITIZE_KERNEL_HWADDRESS = 1UL << 30, + /* Shadow Call Stack. */ + SANITIZE_SHADOW_CALL_STACK = 1UL << 31, SANITIZE_SHIFT = SANITIZE_SHIFT_BASE | SANITIZE_SHIFT_EXPONENT, SANITIZE_UNDEFINED = SANITIZE_SHIFT | SANITIZE_DIVIDE | SANITIZE_UNREACHABLE | SANITIZE_VLA | SANITIZE_NULL | SANITIZE_RETURN diff --git a/gcc/opts.c b/gcc/opts.c index 4472cec1b98..b2e00e8067a 100644 --- a/gcc/opts.c +++ b/gcc/opts.c @@ -1994,6 +1994,7 @@ const struct sanitizer_opts_s sanitizer_opts[] = SANITIZER_OPT (vptr, SANITIZE_VPTR, true), SANITIZER_OPT (pointer-overflow, SANITIZE_POINTER_OVERFLOW, true), SANITIZER_OPT (builtin, SANITIZE_BUILTIN, true), + SANITIZER_OPT (shadow-call-stack, SANITIZE_SHADOW_CALL_STACK, false), SANITIZER_OPT (all, ~0U, true), #undef SANITIZER_OPT { NULL, 0U, 0UL, false } diff --git a/gcc/target.def b/gcc/target.def index 87feeec2ea1..ce382714399 100644 --- a/gcc/target.def +++ b/gcc/target.def @@ -7084,6 +7084,14 @@ counters are incremented using atomic operations. Targets not supporting\n\ type.", HOST_WIDE_INT, (void), default_gcov_type_size) +/* This value represents whether the shadow call stack is implemented on + the target platform. */ +DEFHOOKPOD +(have_shadow_call_stack, + "This value is true if the target platform supports\n\ +@option{-fsanitize=shadow-call-stack}. The default value is false.", + bool, false) + /* Close the 'struct gcc_target' definition. */ HOOK_VECTOR_END (C90_EMPTY_HACK) diff --git a/gcc/testsuite/gcc.target/aarch64/shadow_call_stack_1.c b/gcc/testsuite/gcc.target/aarch64/shadow_call_stack_1.c new file mode 100644 index 00000000000..ab68d6e8482 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/shadow_call_stack_1.c @@ -0,0 +1,6 @@ +/* { dg-do compile } */ +/* { dg-options "-fsanitize=shadow-call-stack -fno-exceptions" } */ + +int i; + +/* { dg-error "'-fsanitize=shadow-call-stack' requires '-ffixed-x18'" "" {target "aarch64*-*-*" } 0 } */ diff --git a/gcc/testsuite/gcc.target/aarch64/shadow_call_stack_2.c b/gcc/testsuite/gcc.target/aarch64/shadow_call_stack_2.c new file mode 100644 index 00000000000..b5139a24559 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/shadow_call_stack_2.c @@ -0,0 +1,6 @@ +/* { dg-do compile } */ +/* { dg-options "-fsanitize=shadow-call-stack -ffixed-x18 -fexceptions" } */ + +int i; + +/* { dg-error "'-fsanitize=shadow-call-stack' requires '-fno-exceptions'" "" {target "aarch64*-*-*" } 0 } */ diff --git a/gcc/testsuite/gcc.target/aarch64/shadow_call_stack_3.c b/gcc/testsuite/gcc.target/aarch64/shadow_call_stack_3.c new file mode 100644 index 00000000000..b88e490f3ae --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/shadow_call_stack_3.c @@ -0,0 +1,45 @@ +/* Testing shadow call stack. */ +/* scs_push: str x30, [x18], #8 */ +/* scs_pop: ldr x30, [x18, #-8]! */ +/* { dg-do compile } */ +/* { dg-options "-O2 -fsanitize=shadow-call-stack -ffixed-x18 -fno-exceptions" } */ + +int foo (int); + +/* function not use x30. */ +int func1 (void) +{ + return 0; +} + +/* function use x30. */ +int func2 (void) +{ + /* scs push */ + asm volatile ("":::"x30"); + + return 0; + /* scs pop */ +} + +/* sibcall. */ +int func3 (int a, int b) +{ + /* scs push */ + asm volatile ("":::"x30"); + + return foo (a+b); + /* scs pop */ +} + +/* eh_return. */ +int func4 (long offset, void *handler) +{ + /* Do not emit scs push/pop */ + asm volatile ("":::"x30"); + + __builtin_eh_return (offset, handler); +} + +/* { dg-final { scan-assembler-times {str\tx30, \[x18\], #?8} 2 } } */ +/* { dg-final { scan-assembler-times {ldr\tx30, \[x18, #?-8\]!} 2 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/shadow_call_stack_4.c b/gcc/testsuite/gcc.target/aarch64/shadow_call_stack_4.c new file mode 100644 index 00000000000..f63169340e1 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/shadow_call_stack_4.c @@ -0,0 +1,20 @@ +/* Testing the disable of shadow call stack. */ +/* scs_push: str x30, [x18], #8 */ +/* scs_pop: ldr x30, [x18, #-8]! */ +/* { dg-do compile } */ +/* { dg-options "-O2 -fno-omit-frame-pointer -fsanitize=shadow-call-stack -ffixed-x18 -fno-exceptions" } */ + +int foo (int); + +/* function disable shadow call stack. */ +int __attribute__((no_sanitize("shadow-call-stack"))) func1 (void) +{ + asm volatile ("":::"x30"); + + return 0; +} + +/* { dg-final { scan-assembler-not {str\tx30, \[x18\], #?8} } } */ +/* { dg-final { scan-assembler-not {ldr\tx30, \[x18, #?-8\]!} } } */ +/* { dg-final { scan-assembler-times {stp\tx29, x30, \[sp, -[0-9]+\]!} 1 } } */ +/* { dg-final { scan-assembler-times {ldp\tx29, x30, \[sp\], [0-9]+} 1 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/shadow_call_stack_5.c b/gcc/testsuite/gcc.target/aarch64/shadow_call_stack_5.c new file mode 100644 index 00000000000..d88357ca04d --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/shadow_call_stack_5.c @@ -0,0 +1,18 @@ +/* Verify: + * -fno-omit-frame-pointer -fsanitize=shadow-call-stack -fno-exceptions -ffixed-x18. + * without outgoing. + * total frame size <= 512 but > 256. + * callee-saved reg: x29, x30. + * optimized code should use "stp x29, x30, [sp]" to save frame chain. + * optimized code should use "ldr x29, [sp]" to restore x29 only. */ + +/* { dg-do compile } */ +/* { dg-options "-O2 -fno-omit-frame-pointer -fsanitize=shadow-call-stack -fno-exceptions -ffixed-x18 --save-temps" } */ + +#include "test_frame_common.h" + +t_frame_pattern (func1, 400, ) + +/* { dg-final { scan-assembler-times {stp\tx29, x30, \[sp\]} 1 } } */ +/* { dg-final { scan-assembler {ldr\tx29, \[sp\]} } } */ + diff --git a/gcc/testsuite/gcc.target/aarch64/shadow_call_stack_6.c b/gcc/testsuite/gcc.target/aarch64/shadow_call_stack_6.c new file mode 100644 index 00000000000..83b74834c6a --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/shadow_call_stack_6.c @@ -0,0 +1,18 @@ +/* Verify: + * -fomit-frame-pointer -fsanitize=shadow-call-stack -fno-exceptions -ffixed-x18. + * without outgoing. + * total frame size <= 256. + * callee-saved reg: x30 only. + * optimized code should use "str x30, [sp]" to save x30 in prologue. + * optimized code should not restore x30 in epilogue. */ + +/* { dg-do compile } */ +/* { dg-options "-O2 -fomit-frame-pointer -fsanitize=shadow-call-stack -fno-exceptions -ffixed-x18 --save-temps" } */ + +#include "test_frame_common.h" + +t_frame_pattern (func1, 200, ) + +/* { dg-final { scan-assembler-times {str\tx30, \[sp\]} 1 } } */ +/* { dg-final { scan-assembler-not {ld[r|p]\tx30, \[sp} } } */ + diff --git a/gcc/testsuite/gcc.target/aarch64/shadow_call_stack_7.c b/gcc/testsuite/gcc.target/aarch64/shadow_call_stack_7.c new file mode 100644 index 00000000000..5537fb3293a --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/shadow_call_stack_7.c @@ -0,0 +1,18 @@ +/* Verify: + * -fomit-frame-pointer -fsanitize=shadow-call-stack -fno-exceptions -ffixed-x18. + * without outgoing. + * total frame size <= 256. + * callee-saved reg: x19, x30. + * optimized code should use "stp x19, x30, [sp, -x]!" to save x19, x30 in prologue. + * optimized code should use "ldr x19, [sp], x" to restore x19 only. */ + +/* { dg-do compile } */ +/* { dg-options "-O2 -fomit-frame-pointer -fsanitize=shadow-call-stack -fno-exceptions -ffixed-x18 --save-temps" } */ + +#include "test_frame_common.h" + +t_frame_pattern (func1, 200, "x19") + +/* { dg-final { scan-assembler-times {stp\tx19, x30, \[sp, -[0-9]+\]!} 1 } } */ +/* { dg-final { scan-assembler {ldr\tx19, \[sp\], [0-9]+} } } */ + diff --git a/gcc/testsuite/gcc.target/aarch64/shadow_call_stack_8.c b/gcc/testsuite/gcc.target/aarch64/shadow_call_stack_8.c new file mode 100644 index 00000000000..b03f26f7bcf --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/shadow_call_stack_8.c @@ -0,0 +1,24 @@ +/* Verify: + * -fomit-frame-pointer -fsanitize=shadow-call-stack -fno-exceptions -ffixed-x18. + * without outgoing. + * total frame <= 512 but > 256. + * callee-saved reg: x19, x20, x30. + * optimized code should use "stp x19, x20, [sp, -x]!" to save x19, x20 in prologue. + * optimized code should use "str x30, [sp " to save x30 in prologue. + * optimized code should use "ldp x19, x20, [sp], x" to retore x19, x20 in epilogue. + * optimized code should not restore x30 in epilogue. */ + +/* { dg-do compile } */ +/* { dg-options "-O0 -fomit-frame-pointer -fsanitize=shadow-call-stack -fno-exceptions -ffixed-x18 --save-temps" } */ + +int func1 (void) +{ + unsigned char a[200]; + __asm__ ("":::"x19","x20","x30"); + return 0; +} + +/* { dg-final { scan-assembler-times {stp\tx19, x20, \[sp, -[0-9]+\]!} 1 } } */ +/* { dg-final { scan-assembler-times {str\tx30, \[sp} 1 } } */ +/* { dg-final { scan-assembler {ldp\tx19, x20, \[sp\], [0-9]+} } } */ +/* { dg-final { scan-assembler-not {ld[r|p]\tx30, \[sp} } } */ diff --git a/gcc/toplev.c b/gcc/toplev.c index e91f083f8ff..93d17ddbda1 100644 --- a/gcc/toplev.c +++ b/gcc/toplev.c @@ -1677,6 +1677,16 @@ process_options (bool no_backend) flag_sanitize &= ~SANITIZE_HWADDRESS; } + if (flag_sanitize & SANITIZE_SHADOW_CALL_STACK) + { + if (!targetm.have_shadow_call_stack) + sorry ("%<-fsanitize=shadow-call-stack%> not supported " + "in current platform"); + else if (flag_exceptions) + error_at (UNKNOWN_LOCATION, "%<-fsanitize=shadow-call-stack%> " + "requires %<-fno-exceptions%>"); + } + HOST_WIDE_INT patch_area_size, patch_area_start; parse_and_check_patch_area (flag_patchable_function_entry, false, &patch_area_size, &patch_area_start);
Shadow Call Stack can be used to protect the return address of a function at runtime, and clang already supports this feature[1]. To enable SCS in user mode, in addition to compiler, other support is also required (as discussed in [2]). This patch only adds basic support for SCS from the compiler side, and provides convenience for users to enable SCS. For linux kernel, only the support of the compiler is required. [1] https://clang.llvm.org/docs/ShadowCallStack.html [2] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102768 Signed-off-by: Dan Li <ashimida@linux.alibaba.com> gcc/ChangeLog: * config/aarch64/aarch64.c (SLOT_REQUIRED): Rename wb_candidate[12] to wb_push_candidate[12]. (aarch64_layout_frame): Likewise, and change callee_adjust when scs is enabled. (aarch64_save_callee_saves): Rename wb_candidate[12] to wb_push_candidate[12]. (aarch64_restore_callee_saves): Likewise. (aarch64_get_separate_components): Likewise. (aarch64_expand_prologue): Push x30 onto SCS before it's pushed onto stack. (aarch64_expand_epilogue): Pop x30 frome SCS, while preventing it from being popped from the regular stack again. (aarch64_override_options_internal): Add SCS compile option check. (TARGET_HAVE_SHADOW_CALL_STACK): New hook. * config/aarch64/aarch64.h (struct GTY): Add is_scs_enabled, wb_pop_candidate[12], and rename wb_candidate[12] to wb_push_candidate[12]. * config/aarch64/aarch64.md (scs_push): New template. (scs_pop): Likewise. * doc/invoke.texi: Document -fsanitize=shadow-call-stack. * doc/tm.texi: Regenerate. * doc/tm.texi.in: Add hook have_shadow_call_stack. * flag-types.h (enum sanitize_code): Add SANITIZE_SHADOW_CALL_STACK. * opts.c: Add shadow-call-stack. * target.def: New hook. * toplev.c (process_options): Add SCS compile option check. gcc/testsuite/ChangeLog: * gcc.target/aarch64/shadow_call_stack_1.c: New test. * gcc.target/aarch64/shadow_call_stack_2.c: New test. * gcc.target/aarch64/shadow_call_stack_3.c: New test. * gcc.target/aarch64/shadow_call_stack_4.c: New test. * gcc.target/aarch64/shadow_call_stack_5.c: New test. * gcc.target/aarch64/shadow_call_stack_6.c: New test. * gcc.target/aarch64/shadow_call_stack_7.c: New test. * gcc.target/aarch64/shadow_call_stack_8.c: New test. --- V4: - Added wb_[push|pop]_candidates[12] to ensure push/pop can emit different registers. V3: - Change scs_push/pop to standard move patterns. - Optimize scs_pop to avoid pop x30 twice when shadow stack is enabled. gcc/config/aarch64/aarch64.c | 121 +++++++++++++----- gcc/config/aarch64/aarch64.h | 21 ++- gcc/config/aarch64/aarch64.md | 10 ++ gcc/doc/invoke.texi | 30 +++++ gcc/doc/tm.texi | 5 + gcc/doc/tm.texi.in | 2 + gcc/flag-types.h | 2 + gcc/opts.c | 1 + gcc/target.def | 8 ++ .../gcc.target/aarch64/shadow_call_stack_1.c | 6 + .../gcc.target/aarch64/shadow_call_stack_2.c | 6 + .../gcc.target/aarch64/shadow_call_stack_3.c | 45 +++++++ .../gcc.target/aarch64/shadow_call_stack_4.c | 20 +++ .../gcc.target/aarch64/shadow_call_stack_5.c | 18 +++ .../gcc.target/aarch64/shadow_call_stack_6.c | 18 +++ .../gcc.target/aarch64/shadow_call_stack_7.c | 18 +++ .../gcc.target/aarch64/shadow_call_stack_8.c | 24 ++++ gcc/toplev.c | 10 ++ 18 files changed, 332 insertions(+), 33 deletions(-) create mode 100644 gcc/testsuite/gcc.target/aarch64/shadow_call_stack_1.c create mode 100644 gcc/testsuite/gcc.target/aarch64/shadow_call_stack_2.c create mode 100644 gcc/testsuite/gcc.target/aarch64/shadow_call_stack_3.c create mode 100644 gcc/testsuite/gcc.target/aarch64/shadow_call_stack_4.c create mode 100644 gcc/testsuite/gcc.target/aarch64/shadow_call_stack_5.c create mode 100644 gcc/testsuite/gcc.target/aarch64/shadow_call_stack_6.c create mode 100644 gcc/testsuite/gcc.target/aarch64/shadow_call_stack_7.c create mode 100644 gcc/testsuite/gcc.target/aarch64/shadow_call_stack_8.c