Message ID | 8798a3cc-6e64-6966-d5ad-fadca79d92ba@synaptics.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | arm64: Add read_mostly declaration/definition to irq stack ptr | expand |
On Mon, Feb 07, 2022 at 04:46:42PM +0800, Jisheng Zhang wrote: > Add "read-mostly" qualifier to irq_stack_ptr and > irq_shadow_call_stack_ptr. This is to prevent the false sharing. > > Before the patch, I got below percpu layout with one defconfig: > ffffffc008723050 <mde_ref_count>: > ffffffc008723050: 00 00 00 00 > .... > > ffffffc008723054 <kde_ref_count>: > ffffffc008723054: 00 00 00 00 > .... > > ffffffc008723058 <irq_stack_ptr>: > ... > > ffffffc008723060 <nmi_contexts>: > ... > > ffffffc008723070 <fpsimd_last_state>: > > As can be seen, the irq_stack_ptr sits with the heavy read/write percpu > vars such as fpsimd_last_state etc. at the same cacheline. > > After the patch: > > ffffffc008723000 <irq_stack_ptr>: > ... > > ffffffc008723008 <cpu_number>: > ... > > ffffffc008723010 <arm64_ssbd_callback_required>: > ... > > ffffffc008723018 <bp_hardening_data>: > ... > > Now, the irq_stack_ptr sits with read mostly percpu vars such as > cpu_number etc. at the same cacheline. Were you able to measure any performance difference after this change? Will
diff --git a/arch/arm64/include/asm/stacktrace.h b/arch/arm64/include/asm/stacktrace.h index e77cdef9ca29..75c142bfdffe 100644 --- a/arch/arm64/include/asm/stacktrace.h +++ b/arch/arm64/include/asm/stacktrace.h @@ -66,7 +66,7 @@ struct stackframe { extern void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk, const char *loglvl); -DECLARE_PER_CPU(unsigned long *, irq_stack_ptr); +DECLARE_PER_CPU_READ_MOSTLY(unsigned long *, irq_stack_ptr); static inline bool on_stack(unsigned long sp, unsigned long size, unsigned long low, unsigned long high, diff --git a/arch/arm64/kernel/irq.c b/arch/arm64/kernel/irq.c index bda49430c9ea..d2e75e9bb826 100644 --- a/arch/arm64/kernel/irq.c +++ b/arch/arm64/kernel/irq.c @@ -26,13 +26,12 @@ /* Only access this in an NMI enter/exit */ DEFINE_PER_CPU(struct nmi_ctx, nmi_contexts); -DEFINE_PER_CPU(unsigned long *, irq_stack_ptr); +DEFINE_PER_CPU_READ_MOSTLY(unsigned long *, irq_stack_ptr); - -DECLARE_PER_CPU(unsigned long *, irq_shadow_call_stack_ptr); +DECLARE_PER_CPU_READ_MOSTLY(unsigned long *, irq_shadow_call_stack_ptr); #ifdef CONFIG_SHADOW_CALL_STACK -DEFINE_PER_CPU(unsigned long *, irq_shadow_call_stack_ptr); +DEFINE_PER_CPU_READ_MOSTLY(unsigned long *, irq_shadow_call_stack_ptr); #endif static void init_irq_scs(void)
Add "read-mostly" qualifier to irq_stack_ptr and irq_shadow_call_stack_ptr. This is to prevent the false sharing. Before the patch, I got below percpu layout with one defconfig: ffffffc008723050 <mde_ref_count>: ffffffc008723050: 00 00 00 00 .... ffffffc008723054 <kde_ref_count>: ffffffc008723054: 00 00 00 00 .... ffffffc008723058 <irq_stack_ptr>: ... ffffffc008723060 <nmi_contexts>: ... ffffffc008723070 <fpsimd_last_state>: As can be seen, the irq_stack_ptr sits with the heavy read/write percpu vars such as fpsimd_last_state etc. at the same cacheline. After the patch: ffffffc008723000 <irq_stack_ptr>: ... ffffffc008723008 <cpu_number>: ... ffffffc008723010 <arm64_ssbd_callback_required>: ... ffffffc008723018 <bp_hardening_data>: ... Now, the irq_stack_ptr sits with read mostly percpu vars such as cpu_number etc. at the same cacheline. Signed-off-by: Jisheng Zhang <Jisheng.Zhang@synaptics.com> --- arch/arm64/include/asm/stacktrace.h | 2 +- arch/arm64/kernel/irq.c | 7 +++---- 2 files changed, 4 insertions(+), 5 deletions(-)