Message ID | 20191023123118.386844979@linutronix.de (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | entry: Provide generic implementation for host and guest entry/exit work | expand |
On Wed, Oct 23, 2019 at 5:31 AM Thomas Gleixner <tglx@linutronix.de> wrote: > > Interrupt state tracing can be safely done in C code. The few stack > operations in assembly do not need to be covered. > > Remove the now pointless indirection via .Lsyscall_32_done and jump to > swapgs_restore_regs_and_return_to_usermode directly. This doesn't look right. > #define SYSCALL_EXIT_WORK_FLAGS \ > @@ -279,6 +282,9 @@ static void syscall_slow_exit_work(struc > { > struct thread_info *ti; > > + /* User to kernel transition disabled interrupts. */ > + trace_hardirqs_off(); > + So you just traced IRQs off, but... > enter_from_user_mode(); > local_irq_enable(); Now they're on and traced on again? I also don't see how your patch handles the fastpath case. How about the attached patch instead?
On Wed, Oct 23, 2019 at 2:30 PM Andy Lutomirski <luto@kernel.org> wrote: > > On Wed, Oct 23, 2019 at 5:31 AM Thomas Gleixner <tglx@linutronix.de> wrote: > > > > Interrupt state tracing can be safely done in C code. The few stack > > operations in assembly do not need to be covered. > > > > Remove the now pointless indirection via .Lsyscall_32_done and jump to > > swapgs_restore_regs_and_return_to_usermode directly. > > This doesn't look right. > > > #define SYSCALL_EXIT_WORK_FLAGS \ > > @@ -279,6 +282,9 @@ static void syscall_slow_exit_work(struc > > { > > struct thread_info *ti; > > > > + /* User to kernel transition disabled interrupts. */ > > + trace_hardirqs_off(); > > + > > So you just traced IRQs off, but... > > > enter_from_user_mode(); > > local_irq_enable(); > > Now they're on and traced on again? > > I also don't see how your patch handles the fastpath case. > > How about the attached patch instead? Ignore the attached patch. You have this in your do_exit_to_usermode() later in the series. But I'm still quite confused by this patch.
On Wed, 23 Oct 2019, Andy Lutomirski wrote: > On Wed, Oct 23, 2019 at 5:31 AM Thomas Gleixner <tglx@linutronix.de> wrote: > > > > Interrupt state tracing can be safely done in C code. The few stack > > operations in assembly do not need to be covered. > > > > Remove the now pointless indirection via .Lsyscall_32_done and jump to > > swapgs_restore_regs_and_return_to_usermode directly. > > This doesn't look right. > > > #define SYSCALL_EXIT_WORK_FLAGS \ > > @@ -279,6 +282,9 @@ static void syscall_slow_exit_work(struc > > { > > struct thread_info *ti; > > > > + /* User to kernel transition disabled interrupts. */ > > + trace_hardirqs_off(); > > + > > So you just traced IRQs off, but... > > > enter_from_user_mode(); > > local_irq_enable(); > > Now they're on and traced on again? Yes, because that's what actually happens. usermode syscall <- Disables interrupts, but tracing thinks they are on entry_SYSCALL_64 .... call do_syscall_64 trace_hardirqs_off() <- So before calling anything else, we have to tell the tracer that interrupts are on, which we did so far in the ASM code between entry_SYSCALL_64 and 'call do_syscall_64'. I'm merily lifting this to C-code. enter_from_user_mode() local_irq_enable() > I also don't see how your patch handles the fastpath case. Hmm? All syscalls return through: syscall_return_slowpath() local_irq_disable() prepare_exit_to_usermode() user_enter_irqoff() mds_user_clear_cpu_buffers() trace_hardirqs_on() What am I missing? > How about the attached patch instead? ^^^^^^ Groan. > > user_enter_irqoff(); > > + /* > + * The actual return to usermode will almost certainly turn IRQs on. > + * Trace it here to simplify the asm code. Why would we return to user from a syscall or interrupt with interrupts traced as disabled? Also the existing ASM is inconsistent vs. that: ENTRY(entry_SYSENTER_32) TRACE_IRQS_ON ENTRY(entry_INT80_32) TRACE_IRQS_IRET ENTRY(entry_SYSCALL_64) TRACE_IRQS_IRET ENTRY(ret_from_fork) TRACE_IRQS_ON GLOBAL(retint_user) TRACE_IRQS_IRETQ ENTRY(entry_SYSCALL_compat) TRACE_IRQS_ON ENTRY(entry_INT80_compat) TRACE_IRQS_ON > + */ > + if (likely(regs->flags & X86_EFLAGS_IF)) > + trace_hardirqs_on(); My variant does this unconditionally and after mds_user_clear_cpu_buffers(). > mds_user_clear_cpu_buffers(); > } And your ASM changes keep still all the TRACE_IRQS_OFF invocations in the various syscall entry pathes, which is what I removed and put as the first thing into the C functions. Confused. Thanks, tglx
On Wed, 23 Oct 2019, Andy Lutomirski wrote: > On Wed, Oct 23, 2019 at 2:30 PM Andy Lutomirski <luto@kernel.org> wrote: > > > > On Wed, Oct 23, 2019 at 5:31 AM Thomas Gleixner <tglx@linutronix.de> wrote: > > > > > > Interrupt state tracing can be safely done in C code. The few stack > > > operations in assembly do not need to be covered. > > > > > > Remove the now pointless indirection via .Lsyscall_32_done and jump to > > > swapgs_restore_regs_and_return_to_usermode directly. > > > > This doesn't look right. > > > > > #define SYSCALL_EXIT_WORK_FLAGS \ > > > @@ -279,6 +282,9 @@ static void syscall_slow_exit_work(struc > > > { > > > struct thread_info *ti; > > > > > > + /* User to kernel transition disabled interrupts. */ > > > + trace_hardirqs_off(); > > > + > > > > So you just traced IRQs off, but... > > > > > enter_from_user_mode(); > > > local_irq_enable(); > > > > Now they're on and traced on again? > > > > I also don't see how your patch handles the fastpath case. > > > > How about the attached patch instead? > > Ignore the attached patch. You have this in your > do_exit_to_usermode() later in the series. But I'm still quite > confused by this patch. What's confusing you? It basically does: ENTRY(syscall/int80) - TRACE_IRQS_OFF call C-syscall*() - TRACE_IRQS_ON/IRET and C-syscall*() + trace_hardirqs_off() <- first action .... prepare_exit_to_usermode() <- last action return and prepare_exit_to_usermode() .... + trace_hardirqs_on() <- last action return So this is exactly the same as the ASM today. The only change is that I made it do unconditionally trace_hardirqs_on() for consistency reasons. I tried to split it into bits and pieces, but failed to come up with something sensible. Let me try again tomorrow. Thanks, tglx
On Wed, Oct 23, 2019 at 2:30 PM Andy Lutomirski <luto@kernel.org> wrote: > > On Wed, Oct 23, 2019 at 5:31 AM Thomas Gleixner <tglx@linutronix.de> wrote: > > > > Interrupt state tracing can be safely done in C code. The few stack > > operations in assembly do not need to be covered. > > > > Remove the now pointless indirection via .Lsyscall_32_done and jump to > > swapgs_restore_regs_and_return_to_usermode directly. > > This doesn't look right. Well, I feel a bit silly. I read this: > > > #define SYSCALL_EXIT_WORK_FLAGS \ > > @@ -279,6 +282,9 @@ static void syscall_slow_exit_work(struc ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ and I applied the diff in my head to the wrong function, and I didn't notice that it didn't really apply there. Oddly, gitweb gets this right: https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/commit/?h=WIP.core/entry&id=e3158f93138ded84eb44fa97606197f6adcf9366 Looking at the actual code: Acked-by: Andy Lutomirski <luto@kernel.org> with one minor caveat: you are making a subtle and mostly irrelevant semantic change: with your patch, user mode will be traced as IRQs on even if a nasty user has used iopl() to turn off interrupts. This is probably a good thing, but I think you should mention it in the changelog. FWIW, the rest of the series looks pretty good, too.
On Thu, Oct 24, 2019 at 09:24:13AM -0700, Andy Lutomirski wrote: > On Wed, Oct 23, 2019 at 2:30 PM Andy Lutomirski <luto@kernel.org> wrote: > > > > On Wed, Oct 23, 2019 at 5:31 AM Thomas Gleixner <tglx@linutronix.de> wrote: > > > > > > Interrupt state tracing can be safely done in C code. The few stack > > > operations in assembly do not need to be covered. > > > > > > Remove the now pointless indirection via .Lsyscall_32_done and jump to > > > swapgs_restore_regs_and_return_to_usermode directly. > > > > This doesn't look right. > > Well, I feel a bit silly. I read this: > > > > > > #define SYSCALL_EXIT_WORK_FLAGS \ > > > @@ -279,6 +282,9 @@ static void syscall_slow_exit_work(struc > > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > and I applied the diff in my head to the wrong function, and I didn't > notice that it didn't really apply there. Oddly, gitweb gets this I had the same when reviewing these patches; I was almost going to ask tglx about it on IRC when the penny dropped.
On Thu, 24 Oct 2019, Peter Zijlstra wrote: > On Thu, Oct 24, 2019 at 09:24:13AM -0700, Andy Lutomirski wrote: > > On Wed, Oct 23, 2019 at 2:30 PM Andy Lutomirski <luto@kernel.org> wrote: > > > > > > On Wed, Oct 23, 2019 at 5:31 AM Thomas Gleixner <tglx@linutronix.de> wrote: > > > > > > > > Interrupt state tracing can be safely done in C code. The few stack > > > > operations in assembly do not need to be covered. > > > > > > > > Remove the now pointless indirection via .Lsyscall_32_done and jump to > > > > swapgs_restore_regs_and_return_to_usermode directly. > > > > > > This doesn't look right. > > > > Well, I feel a bit silly. I read this: Happened to me before. Don't worry. > > > > > > > #define SYSCALL_EXIT_WORK_FLAGS \ > > > > @@ -279,6 +282,9 @@ static void syscall_slow_exit_work(struc > > > > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > > > and I applied the diff in my head to the wrong function, and I didn't > > notice that it didn't really apply there. Oddly, gitweb gets this > > I had the same when reviewing these patches; I was almost going to ask > tglx about it on IRC when the penny dropped. diff is weird at times. Thanks, tglx
--- a/arch/x86/entry/common.c +++ b/arch/x86/entry/common.c @@ -218,6 +218,9 @@ static void exit_to_usermode_loop(struct user_enter_irqoff(); mds_user_clear_cpu_buffers(); + + /* The return to usermode reenables interrupts. Tell the tracer */ + trace_hardirqs_on(); } #define SYSCALL_EXIT_WORK_FLAGS \ @@ -279,6 +282,9 @@ static void syscall_slow_exit_work(struc { struct thread_info *ti; + /* User to kernel transition disabled interrupts. */ + trace_hardirqs_off(); + enter_from_user_mode(); local_irq_enable(); ti = current_thread_info(); @@ -351,6 +357,7 @@ static __always_inline void do_syscall_3 /* Handles int $0x80 */ __visible void do_int80_syscall_32(struct pt_regs *regs) { + trace_hardirqs_off(); enter_from_user_mode(); local_irq_enable(); do_syscall_32_irqs_on(regs); @@ -367,6 +374,9 @@ static __always_inline void do_syscall_3 unsigned long landing_pad = (unsigned long)current->mm->context.vdso + vdso_image_32.sym_int80_landing_pad; + /* User to kernel transition disabled interrupts. */ + trace_hardirqs_off(); + /* * SYSENTER loses EIP, and even SYSCALL32 needs us to skip forward * so that 'regs->ip -= 2' lands back on an int $0x80 instruction. --- a/arch/x86/entry/entry_32.S +++ b/arch/x86/entry/entry_32.S @@ -924,12 +924,6 @@ ENTRY(entry_SYSENTER_32) jnz .Lsysenter_fix_flags .Lsysenter_flags_fixed: - /* - * User mode is traced as though IRQs are on, and SYSENTER - * turned them off. - */ - TRACE_IRQS_OFF - movl %esp, %eax call do_fast_syscall_32 /* XEN PV guests always use IRET path */ @@ -939,8 +933,6 @@ ENTRY(entry_SYSENTER_32) STACKLEAK_ERASE /* Opportunistic SYSEXIT */ - TRACE_IRQS_ON /* User mode traces as IRQs on. */ - /* * Setup entry stack - we keep the pointer in %eax and do the * switch after almost all user-state is restored. @@ -1039,12 +1031,6 @@ ENTRY(entry_INT80_32) SAVE_ALL pt_regs_ax=$-ENOSYS switch_stacks=1 /* save rest */ - /* - * User mode is traced as though IRQs are on, and the interrupt gate - * turned them off. - */ - TRACE_IRQS_OFF - movl %esp, %eax call do_int80_syscall_32 .Lsyscall_32_done: @@ -1052,11 +1038,8 @@ ENTRY(entry_INT80_32) STACKLEAK_ERASE restore_all: - TRACE_IRQS_IRET SWITCH_TO_ENTRY_STACK -.Lrestore_all_notrace: CHECK_AND_APPLY_ESPFIX -.Lrestore_nocheck: /* Switch back to user CR3 */ SWITCH_TO_USER_CR3 scratch_reg=%eax --- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -167,15 +167,11 @@ GLOBAL(entry_SYSCALL_64_after_hwframe) PUSH_AND_CLEAR_REGS rax=$-ENOSYS - TRACE_IRQS_OFF - /* IRQs are off. */ movq %rax, %rdi movq %rsp, %rsi call do_syscall_64 /* returns with IRQs disabled */ - TRACE_IRQS_IRETQ /* we're about to change IF */ - /* * Try to use SYSRET instead of IRET if we're returning to * a completely clean 64-bit userspace context. If we're not, @@ -342,7 +338,6 @@ ENTRY(ret_from_fork) UNWIND_HINT_REGS movq %rsp, %rdi call syscall_return_slowpath /* returns with IRQs disabled */ - TRACE_IRQS_ON /* user mode is traced as IRQS on */ jmp swapgs_restore_regs_and_return_to_usermode 1: @@ -606,7 +601,6 @@ END(common_spurious) GLOBAL(retint_user) mov %rsp,%rdi call prepare_exit_to_usermode - TRACE_IRQS_IRETQ GLOBAL(swapgs_restore_regs_and_return_to_usermode) #ifdef CONFIG_DEBUG_ENTRY --- a/arch/x86/entry/entry_64_compat.S +++ b/arch/x86/entry/entry_64_compat.S @@ -129,17 +129,11 @@ ENTRY(entry_SYSENTER_compat) jnz .Lsysenter_fix_flags .Lsysenter_flags_fixed: - /* - * User mode is traced as though IRQs are on, and SYSENTER - * turned them off. - */ - TRACE_IRQS_OFF - movq %rsp, %rdi call do_fast_syscall_32 /* XEN PV guests always use IRET path */ - ALTERNATIVE "testl %eax, %eax; jz .Lsyscall_32_done", \ - "jmp .Lsyscall_32_done", X86_FEATURE_XENPV + ALTERNATIVE "testl %eax, %eax; jz swapgs_restore_regs_and_return_to_usermode", \ + "jmp swapgs_restore_regs_and_return_to_usermode", X86_FEATURE_XENPV jmp sysret32_from_system_call .Lsysenter_fix_flags: @@ -247,17 +241,11 @@ GLOBAL(entry_SYSCALL_compat_after_hwfram pushq $0 /* pt_regs->r15 = 0 */ xorl %r15d, %r15d /* nospec r15 */ - /* - * User mode is traced as though IRQs are on, and SYSENTER - * turned them off. - */ - TRACE_IRQS_OFF - movq %rsp, %rdi call do_fast_syscall_32 /* XEN PV guests always use IRET path */ - ALTERNATIVE "testl %eax, %eax; jz .Lsyscall_32_done", \ - "jmp .Lsyscall_32_done", X86_FEATURE_XENPV + ALTERNATIVE "testl %eax, %eax; jz swapgs_restore_regs_and_return_to_usermode", \ + "jmp swapgs_restore_regs_and_return_to_usermode", X86_FEATURE_XENPV /* Opportunistic SYSRET */ sysret32_from_system_call: @@ -266,7 +254,6 @@ GLOBAL(entry_SYSCALL_compat_after_hwfram * stack. So let's erase the thread stack right now. */ STACKLEAK_ERASE - TRACE_IRQS_ON /* User mode traces as IRQs on. */ movq RBX(%rsp), %rbx /* pt_regs->rbx */ movq RBP(%rsp), %rbp /* pt_regs->rbp */ movq EFLAGS(%rsp), %r11 /* pt_regs->flags (in r11) */ @@ -403,17 +390,8 @@ ENTRY(entry_INT80_compat) xorl %r15d, %r15d /* nospec r15 */ cld - /* - * User mode is traced as though IRQs are on, and the interrupt - * gate turned them off. - */ - TRACE_IRQS_OFF - movq %rsp, %rdi call do_int80_syscall_32 -.Lsyscall_32_done: - /* Go back to user mode. */ - TRACE_IRQS_ON jmp swapgs_restore_regs_and_return_to_usermode END(entry_INT80_compat)
Interrupt state tracing can be safely done in C code. The few stack operations in assembly do not need to be covered. Remove the now pointless indirection via .Lsyscall_32_done and jump to swapgs_restore_regs_and_return_to_usermode directly. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> --- arch/x86/entry/common.c | 10 ++++++++++ arch/x86/entry/entry_32.S | 17 ----------------- arch/x86/entry/entry_64.S | 6 ------ arch/x86/entry/entry_64_compat.S | 30 ++++-------------------------- 4 files changed, 14 insertions(+), 49 deletions(-)