From patchwork Wed Oct 23 12:27:06 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Thomas Gleixner X-Patchwork-Id: 11206563 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id CAA1C112B for ; Wed, 23 Oct 2019 12:31:40 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id B1C5821928 for ; Wed, 23 Oct 2019 12:31:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2391357AbfJWMbk (ORCPT ); Wed, 23 Oct 2019 08:31:40 -0400 Received: from Galois.linutronix.de ([193.142.43.55]:49086 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2389662AbfJWMbj (ORCPT ); Wed, 23 Oct 2019 08:31:39 -0400 Received: from localhost ([127.0.0.1] helo=nanos.tec.linutronix.de) by Galois.linutronix.de with esmtp (Exim 4.80) (envelope-from ) id 1iNFnb-00016u-4Z; Wed, 23 Oct 2019 14:31:35 +0200 Message-Id: <20191023123117.686514045@linutronix.de> User-Agent: quilt/0.65 Date: Wed, 23 Oct 2019 14:27:06 +0200 From: Thomas Gleixner To: LKML Cc: x86@kernel.org, Peter Zijlstra , Andy Lutomirski , Will Deacon , Paolo Bonzini , kvm@vger.kernel.org, linux-arch@vger.kernel.org, Mike Rapoport , Josh Poimboeuf , Miroslav Benes Subject: [patch V2 01/17] x86/entry/32: Remove unused resume_userspace label References: <20191023122705.198339581@linutronix.de> MIME-Version: 1.0 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org The C reimplementation of SYSENTER left that unused ENTRY() label around. Remove it. Fixes: 5f310f739b4c ("x86/entry/32: Re-implement SYSENTER using the new C path") Originally-by: Peter Zijlstra Signed-off-by: Thomas Gleixner Reviewed-by: Sean Christopherson Reviewed-by: Alexandre Chartre --- arch/x86/entry/entry_32.S | 1 - 1 file changed, 1 deletion(-) --- a/arch/x86/entry/entry_32.S +++ b/arch/x86/entry/entry_32.S @@ -825,7 +825,6 @@ END(ret_from_fork) cmpl $USER_RPL, %eax jb restore_all_kernel # not returning to v8086 or userspace -ENTRY(resume_userspace) DISABLE_INTERRUPTS(CLBR_ANY) TRACE_IRQS_OFF movl %esp, %eax From patchwork Wed Oct 23 12:27:07 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Thomas Gleixner X-Patchwork-Id: 11206565 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id F2354913 for ; Wed, 23 Oct 2019 12:31:40 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id DA16C222C1 for ; Wed, 23 Oct 2019 12:31:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2391365AbfJWMbk (ORCPT ); Wed, 23 Oct 2019 08:31:40 -0400 Received: from Galois.linutronix.de ([193.142.43.55]:49083 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1733282AbfJWMbj (ORCPT ); Wed, 23 Oct 2019 08:31:39 -0400 Received: from localhost ([127.0.0.1] helo=nanos.tec.linutronix.de) by Galois.linutronix.de with esmtp (Exim 4.80) (envelope-from ) id 1iNFnb-00016z-Hd; Wed, 23 Oct 2019 14:31:35 +0200 Message-Id: <20191023123117.779277679@linutronix.de> User-Agent: quilt/0.65 Date: Wed, 23 Oct 2019 14:27:07 +0200 From: Thomas Gleixner To: LKML Cc: x86@kernel.org, Peter Zijlstra , Andy Lutomirski , Will Deacon , Paolo Bonzini , kvm@vger.kernel.org, linux-arch@vger.kernel.org, Mike Rapoport , Josh Poimboeuf , Miroslav Benes Subject: [patch V2 02/17] x86/entry/64: Remove pointless jump in paranoid_exit References: <20191023122705.198339581@linutronix.de> MIME-Version: 1.0 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Jump directly to restore_regs_and_return_to_kernel instead of making a pointless extra jump through .Lparanoid_exit_restore Signed-off-by: Thomas Gleixner Reviewed-by: Sean Christopherson Reviewed-by: Alexandre Chartre --- arch/x86/entry/entry_64.S | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) --- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -1272,12 +1272,11 @@ ENTRY(paranoid_exit) /* Always restore stashed CR3 value (see paranoid_entry) */ RESTORE_CR3 scratch_reg=%rbx save_reg=%r14 SWAPGS_UNSAFE_STACK - jmp .Lparanoid_exit_restore + jmp restore_regs_and_return_to_kernel .Lparanoid_exit_no_swapgs: TRACE_IRQS_IRETQ_DEBUG /* Always restore stashed CR3 value (see paranoid_entry) */ RESTORE_CR3 scratch_reg=%rbx save_reg=%r14 -.Lparanoid_exit_restore: jmp restore_regs_and_return_to_kernel END(paranoid_exit) From patchwork Wed Oct 23 12:27:08 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Thomas Gleixner X-Patchwork-Id: 11206611 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 16BAD13BD for ; Wed, 23 Oct 2019 12:33:44 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id F377521928 for ; Wed, 23 Oct 2019 12:33:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2391350AbfJWMbj (ORCPT ); Wed, 23 Oct 2019 08:31:39 -0400 Received: from Galois.linutronix.de ([193.142.43.55]:49084 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1732144AbfJWMbj (ORCPT ); Wed, 23 Oct 2019 08:31:39 -0400 Received: from localhost ([127.0.0.1] helo=nanos.tec.linutronix.de) by Galois.linutronix.de with esmtp (Exim 4.80) (envelope-from ) id 1iNFnb-000174-UN; Wed, 23 Oct 2019 14:31:35 +0200 Message-Id: <20191023123117.871608831@linutronix.de> User-Agent: quilt/0.65 Date: Wed, 23 Oct 2019 14:27:08 +0200 From: Thomas Gleixner To: LKML Cc: x86@kernel.org, Peter Zijlstra , Andy Lutomirski , Will Deacon , Paolo Bonzini , kvm@vger.kernel.org, linux-arch@vger.kernel.org, Mike Rapoport , Josh Poimboeuf , Miroslav Benes Subject: [patch V2 03/17] x86/traps: Remove pointless irq enable from do_spurious_interrupt_bug() References: <20191023122705.198339581@linutronix.de> MIME-Version: 1.0 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org That function returns immediately after conditionally reenabling interrupts which is more than pointless and requires the ASM code to disable interrupts again. Signed-off-by: Thomas Gleixner Reviewed-by: Sean Christopherson Reviewed-by: Alexandre Chartre --- arch/x86/kernel/traps.c | 1 - 1 file changed, 1 deletion(-) --- a/arch/x86/kernel/traps.c +++ b/arch/x86/kernel/traps.c @@ -871,7 +871,6 @@ do_simd_coprocessor_error(struct pt_regs dotraplinkage void do_spurious_interrupt_bug(struct pt_regs *regs, long error_code) { - cond_local_irq_enable(regs); } dotraplinkage void From patchwork Wed Oct 23 12:27:09 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Thomas Gleixner X-Patchwork-Id: 11206607 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 25136112B for ; Wed, 23 Oct 2019 12:33:30 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 0E656214B2 for ; Wed, 23 Oct 2019 12:33:30 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2391505AbfJWMd0 (ORCPT ); Wed, 23 Oct 2019 08:33:26 -0400 Received: from Galois.linutronix.de ([193.142.43.55]:49091 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2389735AbfJWMbk (ORCPT ); Wed, 23 Oct 2019 08:31:40 -0400 Received: from localhost ([127.0.0.1] helo=nanos.tec.linutronix.de) by Galois.linutronix.de with esmtp (Exim 4.80) (envelope-from ) id 1iNFnc-000179-B8; Wed, 23 Oct 2019 14:31:36 +0200 Message-Id: <20191023123117.976831752@linutronix.de> User-Agent: quilt/0.65 Date: Wed, 23 Oct 2019 14:27:09 +0200 From: Thomas Gleixner To: LKML Cc: x86@kernel.org, Peter Zijlstra , Andy Lutomirski , Will Deacon , Paolo Bonzini , kvm@vger.kernel.org, linux-arch@vger.kernel.org, Mike Rapoport , Josh Poimboeuf , Miroslav Benes Subject: [patch V2 04/17] x86/entry: Make DEBUG_ENTRY_ASSERT_IRQS_OFF available for 32bit References: <20191023122705.198339581@linutronix.de> MIME-Version: 1.0 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Move the interrupt state verification debug macro to common code and fixup the irqflags and paravirt components so it can be used in 32bit code later. Signed-off-by: Thomas Gleixner Reviewed-by: Sean Christopherson Reviewed-by: Alexandre Chartre --- arch/x86/entry/calling.h | 12 ++++++++++++ arch/x86/entry/entry_64.S | 12 ------------ arch/x86/include/asm/irqflags.h | 8 ++++++-- arch/x86/include/asm/paravirt.h | 9 +++++---- 4 files changed, 23 insertions(+), 18 deletions(-) --- a/arch/x86/entry/calling.h +++ b/arch/x86/entry/calling.h @@ -366,3 +366,15 @@ For 32-bit we have the following convent #else #define GET_CR2_INTO(reg) _ASM_MOV %cr2, reg #endif + +.macro DEBUG_ENTRY_ASSERT_IRQS_OFF +#ifdef CONFIG_DEBUG_ENTRY + push %_ASM_AX + SAVE_FLAGS(CLBR_EAX) + test $X86_EFLAGS_IF, %_ASM_AX + jz .Lokay_\@ + ud2 +.Lokay_\@: + pop %_ASM_AX +#endif +.endm --- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -387,18 +387,6 @@ ENTRY(spurious_entries_start) .endr END(spurious_entries_start) -.macro DEBUG_ENTRY_ASSERT_IRQS_OFF -#ifdef CONFIG_DEBUG_ENTRY - pushq %rax - SAVE_FLAGS(CLBR_RAX) - testl $X86_EFLAGS_IF, %eax - jz .Lokay_\@ - ud2 -.Lokay_\@: - popq %rax -#endif -.endm - /* * Enters the IRQ stack if we're not already using it. NMI-safe. Clobbers * flags and puts old RSP into old_rsp, and leaves all other GPRs alone. --- a/arch/x86/include/asm/irqflags.h +++ b/arch/x86/include/asm/irqflags.h @@ -126,11 +126,15 @@ static inline notrace unsigned long arch #define ENABLE_INTERRUPTS(x) sti #define DISABLE_INTERRUPTS(x) cli -#ifdef CONFIG_X86_64 #ifdef CONFIG_DEBUG_ENTRY -#define SAVE_FLAGS(x) pushfq; popq %rax +# ifdef CONFIG_X86_64 +# define SAVE_FLAGS(x) pushfq; popq %rax +# else +# define SAVE_FLAGS(x) pushfl; popl %eax +# endif #endif +#ifdef CONFIG_X86_64 #define SWAPGS swapgs /* * Currently paravirt can't handle swapgs nicely when we --- a/arch/x86/include/asm/paravirt.h +++ b/arch/x86/include/asm/paravirt.h @@ -904,6 +904,11 @@ extern void default_banner(void); ANNOTATE_RETPOLINE_SAFE; \ jmp PARA_INDIRECT(pv_ops+PV_CPU_usergs_sysret64);) +#endif /* CONFIG_PARAVIRT_XXL */ +#endif /* CONFIG_X86_64 */ + +#ifdef CONFIG_PARAVIRT_XXL + #ifdef CONFIG_DEBUG_ENTRY #define SAVE_FLAGS(clobbers) \ PARA_SITE(PARA_PATCH(PV_IRQ_save_fl), \ @@ -912,10 +917,6 @@ extern void default_banner(void); call PARA_INDIRECT(pv_ops+PV_IRQ_save_fl); \ PV_RESTORE_REGS(clobbers | CLBR_CALLEE_SAVE);) #endif -#endif /* CONFIG_PARAVIRT_XXL */ -#endif /* CONFIG_X86_64 */ - -#ifdef CONFIG_PARAVIRT_XXL #define GET_CR2_INTO_AX \ PARA_SITE(PARA_PATCH(PV_MMU_read_cr2), \ From patchwork Wed Oct 23 12:27:10 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Thomas Gleixner X-Patchwork-Id: 11206599 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 53E80913 for ; Wed, 23 Oct 2019 12:33:12 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 3CB0A21920 for ; Wed, 23 Oct 2019 12:33:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2405358AbfJWMdG (ORCPT ); Wed, 23 Oct 2019 08:33:06 -0400 Received: from Galois.linutronix.de ([193.142.43.55]:49096 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2389772AbfJWMbl (ORCPT ); Wed, 23 Oct 2019 08:31:41 -0400 Received: from localhost ([127.0.0.1] helo=nanos.tec.linutronix.de) by Galois.linutronix.de with esmtp (Exim 4.80) (envelope-from ) id 1iNFnc-00017I-Ug; Wed, 23 Oct 2019 14:31:37 +0200 Message-Id: <20191023123118.084086112@linutronix.de> User-Agent: quilt/0.65 Date: Wed, 23 Oct 2019 14:27:10 +0200 From: Thomas Gleixner To: LKML Cc: x86@kernel.org, Peter Zijlstra , Andy Lutomirski , Will Deacon , Paolo Bonzini , kvm@vger.kernel.org, linux-arch@vger.kernel.org, Mike Rapoport , Josh Poimboeuf , Miroslav Benes Subject: [patch V2 05/17] x86/traps: Make interrupt enable/disable symmetric in C code References: <20191023122705.198339581@linutronix.de> MIME-Version: 1.0 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Traps enable interrupts conditionally but rely on the ASM return code to disable them again. That results in redundant interrupt disable and trace calls. Make the trap handlers disable interrupts before returning to avoid that, which allows simplification of the ASM entry code. Originally-by: Peter Zijlstra Signed-off-by: Thomas Gleixner Reviewed-by: Sean Christopherson Reviewed-by: Alexandre Chartre --- arch/x86/kernel/traps.c | 32 +++++++++++++++++++++----------- arch/x86/mm/fault.c | 7 +++++-- 2 files changed, 26 insertions(+), 13 deletions(-) --- a/arch/x86/kernel/traps.c +++ b/arch/x86/kernel/traps.c @@ -276,6 +276,7 @@ static void do_error_trap(struct pt_regs NOTIFY_STOP) { cond_local_irq_enable(regs); do_trap(trapnr, signr, str, regs, error_code, sicode, addr); + cond_local_irq_disable(regs); } } @@ -501,6 +502,7 @@ dotraplinkage void do_bounds(struct pt_r die("bounds", regs, error_code); } + cond_local_irq_disable(regs); return; exit_trap: @@ -512,6 +514,7 @@ dotraplinkage void do_bounds(struct pt_r * time.. */ do_trap(X86_TRAP_BR, SIGSEGV, "bounds", regs, error_code, 0, NULL); + cond_local_irq_disable(regs); } dotraplinkage void @@ -525,19 +528,19 @@ do_general_protection(struct pt_regs *re if (static_cpu_has(X86_FEATURE_UMIP)) { if (user_mode(regs) && fixup_umip_exception(regs)) - return; + goto exit_trap; } if (v8086_mode(regs)) { local_irq_enable(); handle_vm86_fault((struct kernel_vm86_regs *) regs, error_code); - return; + goto exit_trap; } tsk = current; if (!user_mode(regs)) { if (fixup_exception(regs, X86_TRAP_GP, error_code, 0)) - return; + goto exit_trap; tsk->thread.error_code = error_code; tsk->thread.trap_nr = X86_TRAP_GP; @@ -549,12 +552,12 @@ do_general_protection(struct pt_regs *re */ if (!preemptible() && kprobe_running() && kprobe_fault_handler(regs, X86_TRAP_GP)) - return; + goto exit_trap; if (notify_die(DIE_GPF, desc, regs, error_code, X86_TRAP_GP, SIGSEGV) != NOTIFY_STOP) die(desc, regs, error_code); - return; + goto exit_trap; } tsk->thread.error_code = error_code; @@ -563,6 +566,8 @@ do_general_protection(struct pt_regs *re show_signal(tsk, SIGSEGV, "", desc, regs, error_code); force_sig(SIGSEGV); +exit_trap: + cond_local_irq_disable(regs); } NOKPROBE_SYMBOL(do_general_protection); @@ -783,9 +788,7 @@ dotraplinkage void do_debug(struct pt_re if (v8086_mode(regs)) { handle_vm86_trap((struct kernel_vm86_regs *) regs, error_code, X86_TRAP_DB); - cond_local_irq_disable(regs); - debug_stack_usage_dec(); - goto exit; + goto exit_irq; } if (WARN_ON_ONCE((dr6 & DR_STEP) && !user_mode(regs))) { @@ -802,6 +805,8 @@ dotraplinkage void do_debug(struct pt_re si_code = get_si_code(tsk->thread.debugreg6); if (tsk->thread.debugreg6 & (DR_STEP | DR_TRAP_BITS) || user_icebp) send_sigtrap(regs, error_code, si_code); + +exit_irq: cond_local_irq_disable(regs); debug_stack_usage_dec(); @@ -827,7 +832,7 @@ static void math_error(struct pt_regs *r if (!user_mode(regs)) { if (fixup_exception(regs, trapnr, error_code, 0)) - return; + goto exit_trap; task->thread.error_code = error_code; task->thread.trap_nr = trapnr; @@ -835,7 +840,7 @@ static void math_error(struct pt_regs *r if (notify_die(DIE_TRAP, str, regs, error_code, trapnr, SIGFPE) != NOTIFY_STOP) die(str, regs, error_code); - return; + goto exit_trap; } /* @@ -849,10 +854,12 @@ static void math_error(struct pt_regs *r si_code = fpu__exception_code(fpu, trapnr); /* Retry when we get spurious exceptions: */ if (!si_code) - return; + goto exit_trap; force_sig_fault(SIGFPE, si_code, (void __user *)uprobe_get_trap_addr(regs)); +exit_trap: + cond_local_irq_disable(regs); } dotraplinkage void do_coprocessor_error(struct pt_regs *regs, long error_code) @@ -888,6 +895,8 @@ do_device_not_available(struct pt_regs * info.regs = regs; math_emulate(&info); + + cond_local_irq_disable(regs); return; } #endif @@ -918,6 +927,7 @@ dotraplinkage void do_iret_error(struct do_trap(X86_TRAP_IRET, SIGILL, "iret exception", regs, error_code, ILL_BADSTK, (void __user *)NULL); } + local_irq_disable(); } #endif --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -1500,10 +1500,13 @@ static noinline void return; /* Was the fault on kernel-controlled part of the address space? */ - if (unlikely(fault_in_kernel_space(address))) + if (unlikely(fault_in_kernel_space(address))) { do_kern_addr_fault(regs, hw_error_code, address); - else + } else { do_user_addr_fault(regs, hw_error_code, address); + if (regs->flags & X86_EFLAGS_IF) + local_irq_disable(); + } } NOKPROBE_SYMBOL(__do_page_fault); From patchwork Wed Oct 23 12:27:11 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Thomas Gleixner X-Patchwork-Id: 11206603 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3E681913 for ; Wed, 23 Oct 2019 12:33:17 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 26537214B2 for ; Wed, 23 Oct 2019 12:33:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2391401AbfJWMbl (ORCPT ); Wed, 23 Oct 2019 08:31:41 -0400 Received: from Galois.linutronix.de ([193.142.43.55]:49100 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2389776AbfJWMbl (ORCPT ); Wed, 23 Oct 2019 08:31:41 -0400 Received: from localhost ([127.0.0.1] helo=nanos.tec.linutronix.de) by Galois.linutronix.de with esmtp (Exim 4.80) (envelope-from ) id 1iNFnd-00017N-B9; Wed, 23 Oct 2019 14:31:37 +0200 Message-Id: <20191023123118.191230255@linutronix.de> User-Agent: quilt/0.65 Date: Wed, 23 Oct 2019 14:27:11 +0200 From: Thomas Gleixner To: LKML Cc: x86@kernel.org, Peter Zijlstra , Andy Lutomirski , Will Deacon , Paolo Bonzini , kvm@vger.kernel.org, linux-arch@vger.kernel.org, Mike Rapoport , Josh Poimboeuf , Miroslav Benes Subject: [patch V2 06/17] x86/entry/32: Remove redundant interrupt disable References: <20191023122705.198339581@linutronix.de> MIME-Version: 1.0 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Now that the trap handlers return with interrupts disabled, the unconditional disabling of interrupts in the low level entry code can be removed along with the trace calls and the misnomed preempt_stop macro. As a consequence ret_from_exception and ret_from_intr collapse. Add a debug check to verify that interrupts are disabled depending on CONFIG_DEBUG_ENTRY. Signed-off-by: Thomas Gleixner Reviewed-by: Sean Christopherson Reviewed-by: Alexandre Chartre --- arch/x86/entry/entry_32.S | 21 ++++++--------------- 1 file changed, 6 insertions(+), 15 deletions(-) --- a/arch/x86/entry/entry_32.S +++ b/arch/x86/entry/entry_32.S @@ -63,12 +63,6 @@ * enough to patch inline, increasing performance. */ -#ifdef CONFIG_PREEMPTION -# define preempt_stop(clobbers) DISABLE_INTERRUPTS(clobbers); TRACE_IRQS_OFF -#else -# define preempt_stop(clobbers) -#endif - .macro TRACE_IRQS_IRET #ifdef CONFIG_TRACE_IRQFLAGS testl $X86_EFLAGS_IF, PT_EFLAGS(%esp) # interrupts off? @@ -809,8 +803,7 @@ END(ret_from_fork) # userspace resumption stub bypassing syscall exit tracing ALIGN ret_from_exception: - preempt_stop(CLBR_ANY) -ret_from_intr: + DEBUG_ENTRY_ASSERT_IRQS_OFF #ifdef CONFIG_VM86 movl PT_EFLAGS(%esp), %eax # mix EFLAGS and CS movb PT_CS(%esp), %al @@ -825,8 +818,6 @@ END(ret_from_fork) cmpl $USER_RPL, %eax jb restore_all_kernel # not returning to v8086 or userspace - DISABLE_INTERRUPTS(CLBR_ANY) - TRACE_IRQS_OFF movl %esp, %eax call prepare_exit_to_usermode jmp restore_all @@ -1084,7 +1075,7 @@ ENTRY(entry_INT80_32) restore_all_kernel: #ifdef CONFIG_PREEMPTION - DISABLE_INTERRUPTS(CLBR_ANY) + /* Interrupts are disabled and debug-checked */ cmpl $0, PER_CPU_VAR(__preempt_count) jnz .Lno_preempt testl $X86_EFLAGS_IF, PT_EFLAGS(%esp) # interrupts off (exception path) ? @@ -1189,7 +1180,7 @@ END(spurious_entries_start) TRACE_IRQS_OFF movl %esp, %eax call smp_spurious_interrupt - jmp ret_from_intr + jmp ret_from_exception ENDPROC(common_spurious) #endif @@ -1207,7 +1198,7 @@ ENDPROC(common_spurious) TRACE_IRQS_OFF movl %esp, %eax call do_IRQ - jmp ret_from_intr + jmp ret_from_exception ENDPROC(common_interrupt) #define BUILD_INTERRUPT3(name, nr, fn) \ @@ -1219,7 +1210,7 @@ ENTRY(name) \ TRACE_IRQS_OFF \ movl %esp, %eax; \ call fn; \ - jmp ret_from_intr; \ + jmp ret_from_exception; \ ENDPROC(name) #define BUILD_INTERRUPT(name, nr) \ @@ -1366,7 +1357,7 @@ ENTRY(xen_do_upcall) #ifndef CONFIG_PREEMPTION call xen_maybe_preempt_hcall #endif - jmp ret_from_intr + jmp ret_from_exception ENDPROC(xen_hypervisor_callback) /* From patchwork Wed Oct 23 12:27:12 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Thomas Gleixner X-Patchwork-Id: 11206597 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 13076112B for ; Wed, 23 Oct 2019 12:33:04 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id E6F3721A4A for ; Wed, 23 Oct 2019 12:33:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2391412AbfJWMbl (ORCPT ); Wed, 23 Oct 2019 08:31:41 -0400 Received: from Galois.linutronix.de ([193.142.43.55]:49104 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2391367AbfJWMbl (ORCPT ); Wed, 23 Oct 2019 08:31:41 -0400 Received: from localhost ([127.0.0.1] helo=nanos.tec.linutronix.de) by Galois.linutronix.de with esmtp (Exim 4.80) (envelope-from ) id 1iNFnd-00017c-Sd; Wed, 23 Oct 2019 14:31:37 +0200 Message-Id: <20191023123118.296135499@linutronix.de> User-Agent: quilt/0.65 Date: Wed, 23 Oct 2019 14:27:12 +0200 From: Thomas Gleixner To: LKML Cc: x86@kernel.org, Peter Zijlstra , Andy Lutomirski , Will Deacon , Paolo Bonzini , kvm@vger.kernel.org, linux-arch@vger.kernel.org, Mike Rapoport , Josh Poimboeuf , Miroslav Benes Subject: [patch V2 07/17] x86/entry/64: Remove redundant interrupt disable References: <20191023122705.198339581@linutronix.de> MIME-Version: 1.0 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Now that the trap handlers return with interrupts disabled, the unconditional disabling of interrupts in the low level entry code can be removed along with the trace calls. Add debug checks where appropriate. Signed-off-by: Thomas Gleixner Reviewed-by: Sean Christopherson Reviewed-by: Alexandre Chartre --- arch/x86/entry/entry_64.S | 9 +++------ 1 file changed, 3 insertions(+), 6 deletions(-) --- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -595,8 +595,7 @@ END(common_spurious) call do_IRQ /* rdi points to pt_regs */ /* 0(%rsp): old RSP */ ret_from_intr: - DISABLE_INTERRUPTS(CLBR_ANY) - TRACE_IRQS_OFF + DEBUG_ENTRY_ASSERT_IRQS_OFF LEAVE_IRQ_STACK @@ -1252,8 +1251,7 @@ END(paranoid_entry) */ ENTRY(paranoid_exit) UNWIND_HINT_REGS - DISABLE_INTERRUPTS(CLBR_ANY) - TRACE_IRQS_OFF_DEBUG + DEBUG_ENTRY_ASSERT_IRQS_OFF testl %ebx, %ebx /* swapgs needed? */ jnz .Lparanoid_exit_no_swapgs TRACE_IRQS_IRETQ @@ -1356,8 +1354,7 @@ END(error_entry) ENTRY(error_exit) UNWIND_HINT_REGS - DISABLE_INTERRUPTS(CLBR_ANY) - TRACE_IRQS_OFF + DEBUG_ENTRY_ASSERT_IRQS_OFF testb $3, CS(%rsp) jz retint_kernel jmp retint_user From patchwork Wed Oct 23 12:27:13 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Thomas Gleixner X-Patchwork-Id: 11206591 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8AECC913 for ; Wed, 23 Oct 2019 12:33:01 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 6994821A4A for ; Wed, 23 Oct 2019 12:33:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2405046AbfJWMc5 (ORCPT ); Wed, 23 Oct 2019 08:32:57 -0400 Received: from Galois.linutronix.de ([193.142.43.55]:49112 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2391378AbfJWMbm (ORCPT ); Wed, 23 Oct 2019 08:31:42 -0400 Received: from localhost ([127.0.0.1] helo=nanos.tec.linutronix.de) by Galois.linutronix.de with esmtp (Exim 4.80) (envelope-from ) id 1iNFne-00017q-A1; Wed, 23 Oct 2019 14:31:38 +0200 Message-Id: <20191023123118.386844979@linutronix.de> User-Agent: quilt/0.65 Date: Wed, 23 Oct 2019 14:27:13 +0200 From: Thomas Gleixner To: LKML Cc: x86@kernel.org, Peter Zijlstra , Andy Lutomirski , Will Deacon , Paolo Bonzini , kvm@vger.kernel.org, linux-arch@vger.kernel.org, Mike Rapoport , Josh Poimboeuf , Miroslav Benes Subject: [patch V2 08/17] x86/entry: Move syscall irq tracing to C code References: <20191023122705.198339581@linutronix.de> MIME-Version: 1.0 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Interrupt state tracing can be safely done in C code. The few stack operations in assembly do not need to be covered. Remove the now pointless indirection via .Lsyscall_32_done and jump to swapgs_restore_regs_and_return_to_usermode directly. Signed-off-by: Thomas Gleixner Acked-by: Andy Lutomirski --- arch/x86/entry/common.c | 10 ++++++++++ arch/x86/entry/entry_32.S | 17 ----------------- arch/x86/entry/entry_64.S | 6 ------ arch/x86/entry/entry_64_compat.S | 30 ++++-------------------------- 4 files changed, 14 insertions(+), 49 deletions(-) --- a/arch/x86/entry/common.c +++ b/arch/x86/entry/common.c @@ -218,6 +218,9 @@ static void exit_to_usermode_loop(struct user_enter_irqoff(); mds_user_clear_cpu_buffers(); + + /* The return to usermode reenables interrupts. Tell the tracer */ + trace_hardirqs_on(); } #define SYSCALL_EXIT_WORK_FLAGS \ @@ -279,6 +282,9 @@ static void syscall_slow_exit_work(struc { struct thread_info *ti; + /* User to kernel transition disabled interrupts. */ + trace_hardirqs_off(); + enter_from_user_mode(); local_irq_enable(); ti = current_thread_info(); @@ -351,6 +357,7 @@ static __always_inline void do_syscall_3 /* Handles int $0x80 */ __visible void do_int80_syscall_32(struct pt_regs *regs) { + trace_hardirqs_off(); enter_from_user_mode(); local_irq_enable(); do_syscall_32_irqs_on(regs); @@ -367,6 +374,9 @@ static __always_inline void do_syscall_3 unsigned long landing_pad = (unsigned long)current->mm->context.vdso + vdso_image_32.sym_int80_landing_pad; + /* User to kernel transition disabled interrupts. */ + trace_hardirqs_off(); + /* * SYSENTER loses EIP, and even SYSCALL32 needs us to skip forward * so that 'regs->ip -= 2' lands back on an int $0x80 instruction. --- a/arch/x86/entry/entry_32.S +++ b/arch/x86/entry/entry_32.S @@ -924,12 +924,6 @@ ENTRY(entry_SYSENTER_32) jnz .Lsysenter_fix_flags .Lsysenter_flags_fixed: - /* - * User mode is traced as though IRQs are on, and SYSENTER - * turned them off. - */ - TRACE_IRQS_OFF - movl %esp, %eax call do_fast_syscall_32 /* XEN PV guests always use IRET path */ @@ -939,8 +933,6 @@ ENTRY(entry_SYSENTER_32) STACKLEAK_ERASE /* Opportunistic SYSEXIT */ - TRACE_IRQS_ON /* User mode traces as IRQs on. */ - /* * Setup entry stack - we keep the pointer in %eax and do the * switch after almost all user-state is restored. @@ -1039,12 +1031,6 @@ ENTRY(entry_INT80_32) SAVE_ALL pt_regs_ax=$-ENOSYS switch_stacks=1 /* save rest */ - /* - * User mode is traced as though IRQs are on, and the interrupt gate - * turned them off. - */ - TRACE_IRQS_OFF - movl %esp, %eax call do_int80_syscall_32 .Lsyscall_32_done: @@ -1052,11 +1038,8 @@ ENTRY(entry_INT80_32) STACKLEAK_ERASE restore_all: - TRACE_IRQS_IRET SWITCH_TO_ENTRY_STACK -.Lrestore_all_notrace: CHECK_AND_APPLY_ESPFIX -.Lrestore_nocheck: /* Switch back to user CR3 */ SWITCH_TO_USER_CR3 scratch_reg=%eax --- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -167,15 +167,11 @@ GLOBAL(entry_SYSCALL_64_after_hwframe) PUSH_AND_CLEAR_REGS rax=$-ENOSYS - TRACE_IRQS_OFF - /* IRQs are off. */ movq %rax, %rdi movq %rsp, %rsi call do_syscall_64 /* returns with IRQs disabled */ - TRACE_IRQS_IRETQ /* we're about to change IF */ - /* * Try to use SYSRET instead of IRET if we're returning to * a completely clean 64-bit userspace context. If we're not, @@ -342,7 +338,6 @@ ENTRY(ret_from_fork) UNWIND_HINT_REGS movq %rsp, %rdi call syscall_return_slowpath /* returns with IRQs disabled */ - TRACE_IRQS_ON /* user mode is traced as IRQS on */ jmp swapgs_restore_regs_and_return_to_usermode 1: @@ -606,7 +601,6 @@ END(common_spurious) GLOBAL(retint_user) mov %rsp,%rdi call prepare_exit_to_usermode - TRACE_IRQS_IRETQ GLOBAL(swapgs_restore_regs_and_return_to_usermode) #ifdef CONFIG_DEBUG_ENTRY --- a/arch/x86/entry/entry_64_compat.S +++ b/arch/x86/entry/entry_64_compat.S @@ -129,17 +129,11 @@ ENTRY(entry_SYSENTER_compat) jnz .Lsysenter_fix_flags .Lsysenter_flags_fixed: - /* - * User mode is traced as though IRQs are on, and SYSENTER - * turned them off. - */ - TRACE_IRQS_OFF - movq %rsp, %rdi call do_fast_syscall_32 /* XEN PV guests always use IRET path */ - ALTERNATIVE "testl %eax, %eax; jz .Lsyscall_32_done", \ - "jmp .Lsyscall_32_done", X86_FEATURE_XENPV + ALTERNATIVE "testl %eax, %eax; jz swapgs_restore_regs_and_return_to_usermode", \ + "jmp swapgs_restore_regs_and_return_to_usermode", X86_FEATURE_XENPV jmp sysret32_from_system_call .Lsysenter_fix_flags: @@ -247,17 +241,11 @@ GLOBAL(entry_SYSCALL_compat_after_hwfram pushq $0 /* pt_regs->r15 = 0 */ xorl %r15d, %r15d /* nospec r15 */ - /* - * User mode is traced as though IRQs are on, and SYSENTER - * turned them off. - */ - TRACE_IRQS_OFF - movq %rsp, %rdi call do_fast_syscall_32 /* XEN PV guests always use IRET path */ - ALTERNATIVE "testl %eax, %eax; jz .Lsyscall_32_done", \ - "jmp .Lsyscall_32_done", X86_FEATURE_XENPV + ALTERNATIVE "testl %eax, %eax; jz swapgs_restore_regs_and_return_to_usermode", \ + "jmp swapgs_restore_regs_and_return_to_usermode", X86_FEATURE_XENPV /* Opportunistic SYSRET */ sysret32_from_system_call: @@ -266,7 +254,6 @@ GLOBAL(entry_SYSCALL_compat_after_hwfram * stack. So let's erase the thread stack right now. */ STACKLEAK_ERASE - TRACE_IRQS_ON /* User mode traces as IRQs on. */ movq RBX(%rsp), %rbx /* pt_regs->rbx */ movq RBP(%rsp), %rbp /* pt_regs->rbp */ movq EFLAGS(%rsp), %r11 /* pt_regs->flags (in r11) */ @@ -403,17 +390,8 @@ ENTRY(entry_INT80_compat) xorl %r15d, %r15d /* nospec r15 */ cld - /* - * User mode is traced as though IRQs are on, and the interrupt - * gate turned them off. - */ - TRACE_IRQS_OFF - movq %rsp, %rdi call do_int80_syscall_32 -.Lsyscall_32_done: - /* Go back to user mode. */ - TRACE_IRQS_ON jmp swapgs_restore_regs_and_return_to_usermode END(entry_INT80_compat) From patchwork Wed Oct 23 12:27:14 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Thomas Gleixner X-Patchwork-Id: 11206593 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 97305913 for ; Wed, 23 Oct 2019 12:33:02 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 7719221920 for ; Wed, 23 Oct 2019 12:33:02 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2391420AbfJWMbm (ORCPT ); Wed, 23 Oct 2019 08:31:42 -0400 Received: from Galois.linutronix.de ([193.142.43.55]:49118 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2391394AbfJWMbm (ORCPT ); Wed, 23 Oct 2019 08:31:42 -0400 Received: from localhost ([127.0.0.1] helo=nanos.tec.linutronix.de) by Galois.linutronix.de with esmtp (Exim 4.80) (envelope-from ) id 1iNFne-000180-ON; Wed, 23 Oct 2019 14:31:38 +0200 Message-Id: <20191023123118.491328859@linutronix.de> User-Agent: quilt/0.65 Date: Wed, 23 Oct 2019 14:27:14 +0200 From: Thomas Gleixner To: LKML Cc: x86@kernel.org, Peter Zijlstra , Andy Lutomirski , Will Deacon , Paolo Bonzini , kvm@vger.kernel.org, linux-arch@vger.kernel.org, Mike Rapoport , Josh Poimboeuf , Miroslav Benes Subject: [patch V2 09/17] x86/entry: Remove _TIF_NOHZ from _TIF_WORK_SYSCALL_ENTRY References: <20191023122705.198339581@linutronix.de> MIME-Version: 1.0 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Thomas Gleixner Evaluating _TIF_NOHZ to decide whether to use the slow syscall entry path is not only pointless, it's actually counterproductive: 1) Context tracking code is invoked unconditionally before that flag is evaluated. 2) If the flag is set the slow path is invoked for nothing due to #1 Remove it. Signed-off-by: Thomas Gleixner --- arch/x86/include/asm/thread_info.h | 8 ++------ 1 file changed, 2 insertions(+), 6 deletions(-) --- a/arch/x86/include/asm/thread_info.h +++ b/arch/x86/include/asm/thread_info.h @@ -133,14 +133,10 @@ struct thread_info { #define _TIF_X32 (1 << TIF_X32) #define _TIF_FSCHECK (1 << TIF_FSCHECK) -/* - * work to do in syscall_trace_enter(). Also includes TIF_NOHZ for - * enter_from_user_mode() - */ +/* Work to do before invoking the actual syscall. */ #define _TIF_WORK_SYSCALL_ENTRY \ (_TIF_SYSCALL_TRACE | _TIF_SYSCALL_EMU | _TIF_SYSCALL_AUDIT | \ - _TIF_SECCOMP | _TIF_SYSCALL_TRACEPOINT | \ - _TIF_NOHZ) + _TIF_SECCOMP | _TIF_SYSCALL_TRACEPOINT) /* flags to check in __switch_to() */ #define _TIF_WORK_CTXSW_BASE \ From patchwork Wed Oct 23 12:27:15 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Thomas Gleixner X-Patchwork-Id: 11206589 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A9FC1913 for ; Wed, 23 Oct 2019 12:32:53 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 89940214B2 for ; Wed, 23 Oct 2019 12:32:53 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2391534AbfJWMcQ (ORCPT ); Wed, 23 Oct 2019 08:32:16 -0400 Received: from Galois.linutronix.de ([193.142.43.55]:49123 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2391402AbfJWMbo (ORCPT ); Wed, 23 Oct 2019 08:31:44 -0400 Received: from localhost ([127.0.0.1] helo=nanos.tec.linutronix.de) by Galois.linutronix.de with esmtp (Exim 4.80) (envelope-from ) id 1iNFnf-000187-8s; Wed, 23 Oct 2019 14:31:39 +0200 Message-Id: <20191023123118.596517860@linutronix.de> User-Agent: quilt/0.65 Date: Wed, 23 Oct 2019 14:27:15 +0200 From: Thomas Gleixner To: LKML Cc: x86@kernel.org, Peter Zijlstra , Andy Lutomirski , Will Deacon , Paolo Bonzini , kvm@vger.kernel.org, linux-arch@vger.kernel.org, Mike Rapoport , Josh Poimboeuf , Miroslav Benes Subject: [patch V2 10/17] entry: Provide generic syscall entry functionality References: <20191023122705.198339581@linutronix.de> MIME-Version: 1.0 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Thomas Gleixner On syscall entry certain work needs to be done conditionally like tracing, seccomp etc. This code is duplicated in all architectures. Provide a generic version. Signed-off-by: Thomas Gleixner --- V2: Fix function documentation (Mike) Add comment about return value (Andy) --- arch/Kconfig | 3 include/linux/entry-common.h | 132 +++++++++++++++++++++++++++++++++++++++++++ kernel/Makefile | 1 kernel/entry/Makefile | 3 kernel/entry/common.c | 33 ++++++++++ 5 files changed, 172 insertions(+) --- a/arch/Kconfig +++ b/arch/Kconfig @@ -27,6 +27,9 @@ config HAVE_IMA_KEXEC config HOTPLUG_SMT bool +config GENERIC_ENTRY + bool + config OPROFILE tristate "OProfile system profiling" depends on PROFILING --- /dev/null +++ b/include/linux/entry-common.h @@ -0,0 +1,132 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef __LINUX_ENTRYCOMMON_H +#define __LINUX_ENTRYCOMMON_H + +#include +#include +#include +#include +#include + +#include + +/* + * Define dummy _TIF work flags if not defined by the architecture or for + * disabled functionality. + */ +#ifndef _TIF_SYSCALL_TRACE +# define _TIF_SYSCALL_TRACE (0) +#endif + +#ifndef _TIF_SYSCALL_EMU +# define _TIF_SYSCALL_EMU (0) +#endif + +#ifndef _TIF_SYSCALL_TRACEPOINT +# define _TIF_SYSCALL_TRACEPOINT (0) +#endif + +#ifndef _TIF_SECCOMP +# define _TIF_SECCOMP (0) +#endif + +#ifndef _TIF_AUDIT +# define _TIF_AUDIT (0) +#endif + +/* + * TIF flags handled in syscall_enter_from_usermode() + */ +#ifndef ARCH_SYSCALL_ENTER_WORK +# define ARCH_SYSCALL_ENTER_WORK (0) +#endif + +#define SYSCALL_ENTER_WORK \ + (_TIF_SYSCALL_TRACE | _TIF_SYSCALL_AUDIT | TIF_SECCOMP | \ + _TIF_SYSCALL_TRACEPOINT | _TIF_SYSCALL_EMU | \ + ARCH_SYSCALL_ENTER_WORK) + +/** + * arch_syscall_enter_tracehook - Wrapper around tracehook_report_syscall_entry() + * @regs: Pointer to currents pt_regs + * + * Returns: 0 on success or an error code to skip the syscall. + * + * Defaults to tracehook_report_syscall_entry(). Can be replaced by + * architecture specific code. + * + * Invoked from syscall_enter_from_usermode() + */ +static inline __must_check int arch_syscall_enter_tracehook(struct pt_regs *regs); + +#ifndef arch_syscall_enter_tracehook +static inline __must_check int arch_syscall_enter_tracehook(struct pt_regs *regs) +{ + return tracehook_report_syscall_entry(regs); +} +#endif + +/** + * arch_syscall_enter_seccomp - Architecture specific seccomp invocation + * @regs: Pointer to currents pt_regs + * + * Returns: The original or a modified syscall number + * + * Invoked from syscall_enter_from_usermode(). Can be replaced by + * architecture specific code. + */ +static inline long arch_syscall_enter_seccomp(struct pt_regs *regs); + +#ifndef arch_syscall_enter_seccomp +static inline long arch_syscall_enter_seccomp(struct pt_regs *regs) +{ + return secure_computing(NULL); +} +#endif + +/** + * arch_syscall_enter_audit - Architecture specific audit invocation + * @regs: Pointer to currents pt_regs + * + * Invoked from syscall_enter_from_usermode(). Must be replaced by + * architecture specific code if the architecture supports audit. + */ +static inline void arch_syscall_enter_audit(struct pt_regs *regs); + +#ifndef arch_syscall_enter_audit +static inline void arch_syscall_enter_audit(struct pt_regs *regs) { } +#endif + +/* Common syscall enter function */ +long core_syscall_enter_from_usermode(struct pt_regs *regs, long syscall); + +/** + * syscall_enter_from_usermode - Check and handle work before invoking + * a syscall + * @regs: Pointer to currents pt_regs + * @syscall: The syscall number + * + * Invoked from architecture specific syscall entry code with interrupts + * enabled. + * + * Returns: The original or a modified syscall number + * + * If the returned syscall number is -1 then the syscall should be + * skipped. In this case the caller may invoke syscall_set_error() or + * syscall_set_return_value() first. If neither of those is called and -1 + * is returned, then the syscall will fail with ENOSYS. + */ +static inline long syscall_enter_from_usermode(struct pt_regs *regs, + long syscall) +{ + unsigned long ti_work = READ_ONCE(current_thread_info()->flags); + + if (IS_ENABLED(CONFIG_DEBUG_ENTRY)) + BUG_ON(regs != task_pt_regs(current)); + + if (ti_work & SYSCALL_ENTER_WORK) + syscall = core_syscall_enter_from_usermode(regs, syscall); + return syscall; +} + +#endif --- a/kernel/Makefile +++ b/kernel/Makefile @@ -43,6 +43,7 @@ obj-y += irq/ obj-y += rcu/ obj-y += livepatch/ obj-y += dma/ +obj-y += entry/ obj-$(CONFIG_CHECKPOINT_RESTORE) += kcmp.o obj-$(CONFIG_FREEZER) += freezer.o --- /dev/null +++ b/kernel/entry/Makefile @@ -0,0 +1,3 @@ +# SPDX-License-Identifier: GPL-2.0 + +obj-$(CONFIG_GENERIC_ENTRY) += common.o --- /dev/null +++ b/kernel/entry/common.c @@ -0,0 +1,33 @@ +// SPDX-License-Identifier: GPL-2.0 + +#include +#include + +#define CREATE_TRACE_POINTS +#include + +long core_syscall_enter_from_usermode(struct pt_regs *regs, long syscall) +{ + unsigned long ti_work = READ_ONCE(current_thread_info()->flags); + unsigned long ret = 0; + + if (ti_work & (_TIF_SYSCALL_TRACE | _TIF_SYSCALL_EMU)) { + ret = arch_syscall_enter_tracehook(regs); + if (ret || (ti_work & _TIF_SYSCALL_EMU)) + return -1L; + } + + /* Do seccomp after ptrace, to catch any tracer changes. */ + if (ti_work & _TIF_SECCOMP) { + ret = arch_syscall_enter_seccomp(regs); + if (ret == -1L) + return ret; + } + + if (unlikely(test_thread_flag(TIF_SYSCALL_TRACEPOINT))) + trace_sys_enter(regs, syscall); + + arch_syscall_enter_audit(regs); + + return ret ? : syscall; +} From patchwork Wed Oct 23 12:27:16 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Thomas Gleixner X-Patchwork-Id: 11206587 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C5504913 for ; Wed, 23 Oct 2019 12:32:33 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id A3ABA222C1 for ; Wed, 23 Oct 2019 12:32:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2391542AbfJWMcQ (ORCPT ); Wed, 23 Oct 2019 08:32:16 -0400 Received: from Galois.linutronix.de ([193.142.43.55]:49128 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2391410AbfJWMbo (ORCPT ); Wed, 23 Oct 2019 08:31:44 -0400 Received: from localhost ([127.0.0.1] helo=nanos.tec.linutronix.de) by Galois.linutronix.de with esmtp (Exim 4.80) (envelope-from ) id 1iNFnf-00018D-Lh; Wed, 23 Oct 2019 14:31:39 +0200 Message-Id: <20191023123118.687475813@linutronix.de> User-Agent: quilt/0.65 Date: Wed, 23 Oct 2019 14:27:16 +0200 From: Thomas Gleixner To: LKML Cc: x86@kernel.org, Peter Zijlstra , Andy Lutomirski , Will Deacon , Paolo Bonzini , kvm@vger.kernel.org, linux-arch@vger.kernel.org, Mike Rapoport , Josh Poimboeuf , Miroslav Benes Subject: [patch V2 11/17] x86/entry: Use generic syscall entry function References: <20191023122705.198339581@linutronix.de> MIME-Version: 1.0 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Thomas Gleixner Replace the syscall entry work handling with the generic version, Provide the necessary helper inlines to handle the real architecture specific parts, e.g. audit and seccomp invocations. Signed-off-by: Thomas Gleixner --- arch/x86/Kconfig | 1 arch/x86/entry/common.c | 108 +++--------------------------------- arch/x86/include/asm/entry-common.h | 59 +++++++++++++++++++ arch/x86/include/asm/thread_info.h | 5 - 4 files changed, 70 insertions(+), 103 deletions(-) --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -111,6 +111,7 @@ config X86 select GENERIC_CPU_AUTOPROBE select GENERIC_CPU_VULNERABILITIES select GENERIC_EARLY_IOREMAP + select GENERIC_ENTRY select GENERIC_FIND_FIRST_BIT select GENERIC_IOMAP select GENERIC_IRQ_EFFECTIVE_AFF_MASK if SMP --- a/arch/x86/entry/common.c +++ b/arch/x86/entry/common.c @@ -10,13 +10,13 @@ #include #include #include +#include #include #include #include #include #include #include -#include #include #include #include @@ -34,7 +34,6 @@ #include #include -#define CREATE_TRACE_POINTS #include #ifdef CONFIG_CONTEXT_TRACKING @@ -48,86 +47,6 @@ static inline void enter_from_user_mode(void) {} #endif -static void do_audit_syscall_entry(struct pt_regs *regs, u32 arch) -{ -#ifdef CONFIG_X86_64 - if (arch == AUDIT_ARCH_X86_64) { - audit_syscall_entry(regs->orig_ax, regs->di, - regs->si, regs->dx, regs->r10); - } else -#endif - { - audit_syscall_entry(regs->orig_ax, regs->bx, - regs->cx, regs->dx, regs->si); - } -} - -/* - * Returns the syscall nr to run (which should match regs->orig_ax) or -1 - * to skip the syscall. - */ -static long syscall_trace_enter(struct pt_regs *regs) -{ - u32 arch = in_ia32_syscall() ? AUDIT_ARCH_I386 : AUDIT_ARCH_X86_64; - - struct thread_info *ti = current_thread_info(); - unsigned long ret = 0; - u32 work; - - if (IS_ENABLED(CONFIG_DEBUG_ENTRY)) - BUG_ON(regs != task_pt_regs(current)); - - work = READ_ONCE(ti->flags); - - if (work & (_TIF_SYSCALL_TRACE | _TIF_SYSCALL_EMU)) { - ret = tracehook_report_syscall_entry(regs); - if (ret || (work & _TIF_SYSCALL_EMU)) - return -1L; - } - -#ifdef CONFIG_SECCOMP - /* - * Do seccomp after ptrace, to catch any tracer changes. - */ - if (work & _TIF_SECCOMP) { - struct seccomp_data sd; - - sd.arch = arch; - sd.nr = regs->orig_ax; - sd.instruction_pointer = regs->ip; -#ifdef CONFIG_X86_64 - if (arch == AUDIT_ARCH_X86_64) { - sd.args[0] = regs->di; - sd.args[1] = regs->si; - sd.args[2] = regs->dx; - sd.args[3] = regs->r10; - sd.args[4] = regs->r8; - sd.args[5] = regs->r9; - } else -#endif - { - sd.args[0] = regs->bx; - sd.args[1] = regs->cx; - sd.args[2] = regs->dx; - sd.args[3] = regs->si; - sd.args[4] = regs->di; - sd.args[5] = regs->bp; - } - - ret = __secure_computing(&sd); - if (ret == -1) - return ret; - } -#endif - - if (unlikely(test_thread_flag(TIF_SYSCALL_TRACEPOINT))) - trace_sys_enter(regs, regs->orig_ax); - - do_audit_syscall_entry(regs, arch); - - return ret ?: regs->orig_ax; -} - #define EXIT_TO_USERMODE_LOOP_FLAGS \ (_TIF_SIGPENDING | _TIF_NOTIFY_RESUME | _TIF_UPROBE | \ _TIF_NEED_RESCHED | _TIF_USER_RETURN_NOTIFY | _TIF_PATCH_PENDING) @@ -280,16 +199,13 @@ static void syscall_slow_exit_work(struc #ifdef CONFIG_X86_64 __visible void do_syscall_64(unsigned long nr, struct pt_regs *regs) { - struct thread_info *ti; - /* User to kernel transition disabled interrupts. */ trace_hardirqs_off(); enter_from_user_mode(); local_irq_enable(); - ti = current_thread_info(); - if (READ_ONCE(ti->flags) & _TIF_WORK_SYSCALL_ENTRY) - nr = syscall_trace_enter(regs); + + nr = syscall_enter_from_usermode(regs, nr); if (likely(nr < NR_syscalls)) { nr = array_index_nospec(nr, NR_syscalls); @@ -316,22 +232,18 @@ static void syscall_slow_exit_work(struc */ static __always_inline void do_syscall_32_irqs_on(struct pt_regs *regs) { - struct thread_info *ti = current_thread_info(); unsigned int nr = (unsigned int)regs->orig_ax; #ifdef CONFIG_IA32_EMULATION - ti->status |= TS_COMPAT; + current_thread_info()->status |= TS_COMPAT; #endif - if (READ_ONCE(ti->flags) & _TIF_WORK_SYSCALL_ENTRY) { - /* - * Subtlety here: if ptrace pokes something larger than - * 2^32-1 into orig_ax, this truncates it. This may or - * may not be necessary, but it matches the old asm - * behavior. - */ - nr = syscall_trace_enter(regs); - } + /* + * Subtlety here: if ptrace pokes something larger than 2^32-1 into + * orig_ax, this truncates it. This may or may not be necessary, + * but it matches the old asm behavior. + */ + nr = syscall_enter_from_usermode(regs, nr); if (likely(nr < IA32_NR_syscalls)) { nr = array_index_nospec(nr, IA32_NR_syscalls); --- /dev/null +++ b/arch/x86/include/asm/entry-common.h @@ -0,0 +1,59 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +#ifndef _ASM_X86_ENTRY_COMMON_H +#define _ASM_X86_ENTRY_COMMON_H + +#include +#include + +static inline long arch_syscall_enter_seccomp(struct pt_regs *regs) +{ +#ifdef CONFIG_SECCOMP + u32 arch = in_ia32_syscall() ? AUDIT_ARCH_I386 : AUDIT_ARCH_X86_64; + struct seccomp_data sd; + + sd.arch = arch; + sd.nr = regs->orig_ax; + sd.instruction_pointer = regs->ip; + +#ifdef CONFIG_X86_64 + if (arch == AUDIT_ARCH_X86_64) { + sd.args[0] = regs->di; + sd.args[1] = regs->si; + sd.args[2] = regs->dx; + sd.args[3] = regs->r10; + sd.args[4] = regs->r8; + sd.args[5] = regs->r9; + } else +#endif + { + sd.args[0] = regs->bx; + sd.args[1] = regs->cx; + sd.args[2] = regs->dx; + sd.args[3] = regs->si; + sd.args[4] = regs->di; + sd.args[5] = regs->bp; + } + + return __secure_computing(&sd); +#else + return 0; +#endif +} +#define arch_syscall_enter_seccomp arch_syscall_enter_seccomp + +static inline void arch_syscall_enter_audit(struct pt_regs *regs) +{ +#ifdef CONFIG_X86_64 + if (in_ia32_syscall()) { + audit_syscall_entry(regs->orig_ax, regs->di, + regs->si, regs->dx, regs->r10); + } else +#endif + { + audit_syscall_entry(regs->orig_ax, regs->bx, + regs->cx, regs->dx, regs->si); + } +} +#define arch_syscall_enter_audit arch_syscall_enter_audit + +#endif --- a/arch/x86/include/asm/thread_info.h +++ b/arch/x86/include/asm/thread_info.h @@ -133,11 +133,6 @@ struct thread_info { #define _TIF_X32 (1 << TIF_X32) #define _TIF_FSCHECK (1 << TIF_FSCHECK) -/* Work to do before invoking the actual syscall. */ -#define _TIF_WORK_SYSCALL_ENTRY \ - (_TIF_SYSCALL_TRACE | _TIF_SYSCALL_EMU | _TIF_SYSCALL_AUDIT | \ - _TIF_SECCOMP | _TIF_SYSCALL_TRACEPOINT) - /* flags to check in __switch_to() */ #define _TIF_WORK_CTXSW_BASE \ (_TIF_IO_BITMAP|_TIF_NOCPUID|_TIF_NOTSC|_TIF_BLOCKSTEP| \ From patchwork Wed Oct 23 12:27:17 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Thomas Gleixner X-Patchwork-Id: 11206583 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 829A7112B for ; Wed, 23 Oct 2019 12:32:31 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 65F3321BE5 for ; Wed, 23 Oct 2019 12:32:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2391550AbfJWMcR (ORCPT ); Wed, 23 Oct 2019 08:32:17 -0400 Received: from Galois.linutronix.de ([193.142.43.55]:49134 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2391417AbfJWMbo (ORCPT ); Wed, 23 Oct 2019 08:31:44 -0400 Received: from localhost ([127.0.0.1] helo=nanos.tec.linutronix.de) by Galois.linutronix.de with esmtp (Exim 4.80) (envelope-from ) id 1iNFng-00018J-4M; Wed, 23 Oct 2019 14:31:40 +0200 Message-Id: <20191023123118.778776715@linutronix.de> User-Agent: quilt/0.65 Date: Wed, 23 Oct 2019 14:27:17 +0200 From: Thomas Gleixner To: LKML Cc: x86@kernel.org, Peter Zijlstra , Andy Lutomirski , Will Deacon , Paolo Bonzini , kvm@vger.kernel.org, linux-arch@vger.kernel.org, Mike Rapoport , Josh Poimboeuf , Miroslav Benes Subject: [patch V2 12/17] entry: Provide generic syscall exit function References: <20191023122705.198339581@linutronix.de> MIME-Version: 1.0 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Thomas Gleixner Like syscall entry all architectures have similar and pointlessly different code to handle pending work before returning from a syscall to user space. Provide a generic version. Signed-off-by: Thomas Gleixner --- include/linux/entry-common.h | 31 ++++++++++++++++++++++++ kernel/entry/common.c | 55 +++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 86 insertions(+) --- a/include/linux/entry-common.h +++ b/include/linux/entry-common.h @@ -46,6 +46,17 @@ _TIF_SYSCALL_TRACEPOINT | _TIF_SYSCALL_EMU | \ ARCH_SYSCALL_ENTER_WORK) +/* + * TIF flags handled in syscall_exit_to_usermode() + */ +#ifndef ARCH_SYSCALL_EXIT_WORK +# define ARCH_SYSCALL_EXIT_WORK (0) +#endif + +#define SYSCALL_EXIT_WORK \ + (_TIF_SYSCALL_TRACE | _TIF_SYSCALL_AUDIT | \ + _TIF_SYSCALL_TRACEPOINT | ARCH_SYSCALL_EXIT_WORK) + /** * arch_syscall_enter_tracehook - Wrapper around tracehook_report_syscall_entry() * @regs: Pointer to currents pt_regs @@ -129,4 +140,24 @@ static inline long syscall_enter_from_us return syscall; } +/** + * arch_syscall_exit_tracehook - Wrapper around tracehook_report_syscall_exit() + * + * Defaults to tracehook_report_syscall_exit(). Can be replaced by + * architecture specific code. + * + * Invoked from syscall_exit_to_usermode() + */ +static inline void arch_syscall_exit_tracehook(struct pt_regs *regs, bool step); + +#ifndef arch_syscall_exit_tracehook +static inline void arch_syscall_exit_tracehook(struct pt_regs *regs, bool step) +{ + tracehook_report_syscall_exit(regs, step); +} +#endif + +/* Common syscall exit function */ +void syscall_exit_to_usermode(struct pt_regs *regs, long syscall, long retval); + #endif --- a/kernel/entry/common.c +++ b/kernel/entry/common.c @@ -31,3 +31,58 @@ long core_syscall_enter_from_usermode(st return ret ? : syscall; } + +#ifndef _TIF_SINGLESTEP +static inline bool report_single_step(unsigned long ti_work) +{ + return false; +} +#else +/* + * If TIF_SYSCALL_EMU is set, then the only reason to report is when + * TIF_SINGLESTEP is set (i.e. PTRACE_SYSEMU_SINGLESTEP). This syscall + * instruction has been already reported in syscall_enter_from_usermode(). + */ +#define SYSEMU_STEP (_TIF_SINGLESTEP | _TIF_SYSCALL_EMU) + +static inline bool report_single_step(unsigned long ti_work) +{ + return (ti_work & SYSEMU_STEP) == _TIF_SINGLESTEP; +} +#endif + +static void syscall_exit_work(struct pt_regs *regs, long retval, + unsigned long ti_work) +{ + bool step; + + audit_syscall_exit(regs); + + if (ti_work & _TIF_SYSCALL_TRACEPOINT) + trace_sys_exit(regs, retval); + + step = report_single_step(ti_work); + if (step || ti_work & _TIF_SYSCALL_TRACE) + arch_syscall_exit_tracehook(regs, step); +} + +void syscall_exit_to_usermode(struct pt_regs *regs, long syscall, long retval) +{ + unsigned long ti_work; + + CT_WARN_ON(ct_state() != CONTEXT_KERNEL); + + if (IS_ENABLED(CONFIG_PROVE_LOCKING) && + WARN(irqs_disabled(), "syscall %ld left IRQs disabled", syscall)) + local_irq_enable(); + + rseq_syscall(regs); + + /* + * Handle work which needs to run exactly once per syscall exit + * with interrupts enabled. + */ + ti_work = READ_ONCE(current_thread_info()->flags); + if (unlikely(ti_work & SYSCALL_EXIT_WORK)) + syscall_exit_work(regs, retval, ti_work); +} From patchwork Wed Oct 23 12:27:18 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Thomas Gleixner X-Patchwork-Id: 11206581 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D1BE7913 for ; Wed, 23 Oct 2019 12:32:18 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id BA94D21920 for ; Wed, 23 Oct 2019 12:32:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2391556AbfJWMcS (ORCPT ); Wed, 23 Oct 2019 08:32:18 -0400 Received: from Galois.linutronix.de ([193.142.43.55]:49143 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2391426AbfJWMbo (ORCPT ); Wed, 23 Oct 2019 08:31:44 -0400 Received: from localhost ([127.0.0.1] helo=nanos.tec.linutronix.de) by Galois.linutronix.de with esmtp (Exim 4.80) (envelope-from ) id 1iNFng-00018T-J0; Wed, 23 Oct 2019 14:31:40 +0200 Message-Id: <20191023123118.871105130@linutronix.de> User-Agent: quilt/0.65 Date: Wed, 23 Oct 2019 14:27:18 +0200 From: Thomas Gleixner To: LKML Cc: x86@kernel.org, Peter Zijlstra , Andy Lutomirski , Will Deacon , Paolo Bonzini , kvm@vger.kernel.org, linux-arch@vger.kernel.org, Mike Rapoport , Josh Poimboeuf , Miroslav Benes Subject: [patch V2 13/17] x86/entry: Use generic syscall exit functionality References: <20191023122705.198339581@linutronix.de> MIME-Version: 1.0 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Replace the x86 variant with the generic version. Signed-off-by: Thomas Gleixner --- arch/x86/entry/common.c | 44 ------------------------------------ arch/x86/include/asm/entry-common.h | 2 + 2 files changed, 3 insertions(+), 43 deletions(-) --- a/arch/x86/entry/common.c +++ b/arch/x86/entry/common.c @@ -142,55 +142,13 @@ static void exit_to_usermode_loop(struct trace_hardirqs_on(); } -#define SYSCALL_EXIT_WORK_FLAGS \ - (_TIF_SYSCALL_TRACE | _TIF_SYSCALL_AUDIT | \ - _TIF_SINGLESTEP | _TIF_SYSCALL_TRACEPOINT) - -static void syscall_slow_exit_work(struct pt_regs *regs, u32 cached_flags) -{ - bool step; - - audit_syscall_exit(regs); - - if (cached_flags & _TIF_SYSCALL_TRACEPOINT) - trace_sys_exit(regs, regs->ax); - - /* - * If TIF_SYSCALL_EMU is set, we only get here because of - * TIF_SINGLESTEP (i.e. this is PTRACE_SYSEMU_SINGLESTEP). - * We already reported this syscall instruction in - * syscall_trace_enter(). - */ - step = unlikely( - (cached_flags & (_TIF_SINGLESTEP | _TIF_SYSCALL_EMU)) - == _TIF_SINGLESTEP); - if (step || cached_flags & _TIF_SYSCALL_TRACE) - tracehook_report_syscall_exit(regs, step); -} - /* * Called with IRQs on and fully valid regs. Returns with IRQs off in a * state such that we can immediately switch to user mode. */ __visible inline void syscall_return_slowpath(struct pt_regs *regs) { - struct thread_info *ti = current_thread_info(); - u32 cached_flags = READ_ONCE(ti->flags); - - CT_WARN_ON(ct_state() != CONTEXT_KERNEL); - - if (IS_ENABLED(CONFIG_PROVE_LOCKING) && - WARN(irqs_disabled(), "syscall %ld left IRQs disabled", regs->orig_ax)) - local_irq_enable(); - - rseq_syscall(regs); - - /* - * First do one-time work. If these work items are enabled, we - * want to run them exactly once per syscall exit with IRQs on. - */ - if (unlikely(cached_flags & SYSCALL_EXIT_WORK_FLAGS)) - syscall_slow_exit_work(regs, cached_flags); + syscall_exit_to_usermode(regs, regs->orig_ax, regs->ax); local_irq_disable(); prepare_exit_to_usermode(regs); --- a/arch/x86/include/asm/entry-common.h +++ b/arch/x86/include/asm/entry-common.h @@ -5,6 +5,8 @@ #include #include +#define ARCH_SYSCALL_EXIT_WORK (_TIF_SINGLESTEP) + static inline long arch_syscall_enter_seccomp(struct pt_regs *regs) { #ifdef CONFIG_SECCOMP From patchwork Wed Oct 23 12:27:19 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Thomas Gleixner X-Patchwork-Id: 11206569 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id EB333913 for ; Wed, 23 Oct 2019 12:31:53 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id CAF9321D81 for ; Wed, 23 Oct 2019 12:31:53 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2391428AbfJWMbr (ORCPT ); Wed, 23 Oct 2019 08:31:47 -0400 Received: from Galois.linutronix.de ([193.142.43.55]:49149 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2391431AbfJWMbp (ORCPT ); Wed, 23 Oct 2019 08:31:45 -0400 Received: from localhost ([127.0.0.1] helo=nanos.tec.linutronix.de) by Galois.linutronix.de with esmtp (Exim 4.80) (envelope-from ) id 1iNFnh-00018l-5u; Wed, 23 Oct 2019 14:31:41 +0200 Message-Id: <20191023123118.978254388@linutronix.de> User-Agent: quilt/0.65 Date: Wed, 23 Oct 2019 14:27:19 +0200 From: Thomas Gleixner To: LKML Cc: x86@kernel.org, Peter Zijlstra , Andy Lutomirski , Will Deacon , Paolo Bonzini , kvm@vger.kernel.org, linux-arch@vger.kernel.org, Mike Rapoport , Josh Poimboeuf , Miroslav Benes Subject: [patch V2 14/17] entry: Provide generic exit to usermode functionality References: <20191023122705.198339581@linutronix.de> MIME-Version: 1.0 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Thomas Gleixner Provide a generic facility to handle the exit to usermode work. That's aimed to replace the pointlessly different copies in each architecture. Signed-off-by: Thomas Gleixner --- V2: Move lockdep and address limit check right to the end of the return sequence. (PeterZ) --- include/linux/entry-common.h | 105 +++++++++++++++++++++++++++++++++++++++++++ kernel/entry/common.c | 82 +++++++++++++++++++++++++++++++++ 2 files changed, 187 insertions(+) --- a/include/linux/entry-common.h +++ b/include/linux/entry-common.h @@ -34,6 +34,30 @@ # define _TIF_AUDIT (0) #endif +#ifndef _TIF_UPROBE +# define _TIF_UPROBE (0) +#endif + +#ifndef _TIF_PATCH_PENDING +# define _TIF_PATCH_PENDING (0) +#endif + +#ifndef _TIF_NOTIFY_RESUME +# define _TIF_NOTIFY_RESUME (0) +#endif + +/* + * TIF flags handled in exit_to_usermode() + */ +#ifndef ARCH_EXIT_TO_USERMODE_WORK +# define ARCH_EXIT_TO_USERMODE_WORK (0) +#endif + +#define EXIT_TO_USERMODE_WORK \ + (_TIF_SIGPENDING | _TIF_NOTIFY_RESUME | _TIF_UPROBE | \ + _TIF_NEED_RESCHED | _TIF_PATCH_PENDING | \ + ARCH_EXIT_TO_USERMODE_WORK) + /* * TIF flags handled in syscall_enter_from_usermode() */ @@ -58,6 +82,87 @@ _TIF_SYSCALL_TRACEPOINT | ARCH_SYSCALL_EXIT_WORK) /** + * local_irq_enable_exit_to_user - Exit to user variant of local_irq_enable() + * @ti_work: Cached TIF flags gathered with interrupts disabled + * + * Defaults to local_irq_enable(). Can be supplied by architecture specific + * code. + */ +static inline void local_irq_enable_exit_to_user(unsigned long ti_work); + +#ifndef local_irq_enable_exit_to_user +static inline void local_irq_enable_exit_to_user(unsigned long ti_work) +{ + local_irq_enable(); +} +#endif + +/** + * local_irq_disable_exit_to_user - Exit to user variant of local_irq_disable() + * + * Defaults to local_irq_disable(). Can be supplied by architecture specific + * code. + */ +static inline void local_irq_disable_exit_to_user(void); + +#ifndef local_irq_disable_exit_to_user +static inline void local_irq_disable_exit_to_user(void) +{ + local_irq_disable(); +} +#endif + +/** + * arch_exit_to_usermode_work - Architecture specific TIF work for + * exit to user mode. + * @regs: Pointer to currents pt_regs + * @ti_work: Cached TIF flags gathered with interrupts disabled + * + * Invoked from exit_to_usermode() with interrupt disabled + * + * Defaults to NOOP. Can be supplied by architecture specific code. + */ +static inline void arch_exit_to_usermode_work(struct pt_regs *regs, + unsigned long ti_work); + +#ifndef arch_exit_to_usermode_work +static inline void arch_exit_to_usermode_work(struct pt_regs *regs, + unsigned long ti_work) +{ +} +#endif + +/** + * arch_exit_to_usermode - Architecture specific preparation for + * exit to user mode. + * @regs: Pointer to currents pt_regs + * @ti_work: Cached TIF flags gathered with interrupts disabled + * + * Invoked from exit_to_usermode() with interrupt disabled as the last + * function before return. + */ +static inline void arch_exit_to_usermode(struct pt_regs *regs, + unsigned long ti_work); + +#ifndef arch_exit_to_usermode +static inline void arch_exit_to_usermode(struct pt_regs *regs, + unsigned long ti_work) +{ +} +#endif + +/* Common exit to usermode function to handle TIF work */ +asmlinkage __visible void exit_to_usermode(struct pt_regs *regs); + +/** + * arch_do_signal - Architecture specific signal delivery function + * @regs: Pointer to currents pt_regs + * + * Invoked from exit_to_usermode() + */ +void arch_do_signal(struct pt_regs *regs); + +/** * arch_syscall_enter_tracehook - Wrapper around tracehook_report_syscall_entry() * @regs: Pointer to currents pt_regs * --- a/kernel/entry/common.c +++ b/kernel/entry/common.c @@ -2,10 +2,86 @@ #include #include +#include +#include #define CREATE_TRACE_POINTS #include +static unsigned long core_exit_to_usermode_work(struct pt_regs *regs, + unsigned long ti_work) +{ + /* + * Before returning to user space ensure that all pending work + * items have been completed. + */ + while (ti_work & EXIT_TO_USERMODE_WORK) { + + local_irq_enable_exit_to_user(ti_work); + + if (ti_work & _TIF_NEED_RESCHED) + schedule(); + + if (ti_work & _TIF_UPROBE) + uprobe_notify_resume(regs); + + if (ti_work & _TIF_PATCH_PENDING) + klp_update_patch_state(current); + + if (ti_work & _TIF_SIGPENDING) + arch_do_signal(regs); + + if (ti_work & _TIF_NOTIFY_RESUME) { + clear_thread_flag(TIF_NOTIFY_RESUME); + tracehook_notify_resume(regs); + rseq_handle_notify_resume(NULL, regs); + } + + /* Architecture specific TIF work */ + arch_exit_to_usermode_work(regs, ti_work); + + /* + * Disable interrupts and reevaluate the work flags as they + * might have changed while interrupts and preemption was + * enabled above. + */ + local_irq_disable_exit_to_user(); + ti_work = READ_ONCE(current_thread_info()->flags); + } + return ti_work; +} + +static void do_exit_to_usermode(struct pt_regs *regs) +{ + unsigned long ti_work = READ_ONCE(current_thread_info()->flags); + + if (unlikely(ti_work & EXIT_TO_USERMODE_WORK)) + ti_work = core_exit_to_usermode_work(regs, ti_work); + + arch_exit_to_usermode(regs, ti_work); + + /* Ensure no locks are held and the address limit is intact */ + lockdep_sys_exit(); + addr_limit_user_check(); + + /* Return to userspace right after this which turns on interrupts */ + trace_hardirqs_on(); +} + +/** + * exit_to_usermode - Check and handle pending work which needs to be + * handled before returning to user mode + * @regs: Pointer to currents pt_regs + * + * Called and returns with interrupts disabled + */ +asmlinkage __visible void exit_to_usermode(struct pt_regs *regs) +{ + trace_hardirqs_off(); + lockdep_assert_irqs_disabled(); + do_exit_to_usermode(regs); +} + long core_syscall_enter_from_usermode(struct pt_regs *regs, long syscall) { unsigned long ti_work = READ_ONCE(current_thread_info()->flags); @@ -85,4 +161,10 @@ void syscall_exit_to_usermode(struct pt_ ti_work = READ_ONCE(current_thread_info()->flags); if (unlikely(ti_work & SYSCALL_EXIT_WORK)) syscall_exit_work(regs, retval, ti_work); + + /* + * Disable interrupts and handle the regular exit to user mode work + */ + local_irq_disable_exit_to_user(); + do_exit_to_usermode(regs); } From patchwork Wed Oct 23 12:27:20 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Thomas Gleixner X-Patchwork-Id: 11206577 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 49676913 for ; Wed, 23 Oct 2019 12:32:13 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 27E0A21A4A for ; Wed, 23 Oct 2019 12:32:13 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2391474AbfJWMby (ORCPT ); Wed, 23 Oct 2019 08:31:54 -0400 Received: from Galois.linutronix.de ([193.142.43.55]:49150 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2391434AbfJWMbq (ORCPT ); Wed, 23 Oct 2019 08:31:46 -0400 Received: from localhost ([127.0.0.1] helo=nanos.tec.linutronix.de) by Galois.linutronix.de with esmtp (Exim 4.80) (envelope-from ) id 1iNFnh-00018v-LA; Wed, 23 Oct 2019 14:31:41 +0200 Message-Id: <20191023123119.083470878@linutronix.de> User-Agent: quilt/0.65 Date: Wed, 23 Oct 2019 14:27:20 +0200 From: Thomas Gleixner To: LKML Cc: x86@kernel.org, Peter Zijlstra , Andy Lutomirski , Will Deacon , Paolo Bonzini , kvm@vger.kernel.org, linux-arch@vger.kernel.org, Mike Rapoport , Josh Poimboeuf , Miroslav Benes Subject: [patch V2 15/17] x86/entry: Use generic exit to usermode References: <20191023122705.198339581@linutronix.de> MIME-Version: 1.0 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Thomas Gleixner Replace the x86 specific exit to usermode code with the generic implementation. Signed-off-by: Thomas Gleixner --- arch/x86/entry/common.c | 110 ------------------------------------ arch/x86/entry/entry_32.S | 2 arch/x86/entry/entry_64.S | 2 arch/x86/include/asm/entry-common.h | 47 ++++++++++++++- arch/x86/include/asm/signal.h | 1 arch/x86/kernel/signal.c | 2 6 files changed, 51 insertions(+), 113 deletions(-) --- a/arch/x86/entry/common.c +++ b/arch/x86/entry/common.c @@ -15,15 +15,9 @@ #include #include #include -#include -#include #include #include -#include -#include #include -#include -#include #include #include @@ -47,101 +41,6 @@ static inline void enter_from_user_mode(void) {} #endif -#define EXIT_TO_USERMODE_LOOP_FLAGS \ - (_TIF_SIGPENDING | _TIF_NOTIFY_RESUME | _TIF_UPROBE | \ - _TIF_NEED_RESCHED | _TIF_USER_RETURN_NOTIFY | _TIF_PATCH_PENDING) - -static void exit_to_usermode_loop(struct pt_regs *regs, u32 cached_flags) -{ - /* - * In order to return to user mode, we need to have IRQs off with - * none of EXIT_TO_USERMODE_LOOP_FLAGS set. Several of these flags - * can be set at any time on preemptible kernels if we have IRQs on, - * so we need to loop. Disabling preemption wouldn't help: doing the - * work to clear some of the flags can sleep. - */ - while (true) { - /* We have work to do. */ - local_irq_enable(); - - if (cached_flags & _TIF_NEED_RESCHED) - schedule(); - - if (cached_flags & _TIF_UPROBE) - uprobe_notify_resume(regs); - - if (cached_flags & _TIF_PATCH_PENDING) - klp_update_patch_state(current); - - /* deal with pending signal delivery */ - if (cached_flags & _TIF_SIGPENDING) - do_signal(regs); - - if (cached_flags & _TIF_NOTIFY_RESUME) { - clear_thread_flag(TIF_NOTIFY_RESUME); - tracehook_notify_resume(regs); - rseq_handle_notify_resume(NULL, regs); - } - - if (cached_flags & _TIF_USER_RETURN_NOTIFY) - fire_user_return_notifiers(); - - /* Disable IRQs and retry */ - local_irq_disable(); - - cached_flags = READ_ONCE(current_thread_info()->flags); - - if (!(cached_flags & EXIT_TO_USERMODE_LOOP_FLAGS)) - break; - } -} - -/* Called with IRQs disabled. */ -__visible inline void prepare_exit_to_usermode(struct pt_regs *regs) -{ - struct thread_info *ti = current_thread_info(); - u32 cached_flags; - - addr_limit_user_check(); - - lockdep_assert_irqs_disabled(); - lockdep_sys_exit(); - - cached_flags = READ_ONCE(ti->flags); - - if (unlikely(cached_flags & EXIT_TO_USERMODE_LOOP_FLAGS)) - exit_to_usermode_loop(regs, cached_flags); - - /* Reload ti->flags; we may have rescheduled above. */ - cached_flags = READ_ONCE(ti->flags); - - fpregs_assert_state_consistent(); - if (unlikely(cached_flags & _TIF_NEED_FPU_LOAD)) - switch_fpu_return(); - -#ifdef CONFIG_COMPAT - /* - * Compat syscalls set TS_COMPAT. Make sure we clear it before - * returning to user mode. We need to clear it *after* signal - * handling, because syscall restart has a fixup for compat - * syscalls. The fixup is exercised by the ptrace_syscall_32 - * selftest. - * - * We also need to clear TS_REGS_POKED_I386: the 32-bit tracer - * special case only applies after poking regs and before the - * very next return to user mode. - */ - ti->status &= ~(TS_COMPAT|TS_I386_REGS_POKED); -#endif - - user_enter_irqoff(); - - mds_user_clear_cpu_buffers(); - - /* The return to usermode reenables interrupts. Tell the tracer */ - trace_hardirqs_on(); -} - /* * Called with IRQs on and fully valid regs. Returns with IRQs off in a * state such that we can immediately switch to user mode. @@ -149,9 +48,6 @@ static void exit_to_usermode_loop(struct __visible inline void syscall_return_slowpath(struct pt_regs *regs) { syscall_exit_to_usermode(regs, regs->orig_ax, regs->ax); - - local_irq_disable(); - prepare_exit_to_usermode(regs); } #ifdef CONFIG_X86_64 @@ -179,7 +75,7 @@ static void exit_to_usermode_loop(struct #endif } - syscall_return_slowpath(regs); + syscall_exit_to_usermode(regs, regs->orig_ax, regs->ax); } #endif @@ -223,7 +119,7 @@ static __always_inline void do_syscall_3 #endif /* CONFIG_IA32_EMULATION */ } - syscall_return_slowpath(regs); + syscall_exit_to_usermode(regs, regs->orig_ax, regs->ax); } /* Handles int $0x80 */ @@ -278,7 +174,7 @@ static __always_inline void do_syscall_3 /* User code screwed up. */ local_irq_disable(); regs->ax = -EFAULT; - prepare_exit_to_usermode(regs); + exit_to_usermode(regs); return 0; /* Keep it simple: use IRET. */ } --- a/arch/x86/entry/entry_32.S +++ b/arch/x86/entry/entry_32.S @@ -819,7 +819,7 @@ END(ret_from_fork) jb restore_all_kernel # not returning to v8086 or userspace movl %esp, %eax - call prepare_exit_to_usermode + call exit_to_usermode jmp restore_all END(ret_from_exception) --- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -600,7 +600,7 @@ END(common_spurious) /* Interrupt came from user space */ GLOBAL(retint_user) mov %rsp,%rdi - call prepare_exit_to_usermode + call exit_to_usermode GLOBAL(swapgs_restore_regs_and_return_to_usermode) #ifdef CONFIG_DEBUG_ENTRY --- a/arch/x86/include/asm/entry-common.h +++ b/arch/x86/include/asm/entry-common.h @@ -2,11 +2,54 @@ #ifndef _ASM_X86_ENTRY_COMMON_H #define _ASM_X86_ENTRY_COMMON_H -#include -#include +#include +#include + +#include +#include #define ARCH_SYSCALL_EXIT_WORK (_TIF_SINGLESTEP) +#define ARCH_EXIT_TO_USERMODE_WORK (_TIF_USER_RETURN_NOTIFY) + +#define ARCH_EXIT_TO_USER_FROM_SYSCALL_EXIT + +static inline void arch_exit_to_usermode_work(struct pt_regs *regs, + unsigned long ti_work) +{ + if (ti_work & _TIF_USER_RETURN_NOTIFY) + fire_user_return_notifiers(); +} +#define arch_exit_to_usermode_work arch_exit_to_usermode_work + +static inline void arch_exit_to_usermode(struct pt_regs *regs, + unsigned long ti_work) +{ + fpregs_assert_state_consistent(); + if (unlikely(ti_work & _TIF_NEED_FPU_LOAD)) + switch_fpu_return(); + +#ifdef CONFIG_COMPAT + /* + * Compat syscalls set TS_COMPAT. Make sure we clear it before + * returning to user mode. We need to clear it *after* signal + * handling, because syscall restart has a fixup for compat + * syscalls. The fixup is exercised by the ptrace_syscall_32 + * selftest. + * + * We also need to clear TS_REGS_POKED_I386: the 32-bit tracer + * special case only applies after poking regs and before the + * very next return to user mode. + */ + current_thread_info()->status &= ~(TS_COMPAT | TS_I386_REGS_POKED); +#endif + + user_enter_irqoff(); + + mds_user_clear_cpu_buffers(); +} +#define arch_exit_to_usermode arch_exit_to_usermode + static inline long arch_syscall_enter_seccomp(struct pt_regs *regs) { #ifdef CONFIG_SECCOMP --- a/arch/x86/include/asm/signal.h +++ b/arch/x86/include/asm/signal.h @@ -35,7 +35,6 @@ typedef sigset_t compat_sigset_t; #endif /* __ASSEMBLY__ */ #include #ifndef __ASSEMBLY__ -extern void do_signal(struct pt_regs *regs); #define __ARCH_HAS_SA_RESTORER --- a/arch/x86/kernel/signal.c +++ b/arch/x86/kernel/signal.c @@ -808,7 +808,7 @@ static inline unsigned long get_nr_resta * want to handle. Thus you cannot kill init even with a SIGKILL even by * mistake. */ -void do_signal(struct pt_regs *regs) +void arch_do_signal(struct pt_regs *regs) { struct ksignal ksig; From patchwork Wed Oct 23 12:27:21 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Thomas Gleixner X-Patchwork-Id: 11206575 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id AAD5013BD for ; Wed, 23 Oct 2019 12:32:11 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 93A2321920 for ; Wed, 23 Oct 2019 12:32:11 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2391490AbfJWMb4 (ORCPT ); Wed, 23 Oct 2019 08:31:56 -0400 Received: from Galois.linutronix.de ([193.142.43.55]:49153 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2391437AbfJWMbq (ORCPT ); Wed, 23 Oct 2019 08:31:46 -0400 Received: from localhost ([127.0.0.1] helo=nanos.tec.linutronix.de) by Galois.linutronix.de with esmtp (Exim 4.80) (envelope-from ) id 1iNFni-000193-7h; Wed, 23 Oct 2019 14:31:42 +0200 Message-Id: <20191023123119.173422855@linutronix.de> User-Agent: quilt/0.65 Date: Wed, 23 Oct 2019 14:27:21 +0200 From: Thomas Gleixner To: LKML Cc: x86@kernel.org, Peter Zijlstra , Andy Lutomirski , Will Deacon , Paolo Bonzini , kvm@vger.kernel.org, linux-arch@vger.kernel.org, Mike Rapoport , Josh Poimboeuf , Miroslav Benes Subject: [patch V2 16/17] kvm/workpending: Provide infrastructure for work before entering a guest References: <20191023122705.198339581@linutronix.de> MIME-Version: 1.0 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Entering a guest is similar to exiting to user space. Pending work like handling signals, rescheduling, task work etc. needs to be handled before that. Provide generic infrastructure to avoid duplication of the same handling code all over the place. The kvm_exit code is split up into a KVM specific part and a generic builtin core part to avoid multiple exports for the actual work functions. The exit to guest mode handling is slightly different from the exit to usermode handling, e.g. vs. rseq, so a separate function is used. Signed-off-by: Thomas Gleixner --- V2: Moved KVM specific functions to kvm (Paolo) Added lockdep assert (Andy) Dropped live patching from enter guest mode work (Miroslav) --- include/linux/entry-common.h | 12 ++++++++ include/linux/kvm_host.h | 64 +++++++++++++++++++++++++++++++++++++++++++ kernel/entry/common.c | 14 +++++++++ virt/kvm/Kconfig | 3 ++ 4 files changed, 93 insertions(+) --- a/include/linux/entry-common.h +++ b/include/linux/entry-common.h @@ -265,4 +265,16 @@ static inline void arch_syscall_exit_tra /* Common syscall exit function */ void syscall_exit_to_usermode(struct pt_regs *regs, long syscall, long retval); +/* KVM exit to guest mode */ + +void core_exit_to_guestmode_work(unsigned long ti_work); + +#ifndef ARCH_EXIT_TO_GUESTMODE_WORK +# define ARCH_EXIT_TO_GUESTMODE_WORK (0) +#endif + +#define EXIT_TO_GUESTMODE_WORK \ + (_TIF_NEED_RESCHED | _TIF_SIGPENDING | _TIF_NOTIFY_RESUME | \ + ARCH_EXIT_TO_GUESTMODE_WORK) + #endif --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -22,6 +22,7 @@ #include #include #include +#include #include #include #include @@ -1382,4 +1383,67 @@ static inline int kvm_arch_vcpu_run_pid_ } #endif /* CONFIG_HAVE_KVM_VCPU_RUN_PID_CHANGE */ +/* Exit to guest mode work */ +#ifdef CONFIG_KVM_EXIT_TO_GUEST_WORK + +#ifndef arch_exit_to_guestmode_work +/** + * arch_exit_to_guestmode_work - Architecture specific exit to guest mode function + * @kvm: Pointer to the guest instance + * @vcpu: Pointer to current's VCPU data + * @ti_work: Cached TIF flags gathered in exit_to_guestmode() + * + * Invoked from kvm_exit_to_guestmode_work(). Can be replaced by + * architecture specific code. + */ +static inline int arch_exit_to_guestmode_work(struct kvm *kvm, + struct kvm_vcpu *vcpu, + unsigned long ti_work) +{ + return 0; +} +#endif + +/** + * exit_to_guestmode - Check and handle pending work which needs to be + * handled before returning to guest mode + * @kvm: Pointer to the guest instance + * @vcpu: Pointer to current's VCPU data + * + * Returns: 0 or an error code + */ +static inline int exit_to_guestmode(struct kvm *kvm, struct kvm_vcpu *vcpu) +{ + unsigned long ti_work = READ_ONCE(current_thread_info()->flags); + int r = 0; + + if (unlikely(ti_work & EXIT_TO_GUESTMODE_WORK)) { + if (ti_work & _TIF_SIGPENDING) { + vcpu->run->exit_reason = KVM_EXIT_INTR; + vcpu->stat.signal_exits++; + return -EINTR; + } + core_exit_to_guestmode_work(ti_work); + r = arch_exit_to_guestmode_work(kvm, vcpu, ti_work); + } + return r; +} + +/** + * _exit_to_guestmode_work_pending - Check if work is pending which needs to be + * handled before returning to guest mode + * + * Returns: True if work pending, False otherwise. + */ +static inline bool exit_to_guestmode_work_pending(void) +{ + unsigned long ti_work = READ_ONCE(current_thread_info()->flags); + + lockdep_assert_irqs_disabled(); + + return !!(ti_work & EXIT_TO_GUESTMODE_WORK); + +} +#endif /* CONFIG_KVM_EXIT_TO_GUEST_WORK */ + #endif --- a/kernel/entry/common.c +++ b/kernel/entry/common.c @@ -8,6 +8,20 @@ #define CREATE_TRACE_POINTS #include +#ifdef CONFIG_KVM_EXIT_TO_GUEST_WORK +void core_exit_to_guestmode_work(unsigned long ti_work) +{ + if (ti_work & _TIF_NEED_RESCHED) + schedule(); + + if (ti_work & _TIF_NOTIFY_RESUME) { + clear_thread_flag(TIF_NOTIFY_RESUME); + tracehook_notify_resume(NULL); + } +} +EXPORT_SYMBOL_GPL(core_exit_to_guestmode_work); +#endif + static unsigned long core_exit_to_usermode_work(struct pt_regs *regs, unsigned long ti_work) { --- a/virt/kvm/Kconfig +++ b/virt/kvm/Kconfig @@ -60,3 +60,6 @@ config HAVE_KVM_VCPU_RUN_PID_CHANGE config HAVE_KVM_NO_POLL bool + +config KVM_EXIT_TO_GUEST_WORK + bool From patchwork Wed Oct 23 12:27:22 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Thomas Gleixner X-Patchwork-Id: 11206571 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 79B82913 for ; Wed, 23 Oct 2019 12:31:56 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 5A4C721925 for ; Wed, 23 Oct 2019 12:31:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2391481AbfJWMbz (ORCPT ); Wed, 23 Oct 2019 08:31:55 -0400 Received: from Galois.linutronix.de ([193.142.43.55]:49155 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2391444AbfJWMbq (ORCPT ); Wed, 23 Oct 2019 08:31:46 -0400 Received: from localhost ([127.0.0.1] helo=nanos.tec.linutronix.de) by Galois.linutronix.de with esmtp (Exim 4.80) (envelope-from ) id 1iNFni-00019G-NO; Wed, 23 Oct 2019 14:31:42 +0200 Message-Id: <20191023123119.271229148@linutronix.de> User-Agent: quilt/0.65 Date: Wed, 23 Oct 2019 14:27:22 +0200 From: Thomas Gleixner To: LKML Cc: x86@kernel.org, Peter Zijlstra , Andy Lutomirski , Will Deacon , Paolo Bonzini , kvm@vger.kernel.org, linux-arch@vger.kernel.org, Mike Rapoport , Josh Poimboeuf , Miroslav Benes Subject: [patch V2 17/17] x86/kvm: Use generic exit to guest work function References: <20191023122705.198339581@linutronix.de> MIME-Version: 1.0 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Use the generic infrastructure to check for and handle pending work before entering into guest mode. Signed-off-by: Thomas Gleixner --- arch/x86/kvm/Kconfig | 1 + arch/x86/kvm/x86.c | 17 +++++------------ 2 files changed, 6 insertions(+), 12 deletions(-) --- a/arch/x86/kvm/Kconfig +++ b/arch/x86/kvm/Kconfig @@ -42,6 +42,7 @@ config KVM select HAVE_KVM_MSI select HAVE_KVM_CPU_RELAX_INTERCEPT select HAVE_KVM_NO_POLL + select KVM_EXIT_TO_GUEST_WORK select KVM_GENERIC_DIRTYLOG_READ_PROTECT select KVM_VFIO select SRCU --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -52,6 +52,7 @@ #include #include #include +#include #include #include @@ -8115,8 +8116,8 @@ static int vcpu_enter_guest(struct kvm_v if (kvm_lapic_enabled(vcpu) && vcpu->arch.apicv_active) kvm_x86_ops->sync_pir_to_irr(vcpu); - if (vcpu->mode == EXITING_GUEST_MODE || kvm_request_pending(vcpu) - || need_resched() || signal_pending(current)) { + if (vcpu->mode == EXITING_GUEST_MODE || kvm_request_pending(vcpu) || + exit_to_guestmode_work_pending()) { vcpu->mode = OUTSIDE_GUEST_MODE; smp_wmb(); local_irq_enable(); @@ -8309,17 +8310,9 @@ static int vcpu_run(struct kvm_vcpu *vcp kvm_check_async_pf_completion(vcpu); - if (signal_pending(current)) { - r = -EINTR; - vcpu->run->exit_reason = KVM_EXIT_INTR; - ++vcpu->stat.signal_exits; + r = exit_to_guestmode(kvm, vcpu); + if (r) break; - } - if (need_resched()) { - srcu_read_unlock(&kvm->srcu, vcpu->srcu_idx); - cond_resched(); - vcpu->srcu_idx = srcu_read_lock(&kvm->srcu); - } } srcu_read_unlock(&kvm->srcu, vcpu->srcu_idx);