From patchwork Sat Jun 29 05:16:09 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andy Lutomirski X-Patchwork-Id: 11023721 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 844F3138D for ; Sat, 29 Jun 2019 05:22:54 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7CCA62873C for ; Sat, 29 Jun 2019 05:22:54 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 702DC2886B; Sat, 29 Jun 2019 05:22:54 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id CA6C528857 for ; Sat, 29 Jun 2019 05:22:53 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726716AbfF2FQX (ORCPT ); Sat, 29 Jun 2019 01:16:23 -0400 Received: from mail.kernel.org ([198.145.29.99]:52744 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726156AbfF2FQX (ORCPT ); Sat, 29 Jun 2019 01:16:23 -0400 Received: from mail-wr1-f49.google.com (mail-wr1-f49.google.com [209.85.221.49]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 8AE09214DA for ; Sat, 29 Jun 2019 05:16:21 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1561785381; bh=z3FyJn0RPdWPH7muJBCpqnvRwUH0LI5rqWKjeEDFOLg=; h=From:Date:Subject:To:From; b=dPBKv29OzftKoFFvlhAcyEJd99QIMLqS9R4vWd7Yoei5aL4fgOtOMRi6KGoA4iKRt P3z58q9Da0LD+Iz1pK/ifw9si9x/uwRIUs1i2j3g+pKUJaf8xpTDciVa9FCe9BMJC0 kJLggiVUStOsOrFzB1o+yjvl1aavxabu3BWL7wOg= Received: by mail-wr1-f49.google.com with SMTP id p13so8217341wru.10 for ; Fri, 28 Jun 2019 22:16:21 -0700 (PDT) X-Gm-Message-State: APjAAAVi81Dsyb6rOEqDHRsalxHoXUh34o2EUzV7XRJAVjdRfxQt2pu0 b1YOa5j2AWm7pA/WYdO1HbeXwExLCQtgMJ9pxSJa+g== X-Google-Smtp-Source: APXvYqwZprAuzRi9IWoNs7Nypmt5csVlha3XHPO27twGejUVL6amCFqCQjLM5JdTFvJLtqdqXM3k1oXD3vQTc6CAa50= X-Received: by 2002:adf:dd0f:: with SMTP id a15mr8134667wrm.265.1561785380133; Fri, 28 Jun 2019 22:16:20 -0700 (PDT) MIME-Version: 1.0 From: Andy Lutomirski Date: Fri, 28 Jun 2019 22:16:09 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: KVM's SYSCALL emulation for GenuineIntel is buggy To: kvm list , Paolo Bonzini , Radim Krcmar , Thomas Gleixner , Vegard Nossum Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP If I do SYSCALL with EFLAGS.TF set from compat mode on Intel hardware with -cpu host and no other funny business, the guest kernel seems to get #DB with the stored IP pointing at the SYSCALL instruction. This is wrong -- SYSCALL is #UD, which is a *fault*, so there shouldn't be a single-step trap. Unless I'm missing something in the code, emulate_ud() is mishandled in general -- it seems to make cause inject_emulated_exception() to return false here: if (ctxt->have_exception) { r = EMULATE_DONE; if (inject_emulated_exception(vcpu)) return r; and then we land here: if (r == EMULATE_DONE && ctxt->tf) kvm_vcpu_do_singlestep(vcpu, &r); if TF was set, which is wrong. You can test this by applying the attached patch, building x86 selftests, and running syscall_arg_fault_32 in a VM. It hangs. It should complete successfully, and it does on bare metal. commit fae8e860584b5a8c2253b522cb478e92b8b0c281 Author: Andy Lutomirski Date: Fri Jun 28 19:54:34 2019 -0700 selftests/x86: Test SYSCALL and SYSENTER manually with TF set Make sure that we exercise both variants of the nasty TF-in-compat-syscall regardless of what vendor's CPU is running the tests. Also change the intentional signal after SYSCALL to use ud2, which is a lot more comprehensible. This crashes the kernel due to an FSGSBASE bug right now. Reported-by: Vegard Nossum Cc: "Bae, Chang Seok" Signed-off-by: Andy Lutomirski diff --git a/tools/testing/selftests/x86/Makefile b/tools/testing/selftests/x86/Makefile index 186520198de7..fa07d526fe39 100644 --- a/tools/testing/selftests/x86/Makefile +++ b/tools/testing/selftests/x86/Makefile @@ -12,8 +12,9 @@ CAN_BUILD_WITH_NOPIE := $(shell ./check_cc.sh $(CC) trivial_program.c -no-pie) TARGETS_C_BOTHBITS := single_step_syscall sysret_ss_attrs syscall_nt test_mremap_vdso \ check_initial_reg_state sigreturn iopl mpx-mini-test ioperm \ - protection_keys test_vdso test_vsyscall mov_ss_trap -TARGETS_C_32BIT_ONLY := entry_from_vm86 syscall_arg_fault test_syscall_vdso unwind_vdso \ + protection_keys test_vdso test_vsyscall mov_ss_trap \ + syscall_arg_fault +TARGETS_C_32BIT_ONLY := entry_from_vm86 test_syscall_vdso unwind_vdso \ test_FCMOV test_FCOMI test_FISTTP \ vdso_restorer TARGETS_C_64BIT_ONLY := fsgsbase sysret_rip diff --git a/tools/testing/selftests/x86/syscall_arg_fault.c b/tools/testing/selftests/x86/syscall_arg_fault.c index 4e25d38c8bbd..939de3c94976 100644 --- a/tools/testing/selftests/x86/syscall_arg_fault.c +++ b/tools/testing/selftests/x86/syscall_arg_fault.c @@ -15,9 +15,30 @@ #include #include +#ifdef __x86_64__ +# define WIDTH "q" +#else +# define WIDTH "l" +#endif + /* Our sigaltstack scratch space. */ static unsigned char altstack_data[SIGSTKSZ]; +static unsigned long get_eflags(void) +{ + unsigned long eflags; + asm volatile ("pushf" WIDTH "\n\tpop" WIDTH " %0" : "=rm" (eflags)); + return eflags; +} + +static void set_eflags(unsigned long eflags) +{ + asm volatile ("push" WIDTH " %0\n\tpopf" WIDTH + : : "rm" (eflags) : "flags"); +} + +#define X86_EFLAGS_TF (1UL << 8) + static void sethandler(int sig, void (*handler)(int, siginfo_t *, void *), int flags) { @@ -35,13 +56,22 @@ static sigjmp_buf jmpbuf; static volatile sig_atomic_t n_errs; +#ifdef __x86_64__ +#define REG_AX REG_RAX +#define REG_IP REG_RIP +#else +#define REG_AX REG_EAX +#define REG_IP REG_EIP +#endif + static void sigsegv_or_sigbus(int sig, siginfo_t *info, void *ctx_void) { ucontext_t *ctx = (ucontext_t*)ctx_void; + long ax = (long)ctx->uc_mcontext.gregs[REG_AX]; - if (ctx->uc_mcontext.gregs[REG_EAX] != -EFAULT) { - printf("[FAIL]\tAX had the wrong value: 0x%x\n", - ctx->uc_mcontext.gregs[REG_EAX]); + if (ax != -EFAULT && ax != -ENOSYS) { + printf("[FAIL]\tAX had the wrong value: 0x%lx\n", + (unsigned long)ax); n_errs++; } else { printf("[OK]\tSeems okay\n"); @@ -50,9 +80,21 @@ static void sigsegv_or_sigbus(int sig, siginfo_t *info, void *ctx_void) siglongjmp(jmpbuf, 1); } +static void sigtrap(int sig, siginfo_t *info, void *ctx_void) +{ +} + static void sigill(int sig, siginfo_t *info, void *ctx_void) { - printf("[SKIP]\tIllegal instruction\n"); + ucontext_t *ctx = (ucontext_t*)ctx_void; + unsigned short *ip = (unsigned short *)ctx->uc_mcontext.gregs[REG_IP]; + + if (*ip == 0x0b0f) { + /* one of the ud2 instructions faulted */ + printf("[OK]\tSYSCALL returned normally\n"); + } else { + printf("[SKIP]\tIllegal instruction\n"); + } siglongjmp(jmpbuf, 1); } @@ -120,9 +162,46 @@ int main() "movl $-1, %%ebp\n\t" "movl $-1, %%esp\n\t" "syscall\n\t" - "pushl $0" /* make sure we segfault cleanly */ + "ud2" /* make sure we recover cleanly */ + : : : "memory", "flags"); + } + + printf("[RUN]\tSYSENTER with TF and invalid state\n"); + sethandler(SIGTRAP, sigtrap, SA_ONSTACK); + + if (sigsetjmp(jmpbuf, 1) == 0) { + set_eflags(get_eflags() | X86_EFLAGS_TF); + asm volatile ( + "movl $-1, %%eax\n\t" + "movl $-1, %%ebx\n\t" + "movl $-1, %%ecx\n\t" + "movl $-1, %%edx\n\t" + "movl $-1, %%esi\n\t" + "movl $-1, %%edi\n\t" + "movl $-1, %%ebp\n\t" + "movl $-1, %%esp\n\t" + "sysenter" + : : : "memory", "flags"); + } + set_eflags(get_eflags() & ~X86_EFLAGS_TF); + + printf("[RUN]\tSYSCALL with TF and invalid state\n"); + if (sigsetjmp(jmpbuf, 1) == 0) { + set_eflags(get_eflags() | X86_EFLAGS_TF); + asm volatile ( + "movl $-1, %%eax\n\t" + "movl $-1, %%ebx\n\t" + "movl $-1, %%ecx\n\t" + "movl $-1, %%edx\n\t" + "movl $-1, %%esi\n\t" + "movl $-1, %%edi\n\t" + "movl $-1, %%ebp\n\t" + "movl $-1, %%esp\n\t" + "syscall\n\t" + "ud2" /* make sure we recover cleanly */ : : : "memory", "flags"); } + set_eflags(get_eflags() & ~X86_EFLAGS_TF); return 0; }