From patchwork Fri Mar 11 20:47:19 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Matlack X-Patchwork-Id: 8568991 Return-Path: X-Original-To: patchwork-kvm@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork2.web.kernel.org (Postfix) with ESMTP id 2B7DCC0553 for ; Fri, 11 Mar 2016 20:47:48 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 070A120272 for ; Fri, 11 Mar 2016 20:47:47 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id CB094201DD for ; Fri, 11 Mar 2016 20:47:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751934AbcCKUrm (ORCPT ); Fri, 11 Mar 2016 15:47:42 -0500 Received: from mail-pa0-f45.google.com ([209.85.220.45]:34525 "EHLO mail-pa0-f45.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750953AbcCKUrk (ORCPT ); Fri, 11 Mar 2016 15:47:40 -0500 Received: by mail-pa0-f45.google.com with SMTP id fe3so91961605pab.1 for ; Fri, 11 Mar 2016 12:47:40 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=from:to:cc:subject:date:message-id; bh=YnS+nAOBlNQ40v2swPXTh3C3vMjPneD+5xUxmGxteWE=; b=QbfsLOoIulQYNQaMmTO0NSA2vv9Gbv8yUDjZxkM5MXjcy8jIHWakt7f+gUpVul4pAq OJnxZfgSpCENpSDM0i6J/wHA3e8G8PhHm7v56pZG8xcjf04PuGIJbFAxldKvRdf4eKGy N0EAE9yOM+OPSwXN4WLUJg5eEa5nO4TeXmAdV5ctHPydn/ak3UTdGylUo3guNvl0FbjZ y50TxRlwkmuJpSAVZc1VIHRWf0ovsSnGF+Th/iBCmldyi6lTaakClmrbUe37CG5pX6JR lj0PFy2YX0jNSp/SektFwUW7vG9bV0jnp6YIvcr4ZJPyXRMC5SLKuuH+IuQC3HxI7uBq llqQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:to:cc:subject:date:message-id; bh=YnS+nAOBlNQ40v2swPXTh3C3vMjPneD+5xUxmGxteWE=; b=b5U4Hy9SUyscnZ1+EY6nXXloNNZFqvhIVCw/aO0BxnRHcC9YWE6KJ/G2v3fuq5o3js ulZf1V4MVDILx1ttswQuy2SKBEHr3hyH0HU+CiUkDOZbAXdfJKkx5AzJW6u9EwdIiFVn UWBiUkNdI2Ppr055NZT0rlJzusLLNQs15bb+sdeWSa2fKM9M7q0pU2ZQwyqXRNlMvZ/i KnfAg9srlYTtcA8ol/DCGB3LkW3asj2+Fq7CVWwuJGVZ5ZAvrOf0Dmi8fLrNOe5u88ZL fb/PXGcYa5Fox8l7vAk5dlHtU8GOJLcjJwww53nZQ1AOOjgdkab8H+D9JyVg1D3xVz84 U/Gw== X-Gm-Message-State: AD7BkJKD26QxY5+SHnIxsaOMPhOBArJwCKSEBeWZt4H7O3NopQoyjfWBYsO83bWFLnQq+Nlh X-Received: by 10.66.250.163 with SMTP id zd3mr18792450pac.119.1457729259926; Fri, 11 Mar 2016 12:47:39 -0800 (PST) Received: from dmatlack.sea.corp.google.com ([172.31.89.32]) by smtp.gmail.com with ESMTPSA id o7sm14931580pfj.89.2016.03.11.12.47.39 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Fri, 11 Mar 2016 12:47:39 -0800 (PST) From: David Matlack To: linux-kernel@vger.kernel.org, x86@kernel.org, kvm@vger.kernel.org Cc: pbonzini@redhat.com, mingo@redhat.com, luto@kernel.org, hpa@zytor.com, digitaleric@google.com, David Matlack Subject: [PATCH 0/1] KVM: x86: using the fpu in interrupt context with a guest's xcr0 Date: Fri, 11 Mar 2016 12:47:19 -0800 Message-Id: <1457729240-3846-1-git-send-email-dmatlack@google.com> X-Mailer: git-send-email 2.7.0.rc3.207.g0ac5344 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Spam-Status: No, score=-6.8 required=5.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED, DKIM_SIGNED, RCVD_IN_DNSWL_HI, RP_MATCHES_RCVD, T_DKIM_INVALID, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP We've found that an interrupt handler that uses the fpu can kill a KVM VM, if it runs under the following conditions: - the guest's xcr0 register is loaded on the cpu - the guest's fpu context is not loaded - the host is using eagerfpu Note that the guest's xcr0 register and fpu context are not loaded as part of the atomic world switch into "guest mode". They are loaded by KVM while the cpu is still in "host mode". Usage of the fpu in interrupt context is gated by irq_fpu_usable(). The interrupt handler will look something like this: if (irq_fpu_usable()) { kernel_fpu_begin(); [... code that uses the fpu ...] kernel_fpu_end(); } As long as the guest's fpu is not loaded and the host is using eager fpu, irq_fpu_usable() returns true (interrupted_kernel_fpu_idle() returns true). The interrupt handler proceeds to use the fpu with the guest's xcr0 live. kernel_fpu_begin() saves the current fpu context. If this uses XSAVE[OPT], it may leave the xsave area in an undesirable state. According to the SDM, during XSAVE bit i of XSTATE_BV is not modified if bit i is 0 in xcr0. So it's possible that XSTATE_BV[i] == 1 and xcr0[i] == 0 following an XSAVE. kernel_fpu_end() restores the fpu context. Now if any bit i in XSTATE_BV is 1 while xcr0[i] is 0, XRSTOR generates a #GP fault. (This #GP gets trapped and turned into a SIGSEGV, which kills the VM.) In guests that have access to the same CPU features as the host, this bug is more likely to reproduce during VM boot, while the guest xcr0 is 1. Once the guest's xcr0 is indistinguishable from the host's, there is no issue. I have not been able to trigger this bug on Linux 4.3, and suspect it is due to this commit from Linux 4.2: 653f52c kvm,x86: load guest FPU context more eagerly With this commit, as long as the host is using eagerfpu, the guest's fpu is always loaded just before the guest's xcr0 (vcpu->fpu_active is always 1 in the following snippet): 6569 if (vcpu->fpu_active) 6570 kvm_load_guest_fpu(vcpu); 6571 kvm_load_guest_xcr0(vcpu); When the guest's fpu is loaded, irq_fpu_usable() returns false. We've included our workaround for this bug, which applies to Linux 3.11. It does not apply cleanly to HEAD since the fpu subsystem was refactored in Linux 4.2. While the latest kernel does not look vulnerable, we may want to apply a fix to the vulnerable stable kernels. An equally effective solution may be to just backport 653f52c to stable. Attached here is a stress module we used to reproduce the bug. It fires IPIs at all online CPUs and uses the fpu in the IPI handler. We found that running this module while booting a VM was an extremely effective way to kill said VM :). For the kernel developers who are working to make eagerfpu the global default, this module might be a useful stress test, especially when run in the background during other tests. --- 8< --- irq_fpu_stress.c | 95 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 95 insertions(+) create mode 100644 irq_fpu_stress.c Eric Northup (1): KVM: don't allow irq_fpu_usable when the VCPU's XCR0 is loaded arch/x86/include/asm/i387.h | 3 +++ arch/x86/kernel/i387.c | 10 ++++++++-- arch/x86/kvm/x86.c | 4 ++++ 3 files changed, 15 insertions(+), 2 deletions(-) diff --git a/irq_fpu_stress.c b/irq_fpu_stress.c new file mode 100644 index 0000000..faa6ba3 --- /dev/null +++ b/irq_fpu_stress.c @@ -0,0 +1,95 @@ +/* + * For the duration of time this module is loaded, this module fires + * IPIs at all CPUs and tries to use the FPU on that CPU in irq + * context. + */ +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include +#include + +MODULE_LICENSE("GPL"); + +#define MODNAME "irq_fpu_stress" +#undef pr_fmt +#define pr_fmt(fmt) MODNAME": "fmt + +struct workqueue_struct *work_queue; +struct work_struct work; + +struct { + atomic_t irq_fpu_usable; + atomic_t irq_fpu_unusable; + unsigned long num_tests; +} stats; + +bool done; + +static void test_irq_fpu(void *info) +{ + BUG_ON(!in_interrupt()); + + if (irq_fpu_usable()) { + atomic_inc(&stats.irq_fpu_usable); + + kernel_fpu_begin(); + kernel_fpu_end(); + } else { + atomic_inc(&stats.irq_fpu_unusable); + } +} + +static void do_work(struct work_struct *w) +{ + pr_info("starting test\n"); + + stats.num_tests = 0; + atomic_set(&stats.irq_fpu_usable, 0); + atomic_set(&stats.irq_fpu_unusable, 0); + + while (!ACCESS_ONCE(done)) { + preempt_disable(); + smp_call_function_many( + cpu_online_mask, test_irq_fpu, NULL, 1 /* wait */); + preempt_enable(); + + stats.num_tests++; + + if (need_resched()) + schedule(); + } + + pr_info("finished test\n"); +} + +int init_module(void) +{ + work_queue = create_singlethread_workqueue(MODNAME); + + INIT_WORK(&work, do_work); + queue_work(work_queue, &work); + + return 0; +} + +void cleanup_module(void) +{ + ACCESS_ONCE(done) = true; + + flush_workqueue(work_queue); + destroy_workqueue(work_queue); + + pr_info("num_tests %lu, irq_fpu_usable %d, irq_fpu_unusable %d\n", + stats.num_tests, + atomic_read(&stats.irq_fpu_usable), + atomic_read(&stats.irq_fpu_unusable)); +} --- 8< ---