From patchwork Sun May 7 12:38:39 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Andy Lutomirski X-Patchwork-Id: 9715557 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 995DA60365 for ; Sun, 7 May 2017 22:09:54 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8ADAF26419 for ; Sun, 7 May 2017 22:09:54 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 7F84B26C9B; Sun, 7 May 2017 22:09:54 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 01FDA26419 for ; Sun, 7 May 2017 22:09:54 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757046AbdEGWJh (ORCPT ); Sun, 7 May 2017 18:09:37 -0400 Received: from mail.kernel.org ([198.145.29.136]:51076 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757023AbdEGWGT (ORCPT ); Sun, 7 May 2017 18:06:19 -0400 Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 6D75B203B5; Sun, 7 May 2017 12:39:33 +0000 (UTC) Received: from localhost (118-171-175-29.dynamic-ip.hinet.net [118.171.175.29]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 9C080203F3; Sun, 7 May 2017 12:39:31 +0000 (UTC) From: Andy Lutomirski To: X86 ML Cc: "linux-kernel@vger.kernel.org" , Borislav Petkov , Linus Torvalds , Andrew Morton , Mel Gorman , "linux-mm@kvack.org" , Andy Lutomirski , Paolo Bonzini , =?UTF-8?q?Radim=20Kr=C4=8Dm=C3=A1=C5=99?= , kvm@vger.kernel.org, Rik van Riel , Dave Hansen , Nadav Amit , Michal Hocko , Arjan van de Ven Subject: [RFC 10/10] x86, kvm: Teach KVM's VMX code that CR3 isn't a constant Date: Sun, 7 May 2017 05:38:39 -0700 Message-Id: <59ea83a0a78025439e3d15e09b693846fa1f4770.1494160201.git.luto@kernel.org> X-Mailer: git-send-email 2.9.3 In-Reply-To: References: In-Reply-To: References: MIME-Version: 1.0 X-Virus-Scanned: ClamAV using ClamSMTP Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP When PCID is enabled, CR3's PCID bits can change during context switches, so KVM won't be able to treat CR3 as a per-mm constant any more. I structured this like the existing CR4 handling. Under ordinary circumstances (PCID disabled or if the current PCID and the value that's already in the VMCS match), then we won't do an extra VMCS write, and we'll never do an extra direct CR3 read. The overhead should be minimal. I disallowed using the new helper in non-atomic context because PCID support will cause CR3 to stop being constant in non-atomic process context. (Frankly, it also scares me a bit that KVM ever treated CR3 as constant, but it looks like it was okay before.) Cc: Paolo Bonzini Cc: Radim Krčmář Cc: kvm@vger.kernel.org Cc: Rik van Riel Cc: Dave Hansen Cc: Nadav Amit Cc: Michal Hocko Cc: Andrew Morton Cc: Arjan van de Ven Signed-off-by: Andy Lutomirski --- arch/x86/include/asm/mmu_context.h | 19 +++++++++++++++++++ arch/x86/kvm/vmx.c | 21 ++++++++++++++++++--- 2 files changed, 37 insertions(+), 3 deletions(-) diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h index 187c39470a0b..f20d7ea47095 100644 --- a/arch/x86/include/asm/mmu_context.h +++ b/arch/x86/include/asm/mmu_context.h @@ -266,4 +266,23 @@ static inline bool arch_vma_access_permitted(struct vm_area_struct *vma, return __pkru_allows_pkey(vma_pkey(vma), write); } + +/* + * This can be used from process context to figure out what the value of + * CR3 is without needing to do a (slow) read_cr3(). + * + * It's intended to be used for code like KVM that sneakily changes CR3 + * and needs to restore it. It needs to be used very carefully. + */ +static inline unsigned long __get_current_cr3_fast(void) +{ + unsigned long cr3 = __pa(this_cpu_read(cpu_tlbstate.loaded_mm)->pgd); + + /* For now, be very restrictive about when this can be called. */ + VM_WARN_ON(in_nmi() || !in_atomic()); + + VM_BUG_ON(cr3 != read_cr3()); + return cr3; +} + #endif /* _ASM_X86_MMU_CONTEXT_H */ diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 535cc065b844..2771235072aa 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -48,6 +48,7 @@ #include #include #include +#include #include "trace.h" #include "pmu.h" @@ -596,6 +597,7 @@ struct vcpu_vmx { int gs_ldt_reload_needed; int fs_reload_needed; u64 msr_host_bndcfgs; + unsigned long vmcs_host_cr3; /* May not match real cr3 */ unsigned long vmcs_host_cr4; /* May not match real cr4 */ } host_state; struct { @@ -5017,12 +5019,19 @@ static void vmx_set_constant_host_state(struct vcpu_vmx *vmx) u32 low32, high32; unsigned long tmpl; struct desc_ptr dt; - unsigned long cr0, cr4; + unsigned long cr0, cr3, cr4; cr0 = read_cr0(); WARN_ON(cr0 & X86_CR0_TS); vmcs_writel(HOST_CR0, cr0); /* 22.2.3 */ - vmcs_writel(HOST_CR3, read_cr3()); /* 22.2.3 FIXME: shadow tables */ + + /* + * Save the most likely value for this task's CR3 in the VMCS. + * We can't use __get_current_cr3_fast() because we're not atomic. + */ + cr3 = read_cr3(); + vmcs_writel(HOST_CR3, cr3); /* 22.2.3 FIXME: shadow tables */ + vmx->host_state.vmcs_host_cr3 = cr3; /* Save the most likely value for this task's CR4 in the VMCS. */ cr4 = cr4_read_shadow(); @@ -8927,7 +8936,7 @@ static void vmx_arm_hv_timer(struct kvm_vcpu *vcpu) static void __noclone vmx_vcpu_run(struct kvm_vcpu *vcpu) { struct vcpu_vmx *vmx = to_vmx(vcpu); - unsigned long debugctlmsr, cr4; + unsigned long debugctlmsr, cr3, cr4; /* Record the guest's net vcpu time for enforced NMI injections. */ if (unlikely(!cpu_has_virtual_nmis() && vmx->soft_vnmi_blocked)) @@ -8953,6 +8962,12 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu *vcpu) if (test_bit(VCPU_REGS_RIP, (unsigned long *)&vcpu->arch.regs_dirty)) vmcs_writel(GUEST_RIP, vcpu->arch.regs[VCPU_REGS_RIP]); + cr3 = __get_current_cr3_fast(); + if (unlikely(cr3 != vmx->host_state.vmcs_host_cr3)) { + vmcs_writel(HOST_CR3, cr3); + vmx->host_state.vmcs_host_cr3 = cr3; + } + cr4 = cr4_read_shadow(); if (unlikely(cr4 != vmx->host_state.vmcs_host_cr4)) { vmcs_writel(HOST_CR4, cr4);