From patchwork Tue May 9 13:50:06 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yan Zhao X-Patchwork-Id: 13235833 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 506ADC77B75 for ; Tue, 9 May 2023 14:15:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235804AbjEIOP3 (ORCPT ); Tue, 9 May 2023 10:15:29 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55916 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235794AbjEIOP2 (ORCPT ); Tue, 9 May 2023 10:15:28 -0400 Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2D11B40C9; Tue, 9 May 2023 07:15:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1683641722; x=1715177722; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=vihWXicRD+3O+AhuYtdyTHXD4NO+CZOfdeB7IWKIjnU=; b=iVMw+mPuNHBktWvx9nZ1U8TQWLdnE+IUtqmYdLHLEmFxoKsoj8tZpXyA 4SMTU5XPQw5oJZXRQWdrU2imdGPWIXjJEaQeAjo2IOSukRmMGmmuwMV8O 3K7i0vQvTSfdcVa6gBhP9/Mgol0AIeSLjjnDxbK+rO+fTl+CILmpS+M8f /+F0mvg14d5aSLt+OJbD8Pf4CV3ZNgHcXxr38Omftv4byuSNASDlcBUfZ 3QyS9kbkH/u+7xwDbNnLuHcun+DsgaRy2OTIh3mTa7WXBCCO5INQ3ruQR kSQBqQlnuSnWjGCQCx46/zjBdTK5DlnIb26qvOClz5yRSetr4oMojXB6U w==; X-IronPort-AV: E=McAfee;i="6600,9927,10705"; a="378034005" X-IronPort-AV: E=Sophos;i="5.99,262,1677571200"; d="scan'208";a="378034005" Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 May 2023 07:15:21 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10705"; a="810701818" X-IronPort-AV: E=Sophos;i="5.99,262,1677571200"; d="scan'208";a="810701818" Received: from yzhao56-desk.sh.intel.com ([10.239.159.62]) by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 May 2023 07:15:20 -0700 From: Yan Zhao To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: pbonzini@redhat.com, seanjc@google.com, Yan Zhao Subject: [PATCH v2 1/6] KVM: x86/mmu: add a new mmu zap helper to indicate memtype changes Date: Tue, 9 May 2023 21:50:06 +0800 Message-Id: <20230509135006.1604-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230509134825.1523-1-yan.y.zhao@intel.com> References: <20230509134825.1523-1-yan.y.zhao@intel.com> Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Add a helper to indicate that the kvm_zap_gfn_range() request is to update memory type. Then the zap can be avoided in cases: 1. TDP is not enabled. 2. EPT is not enabled. This is because only memory type of EPT leaf entries are subjected to change when noncoherent DMA/guest CR0.CD/guest MTRR settings change. Signed-off-by: Yan Zhao --- arch/x86/kvm/mmu.h | 1 + arch/x86/kvm/mmu/mmu.c | 16 ++++++++++++++++ 2 files changed, 17 insertions(+) diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h index 92d5a1924fc1..a04577afbc71 100644 --- a/arch/x86/kvm/mmu.h +++ b/arch/x86/kvm/mmu.h @@ -236,6 +236,7 @@ static inline u8 permission_fault(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu, } void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end); +void kvm_zap_gfn_for_memtype(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end); int kvm_arch_write_log_dirty(struct kvm_vcpu *vcpu); diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index c8961f45e3b1..2706754794d1 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -6272,6 +6272,22 @@ static bool kvm_rmap_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_e return flush; } +/* + * Invalidate (zap) TDP SPTEs that cover GFNs from gfn_start and up to gfn_end + * (not including it) for reason of memory type being updated. + */ +void kvm_zap_gfn_for_memtype(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end) +{ + /* Currently only memory type of EPT leaf entries are affected by + * guest CR0.CD and guest MTRR. + * So skip invalidation (zap) in other cases + */ + if (!shadow_memtype_mask) + return; + + kvm_zap_gfn_range(kvm, gpa_to_gfn(0), gpa_to_gfn(~0ULL)); +} + /* * Invalidate (zap) SPTEs that cover GFNs from gfn_start and up to gfn_end * (not including it) From patchwork Tue May 9 13:51:10 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yan Zhao X-Patchwork-Id: 13235834 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id AC2AFC77B75 for ; Tue, 9 May 2023 14:16:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235805AbjEIOQO (ORCPT ); Tue, 9 May 2023 10:16:14 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56726 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235296AbjEIOQN (ORCPT ); Tue, 9 May 2023 10:16:13 -0400 Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4FF653580; Tue, 9 May 2023 07:16:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1683641772; x=1715177772; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=0dNyFxF2R8H8nDmO1BfaeIPx4OHrZOLpuoFOx9cABLg=; b=DKuL0WsZAyO7UmgBUucgIO3XP3RYGkdZ4SwioPiX0AeSD7/UmGJjkKI1 Z/ufS5RdlpnLt/ddHe6ksBPkfOYeK05OshSR8V0jLIMeQMESnNfVh41jc PIiRZsVUT+X67ltuT9CZmCS3wqJXx2Lxmwiht+6l79ug9sGXAe7XIBFpK GE6IZXAaNkwNFe4tAbV6b+t4d3nhlIeqvbTNAdW5ggqHqXMFZ2dnBcGtV tS3RedgqgYUsuEDGoxBud5+2phH8U540ICoQJXoxGtsXXsYy2cOmGoBVw 5PqU79Qw+/kSevWjk8Unsutm7wuM3WypL0Zc6Zw81CiFIWGyXBqkFWlcy g==; X-IronPort-AV: E=McAfee;i="6600,9927,10705"; a="349971732" X-IronPort-AV: E=Sophos;i="5.99,262,1677571200"; d="scan'208";a="349971732" Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 May 2023 07:16:11 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10705"; a="945298305" X-IronPort-AV: E=Sophos;i="5.99,262,1677571200"; d="scan'208";a="945298305" Received: from yzhao56-desk.sh.intel.com ([10.239.159.62]) by fmsmga006-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 May 2023 07:16:10 -0700 From: Yan Zhao To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: pbonzini@redhat.com, seanjc@google.com, Yan Zhao Subject: [PATCH v2 2/6] KVM: x86/mmu: only zap EPT when guest CR0_CD changes Date: Tue, 9 May 2023 21:51:10 +0800 Message-Id: <20230509135110.1664-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230509134825.1523-1-yan.y.zhao@intel.com> References: <20230509134825.1523-1-yan.y.zhao@intel.com> Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Call new helper kvm_zap_gfn_for_memtype() to skip zap mmu if EPT is not enabled. Guest CR0_CD value will affect memory type of EPT leaf entry with noncoherent DMA present. But mmu zap is not necessary if EPT is not enabled. Suggested-by: Chao Gao Signed-off-by: Yan Zhao --- arch/x86/kvm/x86.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index e7f78fe79b32..ed1e3939bd05 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -942,7 +942,7 @@ void kvm_post_set_cr0(struct kvm_vcpu *vcpu, unsigned long old_cr0, unsigned lon if (((cr0 ^ old_cr0) & X86_CR0_CD) && kvm_arch_has_noncoherent_dma(vcpu->kvm) && !kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_CD_NW_CLEARED)) - kvm_zap_gfn_range(vcpu->kvm, 0, ~0ULL); + kvm_zap_gfn_for_memtype(vcpu->kvm, gpa_to_gfn(0), gpa_to_gfn(~0ULL)); } EXPORT_SYMBOL_GPL(kvm_post_set_cr0); From patchwork Tue May 9 13:51:43 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yan Zhao X-Patchwork-Id: 13235835 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 671B5C77B75 for ; Tue, 9 May 2023 14:16:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235809AbjEIOQo (ORCPT ); Tue, 9 May 2023 10:16:44 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57270 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235608AbjEIOQn (ORCPT ); Tue, 9 May 2023 10:16:43 -0400 Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1F87530D6; Tue, 9 May 2023 07:16:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1683641802; x=1715177802; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=CFYbkufDbUcvakbzS3v4+C+2rYDU17Dc/1vzYarICt0=; b=GhfN7zDZG0Q8lRzwApqCz3NqHUDv6C01Qa+6ZQrdSXeO0q+55no/sm01 32sMiBIm9f8JKojvWevsaHYBkto21MuycZ8zCE6nckKRtk9EqhI2Bh/VK xaaBWq9kFCL2Tsvw7Bdu1lKaIplZ9LPDM2lpV7XxgQBS4u9Zb9DUz95XK x6YVWMHvJHUtKAvM1KqbH9rGBu8sIw96Cx1HfHAv7VVhg2SUkfUB//Sxv NhM5DBAMXaF36QB/Ex4J+GPD6xyVFwgui8Ot3ncCCoDSi2OYfM2yAPocx SYb6UL6mb3lDRaXn8L0y/a2F2XaJqI977rKI4sAeVfN6ekQBDpDYm4KNI A==; X-IronPort-AV: E=McAfee;i="6600,9927,10705"; a="349971876" X-IronPort-AV: E=Sophos;i="5.99,262,1677571200"; d="scan'208";a="349971876" Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 May 2023 07:16:41 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10705"; a="945298429" X-IronPort-AV: E=Sophos;i="5.99,262,1677571200"; d="scan'208";a="945298429" Received: from yzhao56-desk.sh.intel.com ([10.239.159.62]) by fmsmga006-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 May 2023 07:16:40 -0700 From: Yan Zhao To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: pbonzini@redhat.com, seanjc@google.com, Yan Zhao Subject: [PATCH v2 3/6] KVM: x86/mmu: only zap EPT when guest MTRR changes Date: Tue, 9 May 2023 21:51:43 +0800 Message-Id: <20230509135143.1721-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230509134825.1523-1-yan.y.zhao@intel.com> References: <20230509134825.1523-1-yan.y.zhao@intel.com> Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Call new helper kvm_zap_gfn_for_memtype() to skip zap mmu if EPT is not enabled. When guest MTRR changes and it's desired to zap TDP entries to remove stale mappings, only do it when EPT is enabled, because only memory type of EPT leaf is affected by guest MTRR with noncoherent DMA present. Signed-off-by: Yan Zhao --- arch/x86/kvm/mtrr.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/kvm/mtrr.c b/arch/x86/kvm/mtrr.c index 9fac1ec03463..62ebb9978156 100644 --- a/arch/x86/kvm/mtrr.c +++ b/arch/x86/kvm/mtrr.c @@ -330,7 +330,7 @@ static void update_mtrr(struct kvm_vcpu *vcpu, u32 msr) var_mtrr_range(&mtrr_state->var_ranges[index], &start, &end); } - kvm_zap_gfn_range(vcpu->kvm, gpa_to_gfn(start), gpa_to_gfn(end)); + kvm_zap_gfn_for_memtype(vcpu->kvm, gpa_to_gfn(start), gpa_to_gfn(end)); } static bool var_mtrr_range_is_valid(struct kvm_mtrr_range *range) From patchwork Tue May 9 13:52:26 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yan Zhao X-Patchwork-Id: 13235841 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 26553C7EE24 for ; Tue, 9 May 2023 14:20:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235605AbjEIOUM (ORCPT ); Tue, 9 May 2023 10:20:12 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60902 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235013AbjEIOUK (ORCPT ); Tue, 9 May 2023 10:20:10 -0400 Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 14E9430F7; Tue, 9 May 2023 07:20:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1683642008; x=1715178008; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=9k7MFPNbtv2QzGAeGYaRb/Sks6eG/baaVVfpJ4EEOVA=; b=TAwZ28ZpljasI4j5n1yrYBvKHgIno+0z4VU/hZXSFat3g0CQztoeNLKl K7qM1y5QTnIHXi+RLX9rRbKEWCbbr+SpuaqerOm9CpgTskInSMZsyzVi6 OZ1CG5dpMmPT7fNqSX+ZH8pUBAbrLSXl+PdzVVZO2/ja+Oug7ZgEycyrE 7wuk9+xlR5z7FqLmYSAxPQJYe5rs/xuAy86pQRmG30eFuvhFnJKNbWouu CBwecy9qdq/JgD3ICDVGpPDHcAWIXFf3+RR+bSgCIimFyHI0Mv63CdHE5 fJaIrL22+Ta/tfDobrtwf5AaaU4s4HfXluSecLq5IFpbDaUX26FS2/eZO Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10705"; a="329561785" X-IronPort-AV: E=Sophos;i="5.99,262,1677571200"; d="scan'208";a="329561785" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 May 2023 07:17:28 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10705"; a="823114077" X-IronPort-AV: E=Sophos;i="5.99,262,1677571200"; d="scan'208";a="823114077" Received: from yzhao56-desk.sh.intel.com ([10.239.159.62]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 May 2023 07:17:24 -0700 From: Yan Zhao To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: pbonzini@redhat.com, seanjc@google.com, Yan Zhao Subject: [PATCH v2 4/6] KVM: x86/mmu: Zap all EPT leaf entries according noncoherent DMA count Date: Tue, 9 May 2023 21:52:26 +0800 Message-Id: <20230509135226.1780-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230509134825.1523-1-yan.y.zhao@intel.com> References: <20230509134825.1523-1-yan.y.zhao@intel.com> Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Zap all EPT leaf entries when noncoherent DMA count goes from 0 to 1, or from 1 to 0. When there's no noncoherent DMA device, EPT memory type is ((MTRR_TYPE_WRBACK << VMX_EPT_MT_EPTE_SHIFT) | VMX_EPT_IPAT_BIT) When there're noncoherent DMA devices, EPT memory type needs to honor guest CR0_CD and MTRR settings. So, if noncoherent DMA count changes between 0 and 1, EPT leaf entries need to be zapped to clear stale memory type. This issue might be hidden when the device is statically assigned with VFIO adding/removing MMIO regions of the noncoherent DMA devices for several times during guest boot, and current KVM MMU will call kvm_mmu_zap_all_fast() on the memslot removal. But if the device is hot-plugged, or if the guest has mmio_always_on for the device, the MMIO regions of it may only be added for once, then there's no path to do the EPT entries zapping to clear stale memory type. Therefore do the EPT zapping of all leaf entries when present/non-present state of noncoherent DMA devices changes to ensure stale entries cleaned away. Signed-off-by: Yan Zhao --- arch/x86/kvm/x86.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index ed1e3939bd05..48b683a305b3 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -13145,13 +13145,15 @@ EXPORT_SYMBOL_GPL(kvm_arch_has_assigned_device); void kvm_arch_register_noncoherent_dma(struct kvm *kvm) { - atomic_inc(&kvm->arch.noncoherent_dma_count); + if (atomic_inc_return(&kvm->arch.noncoherent_dma_count) == 1) + kvm_zap_gfn_for_memtype(kvm, gpa_to_gfn(0), gpa_to_gfn(~0ULL)); } EXPORT_SYMBOL_GPL(kvm_arch_register_noncoherent_dma); void kvm_arch_unregister_noncoherent_dma(struct kvm *kvm) { - atomic_dec(&kvm->arch.noncoherent_dma_count); + if (!atomic_dec_return(&kvm->arch.noncoherent_dma_count)) + kvm_zap_gfn_for_memtype(kvm, gpa_to_gfn(0), gpa_to_gfn(~0ULL)); } EXPORT_SYMBOL_GPL(kvm_arch_unregister_noncoherent_dma); From patchwork Tue May 9 13:53:00 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yan Zhao X-Patchwork-Id: 13235836 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D43CFC7EE22 for ; Tue, 9 May 2023 14:18:04 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235819AbjEIOSE (ORCPT ); Tue, 9 May 2023 10:18:04 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58650 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235548AbjEIOSA (ORCPT ); Tue, 9 May 2023 10:18:00 -0400 Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0E0E119B0; Tue, 9 May 2023 07:17:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1683641879; x=1715177879; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=hOnmgLSUwhd8cHoAlejpvBtggpnVHk9PPddNjVrC55A=; b=HStxl9IWdLudi/PQ1gTaR3DKx2Yv1IvkBL1EoMUuzvUfyu33Q6LSlR5+ tSwvHlejXseeHNVDi3TdllWvi0UeDjCPt22o2F+KvXfsKXx7vTxwcWwv5 BV2e1t1tDHZGQtCLUsD1cq6hGKR3j988zi9Zu5gR9h4/FNCLweHjk6GDq 24r79pvr6ullmElTzlqq70VIfePQfT/EpSCoBaQdo3nbgM7izLkECz6+J 6MwVYvOUAu9WyLUb5TdXk33RrH0gl5AAh7cThRklNoaS8+C/9NYPZIKIe P+TszVlLV2ZX8+9IIUM4A+Im3cPxk9wojCVrzZCAUN89HlyHqwh9iozLf A==; X-IronPort-AV: E=McAfee;i="6600,9927,10705"; a="330300038" X-IronPort-AV: E=Sophos;i="5.99,262,1677571200"; d="scan'208";a="330300038" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 May 2023 07:17:58 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10705"; a="788546070" X-IronPort-AV: E=Sophos;i="5.99,262,1677571200"; d="scan'208";a="788546070" Received: from yzhao56-desk.sh.intel.com ([10.239.159.62]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 May 2023 07:17:57 -0700 From: Yan Zhao To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: pbonzini@redhat.com, seanjc@google.com, Yan Zhao Subject: [PATCH v2 5/6] KVM: x86: Keep a per-VM MTRR state Date: Tue, 9 May 2023 21:53:00 +0800 Message-Id: <20230509135300.1855-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230509134825.1523-1-yan.y.zhao@intel.com> References: <20230509134825.1523-1-yan.y.zhao@intel.com> Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Keep a per-VM MTRR state and point it to the MTRR state of vCPU 0. This is a preparation patch for KVM to reference a per-VM guest MTRR to decide memory type of EPT leaf entries when noncoherent DMA is present. Though each vCPU has its own MTRR state, MTRR states should be consistent across each VM, which is demanded as in Intel's SDM "In a multiprocessor system using a processor in the P6 family or a more recent family, each processor MUST use the identical MTRR memory map so that software will have a consistent view of memory." Therefore, when memory type of EPT leaf entry needs to honor guest MTRR, a per-VM version of guest MTRR can be referenced. Each vCPU still has its own MTRR state field to keep guest rdmsr() returning the right value when there's lag of MTRR update for each vCPU. Signed-off-by: Yan Zhao --- arch/x86/include/asm/kvm_host.h | 3 +++ arch/x86/kvm/mtrr.c | 22 ++++++++++++++++++++++ arch/x86/kvm/x86.c | 2 ++ arch/x86/kvm/x86.h | 2 ++ 4 files changed, 29 insertions(+) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 2865c3cb3501..a2b6b1e1548f 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1444,6 +1444,9 @@ struct kvm_arch { */ #define SPLIT_DESC_CACHE_MIN_NR_OBJECTS (SPTE_ENT_PER_PAGE + 1) struct kvm_mmu_memory_cache split_desc_cache; + + struct kvm_mtrr *mtrr_state; + bool has_mtrr; }; struct kvm_vm_stat { diff --git a/arch/x86/kvm/mtrr.c b/arch/x86/kvm/mtrr.c index 62ebb9978156..1ae80c756797 100644 --- a/arch/x86/kvm/mtrr.c +++ b/arch/x86/kvm/mtrr.c @@ -438,6 +438,28 @@ void kvm_vcpu_mtrr_init(struct kvm_vcpu *vcpu) INIT_LIST_HEAD(&vcpu->arch.mtrr_state.head); } +void kvm_mtrr_init(struct kvm_vcpu *vcpu) +{ + struct kvm *kvm = vcpu->kvm; + + if (vcpu->vcpu_id) + return; + + rcu_assign_pointer(kvm->arch.mtrr_state, &vcpu->arch.mtrr_state); + kvm->arch.has_mtrr = guest_cpuid_has(vcpu, X86_FEATURE_MTRR); +} + +void kvm_mtrr_destroy(struct kvm_vcpu *vcpu) +{ + struct kvm *kvm = vcpu->kvm; + + if (vcpu->vcpu_id) + return; + + rcu_assign_pointer(kvm->arch.mtrr_state, NULL); + synchronize_srcu_expedited(&kvm->srcu); +} + struct mtrr_iter { /* input fields. */ struct kvm_mtrr *mtrr_state; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 48b683a305b3..b8aa18031877 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -11879,6 +11879,7 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu) vcpu->arch.msr_platform_info = MSR_PLATFORM_INFO_CPUID_FAULT; kvm_xen_init_vcpu(vcpu); kvm_vcpu_mtrr_init(vcpu); + kvm_mtrr_init(vcpu); vcpu_load(vcpu); kvm_set_tsc_khz(vcpu, vcpu->kvm->arch.default_tsc_khz); kvm_vcpu_reset(vcpu, false); @@ -11948,6 +11949,7 @@ void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu) kvfree(vcpu->arch.cpuid_entries); if (!lapic_in_kernel(vcpu)) static_branch_dec(&kvm_has_noapic_vcpu); + kvm_mtrr_destroy(vcpu); } void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event) diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h index c544602d07a3..d0a7e50de739 100644 --- a/arch/x86/kvm/x86.h +++ b/arch/x86/kvm/x86.h @@ -308,6 +308,8 @@ void kvm_deliver_exception_payload(struct kvm_vcpu *vcpu, struct kvm_queued_exception *ex); void kvm_vcpu_mtrr_init(struct kvm_vcpu *vcpu); +void kvm_mtrr_init(struct kvm_vcpu *vcpu); +void kvm_mtrr_destroy(struct kvm_vcpu *vcpu); u8 kvm_mtrr_get_guest_memory_type(struct kvm_vcpu *vcpu, gfn_t gfn); bool kvm_mtrr_valid(struct kvm_vcpu *vcpu, u32 msr, u64 data); int kvm_mtrr_set_msr(struct kvm_vcpu *vcpu, u32 msr, u64 data); From patchwork Tue May 9 13:53:43 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yan Zhao X-Patchwork-Id: 13235837 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id CD69AC77B75 for ; Tue, 9 May 2023 14:19:05 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235860AbjEIOTE (ORCPT ); Tue, 9 May 2023 10:19:04 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59336 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235548AbjEIOTA (ORCPT ); Tue, 9 May 2023 10:19:00 -0400 Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2C3D63585; Tue, 9 May 2023 07:18:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1683641922; x=1715177922; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=lYfoWretd3oIEUlT2oKkOmxI8HPDaPMFZhzR+BA1bdY=; b=R1ZP1/et5p5gxKrL6cQVByFWsCE9Pr7gQOEycyg9VEvl5gnd75Y0CKVc RI1xUDy9YwF3/sTa4z3ARdRJ80SFvJVJVGGVMclyRY34+zzLjlgoRfjYU nMkZPu6/oqHbn/McwLWBrRMf0wOiZVsT8/T46JsHt73T5gxb859nFxiMT 7JyFDfJ+B4/B4arfDwSd0ERGkgYwiooi0FzXb1kCENbODVdxtbIW9Dnas gfB8uk3cV+yxkWRHyJQMaQqgfKPp71H4Ltm1Y4NuMiTeATP7j8dY2P8Jk HBINYDY00rGbjE2e8F39j1dW+PX8Ye9a0OrS1P9QLCNqgknQjztYjPYtM A==; X-IronPort-AV: E=McAfee;i="6600,9927,10705"; a="330300180" X-IronPort-AV: E=Sophos;i="5.99,262,1677571200"; d="scan'208";a="330300180" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 May 2023 07:18:41 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10705"; a="788546139" X-IronPort-AV: E=Sophos;i="5.99,262,1677571200"; d="scan'208";a="788546139" Received: from yzhao56-desk.sh.intel.com ([10.239.159.62]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 May 2023 07:18:40 -0700 From: Yan Zhao To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: pbonzini@redhat.com, seanjc@google.com, Yan Zhao Subject: [PATCH v2 6/6] KVM: x86/mmu: use per-VM based MTRR for EPT Date: Tue, 9 May 2023 21:53:43 +0800 Message-Id: <20230509135343.1925-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230509134825.1523-1-yan.y.zhao@intel.com> References: <20230509134825.1523-1-yan.y.zhao@intel.com> Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org When KVM mmu checking guest MTRR, check the per-VM one and only zap EPT if per-VM MTRR (MTRR of vCPU 0) changes. Before this patch, if there're noncoherent DMA, EPT violation handler will reference the guest MTRR state of the vCPU causing the violation. EPT leaf entries will be zapped if MTRR settings of each vCPU changes. But as one EPT leaf entry can only have one memory type, it may still cause problem if vCPUs have different MTRR state. So, insane guests without consistent MTRR state across vCPUs will only cause problem to its own. Therefore, this patch switches to use per-VM MTRR and only zap EPT when this per-VM MTRR changes, which can avoid several EPT zap during guest boot. A reference data (average of 10 times of guest boot) is as below: Physical CPU frequency: 3100 MHz | vCPU cnt | memory | EPT zap cnt | EPT zap cycles | bootup time before | 8 | 2G | 84 | 4164.57M | 19.38s after | 8 | 2G | 14 | 16.07M | 18.83s before | 8 | 16G | 84 | 4163.38M | 24.51s after | 8 | 16G | 14 | 16.68M | 23.94s Legends: before: before this patch after: after this patch vCPU cnt: guest vCPU count of a VM memory: guest memory size EPT zap cnt: the count of EPT zap caused by update_mtrr() during guest boot EPT zap cycles: the cpu cyles of EPT zap caused by update_mtrr() during guest boot bootup time: guest bootup time, measured from starting QEMU to guest rc.local Signed-off-by: Yan Zhao --- arch/x86/kvm/mmu/mmu.c | 2 +- arch/x86/kvm/mtrr.c | 88 +++++++++++++++++++++++++++++++----------- arch/x86/kvm/vmx/vmx.c | 2 +- arch/x86/kvm/x86.h | 4 +- 4 files changed, 70 insertions(+), 26 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 2706754794d1..4b05ce1f0241 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -4532,7 +4532,7 @@ int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) gfn_t base = gfn_round_for_level(fault->gfn, fault->max_level); - if (kvm_mtrr_check_gfn_range_consistency(vcpu, base, page_num)) + if (kvm_mtrr_check_gfn_range_consistency(vcpu->kvm, base, page_num)) break; } } diff --git a/arch/x86/kvm/mtrr.c b/arch/x86/kvm/mtrr.c index 1ae80c756797..9be8ed40e226 100644 --- a/arch/x86/kvm/mtrr.c +++ b/arch/x86/kvm/mtrr.c @@ -105,7 +105,7 @@ static u8 mtrr_default_type(struct kvm_mtrr *mtrr_state) return mtrr_state->deftype & IA32_MTRR_DEF_TYPE_TYPE_MASK; } -static u8 mtrr_disabled_type(struct kvm_vcpu *vcpu) +static u8 mtrr_disabled_type(struct kvm *kvm) { /* * Intel SDM 11.11.2.2: all MTRRs are disabled when @@ -117,10 +117,7 @@ static u8 mtrr_disabled_type(struct kvm_vcpu *vcpu) * enable MTRRs and it is obviously undesirable to run the * guest entirely with UC memory and we use WB. */ - if (guest_cpuid_has(vcpu, X86_FEATURE_MTRR)) - return MTRR_TYPE_UNCACHABLE; - else - return MTRR_TYPE_WRBACK; + return kvm->arch.has_mtrr ? MTRR_TYPE_UNCACHABLE : MTRR_TYPE_WRBACK; } /* @@ -310,6 +307,12 @@ static void update_mtrr(struct kvm_vcpu *vcpu, u32 msr) gfn_t start, end; int index; + /* MTRR is consistency between all the processors in the system + * so just update the TDP according to MTRR settings in vcpu0 + */ + if (vcpu->vcpu_id) + return; + if (msr == MSR_IA32_CR_PAT || !tdp_enabled || !kvm_arch_has_noncoherent_dma(vcpu->kvm)) return; @@ -635,10 +638,11 @@ static void mtrr_lookup_next(struct mtrr_iter *iter) for (mtrr_lookup_init(_iter_, _mtrr_, _gpa_start_, _gpa_end_); \ mtrr_lookup_okay(_iter_); mtrr_lookup_next(_iter_)) -u8 kvm_mtrr_get_guest_memory_type(struct kvm_vcpu *vcpu, gfn_t gfn) +u8 kvm_mtrr_get_guest_memory_type(struct kvm *kvm, gfn_t gfn) { - struct kvm_mtrr *mtrr_state = &vcpu->arch.mtrr_state; + struct kvm_mtrr *mtrr_state; struct mtrr_iter iter; + int srcu_idx; u64 start, end; int type = -1; const int wt_wb_mask = (1 << MTRR_TYPE_WRBACK) @@ -647,6 +651,16 @@ u8 kvm_mtrr_get_guest_memory_type(struct kvm_vcpu *vcpu, gfn_t gfn) start = gfn_to_gpa(gfn); end = start + PAGE_SIZE; + srcu_idx = srcu_read_lock(&kvm->srcu); + mtrr_state = srcu_dereference(kvm->arch.mtrr_state, &kvm->srcu); + /* kvm mtrr_state points to mtrr_state of vcpu0. + * should not reach here unless vcpu0 is destroyed + */ + if (WARN_ON(!mtrr_state)) { + type = mtrr_disabled_type(kvm); + goto out; + } + mtrr_for_each_mem_type(&iter, mtrr_state, start, end) { int curr_type = iter.mem_type; @@ -694,12 +708,16 @@ u8 kvm_mtrr_get_guest_memory_type(struct kvm_vcpu *vcpu, gfn_t gfn) return MTRR_TYPE_WRBACK; } - if (iter.mtrr_disabled) - return mtrr_disabled_type(vcpu); + if (iter.mtrr_disabled) { + type = mtrr_disabled_type(kvm); + goto out; + } /* not contained in any MTRRs. */ - if (type == -1) - return mtrr_default_type(mtrr_state); + if (type == -1) { + type = mtrr_default_type(mtrr_state); + goto out; + } /* * We just check one page, partially covered by MTRRs is @@ -707,38 +725,64 @@ u8 kvm_mtrr_get_guest_memory_type(struct kvm_vcpu *vcpu, gfn_t gfn) */ WARN_ON(iter.partial_map); +out: + srcu_read_unlock(&kvm->srcu, srcu_idx); return type; } EXPORT_SYMBOL_GPL(kvm_mtrr_get_guest_memory_type); -bool kvm_mtrr_check_gfn_range_consistency(struct kvm_vcpu *vcpu, gfn_t gfn, +bool kvm_mtrr_check_gfn_range_consistency(struct kvm *kvm, gfn_t gfn, int page_num) { - struct kvm_mtrr *mtrr_state = &vcpu->arch.mtrr_state; + struct kvm_mtrr *mtrr_state; struct mtrr_iter iter; + int srcu_idx; u64 start, end; int type = -1; + int ret; start = gfn_to_gpa(gfn); end = gfn_to_gpa(gfn + page_num); + + srcu_idx = srcu_read_lock(&kvm->srcu); + mtrr_state = srcu_dereference(kvm->arch.mtrr_state, &kvm->srcu); + /* kvm mtrr_state points to mtrr_state of vcpu0. + * should not reach here unless vcpu0 is destroyed + */ + if (WARN_ON(!mtrr_state)) { + ret = true; + goto out; + } + mtrr_for_each_mem_type(&iter, mtrr_state, start, end) { if (type == -1) { type = iter.mem_type; continue; } - if (type != iter.mem_type) - return false; + if (type != iter.mem_type) { + ret = false; + goto out; + } } - if (iter.mtrr_disabled) - return true; + if (iter.mtrr_disabled) { + ret = true; + goto out; + } - if (!iter.partial_map) - return true; + if (!iter.partial_map) { + ret = true; + goto out; + } - if (type == -1) - return true; + if (type == -1) { + ret = true; + goto out; + } - return type == mtrr_default_type(mtrr_state); + ret = (type == mtrr_default_type(mtrr_state)); +out: + srcu_read_unlock(&kvm->srcu, srcu_idx); + return ret; } diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 44fb619803b8..2ae9d5f3da99 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -7540,7 +7540,7 @@ static u8 vmx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio) return (cache << VMX_EPT_MT_EPTE_SHIFT) | VMX_EPT_IPAT_BIT; } - return kvm_mtrr_get_guest_memory_type(vcpu, gfn) << VMX_EPT_MT_EPTE_SHIFT; + return kvm_mtrr_get_guest_memory_type(vcpu->kvm, gfn) << VMX_EPT_MT_EPTE_SHIFT; } static void vmcs_set_secondary_exec_control(struct vcpu_vmx *vmx, u32 new_ctl) diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h index d0a7e50de739..a7acfeacbc04 100644 --- a/arch/x86/kvm/x86.h +++ b/arch/x86/kvm/x86.h @@ -310,11 +310,11 @@ void kvm_deliver_exception_payload(struct kvm_vcpu *vcpu, void kvm_vcpu_mtrr_init(struct kvm_vcpu *vcpu); void kvm_mtrr_init(struct kvm_vcpu *vcpu); void kvm_mtrr_destroy(struct kvm_vcpu *vcpu); -u8 kvm_mtrr_get_guest_memory_type(struct kvm_vcpu *vcpu, gfn_t gfn); +u8 kvm_mtrr_get_guest_memory_type(struct kvm *kvm, gfn_t gfn); bool kvm_mtrr_valid(struct kvm_vcpu *vcpu, u32 msr, u64 data); int kvm_mtrr_set_msr(struct kvm_vcpu *vcpu, u32 msr, u64 data); int kvm_mtrr_get_msr(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata); -bool kvm_mtrr_check_gfn_range_consistency(struct kvm_vcpu *vcpu, gfn_t gfn, +bool kvm_mtrr_check_gfn_range_consistency(struct kvm *kvm, gfn_t gfn, int page_num); bool kvm_vector_hashing_enabled(void); void kvm_fixup_and_inject_pf_error(struct kvm_vcpu *vcpu, gva_t gva, u16 error_code);